[00:29:58] <grrrit-wm>	 (03PS4) 10EBernhardson: Duplicate logstash output to alternate elasticsearch cluster [puppet] - 10https://gerrit.wikimedia.org/r/295442 
[01:02:15] <icinga-wm>	 PROBLEM - puppet last run on labnet1002 is CRITICAL: CRITICAL: Puppet has 1 failures
[01:21:56] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m3 on db1048 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1307.94 seconds
[01:27:46] <icinga-wm>	 RECOVERY - puppet last run on labnet1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[01:51:58] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m3 on db1048 is OK: OK slave_sql_lag Replication lag: 0.07 seconds
[02:00:24] <grrrit-wm>	 (03PS1) 10Yurik: (WIP) Notify TileratorUI on new expiry files [puppet] - 10https://gerrit.wikimedia.org/r/295450 (https://phabricator.wikimedia.org/T108459) 
[02:31:23] <logmsgbot>	 !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.6) (duration: 10m 24s)
[02:31:30] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[03:05:49] <logmsgbot>	 !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.7) (duration: 17m 49s)
[03:05:55] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[03:12:33] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Wed Jun 22 03:12:33 UTC 2016 (duration 6m 44s)
[03:12:38] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[04:29:22] <chasemp>	 !log fix salt key on labtestmetal2001
[04:29:27] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[04:47:20] <icinga-wm>	 PROBLEM - puppet last run on cp2007 is CRITICAL: CRITICAL: Puppet has 1 failures
[05:13:13] <icinga-wm>	 RECOVERY - puppet last run on cp2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[05:29:08] <grrrit-wm>	 (03PS1) 10KartikMistry: Deploy Compact Language Links as default (Stage 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295454 (https://phabricator.wikimedia.org/T136677) 
[05:34:39] <grrrit-wm>	 (03PS1) 10Tim Landscheidt: labstore: Remove redundant calls to lower() for user names [puppet] - 10https://gerrit.wikimedia.org/r/295455 
[05:35:43] <grrrit-wm>	 (03CR) 10Tim Landscheidt: "String arithmetics:" [puppet] - 10https://gerrit.wikimedia.org/r/295455 (owner: 10Tim Landscheidt)
[05:52:07] <grrrit-wm>	 (03PS2) 10KartikMistry: Deploy Compact Language Links as default (Stage 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295454 (https://phabricator.wikimedia.org/T136677) 
[06:04:48] <icinga-wm>	 PROBLEM - puppet last run on mw2132 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:30:36] <icinga-wm>	 RECOVERY - puppet last run on mw2132 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[06:30:56] <icinga-wm>	 PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: puppet fail
[06:31:27] <icinga-wm>	 PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:31:27] <icinga-wm>	 PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: puppet fail
[06:31:47] <icinga-wm>	 PROBLEM - puppet last run on mc2007 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:57] <icinga-wm>	 PROBLEM - puppet last run on cp2001 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:32:07] <icinga-wm>	 PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:32:07] <icinga-wm>	 PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: puppet fail
[06:32:17] <icinga-wm>	 PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:17] <icinga-wm>	 PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:26] <icinga-wm>	 PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:48] <icinga-wm>	 PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:08] <icinga-wm>	 PROBLEM - puppet last run on mw2208 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:27] <icinga-wm>	 PROBLEM - puppet last run on mw2250 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:56] <icinga-wm>	 PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:56:26] <icinga-wm>	 RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[06:56:38] <icinga-wm>	 RECOVERY - puppet last run on mw2208 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures
[06:56:46] <icinga-wm>	 RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures
[06:56:57] <icinga-wm>	 RECOVERY - puppet last run on mw2250 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures
[06:57:18] <icinga-wm>	 RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:57:18] <icinga-wm>	 RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[06:57:27] <icinga-wm>	 RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures
[06:57:37] <icinga-wm>	 RECOVERY - puppet last run on mc2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:46] <icinga-wm>	 RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:48] <icinga-wm>	 RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:57] <icinga-wm>	 RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures
[06:58:06] <icinga-wm>	 RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:06] <icinga-wm>	 RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:07] <icinga-wm>	 RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:01:07] <icinga-wm>	 PROBLEM - puppet last run on mw1260 is CRITICAL: CRITICAL: Puppet has 1 failures
[07:05:17] <icinga-wm>	 PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/0/2: down - Core: cr2-ulsfo:xe-1/3/0 (Zayo, OGYX/124337//ZYO, 38.8ms) {#11541} [10Gbps wave]BR
[07:06:23] <moritzm>	 !log restarted hhvm on mw1131
[07:06:28] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[07:06:46] <icinga-wm>	 PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/3/0: down - Core: cr1-codfw:xe-5/0/2 (Zayo, OGYX/124337//ZYO, 38.8ms) {#?} [10Gbps wave]BR
[07:08:27] <icinga-wm>	 RECOVERY - Apache HTTP on mw1131 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 627 bytes in 0.237 second response time
[07:08:36] <icinga-wm>	 RECOVERY - HHVM rendering on mw1131 is OK: HTTP OK: HTTP/1.1 200 OK - 67924 bytes in 0.391 second response time
[07:24:58] <icinga-wm>	 ACKNOWLEDGEMENT - Elasticsearch HTTPS on elastic1032 is CRITICAL: Use of uninitialized value sans in concatenation (.) or string at /usr/lib/nagios/plugins/check_ssl line 185. Muehlenhoff Host is in setup, see SAL
[07:26:47] <icinga-wm>	 RECOVERY - puppet last run on mw1260 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:26:58] <icinga-wm>	 PROBLEM - Disk space on fluorine is CRITICAL: DISK CRITICAL - free space: /a 136540 MB (3% inode=99%)
[07:30:49] <jynus>	 !log stopping, backing up and reimaging db1061 and db1062
[07:30:53] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[07:37:54] <wikibugs>	 06Operations, 10Traffic, 10Wiki-Loves-Monuments, 07HTTPS: configure https for www.wikilovesmonuments.org - https://phabricator.wikimedia.org/T118388#2398200 (10SindyM3) Done :D
[07:51:04] <jynus>	 I am waiting for log rotate to compress 147G file api.log-20160622 on fluorine
[07:56:17] <icinga-wm>	 PROBLEM - puppet last run on mw1140 is CRITICAL: CRITICAL: Puppet has 41 failures
[07:59:01] <moritzm>	 !log rolling restart of hhvm/apache on canary app servers in eqiad for expat security update
[07:59:05] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[08:08:27] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s2 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1021, Errmsg: Error Disk full (module_deps): waiting for someone to free some space... (errno: 189 Disk full) on query. Default database: bgwiki. Query: REPLACE /* DatabaseMysqlBase::replace */ INTO module_deps (md_module,md_skin,md_deps) VALUES (ext.wikimediaBadges,vector
[08:08:50] <jynus>	 grr
[08:08:53] <_joe_>	 lol
[08:09:12] <_joe_>	 I was reading the alert and thinking: jynus commenting in 3,2,...
[08:09:17] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s4 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1021, Errmsg: Error Disk full (module_deps): waiting for someone to free some space... (errno: 189 Disk full) on query. Default database: commonswiki. Query: REPLACE /* DatabaseMysqlBase::replace */ INTO module_deps (md_module,md_skin,md_deps) VALUES (ext.wikimediaBadges,vector
[08:09:27] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s1 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1021, Errmsg: Error Disk full (pagelinks): waiting for someone to free some space... (errno: 189 Disk full) on query. Default database: enwiki. Query: INSERT /* LinksUpdate::incrTableUpdate 127.0.0.1 */ IGNORE INTO pagelinks (pl_from,pl_from_namespace,pl_namespace,pl_title) VALUES (50822974,0,10,R_from_ambiguous_page),(50
[08:09:36] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s7 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1021, Errmsg: Error Disk full (text): waiting for someone to free some space... (errno: 189 Disk full) on query. Default database: huwiki. Query: INSERT /* Revision::insertOn Szilas */ INTO text (old_id,old_text,old_flags) VALUES (NULL,DB://cluster25/2692957,utf-8,gzip,external)
[08:09:37] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s5 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1021, Errmsg: Error Disk full (text): waiting for someone to free some space... (errno: 189 Disk full) on query. Default database: wikidatawiki. Query: INSERT /* Revision::insertOn ShinePhantom */ INTO text (old_id,old_text,old_flags) VALUES (NULL,DB://cluster24/176847594,utf-8,gzip,external)
[08:09:38] <jynus>	 but why, there is 300GB left!
[08:09:40] <_joe_>	 I can't really help over this network, sorry
[08:09:57] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s6 on dbstore1001 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1021, Errmsg: Error Disk full (querycache_info): waiting for someone to free some space... (errno: 189 Disk full) on query. Default database: ruwiki. Query: REPLACE /* RecentChangesUpdateJob::{closure} */ INTO querycache_info (qci_type,qci_timestamp) VALUES (activeusers,20160621080106)
[08:10:33] <akosiaris>	 disk full ? with 300GB left ?
[08:11:10] <jynus>	 I mean, I have 7TB used
[08:12:45] <jynus>	 I think it is a toku-db only thing
[08:13:13] <jynus>	 mysql should send an io error otherwise
[08:13:38] <Bsadowski1>	 Uh
[08:13:41] <akosiaris>	 tokudb_fs_reserve_percent  ?
[08:13:49] <akosiaris>	 ah it's 5%
[08:13:54] <akosiaris>	 jynus: spot on
[08:14:16] <jynus>	 another reason to hate toku
[08:14:23] <akosiaris>	 hehe
[08:15:14] <jynus>	 and of course the variable is not hot
[08:15:20] <jynus>	 so it will have to wait
[08:16:03] <jynus>	 it is delayed 24 hours, no problem if it gets delayed 25h
[08:16:37] <icinga-wm>	 RECOVERY - puppet last run on mw1140 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[08:16:37] <jynus>	 and I only added 163GB extra
[08:22:04] <jynus>	 nice, and I cannot now login into the management interfaze of db1061
[08:27:27] <icinga-wm>	 RECOVERY - Disk space on fluorine is OK: DISK OK
[08:33:06] <wikibugs>	 06Operations, 10ops-eqiad: db1061 management interface needs physical reset - https://phabricator.wikimedia.org/T138368#2398311 (10jcrespo)
[08:33:44] <wikibugs>	 06Operations, 10ops-eqiad: db1061 management interface needs physical reset - https://phabricator.wikimedia.org/T138368#2398324 (10jcrespo) This is blocking a time-sensitive reimage.
[08:36:00] <wikibugs>	 06Operations, 10ops-eqiad: db1061 and db1062 management interface needs physical reset - https://phabricator.wikimedia.org/T138368#2398326 (10jcrespo)
[08:36:44] <godog>	 akosiaris: thanks for the real_networks etherpad! I'll take a look
[08:41:30] <akosiaris>	 godog: be warned. it's just my thoughts as I tried to capture them in a pad yesterday. I may very well be wrong on some things.
[08:41:59] <icinga-wm>	 RECOVERY - Elasticsearch HTTPS on elastic1032 is OK: SSL OK - Certificate elastic1032.eqiad.wmnet valid until 2021-06-21 08:40:25 +0000 (expires in 1824 days)
[08:45:18] <icinga-wm>	 PROBLEM - HHVM rendering on mw1140 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[08:45:55] <godog>	 heheh ok!
[08:46:07] <icinga-wm>	 PROBLEM - Apache HTTP on mw1140 is CRITICAL: HTTP CRITICAL - No data received from host
[08:46:43] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] (WIP) Notify TileratorUI on new expiry files (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295450 (https://phabricator.wikimedia.org/T108459) (owner: 10Yurik)
[08:47:23] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 031] openldap: enable the memberof overlay [puppet] - 10https://gerrit.wikimedia.org/r/295357 (owner: 10Faidon Liambotis)
[08:47:28] <icinga-wm>	 PROBLEM - dhclient process on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:47:48] <icinga-wm>	 PROBLEM - SSH on mw1140 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[08:47:59] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 031] svc: add graphite LVS addresses [dns] - 10https://gerrit.wikimedia.org/r/289635 (https://phabricator.wikimedia.org/T85451) (owner: 10Filippo Giunchedi)
[08:48:08] <icinga-wm>	 PROBLEM - nutcracker port on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:48:28] <icinga-wm>	 PROBLEM - nutcracker process on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:48:47] <icinga-wm>	 PROBLEM - puppet last run on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:48:57] <icinga-wm>	 PROBLEM - HHVM processes on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:48:58] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:48:58] <icinga-wm>	 PROBLEM - configured eth on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:48:58] <icinga-wm>	 PROBLEM - salt-minion processes on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:49:09] <icinga-wm>	 PROBLEM - DPKG on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:49:28] <icinga-wm>	 PROBLEM - Disk space on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[08:50:18] <icinga-wm>	 RECOVERY - nutcracker port on mw1140 is OK: TCP OK - 0.000 second response time on port 11212
[08:51:29] <grrrit-wm>	 (03PS1) 10Jcrespo: Lower tokudb_fs_reserve_percent to 1% [puppet] - 10https://gerrit.wikimedia.org/r/295457 
[08:53:30] <grrrit-wm>	 (03PS2) 10Jcrespo: Lower tokudb_fs_reserve_percent to 1% [puppet] - 10https://gerrit.wikimedia.org/r/295457 
[08:59:19] <icinga-wm>	 PROBLEM - nutcracker port on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[09:02:09] <wikibugs>	 06Operations: LDAP Account required for Transparency Report - https://phabricator.wikimedia.org/T138369#2398350 (10siddharth11)
[09:09:35] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Lower tokudb_fs_reserve_percent to 1% [puppet] - 10https://gerrit.wikimedia.org/r/295457 (owner: 10Jcrespo)
[09:12:28] <icinga-wm>	 RECOVERY - configured eth on mw1140 is OK: OK - interfaces up
[09:12:29] <icinga-wm>	 RECOVERY - SSH on mw1140 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0)
[09:13:09] <icinga-wm>	 RECOVERY - dhclient process on mw1140 is OK: PROCS OK: 0 processes with command name dhclient
[09:13:28] <icinga-wm>	 RECOVERY - nutcracker port on mw1140 is OK: TCP OK - 0.000 second response time on port 11212
[09:13:29] <icinga-wm>	 RECOVERY - Apache HTTP on mw1140 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.038 second response time
[09:13:29] <icinga-wm>	 RECOVERY - nutcracker process on mw1140 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[09:13:39] <icinga-wm>	 RECOVERY - DPKG on mw1140 is OK: All packages OK
[09:13:49] <icinga-wm>	 RECOVERY - salt-minion processes on mw1140 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[09:13:58] <icinga-wm>	 RECOVERY - Disk space on mw1140 is OK: DISK OK
[09:14:08] <icinga-wm>	 RECOVERY - puppet last run on mw1140 is OK: OK: Puppet is currently enabled, last run 59 minutes ago with 0 failures
[09:14:29] <icinga-wm>	 RECOVERY - HHVM processes on mw1140 is OK: PROCS OK: 6 processes with command name hhvm
[09:14:30] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1140 is OK: OK: nf_conntrack is 18 % full
[09:14:38] <icinga-wm>	 RECOVERY - HHVM rendering on mw1140 is OK: HTTP OK: HTTP/1.1 200 OK - 67910 bytes in 0.118 second response time
[09:15:44] <grrrit-wm>	 (03PS3) 10Ema: tlsproxy: enable client/server TFO support in the kernel [puppet] - 10https://gerrit.wikimedia.org/r/295331 (https://phabricator.wikimedia.org/T108827) 
[09:17:00] <grrrit-wm>	 (03CR) 10Ema: [C: 032 V: 032] tlsproxy: enable client/server TFO support in the kernel [puppet] - 10https://gerrit.wikimedia.org/r/295331 (https://phabricator.wikimedia.org/T108827) (owner: 10Ema)
[09:17:44] <ema>	 jynus: looks like there is an unmerged change of yours on palladium (tokudb_fs_reserve_percent)
[09:17:53] <ema>	 can I merge it?
[09:18:08] <jynus>	 I was doing the same
[09:18:17] <jynus>	 please do
[09:18:26] <ema>	 jynus: done :)
[09:19:48] <jynus>	 !log stopping and reconfiguring mysql on dbstore1001
[09:19:53] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:20:58] <icinga-wm>	 PROBLEM - puppet last run on mw1140 is CRITICAL: CRITICAL: Puppet has 75 failures
[09:21:57] <wikibugs>	 06Operations: LDAP Account required for Transparency Report - https://phabricator.wikimedia.org/T138369#2398350 (10hashar) LDAP accounts are created via https://wikitech.wikimedia.org/  and you seems to already have one: https://wikitech.wikimedia.org/wiki/User:Siddparmar  Your account is neither a member of LDA...
[09:23:06] <grrrit-wm>	 (03CR) 10Muehlenhoff: [C: 031] "Looks good. I tested what's needed to add memberOf attributes for existing group entries: The memberOf attributes on the user accounts are" [puppet] - 10https://gerrit.wikimedia.org/r/295357 (owner: 10Faidon Liambotis)
[09:28:49] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s1 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional)
[09:28:49] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s2 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional)
[09:28:50] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s6 on dbstore1001 is OK: OK slave_sql_state Slave_SQL_Running: No, (no error: intentional)
[09:36:51] <wikibugs>	 06Operations, 07Graphite: Grafana login issue for @thiemowmde - https://phabricator.wikimedia.org/T135994#2398393 (10thiemowmde) 05Open>03Resolved a:03thiemowmde I tried again with Chromium and can login, but can't with Firefox. So obviously something Firefox does different (encoding, obviously). Not to...
[09:43:04] <legoktm>	 !log live-hacking on mw1017 to debug T115119
[09:43:05] <stashbot>	 T115119: Implement Schema:ExternalLinkChange - https://phabricator.wikimedia.org/T115119
[09:43:08] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:44:59] <icinga-wm>	 PROBLEM - HHVM rendering on mw1140 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:46:18] <moritzm>	 !log rolling restart of restbase in codfw to pick up firejail change in service::node
[09:46:22] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[09:47:08] <icinga-wm>	 RECOVERY - HHVM rendering on mw1140 is OK: HTTP OK: HTTP/1.1 200 OK - 67903 bytes in 0.819 second response time
[10:07:42] <grrrit-wm>	 (03Abandoned) 10Mobrovac: service::node: Output stdout and stderr seen by firejail to a log file [puppet] - 10https://gerrit.wikimedia.org/r/294499 (owner: 10Mobrovac)
[10:16:09] <icinga-wm>	 PROBLEM - Apache HTTP on mw1140 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:17:00] <wikibugs>	 06Operations, 10ops-codfw, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2398559 (10fgiunchedi) switch port configuration wasn't correct (`ge` vs `xe` ports names), I've fixed that and was able to pxe-boot ms-be2022
[10:17:18] <icinga-wm>	 PROBLEM - HHVM rendering on mw1140 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:17:38] <icinga-wm>	 PROBLEM - SSH on mw1140 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:18:55] <moritzm>	 !log rolling restart of restbase in eqiad to pick up firejail change in service::node
[10:18:58] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:19:29] <icinga-wm>	 PROBLEM - HHVM processes on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:19:29] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:19:39] <icinga-wm>	 PROBLEM - configured eth on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:23:18] <icinga-wm>	 PROBLEM - DPKG on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:23:20] <icinga-wm>	 PROBLEM - salt-minion processes on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:23:29] <icinga-wm>	 PROBLEM - Disk space on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:23:50] <icinga-wm>	 RECOVERY - HHVM processes on mw1140 is OK: PROCS OK: 6 processes with command name hhvm
[10:25:50] <icinga-wm>	 RECOVERY - Disk space on mw1140 is OK: DISK OK
[10:27:37] <elukey>	 again?
[10:29:18] <wikibugs>	 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#1936600 (10jcrespo) There is indeed a replacement for labsdb100[123] about to arrive. However, there are no short-term plans for these, as they have lower impact.  labsdb10...
[10:29:21] <hashar_>	 lunch &
[10:29:48] <icinga-wm>	 PROBLEM - dhclient process on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:30:00] <elukey>	 going to check mw1140, memory pressure seems to be the cause
[10:31:19] <icinga-wm>	 PROBLEM - HHVM processes on mw1140 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[10:32:06] <elukey>	 !log mw1140 powercycle after freeze issues due to memory pressure (was not able to ssh to it)
[10:32:10] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:32:20] <icinga-wm>	 PROBLEM - Disk space on mw1140 is CRITICAL: Timeout while attempting connection
[10:33:01] <icinga-wm>	 PROBLEM - nutcracker port on mw1140 is CRITICAL: Timeout while attempting connection
[10:33:11] <icinga-wm>	 PROBLEM - nutcracker process on mw1140 is CRITICAL: Timeout while attempting connection
[10:33:51] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1140 is OK: OK: nf_conntrack is 0 % full
[10:33:52] <icinga-wm>	 RECOVERY - HHVM processes on mw1140 is OK: PROCS OK: 2 processes with command name hhvm
[10:34:01] <icinga-wm>	 RECOVERY - SSH on mw1140 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0)
[10:34:21] <icinga-wm>	 RECOVERY - configured eth on mw1140 is OK: OK - interfaces up
[10:34:30] <icinga-wm>	 RECOVERY - dhclient process on mw1140 is OK: PROCS OK: 0 processes with command name dhclient
[10:34:50] <icinga-wm>	 RECOVERY - nutcracker port on mw1140 is OK: TCP OK - 0.000 second response time on port 11212
[10:35:01] <icinga-wm>	 RECOVERY - nutcracker process on mw1140 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[10:35:11] <icinga-wm>	 RECOVERY - Apache HTTP on mw1140 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 1.355 second response time
[10:35:30] <icinga-wm>	 RECOVERY - DPKG on mw1140 is OK: All packages OK
[10:35:50] <icinga-wm>	 RECOVERY - salt-minion processes on mw1140 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[10:35:51] <icinga-wm>	 RECOVERY - Disk space on mw1140 is OK: DISK OK
[10:36:22] <icinga-wm>	 RECOVERY - HHVM rendering on mw1140 is OK: HTTP OK: HTTP/1.1 200 OK - 67903 bytes in 0.595 second response time
[10:37:40] <icinga-wm>	 RECOVERY - puppet last run on mw1140 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[10:46:37] <godog>	 !log upload libphutil/arcanist 0~git20160620-0wmf1 to carbon
[10:46:40] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:53:39] <wikibugs>	 06Operations, 06Collaboration-Team-Interested, 10DBA, 10Flow, 07WorkType-Maintenance: Setup separate logical External Store for Flow in production - https://phabricator.wikimedia.org/T107610#2398667 (10jcrespo) I have not forgotten about this, it is on 'Next', blocked on me having proper time (there is n...
[10:59:27] <wikibugs>	 06Operations, 10LDAP-Access-Requests: LDAP Account required for Transparency Report - https://phabricator.wikimedia.org/T138369#2398687 (10Peachey88)
[11:02:13] <wikibugs>	 06Operations, 06Commons, 10media-storage: Update rsvg on the image scalers - https://phabricator.wikimedia.org/T112421#2398690 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff
[11:15:07] <elukey>	 going afk for lunch, mw128[789] and mw1290 are new api appservers. I can't directly silencing them until they show up in icinga, so if you see some spam please be patient :)
[11:15:09] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: install_server: add prometheus2001 [puppet] - 10https://gerrit.wikimedia.org/r/295470 (https://phabricator.wikimedia.org/T136313) 
[11:16:58] <wikibugs>	 06Operations: Frequent segfaults of rsvg-convert on image scalers - https://phabricator.wikimedia.org/T137876#2398698 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff
[11:20:18] <wikibugs>	 06Operations, 06Commons, 10Wikimedia-SVG-rendering: SVG files larger than 10 MB cannot be thumbnailed - https://phabricator.wikimedia.org/T111815#2398704 (10MoritzMuehlenhoff)
[11:20:34] <wikibugs>	 06Operations, 06Commons, 10Wikimedia-SVG-rendering: SVG files larger than 10 MB cannot be thumbnailed - https://phabricator.wikimedia.org/T111815#1616960 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff
[11:31:05] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: swift: add ms-be202[2-7] [puppet] - 10https://gerrit.wikimedia.org/r/295472 (https://phabricator.wikimedia.org/T136630) 
[11:31:34] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] install_server: add prometheus2001 [puppet] - 10https://gerrit.wikimedia.org/r/295470 (https://phabricator.wikimedia.org/T136313) (owner: 10Filippo Giunchedi)
[11:31:49] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: add ms-be202[2-7] [puppet] - 10https://gerrit.wikimedia.org/r/295472 (https://phabricator.wikimedia.org/T136630) (owner: 10Filippo Giunchedi)
[11:31:56] <grrrit-wm>	 (03CR) 10Gehel: "Puppet compiler looks good https://puppet-compiler.wmflabs.org/3159/" [puppet] - 10https://gerrit.wikimedia.org/r/295369 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[11:34:51] <grrrit-wm>	 (03PS1) 10Gehel: Configuring new elastic1033-1037 servers [puppet] - 10https://gerrit.wikimedia.org/r/295473 (https://phabricator.wikimedia.org/T138329) 
[11:35:37] <grrrit-wm>	 (03CR) 10Gehel: [C: 04-1] "Minor fix to HTTPS required before merging this. Fix coming up right now." [puppet] - 10https://gerrit.wikimedia.org/r/295473 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[11:36:57] <grrrit-wm>	 (03PS2) 10Gehel: Adding missing dependency in exposing puppet SSL certs on elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/295369 (https://phabricator.wikimedia.org/T138329) 
[11:38:31] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] Adding missing dependency in exposing puppet SSL certs on elasticsearch [puppet] - 10https://gerrit.wikimedia.org/r/295369 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[11:38:52] <icinga-wm>	 PROBLEM - puppet last run on ms-be2022 is CRITICAL: CRITICAL: Puppet has 12 failures
[11:41:31] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: Filter effect Gaussian blur filter not rendered correctly for small to medium thumbnail sizes - https://phabricator.wikimedia.org/T44090#461916 (10MoritzMuehlenhoff) That bug is fixed on the new jessie image scaler using 2.4.16 (tested locally, it's not ye...
[11:41:49] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: Filter effect Gaussian blur filter not rendered correctly for small to medium thumbnail sizes - https://phabricator.wikimedia.org/T44090#2398736 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff
[11:51:23] <grrrit-wm>	 (03CR) 10BBlack: [C: 031] lvs: rate-limit more ICMP codes, lower to 1/200ms [puppet] - 10https://gerrit.wikimedia.org/r/294467 (https://phabricator.wikimedia.org/T136939) (owner: 10Faidon Liambotis)
[11:54:32] <icinga-wm>	 RECOVERY - puppet last run on ms-be2022 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures
[11:59:10] <grrrit-wm>	 (03PS2) 10Gehel: Configuring new elastic1033-1037 servers [puppet] - 10https://gerrit.wikimedia.org/r/295473 (https://phabricator.wikimedia.org/T138329) 
[12:00:47] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] Configuring new elastic1033-1037 servers [puppet] - 10https://gerrit.wikimedia.org/r/295473 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[12:06:27] <gehel>	 !log configuring new elasticsearch servers elastic1033-1037 in eqiad
[12:06:30] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:14:24] <wikibugs>	 06Operations, 10Traffic, 10Wiki-Loves-Monuments, 07HTTPS: configure https for www.wikilovesmonuments.org - https://phabricator.wikimedia.org/T118388#2398786 (10JeanFred) 05Open>03Resolved Thanks all for this! :)
[12:23:41] <grrrit-wm>	 (03PS6) 10Yuvipanda: ores: fix workers and config [puppet] - 10https://gerrit.wikimedia.org/r/293904 (owner: 10Ladsgroup)
[12:23:47] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] ores: fix workers and config [puppet] - 10https://gerrit.wikimedia.org/r/293904 (owner: 10Ladsgroup)
[12:27:16] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: ldaplist: Allow searching for more than attribute [puppet] - 10https://gerrit.wikimedia.org/r/295475 
[12:30:25] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: ldaplist: Allow searching for more than attribute [puppet] - 10https://gerrit.wikimedia.org/r/295475 
[12:32:01] <grrrit-wm>	 (03PS1) 10Hashar: contint: create /var/lib/jenkins/builds [puppet] - 10https://gerrit.wikimedia.org/r/295477 (https://phabricator.wikimedia.org/T80385) 
[12:33:32] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "@Tim, will this https://gerrit.wikimedia.org/r/295475 serve your needs?" [puppet] - 10https://gerrit.wikimedia.org/r/295198 (https://phabricator.wikimedia.org/T122595) (owner: 10Muehlenhoff)
[12:34:09] <hashar>	 !log T80385 stopping Jenkins and migrating all build records to /var/lib/jenkins/builds
[12:34:14] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:34:36] <_joe_>	 !log disabling puppet on mw1017, live-hacking it
[12:34:39] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:35:00] <grrrit-wm>	 (03PS1) 10Urbanecm: Add www.wpc.ncep.noaa.gov to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295478 (https://phabricator.wikimedia.org/T138383) 
[12:35:13] <icinga-wm>	 PROBLEM - Apache HTTP on mw1290 is CRITICAL: Connection timed out
[12:35:26] <gehel>	 !log starting reimage of mw1292
[12:35:30] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:35:55] <hashar>	 oh my bad jenkins
[12:36:43] <icinga-wm>	 PROBLEM - puppet last run on mw1290 is CRITICAL: Timeout while attempting connection
[12:37:03] <icinga-wm>	 PROBLEM - salt-minion processes on mw1290 is CRITICAL: Timeout while attempting connection
[12:37:33] <icinga-wm>	 PROBLEM - configured eth on mw1287 is CRITICAL: Timeout while attempting connection
[12:37:33] <icinga-wm>	 PROBLEM - configured eth on mw1288 is CRITICAL: Timeout while attempting connection
[12:37:34] <icinga-wm>	 PROBLEM - Apache HTTP on mw1288 is CRITICAL: Connection timed out
[12:37:34] <icinga-wm>	 PROBLEM - Apache HTTP on mw1287 is CRITICAL: Connection timed out
[12:37:43] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "Examples:" [puppet] - 10https://gerrit.wikimedia.org/r/295475 (owner: 10Alexandros Kosiaris)
[12:37:54] <icinga-wm>	 PROBLEM - dhclient process on mw1288 is CRITICAL: Timeout while attempting connection
[12:37:54] <icinga-wm>	 PROBLEM - dhclient process on mw1287 is CRITICAL: Timeout while attempting connection
[12:37:54] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1290 is CRITICAL: Timeout while attempting connection
[12:38:04] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1288 is CRITICAL: Host mw1288 is not in mediawiki-installation dsh group
[12:38:04] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1287 is CRITICAL: Host mw1287 is not in mediawiki-installation dsh group
[12:38:14] <icinga-wm>	 PROBLEM - DPKG on mw1290 is CRITICAL: Timeout while attempting connection
[12:38:33] <icinga-wm>	 PROBLEM - nutcracker port on mw1287 is CRITICAL: Timeout while attempting connection
[12:38:33] <icinga-wm>	 PROBLEM - nutcracker port on mw1288 is CRITICAL: Timeout while attempting connection
[12:38:33] <icinga-wm>	 PROBLEM - Disk space on mw1290 is CRITICAL: Timeout while attempting connection
[12:38:55] <icinga-wm>	 PROBLEM - MD RAID on mw1290 is CRITICAL: Timeout while attempting connection
[12:38:55] <icinga-wm>	 PROBLEM - nutcracker process on mw1287 is CRITICAL: Timeout while attempting connection
[12:38:55] <icinga-wm>	 PROBLEM - nutcracker process on mw1288 is CRITICAL: Timeout while attempting connection
[12:39:09] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] contint: create /var/lib/jenkins/builds [puppet] - 10https://gerrit.wikimedia.org/r/295477 (https://phabricator.wikimedia.org/T80385) (owner: 10Hashar)
[12:39:14] <icinga-wm>	 PROBLEM - puppet last run on mw1288 is CRITICAL: Timeout while attempting connection
[12:39:15] <icinga-wm>	 PROBLEM - puppet last run on mw1287 is CRITICAL: Timeout while attempting connection
[12:39:34] <icinga-wm>	 PROBLEM - salt-minion processes on mw1287 is CRITICAL: Timeout while attempting connection
[12:39:34] <icinga-wm>	 PROBLEM - salt-minion processes on mw1288 is CRITICAL: Timeout while attempting connection
[12:39:53] <icinga-wm>	 PROBLEM - configured eth on mw1290 is CRITICAL: Timeout while attempting connection
[12:40:05] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1288 is CRITICAL: Timeout while attempting connection
[12:40:05] <icinga-wm>	 PROBLEM - dhclient process on mw1290 is CRITICAL: Timeout while attempting connection
[12:40:05] <icinga-wm>	 PROBLEM - Check size of conntrack table on mw1287 is CRITICAL: Timeout while attempting connection
[12:40:14] <icinga-wm>	 PROBLEM - mediawiki-installation DSH group on mw1290 is CRITICAL: Host mw1290 is not in mediawiki-installation dsh group
[12:40:24] <icinga-wm>	 PROBLEM - DPKG on mw1288 is CRITICAL: Timeout while attempting connection
[12:40:24] <icinga-wm>	 PROBLEM - DPKG on mw1287 is CRITICAL: Timeout while attempting connection
[12:40:43] <icinga-wm>	 PROBLEM - Disk space on mw1287 is CRITICAL: Timeout while attempting connection
[12:40:44] <icinga-wm>	 PROBLEM - nutcracker port on mw1290 is CRITICAL: Timeout while attempting connection
[12:40:44] <icinga-wm>	 PROBLEM - Disk space on mw1288 is CRITICAL: Timeout while attempting connection
[12:41:03] <icinga-wm>	 PROBLEM - MD RAID on mw1288 is CRITICAL: Timeout while attempting connection
[12:41:03] <icinga-wm>	 PROBLEM - nutcracker process on mw1290 is CRITICAL: Timeout while attempting connection
[12:41:03] <icinga-wm>	 PROBLEM - MD RAID on mw1287 is CRITICAL: Timeout while attempting connection
[12:43:10] <elukey>	 ouch too late
[12:43:16] <paravoid>	 what's all that?
[12:43:16] <elukey>	 sorry this is me
[12:43:45] <icinga-wm>	 PROBLEM - jenkins_service_running on gallium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war
[12:43:45] <icinga-wm>	 PROBLEM - jenkins_zmq_publisher on gallium is CRITICAL: Connection refused
[12:43:50] <elukey>	 new appservers, I didn't pay attention to icinga for 10 mins
[12:43:52] <elukey>	 and they appeared
[12:43:55] <elukey>	 silencing
[12:43:56] <elukey>	 sorry
[12:44:08] <grrrit-wm>	 (03PS3) 10Alexandros Kosiaris: ldaplist: Allow searching for more than attribute [puppet] - 10https://gerrit.wikimedia.org/r/295475 
[12:44:21] * gehel is hopefully not going to do the same in the next 5 minutes
[12:44:35] <Urbanecm>	 Hi, whats with Jenkins? See https://gerrit.wikimedia.org/r/#/c/295478/ ...
[12:44:47] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "I 'll remove the default substring matches. They cause more bugs than necessary. Perhaps adding them on an per attribute basis makes more " [puppet] - 10https://gerrit.wikimedia.org/r/295475 (owner: 10Alexandros Kosiaris)
[12:45:10] <gehel>	 Urbanecm: I think hashar is already on it...
[12:45:24] <hashar>	 Urbanecm: I have shut it down to move a bunch of files
[12:45:28] <hashar>	 will bring it back soonish
[12:45:50] <elukey>	 gehel: the funny thing is that I paid attention to icinga until 10/15 minutes ago
[12:45:53] <elukey>	 stepped out for a second
[12:45:53] <Urbanecm>	 Thx. Will it find my new patch and verify it?
[12:45:56] <elukey>	 alarms
[12:46:45] <gehel>	 Urbanecm: not sure, worst case add a comment with "recheck" and Jenkins should pick it up again
[12:47:02] <Urbanecm>	 Okay. 
[12:49:04] <hashar>	 !log T80385 Restarting Jenkins with builds dir set to "${JENKINS_HOME}/builds/${ITEM_FULL_NAME}" which is /var/lib/jenkins/builds/XXX
[12:49:09] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:50:32] <hashar>	 INFO: npm-node-4.3 #17374 main build action completed: SUCCESS
[12:50:37] <hashar>	 looks like some build pass :)
[12:50:45] <icinga-wm>	 RECOVERY - jenkins_service_running on gallium is OK: PROCS OK: 1 process with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war
[12:50:46] <hashar>	 Urbanecm: Jenkins is back, it will catch up
[12:50:53] <icinga-wm>	 RECOVERY - jenkins_zmq_publisher on gallium is OK: TCP OK - 0.000 second response time on port 8888
[12:51:05] <hashar>	 the events are held in Zuul for which you can get an idea of the builds at https://integration.wikimedia.org/zuul/
[12:51:13] <hashar>	 now that Jenkins is back jobs are running again
[12:51:37] <Urbanecm>	 Ok
[12:52:21] <grrrit-wm>	 (03CR) 10Steinsplitter: [C: 031] Add www.wpc.ncep.noaa.gov to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295478 (https://phabricator.wikimedia.org/T138383) (owner: 10Urbanecm)
[12:54:55] <James_F>	 hashar: Argh, restarting Jenkins again?
[12:55:04] <hashar>	 yeah stopping it again sorry
[12:55:25] * James_F is just eager to see the latest sync of code to Beta Cluster.
[12:55:27] <James_F>	 :-)
[12:57:14] <icinga-wm>	 PROBLEM - puppet last run on db1055 is CRITICAL: CRITICAL: Puppet has 1 failures
[12:58:54] <grrrit-wm>	 (03PS2) 10Rush: labstore: Remove redundant calls to lower() for user names [puppet] - 10https://gerrit.wikimedia.org/r/295455 (owner: 10Tim Landscheidt)
[13:02:16] <hashar>	 almost done
[13:02:24] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production).
[13:02:53] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production).
[13:03:16] <hashar>	 !log Manually moved some missing build records. Restarting Jenkins
[13:03:20] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:05:36] <grrrit-wm>	 (03CR) 10Rush: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/295455 (owner: 10Tim Landscheidt)
[13:06:05] <icinga-wm>	 RECOVERY - Apache HTTP on mw1287 is OK: HTTP OK: HTTP/1.1 200 OK - 11378 bytes in 0.002 second response time
[13:07:19] <grrrit-wm>	 (03CR) 10Rush: [C: 032] "thanks Tim, seems seems fine" [puppet] - 10https://gerrit.wikimedia.org/r/295455 (owner: 10Tim Landscheidt)
[13:07:41] <hashar>	 James_F: Jenkins should be all back
[13:07:54] <chasemp>	 hashar: I caught something of yours on merge
[13:07:55] <chasemp>	 Antoine Musso: contint: create /var/lib/jenkins/builds
[13:07:58] <chasemp>	 is this ok?
[13:08:05] <James_F>	 hashar: Thank you!
[13:08:09] <hashar>	 chasemp: yes
[13:08:28] <hashar>	 chasemp: I have manually created it on the host (gallium)   Alexandros merged that change 
[13:08:55] <chasemp>	 cool no biggie just convention to double chek
[13:08:57] <hashar>	 the puppet change should just be about creating a directory
[13:08:58] <chasemp>	 check
[13:09:05] <hashar>	 yeah better to double check :-}
[13:09:24] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge.
[13:09:54] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge.
[13:10:33] <icinga-wm>	 RECOVERY - salt-minion processes on mw1287 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[13:10:44] <icinga-wm>	 RECOVERY - configured eth on mw1287 is OK: OK - interfaces up
[13:10:59] <grrrit-wm>	 (03CR) 10Hashar: [C: 031] "I have updated update the Jenkins configuration to save the build record under /var/lib/jenkins/builds and have migrated all existing reco" [puppet] - 10https://gerrit.wikimedia.org/r/295255 (https://phabricator.wikimedia.org/T80385) (owner: 10Hashar)
[13:11:04] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1287 is OK: OK: nf_conntrack is 0 % full
[13:11:14] <icinga-wm>	 RECOVERY - dhclient process on mw1287 is OK: PROCS OK: 0 processes with command name dhclient
[13:11:35] <icinga-wm>	 RECOVERY - Disk space on mw1287 is OK: DISK OK
[13:11:44] <icinga-wm>	 RECOVERY - nutcracker port on mw1287 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[13:11:51] <hashar>	 and Icinga checks for gallium are all green
[13:12:04] <icinga-wm>	 RECOVERY - MD RAID on mw1287 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0
[13:12:15] <icinga-wm>	 RECOVERY - nutcracker process on mw1287 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker
[13:13:18] <grrrit-wm>	 (03PS2) 10Hashar: Enable backup for gallium [puppet] - 10https://gerrit.wikimedia.org/r/293690 (https://phabricator.wikimedia.org/T80385) (owner: 10Muehlenhoff)
[13:14:10] <grrrit-wm>	 (03CR) 10Hashar: "I have rebased this change on top of https://gerrit.wikimedia.org/r/#/c/295255/ which add an exclude rule to prevent backing up the Jenkin" [puppet] - 10https://gerrit.wikimedia.org/r/293690 (https://phabricator.wikimedia.org/T80385) (owner: 10Muehlenhoff)
[13:15:44] <icinga-wm>	 RECOVERY - Apache HTTP on mw1288 is OK: HTTP OK: HTTP/1.1 200 OK - 11378 bytes in 0.027 second response time
[13:16:09] <hashar>	 akosiaris: moritzm: I got the Jenkins build history migrated \O/  Would need to add the exclude rule in bacula director then enable the backup system whenever one can monitor its actions
[13:16:14] <icinga-wm>	 RECOVERY - DPKG on mw1287 is OK: All packages OK
[13:17:29] <grrrit-wm>	 (03PS1) 10Gehel: Configuring new elastic1038-1042 servers [puppet] - 10https://gerrit.wikimedia.org/r/295490 (https://phabricator.wikimedia.org/T138329) 
[13:17:43] <grrrit-wm>	 (03CR) 10Muehlenhoff: "We still have the discrepancy between the retention in Jenkins (15/30 days) compared to the backup retention period (60 days).@Alex, is th" [puppet] - 10https://gerrit.wikimedia.org/r/295255 (https://phabricator.wikimedia.org/T80385) (owner: 10Hashar)
[13:17:54] <icinga-wm>	 RECOVERY - Apache HTTP on mw1290 is OK: HTTP OK: HTTP/1.1 200 OK - 11378 bytes in 0.011 second response time
[13:19:24] <icinga-wm>	 RECOVERY - nutcracker process on mw1288 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker
[13:19:54] <icinga-wm>	 RECOVERY - salt-minion processes on mw1288 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[13:20:14] <icinga-wm>	 RECOVERY - configured eth on mw1288 is OK: OK - interfaces up
[13:20:34] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1288 is OK: OK: nf_conntrack is 0 % full
[13:20:44] <icinga-wm>	 RECOVERY - dhclient process on mw1288 is OK: PROCS OK: 0 processes with command name dhclient
[13:20:46] <grrrit-wm>	 (03CR) 10Hashar: "The 15/30 days Jenkins retentions are for the build records / artifacts etc that are in /var/lib/jenkins/builds . Given this change excl" [puppet] - 10https://gerrit.wikimedia.org/r/295255 (https://phabricator.wikimedia.org/T80385) (owner: 10Hashar)
[13:21:05] <icinga-wm>	 RECOVERY - Disk space on mw1288 is OK: DISK OK
[13:21:14] <icinga-wm>	 RECOVERY - puppet last run on db1055 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[13:21:15] <icinga-wm>	 RECOVERY - nutcracker port on mw1288 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[13:21:25] <icinga-wm>	 RECOVERY - MD RAID on mw1288 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0
[13:22:04] <icinga-wm>	 RECOVERY - salt-minion processes on mw1290 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[13:22:44] <icinga-wm>	 RECOVERY - configured eth on mw1290 is OK: OK - interfaces up
[13:22:54] <icinga-wm>	 RECOVERY - dhclient process on mw1290 is OK: PROCS OK: 0 processes with command name dhclient
[13:23:03] <icinga-wm>	 RECOVERY - Check size of conntrack table on mw1290 is OK: OK: nf_conntrack is 0 % full
[13:23:24] <icinga-wm>	 RECOVERY - nutcracker port on mw1290 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 11212
[13:23:34] <icinga-wm>	 RECOVERY - Disk space on mw1290 is OK: DISK OK
[13:23:44] <icinga-wm>	 RECOVERY - nutcracker process on mw1290 is OK: PROCS OK: 1 process with UID = 110 (nutcracker), command name nutcracker
[13:24:04] <icinga-wm>	 RECOVERY - MD RAID on mw1290 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0
[13:24:10] <elukey>	 these are the appservers after the first puppet run, sorry again for the spam
[13:25:34] <icinga-wm>	 RECOVERY - DPKG on mw1288 is OK: All packages OK
[13:25:34] <icinga-wm>	 RECOVERY - DPKG on mw1290 is OK: All packages OK
[13:29:21] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: swift: align partition to 1M boundary [puppet] - 10https://gerrit.wikimedia.org/r/295492 
[13:31:34] <gehel>	 !log configuring new elasticsearch servers elastic1038-1042 in eqiad
[13:31:39] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:31:48] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] Configuring new elastic1038-1042 servers [puppet] - 10https://gerrit.wikimedia.org/r/295490 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[13:34:53] <icinga-wm>	 RECOVERY - puppet last run on mw1287 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures
[13:35:39] <wikibugs>	 06Operations, 10ops-codfw, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2399018 (10fgiunchedi) @papaul the two ssd were in raid1, was it the default configuration? I'm asking because in this case we need all disks in raid0, this is what I...
[13:41:29] <grrrit-wm>	 (03PS1) 10Giuseppe Lavagetto: role::cache::text: handle url shortener requests [puppet] - 10https://gerrit.wikimedia.org/r/295493 (https://phabricator.wikimedia.org/T133485) 
[13:42:57] <godog>	 !log add 500G to fluorine /a (almost full)
[13:43:01] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:45:23] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: SVG rendering with marker-element is different between librsvg and Inkscape - https://phabricator.wikimedia.org/T97758#2399052 (10MoritzMuehlenhoff)
[13:45:35] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering, 07Upstream: SVG rendering with marker-element is different between librsvg and Inkscape - https://phabricator.wikimedia.org/T97758#1251624 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff
[13:46:23] <hashar>	 godog: we would want to one day revisit what we collect on fluorine :D   api.log is already 45GBytes large ..
[13:50:25] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering: Install Amiri font (arabic) for svg - https://phabricator.wikimedia.org/T135347#2399101 (10MoritzMuehlenhoff)
[13:51:14] <icinga-wm>	 RECOVERY - puppet last run on mw1288 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures
[13:51:31] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering: Install Amiri font (arabic) for svg - https://phabricator.wikimedia.org/T135347#2295971 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff
[13:51:37] <godog>	 hashar: heh indeed, there's already 90d retention but it keeps slowly growing too
[13:53:45] <hashar>	 godog: ah yeah Jenkins has its own 90 days history of all configuration changes
[13:54:28] <hashar>	 so in theory if we look at a 60 days old backup from bacula we could get the config from 150 days ago..
[13:56:26] <grrrit-wm>	 (03PS1) 10Elukey: Add new MediaWiki appservers to the scap DSH list. [puppet] - 10https://gerrit.wikimedia.org/r/295497 
[13:56:38] <grrrit-wm>	 (03PS1) 10Muehlenhoff: Add Amiri font to the scalers [puppet] - 10https://gerrit.wikimedia.org/r/295498 (https://phabricator.wikimedia.org/T135347) 
[13:57:15] <grrrit-wm>	 (03PS2) 10Giuseppe Lavagetto: role::cache::text: handle url shortener requests [puppet] - 10https://gerrit.wikimedia.org/r/295493 (https://phabricator.wikimedia.org/T133485) 
[13:57:53] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] "We can safely ignore the retention disrepancy. It's a common pattern." [puppet] - 10https://gerrit.wikimedia.org/r/295255 (https://phabricator.wikimedia.org/T80385) (owner: 10Hashar)
[13:58:00] <grrrit-wm>	 (03PS2) 10Alexandros Kosiaris: contint: do not backup Jenkins build history [puppet] - 10https://gerrit.wikimedia.org/r/295255 (https://phabricator.wikimedia.org/T80385) (owner: 10Hashar)
[13:58:03] <grrrit-wm>	 (03CR) 10Eevans: [C: 031] "My Puppet-fu is weak, but this LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/295123 (https://phabricator.wikimedia.org/T137422) (owner: 10Nicko)
[13:58:14] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [V: 032] contint: do not backup Jenkins build history [puppet] - 10https://gerrit.wikimedia.org/r/295255 (https://phabricator.wikimedia.org/T80385) (owner: 10Hashar)
[13:59:11] <grrrit-wm>	 (03PS3) 10Alexandros Kosiaris: Enable backup for gallium [puppet] - 10https://gerrit.wikimedia.org/r/293690 (https://phabricator.wikimedia.org/T80385) (owner: 10Muehlenhoff)
[13:59:18] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Enable backup for gallium [puppet] - 10https://gerrit.wikimedia.org/r/293690 (https://phabricator.wikimedia.org/T80385) (owner: 10Muehlenhoff)
[13:59:56] <grrrit-wm>	 (03CR) 10Gehel: "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/295497 (owner: 10Elukey)
[14:00:05] <grrrit-wm>	 (03CR) 10Gehel: [C: 031] Add new MediaWiki appservers to the scap DSH list. [puppet] - 10https://gerrit.wikimedia.org/r/295497 (owner: 10Elukey)
[14:00:21] <grrrit-wm>	 (03PS2) 10Elukey: Add new MediaWiki appservers to the scap DSH list. [puppet] - 10https://gerrit.wikimedia.org/r/295497 
[14:02:43] <grrrit-wm>	 (03CR) 10BBlack: role::cache::text: handle url shortener requests (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295493 (https://phabricator.wikimedia.org/T133485) (owner: 10Giuseppe Lavagetto)
[14:03:11] <grrrit-wm>	 (03CR) 10Elukey: [C: 032 V: 032] Add new MediaWiki appservers to the scap DSH list. [puppet] - 10https://gerrit.wikimedia.org/r/295497 (owner: 10Elukey)
[14:04:13] <moritzm>	 !log rolling restart of hhvm/apache on app servers in eqiad for expat security update
[14:04:18] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:05:25] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: install_server: add partman recipe for prometheus [puppet] - 10https://gerrit.wikimedia.org/r/295499 
[14:08:20] <grrrit-wm>	 (03PS2) 10Filippo Giunchedi: install_server: add partman recipe for prometheus [puppet] - 10https://gerrit.wikimedia.org/r/295499 
[14:08:27] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] install_server: add partman recipe for prometheus [puppet] - 10https://gerrit.wikimedia.org/r/295499 (owner: 10Filippo Giunchedi)
[14:10:25] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering, 13Patch-For-Review: Install Amiri font (arabic) for svg - https://phabricator.wikimedia.org/T135347#2399192 (10MoritzMuehlenhoff) @Uwe_a: I have prepared a patch to install that font on the Wikimedia servers. Do you have a test case SVG which would visually improve if...
[14:13:16] <grrrit-wm>	 (03PS3) 10Legoktm: role::cache::text: handle url shortener requests [puppet] - 10https://gerrit.wikimedia.org/r/295493 (https://phabricator.wikimedia.org/T133485) (owner: 10Giuseppe Lavagetto)
[14:13:19] <grrrit-wm>	 (03CR) 10Legoktm: role::cache::text: handle url shortener requests (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295493 (https://phabricator.wikimedia.org/T133485) (owner: 10Giuseppe Lavagetto)
[14:20:02] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "2000 OK estimate files=800,331 bytes=31,458,895,274" [puppet] - 10https://gerrit.wikimedia.org/r/293690 (https://phabricator.wikimedia.org/T80385) (owner: 10Muehlenhoff)
[14:27:20] <grrrit-wm>	 (03CR) 10BBlack: [C: 031] role::cache::text: handle url shortener requests [puppet] - 10https://gerrit.wikimedia.org/r/295493 (https://phabricator.wikimedia.org/T133485) (owner: 10Giuseppe Lavagetto)
[14:28:13] <icinga-wm>	 RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0
[14:29:33] <icinga-wm>	 RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0
[14:29:44] <tgr>	 !log running https://phabricator.wikimedia.org/diffusion/ECAU/browse/master/maintenance/checkLocalUser.php for some users T119736
[14:29:45] <stashbot>	 T119736: Could not find local user data for {Username}@{wiki} - https://phabricator.wikimedia.org/T119736
[14:29:48] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:30:24] <wikibugs>	 06Operations, 10ops-codfw, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2399328 (10Papaul) @fgiunchedi yes the default was raid1 i can but that in raid 0 like the other disks
[14:32:32] <jynus>	 !log checksumming m1 databases in preparation for failover
[14:32:36] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[14:40:43] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw1288 is OK: OK
[14:40:43] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw1287 is OK: OK
[14:43:04] <icinga-wm>	 RECOVERY - mediawiki-installation DSH group on mw1290 is OK: OK
[14:49:37] <grrrit-wm>	 (03PS5) 10EBernhardson: Duplicate logstash output to alternate elasticsearch cluster [puppet] - 10https://gerrit.wikimedia.org/r/295442 
[15:00:04] <jouncebot>	 anomie, ostriches, thcipriani, marktraceur, and Krenair: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160622T1500). Please do the needful.
[15:00:04] <jouncebot>	 dapatrick and Urbanecm: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process.
[15:00:28] <Urbanecm>	 Around
[15:00:32] <wikibugs>	 06Operations, 10Gerrit, 06Release-Engineering-Team, 06WMF-Legal, and 2 others: Gerrit seemingly violates data retention guidelines - https://phabricator.wikimedia.org/T114395#1694145 (10Mpaulson) Has this been adjusted so that it deletes the logs after 30 days?
[15:01:14] <thcipriani>	 I can SWAT today.
[15:01:27] <elukey>	 !log rebooting bohrium.eqiad.wmnet (running piwik) for kernel upgrades
[15:01:32] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:02:03] <thcipriani>	 dapatrick: is there a backport for https://gerrit.wikimedia.org/r/#/c/295191/ ?
[15:02:45] <dapatrick>	 thcipriani: Shoot. No, I didn't do that.
[15:03:04] <thcipriani>	 I can do it, is it just for wmf.6?
[15:03:31] <dapatrick>	 I believe so, yes. What versions are currently in use?
[15:04:23] <thcipriani>	 wmf.7 just made it to testwiki, not sure if this made it in before the cut https://noc.wikimedia.org/conf/
[15:05:50] <thcipriani>	 hmm, doesn't look like it made it in
[15:06:10] <dapatrick>	 Nope. It was merged after the branch was cut.
[15:06:27] <thcipriani>	 ack. kk, so backport for wmf.6 and wmf.7?
[15:07:54] <dapatrick>	 Yes, please. Thanks! Sorry I didn't cherry pick those myself.
[15:07:55] <thcipriani>	 dapatrick: could you check me on these, please: https://gerrit.wikimedia.org/r/#/c/295510/ https://gerrit.wikimedia.org/r/#/c/295511/
[15:08:00] <thcipriani>	 np :)
[15:08:28] <grrrit-wm>	 (03PS2) 10Thcipriani: Add www.wpc.ncep.noaa.gov to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295478 (https://phabricator.wikimedia.org/T138383) (owner: 10Urbanecm)
[15:09:30] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295478 (https://phabricator.wikimedia.org/T138383) (owner: 10Urbanecm)
[15:09:55] <dapatrick>	 thcipriani: Those look good.
[15:10:06] <thcipriani>	 dapatrick: cool, thanks
[15:10:09] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add www.wpc.ncep.noaa.gov to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295478 (https://phabricator.wikimedia.org/T138383) (owner: 10Urbanecm)
[15:13:19] <logmsgbot>	 !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:295478|Add www.wpc.ncep.noaa.gov to wgCopyUploadsDomains]] (duration: 00m 54s)
[15:13:34] <thcipriani>	 ^ Urbanecm check please
[15:13:35] <wikibugs>	 06Operations, 10ops-eqiad: eqiad: Install SSD's into ganeti hosts - https://phabricator.wikimedia.org/T138414#2399490 (10Cmjohnson)
[15:14:12] <Urbanecm>	 I have no access to any tool which use this whitelist so I have to ask the author of the request on phab. 
[15:14:51] <thcipriani>	 Urbanecm: okie doke, well, it's live now :)
[15:14:59] <grrrit-wm>	 (03CR) 10Dereckson: Add www.wpc.ncep.noaa.gov to wgCopyUploadsDomains (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295478 (https://phabricator.wikimedia.org/T138383) (owner: 10Urbanecm)
[15:15:18] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: install_server: smaller root for single-disk /srv [puppet] - 10https://gerrit.wikimedia.org/r/295513 
[15:17:01] <grrrit-wm>	 (03PS1) 10Dereckson: Improve style [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295514 
[15:17:22] <grrrit-wm>	 (03CR) 10Dereckson: "Follow-up: I83bb5af83df49c4243f6bd68002c62b76afc0226" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295478 (https://phabricator.wikimedia.org/T138383) (owner: 10Urbanecm)
[15:17:55] <Dereckson>	 Hi, may I suggest to merge https://gerrit.wikimedia.org/r/#/c/295514/ too to fix the comma issue for 295478?
[15:18:00] <logmsgbot>	 !log thcipriani@tin Synchronized php-1.28.0-wmf.7/extensions/OATHAuth: SWAT: [[gerrit:295511|Fixup qrcode-generating js, to stop race condition.]] (duration: 00m 27s)
[15:18:05] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:18:08] <thcipriani>	 ^ dapatrick check please
[15:18:22] <dapatrick>	 Checking.
[15:19:03] <Dereckson>	 Urbanecm: the goal of this trailing comma is the next diff only touches the line you add, not the line before, so `git blame` is more accurate
[15:19:26] <thcipriani>	 Dereckson: sure, thank you :)
[15:20:26] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295514 (owner: 10Dereckson)
[15:20:28] <Dereckson>	 You're welcome.
[15:20:56] <Urbanecm>	 Dereckson: Thx for the patch. I just noticed your comment in the task on phab so I'm going to create patch which whitelist *.noaa.gov instead of only one domain. 
[15:21:11] <grrrit-wm>	 (03Merged) 10jenkins-bot: Improve style [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295514 (owner: 10Dereckson)
[15:21:28] <Dereckson>	 Urbanecm: I noticed * only replaces one subdomain
[15:21:48] <dapatrick>	 thcipriani: Weird. This looks like old code on enwiki.
[15:21:49] <Dereckson>	 so *.nooa.gov would be for quux.nooa.gov but not for www.quux.nooa.gov nor www.alpha.beta.noaa.gov:(
[15:22:12] <Urbanecm>	 And do you know how to whitelist all subdomains of nooa.gov?
[15:22:27] <thcipriani>	 dapatrick: ah, enwiki is on wmf.6, I just sync'd wmf.7 so far
[15:22:33] <Dereckson>	 With 3 entries: *.nooa.gov *.*.nooa.gov *.*.*.nooa.gov
[15:22:39] <Dereckson>	 But I'm not sure it's a really good idea.
[15:22:49] <Dereckson>	 If we do that, that means we trust EVERY of their server.
[15:23:10] <Dereckson>	 This whitelist is restricted to avoid DDoS from the Wikimedia Cluster to their network, or from their network to our cluster
[15:23:12] <thcipriani>	 dapatrick: possible to test on any group0 wikis, or should I roll forward with wmf.7?
[15:23:13] <logmsgbot>	 !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:295514|Improve style]] (duration: 00m 33s)
[15:23:17] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:23:21] <thcipriani>	 ^ Dereckson sync'd
[15:23:26] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review: Investigate TCP Fast Open for tlsproxy - https://phabricator.wikimedia.org/T108827#2399537 (10ema) A few more things tools-wise.  TFO support has been added to curl in version 7.49.0: [[ https://curl.haxx.se/changes.html#7_49_0 | --tcp-fastopen]]. Unfortunately De...
[15:23:28] <Dereckson>	 ack'd
[15:23:37] <Urbanecm>	 I think that at nooa.gov should be only content created by them so it should be all in public domain. 
[15:23:40] <thcipriani>	 dapatrick: er...roll forward with wmf.6
[15:23:41] <Dereckson>	 We don't know how NOAA organizes their server (one datacenter? a lot?)
[15:24:06] <Urbanecm>	 So should we whitelist only the domain which was mentoined in the request?
[15:24:07] <Dereckson>	 Yes, but it's an operations and security concern here, as you would increase our surface attack
[15:24:11] <dapatrick>	 thcipriani: Ah, yes, go ahead with wmf.6.
[15:24:17] <thcipriani>	 dapatrick: ack.
[15:24:34] <Urbanecm>	 And if we add only *.*.nooa.gov? All mentoined domains will match this filter I think. 
[15:24:37] <Dereckson>	 I confirm NOAA content is mostyl PD-Gov, so there isn't any concern for the licensing
[15:25:18] <Dereckson>	 but imagine they have something.nooa.gov in another network from their own, you will also whitelist this one.
[15:25:39] <Dereckson>	 I'd suggest here to be conservative and only whitelist the needed domains;
[15:25:56] <Dereckson>	 you could reach csteipp for a second opinion if you find information about how NOAA manages their network sources.
[15:25:58] <Urbanecm>	 And should I add the domain that you've mentoined in the request?
[15:25:59] <Dereckson>	 resources
[15:26:27] <Dereckson>	 Perhaps, but let's ask Fae first.
[15:26:45] <Urbanecm>	 In the task?
[15:26:45] <logmsgbot>	 !log thcipriani@tin Synchronized php-1.28.0-wmf.6/extensions/OATHAuth: SWAT: [[gerrit:295510|Fixup qrcode-generating js, to stop race condition.]] (duration: 00m 33s)
[15:26:48] <grrrit-wm>	 (03PS1) 10Gehel: Configuring new elastic1043-1047 servers [puppet] - 10https://gerrit.wikimedia.org/r/295524 (https://phabricator.wikimedia.org/T138329) 
[15:26:49] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[15:26:49] <thcipriani>	 ^ dapatrick check please
[15:28:46] <Dereckson>	 Urbanecm: I pinged it on IRC, but not sure they are online.
[15:29:15] <grrrit-wm>	 (03PS1) 10Gehel: Add new MediaWiki appserver to the scap DSH list. [puppet] - 10https://gerrit.wikimedia.org/r/295525 
[15:29:17] <Urbanecm>	 Nobody with this nick is in this channel.
[15:29:29] <Dereckson>	 #wikimedia-commons
[15:30:33] <Dereckson>	 By the way, I tested an upload by url, that works for www.wpc.ncep.noaa.gov.
[15:31:14] <Urbanecm>	 Thx
[15:31:19] <grrrit-wm>	 (03CR) 10Elukey: [C: 031] Add new MediaWiki appserver to the scap DSH list. [puppet] - 10https://gerrit.wikimedia.org/r/295525 (owner: 10Gehel)
[15:31:35] <Dereckson>	 I would like to thank you to take care of these requests so quickly. That's appreciated.
[15:32:27] <dapatrick>	 thcipriani: Working as expected. Thanks!
[15:32:40] <thcipriani>	 dapatrick: awesome, thanks for checking!
[15:33:02] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] Add new MediaWiki appserver to the scap DSH list. [puppet] - 10https://gerrit.wikimedia.org/r/295525 (owner: 10Gehel)
[15:39:20] <icinga-wm>	 PROBLEM - swift-object-auditor on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor
[15:39:31] <icinga-wm>	 PROBLEM - swift-object-replicator on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator
[15:39:40] <icinga-wm>	 PROBLEM - swift-account-reaper on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper
[15:39:41] <icinga-wm>	 PROBLEM - swift-object-server on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server
[15:40:00] <icinga-wm>	 PROBLEM - swift-account-auditor on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor
[15:40:11] <icinga-wm>	 PROBLEM - swift-container-auditor on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor
[15:40:11] <icinga-wm>	 PROBLEM - swift-account-server on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server
[15:40:11] <icinga-wm>	 PROBLEM - swift-object-updater on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater
[15:40:41] <icinga-wm>	 PROBLEM - swift-container-server on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server
[15:40:41] <icinga-wm>	 PROBLEM - swift-account-replicator on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator
[15:40:58] <godog>	 err, that's me ^ apologies
[15:41:11] <icinga-wm>	 PROBLEM - swift-container-updater on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater
[15:41:21] <icinga-wm>	 PROBLEM - swift-container-replicator on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-replicator
[15:41:31] <icinga-wm>	 RECOVERY - swift-object-auditor on ms-be2022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-auditor
[15:41:50] <icinga-wm>	 RECOVERY - swift-object-replicator on ms-be2022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator
[15:41:52] <icinga-wm>	 RECOVERY - swift-account-reaper on ms-be2022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper
[15:42:01] <icinga-wm>	 RECOVERY - swift-object-server on ms-be2022 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server
[15:42:21] <icinga-wm>	 RECOVERY - swift-account-auditor on ms-be2022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor
[15:42:31] <icinga-wm>	 RECOVERY - swift-container-auditor on ms-be2022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor
[15:42:31] <icinga-wm>	 RECOVERY - swift-account-server on ms-be2022 is OK: PROCS OK: 41 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server
[15:42:31] <icinga-wm>	 RECOVERY - swift-object-updater on ms-be2022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater
[15:43:01] <icinga-wm>	 RECOVERY - swift-container-server on ms-be2022 is OK: PROCS OK: 41 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server
[15:43:01] <icinga-wm>	 RECOVERY - swift-account-replicator on ms-be2022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator
[15:43:31] <icinga-wm>	 RECOVERY - swift-container-updater on ms-be2022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater
[15:43:41] <icinga-wm>	 RECOVERY - swift-container-replicator on ms-be2022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator
[15:48:35] <wikibugs>	 06Operations, 10ops-eqiad, 06DC-Ops: dbstore1001 management interface has saturated the number of available ssh connections - https://phabricator.wikimedia.org/T126227#2399596 (10Cmjohnson) 05Open>03Resolved Fixed
[15:49:10] <wikibugs>	 06Operations, 10ops-eqiad: db1061 and db1062 management interface needs physical reset - https://phabricator.wikimedia.org/T138368#2399599 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson Fixed
[15:49:21] <grrrit-wm>	 (03PS1) 10Filippo Giunchedi: install_server: separate /srv for prometheus [puppet] - 10https://gerrit.wikimedia.org/r/295532 
[15:50:51] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: [C: 032 V: 032] install_server: separate /srv for prometheus [puppet] - 10https://gerrit.wikimedia.org/r/295532 (owner: 10Filippo Giunchedi)
[15:51:26] <wikibugs>	 06Operations, 10ops-eqiad: eqiad: Install ssds to labmon1001 - https://phabricator.wikimedia.org/T138415#2399605 (10Krenair)
[15:51:42] <wikibugs>	 06Operations, 10ops-eqiad: eqiad: Install ssds to labmon1001 - https://phabricator.wikimedia.org/T138415#2399608 (10Krenair)
[15:54:47] <wikibugs>	 06Operations, 10ops-eqiad: rack/setup/install/deploy labsdb1009-labsdb1011 - https://phabricator.wikimedia.org/T136860#2399627 (10Cmjohnson) I cannot rack this in A5, it is a 10G rack.
[15:56:13] <wikibugs>	 06Operations, 10LDAP-Access-Requests: LDAP Account required for Transparency Report - https://phabricator.wikimedia.org/T138369#2398350 (10Krenair) You don't appear on https://wikimediafoundation.org/wiki/Staff_and_contractors nor a couple of other pages I checked... Do you have a " (WMF)" SUL account or somet...
[15:58:02] <wikibugs>	 06Operations, 10ops-eqiad: mw1063 broken - https://phabricator.wikimedia.org/T137381#2399632 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson mw1063 has been decommissioned per T129060
[15:58:54] <grrrit-wm>	 (03PS2) 10Gehel: Configuring new elastic1043-1047 servers [puppet] - 10https://gerrit.wikimedia.org/r/295524 (https://phabricator.wikimedia.org/T138329) 
[16:00:42] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] Configuring new elastic1043-1047 servers [puppet] - 10https://gerrit.wikimedia.org/r/295524 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[16:05:07] <grrrit-wm>	 (03PS1) 10Gehel: Adding rack location of new elasticsearch servers [puppet] - 10https://gerrit.wikimedia.org/r/295536 (https://phabricator.wikimedia.org/T138329) 
[16:06:38] <wikibugs>	 06Operations, 10ops-eqiad, 10Analytics-Cluster, 06Analytics-Kanban: analytics1049.eqiad.wmnet disk failure - https://phabricator.wikimedia.org/T137273#2399672 (10Cmjohnson) Disk has been ordered.
[16:06:44] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] Adding rack location of new elasticsearch servers [puppet] - 10https://gerrit.wikimedia.org/r/295536 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[16:09:23] <grrrit-wm>	 (03PS1) 10Gehel: Fixed missing location of elastic1045 [puppet] - 10https://gerrit.wikimedia.org/r/295537 (https://phabricator.wikimedia.org/T138329) 
[16:10:06] <grrrit-wm>	 (03PS1) 10BBlack: r::c::perf: consolidate net tuning and mysterious values [puppet] - 10https://gerrit.wikimedia.org/r/295538 
[16:10:08] <grrrit-wm>	 (03PS1) 10BBlack: r::c::perf: un-mysterious netdev_max_backlog [puppet] - 10https://gerrit.wikimedia.org/r/295539 
[16:10:10] <grrrit-wm>	 (03PS1) 10BBlack: r::c::perf: un-mysterious somaxconn + syn_backlog [puppet] - 10https://gerrit.wikimedia.org/r/295540 
[16:10:12] <grrrit-wm>	 (03PS1) 10BBlack: r::c::perf: un-mysterious the rest [puppet] - 10https://gerrit.wikimedia.org/r/295541 
[16:10:14] <grrrit-wm>	 (03PS1) 10BBlack: r::c::perf: disable prequeue timestamps [puppet] - 10https://gerrit.wikimedia.org/r/295542 
[16:10:16] <grrrit-wm>	 (03PS1) 10BBlack: r::c::perf: raise netdev_budget a bit [puppet] - 10https://gerrit.wikimedia.org/r/295543 
[16:10:18] <grrrit-wm>	 (03PS1) 10BBlack: LVS: sysctl netdev tuning [puppet] - 10https://gerrit.wikimedia.org/r/295544 
[16:10:40] <grrrit-wm>	 (03CR) 10Gehel: [C: 032] Fixed missing location of elastic1045 [puppet] - 10https://gerrit.wikimedia.org/r/295537 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[16:13:29] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review: Decrease max object TTL in varnishes - https://phabricator.wikimedia.org/T124954#1970940 (10Krinkle) How does the cache ttl of Varnish interact with the concept of 304 renewals?  I remember in the past we often had bugs where a cache object had expired (but not ye...
[16:13:34] <Krinkle>	 ori_: bblack: ^
[16:16:50] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] LVS: sysctl netdev tuning [puppet] - 10https://gerrit.wikimedia.org/r/295544 (owner: 10BBlack)
[16:16:58] <wikibugs>	 06Operations, 10ops-eqiad: db1009 degraded RAID (failed disk) - https://phabricator.wikimedia.org/T138203#2393080 (10Cmjohnson) Disk swapped rebuilding  root@db1009:~# megacli -PDList -aALL |grep "Firmware state:" Firmware state: Online, Spun Up Firmware state: Online, Spun Up Firmware state: Online, Spun Up F...
[16:17:27] <wikibugs>	 06Operations, 10LDAP-Access-Requests: LDAP Account required for Transparency Report - https://phabricator.wikimedia.org/T138369#2399740 (10siddharth11) I've just recently joined WMF, that too for couple of months. My main work is to collaborate with the Legal team comprising of Aeryn Palmer and James Buatti to...
[16:18:13] <grrrit-wm>	 (03CR) 10Krinkle: Only mirror refs/heads/ and refs/tags/ for mw core and operations/puppet (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/295011 (owner: 10Paladox)
[16:19:43] <grrrit-wm>	 (03CR) 10Paladox: Only mirror refs/heads/ and refs/tags/ for mw core and operations/puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295011 (owner: 10Paladox)
[16:19:50] <paladox>	 Krinkle ^^
[16:20:30] <gehel>	 !log new elasticsearch servers elastic1032-1047 are configured and have joined the eqiad cluster
[16:20:34] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[16:20:48] <Krinkle>	 paladox: What did you test? Did you test having a repository in Phab and on GitHub and seeing it replicate correctly?
[16:20:57] <paladox>	 Krinkle yes
[16:21:15] <Krinkle>	 How does it now to connect MW with github/wikimedia/mediawiki ?
[16:21:16] <paladox>	 I tested on my local install, using a imported repo from gerrit which was mw core.
[16:21:37] <paladox>	 Krinkle since mw core is big it wont push.
[16:21:51] <paladox>	 I asked github and they said they do have a memory limit or a limit.
[16:22:18] <paladox>	 mw core now includes refs/changes/ same for operations/puppet and they have so much they wont push.
[16:22:22] <Krinkle>	 How does Phabricator convert repo name "MW (mediawiki)" into url https://github.com/wikimedia/mediawiki and know to use that configuration?
[16:22:42] <Krinkle>	 mw-core isn't replicated from PHabricator right now, it's from Gerrit, right?
[16:22:46] <paladox>	 Krinkle since in the mirror uri it does git push --mirror 
[16:22:50] <Krinkle>	 Which only pushes branches and tags.
[16:22:57] <paladox>	 and yes it is being replicated from phabricator
[16:23:17] <paladox>	 But dosent work due to refs/changes/ which has so many refs
[16:23:42] <paladox>	 Krinkle https://phabricator.wikimedia.org/diffusion/MW/uri/view/4/
[16:24:05] <grrrit-wm>	 (03PS1) 10Yurik: Maps: Limit query exec time for kartotherian user [puppet] - 10https://gerrit.wikimedia.org/r/295548 (https://phabricator.wikimedia.org/T138422) 
[16:24:06] <paladox>	 it does git push --mirror https://github.com/wikimedia/mediawiki <!-- and grabs the credentials too.
[16:28:52] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review: Decrease max object TTL in varnishes - https://phabricator.wikimedia.org/T124954#2399786 (10BBlack) Varnish 3 and 4 may differ a bit on 304 basics, and Varnish 4 clearly does a better job of managing grace-mode in general, and using it for 304-refreshes, and my cu...
[16:30:25] <wikibugs>	 06Operations, 10Traffic, 13Patch-For-Review: Decrease max object TTL in varnishes - https://phabricator.wikimedia.org/T124954#2399812 (10BBlack) I should have noted above: our current maximum grace is 1 hour beyond whatever the TTL is.  Basically we're really not using grace very effectively today, but it's...
[16:30:41] <grrrit-wm>	 (03PS2) 10Paladox: Only mirror refs/heads/ and refs/tags/ for mw core and operations/puppet [puppet] - 10https://gerrit.wikimedia.org/r/295011 
[16:32:09] <grrrit-wm>	 (03PS2) 10BBlack: LVS: sysctl netdev tuning [puppet] - 10https://gerrit.wikimedia.org/r/295544 
[16:32:54] <paladox>	 Krinkle we would need to manually add refs/changes/ for those two repos on github which is the alternative to the patch. And plus we are switching on the github mirrors in phabricator now.
[16:32:58] <paladox>	 slowly.
[16:33:33] <grrrit-wm>	 (03CR) 10Paladox: Only mirror refs/heads/ and refs/tags/ for mw core and operations/puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295011 (owner: 10Paladox)
[16:35:56] <wikibugs>	 06Operations, 10media-storage: 'swift' user/group IDs should be consistent across the fleet - https://phabricator.wikimedia.org/T123918#2399821 (10fgiunchedi) I've started provisioning the `swift` user before puppet on the new swift hardware, since there might be a few packages already installed either post-pr...
[16:38:11] <grrrit-wm>	 (03PS3) 10BBlack: LVS: sysctl netdev tuning [puppet] - 10https://gerrit.wikimedia.org/r/295544 
[16:38:13] <grrrit-wm>	 (03PS2) 10BBlack: r::c::perf: raise netdev_budget a bit [puppet] - 10https://gerrit.wikimedia.org/r/295543 
[16:38:15] <grrrit-wm>	 (03PS2) 10BBlack: r::c::perf: disable prequeue timestamps [puppet] - 10https://gerrit.wikimedia.org/r/295542 
[16:38:17] <grrrit-wm>	 (03PS2) 10BBlack: r::c::perf: un-mysterious the rest [puppet] - 10https://gerrit.wikimedia.org/r/295541 
[16:44:57] <wikibugs>	 06Operations, 06Commons, 10Wikimedia-SVG-rendering: SVG files larger than 10 MB cannot be thumbnailed - https://phabricator.wikimedia.org/T111815#2399856 (10Amitie_10g) I see the files that {{HugeSVG}} ins transcluded, but them're still unable to render as PNG.  https://commons.wikimedia.org/wiki/Special:Wha...
[16:46:10] <icinga-wm>	 PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: puppet fail
[16:46:24] <wikibugs>	 06Operations, 06Commons, 10Wikimedia-SVG-rendering: SVG files larger than 10 MB cannot be thumbnailed - https://phabricator.wikimedia.org/T111815#2399862 (10matmarex) The update is not deployed yet, Moritz is working on it. Please subscribe to the blocking task {T112421} for updates.
[16:46:50] <icinga-wm>	 RECOVERY - MegaRAID on db1009 is OK: OK: optimal, 1 logical, 2 physical
[16:52:07] <wikibugs>	 06Operations, 10ops-eqiad: labsdb1001:  Swap eth0 cable - https://phabricator.wikimedia.org/T137555#2399880 (10Cmjohnson) Swapped cable but same issue persists.
[16:52:35] <grrrit-wm>	 (03PS1) 10Hoo man: Log PHP/HHVM errors in CLI mode to stderr, not stdout [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295554 (https://phabricator.wikimedia.org/T138291) 
[16:57:12] <wikibugs>	 06Operations, 10ops-codfw, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2399939 (10fgiunchedi)
[16:57:19] <wikibugs>	 06Operations, 10Wikimedia-SVG-rendering, 13Patch-For-Review: Install Amiri font (arabic) for svg - https://phabricator.wikimedia.org/T135347#2399941 (10Uwe_a) @MoritzMuehlenhoff , sure, https://commons.wikimedia.org/wiki/File:Sinai_-_Camp_David_Treaty_Zones_-_ar.svg would be a ready example!
[17:00:04] <jouncebot>	 jynus: Respected human, time to deploy m1 failoverPotentially affecting gerrit (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160622T1700). Please do the needful.
[17:00:04] <jouncebot>	 jynus: A patch you scheduled for m1 failoverPotentially affecting gerrit is about to be deployed. Please be available during the process.
[17:02:26] <wikibugs>	 06Operations, 10ops-eqiad: rack/setup/install/deploy labsdb1009-labsdb1011 - https://phabricator.wikimedia.org/T136860#2399951 (10Cmjohnson) labs1009 will go into A2 slot 15/16
[17:03:44] <jynus>	 ok, someone around that may help?
[17:03:59] <jynus>	 moritzm, akosiaris ?
[17:06:14] <moritzm>	 aeound
[17:06:16] <moritzm>	 around
[17:07:33] <jynus>	 ok, I will change haproxy config in a second, and reload the server- then I will see which services behaved correctly and if anyone didn't
[17:08:28] <jynus>	 let's prioritize puppet (for annoying) and gerrit for most impacting
[17:09:12] <moritzm>	 sure
[17:10:03] <jynus>	 !log failovered m1-master from db1001 to db1016
[17:10:07] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:10:12] <jynus>	 ok, librenms ok
[17:10:35] <jynus>	 puppet is still connected to db1001
[17:11:26] <jynus>	 but new connections are alredy going to the right place
[17:11:33] <jynus>	 (maybe some will fail?)
[17:11:40] <icinga-wm>	 RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:11:42] <moritzm>	 racktables and etherpad worked fine (with new sessions/logins)
[17:12:02] <Krenair>	 racktables still appears to be working
[17:12:12] <jynus>	 I will kill manually puppet connections
[17:12:23] <jynus>	 rt seems also stuck on db1001
[17:13:01] <Krenair>	 did you say reviewdb was on m1?
[17:13:11] <jynus>	 yes
[17:13:22] <Krenair>	 gsql connects me to m2's master, db1020
[17:14:20] <jynus>	 puppet is ok after kill
[17:14:33] <Krenair>	 can you `select max(last_updated_on) from reviewdb.changes;` on m1?
[17:14:34] <jynus>	 I will kill the rest of the users using perstent connections
[17:14:53] <jynus>	 yeah, 2012
[17:14:57] <Krenair>	 okay, that's old then
[17:14:59] <jynus>	 so it is outdated
[17:15:03] <jynus>	 needs to be killed
[17:15:12] <Krenair>	  max(last_updated_on)
[17:15:12] <Krenair>	  ---------------------
[17:15:12] <Krenair>	  2016-06-22 17:10:39.0
[17:15:16] <Krenair>	 from m2
[17:15:36] <jynus>	 !log killing puppet, rt, librenms user connections on db1001
[17:15:40] <icinga-wm>	 PROBLEM - puppet last run on mw1243 is CRITICAL: CRITICAL: puppet fail
[17:15:41] <icinga-wm>	 PROBLEM - puppet last run on mw2160 is CRITICAL: CRITICAL: puppet fail
[17:15:41] <icinga-wm>	 PROBLEM - puppet last run on analytics1015 is CRITICAL: CRITICAL: puppet fail
[17:15:41] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:15:50] <icinga-wm>	 PROBLEM - puppet last run on rdb2002 is CRITICAL: CRITICAL: puppet fail
[17:15:51] <icinga-wm>	 PROBLEM - puppet last run on maps-test2002 is CRITICAL: CRITICAL: puppet fail
[17:15:52] <icinga-wm>	 PROBLEM - puppet last run on rdb1002 is CRITICAL: CRITICAL: puppet fail
[17:15:52] <icinga-wm>	 PROBLEM - puppet last run on ms-be2002 is CRITICAL: CRITICAL: puppet fail
[17:16:01] <icinga-wm>	 PROBLEM - puppet last run on mw2170 is CRITICAL: CRITICAL: puppet fail
[17:16:01] <icinga-wm>	 PROBLEM - puppet last run on mw2164 is CRITICAL: CRITICAL: puppet fail
[17:16:02] <icinga-wm>	 PROBLEM - puppet last run on mc2009 is CRITICAL: CRITICAL: puppet fail
[17:16:10] <icinga-wm>	 PROBLEM - puppet last run on analytics1043 is CRITICAL: CRITICAL: puppet fail
[17:16:11] <icinga-wm>	 PROBLEM - puppet last run on elastic2019 is CRITICAL: CRITICAL: puppet fail
[17:16:11] <icinga-wm>	 PROBLEM - puppet last run on druid1003 is CRITICAL: CRITICAL: puppet fail
[17:16:20] <icinga-wm>	 PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: puppet fail
[17:16:31] <icinga-wm>	 PROBLEM - puppet last run on mw2112 is CRITICAL: CRITICAL: puppet fail
[17:16:32] <icinga-wm>	 PROBLEM - puppet last run on db1064 is CRITICAL: CRITICAL: puppet fail
[17:16:32] <icinga-wm>	 PROBLEM - puppet last run on titanium is CRITICAL: CRITICAL: puppet fail
[17:16:41] <icinga-wm>	 PROBLEM - puppet last run on elastic2002 is CRITICAL: CRITICAL: puppet fail
[17:16:42] <icinga-wm>	 PROBLEM - puppet last run on pybal-test2001 is CRITICAL: CRITICAL: puppet fail
[17:16:50] <icinga-wm>	 PROBLEM - puppet last run on cp1072 is CRITICAL: CRITICAL: puppet fail
[17:16:51] <icinga-wm>	 PROBLEM - puppet last run on cp1049 is CRITICAL: CRITICAL: puppet fail
[17:16:51] <icinga-wm>	 PROBLEM - puppet last run on mw2161 is CRITICAL: CRITICAL: puppet fail
[17:16:52] <icinga-wm>	 PROBLEM - puppet last run on wtp2002 is CRITICAL: CRITICAL: puppet fail
[17:16:52] <icinga-wm>	 PROBLEM - puppet last run on mw1106 is CRITICAL: CRITICAL: puppet fail
[17:17:01] <icinga-wm>	 PROBLEM - puppet last run on rdb1003 is CRITICAL: CRITICAL: puppet fail
[17:17:01] <icinga-wm>	 PROBLEM - puppet last run on analytics1048 is CRITICAL: CRITICAL: puppet fail
[17:17:01] <icinga-wm>	 PROBLEM - puppet last run on mw1007 is CRITICAL: CRITICAL: puppet fail
[17:17:02] <icinga-wm>	 PROBLEM - puppet last run on wtp1007 is CRITICAL: CRITICAL: puppet fail
[17:17:06] <akosiaris>	 I am around 
[17:17:10] <icinga-wm>	 PROBLEM - puppet last run on db1088 is CRITICAL: CRITICAL: puppet fail
[17:17:11] <icinga-wm>	 PROBLEM - puppet last run on db2070 is CRITICAL: CRITICAL: puppet fail
[17:17:12] <icinga-wm>	 PROBLEM - puppet last run on mw1271 is CRITICAL: CRITICAL: puppet fail
[17:17:13] <icinga-wm>	 PROBLEM - puppet last run on elastic2022 is CRITICAL: CRITICAL: puppet fail
[17:17:20] <icinga-wm>	 PROBLEM - puppet last run on mw1299 is CRITICAL: CRITICAL: puppet fail
[17:17:21] <icinga-wm>	 PROBLEM - puppet last run on wtp2005 is CRITICAL: CRITICAL: puppet fail
[17:17:21] <icinga-wm>	 PROBLEM - puppet last run on mw2243 is CRITICAL: CRITICAL: puppet fail
[17:17:21] <icinga-wm>	 PROBLEM - puppet last run on mc2012 is CRITICAL: CRITICAL: puppet fail
[17:17:22] <icinga-wm>	 PROBLEM - puppet last run on mw1024 is CRITICAL: CRITICAL: puppet fail
[17:17:30] <icinga-wm>	 PROBLEM - puppet last run on mw2139 is CRITICAL: CRITICAL: puppet fail
[17:17:30] <icinga-wm>	 PROBLEM - puppet last run on mw2201 is CRITICAL: CRITICAL: puppet fail
[17:17:30] <icinga-wm>	 PROBLEM - puppet last run on mw2064 is CRITICAL: CRITICAL: puppet fail
[17:17:30] <icinga-wm>	 PROBLEM - puppet last run on mw1281 is CRITICAL: CRITICAL: puppet fail
[17:17:30] <icinga-wm>	 PROBLEM - puppet last run on db1029 is CRITICAL: CRITICAL: puppet fail
[17:17:31] <icinga-wm>	 PROBLEM - puppet last run on cp2021 is CRITICAL: CRITICAL: puppet fail
[17:17:31] <icinga-wm>	 PROBLEM - puppet last run on mw2220 is CRITICAL: CRITICAL: puppet fail
[17:17:32] <icinga-wm>	 PROBLEM - puppet last run on logstash1001 is CRITICAL: CRITICAL: puppet fail
[17:17:32] <icinga-wm>	 PROBLEM - puppet last run on db1044 is CRITICAL: CRITICAL: puppet fail
[17:17:34] <jynus>	 I expect this to be a temporary glitch
[17:17:40] <icinga-wm>	 PROBLEM - puppet last run on mw1140 is CRITICAL: CRITICAL: puppet fail
[17:17:41] <icinga-wm>	 PROBLEM - puppet last run on mw1197 is CRITICAL: CRITICAL: puppet fail
[17:17:42] <icinga-wm>	 PROBLEM - puppet last run on mw2116 is CRITICAL: CRITICAL: puppet fail
[17:17:42] <icinga-wm>	 PROBLEM - puppet last run on analytics1001 is CRITICAL: CRITICAL: puppet fail
[17:17:43] <akosiaris>	 jynus: yeah, it's killing the puppet connections
[17:17:47] <jynus>	 because those persisten connections
[17:17:51] <icinga-wm>	 PROBLEM - puppet last run on mw2229 is CRITICAL: CRITICAL: puppet fail
[17:17:51] <icinga-wm>	 PROBLEM - puppet last run on elastic2017 is CRITICAL: CRITICAL: puppet fail
[17:17:55] <akosiaris>	 I 'll restart the puppetmaster
[17:18:00] <icinga-wm>	 PROBLEM - puppet last run on mw2105 is CRITICAL: CRITICAL: puppet fail
[17:18:00] <icinga-wm>	 PROBLEM - puppet last run on db2009 is CRITICAL: CRITICAL: puppet fail
[17:18:01] <icinga-wm>	 PROBLEM - puppet last run on analytics1057 is CRITICAL: CRITICAL: puppet fail
[17:18:01] <icinga-wm>	 PROBLEM - puppet last run on analytics1033 is CRITICAL: CRITICAL: puppet fail
[17:18:01] <icinga-wm>	 PROBLEM - puppet last run on aqs1003 is CRITICAL: CRITICAL: puppet fail
[17:18:01] <jynus>	 no, it works already
[17:18:02] <icinga-wm>	 PROBLEM - puppet last run on mw2100 is CRITICAL: CRITICAL: puppet fail
[17:18:02] <icinga-wm>	 PROBLEM - puppet last run on mw2075 is CRITICAL: CRITICAL: puppet fail
[17:18:05] <jynus>	 alex
[17:18:10] <akosiaris>	 a ok then
[17:18:11] <icinga-wm>	 PROBLEM - puppet last run on mw2199 is CRITICAL: CRITICAL: puppet fail
[17:18:11] <icinga-wm>	 PROBLEM - puppet last run on mw1266 is CRITICAL: CRITICAL: puppet fail
[17:18:11] <icinga-wm>	 PROBLEM - puppet last run on mw1150 is CRITICAL: CRITICAL: puppet fail
[17:18:12] <icinga-wm>	 PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: puppet fail
[17:18:16] <jynus>	 is from the time I switched until I killed them
[17:18:21] <icinga-wm>	 PROBLEM - puppet last run on mw2227 is CRITICAL: CRITICAL: puppet fail
[17:18:21] <icinga-wm>	 PROBLEM - puppet last run on mw2109 is CRITICAL: CRITICAL: puppet fail
[17:18:21] <icinga-wm>	 PROBLEM - puppet last run on dbproxy1002 is CRITICAL: CRITICAL: puppet fail
[17:18:22] <icinga-wm>	 PROBLEM - puppet last run on elastic1004 is CRITICAL: CRITICAL: puppet fail
[17:18:28] <akosiaris>	 quite a few boxes though
[17:18:30] <icinga-wm>	 PROBLEM - puppet last run on ganeti2004 is CRITICAL: CRITICAL: puppet fail
[17:18:31] <icinga-wm>	 PROBLEM - puppet last run on db2045 is CRITICAL: CRITICAL: puppet fail
[17:18:31] <icinga-wm>	 PROBLEM - puppet last run on ganeti1004 is CRITICAL: CRITICAL: puppet fail
[17:18:32] <akosiaris>	 a persistent connections
[17:18:34] <akosiaris>	 ok
[17:18:36] <jynus>	 yes
[17:18:40] <icinga-wm>	 PROBLEM - puppet last run on cp3018 is CRITICAL: CRITICAL: puppet fail
[17:18:41] <icinga-wm>	 PROBLEM - puppet last run on relforge1002 is CRITICAL: CRITICAL: puppet fail
[17:18:41] <icinga-wm>	 PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: puppet fail
[17:18:41] <icinga-wm>	 PROBLEM - puppet last run on ms-be2006 is CRITICAL: CRITICAL: puppet fail
[17:18:42] <icinga-wm>	 RECOVERY - puppet last run on db1064 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures
[17:18:43] <jynus>	 now I can document this
[17:18:52] <icinga-wm>	 PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: puppet fail
[17:18:52] <icinga-wm>	 PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: puppet fail
[17:18:53] <icinga-wm>	 PROBLEM - puppet last run on kafka1001 is CRITICAL: CRITICAL: puppet fail
[17:18:53] <icinga-wm>	 PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: puppet fail
[17:18:53] <jynus>	 and be faster next time
[17:19:00] <icinga-wm>	 PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: puppet fail
[17:19:01] <icinga-wm>	 PROBLEM - puppet last run on mw1143 is CRITICAL: CRITICAL: puppet fail
[17:19:04] <akosiaris>	 so, I also have bacula and etherpad
[17:19:09] <jynus>	 do it immediatelly, only affect a few ones
[17:19:10] <icinga-wm>	 PROBLEM - puppet last run on mw2188 is CRITICAL: CRITICAL: puppet fail
[17:19:10] <icinga-wm>	 PROBLEM - puppet last run on lvs1005 is CRITICAL: CRITICAL: puppet fail
[17:19:11] <icinga-wm>	 PROBLEM - puppet last run on db1033 is CRITICAL: CRITICAL: puppet fail
[17:19:11] <icinga-wm>	 PROBLEM - puppet last run on mw1003 is CRITICAL: CRITICAL: puppet fail
[17:19:11] <icinga-wm>	 PROBLEM - puppet last run on mw1204 is CRITICAL: CRITICAL: puppet fail
[17:19:12] <icinga-wm>	 PROBLEM - puppet last run on rdb2006 is CRITICAL: CRITICAL: puppet fail
[17:19:12] <icinga-wm>	 PROBLEM - puppet last run on mc2016 is CRITICAL: CRITICAL: puppet fail
[17:19:12] <icinga-wm>	 PROBLEM - puppet last run on mw2247 is CRITICAL: CRITICAL: puppet fail
[17:19:20] <icinga-wm>	 PROBLEM - puppet last run on mw2080 is CRITICAL: CRITICAL: puppet fail
[17:19:22] <jynus>	 etherpad worked, moritzm said
[17:19:28] <akosiaris>	 not surprised
[17:19:31] <icinga-wm>	 PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: puppet fail
[17:19:31] <icinga-wm>	 PROBLEM - puppet last run on cp3040 is CRITICAL: CRITICAL: puppet fail
[17:19:35] <akosiaris>	 probably systemd helped
[17:19:40] <icinga-wm>	 PROBLEM - puppet last run on mw1091 is CRITICAL: CRITICAL: puppet fail
[17:19:41] <icinga-wm>	 PROBLEM - puppet last run on mw2240 is CRITICAL: CRITICAL: puppet fail
[17:19:41] <icinga-wm>	 PROBLEM - puppet last run on ms-fe2004 is CRITICAL: CRITICAL: puppet fail
[17:19:49] <jynus>	 I see no connections on the old master
[17:19:50] <icinga-wm>	 PROBLEM - puppet last run on copper is CRITICAL: CRITICAL: puppet fail
[17:19:50] <icinga-wm>	 PROBLEM - puppet last run on analytics1035 is CRITICAL: CRITICAL: puppet fail
[17:19:51] <icinga-wm>	 PROBLEM - puppet last run on lvs1002 is CRITICAL: CRITICAL: puppet fail
[17:19:51] <icinga-wm>	 PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: puppet fail
[17:19:51] <icinga-wm>	 PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: puppet fail
[17:19:52] <icinga-wm>	 PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: puppet fail
[17:20:00] <icinga-wm>	 PROBLEM - puppet last run on mw1263 is CRITICAL: CRITICAL: puppet fail
[17:20:01] <icinga-wm>	 PROBLEM - puppet last run on elastic1046 is CRITICAL: CRITICAL: puppet fail
[17:20:01] <akosiaris>	 active (running) since Wed 2016-06-22 17:11:02 UTC; 8min ago
[17:20:02] <icinga-wm>	 PROBLEM - puppet last run on fluorine is CRITICAL: CRITICAL: puppet fail
[17:20:03] <jynus>	 it was only the 2 I mentioned that used some kind of pooling
[17:20:11] <icinga-wm>	 PROBLEM - puppet last run on ganeti1002 is CRITICAL: CRITICAL: puppet fail
[17:20:11] <icinga-wm>	 PROBLEM - puppet last run on mw1153 is CRITICAL: CRITICAL: puppet fail
[17:20:11] <icinga-wm>	 PROBLEM - puppet last run on mw2082 is CRITICAL: CRITICAL: puppet fail
[17:20:13] <akosiaris>	 yeah, it almost certainly croaked and died
[17:20:19] <jynus>	 plus reviewdb (gerrit) wasn't even there
[17:20:21] <akosiaris>	 and systemd restarted it
[17:20:32] <icinga-wm>	 PROBLEM - puppet last run on db1050 is CRITICAL: CRITICAL: puppet fail
[17:20:32] <akosiaris>	 gerrit does not do persistent connections ?
[17:20:36] <akosiaris>	 I am impressed
[17:20:37] <jynus>	 gerrit is not on m1
[17:20:39] <Krenair>	 gerrit was moved away from m1 years ago
[17:20:39] <jynus>	 :-)
[17:20:47] <Krenair>	 the data there is from 2012
[17:20:50] <akosiaris>	 aaa ok
[17:20:53] <akosiaris>	 cool then
[17:20:55] <jynus>	 and that is why I asked for owners other than me
[17:20:56] <Krenair>	 it's on m2 now
[17:21:22] <jynus>	 half of m1 databases were rubbish, tests, etc.
[17:21:36] <jynus>	 the only important thing left is bacula
[17:21:39] <akosiaris>	 expected I 'd say
[17:21:46] <Krenair>	 did we check heartbeat and rt?
[17:21:48] <akosiaris>	 !log restarted bacula-director on helium
[17:21:53] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:22:08] <jynus>	 Krenair, I said important
[17:22:19] <akosiaris>	 heartbeat ?
[17:22:28] <jynus>	 heartbeat is not a service
[17:22:36] <akosiaris>	 exactly which is why I asked
[17:22:45] <akosiaris>	 lemme have a look at RT though
[17:22:47] <Krenair>	 right, heartbeat is the mysql lag checking thing?
[17:22:56] <jynus>	 yes, I need to migrte that, but that is no rush
[17:23:05] <Krenair>	 so that's not important
[17:23:57] <akosiaris>	 !log restart apache on ununpentium for m1 migration. Hosts RT, just did it for good measure
[17:24:01] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:24:08] <akosiaris>	 jynus: so, what's left ?
[17:24:12] <jynus>	 bacula
[17:24:21] <akosiaris>	 already done
[17:24:29] <akosiaris>	 works fine
[17:25:08] <akosiaris>	 wow... 
[17:25:19] <akosiaris>	 restore is really fast now
[17:25:24] <akosiaris>	 jynus: thank you!
[17:25:24] <robh>	 things go smoothly, you wonder what you are missing right? ;]
[17:25:25] <jynus>	 ?
[17:25:34] <jynus>	 not so smoothly
[17:25:43] <robh>	 =[
[17:25:46] <jynus>	 too much puppet spam that could have been avoidable
[17:25:52] <jynus>	 but not unexpected
[17:26:08] <akosiaris>	 jynus: tbh, we should have just stopped the puppetmasters right before the migration
[17:26:17] <akosiaris>	 it's not like we can't survive without them
[17:26:26] <jynus>	 no, killing connections worked just as fine
[17:26:39] <jynus>	 but it is ok
[17:26:39] <akosiaris>	 minus the alerts
[17:26:57] <jynus>	 the alerts would have gone off if you killed puppet :-)
[17:27:01] <akosiaris>	 but yeah, stopping the puppetmasters would have caused alerts as well
[17:27:11] <jynus>	 we just needed to do either
[17:27:13] <jynus>	 faster
[17:27:25] <jynus>	 RT works
[17:28:04] <jynus>	 so, I will ask you one last favour, alex, add a one liner for the things you did (saw etherpad killed, restarting services)
[17:28:25] <jynus>	 with that and what I did, I will document it so we can do it better next time
[17:29:27] <akosiaris>	 jynus: I 've already logged it, were would you like me to also add that one liner ? got a pad handy ?
[17:29:50] <jynus>	 yes
[17:30:09] <moritzm>	 https://etherpad.wikimedia.org/p/m1-service-owners ?
[17:30:19] <akosiaris>	 cool
[17:30:32] <jynus>	 better down
[17:30:38] <jynus>	 one the bottom list
[17:30:44] <jynus>	 *on
[17:30:50] <wikibugs>	 06Operations, 10hardware-requests: Replace/refresh carbon - https://phabricator.wikimedia.org/T137117#2399999 (10RobH)
[17:31:01] <jynus>	 well, not literally one liners
[17:31:07] <jynus>	 I meant briefly
[17:31:19] <jynus>	 I will add it more clearly on wiki
[17:32:45] <wikibugs>	 06Operations, 10hardware-requests: eqiad: 1 hardware access request for labs graphite - https://phabricator.wikimedia.org/T137724#2400004 (10RobH) 05stalled>03Resolved So we ended up buying SSDs, which are being installed and upgrading the system via T137924.
[17:33:19] <akosiaris>	 ok I am off to SoS, bbl
[17:33:28] <jynus>	 thank you very much!
[17:33:41] <akosiaris>	 thanks as well!
[17:35:55] <jynus>	 I need to 1) change puppetization of haproxy so that in case haproxy goes down, it does not reload the previous config
[17:36:16] <jynus>	 2) migrate on puppet the master to be db1016 for heartbeat
[17:38:11] <jynus>	 thanks to all. spam aside, it wasn't that bad, wasn't it?
[17:38:37] <Krenair>	 certainly didn't look bad
[17:39:22] <jynus>	 well, m1 are internal services - no user impact
[17:40:41] <icinga-wm>	 PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 664 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5209706 keys - replication_delay is 664
[17:40:45] <moritzm>	 yeah, really smooth
[17:41:00] <icinga-wm>	 RECOVERY - puppet last run on analytics1015 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[17:41:11] <icinga-wm>	 RECOVERY - puppet last run on maps-test2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:41:31] <icinga-wm>	 RECOVERY - puppet last run on druid1003 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[17:41:50] <icinga-wm>	 RECOVERY - puppet last run on titanium is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures
[17:41:52] <jynus>	 the persistent connection stuff is an issue- most services use it and it complicates things
[17:42:01] <icinga-wm>	 RECOVERY - puppet last run on cp1072 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[17:42:21] <icinga-wm>	 RECOVERY - puppet last run on db1088 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[17:42:30] <icinga-wm>	 RECOVERY - puppet last run on db2070 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[17:42:31] <icinga-wm>	 RECOVERY - puppet last run on elastic2022 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[17:42:40] <icinga-wm>	 RECOVERY - puppet last run on mw1299 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[17:42:41] <icinga-wm>	 RECOVERY - puppet last run on mw2243 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures
[17:42:41] <icinga-wm>	 RECOVERY - puppet last run on mc2012 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures
[17:42:42] <icinga-wm>	 RECOVERY - puppet last run on mw2139 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures
[17:42:42] <icinga-wm>	 RECOVERY - puppet last run on db1029 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures
[17:42:51] <icinga-wm>	 RECOVERY - puppet last run on logstash1001 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures
[17:42:51] <icinga-wm>	 RECOVERY - puppet last run on cp2021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:42:52] <icinga-wm>	 RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5167508 keys - replication_delay is 0
[17:43:01] <icinga-wm>	 RECOVERY - puppet last run on analytics1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:43:02] <icinga-wm>	 RECOVERY - puppet last run on mw2116 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures
[17:43:11] <icinga-wm>	 RECOVERY - puppet last run on mw2160 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[17:43:12] <icinga-wm>	 RECOVERY - puppet last run on analytics1033 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[17:43:20] <icinga-wm>	 RECOVERY - puppet last run on aqs1003 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures
[17:43:21] <icinga-wm>	 RECOVERY - puppet last run on rdb2002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:43:21] <icinga-wm>	 RECOVERY - puppet last run on mw1266 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:43:31] <icinga-wm>	 RECOVERY - puppet last run on rdb1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:43:32] <icinga-wm>	 RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures
[17:43:32] <icinga-wm>	 RECOVERY - puppet last run on ms-be2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:43:40] <icinga-wm>	 RECOVERY - puppet last run on dbproxy1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:43:40] <icinga-wm>	 RECOVERY - puppet last run on mw2227 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures
[17:43:41] <icinga-wm>	 RECOVERY - puppet last run on mw2170 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures
[17:43:41] <icinga-wm>	 RECOVERY - puppet last run on analytics1043 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures
[17:43:41] <icinga-wm>	 RECOVERY - puppet last run on mc2009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:43:41] <icinga-wm>	 RECOVERY - puppet last run on elastic1004 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[17:43:42] <icinga-wm>	 RECOVERY - puppet last run on elastic2019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:43:51] <icinga-wm>	 RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:43:51] <icinga-wm>	 RECOVERY - puppet last run on ganeti1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:44:00] <icinga-wm>	 RECOVERY - puppet last run on relforge1002 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures
[17:44:01] <icinga-wm>	 RECOVERY - puppet last run on cp3018 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures
[17:44:01] <icinga-wm>	 RECOVERY - puppet last run on mw2112 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:44:10] <icinga-wm>	 RECOVERY - puppet last run on kafka1001 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures
[17:44:11] <icinga-wm>	 RECOVERY - puppet last run on helium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:44:21] <icinga-wm>	 RECOVERY - puppet last run on elastic2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:44:21] <icinga-wm>	 RECOVERY - puppet last run on cp1049 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:44:21] <icinga-wm>	 RECOVERY - puppet last run on pybal-test2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:44:22] <icinga-wm>	 RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures
[17:44:22] <icinga-wm>	 RECOVERY - puppet last run on mw2161 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures
[17:44:26] <godog>	 jynus: no it wasn't :) thanks (librenms is fine)
[17:44:30] <icinga-wm>	 RECOVERY - puppet last run on db1033 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures
[17:44:31] <icinga-wm>	 RECOVERY - puppet last run on mw1106 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:44:31] <icinga-wm>	 RECOVERY - puppet last run on wtp2002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:44:31] <icinga-wm>	 RECOVERY - puppet last run on rdb2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:44:31] <icinga-wm>	 RECOVERY - puppet last run on analytics1048 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures
[17:44:31] <icinga-wm>	 RECOVERY - puppet last run on rdb1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:44:32] <icinga-wm>	 RECOVERY - puppet last run on mw1007 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[17:44:32] <icinga-wm>	 RECOVERY - puppet last run on mc2016 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures
[17:44:40] <icinga-wm>	 RECOVERY - puppet last run on wtp1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:44:41] <icinga-wm>	 RECOVERY - puppet last run on mw1271 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:44:50] <icinga-wm>	 RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures
[17:44:50] <icinga-wm>	 RECOVERY - puppet last run on cp3040 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[17:44:51] <icinga-wm>	 RECOVERY - puppet last run on mw1091 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures
[17:44:57] <grrrit-wm>	 (03PS1) 10Urbanecm: Add www.photolib.noaa.gov to wgCopyUploadDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295558 (https://phabricator.wikimedia.org/T138383) 
[17:45:00] <icinga-wm>	 RECOVERY - puppet last run on wtp2005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:00] <icinga-wm>	 RECOVERY - puppet last run on ms-fe2004 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures
[17:45:01] <icinga-wm>	 RECOVERY - puppet last run on mw2240 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:01] <icinga-wm>	 RECOVERY - puppet last run on mw1024 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures
[17:45:01] <icinga-wm>	 RECOVERY - puppet last run on mw2201 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:45:01] <icinga-wm>	 RECOVERY - puppet last run on mw1281 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:45:01] <icinga-wm>	 RECOVERY - puppet last run on analytics1035 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures
[17:45:02] <icinga-wm>	 RECOVERY - puppet last run on copper is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:02] <icinga-wm>	 RECOVERY - puppet last run on mw2064 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures
[17:45:03] <icinga-wm>	 RECOVERY - puppet last run on lvs1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:03] <icinga-wm>	 RECOVERY - puppet last run on analytics1003 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures
[17:45:10] <icinga-wm>	 RECOVERY - puppet last run on mw2220 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:10] <icinga-wm>	 RECOVERY - puppet last run on db1044 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:45:11] <icinga-wm>	 RECOVERY - puppet last run on mw1140 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:11] <icinga-wm>	 RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures
[17:45:11] <icinga-wm>	 RECOVERY - puppet last run on mw1197 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures
[17:45:12] <icinga-wm>	 RECOVERY - puppet last run on mw1263 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:20] <icinga-wm>	 RECOVERY - puppet last run on elastic1046 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:21] <icinga-wm>	 RECOVERY - puppet last run on fluorine is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:22] <icinga-wm>	 RECOVERY - puppet last run on mw2229 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:30] <icinga-wm>	 RECOVERY - puppet last run on elastic2017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:30] <icinga-wm>	 RECOVERY - puppet last run on mw1243 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:45:30] <icinga-wm>	 RECOVERY - puppet last run on ganeti1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:31] <icinga-wm>	 RECOVERY - puppet last run on mw2105 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures
[17:45:31] <icinga-wm>	 RECOVERY - puppet last run on analytics1057 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:32] <icinga-wm>	 RECOVERY - puppet last run on db2009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:41] <icinga-wm>	 RECOVERY - puppet last run on mw2100 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:41] <icinga-wm>	 RECOVERY - puppet last run on mw2075 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:41] <icinga-wm>	 RECOVERY - puppet last run on mw2199 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:42] <icinga-wm>	 RECOVERY - puppet last run on mw1150 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[17:45:51] <icinga-wm>	 RECOVERY - puppet last run on db1050 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:52] <icinga-wm>	 RECOVERY - puppet last run on mw2164 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:45:52] <icinga-wm>	 RECOVERY - puppet last run on mw2109 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures
[17:46:02] <icinga-wm>	 RECOVERY - puppet last run on ganeti2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:46:10] <icinga-wm>	 RECOVERY - puppet last run on db2045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:46:20] <icinga-wm>	 RECOVERY - puppet last run on lvs3003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:46:21] <icinga-wm>	 RECOVERY - puppet last run on ms-be2006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:46:21] <icinga-wm>	 RECOVERY - puppet last run on elastic1018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:46:21] <icinga-wm>	 RECOVERY - puppet last run on bast1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:46:30] <icinga-wm>	 RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:46:41] <icinga-wm>	 RECOVERY - puppet last run on mw1143 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:46:41] <icinga-wm>	 RECOVERY - puppet last run on mw2188 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:46:50] <icinga-wm>	 RECOVERY - puppet last run on mw1204 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:46:50] <icinga-wm>	 RECOVERY - puppet last run on mw1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:46:51] <icinga-wm>	 RECOVERY - puppet last run on mw2247 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:47:01] <icinga-wm>	 RECOVERY - puppet last run on mw2080 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:47:21] <icinga-wm>	 RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[17:47:28] <yurik>	 icinga seems chatty today
[17:47:41] <icinga-wm>	 RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:47:50] <icinga-wm>	 RECOVERY - puppet last run on mw2082 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:47:52] <jynus>	 yurik, it was a temporary maintenance of puppet
[17:48:05] <yurik>	 yeah, i figured its not too serious :)
[17:48:13] <yurik>	 otherwise it wouldn't say RECOVERY :)
[17:53:42] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2400045 (10RobH)
[17:53:48] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10Wikidata, 10Wikimedia-Language-setup, and 2 others: Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2400044 (10RobH)
[18:02:11] <icinga-wm>	 PROBLEM - swift-object-server on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server
[18:02:21] <icinga-wm>	 PROBLEM - swift-container-server on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server
[18:02:31] <icinga-wm>	 PROBLEM - swift-object-updater on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater
[18:03:00] <wikibugs>	 06Operations, 10ops-eqiad: rack/setup/install/deploy labsdb1009-labsdb1011 - https://phabricator.wikimedia.org/T136860#2400064 (10Cmjohnson) p:05Normal>03High
[18:03:02] <icinga-wm>	 PROBLEM - swift-account-server on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server
[18:03:08] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "that is probably not the best place to ensure that. Why just not ensure that on the service level ? IIRC, service-runner supports max exec" [puppet] - 10https://gerrit.wikimedia.org/r/295548 (https://phabricator.wikimedia.org/T138422) (owner: 10Yurik)
[18:03:32] <wikibugs>	 06Operations, 10ops-eqiad: rack/setup/install/deploy labsdb1009-labsdb1011 - https://phabricator.wikimedia.org/T136860#2350219 (10Cmjohnson)
[18:03:42] <icinga-wm>	 PROBLEM - swift-container-updater on ms-be2022 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater
[18:04:21] <wikibugs>	 06Operations, 10ops-eqiad: db1009 degraded RAID (failed disk) - https://phabricator.wikimedia.org/T138203#2400068 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson All disks are online...resolved
[18:06:44] <wikibugs>	 06Operations, 10ops-eqiad, 10media-storage: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2400073 (10Cmjohnson) @fgiunchedi Row A is pretty full at the moment. I can add to A5 which is a 10G rack.  A5/C8 and D8 are all 10G racks  I can add 1 each to 10G and 1 each to 1G if you l...
[18:07:18] <grrrit-wm>	 (03CR) 10Yurik: "service runner does not cancel sql query. We would have to do something fairly elaborate to catch many different cases of errors to cancel" [puppet] - 10https://gerrit.wikimedia.org/r/295548 (https://phabricator.wikimedia.org/T138422) (owner: 10Yurik)
[18:07:59] <yurik>	 jgirault, ^ replied
[18:08:08] <yurik>	 oops, i meant akosiaris 
[18:08:13] <jgirault>	 :)
[18:08:17] <yurik>	 sorry :)
[18:12:45] <grrrit-wm>	 (03PS1) 10Jdlrobson: WIP: Disable special casing of main page where it is not required [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295560 (https://phabricator.wikimedia.org/T138129) 
[18:13:00] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "no, service-runner kills the entire process. Which should also cancel the query. I am still not sold on that idea. For example it has the " [puppet] - 10https://gerrit.wikimedia.org/r/295548 (https://phabricator.wikimedia.org/T138422) (owner: 10Yurik)
[18:14:27] <yurik>	 akosiaris, unless i'm mistaken, one worker can process multiple user requests, correct?  In which case, we do NOT want the service runner to kill it, because other requests will also be killed
[18:17:13] <wikibugs>	 06Operations, 10ops-eqiad: eqiad: Install SSD's into ganeti hosts - https://phabricator.wikimedia.org/T138414#2400121 (10akosiaris) @Cmjohnson  yes indeed. So, doing this without causing any downtime will take some time.   Proposal:  * I drain one node per time * We install the new SSDs * I 'll reimage the box...
[18:18:04] <akosiaris>	 yurik: I confess not knowing that, perhaps services can help a bit on that. mobrovac ^?
[18:18:47] <yurik>	 akosiaris, i really don't understand why postgress cannot handle this?? I think this is the most natural place for such things
[18:19:20] <wikibugs>	 06Operations, 10DBA: m1-master switch from db1001 to db1016 - https://phabricator.wikimedia.org/T106312#2400123 (10jcrespo)
[18:19:22] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: Upgrade m1 db servers - https://phabricator.wikimedia.org/T135973#2400122 (10jcrespo)
[18:19:46] <yurik>	 akosiaris,  postgres has internal cancellation, which should kill it automatically. This would be much more reliable than anything we implement (simply because there are more people looking at it)
[18:20:24] <yurik>	 plus it is perfectly suited for our usecase - cancelling just the production user role, not the batch processing
[18:20:30] <papaul>	 !log ms-be202[3-7] - signing puppet certs, salt-key, initial run
[18:20:34] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:20:45] <wikibugs>	 06Operations, 10DBA: m1-master switch from db1001 to db1016 - https://phabricator.wikimedia.org/T106312#1464530 (10jcrespo) These are the notes from the migration, to be documented on wiki:  * bacula ; sudo service bacula-director restart after the migration. I had already made sure no jobs were running with s...
[18:21:09] <grrrit-wm>	 (03PS1) 10Jcrespo: Set temporarilly m1 haproxy to failover to itself (db1016) [puppet] - 10https://gerrit.wikimedia.org/r/295562 (https://phabricator.wikimedia.org/T106312) 
[18:22:16] <grrrit-wm>	 (03PS2) 10Jcrespo: Set temporarily m1 haproxy to failover to itself (db1016) [puppet] - 10https://gerrit.wikimedia.org/r/295562 (https://phabricator.wikimedia.org/T106312) 
[18:22:36] <grrrit-wm>	 (03CR) 10Yurik: "Why would we want to kill the whole process??? Unless I'm mistaken, one process can serve multiple requests at the same time (preemptive m" [puppet] - 10https://gerrit.wikimedia.org/r/295548 (https://phabricator.wikimedia.org/T138422) (owner: 10Yurik)
[18:24:00] <wikibugs>	 06Operations, 10ops-eqiad: eqiad: Install SSD's into ganeti hosts - https://phabricator.wikimedia.org/T138414#2400143 (10Cmjohnson) Great! Let me know when it's ready.  Chris
[18:27:29] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Set temporarily m1 haproxy to failover to itself (db1016) [puppet] - 10https://gerrit.wikimedia.org/r/295562 (https://phabricator.wikimedia.org/T106312) (owner: 10Jcrespo)
[18:31:57] <wikibugs>	 06Operations, 10ops-codfw, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2400153 (10Papaul)
[18:32:15] <wikibugs>	 06Operations, 10ops-codfw, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2341867 (10Papaul) a:05Papaul>03fgiunchedi
[18:36:19] <icinga-wm>	 PROBLEM - swift-object-replicator on ms-be2025 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator
[18:36:48] <icinga-wm>	 PROBLEM - swift-account-auditor on ms-be2026 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor
[18:36:48] <icinga-wm>	 PROBLEM - swift-object-server on ms-be2025 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server
[18:37:19] <icinga-wm>	 PROBLEM - swift-account-reaper on ms-be2026 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper
[18:37:29] <icinga-wm>	 PROBLEM - swift-account-replicator on ms-be2026 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator
[18:37:29] <icinga-wm>	 PROBLEM - puppet last run on ms-be2025 is CRITICAL: CRITICAL: Puppet has 19 failures
[18:37:59] <icinga-wm>	 PROBLEM - swift-account-auditor on ms-be2025 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor
[18:38:29] <icinga-wm>	 PROBLEM - swift-object-replicator on ms-be2027 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator
[18:38:29] <icinga-wm>	 PROBLEM - swift-account-reaper on ms-be2025 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper
[18:38:49] <icinga-wm>	 PROBLEM - swift-account-replicator on ms-be2025 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator
[18:38:49] <icinga-wm>	 PROBLEM - puppet last run on ms-be2024 is CRITICAL: CRITICAL: Puppet has 12 failures
[18:38:49] <icinga-wm>	 PROBLEM - swift-object-server on ms-be2027 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server
[18:39:29] <icinga-wm>	 PROBLEM - puppet last run on ms-be2027 is CRITICAL: CRITICAL: Puppet has 18 failures
[18:39:48] <icinga-wm>	 PROBLEM - swift-object-replicator on ms-be2026 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator
[18:40:09] <icinga-wm>	 PROBLEM - swift-account-auditor on ms-be2027 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor
[18:40:09] <icinga-wm>	 PROBLEM - swift-object-server on ms-be2026 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server
[18:40:09] <icinga-wm>	 PROBLEM - puppet last run on ms-be2023 is CRITICAL: CRITICAL: Puppet has 12 failures
[18:40:14] <grrrit-wm>	 (03PS1) 10Jcrespo: Promote db1016 as the m1 shard master, set db1001 as a m1 slave [puppet] - 10https://gerrit.wikimedia.org/r/295563 (https://phabricator.wikimedia.org/T106312) 
[18:40:19] <icinga-wm>	 PROBLEM - swift-account-reaper on ms-be2027 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper
[18:40:39] <icinga-wm>	 PROBLEM - puppet last run on ms-be2026 is CRITICAL: CRITICAL: Puppet has 18 failures
[18:40:39] <icinga-wm>	 PROBLEM - swift-account-replicator on ms-be2027 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator
[18:41:01] <jynus>	 papaul, those that errored here^ are new hosts, right?
[18:43:24] <grrrit-wm>	 (03PS1) 10Ori.livneh: [NOT FOR MERGING] hack maintain-replicas.pl for adywiki/jamwiki [software] - 10https://gerrit.wikimedia.org/r/295564 (https://phabricator.wikimedia.org/T135029) 
[18:43:41] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [NOT FOR MERGING] hack maintain-replicas.pl for adywiki/jamwiki [software] - 10https://gerrit.wikimedia.org/r/295564 (https://phabricator.wikimedia.org/T135029) (owner: 10Ori.livneh)
[18:46:09] <grrrit-wm>	 (03PS2) 10Ori.livneh: [NOT FOR MERGING] hack maintain-replicas.pl for adywiki/jamwiki [software] - 10https://gerrit.wikimedia.org/r/295564 (https://phabricator.wikimedia.org/T135029) 
[18:46:10] <jynus>	 !log shutting down and reimaging db1001
[18:46:14] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[18:46:25] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [NOT FOR MERGING] hack maintain-replicas.pl for adywiki/jamwiki [software] - 10https://gerrit.wikimedia.org/r/295564 (https://phabricator.wikimedia.org/T135029) (owner: 10Ori.livneh)
[18:47:08] <icinga-wm>	 RECOVERY - swift-account-reaper on ms-be2027 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper
[18:47:28] <icinga-wm>	 RECOVERY - swift-account-replicator on ms-be2027 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator
[18:47:38] <icinga-wm>	 RECOVERY - swift-object-replicator on ms-be2027 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator
[18:47:49] <icinga-wm>	 RECOVERY - swift-object-server on ms-be2027 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server
[18:49:09] <icinga-wm>	 RECOVERY - swift-account-auditor on ms-be2027 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor
[18:49:16] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Promote db1016 as the m1 shard master, set db1001 as a m1 slave [puppet] - 10https://gerrit.wikimedia.org/r/295563 (https://phabricator.wikimedia.org/T106312) (owner: 10Jcrespo)
[18:49:20] <grrrit-wm>	 (03PS1) 10ArielGlenn: use flow.dblist which now explicitly lists all flow wikis [dumps] - 10https://gerrit.wikimedia.org/r/295566 
[18:49:59] <jynus>	 ^oh, great- the change I asked for is useful for other people too
[18:51:35] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10DBA, 06Labs, and 2 others: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2400172 (10ori) Sample invocation:  ```lang=bash $ DB_USER=root DB_PASS=password perl maintain-replicas.pl 2> T...
[18:54:18] <icinga-wm>	 RECOVERY - swift-account-reaper on ms-be2025 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper
[18:54:18] <icinga-wm>	 RECOVERY - swift-object-replicator on ms-be2025 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator
[18:54:24] <grrrit-wm>	 (03PS1) 10Jcrespo: Install jessie on db1001 by default [puppet] - 10https://gerrit.wikimedia.org/r/295568 (https://phabricator.wikimedia.org/T135973) 
[18:54:29] <icinga-wm>	 RECOVERY - swift-account-replicator on ms-be2025 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator
[18:54:48] <icinga-wm>	 RECOVERY - swift-object-server on ms-be2025 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server
[18:54:54] <grrrit-wm>	 (03CR) 10Jcrespo: [C: 032] Install jessie on db1001 by default [puppet] - 10https://gerrit.wikimedia.org/r/295568 (https://phabricator.wikimedia.org/T135973) (owner: 10Jcrespo)
[18:55:59] <icinga-wm>	 RECOVERY - swift-account-auditor on ms-be2025 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor
[18:56:59] <icinga-wm>	 RECOVERY - swift-account-auditor on ms-be2026 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor
[18:57:29] <icinga-wm>	 RECOVERY - swift-account-reaper on ms-be2026 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper
[18:57:40] <icinga-wm>	 RECOVERY - swift-account-replicator on ms-be2026 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator
[18:57:48] <icinga-wm>	 RECOVERY - swift-object-replicator on ms-be2026 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator
[18:57:50] <grrrit-wm>	 (03PS1) 10ArielGlenn: use explicit list of flow-enabled dbs for dumps config files [puppet] - 10https://gerrit.wikimedia.org/r/295569 
[18:58:00] <icinga-wm>	 RECOVERY - swift-object-server on ms-be2026 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server
[18:59:06] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] use explicit list of flow-enabled dbs for dumps config files [puppet] - 10https://gerrit.wikimedia.org/r/295569 (owner: 10ArielGlenn)
[18:59:19] <icinga-wm>	 PROBLEM - NTP on etherpad1001 is CRITICAL: NTP CRITICAL: Offset 11.80056512 secs
[18:59:20] <grrrit-wm>	 (03PS2) 10Jcrespo: Install jessie on db1001 by default [puppet] - 10https://gerrit.wikimedia.org/r/295568 (https://phabricator.wikimedia.org/T135973) 
[18:59:49] <icinga-wm>	 PROBLEM - NTP on planet1001 is CRITICAL: NTP CRITICAL: Offset 36.21695542 secs
[19:00:04] <jouncebot>	 hashar: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160622T1900). Please do the needful.
[19:01:04] <thcipriani>	 train for group1 → wmf.7 coming up
[19:02:21] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] use flow.dblist which now explicitly lists all flow wikis [dumps] - 10https://gerrit.wikimedia.org/r/295566 (owner: 10ArielGlenn)
[19:03:38] <grrrit-wm>	 (03PS1) 10Thcipriani: group1 wikis to 1.28.0-wmf.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295570 
[19:05:02] <grrrit-wm>	 (03CR) 10Thcipriani: [C: 032] group1 wikis to 1.28.0-wmf.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295570 (owner: 10Thcipriani)
[19:05:37] <grrrit-wm>	 (03Merged) 10jenkins-bot: group1 wikis to 1.28.0-wmf.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295570 (owner: 10Thcipriani)
[19:06:01] <logmsgbot>	 !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.28.0-wmf.7
[19:06:14] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:10:20] <grrrit-wm>	 (03PS4) 10BBlack: role::cache::text: handle url shortener requests [puppet] - 10https://gerrit.wikimedia.org/r/295493 (https://phabricator.wikimedia.org/T133485) (owner: 10Giuseppe Lavagetto)
[19:10:26] <icinga-wm>	 PROBLEM - NTP on fermium is CRITICAL: NTP CRITICAL: Offset 39.35874355 secs
[19:12:48] <wikibugs>	 06Operations, 10Gerrit, 06Release-Engineering-Team, 06WMF-Legal, and 2 others: Gerrit seemingly violates data retention guidelines - https://phabricator.wikimedia.org/T114395#1694145 (10greg) >>! In T114395#2399437, @Mpaulson wrote: > Has this been adjusted so that it deletes the logs after 30 days?  Based...
[19:13:36] <icinga-wm>	 PROBLEM - NTP on krypton is CRITICAL: NTP CRITICAL: Offset 43.89846325 secs
[19:14:07] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] role::cache::text: handle url shortener requests [puppet] - 10https://gerrit.wikimedia.org/r/295493 (https://phabricator.wikimedia.org/T133485) (owner: 10Giuseppe Lavagetto)
[19:17:06] <bblack>	 of course, forgotten semicolons :P
[19:18:16] <grrrit-wm>	 (03PS1) 10BBlack: Post-merge VCL syntax fix for e031db9e [puppet] - 10https://gerrit.wikimedia.org/r/295571 
[19:18:32] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] Post-merge VCL syntax fix for e031db9e [puppet] - 10https://gerrit.wikimedia.org/r/295571 (owner: 10BBlack)
[19:20:40] <grrrit-wm>	 (03CR) 10MaxSem: [C: 04-1] "z0 queries run for 5+ minutes." [puppet] - 10https://gerrit.wikimedia.org/r/295548 (https://phabricator.wikimedia.org/T138422) (owner: 10Yurik)
[19:21:00] <yurik>	 MaxSem, its for KARTOTHERIAN user, not TILERATOR
[19:22:05] <grrrit-wm>	 (03CR) 10Yurik: "Max, this is not tilerator user that runs those queries, its karotherian (production) user only" [puppet] - 10https://gerrit.wikimedia.org/r/295548 (https://phabricator.wikimedia.org/T138422) (owner: 10Yurik)
[19:23:25] <MaxSem>	 yurik, what's it there for, then?
[19:23:52] <yurik>	 MaxSem, in case we misconfigure the simplification shape - and the shape query runs too long
[19:24:13] <yurik>	 this is a safety thing
[19:24:26] <grrrit-wm>	 (03PS4) 10BBlack: LVS: sysctl netdev tuning [puppet] - 10https://gerrit.wikimedia.org/r/295544 
[19:24:28] <grrrit-wm>	 (03PS3) 10BBlack: r::c::perf: raise netdev_budget a bit [puppet] - 10https://gerrit.wikimedia.org/r/295543 
[19:24:30] <grrrit-wm>	 (03PS3) 10BBlack: r::c::perf: disable prequeue timestamps [puppet] - 10https://gerrit.wikimedia.org/r/295542 
[19:24:32] <grrrit-wm>	 (03PS3) 10BBlack: r::c::perf: un-mysterious the rest [puppet] - 10https://gerrit.wikimedia.org/r/295541 
[19:24:34] <grrrit-wm>	 (03PS2) 10BBlack: r::c::perf: un-mysterious somaxconn + syn_backlog [puppet] - 10https://gerrit.wikimedia.org/r/295540 
[19:24:36] <grrrit-wm>	 (03PS2) 10BBlack: r::c::perf: un-mysterious netdev_max_backlog [puppet] - 10https://gerrit.wikimedia.org/r/295539 
[19:24:38] <grrrit-wm>	 (03PS2) 10BBlack: r::c::perf: consolidate net tuning and mysterious values [puppet] - 10https://gerrit.wikimedia.org/r/295538 
[19:26:10] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] r::c::perf: consolidate net tuning and mysterious values [puppet] - 10https://gerrit.wikimedia.org/r/295538 (owner: 10BBlack)
[19:26:30] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] r::c::perf: un-mysterious netdev_max_backlog [puppet] - 10https://gerrit.wikimedia.org/r/295539 (owner: 10BBlack)
[19:26:52] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] r::c::perf: un-mysterious somaxconn + syn_backlog [puppet] - 10https://gerrit.wikimedia.org/r/295540 (owner: 10BBlack)
[19:26:58] <legoktm>	 bblack: \o/
[19:27:05] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] r::c::perf: un-mysterious the rest [puppet] - 10https://gerrit.wikimedia.org/r/295541 (owner: 10BBlack)
[19:27:42] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] r::c::perf: disable prequeue timestamps [puppet] - 10https://gerrit.wikimedia.org/r/295542 (owner: 10BBlack)
[19:27:45] <legoktm>	 bblack: does https://w.wiki/ need to be purged for the new target to show up?
[19:28:05] <bblack>	 legoktm: not sure, so far the VCL change hasn't even rolled out to all the caches
[19:28:07] <icinga-wm>	 PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master).
[19:28:14] <legoktm>	 ah okay
[19:28:20] <legoktm>	 I tried https://w.wiki/2 and it worked :D
[19:28:38] <bblack>	 probably it "works" because people aren't using those URLs yet and they're not much in cache, if at all
[19:28:51] <bblack>	 if you start hitting them before it finishes rolling out, you'll cause caching of the old bad stuff :)
[19:28:57] <legoktm>	 oops
[19:29:01] <legoktm>	 makes sense then
[19:29:30] <jynus>	 !log archiving and dropping reviewdb on m1 shard
[19:29:37] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:31:56] <wikibugs>	 06Operations, 10MediaWiki-extensions-UrlShortener, 10Traffic, 13Patch-For-Review, 05Wikimania-Hackathon-2016: Configure apache rules for UrlShortener - https://phabricator.wikimedia.org/T133485#2400220 (10Legoktm) 05Open>03Resolved a:05Dereckson>03Joe
[19:32:50] <bblack>	 !log start rollout of first batch of cache sysctl stuff (un-mysterious + disable prequeue timestamps)
[19:32:57] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:41:39] <grrrit-wm>	 (03PS1) 10EBernhardson: Add de_dot filter and rename to logstash-filters-wikimedia [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/295575 
[19:42:06] <papaul>	 jynus: yes there are new hosts
[19:42:13] <grrrit-wm>	 (03PS2) 10EBernhardson: Add de_dot filter and rename to logstash-filters-wikimedia [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/295575 
[19:42:23] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: Upgrade m1 db servers - https://phabricator.wikimedia.org/T135973#2400248 (10jcrespo)
[19:42:25] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: m1-master switch from db1001 to db1016 - https://phabricator.wikimedia.org/T106312#2400246 (10jcrespo) 05Open>03Resolved db1016 is the new master of m1.
[19:42:33] <grrrit-wm>	 (03PS1) 10Ppchelko: Change-Prop: Added rules for ORES cache updates [puppet] - 10https://gerrit.wikimedia.org/r/295576 
[19:42:53] <grrrit-wm>	 (03PS3) 10EBernhardson: Add de_dot filter and rename to logstash-filters-wikimedia [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/295575 (https://phabricator.wikimedia.org/T138335) 
[19:45:55] <grrrit-wm>	 (03PS1) 10BBlack: move r::c::base classparams down as appropriate [puppet] - 10https://gerrit.wikimedia.org/r/295577 
[19:48:26] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: reimage or decom db servers on precise - https://phabricator.wikimedia.org/T125028#2400265 (10jcrespo) After m1 failover, the only precise hosts left are:  db1043.eqiad.wmnet: True db1048.eqiad.wmnet: True  Which are m3 (phabricator) db hosts, and it requires #phabric...
[19:48:54] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] "Compiler-verified no-op" [puppet] - 10https://gerrit.wikimedia.org/r/295577 (owner: 10BBlack)
[19:50:29] <grrrit-wm>	 (03PS1) 10EBernhardson: logstash: Utilize de_dot filter for sending to es 2.x [puppet] - 10https://gerrit.wikimedia.org/r/295578 (https://phabricator.wikimedia.org/T138335) 
[19:50:34] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: reimage or decom db servers on precise - https://phabricator.wikimedia.org/T125028#2400276 (10jcrespo) >>! In T125028#2400265, @jcrespo wrote: > After m1 failover, the only precise hosts left are: >  > db1043.eqiad.wmnet: True > db1048.eqiad.wmnet: True >  > Which are...
[19:50:45] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 04-1] "needs to be tested on beta cluster" [puppet] - 10https://gerrit.wikimedia.org/r/295578 (https://phabricator.wikimedia.org/T138335) (owner: 10EBernhardson)
[19:52:01] <grrrit-wm>	 (03CR) 10EBernhardson: "i'm also not sure if we really need to be adding an es2 tag and removing the es tag. I need to look more into how logstash runs this stuff" [puppet] - 10https://gerrit.wikimedia.org/r/295578 (https://phabricator.wikimedia.org/T138335) (owner: 10EBernhardson)
[19:52:06] <wikibugs>	 06Operations, 06Performance-Team, 10scap, 07HHVM, and 2 others: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352#2400279 (10greg)
[19:52:07] <grrrit-wm>	 (03CR) 10Mobrovac: [C: 04-1] Change-Prop: Added rules for ORES cache updates (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/295576 (owner: 10Ppchelko)
[19:53:13] <grrrit-wm>	 (03PS2) 10EBernhardson: logstash: Utilize de_dot filter for sending to es 2.x [puppet] - 10https://gerrit.wikimedia.org/r/295578 (https://phabricator.wikimedia.org/T138335) 
[19:54:36] <grrrit-wm>	 (03PS5) 10BBlack: LVS: sysctl netdev tuning [puppet] - 10https://gerrit.wikimedia.org/r/295544 
[19:54:38] <grrrit-wm>	 (03PS4) 10BBlack: r::c::perf: raise netdev_budget a bit [puppet] - 10https://gerrit.wikimedia.org/r/295543 
[19:55:17] <grrrit-wm>	 (03CR) 10BBlack: [C: 032 V: 032] r::c::perf: raise netdev_budget a bit [puppet] - 10https://gerrit.wikimedia.org/r/295543 (owner: 10BBlack)
[19:56:55] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] LVS: sysctl netdev tuning [puppet] - 10https://gerrit.wikimedia.org/r/295544 (owner: 10BBlack)
[19:57:07] <grrrit-wm>	 (03PS2) 10Ppchelko: Change-Prop: Added rules for ORES cache updates [puppet] - 10https://gerrit.wikimedia.org/r/295576 
[19:57:42] <grrrit-wm>	 (03CR) 10BryanDavis: logstash: Utilize de_dot filter for sending to es 2.x (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295578 (https://phabricator.wikimedia.org/T138335) (owner: 10EBernhardson)
[19:58:28] <grrrit-wm>	 (03PS3) 10Ppchelko: Change-Prop: Added rules for ORES cache updates [puppet] - 10https://gerrit.wikimedia.org/r/295576 
[20:00:04] <jouncebot>	 gwicke, cscott, arlolra, subbu, bearND, and mdholloway: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / Mobileapps / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160622T2000).
[20:00:25] <mdholloway>	 i'll deploy mobileapps in a bit
[20:00:30] <subbu>	 no parsoid deploy today.
[20:07:12] <grrrit-wm>	 (03PS1) 10Yurik: LABS: Enable geoshapes graph protocol [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295580 (https://phabricator.wikimedia.org/T138192) 
[20:09:41] <yurik>	 thcipriani, is the train still on?
[20:09:56] <thcipriani>	 yurik: train is complete
[20:10:03] <greg-g>	 :06 < logmsgbot> !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.28.0-wmf.7
[20:10:12] <yurik>	 thcipriani, cool, i will do the services then - need to deploy maps
[20:10:40] <thcipriani>	 ack, sounds good :)
[20:11:09] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] "generally lgtm, comment inline. Lemme know when you would like to have it deployed" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/295576 (owner: 10Ppchelko)
[20:12:28] <grrrit-wm>	 (03PS1) 10Yurik: LABS: Enable graphoid geoshapes [puppet] - 10https://gerrit.wikimedia.org/r/295581 (https://phabricator.wikimedia.org/T138192) 
[20:12:33] <yurik>	 greg-g, you are always on top of everything, eh? :)
[20:14:03] <grrrit-wm>	 (03PS4) 10Ppchelko: Change-Prop: Added rules for ORES cache updates [puppet] - 10https://gerrit.wikimedia.org/r/295576 
[20:15:22] <grrrit-wm>	 (03PS5) 10Ppchelko: Change-Prop: Added rules for ORES cache updates [puppet] - 10https://gerrit.wikimedia.org/r/295576 
[20:17:19] <grrrit-wm>	 (03CR) 10Ppchelko: Change-Prop: Added rules for ORES cache updates (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/295576 (owner: 10Ppchelko)
[20:17:48] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 031] "Happy to merge whenever you want me to" [puppet] - 10https://gerrit.wikimedia.org/r/295576 (owner: 10Ppchelko)
[20:18:55] <yurik>	 !log deployed & restarted kartotherian https://gerrit.wikimedia.org/r/#/c/295449/
[20:19:02] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:19:19] <greg-g>	 yurik: even at half-time working ;) (not really, but I am half-time working)
[20:19:33] <yurik>	 greg-g, what do you do the other half?
[20:19:52] <greg-g>	 parental leave
[20:20:20] <Reedy>	 !log created tmplog_begin_devices on tmplog_end_devices on testwiki.cn_template_log 
[20:20:26] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:25:33] <mdholloway>	 !log starting mobileapps deployment
[20:25:40] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:26:12] <yurik>	 !log deployed & restarted tilerator https://gerrit.wikimedia.org/r/#/c/295447/
[20:26:19] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:27:09] <Reedy>	 greg-g: working half time?
[20:27:13] <Reedy>	 So you're only doing 40 hours?
[20:32:03] <grrrit-wm>	 (03PS4) 10Alexandros Kosiaris: ldaplist: Allow searching for more than attribute [puppet] - 10https://gerrit.wikimedia.org/r/295475 
[20:33:17] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "I 've added (in a rather ugly way indeed) also default substring/equality matches based on the attribute name and the present OpenLDAP ind" [puppet] - 10https://gerrit.wikimedia.org/r/295475 (owner: 10Alexandros Kosiaris)
[20:33:39] <mdholloway>	 !log mobileapps: finished deploying 8046ee2
[20:33:46] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[20:34:05] <mdholloway>	 smooth sailing with scap3, lemme tell ya
[20:37:28] <greg-g>	 Reedy: :P
[20:43:18] <grrrit-wm>	 (03CR) 10MaxSem: [C: 031] "There can be multiple safeguards from overloading, and this is the first line of defense. Service killer is also useful, however if just s" [puppet] - 10https://gerrit.wikimedia.org/r/295548 (https://phabricator.wikimedia.org/T138422) (owner: 10Yurik)
[20:44:11] <yurik>	 akosiaris, ^
[20:44:38] <thcipriani>	 mdholloway: so mobileapps deploys are going smoothly, sounds like? IIRC there was a bit of an adjustment period :)
[20:45:43] <mdholloway>	 thcipriani: hey tyler!  yeah, i know bernd ran into some issues with scap not restarting the service, iirc.  this was my first scap deployment and it was totally smooth, though
[20:45:43] <grrrit-wm>	 (03PS1) 10Gehel: Moving elasticsearch masters to new servers [puppet] - 10https://gerrit.wikimedia.org/r/295585 (https://phabricator.wikimedia.org/T138329) 
[20:46:11] <thcipriani>	 mdholloway: :D happy to hear it!
[20:48:28] <icinga-wm>	 RECOVERY - NTP on planet1001 is OK: NTP OK: Offset 0.008202791214 secs
[20:49:00] <grrrit-wm>	 (03CR) 10Gehel: [C: 04-1] "I'd like to have some discussion on how to deploy this before merging it." [puppet] - 10https://gerrit.wikimedia.org/r/295585 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[20:55:26] <wikibugs>	 06Operations, 06Discovery, 10Elasticsearch, 03Discovery-Search-Sprint, 13Patch-For-Review: Install and configure new elasticsearch servers in eqiad - https://phabricator.wikimedia.org/T138329#2400433 (10Gehel) Plan to deploy https://gerrit.wikimedia.org/r/295585 (move elastic masters to new servers):  #...
[20:56:18] <icinga-wm>	 RECOVERY - NTP on fermium is OK: NTP OK: Offset -0.01325929165 secs
[20:59:59] <Dereckson>	 !log Run namespaceDupes.php on ptwikinews (T138230) and frwikinews (T138442)
[21:00:01] <stashbot>	 T138442: fr.wikinews contains a unfixable pagelink issue according namespaceDupes.php - https://phabricator.wikimedia.org/T138442
[21:00:01] <stashbot>	 T138230: Restore former namespaces on pt.wikinews - https://phabricator.wikimedia.org/T138230
[21:00:06] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:00:58] <icinga-wm>	 RECOVERY - NTP on etherpad1001 is OK: NTP OK: Offset 0.001503109932 secs
[21:02:44] <grrrit-wm>	 (03PS1) 10ArielGlenn: add job that dumps history of flow pages [dumps] - 10https://gerrit.wikimedia.org/r/295587 (https://phabricator.wikimedia.org/T89398) 
[21:03:07] <grrrit-wm>	 (03PS1) 10BBlack: r::c::perf: disable autocorking [puppet] - 10https://gerrit.wikimedia.org/r/295588 
[21:03:13] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] add job that dumps history of flow pages [dumps] - 10https://gerrit.wikimedia.org/r/295587 (https://phabricator.wikimedia.org/T89398) (owner: 10ArielGlenn)
[21:05:13] <grrrit-wm>	 (03CR) 10BBlack: [C: 032] r::c::perf: disable autocorking [puppet] - 10https://gerrit.wikimedia.org/r/295588 (owner: 10BBlack)
[21:06:43] <bblack>	 !log cache perf: start deploy of -autocorking (probably last experiment I can squeeze in today)
[21:06:50] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[21:08:46] <grrrit-wm>	 (03CR) 10Yurik: (WIP) Notify TileratorUI on new expiry files (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295450 (https://phabricator.wikimedia.org/T108459) (owner: 10Yurik)
[21:12:23] <grrrit-wm>	 (03PS3) 10EBernhardson: logstash: Utilize de_dot filter for sending to es 2.x [puppet] - 10https://gerrit.wikimedia.org/r/295578 (https://phabricator.wikimedia.org/T138335) 
[21:12:48] <grrrit-wm>	 (03PS1) 10Gehel: Postgresql: init database with Puppet [puppet] - 10https://gerrit.wikimedia.org/r/295589 (https://phabricator.wikimedia.org/T138092) 
[21:14:01] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Postgresql: init database with Puppet [puppet] - 10https://gerrit.wikimedia.org/r/295589 (https://phabricator.wikimedia.org/T138092) (owner: 10Gehel)
[21:15:55] <grrrit-wm>	 (03CR) 10Gehel: "I'm not really happy about not notifying pg-reload. I could probably work around the cycle be requiring just the needed resources, but tha" [puppet] - 10https://gerrit.wikimedia.org/r/295589 (https://phabricator.wikimedia.org/T138092) (owner: 10Gehel)
[21:16:28] <icinga-wm>	 RECOVERY - NTP on krypton is OK: NTP OK: Offset 0.009777903557 secs
[21:17:18] <grrrit-wm>	 (03PS2) 10Gehel: Postgresql: init database with Puppet [puppet] - 10https://gerrit.wikimedia.org/r/295589 (https://phabricator.wikimedia.org/T138092) 
[21:18:37] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Postgresql: init database with Puppet [puppet] - 10https://gerrit.wikimedia.org/r/295589 (https://phabricator.wikimedia.org/T138092) (owner: 10Gehel)
[21:21:26] <grrrit-wm>	 (03PS2) 10ArielGlenn: add job that dumps history of flow pages [dumps] - 10https://gerrit.wikimedia.org/r/295587 (https://phabricator.wikimedia.org/T89398) 
[21:29:03] <grrrit-wm>	 (03PS3) 10Gehel: Postgresql: init database with Puppet [puppet] - 10https://gerrit.wikimedia.org/r/295589 (https://phabricator.wikimedia.org/T138092) 
[21:33:40] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: "false alarm on my part. Everything looks fine, first ever backup is running" [puppet] - 10https://gerrit.wikimedia.org/r/293690 (https://phabricator.wikimedia.org/T80385) (owner: 10Muehlenhoff)
[21:56:29] <grrrit-wm>	 (03CR) 10EBernhardson: "4 master capable? I would have thought it would be either 3 or 5 with the quorum at 2 or 3 respectively to prevent a split brain." [puppet] - 10https://gerrit.wikimedia.org/r/295585 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[22:00:19] <grrrit-wm>	 (03CR) 10Gehel: "That's what i cet for working when I should ne sleeping. Thanks for putting pour comment in a friendly wording. I might have had less tact" [puppet] - 10https://gerrit.wikimedia.org/r/295585 (https://phabricator.wikimedia.org/T138329) (owner: 10Gehel)
[22:00:59] <grrrit-wm>	 (03PS4) 10Nicko: Include a cassandra::instance::monitoring class [puppet] - 10https://gerrit.wikimedia.org/r/295123 (https://phabricator.wikimedia.org/T137422) 
[22:05:21] <wikibugs>	 06Operations, 06Discovery, 10Elasticsearch, 03Discovery-Search-Sprint, 13Patch-For-Review: Install and configure new elasticsearch servers in eqiad - https://phabricator.wikimedia.org/T138329#2400594 (10EBernhardson) seems sane enough. Both ways give us opportunity for failure. I think the 3->2->3 master...
[22:25:04] <ori>	 !log Ran hacked maintain-replicas.pl on labsdb100[13] for T135029
[22:25:05] <stashbot>	 T135029: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029
[22:25:12] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[22:25:34] <grrrit-wm>	 (03Abandoned) 10Nicko: T137422 Include cassandra monitoring in all the ::cassandra calls [puppet] - 10https://gerrit.wikimedia.org/r/295125 (owner: 10Nicko)
[22:25:47] <ori>	 gehel: ^. Seems to have worked, but MySQL isn't running on labsdb1002, so that one was not updated.
[22:28:00] <grrrit-wm>	 (03PS3) 10Ori.livneh: [NOT FOR MERGING] hack maintain-replicas.pl for adywiki/jamwiki [software] - 10https://gerrit.wikimedia.org/r/295564 (https://phabricator.wikimedia.org/T135029) 
[22:28:17] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] [NOT FOR MERGING] hack maintain-replicas.pl for adywiki/jamwiki [software] - 10https://gerrit.wikimedia.org/r/295564 (https://phabricator.wikimedia.org/T135029) (owner: 10Ori.livneh)
[22:28:54] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10DBA, 06Labs, and 2 others: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2400629 (10ori) I ran it against labsdb1001 and labsdb1003 and it seems to have done the trick. labsdb1002's My...
[22:31:09] <grrrit-wm>	 (03Restored) 10Nicko: T137422 Include cassandra monitoring in all the ::cassandra calls [puppet] - 10https://gerrit.wikimedia.org/r/295125 (owner: 10Nicko)
[22:31:28] <grrrit-wm>	 (03PS2) 10Nicko: Include a cassandra::instance::monitoring class [puppet] - 10https://gerrit.wikimedia.org/r/295125 (https://phabricator.wikimedia.org/T137422) 
[22:34:42] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10DBA, 06Labs, and 2 others: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2400632 (10MaxSem) Confirmed working. Let's keep this bug open until labsdb1002 also gets fixed.
[22:36:41] <MaxSem>	 ori, :beer: \m/
[22:36:57] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10DBA, 06Labs, and 2 others: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2400639 (10Krenair) Thank you @ori! I believe the issues with labsdb1002 are {T126946}
[22:38:53] <ori>	 MaxSem: that script needs a rewrite :/
[22:39:44] <MaxSem>	 ori, at least shit got done ;)
[22:45:45] <grrrit-wm>	 (03PS2) 10Jdlrobson: Disable special casing of main page where it is not required [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295560 
[22:45:51] <ori>	 Krenair: you rewrote it in Python? I just came across maintain-meta_p.py. Has it ever been used? It's probably a better foundation for future refactoring work than the perl script.
[22:46:02] <Krenair>	 ah
[22:46:07] <Krenair>	 That's actually not the whole maintain-replicas
[22:46:16] <ori>	 just the parts that updates the meta_p table, gotcha.
[22:46:23] <ori>	 s/table/db/
[22:46:27] <Krenair>	 yeah
[22:46:56] <Krenair>	 I'm not sure it was ever run
[22:46:56] <Krenair>	 I could probably do the same with the rest of the script at some point, I guess
[22:59:56] <grrrit-wm>	 (03PS3) 10Jdlrobson: Disable special casing of main page where it is not required [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295560 
[23:00:04] <jouncebot>	 RoanKattouw, ostriches, Krenair, MaxSem, and Dereckson: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160622T2300).
[23:00:24] <jdlrobson>	 (here)
[23:00:53] * MaxSem can do it
[23:01:32] <grrrit-wm>	 (03CR) 10MaxSem: [C: 04-1] Enable lazy loaded images on Ukranian and Farsi Wikipedias (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294247 (https://phabricator.wikimedia.org/T134003) (owner: 10Jdlrobson)
[23:03:13] <MaxSem>	 jdlrobson, https://gerrit.wikimedia.org/r/#/c/295560/3/wmf-config/InitialiseSettings.php is pretty huge. does it make sense to put it in a dblist?
[23:03:18] <grrrit-wm>	 (03PS2) 10Jdlrobson: Enable lazy loaded images on Ukranian and Farsi Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294247 (https://phabricator.wikimedia.org/T134003) 
[23:03:26] <grrrit-wm>	 (03CR) 10Jdlrobson: Enable lazy loaded images on Ukranian and Farsi Wikipedias (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294247 (https://phabricator.wikimedia.org/T134003) (owner: 10Jdlrobson)
[23:03:30] <grrrit-wm>	 (03PS3) 10Jdlrobson: Enable lazy loaded images on Ukranian and Farsi Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294247 (https://phabricator.wikimedia.org/T134003) 
[23:03:46] <jdlrobson>	 MaxSem: oohh dblist.. how do i do that?
[23:03:53] <wikibugs>	 06Operations, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: /mnt/upload7 does not exist anywhere, yet it is referenced in multiple places in wmf-config - https://phabricator.wikimedia.org/T129586#2400679 (10Krenair) `/mnt/upload7` does not appear to exist anywhere within the beta cluster, it appears to...
[23:03:56] <jdlrobson>	 while writing this i was thinking ... there must be a cleaner way.
[23:04:28] <MaxSem>	 https://github.com/wikimedia/operations-mediawiki-config/tree/master/dblists
[23:04:47] <RoanKattouw>	 You also need to add it to the list of dblists in CommonSettings.php (search for e.g. 'nonbetafeatures')
[23:04:49] <jdlrobson>	 MaxSem: how do i link that to the config variable?
[23:04:56] <jdlrobson>	 ah thanks RoanKattouw lemme take a look
[23:05:14] <RoanKattouw>	 And then when you've created foobar.dblist you can do stuff like array( 'default' => false, 'foobar' => true )
[23:05:28] <MaxSem>	 then just use the name of dblist as if it was a wiki name
[23:06:06] <Reedy>	 array()? ;)
[23:06:19] <RoanKattouw>	 Right, we have [] in core already
[23:06:21] <MaxSem>	 Reedy, valid PHP ;)
[23:06:27] * RoanKattouw lives in Echo which hasn't switched yet
[23:06:58] <grrrit-wm>	 (03PS2) 10MaxSem: Add www.photolib.noaa.gov to wgCopyUploadDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295558 (https://phabricator.wikimedia.org/T138383) (owner: 10Urbanecm)
[23:07:08] <grrrit-wm>	 (03CR) 10MaxSem: [C: 032] Add www.photolib.noaa.gov to wgCopyUploadDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295558 (https://phabricator.wikimedia.org/T138383) (owner: 10Urbanecm)
[23:08:06] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add www.photolib.noaa.gov to wgCopyUploadDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295558 (https://phabricator.wikimedia.org/T138383) (owner: 10Urbanecm)
[23:08:44] <jdlrobson>	 MaxSem: RoanKattouw done! :)
[23:08:46] <grrrit-wm>	 (03PS4) 10Jdlrobson: Disable special casing of main page where it is not required [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295560 
[23:09:25] <grrrit-wm>	 (03CR) 10Reedy: [C: 04-1] "Needs adding to tags in CommonSettings.php" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295560 (owner: 10Jdlrobson)
[23:09:30] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Disable special casing of main page where it is not required [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295560 (owner: 10Jdlrobson)
[23:09:30] <logmsgbot>	 !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/295558/ (duration: 00m 40s)
[23:09:37] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:09:37] <Dereckson>	 Testing.
[23:09:54] <Dereckson>	 RoanKattouw: https://github.com/thomasbachem/php-short-array-syntax-converter gives good results
[23:09:55] <Reedy>	 jdlrobson: look for $wikiTags = []; in CommonSettings
[23:10:11] <wikibugs>	 06Operations, 13Patch-For-Review: Remove secure.wikimedia.org - https://phabricator.wikimedia.org/T120790#1861112 (10demon) >>! In T120790#2395641, @Dzahn wrote: > I guess we should call it declined then...  Bummer :(
[23:10:44] <RoanKattouw>	 Dereckson: Yeah legoktm already submitted a patch to convert everything, but I let it rot because we had a lot of big commits pending, right around the cross-wiki notifications release
[23:10:59] <grrrit-wm>	 (03PS5) 10Jdlrobson: Disable special casing of main page where it is not required [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295560 
[23:11:01] <icinga-wm>	 RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge.
[23:11:02] <jdlrobson>	 thanks Reedy !
[23:11:04] <RoanKattouw>	 Now is a much better time to do something like that, but by now that commit is so outdated that it'll need to be rebuilt from scratch
[23:11:07] <Dereckson>	 MaxSem: works
[23:11:13] <MaxSem>	 thanks Dereckson 
[23:11:20] <grrrit-wm>	 (03CR) 10MaxSem: [C: 032] Enable lazy loaded images on Ukranian and Farsi Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294247 (https://phabricator.wikimedia.org/T134003) (owner: 10Jdlrobson)
[23:11:38] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Disable special casing of main page where it is not required [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295560 (owner: 10Jdlrobson)
[23:12:21] <grrrit-wm>	 (03PS4) 10MaxSem: Enable lazy loaded images on Ukranian and Farsi Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294247 (https://phabricator.wikimedia.org/T134003) (owner: 10Jdlrobson)
[23:12:24] <Reedy>	 heh
[23:12:25] <Reedy>	 +    41 => 'zh-min-nanwiktionary'
[23:12:27] <grrrit-wm>	 (03CR) 10MaxSem: [C: 032] Enable lazy loaded images on Ukranian and Farsi Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294247 (https://phabricator.wikimedia.org/T134003) (owner: 10Jdlrobson)
[23:12:29] <grrrit-wm>	 (03PS6) 10Jdlrobson: Disable special casing of main page where it is not required [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295560 
[23:12:49] <Reedy>	 zh_min_nanwiktionary
[23:13:09] <grrrit-wm>	 (03Merged) 10jenkins-bot: Enable lazy loaded images on Ukranian and Farsi Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/294247 (https://phabricator.wikimedia.org/T134003) (owner: 10Jdlrobson)
[23:14:01] <logmsgbot>	 !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/294247/ (duration: 00m 24s)
[23:14:07] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:14:17] <MaxSem>	 jdlrobson, please test
[23:14:26] <jdlrobson>	 MaxSem: on it!
[23:14:28] <Dereckson>	 jdlrobson: dblist is mobilemainpagelegacy but option is MFSpecialCaseMainPage
[23:15:14] <Dereckson>	 could be interesting to harmonize a little bit, excepted if the legacy concept implies future settings changes.
[23:15:42] <jdlrobson>	 Dereckson: ? Not sure I follow.
[23:16:27] <jdlrobson>	 MaxSem: tested. All good
[23:16:48] <Dereckson>	 jdlrobson: if the dblist filename and the option name are coherent, it's easier to know they are related
[23:17:52] <jdlrobson>	 Dereckson: the MFSpecialCaseMainPage is a config variable so not easily changed. mobilemainpagelegacy describes the problem (there may be other config variables that use it in future as we migrate away)
[23:17:57] <jdlrobson>	 so better to be generic no?
[23:18:22] * Dereckson nods
[23:19:01] <Dereckson>	 And yes absolutely as there may be future variables changes.
[23:19:49] <MaxSem>	 jdlrobson, why are you keeping a list of wikis that have the same value as default?
[23:20:11] <jdlrobson>	 MaxSem: once i've finished the audit (a few more wiki projects i'll set default to false)
[23:20:26] <MaxSem>	 mhm
[23:20:33] <jdlrobson>	 this just helps the wiktionary and wikiquote cases
[23:20:47] <jdlrobson>	 i'll make sure i do that by next thursday latest :)
[23:21:02] <grrrit-wm>	 (03PS7) 10MaxSem: Disable special casing of main page where it is not required [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295560 (owner: 10Jdlrobson)
[23:21:11] <grrrit-wm>	 (03CR) 10MaxSem: [C: 032] Disable special casing of main page where it is not required [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295560 (owner: 10Jdlrobson)
[23:21:48] <grrrit-wm>	 (03Merged) 10jenkins-bot: Disable special casing of main page where it is not required [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295560 (owner: 10Jdlrobson)
[23:23:14] <logmsgbot>	 !log maxsem@tin Synchronized dblists/mobilemainpagelegacy.dblist: https://gerrit.wikimedia.org/r/#/c/295560/ (duration: 00m 24s)
[23:23:21] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:23:49] <wikibugs>	 06Operations, 10DBA, 13Patch-For-Review: reimage or decom db servers on precise - https://phabricator.wikimedia.org/T125028#2400711 (10mmodell) @jcrespo: The not-exactly-official list of phabricator admins would be myself, @demon and @Aklapper.  I don't think there should be any issue with moving phabricator...
[23:23:55] <logmsgbot>	 !log maxsem@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/295560/ (duration: 00m 24s)
[23:24:01] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:24:55] <logmsgbot>	 !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/295560/ (duration: 00m 25s)
[23:25:01] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[23:25:06] <MaxSem>	 jdlrobson, please test
[23:25:12] <jdlrobson>	 MaxSem:  on it!
[23:26:49] <jdlrobson>	 MaxSem: looks good so far
[23:26:54] <MaxSem>	 thanks
[23:26:59] <MaxSem>	 we're done
[23:27:46] <jdlrobson>	 MaxSem: thanks Max.
[23:29:53] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10DBA, 06Labs, and 2 others: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2400771 (10ori) Filed {T138450} for tracking issues with the script.
[23:30:29] <wikibugs>	 07Blocked-on-Operations, 06Operations, 10DBA, 06Labs, and 2 others: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2400777 (10ori)
[23:30:57] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2400728 (10ori)
[23:43:44] <wikibugs>	 07Blocked-on-Operations, 10DBA, 06Labs, 10Labs-Infrastructure: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2400728 (10Krenair) I wouldn't call this strictly blocked-on-ops yet (since although ops would have to approve the code, the next step is writing it a...
[23:50:41] <grrrit-wm>	 (03PS1) 10Alex Monk: Follow-up I64f5f5a8: Update maintain-meta_p.py script for dblist move too [software] - 10https://gerrit.wikimedia.org/r/295598 
[23:50:58] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Follow-up I64f5f5a8: Update maintain-meta_p.py script for dblist move too [software] - 10https://gerrit.wikimedia.org/r/295598 (owner: 10Alex Monk)
[23:53:40] <grrrit-wm>	 (03PS2) 10Alex Monk: Follow-up I64f5f5a8: Update maintain-meta_p.py script for dblist move too [software] - 10https://gerrit.wikimedia.org/r/295598 
[23:53:57] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Follow-up I64f5f5a8: Update maintain-meta_p.py script for dblist move too [software] - 10https://gerrit.wikimedia.org/r/295598 (owner: 10Alex Monk)
[23:54:24] <Krenair>	 ... Is Jenkins broken?
[23:54:44] <grrrit-wm>	 (03PS1) 10Jdlrobson: Complete list of legacy main pages, switch default to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295600 (https://phabricator.wikimedia.org/T138425) 
[23:55:25] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Complete list of legacy main pages, switch default to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295600 (https://phabricator.wikimedia.org/T138425) (owner: 10Jdlrobson)
[23:56:33] <grrrit-wm>	 (03PS2) 10Jdlrobson: Complete list of legacy main pages, switch default to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295600 (https://phabricator.wikimedia.org/T138425) 
[23:57:01] <Krenair>	 legoktm, any ideas what's up with jenkins in my operations/software.git patch up there?
[23:57:07] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Complete list of legacy main pages, switch default to false [mediawiki-config] - 10https://gerrit.wikimedia.org/r/295600 (https://phabricator.wikimedia.org/T138425) (owner: 10Jdlrobson)
[23:58:16] <Krenair>	 oh, it's probably 2AM for you now