[00:23:27] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s8 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1030, Errmsg: Error Got error 175 File too short: Expected more data in file from storage engine Aria on query. Default database: wikidatawiki. [Query snipped]
[00:23:49] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Incorrect key file for table linter: try to repair it on query. Default database: commonswiki. [Query snipped]
[00:23:59] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s6 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Incorrect key file for table linter: try to repair it on query. Default database: frwiki. [Query snipped]
[00:24:11] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Incorrect key file for table linter: try to repair it on query. Default database: mediawikiwiki. [Query snipped]
[00:24:33] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Incorrect key file for table linter: try to repair it on query. Default database: eswiki. [Query snipped]
[00:24:33] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Incorrect key file for table ores_classification: try to repair it on query. Default database: enwiki. [Query snipped]
[00:24:33] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Incorrect key file for table linter: try to repair it on query. Default database: zhwiki. [Query snipped]
[00:28:12] <onimisionipe>	 Hopefully this ^ paged someone..
[00:28:25] <SQL>	 onimisionipe: Well, it pinged me fwiw :P
[00:30:35] <icinga-wm>	 PROBLEM - MariaDB Slave SQL: s5 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Incorrect key file for table linter: try to repair it on query. Default database: dewiki. [Query snipped]
[00:32:59] <paladox>	 it may have paged the dba
[00:34:49] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1002.14 seconds
[00:36:59] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s8 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 965.01 seconds
[00:37:03] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s6 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 968.29 seconds
[00:37:09] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 974.22 seconds
[00:37:15] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 961.09 seconds
[00:37:25] <paladox>	 Any ops around for ^^?
[00:37:29] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 993.77 seconds
[00:37:37] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 994.82 seconds
[00:43:39] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s5 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 986.95 seconds
[01:07:53] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received
[01:10:13] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[01:10:51] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1003 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received
[01:13:19] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy
[01:13:19] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1002 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received
[01:14:05] <wikibugs>	 (03PS3) 10AndyRussG: Give protect right to centralnoticeadmin on Meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483044 (https://phabricator.wikimedia.org/T209873)
[01:15:11] <icinga-wm>	 PROBLEM - citoid endpoints health on scb1004 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received
[01:16:17] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy
[01:16:53] <icinga-wm>	 RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy
[01:31:11] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received
[01:33:35] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[01:35:40] <wikibugs>	 (03CR) 10Kosta Harlan: [C: 03+1] [WIP] Change links of wgGEHelpPanelLinks for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483996 (https://phabricator.wikimedia.org/T209467) (owner: 10Revi)
[06:29:25] <icinga-wm>	 PROBLEM - puppet last run on ms-be1030 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/smartmontools/run.d/20logger]
[06:29:31] <icinga-wm>	 PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/rsyslog.d/10-puppet-agent.conf]
[06:32:55] <wikibugs>	 (03PS1) 10Marostegui: Offboarding Balazs [puppet] - 10https://gerrit.wikimedia.org/r/484155
[06:36:51] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "AFAIK we need first to define "ensure: absent" on the user." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/484155 (owner: 10Marostegui)
[06:39:30] <wikibugs>	 (03PS2) 10Marostegui: Offboarding Balazs [puppet] - 10https://gerrit.wikimedia.org/r/484155
[06:40:05] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Offboarding Balazs [puppet] - 10https://gerrit.wikimedia.org/r/484155 (owner: 10Marostegui)
[06:41:38] <marostegui>	 gah
[06:42:37] <wikibugs>	 (03PS3) 10Marostegui: Offboarding Balazs [puppet] - 10https://gerrit.wikimedia.org/r/484155
[06:43:08] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 03+1] Offboarding Balazs [puppet] - 10https://gerrit.wikimedia.org/r/484155 (owner: 10Marostegui)
[06:44:14] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Offboarding Balazs [puppet] - 10https://gerrit.wikimedia.org/r/484155 (owner: 10Marostegui)
[06:46:44] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 23228.36 seconds Marostegui T213670 - The acknowledgement expires at: 2019-01-16 06:46:25.
[06:46:44] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave Lag: s2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 23074.78 seconds Marostegui T213670 - The acknowledgement expires at: 2019-01-16 06:46:25.
[06:46:44] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave Lag: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 23041.36 seconds Marostegui T213670 - The acknowledgement expires at: 2019-01-16 06:46:25.
[06:46:44] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave Lag: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 23146.81 seconds Marostegui T213670 - The acknowledgement expires at: 2019-01-16 06:46:25.
[06:46:44] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave Lag: s5 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 22706.60 seconds Marostegui T213670 - The acknowledgement expires at: 2019-01-16 06:46:25.
[06:46:44] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave Lag: s6 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 23050.46 seconds Marostegui T213670 - The acknowledgement expires at: 2019-01-16 06:46:25.
[06:46:44] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave Lag: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 23056.95 seconds Marostegui T213670 - The acknowledgement expires at: 2019-01-16 06:46:25.
[06:46:45] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave Lag: s8 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 23120.86 seconds Marostegui T213670 - The acknowledgement expires at: 2019-01-16 06:46:25.
[06:46:45] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave SQL: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Incorrect key file for table ores_classification: try to repair it on query. Default database: enwiki. [Query snipped] Marostegui T213670 - The acknowledgement expires at: 2019-01-16 06:46:25.
[06:46:46] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave SQL: s2 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Incorrect key file for table linter: try to repair it on query. Default database: zhwiki. [Query snipped] Marostegui T213670 - The acknowledgement expires at: 2019-01-16 06:46:25.
[06:46:46] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave SQL: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Incorrect key file for table linter: try to repair it on query. Default database: mediawikiwiki. [Query snipped] Marostegui T213670 - The acknowledgement expires at: 2019-01-16 06:46:25.
[06:46:47] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave SQL: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Incorrect key file for table linter: try to repair it on query. Default database: commonswiki. [Query snipped] Marostegui T213670 - The acknowledgement expires at: 2019-01-16 06:46:25.
[06:46:47] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave SQL: s5 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Incorrect key file for table linter: try to repair it on query. Default database: dewiki. [Query snipped] Marostegui T213670 - The acknowledgement expires at: 2019-01-16 06:46:25.
[06:46:48] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave SQL: s6 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Incorrect key file for table linter: try to repair it on query. Default database: frwiki. [Query snipped] Marostegui T213670 - The acknowledgement expires at: 2019-01-16 06:46:25.
[06:46:48] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave SQL: s7 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Incorrect key file for table linter: try to repair it on query. Default database: eswiki. [Query snipped] Marostegui T213670 - The acknowledgement expires at: 2019-01-16 06:46:25.
[06:46:49] <icinga-wm>	 ACKNOWLEDGEMENT - MariaDB Slave SQL: s8 on dbstore1002 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1030, Errmsg: Error Got error 175 File too short: Expected more data in file from storage engine Aria on query. Default database: wikidatawiki. [Query snipped] Marostegui T213670 - The acknowledgement expires at: 2019-01-16 06:46:25.
[06:52:02] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received
[06:52:56] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[06:54:10] <wikibugs>	 (03PS1) 10Marostegui: data.yaml: Remove ssh and email from Balazs [puppet] - 10https://gerrit.wikimedia.org/r/484156
[06:55:02] <icinga-wm>	 PROBLEM - puppet last run on dbstore1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): User[banyek]
[06:55:49] <marostegui>	 ^ this gets fixed with a second puppet run
[06:55:58] <icinga-wm>	 PROBLEM - puppet last run on db1078 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): User[banyek]
[06:56:54] <icinga-wm>	 PROBLEM - puppet last run on db1091 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): User[banyek]
[07:00:06] <icinga-wm>	 RECOVERY - puppet last run on ms-be1030 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[07:00:08] <icinga-wm>	 RECOVERY - puppet last run on dbstore1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[07:00:12] <icinga-wm>	 RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[07:01:26] <icinga-wm>	 PROBLEM - puppet last run on db1094 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): User[banyek]
[07:06:14] <icinga-wm>	 RECOVERY - puppet last run on db1078 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[07:06:34] <icinga-wm>	 RECOVERY - puppet last run on db1094 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:06:36] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] install_server: Install pc1007 [puppet] - 10https://gerrit.wikimedia.org/r/483854 (https://phabricator.wikimedia.org/T207258) (owner: 10Marostegui)
[07:06:41] <wikibugs>	 (03PS2) 10Marostegui: install_server: Install pc1007 [puppet] - 10https://gerrit.wikimedia.org/r/483854 (https://phabricator.wikimedia.org/T207258)
[07:06:49] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+1] data.yaml: Remove ssh and email from Balazs [puppet] - 10https://gerrit.wikimedia.org/r/484156 (owner: 10Marostegui)
[07:07:08] <icinga-wm>	 RECOVERY - puppet last run on db1091 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[07:07:41] <wikibugs>	 (03PS2) 10Marostegui: data.yaml: Remove ssh and email from Balazs [puppet] - 10https://gerrit.wikimedia.org/r/484156
[07:09:16] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] data.yaml: Remove ssh and email from Balazs [puppet] - 10https://gerrit.wikimedia.org/r/484156 (owner: 10Marostegui)
[07:10:54] <icinga-wm>	 PROBLEM - Router interfaces on mr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.199, interfaces up: 35, down: 1, dormant: 0, excluded: 1, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:12:54] <icinga-wm>	 PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100%
[07:13:08] <icinga-wm>	 PROBLEM - puppet last run on db1122 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): User[banyek]
[07:13:14] <icinga-wm>	 PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100%
[07:13:30] <vgutierrez>	 the Equinix OOB is the interface that went down
[07:13:34] <vgutierrez>	 (on mr1-eqiad)
[07:14:30] <icinga-wm>	 RECOVERY - Router interfaces on mr1-eqiad is OK: OK: host 208.80.154.199, interfaces up: 37, down: 0, dormant: 0, excluded: 1, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[07:15:00] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Overall lgtm, see the two comments. The absence of mod_passenger from the frontend is the real showstopper in this case." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/451821 (owner: 10Dzahn)
[07:16:53] <icinga-wm>	 PROBLEM - puppet last run on db1109 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): User[banyek]
[07:17:59] <icinga-wm>	 RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 3.36 ms
[07:18:09] <icinga-wm>	 RECOVERY - puppet last run on db1122 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[07:18:21] <icinga-wm>	 RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 2.50 ms
[07:21:53] <icinga-wm>	 RECOVERY - puppet last run on db1109 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[07:22:57] <wikibugs>	 (03CR) 10Peachey88: "Is there a off-boarding task this can be linked to?" [puppet] - 10https://gerrit.wikimedia.org/r/484156 (owner: 10Marostegui)
[07:29:32] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['pc1007.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-a...
[07:38:22] <wikibugs>	 (03PS1) 10ArielGlenn: add new xml/sql dumps mirror, freemirror.org [puppet] - 10https://gerrit.wikimedia.org/r/484160
[07:38:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] add new xml/sql dumps mirror, freemirror.org [puppet] - 10https://gerrit.wikimedia.org/r/484160 (owner: 10ArielGlenn)
[07:40:49] <wikibugs>	 (03PS2) 10ArielGlenn: add new xml/sql dumps mirror, freemirror.org [puppet] - 10https://gerrit.wikimedia.org/r/484160
[07:45:58] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] add new xml/sql dumps mirror, freemirror.org [puppet] - 10https://gerrit.wikimedia.org/r/484160 (owner: 10ArielGlenn)
[07:48:36] <elukey>	 !log executed bmc-device --debug --cold-reset on dbstore1002 - "No more sessions available" for mgmt
[07:48:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:02:28] <wikibugs>	 (03PS2) 10Muehlenhoff: hhvm: Remove support for trusty/jessie [puppet] - 10https://gerrit.wikimedia.org/r/483381
[08:05:04] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] hhvm: Remove support for trusty/jessie [puppet] - 10https://gerrit.wikimedia.org/r/483381 (owner: 10Muehlenhoff)
[08:15:52] <wikibugs>	 (03Abandoned) 10Elukey: [TEST] Remove user elukey [puppet] - 10https://gerrit.wikimedia.org/r/483791 (owner: 10Elukey)
[08:20:04] <wikibugs>	 (03PS34) 10Elukey: admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949)
[08:20:06] <wikibugs>	 (03PS1) 10Elukey: role::analytics_cluster::hadoop::master: add groups without ssh access [puppet] - 10https://gerrit.wikimedia.org/r/484165 (https://phabricator.wikimedia.org/T212949)
[08:24:33] <wikibugs>	 (03CR) 10Elukey: "I've removed the role::analytics_cluster::hadoop::master example and added in a separate review, so this one can be checked via puppet com" [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) (owner: 10Elukey)
[08:29:55] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['pc1007.eqiad.wmnet'] `  and were **ALL** successful.
[08:31:26] <wikibugs>	 (03PS2) 10Elukey: role::analytics_cluster::hadoop: add groups without ssh access [puppet] - 10https://gerrit.wikimedia.org/r/484165 (https://phabricator.wikimedia.org/T212949)
[08:33:29] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Marostegui) pc1007 got installed and looks good: ` root@pc1007:~# megacli -LDPDInfo -aAll  Adapter #0  Number of Virtual Disks: 1 Virtual Drive: 0 (Target Id: 0) Name...
[08:33:48] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Marostegui)
[08:33:57] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Marostegui) 05Open→03Resolved
[08:38:07] <marostegui>	 !log Stop MySQL on pc2010 to clone pc1007 - T208383
[08:38:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:38:10] <stashbot>	 T208383: Implement parsercache service on pc[12]0(07|08|09|10) and replace leased pc[12]00[456] - https://phabricator.wikimedia.org/T208383
[08:42:15] <wikibugs>	 10Operations, 10Puppet: puppet.git rake fails with ruby 2.5 - https://phabricator.wikimedia.org/T208566 (10fgiunchedi) @hashar I initially added this task to ci-infra because it'll be relevant with buster docker/jenkins jobs, is there a task already for that I could piggyback?
[08:44:53] <marostegui>	 !log Stop mysql on dbstore1002 - T213670
[08:44:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:44:56] <stashbot>	 T213670: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670
[08:46:49] <wikibugs>	 (03PS3) 10Zoranzoki21: Add new throttle rule for Berklee College of Music library [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483409 (https://phabricator.wikimedia.org/T213311)
[08:50:11] <wikibugs>	 (03PS4) 10Zoranzoki21: Create Portal namespace on shn.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482508 (https://phabricator.wikimedia.org/T212992)
[08:50:51] <wikibugs>	 10Operations, 10monitoring: Report problems found by mcelog - https://phabricator.wikimedia.org/T197086 (10fgiunchedi) >>! In T197086#4865980, @CDanis wrote: > I think this work has mostly already happened?  We have some mtail rules for mce events. > https://phabricator.wikimedia.org/source/operations-puppet/b...
[08:53:26] <wikibugs>	 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, and 2 others: Assess Thumbor upgrade options - https://phabricator.wikimedia.org/T209886 (10jijiki)
[08:53:37] <wikibugs>	 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, and 3 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki)
[08:53:39] <wikibugs>	 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, and 2 others: Assess Thumbor upgrade options - https://phabricator.wikimedia.org/T209886 (10jijiki) 05Resolved→03Open
[08:55:20] <wikibugs>	 (03PS2) 10GTirloni: wmcs::nfs::misc - Fix typo and nsswitch.conf file [puppet] - 10https://gerrit.wikimedia.org/r/484149 (https://phabricator.wikimedia.org/T209527)
[08:56:21] <wikibugs>	 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, and 2 others: Assess Thumbor upgrade options - https://phabricator.wikimedia.org/T209886 (10jijiki)
[08:56:25] <wikibugs>	 (03CR) 10GTirloni: [C: 03+2] wmcs::nfs::misc - Fix typo and nsswitch.conf file [puppet] - 10https://gerrit.wikimedia.org/r/484149 (https://phabricator.wikimedia.org/T209527) (owner: 10GTirloni)
[09:04:35] <wikibugs>	 (03PS3) 10Muehlenhoff: Add support for buster-wikimedia to our internal repository [puppet] - 10https://gerrit.wikimedia.org/r/483694 (https://phabricator.wikimedia.org/T213527)
[09:05:18] <wikibugs>	 (03PS5) 10Zoranzoki21: Update groupOverrides for srwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482609 (https://phabricator.wikimedia.org/T213055)
[09:06:00] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1115 (tendril DB) had OOM for some processes and some hw (memory) issues - https://phabricator.wikimedia.org/T196726 (10Marostegui) @Cmjohnson can we request a new DIMM to Dell?
[09:09:59] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Add support for buster-wikimedia to our internal repository [puppet] - 10https://gerrit.wikimedia.org/r/483694 (https://phabricator.wikimedia.org/T213527) (owner: 10Muehlenhoff)
[09:20:47] <wikibugs>	 (03PS1) 10Vgutierrez: aptrepo: add component/kernel-proposed-updates to stretch-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/484181 (https://phabricator.wikimedia.org/T203194)
[09:30:46] <wikibugs>	 (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.13 [software/spicerack] - 10https://gerrit.wikimedia.org/r/484184
[09:37:25] <marostegui>	 !log Running aria_chk for all linter tables on dbstore1002 - T213670
[09:37:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:37:29] <stashbot>	 T213670: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670
[09:38:06] <wikibugs>	 (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.13 [software/spicerack] - 10https://gerrit.wikimedia.org/r/484184 (owner: 10Volans)
[09:40:10] <icinga-wm>	 RECOVERY - puppet last run on cloudstore1008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[09:43:30] <wikibugs>	 10Operations, 10Elasticsearch, 10Discovery-Search (Current work), 10Patch-For-Review: Fix prometheus elasticsearch exporter to show all the metrics - https://phabricator.wikimedia.org/T210592 (10Gehel) Validation was done by @Mathew.onipe.  .deb is now uploaded to our apt repo
[09:45:09] <wikibugs>	 (03PS1) 10Zoranzoki21: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484186
[09:45:11] <wikibugs>	 (03PS1) 10Zoranzoki21: Update groupOverrides for srwikinews [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484187 (https://phabricator.wikimedia.org/T213679)
[09:45:54] <wikibugs>	 (03Abandoned) 10Zoranzoki21: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/operations/mediawiki-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484186 (owner: 10Zoranzoki21)
[09:47:39] <wikibugs>	 (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.13 [software/spicerack] - 10https://gerrit.wikimedia.org/r/484184 (owner: 10Volans)
[09:48:05] <wikibugs>	 (03CR) 10DCausse: [C: 04-1] Elasticsearch failed shard allocation check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482297 (https://phabricator.wikimedia.org/T212850) (owner: 10Mathew.onipe)
[09:48:45] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/484181 (https://phabricator.wikimedia.org/T203194) (owner: 10Vgutierrez)
[09:49:09] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] aptrepo: add component/kernel-proposed-updates to stretch-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/484181 (https://phabricator.wikimedia.org/T203194) (owner: 10Vgutierrez)
[09:49:17] <wikibugs>	 (03CR) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.13 [software/spicerack] - 10https://gerrit.wikimedia.org/r/484184 (owner: 10Volans)
[09:50:43] <wikibugs>	 (03PS1) 10Volans: Upstream release v0.0.13 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/484188
[09:51:47] <marostegui>	 !log Running aria_chk for all myisam tables on dbstore1002 T213670
[09:51:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:51:50] <stashbot>	 T213670: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670
[09:57:40] <wikibugs>	 (03CR) 10Volans: [C: 03+2] Upstream release v0.0.13 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/484188 (owner: 10Volans)
[10:01:52] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, although I didn't run a puppetcompiler to verify it." [puppet] - 10https://gerrit.wikimedia.org/r/483695 (owner: 10Muehlenhoff)
[10:03:09] <wikibugs>	 (03Merged) 10jenkins-bot: Upstream release v0.0.13 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/484188 (owner: 10Volans)
[10:07:45] <moritzm>	 !log install tmpreaper security updates on remaining hosts
[10:07:46] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:48] <volans>	 !log uploaded spicerack_0.0.13-1_amd64.deb to apt.wikimedia.org stretch-wikimedia T205884
[10:11:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:52] <stashbot>	 T205884: Spicerack: split wmf-auto-reimage-lib into Spicerack modules - https://phabricator.wikimedia.org/T205884
[10:13:23] <volans>	 !log installed spicerack 0.0.13 on cumin2001 for final testing - T205884
[10:13:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:15:06] <wikibugs>	 (03PS1) 10Hashar: doc: force users umask for wikidev group [puppet] - 10https://gerrit.wikimedia.org/r/484194 (https://phabricator.wikimedia.org/T137890)
[10:15:54] <icinga-wm>	 PROBLEM - DPKG on relforge1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[10:15:55] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] doc: force users umask for wikidev group [puppet] - 10https://gerrit.wikimedia.org/r/484194 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar)
[10:16:52] <icinga-wm>	 PROBLEM - Check systemd state on relforge1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[10:17:47] <_joe_>	 uhm what's up with relforge?
[10:18:16] <wikibugs>	 (03PS2) 10Hashar: doc: force users umask for wikidev group [puppet] - 10https://gerrit.wikimedia.org/r/484194 (https://phabricator.wikimedia.org/T137890)
[10:18:54] <wikibugs>	 10Operations, 10DBA: correctable memory errors db1068 (commons primary master database) - https://phabricator.wikimedia.org/T213664 (10jcrespo) I created to track it, it has gone up to 21 since yesterday. We have to consider the possibility of it crashing due to uncorrectable errors and be prepared for a failo...
[10:19:21] <onimisionipe>	 _joe_: that's me!
[10:19:43] <_joe_>	 onimisionipe: oh ok
[10:19:49] <onimisionipe>	 It should be Ok now
[10:19:51] <_joe_>	 it looks like the prometheus exporter fails
[10:19:52] * fsero wonders if icinga can really downtime things
[10:19:57] <fsero>	 :P
[10:20:06] <onimisionipe>	 yea.. It does.
[10:20:54] <fsero>	 onimisionipe: happens to everyone :)
[10:21:20] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review, 10Performance-Team (Radar): Provision >= 50% of statsd/Graphite-only metrics in Prometheus - https://phabricator.wikimedia.org/T205870 (10fgiunchedi)
[10:26:02] <icinga-wm>	 PROBLEM - puppet last run on elastic1041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[10:26:32] <wikibugs>	 (03PS1) 10Zoranzoki21: Update groupOverrides for srwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484195 (https://phabricator.wikimedia.org/T213684)
[10:29:38] <icinga-wm>	 PROBLEM - puppet last run on relforge1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[prometheus-elasticsearch-exporter]
[10:30:04] <jouncebot>	 jan_drewniak: Time to snap out of that daydream and deploy Wikimedia Portals Update. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190114T1030).
[10:34:06] <wikibugs>	 (03CR) 10Hashar: "The permissions are somehow wrong from time to time :/" [puppet] - 10https://gerrit.wikimedia.org/r/484194 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar)
[10:39:22] <moritzm>	 !log start installing systemd security updates for stretch
[10:39:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:39:52] <icinga-wm>	 RECOVERY - DPKG on relforge1002 is OK: All packages OK
[10:40:02] <icinga-wm>	 RECOVERY - puppet last run on relforge1002 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures
[10:41:24] <wikibugs>	 (03PS2) 10Giuseppe Lavagetto: profile::services_proxy: simple local proxying for remote services [puppet] - 10https://gerrit.wikimedia.org/r/483788 (https://phabricator.wikimedia.org/T210717)
[10:41:26] <wikibugs>	 (03PS3) 10Giuseppe Lavagetto: mediawiki::common: add proxy for services [puppet] - 10https://gerrit.wikimedia.org/r/483789 (https://phabricator.wikimedia.org/T210717)
[10:42:09] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] profile::services_proxy: simple local proxying for remote services [puppet] - 10https://gerrit.wikimedia.org/r/483788 (https://phabricator.wikimedia.org/T210717) (owner: 10Giuseppe Lavagetto)
[10:44:20] <wikibugs>	 (03CR) 10Gehel: "minor comment inline. PCC looks good: https://puppet-compiler.wmflabs.org/compiler1002/14318/" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/483798 (https://phabricator.wikimedia.org/T198622) (owner: 10Mathew.onipe)
[10:47:05] <wikibugs>	 10Puppet, 10ORES, 10Scoring-platform-team (Current): orespoolcounter1002.eqiad.wmnet reporting compile errors - https://phabricator.wikimedia.org/T213586 (10akosiaris) 05Open→03Invalid All are warnings, that is not errors and are safe to ignore. They are about a feature (exported resources[1]) that is no...
[10:49:08] <wikibugs>	 (03PS35) 10Elukey: admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949)
[10:49:11] <wikibugs>	 (03PS3) 10Elukey: role::analytics_cluster::hadoop: add groups without ssh access [puppet] - 10https://gerrit.wikimedia.org/r/484165 (https://phabricator.wikimedia.org/T212949)
[10:51:42] <icinga-wm>	 RECOVERY - puppet last run on elastic1041 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[10:57:41] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] "I 'll merge this in the interest of unblocking the current issue. PCC says it's noop for production anyways, we can always revert with min" [puppet] - 10https://gerrit.wikimedia.org/r/475714 (https://phabricator.wikimedia.org/T212327) (owner: 10Alexandros Kosiaris)
[10:57:53] <wikibugs>	 (03PS7) 10Alexandros Kosiaris: Introduce $aggregate_networks, deprecate $all_networks [puppet] - 10https://gerrit.wikimedia.org/r/475714 (https://phabricator.wikimedia.org/T212327)
[11:00:11] <wikibugs>	 (03PS1) 10Muehlenhoff: Update canary host for Hadoop workers [puppet] - 10https://gerrit.wikimedia.org/r/484196
[11:02:23] <wikibugs>	 (03PS2) 10Muehlenhoff: Update canary host for Hadoop workers [puppet] - 10https://gerrit.wikimedia.org/r/484196
[11:02:42] <icinga-wm>	 RECOVERY - Check systemd state on relforge1002 is OK: OK - running: The system is fully operational
[11:02:43] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: ferm: Remove unused all_networks erb variable [puppet] - 10https://gerrit.wikimedia.org/r/483429
[11:03:30] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 03+2] ferm: Remove unused all_networks erb variable [puppet] - 10https://gerrit.wikimedia.org/r/483429 (owner: 10Alexandros Kosiaris)
[11:03:32] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+2] Update canary host for Hadoop workers [puppet] - 10https://gerrit.wikimedia.org/r/484196 (owner: 10Muehlenhoff)
[11:03:47] <wikibugs>	 10Operations, 10Citoid, 10serviceops: Create a readiness probe for zotero - https://phabricator.wikimedia.org/T213689 (10fselles)
[11:05:42] <wikibugs>	 (03PS3) 10Muehlenhoff: Update canary host for Hadoop workers [puppet] - 10https://gerrit.wikimedia.org/r/484196
[11:09:42] <wikibugs>	 (03PS1) 10Vgutierrez: cache: Add kernel-proposed-updates component for cp1075-99 [puppet] - 10https://gerrit.wikimedia.org/r/484199 (https://phabricator.wikimedia.org/T203194)
[11:12:08] <wikibugs>	 (03CR) 10Fsero: [C: 03+1] profile::services_proxy: simple local proxying for remote services (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/483788 (https://phabricator.wikimedia.org/T210717) (owner: 10Giuseppe Lavagetto)
[11:13:15] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "One nit, but LGTM." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/484199 (https://phabricator.wikimedia.org/T203194) (owner: 10Vgutierrez)
[11:13:48] <icinga-wm>	 PROBLEM - DPKG on relforge1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[11:14:07] <onimisionipe>	 common
[11:20:47] <volans>	 !log installed spicerack 0.0.13 on cumin1001 - T205884
[11:20:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:20:54] <stashbot>	 T205884: Spicerack: split wmf-auto-reimage-lib into Spicerack modules - https://phabricator.wikimedia.org/T205884
[11:24:35] <wikibugs>	 (03CR) 10Volans: [C: 03+2] API: convert to new Spicerack API [cookbooks] - 10https://gerrit.wikimedia.org/r/479463 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans)
[11:26:21] <wikibugs>	 (03Merged) 10jenkins-bot: API: convert to new Spicerack API [cookbooks] - 10https://gerrit.wikimedia.org/r/479463 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans)
[11:51:08] <wikibugs>	 (03PS3) 10Zfilipin: Add 'suppressredirect' user right to patroller user group at zh.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/480768 (https://phabricator.wikimedia.org/T212272) (owner: 10Tulsi Bhagat)
[11:57:06] <wikibugs>	 (03PS7) 10Mathew.onipe: Elasticsearch failed shard allocation check [puppet] - 10https://gerrit.wikimedia.org/r/482297 (https://phabricator.wikimedia.org/T212850)
[11:57:31] <wikibugs>	 (03CR) 10Mathew.onipe: Elasticsearch failed shard allocation check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482297 (https://phabricator.wikimedia.org/T212850) (owner: 10Mathew.onipe)
[11:57:33] <wikibugs>	 (03PS2) 10Vgutierrez: cache: Add kernel-proposed-updates component for cp1075-99 [puppet] - 10https://gerrit.wikimedia.org/r/484199 (https://phabricator.wikimedia.org/T203194)
[12:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: That opportune time is upon us again. Time for a European Mid-day SWAT(Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190114T1200).
[12:00:04] <jouncebot>	 Tulsi, TBhagat, Urbanecm, and Zoranzoki21: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[12:00:16] <zeljkof>	 I can swat today
[12:00:24] <Zoranzoki21>	 Here
[12:00:31] <Zoranzoki21>	 Can you process my patches first
[12:00:55] <wikibugs>	 (03PS2) 10Mathew.onipe: maps: migrate maps1003 to stretch [puppet] - 10https://gerrit.wikimedia.org/r/483798 (https://phabricator.wikimedia.org/T198622)
[12:01:11] <zeljkof>	 Tulsi, TBhagat, Urbanecm: are any patches urgent? no complaints on Zoranzoki21 being the first?
[12:01:18] <Urbanecm>	 No
[12:01:20] <Urbanecm>	 and hi zeljkof 
[12:01:26] <zeljkof>	 hi Urbanecm!
[12:01:27] <TBhagat>	 I have no problem with it.
[12:02:00] <Zoranzoki21>	 Thanks TBhagat and Urbanecm
[12:02:06] <Urbanecm>	 Yw Zoranzoki21 
[12:02:06] <zeljkof>	 ok, deploying the first Zoranzoki21's patch, please stand b<
[12:02:07] <zeljkof>	 by
[12:02:47] <wikibugs>	 (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483409 (https://phabricator.wikimedia.org/T213311) (owner: 10Zoranzoki21)
[12:03:02] <Zoranzoki21>	 zeljkof: 483409 no needs testing, it is throttle rule
[12:03:14] <zeljkof>	 ok
[12:03:56] <wikibugs>	 (03Merged) 10jenkins-bot: Add new throttle rule for Berklee College of Music library [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483409 (https://phabricator.wikimedia.org/T213311) (owner: 10Zoranzoki21)
[12:05:19] <logmsgbot>	 !log zfilipin@deploy1001 Synchronized wmf-config/throttle.php: SWAT: [[gerrit:483409|Add new throttle rule for Berklee College of Music library (T213311)]] (duration: 00m 52s)
[12:05:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:05:22] <stashbot>	 T213311: Request for temporary lift of IP cap on 2019-01-15 - https://phabricator.wikimedia.org/T213311
[12:05:27] <zeljkof>	 Zoranzoki21: 483409 deployed
[12:06:03] <Zoranzoki21>	 Second patch for Portal namespace needs namespaceDupes.php
[12:06:40] <wikibugs>	 (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482508 (https://phabricator.wikimedia.org/T212992) (owner: 10Zoranzoki21)
[12:06:46] <zeljkof>	 Zoranzoki21: ok
[12:06:53] <zeljkof>	 thanks for the reminder
[12:07:43] <wikibugs>	 (03Merged) 10jenkins-bot: Create Portal namespace on shn.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482508 (https://phabricator.wikimedia.org/T212992) (owner: 10Zoranzoki21)
[12:07:56] <Zoranzoki21>	 zeljkof: Oh, zuul is so fast today :)
[12:08:30] <zeljkof>	 Zoranzoki21: it is :) 482508 is at mwdebug1002 for testing
[12:08:45] * Zoranzoki21 testing
[12:09:00] <wikibugs>	 (03CR) 10Mathew.onipe: maps: migrate maps1003 to stretch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/483798 (https://phabricator.wikimedia.org/T198622) (owner: 10Mathew.onipe)
[12:09:36] <Zoranzoki21>	 zeljkof: looks good, LGTM
[12:09:44] <zeljkof>	 Zoranzoki21: ok, deploying
[12:10:36] <logmsgbot>	 !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:482508|Create Portal namespace on shn.wikipedia (T212992)]] (duration: 00m 46s)
[12:10:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:10:39] <stashbot>	 T212992: Create Portal namespace on shn.wikipedia - https://phabricator.wikimedia.org/T212992
[12:11:29] <wikibugs>	 10Operations, 10Citoid, 10serviceops, 10Wikimedia-Incident: Zotero service crashes and pages multiple times. - https://phabricator.wikimedia.org/T213693 (10fselles)
[12:11:36] <Zoranzoki21>	 zeljkof: ;)
[12:12:01] <wikibugs>	 10Operations, 10Citoid, 10serviceops, 10Kubernetes, 10Wikimedia-Incident: Zotero service crashes and pages multiple times. - https://phabricator.wikimedia.org/T213693 (10fselles)
[12:12:07] <zeljkof>	 Zoranzoki21: deployed, script did not find anything T212992#4876873
[12:12:19] <zeljkof>	 Zoranzoki21: you are free to go, thanks for deploying with #releng :)
[12:12:31] <zeljkof>	 (and please test the last patch before going) :)
[12:13:03] <Zoranzoki21>	 zeljkof: Ok is all, thanks!
[12:13:17] <zeljkof>	 Tulsi, TBhagat: you have two nicks? :)
[12:13:24] <wikibugs>	 (03PS3) 10Tulsi Bhagat: Configure $wgAddGroups, $wgRemoveGroups and $wgImportSources for ur.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481579 (https://phabricator.wikimedia.org/T212612)
[12:13:47] <zeljkof>	 anyway, please stand by, you're next, I'll let you know when the first patch is at mwdebug1002 ready for testing
[12:13:57] <TBhagat>	 Hi zeljkof! Yes :)
[12:14:07] <TBhagat>	 Sure
[12:14:25] <zeljkof>	 Tulsi, TBhagat: which one do you prefer for pings?
[12:14:36] <zeljkof>	 so I don't ping both all the time :)
[12:14:50] <TBhagat>	 Go for TBhagat.
[12:14:51] <zeljkof>	 and let me know if you need help on how to test at mwdebug102
[12:14:57] <zeljkof>	 ok
[12:15:12] <wikibugs>	 (03PS4) 10Zfilipin: Add 'suppressredirect' user right to patroller user group at zh.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/480768 (https://phabricator.wikimedia.org/T212272) (owner: 10Tulsi Bhagat)
[12:15:29] <TBhagat>	 No
[12:15:36] <TBhagat>	 Let's start! :)
[12:16:10] <wikibugs>	 (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/480768 (https://phabricator.wikimedia.org/T212272) (owner: 10Tulsi Bhagat)
[12:16:46] <zeljkof>	 Urbanecm: there are a lot of patches today, I'll do my best but there's a chance one or both of your commit will not make it
[12:17:10] <Urbanecm>	 Ok, that's fine
[12:17:15] <wikibugs>	 (03Merged) 10jenkins-bot: Add 'suppressredirect' user right to patroller user group at zh.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/480768 (https://phabricator.wikimedia.org/T212272) (owner: 10Tulsi Bhagat)
[12:18:02] <TBhagat>	 zeljkof, LGTM. Please deploy.
[12:18:05] <zeljkof>	 TBhagat: 480768 is at mwdebug1002, please test and let me know if I can deploy
[12:18:12] <zeljkof>	 oh, that was fast :)
[12:18:17] <zeljkof>	 deploying
[12:18:20] <TBhagat>	 hehe
[12:18:33] <zeljkof>	 TBhagat: let me know if any patches need scripts to run after deployment
[12:18:44] <TBhagat>	 Ok
[12:19:19] <zeljkof>	 the best way is to leave a comment in gerrit (which scripts need to run for which patches)
[12:19:36] <logmsgbot>	 !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:480768|Add suppressredirect user right to patroller user group at zh.wikivoyage (T212272)]] (duration: 00m 46s)
[12:19:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:19:39] <stashbot>	 T212272: Assign "suppressredirect" to patroller on Chinese Wikivoyage - https://phabricator.wikimedia.org/T212272
[12:19:45] <TBhagat>	 I have already left a comment on 481578
[12:19:53] <zeljkof>	 TBhagat: 480768 deployed, please test
[12:19:56] <zeljkof>	 TBhagat: thanks
[12:20:14] <wikibugs>	 (03PS4) 10Zfilipin: Configure $wgAddGroups, $wgRemoveGroups and $wgImportSources for ur.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481579 (https://phabricator.wikimedia.org/T212612) (owner: 10Tulsi Bhagat)
[12:20:23] <TBhagat>	 480768 Working fine.
[12:21:53] <wikibugs>	 (03CR) 10jenkins-bot: Add new throttle rule for Berklee College of Music library [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483409 (https://phabricator.wikimedia.org/T213311) (owner: 10Zoranzoki21)
[12:21:55] <wikibugs>	 (03CR) 10jenkins-bot: Create Portal namespace on shn.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482508 (https://phabricator.wikimedia.org/T212992) (owner: 10Zoranzoki21)
[12:21:57] <wikibugs>	 (03CR) 10jenkins-bot: Add 'suppressredirect' user right to patroller user group at zh.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/480768 (https://phabricator.wikimedia.org/T212272) (owner: 10Tulsi Bhagat)
[12:23:20] <wikibugs>	 (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481579 (https://phabricator.wikimedia.org/T212612) (owner: 10Tulsi Bhagat)
[12:24:48] <wikibugs>	 (03Merged) 10jenkins-bot: Configure $wgAddGroups, $wgRemoveGroups and $wgImportSources for ur.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481579 (https://phabricator.wikimedia.org/T212612) (owner: 10Tulsi Bhagat)
[12:25:21] <zeljkof>	 TBhagat: 481579 is at mwdebug1002, please test
[12:25:53] * TBhagat testing
[12:26:19] <wikibugs>	 (03PS1) 10Jcrespo: mariadb: Depool db1081 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484213 (https://phabricator.wikimedia.org/T213664)
[12:26:47] <TBhagat>	 481579 LGTM, Please deploy.
[12:26:54] <zeljkof>	 ok
[12:27:54] <logmsgbot>	 !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:481579|Configure $wgAddGroups, $wgRemoveGroups and $wgImportSources for ur.wiki (T212612)]] (duration: 00m 46s)
[12:27:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:27:57] <stashbot>	 T212612: Add transwiki import on Urdu Wikipedia - https://phabricator.wikimedia.org/T212612
[12:28:20] <zeljkof>	 TBhagat: it's deployed, please test
[12:28:59] <TBhagat>	 zeljkof, 481579 Working fine.
[12:30:25] <zeljkof>	 ok, moving on
[12:30:33] <TBhagat>	 Sure
[12:31:21] <TBhagat>	 Should i rebase 481578?
[12:31:39] <zeljkof>	 TBhagat: I'll rebase as needed
[12:31:48] <TBhagat>	 ok
[12:32:04] <wikibugs>	 (03PS4) 10Zfilipin: Configure $wgNamespaceAliases for yue.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481578 (https://phabricator.wikimedia.org/T212678) (owner: 10Tulsi Bhagat)
[12:33:02] <wikibugs>	 (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481578 (https://phabricator.wikimedia.org/T212678) (owner: 10Tulsi Bhagat)
[12:33:30] <TBhagat>	 zeljkof, Reminder: change 481578 - Requires `namespaceDupes.php --wiki=yuewiktionary --fix` to be run after deployment.
[12:33:43] <zeljkof>	 TBhagat: thanks, will do
[12:34:06] <wikibugs>	 (03Merged) 10jenkins-bot: Configure $wgNamespaceAliases for yue.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481578 (https://phabricator.wikimedia.org/T212678) (owner: 10Tulsi Bhagat)
[12:34:55] <zeljkof>	 TBhagat: 481578 is at mwdebug1002
[12:35:25] <TBhagat>	 testing
[12:35:56] <wikibugs>	 (03CR) 10jenkins-bot: Configure $wgAddGroups, $wgRemoveGroups and $wgImportSources for ur.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481579 (https://phabricator.wikimedia.org/T212612) (owner: 10Tulsi Bhagat)
[12:35:58] <wikibugs>	 (03CR) 10jenkins-bot: Configure $wgNamespaceAliases for yue.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481578 (https://phabricator.wikimedia.org/T212678) (owner: 10Tulsi Bhagat)
[12:36:16] <TBhagat>	 481578  LGTM
[12:36:24] <zeljkof>	 ok, deploying
[12:36:24] <TBhagat>	 please deploy
[12:37:37] <logmsgbot>	 !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:481578|Configure $wgNamespaceAliases for yue.wiktionary (T212678)]] (duration: 00m 45s)
[12:37:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:37:41] <stashbot>	 T212678: Add namespace aliases for yuewiktionary - https://phabricator.wikimedia.org/T212678
[12:38:42] <zeljkof>	 TBhagat: deployed, script ran, did not find anything to fix https://phabricator.wikimedia.org/T212678#4876955
[12:39:09] <TBhagat>	 Gr8. Let's move on.
[12:39:51] <wikibugs>	 (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483737 (https://phabricator.wikimedia.org/T213023) (owner: 10Tulsi Bhagat)
[12:42:28] <wikibugs>	 (03CR) 10Zfilipin: Configure $wgImportSources for ne.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483737 (https://phabricator.wikimedia.org/T213023) (owner: 10Tulsi Bhagat)
[12:42:33] <wikibugs>	 (03PS2) 10Zfilipin: Configure $wgImportSources for ne.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483737 (https://phabricator.wikimedia.org/T213023) (owner: 10Tulsi Bhagat)
[12:42:42] <wikibugs>	 (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483737 (https://phabricator.wikimedia.org/T213023) (owner: 10Tulsi Bhagat)
[12:43:48] <wikibugs>	 (03Merged) 10jenkins-bot: Configure $wgImportSources for ne.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483737 (https://phabricator.wikimedia.org/T213023) (owner: 10Tulsi Bhagat)
[12:44:55] <logmsgbot>	 !log zfilipin@deploy1001 sync-file aborted: SWAT: [[gerrit:481578|Configure $wgNamespaceAliases for yue.wiktionary (T212678)]] (duration: 00m 01s)
[12:44:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:44:58] <stashbot>	 T212678: Add namespace aliases for yuewiktionary - https://phabricator.wikimedia.org/T212678
[12:45:29] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review, 10User-CDanis, 10User-fgiunchedi: Better organization for SRE grafana dashboards - https://phabricator.wikimedia.org/T178690 (10jcrespo) I've just seen a dashboard I use is scheduled for deletion. I don't see the replacement as particularly better and lacki...
[12:45:31] <zeljkof>	 oops, this ^ is my mistake, wrong link from bash history :( aborted after a second
[12:45:57] <zeljkof>	 TBhagat: 483737 is at mwdebug
[12:46:49] <TBhagat>	 483737 LGTM, Please deploy. 
[12:47:41] <zeljkof>	 ok
[12:48:05] <zeljkof>	 Urbanecm: please stand by, you're next :)
[12:48:07] <Urbanecm>	 ack
[12:48:38] <logmsgbot>	 !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:483737|Configure $wgImportSources for ne.wiktionary (T213023)]] (duration: 00m 45s)
[12:48:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:48:40] <stashbot>	 T213023: Enable import feature in Nepali Wiktionary - https://phabricator.wikimedia.org/T213023
[12:48:56] <zeljkof>	 TBhagat: it's deployed, please test and thanks for deploying with #releng :)
[12:49:12] <wikibugs>	 (03CR) 10jenkins-bot: Configure $wgImportSources for ne.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483737 (https://phabricator.wikimedia.org/T213023) (owner: 10Tulsi Bhagat)
[12:49:20] <wikibugs>	 (03PS2) 10Zfilipin: Localisation of Babel categories on nap.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481244 (https://phabricator.wikimedia.org/T123188) (owner: 10Urbanecm)
[12:50:24] <TBhagat>	 zeljkof, Thank you so much! Have a good time ahead! ;)
[12:50:46] <wikibugs>	 (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481244 (https://phabricator.wikimedia.org/T123188) (owner: 10Urbanecm)
[12:52:20] <wikibugs>	 (03Merged) 10jenkins-bot: Localisation of Babel categories on nap.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481244 (https://phabricator.wikimedia.org/T123188) (owner: 10Urbanecm)
[12:53:15] <zeljkof>	 Urbanecm: 481244 is at mwdebug, please test
[12:53:32] <Urbanecm>	 will do
[12:55:44] <Urbanecm>	 looks to be working, please deploy zeljkof 
[12:55:51] <zeljkof>	 ok
[12:56:28] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review, 10User-CDanis, 10User-fgiunchedi: Better organization for SRE grafana dashboards - https://phabricator.wikimedia.org/T178690 (10CDanis) Jaime, going to have to guess here; are you referring to [[ https://grafana.wikimedia.org/d/000000274/prometheus-machine-...
[12:56:50] <icinga-wm>	 PROBLEM - DPKG on proton1002 is CRITICAL: connect to address 10.64.32.61 port 5666: Connection refused
[12:56:53] <logmsgbot>	 !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:481244|Localisation of Babel categories on nap.wikipedia.org (T123188)]] (duration: 00m 44s)
[12:56:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:56:56] <stashbot>	 T123188: Localisation of user categories for nap.wikipedia - https://phabricator.wikimedia.org/T123188
[12:57:06] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: connect to address 10.64.32.61 port 5666: Connection refused
[12:57:14] <icinga-wm>	 PROBLEM - dhclient process on proton1002 is CRITICAL: connect to address 10.64.32.61 port 5666: Connection refused
[12:57:22] <icinga-wm>	 PROBLEM - Disk space on proton1002 is CRITICAL: connect to address 10.64.32.61 port 5666: Connection refused
[12:57:24] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on proton1002 is CRITICAL: connect to address 10.64.32.61 port 5666: Connection refused
[12:57:24] <icinga-wm>	 PROBLEM - configured eth on proton1002 is CRITICAL: connect to address 10.64.32.61 port 5666: Connection refused
[12:57:36] <icinga-wm>	 PROBLEM - Check systemd state on proton1002 is CRITICAL: connect to address 10.64.32.61 port 5666: Connection refused
[12:57:38] <icinga-wm>	 PROBLEM - Check size of conntrack table on proton1002 is CRITICAL: connect to address 10.64.32.61 port 5666: Connection refused
[12:57:39] <zeljkof>	 Urbanecm: deployed, please test
[12:57:43] <Urbanecm>	 thx
[12:58:16] <wikibugs>	 (03PS3) 10Zfilipin: Add http://mbc.cyfrowemazowsze.pl to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481108 (https://phabricator.wikimedia.org/T212469) (owner: 10Urbanecm)
[12:58:43] <wikibugs>	 (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481108 (https://phabricator.wikimedia.org/T212469) (owner: 10Urbanecm)
[12:58:48] <icinga-wm>	 PROBLEM - puppet last run on proton1002 is CRITICAL: connect to address 10.64.32.61 port 5666: Connection refused
[12:59:49] <wikibugs>	 (03Merged) 10jenkins-bot: Add http://mbc.cyfrowemazowsze.pl to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481108 (https://phabricator.wikimedia.org/T212469) (owner: 10Urbanecm)
[13:00:52] <zeljkof>	 Urbanecm: 481108 is at mwdebug
[13:00:56] <Urbanecm>	 thanks
[13:01:48] <zeljkof>	 Urbanecm: can I deploy it?
[13:01:51] <jijiki>	 is anyone aware why npre died on proton1002 /
[13:01:53] <jijiki>	 ?
[13:01:57] <Urbanecm>	 zeljkof, yes
[13:01:58] <jijiki>	 should I restart it ?
[13:02:06] <zeljkof>	 Urbanecm: ok, deploying
[13:02:16] <wikibugs>	 (03CR) 10jenkins-bot: Localisation of Babel categories on nap.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481244 (https://phabricator.wikimedia.org/T123188) (owner: 10Urbanecm)
[13:02:18] <wikibugs>	 (03CR) 10jenkins-bot: Add http://mbc.cyfrowemazowsze.pl to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481108 (https://phabricator.wikimedia.org/T212469) (owner: 10Urbanecm)
[13:03:03] <logmsgbot>	 !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:481108|Add http://mbc.cyfrowemazowsze.pl to $wgCopyUploadsDomains (T212469)]] (duration: 00m 46s)
[13:03:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:03:08] <stashbot>	 T212469: Add http://mbc.cyfrowemazowsze.pl to $wgCopyUploadsDomains - https://phabricator.wikimedia.org/T212469
[13:03:15] <zeljkof>	 Urbanecm: all deployed, please test and thanks for deploying with #releng ;)
[13:03:20] <Urbanecm>	 thanks zeljkof 
[13:03:22] <zeljkof>	 !log eu swat finished
[13:03:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:05:48] <wikibugs>	 (03PS6) 10CDanis: Reference grafana dashboards by UID for alerting. [puppet] - 10https://gerrit.wikimedia.org/r/483820 (owner: 10Ppchelko)
[13:07:16] <wikibugs>	 10Operations, 10Puppet, 10Continuous-Integration-Config: puppet.git rake fails with ruby 2.5 - https://phabricator.wikimedia.org/T208566 (10hashar) The `Gemfile` uses `puppet ~> 4.8.2` which is the version provided by `jessie-backports` and `stretch`.  The CI job installs it from rubygems hence we lack monke...
[13:07:38] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] Reference grafana dashboards by UID for alerting. [puppet] - 10https://gerrit.wikimedia.org/r/483820 (owner: 10Ppchelko)
[13:08:32] <icinga-wm>	 RECOVERY - Check systemd state on proton1002 is OK: OK - running: The system is fully operational
[13:08:34] <icinga-wm>	 RECOVERY - Check size of conntrack table on proton1002 is OK: OK: nf_conntrack is 0 % full
[13:08:52] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] "Merged and puppet-merged.  Thanks again Petr!" [puppet] - 10https://gerrit.wikimedia.org/r/483820 (owner: 10Ppchelko)
[13:09:00] <icinga-wm>	 RECOVERY - DPKG on proton1002 is OK: All packages OK
[13:09:14] <icinga-wm>	 RECOVERY - puppet last run on proton1002 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures
[13:09:18] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy
[13:09:22] <icinga-wm>	 RECOVERY - dhclient process on proton1002 is OK: PROCS OK: 0 processes with command name dhclient
[13:09:27] <jijiki>	 I restarted npre on proton1002 for now 
[13:09:30] <icinga-wm>	 RECOVERY - Disk space on proton1002 is OK: DISK OK
[13:09:32] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on proton1002 is OK: OK ferm input default policy is set
[13:09:32] <icinga-wm>	 RECOVERY - configured eth on proton1002 is OK: OK - interfaces up
[13:09:57] <jijiki>	 I am not investigating any further, if npre dies again, we could dig deeper
[13:10:31] <jijiki>	 !log Restarted npre on proton1002
[13:10:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:10:47] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review, 10User-CDanis, 10User-fgiunchedi: Better organization for SRE grafana dashboards - https://phabricator.wikimedia.org/T178690 (10jcrespo) >>! In T178690#4876994, @CDanis wrote: > Jaime, going to have to guess here; are you referring to [[ https://grafana.wik...
[13:16:29] <wikibugs>	 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-Cache, 10Language-Team (Language-2019-January-March), and 5 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10Nikerabbit) The new patch sh...
[13:19:20] <wikibugs>	 (03PS1) 10Filippo Giunchedi: hieradata: increase default kafka partitions for logging cluster [puppet] - 10https://gerrit.wikimedia.org/r/484226 (https://phabricator.wikimedia.org/T213081)
[13:23:47] <moritzm>	 jijiki: in such cases it's often the oomkiller which kills unrelated processes
[13:24:33] <jijiki>	 moritzm: yep 
[13:24:44] <jijiki>	 but proton spawns chromium instances 
[13:24:52] <jijiki>	 so maybe one got out of hand 
[13:25:30] <jijiki>	 I will keep an eye 
[13:27:22] <jijiki>	 https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&orgId=1&var-server=proton1002&var-datasource=eqiad%20prometheus%2Fops&var-cluster=proton
[13:34:22] <wikibugs>	 10Operations, 10Traffic, 10HTTPS, 10Patch-For-Review: letsencrypt puppetization: upgrade for scalability - https://phabricator.wikimedia.org/T134447 (10Krenair) 05Open→03Resolved I think at this point the route forward is certcentral and there's not much point keeping this particular ticket open. Feel...
[13:34:25] <wikibugs>	 10Operations, 10Traffic, 10HTTPS, 10Patch-For-Review: Create a secure redirect service for large count of non-canonical / junk domains - https://phabricator.wikimedia.org/T133548 (10Krenair)
[13:34:35] <wikibugs>	 (03CR) 10Filippo Giunchedi: "Luca/Andrew LMK what you think! Straightforward for new topics I'd say, for existing topics should be fine too (see task)" [puppet] - 10https://gerrit.wikimedia.org/r/484226 (https://phabricator.wikimedia.org/T213081) (owner: 10Filippo Giunchedi)
[13:34:41] <wikibugs>	 10Operations, 10Traffic, 10HTTPS: letsencrypt puppetization: add parallel rsa+ecdsa cert support - https://phabricator.wikimedia.org/T141266 (10Krenair) 05Open→03Resolved I think at this point the route forward is certcentral and there's not much point keeping this particular ticket open. Feel free to re...
[13:34:45] <wikibugs>	 10Operations, 10Traffic, 10HTTPS, 10Patch-For-Review: letsencrypt puppetization: upgrade for scalability - https://phabricator.wikimedia.org/T134447 (10Krenair)
[13:37:13] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero upgrade -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
[13:37:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:37:21] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
[13:37:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:38:16] <wikibugs>	 (03PS19) 10DCausse: [cirrus] Start writing to psi & omega [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476271 (https://phabricator.wikimedia.org/T210381)
[13:38:18] <wikibugs>	 (03PS19) 10DCausse: [cirrus] Start using replica group settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476272 (https://phabricator.wikimedia.org/T210381)
[13:38:20] <wikibugs>	 (03PS21) 10DCausse: [cirrus] Cleanup transitional states [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476273 (https://phabricator.wikimedia.org/T210381)
[13:40:31] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2006 is CRITICAL: PYBAL CRITICAL - CRITICAL - zoterov2_1968: Servers kubernetes2002.codfw.wmnet, kubernetes2001.codfw.wmnet are marked down but pooled: zotero_1969: Servers kubernetes2002.codfw.wmnet, kubernetes2001.codfw.wmnet are marked down but pooled
[13:41:39] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs2003 is CRITICAL: PYBAL CRITICAL - CRITICAL - zoterov2_1968: Servers kubernetes2002.codfw.wmnet, kubernetes2004.codfw.wmnet are marked down but pooled: zotero_1969: Servers kubernetes2004.codfw.wmnet, kubernetes2001.codfw.wmnet are marked down but pooled
[13:41:54] <akosiaris>	 !log rollback zotero codfw deployment
[13:41:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:42:26] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero upgrade production --dry-run --debug -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
[13:42:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:42:27] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero cluster codfw completed
[13:42:27] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero finished
[13:42:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:42:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:42:51] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2003 is OK: PYBAL OK - All pools are healthy
[13:42:55] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs2006 is OK: PYBAL OK - All pools are healthy
[13:46:19] <wikibugs>	 (03PS3) 10Jgreen: Add SHA256 selector record for fundraising mail contractor (IBM/Silverpop). [dns] - 10https://gerrit.wikimedia.org/r/483294 (https://phabricator.wikimedia.org/T210445)
[13:47:05] <wikibugs>	 (03CR) 10Jgreen: [C: 03+2] Add SHA256 selector record for fundraising mail contractor (IBM/Silverpop). [dns] - 10https://gerrit.wikimedia.org/r/483294 (https://phabricator.wikimedia.org/T210445) (owner: 10Jgreen)
[13:48:43] <dcausse>	 !log creating testcommonswiki index in the omega search-elastic cluster (eqiad & codfw)
[13:48:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:05] <Jeff_Green>	 !log authdns update for T210445
[13:49:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:49:08] <stashbot>	 T210445: Stronger DKIM key for fundraising emails? - https://phabricator.wikimedia.org/T210445
[13:51:08] <wikibugs>	 10Operations, 10Fundraising-Backlog, 10Mail, 10fundraising-tech-ops, 10Patch-For-Review: Stronger DKIM key for fundraising emails? - https://phabricator.wikimedia.org/T210445 (10Jgreen) >>! In T210445#4867737, @Jgreen wrote: >>>! In T210445#4867613, @bsisolak wrote: >> The key is correct, and IBM will va...
[13:51:10] <wikibugs>	 (03PS1) 10Jbond: Remove user imarlier as part of the off boarding process [puppet] - 10https://gerrit.wikimedia.org/r/484231
[13:51:41] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
[13:51:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:51:42] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero cluster codfw completed
[13:51:42] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero finished
[13:51:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:51:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:54:12] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/484231 (owner: 10Jbond)
[13:58:09] <wikibugs>	 (03CR) 10Jbond: [C: 03+2] Remove user imarlier as part of the off boarding process [puppet] - 10https://gerrit.wikimedia.org/r/484231 (owner: 10Jbond)
[14:04:13] <marostegui>	 !log Add pc1007 to tendril and zarcillo - T208383
[14:04:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:04:16] <stashbot>	 T208383: Implement parsercache service on pc[12]0(07|08|09|10) and replace leased pc[12]00[456] - https://phabricator.wikimedia.org/T208383
[14:07:32] <wikibugs>	 (03CR) 10Elukey: "I'll wait for Andrew to comment since he knows best, but I'd avoid to cross the 3 partitions unless there is a special need for high traff" [puppet] - 10https://gerrit.wikimedia.org/r/484226 (https://phabricator.wikimedia.org/T213081) (owner: 10Filippo Giunchedi)
[14:09:08] <wikibugs>	 (03CR) 10Marostegui: [C: 03+1] mariadb: Depool db1081 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484213 (https://phabricator.wikimedia.org/T213664) (owner: 10Jcrespo)
[14:10:34] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
[14:10:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:11:42] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero upgrade production --debug -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
[14:11:42] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:12:26] <wikibugs>	 10Operations: Offboard Balazs - https://phabricator.wikimedia.org/T213703 (10MoritzMuehlenhoff)
[14:12:33] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: toolforge: aptly: create stretch/jessie repos [puppet] - 10https://gerrit.wikimedia.org/r/484233 (https://phabricator.wikimedia.org/T213421)
[14:12:42] <wikibugs>	 10Operations: Offboard Balazs - https://phabricator.wikimedia.org/T213703 (10MoritzMuehlenhoff) p:05Triage→03Normal a:03jbond
[14:13:19] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
[14:13:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:26] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero cluster eqiad completed
[14:13:26] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero finished
[14:13:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:14:19] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: toolforge: aptly: create stretch/jessie repos [puppet] - 10https://gerrit.wikimedia.org/r/484233 (https://phabricator.wikimedia.org/T213421)
[14:16:49] <icinga-wm>	 RECOVERY - DPKG on relforge1001 is OK: All packages OK
[14:18:03] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] toolforge: aptly: create stretch/jessie repos [puppet] - 10https://gerrit.wikimedia.org/r/484233 (https://phabricator.wikimedia.org/T213421) (owner: 10Arturo Borrero Gonzalez)
[14:18:46] <dcausse>	 !log elasticsearch (search cluster): pre-populating omega & psi clusters in eqiad & codfw (from mwmaint1002 and mwmaint2001 respectively) (T210381)
[14:18:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:18:49] <stashbot>	 T210381: Update mw-config to use the psi&omega elastic clusters - https://phabricator.wikimedia.org/T210381
[14:18:53] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s5 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[14:19:01] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s7 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[14:19:09] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s2 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[14:19:13] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s6 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[14:20:28] <wikibugs>	 (03PS1) 10CDanis: Fixes to check_grafana_alert [puppet] - 10https://gerrit.wikimedia.org/r/484234 (https://phabricator.wikimedia.org/T213506)
[14:21:07] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] Fixes to check_grafana_alert [puppet] - 10https://gerrit.wikimedia.org/r/484234 (https://phabricator.wikimedia.org/T213506) (owner: 10CDanis)
[14:21:18] <wikibugs>	 (03PS2) 10CDanis: Fixes to check_grafana_alert [puppet] - 10https://gerrit.wikimedia.org/r/484234 (https://phabricator.wikimedia.org/T213506)
[14:23:37] <icinga-wm>	 PROBLEM - puppet last run on notebook1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): User[imarlier]
[14:25:38] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s3 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[14:27:14] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s8 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[14:27:18] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s1 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[14:27:32] <wikibugs>	 (03CR) 10Volans: "recheck" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/483131 (owner: 10Volans)
[14:28:13] <elukey>	 \o/
[14:29:06] <icinga-wm>	 RECOVERY - MariaDB Slave SQL: s4 on dbstore1002 is OK: OK slave_sql_state Slave_SQL_Running: Yes
[14:29:56] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: toolforge: refactor docker registry profile [puppet] - 10https://gerrit.wikimedia.org/r/483765 (https://phabricator.wikimedia.org/T213418)
[14:30:01] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 04-1] "The python/yaml changes look good to me except for one question inline." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/483606 (owner: 10BryanDavis)
[14:31:03] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] Enable base::service_auto_restart for uwsgi-striker [puppet] - 10https://gerrit.wikimedia.org/r/483114 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff)
[14:31:59] <wikibugs>	 (03CR) 10Volans: "recheck" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/443366 (https://phabricator.wikimedia.org/T198592) (owner: 10Volans)
[14:32:04] <wikibugs>	 (03CR) 10Volans: "recheck" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/443367 (https://phabricator.wikimedia.org/T198592) (owner: 10Volans)
[14:32:10] <wikibugs>	 (03CR) 10Volans: "recheck" [software/debmonitor] - 10https://gerrit.wikimedia.org/r/443368 (https://phabricator.wikimedia.org/T198592) (owner: 10Volans)
[14:33:21] <wikibugs>	 10Operations, 10TCB-Team, 10WMF-JobQueue, 10monitoring, and 3 others: Grafana alerting broken after upgrade to 5.0.0 - https://phabricator.wikimedia.org/T213506 (10CDanis) 05Open→03Resolved
[14:34:04] <wikibugs>	 (03CR) 10Hashar: "We can fix the permissions ourselves once we are granded sudo as the doc-publisher user  T213169" [puppet] - 10https://gerrit.wikimedia.org/r/484194 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar)
[14:35:51] <wikibugs>	 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-Cache, 10Language-Team (Language-2019-January-March), and 5 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) @Nikerabbit looking...
[14:36:06] <volans>	 !log uploaded python{,3}-phabricator 0.7.0-2~wmf1 to apt.w.o T205884 (upstream removes egg files)
[14:36:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:36:08] <stashbot>	 T205884: Spicerack: split wmf-auto-reimage-lib into Spicerack modules - https://phabricator.wikimedia.org/T205884
[14:37:18] <cdanis>	 moritzm: have a moment to update the topic to put me on ops duty?
[14:37:36] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+1] "LGTM!  Has this been tested in toolforge already via cherry-pick?  If not I'd like to do that before merging." [puppet] - 10https://gerrit.wikimedia.org/r/482237 (https://phabricator.wikimedia.org/T87001) (owner: 10BryanDavis)
[14:37:38] <wikibugs>	 10Operations, 10Citoid, 10serviceops, 10Kubernetes, 10Wikimedia-Incident: Zotero service crashes and pages multiple times. - https://phabricator.wikimedia.org/T213693 (10CDanis) p:05Triage→03Normal
[14:38:02] <wikibugs>	 10Operations, 10Citoid, 10serviceops, 10Patch-For-Review: Create a readiness probe for zotero - https://phabricator.wikimedia.org/T213689 (10CDanis) p:05Triage→03Normal
[14:39:21] <volans>	 !log updated python3-phabricator on cumin[12]001 T205884
[14:39:22] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: correctable memory errors db1068 (commons primary master database) - https://phabricator.wikimedia.org/T213664 (10CDanis) p:05Triage→03High
[14:39:23] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:40:24] <moritzm>	 cdanis: sure, on it
[14:41:12] <moritzm>	 cdanis: done
[14:41:14] <logmsgbot>	 !log anomie@mwmaint1002 Running migrateActors.php on remaining section 3 wikis for T188327. This may cause lag in codfw.
[14:41:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:41:17] <stashbot>	 T188327: Deploy refactored actor storage - https://phabricator.wikimedia.org/T188327
[14:41:47] <wikibugs>	 10Operations, 10Puppet, 10Continuous-Integration-Config: puppet.git rake fails with ruby 2.5 - https://phabricator.wikimedia.org/T208566 (10CDanis) p:05Triage→03Normal
[14:41:52] <wikibugs>	 10Operations, 10Certcentral, 10Traffic, 10Goal: Deploy managed LetsEncrypt certs for all public use-cases - https://phabricator.wikimedia.org/T213705 (10Vgutierrez) p:05Triage→03Normal
[14:41:53] <logmsgbot>	 !log anomie@mwmaint1002 Running migrateActors.php on section 1 wikis for T188327. This may cause lag in codfw.
[14:41:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:01] <logmsgbot>	 !log anomie@mwmaint1002 Running migrateActors.php on section 2 wikis for T188327. This may cause lag in codfw.
[14:42:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:07] <logmsgbot>	 !log anomie@mwmaint1002 Running migrateActors.php on section 4 wikis for T188327. This may cause lag in codfw.
[14:42:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:10] <logmsgbot>	 !log anomie@mwmaint1002 Running migrateActors.php on section 5 wikis for T188327. This may cause lag in codfw.
[14:42:12] <logmsgbot>	 !log anomie@mwmaint1002 Running migrateActors.php on section 6 wikis for T188327. This may cause lag in codfw.
[14:42:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:14] <logmsgbot>	 !log anomie@mwmaint1002 Running migrateActors.php on section 7 wikis for T188327. This may cause lag in codfw.
[14:42:16] <logmsgbot>	 !log anomie@mwmaint1002 Running migrateActors.php on section 8 wikis for T188327. This may cause lag in codfw.
[14:42:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:18] <logmsgbot>	 !log anomie@mwmaint1002 Running migrateActors.php on wikitech for T188327. This may cause lag in codfw.
[14:42:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:42:20] <wikibugs>	 10Operations, 10Certcentral, 10Traffic, 10Goal: Deploy managed LetsEncrypt certs for all public use-cases - https://phabricator.wikimedia.org/T213705 (10Vgutierrez)
[14:42:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:49:27] <wikibugs>	 (03CR) 10Vgutierrez: [C: 03+2] cache: Add kernel-proposed-updates component for cp1075-99 [puppet] - 10https://gerrit.wikimedia.org/r/484199 (https://phabricator.wikimedia.org/T203194) (owner: 10Vgutierrez)
[14:49:35] <wikibugs>	 (03PS3) 10Vgutierrez: cache: Add kernel-proposed-updates component for cp1075-99 [puppet] - 10https://gerrit.wikimedia.org/r/484199 (https://phabricator.wikimedia.org/T203194)
[14:51:10] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero upgrade production -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw]
[14:51:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:11] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero cluster codfw completed
[14:51:12] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero finished
[14:51:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:51:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:52:25] <akosiaris>	 !log upgrade zotero pods to 2019-01-14-115905-candidate in codfw T213693
[14:52:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:52:27] <stashbot>	 T213693: Zotero service crashes and pages multiple times. - https://phabricator.wikimedia.org/T213693
[14:52:49] <wikibugs>	 10Operations, 10DBA, 10Patch-For-Review: Implement parsercache service on pc[12]0(07|08|09|10) and replace leased pc[12]00[456] - https://phabricator.wikimedia.org/T208383 (10Marostegui) pc1007 is now up and replicating. It is catching up. Tomorrow I will replace pc1010 with pc1007 for consistency with codfw...
[14:56:17] <wikibugs>	 (03PS3) 10Revi: Change links of wgGEHelpPanelLinks for kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483996 (https://phabricator.wikimedia.org/T209467)
[14:57:35] <wikibugs>	 10Operations, 10monitoring, 10Goal: Upgrade production prometheus-node-exporter to >= 0.16 - https://phabricator.wikimedia.org/T213708 (10fgiunchedi) p:05Triage→03Normal
[14:57:49] <marostegui>	 !log Drop table tag_summary from s6 - T212255
[14:57:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:57:52] <stashbot>	 T212255: Drop tag_summary table - https://phabricator.wikimedia.org/T212255
[15:00:38] <moritzm>	 !log ran systemctl reset-failed on relforge1001
[15:00:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:01:15] <wikibugs>	 (03PS1) 10Volans: spicerack: fix version [software/spicerack] - 10https://gerrit.wikimedia.org/r/484239 (https://phabricator.wikimedia.org/T205884)
[15:02:18] <vgutierrez>	 !log upgrading kernel in cp1075 to 4.1.144-1 - T203194
[15:02:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:21] <stashbot>	 T203194: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194
[15:02:26] <moritzm>	 !log imported debdeploy 0.0.99.6-1+deb10u1 for buster-wikimedia (T213527)
[15:02:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:31] <stashbot>	 T213527: Prepare our base system layer for Debian buster - https://phabricator.wikimedia.org/T213527
[15:04:33] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero upgrade production -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad]
[15:04:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:04:34] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero cluster eqiad completed
[15:04:34] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero finished
[15:04:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:04:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:04:51] <akosiaris>	 !log upgrade zotero pods to 2019-01-14-115905-candidate in eqiad T213693
[15:04:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:04:53] <stashbot>	 T213693: Zotero service crashes and pages multiple times. - https://phabricator.wikimedia.org/T213693
[15:07:06] <wikibugs>	 10Operations, 10Citoid, 10serviceops, 10Patch-For-Review, 10Wikimedia-Incident: allow zotero container nodejs server to define the amount of heap used instead of the fixed limit of 1.7Gi - https://phabricator.wikimedia.org/T213414 (10akosiaris) p:05Triage→03Normal An image that allows overriding the...
[15:08:49] <volans>	 !log testing switchdc cookbooks in DRY-RUN mode w/ latest spicerack T205884 (no real changes expected)
[15:08:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:08:52] <stashbot>	 T205884: Spicerack: split wmf-auto-reimage-lib into Spicerack modules - https://phabricator.wikimedia.org/T205884
[15:09:12] <wikibugs>	 10Operations, 10Citoid, 10serviceops, 10Kubernetes, 10Wikimedia-Incident: Zotero service crashes and pages multiple times. - https://phabricator.wikimedia.org/T213693 (10akosiaris) p:05Normal→03Low We have already identified a specific url that was able to send zotero in what appear like a busy loop....
[15:11:31] <wikibugs>	 (03PS1) 10Marostegui: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484241 (https://phabricator.wikimedia.org/T85757)
[15:13:52] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484241 (https://phabricator.wikimedia.org/T85757) (owner: 10Marostegui)
[15:14:21] <wikibugs>	 10Operations, 10monitoring: Upgrade to Prometheus 2.x - https://phabricator.wikimedia.org/T187987 (10fgiunchedi)
[15:15:20] <wikibugs>	 10Operations, 10monitoring: Serve >= 50% of production Prometheus systems with Prometheus v2 - https://phabricator.wikimedia.org/T187987 (10fgiunchedi)
[15:15:25] <wikibugs>	 (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484241 (https://phabricator.wikimedia.org/T85757) (owner: 10Marostegui)
[15:16:51] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Depool db1105:3311 T85757 (duration: 00m 46s)
[15:16:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:16:53] <stashbot>	 T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757
[15:16:55] <marostegui>	 !log Deploy schema change on db1105:3311 - T85757
[15:16:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:24:25] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: m5 on db2078 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 302.72 seconds
[15:25:18] <marostegui>	 ^ checking
[15:25:29] <jynus>	 I was too
[15:26:11] <jynus>	 it is gone now
[15:26:14] <wikibugs>	 (03CR) 10jenkins-bot: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484241 (https://phabricator.wikimedia.org/T85757) (owner: 10Marostegui)
[15:26:28] <marostegui>	 it comes and goes
[15:26:40] <marostegui>	 the master is delayed
[15:27:10] <jynus>	 QPS has gone from 0 to 1000
[15:27:35] <jynus>	 at around 14:40-14:43
[15:27:59] <jynus>	 lots of updates
[15:28:37] <marostegui>	 maybe something from the cloud team? as it is m5?
[15:28:48] <marostegui>	 gtirloni: ^ anything that might be hitting m5 dbs?
[15:28:57] <jynus>	 I think I know what it is
[15:29:00] <jynus>	 wikitech upgrade
[15:29:10] <jynus>	 MigrateActors::migrate
[15:29:14] <jynus>	 maybe andrewbogott
[15:29:32] <marostegui>	 that is from anomie
[15:29:41] <jynus>	 oh
[15:29:41] <marostegui>	 ˜/logmsgbot 15:42> !log anomie@mwmaint1002 Running migrateActors.php on wikitech for T188327. This may cause lag in codfw.
[15:29:42] <stashbot>	 T188327: Deploy refactored actor storage - https://phabricator.wikimedia.org/T188327
[15:30:01] <jynus>	 well, it makes sense as wait for replication doesn't really work there
[15:30:08] <jynus>	 maybe we could make it work?
[15:30:31] <marostegui>	 we need to move wikitech to s5 :)
[15:30:52] <jynus>	 "wikitech for T188327. This may cause lag in codfw."
[15:31:01] <jynus>	 so it is expected, there is not much to do
[15:31:20] <marostegui>	 yep
[15:31:24] <jynus>	 sorry, gtirloni andrewbogott it was not your maintenance
[15:31:41] <andrewbogott>	 np
[15:31:46] <anomie>	 Oops, I forgot to update the Deployments page on wikitech. I'll go do that now.
[15:32:14] <jynus>	 it is ok, you logged, it is just I didn't see it because I just returned to my seat
[15:32:49] <jynus>	 andrewbogott: there was some monitoring about wikitech-static some time ago
[15:33:01] <jynus>	 not sure if you saw it
[15:33:33] <andrewbogott>	 i didn't but I'll make a note to look later.  Often mu.tante is also on top of those
[15:33:40] <jynus>	 ok, sorry
[15:33:48] <jynus>	 not very urgent anyway
[15:33:48] <vgutierrez>	 !log rolling restart of cp1076-cp1090 to upgrade to kernel 4.9.144 - T203194
[15:33:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:33:51] <stashbot>	 T203194: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194
[15:35:53] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: m5 on db2078 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[15:36:51] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2049 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 308.21 seconds
[15:42:01] <wikibugs>	 (03PS1) 10Mathew.onipe: elasticsearch: mask default exporter service [puppet] - 10https://gerrit.wikimedia.org/r/484243 (https://phabricator.wikimedia.org/T210592)
[15:44:18] <wikibugs>	 10Operations, 10monitoring: Serve >= 50% of production Prometheus systems with Prometheus v2 - https://phabricator.wikimedia.org/T187987 (10fgiunchedi) The list of production Prometheus instances as of today is (gathered from grafana datasources)  http://prometheus-labmon.eqiad.wmnet/labs http://prometheus.svc...
[15:44:28] <fsero>	 !log downscaling old zotero-production-645dccfb64 replicaset on eqiad
[15:44:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:44:56] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2088 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 303.19 seconds
[15:45:10] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 307.24 seconds
[15:45:14] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2091 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 307.62 seconds
[15:45:32] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2056 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 314.49 seconds
[15:45:58] <anomie>	 !log Running cleanupUsersWithNoIds.php on labswiki and labtestwiki, apparently they were left out when that was done for all other wikis (and so caused issues with the migrateActors.php run).
[15:45:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:51:53] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: Degraded RAID on dbstore1002 - https://phabricator.wikimedia.org/T206965 (10Marostegui) >>! In T206965#4694827, @Cmjohnson wrote: > @elukey dbstore1002 is out of warranty and has 1.2T disks. I don't have disks this size but can replace with a 2TB disk..  Let's do it Th...
[15:52:43] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: Degraded RAID on dbstore1002 - https://phabricator.wikimedia.org/T206965 (10elukey) a:03Cmjohnson
[15:53:21] <wikibugs>	 (03CR) 10Mathew.onipe: "PCC Output looks good: https://puppet-compiler.wmflabs.org/compiler1002/14323/" [puppet] - 10https://gerrit.wikimedia.org/r/484243 (https://phabricator.wikimedia.org/T210592) (owner: 10Mathew.onipe)
[15:53:31] <wikibugs>	 (03PS2) 10Jcrespo: mariadb: Depool db1081 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484213 (https://phabricator.wikimedia.org/T213664)
[15:54:50] <wikibugs>	 (03PS1) 10Addshore: Introduce wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484245 (https://phabricator.wikimedia.org/T201831)
[15:55:58] <wikibugs>	 (03PS1) 10Gehel: wdqs: prometheus-blazegraph-exporter supports multi instances [puppet] - 10https://gerrit.wikimedia.org/r/484246 (https://phabricator.wikimedia.org/T213234)
[15:56:53] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wdqs: prometheus-blazegraph-exporter supports multi instances [puppet] - 10https://gerrit.wikimedia.org/r/484246 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel)
[15:57:14] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero  [namespace: zotero, clusters: eqiad]
[15:57:14] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero cluster eqiad completed
[15:57:14] <logmsgbot>	 !log akosiaris@deploy1001 scap-helm zotero finished
[15:57:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:57:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:57:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:57:18] <wikibugs>	 (03PS1) 10Addshore: wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter 3000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484247 (https://phabricator.wikimedia.org/T201831)
[15:57:33] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2035 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 445.06 seconds
[15:58:24] <wikibugs>	 (03PS1) 10Addshore: wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter fully on [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484248 (https://phabricator.wikimedia.org/T201831)
[15:58:33] <wikibugs>	 (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484249
[15:58:35] <akosiaris>	 ?
[15:58:41] <akosiaris>	 that's wrong... I did nothing
[15:58:53] <akosiaris>	 scap-helm should not have logged anything
[15:58:57] <icinga-wm>	 PROBLEM - IPsec on cp2026 is CRITICAL: Strongswan CRITICAL - ok: 62 not-conn: cp1078_v4, cp1078_v6
[15:59:23] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2041 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 461.95 seconds
[15:59:25] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2063 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 462.32 seconds
[15:59:31] <addshore>	 jouncebot: now
[15:59:31] <jouncebot>	 No deployments scheduled for the next 2 hour(s) and 0 minute(s)
[15:59:34] <addshore>	 jouncebot: next
[15:59:34] <jouncebot>	 In 2 hour(s) and 0 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190114T1800)
[16:00:05] <icinga-wm>	 RECOVERY - IPsec on cp2026 is OK: Strongswan OK - 64 ESP OK
[16:00:07] <wikibugs>	 (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484249 (owner: 10Marostegui)
[16:01:21] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484249 (owner: 10Marostegui)
[16:01:25] <addshore>	 jouncebot reload
[16:01:33] <addshore>	 jouncebot update
[16:01:43] <addshore>	 oh, i never remember...
[16:01:56] <marostegui>	 jouncebot: refresh
[16:01:57] <jouncebot>	 I refreshed my knowledge about deployments.
[16:01:57] <marostegui>	 no?
[16:01:59] <addshore>	 :D
[16:02:01] <marostegui>	 there you go!
[16:02:07] <addshore>	 jouncebot: next
[16:02:07] <jouncebot>	 In 162 hour(s) and 27 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190121T1030)
[16:02:18] <addshore>	 hmmmp, did I break it now >.>
[16:02:21] * addshore looks back at the diff
[16:02:58] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool db1105:3311 T85757 (duration: 00m 46s)
[16:03:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:03:00] <stashbot>	 T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757
[16:03:50] <logmsgbot>	 !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1105:3311 T85757 (duration: 00m 45s)
[16:03:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:04:36] <addshore>	 jouncebot: refresh
[16:04:38] <jouncebot>	 I refreshed my knowledge about deployments.
[16:04:39] <addshore>	 jouncebot: next
[16:04:39] <jouncebot>	 In 0 hour(s) and 55 minute(s): Wikidata: Deploy property link formatter that uses cache instead of wb_terms DB table (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190114T1700)
[16:04:42] <addshore>	 thats better
[16:05:20] <wikibugs>	 (03PS4) 10AndyRussG: Give protect right to centralnoticeadmin on Meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483044 (https://phabricator.wikimedia.org/T209873)
[16:06:25] <wikibugs>	 (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484249 (owner: 10Marostegui)
[16:12:55] <icinga-wm>	 PROBLEM - puppet last run on matomo1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tshark]
[16:18:36] <wikibugs>	 (03CR) 10Mobrovac: "Hm, but as you noted, stretch-backports gives us node 8, while we will be skipping node 8 and go directly to node 10. Would that still be " [puppet] - 10https://gerrit.wikimedia.org/r/483891 (https://phabricator.wikimedia.org/T201366) (owner: 10Dzahn)
[16:18:45] <moritzm>	 ^ matomo1001 is me, should recover soon
[16:25:09] <wikibugs>	 (03CR) 10Muehlenhoff: "Yeah, the nodejs 10 package from the component also no longer builds a -legacy package." [puppet] - 10https://gerrit.wikimedia.org/r/483891 (https://phabricator.wikimedia.org/T201366) (owner: 10Dzahn)
[16:26:34] <wikibugs>	 (03PS1) 10Volans: sre.switchdc.mediawiki: fix update tendril [cookbooks] - 10https://gerrit.wikimedia.org/r/484255
[16:36:30] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s5 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 218.29 seconds
[16:37:05] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] hiera: add cluster definition to dumps servers [puppet] - 10https://gerrit.wikimedia.org/r/483602 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite)
[16:37:12] <wikibugs>	 (03PS5) 10Cwhite: hiera: add cluster definition to dumps servers [puppet] - 10https://gerrit.wikimedia.org/r/483602 (https://phabricator.wikimedia.org/T210486)
[16:37:23] * elukey looks at moritzm trying to break our dear Matomo
[16:37:26] <elukey>	 :D
[16:39:07] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] hiera: add cluster definition to syslog servers [puppet] - 10https://gerrit.wikimedia.org/r/483612 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite)
[16:43:30] <icinga-wm>	 RECOVERY - puppet last run on matomo1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[16:43:36] <logmsgbot>	 !log mobrovac@deploy1001 scap-helm -h  [namespace: -h, clusters: eqiad,codfw]
[16:43:36] <logmsgbot>	 !log mobrovac@deploy1001 scap-helm -h cluster eqiad completed
[16:43:36] <logmsgbot>	 !log mobrovac@deploy1001 scap-helm -h cluster codfw completed
[16:43:37] <logmsgbot>	 !log mobrovac@deploy1001 scap-helm -h finished
[16:43:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:43:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:43:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:43:41] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:43:45] <mobrovac>	 lol
[16:44:01] <mobrovac>	 akosiaris: bug or feature ^ ?
[16:44:02] <mobrovac>	 :P
[16:44:17] <wikibugs>	 (03PS2) 10Cwhite: hiera: add cluster definition to syslog servers [puppet] - 10https://gerrit.wikimedia.org/r/483612 (https://phabricator.wikimedia.org/T210486)
[16:44:24] <volans>	 mobrovac: I bet feature, public help :-P
[16:45:54] <wikibugs>	 (03PS2) 10Nuria: Adding default granularities for monthly datasets [puppet] - 10https://gerrit.wikimedia.org/r/483888 (https://phabricator.wikimedia.org/T209103)
[16:46:24] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2056 is OK: OK slave_sql_lag Replication lag: 9.86 seconds
[16:46:28] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2041 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[16:46:30] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2063 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[16:46:50] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2095 is OK: OK slave_sql_lag Replication lag: 0.30 seconds
[16:46:58] <wikibugs>	 (03CR) 10Cwhite: [C: 03+2] hiera: add cluster definition to syslog servers [puppet] - 10https://gerrit.wikimedia.org/r/483612 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite)
[16:47:17] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2035 is OK: OK slave_sql_lag Replication lag: 0.40 seconds
[16:47:31] <wikibugs>	 (03PS3) 10Elukey: turnilo: add default granularities for monthly datasets [puppet] - 10https://gerrit.wikimedia.org/r/483888 (https://phabricator.wikimedia.org/T209103) (owner: 10Nuria)
[16:47:40] <wikibugs>	 (03PS4) 10Elukey: turnilo: add default granularities for monthly datasets [puppet] - 10https://gerrit.wikimedia.org/r/483888 (https://phabricator.wikimedia.org/T209103) (owner: 10Nuria)
[16:50:37] <wikibugs>	 (03CR) 10Elukey: [C: 03+2] turnilo: add default granularities for monthly datasets [puppet] - 10https://gerrit.wikimedia.org/r/483888 (https://phabricator.wikimedia.org/T209103) (owner: 10Nuria)
[16:51:05] <icinga-wm>	 PROBLEM - puppet last run on cloudvirt1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:51:22] <akosiaris>	 mobrovac: bug for sure
[16:56:07] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2088 is OK: OK slave_sql_lag Replication lag: 0.42 seconds
[16:57:57] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2049 is OK: OK slave_sql_lag Replication lag: 0.24 seconds
[16:58:15] <wikibugs>	 (03PS1) 10Huji: Add new synonyms for namespaces in Persian (fa) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484256 (https://phabricator.wikimedia.org/T213733)
[16:59:47] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2091 is OK: OK slave_sql_lag Replication lag: 0.39 seconds
[16:59:57] <wikibugs>	 (03PS2) 10Gehel: wdqs: prometheus-blazegraph-exporter supports multi instances [puppet] - 10https://gerrit.wikimedia.org/r/484246 (https://phabricator.wikimedia.org/T213234)
[17:00:04] <jouncebot>	 addshore: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Wikidata: Deploy property link formatter that uses cache instead of wb_terms DB table deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190114T1700).
[17:00:04] <jouncebot>	 Addshore: A patch you scheduled for Wikidata: Deploy property link formatter that uses cache instead of wb_terms DB table is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[17:00:13] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1115 (tendril DB) had OOM for some processes and some hw (memory) issues - https://phabricator.wikimedia.org/T196726 (10Cmjohnson) @Marostegui I have to do move the DIMM to another slot and see if the error corrects itself moves with the DIMM or remains the same.  Can you...
[17:00:34] <wikibugs>	 (03PS1) 10GTirloni: wmcs::nfs::misc - Configure nsswitch.conf [puppet] - 10https://gerrit.wikimedia.org/r/484257 (https://phabricator.wikimedia.org/T209527)
[17:00:55] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA: db1115 (tendril DB) had OOM for some processes and some hw (memory) issues - https://phabricator.wikimedia.org/T196726 (10Marostegui) Yep, we can do that! Just ping us when you are ready for it Thanks!
[17:01:00] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wdqs: prometheus-blazegraph-exporter supports multi instances [puppet] - 10https://gerrit.wikimedia.org/r/484246 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel)
[17:01:03] <wikibugs>	 (03PS2) 10Huji: Add new synonyms for namespaces in Persian (fa) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484256 (https://phabricator.wikimedia.org/T213733)
[17:01:33] <wikibugs>	 10Operations, 10ops-eqiad, 10RESTBase, 10RESTBase-Cassandra, and 3 others: Memory error on restbase1016 - https://phabricator.wikimedia.org/T212418 (10Cmjohnson) The log remains clear and no erros have returned. I will give it another 24 hours and if no change then it can go back into service.
[17:03:07] <addshore>	 o/
[17:03:14] * addshore is going to go ahead with the slot :)
[17:04:00] <wikibugs>	 10Operations, 10media-storage: Lost file Juan_Guaidó.jpg - https://phabricator.wikimedia.org/T213655 (10CDanis) p:05Triage→03Normal a:03jcrespo
[17:04:16] <wikibugs>	 (03PS2) 10Addshore: Introduce wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484245 (https://phabricator.wikimedia.org/T201831)
[17:04:21] <wikibugs>	 (03PS2) 10Addshore: wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter 3000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484247 (https://phabricator.wikimedia.org/T201831)
[17:04:28] <wikibugs>	 (03PS2) 10Addshore: wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter fully on [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484248 (https://phabricator.wikimedia.org/T201831)
[17:04:47] <wikibugs>	 10Operations, 10media-storage: Lost file Juan_Guaidó.jpg - https://phabricator.wikimedia.org/T213655 (10CDanis) @jcrespo and @fgiunchedi are going to take a look at what happened to the file in Swift.
[17:04:49] <wikibugs>	 (03PS3) 10Gehel: wdqs: prometheus-blazegraph-exporter supports multi instances [puppet] - 10https://gerrit.wikimedia.org/r/484246 (https://phabricator.wikimedia.org/T213234)
[17:04:59] <wikibugs>	 (03CR) 10Addshore: [C: 03+2] Introduce wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484245 (https://phabricator.wikimedia.org/T201831) (owner: 10Addshore)
[17:05:25] <wikibugs>	 (03CR) 10Ottomata: admin: allow users to be deployed without ssh keys configured (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) (owner: 10Elukey)
[17:06:09] <wikibugs>	 (03Merged) 10jenkins-bot: Introduce wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484245 (https://phabricator.wikimedia.org/T201831) (owner: 10Addshore)
[17:08:42] <logmsgbot>	 !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T201831 T201838 Introduce wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter PT 1/2 (duration: 00m 47s)
[17:08:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:08:46] <stashbot>	 T201831: Deploy item/property link formatter that uses cache instead of wb_terms DB table - https://phabricator.wikimedia.org/T201831
[17:08:46] <stashbot>	 T201838: Use link formatter that uses cache instead of wb_terms for all wikidatawiki properties - https://phabricator.wikimedia.org/T201838
[17:09:44] <logmsgbot>	 !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: T201831 T201838 Introduce wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter PT 2/2 (duration: 00m 45s)
[17:09:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:09:55] <wikibugs>	 (03CR) 10Addshore: [C: 03+2] wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter 3000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484247 (https://phabricator.wikimedia.org/T201831) (owner: 10Addshore)
[17:11:08] <wikibugs>	 (03Merged) 10jenkins-bot: wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter 3000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484247 (https://phabricator.wikimedia.org/T201831) (owner: 10Addshore)
[17:11:25] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2054 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 305.67 seconds
[17:11:27] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2087 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.12 seconds
[17:11:27] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2040 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 306.04 seconds
[17:11:45] <logmsgbot>	 !log addshore@deploy1001 sync-file aborted: T201831 T201838 wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter 3000 (duration: 00m 01s)
[17:11:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:11:51] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2077 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 314.72 seconds
[17:11:55] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2047 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 316.10 seconds
[17:12:05] <wikibugs>	 (03CR) 10jenkins-bot: Introduce wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484245 (https://phabricator.wikimedia.org/T201831) (owner: 10Addshore)
[17:12:05] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 319.96 seconds
[17:12:08] <wikibugs>	 (03CR) 10jenkins-bot: wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter 3000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484247 (https://phabricator.wikimedia.org/T201831) (owner: 10Addshore)
[17:12:13] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2086 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 321.78 seconds
[17:12:17] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2061 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 322.26 seconds
[17:13:05] <logmsgbot>	 !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T201831 T201838 wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter 3000 (duration: 00m 46s)
[17:13:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:13:14] <wikibugs>	 10Operations, 10ops-codfw, 10decommission, 10Discovery-Search (Current work), 10Patch-For-Review: Decommission elastic2001-2024 - https://phabricator.wikimedia.org/T211023 (10Papaul) a:05Papaul→03RobH This is complete. All servers ready to be ship out.
[17:13:21] * addshore will now watch some graphs for a few mins
[17:13:29] <wikibugs>	 10Operations, 10Certcentral, 10Traffic: Allow specifying a custom period of time before deploying a newly issued certificate - https://phabricator.wikimedia.org/T213737 (10Vgutierrez) p:05Triage→03Normal
[17:14:01] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: decommission of restbase200[1-6] (lease return in December 2018) - https://phabricator.wikimedia.org/T211070 (10Papaul) a:05Papaul→03RobH This is complete. All servers ready to be ship out.
[17:17:44] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s7 on db2068 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 426.69 seconds
[17:19:49] <wikibugs>	 (03CR) 10Addshore: [C: 03+2] wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter fully on [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484248 (https://phabricator.wikimedia.org/T201831) (owner: 10Addshore)
[17:20:59] <wikibugs>	 (03Merged) 10jenkins-bot: wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter fully on [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484248 (https://phabricator.wikimedia.org/T201831) (owner: 10Addshore)
[17:21:46] <icinga-wm>	 RECOVERY - puppet last run on cloudvirt1030 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[17:21:57] <logmsgbot>	 !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T201831 T201838 wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter fully on (duration: 00m 46s)
[17:22:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:22:01] <stashbot>	 T201831: Deploy item/property link formatter that uses cache instead of wb_terms DB table - https://phabricator.wikimedia.org/T201831
[17:22:02] <stashbot>	 T201838: Use link formatter that uses cache instead of wb_terms for all wikidatawiki properties - https://phabricator.wikimedia.org/T201838
[17:24:54] <wikibugs>	 (03CR) 10GTirloni: [C: 03+2] wmcs::nfs::misc - Configure nsswitch.conf [puppet] - 10https://gerrit.wikimedia.org/r/484257 (https://phabricator.wikimedia.org/T209527) (owner: 10GTirloni)
[17:25:00] <wikibugs>	 (03CR) 10jenkins-bot: wmgWikibaseMaxItemIdForNewPropertyIdHtmlFormatter fully on [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484248 (https://phabricator.wikimedia.org/T201831) (owner: 10Addshore)
[17:25:02] <addshore>	 !log deploy slot done
[17:25:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:33:28] <wikibugs>	 (03CR) 10Mobrovac: [C: 03+1] "kk, lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/483891 (https://phabricator.wikimedia.org/T201366) (owner: 10Dzahn)
[17:34:20] <wikibugs>	 (03PS1) 10GTirloni: wmcs::nfs::misc - Second attempt to fix nsswitch.conf [puppet] - 10https://gerrit.wikimedia.org/r/484258 (https://phabricator.wikimedia.org/T209527)
[17:36:18] <icinga-wm>	 PROBLEM - IPsec on cp2023 is CRITICAL: Strongswan CRITICAL - ok: 50 not-conn: cp1089_v4, cp1089_v6
[17:36:26] <wikibugs>	 10Operations, 10Patch-For-Review, 10User-Marostegui, 10User-fgiunchedi: Audit "misc" cluster hosts - https://phabricator.wikimedia.org/T210486 (10colewhite) a:03colewhite
[17:37:28] <icinga-wm>	 RECOVERY - IPsec on cp2023 is OK: Strongswan OK - 52 ESP OK
[17:42:10] <icinga-wm>	 PROBLEM - Request latencies on acrab is CRITICAL: instance=10.192.16.26:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[17:43:27] <vgutierrez>	 cp2023 was me upgrading the kernel in cp1089, it's up again already
[17:43:42] <wikibugs>	 (03CR) 10GTirloni: [C: 03+2] wmcs::nfs::misc - Second attempt to fix nsswitch.conf [puppet] - 10https://gerrit.wikimedia.org/r/484258 (https://phabricator.wikimedia.org/T209527) (owner: 10GTirloni)
[17:44:36] <icinga-wm>	 PROBLEM - Request latencies on acrux is CRITICAL: instance=10.192.0.93:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[17:44:47] <wikibugs>	 10Puppet, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current), and 2 others: ORES services should bind to ores config files - https://phabricator.wikimedia.org/T210719 (10Halfak) Maybe we should have a script and a process instead for manually restarting ORES nodes in a safe way.  See {T213743}
[17:47:02] <icinga-wm>	 RECOVERY - Request latencies on acrux is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[17:48:36] <wikibugs>	 10Operations, 10ops-eqiad, 10Traffic, 10Patch-For-Review: cp1075-90 - bnxt_en transmit hangs - https://phabricator.wikimedia.org/T203194 (10Vgutierrez) kernel upgraded successfully in cp1075-cp1090: ` vgutierrez@cumin1001:~$ sudo cumin cp[1075-1090].eqiad.wmnet 'uname -v' 16 hosts will be targeted: cp[1075...
[17:57:20] <icinga-wm>	 PROBLEM - Request latencies on acrux is CRITICAL: instance=10.192.0.93:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[18:00:04] <jouncebot>	 gehel and onimisionipe: It is that lovely time of the day again! You are hereby commanded to deploy Wikidata Query Service weekly deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190114T1800).
[18:00:04] <jouncebot>	 onimisionipe: A patch you scheduled for Wikidata Query Service weekly deploy is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[18:00:22] <onimisionipe>	 here here
[18:01:46] <icinga-wm>	 RECOVERY - Request latencies on acrux is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[18:03:40] <wikibugs>	 (03PS1) 10GTirloni: wmcs::nfs::misc - Remove wmcs-root from admin groups [puppet] - 10https://gerrit.wikimedia.org/r/484260 (https://phabricator.wikimedia.org/T209527)
[18:05:12] <wikibugs>	 (03PS4) 10Dzahn: admins: add Greg to phabricator-admins [puppet] - 10https://gerrit.wikimedia.org/r/483623 (https://phabricator.wikimedia.org/T213569)
[18:05:56] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.56 seconds
[18:05:56] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2049 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 352.61 seconds
[18:06:16] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2035 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 308.83 seconds
[18:06:18] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2091 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.49 seconds
[18:06:22] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2063 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 310.12 seconds
[18:06:30] <mutante>	 Hauskatze: re: " looks like more than 'phabricator-admins' are entitled this access" it's because there is also phabricator-roots
[18:06:46] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2088 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 316.31 seconds
[18:06:59] <Hauskatze>	 mutante: ack, but I was refering to the fact that the compiler listed users with "absent" status
[18:07:04] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "approved in SRE meeting" [puppet] - 10https://gerrit.wikimedia.org/r/483623 (https://phabricator.wikimedia.org/T213569) (owner: 10Dzahn)
[18:07:54] <logmsgbot>	 !log onimisionipe@deploy1001 Started deploy [wdqs/wdqs@f71131e]: Category script and GUI updates, blazegraph launcher updates and moved RWStore from scap to puppet
[18:07:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:07:56] <mutante>	 Hauskatze: ah, that's the spcial group for absent users.. absent doesnt mean literally not existing, it means "member of a special group"
[18:08:01] <mutante>	 be back in a while
[18:08:30] <icinga-wm>	 RECOVERY - Request latencies on acrab is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[18:14:48] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db1124 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 304.14 seconds
[18:16:44] <icinga-wm>	 PROBLEM - Request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[18:18:40] <icinga-wm>	 PROBLEM - Request latencies on argon is CRITICAL: instance=10.64.32.133:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[18:18:50] <logmsgbot>	 !log onimisionipe@deploy1001 Finished deploy [wdqs/wdqs@f71131e]: Category script and GUI updates, blazegraph launcher updates and moved RWStore from scap to puppet (duration: 10m 56s)
[18:18:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:19:32] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2041 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 494.70 seconds
[18:19:34] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s2 on db2056 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 496.53 seconds
[18:19:42] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db1124 is OK: OK slave_sql_lag Replication lag: 22.42 seconds
[18:21:38] <icinga-wm>	 RECOVERY - Request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[18:22:18] <icinga-wm>	 RECOVERY - Request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[18:26:08] <wikibugs>	 10Operations, 10Citoid, 10serviceops, 10Kubernetes, 10Wikimedia-Incident: Zotero service crashes and pages multiple times. - https://phabricator.wikimedia.org/T213693 (10greg) Meta: Reading "This task is sort of an umbrella task for zotero latest incidents, it should be closed when we dont receive multip...
[18:30:58] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2057 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 309.10 seconds
[18:33:39] <wikibugs>	 10Operations, 10Code-Stewardship-Reviews, 10Graphoid, 10Core Platform Team Backlog (Watching / External), and 2 others: graphoid: Code stewardship request - https://phabricator.wikimedia.org/T211881 (10akosiaris) Some numbers to help inform the decision  = Graph usage =  Using WMCS resources I extracted th...
[18:38:16] <wikibugs>	 10Operations, 10Code-Stewardship-Reviews, 10Graphoid, 10Core Platform Team Backlog (Watching / External), and 2 others: graphoid: Code stewardship request - https://phabricator.wikimedia.org/T211881 (10akosiaris) >>! In T211881#4820828, @Milimetric wrote: > The reason Graphoid was initially developed was t...
[18:40:28] <wikibugs>	 10Operations, 10Code-Stewardship-Reviews, 10Graphoid, 10Core Platform Team Backlog (Watching / External), and 2 others: graphoid: Code stewardship request - https://phabricator.wikimedia.org/T211881 (10akosiaris)
[18:40:48] <wikibugs>	 (03PS3) 10Jcrespo: mariadb: Depool db1081 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484213 (https://phabricator.wikimedia.org/T213664)
[18:42:03] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] mariadb: Depool db1081 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484213 (https://phabricator.wikimedia.org/T213664) (owner: 10Jcrespo)
[18:42:07] <wikibugs>	 10Operations, 10Code-Stewardship-Reviews, 10Graphoid, 10Core Platform Team Backlog (Watching / External), and 2 others: graphoid: Code stewardship request - https://phabricator.wikimedia.org/T211881 (10akosiaris) >>! In T211881#4821719, @Yurik wrote: > @akosiaris also, please add usage before the Varnish -...
[18:42:58] <wikibugs>	 10Operations, 10Code-Stewardship-Reviews, 10Graphoid, 10Core Platform Team Backlog (Watching / External), and 2 others: graphoid: Code stewardship request - https://phabricator.wikimedia.org/T211881 (10akosiaris) >>! In T211881#4822247, @Tgr wrote: >>>! In T211881#4820828, @Milimetric wrote: >> The reason...
[18:43:09] <wikibugs>	 (03Merged) 10jenkins-bot: mariadb: Depool db1081 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484213 (https://phabricator.wikimedia.org/T213664) (owner: 10Jcrespo)
[18:43:52] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, 10serviceops, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10Smalyshev) I've tried to read all of it and maybe I've missed something, but I am still not sure what added value having such separate serv...
[18:43:59] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH) p:05Triage→03High
[18:44:03] <wikibugs>	 (03CR) 10jenkins-bot: mariadb: Depool db1081 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484213 (https://phabricator.wikimedia.org/T213664) (owner: 10Jcrespo)
[18:45:25] <logmsgbot>	 !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: depool db1081 (duration: 00m 46s)
[18:45:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:48:10] <wikibugs>	 (03PS1) 10Jcrespo: Revert "mariadb: Depool db1081 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484267
[18:48:12] <wikibugs>	 (03PS3) 10Huji: Add new synonyms for namespaces in Persian (fa) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484256 (https://phabricator.wikimedia.org/T213733)
[18:48:27] <jynus>	 !log stop upgrade and restart db1081
[18:48:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:51:35] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH)
[18:53:02] <wikibugs>	 (03CR) 10Smalyshev: [C: 03+1] "lgtm, +some random notes" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/484246 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel)
[18:54:46] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH)
[18:55:29] <wikibugs>	 10Operations: Offboard Balazs - https://phabricator.wikimedia.org/T213703 (10jbond)
[18:57:29] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH)
[18:58:55] <wikibugs>	 10Operations: Offboard Balazs - https://phabricator.wikimedia.org/T213703 (10jbond)
[19:00:04] <jouncebot>	 Deploy window Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190114T1900)
[19:00:04] <jouncebot>	 MaxSem, dcausse, stephanebisson, and James_F: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[19:00:06] * James_F waves for jouncebot.
[19:00:15] <dcausse>	 o/
[19:00:19] <stephanebisson>	 hello
[19:00:22] <James_F>	 Anyone planning to SWAT, or should I?
[19:00:51] <dcausse>	 I can SWAT if there's no thing crazy to deplay :)
[19:00:58] <James_F>	 dcausse: Go for it. :-)
[19:00:59] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10elukey)
[19:01:42] <wikibugs>	 (03PS3) 10DCausse: Remove old ArticleCreationWorkflows config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462041 (https://phabricator.wikimedia.org/T204016) (owner: 10MaxSem)
[19:03:44] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10elukey)
[19:04:39] <dcausse>	 MaxSem: around?
[19:05:21] <dcausse>	 looks like it's just a cleanup I guess it's fine to deploy
[19:05:35] <wikibugs>	 (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462041 (https://phabricator.wikimedia.org/T204016) (owner: 10MaxSem)
[19:06:39] <wikibugs>	 (03Merged) 10jenkins-bot: Remove old ArticleCreationWorkflows config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462041 (https://phabricator.wikimedia.org/T204016) (owner: 10MaxSem)
[19:08:18] <wikibugs>	 10Operations, 10Code-Stewardship-Reviews, 10Graphoid, 10Core Platform Team Backlog (Watching / External), and 2 others: graphoid: Code stewardship request - https://phabricator.wikimedia.org/T211881 (10akosiaris) > @akosiaris, the logic in <An unorthodox architecture of the API of the service> is fundament...
[19:09:28] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH)
[19:09:36] <logmsgbot>	 !log dcausse@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T204016: Remove old ArticleCreationWorkflows config (duration: 00m 46s)
[19:09:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:09:39] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH)
[19:10:24] <wikibugs>	 (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483645 (owner: 10Jforrester)
[19:10:34] <dcausse>	 James_F: is there something to test with this patch? ^
[19:10:43] <wikibugs>	 (03CR) 10jenkins-bot: Remove old ArticleCreationWorkflows config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/462041 (https://phabricator.wikimedia.org/T204016) (owner: 10MaxSem)
[19:10:57] <James_F>	 dcausse: It's fine.
[19:11:35] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH)
[19:13:23] <wikibugs>	 (03PS2) 10DCausse: Clean-up: Explain why WBMI wikis don't need wmgWikibaseRepoEntityNamespaces set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483645 (owner: 10Jforrester)
[19:13:45] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10DBA: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH)
[19:13:49] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] admins: add Greg to phabricator-admins [puppet] - 10https://gerrit.wikimedia.org/r/483623 (https://phabricator.wikimedia.org/T213569) (owner: 10Dzahn)
[19:14:33] <dcausse>	 stephanebisson: hey, is there something you would like to test (should I deploy it on mwdebug1002 first?)
[19:14:43] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: add Greg Grossmeier to Phabricator admins group - https://phabricator.wikimedia.org/T213569 (10Dzahn)
[19:14:45] <stephanebisson>	 dcausse: yes please, I can test it
[19:14:53] <wikibugs>	 (03CR) 10Jcrespo: [C: 03+2] Revert "mariadb: Depool db1081 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484267 (owner: 10Jcrespo)
[19:15:02] <dcausse>	 ok
[19:15:09] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10DBA: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH)
[19:16:18] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1081 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484267 (owner: 10Jcrespo)
[19:16:45] <dcausse>	 stephanebisson: it's live on mwdebug1002
[19:17:51] <stephanebisson>	 dcausse: testing now
[19:18:21] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: add Greg Grossmeier to Phabricator admins group - https://phabricator.wikimedia.org/T213569 (10Dzahn) 05Open→03Resolved a:03Dzahn ` [phab1001:~] $ id gjg uid=2890(gjg) gid=500(wikidev) groups=500(wikidev),746(phabricator-admin)  `  @greg Should w...
[19:18:31] <wikibugs>	 10Operations, 10SRE-Access-Requests: add Greg Grossmeier to Phabricator admins group - https://phabricator.wikimedia.org/T213569 (10Dzahn)
[19:18:37] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10DBA: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH)
[19:18:47] <ottomata>	 godog: re T213081, why do you want to add more partitions?
[19:18:47] <stashbot>	 T213081: Consider increasing kafka logging topic partitions - https://phabricator.wikimedia.org/T213081
[19:19:00] <logmsgbot>	 !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1081 with low load (duration: 00m 47s)
[19:19:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:19:07] <Hauskatze>	 good to have greg-g added mutante :)
[19:19:20] <stephanebisson>	 dcausse: Looks good, you can deploy
[19:19:28] <jynus>	 Check endpoints for mwdebug1002.eqiad.wmnet' failed: /wiki/{title} (Main Page) timed out before a response was received; /wiki/{title} (Special Version) timed out before a response was received; /w/api.php (Main Page pageprops) timed out before a response was received
[19:19:29] <dcausse>	 stephanebisson: deploying
[19:20:01] <jynus>	 dcausse: ^
[19:20:06] <dcausse>	 jynus: looking
[19:20:09] <wikibugs>	 (03PS3) 10Dzahn: doc: grant doc-uploader access to contint users [puppet] - 10https://gerrit.wikimedia.org/r/480798 (https://phabricator.wikimedia.org/T213169) (owner: 10Hashar)
[19:21:20] <wikibugs>	 (03PS6) 10MarcoAurelio: [WIP] mediawiki: Stop logging each run of purge_abusefilter.pp [puppet] - 10https://gerrit.wikimedia.org/r/483876 (https://phabricator.wikimedia.org/T213591)
[19:21:35] <wikibugs>	 (03CR) 10MarcoAurelio: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/483876 (https://phabricator.wikimedia.org/T213591) (owner: 10MarcoAurelio)
[19:23:14] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "this has been approved in SRE meeting but with the additional comment that this should not stay manual in the long-term and probably needs" [puppet] - 10https://gerrit.wikimedia.org/r/480798 (https://phabricator.wikimedia.org/T213169) (owner: 10Hashar)
[19:23:27] <dcausse>	 jynus: I don't see anything wrong, replaying these requests is working well, I wonder if it's related to T204871
[19:23:28] <stashbot>	 T204871: Investigate the spikes of "web request took longer than 60 seconds and timed out" during deployments - https://phabricator.wikimedia.org/T204871
[19:23:59] <wikibugs>	 (03CR) 10jenkins-bot: Revert "mariadb: Depool db1081 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484267 (owner: 10Jcrespo)
[19:24:01] <dcausse>	 can someone confirm that it's "normal" to see some time outs on mwdebug1002 after running scap pull?
[19:24:04] <jynus>	 dcausse: just got that and was the only thing related I could thing about
[19:24:10] <dcausse>	 thcipriani: perhaps? ^
[19:24:24] <jynus>	 and the rule #1 is to speak up just in case
[19:24:30] <dcausse>	 sure
[19:24:43] <jynus>	 I don't see any error on the logs either
[19:24:49] <dcausse>	 yes me neither
[19:25:31] <thcipriani>	 dcausse: I have noticed that on occasion when hhvm load becomes high on a particular machine after a pull
[19:25:33] <jynus>	 dcausse: remember it is only when one says "it is nothing" when issues happen, and the other way round
[19:25:46] <jynus>	 :-)
[19:25:48] <dcausse>	 :)
[19:26:11] <dcausse>	 thcipriani: ok thanks, I guess we're in this situation
[19:28:08] <XioNoX>	 !log re-activate BGP to Zayo on cr1-eqiad - T212791
[19:28:10] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:28:11] <stashbot>	 T212791: Interface errors on cr1-eqiad:xe-3/3/1 - https://phabricator.wikimedia.org/T212791
[19:29:15] <logmsgbot>	 !log dcausse@deploy1001 Synchronized php-1.33.0-wmf.12/extensions/GrowthExperiments/includes/WelcomeSurvey.php: Welcome survey: ignore check confirmed email (duration: 00m 45s)
[19:29:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:29:28] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Grant sudo access for CI admins to doc.wikimedia.org publishing user - https://phabricator.wikimedia.org/T213169 (10Dzahn) 05Open→03Resolved The request has been appr...
[19:29:46] <dcausse>	 stephanebisson: should be live
[19:30:08] <dcausse>	 James_F: back to your patch, sorry for the delay
[19:30:35] <James_F>	 No worries.
[19:31:09] <wikibugs>	 (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483645 (owner: 10Jforrester)
[19:32:48] <XioNoX>	 !log re-deactivate BGP to Zayo on cr1-eqiad - T212791
[19:32:50] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:32:53] <wikibugs>	 (03Merged) 10jenkins-bot: Clean-up: Explain why WBMI wikis don't need wmgWikibaseRepoEntityNamespaces set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483645 (owner: 10Jforrester)
[19:34:49] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "I'm sorry for the placeholder -1, but I have to run now. I'll add all the related comments later today." [software/certcentral] - 10https://gerrit.wikimedia.org/r/483163 (https://phabricator.wikimedia.org/T213301) (owner: 10Vgutierrez)
[19:35:23] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10DBA: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH)
[19:35:55] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10DBA: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH)
[19:37:10] <logmsgbot>	 !log dcausse@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Clean-up: Explain why WBMI wikis don't need wmgWikibaseRepoEntityNamespaces set (duration: 00m 46s)
[19:37:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:37:13] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "yea, this is just a revert of https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/410072/ which i merged. so if it's not WIP anymore it" [puppet] - 10https://gerrit.wikimedia.org/r/483876 (https://phabricator.wikimedia.org/T213591) (owner: 10MarcoAurelio)
[19:37:16] <wikibugs>	 (03CR) 10jenkins-bot: Clean-up: Explain why WBMI wikis don't need wmgWikibaseRepoEntityNamespaces set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483645 (owner: 10Jforrester)
[19:37:36] <dcausse>	 James_F: done
[19:37:49] <James_F>	 Thanks!
[19:37:51] <dcausse>	 yw!
[19:38:26] <wikibugs>	 (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476271 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse)
[19:41:07] <wikibugs>	 (03PS20) 10DCausse: [cirrus] Start writing to psi & omega [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476271 (https://phabricator.wikimedia.org/T210381)
[19:41:09] <wikibugs>	 (03PS20) 10DCausse: [cirrus] Start using replica group settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476272 (https://phabricator.wikimedia.org/T210381)
[19:41:11] <wikibugs>	 (03PS22) 10DCausse: [cirrus] Cleanup transitional states [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476273 (https://phabricator.wikimedia.org/T210381)
[19:41:32] <wikibugs>	 (03CR) 10Dzahn: "the quotes i left were from a discussion on #httpd the freenode channel about using protocol in server name but they must have been talkin" [puppet] - 10https://gerrit.wikimedia.org/r/483775 (https://phabricator.wikimedia.org/T95164) (owner: 10Hashar)
[19:43:46] <wikibugs>	 (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476271 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse)
[19:43:51] <MaxSem>	 dcausse: sorry I wasn't around. Thanks for deploying!
[19:43:57] <dcausse>	 MaxSem: np!
[19:45:06] <James_F>	 dcausse: All done? I've just realised I didn't schedule a core back-port. :-( I can deploy it if you're busy.
[19:45:13] <wikibugs>	 (03Merged) 10jenkins-bot: [cirrus] Start writing to psi & omega [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476271 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse)
[19:45:34] <dcausse>	 James_F: I'm testing my patch but it's
[19:45:44] <dcausse>	 going to take some time to test :(
[19:47:14] <James_F>	 dcausse: Oh, no worries, I can do it whenever.
[19:47:34] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2057 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[19:50:26] <wikibugs>	 (03CR) 10jenkins-bot: [cirrus] Start writing to psi & omega [mediawiki-config] - 10https://gerrit.wikimedia.org/r/476271 (https://phabricator.wikimedia.org/T210381) (owner: 10DCausse)
[19:51:41] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10DBA: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH) Please note the work has now been scheduled for Thursday, 2019-01-17 @ 07:00 EST (12:00 GMT).  As both the #dba team and the #analytics team have expressed interest in st...
[19:53:33] <wikibugs>	 (03PS1) 10DCausse: Revert "[cirrus] Start writing to psi & omega" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484273
[19:55:22] <wikibugs>	 (03PS4) 10Ottomata: [WIP] Helm chart for eventgate-analytics deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/483035 (https://phabricator.wikimedia.org/T211247)
[19:55:54] <wikibugs>	 (03CR) 10DCausse: [C: 03+2] "SWAT, reverted patch failed testing on mwdebug1002" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484273 (owner: 10DCausse)
[19:57:05] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "[cirrus] Start writing to psi & omega" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484273 (owner: 10DCausse)
[19:57:07] <wikibugs>	 (03PS1) 10Jbond: update the offboard-user script so that it also checks absent users [puppet] - 10https://gerrit.wikimedia.org/r/484276
[19:57:47] <dcausse>	 James_F: I'm done
[19:57:50] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10WMF-Legal, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T213397 (10RStallman-legalteam) This is fully signed and filed. Thanks!
[19:58:03] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10DBA: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH)
[19:58:27] <dcausse>	 !log Morning SWAT done
[19:58:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:58:45] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10DBA: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10Marostegui) @robh what's your plan with db1075 (the db master)?
[19:59:25] <wikibugs>	 (03CR) 10Gehel: wdqs: prometheus-blazegraph-exporter supports multi instances (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/484246 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel)
[19:59:48] <icinga-wm>	 PROBLEM - Request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[19:59:52] <James_F>	 jouncebot: next
[19:59:52] <jouncebot>	 In 1 hour(s) and 0 minute(s): Services – Parsoid / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190114T2100)
[19:59:58] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10DBA: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10RobH) >>! In T213748#4878612, @Marostegui wrote: > @robh what's your plan with db1075 (the db master)?   @cmjohnson will take 1 of the 2 power supplies and cross-cable it into...
[20:03:26] <icinga-wm>	 RECOVERY - Request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api
[20:03:58] <wikibugs>	 (03CR) 10jenkins-bot: Revert "[cirrus] Start writing to psi & omega" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484273 (owner: 10DCausse)
[20:04:19] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10DBA: swap a2-eqiad PDU with on-site spare - https://phabricator.wikimedia.org/T213748 (10Marostegui) Awesome! Thanks for clarifying!
[20:05:02] <wikibugs>	 (03PS13) 10Gehel: Create second Blazegraph instance for categories [puppet] - 10https://gerrit.wikimedia.org/r/483628 (https://phabricator.wikimedia.org/T213234) (owner: 10Smalyshev)
[20:08:30] <wikibugs>	 (03CR) 10Anomie: [C: 03+1] "That's a lot of patches that all do basically the same thing though." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483910 (owner: 10MaxSem)
[20:08:54] <gehel>	 !log disabling puppet on all wdqs servers to deploy T213234
[20:08:57] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:08:57] <stashbot>	 T213234: Create puppet config to run two instances of Blazegraph - https://phabricator.wikimedia.org/T213234
[20:09:26] <wikibugs>	 (03CR) 10MaxSem: "Yep, and I prefer to do it granularly, even if they'll be eventually deployed in one batch." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483910 (owner: 10MaxSem)
[20:09:33] <wikibugs>	 (03CR) 10Samwilson: [C: 03+1] [labs] Remove $wmgUseTemplateWizard [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483900 (owner: 10MaxSem)
[20:10:51] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] Create second Blazegraph instance for categories [puppet] - 10https://gerrit.wikimedia.org/r/483628 (https://phabricator.wikimedia.org/T213234) (owner: 10Smalyshev)
[20:15:56] <wikibugs>	 (03PS5) 10Ottomata: [WIP] Helm chart for eventgate-analytics deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/483035 (https://phabricator.wikimedia.org/T211247)
[20:17:47] <wikibugs>	 (03PS6) 10Ottomata: [WIP] Helm chart for eventgate-analytics deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/483035 (https://phabricator.wikimedia.org/T211247)
[20:19:17] <wikibugs>	 (03CR) 10Dzahn: doc: fix Apache redirects to use https (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/483775 (https://phabricator.wikimedia.org/T95164) (owner: 10Hashar)
[20:27:34] <logmsgbot>	 !log gehel@deploy1001 Started deploy [wdqs/wdqs@f71131e]: upgradign wdqs1010 to latest version
[20:27:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:27:58] <logmsgbot>	 !log gehel@deploy1001 Finished deploy [wdqs/wdqs@f71131e]: upgradign wdqs1010 to latest version (duration: 00m 24s)
[20:27:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:29:30] <wikibugs>	 10Operations, 10Code-Stewardship-Reviews, 10Graphoid, 10Core Platform Team Backlog (Watching / External), and 2 others: graphoid: Code stewardship request - https://phabricator.wikimedia.org/T211881 (10Tgr) Are those numbers reliable? Arabic Wikipedia gets about 5M pageviews a day, and it sounds like almos...
[20:37:20] <logmsgbot>	 !log jforrester@deploy1001 Synchronized php-1.33.0-wmf.12/resources/Resources.php: Hot-deploy I18193b19 to add missing message for OOUI v0.30.0 (duration: 00m 47s)
[20:37:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:43:03] <wikibugs>	 (03CR) 10Dzahn: "Could you give an example for URLs that are currently broken? I think there is a lot of explanation here but maybe not the actual problem " [puppet] - 10https://gerrit.wikimedia.org/r/483775 (https://phabricator.wikimedia.org/T95164) (owner: 10Hashar)
[20:44:34] <wikibugs>	 10Operations, 10Citoid, 10SRE-Access-Requests: Requesting access to Citoid/Zotero production servers for MVOLZ - https://phabricator.wikimedia.org/T213269 (10CDanis)
[20:44:52] <wikibugs>	 (03PS1) 10Gehel: wdqs: make GC log file configurable per blazegraph instance [puppet] - 10https://gerrit.wikimedia.org/r/484288 (https://phabricator.wikimedia.org/T213234)
[20:45:04] <wikibugs>	 (03CR) 10Dzahn: "ok, i see they are described on https://phabricator.wikimedia.org/T213509  gotcha.. let me run some tests with apache-fast-test from deplo" [puppet] - 10https://gerrit.wikimedia.org/r/483775 (https://phabricator.wikimedia.org/T95164) (owner: 10Hashar)
[20:45:17] <wikibugs>	 10Operations, 10Citoid, 10SRE-Access-Requests: Requesting access to Citoid/Zotero production servers for MVOLZ - https://phabricator.wikimedia.org/T213269 (10CDanis) This was approved at the meeting today.  I'm happy to review a patchset adding your keys :)
[20:49:38] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db2057 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 302.57 seconds
[20:51:07] <wikibugs>	 (03CR) 10Mathew.onipe: [C: 03+1] wdqs: make GC log file configurable per blazegraph instance [puppet] - 10https://gerrit.wikimedia.org/r/484288 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel)
[20:51:18] <apergos>	 does anyone know what's up with db2057? some job running or...?
[20:51:38] <apergos>	 I'm kinda hoping it's known or someone in sf tz can look (it's 11 pm here)
[20:52:17] <wikibugs>	 (03PS1) 10Kosta Harlan: EditorJourney: Enable data collection for viwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484289 (https://phabricator.wikimedia.org/T213348)
[20:52:27] <wikibugs>	 (03PS2) 10CDanis: Add bmansunov to deploy-service and recommendation-admin groups [puppet] - 10https://gerrit.wikimedia.org/r/482312 (https://phabricator.wikimedia.org/T212945) (owner: 10Mobrovac)
[20:52:35] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] wdqs: make GC log file configurable per blazegraph instance [puppet] - 10https://gerrit.wikimedia.org/r/484288 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel)
[20:52:45] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] Add bmansunov to deploy-service and recommendation-admin groups [puppet] - 10https://gerrit.wikimedia.org/r/482312 (https://phabricator.wikimedia.org/T212945) (owner: 10Mobrovac)
[20:53:09] <wikibugs>	 (03PS3) 10CDanis: Add bmansunov to deploy-service and recommendation-admin groups [puppet] - 10https://gerrit.wikimedia.org/r/482312 (https://phabricator.wikimedia.org/T212945) (owner: 10Mobrovac)
[20:54:28] <wikibugs>	 10Operations, 10Recommendation-API, 10Research, 10SRE-Access-Requests, and 3 others: Add Baha as a deployer for Recommendation API - https://phabricator.wikimedia.org/T212945 (10CDanis)
[20:54:48] <wikibugs>	 10Operations, 10Recommendation-API, 10Research, 10SRE-Access-Requests, and 3 others: Add Baha as a deployer for Recommendation API - https://phabricator.wikimedia.org/T212945 (10CDanis) 05Open→03Resolved
[20:54:56] <wikibugs>	 (03CR) 10Kosta Harlan: "Waiting for Marshall's feedback on when this would get deployed." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/484289 (https://phabricator.wikimedia.org/T213348) (owner: 10Kosta Harlan)
[20:58:09] <wikibugs>	 10Operations, 10Traffic, 10Wikidata, 10serviceops, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10CRoslof) Transferring the domain name from WMDE to the Foundation requires that WMDE complete an ownership change form. I emailed with @Abraha...
[20:59:08] <mutante>	 apergos: it's a slave in codfw and slow quries from labsdb host. that combo should mean it doesn't need immediate action
[21:00:04] <jouncebot>	 cscott, arlolra, subbu, bearND, halfak, and Amir1: I, the Bot under the Fountain, allow thee, The Deployer, to do Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190114T2100).
[21:00:12] <apergos>	 no, I doubt it, but it should be flagged for review if we know there's not a slow predictable job on it
[21:00:15] <subbu>	 no parsoid deploy today
[21:01:00] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s6 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 290.41 seconds
[21:01:22] <wikibugs>	 (03CR) 10Volans: [C: 03+1] "LGTM, two minor optional comments inline." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/484276 (owner: 10Jbond)
[21:05:50] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10WMF-Legal, 10WMF-NDA-Requests: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T213397 (10CDanis) 05Open→03Resolved
[21:06:06] <wikibugs>	 (03CR) 10MarcoAurelio: "Thanks, Daniel. I am still waiting for someone with access to 'grep' the logs (mwmaint1002 & mwmaint2001, cfr. Task) and see if there has " [puppet] - 10https://gerrit.wikimedia.org/r/483876 (https://phabricator.wikimedia.org/T213591) (owner: 10MarcoAurelio)
[21:09:34] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 239, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:12:08] <wikibugs>	 10Operations, 10Citoid, 10SRE-Access-Requests: Requesting access to Citoid/Zotero production servers for MVOLZ - https://phabricator.wikimedia.org/T213269 (10CDanis) Ah, just realized I probably need to prepare the patch myself.  (Sorry, first time on clinic duty.)  Doing so shortly.
[21:14:00] <mutante>	 apergos: saved the slow query / user / client in a private paste.. pinged m.arostegui 
[21:14:16] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Grant sudo access for CI admins to doc.wikimedia.org publishing user - https://phabricator.wikimedia.org/T213169 (10hashar) I can confirm it is working fine. Thank you....
[21:14:21] <apergos>	 ok, well we'll see what's said about it eventually
[21:14:29] <apergos>	 thank you for having a lok
[21:15:42] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 241, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[21:16:48] <mutante>	 (wikibase)
[21:17:13] <wikibugs>	 (03PS1) 10CDanis: add user mvolz to citoid-admin/deployment/deploy-service [puppet] - 10https://gerrit.wikimedia.org/r/484295 (https://phabricator.wikimedia.org/T213269)
[21:18:33] <wikibugs>	 (03CR) 10MarcoAurelio: "> if you want give it a try to convert it to use the new mediawiki" [puppet] - 10https://gerrit.wikimedia.org/r/483876 (https://phabricator.wikimedia.org/T213591) (owner: 10MarcoAurelio)
[21:20:10] <wikibugs>	 (03CR) 10Volans: [C: 04-1] "I think there is an issue (actually also with current code). See also a couple of suggestions inline." (036 comments) [software/certcentral] - 10https://gerrit.wikimedia.org/r/483163 (https://phabricator.wikimedia.org/T213301) (owner: 10Vgutierrez)
[21:22:00] <wikibugs>	 (03CR) 10CDanis: [C: 03+2] add user mvolz to citoid-admin/deployment/deploy-service [puppet] - 10https://gerrit.wikimedia.org/r/484295 (https://phabricator.wikimedia.org/T213269) (owner: 10CDanis)
[21:23:08] <wikibugs>	 10Operations, 10Citoid, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to Citoid/Zotero production servers for MVOLZ - https://phabricator.wikimedia.org/T213269 (10CDanis) 05Open→03Resolved
[21:26:24] <logmsgbot>	 !log bsitzmann@deploy1001 Started deploy [mobileapps/deploy@89c4d8d]: Update mobileapps to f2658de (fix ITN explore feed for dawiki)
[21:26:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:30:15] <logmsgbot>	 !log bsitzmann@deploy1001 Finished deploy [mobileapps/deploy@89c4d8d]: Update mobileapps to f2658de (fix ITN explore feed for dawiki) (duration: 03m 51s)
[21:30:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:41:00] <wikibugs>	 (03PS7) 10Ottomata: [WIP] Helm chart for eventgate-analytics deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/483035 (https://phabricator.wikimedia.org/T211247)
[21:45:48] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2095 is OK: OK slave_sql_lag Replication lag: 12.16 seconds
[21:46:14] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2041 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[21:46:16] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2091 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[21:46:16] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2056 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[21:46:20] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2035 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[21:46:26] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2063 is OK: OK slave_sql_lag Replication lag: 0.15 seconds
[21:46:44] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2088 is OK: OK slave_sql_lag Replication lag: 0.22 seconds
[21:51:18] <wikibugs>	 (03PS1) 10Andrew Bogott: proxyleaks.py: update for multi-region and other issues [puppet] - 10https://gerrit.wikimedia.org/r/484303
[21:54:09] <wikibugs>	 (03PS1) 10Hashar: rsync: readd incoming and outgoing chmod [puppet] - 10https://gerrit.wikimedia.org/r/484304
[21:55:18] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] rsync: readd incoming and outgoing chmod [puppet] - 10https://gerrit.wikimedia.org/r/484304 (owner: 10Hashar)
[21:59:04] <icinga-wm>	 PROBLEM - Wikitech-static main page has content on labweb1001 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - string Wikitech not found on https://wikitech-static.wikimedia.org:443/wiki/Main_Page?debug=true - 291 bytes in 0.107 second response time
[21:59:14] <icinga-wm>	 PROBLEM - Wikitech-static main page has content on labweb1002 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - string Wikitech not found on https://wikitech-static.wikimedia.org:443/wiki/Main_Page?debug=true - 291 bytes in 0.108 second response time
[21:59:32] <icinga-wm>	 PROBLEM - Wikitech-static main page has content on labtestweb2001 is CRITICAL: HTTP CRITICAL: HTTP/1.0 500 Internal Server Error - string Wikitech not found on https://wikitech-static.wikimedia.org:443/wiki/Main_Page?debug=true - 291 bytes in 0.100 second response time
[22:00:04] <jouncebot>	 bawolff and Reedy: I, the Bot under the Fountain, allow thee, The Deployer, to do Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190114T2200).
[22:03:04] <andrewbogott>	 I'm working on wikitech-static, sorry for the noise
[22:04:35] <wikibugs>	 (03PS2) 10Hashar: rsync: readd incoming and outgoing chmod [puppet] - 10https://gerrit.wikimedia.org/r/484304 (https://phabricator.wikimedia.org/T137890)
[22:04:37] <wikibugs>	 (03PS1) 10Hashar: doc: make published files group writable [puppet] - 10https://gerrit.wikimedia.org/r/484308 (https://phabricator.wikimedia.org/T137890)
[22:05:28] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] rsync: readd incoming and outgoing chmod [puppet] - 10https://gerrit.wikimedia.org/r/484304 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar)
[22:06:43] <wikibugs>	 (03CR) 10Hashar: "We got sudo for doc-publisher, might as well make sure rsync set received files/dirs group writable?" [puppet] - 10https://gerrit.wikimedia.org/r/484308 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar)
[22:08:00] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s2 on db2049 is OK: OK slave_sql_lag Replication lag: 23.25 seconds
[22:08:07] <wikibugs>	 (03CR) 10Hashar: "Seems modules/rsync does not pass rubocop which should not happen (TM). Will dig into the issue eventually and fix it in another change." [puppet] - 10https://gerrit.wikimedia.org/r/484304 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar)
[22:10:16] <icinga-wm>	 PROBLEM - Blazegraph process on wdqs1010 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 499 (blazegraph), regex args ^java .* blazegraph-service-.*war
[22:12:25] <gehel>	 ^ downtime expired
[22:20:49] <wikibugs>	 10Operations, 10ops-eqiad: Interface errors on cr1-eqiad:xe-3/3/1 - https://phabricator.wikimedia.org/T212791 (10ayounsi) Equinix cleaned and tested the X-connect, but the issue persists. Next step is to do another round of testing/swapping on our side and follow up with Zayo if no resolution.
[22:21:00] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2057 is OK: OK slave_sql_lag Replication lag: 26.06 seconds
[22:25:50] <icinga-wm>	 RECOVERY - Wikitech-static main page has content on labweb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 27832 bytes in 0.284 second response time
[22:26:00] <icinga-wm>	 RECOVERY - Wikitech-static main page has content on labweb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 27832 bytes in 0.269 second response time
[22:26:18] <icinga-wm>	 RECOVERY - Wikitech-static main page has content on labtestweb2001 is OK: HTTP OK: HTTP/1.1 200 OK - 27832 bytes in 0.240 second response time
[22:36:16] <James_F>	 bblack: Don't suppose you've had time to do any of the Zero VCL removal you mentioned last week? Should I create a task under T187716?
[22:36:17] <stashbot>	 T187716: Sunset Wikipedia Zero - https://phabricator.wikimedia.org/T187716
[22:39:47] <andrewbogott>	 !log upgraded packages and MW version on wikitech-static
[22:39:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[22:42:34] <wikibugs>	 (03PS1) 10Gehel: wdqs: monitor blazegraph process per instance [puppet] - 10https://gerrit.wikimedia.org/r/484314 (https://phabricator.wikimedia.org/T213234)
[22:43:09] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wdqs: monitor blazegraph process per instance [puppet] - 10https://gerrit.wikimedia.org/r/484314 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel)
[22:53:01] <wikibugs>	 (03CR) 10Dzahn: "i'm somewhat skeptical about adding several "hacks" to do things manual and prevent breakage when doing things manual when at the same tim" [puppet] - 10https://gerrit.wikimedia.org/r/484308 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar)
[22:54:57] <bblack>	 James_F: please do, it's been a pretty swampy january so far :)
[22:57:29] <James_F>	 bblack: Kk.
[22:57:55] <wikibugs>	 (03PS3) 10Dzahn: doc: force users umask for wikidev group [puppet] - 10https://gerrit.wikimedia.org/r/484194 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar)
[22:58:11] <wikibugs>	 10Operations, 10ExternalGuidance, 10Traffic: Deliver mobile-based version for automatic translations - https://phabricator.wikimedia.org/T212197 (10BBlack) I don't have any suggestions, no.  Develop a straw-patch which at least serves in code terms to document the intent (e.g. the explicit header and URI val...
[22:58:55] <wikibugs>	 (03PS2) 10Smalyshev: wdqs: monitor blazegraph process per instance [puppet] - 10https://gerrit.wikimedia.org/r/484314 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel)
[22:59:23] <wikibugs>	 10Operations, 10Traffic, 10Zero: Zero VCL removal - https://phabricator.wikimedia.org/T213769 (10Jdforrester-WMF) p:05Triage→03Normal
[22:59:30] <wikibugs>	 (03CR) 10Jforrester: [C: 04-2] "Specifically, blocked on T213769." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483193 (owner: 10Jforrester)
[22:59:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wdqs: monitor blazegraph process per instance [puppet] - 10https://gerrit.wikimedia.org/r/484314 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel)
[23:00:11] <wikibugs>	 (03PS3) 10Jforrester: Revert "Block WP Zero users from accessing Phabricator uploads" [puppet] - 10https://gerrit.wikimedia.org/r/479399 (https://phabricator.wikimedia.org/T213769) (owner: 10MaxSem)
[23:00:35] <wikibugs>	 (03CR) 10Dzahn: "> We can fix the permissions ourselves once we are granded sudo as the doc-publisher user  T213169" [puppet] - 10https://gerrit.wikimedia.org/r/484194 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar)
[23:00:50] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] doc: force users umask for wikidev group [puppet] - 10https://gerrit.wikimedia.org/r/484194 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar)
[23:02:13] <wikibugs>	 (03CR) 10Dzahn: "adding Moritz for general rsync module changes" [puppet] - 10https://gerrit.wikimedia.org/r/484304 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar)
[23:03:04] <wikibugs>	 (03CR) 10Dzahn: "depends on getting support for it back into rsync module.. so stalled for a moment" [puppet] - 10https://gerrit.wikimedia.org/r/484308 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar)
[23:05:11] <wikibugs>	 10Operations, 10Continuous-Integration-Infrastructure, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Grant sudo access for CI admins to doc.wikimedia.org publishing user - https://phabricator.wikimedia.org/T213169 (10Dzahn) >>! In T213169#4878825, @hashar wrote: >just no...
[23:08:33] <wikibugs>	 (03PS3) 10Smalyshev: wdqs: monitor blazegraph process per instance [puppet] - 10https://gerrit.wikimedia.org/r/484314 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel)
[23:12:05] <wikibugs>	 (03PS4) 10Gehel: wdqs: monitor blazegraph process per instance [puppet] - 10https://gerrit.wikimedia.org/r/484314 (https://phabricator.wikimedia.org/T213234)
[23:13:05] <wikibugs>	 (03PS4) 10Gergő Tisza: Improve list of privileged groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483022
[23:13:28] <wikibugs>	 (03CR) 10Smalyshev: [C: 03+1] wdqs: monitor blazegraph process per instance [puppet] - 10https://gerrit.wikimedia.org/r/484314 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel)
[23:15:23] <wikibugs>	 (03PS1) 10Dzahn: doc: ferm, allow http connections from deployment hosts [puppet] - 10https://gerrit.wikimedia.org/r/484317
[23:15:59] <wikibugs>	 (03PS5) 10Gehel: wdqs: monitor blazegraph process per instance [puppet] - 10https://gerrit.wikimedia.org/r/484314 (https://phabricator.wikimedia.org/T213234)
[23:17:13] <wikibugs>	 (03PS2) 10Dzahn: doc: ferm, allow http connections from deployment hosts [puppet] - 10https://gerrit.wikimedia.org/r/484317 (https://phabricator.wikimedia.org/T137890)
[23:17:38] <wikibugs>	 (03PS6) 10Gehel: wdqs: monitor blazegraph process per instance [puppet] - 10https://gerrit.wikimedia.org/r/484314 (https://phabricator.wikimedia.org/T213234)
[23:18:29] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/14327/doc1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/484317 (https://phabricator.wikimedia.org/T137890) (owner: 10Dzahn)
[23:18:42] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] wdqs: monitor blazegraph process per instance [puppet] - 10https://gerrit.wikimedia.org/r/484314 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel)
[23:19:06] <gehel>	 mutante: damn, you were faster than me!
[23:19:14] <wikibugs>	 (03PS7) 10Gehel: wdqs: monitor blazegraph process per instance [puppet] - 10https://gerrit.wikimedia.org/r/484314 (https://phabricator.wikimedia.org/T213234)
[23:20:29] <mutante>	 gehel: oops. a constant race. your time now before i touch the next :)
[23:20:54] <gehel>	 mutante: thanks!
[23:25:57] <balder>	 PROBLEM - testing a script
[23:27:13] <balder>	 PROBLEM - testing a script
[23:27:54] <wikibugs>	 (03PS4) 10Smalyshev: wdqs: prometheus-blazegraph-exporter supports multi instances [puppet] - 10https://gerrit.wikimedia.org/r/484246 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel)
[23:28:03] <Hauskatze>	 hmm, isn't john bond our new employee?
[23:28:13] <jbond42>	 yes sorry guys that was me
[23:29:02] <wikibugs>	 (03PS1) 10Gehel: wdqs: removed unused port parameter on wdqs::monitor::blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/484319 (https://phabricator.wikimedia.org/T213234)
[23:29:44] <wikibugs>	 (03CR) 10Gehel: [C: 03+2] wdqs: removed unused port parameter on wdqs::monitor::blazegraph [puppet] - 10https://gerrit.wikimedia.org/r/484319 (https://phabricator.wikimedia.org/T213234) (owner: 10Gehel)
[23:30:19] <mutante>	 !log doc1001 - disabling puppet, testing apache config change 483775
[23:30:20] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:30:33] <p858snake>	 jbond42: I recommend applying for a irc cloak, this can be done by "/msg wmopbot cloak" and following the prompts, lot easier to tell when trying to hit spammers
[23:30:37] <p858snake>	 apologies for the removal
[23:31:02] <jbond42>	 p858snake: no worries i should have used a dev room either way
[23:31:21] <jbond42>	 also i did request a cloak for this account just waiting for approval
[23:33:09] <jbond42>	 the other account can remain as a standard account or blocked :)
[23:34:13] <wikibugs>	 (03CR) 10Dzahn: [C: 03+1] "i opened a firewall hole to allow using apache-fast-test from deploy1001 against doc1001 (https://gerrit.wikimedia.org/r/#/c/operations/pu" [puppet] - 10https://gerrit.wikimedia.org/r/483775 (https://phabricator.wikimedia.org/T95164) (owner: 10Hashar)
[23:34:23] <wikibugs>	 (03PS3) 10Dzahn: doc: fix Apache redirects to use https [puppet] - 10https://gerrit.wikimedia.org/r/483775 (https://phabricator.wikimedia.org/T95164) (owner: 10Hashar)
[23:34:50] <wikibugs>	 (03CR) 10Dzahn: [C: 03+2] "editing the old config didn't really do anything anymore and we should delete it but shrug :)" [puppet] - 10https://gerrit.wikimedia.org/r/483775 (https://phabricator.wikimedia.org/T95164) (owner: 10Hashar)
[23:35:07] <wikibugs>	 10Operations, 10netops, 10Performance-Team (Radar): Stop prioritizing peering over transit - https://phabricator.wikimedia.org/T204281 (10ayounsi)
[23:39:51] <logmsgbot>	 !log gehel@deploy1001 Started deploy [wdqs/wdqs@59d5f40]: New wdqs startup script for multi-instance
[23:39:52] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:46:06] <wikibugs>	 (03PS1) 10Dzahn: contint: delete unused doc.wikimedia.org site config [puppet] - 10https://gerrit.wikimedia.org/r/484321 (https://phabricator.wikimedia.org/T137890)
[23:47:20] <wikibugs>	 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-Cache, 10Language-Team (Language-2019-January-March), and 5 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10aaron) >>! In T203786#487722...
[23:47:26] <wikibugs>	 10Operations, 10CirrusSearch, 10Discovery-Search (Current work): Add chi, psi and omega selector to the elasticsearch dashboards in grafana - https://phabricator.wikimedia.org/T211956 (10debt) 05Open→03Resolved
[23:49:43] <logmsgbot>	 !log gehel@deploy1001 Finished deploy [wdqs/wdqs@59d5f40]: New wdqs startup script for multi-instance (duration: 09m 53s)
[23:49:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:54:45] <icinga-wm>	 PROBLEM - Blazegraph process on wdqs2001 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 499 (blazegraph), regex args ^java .* blazegraph-service-.*war
[23:57:05] <icinga-wm>	 PROBLEM - Blazegraph process on wdqs2005 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 499 (blazegraph), regex args ^java .* blazegraph-service-.*war
[23:57:11] <icinga-wm>	 PROBLEM - Blazegraph process on wdqs2004 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 499 (blazegraph), regex args ^java .* blazegraph-service-.*war
[23:57:13] <icinga-wm>	 PROBLEM - Blazegraph process on wdqs2003 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 499 (blazegraph), regex args ^java .* blazegraph-service-.*war
[23:57:51] <gehel>	 ^ transient failure, icinga needs to be updated as well for new wdqs instances, sorry for the noise
[23:57:51] <mutante>	 i know what this will be :)
[23:57:53] <icinga-wm>	 PROBLEM - Blazegraph process on wdqs1006 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 499 (blazegraph), regex args ^java .* blazegraph-service-.*war
[23:58:03] <mutante>	 Matt announced runnign 2 blazegraphs per instance in the meeting today
[23:58:03] <icinga-wm>	 PROBLEM - Blazegraph process on wdqs1008 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 499 (blazegraph), regex args ^java .* blazegraph-service-.*war
[23:58:07] <icinga-wm>	 PROBLEM - Blazegraph process on wdqs2002 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 499 (blazegraph), regex args ^java .* blazegraph-service-.*war
[23:58:22] <mutante>	 and the check isn't "> 0" but " == 1"
[23:58:41] <gehel>	 mutante: at least someone listen in those meetings!
[23:58:43] <jijiki>	 haha, it looked a lot like "something bad happend" 
[23:58:49] <icinga-wm>	 PROBLEM - Blazegraph process on wdqs2006 is CRITICAL: PROCS CRITICAL: 2 processes with UID = 499 (blazegraph), regex args ^java .* blazegraph-service-.*war
[23:58:51] <Hauskatze>	 mutante: left some comments on the purge_abusefilter.pp. Going to bed now.
[23:59:12] <gehel>	 nah, the correction to the check is in the same patch as the new instance, I should have split those for deployement
[23:59:31] <mutante>	 Hauskatze: thanks and good night