[01:49:44] 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for sukhe - https://phabricator.wikimedia.org/T217438 (10Slaporte) I support @ssingh's request. Access would help his work on a project he's doing as a Ford-Mozilla Open Web Fellow for Wikimedia. [02:05:08] (03CR) 10CRusnov: [C: 04-1] "Incorrect Timer selector thing." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [02:08:55] (03CR) 10Aaron Schulz: "That's right except for the fact that resetCheckKey() *does* use broadcasted DELETE, but that is rarely called. Normal "deletes" are broad" [puppet] - 10https://gerrit.wikimedia.org/r/492948 (owner: 10Aaron Schulz) [02:55:52] (03PS1) 10BryanDavis: pbuilder: Ensure ~pbuilder exists and is writable [puppet] - 10https://gerrit.wikimedia.org/r/494155 [03:53:05] PROBLEM - HHVM rendering on mw1344 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:54:07] RECOVERY - HHVM rendering on mw1344 is OK: HTTP OK: HTTP/1.1 200 OK - 79514 bytes in 0.287 second response time [05:00:04] kart_: #bothumor I � Unicode. All rise for deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190304T0500). [05:16:39] ah. Timezone. [05:18:51] !log Started manual run of unpublished ContentTranslation draft purge script (T217310) [05:18:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:18:54] T217310: Run unpublished draft purge script for CX (Week of 03/03) - https://phabricator.wikimedia.org/T217310 [05:23:39] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] apache: update ssl config [wikitech-static] - 10https://gerrit.wikimedia.org/r/492684 (owner: 10Andrew Bogott) [05:23:47] (03CR) 10Andrew Bogott: [V: 03+2 C: 03+2] mediawiki config: catch up with upstream mw changes [wikitech-static] - 10https://gerrit.wikimedia.org/r/492685 (owner: 10Andrew Bogott) [05:28:01] (03CR) 10Andrew Bogott: [C: 03+1] "I didn't know about start_delay -- this seems good!" [puppet] - 10https://gerrit.wikimedia.org/r/493807 (https://phabricator.wikimedia.org/T216040) (owner: 10GTirloni) [05:56:39] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Update pc1007 rack [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493722 (owner: 10Marostegui) [05:57:42] (03Merged) 10jenkins-bot: db-eqiad.php: Update pc1007 rack [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493722 (owner: 10Marostegui) [05:59:51] (03PS1) 10Marostegui: db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494160 [06:00:45] (03PS1) 10Marostegui: dbproxy1010: Depoo labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/494161 [06:01:51] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494160 (owner: 10Marostegui) [06:02:17] (03PS2) 10Marostegui: dbproxy1010: Depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/494161 [06:02:51] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494160 (owner: 10Marostegui) [06:04:10] (03CR) 10Marostegui: [C: 03+2] dbproxy1010: Depool labsdb1010 [puppet] - 10https://gerrit.wikimedia.org/r/494161 (owner: 10Marostegui) [06:04:22] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1094:3314 for schema change (duration: 01m 11s) [06:04:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:04:28] (03CR) 10jenkins-bot: db-eqiad.php: Update pc1007 rack [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493722 (owner: 10Marostegui) [06:04:30] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494160 (owner: 10Marostegui) [06:05:26] !log Reload haproxy on dbproxy1010 to depool labsdb1010 [06:05:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:06:36] !log Run analyze table logging on db2038 and db2059 - T71222 [06:06:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:06:38] T71222: list=logevents slow for users with last log action long time ago - https://phabricator.wikimedia.org/T71222 [06:13:18] !log Upgrade MySQL on db2041 db2049 db2056 db2095 [06:13:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:29:47] PROBLEM - puppet last run on ms-be1027 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/rsyslog.d/40-swift.conf] [06:34:17] !log downtimed cloudstore1008/9 (T209527) [06:34:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:34:20] T209527: Set up scratch and maps NFS services on cloudstore1008/9 - https://phabricator.wikimedia.org/T209527 [06:38:09] !log Stop MySQL on labsdb1010 for mysql upgrade [06:38:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:42:19] PROBLEM - haproxy failover on dbproxy1011 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [06:42:24] ^ expected [06:44:50] marostegui: morning! [06:45:02] elukey: is it time?! [06:45:08] it is :) [06:45:16] \o\ |o| /o/ [06:45:19] Ok, let me go for it! [06:45:42] green light from Analytics! [06:45:55] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga2001 is CRITICAL: cluster=cache_text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [06:46:20] !log Stop MySQL on dbstore1002 for decommission T210478 T172410 T216491 T215589 [06:46:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:46:27] T216491: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 [06:46:27] T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 [06:46:27] T172410: Replace the current multisource analytics-store setup - https://phabricator.wikimedia.org/T172410 [06:46:28] T215589: Migrate users to dbstore100[3-5] - https://phabricator.wikimedia.org/T215589 [06:47:47] (03PS1) 10GTirloni: cloudstore1008/9: reimage with buster [puppet] - 10https://gerrit.wikimedia.org/r/494163 (https://phabricator.wikimedia.org/T209527) [06:48:18] (03PS1) 10Elukey: Delete analytics-store CNAME due to dbstore1002 decom [dns] - 10https://gerrit.wikimedia.org/r/494164 (https://phabricator.wikimedia.org/T216491) [06:48:25] marostegui: --^ [06:48:40] (03CR) 10GTirloni: [C: 03+2] cloudstore1008/9: reimage with buster [puppet] - 10https://gerrit.wikimedia.org/r/494163 (https://phabricator.wikimedia.org/T209527) (owner: 10GTirloni) [06:49:05] RECOVERY - haproxy failover on dbproxy1011 is OK: OK check_failover servers up 2 down 0 [06:49:12] (03CR) 10Marostegui: [C: 03+1] Delete analytics-store CNAME due to dbstore1002 decom [dns] - 10https://gerrit.wikimedia.org/r/494164 (https://phabricator.wikimedia.org/T216491) (owner: 10Elukey) [06:49:41] (03CR) 10Elukey: [C: 03+2] Delete analytics-store CNAME due to dbstore1002 decom [dns] - 10https://gerrit.wikimedia.org/r/494164 (https://phabricator.wikimedia.org/T216491) (owner: 10Elukey) [06:50:49] analytics-store is gone :D [06:50:54] \o/ [06:50:58] * marostegui cries [06:51:06] \o/ [06:52:38] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Marostegui) MySQL has been stopped on dbstore1002 and won't be started again, as this host will be decommissioned [06:53:34] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Marostegui) [06:53:45] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Marostegui) 05Stalled→03Open [06:54:03] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [07:00:25] RECOVERY - puppet last run on ms-be1027 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:04:45] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga2001 is CRITICAL: cluster=cache_text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [07:05:33] !log Upgrade MySQL on db2088 and db2091 [07:05:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:09:59] (03PS1) 10Marostegui: Revert "dbproxy1010: Depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/494165 [07:10:07] (03PS2) 10Marostegui: Revert "dbproxy1010: Depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/494165 [07:13:05] !log Remove dbstore1002 from tendril and zarcillo - T216491 [07:13:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:13:08] T216491: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 [07:17:23] !log Finished manual run of unpublished ContentTranslation draft purge script (T217310) [07:17:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:17:26] T217310: Run unpublished draft purge script for CX (Week of 03/03) - https://phabricator.wikimedia.org/T217310 [07:22:28] 10Operations, 10Patch-For-Review: Prepare our base system layer for Debian buster - https://phabricator.wikimedia.org/T213527 (10GTirloni) I've encountered an issue re-imaging cloudstore1008/9 with Buster where the megaraid_sas driver seems to be missing (or is it mpt2sas?), so no disks are detected. [07:24:32] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga2001 is CRITICAL: cluster=cache_text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [07:25:01] _joe_: ^ [07:26:02] marostegui: https://grafana.wikimedia.org/d/000000352/varnish-failed-fetches?orgId=1&from=now-3h&to=now [07:26:11] there is something weird happening [07:29:11] started ~ at 6:14 UTC [07:30:44] <_joe_> elukey: indeed [07:30:48] <_joe_> let's call ema? [07:31:04] yeah [07:31:44] calling him [07:31:58] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga2001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [07:32:01] (03CR) 10Marostegui: [C: 03+2] Revert "dbproxy1010: Depool labsdb1010" [puppet] - 10https://gerrit.wikimedia.org/r/494165 (owner: 10Marostegui) [07:32:28] heh, as soon as you mention ema the recovery arrives [07:33:12] so there were a couple of servers with mailbox lag [07:33:28] that might be due to traffic changed (?) or similar [07:33:37] !log Reload haproxy on dbproxy1010 to repool labsdb1010 [07:33:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:33:44] but in the past in some situations we had to restart the varnishes lagging [07:35:08] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1097:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494166 [07:36:25] (03PS1) 10GTirloni: Revert "cloudstore1008/9: reimage with buster" [puppet] - 10https://gerrit.wikimedia.org/r/494168 [07:37:05] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1097:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494166 (owner: 10Marostegui) [07:37:38] there are still some varnishes not behaving correctly though [07:37:45] (03CR) 10GTirloni: [C: 03+2] Revert "cloudstore1008/9: reimage with buster" [puppet] - 10https://gerrit.wikimedia.org/r/494168 (owner: 10GTirloni) [07:37:52] the alarm is green but not completely good [07:37:58] elukey: is ema coming in the end? [07:38:08] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1097:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494166 (owner: 10Marostegui) [07:38:51] marostegui: called him, didn't answer [07:39:20] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 (duration: 00m 53s) [07:39:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:39:30] elukey: ah ok! [07:39:42] (03PS1) 10Marostegui: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494170 [07:40:12] elukey: yeah, the alert might come back https://grafana.wikimedia.org/d/000000352/varnish-failed-fetches?orgId=1&from=now-3h&to=now [07:40:43] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494170 (owner: 10Marostegui) [07:40:54] marostegui: https://logstash.wikimedia.org/app/kibana#/dashboard/Varnish-Webrequest-50X [07:41:04] we have some 503s too [07:41:43] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494170 (owner: 10Marostegui) [07:42:48] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1103:3314 for schema change (duration: 00m 49s) [07:42:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:44:15] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1097:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494166 (owner: 10Marostegui) [07:44:17] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1103:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494170 (owner: 10Marostegui) [07:44:59] 10Operations, 10Discovery-Search: Reshard Commons_wiki - https://phabricator.wikimedia.org/T217531 (10Mathew.onipe) [07:45:08] 10Operations, 10Discovery-Search: Reshard Commons_wiki - https://phabricator.wikimedia.org/T217531 (10Mathew.onipe) p:05Triage→03Normal [07:45:53] 10Operations, 10Discovery-Search, 10Elasticsearch: Reshard Commons_wiki - https://phabricator.wikimedia.org/T217531 (10Mathew.onipe) [07:48:21] !log cp3032/cp3042: restart varnish-be due to mbox lag [07:48:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:10:05] (03PS1) 10Marostegui: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494171 (https://phabricator.wikimedia.org/T217397) [08:11:50] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494171 (https://phabricator.wikimedia.org/T217397) (owner: 10Marostegui) [08:12:13] (03CR) 10Muehlenhoff: [C: 03+1] Remove if statement as we now use defaultdict [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493720 (owner: 10Jbond) [08:12:56] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494171 (https://phabricator.wikimedia.org/T217397) (owner: 10Marostegui) [08:14:00] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1089 - T217397 (duration: 00m 49s) [08:14:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:14:05] T217397: logging.log_title_time and logging.log_title_type_time indexes are not on tables.sql but they exist on most of the wikis - https://phabricator.wikimedia.org/T217397 [08:18:34] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494171 (https://phabricator.wikimedia.org/T217397) (owner: 10Marostegui) [08:29:28] !log Change logging indexes on db1089 to leave the indexes exactly like the ones on tables.sql - T217397 [08:29:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:35] T217397: logging.log_title_time and logging.log_title_type_time indexes are not on tables.sql but they exist on most of the wikis - https://phabricator.wikimedia.org/T217397 [08:30:34] (03PS2) 10Gilles: Enable Priority Hints origin trial on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493036 (https://phabricator.wikimedia.org/T216499) [08:32:35] (03CR) 10Gilles: [C: 03+2] Enable Priority Hints origin trial on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493036 (https://phabricator.wikimedia.org/T216499) (owner: 10Gilles) [08:33:40] (03Merged) 10jenkins-bot: Enable Priority Hints origin trial on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493036 (https://phabricator.wikimedia.org/T216499) (owner: 10Gilles) [08:35:30] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494173 [08:35:53] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494173 [08:38:06] PROBLEM - Check systemd state on labsdb1010 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:38:25] !log gilles@deploy1001 scap failed: average error rate on 7/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) [08:38:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:39:20] win 5 [08:40:35] <_joe_> gilles: have you seen the errors? [08:40:54] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10Marostegui) [08:41:08] (03CR) 10jenkins-bot: Enable Priority Hints origin trial on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493036 (https://phabricator.wikimedia.org/T216499) (owner: 10Gilles) [08:41:22] (03PS1) 10Gilles: Revert "Enable Priority Hints origin trial on ruwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494174 [08:41:54] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494173 (owner: 10Marostegui) [08:42:57] (03CR) 10Gilles: [C: 03+2] Revert "Enable Priority Hints origin trial on ruwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494174 (owner: 10Gilles) [08:42:59] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494173 (owner: 10Marostegui) [08:43:44] _joe_: I have :) [08:43:48] reverting [08:44:01] (03Merged) 10jenkins-bot: Revert "Enable Priority Hints origin trial on ruwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494174 (owner: 10Gilles) [08:44:05] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1103:3314 (duration: 00m 49s) [08:44:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:45:18] (03PS3) 10Ammarpad: Add editcontentmodel right to the templateeditor group on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494016 (https://phabricator.wikimedia.org/T217499) [08:45:32] !log gilles@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T216499 Undo enabling Priority Hints origin trial on ruwiki (duration: 00m 49s) [08:45:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:45:38] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10Marostegui) @elukey there is a few seconds of downtime expected for db1107 (event logging master) during this maintenance. [08:45:39] T216499: Priority Hints origin trial - https://phabricator.wikimedia.org/T216499 [08:46:27] RECOVERY - Check systemd state on labsdb1010 is OK: OK - running: The system is fully operational [08:47:35] (03CR) 10Dzahn: "aha! yea, definitely good to have it in a repo. thanks for letting me know" [wikitech-static] - 10https://gerrit.wikimedia.org/r/492684 (owner: 10Andrew Bogott) [08:47:41] (03CR) 10D3r1ck01: "Nice improvement @Ammarpad, this indeed is easier for review." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494016 (https://phabricator.wikimedia.org/T217499) (owner: 10Ammarpad) [08:47:52] (03CR) 10D3r1ck01: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494016 (https://phabricator.wikimedia.org/T217499) (owner: 10Ammarpad) [08:50:09] (03CR) 10Ammarpad: Add editcontentmodel right to the templateeditor group on testwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494016 (https://phabricator.wikimedia.org/T217499) (owner: 10Ammarpad) [08:50:15] (03CR) 10D3r1ck01: [C: 03+1] "LGTM! Let another pair of eyes check this for FR." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494016 (https://phabricator.wikimedia.org/T217499) (owner: 10Ammarpad) [08:52:07] (03CR) 10D3r1ck01: [C: 03+1] Add editcontentmodel right to the templateeditor group on testwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494016 (https://phabricator.wikimedia.org/T217499) (owner: 10Ammarpad) [08:52:39] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1103:3314" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494173 (owner: 10Marostegui) [08:52:41] (03CR) 10jenkins-bot: Revert "Enable Priority Hints origin trial on ruwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494174 (owner: 10Gilles) [08:55:49] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10elukey) >>! In T187960#4997110, @Marostegui wrote: > @elukey there is a few seconds of downtime expected for db1107 (event logging master) during this maint... [08:59:23] (03PS2) 10Dzahn: confd: Remove obsolete Upstart job [puppet] - 10https://gerrit.wikimedia.org/r/493489 (owner: 10Muehlenhoff) [09:01:31] (03CR) 10Dzahn: "It seems you meant the upstart template in module "xdummy" which matches the commit message comment, but the code change is for the confd " [puppet] - 10https://gerrit.wikimedia.org/r/493489 (owner: 10Muehlenhoff) [09:04:11] (03PS1) 10Dzahn: xdummy: remove obsolete upstart template [puppet] - 10https://gerrit.wikimedia.org/r/494178 [09:05:17] PROBLEM - Check systemd state on labsdb1010 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [09:06:28] (03CR) 10Muehlenhoff: [C: 03+1] xdummy: remove obsolete upstart template [puppet] - 10https://gerrit.wikimedia.org/r/494178 (owner: 10Dzahn) [09:09:21] (03PS2) 10Dzahn: xdummy: remove obsolete upstart template [puppet] - 10https://gerrit.wikimedia.org/r/494178 [09:09:56] (03CR) 10Dzahn: [C: 03+2] xdummy: remove obsolete upstart template [puppet] - 10https://gerrit.wikimedia.org/r/494178 (owner: 10Dzahn) [09:10:30] (03Abandoned) 10Muehlenhoff: confd: Remove obsolete Upstart job [puppet] - 10https://gerrit.wikimedia.org/r/493489 (owner: 10Muehlenhoff) [09:12:47] (03CR) 10Dzahn: [C: 03+2] contint: phaseout android slave [puppet] - 10https://gerrit.wikimedia.org/r/491574 (https://phabricator.wikimedia.org/T198495) (owner: 10Hashar) [09:13:03] (03PS2) 10Dzahn: contint: phaseout android slave [puppet] - 10https://gerrit.wikimedia.org/r/491574 (https://phabricator.wikimedia.org/T198495) (owner: 10Hashar) [09:19:11] (03PS1) 10ArielGlenn: pylint, also fix up failure error message for misc dumps [dumps] - 10https://gerrit.wikimedia.org/r/494180 [09:22:28] !log temporarily stop prometheus on prometheus2004 to take a snapshot [09:22:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:00] (03PS1) 10Dzahn: planet: remove wikirigoler.over-blog.com [puppet] - 10https://gerrit.wikimedia.org/r/494182 [09:25:24] (03CR) 10ArielGlenn: [C: 03+2] pylint, also fix up failure error message for misc dumps [dumps] - 10https://gerrit.wikimedia.org/r/494180 (owner: 10ArielGlenn) [09:26:08] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::php: use GET in opcache flushing [puppet] - 10https://gerrit.wikimedia.org/r/494183 [09:27:02] !log ariel@deploy1001 Started deploy [dumps/dumps@932bf7e]: make misc dumps failure message nicer [09:27:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:27:12] !log ariel@deploy1001 Finished deploy [dumps/dumps@932bf7e]: make misc dumps failure message nicer (duration: 00m 09s) [09:27:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:27:17] (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::mediawiki::php: use GET in opcache flushing [puppet] - 10https://gerrit.wikimedia.org/r/494183 (owner: 10Giuseppe Lavagetto) [09:27:41] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::php: use GET in opcache flushing [puppet] - 10https://gerrit.wikimedia.org/r/494183 [09:27:59] (03CR) 10Dzahn: [C: 03+2] planet: remove wikirigoler.over-blog.com [puppet] - 10https://gerrit.wikimedia.org/r/494182 (owner: 10Dzahn) [09:28:14] (03PS2) 10Dzahn: planet: remove wikirigoler.over-blog.com [puppet] - 10https://gerrit.wikimedia.org/r/494182 [09:28:27] (03CR) 10Dzahn: [C: 03+2] "per T&C request" [puppet] - 10https://gerrit.wikimedia.org/r/494182 (owner: 10Dzahn) [09:29:28] <_joe_> mutante: grrr [09:29:48] (03PS3) 10Giuseppe Lavagetto: profile::mediawiki::php: use GET in opcache flushing [puppet] - 10https://gerrit.wikimedia.org/r/494183 [09:29:49] <_joe_> I hate ff-only [09:29:56] <_joe_> it makes me lose so much time every day [09:29:58] oops, i thought i was second and you already done [09:30:12] <_joe_> mutante: nope I was waiting CI [09:30:24] <_joe_> which decided to take 3 minutes to run [09:30:28] 10Operations, 10MediaWiki-Cache, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review, and 3 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) I think that the main recurrent CAS issue... [09:30:41] sigh, i see.. yep [09:30:48] <_joe_> and now again [09:31:39] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::php: stop revalidating opcache [puppet] - 10https://gerrit.wikimedia.org/r/493486 (https://phabricator.wikimedia.org/T211964) [09:32:07] (03CR) 10Gehel: [C: 04-1] "Looks mostly good! Comment inline (but we might ignore it if it adds too much complexity)." (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/488256 (https://phabricator.wikimedia.org/T213401) (owner: 10Mathew.onipe) [09:33:43] (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::mediawiki::php: stop revalidating opcache [puppet] - 10https://gerrit.wikimedia.org/r/493486 (https://phabricator.wikimedia.org/T211964) (owner: 10Giuseppe Lavagetto) [09:35:00] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to tsops@ - https://phabricator.wikimedia.org/T210464 (10Dzahn) a:03Dzahn [09:39:54] (03CR) 10Mathew.onipe: Add wdqs data transfer cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/488256 (https://phabricator.wikimedia.org/T213401) (owner: 10Mathew.onipe) [09:40:06] (03PS1) 10Marostegui: db-eqiad.php: Repool db1089 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494184 (https://phabricator.wikimedia.org/T217397) [09:41:58] 10Operations, 10Patch-For-Review: Prepare our base system layer for Debian buster - https://phabricator.wikimedia.org/T213527 (10MoritzMuehlenhoff) @GTirloni: That's a temporary installer issue, the kernel modules on the last installer images provided use a different kernel ABI than the current kernel in the a... [09:42:32] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Repool db1089 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494184 (https://phabricator.wikimedia.org/T217397) (owner: 10Marostegui) [09:43:33] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1089 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494184 (https://phabricator.wikimedia.org/T217397) (owner: 10Marostegui) [09:44:45] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1089 with low weight (duration: 00m 48s) [09:44:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:50] 10Operations, 10Scap, 10Patch-For-Review, 10User-ArielGlenn, 10User-Joe: Make scap and opcache work consistently together - https://phabricator.wikimedia.org/T211964 (10Joe) 05Open→03Resolved [09:44:52] 10Operations, 10Core Platform Team (PHP7 (TEC4)), 10Core Platform Team Kanban (Doing), 10HHVM, and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Joe) [09:45:24] 10Operations, 10serviceops, 10User-Joe: SRE FY2019 Q3 goal: Ramp-up serving traffic to PHP 7 - https://phabricator.wikimedia.org/T212828 (10Joe) [09:45:47] 10Operations, 10serviceops, 10User-Joe: SRE FY2019 Q3 goal: Ramp-up serving traffic to PHP 7 - https://phabricator.wikimedia.org/T212828 (10Joe) [09:49:44] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1089 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494184 (https://phabricator.wikimedia.org/T217397) (owner: 10Marostegui) [09:57:00] 10Operations, 10monitoring: prometheus-mysqld-exporter package 0.11.0 options changed - https://phabricator.wikimedia.org/T217542 (10Marostegui) [09:57:20] ACKNOWLEDGEMENT - Check systemd state on labsdb1010 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Marostegui T217542 [09:58:02] (03CR) 10Filippo Giunchedi: [C: 03+1] "Thanks Timo!" [puppet] - 10https://gerrit.wikimedia.org/r/494042 (https://phabricator.wikimedia.org/T136849) (owner: 10Krinkle) [10:05:14] RECOVERY - Check systemd state on labsdb1010 is OK: OK - running: The system is fully operational [10:08:42] Reedy: any idea what happened with this? https://phabricator.wikimedia.org/T204477 I ask because there's still no content in the wiki and that seems odd [10:08:48] PROBLEM - Check systemd state on labsdb1010 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [10:08:56] is it possible there's something else still blocking the ug? [10:09:23] (03PS1) 10Urbanecm: Test rules reference only existing wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494188 (https://phabricator.wikimedia.org/T217541) [10:10:26] (03CR) 10jerkins-bot: [V: 04-1] Test rules reference only existing wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494188 (https://phabricator.wikimedia.org/T217541) (owner: 10Urbanecm) [10:14:14] (03CR) 10Gehel: [C: 04-1] [WIP] Add support for elasticsearch 6 (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/493234 (https://phabricator.wikimedia.org/T217196) (owner: 10DCausse) [10:14:49] RECOVERY - Check systemd state on labsdb1010 is OK: OK - running: The system is fully operational [10:15:20] 10Operations, 10monitoring: prometheus-mysqld-exporter package 0.11.0 options changed - https://phabricator.wikimedia.org/T217542 (10Marostegui) I was able to start it by changing the default file with these options: ` root@labsdb1010:~# diff -u /etc/default/prometheus-mysqld-exporter /root/prometheus-mysqld-e... [10:17:33] (03PS3) 10Dzahn: contint: phaseout android slave [puppet] - 10https://gerrit.wikimedia.org/r/491574 (https://phabricator.wikimedia.org/T198495) (owner: 10Hashar) [10:19:30] (03PS3) 10Gehel: Set right owner for admin grants sql [puppet] - 10https://gerrit.wikimedia.org/r/493231 (owner: 10MSantos) [10:20:33] (03CR) 10Gehel: [C: 03+2] Set right owner for admin grants sql [puppet] - 10https://gerrit.wikimedia.org/r/493231 (owner: 10MSantos) [10:22:33] 10Operations, 10serviceops, 10Performance-Team (Radar), 10User-Elukey, 10User-jijiki: Upgrade memcached for Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10elukey) [10:22:35] 10Operations, 10serviceops, 10Performance-Team (Radar), 10User-Elukey, 10User-jijiki: Test different growth factors for memcached (prep step for upgrade to newer versions) - https://phabricator.wikimedia.org/T217020 (10elukey) 05Open→03Resolved a:03elukey [10:24:11] 10Operations, 10monitoring: prometheus-mysqld-exporter package 0.11.0 options changed - https://phabricator.wikimedia.org/T217542 (10fgiunchedi) I also noticed `0.10.0+ds-1~wmf1` for some reason wasn't in `stretch-wikimedia` and I've fixed it now, thus `0.10` should always take precedence now: ` labsdb1009:~$... [10:24:56] (03PS4) 10Dzahn: contint: phaseout android slave [puppet] - 10https://gerrit.wikimedia.org/r/491574 (https://phabricator.wikimedia.org/T198495) (owner: 10Hashar) [10:26:51] 10Operations, 10monitoring: prometheus-mysqld-exporter package 0.11.0 options changed - https://phabricator.wikimedia.org/T217542 (10MoritzMuehlenhoff) Buster has 0.11.0+ds-1, so this will also be an issue once the first DB host runs it [10:30:04] jan_drewniak: #bothumor My software never has bugs. It just develops random features. Rise for Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190304T1030). [10:30:56] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494191 (https://phabricator.wikimedia.org/T128546) [10:31:14] (03CR) 10Gehel: Add cookbook for elastic6 upgrade (036 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/493436 (owner: 10DCausse) [10:32:29] (03CR) 10Volans: [C: 03+1] "LGTM. It would be nice to also:" [puppet] - 10https://gerrit.wikimedia.org/r/490404 (https://phabricator.wikimedia.org/T215183) (owner: 10CDanis) [10:33:48] (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494191 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:34:47] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494191 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:35:54] (03CR) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494191 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:37:44] 10Operations, 10monitoring: prometheus-mysqld-exporter package 0.11.0 options changed - https://phabricator.wikimedia.org/T217542 (10Marostegui) Thanks @fgiunchedi - I have done another `apt full-upgrade` and it downgraded the package and now it works without any manual workaround. Feel free to close this task... [10:37:52] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:494191| Bumping portals to master (T128546)]] (duration: 00m 50s) [10:37:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:37:55] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [10:38:27] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494193 [10:38:43] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:494191| Bumping portals to master (T128546)]] (duration: 00m 50s) [10:38:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:28] (03PS1) 10GTirloni: wmcs::nfs::misc - Fixes and backup role [puppet] - 10https://gerrit.wikimedia.org/r/494195 (https://phabricator.wikimedia.org/T209527) [10:40:20] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to tsops@ - https://phabricator.wikimedia.org/T210464 (10Dzahn) @bcampbell Cool, thank you! Removed pat@, gary@ and box6699@. Now the only legal-related aliases we maintain are: ` 283 legal-en: legal 284 gc: legal... [10:40:38] 10Operations, 10Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144 (10Dzahn) [10:40:44] 10Operations, 10Mail, 10WMF-Legal: Tracking down gary@ and redirecting it to tsops@ - https://phabricator.wikimedia.org/T210464 (10Dzahn) 05Open→03Resolved [10:40:49] (03CR) 10jerkins-bot: [V: 04-1] wmcs::nfs::misc - Fixes and backup role [puppet] - 10https://gerrit.wikimedia.org/r/494195 (https://phabricator.wikimedia.org/T209527) (owner: 10GTirloni) [10:41:56] 10Operations, 10monitoring: prometheus-mysqld-exporter package 0.11.0 options changed - https://phabricator.wikimedia.org/T217542 (10jcrespo) If `--collect` is supported on 0.10, we may almost only need that change. [10:44:08] (03PS2) 10GTirloni: wmcs::nfs::misc - Fixes and backup role [puppet] - 10https://gerrit.wikimedia.org/r/494195 (https://phabricator.wikimedia.org/T209527) [10:45:49] (03PS2) 10Urbanecm: Test rules reference only existing wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494188 (https://phabricator.wikimedia.org/T217541) [10:46:33] (03CR) 10jerkins-bot: [V: 04-1] Test rules reference only existing wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494188 (https://phabricator.wikimedia.org/T217541) (owner: 10Urbanecm) [10:48:16] (03CR) 10Muehlenhoff: wmcs::nfs::misc - Fixes and backup role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/494195 (https://phabricator.wikimedia.org/T209527) (owner: 10GTirloni) [10:49:02] (03PS3) 10Urbanecm: Test rules reference only existing wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494188 (https://phabricator.wikimedia.org/T217541) [10:50:05] (03CR) 10jerkins-bot: [V: 04-1] Test rules reference only existing wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494188 (https://phabricator.wikimedia.org/T217541) (owner: 10Urbanecm) [10:50:12] (03PS3) 10GTirloni: wmcs::nfs::misc - Fixes and backup role [puppet] - 10https://gerrit.wikimedia.org/r/494195 (https://phabricator.wikimedia.org/T209527) [10:50:50] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494193 (owner: 10Marostegui) [10:51:18] (03PS4) 10Urbanecm: Test rules reference only existing wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494188 (https://phabricator.wikimedia.org/T217541) [10:51:47] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494193 (owner: 10Marostegui) [10:52:19] (03CR) 10jerkins-bot: [V: 04-1] Test rules reference only existing wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494188 (https://phabricator.wikimedia.org/T217541) (owner: 10Urbanecm) [10:53:01] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: More weight to db1089 (duration: 00m 48s) [10:53:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:53:19] (03PS4) 10GTirloni: wmcs::nfs::misc - Fixes and backup role [puppet] - 10https://gerrit.wikimedia.org/r/494195 (https://phabricator.wikimedia.org/T209527) [10:53:44] (03CR) 10GTirloni: wmcs::nfs::misc - Fixes and backup role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/494195 (https://phabricator.wikimedia.org/T209527) (owner: 10GTirloni) [10:54:05] (03PS5) 10Urbanecm: Test rules reference only existing wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494188 (https://phabricator.wikimedia.org/T217541) [10:54:22] (03CR) 10Dzahn: "are you planning to add it to SWAT?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482100 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [10:55:16] (03CR) 10jerkins-bot: [V: 04-1] Test rules reference only existing wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494188 (https://phabricator.wikimedia.org/T217541) (owner: 10Urbanecm) [10:56:41] (03PS6) 10Urbanecm: Test rules reference only existing wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494188 (https://phabricator.wikimedia.org/T217541) [10:56:58] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Add citoid specific statsd mappings [deployment-charts] - 10https://gerrit.wikimedia.org/r/493669 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris) [10:57:10] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Publish citoid 0.0.2 version [deployment-charts] - 10https://gerrit.wikimedia.org/r/493670 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris) [10:57:45] (03CR) 10Dzahn: "but it's not "socket config", it's config for the host name of a log server? how does gerrit get the right logserver name then?" [puppet] - 10https://gerrit.wikimedia.org/r/490797 (owner: 10Paladox) [10:58:51] (03PS1) 10Alexandros Kosiaris: kubernetes default calico policy: Allow zotero.discovery.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/494197 [10:58:53] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494193 (owner: 10Marostegui) [10:59:24] (03CR) 10Dzahn: [C: 04-1] "yea..so i don't see a problem statement and hashar says we don't need more." [puppet] - 10https://gerrit.wikimedia.org/r/489475 (owner: 10Paladox) [11:00:31] (03PS5) 10GTirloni: wmcs::nfs::misc - Fixes and backup role [puppet] - 10https://gerrit.wikimedia.org/r/494195 (https://phabricator.wikimedia.org/T209527) [11:03:33] (03CR) 10Dzahn: [C: 04-1] "per previous comments, using the "minsize" command seems wrong here. let's use just "check_https_url_at_address" and the "notes_url" param" [puppet] - 10https://gerrit.wikimedia.org/r/489457 (https://phabricator.wikimedia.org/T215457) (owner: 10Paladox) [11:04:16] !log akosiaris@deploy1001 scap-helm citoid upgrade -f citoid-staging-values.yaml staging stable/citoid [namespace: citoid, clusters: staging] [11:04:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:17] !log akosiaris@deploy1001 scap-helm citoid cluster staging completed [11:04:17] !log akosiaris@deploy1001 scap-helm citoid finished [11:04:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:23] (03PS7) 10Urbanecm: Test rules reference only existing wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494188 (https://phabricator.wikimedia.org/T217541) [11:04:39] (03CR) 10jerkins-bot: [V: 04-1] Test rules reference only existing wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494188 (https://phabricator.wikimedia.org/T217541) (owner: 10Urbanecm) [11:04:40] !log akosiaris@deploy1001 scap-helm citoid upgrade -f citoid-eqiad-values.yaml production stable/citoid [namespace: citoid, clusters: eqiad] [11:04:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:41] !log akosiaris@deploy1001 scap-helm citoid cluster eqiad completed [11:04:41] !log akosiaris@deploy1001 scap-helm citoid finished [11:04:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:55] !log akosiaris@deploy1001 scap-helm citoid upgrade -f citoid-codfw-values.yaml production stable/citoid [namespace: citoid, clusters: codfw] [11:04:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:57] !log akosiaris@deploy1001 scap-helm citoid cluster codfw completed [11:04:57] !log akosiaris@deploy1001 scap-helm citoid finished [11:04:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:04:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:31] (03PS8) 10Urbanecm: Test rules reference only existing wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494188 (https://phabricator.wikimedia.org/T217541) [11:13:08] (03CR) 10Muehlenhoff: Add system timer for running ganeti->netbox sync. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [11:20:47] (03CR) 10Dzahn: "one of the comments on upstream link says this only affects data _during_ the gc run but after that it gets auto-fixed. is that right?" [puppet] - 10https://gerrit.wikimedia.org/r/493963 (https://phabricator.wikimedia.org/T217497) (owner: 10Paladox) [11:21:41] (03CR) 10Alexandros Kosiaris: [C: 03+2] "Applied already, seems to work fine, merging" [puppet] - 10https://gerrit.wikimedia.org/r/494197 (owner: 10Alexandros Kosiaris) [11:23:04] (03PS6) 10Elukey: hadoop: allow the configuration of ssl-(server|client).xml configs [puppet] - 10https://gerrit.wikimedia.org/r/493693 (https://phabricator.wikimedia.org/T217412) [11:26:27] (03PS1) 10Alexandros Kosiaris: Send traffic for citoid to kubernetes hosts as well [puppet] - 10https://gerrit.wikimedia.org/r/494200 [11:28:15] (03PS14) 10Mathew.onipe: [WIP] Add support for elasticsearch 6 [puppet] - 10https://gerrit.wikimedia.org/r/493234 (https://phabricator.wikimedia.org/T217196) (owner: 10DCausse) [11:28:38] (03CR) 10Mathew.onipe: [WIP] Add support for elasticsearch 6 (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/493234 (https://phabricator.wikimedia.org/T217196) (owner: 10DCausse) [11:33:05] (03PS15) 10Mathew.onipe: Add support for elasticsearch 6 [puppet] - 10https://gerrit.wikimedia.org/r/493234 (https://phabricator.wikimedia.org/T217196) (owner: 10DCausse) [11:33:21] PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [11:34:48] akosiaris: is this related to your patch? ^^^ [11:45:44] (03CR) 10Jbond: [V: 03+2] Remove if statement as we now use defaultdict [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493720 (owner: 10Jbond) [11:45:51] (03CR) 10Jbond: [V: 03+2 C: 03+2] Remove if statement as we now use defaultdict [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493720 (owner: 10Jbond) [11:56:47] 10Operations, 10monitoring: prometheus-mysqld-exporter package 0.11.0 options changed - https://phabricator.wikimedia.org/T217542 (10fgiunchedi) Yeah I think it makes sense to resolve this @Marostegui as we have {T161296} already. @jcrespo no afaict in 0.10 command line options had one `-` whereas in 0.11 they... [11:57:25] volans: yup, fixed [11:57:26] thanks! [11:57:48] cheers [11:58:07] RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge. [11:59:26] (03CR) 10Nikerabbit: [C: 03+1] "+1 on the condition that someone verifies that the value of `wgContentTranslationCampaigns` for enwiki/dewiki has newarticle => false befo" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493155 (https://phabricator.wikimedia.org/T216123) (owner: 10KartikMistry) [12:00:05] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: That opportune time is upon us again. Time for a European Mid-day SWAT(Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190304T1200). [12:00:05] No GERRIT patches in the queue for this window AFAICS. [12:00:27] no patches, no problemo :) [12:00:31] 10Operations, 10Mail: Move most (all?) exim personal aliases to OIT - https://phabricator.wikimedia.org/T122144 (10faidon) 05Resolved→03Open >>! In T122144#4152079, @Dzahn wrote: > or they are individual aliases (out of scope of this ticket) Individual/personal aliases were actually the original scope of... [12:03:25] apergos: Honestly, nfi.. They've not come back and said they still can't access it etc [12:03:56] that's pretty weird. because there's definitely no pages (says the wiki, and this is why my incrementals job whines too0 [12:04:59] (03CR) 10Alexandros Kosiaris: [V: 03+2] prometheus-statsd-exporter: Run as prometheus-statsd-exporter [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/493196 (owner: 10Alexandros Kosiaris) [12:20:45] RECOVERY - Check systemd state on ganeti1008 is OK: OK - running: The system is fully operational [12:21:55] 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for sukhe - https://phabricator.wikimedia.org/T217438 (10elukey) @Tbayer what kind of access is needed? I guess analytics-privatedata-users but just want to be sure :) [12:23:19] !log testing component/php72 on mw2224 [12:23:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:26:23] PROBLEM - DPKG on mw2224 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:27:35] RECOVERY - DPKG on mw2224 is OK: All packages OK [12:30:59] (03PS2) 10Dzahn: admins/enforce-users-groups: remove exception for parsoid-rt user [puppet] - 10https://gerrit.wikimedia.org/r/490407 (https://phabricator.wikimedia.org/T216062) [12:31:46] (03CR) 10Dzahn: [C: 03+2] "double checked with cumin, user exists nowhere , especially not on wtp* parsoid hosts or scandium" [puppet] - 10https://gerrit.wikimedia.org/r/490407 (https://phabricator.wikimedia.org/T216062) (owner: 10Dzahn) [12:37:21] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler1002/14953/" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/492390 (owner: 10Herron) [12:38:00] (03CR) 10Elukey: [C: 04-1] "Wrong assumptions, since each host needs to have its own certificate." [puppet] - 10https://gerrit.wikimedia.org/r/493693 (https://phabricator.wikimedia.org/T217412) (owner: 10Elukey) [12:39:12] (03CR) 10Muehlenhoff: [C: 03+1] "JFTR, I did some tests on a codfw app server and that seems fine, it can be considered for further testing in toolforge from my PoV." [puppet] - 10https://gerrit.wikimedia.org/r/493451 (https://phabricator.wikimedia.org/T216712) (owner: 10BryanDavis) [12:40:04] (03PS8) 10Dzahn: jenkins: add data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/485094 [12:40:26] (03CR) 10Dzahn: [C: 03+2] jenkins: add data types to parameters [puppet] - 10https://gerrit.wikimedia.org/r/485094 (owner: 10Dzahn) [12:44:04] (03CR) 10Paladox: "@Dzahn nope, it does not auto fix it’s self and in matter of fact from what I read it can lead to data loss, corruption or accounts becomi" [puppet] - 10https://gerrit.wikimedia.org/r/493963 (https://phabricator.wikimedia.org/T217497) (owner: 10Paladox) [12:45:06] (03CR) 10Alexandros Kosiaris: Introduce cxserver helm chart (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/492301 (https://phabricator.wikimedia.org/T213195) (owner: 10Alexandros Kosiaris) [12:45:22] (03PS2) 10Alexandros Kosiaris: Introduce cxserver helm chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/492301 (https://phabricator.wikimedia.org/T213195) [12:49:01] (03PS2) 10Alexandros Kosiaris: Send traffic for citoid to kubernetes hosts as well [puppet] - 10https://gerrit.wikimedia.org/r/494200 (https://phabricator.wikimedia.org/T213194) [12:49:18] 10Operations, 10Wikimedia-Logstash: Retire udp2log: onboard its producers and consumers to the logging pipeline - https://phabricator.wikimedia.org/T205856 (10fgiunchedi) [12:49:20] 10Operations, 10MediaWiki-Debug-Logger, 10Wikimedia-Logstash, 10monitoring: MediaWiki logging & encryption - https://phabricator.wikimedia.org/T126989 (10fgiunchedi) [12:52:48] (03Abandoned) 10Paladox: Gerrit: Remove socket config from log4j [puppet] - 10https://gerrit.wikimedia.org/r/490797 (owner: 10Paladox) [12:53:33] (03PS1) 10Jbond: Add exception handeling for pidof call [puppet] - 10https://gerrit.wikimedia.org/r/494210 [12:57:04] (03PS1) 10Muehlenhoff: Switch app servers to component/php72 [puppet] - 10https://gerrit.wikimedia.org/r/494212 (https://phabricator.wikimedia.org/T216712) [12:57:39] (03CR) 10jerkins-bot: [V: 04-1] Switch app servers to component/php72 [puppet] - 10https://gerrit.wikimedia.org/r/494212 (https://phabricator.wikimedia.org/T216712) (owner: 10Muehlenhoff) [12:58:49] (03PS2) 10Muehlenhoff: Switch app servers to component/php72 [puppet] - 10https://gerrit.wikimedia.org/r/494212 (https://phabricator.wikimedia.org/T216712) [13:02:25] 10Operations, 10ops-eqiad: Degraded RAID on sodium - https://phabricator.wikimedia.org/T217356 (10faidon) [13:02:29] 10Operations, 10ops-eqiad: Degraded RAID on sodium - https://phabricator.wikimedia.org/T212010 (10faidon) [13:03:32] 10Operations, 10ops-eqiad: Degraded RAID on sodium - https://phabricator.wikimedia.org/T212010 (10faidon) p:05Normal→03High I just merged a duplicate in. @Cmjohnson what's the status of this? [13:07:48] (03PS3) 10Alexandros Kosiaris: Send traffic for citoid to kubernetes hosts as well [puppet] - 10https://gerrit.wikimedia.org/r/494200 (https://phabricator.wikimedia.org/T213194) [13:07:50] (03PS1) 10Alexandros Kosiaris: lvs: Use the kubernetes cluster for citoid [puppet] - 10https://gerrit.wikimedia.org/r/494213 (https://phabricator.wikimedia.org/T213194) [13:07:52] (03PS1) 10Alexandros Kosiaris: citoid: Clean up old scb cluster stanzas [puppet] - 10https://gerrit.wikimedia.org/r/494214 (https://phabricator.wikimedia.org/T213194) [13:07:54] (03PS1) 10Alexandros Kosiaris: Remove citoid role/profile [puppet] - 10https://gerrit.wikimedia.org/r/494215 (https://phabricator.wikimedia.org/T213194) [13:09:37] (03CR) 10Alexandros Kosiaris: [C: 04-1] "-1 for stalling this until we can give the installation in kubernetes a thumbs up" [puppet] - 10https://gerrit.wikimedia.org/r/494215 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris) [13:14:22] (03PS2) 10Bmansurov: Enable reader demographics survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493235 (https://phabricator.wikimedia.org/T217080) [13:14:30] (03CR) 10Muehlenhoff: Add exception handeling for pidof call (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/494210 (owner: 10Jbond) [13:19:06] (03PS2) 10Bmansurov: Disable reader demographics survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493236 (https://phabricator.wikimedia.org/T217080) [13:21:10] (03CR) 10Muehlenhoff: Remove citoid role/profile (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/494215 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris) [13:24:56] (03PS7) 10Elukey: hadoop: allow the configuration of ssl-(server|client).xml configs [puppet] - 10https://gerrit.wikimedia.org/r/493693 (https://phabricator.wikimedia.org/T217412) [13:27:54] (03CR) 10Jbond: Add exception handeling for pidof call (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/494210 (owner: 10Jbond) [13:32:25] 10Operations, 10LDAP-Access-Requests: Add bmansurov to archiva-deployers LDAP group - https://phabricator.wikimedia.org/T217447 (10Ottomata) Hm, I'm not sure how you are going to use archiva-deployers to do automated uploading of artifacts. archiva-deployers is an LDAP user group, and you won't be able to use... [13:33:57] 10Operations, 10monitoring: prometheus-mysqld-exporter package 0.11.0 options changed - https://phabricator.wikimedia.org/T217542 (10Marostegui) 05Open→03Resolved a:03fgiunchedi [13:36:39] (03CR) 10Alexandros Kosiaris: [C: 04-1] "yeah, I was following the principle as the comment in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/490069/1/modules/admin/data/d" [puppet] - 10https://gerrit.wikimedia.org/r/494215 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris) [13:39:24] jouncebot: next [13:39:25] In 4 hour(s) and 20 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190304T1800) [13:42:42] (03PS8) 10Elukey: hadoop: allow the configuration of ssl-(server|client).xml configs [puppet] - 10https://gerrit.wikimedia.org/r/493693 (https://phabricator.wikimedia.org/T217412) [13:42:58] (03PS1) 10MarcoAurelio: WIP: Restrict local uploads on mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494218 [13:44:41] (03PS2) 10MarcoAurelio: Restrict local uploads on mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494218 (https://phabricator.wikimedia.org/T217523) [13:47:24] (03CR) 10Muehlenhoff: "Ok, fine with me." [puppet] - 10https://gerrit.wikimedia.org/r/494215 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris) [13:52:18] (03PS4) 10Alexandros Kosiaris: Send traffic for citoid to kubernetes hosts as well [puppet] - 10https://gerrit.wikimedia.org/r/494200 (https://phabricator.wikimedia.org/T213194) [13:52:20] (03PS2) 10Alexandros Kosiaris: lvs: Use the kubernetes cluster for citoid [puppet] - 10https://gerrit.wikimedia.org/r/494213 (https://phabricator.wikimedia.org/T213194) [13:52:22] (03PS2) 10Alexandros Kosiaris: citoid: Clean up old scb cluster stanzas [puppet] - 10https://gerrit.wikimedia.org/r/494214 (https://phabricator.wikimedia.org/T213194) [13:52:24] (03PS2) 10Alexandros Kosiaris: Remove citoid role/profile [puppet] - 10https://gerrit.wikimedia.org/r/494215 (https://phabricator.wikimedia.org/T213194) [13:53:48] (03CR) 10Muehlenhoff: [C: 03+1] Add exception handeling for pidof call (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/494210 (owner: 10Jbond) [13:54:37] PROBLEM - LVS HTTP IPv4 on wdqs.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:55:05] gehel or onimisionipe around? [13:55:07] gehel, onimisionipe: you around? [13:55:10] XDDD [13:55:11] :) [13:55:14] (03PS1) 10Muehlenhoff: Remove mcelog on systems which were upgraded from stretch [puppet] - 10https://gerrit.wikimedia.org/r/494220 (https://phabricator.wikimedia.org/T205396) [13:55:15] looking [13:55:19] thanks [13:56:56] wow [13:57:09] PROBLEM - LVS HTTP IPv4 on wdqs.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:57:35] !log restarting blazegraph on wdqs eqiad [13:57:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:58:45] 10Operations, 10LDAP-Access-Requests: Add bmansurov to archiva-deployers LDAP group - https://phabricator.wikimedia.org/T217447 (10bmansurov) @Ottomata I'll be deploying artifacts from my local machine, similar to [[ https://wikitech.wikimedia.org/wiki/Discovery/Analytics | how the Discovery team is doing it ]]. [13:59:59] (03PS2) 10Muehlenhoff: Remove mcelog on systems which were upgraded from stretch [puppet] - 10https://gerrit.wikimedia.org/r/494220 (https://phabricator.wikimedia.org/T205396) [14:01:51] RECOVERY - LVS HTTP IPv4 on wdqs.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.230 second response time [14:03:20] (03PS3) 10Alexandros Kosiaris: Introduce cxserver helm chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/492301 (https://phabricator.wikimedia.org/T213195) [14:03:31] (03PS1) 10Marostegui: db-eqiad.php: Depool db1097:3314,db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494221 (https://phabricator.wikimedia.org/T217397) [14:04:26] (03PS1) 10Sbisson: Enable and configure ORES goodfaith and damaging rcfilters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494222 (https://phabricator.wikimedia.org/T161628) [14:04:48] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1097:3314,db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494221 (https://phabricator.wikimedia.org/T217397) (owner: 10Marostegui) [14:05:49] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1097:3314,db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494221 (https://phabricator.wikimedia.org/T217397) (owner: 10Marostegui) [14:05:55] (03PS2) 10Sbisson: Enable and configure the ORES goodfaith model on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493749 (https://phabricator.wikimedia.org/T211032) [14:05:57] (03PS4) 10Alexandros Kosiaris: Introduce cxserver helm chart [deployment-charts] - 10https://gerrit.wikimedia.org/r/492301 (https://phabricator.wikimedia.org/T213195) [14:06:18] (03CR) 10Sbisson: Enable and configure the ORES goodfaith model on itwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493749 (https://phabricator.wikimedia.org/T211032) (owner: 10Sbisson) [14:06:59] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1097:3314, db1100 to changeindexes on logging tbale (duration: 00m 50s) [14:07:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:48] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1097:3314,db1100 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494221 (https://phabricator.wikimedia.org/T217397) (owner: 10Marostegui) [14:12:18] (03PS1) 10Sbisson: Enable GrowthExperiments Homepage on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494223 (https://phabricator.wikimedia.org/T215982) [14:13:07] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10Addshore) For #cognate this will result in DBReadOnlyErrors for page creations, redirect changes, moves, deletions, in the main namespace on wiktionaries. W... [14:13:37] (03PS2) 10Elukey: Assign role labs::db::wikireplica_analytics to labsdb1012 [puppet] - 10https://gerrit.wikimedia.org/r/493653 (https://phabricator.wikimedia.org/T215231) [14:15:36] 10Operations, 10LDAP-Access-Requests: Add bmansurov to archiva-deployers LDAP group - https://phabricator.wikimedia.org/T217447 (10Ottomata) Hm, ok then, I guess this is a good solution for now. The upload/deploy scripts in https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/493762/ probably should go in... [14:16:30] (03CR) 10Jbond: [C: 03+2] Add exception handeling for pidof call [puppet] - 10https://gerrit.wikimedia.org/r/494210 (owner: 10Jbond) [14:16:39] (03PS2) 10Jbond: Add exception handeling for pidof call [puppet] - 10https://gerrit.wikimedia.org/r/494210 [14:20:01] !log Change indexes on logging table on db1100 (s5) and db1097:3314 (commonswiki) - T217397 [14:20:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:20:04] T217397: logging.log_title_time and logging.log_title_type_time indexes are not on tables.sql but they exist on most of the wikis - https://phabricator.wikimedia.org/T217397 [14:20:12] !log update puppet compiler's facts [14:20:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:23:46] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10Marostegui) >>! In T187960#4997817, @Addshore wrote: > For #cognate this will result in DBReadOnlyErrors for page creations, redirect changes, moves, deleti... [14:24:27] (03PS1) 10Herron: rsyslog: replace logstash1006 with logstash1012 in kafka_shipper [puppet] - 10https://gerrit.wikimedia.org/r/494224 (https://phabricator.wikimedia.org/T213898) [14:24:47] (03PS1) 10MarcoAurelio: Create an 'uploader' group on mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494225 (https://phabricator.wikimedia.org/T217523) [14:26:30] (03CR) 10Jbond: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/494220 (https://phabricator.wikimedia.org/T205396) (owner: 10Muehlenhoff) [14:29:17] 10Operations, 10ops-eqiad, 10Cognate, 10Growth-Team, and 3 others: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10Marostegui) [14:29:31] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/14956/" [puppet] - 10https://gerrit.wikimedia.org/r/493653 (https://phabricator.wikimedia.org/T215231) (owner: 10Elukey) [14:30:27] (03CR) 10Marostegui: [C: 03+1] Assign role labs::db::wikireplica_analytics to labsdb1012 [puppet] - 10https://gerrit.wikimedia.org/r/493653 (https://phabricator.wikimedia.org/T215231) (owner: 10Elukey) [14:30:51] (03CR) 10Herron: [C: 03+2] "ready to have notifications re-enabled. reverting" [puppet] - 10https://gerrit.wikimedia.org/r/493476 (https://phabricator.wikimedia.org/T213898) (owner: 10Herron) [14:30:58] (03PS1) 10Herron: Revert "logstash: disable notifications on logstash1006 and logstash1012" [puppet] - 10https://gerrit.wikimedia.org/r/494226 [14:32:09] (03PS2) 10Herron: Revert "logstash: disable notifications on logstash1006 and logstash1012" [puppet] - 10https://gerrit.wikimedia.org/r/494226 [14:32:39] (03PS3) 10Muehlenhoff: Remove mcelog on systems which were upgraded from stretch [puppet] - 10https://gerrit.wikimedia.org/r/494220 (https://phabricator.wikimedia.org/T205396) [14:34:36] 10Operations, 10ops-eqiad, 10Cognate, 10Growth-Team, and 4 others: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10Marostegui) #reading-infrastructure-team-backlog tagging you here as this affects x1 master (T187960#4997790) which might be something you use, so... [14:35:46] (03CR) 10Herron: [C: 03+2] Revert "logstash: disable notifications on logstash1006 and logstash1012" [puppet] - 10https://gerrit.wikimedia.org/r/494226 (owner: 10Herron) [14:36:31] (03PS3) 10Elukey: Assign role labs::db::wikireplica_analytics to labsdb1012 [puppet] - 10https://gerrit.wikimedia.org/r/493653 (https://phabricator.wikimedia.org/T215231) [14:36:33] (03PS4) 10Muehlenhoff: Remove mcelog on systems which were upgraded from stretch [puppet] - 10https://gerrit.wikimedia.org/r/494220 (https://phabricator.wikimedia.org/T205396) [14:40:09] (03CR) 10Elukey: [C: 03+2] Assign role labs::db::wikireplica_analytics to labsdb1012 [puppet] - 10https://gerrit.wikimedia.org/r/493653 (https://phabricator.wikimedia.org/T215231) (owner: 10Elukey) [14:42:50] (03PS5) 10Muehlenhoff: Remove mcelog on systems which were upgraded from stretch [puppet] - 10https://gerrit.wikimedia.org/r/494220 (https://phabricator.wikimedia.org/T205396) [14:44:18] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install labsdb1012.eqiad.wmnet - https://phabricator.wikimedia.org/T215231 (10elukey) Next step is to copy data from labsdb1011 and then start mysql and configure analytics-specific things. [14:44:22] (03CR) 10Sbisson: [C: 04-1] GrowthExperiments: Enable help panel for user and user talk NS (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493616 (https://phabricator.wikimedia.org/T215664) (owner: 10Kosta Harlan) [14:44:32] 10Operations, 10Security: update tar - https://phabricator.wikimedia.org/T216242 (10jbond) 05Open→03Resolved [14:44:39] (03CR) 10Muehlenhoff: [C: 03+2] Remove mcelog on systems which were upgraded from stretch [puppet] - 10https://gerrit.wikimedia.org/r/494220 (https://phabricator.wikimedia.org/T205396) (owner: 10Muehlenhoff) [14:46:57] (03CR) 10Alexandros Kosiaris: "This is working now in tests. Will require some more fiddling with benchmarking but looks already ok" [deployment-charts] - 10https://gerrit.wikimedia.org/r/492301 (https://phabricator.wikimedia.org/T213195) (owner: 10Alexandros Kosiaris) [14:47:01] (03PS5) 10MarcoAurelio: Restore bureaucrat rights on hi.wiktionary to default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/492447 (https://phabricator.wikimedia.org/T214765) (owner: 10Sau226) [14:47:17] (03PS2) 10Ottomata: eventgate: set compression.codec: snappy and message.max.bytes: 4194304 [deployment-charts] - 10https://gerrit.wikimedia.org/r/493444 (https://phabricator.wikimedia.org/T206785) [14:47:39] (03CR) 10Ottomata: [V: 03+2 C: 03+2] eventgate: set compression.codec: snappy and message.max.bytes: 4194304 [deployment-charts] - 10https://gerrit.wikimedia.org/r/493444 (https://phabricator.wikimedia.org/T206785) (owner: 10Ottomata) [14:50:22] (03CR) 10Herron: [C: 03+2] rsyslog: replace logstash1006 with logstash1012 in kafka_shipper [puppet] - 10https://gerrit.wikimedia.org/r/494224 (https://phabricator.wikimedia.org/T213898) (owner: 10Herron) [14:50:29] (03PS2) 10Herron: rsyslog: replace logstash1006 with logstash1012 in kafka_shipper [puppet] - 10https://gerrit.wikimedia.org/r/494224 (https://phabricator.wikimedia.org/T213898) [14:51:06] (03PS1) 10Marostegui: db-eqiad.php: Repool db1100 and db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494230 [14:52:20] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Repool db1100 and db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494230 (owner: 10Marostegui) [14:52:35] (03PS1) 10Ottomata: Remove legacy EventBus config settings. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494231 [14:52:58] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch - https://phabricator.wikimedia.org/T213898 (10herron) [14:53:19] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1100 and db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494230 (owner: 10Marostegui) [14:54:36] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1089 and db1100 after changing index on logging tbale (duration: 00m 49s) [14:54:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:13] (03CR) 10Jforrester: "This should really be three patches, but eh." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493482 (https://phabricator.wikimedia.org/T217365) (owner: 10Esanders) [14:56:15] 10Operations, 10ops-eqiad, 10DC-Ops, 10Wikimedia-Logstash, 10User-herron: Decommission old eqiad logstash hardware hosts logstash100[456] - https://phabricator.wikimedia.org/T217556 (10herron) p:05Triage→03Normal [14:56:38] (03PS6) 10Jbond: Add ability to filter out auto restarts [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/493463 [14:57:40] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: Socket timeout on wdqs.svc.eqiad.wmnet - https://phabricator.wikimedia.org/T217557 (10Gehel) [14:59:11] (03PS27) 10Jbond: Improve CI checks to ensure a basic catalogue compiles on all supported OS's [puppet] - 10https://gerrit.wikimedia.org/r/487882 (https://phabricator.wikimedia.org/T215275) [14:59:39] 10Operations, 10Wikimedia-Logstash, 10User-fgiunchedi, 10User-herron: Increase utilization of application logging pipeline (FY2018-2019 Q3 TEC6) - https://phabricator.wikimedia.org/T213157 (10herron) [14:59:48] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Replace and expand Elasticsearch/Kafka storage in eqiad and upgrade the cluster from Debian jessie to stretch - https://phabricator.wikimedia.org/T213898 (10herron) 05Open→03Resolved a:03herron Service migration and OS upgrade wor... [14:59:57] 10Operations, 10Wikimedia-Logstash, 10User-fgiunchedi, 10User-herron: Increase utilization of application logging pipeline (FY2018-2019 Q3 TEC6) - https://phabricator.wikimedia.org/T213157 (10herron) [15:00:03] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: Socket timeout on wdqs.svc.eqiad.wmnet - https://phabricator.wikimedia.org/T217557 (10Gehel) [15:00:41] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Migrate at least 3 existing Logstash inputs and associated producers to the new Kafka-logging pipeline, and remove the associated non-Kafka Logstash inputs - https://phabricator.wikimedia.org/T213899 (10herron) [15:01:08] (03CR) 10Ottomata: [C: 03+2] Remove legacy EventBus config settings. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494231 (owner: 10Ottomata) [15:01:12] (03PS2) 10Ottomata: Remove legacy EventBus config settings. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494231 [15:01:14] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1100 and db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494230 (owner: 10Marostegui) [15:06:18] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [15:06:57] (03PS1) 10Marostegui: db-eqiad.php: Repool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494232 [15:07:30] (03PS11) 10Herron: rsyslog: change udp_localhost_compat to define, add mwlog_compat [puppet] - 10https://gerrit.wikimedia.org/r/492390 [15:08:40] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [15:09:26] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Repool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494232 (owner: 10Marostegui) [15:10:35] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494232 (owner: 10Marostegui) [15:11:20] (03PS9) 10Elukey: hadoop: allow the configuration of ssl-(server|client).xml configs [puppet] - 10https://gerrit.wikimedia.org/r/493693 (https://phabricator.wikimedia.org/T217412) [15:11:48] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1097:3314 after changing index on logging table (duration: 00m 51s) [15:11:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:33] (03CR) 10jenkins-bot: Remove legacy EventBus config settings. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494231 (owner: 10Ottomata) [15:13:39] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1097:3314 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494232 (owner: 10Marostegui) [15:14:08] (03PS1) 10Gilles: Increase CPU benchmark sampling factor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494234 (https://phabricator.wikimedia.org/T209857) [15:15:38] (03CR) 10Gilles: [C: 03+2] Increase CPU benchmark sampling factor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494234 (https://phabricator.wikimedia.org/T209857) (owner: 10Gilles) [15:16:41] (03Merged) 10jenkins-bot: Increase CPU benchmark sampling factor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494234 (https://phabricator.wikimedia.org/T209857) (owner: 10Gilles) [15:18:47] 10Operations, 10User-fgiunchedi: Upgrade mysqld_exporter in production - https://phabricator.wikimedia.org/T161296 (10jcrespo) After some minimal changes, it starts correctly. ` e="2019-03-04T14:44:23Z" level=info msg="Starting mysqld_exporter (version=0.11.0+ds, branch=debian/sid, revision=0.11.0 e="2019-03-0... [15:19:32] 10Operations, 10DBA, 10User-fgiunchedi: Upgrade mysqld_exporter in production - https://phabricator.wikimedia.org/T161296 (10jcrespo) a:03jcrespo [15:20:14] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/14957/" [puppet] - 10https://gerrit.wikimedia.org/r/493693 (https://phabricator.wikimedia.org/T217412) (owner: 10Elukey) [15:20:45] (03PS1) 10Jcrespo: mariadb: Change the default arguments for buster [puppet] - 10https://gerrit.wikimedia.org/r/494236 (https://phabricator.wikimedia.org/T161296) [15:22:03] !log otto@deploy1001 Synchronized wmf-config/CommonSettings.php: no-op: Remove unused legacy EventBus config settings (duration: 00m 49s) [15:22:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:25:19] (03CR) 10jenkins-bot: Increase CPU benchmark sampling factor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494234 (https://phabricator.wikimedia.org/T209857) (owner: 10Gilles) [15:26:49] (03PS1) 10Hashar: Skip logging 'aux' messages from Docker [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/494238 [15:28:19] (03CR) 10jerkins-bot: [V: 04-1] Skip logging 'aux' messages from Docker [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/494238 (owner: 10Hashar) [15:31:39] 10Operations, 10Toolforge, 10Patch-For-Review, 10cloud-services-team (Kanban): Switch PHP 7.2 packages to an internal component - https://phabricator.wikimedia.org/T216712 (10jbond) p:05Triage→03Normal [15:32:25] (03PS1) 10Muehlenhoff: Initial Kerberos KDC/kadminserver profiles/roles [puppet] - 10https://gerrit.wikimedia.org/r/494242 [15:32:46] 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, 10serviceops, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10akosiaris) Per some IRC discussions we had in #wikimedia-serviceops, the code should be updated to be service-runner compatible as this wil... [15:33:22] (03CR) 10jerkins-bot: [V: 04-1] Initial Kerberos KDC/kadminserver profiles/roles [puppet] - 10https://gerrit.wikimedia.org/r/494242 (owner: 10Muehlenhoff) [15:34:24] (03PS2) 10Hashar: Skip logging 'aux' messages from Docker [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/494238 [15:37:01] (03CR) 10Effie Mouzeli: [C: 03+1] Send traffic for citoid to kubernetes hosts as well [puppet] - 10https://gerrit.wikimedia.org/r/494200 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris) [15:37:10] (03PS6) 10Esanders: VE: Enable true section editing for mobile on labs & testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493482 (https://phabricator.wikimedia.org/T217365) [15:42:12] (03PS2) 10Muehlenhoff: Initial Kerberos KDC/kadminserver profiles/roles [puppet] - 10https://gerrit.wikimedia.org/r/494242 [15:42:42] (03PS16) 10Eevans: Initial configuration for session storage service [puppet] - 10https://gerrit.wikimedia.org/r/487885 (https://phabricator.wikimedia.org/T215883) [15:43:07] (03CR) 10jerkins-bot: [V: 04-1] Initial Kerberos KDC/kadminserver profiles/roles [puppet] - 10https://gerrit.wikimedia.org/r/494242 (owner: 10Muehlenhoff) [15:44:12] !log Disabling puppet on sbc* and kubernetes* - T213194 [15:44:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:44:15] T213194: Migrate citoid to kubernetes - https://phabricator.wikimedia.org/T213194 [15:44:39] (03CR) 10Eevans: [C: 03+1] "Ping; Other than the private.git material (which AFAIK could be reproduced from the `cassandra-ca-manager` manifest in the labs repo), is " [puppet] - 10https://gerrit.wikimedia.org/r/487885 (https://phabricator.wikimedia.org/T215883) (owner: 10Eevans) [15:45:07] (03PS5) 10Effie Mouzeli: Send traffic for citoid to kubernetes hosts as well [puppet] - 10https://gerrit.wikimedia.org/r/494200 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris) [15:45:23] (03CR) 10Effie Mouzeli: [C: 03+2] Send traffic for citoid to kubernetes hosts as well [puppet] - 10https://gerrit.wikimedia.org/r/494200 (https://phabricator.wikimedia.org/T213194) (owner: 10Alexandros Kosiaris) [15:46:18] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Comments inline, plus a quick: Which software relies on $HOME existing? it's not a very valid assumption to make, perhaps it should be fix" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/494155 (owner: 10BryanDavis) [15:47:34] (03PS3) 10Muehlenhoff: Initial Kerberos KDC/kadminserver profiles/roles [puppet] - 10https://gerrit.wikimedia.org/r/494242 [15:51:51] (03PS1) 10Filippo Giunchedi: WIP: mirror udp2log data into the logging pipeline [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494254 (https://phabricator.wikimedia.org/T126989) [15:53:40] 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, 10serviceops, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10Tarrow) I am indeed already working on it. Just so you know the current state: we are already using blubber for the CI i.e. we have 'servi... [15:55:12] !log Running puppet on sbc* and kubernetes* - T213194 [15:55:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:55:15] T213194: Migrate citoid to kubernetes - https://phabricator.wikimedia.org/T213194 [15:55:23] (03PS2) 10Jcrespo: mariadb: Change the default arguments for buster [puppet] - 10https://gerrit.wikimedia.org/r/494236 (https://phabricator.wikimedia.org/T161296) [15:58:05] (03CR) 10Muehlenhoff: "JFTR, if you want a test case we can install mariadb on one of the buster test hosts." [puppet] - 10https://gerrit.wikimedia.org/r/494236 (https://phabricator.wikimedia.org/T161296) (owner: 10Jcrespo) [15:59:28] (03PS1) 10Dzahn: xhgui: fix class name in comments [puppet] - 10https://gerrit.wikimedia.org/r/494258 [16:00:34] 10Operations, 10Wikidata, 10Wikidata-Termbox-Hike, 10serviceops, and 4 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10akosiaris) >>! In T212189#4998182, @Tarrow wrote: > I am indeed already working on it. > > Just so you know the current state: we are alre... [16:09:50] 10Operations, 10Analytics, 10Analytics-Kanban, 10Wikimedia-Stream, and 2 others: Eventstreams build is broken - https://phabricator.wikimedia.org/T216184 (10Ottomata) [16:11:13] (03PS1) 10Milimetric: Disable all reportupdater jobs [puppet] - 10https://gerrit.wikimedia.org/r/494260 [16:13:22] !log jiji@cumin1001 conftool action : set/weight=1; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes.* [16:13:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:13:24] (03PS1) 10Cwhite: prometheus: change escaped to character classes to work around systemd bug [puppet] - 10https://gerrit.wikimedia.org/r/494262 (https://phabricator.wikimedia.org/T214594) [16:13:32] !log jiji@cumin1001 conftool action : set/weight=1; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes1001 [16:13:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:13:44] !log jiji@cumin1001 conftool action : set/pooled=yes; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes1001 [16:13:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:14] (03PS2) 10Elukey: Disable all reportupdater jobs [puppet] - 10https://gerrit.wikimedia.org/r/494260 (owner: 10Milimetric) [16:16:18] (03CR) 10Elukey: [C: 03+2] Disable all reportupdater jobs [puppet] - 10https://gerrit.wikimedia.org/r/494260 (owner: 10Milimetric) [16:18:16] !log installing ldb security updates [16:18:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:18:48] jouncebot: next [16:18:48] In 1 hour(s) and 41 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190304T1800) [16:19:37] Anyone mind if I deploy a new config patch? Enabled on Beta and test wiki only. [16:19:45] (03PS1) 10Andrew Bogott: shinkengen: usemwopenstackclients [puppet] - 10https://gerrit.wikimedia.org/r/494264 (https://phabricator.wikimedia.org/T215847) [16:20:49] (03PS2) 10Andrew Bogott: shinkengen: use mwopenstackclients [puppet] - 10https://gerrit.wikimedia.org/r/494264 (https://phabricator.wikimedia.org/T215847) [16:20:53] (03CR) 10Jforrester: [C: 03+2] VE: Enable true section editing for mobile on labs & testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493482 (https://phabricator.wikimedia.org/T217365) (owner: 10Esanders) [16:21:34] (03CR) 10Andrew Bogott: [C: 03+2] shinkengen: use mwopenstackclients [puppet] - 10https://gerrit.wikimedia.org/r/494264 (https://phabricator.wikimedia.org/T215847) (owner: 10Andrew Bogott) [16:21:36] (03Merged) 10jenkins-bot: VE: Enable true section editing for mobile on labs & testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493482 (https://phabricator.wikimedia.org/T217365) (owner: 10Esanders) [16:21:44] (03CR) 10jenkins-bot: VE: Enable true section editing for mobile on labs & testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493482 (https://phabricator.wikimedia.org/T217365) (owner: 10Esanders) [16:22:24] (03PS1) 10Muehlenhoff: Add library hint for ldb [puppet] - 10https://gerrit.wikimedia.org/r/494265 [16:25:41] 10Operations, 10ops-eqiad, 10Cognate, 10Growth-Team, and 4 others: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10ayounsi) [16:26:46] (03PS2) 10Muehlenhoff: Add library hint for ldb [puppet] - 10https://gerrit.wikimedia.org/r/494265 [16:27:54] (03CR) 10Muehlenhoff: [C: 03+2] Add library hint for ldb [puppet] - 10https://gerrit.wikimedia.org/r/494265 (owner: 10Muehlenhoff) [16:27:56] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T217365: Enable VE section editing on mobile for Beta Cluster, part I (duration: 00m 51s) [16:27:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:27:58] T217365: Enable section editing feature flag on beta cluster - https://phabricator.wikimedia.org/T217365 [16:29:02] 10Operations, 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-Logstash, and 4 others: MediaWiki errors overloading logstash - https://phabricator.wikimedia.org/T215611 (10herron) Sadly this bit us again last week. Details outlined in https://wikitech.wikimedia.org/wiki/Incident_documentation/20190228... [16:29:17] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: T217365: Enable VE section editing on mobile for Beta Cluster, part II (duration: 00m 48s) [16:29:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:55] Deploy clear. [16:33:10] 10Operations, 10ops-eqiad, 10Cognate, 10Growth-Team, and 4 others: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10ayounsi) [16:33:22] !log enabing gtid replication on clouddb1002 [16:33:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:33:45] (03PS2) 10Kosta Harlan: GrowthExperiments: Enable help panel for user and user talk NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493616 (https://phabricator.wikimedia.org/T215664) [16:34:01] (03CR) 10Kosta Harlan: GrowthExperiments: Enable help panel for user and user talk NS (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493616 (https://phabricator.wikimedia.org/T215664) (owner: 10Kosta Harlan) [16:35:49] (03CR) 10Sbisson: [C: 03+1] GrowthExperiments: Enable help panel for user and user talk NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493616 (https://phabricator.wikimedia.org/T215664) (owner: 10Kosta Harlan) [16:36:06] (03CR) 10Volans: [C: 04-1] "It's in a much nicer state now, thanks! I've a few comments inline, feel free to ping me offline if you want more context in any of them." (0316 comments) [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/492007 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [16:43:42] !log Restart MySQL on db1112 for addshore [16:43:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:46:53] 10Operations, 10Analytics, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10Milimetric) p:05Triage→03High a:03RobH [16:47:19] PROBLEM - Disk space on contint1001 is CRITICAL: DISK CRITICAL - free space: / 2655 MB (5% inode=63%) [16:48:05] (03CR) 10Alexandros Kosiaris: [C: 03+1] "I haven't audited the files very thoroughly, but approach LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/493769 (https://phabricator.wikimedia.org/T193264) (owner: 10Bstorm) [16:48:06] 10Operations, 10Analytics, 10RESTBase, 10Traffic, and 2 others: Verify that hit/miss stats in WebRequest are correct - https://phabricator.wikimedia.org/T215987 (10jbond) p:05Triage→03Normal [16:49:21] RECOVERY - Disk space on contint1001 is OK: DISK OK [16:52:01] !log jiji@cumin1001 conftool action : set/pooled=yes; selector: dc=eqiad,service=citoid,cluster=scb,name=kubernetes1001.* [16:52:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:54:16] !log contint1001: cleaned all Docker containers, compress /var/log/zuul/ files [16:54:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:55:20] 10Operations, 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-Logstash, and 4 others: MediaWiki errors overloading logstash - https://phabricator.wikimedia.org/T215611 (10jcrespo) Thanks @herron, I would like to know more information about what caused the extra logging, but I didn't find it on the in... [16:55:49] 10Operations, 10ops-eqiad, 10Data-Services, 10cloud-services-team (Kanban): labstore1006 spontaneous reboot - https://phabricator.wikimedia.org/T217473 (10Bstorm) [16:55:59] 10Operations, 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-Logstash, and 4 others: MediaWiki errors overloading logstash - https://phabricator.wikimedia.org/T215611 (10jcrespo) a:05jcrespo→03None I am not working on this. [16:56:14] 10Operations, 10ops-eqiad: Update several hosts status in Netbox - https://phabricator.wikimedia.org/T217429 (10Volans) @Marostegui the different states and their transitions (when they are supposed to be updated) are described here: - https://wikitech.wikimedia.org/wiki/Server_Lifecycle#States - https://wikit... [16:58:26] 10Operations, 10ops-eqiad: Update several hosts status in Netbox - https://phabricator.wikimedia.org/T217429 (10Marostegui) a:03Marostegui Thanks! Will do! [16:59:19] (03PS5) 10Thcipriani: gerrit: Disable jgit gc [puppet] - 10https://gerrit.wikimedia.org/r/493963 (https://phabricator.wikimedia.org/T217497) (owner: 10Paladox) [16:59:51] (03CR) 10Thcipriani: [C: 03+1] gerrit: Disable jgit gc [puppet] - 10https://gerrit.wikimedia.org/r/493963 (https://phabricator.wikimedia.org/T217497) (owner: 10Paladox) [17:00:37] 10Operations, 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-Logstash, and 4 others: MediaWiki errors overloading logstash - https://phabricator.wikimedia.org/T215611 (10herron) >>! In T215611#4998584, @jcrespo wrote: > Thanks @herron, I would like to know more information about what caused the extr... [17:00:55] (03PS2) 10Elukey: Add timer to delete analytics EL unsanitized events after 90d [puppet] - 10https://gerrit.wikimedia.org/r/493687 (https://phabricator.wikimedia.org/T209503) (owner: 10Mforns) [17:05:00] (03CR) 10Elukey: [C: 03+2] Add timer to delete analytics EL unsanitized events after 90d [puppet] - 10https://gerrit.wikimedia.org/r/493687 (https://phabricator.wikimedia.org/T209503) (owner: 10Mforns) [17:06:04] (03CR) 10Kosta Harlan: [C: 03+1] Enable GrowthExperiments Homepage on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494223 (https://phabricator.wikimedia.org/T215982) (owner: 10Sbisson) [17:07:03] 10Operations, 10ops-eqiad: Update several hosts status in Netbox - https://phabricator.wikimedia.org/T217429 (10Marostegui) 05Open→03Resolved [17:16:44] (03PS1) 10Tim Eulitz: Set up exceptions for rollback confirmation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494270 [17:17:54] 10Operations, 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-Logstash, and 4 others: MediaWiki errors overloading logstash - https://phabricator.wikimedia.org/T215611 (10Krinkle) 05Open→03Resolved a:03aaron >>! In T215611#4998372, @herron wrote: > Sadly this bit us again last week. Details out... [17:18:10] (03PS1) 10DLynch: Oversample metrics for mobile visualeditor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494271 (https://phabricator.wikimedia.org/T212253) [17:20:49] (03PS2) 10Tim Eulitz: Set up exceptions for rollback confirmation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494270 (https://phabricator.wikimedia.org/T217436) [17:30:08] 10Operations, 10ops-eqiad, 10Cognate, 10Growth-Team, and 6 others: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10Addshore) So read only errors are handled nicely in Cognate, well, the data will never end up being written, but users won't see errors. Failures... [17:35:50] 10Operations, 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-Logstash, and 4 others: MediaWiki errors overloading logstash - https://phabricator.wikimedia.org/T215611 (10CCicalese_WMF) [17:36:20] 10Operations, 10ops-eqiad, 10Cognate, 10Growth-Team, and 6 others: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10Marostegui) >>! In T187960#4998807, @Addshore wrote: > > > From where I am sat setting $wgReadOnly for the few seconds would be the best thing f... [17:38:20] 10Operations, 10serviceops, 10Core Platform Team (PHP7 (TEC4)), 10Core Platform Team Kanban (Done with CPT), and 4 others: Set up a beta feature offering the use of PHP7 - https://phabricator.wikimedia.org/T213934 (10CCicalese_WMF) [17:38:55] 10Operations, 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-Logstash, and 4 others: MediaWiki errors overloading logstash - https://phabricator.wikimedia.org/T215611 (10Ottomata) Task is here {T217385} [17:39:30] 10Operations, 10ops-eqiad, 10Cognate, 10Growth-Team, and 6 others: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10Addshore) >>! In T187960#4998831, @Marostegui wrote: >>>! In T187960#4998807, @Addshore wrote: >> >> >> From where I am sat setting $wgReadOnly... [17:40:12] 10Operations, 10ops-eqiad, 10Cognate, 10Growth-Team, and 6 others: Rack/cable/configure asw2-a-eqiad switch stack - https://phabricator.wikimedia.org/T187960 (10Marostegui) Will do! Thanks! [17:40:24] 10Operations, 10Analytics, 10Analytics-EventLogging, 10EventBus, and 3 others: RFC: Modern Event Platform - Choose Schema Tech - https://phabricator.wikimedia.org/T198256 (10CCicalese_WMF) [17:41:06] 10Operations, 10Availability (MediaWiki-MultiDC), 10Core Platform Team Kanban (Done with CPT), 10Performance-Team (Radar), 10Services (designing): Consider REST with SSL (HyperSwitch/Cassandra) for session storage - https://phabricator.wikimedia.org/T134811 (10CCicalese_WMF) [17:41:42] 10Operations, 10ops-codfw, 10Discovery-Search (Current work): elastic2038 CPU/memory errors - https://phabricator.wikimedia.org/T217398 (10Gehel) [17:42:36] 10Operations, 10TCB-Team, 10WMF-JobQueue, 10monitoring, and 3 others: Grafana alerting broken after upgrade to 5.0.0 - https://phabricator.wikimedia.org/T213506 (10CCicalese_WMF) [17:43:16] 10Operations, 10ops-codfw, 10Discovery-Search (Current work): elastic2038 CPU/memory errors - https://phabricator.wikimedia.org/T217398 (10Gehel) @Papaul while this is not an emergency, we're already missing a bunch of servers, so ping me if there is anything I can do to help move this forward. [17:45:03] (03PS1) 10Bstorm: dumps distribution: remove labstore1006 for failover [puppet] - 10https://gerrit.wikimedia.org/r/494273 (https://phabricator.wikimedia.org/T217473) [17:49:27] 10Operations, 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-Logstash, and 4 others: MediaWiki errors overloading logstash - https://phabricator.wikimedia.org/T215611 (10herron) >>! In T215611#4998766, @Krinkle wrote: > The screenshot from Grafana does indicate that starting around 19:40 nearly 90%... [17:49:31] (03PS1) 10Bstorm: dumps distrubution: reduce TTL for failover of dumps.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/494275 (https://phabricator.wikimedia.org/T217473) [17:50:38] (03CR) 10Bstorm: [C: 03+2] dumps distrubution: reduce TTL for failover of dumps.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/494275 (https://phabricator.wikimedia.org/T217473) (owner: 10Bstorm) [17:56:51] 10Operations, 10ops-codfw, 10Discovery-Search (Current work): elastic2038 CPU/memory errors - https://phabricator.wikimedia.org/T217398 (10Papaul) @Gehel Please power the server off if it is not off. Thanks. [17:57:04] (03CR) 10Bstorm: [C: 03+2] dumps distribution: remove labstore1006 for failover [puppet] - 10https://gerrit.wikimedia.org/r/494273 (https://phabricator.wikimedia.org/T217473) (owner: 10Bstorm) [17:58:14] (03PS1) 10Vgutierrez: pybal: switch lvs5002 BGP peering from cr1-eqsin to cr2-eqsin [puppet] - 10https://gerrit.wikimedia.org/r/494278 (https://phabricator.wikimedia.org/T213121) [17:59:49] 10Operations, 10ops-codfw, 10Discovery-Search (Current work): elastic2038 CPU/memory errors - https://phabricator.wikimedia.org/T217398 (10Gehel) I've just powered down elastic2038 via the mgmt interface. [18:00:04] gehel and onimisionipe: Dear deployers, time to do the Wikidata Query Service weekly deploy deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190304T1800). [18:00:04] (03CR) 10Vgutierrez: [C: 03+1] "pcc shows NOOP in lvs500[13] and the expected change in lvs5002: https://puppet-compiler.wmflabs.org/compiler1002/14960/" [puppet] - 10https://gerrit.wikimedia.org/r/494278 (https://phabricator.wikimedia.org/T213121) (owner: 10Vgutierrez) [18:00:51] Here here [18:02:33] (03CR) 10Catrope: [C: 03+1] Enable and configure the ORES goodfaith model on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493749 (https://phabricator.wikimedia.org/T211032) (owner: 10Sbisson) [18:03:37] (03CR) 10Catrope: [C: 03+1] Enable and configure ORES goodfaith and damaging rcfilters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494222 (https://phabricator.wikimedia.org/T161628) (owner: 10Sbisson) [18:05:23] (03CR) 10Jcrespo: "> JFTR, if you want a test case we can install mariadb on one of the" [puppet] - 10https://gerrit.wikimedia.org/r/494236 (https://phabricator.wikimedia.org/T161296) (owner: 10Jcrespo) [18:07:21] (03PS3) 10Sbisson: Enable and configure the ORES goodfaith model on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493749 (https://phabricator.wikimedia.org/T211032) [18:07:49] (03PS2) 10Sbisson: Enable and configure ORES goodfaith and damaging rcfilters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494222 (https://phabricator.wikimedia.org/T161628) [18:14:17] 10Operations, 10ops-eqiad, 10ops-eqsin, 10netops, 10Patch-For-Review: Deploy cr2-eqsin - https://phabricator.wikimedia.org/T213121 (10ayounsi) Moving lvs5002 to cr2-eqsin 1/ push the following (no impact) `name=cr2-eqsin [edit routing-options rib inet6.0 static route 2001:df2:e500:ed1a::2:0/111] - ne... [18:16:38] !log push lvs5002 changes on cr2-eqsin - T213121 [18:16:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:41] T213121: Deploy cr2-eqsin - https://phabricator.wikimedia.org/T213121 [18:17:54] (03CR) 10Vgutierrez: [C: 03+2] pybal: switch lvs5002 BGP peering from cr1-eqsin to cr2-eqsin [puppet] - 10https://gerrit.wikimedia.org/r/494278 (https://phabricator.wikimedia.org/T213121) (owner: 10Vgutierrez) [18:18:08] (03PS2) 10Vgutierrez: pybal: switch lvs5002 BGP peering from cr1-eqsin to cr2-eqsin [puppet] - 10https://gerrit.wikimedia.org/r/494278 (https://phabricator.wikimedia.org/T213121) [18:19:45] 10Operations, 10MediaWiki-Cache, 10MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), 10Patch-For-Review, and 3 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) The spike in GETs (generating several MBs... [18:20:31] (03PS5) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [18:21:27] (03CR) 10jerkins-bot: [V: 04-1] Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [18:23:52] !log restarting pybal on lvs5002 - T213121 [18:23:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:54] T213121: Deploy cr2-eqsin - https://phabricator.wikimedia.org/T213121 [18:25:12] !log disabled notifications for high load on labstore1007 while failed over T217473 [18:25:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:25:15] T217473: labstore1006 spontaneous reboot - https://phabricator.wikimedia.org/T217473 [18:27:23] (03PS6) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [18:30:04] 10Operations: Please create talkpageconsultation@wikimedia.org email alias - https://phabricator.wikimedia.org/T217590 (10TBolliger) [18:39:05] RECOVERY - Host elastic2038 is UP: PING OK - Packet loss = 0%, RTA = 0.16 ms [18:39:16] (03PS7) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [18:44:02] (03CR) 10Volans: "Pre-review as requested offline." (035 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072) (owner: 10CRusnov) [18:47:57] (03PS1) 10Bstorm: dumps distribution: swap do_acme for dumps server failover [puppet] - 10https://gerrit.wikimedia.org/r/494284 (https://phabricator.wikimedia.org/T217473) [18:52:03] (03PS1) 10Bstorm: dumps distribution: fail over to labstore1007 for web access [dns] - 10https://gerrit.wikimedia.org/r/494286 (https://phabricator.wikimedia.org/T217473) [18:56:42] (03CR) 10Bstorm: [C: 03+2] dumps distribution: fail over to labstore1007 for web access [dns] - 10https://gerrit.wikimedia.org/r/494286 (https://phabricator.wikimedia.org/T217473) (owner: 10Bstorm) [18:57:19] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=PATCH https://grafana.wikimedia.org/dashboard/db/kubernetes-api [18:57:21] (03CR) 10Bstorm: [C: 03+2] dumps distribution: swap do_acme for dumps server failover [puppet] - 10https://gerrit.wikimedia.org/r/494284 (https://phabricator.wikimedia.org/T217473) (owner: 10Bstorm) [19:00:04] Deploy window Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190304T1900) [19:00:05] Tpt, bmansurov, stephanebisson, ottomata, and kostajh: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:10] here [19:00:11] hi [19:00:14] hello [19:00:14] here [19:00:30] hi [19:01:02] I can SWAT [19:01:38] (03CR) 10Sbisson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493753 (https://phabricator.wikimedia.org/T217442) (owner: 10Tpt) [19:02:39] (03Merged) 10jenkins-bot: Enables maplink for geocoordinate Wikibase statements display on clients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493753 (https://phabricator.wikimedia.org/T217442) (owner: 10Tpt) [19:02:55] (03CR) 10jenkins-bot: Enables maplink for geocoordinate Wikibase statements display on clients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493753 (https://phabricator.wikimedia.org/T217442) (owner: 10Tpt) [19:03:25] Tpt[m]: Your change is on mwdebug1002, can you test? [19:03:29] !log dumps.wikimedia.org is now running off labstore1007 T217473 [19:03:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:32] T217473: labstore1006 spontaneous reboot - https://phabricator.wikimedia.org/T217473 [19:03:52] stephanebisson doing [19:03:59] (03PS3) 10Sbisson: Enable reader demographics survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493235 (https://phabricator.wikimedia.org/T217080) (owner: 10Bmansurov) [19:06:51] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [19:08:00] (03PS1) 10Andrew Bogott: boostrap-vz: add initial buster manifest [puppet] - 10https://gerrit.wikimedia.org/r/494287 (https://phabricator.wikimedia.org/T216781) [19:08:22] bmansurov: Is your change testable on a debug server? [19:08:29] stephanebisson: yes [19:09:48] stephanebisson I did a typo in the configuration. Could you revert the change please? Sorry for the lost time [19:10:06] Tpt[m]: Sure, no problem [19:10:17] (03PS1) 10Sbisson: Revert "Enables maplink for geocoordinate Wikibase statements display on clients" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494288 [19:10:24] (03CR) 10Sbisson: [C: 03+2] Revert "Enables maplink for geocoordinate Wikibase statements display on clients" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494288 (owner: 10Sbisson) [19:11:29] (03Merged) 10jenkins-bot: Revert "Enables maplink for geocoordinate Wikibase statements display on clients" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494288 (owner: 10Sbisson) [19:11:51] (03CR) 10Sbisson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493235 (https://phabricator.wikimedia.org/T217080) (owner: 10Bmansurov) [19:12:33] 10Operations, 10ops-codfw, 10Discovery-Search (Current work): elastic2038 CPU/memory errors - https://phabricator.wikimedia.org/T217398 (10Papaul) The server warranty is until Nov 2021 and the firmware on the serve is old. If i call DELL they will ask me first to upgrade the firmware. Old version BIOS Versi... [19:12:38] (03PS1) 10Tpt: Enables maplink for geocoordinate Wikibase statements display on clients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494289 (https://phabricator.wikimedia.org/T217442) [19:12:57] (03Merged) 10jenkins-bot: Enable reader demographics survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493235 (https://phabricator.wikimedia.org/T217080) (owner: 10Bmansurov) [19:13:27] (03CR) 10jenkins-bot: Revert "Enables maplink for geocoordinate Wikibase statements display on clients" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494288 (owner: 10Sbisson) [19:13:29] (03CR) 10jenkins-bot: Enable reader demographics survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493235 (https://phabricator.wikimedia.org/T217080) (owner: 10Bmansurov) [19:13:38] bmansurov: Your change is on mwdebug1002 [19:13:44] stephanebisson: ok, testing [19:14:49] 10Operations, 10ops-eqiad, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): labstore1006 spontaneous reboot - https://phabricator.wikimedia.org/T217473 (10Bstorm) a:03Cmjohnson Ok, this host should now be reasonably safe to work on for checking for firmware issues by DC Ops. NOTE... [19:15:03] (03PS4) 10Sbisson: Enable and configure the ORES goodfaith model on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493749 (https://phabricator.wikimedia.org/T211032) [19:15:23] stephanebissonI just made a corrected version of my wrong config change: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/494289 Is it possible to try it at the end of this SWAT window? [19:15:41] Tpt[m]: We'll do it at the end if there's time [19:15:49] thanks! [19:17:51] stephanebisson: I made a typo too. Can you not deploy that patch please? [19:18:05] bmansurov: Sure, no prob [19:18:19] (03PS1) 10Sbisson: Revert "Enable reader demographics survey" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494290 [19:18:23] stephanebisson: I'll submit a fix soon. Sorry for delaying others. I can wait until others are done. [19:18:27] (03CR) 10Sbisson: [C: 03+2] Revert "Enable reader demographics survey" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494290 (owner: 10Sbisson) [19:19:08] You're all making me nervous that I also have typos in my 3 upcoming patches [19:19:28] (03Merged) 10jenkins-bot: Revert "Enable reader demographics survey" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494290 (owner: 10Sbisson) [19:19:42] * kostajh just checked his patch for typos :P [19:19:43] (03PS5) 10Sbisson: Enable and configure the ORES goodfaith model on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493749 (https://phabricator.wikimedia.org/T211032) [19:19:51] (03CR) 10Sbisson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493749 (https://phabricator.wikimedia.org/T211032) (owner: 10Sbisson) [19:20:49] (03Merged) 10jenkins-bot: Enable and configure the ORES goodfaith model on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493749 (https://phabricator.wikimedia.org/T211032) (owner: 10Sbisson) [19:23:48] (03PS1) 10Bmansurov: Enable reader demographics survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494292 (https://phabricator.wikimedia.org/T217080) [19:23:50] (03PS1) 10BryanDavis: toolforge: Rewrite envelope From headers when relaying [puppet] - 10https://gerrit.wikimedia.org/r/494291 (https://phabricator.wikimedia.org/T213416) [19:24:37] (03PS1) 10Sbisson: Revert "Enable and configure the ORES goodfaith model on itwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494293 [19:24:44] (03CR) 10Sbisson: [C: 03+2] Revert "Enable and configure the ORES goodfaith model on itwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494293 (owner: 10Sbisson) [19:24:50] (03CR) 10jenkins-bot: Revert "Enable reader demographics survey" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494290 (owner: 10Sbisson) [19:24:52] (03CR) 10jenkins-bot: Enable and configure the ORES goodfaith model on itwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493749 (https://phabricator.wikimedia.org/T211032) (owner: 10Sbisson) [19:25:07] Doesn't work as expected. Revert [19:25:52] (03Merged) 10jenkins-bot: Revert "Enable and configure the ORES goodfaith model on itwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494293 (owner: 10Sbisson) [19:26:05] (03CR) 10jenkins-bot: Revert "Enable and configure the ORES goodfaith model on itwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494293 (owner: 10Sbisson) [19:26:08] (03PS3) 10Sbisson: Enable and configure ORES goodfaith and damaging rcfilters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494222 (https://phabricator.wikimedia.org/T161628) [19:26:25] (03CR) 10Sbisson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494222 (https://phabricator.wikimedia.org/T161628) (owner: 10Sbisson) [19:27:29] (03Merged) 10jenkins-bot: Enable and configure ORES goodfaith and damaging rcfilters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494222 (https://phabricator.wikimedia.org/T161628) (owner: 10Sbisson) [19:28:05] (03CR) 10Andrew Bogott: [C: 03+2] boostrap-vz: add initial buster manifest [puppet] - 10https://gerrit.wikimedia.org/r/494287 (https://phabricator.wikimedia.org/T216781) (owner: 10Andrew Bogott) [19:29:30] (03PS1) 10Sbisson: Revert "Enable and configure ORES goodfaith and damaging rcfilters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494295 [19:29:38] (03CR) 10Sbisson: [C: 03+2] Revert "Enable and configure ORES goodfaith and damaging rcfilters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494295 (owner: 10Sbisson) [19:29:46] 0/4 [19:30:10] !log onimisionipe@deploy1001 Started deploy [wdqs/wdqs@20badb3]: Updater and Blazegraph group to report metric domain plus GUI updates [19:30:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:30:41] (03Merged) 10jenkins-bot: Revert "Enable and configure ORES goodfaith and damaging rcfilters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494295 (owner: 10Sbisson) [19:30:53] (03PS2) 10Sbisson: Enable GrowthExperiments Homepage on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494223 (https://phabricator.wikimedia.org/T215982) [19:31:13] (03CR) 10Sbisson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494223 (https://phabricator.wikimedia.org/T215982) (owner: 10Sbisson) [19:32:08] (03Merged) 10jenkins-bot: Enable GrowthExperiments Homepage on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494223 (https://phabricator.wikimedia.org/T215982) (owner: 10Sbisson) [19:32:38] stephanebisson: my patch is ready whenever you are: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494292/ [19:35:15] !log sbisson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:494223|Enable GrowthExperiments Homepage on testwiki]] (duration: 00m 49s) [19:35:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:52] ottomata: Your change depends on https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/490418/ which appears to have been reverted... [19:36:15] it was revert reverted this morning [19:36:24] (03CR) 10BryanDavis: "> Which software relies on $HOME" [puppet] - 10https://gerrit.wikimedia.org/r/494155 (owner: 10BryanDavis) [19:36:27] https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/494231/ [19:36:40] (03CR) 10jenkins-bot: Enable and configure ORES goodfaith and damaging rcfilters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494222 (https://phabricator.wikimedia.org/T161628) (owner: 10Sbisson) [19:36:42] (03CR) 10jenkins-bot: Revert "Enable and configure ORES goodfaith and damaging rcfilters" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494295 (owner: 10Sbisson) [19:36:44] (03CR) 10jenkins-bot: Enable GrowthExperiments Homepage on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494223 (https://phabricator.wikimedia.org/T215982) (owner: 10Sbisson) [19:36:48] oh [19:37:00] oh sorry [19:37:12] that one listed there was revert reverted and deployed last week [19:37:32] https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/492356/ [19:37:40] stephanebisson: ^ [19:38:06] ottomata: ok, if you're confident I am to [19:38:41] ottomata: So you change is supposed to be a no-op. Is there anything for you to test? [19:39:01] (03PS14) 10Sbisson: Add eventbus analytics logging alongside with kafka logging. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490668 (https://phabricator.wikimedia.org/T216163) (owner: 10Ppchelko) [19:39:08] in beta yes, in prod i'm not sure what to test other than that configs aren't broken and monolog api action still works [19:39:14] i'm watching throughput of the ApiAction topic in kafka [19:39:15] What does "no-op" stand for? [19:39:26] (03CR) 10Herron: [C: 03+1] toolforge: Rewrite envelope From headers when relaying [puppet] - 10https://gerrit.wikimedia.org/r/494291 (https://phabricator.wikimedia.org/T213416) (owner: 10BryanDavis) [19:39:33] Niharika: "no operation", does nothing [19:39:41] haha, good question, i think 'no operation', meaning that it has no intendend functional affect [19:39:49] effect* [19:39:54] Gotcha. Thanks. :) [19:40:04] stephanebisson: if you deployt to mwdebug1002 [19:40:09] i'll make an api request there and ensrue that that works [19:40:21] is there an easy way to see live error logs from just that host? (logstash I suppose?) [19:40:28] (03CR) 10Sbisson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490668 (https://phabricator.wikimedia.org/T216163) (owner: 10Ppchelko) [19:40:37] ottomata: We'll do that [19:40:47] ok [19:40:58] yeah, logstash with host: [19:41:17] !log onimisionipe@deploy1001 Finished deploy [wdqs/wdqs@20badb3]: Updater and Blazegraph group to report metric domain plus GUI updates (duration: 11m 07s) [19:41:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:41:27] great [19:41:33] 10Operations, 10Analytics, 10EventBus, 10Core Platform Team (Modern Event Platform (TEC2)), and 2 others: Possibly expand Kafka main-{eqiad,codfw} clusters in Q4 2019. - https://phabricator.wikimedia.org/T217359 (10jbond) p:05Triage→03Normal [19:41:38] (03Merged) 10jenkins-bot: Add eventbus analytics logging alongside with kafka logging. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490668 (https://phabricator.wikimedia.org/T216163) (owner: 10Ppchelko) [19:43:07] ottomata: We're on mwdebug1002 [19:43:18] ok [19:44:02] stephanebisson: as far as I can tell, all looks ok. [19:44:24] (03CR) 10Herron: [C: 03+1] "Looks good, will merge shortly" [puppet] - 10https://gerrit.wikimedia.org/r/494042 (https://phabricator.wikimedia.org/T136849) (owner: 10Krinkle) [19:44:29] Alright [19:44:38] 10Operations, 10ORES, 10Scoring-platform-team: [Discuss] ORES without celery - https://phabricator.wikimedia.org/T216838 (10jbond) p:05Triage→03Normal [19:45:13] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): cloudVPS: drain and rebuild labvirt1008 as cloudvirt1008 - https://phabricator.wikimedia.org/T216661 (10jbond) p:05Triage→03Normal [19:45:40] (03PS3) 10Sbisson: GrowthExperiments: Enable help panel for user and user talk NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493616 (https://phabricator.wikimedia.org/T215664) (owner: 10Kosta Harlan) [19:46:01] !log sbisson@deploy1001 Synchronized wmf-config/: SWAT: [[gerrit:490668|Add eventbus analytics logging alongside with kafka logging. (part 1)]] (duration: 00m 51s) [19:46:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:54] 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for sukhe - https://phabricator.wikimedia.org/T217438 (10Tbayer) >>! In T217438#4997575, @elukey wrote: > @Tbayer what kind of access is needed? I guess analytics-privatedata-users but just want to be sure :) Per https://wikitech.wikimedia.org/... [19:47:10] 10Operations, 10Maps, 10Reading-Infrastructure-Team-Backlog, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Create Debian packages for Node.js 8 upgrade for Maps - https://phabricator.wikimedia.org/T216521 (10jbond) p:05Triage→03Normal [19:47:17] !log sbisson@deploy1001 Synchronized tests/loggingTest.php: SWAT: [[gerrit:490668|Add eventbus analytics logging alongside with kafka logging. (part 2)]] (duration: 00m 48s) [19:47:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:47:31] 10Operations: Integrate Stretch 9.8 point update - https://phabricator.wikimedia.org/T216384 (10jbond) p:05Triage→03Normal [19:47:33] (03CR) 10Sbisson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493616 (https://phabricator.wikimedia.org/T215664) (owner: 10Kosta Harlan) [19:47:49] (03CR) 10jenkins-bot: Add eventbus analytics logging alongside with kafka logging. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490668 (https://phabricator.wikimedia.org/T216163) (owner: 10Ppchelko) [19:47:57] kostajh: Your patch is next [19:48:13] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): Move cloudvirt1018 to a 10G rack, connect 10G nics - https://phabricator.wikimedia.org/T217347 (10jbond) p:05Triage→03Normal [19:48:39] (03Merged) 10jenkins-bot: GrowthExperiments: Enable help panel for user and user talk NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493616 (https://phabricator.wikimedia.org/T215664) (owner: 10Kosta Harlan) [19:48:41] stephanebisson: Alright. I'm still here [19:48:53] (03CR) 10jenkins-bot: GrowthExperiments: Enable help panel for user and user talk NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/493616 (https://phabricator.wikimedia.org/T215664) (owner: 10Kosta Harlan) [19:49:01] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): Move cloudvirt1012 to a 10G rack and connect 10g nics - https://phabricator.wikimedia.org/T217346 (10jbond) p:05Triage→03Normal [19:49:13] kostajh: Your patch is on mwdebug1002 [19:49:42] Tpt[m]: where is your new patch? [19:49:43] looking [19:50:09] stephanebisson https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/494289 [19:51:44] stephanebisson: LGTM [19:52:16] (03CR) 10Sbisson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494289 (https://phabricator.wikimedia.org/T217442) (owner: 10Tpt) [19:52:22] 10Operations, 10Discovery-Search: Change logstash plugin deployment to use deb packaging and deployment - https://phabricator.wikimedia.org/T217340 (10jbond) @Mathew.onipe this is probably obvious to most but could you provide information on the "current way"? [19:52:23] (03PS2) 10Sbisson: Enable reader demographics survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494292 (https://phabricator.wikimedia.org/T217080) (owner: 10Bmansurov) [19:52:33] 10Operations, 10Discovery-Search: Change logstash plugin deployment to use deb packaging and deployment - https://phabricator.wikimedia.org/T217340 (10jbond) p:05Triage→03Normal [19:52:56] !log sbisson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:493616|GrowthExperiments: Enable help panel for user and user talk NS]] (duration: 00m 49s) [19:52:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:53:14] (03Merged) 10jenkins-bot: Enables maplink for geocoordinate Wikibase statements display on clients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494289 (https://phabricator.wikimedia.org/T217442) (owner: 10Tpt) [19:53:48] Tpt[m]: you change is on mwdebug1002 [19:54:11] thanks! [19:54:15] (03CR) 10Sbisson: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494292 (https://phabricator.wikimedia.org/T217080) (owner: 10Bmansurov) [19:54:50] 10Operations, 10ops-eqiad, 10DC-Ops: cloudvirt1015: update raid config and move to 10Gb - https://phabricator.wikimedia.org/T217140 (10jbond) p:05Triage→03Normal [19:55:49] (03Merged) 10jenkins-bot: Enable reader demographics survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494292 (https://phabricator.wikimedia.org/T217080) (owner: 10Bmansurov) [19:56:47] bmansurov: your change is on mwdebug1002 [19:56:55] stephanebisson: ok, testing [19:57:11] oh wait.. [19:57:33] (03PS3) 10Herron: logstash: Remove filter for unused 'exception-json' channel [puppet] - 10https://gerrit.wikimedia.org/r/494042 (https://phabricator.wikimedia.org/T136849) (owner: 10Krinkle) [19:57:35] bmansurov: NOW it should be there [19:57:41] ok [19:58:36] stephanebisson: looks good, please ship it. [19:58:59] (03CR) 10jenkins-bot: Enables maplink for geocoordinate Wikibase statements display on clients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494289 (https://phabricator.wikimedia.org/T217442) (owner: 10Tpt) [19:59:01] (03CR) 10jenkins-bot: Enable reader demographics survey [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494292 (https://phabricator.wikimedia.org/T217080) (owner: 10Bmansurov) [19:59:09] 10Operations, 10Analytics, 10RESTBase, 10Traffic, and 2 others: Verify that hit/miss stats in WebRequest are correct - https://phabricator.wikimedia.org/T215987 (10BBlack) The raw data should be accurate. I had thought we were already sending the summarized `X-Cache-Status` to hadoop as well, but apparent... [19:59:13] (03CR) 10Herron: [C: 03+2] logstash: Remove filter for unused 'exception-json' channel [puppet] - 10https://gerrit.wikimedia.org/r/494042 (https://phabricator.wikimedia.org/T136849) (owner: 10Krinkle) [19:59:34] stephanebisson: everything looks fine to me about my change [20:00:05] Niharika and bd808: Time to snap out of that daydream and deploy Wikimania scholarships app update. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190304T2000). [20:00:09] !log sbisson@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:494292|Enable reader demographics survey]] (duration: 00m 49s) [20:00:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:01:11] !log sbisson@deploy1001 Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:494289|Enables maplink for geocoordinate Wikibase statements display on clients]] (duration: 00m 48s) [20:01:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:01:52] And that concludes SWAT. One minute late despite 4 reverts! [20:01:52] 10Operations: Audit our puppet tree for uses of jessie-backports - https://phabricator.wikimedia.org/T216711 (10jbond) p:05Triage→03Normal [20:02:23] thank you! [20:02:23] stephanebisson: Thank you! :) [20:02:27] stephanebisson: thanks! [20:04:20] (03PS2) 10BryanDavis: pbuilder: Ensure ~pbuilder exists and is writable [puppet] - 10https://gerrit.wikimedia.org/r/494155 [20:09:52] " [20:14:59] (03PS1) 10Sbisson: Enable and configure ORES goodfaith and damaging rcfilters [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494301 (https://phabricator.wikimedia.org/T161628) [20:17:06] !log niharika29@deploy1001 Started deploy [scholarships/scholarships@2ef7463]: Deploy new version of app with new translations + fix broken privacy policy link [20:17:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:17:08] !log niharika29@deploy1001 Finished deploy [scholarships/scholarships@2ef7463]: Deploy new version of app with new translations + fix broken privacy policy link (duration: 00m 02s) [20:17:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:17:09] (03PS2) 10Catrope: Enable and configure ORES goodfaith and damaging rcfilters on kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494301 (https://phabricator.wikimedia.org/T161628) (owner: 10Sbisson) [20:19:39] (03PS1) 10ArielGlenn: move labstore1006 to a role that does no rsync fetches for now [puppet] - 10https://gerrit.wikimedia.org/r/494303 (https://phabricator.wikimedia.org/T217473) [20:23:31] !log niharika29@deploy1001 Started deploy [scholarships/scholarships@2ef7463]: Remove outdated translations [20:23:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:23:33] !log niharika29@deploy1001 Finished deploy [scholarships/scholarships@2ef7463]: Remove outdated translations (duration: 00m 02s) [20:23:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:28:49] jouncebot: now [20:28:49] For the next 0 hour(s) and 31 minute(s): Wikimania scholarships app update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190304T2000) [20:28:51] jouncebot: next [20:28:52] In 0 hour(s) and 31 minute(s): Services – Parsoid / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190304T2100) [20:29:13] Reedy: I'm done with my deployment if you want to do something. [20:29:47] <3 [20:29:58] Gotta wait for jerkins to do his thing :) [20:32:17] 10Operations, 10LDAP-Access-Requests: Add bmansurov to archiva-deployers LDAP group - https://phabricator.wikimedia.org/T217447 (10jbond) p:05Triage→03Normal [20:32:57] 10Operations, 10SRE-Access-Requests: Requesting access to stat1007 for sukhe - https://phabricator.wikimedia.org/T217438 (10jbond) p:05Triage→03Normal [20:35:30] (03CR) 10Bstorm: [C: 03+1] move labstore1006 to a role that does no rsync fetches for now [puppet] - 10https://gerrit.wikimedia.org/r/494303 (https://phabricator.wikimedia.org/T217473) (owner: 10ArielGlenn) [20:35:50] (03PS1) 10Catrope: Reapply "Enable and configure the ORES goodfaith model on itwiki"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494306 [20:36:04] (03CR) 10ArielGlenn: [C: 03+2] move labstore1006 to a role that does no rsync fetches for now [puppet] - 10https://gerrit.wikimedia.org/r/494303 (https://phabricator.wikimedia.org/T217473) (owner: 10ArielGlenn) [20:39:27] RECOVERY - Long running screen/tmux on an-coord1001 is OK: OK: No SCREEN or tmux processes detected. [20:40:50] 10Operations, 10Mail: Please create talkpageconsultation@wikimedia.org email alias - https://phabricator.wikimedia.org/T217590 (10jbond) p:05Triage→03Normal a:03jbond [20:41:53] 10Operations, 10ops-eqiad, 10Data-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): labstore1006 spontaneous reboot - https://phabricator.wikimedia.org/T217473 (10ArielGlenn) The above changeset is live on labstore1006. I've commented out the crons that fetch from stat1007, and a run of puppe... [20:44:39] !log reedy@deploy1001 Synchronized php-1.33.0-wmf.19/extensions/Echo/: T217487 (duration: 00m 53s) [20:44:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:44:42] T217487: OAuth tests failing with Echo/UserMerge issues - https://phabricator.wikimedia.org/T217487 [20:45:55] (03CR) 10Framawiki: [C: 03+1] Test rules reference only existing wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494188 (https://phabricator.wikimedia.org/T217541) (owner: 10Urbanecm) [20:48:04] (03CR) 10Framawiki: [C: 03+1] Restrict local uploads on mediawiki.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494218 (https://phabricator.wikimedia.org/T217523) (owner: 10MarcoAurelio) [21:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: How many deployers does it take to do Services – Parsoid / Citoid / Mobileapps / ORES / … deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190304T2100). [21:01:31] jouncebot: many [21:09:25] (03CR) 10Volans: "clarifying my own comment" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072) (owner: 10CRusnov) [21:14:40] !log re-enable bgp to AS13489 on cr2-eqiad [21:14:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:18:15] (03CR) 10CRusnov: "> Patch Set 3:" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/493138 (https://phabricator.wikimedia.org/T217072) (owner: 10CRusnov) [21:19:52] !log add bgp sessions to AS137236 on cr1-eqsin [21:19:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:23:21] mobrovac, akosiaris, mutante or someone ... do you know what is going on with T217604 .. scap aborts on trying to a parsoid deploy /cc arlolra [21:23:21] T217604: Could not find config setting ` default_dsh_targets` - https://phabricator.wikimedia.org/T217604 [21:24:00] or thcipriani ^ [21:24:39] subbu: See also T217597 [21:24:39] T217597: Scap: server_groups regression - https://phabricator.wikimedia.org/T217597 [21:25:02] Which is very simpler (or likely a dupe) [21:25:31] looks similar [21:25:52] I think it's basically the same, spaces not being trimmed [21:25:52] Just different targets [21:27:47] 10Operations, 10ops-eqiad, 10DC-Ops, 10Wikimedia-Logstash, and 2 others: Decommission old eqiad logstash hardware hosts logstash100[456] - https://phabricator.wikimedia.org/T217556 (10Peachey88) [21:28:48] anomie: if you have minute: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/494308/ [21:29:07] wondering if we should abort our deploy plans or whether this is something that is easily fixable .. [21:29:51] thcipriani has already put a patch up [21:29:51] I don't think deploying scap is so quick though [21:33:13] ok. i guess that means try deploying tomorrow. [21:34:54] Unfortunately as tyler hasn't responded :( [21:35:03] ok. [21:35:32] i guess i'll go back to dealing with snow then. :) [21:39:15] (03CR) 10Muehlenhoff: Add system timer for running ganeti->netbox sync. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [21:39:37] ottomata: Reviewed. [21:40:14] (03CR) 10CRusnov: "> Patch Set 7:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) (owner: 10CRusnov) [21:40:18] subbu|away: dang, sorry I missed you. It is a regression. The workaround while the next version of scap is being built is to remove the spaces in your server_groups: i.e., 'one, two' -> 'one,two' [21:40:20] subbu|away you could strip the whitespace i think [21:43:03] anomie: another one for ya if you find another minute: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/EventBus/+/494305/ [21:43:21] (sorry my usual reviewers are mia :) ) [21:43:27] thank you! [21:50:18] ottomata: Reviewed. [21:50:30] thank you so much! [21:51:37] anomie: amended., [21:54:16] !log otto@deploy1001 scap-helm eventgate-analytics install -n staging -f eventgate-analytics-staging-values.yaml stable/eventgate-analytics [namespace: eventgate-analytics, clusters: staging] [21:54:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:54:56] !log otto@deploy1001 scap-helm eventgate-analytics upgrade staging stable/eventgate-analytics -f eventgate-analytics-staging-values.yaml [namespace: eventgate-analytics, clusters: staging] [21:54:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:54:57] !log otto@deploy1001 scap-helm eventgate-analytics cluster staging completed [21:54:57] !log otto@deploy1001 scap-helm eventgate-analytics finished [21:54:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:54:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:58:24] !log otto@deploy1001 scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-eqiad-values.yaml [namespace: eventgate-analytics, clusters: eqiad] [21:58:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:58:26] !log otto@deploy1001 scap-helm eventgate-analytics cluster eqiad completed [21:58:26] !log otto@deploy1001 scap-helm eventgate-analytics finished [21:58:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:58:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:58:35] !log otto@deploy1001 scap-helm eventgate-analytics upgrade production stable/eventgate-analytics -f eventgate-analytics-codfw-values.yaml [namespace: eventgate-analytics, clusters: codfw] [21:58:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:58:37] !log otto@deploy1001 scap-helm eventgate-analytics cluster codfw completed [21:58:37] !log otto@deploy1001 scap-helm eventgate-analytics finished [21:58:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:58:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:59:22] !log arlolra@deploy1001 Started deploy [parsoid/deploy@bdc9e66]: Updating Parsoid to 1660395 [21:59:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:00:04] bawolff and Reedy: It is that lovely time of the day again! You are hereby commanded to deploy Weekly Security deployment window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190304T2200). [22:05:57] !log arlolra@deploy1001 Finished deploy [parsoid/deploy@bdc9e66]: Updating Parsoid to 1660395 (duration: 06m 34s) [22:05:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:09:38] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [22:10:48] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [22:15:05] !log Updated Parsoid to 1660395 (T214099, T202905) [22:15:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:15:09] T214099: Stress test Parsoid's HTTP API - https://phabricator.wikimedia.org/T214099 [22:15:10] T202905: Outreach-17 Project: Add a new Linter Category: Links-in-Links - https://phabricator.wikimedia.org/T202905 [22:23:41] (03PS8) 10CRusnov: Add system timer for running ganeti->netbox sync. [puppet] - 10https://gerrit.wikimedia.org/r/493774 (https://phabricator.wikimedia.org/T215229) [22:26:50] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 229, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [22:35:59] (03PS2) 10Bstorm: osmdb: stage the roles and profiles for virtualizing the servers [puppet] - 10https://gerrit.wikimedia.org/r/493769 (https://phabricator.wikimedia.org/T193264) [23:10:38] (03CR) 10Bartosz Dziewoński: [C: 03+1] Oversample metrics for mobile visualeditor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/494271 (https://phabricator.wikimedia.org/T212253) (owner: 10DLynch) [23:57:19] (03CR) 10Bstorm: [C: 03+2] osmdb: stage the roles and profiles for virtualizing the servers [puppet] - 10https://gerrit.wikimedia.org/r/493769 (https://phabricator.wikimedia.org/T193264) (owner: 10Bstorm)