[00:06:12] (03PS1) 10RobH: Revert "icinga1001 production dns entries" [dns] - 10https://gerrit.wikimedia.org/r/454437 [00:06:27] (03CR) 10RobH: [C: 032] Revert "icinga1001 production dns entries" [dns] - 10https://gerrit.wikimedia.org/r/454437 (owner: 10RobH) [00:10:26] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/449763 (owner: 10Dzahn) [00:12:47] (03PS1) 10RobH: icinga1001.wikimedia.org prod dns [dns] - 10https://gerrit.wikimedia.org/r/454438 (https://phabricator.wikimedia.org/T201344) [00:13:22] (03CR) 10RobH: [C: 032] icinga1001.wikimedia.org prod dns [dns] - 10https://gerrit.wikimedia.org/r/454438 (https://phabricator.wikimedia.org/T201344) (owner: 10RobH) [00:17:02] 10Operations, 10wikidiff2: Create releasers-wikidiff2 group, split from releasers-mediawiki - https://phabricator.wikimedia.org/T202473 (10Legoktm) [00:24:09] 10Operations, 10SRE-Access-Requests, 10wikidiff2: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10Legoktm) [00:25:25] 10Operations, 10ops-eqiad, 10monitoring: rack/setup/install icinga1001.wikimedia.org - https://phabricator.wikimedia.org/T201344 (10RobH) [00:25:30] 10Operations, 10ops-eqiad, 10monitoring: rack/setup/install icinga1001.wikimedia.org - https://phabricator.wikimedia.org/T201344 (10RobH) p:05High>03Normal [00:26:06] 10Operations, 10SRE-Access-Requests, 10wikidiff2: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10Legoktm) [00:26:08] 10Operations, 10wikidiff2: Create releasers-wikidiff2 group, split from releasers-mediawiki - https://phabricator.wikimedia.org/T202473 (10Legoktm) [00:26:48] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give thiemowmde permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202476 (10Legoktm) [00:34:21] (03PS1) 10Legoktm: Split releasers-wikidiff2 from releasers-mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/454442 (https://phabricator.wikimedia.org/T202473) [00:34:48] (03PS1) 10RobH: pushing scandium into parsoid test service [puppet] - 10https://gerrit.wikimedia.org/r/454443 (https://phabricator.wikimedia.org/T201366) [00:35:27] (03CR) 10jerkins-bot: [V: 04-1] pushing scandium into parsoid test service [puppet] - 10https://gerrit.wikimedia.org/r/454443 (https://phabricator.wikimedia.org/T201366) (owner: 10RobH) [00:37:39] (03CR) 10Legoktm: "I looked in git log and saw that the last new group was given gid: 803, so I gave this one 804. Is that the correct thing to do?" [puppet] - 10https://gerrit.wikimedia.org/r/454442 (https://phabricator.wikimedia.org/T202473) (owner: 10Legoktm) [00:38:06] (03CR) 10RobH: "So, this fails due to the fact that the roles need to be parsed down into a single role, not apply multiple roles" [puppet] - 10https://gerrit.wikimedia.org/r/454443 (https://phabricator.wikimedia.org/T201366) (owner: 10RobH) [00:38:37] 10Operations, 10Parsoid, 10Patch-For-Review: rack/setup/install scandium.eqiad.wmnet (parsoid test box) - https://phabricator.wikimedia.org/T201366 (10RobH) So the change to apply things the same as ruthnium won't work quite right, since we applied more than one role to that system. [00:55:30] 10Operations: php7.2-cli in thirdparty/php72 isn't installable due to libargon2-1 dependency - https://phabricator.wikimedia.org/T202478 (10Legoktm) [00:55:49] 10Operations: php7.2-cli in thirdparty/php72 isn't installable due to libargon2-1 dependency - https://phabricator.wikimedia.org/T202478 (10Legoktm) [01:05:58] (03PS3) 10Dzahn: puppetdb: add postgres backup to bacula [puppet] - 10https://gerrit.wikimedia.org/r/449523 [01:06:52] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/12155/puppetdb1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/449523 (owner: 10Dzahn) [01:10:47] PROBLEM - Check systemd state on puppetdb2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [01:10:52] !log puppetdb2001 - dumping postgres db to local backup path [01:10:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:11:37] PROBLEM - Check systemd state on puppetdb1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [01:12:47] RECOVERY - Check systemd state on puppetdb1001 is OK: OK - running: The system is fully operational [01:13:58] that was because it added the bacula-fd service [01:24:38] RECOVERY - Check systemd state on puppetdb2001 is OK: OK - running: The system is fully operational [01:27:24] (03PS1) 10Dzahn: Revert "puppetdb: add postgres backup to bacula" [puppet] - 10https://gerrit.wikimedia.org/r/454445 [01:29:22] (03CR) 10Dzahn: [C: 032] "reverting it because while testing it on puppetdb2001 i got: pg_dump: Error message from server: ERROR: canceling statement due to confli" [puppet] - 10https://gerrit.wikimedia.org/r/454445 (owner: 10Dzahn) [01:31:39] (03CR) 10Dzahn: [C: 032] "pg_dump: Dumping the contents of table "reports" failed: PQgetResult() failed." [puppet] - 10https://gerrit.wikimedia.org/r/454445 (owner: 10Dzahn) [01:36:13] (03PS6) 10Dzahn: Remove Jessie-specific Puppet code from Mediawiki math class [puppet] - 10https://gerrit.wikimedia.org/r/449672 (owner: 10Muehlenhoff) [01:59:25] (03PS1) 10Dzahn: contint: don't include mw::packages, use contint::packages::mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/454447 [02:01:06] (03CR) 10Dzahn: "I made that patch to go before this to unblock this: please take a look:" [puppet] - 10https://gerrit.wikimedia.org/r/449672 (owner: 10Muehlenhoff) [02:02:19] (03CR) 10Dzahn: [C: 04-1] "formal -1 because it needs to wait for https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/454447/ or similar" [puppet] - 10https://gerrit.wikimedia.org/r/449672 (owner: 10Muehlenhoff) [02:02:44] (03CR) 10Dzahn: "here to unblock https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/449672/" [puppet] - 10https://gerrit.wikimedia.org/r/454447 (owner: 10Dzahn) [02:04:34] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10Andrew) @chasemp two questions: 1) was there a reason we requested these with 10G? (Or, did we?) 2) Is it important that these be in a particular rac... [02:05:03] (03CR) 10Dzahn: contint: don't include mw::packages, use contint::packages::mediawiki (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/454447 (owner: 10Dzahn) [02:07:11] (03PS2) 10Dzahn: contint: don't include mw::packages, use contint::packages::mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/454447 [02:09:03] (03PS1) 10Dzahn: authdns::server: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/454448 [02:10:22] (03CR) 10Dzahn: "@jcrespo it won't do anything, it's just the refactoring trick, it's been replaced on almost everything, i just left this one" [puppet] - 10https://gerrit.wikimedia.org/r/450314 (owner: 10Dzahn) [02:11:34] (03CR) 10Dzahn: "possible to compile for integration-slave-jessie ?" [puppet] - 10https://gerrit.wikimedia.org/r/454447 (owner: 10Dzahn) [02:14:22] (03PS3) 10Dzahn: Inline mediawiki::packages::math [puppet] - 10https://gerrit.wikimedia.org/r/449673 (owner: 10Muehlenhoff) [02:16:43] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/12156/bohrium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/453546 (owner: 10Dzahn) [02:18:08] (03CR) 10Dzahn: "part of trying to deprecate the apache module everywhere.. you may click the topic branch to see other examples. compiled on bohrium" [puppet] - 10https://gerrit.wikimedia.org/r/453546 (owner: 10Dzahn) [02:22:31] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.16) (duration: 09m 06s) [02:22:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:23:20] (03PS2) 10Dzahn: piwik: convert apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/453546 [02:28:37] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/12157/bohrium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/453546 (owner: 10Dzahn) [02:29:39] (03CR) 10Dzahn: [C: 031] Inline mediawiki::packages::math [puppet] - 10https://gerrit.wikimedia.org/r/449673 (owner: 10Muehlenhoff) [02:30:45] (03PS4) 10Dzahn: graphite: delete duplicate role(graphite::primary) [puppet] - 10https://gerrit.wikimedia.org/r/449763 [02:37:01] (03CR) 10Dzahn: [C: 031] "http://puppet-compiler.wmflabs.org/12158/" [puppet] - 10https://gerrit.wikimedia.org/r/449673 (owner: 10Muehlenhoff) [02:37:16] (03CR) 10Dzahn: [C: 032] graphite: delete duplicate role(graphite::primary) [puppet] - 10https://gerrit.wikimedia.org/r/449763 (owner: 10Dzahn) [02:41:38] (03CR) 10Dzahn: [C: 032] "checked puppet on graphite1001,2001,2003,1004,1003 - nothing happened whatsoever" [puppet] - 10https://gerrit.wikimedia.org/r/449763 (owner: 10Dzahn) [02:43:28] (03CR) 10Dzahn: "fixing packages.pp:31 ERROR trailing whitespace found (trailing_whitespace)" [puppet] - 10https://gerrit.wikimedia.org/r/449675 (owner: 10Muehlenhoff) [02:46:15] (03PS2) 10Dzahn: Inline mediawiki::packages::tex [puppet] - 10https://gerrit.wikimedia.org/r/449675 (owner: 10Muehlenhoff) [02:46:50] (03CR) 10jerkins-bot: [V: 04-1] Inline mediawiki::packages::tex [puppet] - 10https://gerrit.wikimedia.org/r/449675 (owner: 10Muehlenhoff) [02:50:05] (03CR) 10Dzahn: [C: 031] "+1 but needs rebasing after the other changes, i'll fix the dependency thing tomorrow" [puppet] - 10https://gerrit.wikimedia.org/r/449675 (owner: 10Muehlenhoff) [02:53:28] (03CR) 10Dzahn: "what distro exactly is quarry running on?" [puppet] - 10https://gerrit.wikimedia.org/r/454217 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [02:56:09] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.18) (duration: 15m 36s) [02:56:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:57:30] (03CR) 10Dzahn: [C: 031] "Thank you for this change @Zhuyifei1999 :) It helps on the ticket i made, it's appreciated." [puppet] - 10https://gerrit.wikimedia.org/r/454217 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [03:06:28] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 126, down: 0, dormant: 0, excluded: 0, unused: 0 [03:06:28] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 [03:06:38] !log l10nupdate@deploy1001 ResourceLoader cache refresh completed at Wed Aug 22 03:06:38 UTC 2018 (duration 10m 29s) [03:06:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:00:15] (03PS1) 10Marostegui: db-eqiad.php: Change weights on x1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454469 [05:47:46] (03CR) 10Jcrespo: "Will deploy together with https://gerrit.wikimedia.org/r/450228 by applying puppet slowly- firewall changes even noops are complicated)" [puppet] - 10https://gerrit.wikimedia.org/r/450314 (owner: 10Dzahn) [05:53:08] !log restarting heartbeat on es1011 [05:53:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:53:28] !log Start topology changes for es2 failover - T202364 [05:53:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:53:34] T202364: Switchover es2 master (es1011) to es1015 - https://phabricator.wikimedia.org/T202364 [05:56:56] (03PS2) 10Jcrespo: mariadb: Promote es1015 to be the new es2 master instead of es1011 [puppet] - 10https://gerrit.wikimedia.org/r/454214 (https://phabricator.wikimedia.org/T202364) [05:58:17] We are going to take over deploy1001 and puppet for the es2 failover starting in 2 minutes [06:00:01] (03CR) 10Jcrespo: [C: 032] mariadb: Promote es1015 to be the new es2 master instead of es1011 [puppet] - 10https://gerrit.wikimedia.org/r/454214 (https://phabricator.wikimedia.org/T202364) (owner: 10Jcrespo) [06:00:05] marostegui and jynus: It is that lovely time of the day again! You are hereby commanded to deploy Database maintenance. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180822T0600). [06:00:10] \o/ [06:00:18] (03PS3) 10Jcrespo: mariadb: Depool cluster24 (es2) from new writes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454210 (https://phabricator.wikimedia.org/T202364) [06:01:36] (03CR) 10Jcrespo: [C: 032] mariadb: Depool cluster24 (es2) from new writes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454210 (https://phabricator.wikimedia.org/T202364) (owner: 10Jcrespo) [06:02:50] (03Merged) 10jenkins-bot: mariadb: Depool cluster24 (es2) from new writes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454210 (https://phabricator.wikimedia.org/T202364) (owner: 10Jcrespo) [06:03:44] (03CR) 10Marostegui: [C: 031] mariadb: Depool cluster24 (es2) from new writes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454210 (https://phabricator.wikimedia.org/T202364) (owner: 10Jcrespo) [06:04:22] * elukey looks for marostegui's alter tables to start his morning [06:04:48] We are failing over es2 first! [06:04:54] :) [06:06:51] (03PS3) 10Jcrespo: mariadb: Promote es1015 to es2 master and repool es2 for writes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454211 (https://phabricator.wikimedia.org/T202364) [06:07:30] (03CR) 10Marostegui: [C: 031] mariadb: Promote es1015 to es2 master and repool es2 for writes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454211 (https://phabricator.wikimedia.org/T202364) (owner: 10Jcrespo) [06:08:24] (03CR) 10Jcrespo: [C: 032] mariadb: Promote es1015 to es2 master and repool es2 for writes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454211 (https://phabricator.wikimedia.org/T202364) (owner: 10Jcrespo) [06:09:38] (03Merged) 10jenkins-bot: mariadb: Promote es1015 to es2 master and repool es2 for writes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454211 (https://phabricator.wikimedia.org/T202364) (owner: 10Jcrespo) [06:09:46] I will merge without --force [06:09:49] *deploy [06:09:51] ok [06:09:58] as theoricly this should need no rush [06:10:03] *theretically [06:10:05] ag [06:10:09] I get it :) [06:10:21] do I do it or you do? [06:10:27] you can go [06:10:31] I will monitor [06:10:44] I will then do the critical parts [06:10:57] go for it! [06:11:10] !log depool es2 to failover master from es1011 to es1015 [06:11:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:11:53] check errors and writes to es1011 [06:11:58] will do [06:12:00] (when it finishes) [06:12:12] (03CR) 10jenkins-bot: mariadb: Depool cluster24 (es2) from new writes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454210 (https://phabricator.wikimedia.org/T202364) (owner: 10Jcrespo) [06:12:14] (03CR) 10jenkins-bot: mariadb: Promote es1015 to es2 master and repool es2 for writes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454211 (https://phabricator.wikimedia.org/T202364) (owner: 10Jcrespo) [06:12:31] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool es2 (duration: 01m 05s) [06:12:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:12:43] activity looks gone on es1011 [06:13:42] select max(blob_id) FROM blobs_cluster24; is frozen [06:13:47] is es3 ok in load? [06:13:56] checking [06:14:12] we have time this time :-) [06:14:28] Obviously load increased [06:14:31] But so far looks fine [06:14:46] I see no db errors except mediawiki and labstestwiki [06:14:53] same [06:15:08] looks like the depooling is working fine [06:15:53] and editing works normally? [06:16:30] yeah [06:16:30] it does [06:16:42] ok, so let's go to the failover phase [06:16:51] +1 [06:17:46] running the switchover script now [06:17:52] ok [06:18:07] SUCCESS: Master switch completed successfully [06:18:16] Servers sync at master: es1011-bin.002581:62175534 slave: es1015-bin.002563:203776011 [06:18:29] current status: original master read_only: 1 / original slave read_only: 0 [06:18:32] es1015 has read only OFF [06:18:32] yeah [06:18:48] heartbeat is running [06:18:49] is replication direction ok? [06:19:04] yes [06:19:09] es1011 replicates from es1015 now [06:19:18] and es1015 has no master [06:19:22] will deploy now? [06:19:25] or see errors? [06:19:41] (replication, mediawiki?) [06:19:42] everything looks goo [06:19:43] d [06:19:49] deploying, then [06:19:57] +1 [06:20:53] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Switchover es2 master eqiad from es1011 to es1015 (duration: 00m 55s) [06:20:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:21:08] select max(blob_id) FROM blobs_cluster24; no longer frozen [06:21:20] can we edit ok? errors, etc? [06:21:38] yeah [06:21:39] I can [06:21:52] so far so good [06:22:11] waiting for the load to reduce on es3 [06:22:59] load coming down on es3 [06:23:09] (03CR) 10WMDE-Fisch: [C: 031] Split releasers-wikidiff2 from releasers-mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/454442 (https://phabricator.wikimedia.org/T202473) (owner: 10Legoktm) [06:23:55] (03PS2) 10Jcrespo: mariadb: Point es2-master to es1015 after master switchover [dns] - 10https://gerrit.wikimedia.org/r/454219 (https://phabricator.wikimedia.org/T202364) [06:24:41] tendril/dbtree look ok now [06:24:44] I will check icinga [06:25:23] only complaining about puppet was disabled, will go away in a bit [06:25:43] see something weird or we should call it done? [06:25:49] Writes are back to normal rate in es2 [06:26:06] I which check edits performance just in case [06:26:09] *will [06:26:28] I checked edit counts and they look fine [06:26:29] e.g. in case there were higher failures/non deterministic failures [06:26:59] save timing seems up, but starting 2 days ago [06:28:40] I am going to log the finish of the sync operations [06:28:48] good [06:29:11] we can do the cleanup without blocking the deployments [06:29:15] indeed [06:29:45] this may have been the easiest switchover we have done [06:29:54] indeed [06:30:01] Very nice work! [06:30:29] !log finished es2 switchover T202364 [06:30:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:30:36] T202364: Switchover es2 master (es1011) to es1015 - https://phabricator.wikimedia.org/T202364 [06:30:52] you are free to continue deploying mediawiki and puppet [06:30:59] PROBLEM - puppet last run on labvirt1017 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/apparmor.d/abstractions/ssl_certs] [06:31:07] we have some low prority stuff yet to do [06:31:21] (03CR) 10Jcrespo: [C: 032] mariadb: Point es2-master to es1015 after master switchover [dns] - 10https://gerrit.wikimedia.org/r/454219 (https://phabricator.wikimedia.org/T202364) (owner: 10Jcrespo) [06:45:30] ACKNOWLEDGEMENT - puppet last run on labvirt1017 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 15 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/apparmor.d/abstractions/ssl_certs] Arturo Borrero Gonzalez Looking [06:46:40] herron: you around? [06:47:38] (03CR) 10Muehlenhoff: "@legoktm: Looks good, those are simply incremented with each group." [puppet] - 10https://gerrit.wikimedia.org/r/454442 (https://phabricator.wikimedia.org/T202473) (owner: 10Legoktm) [06:48:30] (03CR) 10Muehlenhoff: [C: 031] Split releasers-wikidiff2 from releasers-mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/454442 (https://phabricator.wikimedia.org/T202473) (owner: 10Legoktm) [06:53:00] (03CR) 10Muehlenhoff: [C: 031] "Looks good, but let's wait until Antoine is back. It's fine to keep the imagemagick::install class as it is, it's generic enough to be use" [puppet] - 10https://gerrit.wikimedia.org/r/454447 (owner: 10Dzahn) [06:53:00] in labvirt1017 `Puppet is disabled. temporarily disabled during puppetmaster apache updates --herron` [06:55:21] labvirt1017 puppet-agent[177719]: (/Stage[main]/Sslcert/File[/etc/apparmor.d/abstractions/ssl_certs]) Could not evaluate: Could not retrieve file metadata for puppet:///modules/sslcert/apparmor/ssl_certs: end of file reached [06:57:46] !log imported libargon-2.1 for thirdparty/php7.2 repo (T202478) [06:57:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:57:53] T202478: php7.2-cli in thirdparty/php72 isn't installable due to libargon2-1 dependency - https://phabricator.wikimedia.org/T202478 [06:57:55] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10elukey) I am back from vacations! I am still seeing some https traffic to lists.w.o and text-lb from stat1005 though, so I think that I... [06:58:04] RECOVERY - puppet last run on labvirt1017 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [07:03:03] 10Operations: php7.2-cli in thirdparty/php72 isn't installable due to libargon2-1 dependency - https://phabricator.wikimedia.org/T202478 (10MoritzMuehlenhoff) 05Open>03Resolved a:03MoritzMuehlenhoff Imported to the thirdparty/php72 repository. [07:06:56] !log installing apache update on deploy1001/planet1001 [07:07:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:07:28] (03PS4) 10Giuseppe Lavagetto: PHP: create module for modern Debian-based distributions [puppet] - 10https://gerrit.wikimedia.org/r/452664 (https://phabricator.wikimedia.org/T201140) [07:07:30] (03PS3) 10Giuseppe Lavagetto: mediawiki: move php to a profile, use the php class [puppet] - 10https://gerrit.wikimedia.org/r/453093 (https://phabricator.wikimedia.org/T201140) [07:07:32] (03PS1) 10Giuseppe Lavagetto: php: add service management for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/454478 (https://phabricator.wikimedia.org/T201140) [07:08:05] (03CR) 10Giuseppe Lavagetto: "Replies inline" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/452664 (https://phabricator.wikimedia.org/T201140) (owner: 10Giuseppe Lavagetto) [07:08:55] (03CR) 10jerkins-bot: [V: 04-1] php: add service management for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/454478 (https://phabricator.wikimedia.org/T201140) (owner: 10Giuseppe Lavagetto) [07:09:14] 10Operations: Integrate Stretch 9.5 point release - https://phabricator.wikimedia.org/T199670 (10MoritzMuehlenhoff) These updates have been fully deployed: ``` dpkg ghostscript systemd apache2 ``` [07:09:30] <_joe_> rubocop we don't really like each other, do we [07:13:05] 10Operations, 10Puppet, 10Wikidata, 10Wikidata-Query-Service, 10cloud-services-team: convert cloud VPS projects from apache to httpd module (wikidata-query/ldfclient) - https://phabricator.wikimedia.org/T202092 (10Smalyshev) a:03Smalyshev I don't see why ldfclient can't use any apache role, so I'll try... [07:16:41] (03PS2) 10Giuseppe Lavagetto: php: add service management for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/454478 (https://phabricator.wikimedia.org/T201140) [07:16:43] (03CR) 10Zhuyifei1999: "Quarry is currently on Trusty and Jessie, but we are going to make a new install on Stretch, and delete the old instances." [puppet] - 10https://gerrit.wikimedia.org/r/454217 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [07:17:45] 10Operations, 10SRE-Access-Requests: Requesting access to restricted production access and analytics-privatedata-users for Kalliope Tsouroupidou - https://phabricator.wikimedia.org/T202486 (10Jalexander) [07:18:04] (03CR) 10jerkins-bot: [V: 04-1] php: add service management for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/454478 (https://phabricator.wikimedia.org/T201140) (owner: 10Giuseppe Lavagetto) [07:19:02] (03PS3) 10Jcrespo: mariadb::packages: Unversion server and client [puppet] - 10https://gerrit.wikimedia.org/r/454217 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [07:20:00] (03PS3) 10Giuseppe Lavagetto: php: add service management for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/454478 (https://phabricator.wikimedia.org/T201140) [07:21:12] (03CR) 10jerkins-bot: [V: 04-1] php: add service management for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/454478 (https://phabricator.wikimedia.org/T201140) (owner: 10Giuseppe Lavagetto) [07:21:14] (03CR) 10Jcrespo: [C: 032] mariadb::packages: Unversion server and client [puppet] - 10https://gerrit.wikimedia.org/r/454217 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [07:23:42] (03CR) 10Muehlenhoff: [C: 031] PHP: create module for modern Debian-based distributions [puppet] - 10https://gerrit.wikimedia.org/r/452664 (https://phabricator.wikimedia.org/T201140) (owner: 10Giuseppe Lavagetto) [07:27:05] (03PS4) 10Giuseppe Lavagetto: php: add service management for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/454478 (https://phabricator.wikimedia.org/T201140) [07:29:41] (03PS4) 10Giuseppe Lavagetto: mediawiki: move php to a profile, use the php class [puppet] - 10https://gerrit.wikimedia.org/r/453093 (https://phabricator.wikimedia.org/T201140) [07:29:43] (03PS5) 10Giuseppe Lavagetto: php: add service management for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/454478 (https://phabricator.wikimedia.org/T201140) [07:31:28] (03PS1) 10Elukey: profile::statistics::cruncher|private: remove unused bacula settings [puppet] - 10https://gerrit.wikimedia.org/r/454480 (https://phabricator.wikimedia.org/T201165) [07:32:41] (03CR) 10Giuseppe Lavagetto: [C: 031] "Nothing besides profiles should include things from the mediawiki module." [puppet] - 10https://gerrit.wikimedia.org/r/454447 (owner: 10Dzahn) [07:41:26] (03PS2) 10Marostegui: db-eqiad.php: Change weights on x1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454469 [07:44:31] (03CR) 10Jcrespo: [C: 031] db-eqiad.php: Change weights on x1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454469 (owner: 10Marostegui) [07:44:48] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Change weights on x1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454469 (owner: 10Marostegui) [07:45:03] (03CR) 10Jcrespo: [C: 031] "There is also https://gerrit.wikimedia.org/r/451844" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454469 (owner: 10Marostegui) [07:46:03] (03Merged) 10jenkins-bot: db-eqiad.php: Change weights on x1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454469 (owner: 10Marostegui) [07:47:51] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Change x1 eqiad weights to give more weight to the most powerful server (duration: 00m 55s) [07:47:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:52:00] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on einsteinium is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [07:56:25] (03PS1) 10Zhuyifei1999: quarry::database: Use mariadb module instead of mysql module [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) [07:56:40] (03CR) 10Muehlenhoff: php: add service management for php-fpm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454478 (https://phabricator.wikimedia.org/T201140) (owner: 10Giuseppe Lavagetto) [07:57:03] (03CR) 10jerkins-bot: [V: 04-1] quarry::database: Use mariadb module instead of mysql module [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [07:57:20] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [07:59:27] (03CR) 10jenkins-bot: db-eqiad.php: Change weights on x1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454469 (owner: 10Marostegui) [08:01:34] (03PS2) 10Zhuyifei1999: quarry::database: Use mariadb module instead of mysql module [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) [08:02:10] (03CR) 10jerkins-bot: [V: 04-1] quarry::database: Use mariadb module instead of mysql module [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [08:02:25] (03CR) 10Volans: [C: 032] cumin: add alias consistency checker (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454077 (owner: 10Volans) [08:02:36] (03CR) 10Jcrespo: quarry::database: Use mariadb module instead of mysql module (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [08:06:14] (03PS3) 10Volans: cumin: add alias consistency checker [puppet] - 10https://gerrit.wikimedia.org/r/454077 [08:08:51] (03CR) 10Volans: [C: 032] cumin: add alias consistency checker [puppet] - 10https://gerrit.wikimedia.org/r/454077 (owner: 10Volans) [08:15:05] (03CR) 10MarcoAurelio: Configure gendered namespaces for pl.wiktionary (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454213 (https://phabricator.wikimedia.org/T202347) (owner: 10MarcoAurelio) [08:20:30] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [08:22:29] <_joe_> this is from a known problem ^^ [08:24:41] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [08:33:14] (03PS1) 10Jcrespo: mariadb: Depool es1011 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454485 (https://phabricator.wikimedia.org/T202364) [08:33:52] (03CR) 10Marostegui: [C: 031] mariadb: Depool es1011 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454485 (https://phabricator.wikimedia.org/T202364) (owner: 10Jcrespo) [08:33:59] 10Operations, 10SRE-Access-Requests: Requesting access to restricted production access and analytics-privatedata-users for Samuel Guebo - https://phabricator.wikimedia.org/T202362 (10sguebo_WMF) Hi @RobH, I hereby confirm that I am the one who generated the key below: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDcAX... [08:34:50] jouncebot: now [08:34:50] No deployments scheduled for the next 2 hour(s) and 25 minute(s) [08:34:56] jouncebot: next [08:34:56] In 2 hour(s) and 25 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180822T1100) [08:35:08] _joe_: i guess it relates to https://phabricator.wikimedia.org/T202483 ? [08:35:17] im just about to patch it [08:35:31] <_joe_> addshore: nope [08:35:39] interesting *looks at logstash* [08:36:00] <_joe_> it's from the Translate/stringmangler/StringMatcher.php issue [08:36:21] (03PS2) 10Ema: Revert "depool eqiad for front-edge traffic" [dns] - 10https://gerrit.wikimedia.org/r/454420 (owner: 10BBlack) [08:37:28] (03CR) 10Zhuyifei1999: quarry::database: Use mariadb module instead of mysql module (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [08:37:41] _joe_: the spike looks like it is still the wikidata / mcr / blobstore thing to me? https://logstash.wikimedia.org/goto/ab9a7b94c2e6229a9b7414da945266e1 [08:37:48] unless im miss reading sometihng [08:38:28] (03CR) 10Ema: [C: 032] Revert "depool eqiad for front-edge traffic" [dns] - 10https://gerrit.wikimedia.org/r/454420 (owner: 10BBlack) [08:38:46] <_joe_> addshore: correct, the preceding one was due to the translate exception [08:38:52] <_joe_> the one at 8:05 [08:38:53] gotcha [08:39:20] (03PS1) 10Jcrespo: install_server: Allow manual reimage of es1011 [puppet] - 10https://gerrit.wikimedia.org/r/454487 (https://phabricator.wikimedia.org/T202364) [08:39:22] !log repool eqiad for front-edge traffic [08:39:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:40:04] !log installing clamav security updates on mendelevium (bundling libmspack security update along) [08:40:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:44] (03CR) 10Jcrespo: [C: 032] mariadb: Depool es1011 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454485 (https://phabricator.wikimedia.org/T202364) (owner: 10Jcrespo) [08:43:23] (03CR) 10ArielGlenn: [C: 031] "Yay! This looks fine. BTW hashar is back." [puppet] - 10https://gerrit.wikimedia.org/r/454447 (owner: 10Dzahn) [08:44:02] (03Merged) 10jenkins-bot: mariadb: Depool es1011 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454485 (https://phabricator.wikimedia.org/T202364) (owner: 10Jcrespo) [08:45:56] (03CR) 10jenkins-bot: mariadb: Depool es1011 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454485 (https://phabricator.wikimedia.org/T202364) (owner: 10Jcrespo) [08:46:44] !log Stop replication on x1 codfw master (db2034) for data fixes - T104459 T201603 [08:46:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:50] T104459: Automate the check and fix of object, schema and data drifts between mediawiki HEAD, production masters and slaves - https://phabricator.wikimedia.org/T104459 [08:46:51] T201603: db2069 storage crash - https://phabricator.wikimedia.org/T201603 [08:50:23] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool es1011 (duration: 00m 56s) [08:50:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:56:32] o/ jynus. I just saw you deployed something in mediawiki-config. Any chance I can have the next ~15-30 mins to deploy & test a fix for an UBN? [08:57:41] yes, I will not deploy anything for 3 hours [08:57:48] jynus: great, thanks! [09:00:11] (03CR) 10Jcrespo: [C: 032] install_server: Allow manual reimage of es1011 [puppet] - 10https://gerrit.wikimedia.org/r/454487 (https://phabricator.wikimedia.org/T202364) (owner: 10Jcrespo) [09:00:36] (03PS1) 10Jcrespo: Revert "mariadb: Depool es1011 for reimage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454489 [09:03:00] (03CR) 10Hashar: [C: 031] "I introduced that pattern in the CI manifests, it is way better to avoid repeating ourself and have it handled by docker-pkg instead!" [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/449918 (owner: 10Legoktm) [09:04:42] (03PS11) 10Reedy: Enforce that interface-admin is the only group that can edit non-own CSS/JS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421125 (https://phabricator.wikimedia.org/T190015) (owner: 10Gergő Tisza) [09:04:57] (03CR) 10jerkins-bot: [V: 04-1] Enforce that interface-admin is the only group that can edit non-own CSS/JS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421125 (https://phabricator.wikimedia.org/T190015) (owner: 10Gergő Tisza) [09:05:42] syncing [09:06:54] !log addshore@deploy1001 Synchronized php-1.32.0-wmf.18/includes: T202483 [[gerrit:454486|The BlobStoreFactory constructor needs an LBFactory]] (duration: 01m 18s) [09:06:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:59] T202483: www.mediawiki.org showing: error:Unknown database 'wikidatawiki' on shard: s3 - https://phabricator.wikimedia.org/T202483 [09:09:29] well, that's all the time I needed :) [09:19:28] !log reimage es1011 to stretch [09:19:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:34:52] !log installing nodejs security updates on restbase* [09:34:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:42:04] (03PS1) 10Elukey: profile::archiva::proxy: protect first run with firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/454497 (https://phabricator.wikimedia.org/T192639) [09:44:39] (03CR) 10Elukey: "Current archiva host (meitnerium) is a no op: https://puppet-compiler.wmflabs.org/compiler02/12159/meitnerium.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/454497 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [09:45:25] !log mobrovac@deploy1001 Started deploy [restbase/deploy@5d03f1c] (dev-cluster): Expand CSS end points [09:45:26] 10Operations: Redirect 2030.wikimedia.org to the new movement strategy portal - https://phabricator.wikimedia.org/T202498 (10Addshore) [09:45:27] (03PS2) 10Elukey: profile::archiva::proxy: protect first run with firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/454497 (https://phabricator.wikimedia.org/T192639) [09:45:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:45:33] 10Operations, 10Wikimedia-Apache-configuration: Redirect 2030.wikimedia.org to the new movement strategy portal - https://phabricator.wikimedia.org/T202498 (10Reedy) [09:45:36] (03PS1) 10Reedy: Update 2030.wikimedia.org redirect [puppet] - 10https://gerrit.wikimedia.org/r/454498 (https://phabricator.wikimedia.org/T202498) [09:45:55] (03CR) 10Muehlenhoff: [C: 031] "Looks good. Alternatively you could also make the ferm rule conditional on the option, but this works fine as well." [puppet] - 10https://gerrit.wikimedia.org/r/454497 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [09:48:39] (03PS3) 10Volans: Add MediaWiki module to manipulate its config [software/spicerack] - 10https://gerrit.wikimedia.org/r/454290 (https://phabricator.wikimedia.org/T199079) [09:48:41] (03PS2) 10Volans: dnsdisc: replace retry logic with decorator [software/spicerack] - 10https://gerrit.wikimedia.org/r/454334 (https://phabricator.wikimedia.org/T199079) [09:48:43] (03PS1) 10Volans: spicerack: expose the IRC logger [software/spicerack] - 10https://gerrit.wikimedia.org/r/454499 (https://phabricator.wikimedia.org/T199079) [09:48:45] (03PS1) 10Volans: cookbook: allow to specify the global config path [software/spicerack] - 10https://gerrit.wikimedia.org/r/454500 (https://phabricator.wikimedia.org/T199079) [09:48:56] (03CR) 10Volans: "replies inline" (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/454290 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [09:49:00] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@5d03f1c] (dev-cluster): Expand CSS end points (duration: 03m 35s) [09:49:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:49:05] (03CR) 10Volans: "done" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/454334 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [09:49:43] !log mobrovac@deploy1001 Started deploy [restbase/deploy@5d03f1c]: Expand CSS end points - T202425 [09:49:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:49:48] T202425: Error when specifying a localhost URL for mobileapps.host - https://phabricator.wikimedia.org/T202425 [09:51:29] !log uploaded clustershell 1.8-1~deb9u1 to apt.wikimedia.org/stretch-wikimedia [09:51:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:52:34] (03PS3) 10Elukey: profile::archiva::proxy: protect first run with firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/454497 (https://phabricator.wikimedia.org/T192639) [09:53:08] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@5d03f1c]: Expand CSS end points - T202425 (duration: 03m 25s) [09:53:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:53:34] (03CR) 10Elukey: [C: 032] profile::archiva::proxy: protect first run with firewall rules [puppet] - 10https://gerrit.wikimedia.org/r/454497 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [09:54:16] (03CR) 10Giuseppe Lavagetto: php: add service management for php-fpm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454478 (https://phabricator.wikimedia.org/T201140) (owner: 10Giuseppe Lavagetto) [09:54:33] !log mobrovac@deploy1001 Started deploy [restbase/deploy@5d03f1c]: Expand CSS end points - T202105 [09:54:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:54:38] T202105: Separate pagelib CSS from base CSS - https://phabricator.wikimedia.org/T202105 [09:55:37] (03PS6) 10Giuseppe Lavagetto: php: add service management for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/454478 (https://phabricator.wikimedia.org/T201140) [09:57:46] (03PS1) 10Elukey: Add component/archiva to stretch-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/454502 (https://phabricator.wikimedia.org/T192639) [09:59:01] !log restarting backups on codfw due to grant issue [09:59:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:52] (03PS1) 10Jcrespo: mydumper: Require python3-pymysql to store backup statistics [puppet] - 10https://gerrit.wikimedia.org/r/454504 (https://phabricator.wikimedia.org/T198987) [10:02:25] (03CR) 10Jcrespo: "We all forgot to explicitly require the new dependency." [puppet] - 10https://gerrit.wikimedia.org/r/454504 (https://phabricator.wikimedia.org/T198987) (owner: 10Jcrespo) [10:02:35] (03PS2) 10Jcrespo: mydumper: Require python3-pymysql to store backup statistics [puppet] - 10https://gerrit.wikimedia.org/r/454504 (https://phabricator.wikimedia.org/T198987) [10:03:21] (03CR) 10Jcrespo: [C: 032] mydumper: Require python3-pymysql to store backup statistics [puppet] - 10https://gerrit.wikimedia.org/r/454504 (https://phabricator.wikimedia.org/T198987) (owner: 10Jcrespo) [10:07:07] (03CR) 10Giuseppe Lavagetto: [C: 032] apt_install: Allow newline separated list of packages [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/449918 (owner: 10Legoktm) [10:08:01] (03Merged) 10jenkins-bot: apt_install: Allow newline separated list of packages [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/449918 (owner: 10Legoktm) [10:08:06] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@5d03f1c]: Expand CSS end points - T202105 (duration: 13m 33s) [10:08:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:08:12] T202105: Separate pagelib CSS from base CSS - https://phabricator.wikimedia.org/T202105 [10:08:22] !log mobrovac@deploy1001 Started deploy [restbase/deploy@5d03f1c]: Expand CSS end points, take #3 [10:08:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:10:14] (03PS3) 10Giuseppe Lavagetto: tests: migrate from nose to pytest [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/398136 (owner: 10Hashar) [10:10:43] (03CR) 10Giuseppe Lavagetto: [C: 032] tests: migrate from nose to pytest [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/398136 (owner: 10Hashar) [10:11:20] (03Merged) 10jenkins-bot: tests: migrate from nose to pytest [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/398136 (owner: 10Hashar) [10:14:36] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10Addshore) [10:16:50] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@5d03f1c]: Expand CSS end points, take #3 (duration: 08m 28s) [10:16:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:17:01] !log mobrovac@deploy1001 Started deploy [restbase/deploy@5d03f1c]: Expand CSS end points, take #4 [10:17:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:22:04] (03PS1) 10Elukey: archiva: set apt component when running on Debian Stretch [puppet] - 10https://gerrit.wikimedia.org/r/454508 (https://phabricator.wikimedia.org/T192639) [10:23:36] 10Operations, 10SRE-Access-Requests: Phabricator: Allow aklapper to delete personal Herald filter rules - https://phabricator.wikimedia.org/T202503 (10Aklapper) [10:23:39] (03PS1) 10Jcrespo: mariadb: Capture connection error exceptions [puppet] - 10https://gerrit.wikimedia.org/r/454509 (https://phabricator.wikimedia.org/T198987) [10:24:55] 10Operations, 10Continuous-Integration-Infrastructure (shipyard), 10Kubernetes: Evaluate VMWare's Harbour as a docker registry - https://phabricator.wikimedia.org/T202504 (10Joe) [10:25:32] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@5d03f1c]: Expand CSS end points, take #4 (duration: 08m 31s) [10:25:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:25:38] !log mobrovac@deploy1001 Started deploy [restbase/deploy@5d03f1c]: Expand CSS end points, take #5 [10:25:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:28:37] (03PS1) 10WMDE-leszek: Wikidata: Added config variable to change new link formatter item range [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454510 (https://phabricator.wikimedia.org/T201832) [10:28:50] (03CR) 10Muehlenhoff: archiva: set apt component when running on Debian Stretch (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454508 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [10:28:55] 10Operations, 10Continuous-Integration-Infrastructure (shipyard), 10Kubernetes: Evaluate VMWare's Harbour as a docker registry - https://phabricator.wikimedia.org/T202504 (10Reedy) [10:30:25] !log deploying ores gerrit:454283 to beta (T197097) [10:30:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:30:30] T197097: Migrate models to LFS - https://phabricator.wikimedia.org/T197097 [10:30:33] (03CR) 10Elukey: ">" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454508 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [10:30:48] (03PS2) 10WMDE-leszek: Wikidata: Use new item ID formatter for Q1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/452365 (https://phabricator.wikimedia.org/T201832) [10:31:53] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@5d03f1c]: Expand CSS end points, take #5 (duration: 06m 15s) [10:31:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:37:04] (03Abandoned) 10Elukey: archiva: set apt component when running on Debian Stretch [puppet] - 10https://gerrit.wikimedia.org/r/454508 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [10:37:16] (03Abandoned) 10Elukey: Add component/archiva to stretch-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/454502 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [10:39:05] (03CR) 10Muehlenhoff: "Yeah, that's fine for stretch-wikimedia/main, then" [puppet] - 10https://gerrit.wikimedia.org/r/454508 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [10:40:04] (03CR) 10ArielGlenn: "What I meant was, if dumps are produced by multiple invocations of the script, as they are now, we might write one (via 'dumpFormat'), wri" [puppet] - 10https://gerrit.wikimedia.org/r/447922 (https://phabricator.wikimedia.org/T144103) (owner: 10Smalyshev) [10:43:02] !log upload archiva 2.2.3-1 to stretch-wikimedia/main - T192639 [10:43:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:07] T192639: Upgrade Archiva (meitnerium) to Debian Stretch - https://phabricator.wikimedia.org/T192639 [10:46:02] (03PS10) 10Gergő Tisza: Remove sitewide and user CSS/JS editing from old groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421124 (https://phabricator.wikimedia.org/T190015) [10:46:04] (03PS12) 10Gergő Tisza: Enforce that interface-admin is the only group that can edit non-own CSS/JS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/421125 (https://phabricator.wikimedia.org/T190015) [10:50:56] (03PS1) 10Elukey: Assign role::archiva to archiva1001 [puppet] - 10https://gerrit.wikimedia.org/r/454511 (https://phabricator.wikimedia.org/T192639) [11:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: It is that lovely time of the day again! You are hereby commanded to deploy European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180822T1100). [11:00:04] No GERRIT patches in the queue for this window AFAICS. [11:00:21] o/ [11:00:28] but looks like there is nothing to swat [11:03:36] (03CR) 10Hashar: [C: 031] "It compiles at least:" [puppet] - 10https://gerrit.wikimedia.org/r/454447 (owner: 10Dzahn) [11:06:57] (03CR) 10Elukey: "pcc https://puppet-compiler.wmflabs.org/compiler02/12161/archiva1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/454511 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [11:08:48] (03CR) 10Muehlenhoff: [C: 031] Assign role::archiva to archiva1001 [puppet] - 10https://gerrit.wikimedia.org/r/454511 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [11:10:46] (03PS1) 10Elukey: profile::archiva::proxy: add monitoring_enabled parameter [puppet] - 10https://gerrit.wikimedia.org/r/454514 (https://phabricator.wikimedia.org/T192639) [11:11:07] (03CR) 10ArielGlenn: [C: 031] "This is good to go whenever you look (or would you prefer me to +2 and merge?)" [puppet] - 10https://gerrit.wikimedia.org/r/441135 (https://phabricator.wikimedia.org/T196920) (owner: 10Herron) [11:11:44] (03PS2) 10Elukey: profile::archiva::proxy: add monitoring_enabled parameter [puppet] - 10https://gerrit.wikimedia.org/r/454514 (https://phabricator.wikimedia.org/T192639) [11:18:26] 10Operations, 10SRE-Access-Requests: Phabricator: Allow aklapper to delete personal Herald filter rules - https://phabricator.wikimedia.org/T202503 (10MarcoAurelio) Will administrators at least attempt to warn users before deleting them? I'd appreciate so because as a Herald user (cf. H90) the interface doesn'... [11:19:53] (03CR) 10Elukey: [C: 032] "No op https://puppet-compiler.wmflabs.org/compiler02/12162/meitnerium.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/454514 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [11:22:38] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10WMDE-Fisch) Following the steps on https://wikitech.wikimedia.org/wiki/Production_shell_access#New_users for new us... [11:25:43] (03PS2) 10Elukey: Assign role::archiva to archiva1001 [puppet] - 10https://gerrit.wikimedia.org/r/454511 (https://phabricator.wikimedia.org/T192639) [11:26:29] jouncebot: next [11:26:29] In 0 hour(s) and 33 minute(s): Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180822T1200) [11:28:31] (03CR) 10Elukey: "added a parameter in another cr to avoid checking archiva.wikimedia.org unless explicitly enabled." [puppet] - 10https://gerrit.wikimedia.org/r/454511 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [11:29:13] (03PS3) 10MarcoAurelio: Configure gendered namespaces for pl.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454213 (https://phabricator.wikimedia.org/T202347) [11:29:22] (03PS1) 10Marostegui: db-eqiad.php: Depool db1113:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454519 [11:29:53] (03CR) 10jerkins-bot: [V: 04-1] Configure gendered namespaces for pl.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454213 (https://phabricator.wikimedia.org/T202347) (owner: 10MarcoAurelio) [11:31:35] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1113:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454519 (owner: 10Marostegui) [11:32:26] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2069 storage crash - https://phabricator.wikimedia.org/T201603 (10Marostegui) I have fixed T201603#4513954 and I am going to re-run the checks across all the wikis again. Will repool the db2069 and close this task and create a new ticket for x1 consi... [11:32:29] (03PS4) 10MarcoAurelio: Configure gendered namespaces for pl.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454213 (https://phabricator.wikimedia.org/T202347) [11:33:09] (03PS3) 10Marostegui: Revert "mariadb: Depool db2069 due to crash" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/451844 (owner: 10Jcrespo) [11:35:20] (03CR) 10ArielGlenn: "ConfigParser.read() will already skip nonexistent files, so again is there something else we're trying to fix here?" [dumps] - 10https://gerrit.wikimedia.org/r/348011 (owner: 10Awight) [11:35:22] (03PS5) 10MarcoAurelio: Configure gendered namespaces for pl.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454213 (https://phabricator.wikimedia.org/T202347) [11:36:10] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1113:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454519 (owner: 10Marostegui) [11:37:26] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1113:3315 (duration: 00m 58s) [11:37:27] !log Deploy schema change on db1113:3315 [11:37:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:47] (03PS2) 10Jcrespo: Revert "mariadb: Depool es1011 for reimage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454489 [11:38:12] (03CR) 10Marostegui: [C: 032] Revert "mariadb: Depool db2069 due to crash" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/451844 (owner: 10Jcrespo) [11:39:26] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db2069 due to crash" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/451844 (owner: 10Jcrespo) [11:40:54] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool db2069 - T201603 (duration: 00m 56s) [11:40:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:40:59] T201603: db2069 storage crash - https://phabricator.wikimedia.org/T201603 [11:43:39] (03PS3) 10Muehlenhoff: contint: don't include mw::packages, use contint::packages::mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/454447 (owner: 10Dzahn) [11:45:49] (03Abandoned) 10MarcoAurelio: Configure gendered namespaces for pl.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454213 (https://phabricator.wikimedia.org/T202347) (owner: 10MarcoAurelio) [11:45:51] (03CR) 10Muehlenhoff: [C: 032] contint: don't include mw::packages, use contint::packages::mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/454447 (owner: 10Dzahn) [11:48:44] (03PS1) 10WMDE-leszek: Wikidata (labs): Added config variable to change new link formatter item range [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454521 (https://phabricator.wikimedia.org/T201832) [11:48:46] (03PS1) 10WMDE-leszek: Wikidata: Use new item ID formatter for Q1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454522 (https://phabricator.wikimedia.org/T201832) [11:51:25] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1113:3315 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454519 (owner: 10Marostegui) [11:51:28] (03CR) 10jenkins-bot: Revert "mariadb: Depool db2069 due to crash" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/451844 (owner: 10Jcrespo) [11:52:00] (03CR) 10Gehel: [C: 031] "Thanks for the docs! Much clearer now!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/454334 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [11:52:55] (03CR) 10ArielGlenn: [C: 031] ":)" [puppet] - 10https://gerrit.wikimedia.org/r/449672 (owner: 10Muehlenhoff) [11:53:35] (03CR) 10Gehel: [C: 031] "LGTM" (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/454290 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [11:54:41] (03CR) 10Gehel: [C: 031] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/454499 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [11:55:04] (03Restored) 10MarcoAurelio: Configure gendered namespaces for pl.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454213 (https://phabricator.wikimedia.org/T202347) (owner: 10MarcoAurelio) [11:55:31] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: db2069 storage crash - https://phabricator.wikimedia.org/T201603 (10Marostegui) 05Open>03Resolved a:05Marostegui>03jcrespo [11:55:50] (03CR) 10Gehel: [C: 031] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/454500 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [11:56:50] (03PS6) 10MarcoAurelio: Modify gender namespaces for pl.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454213 (https://phabricator.wikimedia.org/T202347) [11:59:26] (03PS7) 10Muehlenhoff: Remove Jessie-specific Puppet code from Mediawiki math class [puppet] - 10https://gerrit.wikimedia.org/r/449672 [12:00:05] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180822T1200) [12:00:38] (03CR) 10MarcoAurelio: Modify gender namespaces for pl.wiktionary (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454213 (https://phabricator.wikimedia.org/T202347) (owner: 10MarcoAurelio) [12:04:29] (03CR) 10Muehlenhoff: [C: 032] Remove Jessie-specific Puppet code from Mediawiki math class [puppet] - 10https://gerrit.wikimedia.org/r/449672 (owner: 10Muehlenhoff) [12:04:56] 10Operations, 10netops: set up NAT from 208.80.155.15 to frpig1001 - https://phabricator.wikimedia.org/T202520 (10Jgreen) p:05Triage>03Normal [12:05:52] 10Operations: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) [12:06:02] 10Operations: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) p:05Triage>03Normal [12:06:48] 10Operations: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10Marostegui) 05Open>03stalled Firstly an email is required, OIT does that. [12:06:50] (03PS4) 10Muehlenhoff: Inline mediawiki::packages::math [puppet] - 10https://gerrit.wikimedia.org/r/449673 [12:07:33] PROBLEM - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.020 second response time [12:10:52] RECOVERY - toolschecker: check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.011 second response time [12:12:10] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1113:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454524 [12:13:32] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1113:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454524 (owner: 10Marostegui) [12:14:31] (03PS1) 10Daniel Kinzler: Make current default of MCR migration stage explicit. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454527 (https://phabricator.wikimedia.org/T197816) [12:14:51] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1113:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454524 (owner: 10Marostegui) [12:15:37] !log Deploy schema change on db1070 (s5 primary master) - T67448 T114117 T51191 [12:15:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:15:44] T114117: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117 [12:15:45] T51191: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191 [12:15:45] T67448: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 [12:15:45] (03PS2) 10Daniel Kinzler: Make current default of MCR migration stage explicit. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454527 (https://phabricator.wikimedia.org/T197816) [12:16:02] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1113:3315 (duration: 00m 57s) [12:16:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:22:27] (03PS3) 10Jcrespo: Revert "mariadb: Depool es1011 for reimage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454489 [12:29:32] (03CR) 10Gergő Tisza: [C: 031] Make current default of MCR migration stage explicit. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454527 (https://phabricator.wikimedia.org/T197816) (owner: 10Daniel Kinzler) [12:30:29] marostegui: okay if I merged 2 things into mediawiki-config for beta? [12:30:35] (03CR) 10Muehlenhoff: [C: 032] Inline mediawiki::packages::math [puppet] - 10https://gerrit.wikimedia.org/r/449673 (owner: 10Muehlenhoff) [12:30:47] addshore: yeah, not planning to merge anything :) [12:30:51] cool! [12:30:52] (03PS1) 10Daniel Kinzler: Set MCR migration to read-both/read-new on testwiki. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454534 (https://phabricator.wikimedia.org/T198309) [12:31:07] (03PS2) 10Addshore: Wikidata (labs): Added config variable to change new link formatter item range [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454521 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [12:31:12] (03CR) 10Addshore: [C: 032] Wikidata (labs): Added config variable to change new link formatter item range [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454521 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [12:31:29] (03PS2) 10Addshore: Wikidata (labs): Use new item ID formatter for Q1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454522 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [12:31:33] (03PS3) 10Addshore: Wikidata (labs): Use new item ID formatter for Q1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454522 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [12:31:36] (03CR) 10Addshore: [C: 032] Wikidata (labs): Use new item ID formatter for Q1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454522 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [12:32:12] (03CR) 10jerkins-bot: [V: 04-1] Set MCR migration to read-both/read-new on testwiki. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454534 (https://phabricator.wikimedia.org/T198309) (owner: 10Daniel Kinzler) [12:32:46] jouncebot: next [12:32:46] In 0 hour(s) and 27 minute(s): MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180822T1300) [12:33:20] (03CR) 10jerkins-bot: [V: 04-1] Wikidata (labs): Use new item ID formatter for Q1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454522 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [12:35:04] leszek_wmde: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/454521/ is marked as a WIP so I can't merge it [12:35:38] oh wait, i can unmark it [12:35:47] (03CR) 10Addshore: [C: 032] Wikidata (labs): Added config variable to change new link formatter item range [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454521 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [12:35:56] (03CR) 10Addshore: [C: 032] Wikidata (labs): Use new item ID formatter for Q1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454522 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [12:36:38] (03PS2) 10Muehlenhoff: Remove jessie-specific code from mediawiki::packages::tex, cleanups [puppet] - 10https://gerrit.wikimedia.org/r/449674 [12:37:07] (03Merged) 10jenkins-bot: Wikidata (labs): Added config variable to change new link formatter item range [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454521 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [12:37:48] (03Merged) 10jenkins-bot: Wikidata (labs): Use new item ID formatter for Q1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454522 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [12:38:44] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1113:3315" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454524 (owner: 10Marostegui) [12:38:46] (03CR) 10jenkins-bot: Wikidata (labs): Added config variable to change new link formatter item range [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454521 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [12:38:48] (03CR) 10jenkins-bot: Wikidata (labs): Use new item ID formatter for Q1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454522 (https://phabricator.wikimedia.org/T201832) (owner: 10WMDE-leszek) [12:39:44] heck exception log is busy? [12:39:48] *looks on logstash* [12:39:59] jouncebot: now [12:39:59] For the next 0 hour(s) and 20 minute(s): Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180822T1200) [12:40:01] jouncebot: next [12:40:01] In 0 hour(s) and 19 minute(s): MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180822T1300) [12:40:26] 3 [13518ms] at runtime/ext_mysql: slow query: UPDATE /* User::incEditCountImmediate */ `user` SET user_editcount=user_editcount+1 WHERE user_id = '90987' AND (user_editcount IS NOT NULL) [12:40:31] a shit tonne of that in fatal monitor [12:40:55] :( [12:41:06] 2018-08-22 12:40:52 [W31ZzgpAEDIAAIMm160AAABF] mw1285 commonswiki 1.32.0-wmf.16 exception ERROR: [W31ZzgpAEDIAAIMm160AAABF] /w/api.php Wikimedia\Rdbms\DBTransactionSizeError from line 1325 of /srv/mediawiki/php-1.32.0-wmf.16/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Transaction spent 6.0824255943298 second(s) in writes, exceeding the limit of 3. [12:41:11] in exception.log, lots [12:41:33] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [12:41:42] ^^ i guess that this [12:42:24] or is that string matcher again? [12:43:15] looks like the exception spam stopped.. [12:43:20] Mostly commons? [12:43:27] and just started again... [12:43:35] yes, all commons it looks like [12:44:22] (03CR) 10Muehlenhoff: [C: 031] Assign role::archiva to archiva1001 [puppet] - 10https://gerrit.wikimedia.org/r/454511 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [12:45:46] It seems too many concurrent edits by user 90987 [12:46:16] where did you see that? [12:46:29] on the logs [12:46:37] Lock wait timeout exceeded; try restarting transaction [12:46:42] !log addshore@deploy1001 Synchronized wmf-config: BETA ONLY (2 wikidata patches) (duration: 00m 55s) [12:46:43] User::incEditCountImmediate [12:46:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:54] Lock wait timeout exceeded [12:47:45] Ah, I was looking at fatals [12:47:51] Now I see what you saw [12:48:34] https://commons.wikimedia.org/wiki/Special:Contributions/Blackcat [12:49:14] Uff [12:51:00] addshore: sorry for WIPing it. Forgot I have the default WIP setting on [12:51:18] leszek_wmde: also, there isnt a Q1 on beta, so I'll set it to 1000 instead? :) [12:51:27] 10Operations, 10SRE-Access-Requests: Requesting access to restricted production access and analytics-privatedata-users for Kalliope Tsouroupidou - https://phabricator.wikimedia.org/T202486 (10JanWMF) approved [12:52:12] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [12:52:27] addshore: sure, just please be ready to drop to Q1 if things go south [12:52:44] addshore: I will stop being responsive in 8 minutes [12:52:52] okay! [12:53:00] addshore: Will logs from beta go to the Logstash? [12:53:11] Aleksey_WMDE: they will go to beta logstash yes! [12:53:19] Where is it? [12:55:57] addshore: can you add Aleksey_WMDE to the project needed to be able to ssh to the instance where beta logstash password is? [12:56:04] addshore: I showed the url etc [12:58:34] addshore: looks I managed to add him . Sorry for the noise [12:58:46] :) [12:59:09] !log Deploy schema change on s6 codfw masters (db2039) this will generate lag on s6 codfw [12:59:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:59:24] Can I deploy a quick config patch? [13:00:04] Deploy window MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180822T1300) [13:00:17] Reedy: i believe so! [13:00:17] (03PS3) 10Reedy: Make current default of MCR migration stage explicit. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454527 (https://phabricator.wikimedia.org/T197816) (owner: 10Daniel Kinzler) [13:00:20] (03CR) 10Reedy: [C: 032] Make current default of MCR migration stage explicit. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454527 (https://phabricator.wikimedia.org/T197816) (owner: 10Daniel Kinzler) [13:00:26] Reedy: oh, that one was on my list :P [13:00:29] ;) [13:00:35] Daniel pinged me in -core [13:00:39] #jfdi [13:01:23] :D [13:01:47] (03PS1) 10Addshore: BETA: wmgWikibaseMaxItemIdForNewItemIdHtmlFormatter 1000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454538 [13:01:56] (03Merged) 10jenkins-bot: Make current default of MCR migration stage explicit. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454527 (https://phabricator.wikimedia.org/T197816) (owner: 10Daniel Kinzler) [13:06:29] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10elukey) For the moment I captured only these flows: ``` elukey@stat1005:~$ grep https ipv6_after_changes.log| while read line; do endp... [13:07:20] !log reedy@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Set default for wgMultiContentRevisionSchemaMigrationStage T197816 (duration: 00m 56s) [13:07:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:07:25] T197816: Enable MCR migration stage "write both, read old" on live systems - https://phabricator.wikimedia.org/T197816 [13:09:50] (03PS2) 10Addshore: BETA: wmgWikibaseMaxItemIdForNewItemIdHtmlFormatter 1000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454538 [13:09:53] (03CR) 10Addshore: [C: 032] BETA: wmgWikibaseMaxItemIdForNewItemIdHtmlFormatter 1000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454538 (owner: 10Addshore) [13:11:14] (03Merged) 10jenkins-bot: BETA: wmgWikibaseMaxItemIdForNewItemIdHtmlFormatter 1000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454538 (owner: 10Addshore) [13:14:25] !log addshore@deploy1001 sync-file aborted: BETA ONLY (1 wikidata patche) (duration: 00m 00s) [13:14:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:14:39] typo [13:15:27] !log addshore@deploy1001 Synchronized wmf-config: BETA ONLY (1 wikidata patch) (duration: 00m 56s) [13:15:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:12] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [13:25:30] (03CR) 10jenkins-bot: Make current default of MCR migration stage explicit. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454527 (https://phabricator.wikimedia.org/T197816) (owner: 10Daniel Kinzler) [13:25:32] (03CR) 10jenkins-bot: BETA: wmgWikibaseMaxItemIdForNewItemIdHtmlFormatter 1000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454538 (owner: 10Addshore) [13:26:09] (03CR) 10Ottomata: [C: 031] ":)" [puppet] - 10https://gerrit.wikimedia.org/r/454511 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [13:26:23] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [13:27:47] (03PS1) 10Ladsgroup: mediawiki: Remove unneeded file decleration on wikidata maintenance script [puppet] - 10https://gerrit.wikimedia.org/r/454543 [13:38:26] (03PS3) 10Elukey: Assign role::archiva to archiva1001 [puppet] - 10https://gerrit.wikimedia.org/r/454511 (https://phabricator.wikimedia.org/T192639) [13:39:00] (03PS1) 10Muehlenhoff: Extend Imagemagick policy file to disable Postscript/PDF [puppet] - 10https://gerrit.wikimedia.org/r/454544 [13:39:05] (03CR) 10Elukey: [C: 032] Assign role::archiva to archiva1001 [puppet] - 10https://gerrit.wikimedia.org/r/454511 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [13:44:05] (03PS1) 10Ladsgroup: wikilabels: Add zlib1g-dev package and cronjob to remove expired tasks [puppet] - 10https://gerrit.wikimedia.org/r/454546 (https://phabricator.wikimedia.org/T168478) [13:47:22] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: Rack/Setup frbast2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T196417 (10Jgreen) [13:48:04] (03CR) 10Volans: [C: 032] Add MediaWiki module to manipulate its config [software/spicerack] - 10https://gerrit.wikimedia.org/r/454290 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:48:23] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: Rack/Setup frbast2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T196417 (10Jgreen) [13:49:32] (03Merged) 10jenkins-bot: Add MediaWiki module to manipulate its config [software/spicerack] - 10https://gerrit.wikimedia.org/r/454290 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:49:56] 10Operations, 10ops-codfw, 10fundraising-tech-ops: decom rigel.frack.codfw.wmnet - https://phabricator.wikimedia.org/T202535 (10Jgreen) p:05Triage>03Normal [13:50:00] (03CR) 10Volans: [C: 032] dnsdisc: replace retry logic with decorator [software/spicerack] - 10https://gerrit.wikimedia.org/r/454334 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:50:43] 10Operations, 10ops-codfw, 10fundraising-tech-ops: decom rigel.frack.codfw.wmnet - https://phabricator.wikimedia.org/T202535 (10Jgreen) [13:51:30] (03Merged) 10jenkins-bot: dnsdisc: replace retry logic with decorator [software/spicerack] - 10https://gerrit.wikimedia.org/r/454334 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:51:48] (03CR) 10Volans: [C: 032] spicerack: expose the IRC logger [software/spicerack] - 10https://gerrit.wikimedia.org/r/454499 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:52:03] PROBLEM - Filesystem available is greater than filesystem size on ms-be2042 is CRITICAL: cluster=swift device=/dev/sdn1 fstype=xfs instance=ms-be2042:9100 job=node mountpoint=/srv/swift-storage/sdn1 site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2042&var-datasource=codfw%2520prometheus%252Fops [13:52:12] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [13:52:47] (03Merged) 10jenkins-bot: spicerack: expose the IRC logger [software/spicerack] - 10https://gerrit.wikimedia.org/r/454499 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:52:52] (03PS20) 10Gehel: Switch elasticsearch to use tlsproxy module [puppet] - 10https://gerrit.wikimedia.org/r/444610 (https://phabricator.wikimedia.org/T198351) (owner: 10EBernhardson) [13:52:53] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [13:52:59] (03CR) 10Volans: [C: 032] cookbook: allow to specify the global config path [software/spicerack] - 10https://gerrit.wikimedia.org/r/454500 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:53:15] 10Operations, 10fundraising-tech-ops, 10netops: adjust NAT for 208.80.152.231 (codfw bastion) to point to frbast2001 (10.195.0.67) - https://phabricator.wikimedia.org/T202536 (10Jgreen) p:05Triage>03Normal [13:53:51] (03PS1) 10Elukey: profile::archiva::proxy: fix ferm srange [puppet] - 10https://gerrit.wikimedia.org/r/454548 (https://phabricator.wikimedia.org/T192639) [13:54:01] (03Merged) 10jenkins-bot: cookbook: allow to specify the global config path [software/spicerack] - 10https://gerrit.wikimedia.org/r/454500 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:54:35] (03CR) 10Elukey: [C: 032] profile::archiva::proxy: fix ferm srange [puppet] - 10https://gerrit.wikimedia.org/r/454548 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [13:54:36] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: Rack/Setup frbast2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T196417 (10Jgreen) [13:57:23] PROBLEM - Filesystem available is greater than filesystem size on ms-be2040 is CRITICAL: cluster=swift device=/dev/sdc1 fstype=xfs instance=ms-be2040:9100 job=node mountpoint=/srv/swift-storage/sdc1 site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2040&var-datasource=codfw%2520prometheus%252Fops [14:00:00] 10Operations, 10Analytics, 10Analytics-Kanban: Move internal sites hosted on thorium to ganeti instance(s) - https://phabricator.wikimedia.org/T202011 (10Ottomata) Thanks! [14:03:09] !log installing openssl updates [14:03:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:04:33] (03PS4) 10Jcrespo: Revert "mariadb: Depool es1011 for reimage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454489 [14:07:47] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool es1011 for reimage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454489 (owner: 10Jcrespo) [14:08:49] 10Operations, 10fundraising-tech-ops, 10netops: adjust NAT for 208.80.152.231 (codfw bastion) to point to frbast2001 (10.195.0.67) - https://phabricator.wikimedia.org/T202536 (10Jgreen) [14:09:20] (03Merged) 10jenkins-bot: Revert "mariadb: Depool es1011 for reimage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454489 (owner: 10Jcrespo) [14:09:59] (03CR) 10Marostegui: [C: 031] "let me know if you want me to merge this for you." [puppet] - 10https://gerrit.wikimedia.org/r/454543 (owner: 10Ladsgroup) [14:11:11] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool es1011 (duration: 00m 55s) [14:11:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:09] (03CR) 10jenkins-bot: Revert "mariadb: Depool es1011 for reimage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454489 (owner: 10Jcrespo) [14:13:05] (03PS1) 10ArielGlenn: tar up dumps status files for rsync for each back end in turn [puppet] - 10https://gerrit.wikimedia.org/r/454549 (https://phabricator.wikimedia.org/T202482) [14:13:14] (03CR) 10Marostegui: mediawiki: Remove unneeded file decleration on wikidata maintenance script (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454543 (owner: 10Ladsgroup) [14:14:17] 10Operations, 10Wikimedia-Mailing-lists, 10Chinese-Sites: Create mailing list for Bureaucrat of zh.wikipedia - https://phabricator.wikimedia.org/T202435 (10Wong128hk) * The requested name of the mailing list is wikizh-bureaucrats@lists.wikimedia.org * The purpose for creating the mailing list is to provide t... [14:15:48] (03PS1) 10Elukey: Release 2.2.3-2 [debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/454551 (https://phabricator.wikimedia.org/T192639) [14:16:05] (03PS1) 10Ottomata: Include turnilo on analytics-tool1002 [puppet] - 10https://gerrit.wikimedia.org/r/454552 (https://phabricator.wikimedia.org/T202011) [14:24:48] !log Deploy schema change on dbstore1002 [14:24:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:26:49] (03CR) 10Imarlier: [C: 031] PHP: create module for modern Debian-based distributions (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/452664 (https://phabricator.wikimedia.org/T201140) (owner: 10Giuseppe Lavagetto) [14:27:24] (03PS1) 10Ladsgroup: mediawiki: make rebuildTermSqlIndex.log absent [puppet] - 10https://gerrit.wikimedia.org/r/454553 [14:27:57] (03CR) 10Ladsgroup: "Yup, let's do it: Iffc239dacbbb7816d102ea26ce90858fbe2958b3" [puppet] - 10https://gerrit.wikimedia.org/r/454543 (owner: 10Ladsgroup) [14:29:56] (03CR) 10Imarlier: [C: 031] mediawiki: move php to a profile, use the php class [puppet] - 10https://gerrit.wikimedia.org/r/453093 (https://phabricator.wikimedia.org/T201140) (owner: 10Giuseppe Lavagetto) [14:31:46] (03CR) 10Muehlenhoff: [C: 031] "Looks good" [debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/454551 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [14:35:16] (03CR) 10Imarlier: [C: 031] php: add service management for php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/454478 (https://phabricator.wikimedia.org/T201140) (owner: 10Giuseppe Lavagetto) [14:35:19] (03PS2) 10Elukey: Release 2.2.3-2 [debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/454551 (https://phabricator.wikimedia.org/T192639) [14:36:09] (03CR) 10Marostegui: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/12167/" [puppet] - 10https://gerrit.wikimedia.org/r/454553 (owner: 10Ladsgroup) [14:38:53] (03CR) 10Muehlenhoff: [C: 031] PHP: create module for modern Debian-based distributions (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/452664 (https://phabricator.wikimedia.org/T201140) (owner: 10Giuseppe Lavagetto) [14:42:35] (03PS1) 10Volans: Add README [cookbooks] - 10https://gerrit.wikimedia.org/r/454559 (https://phabricator.wikimedia.org/T199079) [14:43:41] (03CR) 10Elukey: [C: 032] Release 2.2.3-2 [debs/archiva] (debian) - 10https://gerrit.wikimedia.org/r/454551 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [14:46:41] (03CR) 10Giuseppe Lavagetto: PHP: create module for modern Debian-based distributions (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/452664 (https://phabricator.wikimedia.org/T201140) (owner: 10Giuseppe Lavagetto) [14:50:18] <_joe_> /win 24 [14:55:24] (03PS1) 10Mforns: Send an alert email when EventLoggingSanitization job fails [puppet] - 10https://gerrit.wikimedia.org/r/454562 (https://phabricator.wikimedia.org/T202429) [14:55:34] !log load codfw-1534946573 on pfw3-codfw - T202537 [14:55:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:08] !log update pcc's facts [14:56:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:15] (03CR) 10jerkins-bot: [V: 04-1] Send an alert email when EventLoggingSanitization job fails [puppet] - 10https://gerrit.wikimedia.org/r/454562 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [14:57:02] !log load eqiad-1534946573 on pfw3-eqiad - T202537 [14:57:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:27] !log upload archiva 2.2.3-2 to stretch-wikimedia [14:57:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:58:14] 10Operations, 10fundraising-tech-ops, 10netops: adjust NAT for 208.80.152.231 (codfw bastion) to point to frbast2001 (10.195.0.67) - https://phabricator.wikimedia.org/T202536 (10ayounsi) [15:01:24] (03CR) 10Muehlenhoff: [C: 031] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/452664 (https://phabricator.wikimedia.org/T201140) (owner: 10Giuseppe Lavagetto) [15:04:24] (03CR) 10Dzahn: [C: 031] Remove jessie-specific code from mediawiki::packages::tex, cleanups [puppet] - 10https://gerrit.wikimedia.org/r/449674 (owner: 10Muehlenhoff) [15:04:48] (03CR) 10Ottomata: "Let's make a wrapper script for this command and deploy it into /usr/local/bin, simliar to how the refiney_job.pp define does it." [puppet] - 10https://gerrit.wikimedia.org/r/454562 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [15:05:35] (03PS3) 10Dzahn: Remove jessie-specific code from mediawiki::packages::tex, cleanups [puppet] - 10https://gerrit.wikimedia.org/r/449674 (owner: 10Muehlenhoff) [15:12:00] (03CR) 10Dzahn: [C: 032] "changes in compiler, but that's normal because we switch to require_package: http://puppet-compiler.wmflabs.org/12174/mw1261.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/449674 (owner: 10Muehlenhoff) [15:15:58] (03PS1) 10Jon Harald Søby: Enable arbitrary Wikidata access for oldwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454570 (https://phabricator.wikimedia.org/T202543) [15:19:18] 10Operations, 10ops-eqiad: Degraded RAID on labvirt1019 - https://phabricator.wikimedia.org/T196507 (10Cmjohnson) a:05Cmjohnson>03RobH @robh I cannot download the service pack to upgrade the firmware for this server. Can you please try and also reach out to our rep and link my HPE account with all of our s... [15:19:57] (03PS3) 10Dzahn: Inline mediawiki::packages::tex [puppet] - 10https://gerrit.wikimedia.org/r/449675 (owner: 10Muehlenhoff) [15:22:16] (03PS2) 10Ottomata: Include turnilo on analytics-tool1002 [puppet] - 10https://gerrit.wikimedia.org/r/454552 (https://phabricator.wikimedia.org/T202011) [15:24:18] 10Operations, 10SRE-Access-Requests: Requesting access to restricted production access for Bill Pirkle - https://phabricator.wikimedia.org/T202546 (10BPirkle) [15:27:39] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/12176/mwdebug1001.eqiad.wmnet/ also unblocked and rebased now" [puppet] - 10https://gerrit.wikimedia.org/r/449675 (owner: 10Muehlenhoff) [15:31:40] (03PS19) 10Bstorm: WIP toolforge: write/move a sonofgridengine module and toolforge profile [puppet] - 10https://gerrit.wikimedia.org/r/448791 (https://phabricator.wikimedia.org/T200557) [15:41:14] 10Operations, 10ops-eqiad, 10DC-Ops: Replace wtp1043's sda - https://phabricator.wikimedia.org/T196886 (10Cmjohnson) Another dispatch was created.. SR978583381 [15:41:28] 10Operations, 10Cloud-Services, 10DBA, 10Patch-For-Review: m5-master overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 (10Marostegui) @Bstorm are you planning to apply the final tweaks to nova as mentioned at T188589#4516087 to reduce nova's amount of connectio... [15:43:34] (03CR) 10Dzahn: zuul: base::service_unit -> systemd::service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/434427 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [15:43:44] (03PS4) 10Dzahn: zuul: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/434427 (https://phabricator.wikimedia.org/T194724) [15:44:11] (03PS5) 10Dzahn: zuul: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/434427 (https://phabricator.wikimedia.org/T194724) [15:46:26] (03PS1) 10Ppchelko: Switch services to MW connection to https. [puppet] - 10https://gerrit.wikimedia.org/r/454574 [15:46:31] 10Operations, 10Cloud-Services, 10DBA, 10Patch-For-Review: m5-master overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 (10Bstorm) I've tried some already! I think there's somewhere else I might need to look. [15:47:12] (03PS6) 10Dzahn: zuul: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/434427 (https://phabricator.wikimedia.org/T194724) [15:47:41] (03CR) 10Dzahn: "changed to "restart false" for both zuul and zuul-merger" [puppet] - 10https://gerrit.wikimedia.org/r/434427 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [15:48:05] (03PS1) 10Elukey: site.pp: add a note about archiva1001's status [puppet] - 10https://gerrit.wikimedia.org/r/454576 (https://phabricator.wikimedia.org/T192639) [15:48:45] (03CR) 10Elukey: [C: 032] site.pp: add a note about archiva1001's status [puppet] - 10https://gerrit.wikimedia.org/r/454576 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [15:49:14] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: check cabling/config for payments1004 DRAC interface - https://phabricator.wikimedia.org/T202439 (10Cmjohnson) Performed a hard reset [15:50:02] (03CR) 10Dzahn: [C: 04-1] "_joe_: any thoughts on how we replace apache::conf using "vars =>" with httpd::conf?" [puppet] - 10https://gerrit.wikimedia.org/r/451821 (owner: 10Dzahn) [15:50:33] (03CR) 10Ottomata: "https://puppet-compiler.wmflabs.org/compiler03/12177/analytics-tool1002.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/454552 (https://phabricator.wikimedia.org/T202011) (owner: 10Ottomata) [15:50:46] (03CR) 10Ottomata: [C: 032] Include turnilo on analytics-tool1002 [puppet] - 10https://gerrit.wikimedia.org/r/454552 (https://phabricator.wikimedia.org/T202011) (owner: 10Ottomata) [15:50:48] (03PS3) 10Ottomata: Include turnilo on analytics-tool1002 [puppet] - 10https://gerrit.wikimedia.org/r/454552 (https://phabricator.wikimedia.org/T202011) [15:50:50] (03CR) 10Ottomata: [V: 032 C: 032] Include turnilo on analytics-tool1002 [puppet] - 10https://gerrit.wikimedia.org/r/454552 (https://phabricator.wikimedia.org/T202011) (owner: 10Ottomata) [15:50:53] (03CR) 10Dzahn: [C: 04-1] "eh, i meant "apache::env" with "vars =>" replaced by "httpd::conf"" [puppet] - 10https://gerrit.wikimedia.org/r/451821 (owner: 10Dzahn) [15:51:05] 10Operations, 10SRE-Access-Requests, 10User-Addshore: Requesting Access to view EventLogging data for gabriel-wmde / gbirke - https://phabricator.wikimedia.org/T202072 (10gabriel-wmde) {F25224449} This is my public SSH key [15:53:30] 10Operations, 10ops-eqiad, 10DC-Ops, 10Fundraising-Backlog, 10fundraising-tech-ops: Decom tellurium - https://phabricator.wikimedia.org/T194408 (10Cmjohnson) 05Open>03Resolved Disk is wiped and server removed from the rack. No DNS entries exist [15:54:27] 10Operations, 10Cloud-Services, 10DBA, 10Patch-For-Review: m5-master overloaded by idle connections to the nova database - https://phabricator.wikimedia.org/T188589 (10Marostegui) Ah right! Thanks for the heads up, I wasn't aware :-) [15:54:53] !log Update NAT for 208.80.152.231 - T202536 [15:54:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:54:58] T202536: adjust NAT for 208.80.152.231 (codfw bastion) to point to frbast2001 (10.195.0.67) - https://phabricator.wikimedia.org/T202536 [15:55:50] (03PS14) 10Vgutierrez: Refactor certcentral.certificate_management() [software/certcentral] - 10https://gerrit.wikimedia.org/r/451867 [15:55:52] (03PS5) 10Vgutierrez: Implement different Certificate.save() modes [software/certcentral] - 10https://gerrit.wikimedia.org/r/453124 [15:55:54] (03PS6) 10Vgutierrez: Certcentral integration tests [software/certcentral] - 10https://gerrit.wikimedia.org/r/454045 [15:56:09] 10Operations, 10ops-codfw, 10fundraising-tech-ops: decom rigel.frack.codfw.wmnet - https://phabricator.wikimedia.org/T202535 (10ayounsi) [15:56:13] 10Operations, 10fundraising-tech-ops, 10netops: adjust NAT for 208.80.152.231 (codfw bastion) to point to frbast2001 (10.195.0.67) - https://phabricator.wikimedia.org/T202536 (10ayounsi) 05Open>03Resolved [15:58:14] (03PS2) 10Dzahn: puppetmaster: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/451821 [15:59:31] (03PS1) 10Ladsgroup: Update links to github repos of scoring platform team [puppet] - 10https://gerrit.wikimedia.org/r/454577 [16:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Your horoscope predicts another unfortunate Morning SWAT (Max 6 patches) deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180822T1600). [16:00:04] No GERRIT patches in the queue for this window AFAICS. [16:01:16] (03PS1) 10RobH: adding user Samuel Guebo to admin module [puppet] - 10https://gerrit.wikimedia.org/r/454578 (https://phabricator.wikimedia.org/T202362) [16:01:35] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Samuel Guebo - https://phabricator.wikimedia.org/T202362 (10RobH) [16:01:48] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Samuel Guebo - https://phabricator.wikimedia.org/T202362 (10RobH) a:05sguebo_WMF>03None [16:01:48] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [16:02:00] !log add static NAT for 208.80.155.15 - T202520 [16:02:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:02:05] T202520: set up NAT from 208.80.155.15 to frpig1001 - https://phabricator.wikimedia.org/T202520 [16:02:07] (03PS1) 10Marostegui: db-eqiad.php: Depool db1096:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454579 [16:03:00] (03PS2) 10Ladsgroup: Update links to github repos of scoring platform team [puppet] - 10https://gerrit.wikimedia.org/r/454577 (https://phabricator.wikimedia.org/T194212) [16:03:48] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [16:03:50] (03PS1) 10RobH: adding user Samuel Guebo to admin module [puppet] - 10https://gerrit.wikimedia.org/r/454581 (https://phabricator.wikimedia.org/T202362) [16:03:57] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1096:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454579 (owner: 10Marostegui) [16:04:19] 10Operations, 10netops: set up NAT from 208.80.155.15 to frpig1001 - https://phabricator.wikimedia.org/T202520 (10ayounsi) 05Open>03Resolved Done. ``` $ nc -zv 208.80.155.15 443 Connection to 208.80.155.15 443 port [tcp/https] succeeded! ``` [16:04:34] (03PS1) 10Jcrespo: Add s8 section to the list of databases [switchdc] - 10https://gerrit.wikimedia.org/r/454583 [16:04:37] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install labstore1008 & labstore1009 - https://phabricator.wikimedia.org/T193655 (10Cmjohnson) a:03RobH @robh Can you help with the installs please. right now labstore1008 will not work because it's o... [16:04:39] (03PS2) 10RobH: adding user Samuel Guebo to groups in the admin module [puppet] - 10https://gerrit.wikimedia.org/r/454581 (https://phabricator.wikimedia.org/T202362) [16:04:54] (03CR) 10jerkins-bot: [V: 04-1] Add s8 section to the list of databases [switchdc] - 10https://gerrit.wikimedia.org/r/454583 (owner: 10Jcrespo) [16:05:13] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Samuel Guebo - https://phabricator.wikimedia.org/T202362 (10RobH) [16:05:25] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1096:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454579 (owner: 10Marostegui) [16:05:48] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Samuel Guebo - https://phabricator.wikimedia.org/T202362 (10RobH) [16:06:34] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1096:3316 (duration: 00m 55s) [16:06:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:06:40] !log Deploy schema change on db1096:3316 [16:06:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:57] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [16:10:39] (03CR) 10Jcrespo: "Yeah, the cumin calls for discovery are not working, that needs work. Or be converted to calls to the zarcillo database." [switchdc] - 10https://gerrit.wikimedia.org/r/454583 (owner: 10Jcrespo) [16:11:26] (03CR) 10Volans: [C: 04-2] "This software will *not* be used for the switch of the datacenter. The new "stages" will be moved as cookbooks in the new repository for t" [switchdc] - 10https://gerrit.wikimedia.org/r/454583 (owner: 10Jcrespo) [16:11:57] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [16:12:46] (03PS2) 10Jcrespo: Add s8 section to the list of databases [switchdc] - 10https://gerrit.wikimedia.org/r/454583 (https://phabricator.wikimedia.org/T199079) [16:13:36] (03CR) 10Jcrespo: "Yes, I created this as a reminder, as I saw no code on on the other related that I could file a TODO to (this explains all changes needed)" [switchdc] - 10https://gerrit.wikimedia.org/r/454583 (https://phabricator.wikimedia.org/T199079) (owner: 10Jcrespo) [16:13:41] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: check cabling/config for payments1004 DRAC interface - https://phabricator.wikimedia.org/T202439 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson Replaced the cable. all is well [16:13:56] (03CR) 10jerkins-bot: [V: 04-1] Add s8 section to the list of databases [switchdc] - 10https://gerrit.wikimedia.org/r/454583 (https://phabricator.wikimedia.org/T199079) (owner: 10Jcrespo) [16:14:01] jynus: it's part of the etherpad we were using for the meeting [16:14:06] line 72 [16:14:12] and 87 [16:14:12] le tme see [16:14:32] see, there are more changes [16:14:39] please add a note for the zarcillo one [16:14:42] it was a good reminder [16:15:57] I intend to help [16:16:05] I can do those once you port the existing ones [16:16:50] those meetings were supposed to do exactly that, go over the list and check the changes needed since last time :) [16:17:21] yes, but I figured these right now [16:17:33] actually, zarcillo suportes natively multi-dc [16:19:39] I have added it to the etherpad and will abandon 454583 when we test those working [16:19:55] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1096:3316 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454579 (owner: 10Marostegui) [16:20:05] akc [16:20:07] *ack [16:20:18] thanks for checking [16:21:09] !log prometheus-trafficserver-exporter 0.0.2-1 uploaded to stretch-wikimedia T202381 [16:21:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:21:14] T202381: Traffic Server - Prometheus integration - https://phabricator.wikimedia.org/T202381 [16:23:18] (03CR) 10Jcrespo: [C: 04-2] "Not for deployment, just a TODO." [switchdc] - 10https://gerrit.wikimedia.org/r/454583 (https://phabricator.wikimedia.org/T199079) (owner: 10Jcrespo) [16:26:08] 10Operations, 10Maps, 10Maps-Sprint, 10Reading-Infrastructure-Team-Backlog, 10Traffic: Decide on Cache-Control headers for map tiles - https://phabricator.wikimedia.org/T186732 (10Gehel) for reference, the mediawiki implementation of cache invalidation: https://github.com/wikimedia/mediawiki/blob/0ac1ee6... [16:26:27] (03PS1) 10Jgreen: Simplify frbast* A/PTR/CNAME scheme, add payments-listener.frdev.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/454587 (https://phabricator.wikimedia.org/T196417) [16:26:47] (03CR) 10Bartosz Dziewoński: [C: 031] Modify gender namespaces for pl.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454213 (https://phabricator.wikimedia.org/T202347) (owner: 10MarcoAurelio) [16:28:54] 10Operations, 10Wikimedia-Mailing-lists, 10Chinese-Sites: Create mailing list for Bureaucrat of zh.wikipedia - https://phabricator.wikimedia.org/T202435 (10MarcoAurelio) [16:36:08] 10Operations, 10Wikimedia-Mailing-lists: Wikimedia Community User Group Albania mailing list request - https://phabricator.wikimedia.org/T201670 (10Sidorela) @Aklapper no we are not on the same IP, noone was able to register yet. @Dzahn yes I tried from the subscriber form there, from different PCs and diff... [16:38:58] 10Operations, 10Wikimedia-Mailing-lists: Wikimedia Community User Group Albania mailing list request - https://phabricator.wikimedia.org/T201670 (10Dzahn) @herron Can you think of any antivandalism measures triggering this? Maybe it affects all of a large Albanian provider? [16:41:08] (03PS1) 10DCausse: [cirrus] Increase number of shards for wikidata content and commons file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454588 [16:47:14] 10Operations, 10CommRel-Specialists-Support (Jul-Sep-2018), 10User-Johan: Community Relations support for the 2018 data center switchover - https://phabricator.wikimedia.org/T199676 (10akosiaris) [16:47:59] !log repair on ms-be2020 sdh/sdc - T199198 [16:48:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:48:05] T199198: Some swift filesystems reporting negative disk usage - https://phabricator.wikimedia.org/T199198 [16:48:10] 10Operations, 10User-notice: 2018 data center switchover: Move all the things over to codfw - https://phabricator.wikimedia.org/T200022 (10akosiaris) FYI, the definite schedule for this is: **Switchover** ``` Services: Tuesday, September 11th 2018 14:30 UTC Media storage/Swift: Tuesday, September 11th 2018 15... [16:48:52] (03PS2) 10Mforns: Send an alert email when EventLoggingSanitization job fails [puppet] - 10https://gerrit.wikimedia.org/r/454562 (https://phabricator.wikimedia.org/T202429) [16:49:59] (03CR) 10Mforns: "Not sure if that was your suggestion Andrew, lmk" [puppet] - 10https://gerrit.wikimedia.org/r/454562 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [16:51:07] (03PS1) 10MarcoAurelio: Require autoconfirmed status to edit the 828 namespace at es.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454590 (https://phabricator.wikimedia.org/T202555) [16:51:11] !log correction, ms-be2040 - T199198 [16:51:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:51:33] !log repair on ms-be2042 sdk/sdn - T199198 [16:51:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:16] (03CR) 10Mforns: Send an alert email when EventLoggingSanitization job fails (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454562 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [16:54:07] (03CR) 10Filippo Giunchedi: mariadb: Capture connection error exceptions (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454509 (https://phabricator.wikimedia.org/T198987) (owner: 10Jcrespo) [16:54:13] jouncebot: next [16:54:13] In 2 hour(s) and 5 minute(s): MediaWiki train - Americas version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180822T1900) [16:54:15] (03PS1) 10Dzahn: conftool/client: rm 'obsolete distribution check in ubuntu <= trusty' [puppet] - 10https://gerrit.wikimedia.org/r/454592 [16:55:01] damn, I arrive 6 minutes before the finishing of the swat [16:55:07] :) [16:56:35] (03CR) 10Dzahn: [C: 04-1] "works on backend, fails on frontend: Duplicate declaration: File[/etc/apache2/mods-available/status.conf] is already declared" [puppet] - 10https://gerrit.wikimedia.org/r/451821 (owner: 10Dzahn) [16:56:46] jouncebot: now [16:56:46] For the next 0 hour(s) and 3 minute(s): Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180822T1600) [16:57:10] (03PS3) 10Mforns: Send an alert email when EventLoggingSanitization job fails [puppet] - 10https://gerrit.wikimedia.org/r/454562 (https://phabricator.wikimedia.org/T202429) [16:57:38] RECOVERY - Filesystem available is greater than filesystem size on ms-be2040 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2040&var-datasource=codfw%2520prometheus%252Fops [16:58:26] !log start backfilling metrics from graphite1001 into graphite1004 - T196484 [16:58:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:58:31] T196484: rack/setup/install graphite1004 - https://phabricator.wikimedia.org/T196484 [16:59:56] (03PS4) 10Mforns: Send an alert email when EventLoggingSanitization job fails [puppet] - 10https://gerrit.wikimedia.org/r/454562 (https://phabricator.wikimedia.org/T202429) [17:00:40] (03CR) 10jerkins-bot: [V: 04-1] Send an alert email when EventLoggingSanitization job fails [puppet] - 10https://gerrit.wikimedia.org/r/454562 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [17:00:56] (03PS5) 10Mforns: Send an alert email when EventLoggingSanitization job fails [puppet] - 10https://gerrit.wikimedia.org/r/454562 (https://phabricator.wikimedia.org/T202429) [17:01:17] mutante: the timezones :) thankfully it can wait [17:04:11] 10Operations, 10ops-eqiad, 10decommission, 10netops: unrack/decom pfw1-eqiad and pfw2-eqiad - https://phabricator.wikimedia.org/T183390 (10ayounsi) [17:06:06] Anyone deploying? There's a train blocker I'd be keen to sling out. [17:07:49] (03CR) 10Mobrovac: [C: 031] Switch services to MW connection to https. [puppet] - 10https://gerrit.wikimedia.org/r/454574 (owner: 10Ppchelko) [17:08:51] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) [17:10:06] James_F: You can deploy now! [17:10:22] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Harej) [17:10:28] (03CR) 10Ottomata: "Looks good! A nit about naming (can't have an otto review without one :) )" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454562 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [17:10:30] Kk. [17:10:44] James_F: I can do it if you like. :) [17:10:57] Niharika: I'll do it, no worries. [17:11:21] (Y) [17:12:18] 10Operations, 10Analytics: Allow ganeti instance inside of the Analytics VLAN; move analytics-tool* to it and change IPs. - https://phabricator.wikimedia.org/T202559 (10Ottomata) p:05Triage>03Normal [17:15:57] PROBLEM - High lag on wdqs1003 is CRITICAL: 3605 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [17:16:46] (03CR) 10Jgreen: [C: 032] Simplify frbast* A/PTR/CNAME scheme, add payments-listener.frdev.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/454587 (https://phabricator.wikimedia.org/T196417) (owner: 10Jgreen) [17:19:08] !log authdns-update to deploy commit If54277153b6 [17:19:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:20:41] !log ppchelko@deploy1001 Started deploy [cpjobqueue/deploy@32a81be]: Revisit jobs concurrencies T202107 [17:20:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:20:46] T202107: Job queue should not overload the DB servers when there is replication lag - https://phabricator.wikimedia.org/T202107 [17:21:15] 10Operations, 10ops-eqsin, 10Traffic: cp5001 unreachable since 2018-07-14 17:49:21 - https://phabricator.wikimedia.org/T199675 (10RobH) Email to EQ: > Shaun, > > Our Equinix portal lists you as our account rep for SG3, so I'm hoping you can assist me in a recent issue we're having. > > We have a defecti... [17:21:23] 10Operations, 10ops-eqsin, 10Traffic: cp5001 unreachable since 2018-07-14 17:49:21 - https://phabricator.wikimedia.org/T199675 (10RobH) a:03RobH [17:21:30] !log ppchelko@deploy1001 Finished deploy [cpjobqueue/deploy@32a81be]: Revisit jobs concurrencies T202107 (duration: 00m 49s) [17:21:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:24:14] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10Cmjohnson) Moved cloudvirt1023 to B1 ports ge-1/0/8 and ge-1/0/10 and cloudvirt1024 to B8 ports ge-8/0/22 and 8/0/23. BIOS will need updating to enable... [17:27:00] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10RobH) a:05Cmjohnson>03RobH [17:27:44] (03PS6) 10Mforns: Send an alert email when EventLoggingSanitization job fails [puppet] - 10https://gerrit.wikimedia.org/r/454562 (https://phabricator.wikimedia.org/T202429) [17:31:26] (03CR) 10Mforns: Send an alert email when EventLoggingSanitization job fails (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454562 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [17:31:37] !log jforrester@deploy1001 Synchronized php-1.32.0-wmf.18/extensions/JsonConfig/includes/JCSingleton.php: T202469 fix UBN trainblocker for wmf.18 (duration: 00m 56s) [17:31:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:31:42] T202469: jsonconfig-license-notice-license-unset showing up on non-tabular data pages - https://phabricator.wikimedia.org/T202469 [17:32:22] (03PS1) 10Jgreen: monitoring swap: frbast2001.frack.codfw.wmnet replaces rigel.frack.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/454597 (https://phabricator.wikimedia.org/T196417) [17:32:50] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: Rack/Setup frbast2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T196417 (10Jgreen) [17:33:35] (03CR) 10Jgreen: [C: 032] monitoring swap: frbast2001.frack.codfw.wmnet replaces rigel.frack.codfw.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/454597 (https://phabricator.wikimedia.org/T196417) (owner: 10Jgreen) [17:34:52] OK, all looks good, I give up the conch. [17:38:42] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: Rack/Setup frbast2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T196417 (10Jgreen) [17:42:29] anybody knows why wikidata is so slow today? getting a lot of timeouts from recentchanges API [17:43:25] 10Operations, 10ops-eqiad: Degraded RAID on labvirt1019 - https://phabricator.wikimedia.org/T196507 (10RobH) I'm currently downloading this, and will shove it into a directory on install1002 when done. This SPP can be used on all of this gen HP systems as well. [17:43:56] 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: mjolnir-kafka-bulk-daemon failed on all elastic / eqiad nodes - https://phabricator.wikimedia.org/T202120 (10EBernhardson) A full export ran over the daemon from 8/20 13:24 to 8/22 02:00 without triggering this issue again. I think it can b... [17:45:51] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@6fdae90]: updates to support msearch daemon [17:45:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:50:07] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@6fdae90]: updates to support msearch daemon (duration: 04m 16s) [17:50:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:52:18] RECOVERY - Filesystem available is greater than filesystem size on ms-be2042 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2042&var-datasource=codfw%2520prometheus%252Fops [17:52:46] 10Operations, 10SRE-Access-Requests: Access to restbase servers (including sudo) - https://phabricator.wikimedia.org/T202563 (10MarcoAurelio) [17:53:03] 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 10HTTPS: https://sv.wikipedia.beta.wmflabs.org/ has invalid certificate - https://phabricator.wikimedia.org/T202564 (10matmarex) >>! In T191184#4523999, @Arlolra wrote: >> Host: sv.wikipedia.beta.wmflabs.org. is not in the cert > > ``` > ssh deployme... [17:53:23] (03CR) 10Smalyshev: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/447922 (https://phabricator.wikimedia.org/T144103) (owner: 10Smalyshev) [17:56:03] 10Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 10HTTPS: https://sv.wikipedia.beta.wmflabs.org/ has invalid certificate - https://phabricator.wikimedia.org/T202564 (10Krenair) Pretty much the same thing as {T199387} [17:57:23] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: decom rigel.frack.codfw.wmnet - https://phabricator.wikimedia.org/T202535 (10Jgreen) [17:57:53] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: decom rigel.frack.codfw.wmnet - https://phabricator.wikimedia.org/T202535 (10Jgreen) [18:00:56] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: decom rigel.frack.codfw.wmnet - https://phabricator.wikimedia.org/T202535 (10Jgreen) a:05Jgreen>03None @papaul this host is ready for decommissioning [18:02:37] 10Operations, 10SRE-Access-Requests: Phabricator: Allow aklapper to delete personal Herald filter rules - https://phabricator.wikimedia.org/T202503 (10Aklapper) @MarcoAurelio: Basically https://www.mediawiki.org/w/index.php?title=Phabricator%2FHelp%2FHerald_Rules&type=revision&diff=2837388&oldid=2647356 It's n... [18:07:02] 10Operations, 10Puppet, 10Wikidata, 10Wikidata-Query-Service, 10cloud-services-team: convert cloud VPS projects from apache to httpd module (wikidata-query/ldfclient) - https://phabricator.wikimedia.org/T202092 (10Smalyshev) 05Open>03Resolved I switched it to profile::microsites::httpd, seems to be w... [18:08:41] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@ebb466b]: Add prometheus-client based metrics export [18:08:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:09:47] (03CR) 10ArielGlenn: "OK! Would you want to prep the followup commit of the cron manifest update?" [puppet] - 10https://gerrit.wikimedia.org/r/447922 (https://phabricator.wikimedia.org/T144103) (owner: 10Smalyshev) [18:09:56] 10Operations, 10Maps, 10Maps-Sprint, 10Reading-Infrastructure-Team-Backlog, 10Traffic: Decide on Cache-Control headers for map tiles - https://phabricator.wikimedia.org/T186732 (10Mholloway) a:05Pnorman>03Mholloway [18:10:17] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@ebb466b]: Add prometheus-client based metrics export (duration: 01m 36s) [18:10:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:10:25] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@00671e6]: Add prometheus-client based metrics export [18:10:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:10:36] 10Operations, 10Maps, 10Maps-Sprint, 10Traffic, 10Reading-Infrastructure-Team-Backlog (Kanban): Decide on Cache-Control headers for map tiles - https://phabricator.wikimedia.org/T186732 (10Mholloway) [18:11:48] (03PS1) 10Framawiki: Enable Wikilove extension on zh_yuewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454611 (https://phabricator.wikimedia.org/T202548) [18:14:47] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@00671e6]: Add prometheus-client based metrics export (duration: 04m 22s) [18:14:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:29] (03PS1) 10Jgreen: adjust smokeping targets for frack bastion hostname changes [puppet] - 10https://gerrit.wikimedia.org/r/454612 (https://phabricator.wikimedia.org/T202535) [18:16:33] 10Operations, 10Traffic, 10monitoring: False alarms on varnish-http-requests 70% GET drop in 30 min alert - https://phabricator.wikimedia.org/T201630 (10fgiunchedi) If we can avoid false positives I believe the alert has value, also because AIUI a traffic drop might not necessarily result in visible errors o... [18:17:24] (03CR) 10Jgreen: [C: 032] adjust smokeping targets for frack bastion hostname changes [puppet] - 10https://gerrit.wikimedia.org/r/454612 (https://phabricator.wikimedia.org/T202535) (owner: 10Jgreen) [18:17:55] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10chasemp) >>! In T199125#4521647, @Andrew wrote: > @chasemp two questions: 1) was there a reason we requested these with 10G? (Or, did we?) 2) Is it i... [18:19:08] (03PS1) 10Ayounsi: Per DC alerting on sudden traffic drop [puppet] - 10https://gerrit.wikimedia.org/r/454613 (https://phabricator.wikimedia.org/T201630) [18:25:38] (03PS1) 10Framawiki: Add mhs.ox.ac.uk to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454616 (https://phabricator.wikimedia.org/T201604) [18:28:55] (03CR) 10Framawiki: "Note that maintenance script run is needed, see https://www.mediawiki.org/wiki/Extension:WikiLove#Installation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454611 (https://phabricator.wikimedia.org/T202548) (owner: 10Framawiki) [18:29:43] (03PS2) 10Framawiki: Enable Wikilove extension on zh_yuewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454611 (https://phabricator.wikimedia.org/T202548) [18:32:36] (03CR) 10Filippo Giunchedi: "See inline, LGTM modulo TODO and copy/paste comment" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454613 (https://phabricator.wikimedia.org/T201630) (owner: 10Ayounsi) [18:34:58] (03PS1) 10Smalyshev: Use proper data types for indexing items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454620 (https://phabricator.wikimedia.org/T199884) [18:36:07] (03CR) 10jerkins-bot: [V: 04-1] Use proper data types for indexing items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454620 (https://phabricator.wikimedia.org/T199884) (owner: 10Smalyshev) [18:40:29] (03PS2) 10Ayounsi: Per DC alerting on sudden traffic drop [puppet] - 10https://gerrit.wikimedia.org/r/454613 (https://phabricator.wikimedia.org/T201630) [18:41:18] (03CR) 10Ayounsi: "> Patch Set 1:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/454613 (https://phabricator.wikimedia.org/T201630) (owner: 10Ayounsi) [18:41:25] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10RobH) Ok, switch port update diff: ``` robh@asw2-b-eqiad# show | compare [edit interfaces interface-range disabled] member ge-3/0/2 { ... } +... [18:42:06] 10Operations, 10Traffic, 10Maps (Tilerator): Tilerator should purge Varnish cache - https://phabricator.wikimedia.org/T109776 (10Mholloway) a:03Mholloway [18:42:23] 10Operations, 10Maps-Sprint, 10Traffic, 10Maps (Tilerator): Tilerator should purge Varnish cache - https://phabricator.wikimedia.org/T109776 (10Mholloway) [18:42:29] PROBLEM - swift-account-reaper on ms-be2040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [18:42:30] PROBLEM - swift-object-replicator on ms-be2040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [18:42:41] 10Operations, 10Maps-Sprint, 10Traffic, 10Maps (Tilerator), 10Reading-Infrastructure-Team-Backlog (Kanban): Tilerator should purge Varnish cache - https://phabricator.wikimedia.org/T109776 (10Mholloway) [18:42:46] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10Patch-For-Review: Rack/Setup frbast2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T196417 (10Jgreen) 05Open>03Resolved [18:42:49] PROBLEM - swift-container-server on ms-be2040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [18:42:50] PROBLEM - swift-object-auditor on ms-be2040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [18:42:59] PROBLEM - swift-account-server on ms-be2040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [18:43:00] PROBLEM - swift-container-updater on ms-be2040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [18:43:09] PROBLEM - swift-container-auditor on ms-be2040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [18:43:19] PROBLEM - swift-object-updater on ms-be2040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [18:43:29] PROBLEM - swift-account-auditor on ms-be2040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [18:43:29] PROBLEM - swift-container-replicator on ms-be2040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [18:43:29] PROBLEM - swift-account-replicator on ms-be2040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [18:43:29] PROBLEM - swift-object-server on ms-be2040 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [18:48:39] (03PS2) 10Smalyshev: Use proper data types for indexing items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454620 (https://phabricator.wikimedia.org/T199884) [18:48:49] PROBLEM - swift-object-replicator on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [18:48:59] PROBLEM - swift-object-server on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [18:49:00] PROBLEM - swift-container-replicator on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [18:49:00] PROBLEM - swift-container-auditor on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [18:49:00] PROBLEM - swift-account-auditor on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [18:49:00] PROBLEM - swift-account-reaper on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [18:49:09] PROBLEM - swift-account-replicator on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [18:49:10] PROBLEM - swift-object-auditor on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [18:49:10] PROBLEM - swift-container-updater on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [18:49:12] ugh sorry [18:49:16] expired downtime [18:49:19] PROBLEM - swift-object-updater on ms-be2042 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [18:49:24] fixed [18:52:23] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10daniel) I'd like to comment on something that @mark said three weeks ago: > As I understand it, several a... [18:54:49] 10Operations, 10Growth-Team, 10Mail, 10Notifications, 10User-herron: SRE query: Is it possible to measure how many e-mails are sent to "black hole" e-mail addresses? - https://phabricator.wikimedia.org/T202329 (10herron) Yes, messages using the blackhole transport are indeed logged. What timeframe are y... [18:59:15] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Karen Brown - https://phabricator.wikimedia.org/T201668 (10Kbrown) Hi there. Hopefully I'm doing this right - please let me know if I've borked it: > ssh-rsa AAAAB... [19:00:04] thcipriani: Dear deployers, time to do the MediaWiki train - Americas version deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180822T1900). [19:00:09] PROBLEM - High lag on wdqs1003 is CRITICAL: 3626 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [19:00:15] * thcipriani on top of things [19:00:29] 10Operations, 10ops-eqiad: Degraded RAID on labvirt1019 - https://phabricator.wikimedia.org/T196507 (10RobH) a:05RobH>03Cmjohnson [19:01:02] 10Operations, 10Patch-For-Review: Onboarding Effie Mouzeli - https://phabricator.wikimedia.org/T201816 (10jijiki) [19:01:36] !log re-running populateContentTables.php on cawiki for T183488 [19:01:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:01:41] T183488: MCR schema migration stage 2: populate new fields - https://phabricator.wikimedia.org/T183488 [19:04:29] PROBLEM - High lag on wdqs1003 is CRITICAL: 3656 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [19:07:26] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10daniel) Another thing: it's unclear to me how judgments are going to be used. Is it enough to be able to... [19:09:30] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Ladsgroup) >>! In T200297#4524312, @daniel wrote: > //However//, this does not at all address the primary... [19:10:31] 10Operations, 10SRE-Access-Requests: Access to restbase servers (including sudo) for Imarlier - https://phabricator.wikimedia.org/T202563 (10Aklapper) [19:10:33] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Ladsgroup) oh I forgot to mention that custom tables need to be built anyway as raw storage are not query... [19:13:30] (03PS1) 10Jijiki: icinga: Added jijiki to icinga authorizations [puppet] - 10https://gerrit.wikimedia.org/r/454628 (https://phabricator.wikimedia.org/T201816) [19:14:44] (03CR) 10Dzahn: [C: 031] "yea, the non-capitalized version of the "cn" field in LDAP and it matches the Icinga contact name in private repo.. this one we usually ge" [puppet] - 10https://gerrit.wikimedia.org/r/454628 (https://phabricator.wikimedia.org/T201816) (owner: 10Jijiki) [19:15:29] (03CR) 10Jijiki: [C: 031] icinga: Added jijiki to icinga authorizations [puppet] - 10https://gerrit.wikimedia.org/r/454628 (https://phabricator.wikimedia.org/T201816) (owner: 10Jijiki) [19:17:16] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10RobH) >>! In T199125#4523936, @Cmjohnson wrote: > Moved cloudvirt1023 to B1 ports ge-1/0/8 and ge-1/0/10 and cloudvirt1024 to B8 ports ge-8/0/22 and 8/0... [19:20:31] (03CR) 10Jijiki: [C: 032] icinga: Added jijiki to icinga authorizations [puppet] - 10https://gerrit.wikimedia.org/r/454628 (https://phabricator.wikimedia.org/T201816) (owner: 10Jijiki) [19:23:05] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) >>! In T200297#4524386, @Ladsgroup wrote: > oh I forgot to mention that custom tables need to be... [19:24:59] RECOVERY - swift-account-auditor on ms-be2040 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [19:24:59] RECOVERY - swift-container-replicator on ms-be2040 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [19:24:59] RECOVERY - swift-account-replicator on ms-be2040 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [19:25:00] RECOVERY - swift-object-server on ms-be2040 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [19:25:09] RECOVERY - swift-account-reaper on ms-be2040 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [19:25:10] RECOVERY - swift-object-replicator on ms-be2040 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [19:25:29] RECOVERY - swift-container-server on ms-be2040 is OK: PROCS OK: 49 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [19:25:30] RECOVERY - swift-object-auditor on ms-be2040 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [19:25:30] RECOVERY - swift-account-server on ms-be2040 is OK: PROCS OK: 49 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [19:25:40] RECOVERY - swift-container-updater on ms-be2040 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [19:25:49] RECOVERY - swift-container-auditor on ms-be2040 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [19:25:59] RECOVERY - swift-object-updater on ms-be2040 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [19:29:40] ACKNOWLEDGEMENT - High lag on wdqs1003 is CRITICAL: 3932 ge 3600 Gehel Bot overload on wikidata, not much to do except wait. https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [19:29:45] RoanKattouw: I've pulled your flow change over to mwdebug1002 if you've got a few minutes and are inclined to test. [19:36:06] seems to work looking at the pages to view from the task, so going live [19:37:48] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10daniel) > For the initial release, we aren't building any of this machinery however, we'll simply provide... [19:38:52] !log thcipriani@deploy1001 Synchronized php-1.32.0-wmf.18/extensions/Flow/includes/TemplateHelper.php: [[gerrit:454626|Work around exception in DifferenceEngine::showDiffStyle()]] T202454 (duration: 00m 57s) [19:38:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:38:57] T202454: Call to a member function getRevisionRecord() on a non-object (boolean) - https://phabricator.wikimedia.org/T202454 [19:40:29] 10Operations, 10Puppet, 10Wikidata, 10Wikidata-Query-Service, 10cloud-services-team: convert cloud VPS projects from apache to httpd module (wikidata-query/ldfclient) - https://phabricator.wikimedia.org/T202092 (10Dzahn) Thank you very much @Smalyshev ! (There are still more VPS projects to convert fr... [19:46:17] (03PS1) 10Thcipriani: group1 wikis to 1.32.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454633 [19:46:19] (03CR) 10Thcipriani: [C: 032] group1 wikis to 1.32.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454633 (owner: 10Thcipriani) [19:47:24] 10Operations, 10Puppet, 10Wikidata, 10Wikidata-Query-Service, 10cloud-services-team: convert cloud VPS projects from apache to httpd module (wikidata-query/ldfclient) - https://phabricator.wikimedia.org/T202092 (10Dzahn) [19:49:13] (03PS4) 10Dzahn: simplelamp: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/415510 (https://phabricator.wikimedia.org/T202574) [19:50:05] (03Merged) 10jenkins-bot: group1 wikis to 1.32.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454633 (owner: 10Thcipriani) [19:50:50] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) >>! In T200297#4524436, @daniel wrote: >> For the initial release, we aren't building any of this... [19:54:40] !log thcipriani@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.32.0-wmf.18 [19:54:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:55:35] !log thcipriani@deploy1001 Synchronized php: group1 wikis to 1.32.0-wmf.18 (duration: 00m 54s) [19:55:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:56:34] CUSTOM - Host analytics1038 is UP: PING OK - Packet loss = 0%, RTA = 0.62 ms [19:56:59] ^ test to make sure Effie has permissions :) works [19:57:26] test tets [19:57:29] :D [19:58:10] the one that _always_ fails at first, but works here :) [19:58:51] 10Operations, 10Patch-For-Review: Onboarding Effie Mouzeli - https://phabricator.wikimedia.org/T201816 (10jijiki) [20:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: Time to snap out of that daydream and deploy Services – Parsoid / Citoid / Mobileapps / ORES / …. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180822T2000). [20:00:33] I'm going to deploy some stuff about ores [20:04:30] (03CR) 10jenkins-bot: group1 wikis to 1.32.0-wmf.18 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454633 (owner: 10Thcipriani) [20:05:40] 10Operations, 10Cloud-VPS, 10Patch-For-Review: convert cloud VPS projects from apache to httpd module - https://phabricator.wikimedia.org/T202574 (10Dzahn) [20:05:51] (03CR) 10Smalyshev: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/447922 (https://phabricator.wikimedia.org/T144103) (owner: 10Smalyshev) [20:05:55] !log ladsgroup@deploy1001 Started deploy [ores/deploy@8ff4da1]: Update ORES to use git lfs for model T197097 [20:05:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:06:00] T197097: Migrate models to LFS - https://phabricator.wikimedia.org/T197097 [20:07:03] !log ebernhardson@deploy1001 Started deploy [search/mjolnir/deploy@8dd6d74]: bump to master, msearch fixes [20:07:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:10:44] !log ebernhardson@deploy1001 Finished deploy [search/mjolnir/deploy@8dd6d74]: bump to master, msearch fixes (duration: 03m 41s) [20:10:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:09] canary looks fine, moving forward [20:12:33] (03CR) 10Smalyshev: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/447922 (https://phabricator.wikimedia.org/T144103) (owner: 10Smalyshev) [20:16:58] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) I should also say, this is the type of (greatly simplified) query I expect once we're ready to in... [20:20:12] (03PS1) 10EBernhardson: Collect prometheus metrics from mjolnir [puppet] - 10https://gerrit.wikimedia.org/r/454644 [20:20:50] (03CR) 10jerkins-bot: [V: 04-1] Collect prometheus metrics from mjolnir [puppet] - 10https://gerrit.wikimedia.org/r/454644 (owner: 10EBernhardson) [20:24:54] !log thcipriani@deploy1001 rebuilt and synchronized wikiversions files: Revert "Group1 wikis to 1.32.0-wmf.18" [20:24:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:26:03] !log thcipriani@deploy1001 Synchronized php: Revert "Group1 wikis to 1.32.0-wmf.18" (duration: 00m 54s) [20:26:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:26:39] (03PS1) 10Thcipriani: Revert "group1 wikis to 1.32.0-wmf.18" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454645 [20:28:38] !log bsitzmann@deploy1001 Started deploy [mobileapps/deploy@f7fa1df]: Update mobileapps to 141ff20 (T202105 T202237) [20:28:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:28:44] T202237: Enable FA feed for sd.wikipedia in Android app - https://phabricator.wikimedia.org/T202237 [20:28:44] T202105: Separate pagelib CSS from base CSS - https://phabricator.wikimedia.org/T202105 [20:29:14] (03CR) 10Thcipriani: [C: 032] Revert "group1 wikis to 1.32.0-wmf.18" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454645 (owner: 10Thcipriani) [20:29:42] :( [20:30:26] (03Merged) 10jenkins-bot: Revert "group1 wikis to 1.32.0-wmf.18" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454645 (owner: 10Thcipriani) [20:31:04] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [20:31:53] related to the deploy ^ ? [20:32:20] !log bsitzmann@deploy1001 Finished deploy [mobileapps/deploy@f7fa1df]: Update mobileapps to 141ff20 (T202105 T202237) (duration: 03m 43s) [20:32:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:33:14] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [20:37:12] godog: my deploy was a rollback for train, I hadn't seen any memcached errors. Unclear. [20:37:37] !log ladsgroup@deploy1001 Finished deploy [ores/deploy@8ff4da1]: Update ORES to use git lfs for model T197097 (duration: 31m 42s) [20:37:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:37:42] T197097: Migrate models to LFS - https://phabricator.wikimedia.org/T197097 [20:38:07] thcipriani: ack, thanks ! [20:39:34] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [20:39:53] (03PS2) 10EBernhardson: Collect prometheus metrics from mjolnir [puppet] - 10https://gerrit.wikimedia.org/r/454644 [20:40:23] RECOVERY - swift-object-updater on ms-be2042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [20:40:53] RECOVERY - swift-object-replicator on ms-be2042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [20:41:03] (03PS1) 10RobH: cloudvirt102[34] mac address update [puppet] - 10https://gerrit.wikimedia.org/r/454692 (https://phabricator.wikimedia.org/T199125) [20:41:04] RECOVERY - swift-object-server on ms-be2042 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [20:41:04] RECOVERY - swift-container-replicator on ms-be2042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [20:41:13] RECOVERY - swift-account-auditor on ms-be2042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [20:41:13] RECOVERY - swift-account-reaper on ms-be2042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [20:41:13] RECOVERY - swift-container-auditor on ms-be2042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [20:41:14] RECOVERY - swift-account-replicator on ms-be2042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [20:41:14] RECOVERY - swift-object-auditor on ms-be2042 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [20:41:14] RECOVERY - swift-container-updater on ms-be2042 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [20:41:24] (03CR) 10RobH: [C: 032] cloudvirt102[34] mac address update [puppet] - 10https://gerrit.wikimedia.org/r/454692 (https://phabricator.wikimedia.org/T199125) (owner: 10RobH) [20:41:26] that's me ^ [20:41:44] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [20:42:38] 10Operations, 10Wikimedia-Mailing-lists: Wikimedia Community User Group Albania mailing list request - https://phabricator.wikimedia.org/T201670 (10herron) There is also an RBL check that will refuse subscription requests from IP addresses that are listed in spam blocklists. If you could provide a few of the... [20:44:02] (03CR) 10jenkins-bot: Revert "group1 wikis to 1.32.0-wmf.18" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454645 (owner: 10Thcipriani) [20:45:15] oh, problems with wmf.18? [20:46:43] yeah, added https://phabricator.wikimedia.org/T202580 as a blocker. [20:50:34] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqsin on einsteinium is CRITICAL: cluster=cache_text site=eqsin https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [20:50:42] oh [20:52:02] 10Operations, 10Maps, 10Maps-Sprint, 10Traffic, and 2 others: Decide on Cache-Control headers for map tiles - https://phabricator.wikimedia.org/T186732 (10Mholloway) Deployed to beta cluster. Note that we won't be able to deploy the updated max-age to production until the production upgrade to Stretch (T1... [20:53:43] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [21:00:16] 10Operations, 10Maps, 10Maps-Sprint, 10Traffic, 10Reading-Infrastructure-Team-Backlog (Kanban): Decide on Cache-Control headers for map tiles - https://phabricator.wikimedia.org/T186732 (10Mholloway) [21:05:05] 10Operations, 10Wikimedia-Logstash, 10Goal, 10Patch-For-Review, and 2 others: Shorten logstash retention temporarily - https://phabricator.wikimedia.org/T201971 (10fgiunchedi) >>! In T201971#4517412, @fgiunchedi wrote: > Thanks for your help on investigating this everyone! Very helpful insights. > > As it... [21:06:55] thcipriani: Thanks for deploying that. I was stuck in meetings and didn't see your ping until just now [21:07:14] RoanKattouw: sure thing! Thank you for the patch! [21:12:55] 10Operations, 10Growth-Team, 10Mail, 10Notifications, 10User-herron: SRE query: Is it possible to measure how many e-mails are sent to "black hole" e-mail addresses? - https://phabricator.wikimedia.org/T202329 (10MMiller_WMF) @Jdforrester-WMF -- to put some specific around this, here's what I would like... [21:14:47] 10Operations, 10Growth-Team, 10Mail, 10Notifications, 10User-herron: SRE query: Is it possible to measure how many e-mails are sent to "black hole" e-mail addresses? - https://phabricator.wikimedia.org/T202329 (10MMiller_WMF) Adding Morten, our team's data analyst, as this involves data. [21:18:14] wow, we can probably add him to everything, then [21:20:55] 10Operations, 10media-storage: cleanupUploadStash.php / swift-codfw backend-fail-delete / cron spam - https://phabricator.wikimedia.org/T202584 (10Dzahn) [21:22:06] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10RobH) So these successfully boot into the jessie installer, but then seem to lack the support for the Perc H740P hw raid controlle... [21:24:50] (03PS2) 10Dzahn: Split releasers-wikidiff2 from releasers-mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/454442 (https://phabricator.wikimedia.org/T202473) (owner: 10Legoktm) [21:26:41] (03CR) 10Dzahn: "taking this, will check on releases servers the file" [puppet] - 10https://gerrit.wikimedia.org/r/454442 (https://phabricator.wikimedia.org/T202473) (owner: 10Legoktm) [21:27:41] 10Operations, 10Electron-PDFs, 10Proton, 10Patch-For-Review, and 4 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748 (10Jdlrobson) [21:28:03] (03CR) 10Dzahn: [C: 032] Split releasers-wikidiff2 from releasers-mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/454442 (https://phabricator.wikimedia.org/T202473) (owner: 10Legoktm) [21:31:08] well.. that will fail on releases* but on it [21:32:44] PROBLEM - puppet last run on releases2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/srv/org/wikimedia/releases/wikidiff2] [21:33:01] (03PS1) 10Filippo Giunchedi: graphite: install dummy carbonate.conf [puppet] - 10https://gerrit.wikimedia.org/r/454695 (https://phabricator.wikimedia.org/T196484) [21:33:37] (03CR) 10jerkins-bot: [V: 04-1] graphite: install dummy carbonate.conf [puppet] - 10https://gerrit.wikimedia.org/r/454695 (https://phabricator.wikimedia.org/T196484) (owner: 10Filippo Giunchedi) [21:34:37] (03PS1) 10Dzahn: releases: set owner of wikidiff2 releases back to mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/454696 (https://phabricator.wikimedia.org/T202473) [21:35:14] PROBLEM - puppet last run on releases1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/srv/org/wikimedia/releases/wikidiff2] [21:35:18] (03CR) 10jerkins-bot: [V: 04-1] releases: set owner of wikidiff2 releases back to mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/454696 (https://phabricator.wikimedia.org/T202473) (owner: 10Dzahn) [21:36:06] (03PS2) 10Dzahn: releases: set owner of wikidiff2 releases back to mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/454696 (https://phabricator.wikimedia.org/T202473) [21:36:27] (03CR) 10Dzahn: [C: 032] "needs follow-up" [puppet] - 10https://gerrit.wikimedia.org/r/454442 (https://phabricator.wikimedia.org/T202473) (owner: 10Legoktm) [21:36:51] (03CR) 10Dzahn: [C: 032] releases: set owner of wikidiff2 releases back to mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/454696 (https://phabricator.wikimedia.org/T202473) (owner: 10Dzahn) [21:37:09] (03PS2) 10Filippo Giunchedi: graphite: install dummy carbonate.conf [puppet] - 10https://gerrit.wikimedia.org/r/454695 (https://phabricator.wikimedia.org/T196484) [21:40:13] (03PS1) 10Dzahn: releases: add new group releasers-wikidiff2 to role [puppet] - 10https://gerrit.wikimedia.org/r/454698 (https://phabricator.wikimedia.org/T202473) [21:40:23] RECOVERY - puppet last run on releases1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:40:24] (03CR) 10Filippo Giunchedi: [C: 032] "PCC https://puppet-compiler.wmflabs.org/compiler02/12181/" [puppet] - 10https://gerrit.wikimedia.org/r/454695 (https://phabricator.wikimedia.org/T196484) (owner: 10Filippo Giunchedi) [21:40:42] (03PS3) 10Filippo Giunchedi: graphite: install dummy carbonate.conf [puppet] - 10https://gerrit.wikimedia.org/r/454695 (https://phabricator.wikimedia.org/T196484) [21:41:55] (03CR) 10Dzahn: [C: 032] "no new access request are involved here, users already on these hosts from the mediawiki role" [puppet] - 10https://gerrit.wikimedia.org/r/454698 (https://phabricator.wikimedia.org/T202473) (owner: 10Dzahn) [21:42:10] (03PS2) 10Dzahn: releases: add new group releasers-wikidiff2 to role [puppet] - 10https://gerrit.wikimedia.org/r/454698 (https://phabricator.wikimedia.org/T202473) [21:42:53] RECOVERY - puppet last run on releases2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [21:44:36] (03CR) 10Dzahn: [C: 032] "this is what actually created the group" [puppet] - 10https://gerrit.wikimedia.org/r/454698 (https://phabricator.wikimedia.org/T202473) (owner: 10Dzahn) [21:45:38] (03PS1) 10Dzahn: Revert "releases: set owner of wikidiff2 releases back to mediawiki" [puppet] - 10https://gerrit.wikimedia.org/r/454701 [21:47:10] (03PS2) 10Dzahn: Revert "releases: set owner of wikidiff2 releases back to mediawiki" [puppet] - 10https://gerrit.wikimedia.org/r/454701 (https://phabricator.wikimedia.org/T202473) [21:47:18] (03PS3) 10Dzahn: Revert "releases: set owner of wikidiff2 releases back to mediawiki" [puppet] - 10https://gerrit.wikimedia.org/r/454701 (https://phabricator.wikimedia.org/T202473) [21:47:29] (03CR) 10Dzahn: [C: 032] "Now that the new group has been created by I35630abf9e969c147 we can revert this change and actually use the new group." [puppet] - 10https://gerrit.wikimedia.org/r/454701 (https://phabricator.wikimedia.org/T202473) (owner: 10Dzahn) [21:48:10] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10RobH) Same thing in stretch. This is odd, since we must have installed other R440s with H740P controllers.... [21:50:16] (03PS1) 10Andrew Bogott: nova region-migrate: update eqiad/eqiad1 settings [puppet] - 10https://gerrit.wikimedia.org/r/454703 (https://phabricator.wikimedia.org/T191790) [21:51:47] (03CR) 10Andrew Bogott: [C: 032] nova region-migrate: update eqiad/eqiad1 settings [puppet] - 10https://gerrit.wikimedia.org/r/454703 (https://phabricator.wikimedia.org/T191790) (owner: 10Andrew Bogott) [21:52:02] 10Operations, 10wikidiff2, 10Patch-For-Review: Create releasers-wikidiff2 group, split from releasers-mediawiki - https://phabricator.wikimedia.org/T202473 (10Dzahn) We had to create the group first by adding it to the Hiera file for the role, then aftewards use it. After these patches above it is now resol... [21:57:41] !log releases1001/2001: sudo find /srv/org/wikimedia/releases/wikidiff2 -name wikidiff2\* -exec chown :releasers-wikidiff2 {} \; (T202473) [21:57:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:57:47] T202473: Create releasers-wikidiff2 group, split from releasers-mediawiki - https://phabricator.wikimedia.org/T202473 [21:58:49] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10Dzahn) [21:58:51] 10Operations, 10wikidiff2, 10Patch-For-Review: Create releasers-wikidiff2 group, split from releasers-mediawiki - https://phabricator.wikimedia.org/T202473 (10Dzahn) 05Open>03Resolved a:03Dzahn [21:59:25] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10RobH) [22:00:12] 10Operations, 10Traffic, 10monitoring, 10Patch-For-Review: False alarms on varnish-http-requests 70% GET drop in 30 min alert - https://phabricator.wikimedia.org/T201630 (10ayounsi) a:03ayounsi [22:01:34] 10Operations, 10wikidiff2, 10Patch-For-Review: Create releasers-wikidiff2 group, split from releasers-mediawiki - https://phabricator.wikimedia.org/T202473 (10Dzahn) [22:01:36] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10Dzahn) [22:01:39] 10Operations, 10wikidiff2, 10Patch-For-Review: Create releasers-wikidiff2 group, split from releasers-mediawiki - https://phabricator.wikimedia.org/T202473 (10Dzahn) [22:01:41] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10Dzahn) [22:02:01] 10Operations, 10wikidiff2, 10Patch-For-Review: Create releasers-wikidiff2 group, split from releasers-mediawiki - https://phabricator.wikimedia.org/T202473 (10Dzahn) [22:02:04] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give thiemowmde permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202476 (10Dzahn) [22:02:08] 10Operations, 10netops, 10Wikimedia-Incident: asw2-a-eqiad FPC5 gets disconnected every 10 minutes - https://phabricator.wikimedia.org/T201145 (10ayounsi) A chassis reboot cleared that specific issue. [22:02:14] 10Operations, 10wikidiff2, 10Patch-For-Review: Create releasers-wikidiff2 group, split from releasers-mediawiki - https://phabricator.wikimedia.org/T202473 (10Dzahn) [22:02:20] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give thiemowmde permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202476 (10Dzahn) [22:02:50] 10Operations, 10wikidiff2, 10Patch-For-Review: Create releasers-wikidiff2 group, split from releasers-mediawiki - https://phabricator.wikimedia.org/T202473 (10Dzahn) [22:05:10] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10RobH) So, these are booting and they have a single raid10 setup of all the disks. However, they get to the point in the installer... [22:05:41] (03CR) 10Dzahn: [C: 04-1] "unless something changed it needs to be 2 files, redirect.conf and redirects.dat, template and generated result" [puppet] - 10https://gerrit.wikimedia.org/r/454498 (https://phabricator.wikimedia.org/T202498) (owner: 10Reedy) [22:10:50] 10Operations, 10SRE-Access-Requests: Access to restbase servers (including sudo) for Imarlier - https://phabricator.wikimedia.org/T202563 (10Dzahn) @imarlier There is restbase-roots and restbase-admins. Both have sudo privileges but different levels of it. A restbase-admin can control the restbase and cassandr... [22:11:26] (03PS2) 10Dzahn: conftool/client: rm 'obsolete distribution check in ubuntu <= trusty' [puppet] - 10https://gerrit.wikimedia.org/r/454592 [22:14:22] 10Operations, 10Quarry, 10Patch-For-Review, 10cloud-services-team (Kanban): Let quarry use the mariadb module - https://phabricator.wikimedia.org/T181205 (10Framawiki) [22:16:23] RECOVERY - High lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 1197 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [22:17:47] 10Operations, 10SRE-Access-Requests: Access to restbase servers (including sudo) for Imarlier - https://phabricator.wikimedia.org/T202563 (10RobH) [22:20:01] 10Operations, 10SRE-Access-Requests: Access to restbase servers (including sudo) for Imarlier - https://phabricator.wikimedia.org/T202563 (10RobH) p:05Triage>03Normal [22:21:39] 10Operations, 10SRE-Access-Requests: Access to restbase servers (including sudo) for Imarlier - https://phabricator.wikimedia.org/T202563 (10RobH) a:03Imarlier @Imarlier, We'll need the following from you to make this happen: [] - determination on which group you need. Please select the most restrictive g... [22:21:59] (03CR) 10Dzahn: "let me help you with the part that jenkins bot hates:" [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [22:22:30] 10Operations, 10SRE-Access-Requests: Requesting access to restricted production access for Bill Pirkle - https://phabricator.wikimedia.org/T202546 (10RobH) [22:23:40] @seen [22:23:40] mutante: I have never seen [22:23:44] @seen zhuyifei1999 [22:23:44] mutante: I have never seen zhuyifei1999 [22:24:07] 10Operations, 10SRE-Access-Requests: Requesting access to restricted production access for Bill Pirkle - https://phabricator.wikimedia.org/T202546 (10RobH) [22:25:50] 10Operations, 10SRE-Access-Requests: Requesting access to restricted production access for Bill Pirkle - https://phabricator.wikimedia.org/T202546 (10RobH) p:05Triage>03Normal [22:27:50] 10Operations, 10SRE-Access-Requests: Requesting access to restricted production access for Bill Pirkle - https://phabricator.wikimedia.org/T202546 (10RobH) @bpirkle: Please coordinate with your manager to have them approve your access (via comment) on this task. Once they do that, we should have everything w... [22:30:21] (03CR) 10Dzahn: "is this the class that is applied on the instance running the db? https://tools.wmflabs.org/openstack-browser/puppetclass/role::labs::qua" [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [22:32:18] (03PS1) 10RobH: adding user Bill Pirkle [puppet] - 10https://gerrit.wikimedia.org/r/454709 (https://phabricator.wikimedia.org/T202546) [22:33:55] (03PS1) 10RobH: adding bill pirkle to groups [puppet] - 10https://gerrit.wikimedia.org/r/454710 (https://phabricator.wikimedia.org/T202546) [22:34:20] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access for Bill Pirkle - https://phabricator.wikimedia.org/T202546 (10RobH) [22:35:13] 10Operations, 10Growth-Team, 10Mail, 10Notifications, 10User-herron: SRE query: Is it possible to measure how many e-mails are sent to "black hole" e-mail addresses? - https://phabricator.wikimedia.org/T202329 (10Jdforrester-WMF) >>! In T202329#4524878, @MMiller_WMF wrote: > @Jdforrester-WMF -- to put so... [22:37:17] (03PS1) 10Dzahn: quarry::database: convert to profile [puppet] - 10https://gerrit.wikimedia.org/r/454711 [22:37:38] 10Operations, 10SRE-Access-Requests: Phabricator: Allow aklapper to delete personal Herald filter rules - https://phabricator.wikimedia.org/T202503 (10RobH) I'm not sure if this should be a #sre-access-requests or a request for the #release-engineering-team to approve (since they are the primary maintainers of... [22:38:29] (03CR) 10Dzahn: "this should make it possible for https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/454481/ to be merged without getting a jenkins dow" [puppet] - 10https://gerrit.wikimedia.org/r/454711 (owner: 10Dzahn) [22:39:44] (03CR) 10Dzahn: [C: 031] Phab: Allow aklapper to delete personal Herald filter rules [puppet] - 10https://gerrit.wikimedia.org/r/448505 (owner: 10Aklapper) [22:40:34] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [22:41:00] 10Operations, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Sunset Watchmouse's status.wikimedia.org - https://phabricator.wikimedia.org/T199816 (10Framawiki) @waldyrious Just note that if Wikipedia was down, people will try to find more information, and Grafana, Wikitech or other alternative spaces... [22:41:02] 10Operations, 10SRE-Access-Requests: Requesting access to restricted production access and analytics-privatedata-users for Kalliope Tsouroupidou - https://phabricator.wikimedia.org/T202486 (10RobH) [22:41:57] (03PS2) 10Dzahn: quarry::database: convert to profile [puppet] - 10https://gerrit.wikimedia.org/r/454711 [22:43:00] (03CR) 10Dzahn: "https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/454711/" [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [22:44:54] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [22:45:25] !log temp depooled wdq3 to make it catch up faster [22:45:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:45:41] (03PS1) 10RobH: adding new shell user Kalliope Tsouroupidou [puppet] - 10https://gerrit.wikimedia.org/r/454712 (https://phabricator.wikimedia.org/T202486) [22:46:10] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Tune Varnishkafka delivery errors to be more sensitive - https://phabricator.wikimedia.org/T173492 (10Nuria) 05Open>03Resolved [22:46:43] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [22:46:55] (03PS1) 10RobH: adding user ktsouroupidou to groups [puppet] - 10https://gerrit.wikimedia.org/r/454713 (https://phabricator.wikimedia.org/T202486) [22:47:23] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to restricted production access and analytics-privatedata-users for Kalliope Tsouroupidou - https://phabricator.wikimedia.org/T202486 (10RobH) p:05Triage>03Normal [22:48:34] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give thiemowmde permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202476 (10RobH) [22:48:53] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [22:49:02] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review: Logstash packet loss - https://phabricator.wikimedia.org/T200960 (10fgiunchedi) A couple of days ago a sudden spike of syslog udp input caused again packet loss. IOW we have mitigated the common cases but sudden udp surges will still causes loss. Next... [22:56:16] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give thiemowmde permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202476 (10RobH) p:05Triage>03Normal a:03thiemowmde Please note that @thiemowmde will need to provide/work on some furth... [22:56:20] (03PS1) 10Dzahn: quarry::web: convert to profile [puppet] - 10https://gerrit.wikimedia.org/r/454715 [22:56:57] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give thiemowmde permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202476 (10RobH) [22:57:17] (03CR) 10jerkins-bot: [V: 04-1] quarry::web: convert to profile [puppet] - 10https://gerrit.wikimedia.org/r/454715 (owner: 10Dzahn) [22:57:41] (03CR) 10Reedy: "Things have changed" [puppet] - 10https://gerrit.wikimedia.org/r/454498 (https://phabricator.wikimedia.org/T202498) (owner: 10Reedy) [22:57:43] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10RobH) p:05Triage>03Normal [22:58:50] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10RobH) a:03WMDE-Fisch Please note that @WMDE-Fisch will need to provide/work on some further steps for us to proc... [22:59:24] 10Operations, 10SRE-Access-Requests, 10wikidiff2, 10User-Addshore: Give WMDE-Fisch permission to upload wikidiff2 releases (releasers-wikidiff2) - https://phabricator.wikimedia.org/T202475 (10RobH) [23:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: (Dis)respected human, time to deploy Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180822T2300). Please do the needful. [23:00:04] Smalyshev: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:26] 10Operations, 10SRE-Access-Requests: Phabricator: Allow aklapper to delete personal Herald filter rules - https://phabricator.wikimedia.org/T202503 (10Aklapper) >>! In T202503#4525188, @RobH wrote: > I'm not sure if this should be a #sre-access-requests @RobH: I don't think so either but was told that "[[ http... [23:00:40] (03CR) 10Dzahn: [C: 031] "oh! thanks, i missed that during vacation. very nice!" [puppet] - 10https://gerrit.wikimedia.org/r/454498 (https://phabricator.wikimedia.org/T202498) (owner: 10Reedy) [23:00:53] (03PS1) 10Dzahn: quarry::redis: convert to profile [puppet] - 10https://gerrit.wikimedia.org/r/454717 [23:01:04] (03CR) 10Reedy: "Heh, yeah. I spent a little while working out where redirects.conf had gone :D" [puppet] - 10https://gerrit.wikimedia.org/r/454498 (https://phabricator.wikimedia.org/T202498) (owner: 10Reedy) [23:05:11] here [23:06:03] I can SWAT [23:06:20] (03PS2) 10Dzahn: quarry::web: convert to profile [puppet] - 10https://gerrit.wikimedia.org/r/454715 [23:06:34] cool [23:06:41] (03PS3) 10Thcipriani: Use proper data types for indexing items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454620 (https://phabricator.wikimedia.org/T199884) (owner: 10Smalyshev) [23:06:51] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454620 (https://phabricator.wikimedia.org/T199884) (owner: 10Smalyshev) [23:07:10] (03CR) 10jerkins-bot: [V: 04-1] quarry::web: convert to profile [puppet] - 10https://gerrit.wikimedia.org/r/454715 (owner: 10Dzahn) [23:07:29] (03PS3) 10Dzahn: quarry::database: convert to profile [puppet] - 10https://gerrit.wikimedia.org/r/454711 [23:08:24] (03Merged) 10jenkins-bot: Use proper data types for indexing items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454620 (https://phabricator.wikimedia.org/T199884) (owner: 10Smalyshev) [23:09:34] (03PS3) 10Dzahn: quarry::web: convert to profile [puppet] - 10https://gerrit.wikimedia.org/r/454715 [23:09:50] SMalyshev: your change is live on mwdebug1002, check if possible please [23:10:30] thcipriani: checking [23:13:02] hmm... still not exactly what I want, but I'm not sure if it's because it's done by jobs which aren't on mwdebug.... [23:13:14] thcipriani: do you know if this also affects jobs? ^ [23:14:00] hrm, I would suspect it does not, but I'm not positive [23:14:24] 10Operations, 10monitoring, 10Patch-For-Review: Netbox: add Icinga check for PostgreSQL - https://phabricator.wikimedia.org/T185504 (10Dzahn) p:05Normal>03High [23:14:43] thcipriani: ok, then I'd say we can deploy it - it doesn't break any stuff that worked, worst case new stuff isn't working but then I'll debug it on terbium [23:15:01] ok, thank you for checking, going live [23:15:09] beause I suspect then we can't check the jobs stuff on mwdebug [23:15:39] !log repooled wdq3 after recovery [23:15:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:16:48] (03CR) 10jenkins-bot: Use proper data types for indexing items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454620 (https://phabricator.wikimedia.org/T199884) (owner: 10Smalyshev) [23:16:59] !log thcipriani@deploy1001 Synchronized wmf-config/WikibaseSearchSettings.php: SWAT: [[gerrit:454620|Use proper data types for indexing items]] T199884 (duration: 00m 55s) [23:17:03] ^ SMalyshev live now [23:17:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:17:05] T199884: Support haswbstatement in other properties - https://phabricator.wikimedia.org/T199884 [23:17:23] 10Operations, 10DBA, 10JADE, 10TechCom-RFC, 10Scoring-platform-team (Current): Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10kchapman) TechCom hosted a meeting on this today: Minutes: https://tools.wmflabs.org/meetbot/wikimedia-o... [23:17:36] thcipriani: thanks! [23:17:41] yw :) [23:18:45] yep, it's working in prod now! yay! [23:20:26] nice [23:42:25] (03PS1) 10EBernhardson: Deploy msearch daemon to cirrus servers [puppet] - 10https://gerrit.wikimedia.org/r/454722 (https://phabricator.wikimedia.org/T200740) [23:43:12] (03PS2) 10EBernhardson: Deploy msearch daemon to cirrus servers [puppet] - 10https://gerrit.wikimedia.org/r/454722 (https://phabricator.wikimedia.org/T200740) [23:46:27] 10Operations, 10SRE-Access-Requests: Phabricator: Allow aklapper to delete personal Herald filter rules - https://phabricator.wikimedia.org/T202503 (10RobH) >>! In T202503#4525275, @Aklapper wrote: >>>! In T202503#4525188, @RobH wrote: >> I'm not sure if this should be a #sre-access-requests > @RobH: I don't t... [23:48:22] Is there a way to do group by queries in logstash. e.g. I have a column, I want to know what the top results in that column are?