[00:00:05] twentyafterfour: #bothumor I � Unicode. All rise for Phabricator update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180830T0000). [00:10:40] PROBLEM - High load average on labstore1004 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [36.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [00:11:41] (03CR) 10Dzahn: "lol, it's quite funny how that was my own change and i removed it myself" [puppet] - 10https://gerrit.wikimedia.org/r/456175 (https://phabricator.wikimedia.org/T199073) (owner: 10Dzahn) [00:12:47] (03CR) 10Dzahn: [C: 032] "tried to revert myself in https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/456175/ for other reasons, haha" [puppet] - 10https://gerrit.wikimedia.org/r/431047 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [00:13:37] (03CR) 10Dzahn: "well, that was needed to migrate the maintenance server within a DC, and this is better if there is only 1 per DC but we want to switch DC" [puppet] - 10https://gerrit.wikimedia.org/r/456175 (https://phabricator.wikimedia.org/T199073) (owner: 10Dzahn) [00:13:38] (03Abandoned) 10Dzahn: mediawiki::maintenance: use mw_primary to enable/disable crons [puppet] - 10https://gerrit.wikimedia.org/r/456175 (https://phabricator.wikimedia.org/T199073) (owner: 10Dzahn) [00:18:03] (03PS7) 10Dzahn: zuul: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/434427 (https://phabricator.wikimedia.org/T194724) [00:19:58] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/12287/contint1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/434427 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [00:20:20] PROBLEM - High load average on labstore1004 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [36.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [00:23:13] (03CR) 10Dzahn: "noop on contint1001" [puppet] - 10https://gerrit.wikimedia.org/r/434427 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [00:24:40] PROBLEM - High load average on labstore1004 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [36.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [00:24:58] (03CR) 10Dzahn: [C: 04-1] "i'm not sure if shinken will move off of trusty any time soon.. so might want to abandon this" [puppet] - 10https://gerrit.wikimedia.org/r/448770 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [00:26:06] (03CR) 10Dzahn: "i'm wondering if this change got superseded by Alex Monk's change and ib3" [puppet] - 10https://gerrit.wikimedia.org/r/405594 (owner: 10Paladox) [00:29:48] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move internal sites hosted on thorium to ganeti instance(s) - https://phabricator.wikimedia.org/T202011 (10Nuria) 05Open>03Resolved [00:30:13] 10Operations, 10Analytics, 10Analytics-Kanban, 10netops, 10Patch-For-Review: Review analytics-in4/6 rules on cr1/cr2 eqiad - https://phabricator.wikimedia.org/T198623 (10Nuria) 05Open>03Resolved [00:31:45] (03PS3) 10Dzahn: piwik: add support for stretch/PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/453553 [00:31:52] (03CR) 10Dzahn: piwik: add support for stretch/PHP7 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/453553 (owner: 10Dzahn) [00:32:54] (03PS4) 10Dzahn: piwik: add support for stretch/PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/453553 [00:35:35] (03PS5) 10Dzahn: piwik: add support for stretch/PHP7 [puppet] - 10https://gerrit.wikimedia.org/r/453553 [00:39:59] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/12289/bohrium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/453553 (owner: 10Dzahn) [00:42:16] (03PS1) 10Dzahn: systemd::sidekick: replace base_service::unit comment with systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/456312 (https://phabricator.wikimedia.org/T194724) [00:46:56] (03PS1) 10Bstorm: labstore: move the load numbers for our problem children up temporarily [puppet] - 10https://gerrit.wikimedia.org/r/456314 [00:47:34] (03CR) 10Andrew Bogott: [C: 031] "Hope it's enough" [puppet] - 10https://gerrit.wikimedia.org/r/456314 (owner: 10Bstorm) [00:47:56] (03PS2) 10Bstorm: labstore: move the load numbers for our problem children up temporarily [puppet] - 10https://gerrit.wikimedia.org/r/456314 [00:48:51] (03CR) 10Bstorm: [C: 032] labstore: move the load numbers for our problem children up temporarily [puppet] - 10https://gerrit.wikimedia.org/r/456314 (owner: 10Bstorm) [00:51:08] RECOVERY - Check systemd state on cloudservices1004 is OK: OK - running: The system is fully operational [00:52:39] RECOVERY - High load average on labstore1004 is OK: OK: Less than 50.00% above the threshold [24.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [00:53:01] (03PS1) 10Dzahn: apertium: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/456316 (https://phabricator.wikimedia.org/T194724) [00:53:51] (03CR) 10jerkins-bot: [V: 04-1] apertium: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/456316 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [00:54:45] (03PS2) 10Dzahn: apertium: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/456316 (https://phabricator.wikimedia.org/T194724) [00:55:46] (03CR) 10jerkins-bot: [V: 04-1] apertium: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/456316 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [01:00:13] (03PS3) 10Dzahn: apertium: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/456316 (https://phabricator.wikimedia.org/T194724) [01:04:18] (03PS1) 10Dzahn: confd: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/456317 (https://phabricator.wikimedia.org/T194724) [01:10:36] (03PS1) 10Dzahn: hhvm: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/456319 (https://phabricator.wikimedia.org/T194724) [01:11:13] (03CR) 10jerkins-bot: [V: 04-1] hhvm: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/456319 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [01:13:57] (03PS2) 10Dzahn: hhvm: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/456319 (https://phabricator.wikimedia.org/T194724) [01:20:12] (03PS1) 10Dzahn: smokeping: replace radon with dnsauth1001 as a target [puppet] - 10https://gerrit.wikimedia.org/r/456320 (https://phabricator.wikimedia.org/T202040) [01:24:03] (03CR) 10Dzahn: "i am not sure if "C4" is actually right. these are virtual now, right. does another target in C4 make more sense than replacing it with dn" [puppet] - 10https://gerrit.wikimedia.org/r/456320 (https://phabricator.wikimedia.org/T202040) (owner: 10Dzahn) [01:24:18] PROBLEM - High load average on labstore1004 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [48.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [01:28:57] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 26 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/11645088/#!map [01:31:03] RECOVERY - High load average on labstore1004 is OK: OK: Less than 50.00% above the threshold [36.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [01:32:09] (03PS1) 10Mathew.onipe: Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456321 [01:32:11] (03PS1) 10Mathew.onipe: Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 [01:33:17] (03CR) 10jerkins-bot: [V: 04-1] Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456321 (owner: 10Mathew.onipe) [01:33:44] (03PS1) 10Andrew Bogott: labstore: move the load numbers yet higher [puppet] - 10https://gerrit.wikimedia.org/r/456323 [01:33:47] (03CR) 10jerkins-bot: [V: 04-1] Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (owner: 10Mathew.onipe) [01:33:52] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 15 probes of 318 (alerts on 19) - https://atlas.ripe.net/measurements/11645088/#!map [01:34:28] (03CR) 10Andrew Bogott: [C: 032] labstore: move the load numbers yet higher [puppet] - 10https://gerrit.wikimedia.org/r/456323 (owner: 10Andrew Bogott) [01:37:03] (03PS2) 10Mathew.onipe: Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) [01:37:41] (03CR) 10jerkins-bot: [V: 04-1] Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) (owner: 10Mathew.onipe) [01:40:52] (03Abandoned) 10Mathew.onipe: Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) (owner: 10Mathew.onipe) [01:41:18] (03Abandoned) 10Mathew.onipe: Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456321 (owner: 10Mathew.onipe) [01:42:47] (03PS1) 10Andrew Bogott: rabbitmq: allow designate_host_standby to access rabbit [puppet] - 10https://gerrit.wikimedia.org/r/456324 [01:45:05] (03CR) 10Andrew Bogott: [C: 032] rabbitmq: allow designate_host_standby to access rabbit [puppet] - 10https://gerrit.wikimedia.org/r/456324 (owner: 10Andrew Bogott) [01:50:56] (03Restored) 10Mathew.onipe: Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) (owner: 10Mathew.onipe) [01:51:21] (03PS3) 10Mathew.onipe: Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) [01:52:01] (03CR) 10Mathew.onipe: "Please review" [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) (owner: 10Mathew.onipe) [01:52:49] (03CR) 10jerkins-bot: [V: 04-1] Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) (owner: 10Mathew.onipe) [02:06:27] (03CR) 10Dzahn: "Hi Matt, if you follow the link in the comment from jenkins-bot below, you can see some things it doesn't like that you could potentially " [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) (owner: 10Mathew.onipe) [02:10:05] (03CR) 10Mathew.onipe: "> Patch Set 3:" [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) (owner: 10Mathew.onipe) [02:11:40] (03PS4) 10Mathew.onipe: Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) [02:12:44] (03CR) 10jerkins-bot: [V: 04-1] Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) (owner: 10Mathew.onipe) [02:20:43] (03CR) 10Legoktm: [C: 031] noc: Add Cache-Control with short max-age for noc.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/456206 (https://phabricator.wikimedia.org/T202734) (owner: 10Krinkle) [02:34:29] PROBLEM - puppet last run on labsdb1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:36:32] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.18) (duration: 15m 20s) [02:36:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:39:07] (03PS5) 10Dzahn: tor::relay: add configurable thirdparty APT source [puppet] - 10https://gerrit.wikimedia.org/r/456056 (https://phabricator.wikimedia.org/T196701) [02:42:01] (03CR) 10Dzahn: tor::relay: add configurable thirdparty APT source (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/456056 (https://phabricator.wikimedia.org/T196701) (owner: 10Dzahn) [02:45:23] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/12291/radium.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/456056 (https://phabricator.wikimedia.org/T196701) (owner: 10Dzahn) [02:58:09] (03PS5) 10Mathew.onipe: Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) [02:59:43] (03CR) 10jerkins-bot: [V: 04-1] Elasticsearch module is coming up. [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) (owner: 10Mathew.onipe) [03:04:39] RECOVERY - puppet last run on labsdb1009 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [03:11:41] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.19) (duration: 15m 12s) [03:11:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:21:55] !log l10nupdate@deploy1001 ResourceLoader cache refresh completed at Thu Aug 30 03:21:55 UTC 2018 (duration 10m 14s) [03:21:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:26:30] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 796.70 seconds [03:30:19] 10Operations, 10Wikimedia-Planet: en.planet hasn't updated since July 25 - https://phabricator.wikimedia.org/T203055 (10Dzahn) ``` 1 Traceback (most recent call last): 2 File "/usr/bin/rawdog", line 33, in 3 launch() 4 File "/usr/bin/rawdog", line 26, in launch 5 sys.exit(main(sy... [03:33:22] mutante: ugh, I see the issue [03:33:41] (03PS1) 10Legoktm: planet: Fix syntax error in rss.py [puppet] - 10https://gerrit.wikimedia.org/r/456329 (https://phabricator.wikimedia.org/T203055) [03:36:21] legoktm: yes, i already hit enter after git review but you still beat me to it :) [03:36:33] i broke it with https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/447743/ sorry about that [03:36:50] maybe can have monitoring for that [03:37:54] we should probably have a check that there is a post for $currentday - 1 [03:38:04] yea, something about the date, ack [03:38:16] (03CR) 10Dzahn: [C: 032] planet: Fix syntax error in rss.py [puppet] - 10https://gerrit.wikimedia.org/r/456329 (https://phabricator.wikimedia.org/T203055) (owner: 10Legoktm) [03:38:20] I can't remember a normal day in which there were no new planet posts [03:39:28] i cant get my change to upload.. so thanks for yours, aboreted my git review [03:39:39] also on mobile hotspot [03:41:55] :) :( [03:42:05] !log running manual update of en.planet feeds (T203055) [03:42:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:42:10] T203055: en.planet hasn't updated since July 25 - https://phabricator.wikimedia.org/T203055 [03:43:35] 10Operations, 10Wikimedia-Planet, 10Patch-For-Review: en.planet hasn't updated since July 25 - https://phabricator.wikimedia.org/T203055 (10Dzahn) yep, i broke it with https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/447743/ sorry about that maybe we can add monitoring for the feed last update getting... [03:51:09] update still running [03:51:49] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 235.69 seconds [04:03:04] legoktm: try reloading [04:07:10] https://en.planet.wikimedia.org/?cachebust [04:11:11] mutante: I still see the same July 25 [04:11:26] legoktm: it's the cache [04:11:35] add ?somethingrandom [04:12:18] hmm, https://en.planet.wikimedia.org/?cachebustadshfdkfjhdsf still didn't work [04:12:55] mutante: if you can see it I'll just wait another hour for my RSS client to pick up the new posts? [04:13:37] legoktm: yes, i see August 29, 2018 in my browser, no special RSS reader [04:13:42] ok :) [04:22:53] the other language versions are also affected, i'll jus tlet crons run, bbiaw [04:29:24] 10Operations, 10Wikimedia-Planet, 10Patch-For-Review: en.planet hasn't updated since July 25 - https://phabricator.wikimedia.org/T203055 (10Dzahn) should be fixed in an hour.. for all language versions.. crons run at random minutes for each language version and cache should also expire soonish [04:30:51] PROBLEM - High load average on labstore1004 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [70.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [04:33:33] grrrrrrr [04:47:51] RECOVERY - High load average on labstore1004 is OK: OK: Less than 50.00% above the threshold [50.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [05:44:05] (03PS1) 10Jcrespo: mariadb: Depool db1119 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456333 [05:45:37] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1119 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456333 (owner: 10Jcrespo) [05:47:21] (03Merged) 10jenkins-bot: mariadb: Depool db1119 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456333 (owner: 10Jcrespo) [05:47:27] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1119 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456334 [05:49:43] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1119 (duration: 00m 58s) [05:49:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:49:54] (03CR) 10jenkins-bot: mariadb: Depool db1119 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456333 (owner: 10Jcrespo) [05:52:04] !log upgrade and restart db1119 [05:52:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:06:10] (03PS1) 10Jcrespo: mariadb: Depool db1090 (s2 and s7) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456335 [06:11:59] RECOVERY - DPKG on stat1006 is OK: All packages OK [06:26:39] (03CR) 10Elukey: "Nice! \o/" (037 comments) [debs/presto] (debian) - 10https://gerrit.wikimedia.org/r/456277 (https://phabricator.wikimedia.org/T203115) (owner: 10Ottomata) [06:40:39] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1090 (s2 and s7) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456335 (owner: 10Jcrespo) [06:41:57] (03Merged) 10jenkins-bot: mariadb: Depool db1090 (s2 and s7) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456335 (owner: 10Jcrespo) [06:43:45] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1090 (s2 and s7) (duration: 00m 56s) [06:43:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:45:39] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:46:45] !log upgrade and restart db1090 [06:46:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:47:13] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Performance-Team (Radar): add performance team members to webserver_misc_static servers to maintain sitemaps - https://phabricator.wikimedia.org/T202910 (10ArielGlenn) Once the user(s) verify that uploads work as expected, we can close this ticket. [06:48:28] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Please add aaron to perf-team - https://phabricator.wikimedia.org/T202650 (10ArielGlenn) @aaron and @Imarlier which set of permissions should we clean up? [06:50:00] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 8.912 second response time [06:53:01] (03CR) 10jenkins-bot: mariadb: Depool db1090 (s2 and s7) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456335 (owner: 10Jcrespo) [06:56:40] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Performance-Team (Radar): add performance team members to webserver_misc_static servers to maintain sitemaps - https://phabricator.wikimedia.org/T202910 (10MoritzMuehlenhoff) Same issue as https://phabricator.wikimedia.org/T202650#4541158; Aaron ha... [06:57:42] 10Operations, 10SRE-Access-Requests, 10Discovery-Search (Current work): Onboarding Mathew Onipe - https://phabricator.wikimedia.org/T202708 (10ArielGlenn) If we expect that at some point Matt will do general ops work for us (example: clinic duty), we want to make sure there is a path forward that provides hi... [06:58:54] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1090 (s2 and s7)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456336 [06:58:56] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Performance-Team (Radar): add performance team members to webserver_misc_static servers to maintain sitemaps - https://phabricator.wikimedia.org/T202910 (10ArielGlenn) Which means we need to remove something or other; @Imarlier and/or @aaron, which... [07:00:44] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/456056 (https://phabricator.wikimedia.org/T196701) (owner: 10Dzahn) [07:00:50] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:00:58] (03PS4) 10Banyek: mariadb: Depool db2070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456155 [07:01:24] (03PS5) 10Banyek: MariaDB: Depool db2070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456155 [07:02:16] (03CR) 10Elukey: "Looks good, left a nit, thanks for this work!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/453553 (owner: 10Dzahn) [07:02:51] (03PS6) 10Banyek: MariaDB: Depool db2070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456155 [07:06:26] (03CR) 10Jcrespo: [C: 031] "Looks good" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456155 (owner: 10Banyek) [07:07:32] (03CR) 10Banyek: [C: 032] MariaDB: Depool db2070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456155 (owner: 10Banyek) [07:08:50] (03Merged) 10jenkins-bot: MariaDB: Depool db2070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456155 (owner: 10Banyek) [07:09:04] (03CR) 10jenkins-bot: MariaDB: Depool db2070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456155 (owner: 10Banyek) [07:13:01] (03PS5) 10Elukey: profile::archiva: allow rsync to bind to IPv6 interfaces [puppet] - 10https://gerrit.wikimedia.org/r/456156 (https://phabricator.wikimedia.org/T192639) [07:16:27] (03PS6) 10Elukey: profile::archiva: allow rsync to bind to IPv6 interfaces [puppet] - 10https://gerrit.wikimedia.org/r/456156 (https://phabricator.wikimedia.org/T192639) [07:34:15] !log banyek@deploy1001 Synchronized wmf-config/db-codfw.php: Depool db2070 (duration: 00m 57s) [07:34:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:37:49] (03CR) 10Joal: [C: 04-1] "Host definition to update, except from that looks correct." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455605 (https://phabricator.wikimedia.org/T197889) (owner: 10Fdans) [07:49:05] (03CR) 10Elukey: Add druid snapshot removal cron job (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455605 (https://phabricator.wikimedia.org/T197889) (owner: 10Fdans) [08:04:23] (03CR) 10Filippo Giunchedi: "LGTM overall, see note about version sorting" (031 comment) [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/456145 (https://phabricator.wikimedia.org/T198370) (owner: 10Gilles) [08:04:45] (03PS2) 10Filippo Giunchedi: Send Thumbor-Request-Id in haproxy response [puppet] - 10https://gerrit.wikimedia.org/r/456151 (https://phabricator.wikimedia.org/T187765) (owner: 10Gilles) [08:06:40] hashar: https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-php70-docker/8599/console - can you look at it? [08:06:58] hashar: also https://gerrit.wikimedia.org/r/#/c/integration/config/+/450508/ [08:07:28] (03PS1) 10Elukey: profile::analytics::cluster::packages::statistics: add lynx [puppet] - 10https://gerrit.wikimedia.org/r/456344 [08:08:39] (03CR) 10Elukey: [C: 032] profile::analytics::cluster::packages::statistics: add lynx [puppet] - 10https://gerrit.wikimedia.org/r/456344 (owner: 10Elukey) [08:10:44] (03PS1) 10Gehel: wdqs: redirect stderr from cron jobs to log file [puppet] - 10https://gerrit.wikimedia.org/r/456345 [08:12:26] 10Operations, 10Beta-Cluster-Infrastructure, 10wikidata-tech-focus, 10User-Addshore, 10User-Joe: Run mediawiki::maintenance scripts in Beta Cluster - https://phabricator.wikimedia.org/T125976 (10Addshore) p:05Normal>03High It looks like the fix for running wikidata dispatching is on more since we hav... [08:14:19] (03PS1) 10Volans: setup.py: add long description [software/spicerack] - 10https://gerrit.wikimedia.org/r/456347 (https://phabricator.wikimedia.org/T199079) [08:14:37] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, modulo Volans' comments" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/456167 (https://phabricator.wikimedia.org/T201858) (owner: 10Gilles) [08:15:45] godog: it seems that the whole class is using autolookup, so I don't want to make it a blocker for this important change [08:16:06] but we should really migrate it to the profile paradigm [08:16:06] (03CR) 10Gilles: Upgrade to 2.01 (031 comment) [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/456145 (https://phabricator.wikimedia.org/T198370) (owner: 10Gilles) [08:17:34] volans: indeed :| [08:19:03] (03CR) 10Gehel: [C: 031] "LGTM" [software/spicerack] - 10https://gerrit.wikimedia.org/r/456347 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:19:31] (03CR) 10Volans: [C: 032] setup.py: add long description [software/spicerack] - 10https://gerrit.wikimedia.org/r/456347 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:19:49] (03CR) 10Gilles: Send blind thumbnail requests to inactive DC (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/456167 (https://phabricator.wikimedia.org/T201858) (owner: 10Gilles) [08:20:35] (03Merged) 10jenkins-bot: setup.py: add long description [software/spicerack] - 10https://gerrit.wikimedia.org/r/456347 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [08:21:43] (03CR) 10Filippo Giunchedi: Upgrade to 2.01 (031 comment) [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/456145 (https://phabricator.wikimedia.org/T198370) (owner: 10Gilles) [08:22:19] (03CR) 10Volans: Send blind thumbnail requests to inactive DC (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/456167 (https://phabricator.wikimedia.org/T201858) (owner: 10Gilles) [08:22:27] (03PS3) 10Gilles: Send blind thumbnail requests to inactive DC [puppet] - 10https://gerrit.wikimedia.org/r/456167 (https://phabricator.wikimedia.org/T201858) [08:23:09] (03CR) 10Gilles: Send blind thumbnail requests to inactive DC (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/456167 (https://phabricator.wikimedia.org/T201858) (owner: 10Gilles) [08:24:27] gilles: given we're on thumbor topic... any plan for Py3? [08:24:29] * volans hides [08:25:40] volans: https://github.com/APSL/docker-thumbor/issues/79#issuecomment-387153264 [08:25:51] https://github.com/thumbor/thumbor/issues/1004 [08:26:12] it's still not quite there on their end [08:26:38] :( [08:27:17] they said 'soon' last november :D [08:27:53] seems like they have the same definition of "soon" as us [08:28:04] and most open source projects! [08:28:26] (03PS2) 10Gilles: Upgrade to 2.1 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/456145 (https://phabricator.wikimedia.org/T198370) [08:28:39] (03CR) 10Gilles: Upgrade to 2.1 (031 comment) [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/456145 (https://phabricator.wikimedia.org/T198370) (owner: 10Gilles) [08:29:08] (03CR) 10Filippo Giunchedi: [C: 031] Upgrade to 2.1 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/456145 (https://phabricator.wikimedia.org/T198370) (owner: 10Gilles) [08:29:09] heheh [08:29:30] nice, I'll merge and try the inactive dc rewrite.py patch [08:30:13] (03PS4) 10Filippo Giunchedi: Send blind thumbnail requests to inactive DC [puppet] - 10https://gerrit.wikimedia.org/r/456167 (https://phabricator.wikimedia.org/T201858) (owner: 10Gilles) [08:31:28] (03CR) 10Filippo Giunchedi: [C: 032] Send blind thumbnail requests to inactive DC [puppet] - 10https://gerrit.wikimedia.org/r/456167 (https://phabricator.wikimedia.org/T201858) (owner: 10Gilles) [08:40:35] (03CR) 10Filippo Giunchedi: [C: 032] Upgrade to 2.1 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/456145 (https://phabricator.wikimedia.org/T198370) (owner: 10Gilles) [08:45:30] PROBLEM - Check systemd state on db1090 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:46:18] (03PS2) 10Volans: sre.switchdc.mediawiki: add common parse_args [cookbooks] - 10https://gerrit.wikimedia.org/r/456111 (https://phabricator.wikimedia.org/T199079) [08:46:20] (03PS2) 10Volans: sre.switchdc.mediawiki: add Phase 0 cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/456112 (https://phabricator.wikimedia.org/T199079) [08:46:22] (03PS1) 10Volans: Completing setup [cookbooks] - 10https://gerrit.wikimedia.org/r/456350 (https://phabricator.wikimedia.org/T199079) [08:52:38] !log roll-restart swift-proxy to send requests to thumbor in eqiad and codfw - T201858 [08:52:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:52:44] T201858: Push thumbnails to both data centers - https://phabricator.wikimedia.org/T201858 [08:56:20] (03PS3) 10Gehel: Add health check for categories endpoint without lag check [puppet] - 10https://gerrit.wikimedia.org/r/456187 (owner: 10Smalyshev) [08:58:50] (03PS4) 10Gehel: Add health check for categories endpoint without lag check [puppet] - 10https://gerrit.wikimedia.org/r/456187 (owner: 10Smalyshev) [08:59:25] (03CR) 10Gehel: "> Patch Set 2:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/456187 (owner: 10Smalyshev) [09:00:00] (03CR) 10Vgutierrez: [C: 04-1] "Thanks for taking care of this Daniel :D" [puppet] - 10https://gerrit.wikimedia.org/r/456320 (https://phabricator.wikimedia.org/T202040) (owner: 10Dzahn) [09:00:01] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install backup2001 - https://phabricator.wikimedia.org/T196477 (10MoritzMuehlenhoff) Thanks! I've installed my backported test kernel and figured out why additional firmware we need, it looks promising, the driver gets loaded along with the firmware:... [09:00:41] (03CR) 10Gehel: "I'd prefer to have 2 different functions to check lag vs ping, since we want to have different logic to report the status. My proposal is " [puppet] - 10https://gerrit.wikimedia.org/r/456187 (owner: 10Smalyshev) [09:03:50] (03PS5) 10Gehel: Add health check for categories endpoint without lag check [puppet] - 10https://gerrit.wikimedia.org/r/456187 (owner: 10Smalyshev) [09:04:28] 10Operations, 10Beta-Cluster-Infrastructure, 10Wikidata, 10wikidata-tech-focus, and 2 others: Run mediawiki::maintenance scripts in Beta Cluster - https://phabricator.wikimedia.org/T125976 (10Addshore) [09:06:20] (03CR) 10ArielGlenn: [C: 031] "Agree the cache length is reasonable given the low burden this will place on the server." [puppet] - 10https://gerrit.wikimedia.org/r/456206 (https://phabricator.wikimedia.org/T202734) (owner: 10Krinkle) [09:06:27] (03PS1) 10Aleksey Bekh-Ivanov (WMDE): Beta Wikidata: Use new item ID formatter for all the items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456357 (https://phabricator.wikimedia.org/T203147) [09:06:29] (03PS1) 10Aleksey Bekh-Ivanov (WMDE): Test Wikidata: Use new item ID formatter for all the items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456358 (https://phabricator.wikimedia.org/T203147) [09:06:58] (03CR) 10Aleksey Bekh-Ivanov (WMDE): "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456357 (https://phabricator.wikimedia.org/T203147) (owner: 10Aleksey Bekh-Ivanov (WMDE)) [09:07:06] (03CR) 10Aleksey Bekh-Ivanov (WMDE): "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456358 (https://phabricator.wikimedia.org/T203147) (owner: 10Aleksey Bekh-Ivanov (WMDE)) [09:08:37] gilles volans LGTM, I see swift in codfw getting requests https://grafana.wikimedia.org/dashboard/file/swift.json?orgId=1&from=now-3h&to=now-1m&var-DC=codfw&refresh=1m [09:09:37] (03CR) 10Filippo Giunchedi: [C: 032] Send Thumbor-Request-Id in haproxy response [puppet] - 10https://gerrit.wikimedia.org/r/456151 (https://phabricator.wikimedia.org/T187765) (owner: 10Gilles) [09:09:43] godog: why also GETs are getting more requests? [09:09:47] (03PS3) 10Filippo Giunchedi: Send Thumbor-Request-Id in haproxy response [puppet] - 10https://gerrit.wikimedia.org/r/456151 (https://phabricator.wikimedia.org/T187765) (owner: 10Gilles) [09:09:59] volans: thumbor has to fetch the original [09:10:17] and it tries to see if the thumbnail is already there as well [09:10:31] so we're making the thumb twice? one per DC? [09:10:35] yep [09:10:51] ok [09:11:11] * volans pondering pro/cons of that vs sending the generated one [09:12:18] so with this we could have active/active, as requests will go both ways and thumbor will do the thumb only if it doesn't have it locally already [09:12:31] yes [09:12:35] I guess there will races and we might waste CPU re-doing some thumbs [09:12:54] sure, but right now codfw is 100% idle [09:13:01] it's going to waste a bit of electricity that's all [09:13:19] it would be nice to track those when we go active/active [09:14:06] to make sure we don't have patterns like: if a new image is seen, most of the time is requested within the same seconds in both DCs and we have a non-negligible percentage of double requests [09:14:36] (03CR) 10Vgutierrez: [C: 031] "pcc shows a sane output: https://puppet-compiler.wmflabs.org/compiler03/12293/" [puppet/nginx] - 10https://gerrit.wikimedia.org/r/455830 (https://phabricator.wikimedia.org/T200722) (owner: 10Alexandros Kosiaris) [09:18:32] 10Operations, 10Wikimedia-Planet, 10Patch-For-Review: en.planet hasn't updated since July 25 - https://phabricator.wikimedia.org/T203055 (10Paladox) 05Open>03Resolved Seems resolved now. [09:22:05] (03CR) 10ArielGlenn: [C: 031] "Ah! I see the cron command change now, yes it looks fine. I have not tested the script however." [puppet] - 10https://gerrit.wikimedia.org/r/447922 (https://phabricator.wikimedia.org/T144103) (owner: 10Smalyshev) [09:22:34] (03CR) 10Volans: [C: 04-1] "I like the approach, one typo and a couple of optional comments inline" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/456187 (owner: 10Smalyshev) [09:27:13] 10Operations, 10TCB-Team, 10WMDE-QWERTY-Team, 10wikidiff2, 10WMDE-QWERTY-Sprint-2018-08-29: Release wikidiff2 v1.7.3 - https://phabricator.wikimedia.org/T202301 (10ArielGlenn) p:05Triage>03Normal [09:30:59] 10Operations, 10Beta-Cluster-Infrastructure, 10Jenkins, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Upgrade deployment-prep deployment servers to stretch - https://phabricator.wikimedia.org/T192561 (10hashar) A while ago, once a change got merged for parsoid we would trigger a Jenkins job th... [09:39:21] (03CR) 10Gehel: "Congratulation on that second change! There are a lot of comments inline, but mostly about minor things. Thanks!" (0311 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/456322 (https://phabricator.wikimedia.org/T199079) (owner: 10Mathew.onipe) [09:48:09] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 20 probes of 320 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [09:50:47] 10Operations, 10MediaWiki-extensions-Translate, 10Language-2018-July-September, 10MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), and 4 others: 503 error attempting to open multiple projects (Wikipedia and meta wiki are loading very slowly) - https://phabricator.wikimedia.org/T195293 (10Niker... [09:51:15] 10Operations, 10MediaWiki-extensions-Translate, 10Language-2018-July-September, 10MW-1.32-release-notes (WMF-deploy-2018-08-21 (1.32.0-wmf.18)), and 4 others: 503 error attempting to open multiple projects (Wikipedia and meta wiki are loading very slowly) - https://phabricator.wikimedia.org/T195293 (10Niker... [09:52:44] volans: operations/cookbooks.git now has the tox ci job :) [09:53:10] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 17 probes of 320 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [09:53:50] hashar: thanks a lot! [09:54:05] (03CR) 10Volans: "recheck" [cookbooks] - 10https://gerrit.wikimedia.org/r/456350 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [09:54:34] 10Operations, 10Release Pipeline, 10Release-Engineering-Team (Watching / External): Update Debian package of Blubber (0.5.0-1) - https://phabricator.wikimedia.org/T203121 (10ArielGlenn) p:05Triage>03Normal [09:55:25] (03CR) 10Volans: "recheck" [cookbooks] - 10https://gerrit.wikimedia.org/r/456111 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [09:55:38] (03CR) 10Volans: "recheck" [cookbooks] - 10https://gerrit.wikimedia.org/r/456112 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [09:56:29] PROBLEM - Host backup2001 is DOWN: PING CRITICAL - Packet loss = 100% [09:57:34] moritzm: is this you? ^^^ [09:58:45] yeah, will silence again, downtime expired [09:58:53] ack, no prob, thx [10:04:14] !log roll-restart thumbor to upgrade 2.1 [10:04:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:04:20] (03PS4) 10ArielGlenn: move huwiki, arwiki to 'bigwikis' for xml/sql dumps [puppet] - 10https://gerrit.wikimedia.org/r/453990 (https://phabricator.wikimedia.org/T202268) [10:06:26] (03CR) 10ArielGlenn: [C: 032] move huwiki, arwiki to 'bigwikis' for xml/sql dumps [puppet] - 10https://gerrit.wikimedia.org/r/453990 (https://phabricator.wikimedia.org/T202268) (owner: 10ArielGlenn) [10:09:31] (03PS3) 10Fdans: Add druid snapshot removal cron job [puppet] - 10https://gerrit.wikimedia.org/r/455605 (https://phabricator.wikimedia.org/T197889) [10:17:27] (03PS1) 10Gilles: Catch and log inactive DC HTTP errors in Swift Proxy [puppet] - 10https://gerrit.wikimedia.org/r/456366 (https://phabricator.wikimedia.org/T201858) [10:24:19] (03PS2) 10Arturo Borrero Gonzalez: striker: Point at cloudcontrol1003 for OpenStack APIs [puppet] - 10https://gerrit.wikimedia.org/r/456288 (https://phabricator.wikimedia.org/T201504) (owner: 10BryanDavis) [10:32:36] (03CR) 10Arturo Borrero Gonzalez: [C: 032] striker: Point at cloudcontrol1003 for OpenStack APIs [puppet] - 10https://gerrit.wikimedia.org/r/456288 (https://phabricator.wikimedia.org/T201504) (owner: 10BryanDavis) [10:37:01] !log upgrading deployment-prep to wikidiff 1.7.3 [10:37:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:06] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: It is that lovely time of the day again! You are hereby commanded to deploy European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180830T1100). [11:00:06] tgr and Aleksey_WMDE: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:16] here [11:00:20] o/ [11:00:35] I can SWAT today [11:00:41] Cool! [11:00:45] tgr: go ahead while I review Aleksey_WMDE's patches [11:01:22] (03PS2) 10Gergő Tisza: Add editsitejson to everyone who has editinterface [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455561 (https://phabricator.wikimedia.org/T190015) [11:02:07] (03CR) 10Gergő Tisza: [C: 032] Add editsitejson to everyone who has editinterface [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455561 (https://phabricator.wikimedia.org/T190015) (owner: 10Gergő Tisza) [11:02:54] Aleksey_WMDE: is there a reason your two patches are not not one patch? :D [11:03:17] zeljkof: I've been told that it is a "bad practice"... [11:03:32] Planed to make a single patch initially [11:03:33] (03Merged) 10jenkins-bot: Add editsitejson to everyone who has editinterface [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455561 (https://phabricator.wikimedia.org/T190015) (owner: 10Gergő Tisza) [11:03:33] Aleksey_WMDE: to put related code in one patch? ;) [11:03:58] These are not totally related. [11:04:12] ah, ok, in that case it makes sense to separate them [11:04:14] zeljkof: I think thats related to the putting things in seperate patches that should be synced in an order [11:04:16] either is fine with me [11:04:56] patches looked related to me, hence the question, just for my information, I have no preferences either way [11:05:52] addshore: want to deploy those patches? I can do it, asking since you're around and looks like you're familiar with them [11:06:16] and I'm always looking for an excuse for somebody else to do the swat! :D [11:06:28] * addshore has his head stuck in something, sorry :/ [11:06:39] no problemo, I'll swat [11:07:24] (03CR) 10jenkins-bot: Add editsitejson to everyone who has editinterface [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455561 (https://phabricator.wikimedia.org/T190015) (owner: 10Gergő Tisza) [11:07:26] (03CR) 10Filippo Giunchedi: Catch and log inactive DC HTTP errors in Swift Proxy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/456366 (https://phabricator.wikimedia.org/T201858) (owner: 10Gilles) [11:09:16] !log tgr@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:455561|Add editsitejson to everyone who has editinterface (T190015)]] (duration: 01m 00s) [11:09:19] zeljkof: done [11:09:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:09:53] tgr: great! I'll continue with swat [11:10:47] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456357 (https://phabricator.wikimedia.org/T203147) (owner: 10Aleksey Bekh-Ivanov (WMDE)) [11:10:48] I imagine the patches are separate because they are load tests and one of them need to be reverted while the other would still be fine [11:11:11] tgr: makes sence [11:11:11] that or cargo cult programming [11:11:13] sense [11:11:29] Aleksey_WMDE: can both patches be tested at mwdebug1002? [11:11:41] Yeah [11:11:43] "last time InitializeSettings and CommonSettings in the same patch broke SWAT so let's make every file is a separate patch" [11:12:05] (03Merged) 10jenkins-bot: Beta Wikidata: Use new item ID formatter for all the items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456357 (https://phabricator.wikimedia.org/T203147) (owner: 10Aleksey Bekh-Ivanov (WMDE)) [11:12:15] you can't test the beta patch via mwdebug [11:12:29] then again, it's just a beta patch [11:12:58] tgr: that was my thought, when merged, it gets deployed to beta intermediately, right? [11:13:08] tgr: Not a problem. Can go live [11:13:09] or, automatically, anyway [11:13:35] not immediately but soon [11:13:45] anyway, deploying [11:13:50] puppet does a git pull every 5 minutes I think [11:14:25] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:456357|Beta Wikidata: Use new item ID formatter for all the items (T203147)]] (duration: 00m 57s) [11:14:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:14:30] T203147: Enable new ItemId formatter for all items on Test and Beta Wikidata - https://phabricator.wikimedia.org/T203147 [11:14:38] Aleksey_WMDE 456357 is deployed ^ [11:15:09] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456358 (https://phabricator.wikimedia.org/T203147) (owner: 10Aleksey Bekh-Ivanov (WMDE)) [11:15:22] (03CR) 10Zfilipin: Test Wikidata: Use new item ID formatter for all the items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456358 (https://phabricator.wikimedia.org/T203147) (owner: 10Aleksey Bekh-Ivanov (WMDE)) [11:15:24] (03PS2) 10Zfilipin: Test Wikidata: Use new item ID formatter for all the items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456358 (https://phabricator.wikimedia.org/T203147) (owner: 10Aleksey Bekh-Ivanov (WMDE)) [11:15:32] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456358 (https://phabricator.wikimedia.org/T203147) (owner: 10Aleksey Bekh-Ivanov (WMDE)) [11:17:45] zeljkof: Will look [11:18:08] (03Merged) 10jenkins-bot: Test Wikidata: Use new item ID formatter for all the items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456358 (https://phabricator.wikimedia.org/T203147) (owner: 10Aleksey Bekh-Ivanov (WMDE)) [11:18:42] 456357 looks fine [11:18:59] Aleksey_WMDE: 456358 is at mwdebug1002 [11:19:21] Will check. Give me 5 minutes [11:19:29] sure [11:23:19] (03CR) 10jenkins-bot: Beta Wikidata: Use new item ID formatter for all the items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456357 (https://phabricator.wikimedia.org/T203147) (owner: 10Aleksey Bekh-Ivanov (WMDE)) [11:23:21] (03CR) 10jenkins-bot: Test Wikidata: Use new item ID formatter for all the items [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456358 (https://phabricator.wikimedia.org/T203147) (owner: 10Aleksey Bekh-Ivanov (WMDE)) [11:23:23] zeljkof: All is good [11:24:35] Aleksey_WMDE: did you run a script to test? since I'm seeing a lot of these in mwdebug logs [11:24:50] `[report-only] Received CSP report: blocked from being loaded on ` [11:24:58] Nope. Just testing in a browser [11:25:13] any reason logs are full of those? [11:25:19] The CSP reports are my doing [11:25:44] bawolff: ah, ok then, it's test wikidata, the same one we're deploying a change for :) [11:25:51] Aleksey_WMDE: ok, deploying [11:25:56] I'm testing CSP on all group0 wikis [11:26:25] * zeljkof googles CSP [11:26:48] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:456358|Test Wikidata: Use new item ID formatter for all the items (T203147)]] (duration: 00m 53s) [11:26:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:26:56] T203147: Enable new ItemId formatter for all items on Test and Beta Wikidata - https://phabricator.wikimedia.org/T203147 [11:27:00] zeljkof: That said, you should not be getting warnings about loading fonts.gstatic.com unless you put something in your special:MyPage/common.js [11:27:09] Aleksey_WMDE: deployed! please test and thanks for deploying with #releng! :) [11:27:11] Content Security Policy - HTTP headers blocking loading of assets from third party websites :) [11:27:25] bawolff: that's from logs [11:27:25] Although some poorly written browser extensions, and various addware sometimes can cause that warning too [11:27:28] zeljkof: Thanks [11:27:44] zeljkof: Ok, you can ignore in the logs, I thought you meant you personally [11:27:57] bawolff: no, sorry, forgot to make it explicit [11:28:15] The logspam is almost entirely poorly written browser extensions and users with adware that's inserting crap into our pages [11:28:44] no more commits for swat, so... [11:28:47] For background, see the draft plan at https://www.mediawiki.org/wiki/User:BWolff_(WMF)/CSP_plan [11:28:53] !log EU SWAT finished [11:28:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:29:24] jouncebot: next [11:29:25] In 0 hour(s) and 30 minute(s): Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180830T1200) [11:30:33] Reedy: but there is no EU train this week, so the next deployment is at 16 UTC [11:30:42] Cheers :) [11:30:45] Need to create a wiki [11:31:06] * zeljkof runs as far as possible ;P [11:33:02] (03PS1) 10Banyek: Revert "MariaDB: Depool db2070" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456374 [11:34:53] (03PS3) 10Reedy: Initial configuration for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455723 (https://phabricator.wikimedia.org/T202819) (owner: 10Urbanecm) [11:35:02] (03CR) 10Reedy: [C: 032] Initial configuration for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455723 (https://phabricator.wikimedia.org/T202819) (owner: 10Urbanecm) [11:35:30] (03CR) 10Jcrespo: [C: 031] Revert "MariaDB: Depool db2070" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456374 (owner: 10Banyek) [11:35:55] (03CR) 10Banyek: [C: 032] Revert "MariaDB: Depool db2070" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456374 (owner: 10Banyek) [11:36:31] (03Merged) 10jenkins-bot: Initial configuration for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455723 (https://phabricator.wikimedia.org/T202819) (owner: 10Urbanecm) [11:37:05] (03Merged) 10jenkins-bot: Revert "MariaDB: Depool db2070" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456374 (owner: 10Banyek) [11:39:11] Urbanecm: bankek is going to do a quick merge, ok? [11:39:24] *banyek [11:39:56] jynus, sure, I have no problems with it [11:40:08] (03CR) 10jenkins-bot: Initial configuration for fixcopyrightwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/455723 (https://phabricator.wikimedia.org/T202819) (owner: 10Urbanecm) [11:40:10] (03CR) 10jenkins-bot: Revert "MariaDB: Depool db2070" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456374 (owner: 10Banyek) [11:40:23] 👍 [11:40:45] (03PS2) 10Gilles: Catch and log inactive DC HTTP errors in Swift Proxy [puppet] - 10https://gerrit.wikimedia.org/r/456366 (https://phabricator.wikimedia.org/T201858) [11:40:49] oh, sorry, I saw reedy was the one that merged [11:40:53] so ping [11:41:09] I'm paying attention, don't worry ;) [11:41:12] he he [11:41:30] no problems here either, I am just making sure he doesn't get confused [11:41:47] * Urbanecm has no problems with merges at any time, others may have problems with it, through [11:41:53] !log banyek@deploy1001 Synchronized wmf-config/db-codfw.php: Repool db2070 (duration: 00m 55s) [11:41:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:42:23] he==banyek [11:42:34] this is his 2nd deploy [11:43:31] How many deploys does a deployer need to be able to call themselves an experienced deployer? [11:44:02] 34 [11:44:10] no more no less [11:44:13] :-D [11:44:17] Ok, writing it down... [11:44:32] I still not broke anything => no t-shirt for me [11:45:23] I guess your 35th deploy will be breaking, because you start to be experienced per jynus and pay less attention to the deploys :D [11:50:24] Are there achievements for getting that many deploys? [11:50:29] :P [11:53:57] :) [11:54:35] I've broken the wikis more times than that ;) [11:54:49] (03PS1) 10Reedy: Add fixcopyrightwiki to wikiversion.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456375 (https://phabricator.wikimedia.org/T202819) [11:54:55] BUT you only get one (1) t-shirt [11:54:59] Mines framed [11:55:03] So, who's winning? [11:55:12] (03CR) 10Reedy: [C: 032] Add fixcopyrightwiki to wikiversion.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456375 (https://phabricator.wikimedia.org/T202819) (owner: 10Reedy) [11:55:48] do we count number of outages or add up the total outage lengths? [11:56:00] Sounds like a multiplication [11:56:19] and do we start with the hire of the first employee, or when WMF ws incorporated? [11:56:29] (03Merged) 10jenkins-bot: Add fixcopyrightwiki to wikiversion.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456375 (https://phabricator.wikimedia.org/T202819) (owner: 10Reedy) [11:56:42] (03CR) 10jenkins-bot: Add fixcopyrightwiki to wikiversion.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456375 (https://phabricator.wikimedia.org/T202819) (owner: 10Reedy) [11:57:30] PROBLEM - MariaDB Slave Lag: x1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 883.00 seconds [11:57:48] damn it [11:58:10] PROBLEM - MariaDB Slave Lag: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 926.76 seconds [11:58:24] (03PS1) 10Reedy: Add -wmf in fixcopyright in wikiversions.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456376 [11:58:39] (03CR) 10Reedy: [C: 032] Add -wmf in fixcopyright in wikiversions.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456376 (owner: 10Reedy) [11:59:56] (03Merged) 10jenkins-bot: Add -wmf in fixcopyright in wikiversions.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456376 (owner: 10Reedy) [12:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180830T1200) [12:01:59] wow [12:02:02] we have new wiki? [12:02:16] * revi was triggered @ newprojects list [12:02:52] yes, fixcopyright.wikimedia.org [12:03:04] T202819 [12:03:05] T202819: Create production wiki: fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T202819 [12:04:16] uh fishbowl [12:04:30] don't like fishbowl? [12:04:33] no [12:04:40] that means I don't need to create my userpage, I guess [12:04:58] that's true [12:05:05] that doesn't sound like SUL so [12:05:30] and weirdly enough stewards doesn't have autocreate on SUL as well :P [12:05:45] so I can't attach an account on closed wikis xD [12:06:10] if SUL doesn't stand for simple unnecessary large [wiki]... [12:06:24] lol [12:06:29] no the Single Unified Login [12:06:50] closed wikis cannot have no new account and stewards are no exceptions it looks like [12:07:08] Steward is an exception, since he is logged in [12:07:10] I've created my account on closed wikis :P [12:07:21] heh [12:07:21] by using createAndPromote thing? :D [12:07:22] I can't [12:07:26] oh that lol [12:07:29] that one I can't use [12:07:36] !log reedy@deploy1001 Synchronized dblists/: fixcopyrightwiki (duration: 00m 56s) [12:07:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:55] Urbanecm: they're no exception for stewards [12:08:09] stewards also have to have an local account before the wiki is closed [12:08:10] https://meta.wikimedia.org/wiki/Steward_requests/Miscellaneous#Edit_request_in_a_locked_wiki [12:08:10] RECOVERY - MariaDB Slave Lag: x1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [12:08:21] or you won't be able to edit there [12:08:57] (03PS1) 10Reedy: Remove fixcopyright from wikimedia.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456378 [12:09:21] revi, hmm, I thought local steward on closed can create [12:09:24] But he cannot [12:10:16] !log reedy@deploy1001 rebuilt and synchronized wikiversions files: fixcopyright [12:10:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:10:24] Reedy, why you're removing it from wikimedia.dblist? [12:10:26] if I may ask? [12:10:30] (03PS6) 10Gehel: Add health check for categories endpoint without lag check [puppet] - 10https://gerrit.wikimedia.org/r/456187 (owner: 10Smalyshev) [12:10:31] Because it's not a chapter wiki [12:10:43] (03CR) 10Gehel: Add health check for categories endpoint without lag check (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/456187 (owner: 10Smalyshev) [12:11:14] the name is confusing [12:11:34] well [12:11:44] should probably a) rename it someday or b) allow dblist files to take comments, so that we can put notes like that in the header [12:11:47] wikimedia conference is the chapters meeting just like that dblists lol [12:11:57] !log reedy@deploy1001 Synchronized static/images/project-logos: fixcopyrightwiki (duration: 00m 55s) [12:12:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:01] (03CR) 10jenkins-bot: Add -wmf in fixcopyright in wikiversions.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456376 (owner: 10Reedy) [12:12:11] (03CR) 10Gehel: [C: 031] "LGTM, trivial enough" [cookbooks] - 10https://gerrit.wikimedia.org/r/456350 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [12:13:04] !log reedy@deploy1001 Synchronized wmf-config/InitialiseSettings.php: fixcopyrightwiki (duration: 00m 56s) [12:13:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:13:44] Reedy, then you should probably add it to special.dblist [12:14:00] per https://wikitech.wikimedia.org/wiki/Add_a_wiki#Database_creation, each wiki must be in at least one "classification" list [12:14:13] !bug 1 [12:14:13] https://bugzilla.wikimedia.org/show_bug.cgi?id=1 [12:14:16] :D [12:14:21] We use 'special' for almost no purpose [12:14:28] # Special wikis [12:14:28] 'special' => 'en', # default - overridden below by some wikis [12:14:30] Woop de doo :P [12:14:43] 10Operations, 10Dumps-Generation: Reboots of dumps/snapshot hosts for L1TF/microcode updates - https://phabricator.wikimedia.org/T202623 (10ArielGlenn) Were waiting for the truthy nt bz2 files to be written, probly late tonight, and then I'll be able to reboot the last host. [12:14:43] still, it is in the rules :D [12:14:52] we should follow the rules, shouldn't we [12:15:06] (03CR) 10Gehel: [C: 031] "LGTM" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/456111 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [12:15:10] rules are there to be broken [12:15:21] (03PS2) 10Reedy: Fixup fixcopyright dblists and multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456378 [12:15:22] are you sure? [12:15:25] (03CR) 10Reedy: [C: 032] Fixup fixcopyright dblists and multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456378 (owner: 10Reedy) [12:15:29] Reedy is a rebel [12:15:54] we have only one rule [12:15:54] ok then [12:15:55] IAR [12:16:24] {{sojustfixit}} [12:16:29] I'm against rudeness :D [12:16:31] * Reedy deletes the page [12:16:34] (03CR) 10jerkins-bot: [V: 04-1] Fixup fixcopyright dblists and multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456378 (owner: 10Reedy) [12:16:47] * revi blocks the user [12:16:54] (03CR) 10jerkins-bot: [V: 04-1] Fixup fixcopyright dblists and multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456378 (owner: 10Reedy) [12:17:14] 12:16:32 Contents of 'nowikidatadescriptiontaglines' must match expansion of 'nowikidatadescriptiontaglines-computed' [12:17:16] bah [12:17:30] * addshore hides as it says wikidata in it [12:18:18] (03PS3) 10Reedy: Fixup fixcopyright dblists and multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456378 [12:18:22] (03CR) 10Reedy: [C: 032] Fixup fixcopyright dblists and multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456378 (owner: 10Reedy) [12:19:49] (03CR) 10jerkins-bot: [V: 04-1] Fixup fixcopyright dblists and multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456378 (owner: 10Reedy) [12:19:51] (03CR) 10jerkins-bot: [V: 04-1] Fixup fixcopyright dblists and multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456378 (owner: 10Reedy) [12:20:31] (03PS4) 10Reedy: Fixup fixcopyright dblists and multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456378 [12:20:35] (03CR) 10Reedy: [C: 032] Fixup fixcopyright dblists and multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456378 (owner: 10Reedy) [12:21:47] Urbanecm: Are you updating parsoids sitematrix manually? [12:21:52] (03Merged) 10jenkins-bot: Fixup fixcopyright dblists and multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456378 (owner: 10Reedy) [12:22:21] (03CR) 10Gehel: [C: 04-1] "see comments inline" (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/456112 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [12:22:26] !log uploaded grafana 4.6.4 to apt.wikimedia.org [12:22:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:23:05] !log reedy@deploy1001 Synchronized dblists/: fixcopyrightwiki (duration: 00m 56s) [12:23:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:23:39] (03PS1) 10Reedy: Updating interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456380 [12:23:41] (03CR) 10Reedy: [C: 032] Updating interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456380 (owner: 10Reedy) [12:24:59] (03Merged) 10jenkins-bot: Updating interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456380 (owner: 10Reedy) [12:25:55] !log reedy@deploy1001 Synchronized wmf-config/interwiki.php: Updating interwiki cache (duration: 02m 36s) [12:25:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:27:37] Sweet. Thanks for doing the prep patch Urbanecm :) [12:28:38] !log roll restart thumbor in eqiad to upgrade to 2.1 [12:28:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:28:55] (03CR) 10jenkins-bot: Fixup fixcopyright dblists and multiversion [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456378 (owner: 10Reedy) [12:28:57] (03CR) 10jenkins-bot: Updating interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456380 (owner: 10Reedy) [12:39:40] yw Reedy [12:42:10] (03CR) 10Volans: [C: 032] Completing setup [cookbooks] - 10https://gerrit.wikimedia.org/r/456350 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [12:43:00] (03Merged) 10jenkins-bot: Completing setup [cookbooks] - 10https://gerrit.wikimedia.org/r/456350 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [12:43:40] (03CR) 10Volans: "reply inline" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/456111 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [12:47:40] Reedy, no, by a script [12:49:54] (03CR) 10Volans: "thanks for the review, replies inline" (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/456112 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [12:49:56] (03CR) 10Gehel: [C: 031] sre.switchdc.mediawiki: add common parse_args (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/456111 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [12:50:22] (03CR) 10Volans: [C: 032] sre.switchdc.mediawiki: add common parse_args [cookbooks] - 10https://gerrit.wikimedia.org/r/456111 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [12:50:59] (03Merged) 10jenkins-bot: sre.switchdc.mediawiki: add common parse_args [cookbooks] - 10https://gerrit.wikimedia.org/r/456111 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [12:53:46] (03CR) 10Joal: [C: 031] "Looks good to me :)" [puppet] - 10https://gerrit.wikimedia.org/r/455605 (https://phabricator.wikimedia.org/T197889) (owner: 10Fdans) [12:56:10] (03CR) 10Volans: [C: 04-1] "Sorry, almost there, 2 small issues." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/456187 (owner: 10Smalyshev) [12:57:37] (03CR) 10Elukey: "IIUC this nginx config is the one running on all the puppet compiler hosts, and it receives requests in which the fwd_host is specified in" [puppet] - 10https://gerrit.wikimedia.org/r/456287 (https://phabricator.wikimedia.org/T191438) (owner: 10Herron) [12:58:10] (03CR) 10Gehel: [C: 031] "Still the title to improve, feel free to ignore the other comment. And if the only change is the title, feel free to merge without another" (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/456112 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [12:59:48] !log drain + reboot analytics10[28-79]* for kernel updates (will take multiple days) [12:59:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:04] Deploy window MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180830T1300) [13:00:53] (03PS1) 10ArielGlenn: double bw cap for rsync between dumps peers [puppet] - 10https://gerrit.wikimedia.org/r/456385 (https://phabricator.wikimedia.org/T202614) [13:02:40] (03PS7) 10Gehel: Add health check for categories endpoint without lag check [puppet] - 10https://gerrit.wikimedia.org/r/456187 (owner: 10Smalyshev) [13:02:51] (03CR) 10Gehel: Add health check for categories endpoint without lag check (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/456187 (owner: 10Smalyshev) [13:03:09] !log upgrading grafana to 4.6.4 (security release) [13:03:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:16] (03CR) 10ArielGlenn: [C: 032] double bw cap for rsync between dumps peers [puppet] - 10https://gerrit.wikimedia.org/r/456385 (https://phabricator.wikimedia.org/T202614) (owner: 10ArielGlenn) [13:05:48] (03PS3) 10Arturo Borrero Gonzalez: cloudvps: nova-network: allow extra dnsmasq/dhcp option [puppet] - 10https://gerrit.wikimedia.org/r/456127 (https://phabricator.wikimedia.org/T202636) [13:06:20] (03PS3) 10Volans: sre.switchdc.mediawiki: add Phase 0 cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/456112 (https://phabricator.wikimedia.org/T199079) [13:06:40] (03CR) 10Arturo Borrero Gonzalez: [C: 032] cloudvps: nova-network: allow extra dnsmasq/dhcp option [puppet] - 10https://gerrit.wikimedia.org/r/456127 (https://phabricator.wikimedia.org/T202636) (owner: 10Arturo Borrero Gonzalez) [13:07:57] (03CR) 10Volans: "inline" (032 comments) [cookbooks] - 10https://gerrit.wikimedia.org/r/456112 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:08:33] !log sanitizing fixcopyrightwiki on db1124 and children T202820 [13:08:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:08:38] T202820: Prepare and check storage layer for fixcopyright.wikimedia.org - https://phabricator.wikimedia.org/T202820 [13:10:42] !log sanitizing fixcopyrightwiki on db2094 and children T202820 [13:10:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:11:44] (03CR) 10Gehel: [C: 031] sre.switchdc.mediawiki: add Phase 0 cookbooks (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/456112 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:12:39] (03CR) 10Alexandros Kosiaris: [C: 032] ores in labs: issue 403 for two user agents [puppet] - 10https://gerrit.wikimedia.org/r/456126 (https://phabricator.wikimedia.org/T202655) (owner: 10Ladsgroup) [13:12:45] (03PS2) 10Alexandros Kosiaris: ores in labs: issue 403 for two user agents [puppet] - 10https://gerrit.wikimedia.org/r/456126 (https://phabricator.wikimedia.org/T202655) (owner: 10Ladsgroup) [13:13:20] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] ores in labs: issue 403 for two user agents [puppet] - 10https://gerrit.wikimedia.org/r/456126 (https://phabricator.wikimedia.org/T202655) (owner: 10Ladsgroup) [13:13:53] (03CR) 10Volans: [C: 032] sre.switchdc.mediawiki: add Phase 0 cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/456112 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:14:32] (03Merged) 10jenkins-bot: sre.switchdc.mediawiki: add Phase 0 cookbooks [cookbooks] - 10https://gerrit.wikimedia.org/r/456112 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [13:14:48] not sure why, but the new wiki hasn't been created on labs [13:15:04] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install backup2001 - https://phabricator.wikimedia.org/T196477 (10Papaul) @MoritzMuehlenhoff both the 10GB and 1GB NIC's are already connected to the switch 10 GB NIC is on xe-2/0/11 1GB NIC is on ge-2/0/12 [13:15:56] oh, there is a lot of lag [13:15:58] checking why [13:16:16] there is 2 hours of lag [13:19:25] Reedy: there is 0 users on the new wiki [13:19:32] I hope that is normal [13:19:37] or expected [13:20:32] (03CR) 10Elukey: "Looks good! https://puppet-compiler.wmflabs.org/compiler02/12298/" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455605 (https://phabricator.wikimedia.org/T197889) (owner: 10Fdans) [13:21:30] (03PS1) 10Ladsgroup: Revert "ores in labs: issue 403 for two user agents" [puppet] - 10https://gerrit.wikimedia.org/r/456387 [13:23:53] (03CR) 10Ottomata: "Wow hahaha amazing." [puppet] - 10https://gerrit.wikimedia.org/r/456344 (owner: 10Elukey) [13:24:09] PROBLEM - Filesystem available is greater than filesystem size on ms-be2042 is CRITICAL: cluster=swift device=/dev/sdf1 fstype=xfs instance=ms-be2042:9100 job=node mountpoint=/srv/swift-storage/sdf1 site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2042&var-datasource=codfw%2520prometheus%252Fops [13:24:24] chasemp: btw VPN would be really helpful for https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/456344/ too [13:26:47] ottomata: yeah fair, we should write up a brief proposal of functionality with needed features to explore looking for a solution [13:27:02] aye, we don't even know how to solve for that one really [13:27:15] the problem with that is, the hadoop web ui allows you to view jobs running [13:27:20] but if you want to view current logs in it [13:27:33] you need to access the http port of the indivual node manager processes on the workers [13:27:53] and the html given by the resourcemanager server has the internal nodemanager hostnames in the links [13:28:32] so, you have to click on the link, see what changed, then tunnel to the host itself and then change url in browser to localhost:port [13:28:43] but, only us opsen can do that, because users don't have shell access to the worker nodes [13:30:00] yeah interesting [13:31:38] !log depooling labsdb1009 due to extra lag [13:31:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:36:38] ottomata: there is also the yarn -logs command to use, I forgot to mention it to dcausee [13:36:41] *dcausse [13:36:47] oh ya [13:36:48] so this particular case might not be really interesting [13:36:53] elukey: that is good, but it only works after the job hs finished [13:36:59] ah yes yes [13:37:08] yarn -log? [13:37:18] yarn logs -applicationId application_XXX [13:37:24] after your job has finished [13:37:27] ah [13:37:29] yes sorry I didn't remember the exact syntax [13:37:33] that will dump everything from all workers [13:37:39] but it returns the logs for an appid [13:37:40] all logs [13:37:49] RECOVERY - Check systemd state on db1090 is OK: OK - running: The system is fully operational [13:37:57] I needed something live, so that I can kill the jobs if the output does not match what I expect (debug) [13:38:14] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db1090 (s2 and s7)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456336 (owner: 10Jcrespo) [13:38:19] RECOVERY - Host backup2001 is UP: PING OK - Packet loss = 0%, RTA = 36.12 ms [13:38:28] for lynx on the worker output is perfect, I can refresh [13:38:33] s/for/for me/ [13:38:35] oh dcausse just found this on your wikitech stuff https://wikitech.wikimedia.org/wiki/Discovery/Analytics#Yarn_manager_logs [13:38:36] also not bad [13:38:45] ottomata: thanks [13:38:51] coo [13:39:00] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db1119 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456334 (owner: 10Jcrespo) [13:39:01] will update with lynx [13:39:08] (03PS2) 10Jcrespo: Revert "mariadb: Depool db1119 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456334 [13:39:33] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1090 (s2 and s7)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456336 (owner: 10Jcrespo) [13:40:51] (03PS3) 10Jcrespo: Revert "mariadb: Depool db1119 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456334 [13:41:55] !log upgrading mw1261 to wikidiff 1.7.3 [13:41:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:44:40] ACKNOWLEDGEMENT - MariaDB Slave Lag: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 7276.13 seconds Jcrespo new wiki creation [13:45:42] (03PS3) 10Filippo Giunchedi: Catch and log inactive DC HTTP errors in Swift Proxy [puppet] - 10https://gerrit.wikimedia.org/r/456366 (https://phabricator.wikimedia.org/T201858) (owner: 10Gilles) [13:46:24] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1119, db1090 (s2 and s7) (duration: 00m 59s) [13:46:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:48:40] (03CR) 10Alexandros Kosiaris: [C: 032] Revert "ores in labs: issue 403 for two user agents" [puppet] - 10https://gerrit.wikimedia.org/r/456387 (owner: 10Ladsgroup) [13:49:43] (03PS1) 10Ladsgroup: ores in labs: fix up the nginx rule [puppet] - 10https://gerrit.wikimedia.org/r/456390 (https://phabricator.wikimedia.org/T202655) [13:49:51] (03Abandoned) 10Alexandros Kosiaris: Revert "ores in labs: issue 403 for two user agents" [puppet] - 10https://gerrit.wikimedia.org/r/456387 (owner: 10Ladsgroup) [13:50:35] (03PS2) 10Alexandros Kosiaris: ores in labs: fix up the nginx rule [puppet] - 10https://gerrit.wikimedia.org/r/456390 (https://phabricator.wikimedia.org/T202655) (owner: 10Ladsgroup) [13:50:41] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] ores in labs: fix up the nginx rule [puppet] - 10https://gerrit.wikimedia.org/r/456390 (https://phabricator.wikimedia.org/T202655) (owner: 10Ladsgroup) [13:51:11] (03CR) 10Filippo Giunchedi: [C: 031] "Tweaked the function a little" [puppet] - 10https://gerrit.wikimedia.org/r/456366 (https://phabricator.wikimedia.org/T201858) (owner: 10Gilles) [13:51:22] volans: ^ if you have two minutes [13:51:29] godog: sure [13:52:17] (03PS1) 10Bstorm: labstore: make nfsd threads configurable [puppet] - 10https://gerrit.wikimedia.org/r/456391 [13:53:17] (03CR) 10Volans: "question inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/456366 (https://phabricator.wikimedia.org/T201858) (owner: 10Gilles) [13:54:11] volans: thanks! [13:55:12] (03CR) 10Filippo Giunchedi: [C: 031] Catch and log inactive DC HTTP errors in Swift Proxy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/456366 (https://phabricator.wikimedia.org/T201858) (owner: 10Gilles) [13:56:08] (03CR) 10Banyek: [C: 031] "I think going to 80% shouldn't be a trouble, if there aren't too many connections, then I'd say more buffer pool will be good." [puppet] - 10https://gerrit.wikimedia.org/r/455769 (owner: 10Jcrespo) [13:57:51] (03PS2) 10Bstorm: labstore: make nfsd threads configurable [puppet] - 10https://gerrit.wikimedia.org/r/456391 [14:02:25] (03CR) 10Filippo Giunchedi: memcached: add the possibility to configure -v* parameters (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/456096 (owner: 10Elukey) [14:03:16] (03PS4) 10Filippo Giunchedi: Send Thumbor-Request-Id in haproxy response [puppet] - 10https://gerrit.wikimedia.org/r/456151 (https://phabricator.wikimedia.org/T187765) (owner: 10Gilles) [14:03:34] (03CR) 10Volans: [C: 031] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/456366 (https://phabricator.wikimedia.org/T201858) (owner: 10Gilles) [14:03:51] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1090 (s2 and s7)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456336 (owner: 10Jcrespo) [14:03:53] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1119 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456334 (owner: 10Jcrespo) [14:05:14] (03PS1) 10Ottomata: Initial debian packaging version 0.208 [debs/presto] - 10https://gerrit.wikimedia.org/r/456394 (https://phabricator.wikimedia.org/T203115) [14:05:16] (03CR) 10Ottomata: Initial debian packaging version 0.208 (037 comments) [debs/presto] (debian) - 10https://gerrit.wikimedia.org/r/456277 (https://phabricator.wikimedia.org/T203115) (owner: 10Ottomata) [14:05:25] (03PS2) 10ArielGlenn: make sure dump checker always looks at the dump dir from earliest run [puppet] - 10https://gerrit.wikimedia.org/r/441179 [14:06:31] (03CR) 10ArielGlenn: [C: 032] make sure dump checker always looks at the dump dir from earliest run [puppet] - 10https://gerrit.wikimedia.org/r/441179 (owner: 10ArielGlenn) [14:08:17] (03CR) 10Volans: "Nitpick suggestions inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/455769 (owner: 10Jcrespo) [14:08:37] !log upgrading mw1262-1265 to wikidiff 1.7.3 [14:08:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:52] (03CR) 10Elukey: memcached: add the possibility to configure -v* parameters (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/456096 (owner: 10Elukey) [14:10:42] (03PS4) 10Filippo Giunchedi: Catch and log inactive DC HTTP errors in Swift Proxy [puppet] - 10https://gerrit.wikimedia.org/r/456366 (https://phabricator.wikimedia.org/T201858) (owner: 10Gilles) [14:11:59] (03CR) 10Filippo Giunchedi: [C: 032] Catch and log inactive DC HTTP errors in Swift Proxy [puppet] - 10https://gerrit.wikimedia.org/r/456366 (https://phabricator.wikimedia.org/T201858) (owner: 10Gilles) [14:12:20] (03CR) 10Herron: "Indeed this config has been tested on compiler1001 and from there it does the right thing (proxies requests to both compilerNN and compile" [puppet] - 10https://gerrit.wikimedia.org/r/456287 (https://phabricator.wikimedia.org/T191438) (owner: 10Herron) [14:13:46] (03PS3) 10Bstorm: labstore: make nfsd threads configurable [puppet] - 10https://gerrit.wikimedia.org/r/456391 [14:13:50] (03PS3) 10Elukey: memcached: enable basic logging with the -v parameter [puppet] - 10https://gerrit.wikimedia.org/r/456096 [14:14:31] (03CR) 10Elukey: "Lovely, thanks for the clarification!" [puppet] - 10https://gerrit.wikimedia.org/r/456287 (https://phabricator.wikimedia.org/T191438) (owner: 10Herron) [14:15:10] (03PS4) 10Bstorm: labstore: make nfsd threads configurable [puppet] - 10https://gerrit.wikimedia.org/r/456391 [14:15:40] (03PS4) 10Elukey: memcached: enable basic logging with the -v parameter [puppet] - 10https://gerrit.wikimedia.org/r/456096 [14:16:15] (03PS5) 10Elukey: memcached: enable basic logging with the -v parameter [puppet] - 10https://gerrit.wikimedia.org/r/456096 [14:17:06] (03CR) 10Bstorm: [C: 032] labstore: make nfsd threads configurable [puppet] - 10https://gerrit.wikimedia.org/r/456391 (owner: 10Bstorm) [14:34:37] (03CR) 10Jcrespo: "2) According to the commit message this change could be deployed only on servers with more than 256GB of RAM. We could implement that logi" [puppet] - 10https://gerrit.wikimedia.org/r/455769 (owner: 10Jcrespo) [14:35:11] 10Operations: reinstall rdb100[56] with RAID - https://phabricator.wikimedia.org/T140442 (10ArielGlenn) [14:35:32] 10Operations, 10monitoring, 10User-fgiunchedi: prometheus on bast3002 misbehaving - https://phabricator.wikimedia.org/T192610 (10ArielGlenn) [14:35:59] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: WDQS diskspace is low - https://phabricator.wikimedia.org/T196485 (10ArielGlenn) [14:36:49] 10Operations, 10Phabricator, 10Release-Engineering-Team (Watching / External): Reimage both phab1001 and phab2001 to stretch - https://phabricator.wikimedia.org/T190568 (10ArielGlenn) [14:37:21] 10Operations: Netbox: postgres cannot be restarted w/ current config - https://phabricator.wikimedia.org/T184634 (10ArielGlenn) [14:38:26] (03CR) 10Volans: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/455769 (owner: 10Jcrespo) [14:38:36] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10jcrespo) I have signed @Banyek GPG's key, I checked his fingerprint and he showed me a piece of identity with a photo to confirm it. [14:39:56] (03CR) 10Eevans: [C: 04-1] "> No idea if it will have an impact on Cassandra, but "-XX:+UseNUMA"" [puppet] - 10https://gerrit.wikimedia.org/r/426152 (https://phabricator.wikimedia.org/T192112) (owner: 10Eevans) [14:40:01] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10MoritzMuehlenhoff) @jcrespo Can you please push the signed key to the keyservers? ``` gpg --send-key $KEYID ``` [14:40:31] 10Operations, 10Scap, 10Release-Engineering-Team (Kanban): mwscript rebuildLocalisationCache.php takes 40 minutes on HHVM (rather than ~5 on PHP 5) - https://phabricator.wikimedia.org/T191921 (10ArielGlenn) [14:41:12] 10Operations, 10Wikidata, 10Wikidata-Query-Service: WDQS disk usage increase is correlated with reloading of categories - https://phabricator.wikimedia.org/T200202 (10ArielGlenn) [14:42:42] 10Operations, 10Pybal, 10Traffic: Add support for setting weight=0 when depooling - https://phabricator.wikimedia.org/T86650 (10ArielGlenn) [14:45:43] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install backup2001 - https://phabricator.wikimedia.org/T196477 (10MoritzMuehlenhoff) The hardware side is fixed, but I'm seeing a kernel error, looking into it. [14:50:39] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Onboarding Balazs Pocze - https://phabricator.wikimedia.org/T202521 (10jcrespo) I already did. [15:06:08] 10Operations, 10Parsoid: rack/setup/install scandium.eqiad.wmnet (parsoid test box) - https://phabricator.wikimedia.org/T201366 (10Daimona) Ergh, sorry, wrong task. [15:08:21] 10Operations, 10Wikimedia-Logstash, 10User-fgiunchedi, 10User-herron: Logstash hardware expansion - https://phabricator.wikimedia.org/T203169 (10fgiunchedi) p:05Triage>03Normal [15:16:36] (03CR) 10Vgutierrez: [C: 031] certcentral_api: basic functionality fixes and error log [software/certcentral] - 10https://gerrit.wikimedia.org/r/456067 (owner: 10Alex Monk) [15:17:44] !log increased number of nfsd threads on labstore1004 to 300 [15:17:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:24:09] PROBLEM - Filesystem available is greater than filesystem size on ms-be1041 is CRITICAL: cluster=swift device=/dev/sdf1 fstype=xfs instance=ms-be1041:9100 job=node mountpoint=/srv/swift-storage/sdf1 site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be1041&var-datasource=eqiad%2520prometheus%252Fops [15:24:19] RECOVERY - Filesystem available is greater than filesystem size on ms-be2042 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2042&var-datasource=codfw%2520prometheus%252Fops [15:34:32] 10Operations, 10wikimediafoundation.org: wikimediafoundation.org in deutsch shows 'suspended or shutdown' - https://phabricator.wikimedia.org/T203172 (10chasemp) [15:34:52] 10Operations, 10wikimediafoundation.org: wikimediafoundation.org in deutsch shows 'suspended or shutdown' - https://phabricator.wikimedia.org/T203172 (10chasemp) I tagged with #operations so they are aware but I'm not sure if we can do anything directly from the tech side [15:40:53] 10Operations, 10wikimediafoundation.org: wikimediafoundation.org in deutsch shows 'suspended or shutdown' - https://phabricator.wikimedia.org/T203172 (10RobH) So, the langugage content is mixed (per T200742) and I can confirm the bahavior they list above. Click on https://wikimediafoundation.org then click on... [15:42:28] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudvirt102[34] - https://phabricator.wikimedia.org/T199125 (10Cmjohnson) I connectd a SATA cable and turned on the SATA settings in bios to AHCI and auto for disks...the raid controller is st... [15:45:11] (03CR) 10Smalyshev: [C: 031] Add health check for categories endpoint without lag check [puppet] - 10https://gerrit.wikimedia.org/r/456187 (owner: 10Smalyshev) [15:51:18] (03PS1) 10Jgreen: add frdeploy - WMF Fundraising deploy tools [software] - 10https://gerrit.wikimedia.org/r/456407 [15:54:55] (03CR) 10Jgreen: [C: 032] add frdeploy - WMF Fundraising deploy tools [software] - 10https://gerrit.wikimedia.org/r/456407 (owner: 10Jgreen) [15:55:56] (03Merged) 10jenkins-bot: add frdeploy - WMF Fundraising deploy tools [software] - 10https://gerrit.wikimedia.org/r/456407 (owner: 10Jgreen) [15:59:26] (03PS2) 10Herron: puppet_compiler: temporarily proxy two project names with nginx [puppet] - 10https://gerrit.wikimedia.org/r/456287 (https://phabricator.wikimedia.org/T191438) [16:00:05] godog and _joe_: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Puppet SWAT(Max 6 patches) . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180830T1600). [16:00:05] Ebe123 and Reedy: A patch you scheduled for Puppet SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [16:00:14] Around [16:00:28] (03CR) 10Herron: [C: 032] puppet_compiler: temporarily proxy two project names with nginx [puppet] - 10https://gerrit.wikimedia.org/r/456287 (https://phabricator.wikimedia.org/T191438) (owner: 10Herron) [16:03:59] Can test https://gerrit.wikimedia.org/r/c/operations/puppet/+/445603 [16:07:00] (03CR) 10Cwhite: [C: 031] memcached: enable basic logging with the -v parameter [puppet] - 10https://gerrit.wikimedia.org/r/456096 (owner: 10Elukey) [16:12:40] (03CR) 10Cwhite: [C: 031] cassandra: restore (most) G1GC settings to defaults [puppet] - 10https://gerrit.wikimedia.org/r/426152 (https://phabricator.wikimedia.org/T192112) (owner: 10Eevans) [16:13:56] Herron, mind doing my patch? [16:15:05] !log restart of logstash to move data directory - T198351 [16:15:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:10] T198351: Refactor puppet to support multiple elasticsearch instances on same node - https://phabricator.wikimedia.org/T198351 [16:15:14] (03PS2) 10Gehel: logstash: move elasticsearch data directory [puppet] - 10https://gerrit.wikimedia.org/r/456133 (https://phabricator.wikimedia.org/T198351) [16:16:24] Ebe123: o/ - is there any test that we can do on say mwdebug* first before applying it to all mw servers? [16:16:29] PROBLEM - HP RAID on ms-be1019 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [16:16:42] or even in codfw [16:16:57] What are those? [16:17:18] (03CR) 10Gehel: [C: 032] logstash: move elasticsearch data directory [puppet] - 10https://gerrit.wikimedia.org/r/456133 (https://phabricator.wikimedia.org/T198351) (owner: 10Gehel) [16:17:30] Also, the effect will be controlled as timidity++ is kept as a fallback in both Score and puppet [16:18:18] Ebe123: those are servers meant for testing before applying it to the whole production.. Since I don't have much context, I am trying to figure out what kind of problems this change might cause [16:18:41] (regular sre paranoia before merging :) [16:19:11] do you mind to give us a bit of explanation? I saw the task and I got an idea but better safe than sorry [16:19:31] also moritzm asked "Looks fine. Will the mediawiki side of this use the recently introduced wrapper function for executing binaries in firejail?" [16:20:25] Cc: Reedy (if you are around) [16:21:05] yes, the binaries will be through firejail [16:22:38] The potential problem is the audio becomes unplayable (if the fallback is not well configured, but that has already been tested, or due to the soundfonts) [16:23:10] That shouldn't be a problem, as it was like that for 2 weeks waiting for the last train... Oups [16:24:19] RECOVERY - Filesystem available is greater than filesystem size on ms-be1041 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be1041&var-datasource=eqiad%2520prometheus%252Fops [16:26:39] A test could be to create a (new) score, and check if there is a crackling sound at the beginning. [16:26:52] \relative c'' { \time 5/4 \key g \minor \clef treble \tempo "Allegro giusto, nel modo russico; senza allegrezza, ma poco sostenuto." g--\f f-- bes-- c8--( f d4-- ) \bar "||" \time 6/4 c8--( f d4-- ) bes-- c-- g-- f-- } [16:26:59] ah ok so there might be impact if we install these [16:27:27] (I am italian lol) [16:27:44] And you know that snippet of course [16:28:08] 10Operations, 10Traffic, 10User-Urbanecm: Sort out HTTP caching issues for fixcopyright wiki - https://phabricator.wikimedia.org/T203179 (10BBlack) p:05Triage>03High [16:28:47] I admit my complete ignorance and I say no :( [16:29:15] But the possible (non-probable) impact is minimal and was the status-quo for a bit [16:29:36] https://it.wikipedia.org/wiki/Quadri_da_un%27esposizione [16:29:44] !log shutting down wdqs1005 for new SSD and reimaging - T198351 [16:29:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:49] T198351: Refactor puppet to support multiple elasticsearch instances on same node - https://phabricator.wikimedia.org/T198351 [16:30:18] oops, wrong ticket [16:30:39] !log shutting down wdqs1005 for new SSD and reimaging - T202779 [16:30:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:44] T202779: add SSDs to wdqs100[45] - https://phabricator.wikimedia.org/T202779 [16:32:17] Ebe123: just as precaution, I'd like to limit the scope of the deployment initially to the mwdebug servers (will do it via sre tools), and maybe you could use https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug to very that everything is ok? [16:32:44] ok [16:33:07] (03PS6) 10Elukey: Add fluidsynth to wikimedia servers [puppet] - 10https://gerrit.wikimedia.org/r/445603 (https://phabricator.wikimedia.org/T184598) (owner: 10Reedy) [16:33:19] PROBLEM - Device not healthy -SMART- on ms-be1019 is CRITICAL: cluster=swift device={cciss,0,cciss,1} instance=ms-be1019:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be1019&var-datasource=eqiad%2520prometheus%252Fops [16:34:19] PROBLEM - Check systemd state on ms-be1019 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:34:39] PROBLEM - SSH on ms-be1019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:35:14] !log gehel@puppetmaster1001 conftool action : set/pooled=inactive; selector: dc=eqiad,cluster=wdqs,name=wdqs1005.eqiad.wmnet [16:35:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:35:39] RECOVERY - SSH on ms-be1019 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u2 (protocol 2.0) [16:36:12] (03CR) 10Elukey: [C: 032] Add fluidsynth to wikimedia servers [puppet] - 10https://gerrit.wikimedia.org/r/445603 (https://phabricator.wikimedia.org/T184598) (owner: 10Reedy) [16:36:40] PROBLEM - HP RAID on ms-be1019 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [16:37:45] godog: --^ [16:37:49] PROBLEM - Check systemd state on ms-be1019 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:37:53] (03PS1) 10Gilles: Revert "Send Thumbor-Request-Id in haproxy response" [puppet] - 10https://gerrit.wikimedia.org/r/456416 [16:39:07] womp womp [16:39:15] thanks I'll take a look [16:39:31] Tell me which server for test [16:39:33] Ebe123: packages deployed to mwdebug servers [16:39:44] all of them :) [16:40:34] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 2 others: add SSDs to wdqs100[45] - https://phabricator.wikimedia.org/T202779 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['wdqs1005.eqiad.wmnet'] ``` The log can be found in `/var/log/w... [16:42:33] 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team: labvirt1009 HP Raid alert - https://phabricator.wikimedia.org/T198479 (10Cmjohnson) @bstorm yes the disk will work for labvirt1009. Do you want me to swap it? [16:42:46] If you want to hear (it's sharper now), it's https://en.wikipedia.org/w/index.php?title=User:Ebe123/Sandbox (debug) [16:43:24] lovely, so I guess green light :) [16:43:28] going to re-enable puppet [16:43:35] it will be rolled out incrementally during the next hour [16:43:36] 10Operations, 10ops-eqiad, 10DC-Ops: Degraded RAID on labvirt1019 - https://phabricator.wikimedia.org/T201957 (10Cmjohnson) 05Open>03declined [16:43:37] is it ok? [16:43:40] RECOVERY - MariaDB Slave Lag: s3 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 236.94 seconds [16:43:45] Better [16:44:22] Ebe123: done :) [16:44:41] thanks for the patience, and if you could stick around a bit it would be great (to check if anything pops up) [16:45:00] I'm here [16:45:11] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review: rack/setup/install analytics-master100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T201939 (10Cmjohnson) [16:45:51] 10Operations, 10ops-codfw, 10ops-eqiad, 10netops: Audit switch ports/descriptions/enable - https://phabricator.wikimedia.org/T189519 (10Cmjohnson) @ayounsi is this okay to close? [16:47:49] RECOVERY - HP RAID on ms-be1019 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Battery/Capacitor: OK [16:47:50] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudstore1008 & cloudstore1009 - https://phabricator.wikimedia.org/T193655 (10Cmjohnson) [16:51:43] 10Operations, 10ops-eqiad, 10DC-Ops: Replace wtp1043's sda - https://phabricator.wikimedia.org/T196886 (10Cmjohnson) 05Open>03Resolved [16:52:09] RECOVERY - Check systemd state on ms-be1019 is OK: OK - running: The system is fully operational [16:56:55] 10Operations, 10Maps: maps.wikimedia.org is showing old vandalized version of OSM - https://phabricator.wikimedia.org/T201772 (10Jdforrester-WMF) Fallout for users of MapBox caught by the same vandalism: https://www.buzzfeednews.com/article/ryanhatesthis/snapchats-snap-map-is-currently-showing-jewtropolis-instead [16:59:51] !log xfs_repair on ms-be1041 sdf1 - T199198 (retroactive, started at 15:32 [16:59:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:59:56] T199198: Some swift filesystems reporting negative disk usage - https://phabricator.wikimedia.org/T199198 [17:00:04] cscott, arlolra, subbu, halfak, and Amir1: (Dis)respected human, time to deploy Services – Graphoid / Parsoid / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180830T1700). Please do the needful. [17:00:09] I believe that ^ is the reason for additional load on ms-be1019 [17:00:51] no parsoid deploy today [17:02:15] (03PS2) 10Herron: wikidump: change smtpserver to localhost [puppet] - 10https://gerrit.wikimedia.org/r/441135 (https://phabricator.wikimedia.org/T196920) [17:03:20] (03CR) 10Herron: [C: 032] wikidump: change smtpserver to localhost [puppet] - 10https://gerrit.wikimedia.org/r/441135 (https://phabricator.wikimedia.org/T196920) (owner: 10Herron) [17:05:43] (03CR) 10Smalyshev: [C: 031] wdqs: redirect stderr from cron jobs to log file [puppet] - 10https://gerrit.wikimedia.org/r/456345 (owner: 10Gehel) [17:06:00] (03CR) 10Dzahn: [C: 031] "i can confirm this behaviour. tested on releaeses1001. when leaving address blank it also listens on :::873 in addition to 0.0.0.0:873" [puppet] - 10https://gerrit.wikimedia.org/r/456156 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [17:07:46] 10Operations, 10SRE-Access-Requests: Requesting access to EventLogging in Hive (analytics-privatedata-users) for Cicalese - https://phabricator.wikimedia.org/T203182 (10CCicalese_WMF) [17:08:44] (03PS2) 10Herron: sentry: change EMAIL_HOST to localhost [puppet] - 10https://gerrit.wikimedia.org/r/441134 (https://phabricator.wikimedia.org/T196920) [17:09:55] (03CR) 10Herron: [C: 032] sentry: change EMAIL_HOST to localhost [puppet] - 10https://gerrit.wikimedia.org/r/441134 (https://phabricator.wikimedia.org/T196920) (owner: 10Herron) [17:10:18] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 2 others: add SSDs to wdqs100[45] - https://phabricator.wikimedia.org/T202779 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['wdqs1005.eqiad.wmnet'] ``` The log can be found in `/var/log/w... [17:10:57] (03CR) 10Dzahn: "ah right :) so i think i'm supposed to find another target still in C4 then rather than replacing radon with the host that replace its fun" [puppet] - 10https://gerrit.wikimedia.org/r/456320 (https://phabricator.wikimedia.org/T202040) (owner: 10Dzahn) [17:11:39] (03PS22) 10Dzahn: Gerrit: Add support for avatars url in apache [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [17:13:07] (03CR) 10Herron: "sounds like a plan! when is good for you?" [puppet] - 10https://gerrit.wikimedia.org/r/441132 (https://phabricator.wikimedia.org/T196920) (owner: 10Herron) [17:14:10] (03PS2) 10Herron: wikimania_scholarships: change smtp_host to localhost [puppet] - 10https://gerrit.wikimedia.org/r/441133 (https://phabricator.wikimedia.org/T196920) [17:16:42] (03CR) 10Dzahn: "regarding the use of "cache_misc_nodes" in there. might have to double check if this is still "misc" nowadays since cache_misc got merged " [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [17:17:51] (03CR) 10Herron: [C: 032] "asked via the list and received acknowledgement of the change (and provided more details about it) but didn't have luck luck finding a use" [puppet] - 10https://gerrit.wikimedia.org/r/441133 (https://phabricator.wikimedia.org/T196920) (owner: 10Herron) [17:21:26] (03CR) 10Dzahn: "doesnt /var/www/avatars have to be created before the apache snippet?" [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [17:21:28] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Performance-Team (Radar): add performance team members to webserver_misc_static servers to maintain sitemaps - https://phabricator.wikimedia.org/T202910 (10Imarlier) @dzahn @ArielGlenn Confirmed, can connect and write to that dir. Thanks! @Moritz... [17:24:04] (03PS8) 10Paladox: Gerrit: Clone avatars repo into /var/www/gerrit-avatars [puppet] - 10https://gerrit.wikimedia.org/r/440104 (https://phabricator.wikimedia.org/T191183) [17:24:12] (03PS9) 10Paladox: Gerrit: Clone avatars repo into /var/www/gerrit-avatars [puppet] - 10https://gerrit.wikimedia.org/r/440104 (https://phabricator.wikimedia.org/T191183) [17:24:37] (03CR) 10Alex Monk: [C: 032] certcentral_api: basic functionality fixes and error log [software/certcentral] - 10https://gerrit.wikimedia.org/r/456067 (owner: 10Alex Monk) [17:26:09] (03Merged) 10jenkins-bot: certcentral_api: basic functionality fixes and error log [software/certcentral] - 10https://gerrit.wikimedia.org/r/456067 (owner: 10Alex Monk) [17:27:34] (03CR) 10jenkins-bot: certcentral_api: basic functionality fixes and error log [software/certcentral] - 10https://gerrit.wikimedia.org/r/456067 (owner: 10Alex Monk) [17:32:49] 10Operations, 10ops-eqiad, 10Discovery, 10Wikidata, and 2 others: add SSDs to wdqs100[45] - https://phabricator.wikimedia.org/T202779 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['wdqs1005.eqiad.wmnet'] ``` and were **ALL** successful. [17:33:13] (03CR) 10Dzahn: [C: 032] Gerrit: Clone avatars repo into /var/www/gerrit-avatars [puppet] - 10https://gerrit.wikimedia.org/r/440104 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [17:33:15] RECOVERY - Device not healthy -SMART- on ms-be1019 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ms-be1019&var-datasource=eqiad%2520prometheus%252Fops [17:33:35] PROBLEM - Host analytics1028 is DOWN: PING CRITICAL - Packet loss = 100% [17:34:32] (03CR) 10Dzahn: [C: 032] "Notice: /Stage[main]/Gerrit::Jetty/Git::Clone[All-Avatars]/File[/var/www/gerrit-avatars]/ensure: created" [puppet] - 10https://gerrit.wikimedia.org/r/440104 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [17:35:06] (03PS23) 10Paladox: Gerrit: Add support for avatars url in apache [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) [17:38:34] analytics1028 is me [17:39:35] RECOVERY - Host analytics1028 is UP: PING WARNING - Packet loss = 86%, RTA = 138.94 ms [17:40:24] PROBLEM - HHVM rendering on mw2204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:41:15] RECOVERY - HHVM rendering on mw2204 is OK: HTTP OK: HTTP/1.1 200 OK - 76297 bytes in 0.434 second response time [17:44:24] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Performance-Team (Radar): add performance team members to webserver_misc_static servers to maintain sitemaps - https://phabricator.wikimedia.org/T202910 (10ArielGlenn) The issue is that there are duplicate permissions granted by virtue of the two r... [17:44:40] (03PS24) 10Paladox: Gerrit: Add support for avatars url in apache [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) [17:46:20] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Performance-Team (Radar): add performance team members to webserver_misc_static servers to maintain sitemaps - https://phabricator.wikimedia.org/T202910 (10Imarlier) @ArielGlenn Gotcha, that helps. I think you can remove ops. (I don't have any se... [17:48:14] (03PS25) 10Paladox: Gerrit: Add support for avatars url in apache [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) [17:54:33] 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10Performance-Team (Radar): add performance team members to webserver_misc_static servers to maintain sitemaps - https://phabricator.wikimedia.org/T202910 (10ArielGlenn) OK just to clarify a bit more, 'ops' gives global root on everything. So before... [17:57:40] (03CR) 10Paladox: "Applied on gerrit-test3 and looks like a noop" [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [17:59:33] (03PS2) 10Krinkle: Revert "Send Thumbor-Request-Id in haproxy response" [puppet] - 10https://gerrit.wikimedia.org/r/456416 (https://phabricator.wikimedia.org/T187765) (owner: 10Gilles) [18:00:05] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: How many deployers does it take to do Morning SWAT (Max 6 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180830T1800). [18:00:05] No GERRIT patches in the queue for this window AFAICS. [18:00:07] (03CR) 10Krinkle: [C: 031] "Too bad.. I suppose backporting haproxy is not worth it?" [puppet] - 10https://gerrit.wikimedia.org/r/456416 (https://phabricator.wikimedia.org/T187765) (owner: 10Gilles) [18:02:50] (03PS9) 10Paladox: Gerrit: Hook up gerrit.wmfusercontent.org to apache [puppet] - 10https://gerrit.wikimedia.org/r/439808 (https://phabricator.wikimedia.org/T191183) [18:10:01] (03PS26) 10Paladox: Gerrit: Add support for avatars url in apache [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) [18:10:10] (03PS10) 10Paladox: Gerrit: Hook up gerrit.wmfusercontent.org to apache [puppet] - 10https://gerrit.wikimedia.org/r/439808 (https://phabricator.wikimedia.org/T191183) [18:11:50] CUSTOM - HP RAID on ms-be1036 is UNKNOWN: (Service Check Timed Out) [18:12:28] ^ testing succesful :) [18:16:20] (03CR) 10Paladox: "This change works" [puppet] - 10https://gerrit.wikimedia.org/r/439808 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [18:28:35] (03PS27) 10Paladox: Gerrit: Add support for avatars url in apache [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) [18:34:31] (03PS28) 10Paladox: Gerrit: Add support for avatars url in apache [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) [18:34:45] (03CR) 10Paladox: [C: 031] "Tested locally and works (no puppet errors)" [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [18:34:53] (03PS11) 10Paladox: Gerrit: Hook up gerrit.wmfusercontent.org to apache [puppet] - 10https://gerrit.wikimedia.org/r/439808 (https://phabricator.wikimedia.org/T191183) [18:35:23] (03CR) 10Paladox: [C: 031] "Tested locally (no puppet errors) (it defaulted to default.png if a image does not exist)" [puppet] - 10https://gerrit.wikimedia.org/r/439808 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [18:37:05] Any problems the time I was gone? [18:39:40] nope! :) [18:39:46] nothing on fire afaics [18:39:47] all good [18:42:22] * Krinkle staging on mwdebug1002/deploy1001 [18:43:14] (03CR) 10Reedy: [C: 04-1] "Needs rebasing/reworking" [puppet] - 10https://gerrit.wikimedia.org/r/445604 (owner: 10Reedy) [18:52:28] !log restarting aphlict on phab1001 to pick up nodejs security update [18:52:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:55:28] (03PS3) 10Smalyshev: Create wikidata ntriples dump from ttl dump [puppet] - 10https://gerrit.wikimedia.org/r/447922 (https://phabricator.wikimedia.org/T144103) [18:56:05] (03PS3) 10Zhuyifei1999: quarry::database: Use mariadb module instead of mysql module [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) [18:56:46] (03CR) 10jerkins-bot: [V: 04-1] quarry::database: Use mariadb module instead of mysql module [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [18:59:20] (03CR) 10Muehlenhoff: [C: 031] "Looks fine" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/456156 (https://phabricator.wikimedia.org/T192639) (owner: 10Elukey) [19:00:05] marxarelli: Dear deployers, time to do the MediaWiki train - Americas version deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180830T1900). [19:05:11] 10Operations, 10SRE-Access-Requests: Requesting access to restricted production access and analytics-privatedata-users for Patrick Earley - https://phabricator.wikimedia.org/T201667 (10PEarleyWMF) Thanks kindly, @RobH ! [19:06:58] (03PS6) 10Zhuyifei1999: quarry: Move the install into a venv and upgrade to Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/451698 (https://phabricator.wikimedia.org/T192698) [19:07:44] marxarelli: fyi, still waiting on a patch to land that I want to deploy from last hour. [19:07:48] been 20-30min :( [19:07:54] (Jenkins) [19:10:57] (03PS7) 10Zhuyifei1999: quarry: Move the install into a venv and upgrade to Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/451698 (https://phabricator.wikimedia.org/T192698) [19:12:35] Krinkle: k. no problem. i'll wait [19:12:42] yeah, i noticed the long queue [19:12:54] okay 1/2 is rolling out now [19:13:47] !log krinkle@deploy1001 Synchronized php-1.32.0-wmf.19/resources/src/startup: I13a996e01b48 (duration: 01m 06s) [19:13:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:18:05] (03PS4) 10Zhuyifei1999: quarry::database: Use mariadb module instead of mysql module [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) [19:18:45] (03CR) 10jerkins-bot: [V: 04-1] quarry::database: Use mariadb module instead of mysql module [puppet] - 10https://gerrit.wikimedia.org/r/454481 (https://phabricator.wikimedia.org/T181205) (owner: 10Zhuyifei1999) [19:20:19] and 2/2 [19:21:23] !log krinkle@deploy1001 Synchronized php-1.32.0-wmf.19/includes/: I11b390f2e4f5e7 (duration: 01m 16s) [19:21:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:22:53] marxarelli: go ahead :) [19:23:05] * Krinkle releases scap lock [19:23:37] Krinkle: ty! [19:28:02] thcipriani: going to start the train but i just noticed that integration-slave-docker-1025 is offline due to a full partition (you're maintenance script is working!). want to take care of that? [19:28:11] *your* [19:28:17] embarrassing [19:29:54] :) [19:30:05] sure, I'll take a look [19:36:18] hrm, script didn't alert anybody :\ [19:36:30] !log Deploying 1.32.0-wmf.19 to all wikis [19:36:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:38:20] (03PS1) 10Volans: puppetboard: add graph to processor0 fact [puppet] - 10https://gerrit.wikimedia.org/r/456432 [19:39:32] (03CR) 10RobH: [C: 031] puppetboard: add graph to processor0 fact [puppet] - 10https://gerrit.wikimedia.org/r/456432 (owner: 10Volans) [19:39:42] thcipriani: huh. deploy-promote keeps bailing on me [19:39:54] "Git repo is not clean" [19:39:59] but, it is clean [19:40:16] hrm [19:41:11] (03PS4) 10Smalyshev: Create wikidata ntriples dump from ttl dump [puppet] - 10https://gerrit.wikimedia.org/r/447922 (https://phabricator.wikimedia.org/T144103) [19:41:11] git status --porcelain say anything? [19:41:13] oh, wait. it's looking at it's own repo [19:41:16] yeah [19:41:18] :D [19:41:55] a lot of the tools in release do that [19:41:57] i can never keep all things scripts and their usage straight [19:42:05] *these* scripts [19:42:34] could use some sane-making, they've all grown up pretty ad-hoc [19:42:49] PROBLEM - Host cp1080 is DOWN: PING CRITICAL - Packet loss = 100% [19:43:00] (03PS1) 10Dduvall: all wikis to 1.32.0-wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456433 [19:43:17] (03CR) 10Dduvall: [C: 032] all wikis to 1.32.0-wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456433 (owner: 10Dduvall) [19:45:34] (03CR) 10Volans: [C: 032] puppetboard: add graph to processor0 fact [puppet] - 10https://gerrit.wikimedia.org/r/456432 (owner: 10Volans) [19:46:25] (03CR) 10Gilles: "Damnit you're right, it's already in jessie-backports. We just have to tell Puppet to install that one." [puppet] - 10https://gerrit.wikimedia.org/r/456416 (https://phabricator.wikimedia.org/T187765) (owner: 10Gilles) [19:46:30] (03Merged) 10jenkins-bot: all wikis to 1.32.0-wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456433 (owner: 10Dduvall) [19:46:32] (03Abandoned) 10Gilles: Revert "Send Thumbor-Request-Id in haproxy response" [puppet] - 10https://gerrit.wikimedia.org/r/456416 (https://phabricator.wikimedia.org/T187765) (owner: 10Gilles) [19:47:01] (03PS5) 10Smalyshev: Create wikidata ntriples dump from ttl dump [puppet] - 10https://gerrit.wikimedia.org/r/447922 (https://phabricator.wikimedia.org/T144103) [19:47:12] !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: all wikis to 1.32.0-wmf.19 [19:47:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:48:28] PROBLEM - IPsec on cp2002 is CRITICAL: Strongswan CRITICAL - ok: 62 connecting: cp1080_v4, cp1080_v6 [19:48:29] PROBLEM - IPsec on cp4024 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:48:29] PROBLEM - IPsec on cp2020 is CRITICAL: Strongswan CRITICAL - ok: 62 connecting: cp1080_v4, cp1080_v6 [19:48:29] PROBLEM - IPsec on cp5005 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:48:29] PROBLEM - IPsec on cp2017 is CRITICAL: Strongswan CRITICAL - ok: 62 connecting: cp1080_v4, cp1080_v6 [19:48:39] PROBLEM - IPsec on cp5003 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:48:39] PROBLEM - IPsec on cp5002 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:48:39] PROBLEM - IPsec on cp2024 is CRITICAL: Strongswan CRITICAL - ok: 62 connecting: cp1080_v4, cp1080_v6 [19:48:39] PROBLEM - IPsec on cp3045 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:48:48] PROBLEM - IPsec on cp5004 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:48:48] PROBLEM - IPsec on cp5001 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:48:49] PROBLEM - IPsec on cp4023 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:48:49] PROBLEM - IPsec on cp4022 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:48:58] PROBLEM - IPsec on cp2005 is CRITICAL: Strongswan CRITICAL - ok: 62 connecting: cp1080_v4, cp1080_v6 [19:48:58] PROBLEM - IPsec on cp3038 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:48:58] PROBLEM - IPsec on cp2022 is CRITICAL: Strongswan CRITICAL - ok: 62 connecting: cp1080_v4, cp1080_v6 [19:48:59] PROBLEM - IPsec on cp4025 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:49:08] PROBLEM - IPsec on cp4026 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:49:08] PROBLEM - IPsec on cp2008 is CRITICAL: Strongswan CRITICAL - ok: 62 connecting: cp1080_v4, cp1080_v6 [19:49:09] PROBLEM - IPsec on cp2026 is CRITICAL: Strongswan CRITICAL - ok: 62 connecting: cp1080_v4, cp1080_v6 [19:49:18] PROBLEM - IPsec on cp3037 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:49:18] PROBLEM - IPsec on cp3034 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:49:18] PROBLEM - IPsec on cp3035 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:49:18] PROBLEM - IPsec on cp3043 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:49:18] PROBLEM - IPsec on cp3046 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:49:19] PROBLEM - IPsec on cp5006 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:49:19] PROBLEM - IPsec on cp3044 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:49:20] PROBLEM - IPsec on cp3036 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:49:20] PROBLEM - IPsec on cp3049 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:49:21] PROBLEM - IPsec on cp3039 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:49:28] PROBLEM - IPsec on cp4021 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:49:29] PROBLEM - IPsec on cp3047 is CRITICAL: Strongswan CRITICAL - ok: 34 connecting: cp1080_v4, cp1080_v6 [19:49:38] PROBLEM - IPsec on cp2011 is CRITICAL: Strongswan CRITICAL - ok: 62 connecting: cp1080_v4, cp1080_v6 [19:49:38] PROBLEM - IPsec on cp2014 is CRITICAL: Strongswan CRITICAL - ok: 62 connecting: cp1080_v4, cp1080_v6 [19:50:21] strongswan swan song [19:50:22] (03CR) 10jenkins-bot: all wikis to 1.32.0-wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/456433 (owner: 10Dduvall) [19:51:33] i don't see _any_ action in fatalmonitor [19:51:37] which bothers me [19:52:48] ah, there's some [19:52:51] * marxarelli feels better [19:56:46] (03CR) 10Smalyshev: "Seems to be working on my test setup" [puppet] - 10https://gerrit.wikimedia.org/r/447922 (https://phabricator.wikimedia.org/T144103) (owner: 10Smalyshev) [20:04:55] !log dzahn@neodymium conftool action : set/pooled=no; selector: name=cp1080.eqiad.wmnet [20:04:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:05:04] depooled cp1080 [20:05:41] it had a memory issue in the past https://phabricator.wikimedia.org/T201174 [20:06:48] !log dzahn@neodymium conftool action : set/pooled=no; selector: name=cp1080.eqiad.wmnet| reason: Strongswan CRITICALs fom Icinga (T201174) [20:06:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:06:53] T201174: cp1080 uncorrectable DIMM error slot A5 - https://phabricator.wikimedia.org/T201174 [20:17:23] !log powercycling cp1080 [20:17:24] (03PS29) 10Paladox: Gerrit: Add support for avatars url in apache [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) [20:17:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:17:58] (03PS30) 10Paladox: Gerrit: Add support for avatars url in apache [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) [20:18:14] (03CR) 10Paladox: [C: 031] "Confirmed working locally. (no puppet errors)" [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [20:18:24] (03PS12) 10Paladox: Gerrit: Hook up gerrit.wmfusercontent.org to apache [puppet] - 10https://gerrit.wikimedia.org/r/439808 (https://phabricator.wikimedia.org/T191183) [20:19:02] (03PS1) 10Paladox: Gerrit: Setup avatars url in gerrit config [puppet] - 10https://gerrit.wikimedia.org/r/456437 (https://phabricator.wikimedia.org/T191183) [20:20:29] RECOVERY - IPsec on cp4024 is OK: Strongswan OK - 36 ESP OK [20:20:38] RECOVERY - IPsec on cp2024 is OK: Strongswan OK - 64 ESP OK [20:20:38] RECOVERY - Host cp1080 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [20:20:39] RECOVERY - IPsec on cp5005 is OK: Strongswan OK - 36 ESP OK [20:20:39] RECOVERY - IPsec on cp3045 is OK: Strongswan OK - 36 ESP OK [20:20:48] RECOVERY - IPsec on cp5002 is OK: Strongswan OK - 36 ESP OK [20:20:48] RECOVERY - IPsec on cp5003 is OK: Strongswan OK - 36 ESP OK [20:20:49] RECOVERY - IPsec on cp4023 is OK: Strongswan OK - 36 ESP OK [20:20:49] RECOVERY - IPsec on cp2005 is OK: Strongswan OK - 64 ESP OK [20:20:49] RECOVERY - IPsec on cp4022 is OK: Strongswan OK - 36 ESP OK [20:20:49] RECOVERY - IPsec on cp2022 is OK: Strongswan OK - 64 ESP OK [20:20:58] RECOVERY - IPsec on cp5001 is OK: Strongswan OK - 36 ESP OK [20:20:58] RECOVERY - IPsec on cp5004 is OK: Strongswan OK - 36 ESP OK [20:20:58] RECOVERY - IPsec on cp3038 is OK: Strongswan OK - 36 ESP OK [20:20:58] RECOVERY - IPsec on cp4025 is OK: Strongswan OK - 36 ESP OK [20:20:59] RECOVERY - IPsec on cp2008 is OK: Strongswan OK - 64 ESP OK [20:20:59] RECOVERY - IPsec on cp2026 is OK: Strongswan OK - 64 ESP OK [20:21:08] RECOVERY - IPsec on cp4026 is OK: Strongswan OK - 36 ESP OK [20:21:18] RECOVERY - IPsec on cp3037 is OK: Strongswan OK - 36 ESP OK [20:21:18] RECOVERY - IPsec on cp3046 is OK: Strongswan OK - 36 ESP OK [20:21:18] RECOVERY - IPsec on cp3035 is OK: Strongswan OK - 36 ESP OK [20:21:18] RECOVERY - IPsec on cp3034 is OK: Strongswan OK - 36 ESP OK [20:21:18] RECOVERY - IPsec on cp3043 is OK: Strongswan OK - 36 ESP OK [20:21:19] RECOVERY - IPsec on cp3044 is OK: Strongswan OK - 36 ESP OK [20:21:19] RECOVERY - IPsec on cp3049 is OK: Strongswan OK - 36 ESP OK [20:21:20] RECOVERY - IPsec on cp3036 is OK: Strongswan OK - 36 ESP OK [20:21:28] RECOVERY - IPsec on cp2011 is OK: Strongswan OK - 64 ESP OK [20:21:28] RECOVERY - IPsec on cp5006 is OK: Strongswan OK - 36 ESP OK [20:21:28] RECOVERY - IPsec on cp4021 is OK: Strongswan OK - 36 ESP OK [20:21:28] RECOVERY - IPsec on cp2002 is OK: Strongswan OK - 64 ESP OK [20:21:28] RECOVERY - IPsec on cp2014 is OK: Strongswan OK - 64 ESP OK [20:21:29] RECOVERY - IPsec on cp3047 is OK: Strongswan OK - 36 ESP OK [20:21:38] RECOVERY - IPsec on cp2020 is OK: Strongswan OK - 64 ESP OK [20:21:38] RECOVERY - IPsec on cp2017 is OK: Strongswan OK - 64 ESP OK [20:22:29] RECOVERY - IPsec on cp3039 is OK: Strongswan OK - 36 ESP OK [20:23:01] !log cp1080 - powercycled - lots of RECOVERY from Icinga for IPsec connections - leaving depooled so far (T201174) [20:23:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:23:08] T201174: cp1080 uncorrectable DIMM error slot A5 - https://phabricator.wikimedia.org/T201174 [20:23:57] (03PS1) 10Smalyshev: Move settings from hardcoded values to configs. [puppet] - 10https://gerrit.wikimedia.org/r/456439 [20:23:59] (03PS2) 10Paladox: Gerrit: Setup avatars url in gerrit config [puppet] - 10https://gerrit.wikimedia.org/r/456437 (https://phabricator.wikimedia.org/T191183) [20:25:01] (03CR) 10jerkins-bot: [V: 04-1] Gerrit: Setup avatars url in gerrit config [puppet] - 10https://gerrit.wikimedia.org/r/456437 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [20:26:00] (03PS3) 10Ottomata: Initial debian packaging version 0.208 [debs/presto] (debian) - 10https://gerrit.wikimedia.org/r/456277 (https://phabricator.wikimedia.org/T203115) [20:26:02] (03PS3) 10Paladox: Gerrit: Setup avatars url in gerrit config [puppet] - 10https://gerrit.wikimedia.org/r/456437 (https://phabricator.wikimedia.org/T191183) [20:30:26] 10Operations: prometheus-varnish-exporter@frontend.service: Unit entered failed state - invalid character 'C' - https://phabricator.wikimedia.org/T203191 (10Dzahn) [20:31:34] (03PS2) 10Smalyshev: Move settings from hardcoded values to configs. [puppet] - 10https://gerrit.wikimedia.org/r/456439 [20:50:08] RECOVERY - Check for gridmaster host resolution TCP on cloudservices1004 is OK: DNS OK - 0.016 seconds response time (tools-grid-master.tools.eqiad.wmflabs. 60 IN A 10.68.20.158) [20:50:28] RECOVERY - Check for gridmaster host resolution UDP on cloudservices1004 is OK: DNS OK - 0.012 seconds response time (tools-grid-master.tools.eqiad.wmflabs. 60 IN A 10.68.20.158) [20:55:48] 10Operations, 10ops-eqiad, 10Traffic: cp1080 - kernel / bnxt_en failures - https://phabricator.wikimedia.org/T203194 (10Dzahn) [20:59:45] (03CR) 10Hoo man: "The code itself looks good, but I didn't it (I can if needed)." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/447922 (https://phabricator.wikimedia.org/T144103) (owner: 10Smalyshev) [21:10:56] 10Operations: prometheus-varnish-exporter@frontend.service: Unit entered failed state - invalid character 'C' - https://phabricator.wikimedia.org/T203191 (10Dzahn) I found this while looking at T203194 It's well possible this was only triggered because the network went down due to that issue. But "Invalid cha... [21:19:22] jouncebot: now [21:19:22] No deployments scheduled for the next 1 hour(s) and 40 minute(s) [21:19:23] (03CR) 10Volans: sre.switchdc.mediawiki: add Phase 0 cookbooks (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/456112 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [21:21:34] marxarelli: OK if I do a deployment to revert JsonConfig? We're pulling the new feature that's caused multiple UBNs so that it can have more testing first. [21:21:40] ( https://gerrit.wikimedia.org/r/c/mediawiki/extensions/JsonConfig/+/456446 ) [21:22:01] James_F: yes please [21:22:46] Thanks. [21:25:11] (03PS1) 10Mholloway: Add variables for map tile invalidation [puppet] - 10https://gerrit.wikimedia.org/r/456463 (https://phabricator.wikimedia.org/T109776) [21:25:58] (03CR) 10jerkins-bot: [V: 04-1] Add variables for map tile invalidation [puppet] - 10https://gerrit.wikimedia.org/r/456463 (https://phabricator.wikimedia.org/T109776) (owner: 10Mholloway) [21:27:17] (03CR) 10Mholloway: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/456463 (https://phabricator.wikimedia.org/T109776) (owner: 10Mholloway) [21:28:01] (03CR) 10jerkins-bot: [V: 04-1] Add variables for map tile invalidation [puppet] - 10https://gerrit.wikimedia.org/r/456463 (https://phabricator.wikimedia.org/T109776) (owner: 10Mholloway) [21:31:59] (03PS2) 10Mholloway: Add variables for map tile invalidation [puppet] - 10https://gerrit.wikimedia.org/r/456463 (https://phabricator.wikimedia.org/T109776) [21:42:29] (03CR) 10Ppchelko: [C: 031] Add variables for map tile invalidation [puppet] - 10https://gerrit.wikimedia.org/r/456463 (https://phabricator.wikimedia.org/T109776) (owner: 10Mholloway) [21:48:21] 10Operations, 10Maps-Sprint, 10Traffic, 10Maps (Tilerator), and 2 others: Tilerator should purge Varnish cache - https://phabricator.wikimedia.org/T109776 (10Mholloway) Related PRs: https://github.com/wikimedia/change-propagation/pull/290 (merged) https://github.com/kartotherian/tilerator/pull/41 (merged)... [21:54:23] (03PS1) 10Volans: mediawiki: add checks to set_readonly() [software/spicerack] - 10https://gerrit.wikimedia.org/r/456501 (https://phabricator.wikimedia.org/T199079) [21:55:03] (03PS1) 10Volans: sre.switchdc.mediawiki: add Phase 1 cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/456502 (https://phabricator.wikimedia.org/T199079) [21:56:05] (03CR) 10Volans: sre.switchdc.mediawiki: add Phase 1 cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/456502 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [21:56:30] (03PS1) 10Volans: sre.switchdc.mediawiki: add Phase 2 cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/456503 (https://phabricator.wikimedia.org/T199079) [21:57:15] !log jforrester@deploy1001 Synchronized php-1.32.0-wmf.19/extensions/JsonConfig/: Hot-deploy Ieaded578ffd revert of T200968 due to bugs (duration: 00m 51s) [21:57:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:57:21] T200968: Support appropriate documentation of CC BY SA data on Commons - https://phabricator.wikimedia.org/T200968 [21:57:25] (03CR) 10Volans: sre.switchdc.mediawiki: add Phase 2 cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/456503 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [22:01:18] marxarelli: Will there be another l10n bot run this evening or should I do a `scap sync` to re-build i18n? [22:12:48] !log START - Cookbook sre.switchdc.mediawiki.00-disable-puppet (switchdc/volans@sarin) [22:12:48] !log END (NOTRUN) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0) (switchdc/volans@sarin) [22:12:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:12:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:13:15] !log this was a test for the logging to IRC, please ignore ^^^^ [22:13:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:19:38] (03CR) 10Dzahn: "it creates an Apache site called "undef" per http://puppet-compiler.wmflabs.org/12303/cobalt.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [22:20:28] (03CR) 10Paladox: [C: 031] "In testing it worked with undef." [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [22:21:43] (03PS31) 10Paladox: Gerrit: Add support for avatars url in apache [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) [22:25:01] !log jforrester@deploy1001 Started scap: Full sync for i18n re-build following Ieaded578ffd [22:25:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:27:09] (03CR) 10Gehel: sre.switchdc.mediawiki: add Phase 1 cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/456502 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [22:27:26] (03PS32) 10Paladox: Gerrit: Add support for avatars url in apache [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) [22:28:20] (03PS1) 10Volans: sre.switchdc.mediawiki: add Phase 3 cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/456510 (https://phabricator.wikimedia.org/T199079) [22:28:36] (03PS1) 10Volans: sre.switchdc.mediawiki: add Phase 4 cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/456511 (https://phabricator.wikimedia.org/T199079) [22:28:56] (03CR) 10jerkins-bot: [V: 04-1] sre.switchdc.mediawiki: add Phase 3 cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/456510 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [22:29:09] (03CR) 10jerkins-bot: [V: 04-1] sre.switchdc.mediawiki: add Phase 4 cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/456511 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [22:29:38] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 898.96 seconds [22:30:36] (03PS2) 10Volans: sre.switchdc.mediawiki: add Phase 3 cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/456510 (https://phabricator.wikimedia.org/T199079) [22:31:34] (03PS2) 10Volans: sre.switchdc.mediawiki: add Phase 4 cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/456511 (https://phabricator.wikimedia.org/T199079) [22:33:54] (03CR) 10Gehel: mediawiki: add checks to set_readonly() (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/456501 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [22:35:40] (03CR) 10Gehel: [C: 031] sre.switchdc.mediawiki: add Phase 2 cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/456503 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [22:38:09] PROBLEM - HTTP availability for Varnish at ulsfo on einsteinium is CRITICAL: job=varnish-text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [22:38:42] (03CR) 10Gehel: [C: 031] sre.switchdc.mediawiki: add Phase 3 cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/456510 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [22:38:44] (03CR) 10Dzahn: [C: 032] Gerrit: Add support for avatars url in apache [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [22:38:49] :) [22:39:18] RECOVERY - HTTP availability for Varnish at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [22:39:55] (03CR) 10Dzahn: [V: 031 C: 032] "http://puppet-compiler.wmflabs.org/12304/cobalt.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [22:41:24] (03CR) 10Gehel: [C: 04-1] sre.switchdc.mediawiki: add Phase 4 cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/456511 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [22:43:58] (03PS13) 10Paladox: Gerrit: Hook up gerrit.wmfusercontent.org to apache [puppet] - 10https://gerrit.wikimedia.org/r/439808 (https://phabricator.wikimedia.org/T191183) [22:44:11] (03PS4) 10Paladox: Gerrit: Setup avatars url in gerrit config [puppet] - 10https://gerrit.wikimedia.org/r/456437 (https://phabricator.wikimedia.org/T191183) [22:44:57] (03PS14) 10Paladox: Gerrit: Hook up gerrit.wmfusercontent.org to apache [puppet] - 10https://gerrit.wikimedia.org/r/439808 (https://phabricator.wikimedia.org/T191183) [22:45:42] (03CR) 10Dzahn: "loaded mod_remoteip, that was all as expected. applied on gerrit2001, then on cobalt" [puppet] - 10https://gerrit.wikimedia.org/r/439783 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [22:46:24] (03CR) 10Volans: sre.switchdc.mediawiki: add Phase 1 cookbook (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/456502 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [22:49:28] (03PS3) 10Volans: sre.switchdc.mediawiki: add Phase 4 cookbook [cookbooks] - 10https://gerrit.wikimedia.org/r/456511 (https://phabricator.wikimedia.org/T199079) [22:49:31] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/12305/" [puppet] - 10https://gerrit.wikimedia.org/r/439808 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [22:50:52] (03CR) 10Volans: "done" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/456511 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [22:51:27] (03CR) 10Gehel: [C: 031] "LGTM" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/456502 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [22:51:28] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 268.55 seconds [22:53:31] (03CR) 10Gehel: [C: 031] "LGTM" (031 comment) [cookbooks] - 10https://gerrit.wikimedia.org/r/456511 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [22:53:40] (03PS1) 10Volans: mysql: rename ensure_core_masters_in_sync [software/spicerack] - 10https://gerrit.wikimedia.org/r/456513 (https://phabricator.wikimedia.org/T199079) [22:55:00] (03PS1) 10Paladox: gerrit: redirect root url to on avatars url [puppet] - 10https://gerrit.wikimedia.org/r/456514 [22:55:59] (03PS2) 10Paladox: gerrit: Index avatars url showing directory listings [puppet] - 10https://gerrit.wikimedia.org/r/456514 [22:56:13] (03CR) 10Gehel: [C: 031] "LGTM, trivial enough" [software/spicerack] - 10https://gerrit.wikimedia.org/r/456513 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [22:57:31] !log jforrester@deploy1001 Finished scap: Full sync for i18n re-build following Ieaded578ffd (duration: 32m 29s) [22:57:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:57:43] Finallu. [22:57:48] I give up the conch. [22:58:00] (scap-cdb-rebuild seemed stuck on one machine for five minutes.) [23:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the Evening SWAT (Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180830T2300). [23:00:04] No GERRIT patches in the queue for this window AFAICS. [23:00:45] (03PS3) 10Paladox: gerrit: Index avatars url showing directory listings [puppet] - 10https://gerrit.wikimedia.org/r/456514 (https://phabricator.wikimedia.org/T191183) [23:02:50] (03CR) 10Volans: [C: 032] mysql: rename ensure_core_masters_in_sync [software/spicerack] - 10https://gerrit.wikimedia.org/r/456513 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [23:03:04] (03CR) 10Gehel: [C: 031] "Puppet compiler looks good: https://puppet-compiler.wmflabs.org/compiler02/12306/" [puppet] - 10https://gerrit.wikimedia.org/r/456463 (https://phabricator.wikimedia.org/T109776) (owner: 10Mholloway) [23:04:05] (03Merged) 10jenkins-bot: mysql: rename ensure_core_masters_in_sync [software/spicerack] - 10https://gerrit.wikimedia.org/r/456513 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [23:05:48] (03PS2) 10Volans: mediawiki: split set_readonly() and add checks [software/spicerack] - 10https://gerrit.wikimedia.org/r/456501 (https://phabricator.wikimedia.org/T199079) [23:06:01] (03CR) 10Volans: "done" (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/456501 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [23:06:12] 10Operations, 10Traffic, 10User-Urbanecm: Sort out HTTP caching issues for fixcopyright wiki - https://phabricator.wikimedia.org/T203179 (10Legoktm) There are two similar but different things, and I want to clarify first exactly which one we plan on enabling. $wgULSAnonCanChangeLanguage allows anonymous user... [23:09:24] (03PS4) 10Paladox: gerrit: redirect root url to on avatars url [puppet] - 10https://gerrit.wikimedia.org/r/456514 (https://phabricator.wikimedia.org/T191183) [23:09:40] (03PS5) 10Paladox: gerrit: redirect root url to on avatars url [puppet] - 10https://gerrit.wikimedia.org/r/456514 (https://phabricator.wikimedia.org/T191183) [23:09:50] (03CR) 10Gehel: [C: 031] "Sooo much nicer! Thanks!" [software/spicerack] - 10https://gerrit.wikimedia.org/r/456501 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [23:16:20] (03CR) 10Volans: [C: 032] mediawiki: split set_readonly() and add checks [software/spicerack] - 10https://gerrit.wikimedia.org/r/456501 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [23:17:24] (03Merged) 10jenkins-bot: mediawiki: split set_readonly() and add checks [software/spicerack] - 10https://gerrit.wikimedia.org/r/456501 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [23:17:26] 10Operations, 10Traffic, 10User-Urbanecm: Sort out HTTP caching issues for fixcopyright wiki - https://phabricator.wikimedia.org/T203179 (10Platonides) I'm not familiar with VCL, but parsing the accept-language header (correctly) is more complex than most headers. [23:17:32] (03PS6) 10Dzahn: gerrit: redirect avatars_host root URL to regular gerrit [puppet] - 10https://gerrit.wikimedia.org/r/456514 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [23:20:28] (03CR) 10Dzahn: [C: 032] "yep, talked about it. if a user opens the root URL of the avatars host for some reason it will be nice if they just get redirect to regula" [puppet] - 10https://gerrit.wikimedia.org/r/456514 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [23:23:01] (03PS15) 10Paladox: Gerrit: Hook up gerrit.wmfusercontent.org to apache [puppet] - 10https://gerrit.wikimedia.org/r/439808 (https://phabricator.wikimedia.org/T191183) [23:23:14] (03PS5) 10Paladox: Gerrit: Setup avatars url in gerrit config [puppet] - 10https://gerrit.wikimedia.org/r/456437 (https://phabricator.wikimedia.org/T191183) [23:26:30] (03CR) 10Paladox: [C: 04-1] "We need to install the avatars-external plugin at the same time." [puppet] - 10https://gerrit.wikimedia.org/r/456437 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [23:26:59] (03CR) 10Paladox: [C: 031] "Actually this could be done anytime." [puppet] - 10https://gerrit.wikimedia.org/r/456437 (https://phabricator.wikimedia.org/T191183) (owner: 10Paladox) [23:36:30] !log releases1001 - killing outdated rsync processes for releases from bromine [23:36:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:38:28] !log bromine - remove outdated rsync conf fragment for static-bugzilla, stopping rsync, running puppet [23:38:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:39:36] 10Operations, 10WMF-JobQueue, 10Patch-For-Review: Dismantle most of the old jobqueue infrastructure - https://phabricator.wikimedia.org/T197003 (10Krinkle) [23:42:13] !log vega - removed rsync config and let puppet regenerate it [23:42:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:59:47] 10Operations: move human users out of UID range for system accounts - https://phabricator.wikimedia.org/T114446 (10Dzahn) >>! In T114446#4541622, @ArielGlenn wrote: > Do we know how many current users this impacts in labs? I found an answer to this straight with ldapsearch, The answer is 28. {P7503}