[00:01:31] (03PS1) 10Dzahn: releases-jenkins: remove now unused jenkins_proxy file [puppet] - 10https://gerrit.wikimedia.org/r/382097 (https://phabricator.wikimedia.org/T164030) [00:02:03] PROBLEM - puppet last run on releases2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:02:04] (03PS2) 10Dzahn: releases-jenkins: remove now unused jenkins_proxy file [puppet] - 10https://gerrit.wikimedia.org/r/382097 (https://phabricator.wikimedia.org/T164030) [00:02:31] (03CR) 10Dzahn: [C: 032] releases-jenkins: remove now unused jenkins_proxy file [puppet] - 10https://gerrit.wikimedia.org/r/382097 (https://phabricator.wikimedia.org/T164030) (owner: 10Dzahn) [00:04:33] RECOVERY - puppet last run on releases1001 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [00:07:03] RECOVERY - puppet last run on releases2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:22:38] (03PS1) 10Dzahn: releases: remove proxy_jenkins class, simplify [puppet] - 10https://gerrit.wikimedia.org/r/382098 [00:23:17] 10Operations, 10Community-Tech, 10MediaWiki-General-or-Unknown, 10Stewards-and-global-tools (Temporary-UserRights): Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3655917 (10Dzahn) Yep, once you know what command exactly you want to run you can ping... [00:23:27] (03CR) 10jerkins-bot: [V: 04-1] releases: remove proxy_jenkins class, simplify [puppet] - 10https://gerrit.wikimedia.org/r/382098 (owner: 10Dzahn) [00:24:27] 10Operations, 10Community-Tech, 10MediaWiki-General-or-Unknown, 10Stewards-and-global-tools (Temporary-UserRights): Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3655920 (10Dzahn) ..of course pending that DBAs are ok with this running as cron. [00:29:56] (03PS2) 10Dzahn: releases: remove proxy_jenkins class, simplify [puppet] - 10https://gerrit.wikimedia.org/r/382098 [00:33:38] (03CR) 10jerkins-bot: [V: 04-1] releases: remove proxy_jenkins class, simplify [puppet] - 10https://gerrit.wikimedia.org/r/382098 (owner: 10Dzahn) [00:35:52] -1 for "includes apache::mod::rewrite from another module" [00:35:56] well this is going to be common [00:36:24] we always said "no, use the module to setup Apache" in like everywthing [00:41:57] yeah... that's everywhere and the right pattern [00:42:28] unless all usage like that is supposed to move up to a profile? [00:42:46] yea, that [00:42:49] i think so [00:43:13] just not from module to module [00:43:16] so most of the modules I've ever written are really profiles I guess [00:44:27] but then what about things like service::uwsgi that role common patterns into a define? Is that still ok? [00:45:40] yes, i heard "defines are ok, even encouraged" today [00:45:41] heh [00:46:35] I guess I'll learn the rules. Can't be worse than Puppet generally. ; [00:46:37] the roles we wrote were all like profiles [00:47:04] and actual roles can only ever be a single one per node [01:00:45] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, 10Stewards-and-global-tools (Temporary-UserRights): Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3655999 (10EddieGP) a:03EddieGP >>! In T176754#3655920, @Dzahn wrote: > ..of... [01:47:02] RECOVERY - IPMI Sensor Status on analytics1035 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK [02:08:00] 10Operations, 10monitoring, 10Graphite, 10User-fgiunchedi: Upgrade grafana to 4.5.2 - https://phabricator.wikimedia.org/T175980#3656013 (10Krinkle) [02:16:47] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, 10Stewards-and-global-tools (Temporary-UserRights): Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3656030 (10Legoktm) 05Open>03declined I agree with T176754#3636245 and am... [02:25:34] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.1) (duration: 07m 44s) [02:25:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:30:42] PROBLEM - puppet last run on dataset1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:00:42] RECOVERY - puppet last run on dataset1001 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [03:02:19] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.2) (duration: 15m 20s) [03:02:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:09:34] !log l10nupdate@tin ResourceLoader cache refresh completed at Wed Oct 4 03:09:34 UTC 2017 (duration 7m 15s) [03:09:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:46:57] 10Operations, 10Traffic, 10Community-Liaisons (Oct-Dec 2017), 10Patch-For-Review, 10User-Johan: Communicate dropping IE8-on-XP support (a security change) to affected editors and other community members - https://phabricator.wikimedia.org/T163251#3656120 (10Johan) [04:16:45] 10Operations, 10Traffic, 10Community-Liaisons (Oct-Dec 2017), 10Patch-For-Review, 10User-Johan: Communicate dropping IE8-on-XP support (a security change) to affected editors and other community members - https://phabricator.wikimedia.org/T163251#3656134 (10Johan) There will be a new reminder in [[ https... [04:43:07] (03PS2) 10Mridubhatnagar: Explain scripts/webservice does [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/381296 [05:23:46] (03PS3) 10Mridubhatnagar: Explain scripts/webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/381296 [05:24:56] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1056 - https://phabricator.wikimedia.org/T177171#3656157 (10Marostegui) 05Open>03Resolved Raid back to optimal - thank you Chris!: ``` root@db1056:~# megacli -LDPDInfo -aAll Adapter #0 Number of Virtual Disks: 1 Virtual Drive: 0 (Target Id: 0) Name... [05:42:42] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [1000.0] [05:44:38] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2034, db2055 and db2062" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382115 [05:44:41] (03PS2) 10Marostegui: Revert "db-codfw.php: Depool db2034, db2055 and db2062" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382115 [05:44:43] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0] [05:54:42] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2034, db2055 and db2062" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382115 (owner: 10Marostegui) [05:57:16] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2034, db2055 and db2062" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382115 (owner: 10Marostegui) [05:57:26] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2034, db2055 and db2062" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382115 (owner: 10Marostegui) [05:57:43] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [05:58:33] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2034, db2055 db2062 - T174509 (duration: 00m 51s) [05:58:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:58:41] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [05:58:52] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:01:59] (03PS1) 10Urbanecm: New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382116 (https://phabricator.wikimedia.org/T177370) [06:09:52] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] [06:09:53] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [1000.0] [06:13:37] !log Optimize pagelinks and templatelinks tables on s7 codfw master (db2029) this will generate lag - T174509 [06:13:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:13:44] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [06:14:42] checking those 503s [06:16:20] !log restart varnish backend on cp3041 [06:16:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:17:26] the other peak was cp3043, will restart that one too after cp3041 is up [06:19:10] !log Optimize pagelinks and templatelinks on dbstore2002 s1 - T174509 [06:19:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:19:19] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [06:21:45] !log restart varnish backend on cp3043 [06:21:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:22:02] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:22:17] (03PS1) 10Marostegui: db-codfw.php: Depool db2035 and db2056 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382117 (https://phabricator.wikimedia.org/T174509) [06:22:36] * elukey waves to the alter table master marostegui [06:22:48] hahaha [06:22:53] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:22:59] * marostegui waves to the drop table master elukey [06:24:39] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2035 and db2056 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382117 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [06:26:19] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2035 and db2056 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382117 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [06:26:29] (03CR) 10jenkins-bot: db-codfw.php: Depool db2035 and db2056 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382117 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [06:27:39] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2035 and db2056 - T174509 (duration: 00m 50s) [06:27:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:27:46] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [06:30:39] !log Optimize templatelinks and pagelinks tables on db1066 - T174509 [06:30:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:33:24] (03PS1) 10Marostegui: db-eqiad.php: Depool db1087 to clone db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382118 (https://phabricator.wikimedia.org/T172679) [06:34:43] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Next): Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532#3656202 (10Paladox) Bump. [06:36:10] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Next): Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532#3628916 (10Marostegui) >>! In T176532#3656202, @Paladox wrote: > Bump. Hey Paladox Chec... [06:36:21] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1087 to clone db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382118 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [06:37:52] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1087 to clone db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382118 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [06:38:02] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1087 to clone db1104 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382118 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [06:39:08] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1087 - T172679 (duration: 00m 50s) [06:39:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:39:15] T172679: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679 [06:42:03] (03PS1) 10Marostegui: mariadb: Add db1104 to s5 cloned from db1087 [puppet] - 10https://gerrit.wikimedia.org/r/382119 (https://phabricator.wikimedia.org/T172679) [06:45:09] (03CR) 10Marostegui: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/8165/" [puppet] - 10https://gerrit.wikimedia.org/r/382119 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [06:49:50] (03PS1) 10Marostegui: db1087.yaml: Update socket location [puppet] - 10https://gerrit.wikimedia.org/r/382120 [06:51:55] (03PS1) 10Marostegui: s5.hosts: Add db1104 to s5 [software] - 10https://gerrit.wikimedia.org/r/382121 (https://phabricator.wikimedia.org/T172679) [06:53:02] (03CR) 10Marostegui: [C: 032] s5.hosts: Add db1104 to s5 [software] - 10https://gerrit.wikimedia.org/r/382121 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [06:53:46] !log Stop MySQL on db1087 to clone db1104 from it - T172679 [06:53:51] (03Merged) 10jenkins-bot: s5.hosts: Add db1104 to s5 [software] - 10https://gerrit.wikimedia.org/r/382121 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [06:53:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:53:52] T172679: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679 [06:55:15] (03PS1) 10Muehlenhoff: Remove access for zhousquared [puppet] - 10https://gerrit.wikimedia.org/r/382122 [06:56:57] (03CR) 10Muehlenhoff: [C: 032] Remove access for zhousquared [puppet] - 10https://gerrit.wikimedia.org/r/382122 (owner: 10Muehlenhoff) [06:57:10] (03CR) 10Marostegui: [C: 032] db1087.yaml: Update socket location [puppet] - 10https://gerrit.wikimedia.org/r/382120 (owner: 10Marostegui) [06:57:19] (03PS2) 10Marostegui: db1087.yaml: Update socket location [puppet] - 10https://gerrit.wikimedia.org/r/382120 [07:01:45] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Remove db2010 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382123 (https://phabricator.wikimedia.org/T175685) [07:04:31] (03CR) 10Hashar: "./vendor/bundle is used when installing with --deployment, similar to our /vendor.git repos" [puppet] - 10https://gerrit.wikimedia.org/r/381934 (owner: 10Giuseppe Lavagetto) [07:06:51] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Remove db2010 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382123 (https://phabricator.wikimedia.org/T175685) (owner: 10Marostegui) [07:08:13] hello hashar [07:08:18] welcome to Nigeria [07:08:27] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db2010 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382123 (https://phabricator.wikimedia.org/T175685) (owner: 10Marostegui) [07:08:37] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Remove db2010 from config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382123 (https://phabricator.wikimedia.org/T175685) (owner: 10Marostegui) [07:10:04] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::ci::slave: install blubber [puppet] - 10https://gerrit.wikimedia.org/r/382004 (https://phabricator.wikimedia.org/T175296) (owner: 10Giuseppe Lavagetto) [07:10:08] (03PS1) 10Elukey: Revert "releases-jenkins: remove now unused jenkins_proxy file" [puppet] - 10https://gerrit.wikimedia.org/r/382124 [07:10:11] (03PS2) 10Giuseppe Lavagetto: profile::ci::slave: install blubber [puppet] - 10https://gerrit.wikimedia.org/r/382004 (https://phabricator.wikimedia.org/T175296) [07:10:15] (03PS2) 10Elukey: Revert "releases-jenkins: remove now unused jenkins_proxy file" [puppet] - 10https://gerrit.wikimedia.org/r/382124 [07:10:35] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Remove db2010 from config as it will be decommissioned - T175685 (duration: 00m 48s) [07:10:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:10:42] T175685: Decommission db2010 and move m1 codfw to db2078 - https://phabricator.wikimedia.org/T175685 [07:10:48] (03CR) 10Elukey: [C: 032] Revert "releases-jenkins: remove now unused jenkins_proxy file" [puppet] - 10https://gerrit.wikimedia.org/r/382124 (owner: 10Elukey) [07:11:02] that did not last long :] klined! [07:11:08] (03PS1) 10Elukey: Revert "releases: drop /ci/ suffix for jenkins-proxy, unify templates" [puppet] - 10https://gerrit.wikimedia.org/r/382125 [07:11:18] (03PS2) 10Elukey: Revert "releases: drop /ci/ suffix for jenkins-proxy, unify templates" [puppet] - 10https://gerrit.wikimedia.org/r/382125 [07:11:30] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Remove db2010 from config as it will be decommissioned - T175685 (duration: 00m 48s) [07:11:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:11:56] (03CR) 10Elukey: [C: 032] Revert "releases: drop /ci/ suffix for jenkins-proxy, unify templates" [puppet] - 10https://gerrit.wikimedia.org/r/382125 (owner: 10Elukey) [07:12:45] <_joe_> let's protect the topic at least [07:12:48] hashar: sorry but releases.w.o was down due to an apache misconfig :( [07:13:00] <_joe_> elukey: it was *down*? [07:13:15] <_joe_> I thought it was just unable to reload the config [07:13:19] <_joe_> so we had an outage [07:13:22] <_joe_> for 11 hours [07:13:27] elukey: I got a patch floating around to move that prefix to a hiera key (iirc) [07:13:29] <_joe_> and no one noticed [07:13:45] nono I need to check, now that I see I tried to access "release.w.o" at first try (needed coffee) [07:13:50] so it might be only a reload [07:14:00] _joe_ o/ same joker is in #wikimedia-cloud, if you have the creds there and inclination to fix the topic [07:14:02] elukey: and I guess if something is broken, revert is 100% fine :] Can always be redone later :] [07:14:13] <_joe_> awight: no I do not [07:14:39] <_joe_> elukey: wait please [07:14:44] <_joe_> I want to assess the situation [07:15:04] <_joe_> so the server is *up* [07:15:11] <_joe_> did you restart it? [07:15:13] nono I was wrong, the cron error msg said "failed to reload" [07:15:15] my bad [07:15:20] <_joe_> cron? [07:15:26] <_joe_> why cron btw? [07:15:26] yeah I did it, with the correct config though [07:15:28] <_joe_> why not puppet? [07:15:33] <_joe_> elukey: uhm [07:15:37] <_joe_> ok [07:15:50] so Cron (03PS3) 10Giuseppe Lavagetto: profile::ci::slave: install blubber [puppet] - 10https://gerrit.wikimedia.org/r/382004 (https://phabricator.wikimedia.org/T175296) [07:16:07] before restarting I of course ran apachectl -t etc.. [07:16:11] I didn't do the cowboy restart :) [07:16:22] <_joe_> yeah not sure why cron is reloading apache there [07:16:27] <_joe_> seems quite wrong [07:16:32] <_joe_> unless it's logrotate [07:16:41] yep it is [07:18:04] apache reloaded, config good, https://releases.wikimedia.org/ up and running fine [07:18:21] 10Operations: Phase out DSA keys for SSH access (ssh-dss) - https://phabricator.wikimedia.org/T177371#3656267 (10MoritzMuehlenhoff) [07:18:36] <_joe_> elukey: so it was effectively unreachable until you did that? [07:18:46] <_joe_> I need to understand if we need an incident report [07:18:49] <_joe_> I guesss we do [07:18:53] (03PS1) 10Marostegui: linux-host-entries: Remove db2010 [puppet] - 10https://gerrit.wikimedia.org/r/382127 [07:19:07] (03CR) 10jerkins-bot: [V: 04-1] linux-host-entries: Remove db2010 [puppet] - 10https://gerrit.wikimedia.org/r/382127 (owner: 10Marostegui) [07:19:48] nope it was fine, every time I changed anything the site was completely up and running. The only issue that is completely my fault was that I typed "release.w.o" instead releases.w.o and I thought it was down after a wrong reload [07:20:08] <_joe_> ahahah ok [07:20:17] so I am not sure if it was down before I started working, but almost surely not [07:20:27] (03PS6) 10Hashar: contint: move an include from site.pp to role [puppet] - 10https://gerrit.wikimedia.org/r/381648 [07:20:29] (03PS2) 10Marostegui: install_server: Remove db2010 [puppet] - 10https://gerrit.wikimedia.org/r/382127 [07:20:31] (03PS4) 10Hashar: contint: move jenkins from role to a profile [puppet] - 10https://gerrit.wikimedia.org/r/381649 [07:22:50] (03CR) 10jerkins-bot: [V: 04-1] install_server: Remove db2010 [puppet] - 10https://gerrit.wikimedia.org/r/382127 (owner: 10Marostegui) [07:22:52] (03PS3) 10Marostegui: install_server: Remove db2010 [puppet] - 10https://gerrit.wikimedia.org/r/382127 (https://phabricator.wikimedia.org/T175685) [07:27:29] 10Operations, 10RelEng-Archive-FY201718-Q1, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10Security-General: setup releases1001.eqiad.wmnet (was: setup mwreleases1001) - https://phabricator.wikimedia.org/T164030#3218929 (10elukey) I had to revert the last changes since apache was failing to r... [07:30:45] !log stopping s5 replication on dbstore1002 and converting wikidatawiki.wb_terms into TokuDB [07:30:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:30:54] \o/ [07:34:22] (03CR) 10Hashar: [C: 031] "Still OK to go after a rebase https://puppet-compiler.wmflabs.org/compiler02/8166/" [puppet] - 10https://gerrit.wikimedia.org/r/381648 (owner: 10Hashar) [07:38:59] 10Operations, 10monitoring, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348#3656316 (10akosiaris) I am guessing we can finally close this ? [07:39:09] (03PS5) 10Hashar: contint: move jenkins from role to a profile [puppet] - 10https://gerrit.wikimedia.org/r/381649 [07:40:17] (03CR) 10Alexandros Kosiaris: [C: 032] contint: move an include from site.pp to role [puppet] - 10https://gerrit.wikimedia.org/r/381648 (owner: 10Hashar) [07:41:04] (03CR) 10Alexandros Kosiaris: [C: 032] "Ok, let's do the active/passive part in a different patch" [puppet] - 10https://gerrit.wikimedia.org/r/381649 (owner: 10Hashar) [07:41:44] (03CR) 10Hashar: "https://puppet-compiler.wmflabs.org/compiler02/8168/" [puppet] - 10https://gerrit.wikimedia.org/r/381649 (owner: 10Hashar) [07:41:56] akosiaris: that chain was a bit messy sorry :] [07:42:38] no worries [07:46:06] bah somehow contint2001 has a nrpe::monitor_service[jenkins] being added ( https://puppet-compiler.wmflabs.org/compiler02/8168/contint2001.wikimedia.org/ ) [07:48:41] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 66, down: 1, dormant: 0, excluded: 0, unused: 0 [07:48:52] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0 [07:49:00] akosiaris: _joe_ : should we get the nrpe::monitor things on the module that define the service or inside the profile ? :] [07:49:20] (03CR) 10Marostegui: [C: 032] install_server: Remove db2010 [puppet] - 10https://gerrit.wikimedia.org/r/382127 (https://phabricator.wikimedia.org/T175685) (owner: 10Marostegui) [07:49:25] (03PS4) 10Marostegui: install_server: Remove db2010 [puppet] - 10https://gerrit.wikimedia.org/r/382127 (https://phabricator.wikimedia.org/T175685) [07:49:49] hashar: the profile is preferable [07:50:23] the use case is Jenkins, I am tempted to have the jenkins class to accept a $service_monitor [07:50:44] so that different profiles relying on jenkins would have the check automagically [07:51:09] (03Abandoned) 10Muehlenhoff: profile::microsites::endowment: Switch to $CACHE_MISC [puppet] - 10https://gerrit.wikimedia.org/r/366817 (owner: 10Muehlenhoff) [07:51:26] (03PS3) 10Muehlenhoff: Remove beta::saltmaster::tools and /usr/local/bin/beta-apaches [puppet] - 10https://gerrit.wikimedia.org/r/379502 [07:51:31] hashar: if it helps, the rule is more or less like this: "the module installs/configures/runs the software, the profile assembles modules to get a service functional the wmf way, the role just aggregates profiles and is a applied to a host" [07:51:59] of course as all rules it can and will be broken.. but it's a nice rule of thumb [07:52:36] Or I can use a profile for the CI Jenkins [07:52:37] (03CR) 10Muehlenhoff: [C: 032] Remove beta::saltmaster::tools and /usr/local/bin/beta-apaches [puppet] - 10https://gerrit.wikimedia.org/r/379502 (owner: 10Muehlenhoff) [07:52:52] and another profile for a standard/basic jenkins that would include the default monitor [07:53:02] bah E_TOO_MANY_CHOICES [07:54:57] (03PS1) 10Hashar: contint: properly support jenkins monitoring [puppet] - 10https://gerrit.wikimedia.org/r/382128 [08:02:55] 10Operations, 10Continuous-Integration-Infrastructure (shipyard), 10User-Joe: Unify production and CI docker image build process - https://phabricator.wikimedia.org/T177276#3656360 (10Joe) [08:03:15] (03CR) 10Hashar: "https://puppet-compiler.wmflabs.org/compiler02/8170/" [puppet] - 10https://gerrit.wikimedia.org/r/382128 (owner: 10Hashar) [08:03:22] Error: Failed to compile catalog for node releases2001.codfw.wmnet: Attempt to assign to a reserved variable name: 'trusted' on node releases2001.codfw.wmnet [08:03:25] that never ends :] [08:10:01] PROBLEM - parsoid on wtp2003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:10:52] RECOVERY - parsoid on wtp2003 is OK: HTTP OK: HTTP/1.1 200 OK - 1051 bytes in 0.104 second response time [08:11:26] 10Operations, 10Goal, 10Technical-Debt, 10User-fgiunchedi: Reduce technical debt in metrics monitoring - https://phabricator.wikimedia.org/T177195#3656369 (10fgiunchedi) [08:11:39] 10Operations, 10Goal, 10User-fgiunchedi: Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3656370 (10fgiunchedi) [08:11:47] 10Operations, 10Goal, 10User-fgiunchedi: Export Prometheus-compatible JVM metrics from JVMs in production - https://phabricator.wikimedia.org/T177197#3656371 (10fgiunchedi) [08:11:57] 10Operations, 10Goal, 10User-fgiunchedi: Add Prometheus client support for varnish/statsd metrics daemons - https://phabricator.wikimedia.org/T177199#3656372 (10fgiunchedi) [08:15:44] (03PS3) 10Giuseppe Lavagetto: For HHVM set LANG=C.UTF-8 [puppet] - 10https://gerrit.wikimedia.org/r/353228 (https://phabricator.wikimedia.org/T107128) (owner: 10Tim Starling) [08:16:05] (03CR) 10Hashar: [V: 031 C: 031] "So that would disable monitoring of Jenkins on contint2001 (expected) and enable again monitoring on the releases hosts." [puppet] - 10https://gerrit.wikimedia.org/r/382128 (owner: 10Hashar) [08:19:05] (03PS2) 10Alexandros Kosiaris: contint: properly support jenkins monitoring [puppet] - 10https://gerrit.wikimedia.org/r/382128 (owner: 10Hashar) [08:19:10] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] contint: properly support jenkins monitoring [puppet] - 10https://gerrit.wikimedia.org/r/382128 (owner: 10Hashar) [08:20:26] (03PS1) 10Ema: prometheus::node_gdnsd: ensure python-requests is installed [puppet] - 10https://gerrit.wikimedia.org/r/382129 (https://phabricator.wikimedia.org/T147426) [08:21:16] (03CR) 10Filippo Giunchedi: [C: 031] prometheus::node_gdnsd: ensure python-requests is installed [puppet] - 10https://gerrit.wikimedia.org/r/382129 (https://phabricator.wikimedia.org/T147426) (owner: 10Ema) [08:21:40] (03CR) 10Elukey: [C: 031] prometheus::node_gdnsd: ensure python-requests is installed [puppet] - 10https://gerrit.wikimedia.org/r/382129 (https://phabricator.wikimedia.org/T147426) (owner: 10Ema) [08:22:19] 10Operations, 10ops-eqiad, 10hardware-requests: decommission wdqs100[12] - https://phabricator.wikimedia.org/T175595#3656409 (10Gehel) Removing this from the WDQS board, nothing more to do on our side... [08:22:52] (03CR) 10Ema: [C: 032] prometheus::node_gdnsd: ensure python-requests is installed [puppet] - 10https://gerrit.wikimedia.org/r/382129 (https://phabricator.wikimedia.org/T147426) (owner: 10Ema) [08:22:56] PROBLEM - jenkins_service_running on contint2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war [08:23:38] ^^ that one should go once https://gerrit.wikimedia.org/r/382128 has been applied on the icinga box [08:23:53] (03CR) 10Alexandros Kosiaris: [C: 031] "IIRC, I had this setting set to 0 to ensure the replication worked fine when having to catch up from the latest full planet dump which can" [puppet] - 10https://gerrit.wikimedia.org/r/382016 (owner: 10Gehel) [08:24:27] (03CR) 10Filippo Giunchedi: "Indeed :( Getting rid of AAAAs will break production though so we'll have to find another strategy" [puppet] - 10https://gerrit.wikimedia.org/r/381073 (https://phabricator.wikimedia.org/T176314) (owner: 10Hashar) [08:25:27] ACKNOWLEDGEMENT - jenkins_service_running on contint2001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/java .*-jar /usr/share/jenkins/jenkins.war amusso Should be disabled by merged https://gerrit.wikimedia.org/r/382128 eg: pending a puppet run on Icinga host - The acknowledgement expires at: 2017-10-05 09:24:29. [08:26:55] (03PS2) 10Gehel: osm: ensure we do have a maxInterval on OSM replication [puppet] - 10https://gerrit.wikimedia.org/r/382016 [08:27:32] (03CR) 10Gehel: [C: 032] osm: ensure we do have a maxInterval on OSM replication [puppet] - 10https://gerrit.wikimedia.org/r/382016 (owner: 10Gehel) [08:28:52] !log Stop MySQL on db2010 as it will be decommissioned - T175685 [08:28:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:00] T175685: Decommission db2010 and move m1 codfw to db2078 - https://phabricator.wikimedia.org/T175685 [08:29:21] (03PS1) 10Hashar: contint: move backup from role to a profile [puppet] - 10https://gerrit.wikimedia.org/r/382131 [08:29:30] (03PS1) 10Alexandros Kosiaris: maps: Use conftool to populate dsh hosts [puppet] - 10https://gerrit.wikimedia.org/r/382132 [08:30:31] 10Operations, 10ops-codfw, 10DBA: Decommission db2010 and move m1 codfw to db2078 - https://phabricator.wikimedia.org/T175685#3656422 (10Marostegui) a:03Papaul db2010 is ready to be fully decommissioned by @Papaul [08:30:52] !log mobrovac@tin Started deploy [restbase/deploy@8eb758a]: Switch the mobile feeds to the new storage schema and Cassandra 3 [08:30:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:33:14] 10Operations, 10monitoring, 10Patch-For-Review: Review check_raid_hpssacli frequency - https://phabricator.wikimedia.org/T173311#3656423 (10akosiaris) So I am guessing that as far as this task goes, we are done, having the potential to do future improvements on a per host/cluster/dc/role basis (yay hiera). I... [08:33:57] (03CR) 10Hashar: "https://puppet-compiler.wmflabs.org/compiler02/8173/" [puppet] - 10https://gerrit.wikimedia.org/r/382131 (owner: 10Hashar) [08:35:34] (03PS1) 10Marostegui: db-eqiad.php: Depool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382134 (https://phabricator.wikimedia.org/T174509) [08:36:24] !log upgrading app servers mw1180-mw1188, mw1209-mw1220 to HHVM 3.18.5 [08:36:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:38:57] PROBLEM - DPKG on mw1180 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [08:39:56] RECOVERY - DPKG on mw1180 is OK: All packages OK [08:40:57] !log mobrovac@tin Finished deploy [restbase/deploy@8eb758a]: Switch the mobile feeds to the new storage schema and Cassandra 3 (duration: 10m 05s) [08:41:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:29] !log akosiaris@puppetmaster1001 conftool action : set/pooled=no; selector: name=wtp10([01][0-9]|2[0-4]).eqiad.wmnet [08:41:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:43:25] (03CR) 10Hashar: "recheck" [puppet-lint/wmf_styleguide-check] - 10https://gerrit.wikimedia.org/r/381744 (owner: 10Hashar) [08:44:52] (03CR) 10Hashar: "The CI job works https://integration.wikimedia.org/ci/job/puppet-wmf-styleguide-rake-jessie/1/ :)" [puppet-lint/wmf_styleguide-check] - 10https://gerrit.wikimedia.org/r/381744 (owner: 10Hashar) [08:45:32] 10Operations, 10ops-eqiad, 10DC-Ops: decom wtp1001-wtp1024 - https://phabricator.wikimedia.org/T177374#3656431 (10akosiaris) [08:47:11] 10Operations, 10ops-eqiad, 10DC-Ops: decom wtp1001-wtp1024 - https://phabricator.wikimedia.org/T177374#3656444 (10akosiaris) The boxes are old enough (Jan 2013, soon to be 5 years old) to warrant full removal. [08:50:21] PROBLEM - puppet last run on mw1183 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm],Package[hhvm-dbg] [08:53:10] 10Operations, 10monitoring, 10Patch-For-Review: Review check_raid_hpssacli frequency - https://phabricator.wikimedia.org/T173311#3656448 (10jcrespo) +1 [08:53:11] (03CR) 10Alexandros Kosiaris: [C: 032] "It's one of those corner cases that it's fine (at least for now). Note that omitting profile::backup::host will have no ill effects on pup" [puppet] - 10https://gerrit.wikimedia.org/r/382131 (owner: 10Hashar) [08:53:17] (03PS2) 10Alexandros Kosiaris: contint: move backup from role to a profile [puppet] - 10https://gerrit.wikimedia.org/r/382131 (owner: 10Hashar) [08:53:19] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] contint: move backup from role to a profile [puppet] - 10https://gerrit.wikimedia.org/r/382131 (owner: 10Hashar) [08:54:26] (03PS4) 10Giuseppe Lavagetto: For HHVM set LANG=C.UTF-8 [puppet] - 10https://gerrit.wikimedia.org/r/353228 (https://phabricator.wikimedia.org/T107128) (owner: 10Tim Starling) [08:56:12] (03CR) 10Giuseppe Lavagetto: [C: 032] For HHVM set LANG=C.UTF-8 [puppet] - 10https://gerrit.wikimedia.org/r/353228 (https://phabricator.wikimedia.org/T107128) (owner: 10Tim Starling) [08:59:00] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382134 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [08:59:49] 10Operations, 10Release Pipeline, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Joe: Install Blubber on contint1001 - https://phabricator.wikimedia.org/T175296#3656451 (10hashar) 05Open>03Resolved blubber is on contint1001 and contint2001 (installed via role::ci:slave so that is on... [09:00:21] PROBLEM - DPKG on mw1186 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:01:21] RECOVERY - DPKG on mw1186 is OK: All packages OK [09:02:30] akosiaris: thanks for the ci backup change :] I wasnt sure how to solve that one without the ci profile requiring profile::backup::host :] [09:03:10] PROBLEM - DPKG on mw1188 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:03:23] role::ci::master now solely include profiles \o/ [09:04:10] RECOVERY - DPKG on mw1188 is OK: All packages OK [09:04:19] (03PS1) 10Alexandros Kosiaris: Disable notifications for wtp1001-wtp1024 [puppet] - 10https://gerrit.wikimedia.org/r/382135 (https://phabricator.wikimedia.org/T177374) [09:04:22] (03PS1) 10Alexandros Kosiaris: decom wtp1001-wtp1024 [puppet] - 10https://gerrit.wikimedia.org/r/382136 (https://phabricator.wikimedia.org/T177374) [09:04:30] hashar: yeah it's a corner case and it doesn't hurt so we leave it for now [09:04:32] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382134 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [09:05:21] <_joe_> hashar: <3 [09:06:00] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1097 - T174509 (duration: 00m 50s) [09:06:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:08] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [09:06:10] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1097 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382134 (https://phabricator.wikimedia.org/T174509) (owner: 10Marostegui) [09:07:36] !log Optimize templatelinks and pagelinks tables on db1097 (s4) - T174509 [09:07:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:07:45] (03CR) 10Alexandros Kosiaris: [C: 032] Disable notifications for wtp1001-wtp1024 [puppet] - 10https://gerrit.wikimedia.org/r/382135 (https://phabricator.wikimedia.org/T177374) (owner: 10Alexandros Kosiaris) [09:09:30] 10Operations, 10monitoring, 10Patch-For-Review: Review check_raid_hpssacli frequency - https://phabricator.wikimedia.org/T173311#3656473 (10akosiaris) 05Open>03Resolved [09:09:53] <_joe_> !log restarting hhvm on mwdebug*, T107128 [09:09:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:02] T107128: Scribunto string comparison works case insensitive while the standard Lua case sensitive - https://phabricator.wikimedia.org/T107128 [09:10:22] PROBLEM - puppet last run on mw1183 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm],Package[hhvm-dbg] [09:10:22] (03PS3) 10Elukey: Add mw videoscaler hiera config for the new eqiad hosts [puppet] - 10https://gerrit.wikimedia.org/r/381969 (https://phabricator.wikimedia.org/T165519) [09:11:55] (03CR) 10Elukey: [C: 032] Add mw videoscaler hiera config for the new eqiad hosts [puppet] - 10https://gerrit.wikimedia.org/r/381969 (https://phabricator.wikimedia.org/T165519) (owner: 10Elukey) [09:12:57] (03PS4) 10Sowjanyavemuri: Understand the scripts/webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/380568 (https://phabricator.wikimedia.org/T176624) [09:15:58] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3656491 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['mw1307.eqiad.wmnet', 'mw1318.eqiad.wmnet... [09:16:32] !log Reboot db1087 to pick up new kernel [09:16:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:20:22] RECOVERY - puppet last run on mw1183 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [09:22:17] (03PS1) 10Hashar: contint::website to a profile [puppet] - 10https://gerrit.wikimedia.org/r/382140 [09:22:39] akosiaris: I have a logstash endpoint change for mediawiki (https://gerrit.wikimedia.org/r/#/c/380994/), is there anything specific to do when deploying it ? [09:22:50] (03CR) 10jerkins-bot: [V: 04-1] contint::website to a profile [puppet] - 10https://gerrit.wikimedia.org/r/382140 (owner: 10Hashar) [09:25:20] !log installing ocaml security updates on trusty [09:25:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:25:39] gehel: can't think of any [09:25:59] (03CR) 10Gehel: [C: 031] "Oh that is much nicer!" [puppet] - 10https://gerrit.wikimedia.org/r/382132 (owner: 10Alexandros Kosiaris) [09:26:10] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Add db1104 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382141 (https://phabricator.wikimedia.org/T172679) [09:26:57] (03PS2) 10Gehel: maps: Use conftool to populate dsh hosts [puppet] - 10https://gerrit.wikimedia.org/r/382132 (owner: 10Alexandros Kosiaris) [09:27:28] (03PS2) 10Hashar: contint::website to a profile [puppet] - 10https://gerrit.wikimedia.org/r/382140 [09:27:41] (03CR) 10Gehel: [C: 032] maps: Use conftool to populate dsh hosts [puppet] - 10https://gerrit.wikimedia.org/r/382132 (owner: 10Alexandros Kosiaris) [09:28:21] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Add db1104 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382141 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [09:28:59] ocaml? wow :D [09:29:22] (03PS2) 10Elukey: aqs: switch to LVS endpoint for logstash [puppet] - 10https://gerrit.wikimedia.org/r/380992 (https://phabricator.wikimedia.org/T175242) (owner: 10Gehel) [09:29:40] ocaml is fun! But I did not know anyone was actually using it... [09:30:13] (03CR) 10Hashar: "Based on the wmf style guide utility (which is awesome) and the puppet coding recommendation, it looks like a lot of the contint module sh" [puppet] - 10https://gerrit.wikimedia.org/r/382140 (owner: 10Hashar) [09:30:38] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Add db1104 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382141 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [09:30:52] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Add db1104 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382141 (https://phabricator.wikimedia.org/T172679) (owner: 10Marostegui) [09:32:17] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Add db1104 to the config - T172679 (duration: 00m 51s) [09:32:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:24] T172679: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679 [09:33:13] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Add db1104 to the config - T172679 (duration: 00m 50s) [09:33:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:33:29] 10Operations, 10Contributors-Team, 10MobileFrontend, 10wikidiff2, and 3 others: Diff page consistently produces 503 on beta cluster on first visit - https://phabricator.wikimedia.org/T176637#3656593 (10Tobi_WMDE_SW) [09:34:57] (03CR) 10Gehel: "It looks like this change will require a rolling restart of OCG to be activated." [puppet] - 10https://gerrit.wikimedia.org/r/380995 (https://phabricator.wikimedia.org/T175242) (owner: 10Gehel) [09:35:23] (03CR) 10Hashar: [C: 04-1] "Yeah that patch is just a quick hack for labs, I have forgot to -1 it to prevent it from being merged for prod." [puppet] - 10https://gerrit.wikimedia.org/r/381073 (https://phabricator.wikimedia.org/T176314) (owner: 10Hashar) [09:39:51] !log Killed remaining dumpRdf runners on snapshot1007 after two crashed due to the db1087 depool. Auto-restart will do the rest. [09:39:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:40:01] apergos: FYI ^ [09:40:04] :/ [09:41:18] (03CR) 10Elukey: [C: 032] aqs: switch to LVS endpoint for logstash [puppet] - 10https://gerrit.wikimedia.org/r/380992 (https://phabricator.wikimedia.org/T175242) (owner: 10Gehel) [09:42:11] (03CR) 10Alexandros Kosiaris: "I 'd wait this out a bit to see how long ocg will be around for yet" [puppet] - 10https://gerrit.wikimedia.org/r/380995 (https://phabricator.wikimedia.org/T175242) (owner: 10Gehel) [09:43:16] (03CR) 10Gehel: "Ok, I'll start pushing for this when OCG will be the last service using the ready to be decommissioned logstash servers." [puppet] - 10https://gerrit.wikimedia.org/r/380995 (https://phabricator.wikimedia.org/T175242) (owner: 10Gehel) [09:45:27] !log rolling restart of aqs nodes to pick up the new logstash lvs config [09:45:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:46:55] PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[hhvm-dbg] [09:47:36] !log upgrading API app servers mw1189-mw1208 to HHVM 3.18.5 [09:47:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:49:09] (03PS1) 10Jcrespo: mydumper: Expand saved projects to all local datasets [puppet] - 10https://gerrit.wikimedia.org/r/382144 (https://phabricator.wikimedia.org/T162789) [09:49:14] PROBLEM - DPKG on mw1220 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:50:14] RECOVERY - DPKG on mw1220 is OK: All packages OK [09:52:49] <_joe_> moritzm: oh, right now? [09:53:01] <_joe_> ok, it will pick up the locale change then [09:53:06] <_joe_> that's ok [09:53:09] (03PS1) 10Marostegui: db-eqiad.php: Repool db1087 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382146 [09:53:52] <_joe_> plan to upgrade other hosts today? [09:54:17] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Next): Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532#3656638 (10Paladox) @Marostegui oh thanks. Is there a way we can fix this please? As it w... [09:54:19] (03CR) 10Marostegui: [C: 031] mydumper: Expand saved projects to all local datasets [puppet] - 10https://gerrit.wikimedia.org/r/382144 (https://phabricator.wikimedia.org/T162789) (owner: 10Jcrespo) [09:54:50] <_joe_> !log restarting hhvm on canaries (both appservers and canaries) T107128 [09:54:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:54:57] T107128: Scribunto string comparison works case insensitive while the standard Lua case sensitive - https://phabricator.wikimedia.org/T107128 [09:56:41] (03PS1) 10Muehlenhoff: Add Cumin alias for kafka-jumbo [puppet] - 10https://gerrit.wikimedia.org/r/382147 [09:57:07] gehel: done! [09:57:17] elukey: thanks! [09:57:20] _joe_: yeah, I'm slowly rolling out the 3.18.5, but I can stop until you've deployed the UTC locale change [09:57:24] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm],Package[hhvm-dbg] [09:57:27] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/382147 (owner: 10Muehlenhoff) [09:57:31] <_joe_> moritzm: no please go on [09:57:35] ok [09:57:56] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1087 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382146 (owner: 10Marostegui) [09:59:11] (03PS2) 10Muehlenhoff: Add Cumin alias for kafka-jumbo [puppet] - 10https://gerrit.wikimedia.org/r/382147 [09:59:36] thanks! [09:59:47] (03CR) 10Muehlenhoff: [C: 032] Add Cumin alias for kafka-jumbo [puppet] - 10https://gerrit.wikimedia.org/r/382147 (owner: 10Muehlenhoff) [10:00:21] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1087 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382146 (owner: 10Marostegui) [10:00:32] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1087 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382146 (owner: 10Marostegui) [10:01:15] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Next): Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532#3628916 (10jcrespo) > As it was working before It wasn't working before- there was a sec... [10:01:32] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1087 with low weight (duration: 00m 50s) [10:01:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:57] (03PS2) 10Jcrespo: mydumper: Expand saved projects to all local datasets [puppet] - 10https://gerrit.wikimedia.org/r/382144 (https://phabricator.wikimedia.org/T162789) [10:03:40] !log install libidn security updates [10:03:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:57] wtf php exits with 0 when it encounters an uncaught exception *facepalm* [10:04:07] (03CR) 10Jcrespo: [C: 032] mydumper: Expand saved projects to all local datasets [puppet] - 10https://gerrit.wikimedia.org/r/382144 (https://phabricator.wikimedia.org/T162789) (owner: 10Jcrespo) [10:05:05] well, apparently that's MediaWiki magic, yay [10:06:30] (03CR) 10Gehel: "It looks like this is /only/ changing rsyslog config and auto reloading it. So it should be pretty safe / easy to deploy." [puppet] - 10https://gerrit.wikimedia.org/r/380994 (https://phabricator.wikimedia.org/T175242) (owner: 10Gehel) [10:12:26] (03CR) 10Giuseppe Lavagetto: [C: 031] mediawiki: switch to LVS endpoint for logstash [puppet] - 10https://gerrit.wikimedia.org/r/380994 (https://phabricator.wikimedia.org/T175242) (owner: 10Gehel) [10:16:54] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:18:14] 10Operations, 10Goal, 10User-fgiunchedi: Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3656694 (10fgiunchedi) [10:18:31] !log akosiaris@puppetmaster1001 conftool action : set/pooled=inactive; selector: name=wtp10([01][0-9]|2[0-4]).eqiad.wmnet [10:18:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:18:51] !log T177374, fully depool wtp1001-wtp1024 [10:18:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:18:58] T177374: decom wtp1001-wtp1024 - https://phabricator.wikimedia.org/T177374 [10:20:08] (03PS2) 10Alexandros Kosiaris: decom wtp1001-wtp1024 [puppet] - 10https://gerrit.wikimedia.org/r/382136 (https://phabricator.wikimedia.org/T177374) [10:20:13] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] decom wtp1001-wtp1024 [puppet] - 10https://gerrit.wikimedia.org/r/382136 (https://phabricator.wikimedia.org/T177374) (owner: 10Alexandros Kosiaris) [10:22:53] !log draining labsdb1011 connections [10:22:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:27:15] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:27:35] (03PS1) 10Marostegui: db-eqiad.php: Increase db1087 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382148 [10:29:44] !log upgrade and reboot labsdb1011 [10:29:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:29:57] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase db1087 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382148 (owner: 10Marostegui) [10:30:56] (03PS1) 10Muehlenhoff: Add library hints for libidn/libidn2 [puppet] - 10https://gerrit.wikimedia.org/r/382149 [10:32:45] (03Merged) 10jenkins-bot: db-eqiad.php: Increase db1087 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382148 (owner: 10Marostegui) [10:32:59] (03CR) 10jenkins-bot: db-eqiad.php: Increase db1087 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382148 (owner: 10Marostegui) [10:33:53] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase db1087 weight (duration: 00m 51s) [10:33:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:34:40] 10Operations, 10media-storage: unknown error occurred in storage backend "local-swift-codfw" - https://phabricator.wikimedia.org/T155323#3656728 (10Yann) Bug reported with https://archive.org/details/larevuedejanfeb1900pariuoft via IRC. [10:44:04] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: decom wtp1001-wtp1024 - https://phabricator.wikimedia.org/T177374#3656739 (10akosiaris) [10:44:39] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: decom wtp1001-wtp1024 - https://phabricator.wikimedia.org/T177374#3656431 (10akosiaris) [10:46:38] (03PS2) 10Muehlenhoff: Add library hints for libidn/libidn2 [puppet] - 10https://gerrit.wikimedia.org/r/382149 [10:53:11] (03CR) 10Muehlenhoff: [C: 032] Add library hints for libidn/libidn2 [puppet] - 10https://gerrit.wikimedia.org/r/382149 (owner: 10Muehlenhoff) [10:54:46] RECOVERY - MariaDB Slave Lag: s4 on dbstore1001 is OK: OK slave_sql_lag not a slave [10:59:25] 10Operations, 10Goal, 10User-fgiunchedi: Export Prometheus-compatible JVM metrics from JVMs in production - https://phabricator.wikimedia.org/T177197#3656770 (10fgiunchedi) [11:04:57] PROBLEM - DPKG on mw1199 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:05:37] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382151 [11:05:57] RECOVERY - DPKG on mw1199 is OK: All packages OK [11:10:11] (03PS1) 10Alexandros Kosiaris: Remove DNS entries for wtp1001-wtp1024 [dns] - 10https://gerrit.wikimedia.org/r/382152 (https://phabricator.wikimedia.org/T177374) [11:10:23] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382151 (owner: 10Marostegui) [11:10:28] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: decom wtp1001-wtp1024 - https://phabricator.wikimedia.org/T177374#3656779 (10akosiaris) [11:11:44] (03CR) 10Alexandros Kosiaris: [C: 032] Remove DNS entries for wtp1001-wtp1024 [dns] - 10https://gerrit.wikimedia.org/r/382152 (https://phabricator.wikimedia.org/T177374) (owner: 10Alexandros Kosiaris) [11:12:34] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382151 (owner: 10Marostegui) [11:12:47] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382151 (owner: 10Marostegui) [11:13:36] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Restore db1087 original weight (duration: 00m 47s) [11:13:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:29:57] (03PS1) 10Muehlenhoff: Add library hint for ghostscript [puppet] - 10https://gerrit.wikimedia.org/r/382153 [11:32:04] (03PS2) 10Muehlenhoff: Add library hint for ghostscript [puppet] - 10https://gerrit.wikimedia.org/r/382153 [11:32:48] PROBLEM - puppet last run on mw1208 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm],Package[hhvm-dbg] [11:33:07] (03CR) 10Muehlenhoff: [C: 032] Add library hint for ghostscript [puppet] - 10https://gerrit.wikimedia.org/r/382153 (owner: 10Muehlenhoff) [11:34:06] !log installing ghostscript security updates [11:34:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:34:59] (03PS1) 10Hashar: .gitignore private in case it is a symlink [puppet] - 10https://gerrit.wikimedia.org/r/382154 [11:38:59] PROBLEM - DPKG on mw1201 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:39:59] RECOVERY - DPKG on mw1201 is OK: All packages OK [11:49:08] !log upgrading job runners mw1161-mw1167 to HHVM 3.18.5 [11:49:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:03] PROBLEM - HHVM jobrunner on mw1161 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [11:52:12] RECOVERY - HHVM jobrunner on mw1161 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [11:57:59] (03PS2) 10Gehel: mediawiki: switch to LVS endpoint for logstash [puppet] - 10https://gerrit.wikimedia.org/r/380994 (https://phabricator.wikimedia.org/T175242) [11:58:58] (03CR) 10Gehel: [C: 032] mediawiki: switch to LVS endpoint for logstash [puppet] - 10https://gerrit.wikimedia.org/r/380994 (https://phabricator.wikimedia.org/T175242) (owner: 10Gehel) [12:00:51] !log mediawiki now uses the LVS endpoint for logstash - T175242 [12:00:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:58] T175242: all log producers need to use the logstash LVS endpoint - https://phabricator.wikimedia.org/T175242 [12:02:52] RECOVERY - puppet last run on mw1208 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:08:35] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3656905 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw1307.eqiad.wmnet', 'mw1318.eqiad.wmnet'] ``` and were **ALL** successful. [12:09:45] !log elukey@puppetmaster1001 conftool action : set/pooled=yes; selector: name=mw1307.eqiad.wmnet [12:09:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:09:51] !log elukey@puppetmaster1001 conftool action : set/pooled=yes; selector: name=mw1318.eqiad.wmnet [12:09:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:10:11] !log added two new mediawiki videoscalers - mw1307/1318 [12:10:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:11:16] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3656906 (10elukey) [12:11:36] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3284670 (10elukey) The mw1307-28 batch has been completed! [12:12:19] !log upgrading HHVM on deployment servers to 3.18.5 [12:12:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:27] (03PS9) 10Hashar: contint: move from /mnt to /srv [puppet] - 10https://gerrit.wikimedia.org/r/312523 (https://phabricator.wikimedia.org/T146381) [12:18:38] 10Operations, 10ops-eqiad, 10User-Elukey, 10User-Joe: Decomission mw1161-69 - https://phabricator.wikimedia.org/T177387#3656944 (10elukey) [12:18:59] (03PS4) 10Mridubhatnagar: Explain scripts/webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/381296 [12:19:53] (03PS2) 10Urbanecm: New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382116 (https://phabricator.wikimedia.org/T177370) [12:20:23] (03PS4) 10Hashar: Migrate puppet compiler instance from /mnt to /srv [puppet] - 10https://gerrit.wikimedia.org/r/330412 (https://phabricator.wikimedia.org/T146381) [12:21:09] !log upgrading image scalers mw1293-mw1295 to HHVM 3.18.5 [12:21:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:24:59] PROBLEM - HHVM rendering on mw1293 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:25:22] (03PS5) 10Mridubhatnagar: Explain scripts/webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/381296 [12:25:49] RECOVERY - HHVM rendering on mw1293 is OK: HTTP OK: HTTP/1.1 200 OK - 74670 bytes in 0.146 second response time [12:26:15] 10Operations, 10ops-eqiad, 10User-Elukey, 10User-Joe: Decomission mw1161-69 - https://phabricator.wikimedia.org/T177387#3656968 (10elukey) Current status for the jobrunners in eqiad: ``` elukey@neodymium:~$ sudo cumin '*.eqiad.wmnet and R:class = role::mediawiki::jobrunner' 'lldpcli show neighbors | grep... [12:26:17] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3656969 (10elukey) [12:26:48] PROBLEM - MariaDB Slave Lag: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 655.19 seconds [12:30:18] (03CR) 10Mridubhatnagar: "> (2 comments)" (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/381296 (owner: 10Mridubhatnagar) [12:34:24] (03CR) 10Dbarratt: [C: 031] Enable AbuseFilter runtime profile on Portuguese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382080 (https://phabricator.wikimedia.org/T177336) (owner: 10Dmaza) [12:34:48] RECOVERY - MariaDB Slave Lag: s4 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 296.18 seconds [12:35:07] (03PS6) 10Mridubhatnagar: Explain scripts/webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/381296 [12:36:17] (03PS7) 10Mridubhatnagar: Explain scripts/webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/381296 (https://phabricator.wikimedia.org/T176018) [12:36:18] 10Operations, 10Wikimedia-Logstash, 10Discovery-Search (Current work), 10Patch-For-Review: all log producers need to use the logstash LVS endpoint - https://phabricator.wikimedia.org/T175242#3657031 (10Gehel) Correction, https://gerrit.wikimedia.org/r/380994 is actually a noop, cleaning up a default that i... [12:36:23] (03PS3) 10Hashar: Move jenkins agent username to hiera [puppet] - 10https://gerrit.wikimedia.org/r/379729 [12:38:14] !log upgrading app servers in codfw to HHVM 3.18.5 [12:38:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:17] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: decom wtp1001-wtp1024 - https://phabricator.wikimedia.org/T177374#3657081 (10akosiaris) a:05akosiaris>03Cmjohnson [12:44:13] 10Operations, 10Kubernetes: Implement authentication/authorization in Kubernetes clusters - https://phabricator.wikimedia.org/T177393#3657094 (10akosiaris) [12:44:33] (03PS4) 10Hashar: Apply jenkins agent username from hiera [puppet] - 10https://gerrit.wikimedia.org/r/379729 [12:46:21] (03PS8) 10Mridubhatnagar: Explain scripts/webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/381296 (https://phabricator.wikimedia.org/T176018) [12:55:00] 10Operations, 10Kubernetes: Experiment with a TLS proxy/router for pods - https://phabricator.wikimedia.org/T177394#3657130 (10akosiaris) [12:56:31] !log Optimize templatelinks and pagelinks tables on s2 and s7 - labsdb1010 - T174509 [12:56:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:38] T174509: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509 [12:57:10] (03Abandoned) 10Hashar: openstack: explicitly define default_log_levels [puppet] - 10https://gerrit.wikimedia.org/r/377321 (owner: 10Hashar) [12:57:14] (03Abandoned) 10Hashar: openstack: debug oslo.messaging nova.network.manager [puppet] - 10https://gerrit.wikimedia.org/r/377322 (owner: 10Hashar) [12:58:09] PROBLEM - DPKG on mw2118 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [12:59:09] RECOVERY - DPKG on mw2118 is OK: All packages OK [12:59:38] !log upgrade and reboot labsdb1009 [12:59:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: That opportune time is upon us again. Time for a European Mid-day SWAT(Max 8 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171004T1300). [13:00:04] Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:16] Hello [13:00:18] I can SWAT today [13:00:33] zeljkof: take in priority the Urbanecm rule, the event is 3pm [13:00:35] (so now) [13:00:39] 10Operations, 10Kubernetes: Improve monitoring of the Kubernetes clusters - https://phabricator.wikimedia.org/T177395#3657156 (10akosiaris) [13:00:53] Dereckson: uh oh, but it's the only patch, so deploying now! :) [13:02:17] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382116 (https://phabricator.wikimedia.org/T177370) (owner: 10Urbanecm) [13:02:59] I'm here [13:03:11] BTW the lecture told me that he'll need the throttle rule in 30 minutes or so :) [13:03:17] 10Operations, 10Kubernetes: Design pod-level monitoring and service-level alerting - https://phabricator.wikimedia.org/T177396#3657180 (10akosiaris) [13:03:28] Urbanecm: should be deployed in a few minutes, waiting for CI [13:03:37] Great [13:03:39] Thank you [13:03:55] (03Merged) 10jenkins-bot: New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382116 (https://phabricator.wikimedia.org/T177370) (owner: 10Urbanecm) [13:05:40] !log zfilipin@tin Synchronized wmf-config/throttle.php: SWAT: [[gerrit:382116|New throttle rule (T177370)]] (duration: 00m 53s) [13:05:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:47] T177370: Lift account creation IP cap for 2017-10-04 - https://phabricator.wikimedia.org/T177370 [13:06:02] Urbanecm, Dereckson: deployed, thanks for releasing with #releng ;) [13:06:11] more patches for EU SWAT? [13:06:13] (03CR) 10jenkins-bot: New throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382116 (https://phabricator.wikimedia.org/T177370) (owner: 10Urbanecm) [13:06:37] I don't think so :) [13:06:55] !log EU SWAT finished [13:07:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:07:12] this was one of the shortest swats ever [13:07:15] :D [13:07:58] PROBLEM - HHVM rendering on mw2109 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:08:49] RECOVERY - HHVM rendering on mw2109 is OK: HTTP OK: HTTP/1.1 200 OK - 74616 bytes in 0.306 second response time [13:10:26] Thanks [13:11:49] zeljkof: ouf [13:11:57] zeljkof: I was wondering about https://gerrit.wikimedia.org/r/#/c/382095/ which I just noticed [13:13:20] 10Operations, 10Kubernetes: Create scaffolding of services templates for deployment in production/staging - https://phabricator.wikimedia.org/T177397#3657212 (10akosiaris) [13:15:17] (03PS1) 10Muehlenhoff: Provide deb-src entries for older distros on package builders [puppet] - 10https://gerrit.wikimedia.org/r/382160 [13:15:46] (03CR) 10jerkins-bot: [V: 04-1] Provide deb-src entries for older distros on package builders [puppet] - 10https://gerrit.wikimedia.org/r/382160 (owner: 10Muehlenhoff) [13:17:22] (03PS2) 10Muehlenhoff: Provide deb-src entries for older distros on package builders [puppet] - 10https://gerrit.wikimedia.org/r/382160 [13:20:43] Nikerabbit: sorry, just saw you ping [13:20:48] want to deploy it now? [13:21:50] zeljkof: just trying to understand... from logstash I observe the error is still happening, is the newest branch wmf2? [13:22:00] zeljkof: in that case, I think it would be good to SWAT it [13:22:46] Nikerabbit: this is all I know https://tools.wmflabs.org/versions/ [13:22:51] !log Optimize enwiki.ores_classification on db2069 [13:22:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:23:03] looks like wmf2 is at group0 only [13:23:16] if that was your question [13:23:34] zeljkof: yep, I would deploy then [13:24:10] Nikerabbit: ok, could you please add it to the calendar? I will start with the deploy [13:24:14] zeljkof: ok [13:25:04] zeljkof: ah I see now that Krinkle scheduled it for the next SWAT, but let's do it now [13:25:19] Nikerabbit: ok [13:26:11] !log Optimize table ores_classification on enwiki codfw master db2048 with replication - might generate lag - T159753 [13:26:16] !log one more commit for EU SWAT [13:26:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:17] T159753: Concerns about ores_classification table size on enwiki - https://phabricator.wikimedia.org/T159753 [13:26:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:29] zeljkof: calendar is updated [13:27:37] Nikerabbit: thanks, merging [13:28:15] Nikerabbit: is there anything to check at mwdebug? or should I just deploy? [13:28:27] zeljkof: I have a test page that fails [13:28:39] at least in incognito mode: https://www.mediawiki.org/wiki/Thread:Project:Support_desk/Some_pages_taking_7_-_15_seconds_to_load/reply_(2) [13:29:00] Nikerabbit: ok, in that case I will ping you when it's at mwdebug1002, in a minute or two, please stand by [13:29:05] sure [13:32:10] PROBLEM - puppet last run on mw2108 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm],Package[hhvm-dbg] [13:32:42] 10Operations, 10Wikidata, 10wikiba.se, 10Wikidata-Sprint, 10Wikidata-Sprint-2016-11-08: Create wikibase/wikiba.se-deploy repo - https://phabricator.wikimedia.org/T176841#3657240 (10Ladsgroup) Made an entry there [[https://www.mediawiki.org/w/index.php?title=Gerrit/New_repositories/Requests/Entries&diff=p... [13:33:20] Nikerabbit: any order the files should be deployed in? or any order would do? (since there are two files to deploy) [13:33:47] zeljkof: they are independent [13:33:58] Nikerabbit: ok, then random order :) [13:34:02] just checking [13:34:19] (03PS1) 10Lucas Werkmeister (WMDE): Change /data/ redirect to Special:Pagedata [puppet] - 10https://gerrit.wikimedia.org/r/382163 (https://phabricator.wikimedia.org/T163922) [13:37:38] PROBLEM - DPKG on mw2120 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:38:19] Nikerabbit: the patch is at mwdebug1002, please test and let me know if I can deploy [13:38:28] (03CR) 10Ladsgroup: [C: 031] "Nice catch, I am wondering if we should rename the special page to Special:PageData (so no fix here is needed)" [puppet] - 10https://gerrit.wikimedia.org/r/382163 (https://phabricator.wikimedia.org/T163922) (owner: 10Lucas Werkmeister (WMDE)) [13:38:38] RECOVERY - DPKG on mw2120 is OK: All packages OK [13:39:18] zeljkof: my test page works now [13:39:32] Nikerabbit: ok to deploy then? [13:39:37] zeljkof: yep [13:39:47] deploying [13:40:33] PROBLEM - LVS HTTP IPv4 on api.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:40:47] (03CR) 10Lucas Werkmeister (WMDE): "Yeah, that’s also possible… no opinion from me on that :)" [puppet] - 10https://gerrit.wikimedia.org/r/382163 (https://phabricator.wikimedia.org/T163922) (owner: 10Lucas Werkmeister (WMDE)) [13:40:52] not good [13:40:53] is the api down? [13:40:59] mmh [13:41:12] !log zfilipin@tin Synchronized php-1.31.0-wmf.2/extensions/Translate/TranslateHooks.php: php-1.31.0-wmf.2/extensions/Translate/tag/PageTranslationHooks.php SWAT: [[gerrit:382095|Revert "Use Language type hint in hooks" (T177352)]] (duration: 00m 52s) [13:41:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:41:19] T177352: PageTranslationHooks::onPageContentLanguage() must be an instance of Language, StubUserLangGiven - https://phabricator.wikimedia.org/T177352 [13:41:23] RECOVERY - LVS HTTP IPv4 on api.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 23478 bytes in 0.604 second response time [13:41:28] is some code change ongoing? [13:41:30] I think that Moritz is updrading hhvm in there, it might be a side effect [13:41:38] mmm [13:41:46] Nikerabbit: deployed, please check [13:41:56] elukey: worst case scenario, some errors [13:42:04] but there are multiple api servers [13:42:08] or there should be [13:42:18] PROBLEM - DPKG on mw2133 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [13:42:41] zeljkof: both files? not working yet [13:42:47] (03CR) 10Ladsgroup: [C: 031] "I think we should go that way because of the consistency with other special pages like Special:NewPages or Special:RecentChanges" [puppet] - 10https://gerrit.wikimedia.org/r/382163 (https://phabricator.wikimedia.org/T163922) (owner: 10Lucas Werkmeister (WMDE)) [13:42:50] high 4XX [13:43:09] the dashboard is still very much under construction, but: https://grafana.wikimedia.org/dashboard/db/pybal?orgId=1&var-datasource=codfw%20prometheus%2Fops&var-server=lvs2003&var-service=api_80&var-service=api-https_443 [13:43:14] strange [13:43:28] RECOVERY - DPKG on mw2133 is OK: All packages OK [13:43:31] Nikerabbit: let me check try one more time [13:43:55] jynus: the api.svc.codfw.wmnet endpoint is not used right? [13:43:56] what is the change being deployed? [13:44:07] I doubt that's the HHVM updates, this was only one updated codfw host at a time [13:44:17] jynus: fixing an exception on some pages [13:44:17] elukey: not at this exact moment, but configuration to use it is pretty easy [13:44:30] 10Operations: Upgrade Cumin masters to stretch - https://phabricator.wikimedia.org/T177385#3657259 (10Reedy) [13:44:33] bblack: sure sure, just wanted to make sure about it :) [13:44:33] yeah, what I mean is, either the endpoint is bad configured(LVS) [13:44:37] or it is a code problem [13:44:38] moritzm: it may not be the update process itself, but faulty results afterwards? [13:44:44] there was one ongoing deployment [13:44:50] or [13:44:54] it is a false positive [13:45:02] !log zfilipin@tin Synchronized php-1.31.0-wmf.2/extensions/Translate/TranslateHooks.php: SWAT: [[gerrit:382095|Revert "Use Language type hint in hooks" (T177352)]] (duration: 00m 50s) [13:45:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:45:27] 10Operations, 10Operations-Software-Development: Upgrade Cumin masters to stretch - https://phabricator.wikimedia.org/T177385#3657264 (10Volans) [13:45:32] can deployers pause deployment and attend a potential outage? [13:45:57] <_joe_> zeljkof: can you pause, please? [13:46:10] !log zfilipin@tin Synchronized php-1.31.0-wmf.2/extensions/Translate/tag/PageTranslationHooks.php: SWAT: [[gerrit:382095|Revert "Use Language type hint in hooks" (T177352)]] (duration: 00m 50s) [13:46:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:46:17] _joe_: done, acutually [13:46:29] it is likely to be a false positive, but the first reaction shoudl be stop, check, then contiunue [13:46:29] Nikerabbit: try again, please [13:46:41] zeljkof: works now [13:47:09] Nikerabbit: I guess I messed up the first time, tried some fancy new way, did not work, will stick to the old way that works :) [13:47:10] proxyfetch failed but not idleconn [13:47:17] !log EU SWAT finished [13:47:20] so, the api servers are still taking connections, but http reqs are failing... [13:47:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:47:38] <_joe_> bblack: yeah let me check that [13:50:01] I do not see slowdown on db apit traffic, so I can almost discard real user affection [13:51:48] 40:23-41:20 [13:51:57] doesn't anyone else see something else? [13:52:44] *does [13:52:48] I don't spot anything unusual in logstash [13:53:47] Nikerabbit: for what other people are saying and what I am seeing, the affected end point wasn't active [13:54:45] mw1227 had a spike of "pl proc line: 2959: warning: points must have either 4 or 2 values per line" [13:55:01] oh, not only that one [13:55:01] 10Operations: Phase out DSA keys for SSH access (ssh-dss) - https://phabricator.wikimedia.org/T177371#3656267 (10faidon) We have at least another usage, the Ganeti key (cf. `modules/role/manifests/ganeti.pp`). This was for legacy reasons -- Ganeti didn't support RSA, but I think it does now, at least in the vers... [13:57:01] Nikerabbit: don't worry, for now we do not think it is code [13:57:18] RECOVERY - puppet last run on mw2108 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [14:01:17] okay [14:05:30] (03CR) 10Alexandros Kosiaris: "Which in turn redirects with a 303 to index.php?title=Data:blahblah. Of course this can be made better (at least removing 1 redirect). I a" [puppet] - 10https://gerrit.wikimedia.org/r/382163 (https://phabricator.wikimedia.org/T163922) (owner: 10Lucas Werkmeister (WMDE)) [14:10:13] !log base package upgrades (+autoremove) on lvs4* [14:10:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:49] (03CR) 10Mforns: [C: 04-1] "> * Except for the skin field (cf. below), this purging strategy had" [puppet] - 10https://gerrit.wikimedia.org/r/379829 (https://phabricator.wikimedia.org/T175395) (owner: 10Bmansurov) [14:18:04] !log base package upgrades (+autoremove) on lvs2* [14:18:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:53] (03PS1) 10Giuseppe Lavagetto: Rakefile: split in modules, refactor git interaction [puppet] - 10https://gerrit.wikimedia.org/r/382165 [14:24:40] !log upgrading nginx to 1.13.5-1+wmf1~jessie1 on cp4021 (first live cache, upload@ulsfo) [14:24:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:56] (03CR) 10Lucas Werkmeister (WMDE): "As long as the lowercase version then redirects to the uppercase version, I wouldn’t expect any breakage. (famous last words…)" [puppet] - 10https://gerrit.wikimedia.org/r/382163 (https://phabricator.wikimedia.org/T163922) (owner: 10Lucas Werkmeister (WMDE)) [14:38:24] 10Operations, 10ops-esams, 10Traffic: esams rack OE10 power redundancy issues? (cp3030-9) - https://phabricator.wikimedia.org/T177403#3657409 (10BBlack) [14:39:11] ACKNOWLEDGEMENT - IPMI Sensor Status on cp3030 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] Brandon Black T177403 [14:39:11] ACKNOWLEDGEMENT - IPMI Sensor Status on cp3031 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] Brandon Black T177403 [14:39:11] ACKNOWLEDGEMENT - IPMI Sensor Status on cp3032 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] Brandon Black T177403 [14:39:11] ACKNOWLEDGEMENT - IPMI Sensor Status on cp3033 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] Brandon Black T177403 [14:39:11] ACKNOWLEDGEMENT - IPMI Sensor Status on cp3034 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] Brandon Black T177403 [14:39:11] ACKNOWLEDGEMENT - IPMI Sensor Status on cp3035 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] Brandon Black T177403 [14:39:11] ACKNOWLEDGEMENT - IPMI Sensor Status on cp3036 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] Brandon Black T177403 [14:39:12] ACKNOWLEDGEMENT - IPMI Sensor Status on cp3037 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] Brandon Black T177403 [14:39:12] ACKNOWLEDGEMENT - IPMI Sensor Status on cp3038 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] Brandon Black T177403 [14:39:13] ACKNOWLEDGEMENT - IPMI Sensor Status on cp3039 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] Brandon Black T177403 [14:41:34] !log base package upgrades (+autoremove) on lvs3003 [14:41:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:28] !log base package upgrades (+autoremove) on lvs3004 [14:43:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:45:38] !log base package upgrades (+autoremove) on lvs3001-2 [14:45:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:47:10] !log replacing faulty VC cable on fasw-c-eqiad [14:47:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:52:14] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, 10Stewards-and-global-tools (Temporary-UserRights): Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3657440 (10MarcoAurelio) 05declined>03Open Sorry, I dare to totally disagr... [14:57:56] !log cp4025: restart varnish backend [14:58:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:36] 10Operations, 10ops-esams, 10Traffic: esams rack OE10 power redundancy issues? (cp3030-9) - https://phabricator.wikimedia.org/T177403#3657475 (10faidon) [15:03:38] 10Operations, 10ops-esams, 10DC-Ops: Multiple systems in esams OE10 showing PSU failures - https://phabricator.wikimedia.org/T177228#3657478 (10faidon) [15:06:29] 10Operations, 10ops-esams, 10DC-Ops, 10Traffic: Multiple systems in esams OE10 showing PSU failures - https://phabricator.wikimedia.org/T177228#3657486 (10BBlack) [15:13:45] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, 10Stewards-and-global-tools (Temporary-UserRights): Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3635590 (10jcrespo) This is my (I hope) neutral evaluation of the issue: * Th... [15:13:59] 10Operations, 10ops-eqiad: rack and setup db1107 and db1008 - https://phabricator.wikimedia.org/T177405#3657517 (10Cmjohnson) [15:14:38] 10Operations, 10ops-eqiad: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3657543 (10Cmjohnson) [15:18:31] (03Abandoned) 10Zoranzoki21: Removing expired rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381635 (owner: 10Zoranzoki21) [15:20:27] 10Operations, 10ops-eqiad: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3657615 (10jcrespo) Wait, should those be named db1* ? CC @Cmjohnson @elukey @Ottomata I personally do not have a problem with that (it gets some regex configurations because it is a database), but maybe you... [15:21:18] 10Operations, 10ops-eqiad: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3657618 (10Ottomata) I have no opinions on the naming of these boxes :) [15:24:11] 10Operations, 10ops-eqiad: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3657644 (10elukey) I have no preference, the alternative would be to explicitly mention eventlogging in their names, going to ask to my team and report back asap. [15:24:52] !log rebuilding labvirt1016 [15:24:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:46] !log Disable puppet on db2010 - it will be decommissioned - T175685 [15:28:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:53] T175685: Decommission db2010 and move m1 codfw to db2078 - https://phabricator.wikimedia.org/T175685 [15:29:30] 10Operations, 10ops-codfw, 10DBA: Decommission db2010 and move m1 codfw to db2078 - https://phabricator.wikimedia.org/T175685#3657671 (10Marostegui) [15:29:48] (03PS1) 10Jcrespo: mariadb: Add file to control additional wikireplicas-only indexes [puppet] - 10https://gerrit.wikimedia.org/r/382170 (https://phabricator.wikimedia.org/T177096) [15:31:05] (03PS1) 10Marostegui: site.pp: Remove db2010 [puppet] - 10https://gerrit.wikimedia.org/r/382171 (https://phabricator.wikimedia.org/T175685) [15:31:41] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, 10Stewards-and-global-tools (Temporary-UserRights): Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3657682 (10MarcoAurelio) @jcrespo Thank you. Point number two is what people a... [15:32:15] !log caches: apt autoremove -> install python-requests [15:32:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:33:01] (03CR) 10Marostegui: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/8175/" [puppet] - 10https://gerrit.wikimedia.org/r/382171 (https://phabricator.wikimedia.org/T175685) (owner: 10Marostegui) [15:33:08] (03PS2) 10Giuseppe Lavagetto: Rakefile: split in modules, refactor git interaction [puppet] - 10https://gerrit.wikimedia.org/r/382165 [15:34:09] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Decommission db2010 and move m1 codfw to db2078 - https://phabricator.wikimedia.org/T175685#3657692 (10Marostegui) [15:35:35] !log recdns: apt autoremove -> install python-requests [15:35:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:35:58] !log Power off db2010 to decommission it - T175685 [15:36:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:36:06] T175685: Decommission db2010 and move m1 codfw to db2078 - https://phabricator.wikimedia.org/T175685 [15:36:48] PROBLEM - MariaDB Slave Lag: s4 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 614.21 seconds [15:37:22] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: Decommission db2010 and move m1 codfw to db2078 - https://phabricator.wikimedia.org/T175685#3657708 (10Marostegui) [15:37:23] !log hydrogen (recdns@eqiad) - base package upgrades [15:37:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:37:54] 10Operations, 10Traffic, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3657709 (10ema) So, packages-wise, I've packaged and uploaded [[ https://packages.qa.debian.org/v/varnish-modules/news/20171004T150923Z.html | varnish-modules 0.12.1 ]] to Debian unstable. @a... [15:39:05] !log lvs1007-12: base package upgrades, autoremove, +python-requests [15:39:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:40:16] (03CR) 10Giuseppe Lavagetto: [C: 031] wmf-auto-reimage: add support for rename [puppet] - 10https://gerrit.wikimedia.org/r/381200 (https://phabricator.wikimedia.org/T176955) (owner: 10Volans) [15:42:33] I though it was policy to not rename? [15:42:36] 10Operations, 10ops-eqiad: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3657730 (10Marostegui) I don't really have any preference on where to rack them, I would just suggest they are placed on a different rack. But as these two hosts will be mostly for Analytics, I would leave i... [15:43:42] <_joe_> bblack: servers? we did that from time to time [15:44:44] well, it's at least strongly frowned on. maybe we should revisit that policy at some point and clarify it in some direction or other. [15:45:18] <_joe_> bblack: when we transformed an imagescaler into a thumbor host, we did rename it [15:45:46] RECOVERY - MariaDB Slave Lag: s4 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 245.05 seconds [15:45:50] IIRC the basic argument against all renaming is that our tickets about past hw/sw issues, etc all tend to use the hostname as a key, so renames confuse history (can't lookup history of same bad problem on this host under another name, or even worse if you rename to the former name of another host) [15:46:24] + confusing in terms of anything else that references hostnames as if they were tied to the hardware I guess, like server lifecycle spreadsheets about purchasing + decoms, etc [15:47:01] there's a sort-of solution in saying that we should use asset tags to refer to hardware in all these cases instead of hostnames, but it's certainly not current practice to do so (and brings its own level of confusion) [15:50:04] (03CR) 10Volans: [C: 031] "The split logic looks sane to me, let's see if all still works together!" [puppet] - 10https://gerrit.wikimedia.org/r/382165 (owner: 10Giuseppe Lavagetto) [15:51:09] (03PS1) 10Ayounsi: Imported Upstream version 5.1.3 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382174 [15:51:11] (03PS1) 10Ayounsi: Adapt WMF specific patches for Varnish5 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382175 [15:51:13] (03PS1) 10Ayounsi: Update changelog for Varnish 5.1.3 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382176 [15:51:15] (03PS1) 10Ayounsi: Varnish5: Install devicedetect.vcl in the proper path [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382177 [15:51:17] (03PS1) 10Ayounsi: Varnish5: Update libvarnishapi1.symbols [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382178 [15:51:28] (03CR) 10jerkins-bot: [V: 04-1] Imported Upstream version 5.1.3 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382174 (owner: 10Ayounsi) [15:51:48] (03CR) 10jerkins-bot: [V: 04-1] Adapt WMF specific patches for Varnish5 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382175 (owner: 10Ayounsi) [15:52:30] bblack: just to clarify, that patch is just to delete the old wmf-reimage script and have the old functionality available in the new script, just in case it's needed ;) [15:53:14] (03PS3) 10Volans: wmf-auto-reimage: add support for rename [puppet] - 10https://gerrit.wikimedia.org/r/381200 (https://phabricator.wikimedia.org/T176955) [15:53:21] (03CR) 10BryanDavis: [C: 031] striker: switch to LVS endpoint for logstash [puppet] - 10https://gerrit.wikimedia.org/r/380993 (https://phabricator.wikimedia.org/T175242) (owner: 10Gehel) [15:53:23] (03CR) 10jerkins-bot: [V: 04-1] Update changelog for Varnish 5.1.3 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382176 (owner: 10Ayounsi) [15:54:04] (03CR) 10Volans: [C: 032] wmf-auto-reimage: add support for rename [puppet] - 10https://gerrit.wikimedia.org/r/381200 (https://phabricator.wikimedia.org/T176955) (owner: 10Volans) [15:54:52] (03PS2) 10Gehel: striker: switch to LVS endpoint for logstash [puppet] - 10https://gerrit.wikimedia.org/r/380993 (https://phabricator.wikimedia.org/T175242) [15:55:00] (03CR) 10jerkins-bot: [V: 04-1] Varnish5: Install devicedetect.vcl in the proper path [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382177 (owner: 10Ayounsi) [15:55:11] (03CR) 10jerkins-bot: [V: 04-1] Varnish5: Update libvarnishapi1.symbols [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382178 (owner: 10Ayounsi) [15:55:38] (03CR) 10Gehel: [C: 032] striker: switch to LVS endpoint for logstash [puppet] - 10https://gerrit.wikimedia.org/r/380993 (https://phabricator.wikimedia.org/T175242) (owner: 10Gehel) [15:56:38] (03CR) 10Filippo Giunchedi: [WIP] smart: new module (0313 comments) [puppet] - 10https://gerrit.wikimedia.org/r/378039 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [15:57:17] (03PS3) 10Filippo Giunchedi: smart: new module [puppet] - 10https://gerrit.wikimedia.org/r/378039 (https://phabricator.wikimedia.org/T86552) [15:57:18] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, 10Stewards-and-global-tools (Temporary-UserRights): Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3657784 (10jcrespo) Legoktm was totally legitimate about closing the ticket wi... [15:58:23] 10Operations, 10Traffic, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3657787 (10ayounsi) pristine-tar and upstream pushed to origin debian-wmf pushed to gerrit: https://gerrit.wikimedia.org/r/382174 Imported Upstream version 5.1.3 https://gerrit.wikimedia.org... [16:00:36] PROBLEM - puppet last run on puppetmaster2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/wmf-reimage] [16:00:47] PROBLEM - puppet last run on labpuppetmaster1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/wmf-reimage] [16:01:31] 10Operations, 10ops-eqiad: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3657820 (10elukey) We are super fine with db1* names, no real preference, just asked to my team. If it is fine for the DBA team, we can proceed! [16:01:52] 10Operations, 10ops-eqiad: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3657821 (10jcrespo) db1* is ok, then, @Cmjohnson . I only asked because I may have thought you proposed to rename them. I am more than ok as keeping them as part of the db* family as m4 replica set. Sorry fo... [16:02:16] PROBLEM - puppet last run on puppetmaster2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/wmf-reimage] [16:05:56] PROBLEM - puppet last run on labpuppetmaster1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/wmf-reimage] [16:05:57] PROBLEM - puppet last run on puppetmaster1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/wmf-reimage] [16:06:04] !log performing schema change on labsdb1009/10/11 [16:06:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:06:16] (03CR) 10Giuseppe Lavagetto: [C: 04-1] smart: new module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/378039 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [16:09:56] (03PS2) 10Ayounsi: Varnish5: Install devicedetect.vcl in the proper path [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382177 [16:09:58] (03PS2) 10Ayounsi: Varnish5: Update libvarnishapi1.symbols [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382178 [16:10:00] (03PS2) 10Ayounsi: Update changelog for Varnish 5.1.3 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382176 [16:10:18] (03CR) 10jerkins-bot: [V: 04-1] Varnish5: Install devicedetect.vcl in the proper path [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382177 (owner: 10Ayounsi) [16:10:27] (03CR) 10jerkins-bot: [V: 04-1] Varnish5: Update libvarnishapi1.symbols [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382178 (owner: 10Ayounsi) [16:11:45] !log lvs1004-6: base package upgrades, autoremove, +python-requests [16:11:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:12:23] (03CR) 10jerkins-bot: [V: 04-1] Update changelog for Varnish 5.1.3 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382176 (owner: 10Ayounsi) [16:15:52] (03CR) 10BBlack: "recheck" [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/369643 (owner: 10Ema) [16:15:56] PROBLEM - puppet last run on rhodium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/wmf-reimage] [16:18:35] <_joe_> volans: ^^ [16:18:52] _joe_: looking [16:18:52] <_joe_> I thought you removed references to the file previously [16:19:03] I thought I did [16:19:36] but apparently I didn't, fixing [16:19:41] sorry about that [16:19:59] I'll remove the file manually [16:20:27] !log maelant: upgrade base packages [16:20:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:21:19] (03PS1) 10Volans: wmf-reimage: remove dangling reference [puppet] - 10https://gerrit.wikimedia.org/r/382182 (https://phabricator.wikimedia.org/T176955) [16:22:22] (03CR) 10Volans: [C: 032] wmf-reimage: remove dangling reference [puppet] - 10https://gerrit.wikimedia.org/r/382182 (https://phabricator.wikimedia.org/T176955) (owner: 10Volans) [16:23:50] !log deleted 'R:File = /usr/local/bin/wmf-reimage' on matching hosts T176955 [16:23:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:23:59] T176955: wmf-auto-reimage: add support for renaming while re-imaging - https://phabricator.wikimedia.org/T176955 [16:24:01] !log recdns: upgrade base packages [16:24:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:25:03] running puppet, should recover soon [16:25:47] RECOVERY - puppet last run on labpuppetmaster1001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [16:25:56] RECOVERY - puppet last run on rhodium is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [16:25:56] RECOVERY - puppet last run on labpuppetmaster1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:27:07] RECOVERY - puppet last run on puppetmaster2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:28:09] _joe_: thanks for noticing, I was distracted by a CR ;) all fixed [16:29:24] !log lvs1001-3: base package upgrades, autoremove, +python-requests [16:29:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:36] RECOVERY - puppet last run on puppetmaster2002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:30:56] RECOVERY - puppet last run on puppetmaster1002 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [16:30:58] (03CR) 10Volans: "Much nicer! Thanks for all the fixes. Just a couple of comments/replies inline." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/378039 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [16:33:32] (03PS2) 10Jcrespo: mariadb: Add file to control additional wikireplicas-only indexes [puppet] - 10https://gerrit.wikimedia.org/r/382170 (https://phabricator.wikimedia.org/T177096) [16:34:26] PROBLEM - DPKG on lvs1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:35:18] PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[pybal] [16:36:26] RECOVERY - DPKG on lvs1003 is OK: All packages OK [16:40:16] RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:41:56] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 20 probes of 277 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [16:43:07] (03PS3) 10Giuseppe Lavagetto: Rakefile: split in modules, refactor git interaction [puppet] - 10https://gerrit.wikimedia.org/r/382165 [16:44:15] (03PS1) 10Gehel: logstash: DSH groups based on conftool [puppet] - 10https://gerrit.wikimedia.org/r/382184 [16:46:54] (03CR) 10Giuseppe Lavagetto: [C: 032] Rakefile: split in modules, refactor git interaction [puppet] - 10https://gerrit.wikimedia.org/r/382165 (owner: 10Giuseppe Lavagetto) [16:46:56] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is OK: OK - failed 11 probes of 277 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [16:47:48] (03PS2) 10Ottomata: Add druid-analytics LVS svc name [dns] - 10https://gerrit.wikimedia.org/r/378967 (https://phabricator.wikimedia.org/T176223) [16:48:06] (03PS3) 10Giuseppe Lavagetto: contint::website to a profile [puppet] - 10https://gerrit.wikimedia.org/r/382140 (owner: 10Hashar) [16:51:40] (03PS3) 10Ottomata: Add LVS service for druid-analytics-broker [puppet] - 10https://gerrit.wikimedia.org/r/378956 (https://phabricator.wikimedia.org/T176223) [16:53:38] I'm getting a 500 when at https://zh.wikipedia.org/wiki/Special:%E5%B1%95%E5%BC%80%E6%A8%A1%E6%9D%BF I put {{#property:P1402|from=Q776995}} in the textbox [16:54:04] (03PS4) 10Ottomata: Add LVS service for druid-analytics-broker [puppet] - 10https://gerrit.wikimedia.org/r/378956 (https://phabricator.wikimedia.org/T176223) [16:54:20] (03PS5) 10Ottomata: Add LVS service for druid-analytics-broker [puppet] - 10https://gerrit.wikimedia.org/r/378956 (https://phabricator.wikimedia.org/T176223) [16:55:17] (03CR) 10Ottomata: [C: 032] Add druid-analytics LVS svc name [dns] - 10https://gerrit.wikimedia.org/r/378967 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [16:58:16] (03Abandoned) 10Lucas Werkmeister (WMDE): Change /data/ redirect to Special:Pagedata [puppet] - 10https://gerrit.wikimedia.org/r/382163 (https://phabricator.wikimedia.org/T163922) (owner: 10Lucas Werkmeister (WMDE)) [16:58:21] !log krinkle@tin Synchronized php-1.31.0-wmf.2/extensions/TimedMediaHandler/resources/mw.MediaWikiPlayer.loader.js: T175506 (duration: 00m 52s) [16:58:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:58:29] T175506: jQuery.fn.bind() on Special:Search from TimedMediaHandler - https://phabricator.wikimedia.org/T175506 [17:03:50] 10Operations, 10ops-eqiad: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3658071 (10Cmjohnson) @jcrespo I only recommended that named based on the procurement ticket subject: "eqiad: replacements for db1046 and db1047" I have zero preference to naming. [17:06:18] !log cp4022-26 (now all of upload@ulsfo) - upgrade nginx to 1.13.5-1+wmf1~jessie1 [17:06:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:07:32] (03PS2) 10Elukey: Remove reportupdater job that triggers abandoned discovery-stats [puppet] - 10https://gerrit.wikimedia.org/r/380778 (https://phabricator.wikimedia.org/T176639) (owner: 10Mforns) [17:08:06] (03CR) 10Elukey: [C: 032] Remove reportupdater job that triggers abandoned discovery-stats [puppet] - 10https://gerrit.wikimedia.org/r/380778 (https://phabricator.wikimedia.org/T176639) (owner: 10Mforns) [17:08:33] 10Operations, 10ops-eqiad: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3658083 (10jcrespo) > db1* is ok [17:09:06] PROBLEM - DPKG on cp4022 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [17:09:59] bleh [17:10:06] RECOVERY - DPKG on cp4022 is OK: All packages OK [17:11:10] (03PS6) 10Ottomata: Add LVS service for druid-analytics-broker [puppet] - 10https://gerrit.wikimedia.org/r/378956 (https://phabricator.wikimedia.org/T176223) [17:14:45] (03Draft1) 10Paladox: Jenkins: Install java 8 on debian jessie [puppet] - 10https://gerrit.wikimedia.org/r/382188 (https://phabricator.wikimedia.org/T168644) [17:14:49] (03PS2) 10Paladox: Jenkins: Install java 8 on debian jessie [puppet] - 10https://gerrit.wikimedia.org/r/382188 (https://phabricator.wikimedia.org/T168644) [17:15:28] (03CR) 10jerkins-bot: [V: 04-1] Jenkins: Install java 8 on debian jessie [puppet] - 10https://gerrit.wikimedia.org/r/382188 (https://phabricator.wikimedia.org/T168644) (owner: 10Paladox) [17:16:02] (03PS3) 10Paladox: Jenkins: Install java 8 on debian jessie [puppet] - 10https://gerrit.wikimedia.org/r/382188 (https://phabricator.wikimedia.org/T168644) [17:16:39] !log uploaded icu52 to stretch-wikimedia (co-installable forward port of jessie's ICU for stretch to be used for HHVM) [17:16:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:18:36] ACKNOWLEDGEMENT - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0: Ayounsi Telia outage #00783305 [17:18:36] ACKNOWLEDGEMENT - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 66, down: 1, dormant: 0, excluded: 0, unused: 0: Ayounsi Telia outage #00783305 [17:24:43] 10Operations, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Tune Kafka logs to register clients connected - https://phabricator.wikimedia.org/T173493#3658150 (10elukey) Added the following ACLs (still not active since the above patch is not merged): ``` elukey@kafka-jumbo1001:~$ kafka acls --list... [17:24:54] (03PS4) 10Elukey: Enable basic ACL handling on the Kafka Jumbo cluster [puppet] - 10https://gerrit.wikimedia.org/r/381980 (https://phabricator.wikimedia.org/T173493) [17:25:24] (03CR) 10Ottomata: [C: 032] "looks ok to me! https://puppet-compiler.wmflabs.org/compiler02/8180/" [puppet] - 10https://gerrit.wikimedia.org/r/378956 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [17:26:04] (03PS7) 10Ottomata: Add LVS service for druid-analytics-broker [puppet] - 10https://gerrit.wikimedia.org/r/378956 (https://phabricator.wikimedia.org/T176223) [17:27:38] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/8181/" [puppet] - 10https://gerrit.wikimedia.org/r/381980 (https://phabricator.wikimedia.org/T173493) (owner: 10Elukey) [17:28:37] (03PS3) 10Ayounsi: Update changelog for Varnish 5.1.3 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382176 [17:28:39] (03PS1) 10Ayounsi: Add lintian-overrides for statically linked libs [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382193 [17:29:02] (03CR) 10jerkins-bot: [V: 04-1] Add lintian-overrides for statically linked libs [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/382193 (owner: 10Ayounsi) [17:29:11] (03PS8) 10Ottomata: Add LVS service for druid-analytics-broker [puppet] - 10https://gerrit.wikimedia.org/r/378956 (https://phabricator.wikimedia.org/T176223) [17:32:04] !log deploying new LVS service for druid-analytics-broker [17:32:04] !log text@ulsfo - upgrade nginx to 1.13.5-1+wmf1~jessie1 [17:32:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:32:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:33:40] (03PS1) 10Gehel: wdqs: GC tuning [puppet] - 10https://gerrit.wikimedia.org/r/382195 (https://phabricator.wikimedia.org/T175919) [17:35:15] 10Operations, 10ops-eqiad: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3658217 (10Marostegui) As for the racking I would suggest db1107: Take db1036 (T176311) place on B2 db1108: Take db1015 (T173570) on A2 So we would have different racks and different rows. [17:37:29] !log enabled basic ACLs on the Kafka Jumbo cluster - T173493 [17:37:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:37:37] T173493: Tune Kafka logs to register clients connected - https://phabricator.wikimedia.org/T173493 [17:38:16] 10Operations, 10ops-eqiad: rack and setup db1107 and db1108 - https://phabricator.wikimedia.org/T177405#3658231 (10jcrespo) Note that in theory, these new servers should be 1/2 the size of the original ones, so they should fit there and even have space for extra 1U servers. [17:38:57] (03PS3) 10Jcrespo: mariadb: Add file to control additional wikireplicas-only indexes [puppet] - 10https://gerrit.wikimedia.org/r/382170 (https://phabricator.wikimedia.org/T177096) [17:39:29] (03CR) 10Jcrespo: [C: 032] mariadb: Add file to control additional wikireplicas-only indexes [puppet] - 10https://gerrit.wikimedia.org/r/382170 (https://phabricator.wikimedia.org/T177096) (owner: 10Jcrespo) [17:42:58] PROBLEM - Host druid-analytics.svc.eqiad.wmnet is DOWN: PING CRITICAL - Packet loss = 100% [17:45:03] ACKNOWLEDGEMENT - Host druid-analytics.svc.eqiad.wmnet is DOWN: PING CRITICAL - Packet loss = 100% ottomata new service, not working yet. sorry for page. [17:46:40] (03CR) 10Krinkle: [C: 031] Fix expanddblist shebang [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371966 (owner: 10Dereckson) [17:46:51] (03PS2) 10Krinkle: multiversion: Fix expanddblist shebang [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371966 (owner: 10Dereckson) [17:50:41] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Next): Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532#3658342 (10Dzahn) Yea, this should just wait for the proper setup in codfw. I don't see a... [17:50:58] (03PS2) 10Krinkle: Fix notice when no argument is given [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371967 (https://phabricator.wikimedia.org/T173342) (owner: 10Dereckson) [17:51:15] (03PS3) 10Krinkle: multiversion: Fix PHP notice when no argument is given [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371967 (https://phabricator.wikimedia.org/T173342) (owner: 10Dereckson) [17:51:20] (03CR) 10Krinkle: [C: 031] multiversion: Fix PHP notice when no argument is given [mediawiki-config] - 10https://gerrit.wikimedia.org/r/371967 (https://phabricator.wikimedia.org/T173342) (owner: 10Dereckson) [17:52:13] 10Operations, 10Wikidata, 10wikiba.se, 10Wikidata-Sprint, 10Wikidata-Sprint-2016-11-08: Create wikibase/wikiba.se-deploy repo - https://phabricator.wikimedia.org/T176841#3658348 (10Dzahn) Looks good, thanks. When i needed one in the past i always got a response from that wiki page and usually @QChris did... [17:54:32] 10Operations, 10monitoring, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348#3658367 (10Dzahn) Yes, eh.. tentatively closing :) Of course we can still comment here and reopen if necessary. I just see minor adjustments to thresholds or whitelists in the... [17:54:44] 10Operations, 10monitoring, 10Patch-For-Review: Check long-running screen/tmux sessions - https://phabricator.wikimedia.org/T165348#3658368 (10Dzahn) 05Open>03Resolved [17:54:52] PROBLEM - PyBal backends health check on lvs1009 is CRITICAL: PYBAL CRITICAL - Bad Response from pybal: 500 Internal Server Error [17:55:57] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, 10Stewards-and-global-tools (Temporary-UserRights): Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3658385 (10kaldari) >If purging is undesirable on production, which is somethi... [17:57:02] PROBLEM - PyBal IPVS diff check on lvs1009 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([druid1002.eqiad.wmnet, druid1001.eqiad.wmnet]) [17:57:23] PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - druid-analytics-broker_8082 - Could not depool server druid1001.eqiad.wmnet because of too many down! [17:57:42] this one is work in progress, not a real issue --^ [17:58:48] RECOVERY - Host druid-analytics.svc.eqiad.wmnet is UP: PING OK - Packet loss = 0%, RTA = 0.50 ms [17:59:02] 10Operations, 10RelEng-Archive-FY201718-Q1, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10Security-General: setup releases1001.eqiad.wmnet (was: setup mwreleases1001) - https://phabricator.wikimedia.org/T164030#3658401 (10Dzahn) Thanks @elukey , that issue was known to me and the fix was pen... [18:00:05] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I, the Bot under the Fountain, allow thee, The Deployer, to do Morning SWAT (Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171004T1800). [18:00:06] Krinkle: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:06] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se, 10Wikidata-Sprint-2016-11-08: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3658406 (10QChris) [18:00:20] (03PS1) 10Dzahn: Revert "Revert "releases: drop /ci/ suffix for jenkins-proxy, unify templates"" [puppet] - 10https://gerrit.wikimedia.org/r/382198 [18:00:23] (03PS1) 10Dzahn: Revert "Revert "releases-jenkins: remove now unused jenkins_proxy file"" [puppet] - 10https://gerrit.wikimedia.org/r/382199 [18:01:01] I can SWAT. Krinkle - you around? [18:01:08] yep [18:01:09] (03PS2) 10Dzahn: Revert "Revert "releases: drop /ci/ suffix for jenkins-proxy, unify templates"" [puppet] - 10https://gerrit.wikimedia.org/r/382198 [18:01:28] PROBLEM - LVS HTTP IPv4 on druid-analytics.svc.eqiad.wmnet is CRITICAL: connect to address 10.2.2.38 and port 8082: No route to host [18:01:51] (03PS2) 10Niharika29: Enable jQuery 3 on most group1 wikis (non-Wikipedia) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379948 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [18:02:56] (03CR) 10Niharika29: [C: 032] Enable jQuery 3 on most group1 wikis (non-Wikipedia) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379948 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [18:03:12] PROBLEM - PyBal IPVS diff check on lvs1006 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([druid1002.eqiad.wmnet, druid1001.eqiad.wmnet]) [18:03:17] Niharika: i'll add another thing to SWAT if you don't mind [18:03:21] sorry for lateness [18:03:30] Sure. [18:04:01] (03CR) 10Zoranzoki21: [C: 031] Revert "Revert "releases: drop /ci/ suffix for jenkins-proxy, unify templates"" [puppet] - 10https://gerrit.wikimedia.org/r/382198 (owner: 10Dzahn) [18:04:02] PROBLEM - PyBal backends health check on lvs1003 is CRITICAL: PYBAL CRITICAL - druid-analytics-broker_8082 - Could not depool server druid1001.eqiad.wmnet because of too many down! [18:05:11] (03Merged) 10jenkins-bot: Enable jQuery 3 on most group1 wikis (non-Wikipedia) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379948 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [18:05:27] (03CR) 10Zoranzoki21: [C: 031] wikitech: Align 'contentadmin' and 'sysop' permissions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382024 (https://phabricator.wikimedia.org/T171208) (owner: 10EddieGP) [18:05:37] (03CR) 10Dzahn: [C: 032] Revert "Revert "releases: drop /ci/ suffix for jenkins-proxy, unify templates"" [puppet] - 10https://gerrit.wikimedia.org/r/382198 (owner: 10Dzahn) [18:05:58] Krinkle: Your patch is on mwdebug1002 if you'd like to test. [18:06:04] Niharika: done. https://gerrit.wikimedia.org/r/#/c/382201/ for wmf.1 and https://gerrit.wikimedia.org/r/#/c/382202/ for wmf.2. i can only test on wmf.1 though, i don't think any wmf.2 wikis are using configuration that exposes the bug. [18:06:04] k [18:06:09] (03CR) 10jenkins-bot: Enable jQuery 3 on most group1 wikis (non-Wikipedia) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/379948 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [18:06:13] (03PS2) 10Dzahn: Revert "Revert "releases-jenkins: remove now unused jenkins_proxy file"" [puppet] - 10https://gerrit.wikimedia.org/r/382199 [18:06:23] PROBLEM - PyBal IPVS diff check on lvs1003 is CRITICAL: CRITICAL: Hosts in IPVS but unknown to PyBal: set([druid1002.eqiad.wmnet, druid1001.eqiad.wmnet]) [18:06:38] (it seems that only meta and commons are affected) [18:07:01] (03CR) 10Dzahn: [C: 032] Revert "Revert "releases-jenkins: remove now unused jenkins_proxy file"" [puppet] - 10https://gerrit.wikimedia.org/r/382199 (owner: 10Dzahn) [18:07:29] Niharika: confirmed, lgtm. [18:07:32] (03PS3) 10Dzahn: releases: remove proxy_jenkins class, simplify [puppet] - 10https://gerrit.wikimedia.org/r/382098 [18:09:51] !log niharika29@tin Synchronized wmf-config/InitialiseSettings.php: Enable jQuery 3 on most group1 wikis (T124742) (duration: 00m 51s) [18:09:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:09:59] T124742: Upgrade to jQuery 3 - https://phabricator.wikimedia.org/T124742 [18:10:01] Krinkle: Synced. [18:10:03] Thx [18:10:09] (03PS6) 10Zoranzoki21: Enable Extension:Newsletter on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/381537 (https://phabricator.wikimedia.org/T177151) [18:15:32] MatmaRex: Your change is live for wmf.1 wikis. [18:15:35] On mwdebug1002. [18:16:43] (03CR) 10Smalyshev: [C: 031] wdqs: GC tuning [puppet] - 10https://gerrit.wikimedia.org/r/382195 (https://phabricator.wikimedia.org/T175919) (owner: 10Gehel) [18:17:29] Niharika: hmm, for sure? i still see the old behavior [18:18:43] MatmaRex: Oh sorry, I forgot the submodule update. :| [18:18:56] :) [18:19:00] MatmaRex: Check now. [18:19:11] Niharika: yup, works fine now on Commons [18:19:39] MatmaRex: Okay. I'll sync it out for both wmf.1 and wmf.2. [18:19:45] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, 10Stewards-and-global-tools (Temporary-UserRights): Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3658483 (10EddieGP) When digging a bit further into this, I found that it was... [18:20:30] (03PS1) 10Papaul: DNS:Remove production & mgmt DNS for db2010 [dns] - 10https://gerrit.wikimedia.org/r/382205 (https://phabricator.wikimedia.org/T175685) [18:22:04] !log niharika29@tin Synchronized php-1.31.0-wmf.1/skins/Vector/: Do not special-case ULS and Not logged in in RTL in personal bar T48947, T177312 (duration: 00m 51s) [18:22:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:12] T177312: Universal Language Selector position (rtl/ltr) - https://phabricator.wikimedia.org/T177312 [18:22:12] T48947: Vector: Horizontal nav elements should be flipped with CSS instead of in HTML - https://phabricator.wikimedia.org/T48947 [18:23:13] !log niharika29@tin Synchronized php-1.31.0-wmf.2/skins/Vector/: Do not special-case ULS and Not logged in in RTL in personal bar T48947, T177312 (duration: 00m 50s) [18:23:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:40] Niharika: thanks! [18:23:53] You're welcome. :) [18:25:37] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, 10Stewards-and-global-tools (Temporary-UserRights): Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3658511 (10MarcoAurelio) Sorry but I don't see it that way. As much as I respe... [18:28:13] (03Draft2) 10Zoranzoki21: Removing expired rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382206 [18:28:21] Hi all. Can any deploy this change please: https://gerrit.wikimedia.org/r/#/c/382206/ [18:29:48] (03PS4) 10Dzahn: releases: rm proxy_jenkins class, mv Apache includes [puppet] - 10https://gerrit.wikimedia.org/r/382098 [18:30:07] (03CR) 10jerkins-bot: [V: 04-1] releases: rm proxy_jenkins class, mv Apache includes [puppet] - 10https://gerrit.wikimedia.org/r/382098 (owner: 10Dzahn) [18:30:44] Zoranzoki21: add it to swat deploy calendar [18:30:52] jouncebot: next [18:30:52] In 0 hour(s) and 29 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171004T1900) [18:30:58] (03PS1) 10Ayounsi: Fix vmod_abi.h version parsing [software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/382207 [18:31:00] (03PS1) 10Ayounsi: Add description to vmod_netmapper.vcc [software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/382208 [18:31:02] (03PS1) 10Ayounsi: Bump version to 1.5 [software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/382209 [18:31:18] ok mutante [18:31:20] Zoranzoki21: see the "Evening SWAT" section there [18:31:29] ok.. thank you [18:32:03] you would add the gerrit link and your IRC nick, later bot will ping you [18:32:59] (03PS5) 10Dzahn: releases: rm proxy_jenkins class, mv Apache includes [puppet] - 10https://gerrit.wikimedia.org/r/382098 [18:33:11] jerkins, come on, do you like it now [18:33:22] (03PS1) 10Ottomata: Revert "Add LVS service for druid-analytics-broker" [puppet] - 10https://gerrit.wikimedia.org/r/382210 [18:33:38] 10Operations, 10cloud-services-team: Upgrade to puppet 4 (4.8 or newer) - https://phabricator.wikimedia.org/T177254#3658537 (10herron) [18:33:41] (03CR) 10Ottomata: "This won't work. Druid is in the analytics vlan. :(" [puppet] - 10https://gerrit.wikimedia.org/r/382210 (owner: 10Ottomata) [18:33:57] 10Operations, 10cloud-services-team: Upgrade to puppet 4 (4.8 or newer) - https://phabricator.wikimedia.org/T177254#3652273 (10herron) [18:34:06] (03PS1) 10Andrew Bogott: labweb: add appserver and ldap classes [puppet] - 10https://gerrit.wikimedia.org/r/382211 [18:34:18] (03CR) 10Ottomata: [C: 032] Revert "Add LVS service for druid-analytics-broker" [puppet] - 10https://gerrit.wikimedia.org/r/382210 (owner: 10Ottomata) [18:34:23] (03PS2) 10Ottomata: Revert "Add LVS service for druid-analytics-broker" [puppet] - 10https://gerrit.wikimedia.org/r/382210 [18:34:25] it does :) yay, got some score for the competition to remove style violations [18:34:26] (03CR) 10Ottomata: [V: 032 C: 032] Revert "Add LVS service for druid-analytics-broker" [puppet] - 10https://gerrit.wikimedia.org/r/382210 (owner: 10Ottomata) [18:34:36] (03CR) 10jerkins-bot: [V: 04-1] labweb: add appserver and ldap classes [puppet] - 10https://gerrit.wikimedia.org/r/382211 (owner: 10Andrew Bogott) [18:35:55] (03PS2) 10Andrew Bogott: labweb: add appserver and ldap classes [puppet] - 10https://gerrit.wikimedia.org/r/382211 [18:36:02] bd808: btw, using "class { '::apache::mod::rewrite': } instead of "include apache::mod::rewrite" and moving it to profile = jenkins is happy [18:36:27] (03PS3) 10Andrew Bogott: labweb: add appserver and ldap classes [puppet] - 10https://gerrit.wikimedia.org/r/382211 [18:37:29] (03CR) 10Andrew Bogott: [C: 032] labweb: add appserver and ldap classes [puppet] - 10https://gerrit.wikimedia.org/r/382211 (owner: 10Andrew Bogott) [18:39:21] !log reverting roll out of LVS service druid-analytics.svc.eqiad.wmnet - this wont' work with hosts inside of Analytics VLAN [18:39:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:39:33] (03PS6) 10Dzahn: releases: rm proxy_jenkins class, mv Apache includes [puppet] - 10https://gerrit.wikimedia.org/r/382098 [18:46:03] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, 10Stewards-and-global-tools (Temporary-UserRights): Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3658551 (10kaldari) @EddieGP: I believe there were performance concerns with t... [18:48:39] (03PS7) 10Dzahn: releases: rm proxy_jenkins class, mv Apache includes [puppet] - 10https://gerrit.wikimedia.org/r/382098 (https://phabricator.wikimedia.org/T164030) [18:49:10] (03CR) 10jerkins-bot: [V: 04-1] releases: rm proxy_jenkins class, mv Apache includes [puppet] - 10https://gerrit.wikimedia.org/r/382098 (https://phabricator.wikimedia.org/T164030) (owner: 10Dzahn) [18:50:35] 10Operations, 10Wikidata, 10wikiba.se, 10Wikidata-Sprint, 10Wikidata-Sprint-2016-11-08: Create wikibase/wikiba.se-deploy repo - https://phabricator.wikimedia.org/T176841#3658560 (10Dzahn) Heh, that worked ! Thanks @QChris :) [18:52:26] (03PS8) 10Dzahn: releases: rm proxy_jenkins class, mv Apache includes [puppet] - 10https://gerrit.wikimedia.org/r/382098 (https://phabricator.wikimedia.org/T164030) [18:52:37] RECOVERY - PyBal backends health check on lvs1009 is OK: PYBAL OK - All pools are healthy [18:55:07] RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy [18:55:47] (03PS9) 10Dzahn: releases: rm proxy_jenkins class, mv Apache includes [puppet] - 10https://gerrit.wikimedia.org/r/382098 (https://phabricator.wikimedia.org/T164030) [18:56:37] RECOVERY - PyBal backends health check on lvs1003 is OK: PYBAL OK - All pools are healthy [18:56:59] (03CR) 10Dzahn: "18:52:57 wmf-style: total violations delta -1 (by adding 3 new ones but fixing 4, haha)" [puppet] - 10https://gerrit.wikimedia.org/r/382098 (https://phabricator.wikimedia.org/T164030) (owner: 10Dzahn) [18:57:08] RECOVERY - PyBal IPVS diff check on lvs1009 is OK: OK: no difference between hosts in IPVS/PyBal [18:57:20] i am adding 3 new violations... but i'm removing 4 violations, so still a score of -1 , hehe [18:57:43] (03CR) 10BBlack: [C: 031] Fix vmod_abi.h version parsing [software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/382207 (owner: 10Ayounsi) [18:57:45] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, 10Stewards-and-global-tools (Temporary-UserRights): Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3658580 (10MarcoAurelio) @kaldari Tool Labs is mentioned here because I discov... [18:57:50] (03CR) 10BBlack: [C: 031] Add description to vmod_netmapper.vcc [software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/382208 (owner: 10Ayounsi) [18:57:56] (03CR) 10BBlack: [C: 031] Bump version to 1.5 [software/varnish/libvmod-netmapper] - 10https://gerrit.wikimedia.org/r/382209 (owner: 10Ayounsi) [18:58:08] RECOVERY - PyBal IPVS diff check on lvs1006 is OK: OK: no difference between hosts in IPVS/PyBal [18:58:37] RECOVERY - PyBal IPVS diff check on lvs1003 is OK: OK: no difference between hosts in IPVS/PyBal [19:00:05] twentyafterfour: Your horoscope predicts another unfortunate MediaWiki train deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171004T1900). [19:00:05] No GERRIT patches in the queue for this window AFAICS. [19:06:14] !log cp2* (all codfw caches): upgrade nginx to 1.13.5-1+wmf1~jessie1 [19:06:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:06:43] (03PS10) 10Dzahn: releases: rm proxy_jenkins class, mv Apache includes [puppet] - 10https://gerrit.wikimedia.org/r/382098 (https://phabricator.wikimedia.org/T164030) [19:08:47] (03CR) 10Dzahn: "19:07:13 wmf-style: total violations delta -4 :)" [puppet] - 10https://gerrit.wikimedia.org/r/382098 (https://phabricator.wikimedia.org/T164030) (owner: 10Dzahn) [19:09:09] (03PS11) 10Dzahn: releases: rm proxy_jenkins class, mv Apache includes [puppet] - 10https://gerrit.wikimedia.org/r/382098 (https://phabricator.wikimedia.org/T164030) [19:09:16] (03CR) 10Dzahn: [C: 032] releases: rm proxy_jenkins class, mv Apache includes [puppet] - 10https://gerrit.wikimedia.org/r/382098 (https://phabricator.wikimedia.org/T164030) (owner: 10Dzahn) [19:19:26] (03CR) 10BryanDavis: "> How would you feel about keeping the barest of this ideas w/" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/381713 (https://phabricator.wikimedia.org/T166712) (owner: 10BryanDavis) [19:19:34] PROBLEM - Check systemd state on releases2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:25:08] releases is me.. working on it [19:25:20] !log Deploying MediaWiki 1.31.0-wmf.2 to Group 1 wikis refs T174358 [19:25:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:25:26] T174358: 1.31.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T174358 [19:26:49] (03CR) 10BryanDavis: [C: 031] "Description looks good to me and comments from the initial review have been addressed. Thanks!" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/381296 (https://phabricator.wikimedia.org/T176018) (owner: 10Mridubhatnagar) [19:31:54] PROBLEM - DPKG on labvirt1016 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:36:03] 10Operations, 10Performance-Team, 10Thumbor, 10MW-1.31-release-notes (WMF-deploy-2017-09-26 (1.31.0-wmf.1)), 10User-fgiunchedi: Remove X-Content-Dimensions for multipage originals - https://phabricator.wikimedia.org/T175689#3658744 (10Gilles) [19:37:17] 10Operations, 10MediaWiki-Platform-Team, 10TechCom-RfC, 10HHVM, 10NewPHP: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3623015 (10ksmith) Should there be some kind of task relationship between this and {T172165}? [19:37:51] (03PS1) 10Andrew Bogott: nova: add labvirt1017 to the scheduling pool [puppet] - 10https://gerrit.wikimedia.org/r/382213 (https://phabricator.wikimedia.org/T176044) [19:38:07] !log Start regenerating map tiles on eqiad for z13-z14 - T176252 [19:38:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:38:13] T176252: Regenerate tiles - https://phabricator.wikimedia.org/T176252 [19:39:15] 10Operations, 10MediaWiki-Platform-Team, 10TechCom-RfC, 10HHVM, 10NewPHP: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3623015 (10Reedy) >>! In T176370#3658751, @ksmith wrote: > Should there be some kind of task relationship between this and {T172165}? Probably. This ta... [19:40:07] (03CR) 10Rush: [C: 031] nova: add labvirt1017 to the scheduling pool [puppet] - 10https://gerrit.wikimedia.org/r/382213 (https://phabricator.wikimedia.org/T176044) (owner: 10Andrew Bogott) [19:40:35] (03CR) 10Andrew Bogott: [C: 032] nova: add labvirt1017 to the scheduling pool [puppet] - 10https://gerrit.wikimedia.org/r/382213 (https://phabricator.wikimedia.org/T176044) (owner: 10Andrew Bogott) [19:41:23] (03PS1) 10Dzahn: releases-jenkins: fix proxy setup, prefix setting [puppet] - 10https://gerrit.wikimedia.org/r/382214 (https://phabricator.wikimedia.org/T164030) [19:46:11] (03PS2) 10Dzahn: releases-jenkins: fix proxy setup, prefix setting [puppet] - 10https://gerrit.wikimedia.org/r/382214 (https://phabricator.wikimedia.org/T164030) [19:46:35] PROBLEM - puppet last run on releases2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[apache2] [19:50:04] (03PS1) 1020after4: group1 wikis to 1.31.0-wmf.2 refs T174358 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382215 [19:50:06] (03CR) 1020after4: [C: 032] group1 wikis to 1.31.0-wmf.2 refs T174358 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382215 (owner: 1020after4) [19:52:15] (03Merged) 10jenkins-bot: group1 wikis to 1.31.0-wmf.2 refs T174358 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382215 (owner: 1020after4) [19:52:28] (03CR) 10jenkins-bot: group1 wikis to 1.31.0-wmf.2 refs T174358 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382215 (owner: 1020after4) [19:52:38] (03PS3) 10Dzahn: releases-jenkins: fix proxy setup, prefix setting [puppet] - 10https://gerrit.wikimedia.org/r/382214 (https://phabricator.wikimedia.org/T164030) [19:54:39] 10Operations, 10DBA, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make apache/maintenance hosts TLS connections to mariadb work - https://phabricator.wikimedia.org/T175672#3658801 (10Joe) So what I extract from the errors is you're trying to connect to db2048 by IP and not by h... [19:55:02] (03PS4) 10Dzahn: releases-jenkins: fix proxy setup, prefix setting [puppet] - 10https://gerrit.wikimedia.org/r/382214 (https://phabricator.wikimedia.org/T164030) [19:55:06] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.31.0-wmf.2 refs T174358 [19:55:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:55:16] T174358: 1.31.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T174358 [19:55:27] (03PS1) 10Hashar: jenkins: switch to Java8 [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) [19:55:55] !log twentyafterfour@tin Synchronized php: group1 wikis to 1.31.0-wmf.2 refs T174358 (duration: 00m 48s) [19:56:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:56:03] (03CR) 10jerkins-bot: [V: 04-1] jenkins: switch to Java8 [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [19:56:15] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/8187/releases1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/382214 (https://phabricator.wikimedia.org/T164030) (owner: 10Dzahn) [19:57:51] (03PS2) 10Hashar: jenkins: switch to Java8 [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) [19:58:52] RECOVERY - Check systemd state on releases2001 is OK: OK - running: The system is fully operational [19:59:10] 10Operations, 10MediaWiki-Platform-Team, 10TechCom-RfC, 10HHVM, 10NewPHP: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3658811 (10Jdforrester-WMF) [20:00:04] gwicke, cscott, arlolra, subbu, bearND, halfak, and Amir1: I, the Bot under the Fountain, allow thee, The Deployer, to do Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171004T2000). [20:00:04] No GERRIT patches in the queue for this window AFAICS. [20:00:25] Nothing for ores today [20:01:13] 10Operations, 10Community-Tech, 10DBA, 10MediaWiki-General-or-Unknown, and 2 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3658814 (10EddieGP) p:05Normal>03Low >>! In T176754#3658580, @MarcoAurelio wrote: > The issue here is that MediaWi... [20:01:41] RECOVERY - puppet last run on releases2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [20:06:11] PROBLEM - DPKG on labvirt1016 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:18:05] !log Start regenerating map tiles on codfw for z13-z14 - T176252 [20:18:07] (03PS1) 10Dzahn: releases-jenkins: fix prefix for proxy setup, pt.2 [puppet] - 10https://gerrit.wikimedia.org/r/382221 (https://phabricator.wikimedia.org/T164030) [20:18:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:13] T176252: Regenerate tiles - https://phabricator.wikimedia.org/T176252 [20:18:29] 10Operations, 10Analytics, 10Analytics-Cluster, 10Research-management: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843#3658835 (10dr0ptp4kt) Just so we have it here, for TensorFlow people, there's an encouraging comment at https://github.com/tensorflow/tensorflow/issues/22#issueco... [20:18:37] (03CR) 10jerkins-bot: [V: 04-1] releases-jenkins: fix prefix for proxy setup, pt.2 [puppet] - 10https://gerrit.wikimedia.org/r/382221 (https://phabricator.wikimedia.org/T164030) (owner: 10Dzahn) [20:18:39] (03PS3) 10Hashar: jenkins: switch to Java8 [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) [20:21:48] (03CR) 10Paladox: [C: 031] "LGTM :)" [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [20:22:12] 10Operations, 10DBA, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Next): Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532#3658840 (10demon) p:05Triage>03Low [20:26:49] 10Operations, 10MediaWiki-Platform-Team, 10TechCom-RfC, 10HHVM, 10NewPHP: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3658845 (10EddieGP) >>! In T176370#3658764, @Reedy wrote: > Probably. This task would block that; we can't change MW core until WMF production is migrat... [20:28:37] (03CR) 10Hashar: [V: 031 C: 031] "Both the master and slave classes defines an alternative::select['java']. They are both applied on contint1001/contint2001 so that ended " [puppet] - 10https://gerrit.wikimedia.org/r/382217 (https://phabricator.wikimedia.org/T162828) (owner: 10Hashar) [20:32:52] !log arlolra@tin Started deploy [parsoid/deploy@5b80254]: Updating Parsoid to 477942c3 [20:33:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:33:50] !log arlolra@tin Finished deploy [parsoid/deploy@5b80254]: Updating Parsoid to 477942c3 (duration: 00m 58s) [20:33:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:33:58] 10Operations, 10DBA, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make apache/maintenance hosts TLS connections to mariadb work - https://phabricator.wikimedia.org/T175672#3658863 (10aaron) [20:37:08] akosiaris _joe_ looks like wtp1001..wtp1024 got decommissioned .. which is fine since we have the new servers, but it would have been useful to have been notified .. so we could have updated our set of canaries for deployment. [20:39:14] 10Operations, 10DBA, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make apache/maintenance hosts TLS connections to mariadb work - https://phabricator.wikimedia.org/T175672#3658870 (10aaron) Also, there is https://bugs.php.net/bug.php?id=74445 :) [20:41:21] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review: decom wtp1001-wtp1024 - https://phabricator.wikimedia.org/T177374#3656431 (10Arlolra) #parsing-team could have used a ping here, since our canaries are still hardcoded to the decommissioned nodes. https://github.com/wikimedia/mediawiki-services-parso... [20:47:05] 10Operations, 10DBA, 10Availability (Multiple-active-datacenters), 10Performance-Team (Radar): Make apache/maintenance hosts TLS connections to mariadb work - https://phabricator.wikimedia.org/T175672#3658884 (10aaron) >>! In T175672#3658801, @Joe wrote: > So what I extract from the errors is you're trying... [20:53:32] !log arlolra@tin Started deploy [parsoid/deploy@617cdb3]: Updating Parsoid to 477942c3 [20:53:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:57:12] 10Operations, 10Wikidata, 10wikiba.se, 10Wikidata-Sprint, 10Wikidata-Sprint-2016-11-08: Create wikibase/wikiba.se-deploy repo - https://phabricator.wikimedia.org/T176841#3658910 (10Ladsgroup) Thank you :) [21:03:27] (03PS2) 10Dzahn: releases-jenkins: fix prefix for proxy setup, pt.2 [puppet] - 10https://gerrit.wikimedia.org/r/382221 (https://phabricator.wikimedia.org/T164030) [21:03:33] !log arlolra@tin Finished deploy [parsoid/deploy@617cdb3]: Updating Parsoid to 477942c3 (duration: 10m 01s) [21:03:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:05:06] (03CR) 10Dzahn: [C: 032] releases-jenkins: fix prefix for proxy setup, pt.2 [puppet] - 10https://gerrit.wikimedia.org/r/382221 (https://phabricator.wikimedia.org/T164030) (owner: 10Dzahn) [21:07:26] !log Updated the Wikidata property suggester with data from Monday's JSON dump and applied the T132839 workarounds [21:07:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:07:56] T132839: [RfC] Property suggester suggests human properties for non-human items - https://phabricator.wikimedia.org/T132839 [21:09:21] !log Updated Parsoid to 477942c3 (T177115) [21:09:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:09:29] T177115: Paragraph wrapping bug when inline tags wrap block tags - https://phabricator.wikimedia.org/T177115 [21:09:35] 10Operations, 10Traffic, 10Patch-For-Review: Text eqiad varnish 503 spikes - https://phabricator.wikimedia.org/T175803#3658969 (10Aklapper) Two weeks later, is there more to do in this open task? Did https://gerrit.wikimedia.org/r/376751 "properly" fix this? Or is some "underlying issue investigation" needed? [21:28:09] 10Operations, 10RelEng-Archive-FY201718-Q1, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10Security-General: setup releases1001.eqiad.wmnet (was: setup mwreleases1001) - https://phabricator.wikimedia.org/T164030#3658984 (10Dzahn) https://releases-jenkins.wikimedia.org works now without the /... [21:36:31] (03PS1) 10Ayounsi: Fix vmod_abi.h version parsing [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/382309 [21:36:33] (03PS1) 10Ayounsi: Add description to vmod_netmapper.vcc [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/382310 [21:36:35] (03PS1) 10Ayounsi: Bump version to 1.5 [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/382311 [21:36:37] (03PS1) 10Ayounsi: Update debian changelog for version 1.5 [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/382312 [21:36:39] (03PS1) 10Ayounsi: Bump libvarnishapi-dev dependency to version 5.1.3 [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/382313 [21:38:55] (03PS1) 10Andrew Bogott: diskspace.py: add some new flavors [puppet] - 10https://gerrit.wikimedia.org/r/382314 [21:39:26] (03CR) 10Andrew Bogott: [C: 032] diskspace.py: add some new flavors [puppet] - 10https://gerrit.wikimedia.org/r/382314 (owner: 10Andrew Bogott) [21:55:39] PROBLEM - puppet last run on elastic1047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:57:14] (03PS1) 10Andrew Bogott: labweb: I don't think we need a local db anymore [puppet] - 10https://gerrit.wikimedia.org/r/382316 [21:58:02] (03CR) 10Andrew Bogott: [C: 032] labweb: I don't think we need a local db anymore [puppet] - 10https://gerrit.wikimedia.org/r/382316 (owner: 10Andrew Bogott) [22:03:20] PROBLEM - Apache HTTP on mw1289 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 3.425 second response time [22:04:19] RECOVERY - Apache HTTP on mw1289 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.045 second response time [22:04:39] PROBLEM - Nginx local proxy to apache on mw1282 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.007 second response time [22:04:40] PROBLEM - Apache HTTP on mw1282 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time [22:05:39] RECOVERY - Nginx local proxy to apache on mw1282 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.028 second response time [22:05:40] RECOVERY - Apache HTTP on mw1282 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.039 second response time [22:18:05] (03PS1) 10Andrew Bogott: labweb: add hiera host files. [puppet] - 10https://gerrit.wikimedia.org/r/382318 [22:18:50] (03CR) 10Andrew Bogott: [C: 032] labweb: add hiera host files. [puppet] - 10https://gerrit.wikimedia.org/r/382318 (owner: 10Andrew Bogott) [22:20:26] 10Operations, 10HHVM, 10User-Elukey: Missing .deb dependencies for appserver on Stretch - https://phabricator.wikimedia.org/T177443#3659028 (10Andrew) [22:25:42] RECOVERY - puppet last run on elastic1047 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [22:27:37] 10Operations, 10Traffic, 10Performance-Team (Radar): Upgrade to Varnish 5 - https://phabricator.wikimedia.org/T168529#3659048 (10ayounsi) libvmod-netmapper pushed to gerrit as well. libvmod-netmapper / varnish pushed to the experimental apt repo. [22:44:32] (03PS1) 10Reedy: Regenerate FancyCaptchas weekly rather than monthly [puppet] - 10https://gerrit.wikimedia.org/r/382322 (https://phabricator.wikimedia.org/T157736) [22:45:03] (03CR) 10jerkins-bot: [V: 04-1] Regenerate FancyCaptchas weekly rather than monthly [puppet] - 10https://gerrit.wikimedia.org/r/382322 (https://phabricator.wikimedia.org/T157736) (owner: 10Reedy) [22:46:03] This indenting gets silly [22:46:21] 10Operations, 10Traffic, 10Patch-For-Review: Text eqiad varnish 503 spikes - https://phabricator.wikimedia.org/T175803#3659119 (10BBlack) There's still some overlap and/or confusion between the 503 issues in this ticket, T174932 and T145661, and there's some still lesser recurrent 503s in esams that we don't... [22:46:26] (03PS2) 10Reedy: Regenerate FancyCaptchas weekly rather than monthly [puppet] - 10https://gerrit.wikimedia.org/r/382322 (https://phabricator.wikimedia.org/T157736) [22:46:34] Reedy: one reason i dislike python [22:46:49] This is ruby ;) [22:47:13] 10Operations, 10Traffic, 10Patch-For-Review: Text eqiad varnish 503 spikes - https://phabricator.wikimedia.org/T175803#3659123 (10BBlack) p:05High>03Normal [22:47:25] Reedy: im aware but its the same reason i dislike python [22:47:52] It makes a little more sense in python, in comarison to this expecting => at a certain distance after the longest key [22:49:30] Agreed [22:52:16] the guy who started Puppet explained how he couldn't get a grip of Python for a week but then tried Ruby and had a working prototype in two hours [23:00:12] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: That opportune time is upon us again. Time for a Evening SWAT (Max 8 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171004T2300). [23:00:12] DMaza, Zoranzoki21, Jdlrobson, VolkerE, and RoanKattouw: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:28] I can swat [23:00:47] here! [23:01:12] (03PS2) 10Catrope: Remove unnecessary `id` attributes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377406 (https://phabricator.wikimedia.org/T175670) (owner: 10VolkerE) [23:01:15] (03CR) 10Catrope: [C: 032] Remove unnecessary `id` attributes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377406 (https://phabricator.wikimedia.org/T175670) (owner: 10VolkerE) [23:01:25] Jon wins today's "who gets to go first" race [23:02:26] here [23:02:57] (03Merged) 10jenkins-bot: Remove unnecessary `id` attributes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377406 (https://phabricator.wikimedia.org/T175670) (owner: 10VolkerE) [23:03:11] (03CR) 10jenkins-bot: Remove unnecessary `id` attributes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377406 (https://phabricator.wikimedia.org/T175670) (owner: 10VolkerE) [23:03:36] Hi, I believe somehow an incorrect location got uploaded alongside https://commons.wikimedia.org/wiki/File:Newcastle_Town_Moor_Skyline.jpg (by the Android Commons app?), how can I check / change / delete this? [23:04:15] (03PS3) 10Reedy: Regenerate FancyCaptchas weekly rather than monthly [puppet] - 10https://gerrit.wikimedia.org/r/382322 (https://phabricator.wikimedia.org/T157736) [23:04:23] Sorry I think wrong channel [23:04:42] (03PS2) 10Catrope: Enable AbuseFilter runtime profile on Portuguese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382080 (https://phabricator.wikimedia.org/T177336) (owner: 10Dmaza) [23:04:51] (03CR) 10Catrope: [C: 032] Enable AbuseFilter runtime profile on Portuguese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382080 (https://phabricator.wikimedia.org/T177336) (owner: 10Dmaza) [23:05:04] jdlrobson: Your logo change is on mwdebug1002, please test [23:07:21] (03Merged) 10jenkins-bot: Enable AbuseFilter runtime profile on Portuguese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382080 (https://phabricator.wikimedia.org/T177336) (owner: 10Dmaza) [23:07:31] (03CR) 10jenkins-bot: Enable AbuseFilter runtime profile on Portuguese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382080 (https://phabricator.wikimedia.org/T177336) (owner: 10Dmaza) [23:13:03] jdlrobson: Your MinervaNeue change is now also on mwdebug1002, please test [23:13:12] RoanKattouw: teessting [23:14:48] RoanKattouw: good to sync! [23:15:03] jdlrobson: Both the device threshold and the logo change? [23:16:18] RoanKattouw: just checking logo [23:16:33] RoanKattouw: that looks good too [23:17:20] OK cool [23:18:17] !log catrope@tin Synchronized php-1.31.0-wmf.2/skins/MinervaNeue/: Correct feature phone threshold detection (T176286) (duration: 00m 53s) [23:18:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:18:25] T176286: Mobile font size is 0.8em for devices with screen width 320px and smaller - https://phabricator.wikimedia.org/T176286 [23:19:35] !log catrope@tin Synchronized static/images/mobile/copyright/wikipedia-wordmark-en.svg: Remove unnecessary id attributes (T175670) (duration: 00m 50s) [23:19:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:19:42] T175670: Squeeze some more bytes out of the wordmark SVG - https://phabricator.wikimedia.org/T175670 [23:20:06] DMaza: Krinkle: Your changes are on mwdebug1002 now, please test [23:20:13] thanks RoanKattouw [23:20:17] ok, testing [23:20:19] That's AbuseFilter for DMaza and TMH for Krinkle [23:22:12] (03PS3) 10Catrope: Removing expired rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382206 (owner: 10Zoranzoki21) [23:22:15] (03CR) 10Catrope: [C: 032] Removing expired rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382206 (owner: 10Zoranzoki21) [23:25:22] RoanKattouw: everything looks good [23:26:19] OK syncing [23:26:46] thank you [23:27:06] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Enable AbuseFilter runtime profile on ptwiki (T177336) (duration: 00m 51s) [23:27:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:27:14] T177336: Enable AbuseFilterRuntimeProfile on Portuguese Wikipedia - https://phabricator.wikimedia.org/T177336 [23:27:35] (03Merged) 10jenkins-bot: Removing expired rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382206 (owner: 10Zoranzoki21) [23:27:48] (03CR) 10jenkins-bot: Removing expired rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/382206 (owner: 10Zoranzoki21) [23:29:18] !log catrope@tin Synchronized wmf-config/throttle.php: Remove expired rules (duration: 00m 50s) [23:29:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:31:02] Checking [23:31:39] Confirmed. RoanKattouw [23:32:10] Thanks [23:32:24] !log Upgrading xhgui on tungsten (operations/software/xhgui.git) from 0.5.2 to 0.8.1 (matching version used in operations/mediawiki-config vendor) - new feature: flamegraphs [23:32:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:32:54] !log catrope@tin Synchronized php-1.31.0-wmf.2/includes/changes/ChangesListFilterGroup.php: Fix PHP fatal when transcluding Special:Recentchanges (T176236) (duration: 00m 50s) [23:33:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:33:02] T176236: PHP Warning: Attempted to serialize unserializable builtin class Closure$ChangesListSpecialPage::__construct#28;1852 - https://phabricator.wikimedia.org/T176236 [23:36:21] !log catrope@tin Synchronized php-1.31.0-wmf.2/includes/changes/ChangesListFilter.php: ORES highlights for RCFilters (T172757) (duration: 00m 50s) [23:36:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:29] T172757: Migrate and convert user preferences to the new UX - https://phabricator.wikimedia.org/T172757 [23:49:29] !log catrope@tin Synchronized php-1.31.0-wmf.2/resources/src/: ORES highlights for RCFilters (T172757) (duration: 00m 50s) [23:49:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:49:36] T172757: Migrate and convert user preferences to the new UX - https://phabricator.wikimedia.org/T172757 [23:51:12] !log catrope@tin Synchronized php-1.31.0-wmf.2/extensions/ORES/includes/Hooks.php: ORES highlights for RCFilters (T172757) (duration: 00m 50s) [23:51:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:53:49] !log catrope@tin Synchronized php-1.31.0-wmf.2/extensions/TimedMediaHandler/MwEmbedModules/EmbedPlayer/resources/jquery.embedPlayer.js: Fix jqmigrate warning (T169385) (duration: 00m 50s) [23:53:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:53:57] T169385: jQuery 3 migration warnings - https://phabricator.wikimedia.org/T169385