[01:20:33] (03PS1) 10Madhuvishy: firstboot: Fix force puppet run after ensure NFS mounts available [puppet] - 10https://gerrit.wikimedia.org/r/378653 (https://phabricator.wikimedia.org/T171508) [01:22:11] (03PS2) 10Madhuvishy: firstboot: Fix force puppet run after ensure NFS mounts available [puppet] - 10https://gerrit.wikimedia.org/r/378653 (https://phabricator.wikimedia.org/T171508) [01:24:35] (03PS3) 10Madhuvishy: firstboot: Fix force puppet run after ensure NFS mounts available [puppet] - 10https://gerrit.wikimedia.org/r/378653 (https://phabricator.wikimedia.org/T171508) [01:24:36] (03CR) 10Madhuvishy: [C: 032] firstboot: Fix force puppet run after ensure NFS mounts available [puppet] - 10https://gerrit.wikimedia.org/r/378653 (https://phabricator.wikimedia.org/T171508) (owner: 10Madhuvishy) [02:29:44] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.18) (duration: 08m 04s) [02:30:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:36:50] !log l10nupdate@tin ResourceLoader cache refresh completed at Mon Sep 18 02:36:50 UTC 2017 (duration 7m 6s) [02:37:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:24:25] PROBLEM - recommendation_api endpoints health on scb1003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) timed out before a response was received [03:25:25] RECOVERY - recommendation_api endpoints health on scb1003 is OK: All endpoints are healthy [05:05:44] (03PS2) 10Tim Starling: Re-enable EtcdConfig in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375108 [05:22:19] (03PS3) 10Tim Starling: Re-enable EtcdConfig in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375108 (https://phabricator.wikimedia.org/T156924) [05:33:57] (03CR) 10Tim Starling: [C: 032] Re-enable EtcdConfig in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375108 (https://phabricator.wikimedia.org/T156924) (owner: 10Tim Starling) [05:37:11] (03Merged) 10jenkins-bot: Re-enable EtcdConfig in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375108 (https://phabricator.wikimedia.org/T156924) (owner: 10Tim Starling) [05:37:21] (03CR) 10jenkins-bot: Re-enable EtcdConfig in beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/375108 (https://phabricator.wikimedia.org/T156924) (owner: 10Tim Starling) [05:40:20] !log tstarling@tin Synchronized wmf-config: just for consistency, should have no effect (gerrit 375108) (duration: 00m 49s) [05:40:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:18:29] !log testing EtcdConfig in beta cluster [06:18:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:27:35] PROBLEM - ores on scb1002 is CRITICAL: connect to address 10.64.16.21 and port 8081: Connection refused [06:31:45] RECOVERY - ores on scb1002 is OK: HTTP OK: HTTP/1.0 200 OK - 3666 bytes in 0.023 second response time [06:42:24] !log installing freexl security updates [06:42:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:05:31] (03PS1) 10Muehlenhoff: Extend Cumin aliases for missing maps test roles [puppet] - 10https://gerrit.wikimedia.org/r/378657 [07:07:17] (03CR) 10Muehlenhoff: [C: 032] Extend Cumin aliases for missing maps test roles [puppet] - 10https://gerrit.wikimedia.org/r/378657 (owner: 10Muehlenhoff) [07:07:35] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [07:09:15] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [07:09:55] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [07:10:05] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [07:12:06] PROBLEM - puppet last run on ganeti1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:12:25] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:12:36] PROBLEM - puppet last run on scb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:12:46] PROBLEM - puppet last run on labcontrol1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:12:55] PROBLEM - puppet last run on db1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:12:56] PROBLEM - puppet last run on ores1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:13:05] PROBLEM - puppet last run on analytics1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:13:05] PROBLEM - puppet last run on wtp1032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:13:06] PROBLEM - puppet last run on conf1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:13:15] PROBLEM - puppet last run on db1071 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:13:15] PROBLEM - puppet last run on db1099 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:14:15] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [07:14:55] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [07:15:05] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [07:15:45] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [07:16:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] [07:24:25] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] [07:40:15] RECOVERY - puppet last run on scb1001 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [07:40:45] RECOVERY - puppet last run on conf1001 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [07:40:45] RECOVERY - puppet last run on ganeti1003 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [07:40:55] RECOVERY - puppet last run on db1099 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [07:41:01] 10Operations: unaccepted salt keys - https://phabricator.wikimedia.org/T170510#3613818 (10MoritzMuehlenhoff) 05Open>03declined Salt is being removed. [07:41:05] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [07:41:25] RECOVERY - puppet last run on labcontrol1003 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [07:41:35] RECOVERY - puppet last run on db1035 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [07:41:35] RECOVERY - puppet last run on ores1005 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [07:41:45] RECOVERY - puppet last run on wtp1032 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [07:41:47] 10Operations, 10Puppet, 10Deployment-Systems, 10Beta-Cluster-reproducible, 10Patch-For-Review: grain-ensure erroneous mismatch with (bool)True vs (str)true - https://phabricator.wikimedia.org/T146914#3613824 (10MoritzMuehlenhoff) 05Open>03declined Salt is being removed. [07:41:55] RECOVERY - puppet last run on db1071 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [07:42:45] RECOVERY - puppet last run on analytics1048 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [07:43:37] so cp1067 is the responsible for the last big spike [07:44:24] it doesn't show any warning/error for mailbox expiry lag [07:44:53] but after the restart of cp105[23] yesterday we didn't see more issues from them afaics [07:48:40] so I'd be inclined to restart cp1067's backend sooner rather than later [07:48:54] cp105[32] caches should have already recovered nicely [07:52:33] ema should be back today IIRC, I can wait for him [08:05:16] (03CR) 10Volans: "I personally don't like this approach for the YAML, but I don't want to block this, so feel free to proceed as is." (031 comment) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/375980 (owner: 10Muehlenhoff) [08:18:46] (03CR) 10Matthias Mullie: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/378233 (https://phabricator.wikimedia.org/T160185) (owner: 10Matthias Mullie) [08:19:07] (03CR) 10jerkins-bot: [V: 04-1] Add 3d2png deploy repo to image scalers [puppet] - 10https://gerrit.wikimedia.org/r/378233 (https://phabricator.wikimedia.org/T160185) (owner: 10Matthias Mullie) [08:23:59] elukey: looking [08:24:36] yeah so this time around it's cp1067, it recovered a while ago [08:24:46] https://grafana.wikimedia.org/dashboard/db/varnish-failed-fetches?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cache_type=text&from=now-3h&to=now [08:30:53] (03CR) 10Filippo Giunchedi: ">" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/378039 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [08:44:28] (03PS1) 10Hashar: Allow contint-admins to interact with zuul service [puppet] - 10https://gerrit.wikimedia.org/r/378664 (https://phabricator.wikimedia.org/T167845) [08:45:06] 10Operations, 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), 10Zuul: Migrate zuul-server behind systemd service - https://phabricator.wikimedia.org/T167845#3614093 (10hashar) 05Resolved>03Open contint-admins can not interact with the `zu... [08:47:57] 10Operations, 10Mail: update exim::listserve::private::mailing_lists value in puppet - https://phabricator.wikimedia.org/T82350#3614097 (10MoritzMuehlenhoff) [08:49:55] (03PS2) 10ArielGlenn: dumps: Align box-shadow with WikimediaUI standard [puppet] - 10https://gerrit.wikimedia.org/r/378408 (owner: 10Ladsgroup) [08:51:14] (03CR) 10ArielGlenn: [C: 032] dumps: Align box-shadow with WikimediaUI standard [puppet] - 10https://gerrit.wikimedia.org/r/378408 (owner: 10Ladsgroup) [08:52:56] (03CR) 10Volans: "I'm suggesting the -X version for two reasons:" [puppet] - 10https://gerrit.wikimedia.org/r/378039 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [09:04:38] apergos: thanks :) [09:05:18] (03CR) 10ArielGlenn: "There's a lot of code in common between the dumpcategoriesrdf.sh and the dumpcategoriesrdf-daily.sh scripts. Why not pull the common func" [puppet] - 10https://gerrit.wikimedia.org/r/378355 (https://phabricator.wikimedia.org/T173774) (owner: 10Smalyshev) [09:06:16] (03CR) 10Filippo Giunchedi: "> I'm suggesting the -X version for two reasons:" [puppet] - 10https://gerrit.wikimedia.org/r/378039 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [09:08:07] Amir1: yw [09:08:11] thanks for the fixup [09:11:43] (03CR) 10Giuseppe Lavagetto: [C: 04-1] site.pp: assign roles to mw1307-28 (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/377774 (https://phabricator.wikimedia.org/T165519) (owner: 10Elukey) [09:13:33] (03CR) 10Muehlenhoff: [C: 031] site.pp: assign roles to mw1307-28 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/377774 (https://phabricator.wikimedia.org/T165519) (owner: 10Elukey) [09:13:40] (03CR) 10Elukey: site.pp: assign roles to mw1307-28 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/377774 (https://phabricator.wikimedia.org/T165519) (owner: 10Elukey) [09:29:21] (03PS7) 10Ema: VCL: stabilize backend storage patterns [puppet] - 10https://gerrit.wikimedia.org/r/376751 (https://phabricator.wikimedia.org/T145661) (owner: 10BBlack) [09:46:45] (03PS8) 10Ema: VCL: stabilize backend storage patterns [puppet] - 10https://gerrit.wikimedia.org/r/376751 (https://phabricator.wikimedia.org/T145661) (owner: 10BBlack) [09:58:33] (03PS15) 10ArielGlenn: restructure dumps webserver, zim manifests to module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/376750 (https://phabricator.wikimedia.org/T175592) [10:03:04] (03CR) 10ArielGlenn: [C: 032] restructure dumps webserver, zim manifests to module/role/profile [puppet] - 10https://gerrit.wikimedia.org/r/376750 (https://phabricator.wikimedia.org/T175592) (owner: 10ArielGlenn) [10:09:26] PROBLEM - MariaDB Slave Lag: s2 on db1047 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 379.83 seconds [10:11:26] RECOVERY - MariaDB Slave Lag: s2 on db1047 is OK: OK slave_sql_lag Replication lag: 22.37 seconds [10:22:21] (03PS2) 10Ema: Remove Puppet code which is obsolete with Salt grain removal [puppet] - 10https://gerrit.wikimedia.org/r/377292 (owner: 10Muehlenhoff) [10:25:08] (03CR) 10Ema: "noop on lvs1001/3003 https://puppet-compiler.wmflabs.org/compiler02/7906/" [puppet] - 10https://gerrit.wikimedia.org/r/377292 (owner: 10Muehlenhoff) [10:28:31] (03CR) 10Ema: [C: 032] Remove Puppet code which is obsolete with Salt grain removal [puppet] - 10https://gerrit.wikimedia.org/r/377292 (owner: 10Muehlenhoff) [10:30:09] (03CR) 10Elukey: role::kafka::jumbo::broker: enable Prometheus JMX monitoring (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/377753 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey) [10:37:44] (03CR) 10Filippo Giunchedi: "> > I'm suggesting the -X version for two reasons:" [puppet] - 10https://gerrit.wikimedia.org/r/378039 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [10:38:15] (03PS1) 10Ema: cache::text: enable nginx-lua-prometheus [puppet] - 10https://gerrit.wikimedia.org/r/378671 [10:40:58] !log cp1099 - backend restart, mailbox lag [10:41:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:43:28] (03CR) 10Paladox: [C: 031] Allow contint-admins to interact with zuul service [puppet] - 10https://gerrit.wikimedia.org/r/378664 (https://phabricator.wikimedia.org/T167845) (owner: 10Hashar) [10:53:25] 10Operations, 10OTRS: mendelevium (otrs) running out of inodes - https://phabricator.wikimedia.org/T171490#3466797 (10akosiaris) Thanks! From the looks of it, this peaked at 95% at 18:36 on the 16th. Seems like there was a spam campaign that same day which caused some issues but is not directly related to this... [11:22:31] (03PS11) 10Elukey: role::kafka::jumbo::broker: enable Prometheus JMX monitoring [puppet] - 10https://gerrit.wikimedia.org/r/377753 (https://phabricator.wikimedia.org/T175922) [11:26:08] (03CR) 10Elukey: "PCC https://puppet-compiler.wmflabs.org/compiler02/7907/" [puppet] - 10https://gerrit.wikimedia.org/r/377753 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey) [11:31:51] (03PS2) 10Elukey: site.pp: assign roles to mw1307-28 [puppet] - 10https://gerrit.wikimedia.org/r/377774 (https://phabricator.wikimedia.org/T165519) [11:34:27] (03CR) 10Elukey: "Added comments related to rows (as consequence I've split the node definition of the video scalers). Left in place the base firewall entri" [puppet] - 10https://gerrit.wikimedia.org/r/377774 (https://phabricator.wikimedia.org/T165519) (owner: 10Elukey) [11:43:48] (03PS1) 10Phuedx: pagePreviews: Stop A/B test on enwiki and dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378686 (https://phabricator.wikimedia.org/T176068) [11:44:39] hashar: could you schedule https://gerrit.wikimedia.org/r/#/c/378686/ for deployment during the swat window? [11:44:57] i can't login to wikitech right now as i don't have my authentication device [11:45:04] (for 2fa) [11:45:30] cc zeljkof [11:45:32] <3 [11:46:18] phuedx: but how I can identify that you are who you pretend to be ??? :] [11:48:05] who else would put an ascii art heart after their request? [11:49:13] jouncebot: refresh [11:49:16] I refreshed my knowledge about deployments. [11:49:17] jouncebot: next [11:49:17] In 1 hour(s) and 10 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170918T1300) [11:49:20] phuedx: done :D [11:49:25] hashar: messaged you via a different messenger app with a different 2fa scheme (text message instead of app) :D [11:49:29] thanks hashar! [12:10:13] (03CR) 10Muehlenhoff: [C: 031] site.pp: assign roles to mw1307-28 [puppet] - 10https://gerrit.wikimedia.org/r/377774 (https://phabricator.wikimedia.org/T165519) (owner: 10Elukey) [12:25:57] (03CR) 10Muehlenhoff: [C: 032] Readd rollback handling to debdeploy [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/375980 (owner: 10Muehlenhoff) [12:26:15] (03PS4) 10Muehlenhoff: Add a new debdeploy command query_version [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/376715 [12:26:57] (03CR) 10Muehlenhoff: [C: 032] Add a new debdeploy command query_version [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/376715 (owner: 10Muehlenhoff) [12:27:21] (03PS3) 10Muehlenhoff: Fix parsing of necessary restarts in query_restart [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/376688 [12:36:01] !log installing tomcat8 security updates [12:36:08] (03CR) 10Elukey: eventlogging_cleaner.py: add feature to pick start_ts from file (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/378236 (https://phabricator.wikimedia.org/T156933) (owner: 10Elukey) [12:36:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:43:38] (03PS4) 10MarcoAurelio: New 'abusefilter-helper' configuration for en.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377473 (https://phabricator.wikimedia.org/T175684) [12:43:46] (03PS2) 10MarcoAurelio: Meta(Talk)Namespace configuration for be.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378234 (https://phabricator.wikimedia.org/T175950) [12:46:08] (03PS2) 10Elukey: eventlogging_cleaner.py: add feature to pick start_ts from file [puppet] - 10https://gerrit.wikimedia.org/r/378236 (https://phabricator.wikimedia.org/T156933) [12:47:59] (03CR) 10Filippo Giunchedi: [C: 031] cache::text: enable nginx-lua-prometheus [puppet] - 10https://gerrit.wikimedia.org/r/378671 (owner: 10Ema) [12:48:42] (03CR) 10Filippo Giunchedi: Add 3d2png deploy repo to image scalers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/378233 (https://phabricator.wikimedia.org/T160185) (owner: 10Matthias Mullie) [12:51:46] (03CR) 10Filippo Giunchedi: [C: 04-1] "LGTM, just the jar name" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/377753 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey) [12:57:57] (03PS2) 10Filippo Giunchedi: [WIP] smart: new module [puppet] - 10https://gerrit.wikimedia.org/r/378039 (https://phabricator.wikimedia.org/T86552) [12:58:17] (03CR) 10Filippo Giunchedi: [WIP] smart: new module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/378039 (https://phabricator.wikimedia.org/T86552) (owner: 10Filippo Giunchedi) [12:59:10] !log installing libxen updates [12:59:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Time to pull up your socks and deploy European Mid-day SWAT(Max 8 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170918T1300). [13:00:04] tabbycat, phuedx, and Pchelolo: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break the wikis, you will be rewarded with a sticker. [13:00:14] o/// [13:00:23] I'm here [13:00:48] o/ [13:00:56] o/ [13:01:19] (03PS1) 10MarcoAurelio: Create a 'patroller' user group at Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378695 (https://phabricator.wikimedia.org/T176079) [13:02:17] (03CR) 10Hashar: [C: 032] "SWAT !" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377473 (https://phabricator.wikimedia.org/T175684) (owner: 10MarcoAurelio) [13:02:35] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378686 (https://phabricator.wikimedia.org/T176068) (owner: 10Phuedx) [13:03:00] (03CR) 10jerkins-bot: [V: 04-1] Create a 'patroller' user group at Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378695 (https://phabricator.wikimedia.org/T176079) (owner: 10MarcoAurelio) [13:03:15] ... sigh ... [13:04:02] Single space expected between "//" and comment [13:04:25] bah [13:04:43] (03Merged) 10jenkins-bot: New 'abusefilter-helper' configuration for en.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377473 (https://phabricator.wikimedia.org/T175684) (owner: 10MarcoAurelio) [13:05:12] tabbycat: the worst [13:05:38] tabbycat: "New 'abusefilter-helper' configuration for en.wikipedia" patch is on mwdebug1001 [13:05:43] err [13:05:44] no [13:05:45] it timed out [13:05:46] (03PS2) 10MarcoAurelio: Create a 'patroller' user group at Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378695 (https://phabricator.wikimedia.org/T176079) [13:06:01] ah it is on [13:06:04] hashar: it's there or not :) ? [13:06:08] yes yes :) [13:06:13] (03CR) 10jenkins-bot: New 'abusefilter-helper' configuration for en.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377473 (https://phabricator.wikimedia.org/T175684) (owner: 10MarcoAurelio) [13:06:16] okay, let's take a peek [13:06:40] (03PS2) 10Hashar: pagePreviews: Stop A/B test on enwiki and dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378686 (https://phabricator.wikimedia.org/T176068) (owner: 10Phuedx) [13:06:46] (03CR) 10Hashar: [C: 032] pagePreviews: Stop A/B test on enwiki and dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378686 (https://phabricator.wikimedia.org/T176068) (owner: 10Phuedx) [13:06:50] stupid file [13:06:52] and gerrit [13:06:53] bah [13:07:17] Pchelolo: going to deploy the beta feature thing to mwdebug1001 but I guess there is not much to chec [13:07:18] k [13:07:48] hashar: looks good to me [13:07:57] hashar: ye, it's executed on a job runner, so I can't really test it [13:08:01] rights and grants on Special:ListGroupRights are okay [13:08:10] Pchelolo: so I am going to sync it in a few minutes :] [13:09:34] !log hashar@tin Synchronized wmf-config: New 'abusefilter-helper' configuration for en.wikipedia - T175684 (duration: 00m 49s) [13:09:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:49] T175684: Please create the Edit filter helper user group on en.wp - https://phabricator.wikimedia.org/T175684 [13:10:46] !log hashar@tin Synchronized php-1.30.0-wmf.18/extensions/BetaFeatures: Temporary log all the executions of the update job. - T175637 (duration: 00m 46s) [13:10:50] Pchelolo: synced! [13:11:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:11:00] (03Merged) 10jenkins-bot: pagePreviews: Stop A/B test on enwiki and dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378686 (https://phabricator.wikimedia.org/T176068) (owner: 10Phuedx) [13:11:00] T175637: End of September milestone: Migrate first production use case - https://phabricator.wikimedia.org/T175637 [13:11:14] (03CR) 10jenkins-bot: pagePreviews: Stop A/B test on enwiki and dewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378686 (https://phabricator.wikimedia.org/T176068) (owner: 10Phuedx) [13:11:20] Thank you hashar I'll monitor it for a bit, the logs should start showing up [13:11:46] phuedx: and your "pagePreviews: Stop A/B test on enwiki and dewiki" is on mwdebug1001 [13:11:53] hashar: ta -- testing now [13:12:00] Pchelolo: I dont think we have a logger for that log bucket [13:12:03] hashar: next patch of mine requires running a maintenance script after being merged [13:12:12] tabbycat: yeah I will do it last [13:12:27] okay! I'll still be around [13:12:34] poke me when you're ready [13:13:03] Pchelolo: what mean is that if I remember correctly, the 'updateBetaFeaturesUserCounts' log bucket has to be explciitly configured in mediawiki-config . Though I might mis remember [13:13:05] (03PS1) 10Muehlenhoff: Remove salt grains used for trebuchet [puppet] - 10https://gerrit.wikimedia.org/r/378696 [13:13:45] hashar: ye, seems like you are right... will do shortly, but I guess it will have to wait till the next SWAT window [13:14:02] Pchelolo: i dont mind enabling it out of the window :) [13:15:18] hashar: ok, thank you, I'll add you to the patch as a reviewer [13:15:28] Pchelolo: check wmgMonologChannels in wmf-config/InitialiseSettings.php [13:15:49] hashar: lgtm -- checked that page previews works on cawiki and that it's configured as expected on enwiki [13:16:04] Pchelolo: adding: 'updateBetaFeaturesUserCounts' => 'info' should do the job [13:16:15] and that it's off by default on dewiki [13:17:14] Pchelolo: and we also have logs for the job on mwlog1001.eqiad.wmnet in /srv/mw-log/runJobs.log [13:17:18] Pchelolo: eg: grep updateBetaFeaturesUserCounts.*job_duration /srv/mw-log/runJobs.log [13:17:25] (which should be somehow in logstash iirc) [13:18:10] phuedx: syncing it: [13:18:52] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: pagePreviews: Stop A/B test on enwiki and dewiki - T176068 (duration: 00m 45s) [13:19:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:19:07] T176068: Stop page previews A/B test on enwiki and dewiki - https://phabricator.wikimedia.org/T176068 [13:19:34] (03PS3) 10Hashar: Meta(Talk)Namespace configuration for be.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378234 (https://phabricator.wikimedia.org/T175950) (owner: 10MarcoAurelio) [13:19:54] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378234 (https://phabricator.wikimedia.org/T175950) (owner: 10MarcoAurelio) [13:19:54] =^o^= [13:20:01] (03CR) 10Volans: [C: 031] "thanks for the fixes! LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/378236 (https://phabricator.wikimedia.org/T156933) (owner: 10Elukey) [13:20:21] (03CR) 10Ema: [C: 032] cache::text: enable nginx-lua-prometheus [puppet] - 10https://gerrit.wikimedia.org/r/378671 (owner: 10Ema) [13:21:19] tabbycat: doing to the bewiktionary one now [13:21:29] k [13:21:30] (03Merged) 10jenkins-bot: Meta(Talk)Namespace configuration for be.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378234 (https://phabricator.wikimedia.org/T175950) (owner: 10MarcoAurelio) [13:21:33] I'm ready to test [13:21:40] (03CR) 10jenkins-bot: Meta(Talk)Namespace configuration for be.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378234 (https://phabricator.wikimedia.org/T175950) (owner: 10MarcoAurelio) [13:22:03] when you run the script hashar, please paste the output on the Phab task so we can check if everything went okay if possible [13:24:03] (03PS1) 10Herron: Lists: Change zen.spamhaus.org DNSBL action from warn to drop [puppet] - 10https://gerrit.wikimedia.org/r/378697 (https://phabricator.wikimedia.org/T175878) [13:24:05] hashar: hm I wasn't aware of those logs, they actually might be sufficient for what we're trying to do here [13:24:57] tabbycat: syncing it [13:25:07] hashar: can we test it first? [13:25:11] on mwdebug? [13:25:14] tabbycat: and the paste is at https://phabricator.wikimedia.org/T175950#3614693 [13:25:19] ah, well [13:25:35] Pchelolo: which would be quite magic :] [13:25:38] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Meta(Talk)Namespace configuration for be.wiktionary - T175950 (duration: 00m 46s) [13:25:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:54] T175950: Fix the project namespace at be.wiktionary.org - https://phabricator.wikimedia.org/T175950 [13:25:56] Pchelolo: we might even have some stats in graphite/grafana on a per job basis [13:26:37] Pchelolo: potentially: https://grafana.wikimedia.org/dashboard/db/job-queue-rate?orgId=1&var-Job=all&var-Job=updateBetaFeaturesUserCounts [13:27:03] Pchelolo: errrr proper link is https://grafana.wikimedia.org/dashboard/db/job-queue-rate?orgId=1&var-Job=updateBetaFeaturesUserCounts [13:27:06] (03CR) 10Elukey: [C: 04-1] "Self -1 for extra blame for the sloppiness :)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/377753 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey) [13:27:25] hashar: such graph! https://grafana.wikimedia.org/dashboard/db/eventlogging-schema?orgId=1&from=now-15m&to=now [13:27:50] phuedx: E_TOO_MANY_GRAPHS :] [13:28:01] hashar: NEVER! [13:28:02] (03PS1) 10Ema: cache::text: enable lua support [puppet] - 10https://gerrit.wikimedia.org/r/378700 [13:28:20] tabbycat: so be.wiktionary should be good. With 3 pages that gotta be fixed up because they were duplicates [13:28:21] are you guys turning off Popups? [13:28:38] hashar: I'm fixing the redirects now [13:28:39] elukey: we're turning off popups eventlogging [13:28:39] thanks [13:28:42] yes [13:28:45] tabbycat: you are awesome [13:28:45] nice! [13:28:51] !log ppchelko@tin Started deploy [cpjobqueue/deploy@2c422f5]: Annotate the job event with pipeline property [13:28:53] (03CR) 10Filippo Giunchedi: [C: 031] cache::text: enable lua support [puppet] - 10https://gerrit.wikimedia.org/r/378700 (owner: 10Ema) [13:28:58] i believe tbayer notified you on thursday/friday? [13:28:58] (03PS1) 10ArielGlenn: Move nfs and directory setup for dumpsdata hosts into dumps module [puppet] - 10https://gerrit.wikimedia.org/r/378701 (https://phabricator.wikimedia.org/T175606) [13:29:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:29:04] hashar: tu est bien gentil [13:29:12] (that's the last time i synced w/ him) [13:29:13] Pchelolo: then I dont mind enabling logging since that is low traffic anyway. That might save you the trouble of having to grep in huge fils [13:29:21] !log ppchelko@tin Finished deploy [cpjobqueue/deploy@2c422f5]: Annotate the job event with pipeline property (duration: 00m 29s) [13:29:21] elukey: thanks for all of your support <3 [13:29:23] (03CR) 10jerkins-bot: [V: 04-1] Move nfs and directory setup for dumpsdata hosts into dumps module [puppet] - 10https://gerrit.wikimedia.org/r/378701 (https://phabricator.wikimedia.org/T175606) (owner: 10ArielGlenn) [13:29:27] (03CR) 10Ema: [C: 032] cache::text: enable lua support [puppet] - 10https://gerrit.wikimedia.org/r/378700 (owner: 10Ema) [13:29:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:30:09] phuedx: is Popups the ext that, when hovering a link was showing a preview of the article? [13:30:19] hashar: yes [13:30:23] :(( [13:30:27] no no no [13:30:33] we're just turning off the instrumentation! [13:30:36] (03PS12) 10Elukey: role::kafka::jumbo::broker: enable Prometheus JMX monitoring [puppet] - 10https://gerrit.wikimedia.org/r/377753 (https://phabricator.wikimedia.org/T175922) [13:31:07] we were just testing on enwiki and dewiki to collect data alongside fundraising tech [13:31:20] it'll be out on enwiki and dewiki one day [13:31:22] ! [13:31:45] meanwhile it is a beta features only right? :) [13:31:50] (03PS2) 10ArielGlenn: Move nfs and directory setup for dumpsdata hosts into dumps module [puppet] - 10https://gerrit.wikimedia.org/r/378701 (https://phabricator.wikimedia.org/T175606) [13:31:53] hashar: when you got a minute if you can run the script again just to be sure I got everything? [13:32:05] tabbycat: sure! [13:32:43] tabbycat: yeah it is all good \o/ [13:32:56] 0 pages to fix, 0 were resolvable. [13:32:56] 0 links to fix, 0 were resolvable. [13:33:08] chachi :) [13:33:24] okay so I'm done for this swat window [13:34:37] hashar: it's beta features only on enwiki and dewiki [13:34:44] it's 100% rolled out to all other wikis [13:36:52] (03PS1) 10Ppchelko: [Logging config] Enable logging for updateBetaFeaturesUserCounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378703 [13:38:21] hashar: heh, apparently the logs in the file are not sufficient for us.. I've made a patch for the config and added you as a reviewer there [13:39:21] (03PS2) 10Ppchelko: [Logging config] Enable logging for updateBetaFeaturesUserCounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378703 [13:45:51] (03PS1) 10Ema: prometheus: add nginx_cache_text cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/378705 [13:47:17] (03CR) 10Ottomata: [C: 031] ":D" [puppet] - 10https://gerrit.wikimedia.org/r/377753 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey) [13:47:56] (03CR) 10Filippo Giunchedi: [C: 031] prometheus: add nginx_cache_text cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/378705 (owner: 10Ema) [13:48:27] ottomata: o/ shall we merge https://gerrit.wikimedia.org/r/377753 ? [13:50:39] (03PS3) 10Matthias Mullie: Add 3d2png deploy repo to image scalers [puppet] - 10https://gerrit.wikimedia.org/r/378233 (https://phabricator.wikimedia.org/T160185) [13:50:53] (03CR) 10Ema: [C: 032] prometheus: add nginx_cache_text cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/378705 (owner: 10Ema) [13:51:05] (03CR) 10Mobrovac: [C: 031] [Logging config] Enable logging for updateBetaFeaturesUserCounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378703 (owner: 10Ppchelko) [13:55:16] (03PS1) 10Ayounsi: Add OpenGear support to Rancid [puppet] - 10https://gerrit.wikimedia.org/r/378708 (https://phabricator.wikimedia.org/T175876) [13:58:23] (03PS3) 10Ppchelko: [Logging config] Enable logging for updateBetaFeaturesUserCounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378703 [14:00:22] (03CR) 10Mobrovac: [C: 031] [Logging config] Enable logging for updateBetaFeaturesUserCounts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378703 (owner: 10Ppchelko) [14:04:05] PROBLEM - Check Varnish expiry mailbox lag on cp1050 is CRITICAL: CRITICAL: expiry mailbox lag is 2120633 [14:09:20] (03CR) 10Matthias Mullie: Add 3d2png deploy repo to image scalers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/378233 (https://phabricator.wikimedia.org/T160185) (owner: 10Matthias Mullie) [14:11:30] (03CR) 10Muehlenhoff: Fix parsing of necessary restarts in query_restart (032 comments) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/376688 (owner: 10Muehlenhoff) [14:13:28] (03PS3) 10Elukey: eventlogging_cleaner.py: add feature to pick start_ts from file [puppet] - 10https://gerrit.wikimedia.org/r/378236 (https://phabricator.wikimedia.org/T156933) [14:14:12] (03CR) 10Elukey: [C: 032] eventlogging_cleaner.py: add feature to pick start_ts from file [puppet] - 10https://gerrit.wikimedia.org/r/378236 (https://phabricator.wikimedia.org/T156933) (owner: 10Elukey) [14:15:14] jouncebot: now [14:15:14] No deployments scheduled for the next 2 hour(s) and 44 minute(s) [14:15:17] jouncebot: next [14:15:17] In 2 hour(s) and 44 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170918T1700) [14:16:38] (03PS13) 10Elukey: role::kafka::jumbo::broker: enable Prometheus JMX monitoring [puppet] - 10https://gerrit.wikimedia.org/r/377753 (https://phabricator.wikimedia.org/T175922) [14:17:46] godog: ok to merge --^ ? [14:21:12] (03PS4) 10Muehlenhoff: Fix parsing of necessary restarts in query_restart [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/376688 [14:21:52] (03CR) 10Rush: [C: 031] "I left some subjective feedback, this should work I think as-is" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/377787 (https://phabricator.wikimedia.org/T175712) (owner: 10Volans) [14:23:52] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/377753 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey) [14:23:55] elukey: yup [14:25:27] (03PS2) 10Giuseppe Lavagetto: Add fluent-bit image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/378260 (https://phabricator.wikimedia.org/T175527) [14:25:43] (03CR) 10Elukey: [C: 032] role::kafka::jumbo::broker: enable Prometheus JMX monitoring [puppet] - 10https://gerrit.wikimedia.org/r/377753 (https://phabricator.wikimedia.org/T175922) (owner: 10Elukey) [14:27:38] !log reedy@tin Synchronized php-1.30.0-wmf.18/includes/filerepo/file/LocalFile.php: T175444 (duration: 00m 47s) [14:27:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:27:53] T175444: File History: Comments are not displayed all versions - https://phabricator.wikimedia.org/T175444 [14:30:42] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Improvements to the build script [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/378259 (owner: 10Giuseppe Lavagetto) [14:30:56] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Add fluent-bit image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/378260 (https://phabricator.wikimedia.org/T175527) (owner: 10Giuseppe Lavagetto) [14:34:24] (03PS1) 10Giuseppe Lavagetto: Fix container references [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/378714 [14:34:55] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to scb* and pdfrender-admin for tgr - https://phabricator.wikimedia.org/T175882#3614854 (10RobH) [14:50:01] (03PS1) 10Filippo Giunchedi: prometheus: add analytics instance [puppet] - 10https://gerrit.wikimedia.org/r/378716 (https://phabricator.wikimedia.org/T175922) [14:50:06] (03PS1) 10Herron: MX: Add zen.spamhaus.org DNSBL check to MTA rcpt acl [puppet] - 10https://gerrit.wikimedia.org/r/378717 (https://phabricator.wikimedia.org/T175879) [14:50:08] (03PS1) 10Herron: MX: Change zen.spamhaus.org DNSBL action from warn to drop [puppet] - 10https://gerrit.wikimedia.org/r/378718 (https://phabricator.wikimedia.org/T175879) [14:51:03] (03CR) 10Herron: "No to be merged before 378717" [puppet] - 10https://gerrit.wikimedia.org/r/378718 (https://phabricator.wikimedia.org/T175879) (owner: 10Herron) [14:53:54] (03CR) 10Herron: [C: 032] MX: Add zen.spamhaus.org DNSBL check to MTA rcpt acl [puppet] - 10https://gerrit.wikimedia.org/r/378717 (https://phabricator.wikimedia.org/T175879) (owner: 10Herron) [14:54:01] (03PS2) 10Herron: MX: Add zen.spamhaus.org DNSBL check to MTA rcpt acl [puppet] - 10https://gerrit.wikimedia.org/r/378717 (https://phabricator.wikimedia.org/T175879) [15:00:15] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 2104790 [15:01:55] PROBLEM - Check Varnish expiry mailbox lag on cp1072 is CRITICAL: CRITICAL: expiry mailbox lag is 2210554 [15:04:34] (03PS1) 10Rush: openstack: prometheus node_exporter restrict listen IP [puppet] - 10https://gerrit.wikimedia.org/r/378725 (https://phabricator.wikimedia.org/T169039) [15:17:39] (03Abandoned) 10Elukey: [WIP] Yandex ClickHouse puppetization [puppet] - 10https://gerrit.wikimedia.org/r/325797 (https://phabricator.wikimedia.org/T150343) (owner: 10Elukey) [15:19:48] 10Operations, 10monitoring: Monitor internal CA expirations - https://phabricator.wikimedia.org/T171157#3614975 (10faidon) a:05Dzahn>03akosiaris [15:20:25] (03Abandoned) 10Hashar: salt: fix grain-ensure comparison [puppet] - 10https://gerrit.wikimedia.org/r/348928 (https://phabricator.wikimedia.org/T146914) (owner: 10Hashar) [15:20:26] 10Operations, 10ops-eqiad, 10DC-Ops: scs-c1-eqiad unresponsive - https://phabricator.wikimedia.org/T175625#3614977 (10Cmjohnson) @robh there is not a reset button just an erase button. which I did try....there is zero power to the switch and did not work. [15:23:16] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to scb* and pdfrender-admin for tgr - https://phabricator.wikimedia.org/T175882#3614981 (10GWicke) I strongly support @tgr's access request as well. [15:26:02] !log cp1050 - backend restart for mailbox lahg [15:26:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:39] (03PS1) 10BBlack: Revert "VCL: fixed keep values: 7d def, 1d for text" [puppet] - 10https://gerrit.wikimedia.org/r/378731 [15:27:49] (03PS2) 10BBlack: Revert "VCL: fixed keep values: 7d def, 1d for text" [puppet] - 10https://gerrit.wikimedia.org/r/378731 [15:28:39] !log cp1074 - backend restart for mailbox lag [15:28:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:51] (03CR) 10Giuseppe Lavagetto: [C: 031] "did not recheck the division, but otherwise lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/377774 (https://phabricator.wikimedia.org/T165519) (owner: 10Elukey) [15:30:15] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 0 [15:31:12] 10Operations, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Extract metrics from logs - https://phabricator.wikimedia.org/T147923#3615009 (10fgiunchedi) Update: with latest upstream git of `mtail` things seem stable so far. [15:31:16] (03PS1) 10Ema: VCL: remove wikiScrape rate limiting [puppet] - 10https://gerrit.wikimedia.org/r/378732 [15:32:28] (03CR) 10Ema: [C: 031] Revert "VCL: fixed keep values: 7d def, 1d for text" [puppet] - 10https://gerrit.wikimedia.org/r/378731 (owner: 10BBlack) [15:33:23] (03PS2) 10BryanDavis: Install composer for PHP imaages [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/369838 (https://phabricator.wikimedia.org/T172358) [15:33:31] 10Operations, 10Traffic, 10monitoring, 10Patch-For-Review: prometheus -> grafana stats for per-numa-node meminfo - https://phabricator.wikimedia.org/T175636#3615011 (10fgiunchedi) @bblack your patch to add `meminfo_numa` seems to be working! Anything left to do ? [15:34:05] RECOVERY - Check Varnish expiry mailbox lag on cp1050 is OK: OK: expiry mailbox lag is 0 [15:35:09] (03CR) 10BryanDavis: Install composer for PHP imaages (031 comment) [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/369838 (https://phabricator.wikimedia.org/T172358) (owner: 10BryanDavis) [15:38:04] 10Operations, 10Traffic, 10monitoring, 10Patch-For-Review: prometheus -> grafana stats for per-numa-node meminfo - https://phabricator.wikimedia.org/T175636#3615020 (10BBlack) yeah, put it somewhere useful in grafana :) [15:38:24] !log cp1072 - varnish backend restart, mailbox lag [15:38:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:41:55] RECOVERY - Check Varnish expiry mailbox lag on cp1072 is OK: OK: expiry mailbox lag is 0 [15:44:03] 10Operations, 10DBA: Lost access to x1-analytics-slave - https://phabricator.wikimedia.org/T175970#3615045 (10Etonkovidova) @jcrespo Thank you very much! [15:48:23] (03CR) 10Thcipriani: [C: 031] Scap3: Go ahead and `scap deploy --init` a freshly provisioned repo [puppet] - 10https://gerrit.wikimedia.org/r/377304 (owner: 10Chad) [15:48:59] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to scb* and pdfrender-admin for tgr - https://phabricator.wikimedia.org/T175882#3615054 (10RobH) Noted that you've signed L3 and all that is needed id pfdrender-admin. I've prepared the patchset (linked in task description) and it is... [15:51:47] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se, 10Wikidata-Sprint-2016-11-08: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3615093 (10Dzahn) a:03Dzahn [15:53:46] (03Abandoned) 10ArielGlenn: adapt trebuchet-trigger for timeout to restart function [software/deployment/trebuchet-trigger] - 10https://gerrit.wikimedia.org/r/269465 (https://phabricator.wikimedia.org/T63882) (owner: 10ArielGlenn) [15:54:20] (03CR) 10Thcipriani: "Few comments inline" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/378696 (owner: 10Muehlenhoff) [15:54:34] <_joe_> win 94 [15:55:00] (03Abandoned) 10ArielGlenn: don't deployment_server_init on every puppet run [puppet] - 10https://gerrit.wikimedia.org/r/219372 (owner: 10ArielGlenn) [15:55:04] you should close some , joe [15:55:16] 100 is the psychological limit, heh [15:56:11] (03CR) 10Dzahn: [C: 031] "yes, i'll take care of that. separately, we should start using systemctl instead of "service"" [puppet] - 10https://gerrit.wikimedia.org/r/378664 (https://phabricator.wikimedia.org/T167845) (owner: 10Hashar) [16:05:50] !log cp1063 - backend restart, mailbox lag [16:06:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:07:29] 10Operations, 10Traffic: cp1066 unexplained 503 spikes - https://phabricator.wikimedia.org/T175319#3615165 (10BBlack) [16:07:31] 10Operations, 10Traffic, 10Patch-For-Review: Text eqiad varnish 503 spikes - https://phabricator.wikimedia.org/T175803#3615163 (10BBlack) [16:10:56] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [16:12:08] 10Operations, 10OTRS: mendelevium (otrs) running out of inodes - https://phabricator.wikimedia.org/T171490#3615173 (10akosiaris) 05Open>03Resolved a:03akosiaris [16:12:18] 10Operations, 10MediaWiki-Maintenance-scripts: wikitech-static sync failing - https://phabricator.wikimedia.org/T176090#3615175 (10Reedy) p:05Triage>03Normal So this should be fixed now... Both with the `strval()` change, but also the underlying issue being fixed in production The `strval()` change could... [16:14:22] ^ above seems to be a small 500-spike, probably not the recent cache_text 503 stuff [16:14:34] akosiaris, arlolra and i were wondering if we should hold parsoid deployments while you are investigating the pooling issue with the new eqiad servers. [16:17:35] subbu: no, don't. I don't want you blocked on me and it looks like the issue is a race between having the hosts defined in conftool-data and code being deployed, which means it's possibly easily avoidable. I 'll probably take a node of rotation and try to reproduce, but it won't somehow cause you issues [16:17:54] s/won't/shouldn't/ [16:17:55] :-D [16:18:10] ok. [16:19:02] (03CR) 10Elukey: "Looks good except the few comments (triggering also the jenkins -1)" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/376640 (https://phabricator.wikimedia.org/T162034) (owner: 10Nuria) [16:19:05] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [16:20:00] (03PS3) 10Krinkle: Install composer for PHP images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/369838 (https://phabricator.wikimedia.org/T172358) (owner: 10BryanDavis) [16:20:46] (03CR) 10Krinkle: [C: 031] Allow contint-admins to interact with zuul service [puppet] - 10https://gerrit.wikimedia.org/r/378664 (https://phabricator.wikimedia.org/T167845) (owner: 10Hashar) [16:22:32] 10Operations: Create affcom-staff email account - https://phabricator.wikimedia.org/T176153#3615202 (10Reedy) a:05ops-monitoring-bot>03None [16:22:47] 10Operations: Create affcom-staff email account - https://phabricator.wikimedia.org/T176153#3615189 (10Reedy) Isn't this a request for #office-it ? [16:23:13] 10Operations: Create affcom-staff email account - https://phabricator.wikimedia.org/T176153#3615209 (10egalvezwmf) p:05Triage>03Normal [16:24:22] 10Operations, 10Discovery, 10Discovery-Search, 10Elasticsearch, 10Epic: EPIC: Cultivating the Elasticsearch garden (operational lessons from 1.7.1 upgrade) - https://phabricator.wikimedia.org/T109089#3615226 (10demon) [16:30:06] (03PS2) 10Muehlenhoff: Remove salt grains used for trebuchet [puppet] - 10https://gerrit.wikimedia.org/r/378696 [16:30:55] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 28 probes of 287 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [16:31:35] 10Operations, 10Gerrit, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Investigate seemingly random Gerrit slow-downs - https://phabricator.wikimedia.org/T148478#3615250 (10demon) p:05High>03Low [16:34:59] 10Operations, 10ops-eqiad, 10DBA: db1100 crashed - https://phabricator.wikimedia.org/T175973#3615256 (10RobH) We should update the firmware versions (as they are likely not the latest.) If we open a case with Dell, it will be the first thing they recommend. Updating the bios firmware requires reboot when... [16:35:38] (03PS1) 10BBlack: depool codfw front edge traffic [dns] - 10https://gerrit.wikimedia.org/r/378738 [16:35:55] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 1 probes of 287 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [16:35:55] (03CR) 10BBlack: [C: 032] depool codfw front edge traffic [dns] - 10https://gerrit.wikimedia.org/r/378738 (owner: 10BBlack) [16:40:04] !log reedy@tin Synchronized php-1.30.0-wmf.18/maintenance/populateIpChanges.php: perf improvements T175962 (duration: 00m 46s) [16:40:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:40:20] T175962: Issue with maintenance script: SELECTing revisions with high rev_id is painfully slow - https://phabricator.wikimedia.org/T175962 [16:40:27] (03CR) 10Muehlenhoff: Remove salt grains used for trebuchet (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/378696 (owner: 10Muehlenhoff) [16:43:13] (03PS1) 10BryanDavis: wmcs: Add wikireplica_dns management script [puppet] - 10https://gerrit.wikimedia.org/r/378739 (https://phabricator.wikimedia.org/T174860) [16:43:38] (03CR) 10jerkins-bot: [V: 04-1] wmcs: Add wikireplica_dns management script [puppet] - 10https://gerrit.wikimedia.org/r/378739 (https://phabricator.wikimedia.org/T174860) (owner: 10BryanDavis) [16:46:32] !log shuting down db1100 T175973 [16:46:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:46:47] T175973: db1100 crashed - https://phabricator.wikimedia.org/T175973 [16:48:56] (03PS2) 10BryanDavis: wmcs: Add wikireplica_dns management script [puppet] - 10https://gerrit.wikimedia.org/r/378739 (https://phabricator.wikimedia.org/T174860) [16:49:39] 10Operations, 10ops-eqiad, 10DBA: db1100 crashed - https://phabricator.wikimedia.org/T175973#3615312 (10jcrespo) db1100 is depooled, I have downtime'ed it for a week so the BIOS update can happen at any time. [16:53:05] (03PS3) 10BryanDavis: wmcs: Add wikireplica_dns management script [puppet] - 10https://gerrit.wikimedia.org/r/378739 (https://phabricator.wikimedia.org/T174860) [16:54:30] 10Operations, 10ops-eqiad, 10Traffic, 10netops: Upgrade BIOS/RBSU/etc on lvs1007 - https://phabricator.wikimedia.org/T167299#3615327 (10RobH) a:05Cmjohnson>03RobH Ok, since the firmware updates, the host lvs1007 won't pxe boot. I'll investigate today/tomorrow and try to make it so this host will pxe b... [16:55:31] (03PS2) 10RobH: add tgr to pdfrender-admin sudo group [puppet] - 10https://gerrit.wikimedia.org/r/378060 (https://phabricator.wikimedia.org/T175882) [16:55:47] (03CR) 10RobH: [C: 032] add tgr to pdfrender-admin sudo group [puppet] - 10https://gerrit.wikimedia.org/r/378060 (https://phabricator.wikimedia.org/T175882) (owner: 10RobH) [16:56:56] 10Operations, 10Ops-Access-Requests: Requesting access to scb* and pdfrender-admin for tgr - https://phabricator.wikimedia.org/T175882#3615351 (10RobH) [17:00:04] gehel: It is that lovely time of the day again! You are hereby commanded to deploy Wikidata Query Service weekly deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170918T1700). [17:00:04] No patches in the queue for this window. Wheeee! [17:00:09] 10Operations, 10Release Pipeline, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Provision Docker >= 17.05 on contint1001 - https://phabricator.wikimedia.org/T175293#3615375 (10akosiaris) >>! In T175293#3611511, @thcipriani wrote: >>>! In T175293#3599497, @hashar wrote: >> Potentially the require... [17:00:54] (03CR) 10BryanDavis: "I would like Jamie and/or Manuel to confirm if managing these service names at the shard level is enough flexibility for their expected ne" [puppet] - 10https://gerrit.wikimedia.org/r/378739 (https://phabricator.wikimedia.org/T174860) (owner: 10BryanDavis) [17:01:08] 10Operations, 10ops-eqiad, 10DBA: db1100 crashed - https://phabricator.wikimedia.org/T175973#3615381 (10jcrespo) And put it down CC @Cmjohnson. [17:09:34] (03PS1) 10Muehlenhoff: Remove access for nschaaf [puppet] - 10https://gerrit.wikimedia.org/r/378742 [17:10:02] (03CR) 10Catrope: Enable structured change filters by default on cawiki, frwiki and hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378264 (https://phabricator.wikimedia.org/T157642) (owner: 10Catrope) [17:10:04] (03CR) 10Catrope: Enable $wgStructuredChangeFiltersOnWatchlist on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378263 (https://phabricator.wikimedia.org/T164234) (owner: 10Catrope) [17:10:23] (03PS2) 10Dzahn: Allow contint-admins to interact with zuul service [puppet] - 10https://gerrit.wikimedia.org/r/378664 (https://phabricator.wikimedia.org/T167845) (owner: 10Hashar) [17:11:12] (03CR) 10Dzahn: [C: 032] Allow contint-admins to interact with zuul service [puppet] - 10https://gerrit.wikimedia.org/r/378664 (https://phabricator.wikimedia.org/T167845) (owner: 10Hashar) [17:12:38] (03PS2) 10Muehlenhoff: Remove access for nschaaf [puppet] - 10https://gerrit.wikimedia.org/r/378742 [17:13:05] (03CR) 10Chad: "Is this still needed?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320980 (https://phabricator.wikimedia.org/T115713) (owner: 10DCausse) [17:13:47] (03Abandoned) 10DCausse: [WIP] test job jenkins with mw-core [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320980 (https://phabricator.wikimedia.org/T115713) (owner: 10DCausse) [17:13:49] (03CR) 10Volans: "I might miss something obvious here but I think there is an error, see inline." (032 comments) [debs/debdeploy] - 10https://gerrit.wikimedia.org/r/376688 (owner: 10Muehlenhoff) [17:13:51] (03CR) 10Muehlenhoff: [C: 032] Remove access for nschaaf [puppet] - 10https://gerrit.wikimedia.org/r/378742 (owner: 10Muehlenhoff) [17:14:20] (03Abandoned) 10DCausse: [DNM] new discovery service for CirrusSearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/345115 (owner: 10DCausse) [17:14:49] 10Operations: Create affcom-staff email account - https://phabricator.wikimedia.org/T176153#3615469 (10Framawiki) [17:15:32] (03CR) 10Dzahn: "@Paladox i see your comments, but see inline as well.. am i not reducing the hardcoding and making it more flexible? Is it just that the " (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/378360 (owner: 10Dzahn) [17:15:54] (03PS3) 10Chad: Remove outdated comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336704 (owner: 10Platonides) [17:16:15] (03CR) 10Chad: [C: 032] Remove outdated comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336704 (owner: 10Platonides) [17:16:44] (03CR) 10Paladox: [C: 04-1] ">" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/378360 (owner: 10Dzahn) [17:18:04] (03CR) 10Jcrespo: [C: 031] "It makes little sense to do it finer; unless something changes that I could not even think of now a day." [puppet] - 10https://gerrit.wikimedia.org/r/378739 (https://phabricator.wikimedia.org/T174860) (owner: 10BryanDavis) [17:19:03] (03CR) 10Paladox: [C: 04-1] gerrit: fix host for TLS cert/monitoring if on slave (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/378360 (owner: 10Dzahn) [17:19:14] 10Operations, 10ops-eqiad, 10Traffic, 10netops: Upgrade BIOS/RBSU/etc on lvs1007 - https://phabricator.wikimedia.org/T167299#3615489 (10RobH) a:05RobH>03Cmjohnson This is an HP gen8, so I cannot actually load the bios remotely and check the PXE settings for the cards. This issue sounds like the NIC c... [17:20:35] (03CR) 10Chad: gerrit: fix host for TLS cert/monitoring if on slave (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/378360 (owner: 10Dzahn) [17:23:41] (03Merged) 10jenkins-bot: Remove outdated comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336704 (owner: 10Platonides) [17:25:06] (03CR) 10Paladox: [C: 04-1] gerrit: fix host for TLS cert/monitoring if on slave (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/378360 (owner: 10Dzahn) [17:26:11] !log demon@tin Synchronized wmf-config/ProductionServices.php: no-op/comment fix (duration: 00m 45s) [17:26:19] (03CR) 10jenkins-bot: Remove outdated comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336704 (owner: 10Platonides) [17:26:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:27:11] 10Operations, 10Release-Engineering-Team, 10Epic, 10Services (watching): FY2017/18 Program 6 - Outcome 2 - Objective 2: Set up a continuous integration and deployment pipeline - https://phabricator.wikimedia.org/T170481#3615520 (10thcipriani) [17:28:01] (03PS7) 10Jcrespo: Add new m1 host db2078, enable firewall on all misc services [puppet] - 10https://gerrit.wikimedia.org/r/377460 (https://phabricator.wikimedia.org/T175685) [17:41:58] (03CR) 10BryanDavis: [C: 032] Install composer for PHP images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/369838 (https://phabricator.wikimedia.org/T172358) (owner: 10BryanDavis) [17:42:20] !log smalyshev@tin Started deploy [wdqs/wdqs@ecdbd0d]: Deploying Blazegraph fixes and category scripts for WDQS [17:42:32] (03Merged) 10jenkins-bot: Install composer for PHP images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/369838 (https://phabricator.wikimedia.org/T172358) (owner: 10BryanDavis) [17:42:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:42:42] !log smalyshev@tin Finished deploy [wdqs/wdqs@ecdbd0d]: Deploying Blazegraph fixes and category scripts for WDQS (duration: 00m 21s) [17:42:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:56:16] (03PS11) 10Volans: WMCS: install Cumin for WMCS admins [puppet] - 10https://gerrit.wikimedia.org/r/377787 (https://phabricator.wikimedia.org/T175712) [17:56:49] (03PS1) 10Thcipriani: Beta: Scap: canary_dashboard_url to beta logstash [puppet] - 10https://gerrit.wikimedia.org/r/378750 (https://phabricator.wikimedia.org/T168211) [17:57:19] (03CR) 10Volans: "Thanks for the review, replies inline. I've dropped a parameter I don't think I'll need and added another one in the configuration. Can be" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/377787 (https://phabricator.wikimedia.org/T175712) (owner: 10Volans) [17:57:57] 10Operations, 10ops-eqiad, 10DBA: db1100 crashed - https://phabricator.wikimedia.org/T175973#3615645 (10RobH) a:03Cmjohnson Actually, turns out this is already running the newest bios firmware: http://www.dell.com/support/home/us/en/19/product-support/servicetag/jcb8hh2/drivers BIOS Version: 2.4.3 is the... [18:00:04] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: (Dis)respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170918T1800). Please do the needful. [18:00:05] MaxSem: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break the wikis, you will be rewarded with a sticker. [18:00:35] guess I'll be the trigger puller [18:01:28] (03PS2) 10MaxSem: Leave a comment that ACW must be loaded before VE [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376791 [18:01:42] (03CR) 10MaxSem: [C: 032] Leave a comment that ACW must be loaded before VE [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376791 (owner: 10MaxSem) [18:02:18] 10Operations, 10ops-eqiad, 10DBA: db1100 crashed - https://phabricator.wikimedia.org/T175973#3615656 (10RobH) some googling http://www.dell.com/support/manuals/uk/en/ukdhs1/dell-opnmang-sw-v8.2/eemi_13g_v1.3-v2/pwr-event-messages?guid=guid-5bc564bd-a527-4c29-828b-ff4720644565&lang=en-us Message The I... [18:04:28] 10Operations, 10ops-eqiad, 10DBA: db1100 crashed - https://phabricator.wikimedia.org/T175973#3615664 (10RobH) @jcrespo: Was the error output in the task description from the OS? We don't see any errors in the idrac/ilom event log. [18:07:52] (03Merged) 10jenkins-bot: Leave a comment that ACW must be loaded before VE [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376791 (owner: 10MaxSem) [18:08:03] (03CR) 10jenkins-bot: Leave a comment that ACW must be loaded before VE [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376791 (owner: 10MaxSem) [18:08:18] (03PS1) 10RobH: adding wmf staff david chan to admin module ldap section [puppet] - 10https://gerrit.wikimedia.org/r/378754 (https://phabricator.wikimedia.org/T176142) [18:09:08] (03CR) 10RobH: [C: 032] adding wmf staff david chan to admin module ldap section [puppet] - 10https://gerrit.wikimedia.org/r/378754 (https://phabricator.wikimedia.org/T176142) (owner: 10RobH) [18:10:10] (03PS2) 10MaxSem: Start migration to Unicode sections everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378357 (https://phabricator.wikimedia.org/T152540) [18:10:19] (03CR) 10MaxSem: [C: 032] Start migration to Unicode sections everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378357 (https://phabricator.wikimedia.org/T152540) (owner: 10MaxSem) [18:13:54] (03CR) 10Paladox: [C: 031] "I guess this is now ready now that the scap change is merged?" [puppet] - 10https://gerrit.wikimedia.org/r/374667 (https://phabricator.wikimedia.org/T157414) (owner: 10Chad) [18:15:47] (03Merged) 10jenkins-bot: Start migration to Unicode sections everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378357 (https://phabricator.wikimedia.org/T152540) (owner: 10MaxSem) [18:16:12] (03CR) 10jenkins-bot: Start migration to Unicode sections everywhere [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378357 (https://phabricator.wikimedia.org/T152540) (owner: 10MaxSem) [18:17:21] (03PS1) 10Gehel: scap: manually manage the deploy_user home directory [puppet] - 10https://gerrit.wikimedia.org/r/378757 [18:18:01] (03CR) 10jerkins-bot: [V: 04-1] scap: manually manage the deploy_user home directory [puppet] - 10https://gerrit.wikimedia.org/r/378757 (owner: 10Gehel) [18:18:52] it works! [18:19:02] (03PS2) 10Gehel: scap: manually manage the deploy_user home directory [puppet] - 10https://gerrit.wikimedia.org/r/378757 [18:19:42] (03CR) 10Thcipriani: [C: 04-1] scap: manually manage the deploy_user home directory (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/378757 (owner: 10Gehel) [18:20:06] (03CR) 10Dzahn: [C: 031] "yea, puppet does not create the dir if a user already exists and then the location is changed. it would only work on brandnew installs" [puppet] - 10https://gerrit.wikimedia.org/r/378757 (owner: 10Gehel) [18:20:40] (03PS3) 10Gehel: scap: manually manage the deploy_user home directory [puppet] - 10https://gerrit.wikimedia.org/r/378757 [18:20:42] !log maxsem@tin Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/378357/ https://gerrit.wikimedia.org/r/#/c/376791/ (duration: 00m 48s) [18:20:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:21:02] okay, swat done for now [18:21:04] (03CR) 10Dzahn: [C: 031] scap: manually manage the deploy_user home directory (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/378757 (owner: 10Gehel) [18:21:26] maybe I'll conococt something else before the window closes [18:22:32] (03CR) 10Thcipriani: [C: 031] scap: manually manage the deploy_user home directory [puppet] - 10https://gerrit.wikimedia.org/r/378757 (owner: 10Gehel) [18:23:24] (03PS4) 10Nuria: Add cron to purge old mediawiki data snapshots [puppet] - 10https://gerrit.wikimedia.org/r/376640 (https://phabricator.wikimedia.org/T162034) [18:23:59] (03CR) 10Nuria: Add cron to purge old mediawiki data snapshots (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/376640 (https://phabricator.wikimedia.org/T162034) (owner: 10Nuria) [18:25:39] (03CR) 10Dzahn: "follow-up fix needed https://gerrit.wikimedia.org/r/#/c/378757/" [puppet] - 10https://gerrit.wikimedia.org/r/365891 (https://phabricator.wikimedia.org/T166013) (owner: 1020after4) [18:28:34] (03CR) 1020after4: [C: 031] scap: manually manage the deploy_user home directory [puppet] - 10https://gerrit.wikimedia.org/r/378757 (owner: 10Gehel) [18:30:33] (03CR) 10Gehel: [C: 032] scap: manually manage the deploy_user home directory [puppet] - 10https://gerrit.wikimedia.org/r/378757 (owner: 10Gehel) [18:34:25] 10Operations, 10ops-eqiad, 10Traffic, 10netops: Upgrade BIOS/RBSU/etc on lvs1007 - https://phabricator.wikimedia.org/T167299#3615765 (10BBlack) I did the NIC card bios check last week when I first found the PXE booting problem. It is enabled there. My guess is either something else in BIOS settings got c... [18:34:29] !log smalyshev@tin Started deploy [wdqs/wdqs@ecdbd0d]: Deploying Blazegraph fixes and category scripts for WDQS [18:34:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:36:54] !log smalyshev@tin Finished deploy [wdqs/wdqs@ecdbd0d]: Deploying Blazegraph fixes and category scripts for WDQS (duration: 02m 26s) [18:37:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:37:28] !log smalyshev@tin (no justification provided) [18:37:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:44:22] !log smalyshev@tin Started deploy [wdqs/wdqs@ecdbd0d]: Deploying Blazegraph fixes and category scripts for WDQS, take 3 [18:44:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:46:35] PROBLEM - Blazegraph Port on wdqs1003 is CRITICAL: connect to address 127.0.0.1 and port 9999: Connection refused [18:46:36] PROBLEM - Check systemd state on wdqs1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [18:46:48] !log smalyshev@tin Finished deploy [wdqs/wdqs@ecdbd0d]: Deploying Blazegraph fixes and category scripts for WDQS, take 3 (duration: 02m 25s) [18:47:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:47:35] RECOVERY - Blazegraph Port on wdqs1003 is OK: TCP OK - 0.000 second response time on 127.0.0.1 port 9999 [18:47:36] RECOVERY - Check systemd state on wdqs1003 is OK: OK - running: The system is fully operational [18:48:26] :) [18:49:45] (03CR) 10Chad: "Yes." [puppet] - 10https://gerrit.wikimedia.org/r/374667 (https://phabricator.wikimedia.org/T157414) (owner: 10Chad) [18:51:35] (03PS4) 10Dzahn: Gerrit: Start using plugins from scap-deployed version [puppet] - 10https://gerrit.wikimedia.org/r/374667 (https://phabricator.wikimedia.org/T157414) (owner: 10Chad) [18:52:15] PROBLEM - Host analytics1062 is DOWN: PING CRITICAL - Packet loss = 100% [18:52:59] (03CR) 10Dzahn: [C: 032] Gerrit: Start using plugins from scap-deployed version [puppet] - 10https://gerrit.wikimedia.org/r/374667 (https://phabricator.wikimedia.org/T157414) (owner: 10Chad) [18:53:17] :) [18:55:37] paladox: fail [18:55:43] oh [18:55:47] what does it fail on? [18:56:07] ah.. "Error: Could not remove existing file [18:56:10] ah [18:56:10] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate declaration: File[/var/lib/gerrit2/review_site/plugins] is already declared in file /etc/puppet/modules/gerrit/manifests/jetty.pp:252; cannot redeclare at /etc/puppet/modules/gerrit/manifests/jetty.pp:275 on node gerrit-test3.git.eqiad.wmflabs [18:56:10] Warning: Not using cache on failed catalog [18:56:10] Error: Could not retrieve catalog; skipping run [18:56:25] well that's different , why :) [18:56:33] we could do [18:56:34] rm -rf /var/lib/gerrit2/review_site/plugins [18:56:37] and then re run puppet [18:56:50] though will leave that up to chad [18:56:59] !log T160570: Upgrading restbase-dev1004.eqiad.wmnet to Cassandra 3.11.0-wmf5 (canary) [18:57:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:57:14] T160570: Cassandra 3.x Tracking - https://phabricator.wikimedia.org/T160570 [18:57:15] PROBLEM - puppet last run on cobalt is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/var/lib/gerrit2/review_site/plugins] [18:57:16] Er, yeah [18:57:18] Good point [18:57:27] ah [18:57:42] (03PS2) 10Rush: openstack: prometheus node_exporter restrict listen IP [puppet] - 10https://gerrit.wikimedia.org/r/378725 (https://phabricator.wikimedia.org/T169039) [18:58:11] paladox: we dont see the same error though.. [18:58:25] i mean it's not a big one, just remove the dir, but the part that it is ... not the same [18:58:26] * paladox just realised its due to some custom stuff [18:58:29] merge conflicts [18:58:36] ok [18:58:47] (03PS3) 10Rush: openstack: prometheus node_exporter restrict listen IP [puppet] - 10https://gerrit.wikimedia.org/r/378725 (https://phabricator.wikimedia.org/T169039) [18:58:56] no_justification: should i delete it on cobalt? you got it? [18:59:13] I got it [18:59:18] cool [18:59:32] works [18:59:36] PROBLEM - puppet last run on gerrit2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/var/lib/gerrit2/review_site/plugins] [18:59:38] Fixing on gerrit2001 first [18:59:54] Notice: /Stage[main]/Gerrit::Jetty/File[/var/lib/gerrit2/review_site/plugins]/ensure: created [18:59:56] :D [18:59:57] (03CR) 10Rush: [C: 032] openstack: prometheus node_exporter restrict listen IP [puppet] - 10https://gerrit.wikimedia.org/r/378725 (https://phabricator.wikimedia.org/T169039) (owner: 10Rush) [19:00:24] !log smalyshev@tin Started deploy [wdqs/wdqs@ecdbd0d]: Deploying Blazegraph fixes and category scripts for WDQS, take 4 [19:00:36] RECOVERY - puppet last run on gerrit2001 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [19:00:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:01:10] ah [19:01:12] lrwxrwxrwx 1 root root 37 Sep 18 18:59 plugins -> /srv/deployment/gerrit/gerrit/plugins [19:01:22] no_justification we need to add a user to puppet [19:01:29] so that it is chown as gerrit2 [19:01:39] It doesn't need to be owned by gerrit [19:01:40] Root is fine [19:02:07] oh [19:02:14] !log gerrit: restarting service, back in a moment [19:02:25] RECOVERY - puppet last run on cobalt is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [19:02:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:02:44] paladox: It's actually a security issue if gerrit could own its own program--a vulnerability could allow gerrit to change its own code. [19:02:53] So let it be owned by root, all gerrit can do is *read* the files :) [19:02:58] oh, ah i see [19:03:00] !log smalyshev@tin Finished deploy [wdqs/wdqs@ecdbd0d]: Deploying Blazegraph fixes and category scripts for WDQS, take 4 (duration: 02m 36s) [19:03:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:35] Ok, gerrit & puppet happy on 2001 & colabt [19:03:37] *cobalt [19:03:49] Both running the 2.13.9 plugins I checked in to scap-deployed repo :) [19:03:51] YAY! [19:04:24] :)) [19:04:43] :) [19:05:20] scap's a success :) [19:05:40] !log smalyshev@tin Started deploy [wdqs/wdqs@2523836]: Deploy correct prefixes conf [19:05:45] PROBLEM - puppet last run on kafka2002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_mediawiki/event-schemas] [19:05:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:06:31] paladox: yea, s.c.a.p. "succesfully copying advanced plugins" [19:06:38] :) [19:07:00] !log smalyshev@tin Finished deploy [wdqs/wdqs@2523836]: Deploy correct prefixes conf (duration: 01m 20s) [19:07:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:10:10] !log T160570: Upgrading restbase-dev100[5-6].eqiad.wmnet to Cassandra 3.11.0-wmf5 [19:10:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:10:25] T160570: Cassandra 3.x Tracking - https://phabricator.wikimedia.org/T160570 [19:14:53] (03PS1) 10Rush: openstack: prometheus node_exporter labtest [puppet] - 10https://gerrit.wikimedia.org/r/378769 (https://phabricator.wikimedia.org/T169039) [19:15:17] (03CR) 10jerkins-bot: [V: 04-1] openstack: prometheus node_exporter labtest [puppet] - 10https://gerrit.wikimedia.org/r/378769 (https://phabricator.wikimedia.org/T169039) (owner: 10Rush) [19:16:40] (03Draft1) 10Paladox: Gerrit: Use base::service for systemd and also add script [puppet] - 10https://gerrit.wikimedia.org/r/378768 [19:16:43] (03PS2) 10Paladox: Gerrit: Use base::service for systemd and also add script [puppet] - 10https://gerrit.wikimedia.org/r/378768 [19:17:09] (03CR) 10jerkins-bot: [V: 04-1] Gerrit: Use base::service for systemd and also add script [puppet] - 10https://gerrit.wikimedia.org/r/378768 (owner: 10Paladox) [19:18:01] (03CR) 10Chad: "This can land now." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/374668 (owner: 10Chad) [19:19:05] (03PS2) 10Rush: openstack: prometheus node_exporter labtest [puppet] - 10https://gerrit.wikimedia.org/r/378769 (https://phabricator.wikimedia.org/T169039) [19:19:49] (03PS3) 10Rush: openstack: prometheus node_exporter labtest [puppet] - 10https://gerrit.wikimedia.org/r/378769 (https://phabricator.wikimedia.org/T169039) [19:20:48] (03CR) 10Rush: [C: 032] openstack: prometheus node_exporter labtest [puppet] - 10https://gerrit.wikimedia.org/r/378769 (https://phabricator.wikimedia.org/T169039) (owner: 10Rush) [19:29:53] !log restarting pfw3-codfw for software upgrade [19:30:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:30:47] (03PS1) 10Madhuvishy: notebook: Add packages to enable PDF exports [puppet] - 10https://gerrit.wikimedia.org/r/378776 (https://phabricator.wikimedia.org/T159617) [19:33:25] RECOVERY - puppet last run on kafka2002 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [19:35:15] PROBLEM - Router interfaces on pfw-eqiad is CRITICAL: CRITICAL: host 208.80.154.218, interfaces up: 103, down: 1, dormant: 0, excluded: 3, unused: 0 [19:36:15] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [19:36:17] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [19:37:43] (03CR) 10Paladox: [C: 031] gerrit (2.13.8+git1-wmf.7) jessie-wikimedia; urgency=medium [debs/gerrit] - 10https://gerrit.wikimedia.org/r/374668 (owner: 10Chad) [19:38:42] (03CR) 10Dzahn: "see: < _joe_> mutante: ok, fyi, if you have only systemd, systemd::service is a better option" [puppet] - 10https://gerrit.wikimedia.org/r/378768 (owner: 10Paladox) [19:40:07] (03PS3) 10BBlack: Revert "VCL: fixed keep values: 7d def, 1d for text" [puppet] - 10https://gerrit.wikimedia.org/r/378731 [19:40:13] (03CR) 10BBlack: [V: 032 C: 032] Revert "VCL: fixed keep values: 7d def, 1d for text" [puppet] - 10https://gerrit.wikimedia.org/r/378731 (owner: 10BBlack) [19:41:55] (03PS9) 10BBlack: VCL: stabilize backend storage patterns [puppet] - 10https://gerrit.wikimedia.org/r/376751 (https://phabricator.wikimedia.org/T145661) [19:43:02] (03CR) 10Dzahn: [C: 032] gerrit (2.13.8+git1-wmf.7) jessie-wikimedia; urgency=medium [debs/gerrit] - 10https://gerrit.wikimedia.org/r/374668 (owner: 10Chad) [19:44:26] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [19:44:26] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [19:44:35] (03PS2) 10Madhuvishy: notebook: Add packages to enable PDF exports [puppet] - 10https://gerrit.wikimedia.org/r/378776 (https://phabricator.wikimedia.org/T159617) [19:46:20] (03PS3) 10Paladox: Gerrit: Use systemd::service for systemd [puppet] - 10https://gerrit.wikimedia.org/r/378768 [19:46:46] (03CR) 10jerkins-bot: [V: 04-1] Gerrit: Use systemd::service for systemd [puppet] - 10https://gerrit.wikimedia.org/r/378768 (owner: 10Paladox) [19:46:48] (03PS4) 10Paladox: Gerrit: Use systemd::service for systemd [puppet] - 10https://gerrit.wikimedia.org/r/378768 [19:47:07] (03CR) 10jerkins-bot: [V: 04-1] Gerrit: Use systemd::service for systemd [puppet] - 10https://gerrit.wikimedia.org/r/378768 (owner: 10Paladox) [19:47:39] (03PS3) 10Madhuvishy: notebook: Add packages to enable PDF exports [puppet] - 10https://gerrit.wikimedia.org/r/378776 (https://phabricator.wikimedia.org/T159617) [19:48:10] (03CR) 10Madhuvishy: [C: 032] notebook: Add packages to enable PDF exports [puppet] - 10https://gerrit.wikimedia.org/r/378776 (https://phabricator.wikimedia.org/T159617) (owner: 10Madhuvishy) [19:48:45] RECOVERY - Router interfaces on pfw-eqiad is OK: OK: host 208.80.154.218, interfaces up: 104, down: 0, dormant: 0, excluded: 3, unused: 0 [19:49:07] (03PS1) 10Chad: Follow-up Ib5a4306c, should remove plugins from install rules [debs/gerrit] - 10https://gerrit.wikimedia.org/r/378779 [19:49:14] mutante: Forgot that part ^ [19:51:21] (03CR) 10Dzahn: [C: 032] Follow-up Ib5a4306c, should remove plugins from install rules [debs/gerrit] - 10https://gerrit.wikimedia.org/r/378779 (owner: 10Chad) [19:51:35] PROBLEM - Router interfaces on pfw3-codfw is CRITICAL: CRITICAL: host 208.80.153.197, interfaces up: 68, down: 1, dormant: 0, excluded: 0, unused: 0 [19:51:41] (03PS5) 10Paladox: Gerrit: Use systemd::service for systemd [puppet] - 10https://gerrit.wikimedia.org/r/378768 [19:51:50] mutante: Can you build + upload that too? That way we'll be in sync :) [19:52:11] eh..m.. yea :) [19:52:21] 10Operations, 10ops-eqiad, 10DC-Ops: scs-c1-eqiad unresponsive - https://phabricator.wikimedia.org/T175625#3616042 (10RobH) I've sent a followup to opengear requesting a replacement scs console. [19:52:43] no_justification could you review https://gerrit.wikimedia.org/r/#/c/378768/ please? :) [19:53:08] I mean I can look at it :p [19:53:11] But I dunno systemd [19:53:14] It's a black box to me [19:55:07] paladox: one by one, we'll get to that .. first we need to build the package one last time or something :) [19:55:55] and yea, that you are using systemd::service should be right, j.oe said it's a better option if we only have systemd and not a mix [19:55:57] It'll be two last times, probably :p [19:57:07] (03CR) 10Chad: Gerrit: Use systemd::service for systemd (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/378768 (owner: 10Paladox) [20:00:05] gwicke, cscott, arlolra, subbu, bearND, halfak, and Amir1: How many deployers does it take to do Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170918T2000). [20:00:05] No patches in the queue for this window. Wheeee! [20:00:10] (03PS1) 10Dzahn: admins: add new SSH key for Zhou Zhou [puppet] - 10https://gerrit.wikimedia.org/r/378780 (https://phabricator.wikimedia.org/T175959) [20:00:30] dpkg-source: info: building gerrit in gerrit_2.13.8+git1-wmf.7.tar.gz [20:00:41] no ores [20:03:00] ok :) [20:03:13] (03CR) 10Paladox: Gerrit: Use systemd::service for systemd (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/378768 (owner: 10Paladox) [20:04:20] ah found it [20:04:20] packedGitOpenFiles [20:04:52] i am thinking about naming it git_open_files [20:06:58] (03PS6) 10Paladox: Gerrit: Use systemd::service for systemd [puppet] - 10https://gerrit.wikimedia.org/r/378768 [20:10:11] (03CR) 10Dzahn: [C: 032] "also just verified with a quick Google chat to the @wikimedia address" [puppet] - 10https://gerrit.wikimedia.org/r/378780 (https://phabricator.wikimedia.org/T175959) (owner: 10Dzahn) [20:10:27] the package did build.. 5 min.. will upload it.. [20:10:38] multi-tasking [20:11:57] (03PS1) 10Greg Grossmeier: Revert "Limit thanks for new users at pl.wikipedia to 3 per day" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378784 (https://phabricator.wikimedia.org/T176174) [20:13:14] (03PS7) 10Paladox: Gerrit: Use systemd::service for systemd [puppet] - 10https://gerrit.wikimedia.org/r/378768 [20:13:42] !log arlolra@tin Started deploy [parsoid/deploy@a01064d]: Updating Parsoid to 05a0965 [20:13:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:14:30] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to Stat1005 for zhousquared - https://phabricator.wikimedia.org/T175959#3616109 (10Dzahn) 05Open>03Resolved We just had a brief Google chat to cross-verify this, then i merged the new key. Puppet ran and replaced key on bast1001... [20:14:54] no_justification, i think i've got everything in https://gerrit.wikimedia.org/r/#/c/378768/ :) [20:14:58] i will test it now [20:15:36] !log arlolra@tin Finished deploy [parsoid/deploy@a01064d]: Updating Parsoid to 05a0965 (duration: 01m 53s) [20:15:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:16:16] hmmm [20:17:33] !log arlolra@tin (no justification provided) [20:17:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:17:46] applys successfully :) [20:17:49] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10netops: connect second ethernet interface for fundraising codfw hosts - https://phabricator.wikimedia.org/T176175#3616118 (10Jgreen) [20:17:55] note that gerrit will have access to increased resources with this [20:17:56] - packedGitOpenFiles = 6000 [20:17:56] + packedGitOpenFiles = 20000 [20:17:56] + packedGitOpenFiles = 20000 [20:18:26] is looking for ways how to avoid down and uploading the 250MB tar.gz [20:18:29] Notice: /Stage[main]/Gerrit::Jetty/File[/etc/default/gerrit]/mode: mode changed '0644' to '0440' [20:18:53] wants to add rsync from build host to apt host [20:20:32] ah, we need to link gerrit.war to scap. [20:20:36] oh, or i already did that [20:20:42] like plugins, but this time it's a file [20:20:47] yes :) can sync into "incoming" [20:22:37] ..in theory :p [20:23:59] (03Draft1) 10Paladox: Gerrit: Link gerrit.war to scap [puppet] - 10https://gerrit.wikimedia.org/r/378791 [20:24:02] akosiaris: any idea why I might be seeing? [20:24:03] ERROR:conftool:Error when trying to set/pooled=yes on service=parsoid,name=wtp1001.eqiad.wmnet [20:24:03] ERROR:conftool:Failure writing to the kvstore: Backend error: The request requires user authentication : Insufficient credentials [20:24:03] WARNING:etcd.client:etcd response did not contain a cluster ID [20:24:17] (03PS2) 10Paladox: Gerrit: Link gerrit.war to scap [puppet] - 10https://gerrit.wikimedia.org/r/378791 [20:24:33] no_justification ^^ :) [20:24:42] arlolra: try with "sudo -i" [20:25:09] Yeah, I wanted to test the plugins first :) [20:25:24] well, if you can.. but that looks like what i get when i don't use -i [20:25:27] We should be able to do core gerrit now. Then cleanup stuff like defaults file and such [20:25:55] yep [20:25:57] (03CR) 10Chad: [C: 031] "We want to symlink to both spaces, but we'll start with just this one." [puppet] - 10https://gerrit.wikimedia.org/r/378791 (owner: 10Paladox) [20:28:38] (03CR) 10Chad: Gerrit: Use systemd::service for systemd (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/378768 (owner: 10Paladox) [20:29:17] (03PS8) 10Paladox: Gerrit: Use systemd::service for systemd [puppet] - 10https://gerrit.wikimedia.org/r/378768 [20:29:21] (03CR) 10Paladox: Gerrit: Use systemd::service for systemd (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/378768 (owner: 10Paladox) [20:29:35] mutante this https://gerrit.wikimedia.org/r/378791 works locally :) [20:29:40] Notice: /Stage[main]/Gerrit::Jetty/File[/var/lib/gerrit2/gerrit.war]/ensure: ensure changed 'file' to 'link' [20:30:01] lrwxrwxrwx 1 root root 40 Sep 18 20:28 /var/lib/gerrit2/gerrit.war -> /srv/deployment/gerrit/gerrit/gerrit.war [20:31:33] mutante: I can't ssh to that node from the deployment server either [20:34:24] arlolra: wtp1001, i can ssh to it. do you want me to try pooling it from there? [20:34:32] paladox: ok :) [20:34:48] :) [20:34:50] * mutante is still copying the .tar.gz for new gerrit version [20:35:18] will open ticket about the rsync issue.. it's currently not allowed apparently [20:35:52] mutante: {"wtp1001.eqiad.wmnet": {"pooled": "yes", "weight": 15}, "tags": "dc=eqiad,cluster=parsoid,service=parsoid"} [20:36:04] i can ssh to it directly, just not from tin [20:36:10] arlolra: so you want it depooled? [20:36:19] no, i want to be able to deploy to it [20:36:30] but it's giving me that credential error [20:37:13] Error when trying to set/pooled=yes [20:37:23] but now it's pooled? [20:37:31] so you manually did it from the host itself? [20:37:58] yea, i think that's a general issue with deployment permissions, i saw something similar the other day.. looking to ifind it [20:38:43] !log krinkle@tin Started deploy [jobrunner/jobrunner@57f5f47]: No-op sync - first time scap3 - T129148 [20:38:44] i didn't go it manually. scap is trying to toggle it so it can deploy code [20:38:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:38:56] T129148: Deploy jobrunner with scap3 (Trebuchet jobrunner/jobrunner) - https://phabricator.wikimedia.org/T129148 [20:39:17] https://phabricator.wikimedia.org/T104352 [20:39:36] T104352 [20:39:37] T104352: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352 [20:41:21] Um, two things. [20:41:38] or maybe T172333 [20:41:39] T172333: Scap: keyholder Too many authentication failures - https://phabricator.wikimedia.org/T172333 [20:41:40] 1) That doesn't touch non-MW deploys yet. `scap deploy` does not speak pooling/depooling logic yet [20:41:56] 2) It's untested in production yet, which is why it's behind a config flag ;-) [20:42:06] !log krinkle@tin Finished deploy [jobrunner/jobrunner@57f5f47]: No-op sync - first time scap3 - T129148 (duration: 03m 23s) [20:42:10] T172333 much more likely [20:42:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:42:23] (if you can't ssh. pooling wouldn't touch your ability to SSH) [20:43:20] (03CR) 10Dzahn: [C: 032] Gerrit: Link gerrit.war to scap [puppet] - 10https://gerrit.wikimedia.org/r/378791 (owner: 10Paladox) [20:43:42] thanks :) [20:43:53] i am not sure what changed here. it is preventing us from deploying. [20:44:03] (03PS9) 10Paladox: Gerrit: Use systemd::service for systemd [puppet] - 10https://gerrit.wikimedia.org/r/378768 [20:44:23] subbu: maybe it's the "keyholder_key: deploy_service" line [20:44:28] subbu: You can't deploy to the rest and just depool the busted one? [20:44:47] no_justification, those were the canaries .. i think. [20:44:58] no_justification: 20:14:20 4 of 4 canary targets failed, exceeding limit [20:44:59] arlolra, good thought [20:45:01] Depooling them should remove from canary list. [20:45:07] Oh, it's multiple failures [20:45:10] Boo [20:45:13] but, if all 4 canaries fail, that is not a good sign. [20:45:32] Try again with --verbose? [20:45:36] I'm curious what it's failing on [20:45:39] Pastebin too :) [20:46:47] I think this is a deploy check that's failing. Unclear why. It's one of the pooling and depooling shell scripts that scap calls out to as part of deployment. [20:47:05] Oh, manual depooling via shell out [20:47:07] Duh [20:47:23] (forgot that was a workaround prior to me writing Pooler) [20:47:26] > ERROR:conftool:Failure writing to the kvstore: Backend error: The request requires user authentication : Insufficient credentials [20:47:58] just running the "pool" command on wtp2001.codfw.wmnet [20:48:18] whatever that does. IIRC it's a shell script wrapping a ruby script that calls out to a python script. [20:48:32] no_justification: this is what happened before https://phabricator.wikimedia.org/P6018 [20:50:54] bad idea to comment out "keyholder_key: deploy_service" from https://gerrit.wikimedia.org/r/#/c/377966/3/scap/scap.cfg [20:50:57] ? [20:51:54] 10Operations: allow rsyncing between build host and install hosts - https://phabricator.wikimedia.org/T176178#3616206 (10Dzahn) [20:52:02] thcipriani, no_justification mutante ^^ /cc akosiaris [20:53:52] comment out keyholder_key? I didn't see anything in the logs that would indicate that that is a problem. [20:54:34] was there a message about "too many authentication failures"? [20:55:04] thcipriani: greg-g: I'd like to roll out https://gerrit.wikimedia.org/r/#/c/378740/ asap to fix a UBN [20:55:09] extensions/NavigationTiming [20:56:26] anyone know what is going on with the failures then? [20:56:26] Krinkle: kk [20:56:37] thcipriani: no, just insufficient [20:56:46] Any other MW deploys on-going? I see some talk here about keyholders and scap. [20:56:51] mutante: On the upside, the git-fat-backed scap3 deployed version? 568K on disk for the git repo :D [20:57:00] Krinkle: Non-MW scap deploys [20:57:07] no_justification: And scap for MW is working fine? [20:57:14] News to me if it's not :) [20:57:17] Thanks :) [20:57:18] arlolra: subbu: I think the failure is a result of something changing in puppet on wtp1002.eqiad.wmnet or something changing inside conftool. This is a bit of a black box to me. [20:57:26] no_justification: like your new nick btw :) [20:57:32] :) [20:58:09] 10Operations: allow rsyncing between build host and install hosts - https://phabricator.wikimedia.org/T176178#3616247 (10Dzahn) ``` root@install1002:/srv/wikimedia/incoming# cat /etc/rsync.d/frag-incoming [ incoming ] path = /srv/wikimedia/incoming read only = no write only = no list = yes uid = 0 gid = 0 max... [20:58:20] 10Operations: allow rsyncing between build host and install hosts - https://phabricator.wikimedia.org/T176178#3616248 (10Dzahn) a:03Dzahn [20:58:43] arlolra: subbu what's failing is scap is ssh-ing to wtp1002 as deploy-service and running "pool" and that is failing. pool is a shell script that talks to conftool and I'm not sure how it works or why it's failing :( [20:59:05] no_justification: heh :) [20:59:11] maybe it's supposed to be "deploy-service", not "deploy_service"? [20:59:12] https://github.com/wikimedia/puppet/blob/production/modules/scap/manifests/conftool.pp [20:59:25] arlolra: subbu -- mobrovac: may be a person to ask about the "pool" and "depool" scripts [20:59:33] https://gerrit.wikimedia.org/r/#/c/377966/3/scap/scap.cfg [20:59:37] yes, deploy-service with - [20:59:39] is the username [20:59:45] ha! [20:59:56] saw that earlier when we talked about it for wdqs [21:00:04] and it wasn't creating the home dir [21:00:04] dapatrick, bawolff, and Reedy: (Dis)respected human, time to deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170918T2100). Please do the needful. [21:00:04] No patches in the queue for this window. Wheeee! [21:00:05] arlolra: that refers to /etc/keyholder.d/deploy_service.pub which is a file that exists [21:00:08] so that should be correct [21:00:21] it is the deploy-service users public key [21:00:26] *user's [21:00:51] arlolra, i guess file a high priority task and have ops / releng take a look. let us call off the deploy now. [21:01:00] also, if it were not correct we wouldn't be able to ssh to the server at all [21:01:16] so "pool" and "depool" wouldn't fail, we'd fail much earlier [21:01:35] Can double check with ssh -vvv + SSH_AUTH_SOCK [21:06:34] (03CR) 10Krinkle: "Is this something we can upstream with a minimal test case? https://github.com/brendangregg/FlameGraph" [puppet] - 10https://gerrit.wikimedia.org/r/377451 (https://phabricator.wikimedia.org/T169249) (owner: 10Aaron Schulz) [21:06:50] !log uploaded gerrit 2.13.8+git1-wmf.7 for jessie-wikimedia on APT.wm.org [21:06:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:07:30] :) [21:08:06] PROBLEM - Check systemd state on restbase1010 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:08:26] PROBLEM - Check systemd state on restbase1009 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [21:10:26] 10Operations, 10JobRunner-Service, 10Beta-Cluster-reproducible, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): jobrunner / jobchron systemd services are in error state after a stop - https://phabricator.wikimedia.org/T168044#3616275 (10thcipriani) [21:12:36] 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review, 10Technical-Debt: Sunset our use of Salt - https://phabricator.wikimedia.org/T164780#3616302 (10thcipriani) [21:16:00] (03Draft1) 10Paladox: Remove gerrit.war [debs/gerrit] - 10https://gerrit.wikimedia.org/r/378801 [21:16:02] (03PS2) 10Paladox: Drop gerrit.war [debs/gerrit] - 10https://gerrit.wikimedia.org/r/378801 [21:16:12] no_justification ^^ :) [21:16:23] !log repeating import of gerrit_2.13.8+git1-wmf.7 on install1002, followed by reprepro copy stretch-wikimedia jessie-wikimedia gerrit to make it available on stretch [21:16:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:16:38] i copied it in the wrong destination for a moment [21:16:51] heh [21:16:51] duuhh,, [21:17:06] normally cp is not that way :p [21:17:28] (03CR) 10Chad: "Can we delete debian/install entirely?" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/378801 (owner: 10Paladox) [21:17:39] paladox: test on stretch :) [21:17:40] (03PS3) 10Paladox: Drop gerrit.war [debs/gerrit] - 10https://gerrit.wikimedia.org/r/378801 [21:17:49] (03CR) 10Paladox: "> Can we delete debian/install entirely?" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/378801 (owner: 10Paladox) [21:17:56] mutante ok :) [21:18:47] !log krinkle@tin Synchronized php-1.30.0-wmf.18/extensions/NavigationTiming/modules/ext.navigationTiming.js: Unbreak - T176105 (duration: 00m 48s) [21:19:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:19:02] T176105: Navigation Timing is broken - https://phabricator.wikimedia.org/T176105 [21:19:35] PROBLEM - HHVM jobrunner on mw1259 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:19:55] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10netops: connect second ethernet interface for fundraising codfw hosts - https://phabricator.wikimedia.org/T176175#3616118 (10faidon) All of them? Wasn't the plan to only do it for the few hosts that are important SPOFs? Again, I fear that this gives a fa... [21:20:35] paladox: I'm curious if we need that final build of gerrit package [21:20:48] Nope we wont :) [21:20:49] Or if just finish dropping the stuff into puppet and then kill the package outright [21:21:00] we can kill it straight away [21:21:10] once https://gerrit.wikimedia.org/r/#/c/378768/ is merged [21:21:25] i compared the difference [21:21:46] didnt see anything else that was missing [21:21:48] mutante i have no stretch hosts to test gerrit [21:22:04] except from if i try installing on jenkins-slave-01 to see if it at least installs [21:22:38] paladox: for some reason i thought the reason i added it to stretch last time was your instance :) ok [21:22:39] Yeah, so I'm thinking just wait for 373768 to land and forget 378801 [21:23:18] 10Operations, 10Operations-Software-Development, 10Goal, 10Patch-For-Review, 10Technical-Debt: Sunset our use of Salt - https://phabricator.wikimedia.org/T164780#3616325 (10demon) [21:23:25] RECOVERY - HHVM jobrunner on mw1259 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.414 second response time [21:23:36] mutante yeh, that was i think i tryed to apt-get install it (though i did not really need it install on that host, it was to see if it was available). [21:24:00] we can start testing on stretch :) [21:24:26] yeh. [21:24:46] but we also want one that is like prod [21:24:49] so ideally 2 [21:24:52] and ok no_justification, we can erase that repo (or archive it) :) [21:25:02] I'm going to erase it [21:25:09] It uses an absurd amount of disk space [21:25:19] yeh [21:26:00] It's down to 335M cuz I just did another aggressive gc with pruned reflogs.... [21:26:06] But it balloons to 1G+ easily [21:26:10] oh [21:26:12] Silly for a bunch of old jar files :) [21:26:13] that's a large repo [21:26:33] it would have been probaly 2gb with 2.14 as it ballons in size [21:27:29] (03Abandoned) 10Paladox: Drop gerrit.war [debs/gerrit] - 10https://gerrit.wikimedia.org/r/378801 (owner: 10Paladox) [21:29:45] (03PS1) 10Krinkle: Enable jQuery 3 on metawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378802 (https://phabricator.wikimedia.org/T124742) [21:30:15] (03PS2) 10Krinkle: Enable jQuery 3 on meta.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378802 (https://phabricator.wikimedia.org/T124742) [21:30:17] (03PS1) 10Krinkle: Enable jQuery 3 on commons.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378803 [21:34:43] (03PS2) 10Krinkle: Enable logging of post-send DB updates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377412 (https://phabricator.wikimedia.org/T166199) (owner: 10Aaron Schulz) [21:34:45] (03PS3) 10Krinkle: Enable logging of post-send DB updates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377412 (https://phabricator.wikimedia.org/T166199) (owner: 10Aaron Schulz) [21:34:48] (03CR) 10Krinkle: [C: 031] Enable logging of post-send DB updates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377412 (https://phabricator.wikimedia.org/T166199) (owner: 10Aaron Schulz) [21:35:08] (03CR) 10Krinkle: [C: 031] Make it easy to set PHP ini flags with mwscript [puppet] - 10https://gerrit.wikimedia.org/r/378007 (owner: 10Aaron Schulz) [21:39:56] (03PS2) 10Krinkle: Enable jQuery 3 on commons.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/378803 (https://phabricator.wikimedia.org/T124742) [21:41:25] PROBLEM - HHVM jobrunner on mw1260 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:42:46] * bawolff is going to deploy a security patch [21:45:25] RECOVERY - HHVM jobrunner on mw1260 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.002 second response time [21:57:29] !log deployed patch T175900 [21:57:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:13:00] 10Operations, 10fundraising-tech-ops: Port fundraising stats off Ganglia - https://phabricator.wikimedia.org/T152562#3616435 (10Ejegg) [22:13:44] 10Operations, 10fundraising-tech-ops: Port fundraising stats off Ganglia - https://phabricator.wikimedia.org/T152562#2852630 (10Ejegg) [22:14:05] 10Operations, 10fundraising-tech-ops: Port fundraising stats off Ganglia - https://phabricator.wikimedia.org/T152562#2852630 (10Ejegg) For CiviCRM code: PHP client library: https://github.com/Jimdo/prometheus_client_php Metric types: https://prometheus.io/docs/concepts/metric_types/ All we're using it for is... [22:15:02] 10Operations, 10Fundraising-Backlog, 10fundraising-tech-ops: Port fundraising stats off Ganglia - https://phabricator.wikimedia.org/T152562#3616441 (10Ejegg) [22:18:50] (03PS3) 10Chad: Just include PrivateSettings.php directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376762 [22:19:44] (03PS1) 10Dzahn: aptrepo: allow rsyncing from package build hosts [puppet] - 10https://gerrit.wikimedia.org/r/378810 (https://phabricator.wikimedia.org/T176178) [22:21:16] (03CR) 10Chad: [C: 032] Just include PrivateSettings.php directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376762 (owner: 10Chad) [22:24:27] (03Merged) 10jenkins-bot: Just include PrivateSettings.php directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376762 (owner: 10Chad) [22:25:34] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10netops: connect second ethernet interface for fundraising codfw hosts - https://phabricator.wikimedia.org/T176175#3616465 (10Jgreen) >>! In T176175#3616318, @faidon wrote: > All of them? Wasn't the plan to only do it for the few hosts that are important... [22:27:23] 10Operations, 10Parsoid, 10Scap, 10Release-Engineering-Team (Backlog): Check 'depool' failed while deploying - https://phabricator.wikimedia.org/T176184#3616468 (10ssastry) p:05Triage>03High [22:27:46] 10Operations, 10Parsoid, 10Scap, 10Release-Engineering-Team (Backlog): Check 'depool' failed while deploying - https://phabricator.wikimedia.org/T176184#3616450 (10ssastry) This is blocking deployments right now. [22:27:53] (03CR) 10Krinkle: [C: 031] Moving update wikiversions to scap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367828 (owner: 10Chad) [22:29:33] (03CR) 10jenkins-bot: Just include PrivateSettings.php directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376762 (owner: 10Chad) [22:30:55] PROBLEM - Check Varnish expiry mailbox lag on cp1062 is CRITICAL: CRITICAL: expiry mailbox lag is 2014361 [22:31:58] were there any changes to abuse filter deployed recently? some filters which used to work, no longer work anymore... [22:32:00] !log demon@tin Synchronized wmf-config/CommonSettings.php: Use PrivateSettings.php directly (duration: 00m 46s) [22:32:07] 10Operations, 10Patch-For-Review: allow rsyncing between build host and install hosts - https://phabricator.wikimedia.org/T176178#3616475 (10Dzahn) p:05Triage>03Normal [22:32:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:33:03] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10netops: connect second ethernet interface for fundraising codfw hosts - https://phabricator.wikimedia.org/T176175#3616476 (10faidon) >>! In T176175#3616465, @Jgreen wrote: >> Again, I fear that this gives a false sense of redundancy > > This does not co... [22:41:46] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, 10Wikimedia-log-errors: Unable to delete file on Commons (Fatal exception of type "Wikimedia\Rdbms\DBQueryError") - https://phabricator.wikimedia.org/T176185#3616478 (10Josve05a) [22:42:02] (03CR) 10Chad: "Holy fuck this worked." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/376762 (owner: 10Chad) [22:43:47] (03PS8) 10Chad: Setup apache vhost on scap proxies as well [puppet] - 10https://gerrit.wikimedia.org/r/344221 (https://phabricator.wikimedia.org/T147938) [22:43:56] RECOVERY - MariaDB Slave Lag: s2 on dbstore1001 is OK: OK slave_sql_lag Replication lag: 89998.38 seconds [22:44:19] 10Operations, 10Parsoid, 10Scap, 10Release-Engineering-Team (Backlog): Check 'depool' failed while deploying - https://phabricator.wikimedia.org/T176184#3616450 (10thcipriani) > This is probably related to T172333 where keyholder_key: deploy_service is added. I have my doubts about this. The `keyholder_ke... [22:45:04] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, 10Wikimedia-log-errors: Unable to delete file on Commons (Fatal exception of type "Wikimedia\Rdbms\DBQueryError") - https://phabricator.wikimedia.org/T176185#3616496 (10Reedy) [22:56:28] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, 10Wikimedia-log-errors: Unable to delete file on Commons (Fatal exception of type "Wikimedia\Rdbms\DBQueryError") - https://phabricator.wikimedia.org/T176185#3616514 (10Reedy) [23:00:05] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: (Dis)respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170918T2300). Please do the needful. [23:00:05] kaldari: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break the wikis, you will be rewarded with a sticker. [23:00:15] here [23:01:02] (03CR) 10Dzahn: "ok, you are right, let's just call the template "gerrit.erb" or so" [puppet] - 10https://gerrit.wikimedia.org/r/378360 (owner: 10Dzahn) [23:02:09] I can SWAT [23:02:19] (03CR) 10Dzahn: "and ok.. let me amend so the hostname also stays flexible somehow.. while getting rid of the flexible template name" [puppet] - 10https://gerrit.wikimedia.org/r/378360 (owner: 10Dzahn) [23:05:20] (03CR) 10Chad: "I'd go with apache.conf.erb or somesuch so it's super clear what the file is for :)" [puppet] - 10https://gerrit.wikimedia.org/r/378360 (owner: 10Dzahn) [23:05:54] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, 10Wikimedia-log-errors: Unable to delete file on Commons (Fatal exception of type "Wikimedia\Rdbms\DBQueryError") - https://phabricator.wikimedia.org/T176185#3616520 (10Josve05a) > Reedy closed this task as a duplicate of Restricted Task... [23:06:01] thcipriani: I'm done with my super scary PrivateSettings swap. [23:06:02] (03PS17) 10Paladox: gerrit: Ajust scap files (DO NOT MERGE) [software/gerrit] - 10https://gerrit.wikimedia.org/r/363738 [23:06:04] You should be good to go [23:06:54] no_justification: cool, thanks :) [23:06:55] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, 10Wikimedia-log-errors: Unable to delete file on Commons (Fatal exception of type "Wikimedia\Rdbms\DBQueryError") - https://phabricator.wikimedia.org/T176185#3616478 (10Reedy) You were CC'd in... T176101 [23:07:11] kaldari: your change is on mwdebug1002, check please [23:07:17] thanks... [23:09:32] thcipriani: gimme a couple minutes for testing it.... [23:09:42] ok, np :) [23:10:37] thcipriani: OK, looks good! [23:10:52] kaldari: okie doke, going live everywhere [23:12:53] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, 10Wikimedia-log-errors: Unable to delete file on Commons (Fatal exception of type "Wikimedia\Rdbms\DBQueryError") - https://phabricator.wikimedia.org/T176185#3616531 (10Josve05a) >>! In T176185#3616521, @Reedy wrote: > You were CC'd in..... [23:13:55] !log thcipriani@tin Synchronized php-1.30.0-wmf.18/extensions/ArticleCreationWorkflow: SWAT: [[gerrit:378806|Only intercept users who can potentially create articles otherwise]] T176100 (duration: 00m 46s) [23:14:04] ^ kaldari live now [23:14:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:14:09] T176100: Logged out users get redirected to landing page when following a red link - https://phabricator.wikimedia.org/T176100 [23:14:26] (03PS18) 10Paladox: gerrit: Ajust scap files (DO NOT MERGE) [software/gerrit] - 10https://gerrit.wikimedia.org/r/363738 [23:14:29] (03PS16) 10Paladox: Gerrit: Upgrading gerrit to 2.14.4-pre (DO NOT MERGE) [software/gerrit] - 10https://gerrit.wikimedia.org/r/363734 [23:14:53] thcipriani: Thanks! [23:15:43] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, 10Wikimedia-log-errors: Unable to delete file on Commons (Fatal exception of type "Wikimedia\Rdbms\DBQueryError") - https://phabricator.wikimedia.org/T176185#3616552 (10Reedy) Because the wrong security policy was applied. Fixed [23:16:48] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, 10Wikimedia-log-errors: Unable to delete file on Commons (Fatal exception of type "Wikimedia\Rdbms\DBQueryError") - https://phabricator.wikimedia.org/T176185#3616582 (10Josve05a) >>! In T176185#3616552, @Reedy wrote: > Because the wrong... [23:28:50] !log tstarling@tin Synchronized php-1.30.0-wmf.18/includes/filerepo/file/LocalFile.php: (no justification provided) (duration: 00m 46s) [23:29:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:50:31] thcipriani: Lemme know when you're done, I wanna test something [23:51:11] no_justification: all yours, I'm clear [23:59:33] (03PS14) 10Chad: Moving update wikiversions to scap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/367828