[00:00:06] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: How many deployers does it take to do Evening SWAT (Max 6 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190108T0000). [00:00:06] No GERRIT patches in the queue for this window AFAICS. [00:01:09] no SWAT? [00:01:15] 10Operations, 10ops-eqiad, 10netops: Replace eqiad mgmt switches with EX4200s - https://phabricator.wikimedia.org/T213128 (10ayounsi) p:05Triage→03Normal [00:01:30] I'll keep on wrangling the password config patch then [00:01:49] PROBLEM - High lag on wdqs1007 is CRITICAL: 3885 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [00:05:41] PROBLEM - Check systemd state on wdqs1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [00:06:34] !log depooling wdqs1007 (something looks like DB corruption) [00:06:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:08:09] RECOVERY - Check systemd state on wdqs1007 is OK: OK - running: The system is fully operational [00:14:58] 10Operations, 10ops-eqiad: eqiad: Re-connect cage cameras - https://phabricator.wikimedia.org/T207965 (10ayounsi) I created T213128 for using EX4200 as mgmt switches. Even though it would be the best option in the long term, it comes with its own challenges. @Cmjohnson We can start by connecting "Camera Row A... [00:15:29] PROBLEM - Check systemd state on wdqs1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [00:18:37] PROBLEM - tilerator on maps1004 is CRITICAL: connect to address 10.64.48.154 and port 6534: Connection refused [00:19:09] PROBLEM - tilerator on maps1002 is CRITICAL: connect to address 10.64.16.42 and port 6534: Connection refused [00:19:25] PROBLEM - tilerator on maps1003 is CRITICAL: connect to address 10.64.32.117 and port 6534: Connection refused [00:20:02] (03PS1) 10Catrope: Enable logging for GrowthExperiments help panel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482743 (https://phabricator.wikimedia.org/T211991) [00:21:26] (03PS7) 10Gergő Tisza: Make password policy code saner [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481115 [00:22:08] !log restarting tilerator on all maps servers [00:22:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:24:53] PROBLEM - tilerator on maps2001 is CRITICAL: connect to address 10.192.0.144 and port 6534: Connection refused [00:24:53] PROBLEM - tilerator on maps2004 is CRITICAL: connect to address 10.192.48.57 and port 6534: Connection refused [00:24:53] PROBLEM - tilerator on maps2002 is CRITICAL: connect to address 10.192.16.179 and port 6534: Connection refused [00:25:11] * gehel is looking at those maps servers [00:25:35] PROBLEM - tilerator on maps2003 is CRITICAL: connect to address 10.192.32.146 and port 6534: Connection refused [00:25:43] PROBLEM - tilerator on maps1001 is CRITICAL: connect to address 10.64.0.79 and port 6534: Connection refused [00:29:12] MaxSem: are you also looking at those issues? I see you logged on maps1001 [00:29:35] gehel: nope, I was looking up weird data [00:29:55] you're not running a crazy expensive and locking query by any chance :) [00:30:04] nah [00:30:45] ok, so one more case of "I have no idea what's going on and the logs are not helping" :( [00:30:49] RECOVERY - tilerator on maps1004 is OK: HTTP OK: HTTP/1.1 200 OK - 304 bytes in 0.074 second response time [00:35:10] looks like the known T204047. It should recover on its own, no direct user impact, we might get some more lag on tile generation [00:35:11] T204047: investigate tilerator crash on maps eqiad - https://phabricator.wikimedia.org/T204047 [00:35:17] so I'll go get some sleep [00:35:37] call me if things are on fire for real! [00:36:26] ACKNOWLEDGEMENT - tilerator on maps1001 is CRITICAL: connect to address 10.64.0.79 and port 6534: Connection refused Gehel T204047 has struck again, tilerator should recover on its own at some point. [00:36:26] ACKNOWLEDGEMENT - tilerator on maps1002 is CRITICAL: connect to address 10.64.16.42 and port 6534: Connection refused Gehel T204047 has struck again, tilerator should recover on its own at some point. [00:36:26] ACKNOWLEDGEMENT - tilerator on maps1003 is CRITICAL: connect to address 10.64.32.117 and port 6534: Connection refused Gehel T204047 has struck again, tilerator should recover on its own at some point. [00:36:26] ACKNOWLEDGEMENT - tilerator on maps2001 is CRITICAL: connect to address 10.192.0.144 and port 6534: Connection refused Gehel T204047 has struck again, tilerator should recover on its own at some point. [00:36:27] ACKNOWLEDGEMENT - tilerator on maps2002 is CRITICAL: connect to address 10.192.16.179 and port 6534: Connection refused Gehel T204047 has struck again, tilerator should recover on its own at some point. [00:36:28] ACKNOWLEDGEMENT - tilerator on maps2003 is CRITICAL: connect to address 10.192.32.146 and port 6534: Connection refused Gehel T204047 has struck again, tilerator should recover on its own at some point. [00:36:29] ACKNOWLEDGEMENT - tilerator on maps2004 is CRITICAL: connect to address 10.192.48.57 and port 6534: Connection refused Gehel T204047 has struck again, tilerator should recover on its own at some point. [00:38:27] 10Operations, 10Maps (Tilerator), 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): investigate tilerator crash on maps eqiad - https://phabricator.wikimedia.org/T204047 (10Gehel) the problem just struck again. Restarting tilerator did not immediately fix the issue. It should recover once... [00:45:51] (03PS8) 10Gergő Tisza: Make password policy code saner [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481115 [00:48:13] RECOVERY - tilerator on maps2001 is OK: HTTP OK: HTTP/1.1 200 OK - 305 bytes in 0.099 second response time [00:48:13] RECOVERY - tilerator on maps2004 is OK: HTTP OK: HTTP/1.1 200 OK - 305 bytes in 0.098 second response time [00:48:13] RECOVERY - tilerator on maps2002 is OK: HTTP OK: HTTP/1.1 200 OK - 305 bytes in 0.095 second response time [00:48:53] RECOVERY - tilerator on maps2003 is OK: HTTP OK: HTTP/1.1 200 OK - 305 bytes in 0.099 second response time [00:54:41] RECOVERY - tilerator on maps1002 is OK: HTTP OK: HTTP/1.1 200 OK - 305 bytes in 0.034 second response time [00:54:57] RECOVERY - tilerator on maps1003 is OK: HTTP OK: HTTP/1.1 200 OK - 305 bytes in 0.037 second response time [00:55:09] RECOVERY - tilerator on maps1001 is OK: HTTP OK: HTTP/1.1 200 OK - 305 bytes in 0.039 second response time [00:56:24] 10Operations, 10Collection, 10OfflineContentGenerator, 10Core Platform Team Backlog (Watching / External), and 2 others: Replace OCG in collection extension with Electron - https://phabricator.wikimedia.org/T150872 (10PFWOz) What happens to this ticket with [Electron being phased out](https://phabricator.w... [01:00:33] (03PS9) 10Gergő Tisza: Make password policy code saner [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481115 [01:02:05] (03PS10) 10Gergő Tisza: Make password policy code saner [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481115 [01:20:56] (03PS11) 10Gergő Tisza: Make password policy and logging code saner [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481115 [01:26:29] (03CR) 10Gergő Tisza: "I'll sleep one more on this. I fixed all the bugs (and added fixes for some more problems I found in the old code) and I think the patch i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481115 (owner: 10Gergő Tisza) [01:34:24] (03CR) 10Gergő Tisza: "Side notes:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481115 (owner: 10Gergő Tisza) [03:08:46] 10Operations, 10Discovery-Search (Current work): Test spicerack elasticsearch module on relforge - https://phabricator.wikimedia.org/T207920 (10Mathew.onipe) [03:14:13] 10Operations, 10Wikimedia-Mailing-lists, 10Privacy: Potential privacy violations in emails on mailing lists (links posted in emails to external websites which track users) - https://phabricator.wikimedia.org/T213044 (10Aklapper) @Bawolff added (thanks!) what I should have also written before: Feel free to co... [03:18:54] !log kartik@deploy1001 Started deploy [cxserver/deploy@b669f95]: Update cxserver to d6b1d6f [03:18:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:23:55] !log kartik@deploy1001 Finished deploy [cxserver/deploy@b669f95]: Update cxserver to d6b1d6f (duration: 05m 00s) [03:23:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:34:55] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 927.52 seconds [04:06:35] PROBLEM - High lag on wdqs1008 is CRITICAL: 3667 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [04:19:45] PROBLEM - Check systemd state on wdqs1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [04:21:34] * onimisionipe is looking at wdqs1008 [04:21:55] (03CR) 10Krinkle: [C: 03+1] robots.php: Drop the special treatment for Wikipedia Zero [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482100 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [04:22:13] RECOVERY - Check systemd state on wdqs1008 is OK: OK - running: The system is fully operational [04:23:30] (03CR) 10Krinkle: Re-write mobilelanding.php to not break when we drop ZeroBanner (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482098 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [04:25:26] (03CR) 10Krinkle: "If I understand correctly, this would serve the www portal from m-dot and zero-dot as-is, which seems unexpected. I'm not sure all other a" [puppet] - 10https://gerrit.wikimedia.org/r/482492 (https://phabricator.wikimedia.org/T187716) (owner: 10MaxSem) [04:25:53] PROBLEM - Check systemd state on wdqs1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [04:26:38] !log depooling wdqs1008 - T213134 [04:26:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:26:43] T213134: wdqs1007 database corruption - https://phabricator.wikimedia.org/T213134 [04:27:12] onimisionipe: please don't depool it - it can serve read queries [04:27:36] otherwise we're left with only wdq3, which is already kinda weak [04:27:42] SMalyshev: Ok. I'll leave it [04:27:51] ACKNOWLEDGEMENT - Check systemd state on wdqs1008 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Stas Malychev https://phabricator.wikimedia.org/T213134 [04:27:51] ACKNOWLEDGEMENT - High lag on wdqs1008 is CRITICAL: 4919 ge 3600 Stas Malychev https://phabricator.wikimedia.org/T213134 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [04:28:32] SMalyshev: I can see similar pattern of database corruption [04:28:46] onimisionipe: leave wdq7 and wdq8 as is for now, I'll talk to Bryan from Blazegraph tomorrow morning, to see if we can reset them [04:29:06] read queries still seem to work, so I shut down the updater and leave them be for now [04:29:30] SMalyshev: alright! [04:34:29] (03CR) 10MaxSem: "All our vhosts by default run with `UseCanonicalName On` (set in modules/mediawiki/templates/apache/apache2.conf.erb) which makes them red" [puppet] - 10https://gerrit.wikimedia.org/r/482492 (https://phabricator.wikimedia.org/T187716) (owner: 10MaxSem) [04:44:10] What's `puppet`? [04:44:23] PROBLEM - puppet last run on lvs5003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:49:05] 10Operations, 10Gerrit, 10Traffic, 10Patch-For-Review: Enable avatars in gerrit - https://phabricator.wikimedia.org/T191183 (10Krinkle) >>! In T191183#4656943, @Tgr wrote: > [Phabricator] Conduit is [..] not very fast, so the Gerrit plugin would still have to store images somewhere. Yeah, anything that qu... [04:50:18] shreyasminocha: https://en.wikipedia.org/wiki/Puppet_(software) [04:50:55] The operations/puppet repository is the configuration for our installation of Puppet. [05:08:47] (03CR) 10Krinkle: profile::mediawiki::php: install php-tideways, php-mongodb (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/478594 (https://phabricator.wikimedia.org/T206152) (owner: 10Giuseppe Lavagetto) [05:11:23] (03CR) 10Krinkle: "tiny nit. Quite cool to see this. I know it's WiP still, but I assume there'd also be a removal somewhere, right? Is that in puppet curren" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/465411 (owner: 10Giuseppe Lavagetto) [05:11:47] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 284.78 seconds [05:15:39] RECOVERY - puppet last run on lvs5003 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [05:28:25] 10Operations, 10Gerrit, 10Traffic, 10Patch-For-Review: Enable avatars in gerrit - https://phabricator.wikimedia.org/T191183 (10demon) I'm increasingly convinced that avatars aren't worth the effort. [05:39:54] !log restarted some Blazegraph servers as precaution against corruption issues [05:39:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:22:57] PROBLEM - HHVM rendering on mwdebug1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:24:07] RECOVERY - HHVM rendering on mwdebug1001 is OK: HTTP OK: HTTP/1.1 200 OK - 74670 bytes in 6.595 second response time [06:29:49] PROBLEM - puppet last run on analytics1048 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/spark2_yarn_shuffle_jar_install] [06:31:45] PROBLEM - puppet last run on ms-be1027 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/smartmontools/run.d/20logger] [06:32:01] PROBLEM - puppet last run on labstore1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/home/jgreen] [06:33:09] PROBLEM - puppet last run on an-worker1084 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/field.sh] [06:33:41] PROBLEM - puppet last run on analytics1071 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/R/update-library.R] [06:47:13] 10Operations, 10CirrusSearch, 10Discovery-Search, 10serviceops: Find an alternative to HHVM curl connection pooling for PHP 7 - https://phabricator.wikimedia.org/T210717 (10Joe) >>! In T210717#4860057, @EBernhardson wrote: > Even more generally, it we install a reverse proxy for local TLS connection poolin... [06:48:07] 10Operations, 10CirrusSearch, 10Discovery-Search, 10serviceops: Find an alternative to HHVM curl connection pooling for PHP 7 - https://phabricator.wikimedia.org/T210717 (10Joe) I think I have a decent idea of how to implement a basic version of what we want via nginx. I'll work on it this week hopefully. [06:55:53] RECOVERY - puppet last run on analytics1048 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:57:47] RECOVERY - puppet last run on ms-be1027 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:07] RECOVERY - puppet last run on labstore1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:58:30] 10Operations, 10Gerrit, 10Traffic, 10Patch-For-Review: Enable avatars in gerrit - https://phabricator.wikimedia.org/T191183 (10Tgr) Well the effort was inflated by dubious privacy requirements :) I still think the best course of action would be gravatar with a proxy. [06:59:15] RECOVERY - puppet last run on an-worker1084 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [06:59:37] Krinkle: ok, thanks. thought it was something WM-specific [06:59:43] RECOVERY - puppet last run on analytics1071 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:23:02] (03PS2) 10Elukey: Decommission analytics10[39-41] from Hadoop Analytics [puppet] - 10https://gerrit.wikimedia.org/r/482649 (https://phabricator.wikimedia.org/T209929) [07:25:53] (03CR) 10Elukey: [C: 03+2] Decommission analytics10[39-41] from Hadoop Analytics [puppet] - 10https://gerrit.wikimedia.org/r/482649 (https://phabricator.wikimedia.org/T209929) (owner: 10Elukey) [07:32:17] (03PS4) 10Elukey: Update analytics eventlogging_to_druid_job.pp to mirror changes in scala job [puppet] - 10https://gerrit.wikimedia.org/r/479847 (https://phabricator.wikimedia.org/T210099) (owner: 10Mforns) [07:32:41] PROBLEM - Hadoop NodeManager on analytics1040 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [07:32:47] PROBLEM - Hadoop NodeManager on analytics1041 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [07:32:59] this is me --^ [07:33:23] PROBLEM - Hadoop NodeManager on analytics1039 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [07:34:10] forgot to silence those, they are in decom phase and they don't want to let go [07:42:31] (03CR) 10Krinkle: "Thanks" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482722 (https://phabricator.wikimedia.org/T204183) (owner: 10Ppchelko) [07:45:16] (03CR) 10Giuseppe Lavagetto: "LGTM, minor nit inline." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/479131 (owner: 10Dzahn) [07:46:30] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "Seems reasonable." [puppet] - 10https://gerrit.wikimedia.org/r/481932 (owner: 10Paladox) [07:47:50] (03CR) 10Elukey: [C: 03+2] Update analytics eventlogging_to_druid_job.pp to mirror changes in scala job [puppet] - 10https://gerrit.wikimedia.org/r/479847 (https://phabricator.wikimedia.org/T210099) (owner: 10Mforns) [07:54:42] (03PS1) 10Elukey: Remove decommed nodes from Analytics Hadoop's net topology [puppet] - 10https://gerrit.wikimedia.org/r/482767 (https://phabricator.wikimedia.org/T209929) [08:16:51] 10Operations, 10CirrusSearch, 10Discovery-Search, 10serviceops: Find an alternative to HHVM curl connection pooling for PHP 7 - https://phabricator.wikimedia.org/T210717 (10dcausse) Just one thought as I discovered this last week. A non negligible time spent by curl is by reading `/etc/ssl/certs/ca-certifi... [08:19:47] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [08:20:47] spike at 8:14 ^ [08:24:12] <_joe_> hi akosiaris [08:24:29] hey [08:25:53] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [08:26:22] 10Operations, 10CirrusSearch, 10Discovery-Search, 10serviceops: Find an alternative to HHVM curl connection pooling for PHP 7 - https://phabricator.wikimedia.org/T210717 (10Joe) >>! In T210717#4861725, @dcausse wrote: > I think this explain why we've seen a +15ms when I broke connection pooling (T212768).... [08:28:49] (03CR) 10Alexandros Kosiaris: [C: 03+1] wmcs: Add postgres maps users for eqiad1-r region [puppet] - 10https://gerrit.wikimedia.org/r/481341 (https://phabricator.wikimedia.org/T212596) (owner: 10BryanDavis) [08:30:50] 10Operations, 10Wikimedia-Mailing-lists: Adminship of MediaWiki-India Mailing List - https://phabricator.wikimedia.org/T212957 (10akosiaris) p:05Triage→03Normal [08:34:27] 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-Cache, 10Language-Team (Language-2019-January-March), and 5 others: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10Pginer-WMF) [08:35:02] 10Operations, 10Wikimedia-Mailing-lists: Adminship of MediaWiki-India Mailing List - https://phabricator.wikimedia.org/T212957 (10akosiaris) @yuvipanda Unless you object, I think we can transfer ownership of this list. [08:38:11] (03CR) 10Alexandros Kosiaris: [C: 03+2] toolforge: Add missing php packages [puppet] - 10https://gerrit.wikimedia.org/r/482481 (owner: 10BryanDavis) [08:38:18] (03PS3) 10Alexandros Kosiaris: toolforge: Add missing php packages [puppet] - 10https://gerrit.wikimedia.org/r/482481 (owner: 10BryanDavis) [08:38:50] (03CR) 10Alexandros Kosiaris: [C: 03+2] toolforge: remove duplicate stretch python packages [puppet] - 10https://gerrit.wikimedia.org/r/482688 (owner: 10BryanDavis) [08:39:39] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [08:39:53] (03PS2) 10Alexandros Kosiaris: toolforge: remove duplicate stretch python packages [puppet] - 10https://gerrit.wikimedia.org/r/482688 (owner: 10BryanDavis) [08:42:05] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [08:45:12] (03CR) 10Alexandros Kosiaris: [C: 03+2] Add fu-berlin.de networks to our poolcounter whitelist [puppet] - 10https://gerrit.wikimedia.org/r/482683 (https://phabricator.wikimedia.org/T210103) (owner: 10Awight) [08:45:36] (03PS4) 10Alexandros Kosiaris: Add fu-berlin.de networks to our poolcounter whitelist [puppet] - 10https://gerrit.wikimedia.org/r/482683 (https://phabricator.wikimedia.org/T210103) (owner: 10Awight) [08:58:47] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Minor comment, rest LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482268 (owner: 10Giuseppe Lavagetto) [09:01:08] (03CR) 10Alexandros Kosiaris: [C: 03+1] "I like the fact this is disjoint from the discovery URLS, so we are still keeping videoscaler.discovery.wmnet in case we end up needing it" [puppet] - 10https://gerrit.wikimedia.org/r/482269 (owner: 10Giuseppe Lavagetto) [09:11:59] !log mobrovac@deploy1001 Started deploy [cpjobqueue/deploy@f91cf04]: Increase the concurrency of categoryMembershipJob - T192691 [09:12:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:12:02] T192691: [Commons] A new image added to a category is not shown in Watchlist - https://phabricator.wikimedia.org/T192691 [09:12:58] !log mobrovac@deploy1001 Finished deploy [cpjobqueue/deploy@f91cf04]: Increase the concurrency of categoryMembershipJob - T192691 (duration: 00m 59s) [09:13:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:39] !log gerrit: resaved configuration for All-Projects by changing "Max Reviewers" from 3 to 4. Might enable adding reviewers automatically based on git blame. See task for config diff # T101131 [09:19:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:42] T101131: Enable Gerrit reviewers-by-blame plugin - https://phabricator.wikimedia.org/T101131 [09:21:20] !log akosiaris@deploy1001 scap-helm zotero install --name production2 -f zotero-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad] [09:21:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:21:22] !log akosiaris@deploy1001 scap-helm zotero cluster eqiad completed [09:21:22] !log akosiaris@deploy1001 scap-helm zotero finished [09:21:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:21:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:42] !log stop replication on db1124:s5 T213108 [09:22:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:45] T213108: db1082 power loss resulted on mysql crash - https://phabricator.wikimedia.org/T213108 [09:25:25] (03CR) 10Filippo Giunchedi: "Thanks for taking a look!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482681 (https://phabricator.wikimedia.org/T213052) (owner: 10Volans) [09:25:42] 10Operations, 10DBA, 10Data-Services: db1082 power loss resulted on mysql crash - https://phabricator.wikimedia.org/T213108 (10jcrespo) db1124:s5 stopped at db1082-bin.002490:667685191 ` root@db1124[(none)]> show global variables like '%gtid%'; +------------------------+------------------------------... [09:26:23] !log akosiaris@deploy1001 scap-helm zotero install --name production2 -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw] [09:26:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:26:25] !log akosiaris@deploy1001 scap-helm zotero cluster codfw completed [09:26:25] !log akosiaris@deploy1001 scap-helm zotero finished [09:26:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:26:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:33:23] (03CR) 10Volans: icinga: fix URLs to dashboard links (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482681 (https://phabricator.wikimedia.org/T213052) (owner: 10Volans) [09:33:54] (03PS1) 10Alexandros Kosiaris: Add the zotero.svc.$::site.wmnet LVS IP to kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/482772 [09:39:00] (03CR) 10Gehel: [C: 03+1] phabricator: add phabricator module (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/482018 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [09:39:40] !log reset user email for Zergiorubio [09:39:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:39:42] (03CR) 10Volans: [C: 03+2] phabricator: add phabricator module [software/spicerack] - 10https://gerrit.wikimedia.org/r/482018 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [09:39:50] (03PS9) 10Volans: phabricator: add phabricator module [software/spicerack] - 10https://gerrit.wikimedia.org/r/482018 (https://phabricator.wikimedia.org/T205884) [09:41:41] (03CR) 10Filippo Giunchedi: [C: 03+1] icinga: fix URLs to dashboard links (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482681 (https://phabricator.wikimedia.org/T213052) (owner: 10Volans) [09:46:42] (03CR) 10Filippo Giunchedi: [C: 03+1] icinga: fix URLs to dashboard links (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482681 (https://phabricator.wikimedia.org/T213052) (owner: 10Volans) [09:47:29] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 53, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [09:47:33] (03CR) 10Alexandros Kosiaris: [C: 03+2] Add the zotero.svc.$::site.wmnet LVS IP to kubernetes [puppet] - 10https://gerrit.wikimedia.org/r/482772 (owner: 10Alexandros Kosiaris) [09:47:53] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [09:48:05] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 229, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [09:55:15] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=eqiad&var-cache_type=All&var-status_type=5 [09:55:46] (03CR) 10Muehlenhoff: [C: 03+1] Add krinkle to contint-docker group [puppet] - 10https://gerrit.wikimedia.org/r/482483 (https://phabricator.wikimedia.org/T213015) (owner: 10Krinkle) [09:56:13] (03CR) 10Volans: [C: 03+2] phabricator: add phabricator module [software/spicerack] - 10https://gerrit.wikimedia.org/r/482018 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [09:56:40] (03PS2) 10Volans: icinga: fix URLs to dashboard links [puppet] - 10https://gerrit.wikimedia.org/r/482681 (https://phabricator.wikimedia.org/T213052) [09:56:52] (03CR) 10Volans: icinga: fix URLs to dashboard links (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482681 (https://phabricator.wikimedia.org/T213052) (owner: 10Volans) [09:58:28] (03PS1) 10Hashar: Update plugins reviewers-by-blame to stable-2.15 [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/482775 (https://phabricator.wikimedia.org/T101131) [09:58:49] (03CR) 10Volans: "Compiler results:" [puppet] - 10https://gerrit.wikimedia.org/r/482681 (https://phabricator.wikimedia.org/T213052) (owner: 10Volans) [09:59:29] (03CR) 10Hashar: "Our plugin version does not work on 2.15 since it uses reviewdb instead of notedb." [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/482775 (https://phabricator.wikimedia.org/T101131) (owner: 10Hashar) [10:01:55] (03CR) 10Filippo Giunchedi: [C: 03+1] "Nice! LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/482681 (https://phabricator.wikimedia.org/T213052) (owner: 10Volans) [10:01:56] (03Merged) 10jenkins-bot: phabricator: add phabricator module [software/spicerack] - 10https://gerrit.wikimedia.org/r/482018 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [10:02:47] (03CR) 10Volans: [C: 03+2] icinga: fix URLs to dashboard links [puppet] - 10https://gerrit.wikimedia.org/r/482681 (https://phabricator.wikimedia.org/T213052) (owner: 10Volans) [10:02:56] (03CR) 10jenkins-bot: phabricator: add phabricator module [software/spicerack] - 10https://gerrit.wikimedia.org/r/482018 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [10:03:38] godog: ^^^ merged, I'll keep an eye on potential puppet failures, ping me if I miss any ;) [10:03:57] volans: sweet, thanks! [10:04:43] ~1h to full resolution in Icinga ofc ;) [10:05:21] (03PS4) 10Volans: debmonitor: add debmonitor module [software/spicerack] - 10https://gerrit.wikimedia.org/r/482299 (https://phabricator.wikimedia.org/T205884) [10:06:10] 10Operations, 10Performance-Team (Radar), 10User-Elukey: Upgrade memcached for Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10MoritzMuehlenhoff) When backporting this to jessie, we'll need to carefully review the systemd hardening options used in the memcached systemd unit, some of the... [10:18:55] (03CR) 10Volans: [C: 03+2] debmonitor: add debmonitor module (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/482299 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [10:20:32] (03CR) 10Volans: [C: 04-2] "Keeping it around for now if we discover it's needed, but clearly marking it as not-to-merge for now with a -2." (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/481858 (https://phabricator.wikimedia.org/T212783) (owner: 10Volans) [10:24:22] (03Merged) 10jenkins-bot: debmonitor: add debmonitor module [software/spicerack] - 10https://gerrit.wikimedia.org/r/482299 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [10:25:22] (03CR) 10jenkins-bot: debmonitor: add debmonitor module [software/spicerack] - 10https://gerrit.wikimedia.org/r/482299 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [10:25:44] !log executing schema change on db1062 - T85757 [10:25:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:25:47] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [10:26:26] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 55, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [10:34:16] !log elastic@eqiad setting crosscluster conf on production search cluster (T213150) [10:34:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:34:19] T213150: Configure elasticsearch crosscluster on production search servers - https://phabricator.wikimedia.org/T213150 [10:36:38] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 53, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [10:39:35] 10Operations, 10Scap, 10serviceops, 10Goal: SRE FY2019 Q3:TEC6: First steps towards Canary Deployments - https://phabricator.wikimedia.org/T213156 (10jijiki) p:05Triage→03Normal [10:39:43] (03PS2) 10Fsero: Initial docker::registry::ha puppetization. [puppet] - 10https://gerrit.wikimedia.org/r/482675 (https://phabricator.wikimedia.org/T210076) [10:39:51] 10Operations, 10Scap, 10serviceops, 10Goal, 10User-jijiki: SRE FY2019 Q3:TEC6: First steps towards Canary Deployments - https://phabricator.wikimedia.org/T213156 (10jijiki) [10:40:16] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 55, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [10:40:22] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 231, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [10:40:30] (03CR) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) (owner: 10Elukey) [10:40:33] (03CR) 10jerkins-bot: [V: 04-1] Initial docker::registry::ha puppetization. [puppet] - 10https://gerrit.wikimedia.org/r/482675 (https://phabricator.wikimedia.org/T210076) (owner: 10Fsero) [10:45:08] 10Operations, 10Wikimedia-Logstash: Increase utilization of application logging pipeline (FY2018-2019 Q3 TEC6) - https://phabricator.wikimedia.org/T213157 (10fgiunchedi) [10:45:21] (03CR) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) (owner: 10Elukey) [10:46:56] 10Operations, 10Thumbor, 10Wikimedia-Logstash, 10serviceops, 10User-jijiki: Stream Thumbor logs to logstash - https://phabricator.wikimedia.org/T212946 (10jijiki) @fgiunchedi @herron I agree, it is a good opportunity to move it to the new infrastructure. Since we will be upgrading Thumbor servers to Stre... [10:47:40] 10Operations, 10monitoring: Upgrade metrics monitoring infrastructure core components (FY2018-2019 Q3 TEC6) - https://phabricator.wikimedia.org/T213158 (10fgiunchedi) [10:48:08] (03PS3) 10Fsero: Initial docker::registry::ha puppetization. [puppet] - 10https://gerrit.wikimedia.org/r/482675 (https://phabricator.wikimedia.org/T210076) [10:48:54] !log installing libseccomp updates from stretch point release [10:48:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:53:27] (03PS20) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) [10:54:56] (03PS21) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) [10:56:11] (03PS22) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) [10:57:22] (03PS23) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) [11:03:09] (03CR) 10Elukey: "pcc: https://puppet-compiler.wmflabs.org/compiler1001/14193/an-master1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) (owner: 10Elukey) [11:04:44] !log rebooting mw1261 [11:04:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:05:26] (03PS4) 10Fsero: Initial docker::registry::ha puppetization. [puppet] - 10https://gerrit.wikimedia.org/r/482675 (https://phabricator.wikimedia.org/T210076) [11:07:29] !log stoping and restarting db1102 (s5, s4) for upgrade [11:07:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:10:13] (03PS24) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) [11:10:37] (03PS5) 10Fsero: Initial docker::registry::ha puppetization. [puppet] - 10https://gerrit.wikimedia.org/r/482675 (https://phabricator.wikimedia.org/T210076) [11:11:26] (03CR) 10jerkins-bot: [V: 04-1] Initial docker::registry::ha puppetization. [puppet] - 10https://gerrit.wikimedia.org/r/482675 (https://phabricator.wikimedia.org/T210076) (owner: 10Fsero) [11:13:14] (03PS6) 10Fsero: Initial docker::registry::ha puppetization. [puppet] - 10https://gerrit.wikimedia.org/r/482675 (https://phabricator.wikimedia.org/T210076) [11:16:42] (03PS25) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) [11:17:57] 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, and 3 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki) [11:17:59] 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, 10User-jijiki: Investigate systemd hardening to replace Firejail for Thumbor - https://phabricator.wikimedia.org/T212941 (10jijiki) [11:18:47] 10Operations, 10Thumbor, 10serviceops, 10Patch-For-Review, and 3 others: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817 (10jijiki) [11:20:13] PROBLEM - LVS HTTP IPv4 on zotero.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:20:37] good ol zotero [11:20:39] <_joe_> akosiaris: that's you right? [11:20:44] expected? [11:20:57] (03PS26) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) [11:21:01] (03PS7) 10Fsero: Initial docker::registry::ha puppetization. [puppet] - 10https://gerrit.wikimedia.org/r/482675 (https://phabricator.wikimedia.org/T210076) [11:21:25] (03CR) 10jerkins-bot: [V: 04-1] [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) (owner: 10Elukey) [11:22:05] <_joe_> vgutierrez: I would assume it's not, lemme see [11:22:14] if it was expected it should have been downtimed [11:22:18] so i guess not [11:23:22] <_joe_> so zotero is now pointing to kubernetes in eqiad [11:23:38] (03Abandoned) 10Hashar: contint: instances are fully on eqiad1-r [puppet] - 10https://gerrit.wikimedia.org/r/480769 (https://phabricator.wikimedia.org/T210288) (owner: 10Hashar) [11:23:42] <_joe_> and there it's up [11:23:47] <_joe_> in codfw, it's down [11:24:19] <_joe_> ok, let's depool zotero in codfw for now [11:24:58] !log oblivian@puppetmaster1001 conftool action : set/pooled=false; selector: dnsdisc=zotero,name=codfw [11:24:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:25:05] (03PS27) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) [11:25:24] <_joe_> fsero: can you take a look at the zotero deployments in eqiad and codfw? [11:25:34] yep looking into it [11:25:43] <_joe_> I think it's just a wrong IP or something [11:27:15] there are two deploys zotero-production and zotero-production2 [11:27:30] both seems running [11:27:37] that's me [11:27:51] ignore it I am turning over the old zotero to the kubernetes installation [11:28:06] but yes, I should have downtimed it [11:28:08] <_joe_> akosiaris: so it's not called by citoid? [11:28:13] no it's not [11:28:24] <_joe_> anyways, lemme know when should I repool codfw :) [11:28:28] <_joe_> eqiad looks fine btw [11:28:32] citoid currently calls zoterov2.scv.site.wmnet [11:28:38] <_joe_> ok [11:28:47] _joe_: feel free to repool it [11:28:54] (03PS1) 10Fdans: Changes wording in uniques dump link to reflect project families [puppet] - 10https://gerrit.wikimedia.org/r/482787 (https://phabricator.wikimedia.org/T168477) [11:29:18] !log oblivian@puppetmaster1001 conftool action : set/pooled=true; selector: dnsdisc=zotero,name=codfw [11:29:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:30:05] (03PS28) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) [11:30:33] (03CR) 10jerkins-bot: [V: 04-1] [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) (owner: 10Elukey) [11:32:15] (03PS29) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) [11:32:55] (03CR) 10Nuria: [C: 03+1] Changes wording in uniques dump link to reflect project families (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482787 (https://phabricator.wikimedia.org/T168477) (owner: 10Fdans) [11:33:03] !log mobrovac@deploy1001 Started restart [electron-render/deploy@94d27d7]: Electron strugling, restart - T213154 [11:33:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:33:06] T213154: /api/rest_v1/page/pdf/* service unstable - https://phabricator.wikimedia.org/T213154 [11:35:06] !log akosiaris@deploy1001 scap-helm zotero upgrade production -f zoterov2-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad] [11:35:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:07] !log akosiaris@deploy1001 scap-helm zotero cluster eqiad completed [11:35:08] !log akosiaris@deploy1001 scap-helm zotero finished [11:35:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:35:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:36:05] !log akosiaris@deploy1001 scap-helm zotero upgrade production -f zoterov2-values-eqiad.yaml stable/zotero [namespace: zotero, clusters: eqiad] [11:36:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:36:06] !log akosiaris@deploy1001 scap-helm zotero cluster eqiad completed [11:36:06] !log akosiaris@deploy1001 scap-helm zotero finished [11:36:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:36:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:50] (03PS2) 10Nuria: Changes wording in uniques dump link to reflect project families [puppet] - 10https://gerrit.wikimedia.org/r/482787 (https://phabricator.wikimedia.org/T168477) (owner: 10Fdans) [11:38:42] (03CR) 10Elukey: "With this last change the pcc looks strange:" [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) (owner: 10Elukey) [11:38:53] (03PS1) 10Volans: CHANGELOG: add changelogs for release v0.0.11 [software/spicerack] - 10https://gerrit.wikimedia.org/r/482788 [11:40:10] * mobrovac will take over deploy1001 for 5 mins to get https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/482722/ out [11:40:51] (03PS2) 10Mobrovac: Increase MW -> EventBus service HTTP request timeout. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482722 (https://phabricator.wikimedia.org/T204183) (owner: 10Ppchelko) [11:40:54] PROBLEM - PyBal backends health check on lvs2003 is CRITICAL: PYBAL CRITICAL - CRITICAL - zotero_1969: Servers kubernetes2001.codfw.wmnet, kubernetes2003.codfw.wmnet are marked down but pooled [11:42:02] (03PS2) 10Volans: CHANGELOG: add changelogs for release v0.0.11 [software/spicerack] - 10https://gerrit.wikimedia.org/r/482788 [11:42:33] (03CR) 10Mobrovac: [C: 03+2] Increase MW -> EventBus service HTTP request timeout. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482722 (https://phabricator.wikimedia.org/T204183) (owner: 10Ppchelko) [11:43:39] (03Merged) 10jenkins-bot: Increase MW -> EventBus service HTTP request timeout. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482722 (https://phabricator.wikimedia.org/T204183) (owner: 10Ppchelko) [11:44:30] 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review: base::monitoring::host's alarm dashboard links are broken - https://phabricator.wikimedia.org/T213052 (10Volans) 05Open→03Resolved It should be resolved with the above patch, feel free to reopen if not. [11:44:45] (03CR) 10jenkins-bot: Increase MW -> EventBus service HTTP request timeout. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482722 (https://phabricator.wikimedia.org/T204183) (owner: 10Ppchelko) [11:46:06] !log mobrovac@deploy1001 Synchronized wmf-config/CommonSettings.php: Increase time out on the MW side to 60s - T204183 (duration: 00m 51s) [11:46:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:46:11] T204183: EventBus error "Unable to deliver all events: (curl error: 28) Timeout was reached" - https://phabricator.wikimedia.org/T204183 [11:46:20] * mobrovac is done [11:47:11] (03PS1) 10Fsero: Adding docker_registry_ha secrets for PCC [labs/private] - 10https://gerrit.wikimedia.org/r/482789 [11:48:27] ^ is a small change anyone want to review it? [11:48:41] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v0.0.11 [software/spicerack] - 10https://gerrit.wikimedia.org/r/482788 (owner: 10Volans) [11:49:13] (03CR) 10Filippo Giunchedi: [C: 03+1] Adding docker_registry_ha secrets for PCC [labs/private] - 10https://gerrit.wikimedia.org/r/482789 (owner: 10Fsero) [11:49:16] !log mobrovac@deploy1001 Started deploy [restbase/deploy@503b29c] (dev-cluster): Add test-commons and nap.wikisource [11:49:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:49:24] fsero: done, though usually labs/private is ok to self-merge IMO [11:49:33] fsero: usually no need for review for the labs/private repo if you're just replicating prod hiera [11:49:55] fsero: pro-tip: you need to both C+2, V+2 and submit to merge as there is no CI in that repo ;) [11:50:06] (03CR) 10Fsero: [V: 03+2 C: 03+2] "thanks Filippo!" [labs/private] - 10https://gerrit.wikimedia.org/r/482789 (owner: 10Fsero) [11:50:34] (03PS1) 10Giuseppe Lavagetto: systemd: introduce timer::job define [puppet] - 10https://gerrit.wikimedia.org/r/482790 [11:50:36] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::webserver: inline mediawiki::conftool [puppet] - 10https://gerrit.wikimedia.org/r/482791 [11:50:38] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::maintenance: systemd-timer based periodic jobs [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) [11:50:40] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::maintenance: migrate tor job to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/482793 (https://phabricator.wikimedia.org/T211250) [11:50:55] 10Operations, 10ops-eqiad, 10RESTBase, 10RESTBase-Cassandra, and 3 others: Memory error on restbase1016 - https://phabricator.wikimedia.org/T212418 (10mobrovac) ping @Cmjohnson [11:51:01] (03CR) 10jerkins-bot: [V: 04-1] systemd: introduce timer::job define [puppet] - 10https://gerrit.wikimedia.org/r/482790 (owner: 10Giuseppe Lavagetto) [11:53:43] ACKNOWLEDGEMENT - PyBal backends health check on lvs2003 is CRITICAL: PYBAL CRITICAL - CRITICAL - zotero_1969: Servers kubernetes2001.codfw.wmnet, kubernetes2003.codfw.wmnet are marked down but pooled alexandros kosiaris known, looking into it [11:53:49] ACKNOWLEDGEMENT - LVS HTTP IPv4 on zotero.svc.codfw.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds alexandros kosiaris known, looking into it [11:54:09] (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.11 [software/spicerack] - 10https://gerrit.wikimedia.org/r/482788 (owner: 10Volans) [11:55:12] (03CR) 10jenkins-bot: CHANGELOG: add changelogs for release v0.0.11 [software/spicerack] - 10https://gerrit.wikimedia.org/r/482788 (owner: 10Volans) [11:56:21] (03PS3) 10Fdans: Changes wording in uniques dump link to reflect project families [puppet] - 10https://gerrit.wikimedia.org/r/482787 (https://phabricator.wikimedia.org/T168477) [11:56:30] (03CR) 10Fdans: Changes wording in uniques dump link to reflect project families (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482787 (https://phabricator.wikimedia.org/T168477) (owner: 10Fdans) [11:56:34] (03PS1) 10Fsero: forgot registry shared_secret secret [labs/private] - 10https://gerrit.wikimedia.org/r/482794 [11:58:26] (03CR) 10Fsero: [V: 03+2 C: 03+2] forgot registry shared_secret secret [labs/private] - 10https://gerrit.wikimedia.org/r/482794 (owner: 10Fsero) [11:59:48] (03PS2) 10Giuseppe Lavagetto: jobrunner: support php7 [puppet] - 10https://gerrit.wikimedia.org/r/481866 [12:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for European Mid-day SWAT(Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190108T1200). [12:00:04] dcausse and Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [12:00:30] o/ [12:00:33] I can swat today [12:00:41] dcausse: go ahead while I get ready [12:00:45] sure [12:01:21] * Urbanecm waves [12:01:26] (03CR) 10Effie Mouzeli: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/14205/mwmaint1002.eqiad.wmnet/ Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/470877 (https://phabricator.wikimedia.org/T150375) (owner: 10Thifranc) [12:01:54] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@503b29c] (dev-cluster): Add test-commons and nap.wikisource (duration: 12m 38s) [12:01:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:01:59] (03CR) 10Effie Mouzeli: [C: 03+2] puppet:Reduce cronspam from modules/mediawiki/ [puppet] - 10https://gerrit.wikimedia.org/r/470877 (https://phabricator.wikimedia.org/T150375) (owner: 10Thifranc) [12:02:02] (03CR) 10DCausse: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482082 (https://phabricator.wikimedia.org/T212768) (owner: 10DCausse) [12:02:12] (03PS7) 10Effie Mouzeli: puppet:Reduce cronspam from modules/mediawiki/ [puppet] - 10https://gerrit.wikimedia.org/r/470877 (https://phabricator.wikimedia.org/T150375) (owner: 10Thifranc) [12:03:13] (03Merged) 10jenkins-bot: [cirrus] re-enable HHVM connection pooling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482082 (https://phabricator.wikimedia.org/T212768) (owner: 10DCausse) [12:05:27] Urbanecm: there's and extra G here? https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/482586 [12:05:32] looking [12:05:38] (according to ZoranZoki) [12:06:26] Zoran's right, fixing [12:06:34] cool [12:07:38] (03PS2) 10Urbanecm: New throttle rule for University of Southern California editathon and clean old rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482586 (https://phabricator.wikimedia.org/T212917) [12:07:42] zeljkof, ^^^^ that's it ^^^^ [12:07:42] !log dcausse@deploy1001 Synchronized wmf-config/CirrusSearch-production.php: T212768 [cirrus] re-enable HHVM connection pooling (duration: 00m 45s) [12:07:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:47] T212768: Completion suggester: TP50 increased from 9ms to 24ms - https://phabricator.wikimedia.org/T212768 [12:07:48] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/14207/mw1300.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/481866 (owner: 10Giuseppe Lavagetto) [12:07:49] (03CR) 10jerkins-bot: [V: 04-1] New throttle rule for University of Southern California editathon and clean old rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482586 (https://phabricator.wikimedia.org/T212917) (owner: 10Urbanecm) [12:08:10] ehh, merge conflict, fixing [12:08:16] 10Operations, 10Patch-For-Review: Tracking and Reducing cron-spam to root@ - https://phabricator.wikimedia.org/T132324 (10jijiki) [12:08:20] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Maintenance-scripts, and 3 others: cronspam cleanup: Cron /usr/local/bin/foreachwiki maintenance/cleanupUploadStash.php > /dev/null - https://phabricator.wikimedia.org/T150375 (10jijiki) 05Open→03Resolved a:03jijiki... [12:08:35] zeljkof: I'm done [12:08:49] dcausse: great! I'll continue with swat [12:09:15] (03PS8) 10Fsero: Initial docker::registry::ha puppetization. [puppet] - 10https://gerrit.wikimedia.org/r/482675 (https://phabricator.wikimedia.org/T210076) [12:09:35] (03PS3) 10Urbanecm: New throttle rule for University of Southern California editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482586 (https://phabricator.wikimedia.org/T212917) [12:09:36] ^^ finally done ^^ [12:10:29] Urbanecm: :) [12:10:48] (03CR) 10jenkins-bot: [cirrus] re-enable HHVM connection pooling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482082 (https://phabricator.wikimedia.org/T212768) (owner: 10DCausse) [12:11:42] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482586 (https://phabricator.wikimedia.org/T212917) (owner: 10Urbanecm) [12:12:47] (03Merged) 10jenkins-bot: New throttle rule for University of Southern California editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482586 (https://phabricator.wikimedia.org/T212917) (owner: 10Urbanecm) [12:13:55] Urbanecm: 481240 now has merge conflict [12:13:56] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se website, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10MasinAlDujailiWMDE) wikiba.se is now authoritatively nameserved by ns[012].wikimedia.org. Well, at least that's what I told the DNS. [12:13:59] fixing [12:14:08] !log zfilipin@deploy1001 Synchronized wmf-config/throttle.php: SWAT: [[gerrit:482586|New throttle rule for University of Southern California editathon (T212917)]] (duration: 00m 45s) [12:14:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:14:11] T212917: Request for temporary lift of IP cap for account creation on 2019-01-15 - https://phabricator.wikimedia.org/T212917 [12:14:15] Urbanecm: 482586 deployed [12:14:19] ack [12:14:53] (03PS1) 10Fsero: redis_password is not fetched moving it [labs/private] - 10https://gerrit.wikimedia.org/r/482796 [12:15:08] (03CR) 10Fsero: [V: 03+2 C: 03+2] redis_password is not fetched moving it [labs/private] - 10https://gerrit.wikimedia.org/r/482796 (owner: 10Fsero) [12:15:34] (03PS2) 10Zfilipin: Use localized wgMetaNamespace and wgMetaNamespaceTalk in satwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481663 (https://phabricator.wikimedia.org/T211294) (owner: 10Urbanecm) [12:16:36] (03PS2) 10Urbanecm: New throttle rule for students writing Wikipedia program [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481240 (https://phabricator.wikimedia.org/T212226) [12:16:37] ^^ fixed ^^ [12:16:40] zeljkof, [12:16:48] Urbanecm: do I need to run a script after 481663? [12:17:02] Yes, namespaceDupes.php [12:17:17] could you please add a note to gerrit, so I don't forget? [12:17:32] will do [12:17:55] (03CR) 10Urbanecm: "Please run namespaceDupes.php after deploying." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481663 (https://phabricator.wikimedia.org/T211294) (owner: 10Urbanecm) [12:18:02] ^^ is this enough? ^^ [12:18:17] Urbanecm: it is [12:18:22] good [12:18:48] Urbanecm: a bit confused, why is from and to time in different timezone utc/gmt https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/481240 [12:19:04] well, I guess it's the same timezone, but still a bit confusing :) [12:19:18] probably because I was sleepy when committing [12:19:25] :D [12:19:36] could you please pick one? [12:19:49] also, the commit message needs update, there is no more cleanup, right? [12:20:07] (03PS3) 10Urbanecm: New throttle rule for students writing Wikipedia program [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481240 (https://phabricator.wikimedia.org/T212226) [12:20:11] you're right, fixed both things ^ [12:20:52] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481240 (https://phabricator.wikimedia.org/T212226) (owner: 10Urbanecm) [12:21:55] (03Merged) 10jenkins-bot: New throttle rule for students writing Wikipedia program [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481240 (https://phabricator.wikimedia.org/T212226) (owner: 10Urbanecm) [12:23:03] !log zfilipin@deploy1001 Synchronized wmf-config/throttle.php: SWAT: [[gerrit:481240|New throttle rule for students writing Wikipedia program (T212226)]] (duration: 00m 44s) [12:23:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:23:06] T212226: Temporary lift of IP cap on en.wikipedia for 29 Jan 2019 - https://phabricator.wikimedia.org/T212226 [12:23:23] Urbanecm: 481240 deployed [12:23:25] thx [12:23:48] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481663 (https://phabricator.wikimedia.org/T211294) (owner: 10Urbanecm) [12:24:05] (03CR) 10jenkins-bot: New throttle rule for University of Southern California editathon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482586 (https://phabricator.wikimedia.org/T212917) (owner: 10Urbanecm) [12:24:06] (03CR) 10jenkins-bot: New throttle rule for students writing Wikipedia program [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481240 (https://phabricator.wikimedia.org/T212226) (owner: 10Urbanecm) [12:24:56] (03Merged) 10jenkins-bot: Use localized wgMetaNamespace and wgMetaNamespaceTalk in satwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481663 (https://phabricator.wikimedia.org/T211294) (owner: 10Urbanecm) [12:25:09] (03CR) 10jenkins-bot: Use localized wgMetaNamespace and wgMetaNamespaceTalk in satwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481663 (https://phabricator.wikimedia.org/T211294) (owner: 10Urbanecm) [12:26:15] Urbanecm: 481663 is at mwdebug1002 [12:26:20] thx, testing [12:26:31] (03PS1) 10Fsero: move eqiad and codfw values into role/site [labs/private] - 10https://gerrit.wikimedia.org/r/482798 [12:26:50] (03CR) 10Fsero: [V: 03+2 C: 03+2] move eqiad and codfw values into role/site [labs/private] - 10https://gerrit.wikimedia.org/r/482798 (owner: 10Fsero) [12:27:04] (03CR) 10Nuria: [C: 03+1] Changes wording in uniques dump link to reflect project families [puppet] - 10https://gerrit.wikimedia.org/r/482787 (https://phabricator.wikimedia.org/T168477) (owner: 10Fdans) [12:27:18] zeljkof, it works, please deploy [12:28:24] ok [12:29:18] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:481663|Use localized wgMetaNamespace and wgMetaNamespaceTalk in satwiki (T211294)]] (duration: 00m 45s) [12:29:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:29:21] T211294: Please translate the "Wikipedia" namespace in Santali for Santali Wikipedia - https://phabricator.wikimedia.org/T211294 [12:31:16] Urbanecm: deployed, running script [12:31:19] thx [12:32:49] Urbanecm: no problems with scripts [12:32:52] thx [12:33:34] (03PS2) 10Zfilipin: Allow ptwiki's bureaucrats to grant/revoke rollbacker user group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481662 (https://phabricator.wikimedia.org/T212735) (owner: 10Urbanecm) [12:34:35] Urbanecm: should task number be added to the line 11818? https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/481662 [12:34:50] task number is added to line 11035 [12:35:12] thx, fixing [12:37:17] (03PS3) 10Urbanecm: Allow ptwiki's bureaucrats to grant/revoke rollbacker user group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481662 (https://phabricator.wikimedia.org/T212735) [12:37:17] ^^^ fixed ^^^ zeljkof [12:38:28] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481662 (https://phabricator.wikimedia.org/T212735) (owner: 10Urbanecm) [12:38:35] Urbanecm: looks good :) [12:38:38] thanks [12:39:33] (03Merged) 10jenkins-bot: Allow ptwiki's bureaucrats to grant/revoke rollbacker user group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481662 (https://phabricator.wikimedia.org/T212735) (owner: 10Urbanecm) [12:39:49] (03CR) 10Ema: [C: 03+1] "pcc output looks good https://puppet-compiler.wmflabs.org/compiler1002/14211/cp3030.esams.wmnet/, patch tested successfully on cp1008." [puppet] - 10https://gerrit.wikimedia.org/r/482666 (https://phabricator.wikimedia.org/T209590) (owner: 10Vgutierrez) [12:39:50] !log akosiaris@deploy1001 scap-helm zotero upgrade production2 -f zoterov2-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw] [12:39:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:42:13] Urbanecm: 481662 is at mwdebug1002 [12:42:44] (03CR) 10Vgutierrez: [C: 03+2] tlsproxy: Set http2_max_field_size to 8k [puppet] - 10https://gerrit.wikimedia.org/r/482666 (https://phabricator.wikimedia.org/T209590) (owner: 10Vgutierrez) [12:42:52] (03PS2) 10Vgutierrez: tlsproxy: Set http2_max_field_size to 8k [puppet] - 10https://gerrit.wikimedia.org/r/482666 (https://phabricator.wikimedia.org/T209590) [12:42:58] looking [12:43:27] zeljkof, working, please deploy [12:43:50] ok [12:44:55] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:481662|Allow ptwikis bureaucrats to grant/revoke rollbacker user group (T212735)]] (duration: 00m 45s) [12:44:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:44:58] T212735: ptwikipedia: Allow bureaucrats to grant and remove rollbacker usergroup - https://phabricator.wikimedia.org/T212735 [12:45:27] Urbanecm: deployed [12:45:29] thx [12:45:37] !log akosiaris@deploy1001 scap-helm zotero install --name production2 -f zotero-values-codfw.yaml stable/zotero [namespace: zotero, clusters: codfw] [12:45:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:38] !log akosiaris@deploy1001 scap-helm zotero cluster codfw completed [12:45:38] !log akosiaris@deploy1001 scap-helm zotero finished [12:45:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:02] Urbanecm: merge conflict on the last one, 481107 [12:46:09] fixing [12:48:09] (03PS2) 10Urbanecm: Give all users (including IPs) the pagequality right in plwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481107 (https://phabricator.wikimedia.org/T212478) [12:48:28] fixed zeljkof [12:48:35] Urbanecm: thanks! [12:48:53] yw [12:49:46] (03CR) 10Zfilipin: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481107 (https://phabricator.wikimedia.org/T212478) (owner: 10Urbanecm) [12:50:26] (03CR) 10jenkins-bot: Allow ptwiki's bureaucrats to grant/revoke rollbacker user group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481662 (https://phabricator.wikimedia.org/T212735) (owner: 10Urbanecm) [12:51:08] (03Merged) 10jenkins-bot: Give all users (including IPs) the pagequality right in plwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481107 (https://phabricator.wikimedia.org/T212478) (owner: 10Urbanecm) [12:51:22] (03CR) 10jenkins-bot: Give all users (including IPs) the pagequality right in plwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481107 (https://phabricator.wikimedia.org/T212478) (owner: 10Urbanecm) [12:52:12] (03PS9) 10Fsero: Initial docker::registry::ha puppetization. [puppet] - 10https://gerrit.wikimedia.org/r/482675 (https://phabricator.wikimedia.org/T210076) [12:52:24] Urbanecm: 481107 is at mwdebug [12:54:09] PROBLEM - PyBal backends health check on lvs1006 is CRITICAL: PYBAL CRITICAL - CRITICAL - zotero_1969: Servers kubernetes1002.eqiad.wmnet, kubernetes1004.eqiad.wmnet are marked down but pooled [12:54:34] (03PS10) 10Fsero: Initial docker::registry::ha puppetization. [puppet] - 10https://gerrit.wikimedia.org/r/482675 (https://phabricator.wikimedia.org/T210076) [12:55:18] Urbanecm: still around? do you need more time to test? [12:55:28] ah, sorry, missed the "do test" msg [12:55:29] looking [12:56:00] zeljkof, works, please deploy [12:56:47] Urbanecm: ok [12:57:37] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:481107|Give all users (including IPs) the pagequality right in plwikisource (T212478)]] (duration: 00m 45s) [12:57:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:57:43] (03CR) 10Fsero: "PCC diff https://puppet-compiler.wmflabs.org/compiler1002/14213/registry1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/482675 (https://phabricator.wikimedia.org/T210076) (owner: 10Fsero) [12:57:43] T212478: Give all users (including IPs) the pagequality right in plwikisource - https://phabricator.wikimedia.org/T212478 [12:58:09] Urbanecm: deployed! [12:58:12] thanks [12:58:18] (03CR) 10Esanders: "> Also how do i get it so the header is not focused too much?" [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [12:58:20] thanks for deploying with #releng ;) [12:59:09] !log transfering db1102:s5 mariadb datadir to db1082 [12:59:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:01] (03PS2) 10Mforns: Bump up refinery_version in refine.pp to v0.0.83 [puppet] - 10https://gerrit.wikimedia.org/r/482727 [13:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190108T1300) [13:02:11] !log EU SWAT finished [13:02:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:37] (03PS1) 10Volans: Upstream release v0.0.11 [software/spicerack] (debian) - 10https://gerrit.wikimedia.org/r/482804 [13:09:30] 10Operations, 10Continuous-Integration-Infrastructure, 10SRE-Access-Requests, 10Release-Engineering-Team (Kanban): Grant sudo access for CI admins to doc.wikimedia.org publishing user - https://phabricator.wikimedia.org/T213169 (10hashar) p:05Triage→03Normal [13:10:24] (03PS2) 10Hashar: doc: grant doc-uploader access to contint users [puppet] - 10https://gerrit.wikimedia.org/r/480798 (https://phabricator.wikimedia.org/T213169) [13:16:00] (03PS7) 10Hashar: swift: lower replication interval for beta [puppet] - 10https://gerrit.wikimedia.org/r/344387 (https://phabricator.wikimedia.org/T160990) [13:17:19] RECOVERY - PyBal backends health check on lvs1006 is OK: PYBAL OK - All pools are healthy [13:19:14] RECOVERY - LVS HTTP IPv4 on zotero.svc.codfw.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 138 bytes in 0.089 second response time [13:19:59] RECOVERY - PyBal backends health check on lvs2003 is OK: PYBAL OK - All pools are healthy [13:20:07] (03CR) 10Hashar: "Rebased. It had a trivial conflict in hieradata/labs/deployment-prep/common.yaml" [puppet] - 10https://gerrit.wikimedia.org/r/344387 (https://phabricator.wikimedia.org/T160990) (owner: 10Hashar) [13:33:16] (03PS1) 10Alexandros Kosiaris: Remove zoterov2 RRs [dns] - 10https://gerrit.wikimedia.org/r/482807 [13:33:46] (03PS1) 10Alexandros Kosiaris: citoid: Move back to using zotero.svc.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/482808 [13:33:48] (03PS1) 10Alexandros Kosiaris: sca: Remove the cluster from conftool [puppet] - 10https://gerrit.wikimedia.org/r/482809 [13:33:50] (03PS1) 10Alexandros Kosiaris: lvs: Remove all mentions of zoterov2 [puppet] - 10https://gerrit.wikimedia.org/r/482810 [13:39:35] (03CR) 10Alexandros Kosiaris: "Now that the new version/infra has proven as functional and stable as the old one ;), let's clear up the migration steps and use the zoter" [puppet] - 10https://gerrit.wikimedia.org/r/482808 (owner: 10Alexandros Kosiaris) [13:41:28] (03PS2) 10Alexandros Kosiaris: citoid: Move back to using zotero.svc.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/482808 [13:41:29] (03PS2) 10Alexandros Kosiaris: sca: Remove the cluster from conftool [puppet] - 10https://gerrit.wikimedia.org/r/482809 [13:41:31] (03PS2) 10Alexandros Kosiaris: lvs: Remove all mentions of zoterov2 [puppet] - 10https://gerrit.wikimedia.org/r/482810 [13:43:35] (03CR) 10Hashar: "Gave it a try locally via:" [puppet] - 10https://gerrit.wikimedia.org/r/333012 (https://phabricator.wikimedia.org/T154915) (owner: 10Hashar) [13:48:06] (03PS2) 10Hashar: hhvm: fix typo in RUN_AS_GROUP [puppet] - 10https://gerrit.wikimedia.org/r/474910 (https://phabricator.wikimedia.org/T209946) [13:48:08] (03PS2) 10Hashar: hhvm: add basic specs [puppet] - 10https://gerrit.wikimedia.org/r/474915 [13:48:10] (03PS2) 10Hashar: hhvm: test default file generation [puppet] - 10https://gerrit.wikimedia.org/r/474917 (https://phabricator.wikimedia.org/T209946) [13:48:12] (03PS4) 10Hashar: ci: add some gated extensions to git cache [puppet] - 10https://gerrit.wikimedia.org/r/440539 (https://phabricator.wikimedia.org/T197469) [13:48:29] (03CR) 10Hashar: "That is one is for CI instances on WMCS :)" [puppet] - 10https://gerrit.wikimedia.org/r/440539 (https://phabricator.wikimedia.org/T197469) (owner: 10Hashar) [13:48:48] (03CR) 10Hashar: [V: 03+1] "(already cherry picked on the integration CI puppet master)." [puppet] - 10https://gerrit.wikimedia.org/r/440539 (https://phabricator.wikimedia.org/T197469) (owner: 10Hashar) [13:55:52] (03Abandoned) 10Brian Wolff: Add wikimedia.org to allowed source list for Mathoid [mediawiki-config] - 10https://gerrit.wikimedia.org/r/454180 (owner: 10Brian Wolff) [13:56:21] (03PS2) 10Hashar: admin: test for absent users [puppet] - 10https://gerrit.wikimedia.org/r/482611 [13:56:24] (03PS1) 10Hashar: admin: run tox suite for any changes in data dir [puppet] - 10https://gerrit.wikimedia.org/r/482812 [13:57:38] (03CR) 10Hashar: "I have fixed the flake8 issues." [puppet] - 10https://gerrit.wikimedia.org/r/482611 (owner: 10Hashar) [13:58:36] (03CR) 10Hashar: [V: 03+1] "Found while running tests for https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/482611/ . The change touches modules/admin/data/data" [puppet] - 10https://gerrit.wikimedia.org/r/482812 (owner: 10Hashar) [14:00:04] Deploy window MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190108T1400) [14:01:23] (03PS5) 10Hashar: zuul: allow email connection [puppet] - 10https://gerrit.wikimedia.org/r/376739 (https://phabricator.wikimedia.org/T93414) [14:01:34] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/376739 (https://phabricator.wikimedia.org/T93414) (owner: 10Hashar) [14:02:09] (03PS3) 10Giuseppe Lavagetto: jobrunner: support php7 [puppet] - 10https://gerrit.wikimedia.org/r/481866 [14:02:11] (03CR) 10Hashar: "Rebased. I have added Hosts header in the commit message for the puppet compiler." [puppet] - 10https://gerrit.wikimedia.org/r/376739 (https://phabricator.wikimedia.org/T93414) (owner: 10Hashar) [14:04:33] (03CR) 10Hashar: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1001/92/ which looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/376739 (https://phabricator.wikimedia.org/T93414) (owner: 10Hashar) [14:07:35] (03Abandoned) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [debs/php-excimer] (debian/stretch-wikimedia) - 10https://gerrit.wikimedia.org/r/481612 (owner: 10Hashar) [14:09:42] 10Operations, 10media-storage, 10Patch-For-Review, 10User-fgiunchedi: rack/setup/install ms-be10[44-50].eqiad.wmnet - https://phabricator.wikimedia.org/T209618 (10fgiunchedi) >>! In T209618#4854119, @Dzahn wrote: > @Cmjohnson @fgiunchedi > > There are 2 new Icinga alerts saying that on ms-be1044 and ms-b... [14:09:52] 10Operations, 10media-storage, 10Patch-For-Review, 10User-fgiunchedi: rack/setup/install ms-be10[44-50].eqiad.wmnet - https://phabricator.wikimedia.org/T209618 (10fgiunchedi) 05Open→03Resolved All new hosts in service, resolving. [14:18:59] (03CR) 10Paladox: [C: 03+1] "LGTM" [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/482775 (https://phabricator.wikimedia.org/T101131) (owner: 10Hashar) [14:27:36] (03PS3) 10Arturo Borrero Gonzalez: wmcs: Add postgres maps users for eqiad1-r region [puppet] - 10https://gerrit.wikimedia.org/r/481341 (https://phabricator.wikimedia.org/T212596) (owner: 10BryanDavis) [14:28:15] akosiaris: could we merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/481341/ ? [14:30:24] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2047 - https://phabricator.wikimedia.org/T212966 (10Marostegui) 05Open→03Resolved Thanks! ` root@db2047:~# hpssacli controller all show config Smart Array P420i in Slot 0 (Embedded) (sn: 0014380337E0DB0) Port Name: 1I Port Name: 2I Gen... [14:30:37] 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [14:30:39] (03CR) 10Gehel: [C: 04-1] wdqs: prefix exporter with wdqs_updater_ (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/479395 (https://phabricator.wikimedia.org/T208215) (owner: 10Mathew.onipe) [14:43:01] 10Operations, 10DBA, 10Jade, 10TechCom-RFC, and 2 others: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10daniel) Moving this to "backlog" for now. This should have happened right after last week's TechCom meeting, but I forgot to do... [14:44:25] (03PS1) 10Marostegui: db-codfw.php: Depool es2019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482814 (https://phabricator.wikimedia.org/T212833) [14:45:59] (03CR) 10Elukey: [C: 03+1] "Left a couple of comments but it looks good to me, didn't test the httpd config though (but it seems very fine to me)." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/481866 (owner: 10Giuseppe Lavagetto) [14:50:45] (03CR) 10Elukey: "> With this last change the pcc looks strange:" [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) (owner: 10Elukey) [14:51:04] 10Operations, 10Continuous-Integration-Infrastructure, 10SRE-Access-Requests, 10Patch-For-Review, 10Performance-Team (Radar): Add krinkle to contint-docker group - https://phabricator.wikimedia.org/T213015 (10hashar) The group comes from T182860 and https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/... [14:54:00] hashar: is the train done? [14:54:25] (03CR) 10Giuseppe Lavagetto: jobrunner: support php7 (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/481866 (owner: 10Giuseppe Lavagetto) [14:54:38] (03PS4) 10Giuseppe Lavagetto: jobrunner: support php7 [puppet] - 10https://gerrit.wikimedia.org/r/481866 [14:55:24] (03PS1) 10Joal: Update AQS druid datasource to new snapshot [puppet] - 10https://gerrit.wikimedia.org/r/482816 [14:56:59] (03CR) 10Giuseppe Lavagetto: [C: 03+2] jobrunner: support php7 [puppet] - 10https://gerrit.wikimedia.org/r/481866 (owner: 10Giuseppe Lavagetto) [14:57:00] 10Operations, 10monitoring, 10Patch-For-Review: Degraded RAID alert not acking notifications - https://phabricator.wikimedia.org/T212969 (10Marostegui) Thank you! [14:57:56] (03CR) 10Elukey: [C: 03+2] Update AQS druid datasource to new snapshot [puppet] - 10https://gerrit.wikimedia.org/r/482816 (owner: 10Joal) [14:58:00] (03PS2) 10Elukey: Update AQS druid datasource to new snapshot [puppet] - 10https://gerrit.wikimedia.org/r/482816 (owner: 10Joal) [14:59:05] 10Operations, 10TechCom-RFC, 10Wikidata, 10Wikidata-Termbox-Hike, and 5 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10daniel) Did the chat with @joe happen? What was the outcome? [14:59:25] marostegui: nop. Will be done later tonight by the USA folks :) [15:04:27] hashar: that means I can deploy? [15:04:30] !log Restarted CI Jenkins [15:04:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:04:35] marostegui: I guess so !! :) [15:05:30] thanks! [15:08:01] (03CR) 10Marostegui: [C: 03+2] db-codfw.php: Depool es2019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482814 (https://phabricator.wikimedia.org/T212833) (owner: 10Marostegui) [15:09:06] (03Merged) 10jenkins-bot: db-codfw.php: Depool es2019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482814 (https://phabricator.wikimedia.org/T212833) (owner: 10Marostegui) [15:10:12] (03PS1) 10CDanis: Revert "grafana-old.wikimedia.org DNS points to text caches" [dns] - 10https://gerrit.wikimedia.org/r/482818 (https://phabricator.wikimedia.org/T211712) [15:10:15] (03PS6) 10Marostegui: Recommendation API: increase mysql connection limit for service [puppet] - 10https://gerrit.wikimedia.org/r/481871 (https://phabricator.wikimedia.org/T205294) (owner: 10Bmansurov) [15:10:22] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Depool es2019 - T212833 (duration: 00m 44s) [15:10:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:10:25] T212833: es2019 is not responsive - https://phabricator.wikimedia.org/T212833 [15:11:05] (03CR) 10CDanis: [C: 03+2] Revert "grafana-old.wikimedia.org DNS points to text caches" [dns] - 10https://gerrit.wikimedia.org/r/482818 (https://phabricator.wikimedia.org/T211712) (owner: 10CDanis) [15:12:22] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: es2019 is not responsive - https://phabricator.wikimedia.org/T212833 (10Marostegui) I have depooled es2019 so it is ready to be powered off once @Papaul is ready for it [15:13:15] (03CR) 10Marostegui: [C: 03+2] Recommendation API: increase mysql connection limit for service [puppet] - 10https://gerrit.wikimedia.org/r/481871 (https://phabricator.wikimedia.org/T205294) (owner: 10Bmansurov) [15:14:48] (03PS1) 10CDanis: Revert "add hiera for grafana-old.w.o pointing to krypton" [puppet] - 10https://gerrit.wikimedia.org/r/482819 (https://phabricator.wikimedia.org/T211712) [15:15:25] (03CR) 10CDanis: [C: 03+2] Revert "add hiera for grafana-old.w.o pointing to krypton" [puppet] - 10https://gerrit.wikimedia.org/r/482819 (https://phabricator.wikimedia.org/T211712) (owner: 10CDanis) [15:17:31] !log Increase connections from 10 to 50 for recommendationapiservice on m2 - T212154 [15:17:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:17:34] T212154: Recommendation API exceeds max_user_connections in MySQL - https://phabricator.wikimedia.org/T212154 [15:17:38] (03CR) 10jenkins-bot: db-codfw.php: Depool es2019 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482814 (https://phabricator.wikimedia.org/T212833) (owner: 10Marostegui) [15:18:26] (03PS2) 10Nuria: Setting default config to monthly for monthly datasets [puppet] - 10https://gerrit.wikimedia.org/r/482336 (https://phabricator.wikimedia.org/T209103) [15:18:39] 10Operations, 10Cloud-Services, 10Wikibase-Quality, 10Wikibase-Quality-Constraints, and 4 others: Flood of WDQS requests from wbqc - https://phabricator.wikimedia.org/T204267 (10Addshore) >>! In T204267#4652483, @Smalyshev wrote: > Happened again, bumping the priority. The spike can be seen here https:/... [15:19:24] (03PS3) 10Elukey: Setting default config to monthly for monthly datasets [puppet] - 10https://gerrit.wikimedia.org/r/482336 (https://phabricator.wikimedia.org/T209103) (owner: 10Nuria) [15:19:51] 10Operations, 10Release Pipeline, 10Core Platform Team Backlog (Watching / External), 10Release-Engineering-Team (Watching / External), 10Services (watching): Revisit the logging work done on Q1 2017-2018 for the standard pod setup - https://phabricator.wikimedia.org/T207200 (10fgiunchedi) >>! In T207200... [15:22:05] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: es2019 is not responsive - https://phabricator.wikimedia.org/T212833 (10Marostegui) a:05Banyek→03Papaul Assigning to @Papaul as per our chat [15:23:58] !log briefly stop carbon daemons on graphite1004 to move /srv/whisper -> /srv/carbon/whisper [15:23:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:24:58] 10Operations, 10Release Pipeline, 10Core Platform Team Backlog (Watching / External), 10Release-Engineering-Team (Watching / External), 10Services (watching): Revisit the logging work done on Q1 2017-2018 for the standard pod setup - https://phabricator.wikimedia.org/T207200 (10fselles) @fgiunchedi i do... [15:27:38] PROBLEM - carbon-cache@g service on graphite1004 is CRITICAL: CRITICAL - Expecting active but unit carbon-cache@g is inactive [15:27:45] (03CR) 10Nuria: [C: 03+1] Update AQS druid datasource to new snapshot [puppet] - 10https://gerrit.wikimedia.org/r/482816 (owner: 10Joal) [15:29:14] (03CR) 10Elukey: [C: 03+2] Setting default config to monthly for monthly datasets [puppet] - 10https://gerrit.wikimedia.org/r/482336 (https://phabricator.wikimedia.org/T209103) (owner: 10Nuria) [15:30:54] RECOVERY - carbon-cache@g service on graphite1004 is OK: OK - carbon-cache@g is active [15:32:35] !log Stop MySQL on es2019 for upgrade - T212833 [15:32:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:37] T212833: es2019 is not responsive - https://phabricator.wikimedia.org/T212833 [15:33:30] PROBLEM - PHP7 rendering on mw2152 is CRITICAL: HTTP CRITICAL: HTTP/1.1 403 Forbidden - header X-Powered-By: PHP/7. not found on http://10.192.32.40:9005/w/health-check.php - 391 bytes in 0.073 second response time [15:37:19] (03CR) 10Herron: "> One possible attack is to run a job/deployment/replicateset that" [puppet] - 10https://gerrit.wikimedia.org/r/379239 (https://phabricator.wikimedia.org/T175964) (owner: 10Herron) [15:40:12] PROBLEM - Request latencies on acrux is CRITICAL: instance=10.192.0.93:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [15:40:26] PROBLEM - Request latencies on argon is CRITICAL: instance=10.64.32.133:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [15:41:28] PROBLEM - Request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [15:41:54] PROBLEM - Request latencies on acrab is CRITICAL: instance=10.192.16.26:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [15:42:01] (03CR) 10Hashar: [C: 03+1] "So I think this one is good to go :-]" [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/480533 (https://phabricator.wikimedia.org/T210438) (owner: 10Hashar) [15:42:01] mmm not good [15:42:40] RECOVERY - Request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [15:42:50] RECOVERY - Request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [15:43:07] (03PS1) 10Elukey: profile::hive::site_hdfs: move exec to cdh::exec [puppet] - 10https://gerrit.wikimedia.org/r/482823 [15:44:04] (03CR) 10Hashar: "Should be good to go. Breaking change being that it defaults to pulling images but that can be disabled by passing --no-pull." [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/475843 (https://phabricator.wikimedia.org/T200720) (owner: 10Hashar) [15:44:18] RECOVERY - Request latencies on acrab is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [15:44:58] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/14215/" [puppet] - 10https://gerrit.wikimedia.org/r/482823 (owner: 10Elukey) [15:45:12] !log Drop valid_tag table from s2 - T212254 [15:45:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:15] T212254: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 [15:45:37] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::jobrunner: actually allow requests [puppet] - 10https://gerrit.wikimedia.org/r/482825 [15:45:53] (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::mediawiki::jobrunner: actually allow requests [puppet] - 10https://gerrit.wikimedia.org/r/482825 (owner: 10Giuseppe Lavagetto) [15:46:44] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] profile::mediawiki::jobrunner: actually allow requests [puppet] - 10https://gerrit.wikimedia.org/r/482825 (owner: 10Giuseppe Lavagetto) [15:46:53] (03PS2) 10Giuseppe Lavagetto: profile::mediawiki::jobrunner: actually allow requests [puppet] - 10https://gerrit.wikimedia.org/r/482825 [15:47:09] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] profile::mediawiki::jobrunner: actually allow requests [puppet] - 10https://gerrit.wikimedia.org/r/482825 (owner: 10Giuseppe Lavagetto) [15:47:18] jouncebot: now [15:47:18] For the next 0 hour(s) and 12 minute(s): MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190108T1400) [15:47:26] jouncebot: next [15:47:26] In 1 hour(s) and 12 minute(s): Puppet SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190108T1700) [15:47:26] RECOVERY - Request latencies on acrux is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [15:50:20] (03PS1) 10Mforns: Switch on RefineMonitor for Analytics EventLoggingSanitization [puppet] - 10https://gerrit.wikimedia.org/r/482826 (https://phabricator.wikimedia.org/T202429) [15:50:38] (03CR) 10GTirloni: [C: 03+2] "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/379239 (https://phabricator.wikimedia.org/T175964) (owner: 10Herron) [15:52:42] (03PS1) 10Marostegui: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482827 (https://phabricator.wikimedia.org/T86338) [15:54:13] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482827 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [15:54:19] (03CR) 10Mobrovac: [C: 03+1] citoid: Move back to using zotero.svc.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/482808 (owner: 10Alexandros Kosiaris) [15:55:17] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482827 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [15:55:31] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482827 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [15:56:16] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1089 T86338 T202167 (duration: 00m 45s) [15:56:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:56:20] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [15:56:21] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [15:59:55] !log Deploy schema change on db1089 T86338 T202167 [15:59:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:10] !log mobrovac@deploy1001 Started deploy [restbase/deploy@503b29c]: Add test-commons and nap.wikisource - T210752 T197616 [16:00:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:14] T210752: Create Wikisource Neapolitan - https://phabricator.wikimedia.org/T210752 [16:00:15] T197616: Create a production test wiki in group0 to parallel Wikimedia Commons - https://phabricator.wikimedia.org/T197616 [16:00:39] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::jobrunner: fix RewriteRule [puppet] - 10https://gerrit.wikimedia.org/r/482828 [16:01:17] 10Operations, 10DBA, 10Jade, 10TechCom-RFC, and 2 others: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Halfak) The meeting happened. @Krinkle confirmed that his questions had been answered. I imagined that he'd report to y'all. [16:01:27] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] profile::mediawiki::jobrunner: fix RewriteRule [puppet] - 10https://gerrit.wikimedia.org/r/482828 (owner: 10Giuseppe Lavagetto) [16:03:24] PROBLEM - Restbase root url on restbase1007 is CRITICAL: connect to address 10.64.0.223 and port 7231: Connection refused [16:04:15] !log Drop valid_tag table from s7 - T212254 [16:04:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:04:17] T212254: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 [16:05:46] RECOVERY - Restbase root url on restbase1007 is OK: HTTP OK: HTTP/1.1 200 - 16234 bytes in 0.009 second response time [16:07:37] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::jobrunner: also fix the other vhost [puppet] - 10https://gerrit.wikimedia.org/r/482829 [16:11:02] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Marostegui) @faidon @RobH can we follow up with Dell to see what's going on a more formal way? This server has been unusable since it arrived and it is brand new :-| [16:11:09] (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::mediawiki::jobrunner: also fix the other vhost [puppet] - 10https://gerrit.wikimedia.org/r/482829 (owner: 10Giuseppe Lavagetto) [16:12:37] !log add BGP sessions to AS64050 in AMS-IX [16:12:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:13:16] (03CR) 10Filippo Giunchedi: [C: 03+1] hiera: add certcentral cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/482108 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [16:13:48] (03PS2) 10Elukey: Switch on RefineMonitor for Analytics EventLoggingSanitization [puppet] - 10https://gerrit.wikimedia.org/r/482826 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [16:15:09] (03PS1) 10Jbond42: add jbond user as part of onboarding Bug: T123456 [puppet] - 10https://gerrit.wikimedia.org/r/482831 (https://phabricator.wikimedia.org/T123456) [16:15:11] (03CR) 10Welcome, new contributor!: "Thank you for making your first contribution to Wikimedia! :) To learn how to get your code changes reviewed faster and more likely to get" [puppet] - 10https://gerrit.wikimedia.org/r/482831 (https://phabricator.wikimedia.org/T123456) (owner: 10Jbond42) [16:16:53] (03CR) 10Jcrespo: "Hi!" [puppet] - 10https://gerrit.wikimedia.org/r/482831 (https://phabricator.wikimedia.org/T123456) (owner: 10Jbond42) [16:17:53] (03PS2) 10Cwhite: hiera: add certcentral cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/482108 (https://phabricator.wikimedia.org/T210486) [16:19:04] (03CR) 10Cwhite: [C: 03+2] hiera: add certcentral cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/482108 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [16:19:10] (03CR) 10Elukey: [C: 03+2] Switch on RefineMonitor for Analytics EventLoggingSanitization [puppet] - 10https://gerrit.wikimedia.org/r/482826 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [16:19:20] (03PS3) 10Elukey: Switch on RefineMonitor for Analytics EventLoggingSanitization [puppet] - 10https://gerrit.wikimedia.org/r/482826 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [16:19:23] (03CR) 10Elukey: [V: 03+2 C: 03+2] Switch on RefineMonitor for Analytics EventLoggingSanitization [puppet] - 10https://gerrit.wikimedia.org/r/482826 (https://phabricator.wikimedia.org/T202429) (owner: 10Mforns) [16:19:30] (03CR) 10Jforrester: add jbond user as part of onboarding Bug: T123456 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482831 (https://phabricator.wikimedia.org/T123456) (owner: 10Jbond42) [16:19:34] (03PS3) 10Elukey: Bump up refinery_version in refine.pp to v0.0.83 [puppet] - 10https://gerrit.wikimedia.org/r/482727 (owner: 10Mforns) [16:20:05] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: es2019 is not responsive - https://phabricator.wikimedia.org/T212833 (10Papaul) a:05Papaul→03Marostegui Update BIOS from 2.4.3 to 2.8.0 IDRAC from 2.40 to 2.61 system is power on [16:20:48] (03CR) 10Elukey: [C: 03+2] Bump up refinery_version in refine.pp to v0.0.83 [puppet] - 10https://gerrit.wikimedia.org/r/482727 (owner: 10Mforns) [16:20:54] PROBLEM - Restbase root url on restbase1018 is CRITICAL: connect to address 10.64.48.97 and port 7231: Connection refused [16:21:06] (03PS2) 10Jbond42: add jbond user as part of onboarding Bug: T213079 [puppet] - 10https://gerrit.wikimedia.org/r/482831 (https://phabricator.wikimedia.org/T213079) [16:21:36] (03CR) 10jerkins-bot: [V: 04-1] add jbond user as part of onboarding Bug: T213079 [puppet] - 10https://gerrit.wikimedia.org/r/482831 (https://phabricator.wikimedia.org/T213079) (owner: 10Jbond42) [16:23:38] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: decommission of restbase200[1-6] (lease return in December 2018) - https://phabricator.wikimedia.org/T211070 (10Papaul) [16:23:55] (03CR) 10Jbond42: "Have updated thanks for the pointers" [puppet] - 10https://gerrit.wikimedia.org/r/482831 (https://phabricator.wikimedia.org/T213079) (owner: 10Jbond42) [16:24:26] rb problem known ^ [16:24:40] (03PS1) 10Giuseppe Lavagetto: profile::mediawiki::jobqueue: hopefully last fixup [puppet] - 10https://gerrit.wikimedia.org/r/482833 [16:25:06] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482834 [16:25:13] (03PS3) 10Jbond42: add jbond user as part of onboarding [puppet] - 10https://gerrit.wikimedia.org/r/482831 (https://phabricator.wikimedia.org/T213079) [16:25:21] (03CR) 10Muehlenhoff: add jbond user as part of onboarding (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482831 (https://phabricator.wikimedia.org/T213079) (owner: 10Jbond42) [16:25:23] (03CR) 10Giuseppe Lavagetto: [V: 03+2 C: 03+2] profile::mediawiki::jobqueue: hopefully last fixup [puppet] - 10https://gerrit.wikimedia.org/r/482833 (owner: 10Giuseppe Lavagetto) [16:25:27] (03PS2) 10Cwhite: hiera: add wmcs cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/482149 (https://phabricator.wikimedia.org/T210486) [16:26:08] (03CR) 10Jforrester: "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/482831 (https://phabricator.wikimedia.org/T213079) (owner: 10Jbond42) [16:26:56] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482834 (owner: 10Marostegui) [16:26:56] RECOVERY - Restbase root url on restbase1018 is OK: HTTP OK: HTTP/1.1 200 - 16234 bytes in 0.009 second response time [16:27:39] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool es2019" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482835 [16:27:45] (03CR) 10Jbond42: "> Patch Set 2:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482831 (https://phabricator.wikimedia.org/T213079) (owner: 10Jbond42) [16:27:56] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10RobH) >>! In T207258#4862960, @Marostegui wrote: > @faidon @RobH can we follow up with Dell to see what's going on a more formal way? This server has been unusable since it arrived and it is bran... [16:28:00] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482834 (owner: 10Marostegui) [16:28:19] 10Operations, 10ops-eqiad, 10DBA: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Marostegui) Thank you! [16:28:34] RECOVERY - PHP7 rendering on mw2152 is OK: HTTP OK: HTTP/1.1 200 OK - 320 bytes in 0.075 second response time [16:29:14] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1089 T86338 T202167 (duration: 00m 45s) [16:29:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:17] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [16:29:18] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [16:29:30] (03CR) 10Marostegui: [C: 03+2] Revert "db-codfw.php: Depool es2019" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482835 (owner: 10Marostegui) [16:30:36] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool es2019" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482835 (owner: 10Marostegui) [16:32:54] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482834 (owner: 10Marostegui) [16:32:56] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool es2019" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482835 (owner: 10Marostegui) [16:34:17] !log marostegui@deploy1001 Synchronized wmf-config/db-codfw.php: Repool es2019 - T212833 (duration: 02m 51s) [16:34:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:34:20] T212833: es2019 is not responsive - https://phabricator.wikimedia.org/T212833 [16:34:54] 10Operations, 10ops-codfw, 10DBA: Several es20XX servers keep crashing (es2017, es2019, es2015, es2014) since 23 March - https://phabricator.wikimedia.org/T130702 (10Marostegui) [16:34:57] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: es2019 is not responsive - https://phabricator.wikimedia.org/T212833 (10Marostegui) 05Open→03Resolved Thank you! I have repooled the server! [16:35:58] PROBLEM - Restbase root url on restbase1008 is CRITICAL: connect to address 10.64.32.178 and port 7231: Connection refused [16:38:11] (03CR) 10Muehlenhoff: [C: 04-1] add jbond user as part of onboarding (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/482831 (https://phabricator.wikimedia.org/T213079) (owner: 10Jbond42) [16:40:50] (03PS4) 10Jbond42: add jbond user as part of onboarding [puppet] - 10https://gerrit.wikimedia.org/r/482831 (https://phabricator.wikimedia.org/T213079) [16:41:56] PROBLEM - WDQS HTTP Port on wdqs1006 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - 387 bytes in 0.001 second response time [16:41:58] RECOVERY - Restbase root url on restbase1008 is OK: HTTP OK: HTTP/1.1 200 - 16234 bytes in 0.008 second response time [16:42:20] gehel, onimisionipe: FYI ^^^ wdqs1006 [16:44:23] Crap [16:44:44] (03PS1) 10Jbond42: address review comments [puppet] - 10https://gerrit.wikimedia.org/r/482838 [16:44:57] I'm looking into it [16:45:48] PROBLEM - Request latencies on acrab is CRITICAL: instance=10.192.16.26:6443 verb=CONNECT https://grafana.wikimedia.org/dashboard/db/kubernetes-api [16:48:01] 10Operations, 10Recommendation-API, 10Research, 10SRE-Access-Requests, and 3 others: Add Baha as a deployer for Recommendation API - https://phabricator.wikimedia.org/T212945 (10bmansurov) @mobrovac @DarTar is on vacation. @Leila is a stand-in manager. Would her approval be good? [16:48:40] PROBLEM - Restbase root url on restbase1014 is CRITICAL: connect to address 10.64.48.133 and port 7231: Connection refused [16:48:58] PROBLEM - Restbase root url on restbase1013 is CRITICAL: connect to address 10.64.32.80 and port 7231: Connection refused [16:49:28] PROBLEM - Restbase root url on restbase1009 is CRITICAL: connect to address 10.64.48.110 and port 7231: Connection refused [16:50:29] (03PS1) 10Marostegui: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482839 (https://phabricator.wikimedia.org/T86338) [16:51:04] RECOVERY - Request latencies on acrab is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [16:52:16] 10Operations, 10Recommendation-API, 10Research, 10SRE-Access-Requests, and 3 others: Add Baha as a deployer for Recommendation API - https://phabricator.wikimedia.org/T212945 (10mobrovac) >>! In T212945#4863061, @bmansurov wrote: > @mobrovac @DarTar is on vacation. @Leila is a stand-in manager. Would her a... [16:52:18] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482839 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [16:52:53] 10Operations, 10TechCom-RFC, 10Wikidata, 10Wikidata-Termbox-Hike, and 5 others: New Service Request: Wikidata Termbox SSR - https://phabricator.wikimedia.org/T212189 (10WMDE-leszek) It did (today, not on Monday though). I hope the outcome is I hope that @Joe and @akosiaris have a better understanding of wh... [16:53:12] RECOVERY - WDQS HTTP Port on wdqs1006 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.029 second response time [16:53:25] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482839 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [16:54:18] !log stopping s5 replication on labsdb1009/10/11 to prevent undoable mistakes [16:54:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:54:53] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1105:3311 T86338 T202167 (duration: 00m 44s) [16:54:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:54:57] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [16:54:57] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [16:54:58] !log Deploy schema change on db1105:3311 T86338 T202167 [16:55:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:55:36] RECOVERY - Restbase root url on restbase1009 is OK: HTTP OK: HTTP/1.1 200 - 16234 bytes in 0.011 second response time [16:55:42] PROBLEM - restbase endpoints health on restbase1013 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received [16:55:44] 10Operations, 10Recommendation-API, 10Research, 10SRE-Access-Requests, and 3 others: Add Baha as a deployer for Recommendation API - https://phabricator.wikimedia.org/T212945 (10leila) approved. [16:55:44] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) timed out before a response was received: /en.wikipedia.org/v1/page/html/{title} (Get html by title from storage) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was recei [16:55:44] a.org/v1/page/mobile-sections/{title}{/revision} (Get mobile-sections for a test page on enwiki) timed out before a response was received [16:55:48] RECOVERY - Restbase root url on restbase1014 is OK: HTTP OK: HTTP/1.1 200 - 16234 bytes in 0.013 second response time [16:56:05] !log changing db1124:s5 replication to db2066 [16:56:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:56:06] PROBLEM - restbase endpoints health on restbase1014 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) timed out before a response was received: /en.wikipedia.org/v1/page/html/{title} (Get html by title from storage) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was recei [16:56:06] a.org/v1/page/mobile-sections/{title}{/revision} (Get mobile-sections for a test page on enwiki) timed out before a response was received [16:56:10] RECOVERY - Restbase root url on restbase1013 is OK: HTTP OK: HTTP/1.1 200 - 16234 bytes in 0.015 second response time [16:56:36] RECOVERY - restbase endpoints health on restbase1013 is OK: All endpoints are healthy [16:56:38] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [16:56:50] !log forcing removal of restbase1016-a (host down way too long to salvage) -- T212418 [16:56:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:56:52] T212418: Memory error on restbase1016 - https://phabricator.wikimedia.org/T212418 [16:56:58] RECOVERY - MariaDB Slave IO: s5 on db1124 is OK: OK slave_io_state Slave_IO_Running: Yes [16:57:04] RECOVERY - restbase endpoints health on restbase1014 is OK: All endpoints are healthy [16:57:11] onimisionipe: how is it going with wdqs1006? (sorry, baby issue, was away for a bit) [16:57:59] 10Operations, 10Recommendation-API, 10Research, 10SRE-Access-Requests, and 3 others: Add Baha as a deployer for Recommendation API - https://phabricator.wikimedia.org/T212945 (10mobrovac) [16:58:23] gehel: Its Ok now. digging through logs to find some info. blazegraph ran into error and restarted on its own [16:58:33] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1105:3311 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482839 (https://phabricator.wikimedia.org/T86338) (owner: 10Marostegui) [16:58:40] the restbase errors are probably related [16:58:58] 10Operations, 10Recommendation-API, 10Research, 10SRE-Access-Requests, and 3 others: Add Baha as a deployer for Recommendation API - https://phabricator.wikimedia.org/T212945 (10mobrovac) @herron manager approval received. Has the request been discussed during the SRE weekly meeting? [17:00:04] godog and _joe_: Dear deployers, time to do the Puppet SWAT(Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190108T1700). [17:00:04] MaxSem and James_F: A patch you scheduled for Puppet SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [17:00:29] rb errors are known gehel :) [17:00:47] * James_F waves. [17:00:53] mobrovac: and not related to the wdqs hicup ? [17:01:16] nope gehel, they are due to a forced removal of a node and deployment [17:01:25] <_joe_> MaxSem, James_F are you around? [17:01:32] _joe_: I am. [17:01:34] ok, nice to know I only have one problem to deal with ! [17:02:00] gehel: found it [17:02:33] we've seen similar issue before. A query with a particular UA string setting causing blazegraph to crash [17:02:54] It happened while you were doing baby business :) [17:03:16] onimisionipe: I expect that what is actually crashing blazegraph is the query itself, not the UA :) [17:03:21] onimisionipe: do we have a task for it? [17:03:26] (03PS2) 10Giuseppe Lavagetto: Get rid of Zero landing support [puppet] - 10https://gerrit.wikimedia.org/r/482492 (https://phabricator.wikimedia.org/T187716) (owner: 10MaxSem) [17:03:46] PROBLEM - mobileapps endpoints health on scb1004 is CRITICAL: /{domain}/v1/page/mobile-sections/{title}{/revision}{/tid} (retrieve test page via mobile-sections) timed out before a response was received [17:03:52] PROBLEM - Restbase root url on restbase1015 is CRITICAL: connect to address 10.64.48.134 and port 7231: Connection refused [17:03:56] PROBLEM - Restbase root url on restbase1017 is CRITICAL: connect to address 10.64.32.129 and port 7231: Connection refused [17:03:59] <_joe_> James_F: so just to confirm, we want to remove the special treatment of the mobile pages [17:04:03] gehel: I don't think so. I'll create one [17:04:15] onimisionipe: thanks! [17:04:26] <_joe_> mobrovac: should we those alerts be acknowledged?? [17:04:38] PROBLEM - Restbase root url on restbase1012 is CRITICAL: connect to address 10.64.32.79 and port 7231: Connection refused [17:04:52] RECOVERY - mobileapps endpoints health on scb1004 is OK: All endpoints are healthy [17:04:57] _joe_: The Zero-related rewrite config is no longer needed. [17:05:03] (03PS1) 10Gehel: wdqs: dashboard for WDQS lag has moved [puppet] - 10https://gerrit.wikimedia.org/r/482847 [17:05:39] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Get rid of Zero landing support [puppet] - 10https://gerrit.wikimedia.org/r/482492 (https://phabricator.wikimedia.org/T187716) (owner: 10MaxSem) [17:05:45] <_joe_> ok, let's merge and verify on mwdebug1002 [17:06:38] (03Abandoned) 10Jbond42: address review comments [puppet] - 10https://gerrit.wikimedia.org/r/482838 (owner: 10Jbond42) [17:07:25] <_joe_> jbond42: gerrit is hard to get used to once you've been using GH or GH clones for a long time :) [17:08:31] _joe_: Sounds good! [17:08:33] <_joe_> but now that I'm used to it, I can't really use gh anymore [17:08:45] yes im use to github and gitlab still getting to grips with gerrit [17:08:46] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/482831 (https://phabricator.wikimedia.org/T213079) (owner: 10Jbond42) [17:09:06] jbond42: you'll love to hate gerrit in a few weeks! [17:09:12] :D [17:09:38] <_joe_> James_F: mwdebug1001 has the change applied [17:10:14] PROBLEM - restbase endpoints health on restbase1012 is CRITICAL: /en.wikipedia.org/v1/page/summary/{title} (Get summary from storage) timed out before a response was received: /en.wikipedia.org/v1/page/html/{title} (Get html by title from storage) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was recei [17:10:14] a.org/v1/page/mobile-sections/{title}{/revision} (Get mobile-sections for a test page on enwiki) timed out before a response was received [17:10:32] RECOVERY - Restbase root url on restbase1012 is OK: HTTP OK: HTTP/1.1 200 - 16234 bytes in 0.012 second response time [17:10:56] RECOVERY - Restbase root url on restbase1015 is OK: HTTP OK: HTTP/1.1 200 - 16234 bytes in 0.012 second response time [17:11:00] RECOVERY - Restbase root url on restbase1017 is OK: HTTP OK: HTTP/1.1 200 - 16234 bytes in 0.006 second response time [17:11:08] _joe_: I think that looks right. [17:11:22] RECOVERY - restbase endpoints health on restbase1012 is OK: All endpoints are healthy [17:11:24] 10Operations, 10Discovery-Search (Current work): Some queries causes wdqs-blazegraph on wdqs1006 to crash and restart - https://phabricator.wikimedia.org/T213191 (10Mathew.onipe) [17:11:37] (03PS5) 10Jbond42: add jbond user as part of onboarding [puppet] - 10https://gerrit.wikimedia.org/r/482831 (https://phabricator.wikimedia.org/T213079) [17:11:45] 10Operations, 10ops-eqiad, 10Analytics: Rack A2's hosts alarm for PSU broken - https://phabricator.wikimedia.org/T212861 (10RobH) So I asked for an update for the quote on T210776 and nothing yet. Dell acknowledges they received and are working on it. If we do not have a quote back today, I'd recommend s... [17:12:12] _joe_: they're transient and will stop very soon [17:13:17] <_joe_> James_F: uhm is mobilelanding.php configured to redirect to www anyways? [17:14:14] _joe_: Why? [17:14:18] _joe_: Or "is"? [17:14:26] PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received [17:14:43] (03CR) 10Muehlenhoff: [C: 03+2] add jbond user as part of onboarding [puppet] - 10https://gerrit.wikimedia.org/r/482831 (https://phabricator.wikimedia.org/T213079) (owner: 10Jbond42) [17:15:06] _joe_: It point[s/ed] the user to zero.wikipedia.org vs. www.wikipedia.org. [17:15:30] RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy [17:16:47] _joe_: Next step is to delete mobilelanding.php, but I can't do that until prod doesn't point at it. :-) [17:16:55] !log restarted Blazegraph wdqs1006 due to unresponsiveness (caused by load?) [17:16:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:17:31] <_joe_> James_F: you can do so in ~ 30 minutes [17:17:43] _joe_: Brilliant. [17:20:57] <_joe_> !log depooling mw1299 for testing of the apache change [17:20:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:21:30] PROBLEM - Restbase root url on restbase1011 is CRITICAL: connect to address 10.64.0.113 and port 7231: Connection refused [17:24:42] 10Operations, 10Release Pipeline, 10Release-Engineering-Team, 10Core Platform Team Backlog (Watching / External), and 2 others: Migrate production services to kubernetes using the pipeline - https://phabricator.wikimedia.org/T198901 (10thcipriani) [17:24:45] !log roll restart of aqs on aqs100* to pick up new Druid settings [17:24:45] 10Operations, 10Prod-Kubernetes, 10Release Pipeline, 10Release-Engineering-Team, 10Documentation: TEC3:O6:O:6.1:Q3: Deployment Pipeline Documentation - https://phabricator.wikimedia.org/T213090 (10thcipriani) [17:24:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:25:36] RECOVERY - Restbase root url on restbase1011 is OK: HTTP OK: HTTP/1.1 200 - 16234 bytes in 0.014 second response time [17:30:44] (03PS1) 10Elukey: profile::statistics::private: allow geoip to use kerberos [puppet] - 10https://gerrit.wikimedia.org/r/482854 [17:32:10] 10Operations, 10ops-eqiad, 10RESTBase, 10RESTBase-Cassandra, and 3 others: Memory error on restbase1016 - https://phabricator.wikimedia.org/T212418 (10Eevans) We're currently in the process of force-removing these instances. We'll need to coordinate when the host comes back up, as we'll have to re-bootstr... [17:33:27] <_joe_> !log applying the new apache configuration to jobrunners in eqiad [17:33:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:33:33] (03PS6) 10Dzahn: zuul: allow email connection [puppet] - 10https://gerrit.wikimedia.org/r/376739 (https://phabricator.wikimedia.org/T93414) (owner: 10Hashar) [17:34:49] (03CR) 10Dzahn: [C: 03+2] zuul: allow email connection [puppet] - 10https://gerrit.wikimedia.org/r/376739 (https://phabricator.wikimedia.org/T93414) (owner: 10Hashar) [17:35:36] 10Operations, 10Release-Engineering-Team: Add IRC SRE bot for SAL !log actions to #wikimedia-serviceops - https://phabricator.wikimedia.org/T213196 (10Jdforrester-WMF) [17:36:08] 10Operations, 10ORES, 10Scoring-platform-team, 10Release Pipeline (Blubber): Build blubber file for ORES - https://phabricator.wikimedia.org/T210268 (10thcipriani) [17:36:20] (03CR) 10Dzahn: [C: 03+2] "applied on contint1001/2001. zuul-merger got restarted" [puppet] - 10https://gerrit.wikimedia.org/r/376739 (https://phabricator.wikimedia.org/T93414) (owner: 10Hashar) [17:37:00] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@503b29c]: Add test-commons and nap.wikisource - T210752 T197616 (duration: 96m 50s) [17:37:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:37:04] T210752: Create Wikisource Neapolitan - https://phabricator.wikimedia.org/T210752 [17:37:04] T197616: Create a production test wiki in group0 to parallel Wikimedia Commons - https://phabricator.wikimedia.org/T197616 [17:37:36] (03PS3) 10Cwhite: hiera: add wmcs cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/482149 (https://phabricator.wikimedia.org/T210486) [17:37:54] 10Operations, 10ORES, 10Scoring-platform-team, 10Release Pipeline (Blubber): Build blubber file for ORES - https://phabricator.wikimedia.org/T210268 (10thcipriani) [17:37:56] 10Operations, 10ORES, 10Scoring-platform-team: [Epic] Deploy ORES in kubernetes cluster - https://phabricator.wikimedia.org/T182331 (10thcipriani) [17:38:02] 10Operations, 10ORES, 10Scoring-platform-team, 10Release Pipeline (Blubber), 10Release-Engineering-Team (Backlog): The continuous release pipeline should support more than one service per repo - https://phabricator.wikimedia.org/T210267 (10thcipriani) [17:38:05] (03CR) 10Cwhite: [C: 03+2] hiera: add wmcs cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/482149 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [17:38:18] !log mobrovac@deploy1001 Started deploy [restbase/deploy@503b29c]: Add test-commons and nap.wikisource, take #2 [17:38:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:38:58] 10Operations, 10ops-eqiad, 10netops: Move servers off asw2-a5-eqiad - https://phabricator.wikimedia.org/T212348 (10ayounsi) [17:39:59] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/14216/stat1007.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/482854 (owner: 10Elukey) [17:40:06] (03PS2) 10Elukey: profile::statistics::private: allow geoip to use kerberos [puppet] - 10https://gerrit.wikimedia.org/r/482854 [17:40:47] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@503b29c]: Add test-commons and nap.wikisource, take #2 (duration: 02m 29s) [17:40:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:40:52] (03PS3) 10Dzahn: httpd::mpm: add support for a content param [puppet] - 10https://gerrit.wikimedia.org/r/481932 (owner: 10Paladox) [17:41:05] !log installing libseccomp updates from stretch point release [17:41:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:45:27] (03CR) 10Dzahn: [C: 03+2] httpd::mpm: add support for a content param [puppet] - 10https://gerrit.wikimedia.org/r/481932 (owner: 10Paladox) [17:45:37] (03PS4) 10Dzahn: httpd::mpm: add support for a content param [puppet] - 10https://gerrit.wikimedia.org/r/481932 (owner: 10Paladox) [17:46:20] 10Operations: Add IRC SRE bot for SAL !log actions to #wikimedia-serviceops - https://phabricator.wikimedia.org/T213196 (10greg) [17:49:13] 10Operations, 10Prod-Kubernetes, 10Release Pipeline, 10Release-Engineering-Team, 10Documentation: Document helm chart creation - https://phabricator.wikimedia.org/T213197 (10thcipriani) p:05Triage→03Normal [17:49:26] 10Operations, 10Recommendation-API, 10Research, 10SRE-Access-Requests, and 3 others: Add Baha as a deployer for Recommendation API - https://phabricator.wikimedia.org/T212945 (10herron) >>! In T212945#4863092, @mobrovac wrote: > @herron manager approval received. Has the request been discussed during the S... [17:50:11] 10Operations, 10Prod-Kubernetes, 10Release-Engineering-Team, 10Documentation, 10Release Pipeline (Blubber): Update Blubber documentation - https://phabricator.wikimedia.org/T213198 (10thcipriani) p:05Triage→03Normal [17:55:45] RECOVERY - MariaDB Slave Lag: s5 on db1124 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [17:57:01] (03PS1) 10MSantos: Change frequency of OSM replication on maps1004 [puppet] - 10https://gerrit.wikimedia.org/r/482860 [18:00:04] cscott, arlolra, subbu, halfak, and Amir1: (Dis)respected human, time to deploy Services – Graphoid / Parsoid / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190108T1800). Please do the needful. [18:00:42] Nothing for ORES today. [18:01:49] (03CR) 10Dzahn: mediawiki: rename the maintenance role to match other roles (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/479131 (owner: 10Dzahn) [18:02:01] (03PS3) 10Dzahn: mediawiki: rename the maintenance role to match other roles [puppet] - 10https://gerrit.wikimedia.org/r/479131 [18:14:59] (03PS2) 10ArielGlenn: move iohandler code for compression/decompression out to a separate file [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/441484 (https://phabricator.wikimedia.org/T213200) [18:15:02] (03PS4) 10ArielGlenn: use iohandlers for recompressxml input and output [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/441485 (https://phabricator.wikimedia.org/T213200) [18:15:04] (03PS3) 10ArielGlenn: option to skip siteinfo header, mw footer for recompresing files [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/442774 (https://phabricator.wikimedia.org/T213200) [18:15:05] (03PS3) 10ArielGlenn: options for writeuptopageid to skip writing header or footer [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/442775 (https://phabricator.wikimedia.org/T213200) [18:15:07] (03PS1) 10ArielGlenn: version 0.0.9 [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/482861 (https://phabricator.wikimedia.org/T213200) [18:15:16] * apergos snickers and runs away [18:18:59] hehehe [18:20:38] (03PS1) 10Gehel: wdqs: reduce heap size to 31G to keep compressed oops [puppet] - 10https://gerrit.wikimedia.org/r/482863 [18:23:06] (03Abandoned) 10Jforrester: Re-write mobilelanding.php to not break when we drop ZeroBanner [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482098 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [18:24:06] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482865 [18:26:36] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482865 (owner: 10Marostegui) [18:27:15] !log restarting s5 replication on labsdb1009/10/11 [18:27:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:27:43] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482865 (owner: 10Marostegui) [18:27:57] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1105:3311" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482865 (owner: 10Marostegui) [18:28:45] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1105:3311 T86338 T202167 (duration: 00m 45s) [18:28:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:28:49] T86338: Dropping page.page_counter on wmf databases - https://phabricator.wikimedia.org/T86338 [18:28:49] T202167: Schema change for rc_this_oldid index - https://phabricator.wikimedia.org/T202167 [18:35:31] (03PS1) 10Ottomata: WIP Serve event-schemas repo via http [puppet] - 10https://gerrit.wikimedia.org/r/482867 (https://phabricator.wikimedia.org/T211247) [18:36:19] (03CR) 10jerkins-bot: [V: 04-1] WIP Serve event-schemas repo via http [puppet] - 10https://gerrit.wikimedia.org/r/482867 (https://phabricator.wikimedia.org/T211247) (owner: 10Ottomata) [18:38:13] (03PS2) 10Ottomata: WIP Serve event-schemas repo via http [puppet] - 10https://gerrit.wikimedia.org/r/482867 (https://phabricator.wikimedia.org/T211247) [18:38:36] !log temporarily permit ssh from frpm1001 to pfw3-eqiad on pfw3-eqiad [18:38:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:39:00] (03CR) 10jerkins-bot: [V: 04-1] WIP Serve event-schemas repo via http [puppet] - 10https://gerrit.wikimedia.org/r/482867 (https://phabricator.wikimedia.org/T211247) (owner: 10Ottomata) [18:41:22] (03PS3) 10Ottomata: WIP Serve event-schemas repo via http [puppet] - 10https://gerrit.wikimedia.org/r/482867 (https://phabricator.wikimedia.org/T211247) [18:47:54] (03CR) 10Ottomata: "https://puppet-compiler.wmflabs.org/compiler1002/14218/thorium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/482867 (https://phabricator.wikimedia.org/T211247) (owner: 10Ottomata) [18:49:45] !log arlolra@deploy1001 Started deploy [parsoid/deploy@4b82683]: Updating Parsoid to 2c5dc7b [18:49:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:50:35] !log add NAT workaround to pfw3-eqiad - T211028 [18:50:37] !log Drop valid_tag table from s4 - T212254 [18:50:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:50:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:50:45] T212254: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 [18:54:32] !log make pfw3-codfw source NAT similar to pfw3-eqiad - T211028 [18:54:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:00:04] Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190108T1900) [19:00:25] !log arlolra@deploy1001 Finished deploy [parsoid/deploy@4b82683]: Updating Parsoid to 2c5dc7b (duration: 10m 40s) [19:00:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:01:29] 10Operations, 10DBA, 10Data-Services: db1082 power loss resulted on mysql crash - https://phabricator.wikimedia.org/T213108 (10jcrespo) This is mostly fixed, except gtid must be enabled on 82 and 1124, plus 82 must be repooled. [19:05:24] (03PS5) 10Jforrester: Install but don't enable the WikibaseMediaInfo extension, part IV [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446844 (https://phabricator.wikimedia.org/T180981) [19:05:26] (03PS4) 10Jforrester: Enable WikibaseMediaInfo on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/466955 (https://phabricator.wikimedia.org/T159708) [19:05:28] (03PS2) 10Jforrester: [Beta Cluster] Cleanup SDC config, all same as prod now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470459 [19:06:54] !log Drop valid_tag table from s1 - T212254 [19:06:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:06:56] T212254: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 [19:11:39] !log Updated Parsoid to 2c5dc7b (T197616, T205491, T209772, T199926, T209194, T204622) [19:11:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:11:51] T197616: Create a production test wiki in group0 to parallel Wikimedia Commons - https://phabricator.wikimedia.org/T197616 [19:11:51] T205491: QuoteTransformer and quote-tokens use a ".value" property on the token instead of adding it to the token's attributes - https://phabricator.wikimedia.org/T205491 [19:11:52] T209772: visitDOM makes an invalid assumption about its handlers - https://phabricator.wikimedia.org/T209772 [19:11:52] T199926: html -> wt: Parsoid sometimes trips up on | chars in hrefs - https://phabricator.wikimedia.org/T199926 [19:11:53] T204622: Use native Javascript (ES6) classes instead of prototype-based definition pattern in the Parsoid codebase - https://phabricator.wikimedia.org/T204622 [19:11:53] T209194: Export one "real" ES6 class per file - https://phabricator.wikimedia.org/T209194 [19:21:15] 10Operations: Add IRC SRE bot for SAL !log actions to #wikimedia-serviceops - https://phabricator.wikimedia.org/T213196 (10herron) p:05Triage→03Normal I'm aware of the expectation for this to work for messages sent to `#wikimedia-operations`. Is there consensus that should should work in other channels as w... [19:22:21] 10Operations, 10monitoring: Upgrade metrics monitoring infrastructure core components (FY2018-2019 Q3 TEC6) - https://phabricator.wikimedia.org/T213158 (10herron) p:05Triage→03Normal [19:24:06] (03PS1) 10ArielGlenn: specify output file explicitly for recompress dump jobs [dumps] - 10https://gerrit.wikimedia.org/r/482870 (https://phabricator.wikimedia.org/T213200) [19:24:10] 10Operations, 10Wikimedia-Logstash, 10User-herron: Increase utilization of application logging pipeline (FY2018-2019 Q3 TEC6) - https://phabricator.wikimedia.org/T213157 (10herron) p:05Triage→03Normal [19:24:50] 10Operations, 10Wikimedia-Logstash, 10User-herron: Increase utilization of application logging pipeline (FY2018-2019 Q3 TEC6) - https://phabricator.wikimedia.org/T213157 (10herron) [19:25:45] 10Operations: Add IRC SRE bot for SAL !log actions to #wikimedia-serviceops - https://phabricator.wikimedia.org/T213196 (10Jdforrester-WMF) Cloud Services and RelEng have the bot installed in `#wikimedia-cloud` and `#wikimedia-releng` respectively to point to the Cloud SAL. Adding it for the main SAL in other Op... [19:26:27] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack and cable frav1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T213104 (10herron) p:05Triage→03Normal [19:26:45] (03PS1) 10Kosta Harlan: GrowthExperiments help panel: Add more example links in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482871 (https://phabricator.wikimedia.org/T211206) [19:27:58] 10Operations, 10Wikidata, 10Wikidata-Query-Service: Some queries causes wdqs-blazegraph on wdqs1006 to crash and restart - https://phabricator.wikimedia.org/T213191 (10Smalyshev) The exceptions above look normal, but we did have a downtime on wdqs1006 which seems to be caused by a flood of incoming requests.... [19:28:27] (03PS2) 10Kosta Harlan: GrowthExperiments help panel: Add more example links in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482871 (https://phabricator.wikimedia.org/T212905) [19:28:30] 10Operations, 10Wikidata, 10Wikidata-Query-Service: Some queries causes wdqs-blazegraph on wdqs1006 to crash and restart - https://phabricator.wikimedia.org/T213191 (10Smalyshev) [19:29:05] 10Operations, 10Packaging, 10uprightdiff, 10Parsoid-Tests: stretch version of uprightdiff package - https://phabricator.wikimedia.org/T212987 (10herron) p:05Triage→03Normal [19:39:27] (03PS1) 10Dduvall: group0 to 1.33.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482874 [19:46:29] (03CR) 10Dzahn: [C: 03+2] "all that changes is the role name https://puppet-compiler.wmflabs.org/compiler1002/14217/" [puppet] - 10https://gerrit.wikimedia.org/r/479131 (owner: 10Dzahn) [19:46:39] (03PS4) 10Dzahn: mediawiki: rename the maintenance role to match other roles [puppet] - 10https://gerrit.wikimedia.org/r/479131 [19:48:16] ^ i will check deployment-prep now to adjust role name in wmcs [19:48:22] !log dduvall@deploy1001 Pruned MediaWiki: 1.33.0-wmf.12 (duration: 06m 26s) [19:48:22] (if needed) [19:48:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:48:51] !log krinkle@deploy1001 Started deploy [performance/navtiming@68fd54d]: (no justification provided) [19:48:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:48:56] !log krinkle@deploy1001 Finished deploy [performance/navtiming@68fd54d]: (no justification provided) (duration: 00m 05s) [19:48:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:49:07] 10Operations, 10Wikimedia-General-or-Unknown: load.php URL hanging sometimes - https://phabricator.wikimedia.org/T213030 (10herron) p:05Triage→03Normal Hello, thank you for the report. Could you please describe a little bit the environment from which you are seeing this problem? What type of network(s) a... [19:50:58] (03CR) 10Dzahn: [C: 03+2] "i used https://tools.wmflabs.org/openstack-browser/puppetclass/role::mediawiki_maintenance to check where this is used in wmcs and going t" [puppet] - 10https://gerrit.wikimedia.org/r/479131 (owner: 10Dzahn) [19:55:07] (03CR) 10Catrope: [C: 03+2] GrowthExperiments help panel: Add more example links in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482871 (https://phabricator.wikimedia.org/T212905) (owner: 10Kosta Harlan) [19:55:36] (03CR) 10Dzahn: [C: 03+2] "role changed on deployment-mwmaint01 in deployment-prep" [puppet] - 10https://gerrit.wikimedia.org/r/479131 (owner: 10Dzahn) [19:56:11] (03Merged) 10jenkins-bot: GrowthExperiments help panel: Add more example links in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482871 (https://phabricator.wikimedia.org/T212905) (owner: 10Kosta Harlan) [19:58:14] (03PS2) 10Krinkle: Fix minor tech debt around AuthManager audit logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479156 [20:00:04] marxarelli: How many deployers does it take to do MediaWiki train - Americas version deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190108T2000). [20:02:26] (03CR) 10Paladox: "@Esanders ah ok." [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [20:02:53] marxarelli: train blocked or not? If not, would deploy some minor clean up in mw-config. [20:03:04] if blocked and not deploying [20:03:23] 10Operations, 10Wikimedia-General-or-Unknown: load.php URL hanging sometimes - https://phabricator.wikimedia.org/T213030 (10Ciencia_Al_Poder) This happened at home (no proxy, etc. Normal ADSL). I'm using Firefox on linux (OpenSUSE). When this happened (page taking too long to actually display anything), I copi... [20:03:43] Krinkle: not blocked, just late :) [20:03:58] marxarelli: np, saw the task had an unresolved subtask [20:04:36] right. it's unresolved but patches were merged to master and addshore was able to verify them on beta, so we're unblocked [20:05:32] ah, i see it's still UBN [20:05:35] i'll change it [20:05:59] oh nm, James_F did it already [20:07:08] (03CR) 10jenkins-bot: GrowthExperiments help panel: Add more example links in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482871 (https://phabricator.wikimedia.org/T212905) (owner: 10Kosta Harlan) [20:07:13] marxarelli: Yeah, I didn't want to close it in case the WMDE people want to do more on the task. [20:07:24] (Yay for everyone having their own Phabricator processes. ;-)) [20:07:38] :) np [20:09:25] 10Operations, 10VisualEditor, 10Wikimedia-Apache-configuration: Visual Editor gets stuck opening article - https://phabricator.wikimedia.org/T213214 (10matmarex) I was able to reproduce this right after you filed the task, but it started working again after a few minutes. I'm using Chromium 71 (Opera 58). I... [20:09:58] (03CR) 10Dzahn: [C: 03+1] "ah, i somehow expected it would have to be converted to mariadb::config but also despite what the ticket might sound like i haven't actual" [puppet] - 10https://gerrit.wikimedia.org/r/482693 (https://phabricator.wikimedia.org/T162070) (owner: 10Ottomata) [20:11:04] !log change BGP_fundraising_aggregates term nat from static to aggregate on pfw3-eqiad - T211028 [20:11:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:47] (03CR) 10Dzahn: [C: 03+1] ":) thanks Thifranc and Effie. We should not get that cronspam anymore we have been getting for years:)" [puppet] - 10https://gerrit.wikimedia.org/r/470877 (https://phabricator.wikimedia.org/T150375) (owner: 10Thifranc) [20:14:12] (03PS1) 10Jforrester: TestCommons: Don't enable entities, we're not Wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482877 [20:14:47] marxarelli: I've got a quick testcommons-only config patch ^^, can I push it now? [20:15:01] (03CR) 10Smalyshev: [C: 03+1] wdqs: reduce heap size to 31G to keep compressed oops [puppet] - 10https://gerrit.wikimedia.org/r/482863 (owner: 10Gehel) [20:15:03] (03CR) 10jerkins-bot: [V: 04-1] TestCommons: Don't enable entities, we're not Wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482877 (owner: 10Jforrester) [20:15:10] James_F: sure [20:15:24] Thanks. [20:15:44] (03PS2) 10Jforrester: TestCommons: Don't enable entities, we're not Wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482877 [20:15:59] (03CR) 10Dzahn: "thanks for taking that :)" [puppet] - 10https://gerrit.wikimedia.org/r/482681 (https://phabricator.wikimedia.org/T213052) (owner: 10Volans) [20:16:55] (03CR) 10Jforrester: [C: 03+2] TestCommons: Don't enable entities, we're not Wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482877 (owner: 10Jforrester) [20:16:56] (03CR) 10Dzahn: "same here, thank you for fixing this oversight" [puppet] - 10https://gerrit.wikimedia.org/r/482403 (https://phabricator.wikimedia.org/T212969) (owner: 10Volans) [20:18:00] (03Merged) 10jenkins-bot: TestCommons: Don't enable entities, we're not Wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482877 (owner: 10Jforrester) [20:18:08] (03CR) 10Dzahn: [C: 03+2] "yup, already cherry-picked" [puppet] - 10https://gerrit.wikimedia.org/r/440539 (https://phabricator.wikimedia.org/T197469) (owner: 10Hashar) [20:18:16] (03PS5) 10Dzahn: ci: add some gated extensions to git cache [puppet] - 10https://gerrit.wikimedia.org/r/440539 (https://phabricator.wikimedia.org/T197469) (owner: 10Hashar) [20:20:31] (03CR) 10jenkins-bot: TestCommons: Don't enable entities, we're not Wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482877 (owner: 10Jforrester) [20:21:32] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: TestCommons: Don't enable entities, we're not Wikidata.org (duration: 01m 44s) [20:21:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:21:41] marxarelli: OK, deployed. Thanks! [20:21:55] cool! [20:24:17] !log starting data copy from wdqs1004 to wdqs1007 (both will be depooled) - T213217 [20:24:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:24:19] T213217: Copy database from wdq[345] to wdq7 and wdq8 - https://phabricator.wikimedia.org/T213217 [20:24:27] !log dduvall@deploy1001 Started scap: testwiki to php-1.33.0-wmf.12 and rebuild l10n cache [20:24:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:25:18] (03CR) 10Dzahn: [C: 03+2] admin: run tox suite for any changes in data dir [puppet] - 10https://gerrit.wikimedia.org/r/482812 (owner: 10Hashar) [20:25:29] (03PS2) 10Dzahn: admin: run tox suite for any changes in data dir [puppet] - 10https://gerrit.wikimedia.org/r/482812 (owner: 10Hashar) [20:27:19] (03CR) 10Dzahn: "merged that parent change to run test on changes to any file in admin module" [puppet] - 10https://gerrit.wikimedia.org/r/482611 (owner: 10Hashar) [20:29:17] (03PS1) 10Cwhite: hiera: add cluster definition for graphite [puppet] - 10https://gerrit.wikimedia.org/r/482884 (https://phabricator.wikimedia.org/T210486) [20:29:19] 10Operations, 10Wikimedia-Mailing-lists: recovering wikimedia-mx mailing list password - https://phabricator.wikimedia.org/T212920 (10herron) 05Open→03Resolved a:03herron Hello, this list password has been reset and the new value automatically sent to the owner by the system. Please don't hesitate to re... [20:30:34] (03CR) 10Dzahn: [C: 03+1] profile: point to real modules for specs [puppet] - 10https://gerrit.wikimedia.org/r/480957 (owner: 10Hashar) [20:30:52] 10Operations, 10Wikimedia-Mailing-lists, 10User-Urbanecm: Post hold because of "invalid headers" in wikimediacz-l - https://phabricator.wikimedia.org/T210223 (10herron) 05Open→03Resolved Please consider this a soft-close and reopen if any follow up is needed. Thanks! [20:31:45] (03CR) 10Dzahn: [C: 03+1] hhvm: fix typo in RUN_AS_GROUP [puppet] - 10https://gerrit.wikimedia.org/r/474910 (https://phabricator.wikimedia.org/T209946) (owner: 10Hashar) [20:33:15] (03CR) 10Dzahn: "are we actually expecting a scenario where a human user goes to the server and edits docs with a text editor in the document root?" [puppet] - 10https://gerrit.wikimedia.org/r/480798 (https://phabricator.wikimedia.org/T213169) (owner: 10Hashar) [20:36:08] 10Operations, 10Packaging, 10uprightdiff, 10Parsoid-Tests: stretch version of uprightdiff package - https://phabricator.wikimedia.org/T212987 (10Dzahn) @Legoktm Thank you! It's definitely not an issue to wait a few more days. [20:37:59] 10Operations, 10Wikimedia-Mailing-lists: Create mailing list for Wikimedia's Google Code-in mentors - https://phabricator.wikimedia.org/T212747 (10herron) 05Open→03Resolved a:03herron This list has been created an initial password emailed by the system to `aklapper@wm`. The list has been set to "confirm... [20:46:43] musikanimal: hello :) [20:46:57] hey! [20:47:14] argh sorry [20:47:21] musikanimal: I poked the wrong person. But hi still! [20:47:26] haha, okay [20:47:59] I got pinged in -operations and my heart dropped... thought I broke something! [20:50:00] (03CR) 10Dzahn: [C: 03+1] robots.php: Drop the special treatment for Wikipedia Zero [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482100 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [20:52:10] (03CR) 10Dzahn: "ehm.. actually not so sure anymore per " still receiving a couple thousand requests per day on the zero subdomain," https://phabricator.wi" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482100 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [20:54:10] (03CR) 10Hashar: "I tried to describe the use cases on T213169 . We sometime have to manually delete directories, typically when a repository is archived fr" [puppet] - 10https://gerrit.wikimedia.org/r/480798 (https://phabricator.wikimedia.org/T213169) (owner: 10Hashar) [20:54:54] PROBLEM - High CPU load on API appserver on mw1317 is CRITICAL: CRITICAL - load average: 74.04, 32.95, 21.79 [20:55:12] PROBLEM - High CPU load on API appserver on mw1227 is CRITICAL: CRITICAL - load average: 55.28, 24.12, 14.80 [20:55:32] PROBLEM - High CPU load on API appserver on mw1233 is CRITICAL: CRITICAL - load average: 52.07, 22.65, 13.89 [20:56:08] RECOVERY - High CPU load on API appserver on mw1317 is OK: OK - load average: 34.67, 30.31, 21.76 [20:56:46] RECOVERY - High CPU load on API appserver on mw1233 is OK: OK - load average: 24.57, 20.74, 13.91 [20:57:06] (03PS1) 10Jforrester: dblists: Split 'wikidata' into 'wikidatarepo' as well for Commonses [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482891 [20:57:20] checked API request rate on grafana and did not look spikey [20:57:40] RECOVERY - High CPU load on API appserver on mw1227 is OK: OK - load average: 22.10, 24.07, 16.32 [20:57:41] could be my scap? [20:59:10] PROBLEM - Memory correctable errors -EDAC- on thumbor1004 is CRITICAL: 4.001 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=thumbor1004&var-datasource=eqiad+prometheus/ops [20:59:21] not sure, but if that matches up with a deployment, possible [21:01:15] (03PS1) 10Cwhite: hiera: add cluster definition to webperf servers [puppet] - 10https://gerrit.wikimedia.org/r/482894 (https://phabricator.wikimedia.org/T210486) [21:03:29] 10Operations, 10DBA, 10Data-Services: db1082 power loss resulted on mysql crash - https://phabricator.wikimedia.org/T213108 (10bd808) [21:03:49] !log dduvall@deploy1001 Finished scap: testwiki to php-1.33.0-wmf.12 and rebuild l10n cache (duration: 39m 22s) [21:03:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:06:14] 10Operations, 10Continuous-Integration-Infrastructure, 10SRE-Access-Requests, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Grant sudo access for CI admins to doc.wikimedia.org publishing user - https://phabricator.wikimedia.org/T213169 (10Dzahn) Thanks for excellent justification for https://... [21:07:47] (03CR) 10Dzahn: [C: 03+1] "thank you Hashar, that ticket is very nice as justification for the access request" [puppet] - 10https://gerrit.wikimedia.org/r/480798 (https://phabricator.wikimedia.org/T213169) (owner: 10Hashar) [21:10:20] (03CR) 10Dzahn: "found these about the "cluster.mailers"" [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) (owner: 10Paladox) [21:12:26] musikanimal: is the train done? [21:12:47] (03CR) 10Dzahn: "https://secure.phabricator.com/T13053 https://secure.phabricator.com/T12677" [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) (owner: 10Paladox) [21:16:04] (03CR) 10MaxSem: [C: 03+1] "It's fine. This code is for language subdomains (e.g. en.zero.wikipedia.org), and we don't even have them anymore. zero.wikipedia.org is n" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482100 (https://phabricator.wikimedia.org/T212865) (owner: 10Jforrester) [21:19:58] (03CR) 10Dzahn: "https://secure.phabricator.com/book/phabricator/article/configuring_inbound_email/" [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) (owner: 10Paladox) [21:23:48] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: rack and cable frav1002.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T213104 (10RobH) a:03Cmjohnson Jeff updated the task with the following: > I've reserved 10.64.40.74 and hostname will be frav1002.frack.eqiad.wmnet. > > The first two ethern... [21:25:02] (03PS2) 10Dduvall: group0 to 1.33.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482874 [21:25:03] 10Operations, 10Wikimedia-Mailing-lists: Create mailing list for Wikimedia's Google Code-in mentors - https://phabricator.wikimedia.org/T212747 (10Aklapper) Thanks herron! [21:25:28] (03PS7) 10Paladox: phabricator: Add new cluster.mailers [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) [21:26:25] (03CR) 10Dduvall: [C: 03+2] group0 to 1.33.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482874 (owner: 10Dduvall) [21:26:27] (03CR) 10Dzahn: "https://secure.phabricator.com/book/phabricator/article/configuring_outbound_email/" [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) (owner: 10Paladox) [21:27:32] (03Merged) 10jenkins-bot: group0 to 1.33.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482874 (owner: 10Dduvall) [21:29:35] (03PS8) 10Paladox: phabricator: Add new cluster.mailers [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) [21:30:02] herron: services, like Phabricator, should send mail to localhost and not directly to mx servers, right? https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/482400/7/modules/profile/manifests/phabricator/main.pp [21:30:11] cc paladox [21:30:35] oh that explains why we are not using tls for mail. [21:30:35] mutante: yeah, services should relay to localhost [21:30:54] this way they queue and failover without special config at the service level [21:31:03] !log dduvall@deploy1001 rebuilt and synchronized wikiversions files: group0 to 1.33.0-wmf.12 [21:31:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:32:25] sendmail: Use the local sendmail binary. [21:32:26] smtp: Connect directly to an SMTP server. [21:32:36] so... smtp but .. using localhost [21:33:10] paladox: can old and new mail config live together at the same time? [21:33:17] Im not sure. [21:33:23] herron: ack, thanks [21:35:12] (03PS1) 10Marostegui: db-eqiad.php: Depool db1123 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483004 (https://phabricator.wikimedia.org/T212254) [21:35:30] paladox: in some version it becomes mandatory to use only new config , right [21:36:29] yep [21:36:32] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1123 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483004 (https://phabricator.wikimedia.org/T212254) (owner: 10Marostegui) [21:36:41] (03CR) 10Dzahn: "the new config looks all good after looking at upstream docs and confirming we want to use smtp to localhost. as long as Phabricator is ok" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) (owner: 10Paladox) [21:36:42] it become manditory in the release that removes the old config [21:37:36] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1123 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483004 (https://phabricator.wikimedia.org/T212254) (owner: 10Marostegui) [21:37:38] does it generally ignore config settings that are unknown [21:37:50] because they will only be introduced in future version [21:38:20] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1123" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483005 [21:38:51] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1123 T212254 (duration: 00m 53s) [21:38:53] !log Drop valid_tag table from db1123 (s3) - T212254 [21:38:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:38:54] T212254: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 [21:38:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:39:02] (03CR) 10jenkins-bot: group0 to 1.33.0-wmf.12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482874 (owner: 10Dduvall) [21:39:04] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1123 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483004 (https://phabricator.wikimedia.org/T212254) (owner: 10Marostegui) [21:39:52] paladox: just not sure if the space is ok inside that string, better use - like in example? [21:40:14] which string? [21:40:30] $mail_config => key [21:40:58] key: Required string. A unique name for this mailer. [21:41:22] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1123" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483005 (owner: 10Marostegui) [21:42:04] ok [21:42:27] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1123" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483005 (owner: 10Marostegui) [21:42:33] Krinkle: all done with the train if you're still wanting to do some cleanup [21:42:36] done [21:42:38] (03PS9) 10Paladox: phabricator: Add new cluster.mailers [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) [21:43:28] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1123 T212254 (duration: 00m 53s) [21:43:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:44:01] thanks paladox, it looks good, with the only exception that phab doesn't go "omg, an unknown config option that i have not heard of" [21:44:09] can test on wmcs? [21:44:09] heh [21:44:12] yup [21:44:15] thanks! [21:44:44] (03PS1) 10Marostegui: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483006 (https://phabricator.wikimedia.org/T212254) [21:45:08] (03PS3) 10Krinkle: Fix minor tech debt around AuthManager audit logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479156 [21:45:12] (03CR) 10Krinkle: [C: 03+2] Fix minor tech debt around AuthManager audit logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479156 (owner: 10Krinkle) [21:45:19] marxarelli: okidoki [21:45:46] * Krinkle staging on mwdebug1001 [21:46:08] mutante tested and works [21:46:12] (mail still is sent) [21:46:13] Krinkle: Can I deploy mediawiki-config to depool a DB or should I wait till you are done? [21:46:19] (03Merged) 10jenkins-bot: Fix minor tech debt around AuthManager audit logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479156 (owner: 10Krinkle) [21:46:32] marostegui: go ahead, you'll fetch my change as well, but don't sync it :) [21:46:44] Krinkle: yeah, will only sync db-eqiad.php [21:47:00] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483006 (https://phabricator.wikimedia.org/T212254) (owner: 10Marostegui) [21:47:13] Krinkle: Once you're done I've got one I want to deploy. [21:47:17] paladox: cool! does it keep using the "metamta.*" config options? they are not changed in upstream? [21:47:30] Im not sure, carn't really tell. [21:47:36] paladox: so like the no-reply@ address [21:48:03] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483006 (https://phabricator.wikimedia.org/T212254) (owner: 10Marostegui) [21:48:05] i get a email from [21:48:29] paladox: no-reply sounds like metameta.* are unffected. cool [21:48:46] metamta [21:49:07] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1078" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483007 [21:49:08] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1078 T212254 (duration: 00m 53s) [21:49:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:49:10] !log Drop valid_tag table from db1078 (s3) - T212254 [21:49:11] T212254: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 [21:49:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:49:24] paladox: do you have metamta.domain set in Hiera in labs? [21:49:42] nope [21:49:45] the config does: [21:49:45] "no-reply@${domain}", [21:49:56] so it gets the domain from the hiera value i set [21:50:20] Krinkle: I am done, I will revert my change sync it and get out of your way! [21:50:21] i've removed the old config now [21:50:23] and get: [21:50:27] [21:50:43] so looks like a noop [21:51:10] hmm [21:51:11] metamta.default-addres [21:51:18] is still a config according to the docs [21:51:31] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1078" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483007 (owner: 10Marostegui) [21:52:44] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1078" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483007 (owner: 10Marostegui) [21:52:54] OK. [21:52:55] Syncing now [21:52:59] yup new config and old config work mutante [21:53:10] just the old config is taking precident over the new one [21:53:51] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1078 T212254 (duration: 00m 52s) [21:53:52] paladox: there is a second config value, called: [21:53:53] Krinkle: I am fully done! [21:53:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:54:14] paladox: metamta.reply-handler-domain ? [21:54:35] paladox: cool, then +1 for all of that [21:54:48] yup [21:54:51] (03CR) 10Dzahn: [C: 03+1] phabricator: Add new cluster.mailers [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) (owner: 10Paladox) [21:54:51] that still exists [21:55:30] (03CR) 10Dzahn: [C: 03+1] "the domain and sender address configured with metamta* settings all look unaffected and paladox tested in wmcs" [puppet] - 10https://gerrit.wikimedia.org/r/482400 (https://phabricator.wikimedia.org/T212989) (owner: 10Paladox) [21:55:38] (03PS12) 10Gergő Tisza: Make password policy and logging code saner [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481115 [21:57:19] Krinkle: You done with mw-config? [21:57:55] 10Operations, 10Phabricator, 10Release-Engineering-Team, 10Patch-For-Review: Convert Phabricator mail config to use cluster.mailers - https://phabricator.wikimedia.org/T212989 (10Dzahn) Thanks @Paladox, per IRC and Gerrit +1 to that config change. Could you add some info in what version exactly this will... [21:58:39] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1123" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483005 (owner: 10Marostegui) [21:58:43] (03CR) 10jenkins-bot: Fix minor tech debt around AuthManager audit logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479156 (owner: 10Krinkle) [21:58:45] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1078 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483006 (https://phabricator.wikimedia.org/T212254) (owner: 10Marostegui) [21:58:47] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1078" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483007 (owner: 10Marostegui) [21:58:51] 10Operations, 10Phabricator, 10Release-Engineering-Team, 10Patch-For-Review: Convert Phabricator mail config to use cluster.mailers - https://phabricator.wikimedia.org/T212989 (10Paladox) It has been removed in version 2019.01 (https://secure.phabricator.com/w/changelog/2019.01/) [21:59:07] 10Operations, 10Phabricator, 10Release-Engineering-Team, 10Patch-For-Review: Convert Phabricator mail config to use cluster.mailers - https://phabricator.wikimedia.org/T212989 (10Dzahn) https://secure.phabricator.com/book/phabricator/article/configuring_outbound_email/ [22:00:16] (03PS1) 10Cwhite: hiera: add cluster definition to poolcounter servers [puppet] - 10https://gerrit.wikimedia.org/r/483009 (https://phabricator.wikimedia.org/T210486) [22:01:48] James_F: sorry, not yet. [22:01:49] 1 minute [22:03:51] !log krinkle@deploy1001 Synchronized wmf-config/CommonSettings.php: cleanup - Idfa129a65a41 (duration: 00m 53s) [22:03:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:04:17] * Krinkle releases deploy handle [22:04:18] James_F: :) [22:04:35] Thanks! [22:04:44] (03CR) 10Catrope: [C: 03+1] Disable unused Flow extension on ur.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478464 (https://phabricator.wikimedia.org/T207627) (owner: 10Zoranzoki21) [22:05:17] (03CR) 10Catrope: [C: 03+1] Disable unused Flow extension on de.wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478463 (https://phabricator.wikimedia.org/T207626) (owner: 10Zoranzoki21) [22:07:25] (03CR) 10Jforrester: [C: 03+2] dblists: Split 'wikidata' into 'wikidatarepo' as well for Commonses [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482891 (owner: 10Jforrester) [22:07:28] 10Operations, 10Phabricator, 10Release-Engineering-Team, 10Patch-For-Review: Convert Phabricator mail config to use cluster.mailers - https://phabricator.wikimedia.org/T212989 (10Dzahn) looked at upstream docs, config change looks ok to me, confirmed with @herron services should use smtp to localhost and t... [22:07:58] 10Operations, 10Mail, 10Phabricator, 10Release-Engineering-Team, and 2 others: Convert Phabricator mail config to use cluster.mailers - https://phabricator.wikimedia.org/T212989 (10Dzahn) [22:08:25] !log Drop valid_tag table from db2043 with replication (s3 codfw master - lag will be generated) - T212254 [22:08:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:08:27] (03Merged) 10jenkins-bot: dblists: Split 'wikidata' into 'wikidatarepo' as well for Commonses [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482891 (owner: 10Jforrester) [22:08:28] T212254: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 [22:08:44] James_F: Stupid fat commonses? [22:08:53] Reedy: Indeed. :-) [22:12:07] 10Operations, 10fundraising-tech-ops, 10netops: Refresh Minfraud IP list - https://phabricator.wikimedia.org/T213100 (10Dzahn) p:05Triage→03High i am told this is in progress right now [22:12:54] !log forcing removal of restbase1016-b (host down way too long to salvage) -- T212418 [22:12:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:12:57] (03PS1) 10Jforrester: TestCommons: Add default search NSes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483015 [22:12:59] T212418: Memory error on restbase1016 - https://phabricator.wikimedia.org/T212418 [22:13:00] 10Operations, 10fundraising-tech-ops, 10netops: Refresh Minfraud IP list - https://phabricator.wikimedia.org/T213100 (10cwdent) [22:13:03] !log jforrester@deploy1001 Synchronized dblists/wikidatarepo.dblist: dblists: Add wikidatarepo list (duration: 00m 53s) [22:13:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:13:04] (03CR) 10jenkins-bot: dblists: Split 'wikidata' into 'wikidatarepo' as well for Commonses [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482891 (owner: 10Jforrester) [22:14:14] !log jforrester@deploy1001 Synchronized dblists/wikidata.dblist: dblists: Remove testcommons from wikidata list (duration: 00m 52s) [22:14:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:15:09] (03CR) 10Hashar: [C: 03+1] "Thank you. The zuul-merger can be restarted anytime, that is why puppet triggers a restart when the config change is applied." [puppet] - 10https://gerrit.wikimedia.org/r/376739 (https://phabricator.wikimedia.org/T93414) (owner: 10Hashar) [22:15:14] 10Operations, 10Recommendation-API, 10Research, 10SRE-Access-Requests, and 3 others: Add Baha as a deployer for Recommendation API - https://phabricator.wikimedia.org/T212945 (10bmansurov) [22:15:39] !log jforrester@deploy1001 scap failed: average error rate on 9/11 canaries increased by 10x (rerun with --force to override this check, see https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040 for details) [22:15:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:15:52] Sugar. [22:20:19] !log jforrester@deploy1001 Synchronized wmf-config/CommonSettings.php: dblists: Load wikibaserepo (duration: 00m 52s) [22:20:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:20:45] (03PS2) 10Jforrester: TestCommons: Add default search NSes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483015 [22:20:46] (03PS1) 10Jforrester: dblists: Load new wikibaserepo list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483016 [22:21:01] (03CR) 10Jforrester: [C: 03+2] dblists: Load new wikibaserepo list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483016 (owner: 10Jforrester) [22:21:24] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Use new wikidatarepo dblist where appropriate (duration: 00m 52s) [22:21:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:22:02] !log Ran /docroot/noc/createTxtFileSymlinks.sh for new dblist [22:22:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:22:08] (03Merged) 10jenkins-bot: dblists: Load new wikibaserepo list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483016 (owner: 10Jforrester) [22:23:50] (03CR) 10Jforrester: [C: 03+2] TestCommons: Add default search NSes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483015 (owner: 10Jforrester) [22:24:48] jouncebot: next [22:24:48] In 1 hour(s) and 35 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190109T0000) [22:24:56] (03Merged) 10jenkins-bot: TestCommons: Add default search NSes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483015 (owner: 10Jforrester) [22:26:03] (03CR) 10jenkins-bot: dblists: Load new wikibaserepo list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483016 (owner: 10Jforrester) [22:26:05] (03CR) 10jenkins-bot: TestCommons: Add default search NSes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483015 (owner: 10Jforrester) [22:27:08] !log jforrester@deploy1001 Synchronized wmf-config/InitialiseSettings.php: TestCommons: Add default search NSes (duration: 00m 51s) [22:27:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:27:24] OK, I'm done (for now). [22:29:44] !log starting data copy from wdqs1007 to wdqs1008 (both will be depooled) - T213217 [22:29:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:29:46] T213217: Copy database from wdq[345] to wdq7 and wdq8 - https://phabricator.wikimedia.org/T213217 [22:34:10] (03PS13) 10Gergő Tisza: Make password policy and logging code saner [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481115 [22:35:44] (03CR) 10Dzahn: [C: 03+1] profile::mediawiki::webserver: inline mediawiki::conftool [puppet] - 10https://gerrit.wikimedia.org/r/482791 (owner: 10Giuseppe Lavagetto) [22:39:12] !log deactivate policy-statement BGP_fundraising_aggregates term nat on pfw3-eqiad/codfw - T211028 [22:39:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:39:15] (03CR) 10Jforrester: [C: 03+1] Make password policy and logging code saner [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481115 (owner: 10Gergő Tisza) [22:40:58] 10Operations, 10serviceops, 10Patch-For-Review, 10User-jijiki: Create a mediawiki::cronjob define - https://phabricator.wikimedia.org/T211250 (10Dzahn) summarizing joe's work: 1) [[ https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/482790/ | introduction of systemd::timer::job ]] -> 2) [[https:/... [22:42:24] 10Operations, 10serviceops, 10Patch-For-Review, 10User-jijiki: Create a mediawiki::cronjob define - https://phabricator.wikimedia.org/T211250 (10Dzahn) [22:57:49] 10Operations, 10fundraising-tech-ops, 10netops: Refresh Minfraud IP list - https://phabricator.wikimedia.org/T213100 (10cwdent) @ayounsi the new rules are at 1546987554 [22:59:32] (03PS1) 10Gergő Tisza: Improve list of privileged groups [mediawiki-config] - 10https://gerrit.wikimedia.org/r/483022 [23:00:03] Update pfw3-codfw/eqiad security policies - T213100 [23:00:03] T213100: Refresh Minfraud IP list - https://phabricator.wikimedia.org/T213100 [23:00:23] !log Update pfw3-codfw/eqiad security policies - T213100 [23:00:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:03:44] (03CR) 10Dzahn: profile::mediawiki::maintenance: systemd-timer based periodic jobs (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) (owner: 10Giuseppe Lavagetto) [23:05:59] James_F: marxarelli yes i should have switched it from UBN to something else i guess. We left it open on our board so we could verify it in prod once deployed [23:06:19] addshore: Cool. :-) [23:06:23] I made sure i left a paper trail saying everything was fixed on the ticket though :P [23:06:32] dont want to confuse anyone too much [23:06:49] addshore: See also the last remaining blocker to deploying SDC on Thursday morning, pinged to you in -commons-sd. :-) [23:06:54] (No pressure.) [23:07:00] will look in 5 mins [23:08:56] (03CR) 10Dzahn: [C: 04-1] "typo in "conftoool::state"" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) (owner: 10Giuseppe Lavagetto) [23:11:23] (03CR) 10Dzahn: [C: 04-1] profile::mediawiki::maintenance: systemd-timer based periodic jobs (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) (owner: 10Giuseppe Lavagetto) [23:11:43] (03PS2) 10Dzahn: profile::mediawiki::maintenance: systemd-timer based periodic jobs [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) (owner: 10Giuseppe Lavagetto) [23:14:42] (03CR) 10Dzahn: "amended to fix compiler issue, now there is a new one: Could not find data item profile::conftool::state::ensure in any Hiera data file " [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) (owner: 10Giuseppe Lavagetto) [23:20:25] (03PS2) 10Dzahn: systemd: introduce timer::job define [puppet] - 10https://gerrit.wikimedia.org/r/482790 (owner: 10Giuseppe Lavagetto) [23:20:26] (03PS2) 10Dzahn: profile::mediawiki::webserver: inline mediawiki::conftool [puppet] - 10https://gerrit.wikimedia.org/r/482791 (owner: 10Giuseppe Lavagetto) [23:20:29] (03PS3) 10Dzahn: profile::mediawiki::maintenance: systemd-timer based periodic jobs [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) (owner: 10Giuseppe Lavagetto) [23:21:17] (03CR) 10jerkins-bot: [V: 04-1] systemd: introduce timer::job define [puppet] - 10https://gerrit.wikimedia.org/r/482790 (owner: 10Giuseppe Lavagetto) [23:25:44] (03PS4) 10Dzahn: profile::mediawiki::maintenance: systemd-timer based periodic jobs [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) (owner: 10Giuseppe Lavagetto) [23:28:13] (03CR) 10Dzahn: "PS3: rebased because i renamed mediawiki_maintenance to mediawiki::maintenance and need to use hieradata/role/common" [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) (owner: 10Giuseppe Lavagetto) [23:30:48] (03CR) 10Dzahn: "Could not find data item profile::conftool::state::query_interval in any Hiera data file" [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) (owner: 10Giuseppe Lavagetto) [23:31:39] 10Operations, 10ops-eqiad: Interface errors on cr1-eqiad:xe-3/3/1 - https://phabricator.wikimedia.org/T212791 (10ayounsi) Reply from Zayo: > The latest update is from 1/8/2018 10:53 AM CST : > Good morning, yes we do have good connectivity as our interface is with good light statistics. The BGP session is stu... [23:35:20] (03PS5) 10Dzahn: profile::mediawiki::maintenance: systemd-timer based periodic jobs [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) (owner: 10Giuseppe Lavagetto) [23:40:03] (03PS1) 10Smalyshev: Remove rules.log - don't think anything uses it anymore [puppet] - 10https://gerrit.wikimedia.org/r/483033 (https://phabricator.wikimedia.org/T144539) [23:40:36] (03CR) 10jerkins-bot: [V: 04-1] Remove rules.log - don't think anything uses it anymore [puppet] - 10https://gerrit.wikimedia.org/r/483033 (https://phabricator.wikimedia.org/T144539) (owner: 10Smalyshev) [23:40:37] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se website, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn) confirmed, WMF name servers listed as authority: ` dig wikiba.se | grep -A3 AUTHORITY .. ;; AUTHORITY SECTION: wikiba.se. 863... [23:42:06] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se website, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn) Might wanna redirect or replace content on ? http://89.31.143.100/ [23:44:02] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se website, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn) @MasinAlDujailiWMDE could you work with @CRoslof on getting the domain name transferred to WMF? [23:44:32] !log repooled wdqs1004 [23:44:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:44:56] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se website, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Addshore) >>! In T99531#4864434, @Dzahn wrote: > Might wanna redirect or replace content on ? http://89.31.143.100/ I have no idea abo... [23:45:05] mutante: thanks for all of your help on that ticket [23:45:15] it has been a long running one, but i can start to see the light at the end of the tunnel [23:45:27] (03PS2) 10Smalyshev: Remove rules.log - don't think anything uses it anymore [puppet] - 10https://gerrit.wikimedia.org/r/483033 (https://phabricator.wikimedia.org/T144539) [23:46:03] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se website, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn) >>! In T99531#4820326, @BBlack wrote: > 1. Switch the nameservers for `wikiba.se` to ns[012].wikimedia.org with your current reg... [23:47:45] 10Operations, 10Traffic, 10Wikidata, 10wikiba.se website, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn) >>! In T99531#4864464, @Addshore wrote: > But I guess not top priority as the hosting should be moving soonish ;) Yea, i realiz... [23:48:12] addshore: welcome:) and yes, indeed. it's been a while but we can see it happening :) [23:49:17] (03PS1) 10Ottomata: [WIP] Helm chart for eventgate-analytics deployment [deployment-charts] - 10https://gerrit.wikimedia.org/r/483035 (https://phabricator.wikimedia.org/T211247) [23:52:26] (03CR) 10Dzahn: "it compiles now, these new resources are all due to conftool-state: https://puppet-compiler.wmflabs.org/compiler1002/14222/mwmaint1002.eq" [puppet] - 10https://gerrit.wikimedia.org/r/482792 (https://phabricator.wikimedia.org/T211250) (owner: 10Giuseppe Lavagetto) [23:53:50] 10Operations, 10Traffic, 10Wikidata, 10serviceops, and 3 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn) [23:54:11] 10Operations, 10Traffic, 10Wikidata, 10serviceops, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn) [23:56:09] 10Operations, 10Wikimedia-Mailing-lists: import old staff list archives ? - https://phabricator.wikimedia.org/T109395 (10Dzahn) a:05Dzahn→03None [23:57:22] 10Operations, 10monitoring: Icinga check for ipv6 host reachability - https://phabricator.wikimedia.org/T163996 (10Dzahn) a:05Dzahn→03None i noticed i had this comment that i started typing but never saved: --- currently not going to work on this and we should probably wait a bit closely watching performa... [23:59:22] 10Operations, 10ops-codfw: asw-c-codfw - FPC 1 PEM 1 is not powered - https://phabricator.wikimedia.org/T213233 (10ayounsi) p:05Triage→03High