[00:06:47] RECOVERY - Disk space on an-coord1001 is OK: DISK OK [00:07:21] !log an-coord1001 - apt-get clean to free disk space, reacting to Icinga alert for running out of disk [00:07:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:08:38] 10Operations, 10Analytics, 10Analytics-Cluster: an-coord1001 almost out of disk - https://phabricator.wikimedia.org/T212915 (10Dzahn) [00:09:06] 10Operations, 10Analytics, 10Analytics-Cluster: an-coord1001 almost out of disk - https://phabricator.wikimedia.org/T212915 (10Dzahn) After i ran apt-get clean it;s back to: /dev/md0 46G 39G 5.1G 89% / [00:15:54] 10Operations, 10Domains, 10Traffic: Redirecting incoming queries to non-existent subpages - https://phabricator.wikimedia.org/T212914 (10Dzahn) @Thomas_Shafee Is it an option to stop using GoDaddy? It seems pretty bad to me that they are adding "random strings", maybe another registrar would be the best solu... [00:17:22] 10Operations, 10Domains, 10Traffic: Redirecting incoming queries to non-existent subpages - https://phabricator.wikimedia.org/T212914 (10Dzahn) https://www.godaddy.com/community/Managing-Domains/subdomain-forwarding-is-adding-extra-random-characters-to-the/td-p/61774 [00:29:46] (03PS11) 10Dzahn: prometheus::ops: convert role to profile [puppet] - 10https://gerrit.wikimedia.org/r/400241 [00:57:56] (03CR) 10Smalyshev: Allow format to be overridden in mediatype object (032 comments) [dumps/dcat] - 10https://gerrit.wikimedia.org/r/425993 (https://phabricator.wikimedia.org/T154914) (owner: 10Lokal Profil) [01:01:07] (03CR) 10EBernhardson: [C: 03+1] [cirrus] re-enable HHVM connection pooling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482082 (https://phabricator.wikimedia.org/T212768) (owner: 10DCausse) [01:19:15] RECOVERY - MariaDB Slave Lag: s4 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 218.18 seconds [01:45:03] PROBLEM - puppet last run on lvs5003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:46:45] 10Operations, 10monitoring, 10Discovery-Search (Current work), 10User-CDanis, 10User-fgiunchedi: Remove "prometheus" from elasticsearch grafana dashboard names - https://phabricator.wikimedia.org/T212839 (10Mathew.onipe) a:03Mathew.onipe [01:52:37] (03CR) 10Dzahn: [C: 03+2] prometheus::ops: convert role to profile [puppet] - 10https://gerrit.wikimedia.org/r/400241 (owner: 10Dzahn) [01:55:54] 10Operations, 10monitoring, 10Patch-For-Review, 10User-CDanis, 10User-fgiunchedi: Better organization for SRE grafana dashboards - https://phabricator.wikimedia.org/T178690 (10Mathew.onipe) [01:55:56] 10Operations, 10monitoring, 10Discovery-Search (Current work), 10User-CDanis, 10User-fgiunchedi: Remove "prometheus" from elasticsearch grafana dashboard names - https://phabricator.wikimedia.org/T212839 (10Mathew.onipe) 05Open→03Resolved [01:59:07] 10Operations, 10Elasticsearch, 10Maps, 10Discovery-Search (Current work): Review Elastic/maps Grafana dashboards - https://phabricator.wikimedia.org/T209812 (10Mathew.onipe) [01:59:09] 10Operations, 10Elasticsearch, 10Discovery-Search (Current work): fix broken visualizations in Elasticsearch Node comparison dashboard - https://phabricator.wikimedia.org/T212831 (10Mathew.onipe) [01:59:15] (03CR) 10Dzahn: [C: 03+2] "@Filippo merged and ran puppet on prometheus2003 and bast3002 (for bastionhost::pop) and there was no change at all during the puppet run." [puppet] - 10https://gerrit.wikimedia.org/r/400241 (owner: 10Dzahn) [02:16:21] RECOVERY - puppet last run on lvs5003 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [03:08:02] (03CR) 10VolkerE: [C: 04-1] "The logos need to be further optimized, by a PNG optimization tool. Exemplified by TinyPNG elwikinews-2x.png comes down to 35.1 KiB instea" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/478498 (owner: 10Robingan7) [03:31:35] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 892.95 seconds [03:37:24] (03Abandoned) 10Gergő Tisza: Handle X-MediaWiki-Patrol-Status header in Varnish [puppet] - 10https://gerrit.wikimedia.org/r/402572 (https://phabricator.wikimedia.org/T167400) (owner: 10Gergő Tisza) [03:38:05] (03Abandoned) 10Gergő Tisza: Configure the Swift file backend to accept X-MediaWiki- headers. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/402763 (owner: 10Gergő Tisza) [04:02:41] (03PS1) 10Wangql: Modifying configuration about Chinese Wikiversity: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) [04:02:57] (03CR) 10jerkins-bot: [V: 04-1] Modifying configuration about Chinese Wikiversity: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [04:04:23] (03PS1) 10Gergő Tisza: Revert "Whitelist X-MediaWiki-Patrol-Status header in Swift" [puppet] - 10https://gerrit.wikimedia.org/r/482262 [04:05:02] (03PS2) 10Gergő Tisza: Revert "Whitelist X-MediaWiki-Patrol-Status header in Swift" [puppet] - 10https://gerrit.wikimedia.org/r/482262 (https://phabricator.wikimedia.org/T167400) [04:20:49] (03PS2) 10Wangql: Modifying configuration about Chinese Wikiversity: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) [04:21:05] (03CR) 10jerkins-bot: [V: 04-1] Modifying configuration about Chinese Wikiversity: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [04:24:01] 10Operations, 10Wikimedia-Mailing-lists: recovering wikimedia-mx mailing list password - https://phabricator.wikimedia.org/T212920 (10Magister_Mathematicae) [04:26:03] (03PS3) 10Wangql: Modifying configuration about Chinese Wikiversity: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) [04:34:05] (03PS4) 10Wangql: Modifying configuration about Chinese Wikiversity: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) [04:43:11] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 287.05 seconds [06:16:29] (03PS1) 10Jayprakash12345: To lift a cap on account creation from IP for mrwiki community [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482263 (https://phabricator.wikimedia.org/T212921) [06:18:31] (03CR) 10Jayprakash12345: "@Urbanecm and Framawiki, Can you review?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482263 (https://phabricator.wikimedia.org/T212921) (owner: 10Jayprakash12345) [06:22:35] (03CR) 10Jayprakash12345: "UTC Offset for IST is +5:30" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482263 (https://phabricator.wikimedia.org/T212921) (owner: 10Jayprakash12345) [06:49:57] 10Operations, 10Release-Engineering-Team, 10Scap: mwdebug1001 and mwdebug1002 are reliably the last two hosts to finish scap-cdb-rebuild - https://phabricator.wikimedia.org/T203625 (10Joe) >>! In T203625#4778902, @Legoktm wrote: > Yeah, that seems sensible unless there's some significant reason (e.g. hardwar... [07:24:08] (03PS2) 10Elukey: Remove two Analytics Hadoop worker nodes for decom [puppet] - 10https://gerrit.wikimedia.org/r/482016 (https://phabricator.wikimedia.org/T209929) [07:38:21] (03CR) 10Elukey: [C: 03+2] Remove two Analytics Hadoop worker nodes for decom [puppet] - 10https://gerrit.wikimedia.org/r/482016 (https://phabricator.wikimedia.org/T209929) (owner: 10Elukey) [07:40:33] (03PS2) 10Muehlenhoff: Re-add blacklist for /sbin in thumbor profile [puppet] - 10https://gerrit.wikimedia.org/r/481141 [07:46:44] (03CR) 10Muehlenhoff: [C: 04-2] "Yeah, so I had a closer look and this won't work as intended, but I have a better solution instead: The Firejail profile for Mediawiki con" [puppet] - 10https://gerrit.wikimedia.org/r/481139 (owner: 10Muehlenhoff) [07:47:08] (03Abandoned) 10Muehlenhoff: Remove thumbor.profile.firejail [puppet] - 10https://gerrit.wikimedia.org/r/481143 (owner: 10Muehlenhoff) [07:47:36] (03Abandoned) 10Muehlenhoff: Switch /etc/firejail/thumbor.profile to the mediawiki profile [puppet] - 10https://gerrit.wikimedia.org/r/481142 (owner: 10Muehlenhoff) [07:56:15] !log installing OpenSSL security updates [07:56:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:00] !log restart eventlogging daemons on eventlog1002 to pick up openssl updates [08:16:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:50] (03PS1) 10Giuseppe Lavagetto: jobrunner: add php monitoring [puppet] - 10https://gerrit.wikimedia.org/r/482267 [08:22:52] (03PS1) 10Giuseppe Lavagetto: site.pp: fold videoscalers into jobrunners [puppet] - 10https://gerrit.wikimedia.org/r/482268 [08:22:54] (03PS1) 10Giuseppe Lavagetto: videoscaler: remove last references to videoscalers as a separate cluster. [puppet] - 10https://gerrit.wikimedia.org/r/482269 [08:30:00] (03CR) 10Giuseppe Lavagetto: [C: 03+2] jobrunner: add php monitoring [puppet] - 10https://gerrit.wikimedia.org/r/482267 (owner: 10Giuseppe Lavagetto) [08:34:23] PROBLEM - DPKG on ping2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [08:35:05] ^ that's me, should recover soonish [08:35:37] RECOVERY - DPKG on ping2001 is OK: All packages OK [08:39:19] PROBLEM - DPKG on francium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [08:40:49] PROBLEM - puppet last run on ping2001 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[tshark],Exec[set debconf flag seen for wireshark-common/install-setuid] [08:41:45] RECOVERY - DPKG on francium is OK: All packages OK [08:44:07] !log restarting nginx on francium to pick up new OpenSSL [08:44:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:01] RECOVERY - puppet last run on ping2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:50:39] PROBLEM - PHP7 rendering on mw2160 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [08:50:41] PROBLEM - PHP7 rendering on mw2278 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [08:52:27] PROBLEM - PHP7 rendering on mw1299 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [08:54:15] PROBLEM - PHP7 rendering on mw1296 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [08:55:09] _joe_: ^ [08:55:29] <_joe_> uh interesting [08:55:32] <_joe_> of course [08:55:47] <_joe_> ok not a big deal, it's the fault of my puppet change [08:56:18] do we need to revert? [08:56:23] <_joe_> no [08:56:34] <_joe_> we're just monitoring the wrong url [08:56:39] lol [08:56:52] <_joe_> I need to do a followup change :) [08:59:45] PROBLEM - PHP7 rendering on mw1293 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [08:59:45] PROBLEM - PHP7 rendering on mw1300 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [09:01:35] PROBLEM - PHP7 rendering on mw1295 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [09:01:39] PROBLEM - PHP7 rendering on mw2279 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.074 second response time [09:03:00] !log executing schema change on db1116 - T85757 [09:03:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:03:03] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [09:03:27] PROBLEM - PHP7 rendering on mw2155 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:03:27] PROBLEM - PHP7 rendering on mw2159 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:05:17] PROBLEM - PHP7 rendering on mw1338 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [09:05:19] PROBLEM - PHP7 rendering on mw2246 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:06:40] (03PS1) 10Giuseppe Lavagetto: profile::mw::php::monitoring: make page check conditional [puppet] - 10https://gerrit.wikimedia.org/r/482272 [09:06:45] <_joe_> jijiki: ^ can you review? :) [09:07:09] PROBLEM - PHP7 rendering on mw2250 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:07:09] PROBLEM - PHP7 rendering on mw2282 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:07:36] sure :) [09:08:57] PROBLEM - PHP7 rendering on mw2153 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:08:59] PROBLEM - PHP7 rendering on mw2249 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:09:44] _joe_: we simply disable this test if I understand correctly? [09:09:56] <_joe_> yes [09:10:11] <_joe_> we will have a dedicated check once we merge another patch [09:10:20] <_joe_> enabling the use of php7 from the jobrunners [09:10:43] PROBLEM - PHP7 rendering on mw1294 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [09:10:43] PROBLEM - PHP7 rendering on mw1301 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [09:10:43] PROBLEM - PHP7 rendering on mw1310 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [09:10:47] PROBLEM - PHP7 rendering on mw2259 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:13:58] !log restarting nginx on puppetdb hosts to pick up new OpenSSL [09:13:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:47] <_joe_> that's going to cause puppet failures, ofc [09:17:52] (03CR) 10Filippo Giunchedi: "Sounds good, I'll merge on Mon" [puppet] - 10https://gerrit.wikimedia.org/r/482262 (https://phabricator.wikimedia.org/T167400) (owner: 10Gergő Tisza) [09:18:45] (03CR) 10Effie Mouzeli: [V: 03+1 C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/14155/mw1310.eqiad.wmnet/ looks alirght" [puppet] - 10https://gerrit.wikimedia.org/r/482272 (owner: 10Giuseppe Lavagetto) [09:20:32] PROBLEM - PHP7 rendering on mw1306 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [09:20:32] PROBLEM - PHP7 rendering on mw1308 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [09:20:34] PROBLEM - PHP7 rendering on mw2152 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:20:34] PROBLEM - PHP7 rendering on mw2161 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:20:36] PROBLEM - PHP7 rendering on mw2265 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:22:20] PROBLEM - PHP7 rendering on mw1302 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [09:22:24] PROBLEM - PHP7 rendering on mw2243 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:22:24] PROBLEM - PHP7 rendering on mw2247 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:22:54] PROBLEM - DPKG on auth1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:23:06] (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::mw::php::monitoring: make page check conditional [puppet] - 10https://gerrit.wikimedia.org/r/482272 (owner: 10Giuseppe Lavagetto) [09:24:10] PROBLEM - PHP7 rendering on mw1318 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [09:24:12] PROBLEM - PHP7 rendering on mw1336 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [09:24:12] PROBLEM - PHP7 rendering on mw2157 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:24:14] PROBLEM - PHP7 rendering on mw2280 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:26:00] PROBLEM - PHP7 rendering on mw1311 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [09:26:04] PROBLEM - PHP7 rendering on mw2263 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:27:30] PROBLEM - puppet last run on auth1002 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 3 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[tshark],Exec[set debconf flag seen for wireshark-common/install-setuid] [09:27:50] PROBLEM - PHP7 rendering on mw1309 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [09:27:54] PROBLEM - PHP7 rendering on mw2266 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.073 second response time [09:29:22] RECOVERY - DPKG on auth1002 is OK: All packages OK [09:29:31] (03CR) 10Filippo Giunchedi: "Thanks for this! I'm adding cc'ing WMCS folks to check what they'd like to do with cluster name, so far we have:" [puppet] - 10https://gerrit.wikimedia.org/r/482149 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [09:29:42] PROBLEM - PHP7 rendering on mw1303 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [09:29:42] PROBLEM - PHP7 rendering on mw1305 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [09:29:42] PROBLEM - PHP7 rendering on mw1334 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - header X-Powered-By: PHP/7. not found on http://en.wikipedia.org:80/wiki/Main_Page - 376 bytes in 0.001 second response time [09:30:46] (03CR) 10Jforrester: "Not sure why we'd want TestCommons to regress from where RealCommons is?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482139 (https://phabricator.wikimedia.org/T197616) (owner: 10Reedy) [09:32:08] (03CR) 10Effie Mouzeli: [C: 03+2] Re-add blacklist for /sbin in thumbor profile [puppet] - 10https://gerrit.wikimedia.org/r/481141 (owner: 10Muehlenhoff) [09:33:38] (03PS3) 10Effie Mouzeli: Re-add blacklist for /sbin in thumbor profile [puppet] - 10https://gerrit.wikimedia.org/r/481141 (owner: 10Muehlenhoff) [09:38:21] Reedy: Hmm. "error:Table 'testcommonswiki.blobs_cluster24' doesn't exist (10.64.32.184)". Is S4's ES magically sharded? [09:39:29] (This means we're currently doing a perfect job of stopping spam, of course, but also actual edits.) [09:40:55] !log executing schema change on dbstore1002 - T85757 [09:40:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:40:57] T85757: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 [09:50:07] !log restarting nginx on all wdqs hosts [09:50:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:57:15] !log restarting thumbor services to pick up 481141 [09:57:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:58:03] (03PS1) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [09:58:14] RECOVERY - puppet last run on auth1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:09:56] (03PS2) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [10:13:40] (03PS3) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [10:27:06] (03CR) 10GTirloni: [C: 03+2] toolforge: Update regex for parsing nginx logs [puppet] - 10https://gerrit.wikimedia.org/r/482236 (owner: 10BryanDavis) [10:27:19] (03PS2) 10GTirloni: toolforge: Update regex for parsing nginx logs [puppet] - 10https://gerrit.wikimedia.org/r/482236 (owner: 10BryanDavis) [10:36:48] (03CR) 10GTirloni: [C: 03+1] hiera: add wmcs cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/482149 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [10:40:00] (03CR) 10GTirloni: "Would it make sense to have this in labs/private/hieradata/common/profile/toolforge/toolviews.pp instead?" [labs/private] - 10https://gerrit.wikimedia.org/r/482238 (https://phabricator.wikimedia.org/T87001) (owner: 10BryanDavis) [10:41:26] (03PS1) 10Arturo Borrero Gonzalez: cloudvirt1024: rebuild as stretch [puppet] - 10https://gerrit.wikimedia.org/r/482282 (https://phabricator.wikimedia.org/T212898) [10:42:06] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudvirt1024: rebuild as stretch [puppet] - 10https://gerrit.wikimedia.org/r/482282 (https://phabricator.wikimedia.org/T212898) (owner: 10Arturo Borrero Gonzalez) [10:46:07] !log rolling restart of swift proxies to pick up OpenSSL update [10:46:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:47:17] !log T212898 reimaging cloudvirt1024 as stretch [10:47:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:47:20] T212898: Rebuild cloudvirt1024 with Stretch - https://phabricator.wikimedia.org/T212898 [11:02:26] 10Operations, 10Citoid, 10Core Platform Team Backlog (Watching / External), 10Patch-For-Review, and 2 others: Decreased internationalisation of automatic citations as a result of switch to new translation-server - https://phabricator.wikimedia.org/T210806 (10Mvolz) [11:21:15] (03CR) 10A2093064: [C: 04-1] Modifying configuration about Chinese Wikiversity: (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) (owner: 10Wangql) [11:27:16] (03PS2) 10Arturo Borrero Gonzalez: site.pp: consolidate cloudvirt entries [puppet] - 10https://gerrit.wikimedia.org/r/482133 (owner: 10Andrew Bogott) [11:28:06] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] site.pp: consolidate cloudvirt entries [puppet] - 10https://gerrit.wikimedia.org/r/482133 (owner: 10Andrew Bogott) [11:30:32] (03CR) 10Elukey: [C: 04-1] "This doesn't work, removing people to refactor the change.." [puppet] - 10https://gerrit.wikimedia.org/r/482275 (owner: 10Elukey) [11:31:13] 10Operations, 10DBA, 10Jade, 10TechCom-RFC, and 2 others: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10daniel) Quick meta-point: @Halfak wrote > This conversation is getting very Producty but if this is what TechCom needs to in or... [11:31:26] !log installing libdatetime-timezone-perl updates for recent tz changes [11:31:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:31] (03CR) 10Arturo Borrero Gonzalez: "I'm not sure about having trusty in d/changelog. This has bitten us in the past several times. I tried a git branching approach that bstor" [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/479181 (https://phabricator.wikimedia.org/T107878) (owner: 10GTirloni) [11:32:52] (03PS4) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [11:33:32] !log installing jasper security updates [11:33:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:33:54] (03PS12) 10Arturo Borrero Gonzalez: wmcs::nfs::misc - Refactor into profile/role [puppet] - 10https://gerrit.wikimedia.org/r/482051 (https://phabricator.wikimedia.org/T209527) (owner: 10GTirloni) [11:34:37] (03PS5) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [11:39:29] (03PS6) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [11:40:17] (03PS1) 10Muehlenhoff: Add library hint for jasper [puppet] - 10https://gerrit.wikimedia.org/r/482288 [11:41:15] (03PS7) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [11:42:10] (03CR) 10Muehlenhoff: [C: 03+2] Add library hint for jasper [puppet] - 10https://gerrit.wikimedia.org/r/482288 (owner: 10Muehlenhoff) [11:42:41] (03CR) 10Elukey: [C: 04-1] [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (owner: 10Elukey) [11:48:29] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] hiera: add wmcs cluster definition [puppet] - 10https://gerrit.wikimedia.org/r/482149 (https://phabricator.wikimedia.org/T210486) (owner: 10Cwhite) [11:57:22] (03CR) 10Arturo Borrero Gonzalez: "Good work! Some comments inline." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/482051 (https://phabricator.wikimedia.org/T209527) (owner: 10GTirloni) [11:58:24] !log installing libsndfile security updates [11:58:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:03:58] (03PS1) 10Muehlenhoff: Add library hint for libsndfile [puppet] - 10https://gerrit.wikimedia.org/r/482291 [12:05:37] (03CR) 10Muehlenhoff: [C: 03+2] Add library hint for libsndfile [puppet] - 10https://gerrit.wikimedia.org/r/482291 (owner: 10Muehlenhoff) [12:10:06] (03CR) 10Fsero: [C: 03+2] k8s::flannel: remove upstart, use systemd::service instead (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/482118 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [12:16:02] PROBLEM - puppet last run on cloudvirt1026 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/rsyslog.d/10-puppet-agent.conf] [12:17:54] PROBLEM - puppet last run on cloudvirt1027 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/rsyslog.d/10-puppet-agent.conf] [12:21:38] PROBLEM - puppet last run on cloudvirt1028 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/rsyslog.d/10-puppet-agent.conf] [12:28:22] RECOVERY - puppet last run on cloudvirt1027 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [12:31:40] RECOVERY - puppet last run on cloudvirt1026 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [12:32:02] RECOVERY - puppet last run on cloudvirt1028 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:34:27] (03CR) 10GTirloni: wmcs::nfs::misc - Refactor into profile/role (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/482051 (https://phabricator.wikimedia.org/T209527) (owner: 10GTirloni) [12:44:30] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 627.32 seconds [12:48:17] (03PS5) 10Wangql: Modifying configuration about Chinese Wikiversity: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482261 (https://phabricator.wikimedia.org/T212919) [12:48:33] (03PS1) 10ArielGlenn: file truncation checks also now optionally check for last xml tag [dumps] - 10https://gerrit.wikimedia.org/r/482293 (https://phabricator.wikimedia.org/T212462) [12:56:38] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 288.01 seconds [12:57:29] (03PS3) 10GTirloni: Limit manifest starts (max 10) [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/479181 (https://phabricator.wikimedia.org/T107878) [12:57:54] !log rebooting kubernetes staging workers for kernel security update [12:57:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:59:20] (03CR) 10GTirloni: "> Patch Set 2:" [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/479181 (https://phabricator.wikimedia.org/T107878) (owner: 10GTirloni) [13:01:41] (03PS1) 10Muehlenhoff: Remove obsolete Hiera files [puppet] - 10https://gerrit.wikimedia.org/r/482295 [13:11:14] (03PS1) 10Mathew.onipe: Elasticsearch failed shard allocation check [puppet] - 10https://gerrit.wikimedia.org/r/482297 (https://phabricator.wikimedia.org/T212850) [13:11:50] !log rebooting kubernetes staging master to pick up SSBD-enabled qemu [13:11:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:00] (03CR) 10Arturo Borrero Gonzalez: wmcs::nfs::misc - Refactor into profile/role (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/482051 (https://phabricator.wikimedia.org/T209527) (owner: 10GTirloni) [13:16:53] (03PS4) 10Volans: phabricator: add phabricator module [software/spicerack] - 10https://gerrit.wikimedia.org/r/482018 (https://phabricator.wikimedia.org/T205884) [13:18:40] (03PS5) 10Volans: phabricator: add phabricator module [software/spicerack] - 10https://gerrit.wikimedia.org/r/482018 (https://phabricator.wikimedia.org/T205884) [13:20:43] (03CR) 10Arturo Borrero Gonzalez: "> > But having this d/changelog entry in the stretch package (new" [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/479181 (https://phabricator.wikimedia.org/T107878) (owner: 10GTirloni) [13:28:44] 10Operations, 10Electron-PDFs, 10Proton, 10Epic, and 4 others: [EPIC] New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748 (10Trizek-WMF) OK, I leave this ticket as "not ready to announce" until something that impacts users will happen. When ready, please add one sentence... [13:30:06] 10Operations, 10Domains, 10Traffic: Redirecting incoming queries to non-existent subpages - https://phabricator.wikimedia.org/T212914 (10ema) p:05Triage→03Normal [13:33:06] !log rebooting kubernetes staging etcd hosts to pick up SSBD-enabled qemu [13:33:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:39:58] (03PS6) 10Volans: phabricator: add phabricator module [software/spicerack] - 10https://gerrit.wikimedia.org/r/482018 (https://phabricator.wikimedia.org/T205884) [13:40:00] (03PS1) 10Volans: debmonitor: add debmonitor module [software/spicerack] - 10https://gerrit.wikimedia.org/r/482299 (https://phabricator.wikimedia.org/T205884) [13:49:52] (03PS8) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [13:52:14] !log rebooting etcd1004-1006 to pick up SSBD-enabled qemu [13:52:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:02:50] (03PS9) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [14:08:23] !log rebooting etcd1001-1003 to pick up SSBD-enabled qemu [14:08:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:09:28] (03PS10) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [14:14:48] PROBLEM - etcd request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 operation={get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:15:12] PROBLEM - etcd request latencies on argon is CRITICAL: instance=10.64.32.133:6443 operation={create,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:15:28] PROBLEM - Request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 verb={GET,LIST,PATCH,PUT} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:15:42] PROBLEM - Request latencies on argon is CRITICAL: instance=10.64.32.133:6443 verb={PATCH,POST} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:16:56] RECOVERY - Request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:17:38] RECOVERY - etcd request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:17:54] RECOVERY - Request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:18:26] RECOVERY - etcd request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [14:19:43] in case you are curious these alerts are because a kernel upgrade [14:20:39] IMO the threshold is pretty low, also the fact that the etcd cluster lives on a VM with probably limited IOPS doesnt help when we stress etcd (in this case a node coming up and refreshing the state) [14:28:44] (03PS11) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [14:30:54] 10Operations, 10serviceops: SRE FY2019 Q3 goal: Increase reach of deployment pipeline - https://phabricator.wikimedia.org/T212935 (10fselles) [14:34:49] 10Operations, 10Traffic: HTTP/2 requests fail with too-long URLs - https://phabricator.wikimedia.org/T209590 (10ema) P7916, which worked for @Huji, does not work for me. The path in P7916 (/wiki/Main_Page?_=xxxx...) is 4698 characters long. In my tests, the following works (path length 4683): ` $ curl --http... [14:47:51] (03PS12) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [14:55:36] (03PS13) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [14:57:38] (03PS14) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [15:03:52] 10Operations, 10ops-eqiad: db1082 has CRITICAL status in power supply - https://phabricator.wikimedia.org/T212909 (10Marostegui) [15:04:14] PROBLEM - ensure kvm processes are running on cloudvirt1024 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 [15:04:21] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-EventLogging: db1107 has CRITICAL status in power supply - https://phabricator.wikimedia.org/T212910 (10Marostegui) [15:04:40] PROBLEM - Check systemd state on cloudvirt1024 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [15:06:47] 10Operations, 10Thumbor, 10serviceops, 10User-jijiki: Investigate systemd hardening to replace Firejail for Thumbor - https://phabricator.wikimedia.org/T212941 (10jijiki) p:05Triage→03Normal [15:08:36] (03PS1) 10Arturo Borrero Gonzalez: cloudvirt1024: disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/482306 [15:09:04] (03PS15) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [15:11:18] 10Operations, 10ops-eqiad: db1082 has CRITICAL status in power supply - https://phabricator.wikimedia.org/T212909 (10Marostegui) p:05Triage→03Normal @Cmjohnson might be a loose cable? ` /system1/log1/record7 Targets Properties number=7 severity=Caution date=01/03/2019 time=14:32 des... [15:11:44] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-EventLogging: db1107 has CRITICAL status in power supply - https://phabricator.wikimedia.org/T212910 (10Marostegui) p:05Triage→03Normal This happened around the same time as {T212909} maybe there was some work being done over those racks and the cable... [15:12:09] (03PS1) 10Arturo Borrero Gonzalez: cloudvirt1024: use right iface name for neutron [puppet] - 10https://gerrit.wikimedia.org/r/482308 [15:12:13] (03PS16) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [15:12:42] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] cloudvirt1024: use right iface name for neutron [puppet] - 10https://gerrit.wikimedia.org/r/482308 (owner: 10Arturo Borrero Gonzalez) [15:14:28] RECOVERY - Check systemd state on cloudvirt1024 is OK: OK - running: The system is fully operational [15:15:26] (03Abandoned) 10Arturo Borrero Gonzalez: cloudvirt1024: disable notifications [puppet] - 10https://gerrit.wikimedia.org/r/482306 (owner: 10Arturo Borrero Gonzalez) [15:16:00] (03PS13) 10GTirloni: wmcs::nfs::misc - Refactor into profile/role [puppet] - 10https://gerrit.wikimedia.org/r/482051 (https://phabricator.wikimedia.org/T209527) [15:17:33] marostegui banyek re: db PSU alerts, looks like duplicate of T212861 ? [15:17:33] T212861: Rack A2's hosts alarm for PSU broken - https://phabricator.wikimedia.org/T212861 [15:17:42] I'm going to adjust some spam controls for kowiki [15:18:49] godog: yes, that might be the same [15:19:22] (03PS17) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [15:20:18] yeah I'm quite sure that's the case [15:21:02] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: es2019 is not responsive - https://phabricator.wikimedia.org/T212833 (10Marostegui) The cause of the crash was apparently memory related ` /admin1/system1/logs1/log1-> show record1 properties CreationTimestamp = 20190103073754.000000-360 ElementNa... [15:21:04] 10Operations, 10ops-codfw, 10DBA: Several es20XX servers keep crashing (es2017, es2019, es2015, es2014) since 23 March - https://phabricator.wikimedia.org/T130702 (10Marostegui) [15:21:06] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: es2019 is not responsive - https://phabricator.wikimedia.org/T212833 (10Marostegui) [15:21:09] 10Operations, 10ops-eqiad: db1082 has CRITICAL status in power supply - https://phabricator.wikimedia.org/T212909 (10Banyek) [15:21:17] 10Operations, 10ops-eqiad, 10Analytics: Rack A2's hosts alarm for PSU broken - https://phabricator.wikimedia.org/T212861 (10Banyek) [15:21:19] 10Operations, 10ops-eqiad, 10Analytics, 10Analytics-EventLogging: db1107 has CRITICAL status in power supply - https://phabricator.wikimedia.org/T212910 (10Banyek) [15:21:21] 10Operations, 10ops-eqiad, 10Analytics: Rack A2's hosts alarm for PSU broken - https://phabricator.wikimedia.org/T212861 (10Banyek) [15:22:01] (03CR) 10Elukey: "First pcc that kinda works: https://puppet-compiler.wmflabs.org/compiler1002/14171/an-master1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/482275 (owner: 10Elukey) [15:22:20] 10Operations, 10ops-eqiad, 10Analytics: Rack A2's hosts alarm for PSU broken - https://phabricator.wikimedia.org/T212861 (10Marostegui) [15:23:39] ACKNOWLEDGEMENT - ensure kvm processes are running on cloudvirt1024 is CRITICAL: PROCS CRITICAL: 0 processes with regex args qemu-system-x86_64 GTirloni Not in use [15:24:34] 10Operations, 10DBA, 10Patch-For-Review: es2019 crashed again - https://phabricator.wikimedia.org/T149526 (10Marostegui) [15:24:37] 10Operations, 10ops-codfw, 10DBA: Several es20XX servers keep crashing (es2017, es2019, es2015, es2014) since 23 March - https://phabricator.wikimedia.org/T130702 (10Marostegui) [15:24:40] (03PS18) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 [15:25:01] (03PS1) 10Muehlenhoff: Switch Thumbor hardening from Firejail to native systemd features (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/482309 [15:25:47] (03PS14) 10GTirloni: wmcs::nfs::misc - Refactor into profile/role [puppet] - 10https://gerrit.wikimedia.org/r/482051 (https://phabricator.wikimedia.org/T209527) [15:26:31] (03PS15) 10GTirloni: wmcs::nfs::misc - Refactor into profile/role [puppet] - 10https://gerrit.wikimedia.org/r/482051 (https://phabricator.wikimedia.org/T209527) [15:28:04] (03CR) 10GTirloni: wmcs::nfs::misc - Refactor into profile/role (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/482051 (https://phabricator.wikimedia.org/T209527) (owner: 10GTirloni) [15:42:14] !log bawolff@deploy1001 Synchronized private/PrivateSettings.php: T212667 - More aggressive anti-spam measures for account creation on kowiki (duration: 00m 48s) [15:42:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:17] T212667: Create mitigations for account creation spam attack [public task] - https://phabricator.wikimedia.org/T212667 [15:43:02] 10Operations, 10Recommendation-API, 10SRE-Access-Requests, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Add Baha as a deployer for Recommendation API - https://phabricator.wikimedia.org/T212945 (10mobrovac) p:05Triage→03Normal [15:43:10] 10Operations, 10Analytics, 10Analytics-Cluster: an-coord1001 almost out of disk - https://phabricator.wikimedia.org/T212915 (10herron) Also... ` an-coord1001:~$ uptime 15:40:26 up 92 days, 1:05, 2 users, load average: 1177.70, 1166.05, 1131.88 ` Looks like loads of icinga check_disk processes in D sta... [15:45:05] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: es2019 is not responsive - https://phabricator.wikimedia.org/T212833 (10Banyek) On Cumin2001 I have a comparison screen running inside of a screen in `/home/banyek` The script is used the following: `#!/bin/bash for db in $(mysql.py -h es2018 -BN -e "... [15:45:17] 10Operations, 10Thumbor, 10serviceops, 10User-jijiki: Stream Thumbor logs to logstash - https://phabricator.wikimedia.org/T212946 (10jijiki) [15:45:34] 10Operations, 10Thumbor, 10serviceops, 10User-jijiki: Stream Thumbor logs to logstash - https://phabricator.wikimedia.org/T212946 (10jijiki) [15:45:38] 10Operations, 10Recommendation-API, 10SRE-Access-Requests, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Add Baha as a deployer for Recommendation API - https://phabricator.wikimedia.org/T212945 (10mobrovac) @DarTar as Baha's manager, please review and approve this request. [15:46:14] 10Operations, 10Recommendation-API, 10Research, 10SRE-Access-Requests, and 2 others: Add Baha as a deployer for Recommendation API - https://phabricator.wikimedia.org/T212945 (10mobrovac) [15:47:50] 10Operations, 10Recommendation-API, 10Research, 10SRE-Access-Requests, and 2 others: Add Baha as a deployer for Recommendation API - https://phabricator.wikimedia.org/T212945 (10mobrovac) [15:48:10] (03PS1) 10Mobrovac: Add bmansunov to deploy-service and recommendation-admin groups [puppet] - 10https://gerrit.wikimedia.org/r/482312 (https://phabricator.wikimedia.org/T212945) [15:48:49] 10Operations, 10Analytics, 10Analytics-Cluster: an-coord1001 almost out of disk - https://phabricator.wikimedia.org/T212915 (10herron) Also looks like /mnt/hdfs is hanging on this host, which would explain check_disk stacking up [15:51:21] (03PS2) 10Muehlenhoff: Switch Thumbor hardening from Firejail to native systemd features (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/482309 (https://phabricator.wikimedia.org/T212941) [15:56:30] 10Operations, 10Analytics, 10Analytics-Cluster: an-coord1001 almost out of disk - https://phabricator.wikimedia.org/T212915 (10elukey) Thanks a lot for the task, I didn't see this today :( So two things: 1) the disk fills up due to logs, sadly there is a chatty systemd timer (hdfs-balancer) that emits logs... [16:02:29] (03PS3) 10Muehlenhoff: Switch Thumbor hardening from Firejail to native systemd features (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/482309 (https://phabricator.wikimedia.org/T212941) [16:02:31] 10Operations, 10Traffic: HTTP/2 requests fail with too-long URLs - https://phabricator.wikimedia.org/T209590 (10Vgutierrez) Ok, so after capturing a crashing request, we can see the following stuff going on at HTTP2 level: `willikins:~ vgutierrez$ tshark -r ~/crash.pcap.pcapng -o ssl.keylog_file:/tmp/crash.ke... [16:10:30] 10Operations, 10Analytics, 10Analytics-Cluster: an-coord1001 almost out of disk - https://phabricator.wikimedia.org/T212915 (10herron) Thanks @elukey! Is INFO level logging from hdfs-balancer needed? If not we might also be able to turn down the verbosity there. Definitely open to optimizing the logging c... [16:14:34] 10Operations, 10Analytics, 10Analytics-Cluster: an-coord1001 almost out of disk - https://phabricator.wikimedia.org/T212915 (10elukey) Completely ignorant about autofs but it looks a very viable option, I am all for trying it :) The verbosity could be lowered indeed, not sure if possible judging from the da... [16:15:01] 10Operations, 10Traffic: HTTP/2 requests fail with too-long URLs - https://phabricator.wikimedia.org/T209590 (10Huji) Thanks for working on this. I found that 420 message hilarious! I resisted making a 420 pun, so instead, I offer you this meme to thank you for all the hard work :) {F27790968} [16:15:10] (03CR) 10Muehlenhoff: "The errors like "node-abbrev (>= 1.1.1~) but 1.0.9-1 is to be installed" are caused by the fact that npm needs 1.1.1, but the package is a" [puppet] - 10https://gerrit.wikimedia.org/r/482150 (https://phabricator.wikimedia.org/T201366) (owner: 10Dzahn) [16:16:45] 10Operations, 10Analytics, 10Analytics-Kanban: Allow the deployment of users without SSH access - https://phabricator.wikimedia.org/T212949 (10elukey) p:05Triage→03Normal [16:18:01] (03PS19) 10Elukey: [WIP] admin: allow users to be deployed without ssh keys configured [puppet] - 10https://gerrit.wikimedia.org/r/482275 (https://phabricator.wikimedia.org/T212949) [16:26:40] 10Operations, 10Traffic: HTTP/2 requests fail with too-long URLs - https://phabricator.wikimedia.org/T209590 (10Anomie) Note it's ok if the server implements some limits. The problem here is that the client isn't getting the expected 4xx response (e.g. 414 or 431) when the limit is hit, all it sees is a droppe... [16:31:27] (03PS2) 10ArielGlenn: file truncation checks also now optionally check for last xml tag [dumps] - 10https://gerrit.wikimedia.org/r/482293 (https://phabricator.wikimedia.org/T212462) [16:36:55] 10Operations, 10DBA, 10Jade, 10TechCom-RFC, and 2 others: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Halfak) ^ Nevermind that. Was a copy-paste mistake [16:38:28] (03CR) 10CRusnov: [C: 03+1] "LGTM super minor inline." (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/482299 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [16:42:09] 10Operations, 10Mail: More restrictive DMARC policy for the wikimedia.org domain - https://phabricator.wikimedia.org/T211404 (10herron) p:05Triage→03Normal [16:42:23] 10Operations, 10Mail: Domains of most projects do not have DMARC policy - https://phabricator.wikimedia.org/T211403 (10herron) p:05Triage→03Normal [16:42:37] 10Operations, 10Mail: Wikipedia.org DMARC "rua" and "ruf" email addresses need verification - https://phabricator.wikimedia.org/T211401 (10herron) p:05Triage→03Normal [17:00:30] 10Operations, 10Analytics, 10Analytics-Kanban: an-coord1001 almost out of disk - https://phabricator.wikimedia.org/T212915 (10elukey) p:05Triage→03High [17:01:46] (03PS3) 10ArielGlenn: file truncation checks also now optionally check for last xml tag [dumps] - 10https://gerrit.wikimedia.org/r/482293 (https://phabricator.wikimedia.org/T212462) [17:04:16] (03PS1) 10Elukey: systemd::syslog: allow to add the 'stop' rule when needed [puppet] - 10https://gerrit.wikimedia.org/r/482327 (https://phabricator.wikimedia.org/T212915) [17:05:22] PROBLEM - HHVM jobrunner on mw1301 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [17:06:34] RECOVERY - HHVM jobrunner on mw1301 is OK: HTTP OK: HTTP/1.1 200 OK - 270 bytes in 0.007 second response time [17:09:02] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/14173/" [puppet] - 10https://gerrit.wikimedia.org/r/482327 (https://phabricator.wikimedia.org/T212915) (owner: 10Elukey) [17:22:39] 10Operations, 10Thumbor, 10Wikimedia-Logstash, 10serviceops, 10User-jijiki: Stream Thumbor logs to logstash - https://phabricator.wikimedia.org/T212946 (10herron) p:05Triage→03Normal Sure! In terms of figuring out how, since the logs are already present in syslog this should be a pretty straightforwa... [17:24:47] 10Operations, 10serviceops: SRE FY2019 Q3 goal: Increase reach of deployment pipeline - https://phabricator.wikimedia.org/T212935 (10herron) p:05Triage→03Normal [17:25:11] 10Operations, 10Release Pipeline (Blubber): blubber template for nodejs should allow defining configuration files to copy to the container - https://phabricator.wikimedia.org/T211580 (10herron) p:05Triage→03Normal [17:32:55] 10Operations, 10serviceops, 10Release-Engineering-Team (Watching / External): Increase mwdebugXXXX hosts CPU and memory - https://phabricator.wikimedia.org/T212955 (10greg) [17:33:31] 10Operations, 10Release-Engineering-Team, 10Scap: mwdebug1001 and mwdebug1002 are reliably the last two hosts to finish scap-cdb-rebuild - https://phabricator.wikimedia.org/T203625 (10greg) >>! In T203625#4854409, @Joe wrote: > The debug hosts sit idle all the time. Having 4 physical servers sitting idle in... [17:39:45] 10Operations, 10Scap, 10Release-Engineering-Team (Watching / External): mwdebug1001 and mwdebug1002 are reliably the last two hosts to finish scap-cdb-rebuild - https://phabricator.wikimedia.org/T203625 (10greg) [17:41:07] 10Operations, 10Scap, 10Patch-For-Review, 10User-ArielGlenn, 10User-Joe: Make scap and opcache work consistently together - https://phabricator.wikimedia.org/T211964 (10greg) (we'll watch via the #scap project) [17:42:52] (03CR) 10Gehel: [C: 04-1] "minor comments inline." (037 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/482299 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [17:46:43] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: decommission of restbase200[1-6] (lease return in December 2018) - https://phabricator.wikimedia.org/T211070 (10Papaul) [18:06:54] (03PS2) 10RobH: decom restbase200[1-6] production dns entries [dns] - 10https://gerrit.wikimedia.org/r/479560 (https://phabricator.wikimedia.org/T211070) [18:07:23] 10Operations, 10ops-codfw, 10DC-Ops, 10decommission, 10Patch-For-Review: decommission of restbase200[1-6] (lease return in December 2018) - https://phabricator.wikimedia.org/T211070 (10RobH) >>! In T211070#4853493, @Dzahn wrote: > @RobH @fgiunchedi I still see a bunch of production DNS records for these.... [18:07:32] (03CR) 10RobH: [V: 03+2 C: 03+2] decom restbase200[1-6] production dns entries [dns] - 10https://gerrit.wikimedia.org/r/479560 (https://phabricator.wikimedia.org/T211070) (owner: 10RobH) [18:08:37] (03CR) 10Gehel: [C: 04-1] "minor comment inline" (036 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/482018 (https://phabricator.wikimedia.org/T205884) (owner: 10Volans) [18:14:48] 10Operations, 10serviceops, 10Release-Engineering-Team (Watching / External): Increase mwdebugXXXX hosts CPU and memory - https://phabricator.wikimedia.org/T212955 (10hashar) /proc/cpuinfo reports a single cpu core. Though it is unknown whether uses multiple cores when generate the bytecode cache, it is wor... [18:19:56] 10Operations, 10serviceops, 10Release-Engineering-Team (Watching / External): Increase mwdebugXXXX hosts CPU and memory - https://phabricator.wikimedia.org/T212955 (10hashar) IIRC the cdb files are generated by `rebuildLocalisationCache.php` which is CPU bounds and run with up to 30 parallel tasks. I previo... [18:21:14] 10Operations, 10serviceops, 10Release-Engineering-Team (Watching / External): Increase mwdebugXXXX hosts CPU and memory(?) - https://phabricator.wikimedia.org/T212955 (10greg) [18:22:11] 10Operations, 10Wikimedia-Mailing-lists: Adminship of MediaWiki-India Mailing List - https://phabricator.wikimedia.org/T212957 (10Jayprakash12345) It will be good If you reset the list before 10 Jan. So we can use this upcoming #Wikimedia-Technical-Training-2019-India, which will happen in Feb. @Aklapper Can... [18:38:02] 10Operations, 10Wikimedia-Mailing-lists: Adminship of MediaWiki-India Mailing List - https://phabricator.wikimedia.org/T212957 (10Aklapper) @Jayprakash12345: See https://wikitech.wikimedia.org/wiki/Ops_Clinic_Duty for the process [18:43:12] PROBLEM - puppet last run on oresrdb1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:48:04] (03PS1) 10Nuria: [WIP] Testing monthly default config [puppet] - 10https://gerrit.wikimedia.org/r/482336 (https://phabricator.wikimedia.org/T209103) [18:48:23] 10Operations, 10Wikimedia-Mailing-lists: Adminship of MediaWiki-India Mailing List - https://phabricator.wikimedia.org/T212957 (10Jayprakash12345) [18:54:13] 10Operations, 10Wikimedia-Mailing-lists: Adminship of MediaWiki-India Mailing List - https://phabricator.wikimedia.org/T212957 (10Jayprakash12345) [19:04:11] (03CR) 10CRusnov: [C: 03+1] "LGTM (trivial change)" [software/cumin] - 10https://gerrit.wikimedia.org/r/481913 (owner: 10Volans) [19:09:14] RECOVERY - puppet last run on oresrdb1002 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [19:29:09] (03CR) 10Dzahn: "Thanks! It looks like it's fine to use wildcards with apt::pin package parameter (i see that being done in toolllabs classes), wondering i" [puppet] - 10https://gerrit.wikimedia.org/r/482150 (https://phabricator.wikimedia.org/T201366) (owner: 10Dzahn) [19:37:26] PROBLEM - HP RAID on db2047 is CRITICAL: CRITICAL: Slot 0: OK: 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Failed: 1I:1:1 - Controller: OK - Battery/Capacitor: OK [19:37:33] 10Operations, 10ops-codfw: Degraded RAID on db2047 - https://phabricator.wikimedia.org/T212966 (10ops-monitoring-bot) [19:41:32] (03CR) 10CRusnov: "Looks good, I do like the overall solution however I feel that it groups a lot of different kinds of errors in one Exception. Thinking abo" (032 comments) [dns] - 10https://gerrit.wikimedia.org/r/481833 (owner: 10Volans) [19:41:36] (03CR) 10Ayounsi: "The nagiosplugin module does seem to make things more complicated, at least for such simple check." [puppet] - 10https://gerrit.wikimedia.org/r/481157 (owner: 10Faidon Liambotis) [19:42:30] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2047 - https://phabricator.wikimedia.org/T212966 (10Marostegui) a:03Papaul Let's get it replaced when you can Thanks! [19:42:45] 10Operations, 10ops-codfw, 10DBA: Degraded RAID on db2047 - https://phabricator.wikimedia.org/T212966 (10Marostegui) p:05Triage→03Normal [19:44:01] (03PS3) 10Dzahn: testreduce: if on stretch, use stretch-backports to get npm package [puppet] - 10https://gerrit.wikimedia.org/r/482150 (https://phabricator.wikimedia.org/T201366) [19:44:51] 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [19:45:24] ACKNOWLEDGEMENT - HP RAID on db2047 is CRITICAL: CRITICAL: Slot 0: OK: 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Failed: 1I:1:1 - Controller: OK - Battery/Capacitor: OK Marostegui T212966 - The acknowledgement expires at: 2019-01-09 19:45:06. [19:45:32] (03CR) 10Dzahn: "Thank you Fsero! I added some more reviewers." [puppet] - 10https://gerrit.wikimedia.org/r/482118 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [19:48:04] (03CR) 10Dzahn: [C: 03+1] "patch looks good to me, just needs to follow the normal process for access requests. added clinic duty person" [puppet] - 10https://gerrit.wikimedia.org/r/482312 (https://phabricator.wikimedia.org/T212945) (owner: 10Mobrovac) [20:02:37] 10Operations, 10monitoring: Degraded RAID alert not acking notifications - https://phabricator.wikimedia.org/T212969 (10Marostegui) [20:02:55] 10Operations, 10monitoring: Degraded RAID alert not acking notifications - https://phabricator.wikimedia.org/T212969 (10Marostegui) p:05Triage→03Normal [20:07:13] (03CR) 10BryanDavis: "> > (I'd prefer branches but looks like our tooling is not there" [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/479181 (https://phabricator.wikimedia.org/T107878) (owner: 10GTirloni) [20:14:01] (03CR) 10GTirloni: "Although Toolforge's k8s cluster is on Jessie, the overlay network appears to be extended to the bastions, which are Trusty at the moment." [puppet] - 10https://gerrit.wikimedia.org/r/482118 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [20:15:21] (03CR) 10GTirloni: [C: 04-1] "Defensive -1 for now, sorry." [puppet] - 10https://gerrit.wikimedia.org/r/482118 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [20:19:17] (03CR) 10BryanDavis: [C: 03+1] "> Would it make sense to have this in labs/private/hieradata/common/profile/toolforge/toolviews.pp" [labs/private] - 10https://gerrit.wikimedia.org/r/482238 (https://phabricator.wikimedia.org/T87001) (owner: 10BryanDavis) [20:20:13] (03CR) 10GTirloni: "I feel we're getting derailed by the discussions around branches, which wasn't the purpose of this change :)" [software/tools-manifest] - 10https://gerrit.wikimedia.org/r/479181 (https://phabricator.wikimedia.org/T107878) (owner: 10GTirloni) [20:21:42] (03CR) 10GTirloni: "> Patch Set 1:" [labs/private] - 10https://gerrit.wikimedia.org/r/482238 (https://phabricator.wikimedia.org/T87001) (owner: 10BryanDavis) [20:46:36] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 11 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214 (10Krinkle) [20:47:11] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 11 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214 (10Krinkle) [20:47:45] 10Operations, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 11 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214 (10Krinkle) [20:58:55] (03PS1) 10Cwhite: varnish: enable statsd_exporter and add matching rules [puppet] - 10https://gerrit.wikimedia.org/r/482350 (https://phabricator.wikimedia.org/T205870) [21:01:03] 10Operations, 10Traffic: HTTP/2 requests fail with too-long URLs - https://phabricator.wikimedia.org/T209590 (10ema) >>! In T209590#4855448, @Anomie wrote: > Note it's ok if the server implements some limits. The problem here is that the client isn't getting the expected 4xx response (e.g. 414 or 431) when the... [21:10:57] (03CR) 10Krinkle: "What are these metric matchings based on? I would expect this to be a narrow subset of audited metrics affected teams are aware of and wan" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/481110 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite) [21:12:53] (03CR) 10Ayounsi: "As Icinga checks apply to a single device, I think the best way to implement this check is to define in "modules/netops/manifests/monitori" [puppet] - 10https://gerrit.wikimedia.org/r/481154 (https://phabricator.wikimedia.org/T150264) (owner: 10Faidon Liambotis) [21:36:46] PROBLEM - puppet last run on db1070 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:38:04] (03CR) 10Cwhite: ">" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/481110 (https://phabricator.wikimedia.org/T205870) (owner: 10Cwhite) [22:02:48] RECOVERY - puppet last run on db1070 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [22:13:53] 10Operations, 10ops-ulsfo, 10decommission: decommission/replace bast4001.wikimedia.org - https://phabricator.wikimedia.org/T178592 (10brion) Note that bast4001 no longer works for login? [22:52:53] (03PS1) 10Catrope: Update GrowthExperiments config for proportion->percentage change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482372 [22:53:36] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/14174/" [puppet] - 10https://gerrit.wikimedia.org/r/482150 (https://phabricator.wikimedia.org/T201366) (owner: 10Dzahn) [22:53:46] (03PS4) 10Dzahn: testreduce: if on stretch, use stretch-backports to get npm package [puppet] - 10https://gerrit.wikimedia.org/r/482150 (https://phabricator.wikimedia.org/T201366) [22:54:08] (03CR) 10Dzahn: [C: 03+2] "noop on ruthenium, only affects the new version of a test system" [puppet] - 10https://gerrit.wikimedia.org/r/482150 (https://phabricator.wikimedia.org/T201366) (owner: 10Dzahn) [22:55:14] (03CR) 10Catrope: [C: 04-2] "Do not deploy until after wmf.12 is deployed everywhere, otherwise the default config will kick in and the help panel will be given to 50%" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482372 (owner: 10Catrope) [22:58:58] (03PS1) 10Catrope: Enable GrowthExperiments help panel on cswiki and kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482373 (https://phabricator.wikimedia.org/T211993) [22:59:00] (03PS1) 10Catrope: Enable GrowthExperiments help panel for 50% of new users on cswiki and kowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482374 (https://phabricator.wikimedia.org/T211993) [22:59:09] grrr "Unable to correct problems, you have held broken packages." [22:59:23] (just a new test box though) [22:59:33] that would give us npm on stretch for parsoid testing [23:07:31] !log scandium apt-get remove nodejs nodes-legacy ; puppet agent -tv - after merging gerrit:482150 this fixed "you have held broken packages" issue, now we are at a puppet dependecy cycle with apt::pin T201366 [23:07:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:07:34] T201366: rack/setup/install scandium.eqiad.wmnet (parsoid test box) - https://phabricator.wikimedia.org/T201366 [23:08:28] RECOVERY - puppet last run on scandium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:10:33] (03PS1) 10Catrope: Enable Flow beta feature on viwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482376 (https://phabricator.wikimedia.org/T212929) [23:28:37] (03PS1) 10Paladox: gerrit: Update PolyGerrit theme plugin to customise the header either more [puppet] - 10https://gerrit.wikimedia.org/r/482379 [23:29:49] (03PS2) 10Paladox: gerrit: Update PolyGerrit theme plugin to customise the header either more [puppet] - 10https://gerrit.wikimedia.org/r/482379 [23:33:58] (03PS1) 10Dzahn: testreduce: break dependency cycle between apt::pin and require_package [puppet] - 10https://gerrit.wikimedia.org/r/482380 (https://phabricator.wikimedia.org/T201366) [23:34:48] (03PS2) 10Dzahn: testreduce: break dependency cycle between apt::pin and require_package [puppet] - 10https://gerrit.wikimedia.org/r/482380 (https://phabricator.wikimedia.org/T201366) [23:36:15] (03CR) 10Dzahn: [C: 03+2] testreduce: break dependency cycle between apt::pin and require_package [puppet] - 10https://gerrit.wikimedia.org/r/482380 (https://phabricator.wikimedia.org/T201366) (owner: 10Dzahn) [23:37:34] (03PS3) 10Paladox: gerrit: Update PolyGerrit theme plugin to customise the header either more [puppet] - 10https://gerrit.wikimedia.org/r/482379 [23:39:55] 10Operations, 10Wikimedia-Site-requests, 10Wikimedia-maintenance-script-run: Drop FlaggedRevs rights from users at srwikinews - https://phabricator.wikimedia.org/T212058 (10Zoranzoki21) >>! In T212058#4855320, @Marostegui wrote: > Ah, I see. That are not many rows :-) > Should be fine to run the script anyti... [23:41:05] (03PS4) 10Paladox: gerrit: Update PolyGerrit theme plugin to customise the header either more [puppet] - 10https://gerrit.wikimedia.org/r/482379 [23:42:07] (03PS5) 10Paladox: gerrit: Update PolyGerrit theme plugin to customise the header either more [puppet] - 10https://gerrit.wikimedia.org/r/482379 [23:42:17] (03PS6) 10Paladox: gerrit: Update PolyGerrit theme plugin to customise the header either more [puppet] - 10https://gerrit.wikimedia.org/r/482379 [23:46:00] (03PS7) 10Paladox: gerrit: Update PolyGerrit theme plugin to customise the header either more [puppet] - 10https://gerrit.wikimedia.org/r/482379 [23:47:30] (03PS1) 10Dzahn: testreduce: use regular package{} instead of require_package [puppet] - 10https://gerrit.wikimedia.org/r/482381 (https://phabricator.wikimedia.org/T201366) [23:47:59] (03CR) 10jerkins-bot: [V: 04-1] testreduce: use regular package{} instead of require_package [puppet] - 10https://gerrit.wikimedia.org/r/482381 (https://phabricator.wikimedia.org/T201366) (owner: 10Dzahn) [23:48:16] (03PS8) 10Paladox: gerrit: Update PolyGerrit theme plugin to customise the header either more [puppet] - 10https://gerrit.wikimedia.org/r/482379 [23:49:14] (03PS2) 10Dzahn: testreduce: use regular package{} instead of require_package [puppet] - 10https://gerrit.wikimedia.org/r/482381 (https://phabricator.wikimedia.org/T201366) [23:49:59] (03PS3) 10Dzahn: testreduce: use regular package{} instead of require_package [puppet] - 10https://gerrit.wikimedia.org/r/482381 (https://phabricator.wikimedia.org/T201366) [23:51:07] (03CR) 10Dzahn: [C: 03+2] testreduce: use regular package{} instead of require_package [puppet] - 10https://gerrit.wikimedia.org/r/482381 (https://phabricator.wikimedia.org/T201366) (owner: 10Dzahn) [23:53:45] and now Duplicate Declaration .. of course :/