[00:00:08] And also ready to test on enwiki [00:04:55] Cherry pick for wmf.26 is at https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/FlaggedRevs/+/585621/ [00:06:47] Ah thanks for creating the cherry-pick [00:13:55] cherry-pick was merged; let me know when I should test it (either on beta or prod) [00:20:41] RoanKattouw is this going to be deployed? [00:21:00] DannyS712: Yes, sorry, I don't get notified when the patch merges, and I got distracted [00:21:05] Thanks for reminding me [00:22:57] DannyS712: It's on the mwdebug servers now, please test [00:23:11] confirmed to work on beta cluster [00:24:26] RoanKattouw forgot to ping ^ [00:24:46] OK, could you test on mwdebug1001 just to be sure? [00:24:59] how? what is that? idk? [00:27:14] RoanKattouw - I don't know how to, sorry [00:27:28] Do you have the WikimediaDebug browser extension installed? [00:27:41] I just installed it, but I couldn't tell if it was doing anything [00:28:18] https://usercontent.irccloud-cdn.com/file/4Q5n2oMM/Screenshot%20from%202020-04-02%2017-28-02.png [00:28:27] Change the dropdown to mwdebug1001 and set the button to "on" [00:28:36] Its on, I just don't know if its doing anything [00:29:29] Even with it on, nothing changes - version is still from before the patch merged [00:29:49] RoanKattouw please advise; it the beta cluster enough? It works there [00:30:00] You can verify if it's working by going to any WMF wiki (e.g. a Wikipedia) and typing "wgHostname" into the browser console [00:30:15] If it says mwdebug1001, then your browser extension is working [00:31:09] Then you can test this patch in production directly. The extension ensures that your requests go to the debug server, where I have deployed your patch already [00:32:42] Sorry, I don't think its working for me - it says mwdebug1001, but the content is still from production - https://en.wikipedia.org/wiki/Special:Version still shows the old sha1 of the extension [00:33:19] But it works on the beta cluster [00:36:49] Yeah the SHA1 will be wrong, that's a known issue, sorry [00:37:02] Are you able to test whether the fix itself works? Or is that difficult to test in production? [00:37:45] I am unable to confirm that it works, but I believe its because of the mwdebug1001/prod thing not working fully [00:41:11] OK. If it works in beta then I guess I'll just deploy it [00:41:26] But let's test that it works afterwards then? [00:41:36] will do [00:42:44] !log catrope@deploy1001 Synchronized php-1.35.0-wmf.26/extensions/FlaggedRevs/: Fix logic for determining if pending edits were null (T249277) (duration: 01m 00s) [00:42:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:42:52] T249277: Can't reject changes: improper error message about blank diff - https://phabricator.wikimedia.org/T249277 [00:43:29] confirmed to work [00:51:33] (03CR) 10Bstorm: [C: 03+2] "Tested this locally and already applied to the servers manually. Merging for consistency." [puppet] - 10https://gerrit.wikimedia.org/r/585523 (https://phabricator.wikimedia.org/T249010) (owner: 10Hoo man) [00:55:20] 10Operations, 10ops-eqiad, 10SRE-swift-storage: ms-be1023 crashed / Smart Storage Battery failure - https://phabricator.wikimedia.org/T249174 (10wiki_willy) [00:55:40] 10Operations, 10ops-eqiad, 10SRE-swift-storage: ms-be1023 crashed / Smart Storage Battery failure - https://phabricator.wikimedia.org/T249174 (10wiki_willy) T249296 created for @RobH to order a few spares. Thanks, Willy [02:30:18] RECOVERY - Debian mirror in sync with upstream on sodium is OK: /srv/mirrors/debian is over 0 hours old. https://wikitech.wikimedia.org/wiki/Mirrors [03:05:54] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [03:18:48] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1075 is OK: HTTP OK: HTTP/1.0 200 OK - 22338 bytes in 0.006 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [04:39:46] (03PS2) 10Samwilson: Enable password-reset-update on Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585371 (https://phabricator.wikimedia.org/T245791) [05:00:46] RECOVERY - mediawiki originals uploads -hourly- for eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [05:02:02] RECOVERY - mediawiki originals uploads -hourly- for codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [05:20:42] PROBLEM - Work requests waiting in Zuul Gearman server on contint1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [150.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [05:21:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1126 for schema change', diff saved to https://phabricator.wikimedia.org/P10869 and previous config saved to /var/cache/conftool/dbconfig/20200403-052115-marostegui.json [05:21:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:21:23] !log Deploy schema change on db1126 [05:21:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:26:18] RECOVERY - Work requests waiting in Zuul Gearman server on contint1001 is OK: OK: Less than 100.00% above the threshold [90.0] https://www.mediawiki.org/wiki/Continuous_integration/Zuul https://grafana.wikimedia.org/dashboard/db/zuul-gearman?panelId=10&fullscreen&orgId=1 [05:42:18] PROBLEM - Host mw1320.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:42:18] PROBLEM - Host mw1322.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:42:20] PROBLEM - Host mw1323.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:42:20] PROBLEM - Host mw1321.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:42:26] PROBLEM - Host mw1324.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:42:44] PROBLEM - Host bast1002.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:42:44] PROBLEM - Host ps1-c6-eqiad is DOWN: PING CRITICAL - Packet loss = 100% [05:43:18] PROBLEM - Host db1134.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:43:52] woot? [05:43:59] mgmt switch down? [05:44:36] PROBLEM - Host mw1319.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:44:42] yeah, looks mgmt only [05:44:44] XioNoX: ^ [05:44:50] PROBLEM - Host mw1325.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:44:54] PROBLEM - Host wdqs1010.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:45:30] PROBLEM - Host db1121.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:46:11] I am going to wait a bit before creating a task [05:46:53] Looks like C6 [05:46:56] PROBLEM - Host mw1326.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:46:56] PROBLEM - Host mw1327.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:46:56] PROBLEM - Host mw1330.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:46:56] PROBLEM - Host mw1329.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:46:56] PROBLEM - Host mw1334.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:46:58] PROBLEM - Host mw1328.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:46:58] PROBLEM - Host mw1331.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:47:00] PROBLEM - Host mw1337.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:47:00] PROBLEM - Host mw1336.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:47:00] PROBLEM - Host mw1332.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:47:00] PROBLEM - Host mw1340.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:47:02] PROBLEM - Host mw1344.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:47:02] PROBLEM - Host mw1341.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:47:02] PROBLEM - Host mw1333.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:47:02] PROBLEM - Host mw1335.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:47:06] PROBLEM - Host mw1338.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:47:06] PROBLEM - Host mw1347.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:47:06] PROBLEM - Host mw1339.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:47:06] PROBLEM - Host mw1342.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:47:06] PROBLEM - Host mw1345.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:47:06] PROBLEM - Host mw1348.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:47:11] PROBLEM - Host mw1343.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:47:11] PROBLEM - Host mw1346.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:49:02] 10Operations, 10netops: Eqiad: C6 mgmt switch glitch - https://phabricator.wikimedia.org/T249309 (10Marostegui) [05:50:10] PROBLEM - Host ganeti1011.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [05:51:56] PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 60, down: 2, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:52:52] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 131, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:52:52] PROBLEM - Router interfaces on cr3-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 74, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:53:20] ACKNOWLEDGEMENT - Host wdqs1010.mgmt is DOWN: PING CRITICAL - Packet loss = 100% Marostegui https://phabricator.wikimedia.org/T249309 [05:53:20] ACKNOWLEDGEMENT - Host ps1-c6-eqiad is DOWN: PING CRITICAL - Packet loss = 100% Marostegui https://phabricator.wikimedia.org/T249309 [05:53:20] ACKNOWLEDGEMENT - Host mw1348.mgmt is DOWN: PING CRITICAL - Packet loss = 100% Marostegui https://phabricator.wikimedia.org/T249309 [05:53:20] ACKNOWLEDGEMENT - Host mw1347.mgmt is DOWN: PING CRITICAL - Packet loss = 100% Marostegui https://phabricator.wikimedia.org/T249309 [05:53:20] ACKNOWLEDGEMENT - Host mw1346.mgmt is DOWN: PING CRITICAL - Packet loss = 100% Marostegui https://phabricator.wikimedia.org/T249309 [05:57:34] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 64, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:58:28] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 133, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:58:28] RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 76, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [05:58:42] weird [06:11:22] (03PS1) 10Marostegui: labsdb: Prepare basedir for Buster and 10.4 [puppet] - 10https://gerrit.wikimedia.org/r/585635 (https://phabricator.wikimedia.org/T249188) [06:11:25] (03CR) 10Muehlenhoff: [C: 03+2] Add s-nail to send mails after package imports [puppet] - 10https://gerrit.wikimedia.org/r/585528 (https://phabricator.wikimedia.org/T224576) (owner: 10Muehlenhoff) [06:14:23] 10Operations, 10netops: Eqiad: C6 mgmt switch glitch - https://phabricator.wikimedia.org/T249309 (10elukey) On msw1 I see all events like the following, starting at 5:40 UTC: ` Apr 3 05:40:24 msw1-eqiad chassism[1399]: ifd_process_flaps IFD: ge-0/0/23, sent flap msg to RE, Downstate Apr 3 05:40:24 msw1-eq... [06:17:50] 10Operations, 10Traffic, 10User-DannyS712: 503 error on enwikinews - https://phabricator.wikimedia.org/T249280 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez Thanks for you report. a 503 error usually signals a transient issue. Please reopen this task if you experience this issue frequently. [06:18:03] (03CR) 10Marostegui: "NOOP as expected: https://puppet-compiler.wmflabs.org/compiler1001/21693/" [puppet] - 10https://gerrit.wikimedia.org/r/585635 (https://phabricator.wikimedia.org/T249188) (owner: 10Marostegui) [06:18:30] (03PS1) 10Muehlenhoff: Use bsd-mailx, not s-nail [puppet] - 10https://gerrit.wikimedia.org/r/585636 [06:19:25] RECOVERY - Host ganeti1011.mgmt is UP: PING OK - Packet loss = 0%, RTA = 0.07 ms [06:20:50] (03CR) 10Muehlenhoff: [C: 03+2] Use bsd-mailx, not s-nail [puppet] - 10https://gerrit.wikimedia.org/r/585636 (owner: 10Muehlenhoff) [06:20:59] marostegui: looks like the eqiad msw1 mgmt switch sees the port to c6 down, not sure what is the cause :( [06:21:23] elukey: ganeti came back, so maybe they'll all come back? [06:21:44] is it in c6? [06:21:53] I'd expect all or nothing [06:22:08] it is [06:24:02] yeah, maybe we'll see a storm of recoveries in a bit? [06:25:30] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1126 after schema change', diff saved to https://phabricator.wikimedia.org/P10870 and previous config saved to /var/cache/conftool/dbconfig/20200403-062529-marostegui.json [06:25:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:25:43] s'up? [06:25:48] hola :) [06:25:57] mgmt network for eqiad c6 seems down [06:26:02] XioNoX: check https://phabricator.wikimedia.org/T249309 [06:26:02] manuel opened a task [06:27:38] Great breakfast outage [06:28:21] XioNoX: do we have access to say msw-c6-eqiad? I didn't find a way to check it [06:28:26] that's for dcops [06:28:34] it's an unmanaged switch [06:28:40] also this seems something low priority, finish your breakfast please :) [06:28:43] ahhhh [06:28:49] they need to check the cabling [06:28:55] switch state [06:29:02] yep yep makes sense [06:29:09] and worse case replace it [06:29:43] it is strange that ganeti1011's mgmt network recovered [06:29:51] (that is in c6) but not the others [06:30:29] uh [06:30:44] Still waking up, I'll have a look [06:31:09] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, one nit inline." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/585635 (https://phabricator.wikimedia.org/T249188) (owner: 10Marostegui) [06:33:06] XioNoX: sure sure later on, I was just curious [06:33:12] I'll add to the task [06:33:34] 10Operations, 10ops-eqiad, 10netops: Eqiad: C6 mgmt switch glitch - https://phabricator.wikimedia.org/T249309 (10elukey) [06:34:22] 10Operations, 10ops-eqiad, 10netops: Eqiad: C6 mgmt switch glitch - https://phabricator.wikimedia.org/T249309 (10elukey) Interesting that ganeti1011's mgmt interface recovered, but not the others. Adding dcops to see if we can schedule in the next days/weeks a check of `msw-c6-eqiad`. [06:35:10] elukey: I can't ping mr1-eqiad> ping 10.65.5.106 [06:35:24] https://netbox.wikimedia.org/ipam/ip-addresses/1512/ [06:36:24] (03PS1) 10Raimond Spekking: Remove language 'smn', supported by core now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585673 [06:37:35] XioNoX: ok I was trying to parse "I can't ping mr1" since I didn't get it :D [06:37:45] hahah [06:37:47] icinga1001:~$ ping ganeti1011.mgmt [06:37:51] also doesn't work [06:38:09] then icinga is crazy: RECOVERY - Host ganeti1011.mgmt is UP: PING OK [06:38:25] yeah it's still up in https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=ganeti1011.mgmt [06:38:46] godog: ^ [06:39:47] 10Operations, 10ops-eqiad, 10netops: Eqiad: C6 mgmt switch glitch - https://phabricator.wikimedia.org/T249309 (10ayounsi) p:05Triage→03High [06:42:21] 10Operations, 10ops-eqiad, 10netops: Eqiad: C6 mgmt switch glitch - https://phabricator.wikimedia.org/T249309 (10ayounsi) * Check msw-c6-eqiad's status * Check msw-c6-eqiad cabling to msw1-eqiad Replace either cable or switch depending on what's faulty. [06:43:36] 10Operations, 10ops-eqiad, 10netops: Eqiad: C6 mgmt switch down - https://phabricator.wikimedia.org/T249309 (10ayounsi) [06:44:34] (03PS1) 10Muehlenhoff: Remove apt.dockerproject.org from reprepro updates [puppet] - 10https://gerrit.wikimedia.org/r/585675 [06:45:38] (03PS2) 10Marostegui: labsdb: Prepare basedir for Buster and 10.4 [puppet] - 10https://gerrit.wikimedia.org/r/585635 (https://phabricator.wikimedia.org/T249188) [06:47:16] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/585635 (https://phabricator.wikimedia.org/T249188) (owner: 10Marostegui) [06:48:02] (03CR) 10Muehlenhoff: [C: 03+2] Remove apt.dockerproject.org from reprepro updates [puppet] - 10https://gerrit.wikimedia.org/r/585675 (owner: 10Muehlenhoff) [06:48:04] 10Operations, 10Performance-Team: Occasional NIC Tx bandwidth saturation for mc1027 - https://phabricator.wikimedia.org/T248962 (10elukey) @Krinkle @aaron Can you tell me some hints about how to figure out what is the key's purpose? It feels like a popular template change in nlwiki but I'd like to triple check... [06:48:24] 10Operations, 10ops-eqiad, 10DC-Ops: (Need by: 2020-03-01) rack/setup/install htmldumper1001.eqiad.wmnet. - https://phabricator.wikimedia.org/T245567 (10ArielGlenn) It should get role(dumps::web::htmldumps) like francium. It doesn't hurt to have multiple hosts with this role. [06:49:46] (03CR) 10Marostegui: [C: 03+2] labsdb: Prepare basedir for Buster and 10.4 [puppet] - 10https://gerrit.wikimedia.org/r/585635 (https://phabricator.wikimedia.org/T249188) (owner: 10Marostegui) [06:55:05] !log add fastnetmon 1.1.4 to buster-wikimedia - T240658 [06:55:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:55:11] T240658: fastnetmon spamming /var/log on netflow hosts leading to disk saturation - https://phabricator.wikimedia.org/T240658 [06:57:56] (03PS3) 10DannyS712: Stop using $wgContentHandlerUseDB [mediawiki-config] - 10https://gerrit.wikimedia.org/r/583406 [06:58:01] 10Operations, 10Patch-For-Review: Upgrade install servers to Buster - https://phabricator.wikimedia.org/T224576 (10MoritzMuehlenhoff) I successfully tested import and removals of a dummy build (hello) and fixed up the sending of status mails. I've also synched the public repo keys from install1002 to apt1001... [06:58:32] (03CR) 10Muehlenhoff: [C: 03+1] "Tests were fine: https://phabricator.wikimedia.org/T224576#6025506" [dns] - 10https://gerrit.wikimedia.org/r/575404 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [06:59:09] (03CR) 10Muehlenhoff: [C: 03+1] "Tests were fine: https://phabricator.wikimedia.org/T224576#6025506" [puppet] - 10https://gerrit.wikimedia.org/r/585245 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [07:04:59] 10Operations, 10Performance-Team: Occasional NIC Tx bandwidth saturation for mc1027 - https://phabricator.wikimedia.org/T248962 (10elukey) 05Open→03Resolved a:03elukey [07:05:52] 10Operations, 10netops: fastnetmon spamming /var/log on netflow hosts leading to disk saturation - https://phabricator.wikimedia.org/T240658 (10ayounsi) 05Open→03Stalled p:05Medium→03Low All netflow hosts are now running FNM 1.1.4. Now waiting for upstream. [07:06:43] 10Operations, 10observability, 10serviceops, 10Performance-Team (Radar), 10User-Elukey: Create an alert for high memcached bw usage - https://phabricator.wikimedia.org/T224454 (10elukey) @CDanis this is an old task that I opened, do you think that we could revamp it and use what you have in mind to detec... [07:13:01] 10Operations, 10netops, 10Patch-For-Review, 10User-Elukey: can aggregated netflow data include the router it was sampled from? - https://phabricator.wikimedia.org/T246186 (10ayounsi) 05Open→03Resolved Afaik, everything is done here, thanks! [07:14:56] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10jcrespo) [07:15:02] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10jcrespo) a:05aezell→03Nuria This is now seeking analytics team approval. CC @Nuria [07:18:19] (03PS1) 10ArielGlenn: don't whine about retries in dump text retrieval [puppet] - 10https://gerrit.wikimedia.org/r/585682 [07:23:18] 10Operations, 10LDAP-Access-Requests, 10observability, 10serviceops, 10Patch-For-Review: Grant Access to Logstash to Peter(peter.ovchyn@speedandfunction.com) - https://phabricator.wikimedia.org/T249037 (10jcrespo) a:03AMooney I am assigning this to you to sort out legal/NDA/agreement with legal, please... [07:27:44] (03PS3) 10Jcrespo: aklapper: access to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/584676 (https://phabricator.wikimedia.org/T248905) (owner: 10Aklapper) [07:29:43] (03CR) 10ArielGlenn: [C: 03+2] don't whine about retries in dump text retrieval [puppet] - 10https://gerrit.wikimedia.org/r/585682 (owner: 10ArielGlenn) [07:31:34] (03CR) 10Jcrespo: [C: 03+2] aklapper: access to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/584676 (https://phabricator.wikimedia.org/T248905) (owner: 10Aklapper) [07:32:31] 10Operations: Onboarding Janis Meybohm - https://phabricator.wikimedia.org/T249081 (10MoritzMuehlenhoff) [07:34:10] 10Operations, 10SRE-Access-Requests, 10Developer-Advocacy (Apr-Jun 2020), 10Patch-For-Review: Add aklapper to analytics-privatedata-users - https://phabricator.wikimedia.org/T248905 (10jcrespo) @Aklapper server access has been deployed, in a few minutes (~30) you should have access to the stats machines.... [07:37:46] 10Operations, 10SRE-Access-Requests: Requesting access to mwmaint1002.eqiad.wmnet for holger - https://phabricator.wikimedia.org/T248922 (10jcrespo) a:05jcrespo→03holger.knust [07:41:56] 10Operations, 10SRE-Access-Requests, 10Developer-Advocacy (Apr-Jun 2020), 10Patch-For-Review: Add aklapper to analytics-privatedata-users - https://phabricator.wikimedia.org/T248905 (10elukey) @Aklapper I checked T213780 and I see that the user in question doesn't have a Kerberos account, how are you guys... [07:42:41] 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Alex Paskulin - https://phabricator.wikimedia.org/T249272 (10jcrespo) p:05Triage→03Medium a:03jcrespo [07:45:40] 10Operations, 10SRE-Access-Requests, 10Developer-Advocacy (Apr-Jun 2020), 10Patch-For-Review: Add aklapper to analytics-privatedata-users - https://phabricator.wikimedia.org/T248905 (10MoritzMuehlenhoff) @elukey: To help SREs on Clinic Duty figure out whether adding someone to a group also needs a Kerberos... [07:49:55] 10Operations: Onboarding Janis Meybohm - https://phabricator.wikimedia.org/T249081 (10MoritzMuehlenhoff) [07:54:07] 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Alex Paskulin - https://phabricator.wikimedia.org/T249272 (10jcrespo) I will proceed with your request once I verified your credentials. Allow me to suggest to add your LDAP account (wikitech account) to your Phabricator profile on https://... [07:56:02] 10Operations, 10LDAP-Access-Requests: LDAP access to the wmf group for Alex Paskulin - https://phabricator.wikimedia.org/T249272 (10jcrespo) [07:56:10] (03PS1) 10JMeybohm: icinga: let Janis Meybohm run commands on all hosts and services [puppet] - 10https://gerrit.wikimedia.org/r/585686 (https://phabricator.wikimedia.org/T249081) [07:57:02] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/585686 (https://phabricator.wikimedia.org/T249081) (owner: 10JMeybohm) [07:57:03] !log Deploy schema change on s7 codfw master, this will generate lag on codfw [07:57:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:57:48] (03CR) 10JMeybohm: [C: 03+2] icinga: let Janis Meybohm run commands on all hosts and services [puppet] - 10https://gerrit.wikimedia.org/r/585686 (https://phabricator.wikimedia.org/T249081) (owner: 10JMeybohm) [08:03:00] (03PS1) 10Jcrespo: admin: Add apaskulin to wmf ldap-only users [puppet] - 10https://gerrit.wikimedia.org/r/585687 (https://phabricator.wikimedia.org/T249272) [08:06:09] (03PS1) 10Dzahn: site: add role(dumps::web::htmldumps) to htmldumper1001 [puppet] - 10https://gerrit.wikimedia.org/r/585688 (https://phabricator.wikimedia.org/T245567) [08:07:06] (03CR) 10Jcrespo: [C: 03+2] admin: Add apaskulin to wmf ldap-only users [puppet] - 10https://gerrit.wikimedia.org/r/585687 (https://phabricator.wikimedia.org/T249272) (owner: 10Jcrespo) [08:08:38] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/21694/" [puppet] - 10https://gerrit.wikimedia.org/r/585688 (https://phabricator.wikimedia.org/T245567) (owner: 10Dzahn) [08:09:10] (03PS2) 10Dzahn: site: add role(dumps::web::htmldumps) to htmldumper1001 [puppet] - 10https://gerrit.wikimedia.org/r/585688 (https://phabricator.wikimedia.org/T245567) [08:09:23] (03PS1) 10Marostegui: mariadb: Remove old basedir options [puppet] - 10https://gerrit.wikimedia.org/r/585689 [08:09:38] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: LDAP access to the wmf group for Alex Paskulin - https://phabricator.wikimedia.org/T249272 (10jcrespo) 05Open→03Resolved Access is deployed, and should take effect immediately: https://tools.wmflabs.org/ldap/user/apaskulin Resolving ticket, but p... [08:10:15] (03CR) 10Muehlenhoff: [C: 03+1] mariadb: Remove old basedir options [puppet] - 10https://gerrit.wikimedia.org/r/585689 (owner: 10Marostegui) [08:10:59] XioNoX: ah? [08:11:17] (03PS2) 10Dzahn: ATS/phabricator: directly talk wss:// to aphlict [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) [08:11:44] godog: if you want a mystery for breakfast [08:11:52] (03CR) 10jerkins-bot: [V: 04-1] ATS/phabricator: directly talk wss:// to aphlict [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [08:11:52] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=ganeti1011.mgmt is up [08:12:32] but `icinga1001:~$ ping ganeti1011.mgmt` doesn't reply [08:12:39] eh... DNS CRITICAL - expected '0.0.0.0' ?? [08:13:14] weird, not long ago SSH was down but DNS was up with the good IP [08:13:53] so I guess for some reason ganeti1011.mgmt resolves to 0.0.0.0 in icinga [08:14:08] (03CR) 10Vgutierrez: [C: 04-1] ATS/phabricator: directly talk wss:// to aphlict (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [08:14:17] and ssh 0.0.0.0 works (goes to localhost) [08:14:24] (03CR) 10Marostegui: [C: 03+2] mariadb: Remove old basedir options [puppet] - 10https://gerrit.wikimedia.org/r/585689 (owner: 10Marostegui) [08:14:31] [icinga1001:~] $ host ganeti1011.mgmt.eqiad.wmnet [08:14:31] ganeti1011.mgmt.eqiad.wmnet has address 10.65.5.106 [08:15:16] (03PS1) 10Elukey: admin: clarify kerberos account creation for analytics-privatedata [puppet] - 10https://gerrit.wikimedia.org/r/585692 (https://phabricator.wikimedia.org/T248905) [08:15:38] yeah [08:15:50] in /etc/icinga/objects/puppet_hosts.cfg https://www.irccloud.com/pastebin/zggSQHvP/ [08:15:55] (03CR) 10jerkins-bot: [V: 04-1] admin: clarify kerberos account creation for analytics-privatedata [puppet] - 10https://gerrit.wikimedia.org/r/585692 (https://phabricator.wikimedia.org/T248905) (owner: 10Elukey) [08:15:57] fascinating! [08:16:01] ganeti1012.mgmt is fine: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=ganeti1012.mgmt [08:16:08] lookign for typos in DNS but dont see one yet [08:16:09] thanks I'll pass for now, I can see the rabbit hole from here :) [08:16:58] cat /etc/icinga/objects/puppet_hosts.cfg | grep 0.0.0.0 | wc -l = 1 [08:17:11] so it's only that one [08:17:56] how about manually remove it from the file and run puppet to recreate it [08:17:57] some race condition with Puppet, icinga and DNS (maybe the new Netbox driven one?) < volans|off [08:19:07] I don't know much about icinga to know if it would break more or less things so I let you decide :) It probably needs a task too [08:19:37] XioNoX: interesting comment on that switch ticket: "Interesting that ganeti1011's mgmt interface recovered, but not the others." [08:19:49] so the only one that recovered is now the special case again? [08:19:50] so it did go down? [08:19:56] i dont know [08:20:23] but it's the same host being special in 2 cases [08:21:29] mutante: it changed twice recently: https://puppetboard.wikimedia.org/node/icinga1001.wikimedia.org [08:22:06] and the icinga hosts files https://puppetboard.wikimedia.org/report/icinga1001.wikimedia.org/1598ba8ba929f9c7d479da268f3ff8adbc1e93da [08:22:45] ah no that one: https://puppetboard.wikimedia.org/report/icinga1001.wikimedia.org/015b42c281a8b504e089f1db5c96bf84b0bc8bf1 [08:22:51] ctrl+f 0.0.0.0 [08:22:58] (03PS3) 10Giuseppe Lavagetto: profile::tlsprox::envoy: update request_timeout parameter [puppet] - 10https://gerrit.wikimedia.org/r/585517 (owner: 10Jbond) [08:23:06] define host { - address 10.65.5.106 + address 0.0.0.0 [08:24:10] so we know when it happend, but not why yet, probably needs to dig on how those IPs are created? [08:25:14] that IP was added to DNS already on 2019-08-14 [08:25:35] i dont know the status of autogenerated DNS [08:26:13] (03PS2) 10Jcrespo: admin: clarify kerberos account creation for analytics-privatedata [puppet] - 10https://gerrit.wikimedia.org/r/585692 (https://phabricator.wikimedia.org/T248905) (owner: 10Elukey) [08:27:22] (03PS1) 10Vgutierrez: ATS: Enable inbound TLSv1.3 on the upload cluster [puppet] - 10https://gerrit.wikimedia.org/r/585697 (https://phabricator.wikimedia.org/T170567) [08:29:56] XioNoX: the netbox dns snippets are not yet used in prod, just deployed but not included into gdnsd zonefiles [08:30:06] (03CR) 10Jcrespo: [C: 03+1] admin: clarify kerberos account creation for analytics-privatedata [puppet] - 10https://gerrit.wikimedia.org/r/585692 (https://phabricator.wikimedia.org/T248905) (owner: 10Elukey) [08:30:10] ok [08:30:23] * volans|off on mobile, didn't read backscroll [08:30:24] thanks, so one less moving part :) [08:30:34] what's the issue? [08:31:16] volans|off: define host { - address 10.65.5.106 + address 0.0.0.0 [08:31:40] volans|off: are you working today? [08:31:57] (03PS1) 10KartikMistry: apertium-eo-en: Fix FTBFS with apertium 3.6 [debs/contenttranslation/apertium-eo-en] - 10https://gerrit.wikimedia.org/r/585699 (https://phabricator.wikimedia.org/T247585) [08:32:30] if it's an issue I can ;) not that I can go anywhere anyway :-P [08:33:50] XioNoX: don't nerd snipe Riccardo please [08:33:55] I know it is easy [08:34:03] :D [08:34:16] (03CR) 10Muehlenhoff: [C: 03+1] admin: clarify kerberos account creation for analytics-privatedata [puppet] - 10https://gerrit.wikimedia.org/r/585692 (https://phabricator.wikimedia.org/T248905) (owner: 10Elukey) [08:34:30] (03CR) 10Elukey: [C: 03+2] admin: clarify kerberos account creation for analytics-privatedata [puppet] - 10https://gerrit.wikimedia.org/r/585692 (https://phabricator.wikimedia.org/T248905) (owner: 10Elukey) [08:35:13] volans|off: play some sailing simulation game :-) [08:35:37] 10Operations, 10ops-eqiad, 10DC-Ops: (Need by: 2020-03-01) rack/setup/install htmldumper1001.eqiad.wmnet. - https://phabricator.wikimedia.org/T245567 (10Dzahn) >>! In T245567#6025491, @ArielGlenn wrote: > It should get role(dumps::web::htmldumps) Done. Applied and puppet ran and no puppet failure. [08:35:37] lol [08:36:46] 10Operations, 10ops-eqiad, 10DC-Ops: (Need by: 2020-03-01) rack/setup/install htmldumper1001.eqiad.wmnet. - https://phabricator.wikimedia.org/T245567 (10ArielGlenn) Thanks so much everyone! [08:37:48] 10Operations, 10ops-eqiad, 10DC-Ops: (Need by: 2020-03-01) rack/setup/install htmldumper1001.eqiad.wmnet. - https://phabricator.wikimedia.org/T245567 (10Dzahn) 05Resolved→03Open [08:38:06] 10Operations: Onboarding Janis Meybohm - https://phabricator.wikimedia.org/T249081 (10MoritzMuehlenhoff) [08:38:11] volans|off: it's a very low impact issue, so it can wait. But basically, for some reason in https://puppetboard.wikimedia.org/report/icinga1001.wikimedia.org/015b42c281a8b504e089f1db5c96bf84b0bc8bf1 Puppet decided that ganeti1011.mgmt was not 10.65.5.106 anymore but 0.0.0.0 [08:38:24] which is not very convenient for monitoring it [08:38:37] 10Operations, 10ops-eqiad, 10DC-Ops: (Need by: 2020-03-01) rack/setup/install htmldumper1001.eqiad.wmnet. - https://phabricator.wikimedia.org/T245567 (10Dzahn) a:05Cmjohnson→03ArielGlenn [08:41:24] (03CR) 10Vgutierrez: "pcc shows the right number of DIFFs VS NOOPs: https://puppet-compiler.wmflabs.org/compiler1002/21695/" [puppet] - 10https://gerrit.wikimedia.org/r/585697 (https://phabricator.wikimedia.org/T170567) (owner: 10Vgutierrez) [08:43:58] !log Deploy schema change on dbstore1003:3317 [08:44:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:07] XioNoX: ack I'll see if I can have a look in a bit [09:02:34] (03PS1) 10ArielGlenn: pass an actual tuple (dumps text pass log entries filtering) [puppet] - 10https://gerrit.wikimedia.org/r/585701 [09:04:47] (03CR) 10ArielGlenn: [C: 03+2] pass an actual tuple (dumps text pass log entries filtering) [puppet] - 10https://gerrit.wikimedia.org/r/585701 (owner: 10ArielGlenn) [09:09:10] 10Operations: Onboarding Janis Meybohm - https://phabricator.wikimedia.org/T249081 (10MoritzMuehlenhoff) [09:16:44] XioNoX: ganeti1011, is there a task? [09:16:47] I know the answer [09:17:09] not going to answer then [09:17:36] :) [09:17:53] if you open the task I give you the answer :-P [09:18:50] PROBLEM - DNS on ganeti1011.mgmt is CRITICAL: DNS CRITICAL - expected 0.0.0.0 but got 10.65.5.106 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [09:19:09] okok [09:22:32] 10Operations: Puppet resolves wrong IP for Icinga host config - https://phabricator.wikimedia.org/T249314 (10ayounsi) p:05Triage→03Medium [09:22:46] volans|off: https://phabricator.wikimedia.org/T249314 [09:23:34] thanks! [09:25:45] (03PS5) 10Dzahn: phabricator: remove firewall holes for port 80 [puppet] - 10https://gerrit.wikimedia.org/r/569100 [09:25:47] (03PS3) 10Dzahn: ATS/phabricator: directly talk wss:// to aphlict [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) [09:27:06] (03CR) 10Dzahn: ATS/phabricator: directly talk wss:// to aphlict (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [09:27:22] 10Operations, 10ops-eqiad: Puppet resolves wrong IP for Icinga host config - https://phabricator.wikimedia.org/T249314 (10Volans) so, it looks like puppet change is due because of the facts exported by ganeti1011: ` ganeti1011 ~$ sudo facter -p ipmi_lan { ipaddress => "0.0.0.0", macaddress => "4c:d9:8f:66... [09:27:26] XioNoX: {done} ^^^ [09:28:36] volans|off: thx [09:29:31] 10Operations, 10ops-eqiad: Puppet resolves wrong IP for Icinga host config - https://phabricator.wikimedia.org/T249314 (10ayounsi) Also a check somewhere that this IP is: 1/ unique 2/ not 0.0.0.0 would be useful [09:30:30] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [09:31:22] 10Operations, 10ops-eqiad: Puppet resolves wrong IP for Icinga host config - https://phabricator.wikimedia.org/T249314 (10Volans) >>! In T249314#6025771, @ayounsi wrote: > Also a check somewhere that this IP is: > 1/ unique > 2/ not 0.0.0.0 > would be useful That's https://gerrit.wikimedia.org/r/plugins/gitil... [09:32:24] 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO (2020-04 to 2020-06 (Q4)): gerrit1002 running out of space - https://phabricator.wikimedia.org/T243808 (10jbond) >>! In T243808#6024187, @Dzahn wrote: > Alright. This is a one-time installation though to test the Gerrit upgrade to 2.16 and then remove i... [09:32:38] !log Deploy schema on db1116:3317 [09:32:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:33:05] (03PS4) 10Dzahn: ATS/phabricator: directly talk wss:// to aphlict [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) [09:36:33] 10Operations, 10Wikimedia-Mailing-lists: Creation of three Wikimedia CH mailing lists - https://phabricator.wikimedia.org/T248910 (10jcrespo) 05Open→03Resolved {F31725720} The 3 requested lists have been created and you set as administrator, as noted here: https://meta.wikimedia.org/w/index.php?title=Maili... [09:37:14] 10Operations, 10ops-eqiad, 10DC-Ops: ganeti1011.mgmt is un-configured (was: Puppet resolves wrong IP for Icinga host config) - https://phabricator.wikimedia.org/T249314 (10Dzahn) 05Open→03Stalled [09:37:48] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3056 is OK: HTTP OK: HTTP/1.0 200 OK - 22406 bytes in 0.257 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [09:38:40] 10Operations, 10ops-codfw, 10decommission, 10serviceops, 10Patch-For-Review: codfw: decom at least 15 appservers(mw2158 through mw2172) in codfw rack C3 to make room for new servers - https://phabricator.wikimedia.org/T247018 (10Dzahn) 05Open→03Stalled [09:44:26] 10Operations, 10Wikimedia-Mailing-lists: Request for new list - https://phabricator.wikimedia.org/T249281 (10jcrespo) p:05Triage→03High a:03jcrespo [09:51:25] (03PS1) 10Elukey: Revert "profile::analytics::refinery::job::refine: exclude TwoColConflictExit" [puppet] - 10https://gerrit.wikimedia.org/r/585713 [09:54:17] 10Operations, 10Wikimedia-Mailing-lists: Request for new list - https://phabricator.wikimedia.org/T249281 (10jcrespo) 05Open→03Resolved The list has been created, with no initial subscribers: https://lists.wikimedia.org/mailman/listinfo/covid-19-stats List options are by default (including public listing a... [09:54:56] (03CR) 10Elukey: [C: 03+2] Revert "profile::analytics::refinery::job::refine: exclude TwoColConflictExit" [puppet] - 10https://gerrit.wikimedia.org/r/585713 (owner: 10Elukey) [09:58:10] (03CR) 10Dzahn: "20after4, vgutierrez: Is this correct now? Including the part that one file has the /ws/ path for both target and replacement and the othe" [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [10:00:25] 10Operations, 10Wikimedia-Mailing-lists: Request for new list COVID-19-stats - https://phabricator.wikimedia.org/T249281 (10Aklapper) [10:02:26] (03CR) 10Ema: [C: 03+1] ATS: Enable inbound TLSv1.3 on the upload cluster [puppet] - 10https://gerrit.wikimedia.org/r/585697 (https://phabricator.wikimedia.org/T170567) (owner: 10Vgutierrez) [10:09:27] (03PS3) 10Dzahn: hiera/apt.wikimedia.org: switch from install1002 to apt1001 [puppet] - 10https://gerrit.wikimedia.org/r/585245 (https://phabricator.wikimedia.org/T224576) [10:10:07] (03CR) 10Vgutierrez: [C: 04-1] ATS/phabricator: directly talk wss:// to aphlict (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [10:10:22] (03CR) 10Dzahn: [C: 03+2] hiera/apt.wikimedia.org: switch from install1002 to apt1001 [puppet] - 10https://gerrit.wikimedia.org/r/585245 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [10:12:41] (03CR) 10Dzahn: "This changed ferm rules and rsyncd setup for data rsync between APT servers." [puppet] - 10https://gerrit.wikimedia.org/r/585245 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [10:15:13] 10Operations, 10Traffic: Only retry failed requests for external traffic on cache frontends - https://phabricator.wikimedia.org/T249317 (10ema) [10:15:17] 10Operations, 10Traffic: Only retry failed requests for external traffic on cache frontends - https://phabricator.wikimedia.org/T249317 (10ema) p:05Triage→03Medium [10:24:04] 10Operations, 10Patch-For-Review: backup space is used unwisely - https://phabricator.wikimedia.org/T159524 (10jcrespo) 05Open→03Resolved a:03jcrespo Resolved long time ago with new backup setup initial refactoring: T229209 More improvements on monitoring will come soon, too. [10:26:44] XioNoX: earlier you imported fastnetmon to the APT repo. could you do that one more time on apt1001 please? [10:34:00] (03PS1) 10Urbanecm: Enable cswiki anniversary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585721 (https://phabricator.wikimedia.org/T249173) [10:35:15] (03PS1) 10Dzahn: aptrepo: add MOTD warning to not use old install servers for APT [puppet] - 10https://gerrit.wikimedia.org/r/585722 (https://phabricator.wikimedia.org/T224576) [10:35:30] (03CR) 10Urbanecm: [C: 03+2] Enable cswiki anniversary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585721 (https://phabricator.wikimedia.org/T249173) (owner: 10Urbanecm) [10:35:33] (03CR) 10Hnowlan: [C: 03+1] Changeprop: Listen to mediawiki.page-suppress topic [deployment-charts] - 10https://gerrit.wikimedia.org/r/584672 (https://phabricator.wikimedia.org/T242025) (owner: 10Ppchelko) [10:36:23] (03Merged) 10jenkins-bot: Enable cswiki anniversary logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585721 (https://phabricator.wikimedia.org/T249173) (owner: 10Urbanecm) [10:37:21] (03PS2) 10Dzahn: switch apt.wikimedia.org from install1002 to apt1001 [dns] - 10https://gerrit.wikimedia.org/r/575404 (https://phabricator.wikimedia.org/T224576) [10:37:47] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1101:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P10872 and previous config saved to /var/cache/conftool/dbconfig/20200403-103746-marostegui.json [10:37:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:37:59] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: 861b267: Enable cswiki anniversary logo (T249173) (duration: 01m 02s) [10:38:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:38:04] T249173: Czech Wikipedia 450k milestone special logo - https://phabricator.wikimedia.org/T249173 [10:38:06] !log Deploy schema change on db1101:3317 [10:38:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:38:23] 10Operations, 10Traffic, 10good first task: Only retry failed requests for external traffic on cache frontends - https://phabricator.wikimedia.org/T249317 (10ema) [10:38:29] (03CR) 10jerkins-bot: [V: 04-1] aptrepo: add MOTD warning to not use old install servers for APT [puppet] - 10https://gerrit.wikimedia.org/r/585722 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [10:39:31] (03PS2) 10Dzahn: aptrepo: add MOTD warning to not use old install servers for APT [puppet] - 10https://gerrit.wikimedia.org/r/585722 (https://phabricator.wikimedia.org/T224576) [10:47:18] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1001/21697/apt1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/585722 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [10:47:22] (03PS1) 10Dzahn: aptrepo: remove reprepro package on inactive servers [puppet] - 10https://gerrit.wikimedia.org/r/585723 (https://phabricator.wikimedia.org/T224576) [10:55:04] (03CR) 10Vgutierrez: [C: 04-2] "To be merged on Monday" [puppet] - 10https://gerrit.wikimedia.org/r/585697 (https://phabricator.wikimedia.org/T170567) (owner: 10Vgutierrez) [10:57:36] PROBLEM - PHP opcache health on mw2306 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [11:01:32] (03CR) 10Dzahn: [C: 03+2] switch apt.wikimedia.org from install1002 to apt1001 [dns] - 10https://gerrit.wikimedia.org/r/575404 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [11:08:06] _joe_: ^ [11:08:47] mutante: I'm on my phone (out to the supermarket), can do it later on [11:08:48] <_joe_> XioNoX: yeah my change didn't fix every case [11:08:53] <_joe_> and I think I know why [11:09:34] XioNoX: thanks! the reason is it's the only package that was not already rsynced and just switched the backend [11:09:49] i did a diff of find /srv/ [11:10:03] ok, yeah saw your email [11:10:06] cool [11:15:31] (03PS1) 10Dzahn: monitoring::host: remove superflouous lint-ignore [puppet] - 10https://gerrit.wikimedia.org/r/585724 [11:19:30] (03PS2) 10Dzahn: aptrepo: remove reprepro package on inactive servers [puppet] - 10https://gerrit.wikimedia.org/r/585723 (https://phabricator.wikimedia.org/T224576) [11:19:34] (03CR) 10jerkins-bot: [V: 04-1] monitoring::host: remove superflouous lint-ignore [puppet] - 10https://gerrit.wikimedia.org/r/585724 (owner: 10Dzahn) [11:20:23] (03PS1) 10Mvolz: Update spec.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/585726 [11:21:32] RECOVERY - PHP opcache health on mw2306 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [11:28:24] (03PS2) 10Dzahn: remove multiple superfluous lint-ignore lines [puppet] - 10https://gerrit.wikimedia.org/r/585724 [11:29:10] 10Operations: purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10Urbanecm) [11:31:59] (03CR) 10jerkins-bot: [V: 04-1] remove multiple superfluous lint-ignore lines [puppet] - 10https://gerrit.wikimedia.org/r/585724 (owner: 10Dzahn) [11:37:18] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1101:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P10874 and previous config saved to /var/cache/conftool/dbconfig/20200403-113717-marostegui.json [11:37:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:40:04] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1098:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P10875 and previous config saved to /var/cache/conftool/dbconfig/20200403-114004-marostegui.json [11:40:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:40:44] !log Deploy schema change on db1098:3317 [11:40:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:47:01] (03CR) 10Muehlenhoff: "I don't think removal is useful? Let's rather keep it unaround untouched in case we need to dig out old stuff (and then simply retire it " [puppet] - 10https://gerrit.wikimedia.org/r/585723 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [11:48:03] (03Abandoned) 10Dzahn: aptrepo: remove reprepro package on inactive servers [puppet] - 10https://gerrit.wikimedia.org/r/585723 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [11:48:23] (03Abandoned) 10Dzahn: remove multiple superfluous lint-ignore lines [puppet] - 10https://gerrit.wikimedia.org/r/585724 (owner: 10Dzahn) [11:51:54] (03PS5) 10Dzahn: ATS/phabricator: directly talk wss:// to aphlict [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) [11:55:12] (03CR) 10Dzahn: ATS/phabricator: directly talk wss:// to aphlict (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [11:59:00] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1098:3317 after schema change', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20200403-115854-marostegui.json [11:59:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:59:18] (03CR) 10Dzahn: [C: 03+1] "thanks! and yes, i had to do the same on phabricator after buster upgrade" [puppet] - 10https://gerrit.wikimedia.org/r/585528 (https://phabricator.wikimedia.org/T224576) (owner: 10Muehlenhoff) [12:00:00] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1094 for schema change', diff saved to https://phabricator.wikimedia.org/P10877 and previous config saved to /var/cache/conftool/dbconfig/20200403-115959-marostegui.json [12:00:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:47] !log Deploy schema change on db1094 [12:00:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:18] 10Operations, 10Patch-For-Review: Upgrade install servers to Buster - https://phabricator.wikimedia.org/T224576 (10Dzahn) [12:08:42] (03CR) 10Vgutierrez: "I'd wait till Monday to merge this one" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [12:09:43] 10Operations: persistent cronspam from Cron Daemon - https://phabricator.wikimedia.org/T247608 (10Dzahn) Since https://gerrit.wikimedia.org/r/c/operations/puppet/+/585245 this should now be stopped. [12:12:40] 10Operations, 10observability, 10serviceops, 10Performance-Team (Radar), 10User-Elukey: Create an alert for high memcached bw usage - https://phabricator.wikimedia.org/T224454 (10CDanis) a:03CDanis [12:23:00] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1094 after schema change', diff saved to https://phabricator.wikimedia.org/P10878 and previous config saved to /var/cache/conftool/dbconfig/20200403-122259-marostegui.json [12:23:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:26:39] (03CR) 10Dzahn: ATS/phabricator: directly talk wss:// to aphlict (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [12:27:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1136 for schema change', diff saved to https://phabricator.wikimedia.org/P10879 and previous config saved to /var/cache/conftool/dbconfig/20200403-122716-marostegui.json [12:27:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:27:31] !log Deploy schema change on db1136 [12:27:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:27:43] 10Operations, 10Patch-For-Review: Sort out plan for install* servers in edge sites - https://phabricator.wikimedia.org/T242602 (10Dzahn) install1002/2002 have now been replaced by install1003/2003 for DHCP/TFTP/webproxy and apt1001/2001 for apt.wikimedia.org / reprepro. [12:30:16] (03PS6) 10Dzahn: ATS/phabricator: directly talk wss:// to aphlict [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) [12:30:41] (03CR) 10Dzahn: "thanks for the reviews and I agree about doing it on Monday and not now." [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [12:36:28] 10Operations, 10MediaWiki-Maintenance-scripts, 10MediaWiki-extensions-Maintenance, 10Traffic: purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10jcrespo) [12:36:44] (03CR) 1020after4: [C: 03+1] ATS/phabricator: directly talk wss:// to aphlict [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [12:37:22] 10Operations, 10MediaWiki-Maintenance-scripts, 10Traffic: purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10jcrespo) [12:39:57] 10Operations, 10MediaWiki-Maintenance-scripts, 10Traffic: purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10CDanis) Is `en.wikipedia.org` really the correct hostname to use for cswiki's logo? [12:44:58] !log dcausse@deploy1001 Started deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs1007: testing T249196 [12:45:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:04] T249196: Test the impact of the wdqs updater performance by disabling values cleanup - https://phabricator.wikimedia.org/T249196 [12:45:41] !log dcausse@deploy1001 Finished deploy [wdqs/wdqs@23495ae]: deploying wdqs 0.3.17 to wdqs1007: testing T249196 (duration: 00m 43s) [12:45:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:48:28] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1136 after schema change', diff saved to https://phabricator.wikimedia.org/P10880 and previous config saved to /var/cache/conftool/dbconfig/20200403-124827-marostegui.json [12:48:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:49:08] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1090:3317 for schema change', diff saved to https://phabricator.wikimedia.org/P10881 and previous config saved to /var/cache/conftool/dbconfig/20200403-124908-marostegui.json [12:49:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:49:23] !log Deploy schema change on db1090:3317 [12:49:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:51:36] mutante: done [12:51:47] (03CR) 10Dzahn: [C: 04-2] contint: use package_from_component, stop using docker class (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/566383 (https://phabricator.wikimedia.org/T224591) (owner: 10Dzahn) [12:51:55] XioNoX: thanks! [12:54:24] PROBLEM - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is CRITICAL: CRITICAL - failed 51 probes of 547 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:00:20] RECOVERY - IPv6 ping to eqsin on ripe-atlas-eqsin IPv6 is OK: OK - failed 37 probes of 547 (alerts on 50) - https://atlas.ripe.net/measurements/11645088/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [13:16:14] (03Abandoned) 10Hashar: zuul: Convert to using scap [puppet] - 10https://gerrit.wikimedia.org/r/489012 (https://phabricator.wikimedia.org/T215458) (owner: 10Paladox) [13:18:16] PROBLEM - PHP opcache health on mw2271 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:25:48] 10Operations, 10netops: IRR updates needed - https://phabricator.wikimedia.org/T235886 (10ayounsi) About: > We found that the prefixes 185.15.56.0/22 and 2a02:ec80::/29 are in use but not documented in the RIPE Database as assignments. After discussing it with John, the deeper issue might be that they are "... [13:29:14] RECOVERY - PHP opcache health on mw2271 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:30:15] (03CR) 10Muehlenhoff: "Some initial comments, I'll test this further on Monday." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/584638 (https://phabricator.wikimedia.org/T233950) (owner: 10Jbond) [13:46:45] 10Operations, 10Traffic: cp1075 + cp1081 being Pybal-depooled/repooled frequently - https://phabricator.wikimedia.org/T249335 (10CDanis) [13:48:56] 10Operations, 10Traffic: cp1075 + cp1081 being Pybal-depooled/repooled frequently - https://phabricator.wikimedia.org/T249335 (10Vgutierrez) p:05Triage→03Medium [13:50:44] (03CR) 10Lydia Pintscher: [C: 03+1] "Yay :) Thanks for keeping an eye on these, Raimond." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585673 (owner: 10Raimond Spekking) [13:55:57] !log restart ats-tls on cp1075 and cp1081 - T249335 [13:56:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:56:04] T249335: cp1075 + cp1081 being Pybal-depooled/repooled frequently - https://phabricator.wikimedia.org/T249335 [13:57:10] 10Operations, 10SRE-Access-Requests: Requesting access to mwmaint1002.eqiad.wmnet for holger - https://phabricator.wikimedia.org/T248922 (10holger.knust) 05Open→03Resolved [13:59:57] (03PS3) 10Jbond: tomcat: create new tomcat module intended for use with apereo cas [puppet] - 10https://gerrit.wikimedia.org/r/584638 (https://phabricator.wikimedia.org/T233950) [14:00:14] (03CR) 10Jbond: "updated thanks" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/584638 (https://phabricator.wikimedia.org/T233950) (owner: 10Jbond) [14:01:32] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1090:3317 after schema change', diff saved to https://phabricator.wikimedia.org/P10882 and previous config saved to /var/cache/conftool/dbconfig/20200403-140132-marostegui.json [14:01:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:02:20] 10Operations, 10MediaWiki-Parser, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), 10Wikimedia-Incident: API action=parse should be poolcounter-limited if a re-parse is necessary - https://phabricator.wikimedia.org/T243803 (10Peter.ovchyn) @Anomie, @nnikkhoui, Do I understand correctly... [14:03:47] (03PS1) 10Muehlenhoff: Add dh-python to package builder packages [puppet] - 10https://gerrit.wikimedia.org/r/585755 [14:07:21] !log restart ats-tls on cp1087 - T249335 [14:07:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:26] T249335: cp1075 + cp1081 being Pybal-depooled/repooled frequently - https://phabricator.wikimedia.org/T249335 [14:07:46] (03CR) 10Muehlenhoff: [C: 03+2] Add dh-python to package builder packages [puppet] - 10https://gerrit.wikimedia.org/r/585755 (owner: 10Muehlenhoff) [14:10:40] godog: moritzm: [14:10:44] sorry ignore that [14:11:58] * jbond42 fyi i use the irssi go.pl script which is why go.dog you are the unfortunate one to always get tagged sorry [14:12:54] 10Operations, 10MediaWiki-Maintenance-scripts, 10Traffic: purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10jcrespo) According to documentation is: > This is because the cache for /static is shared between all wikis, and the canonical form int... [14:13:15] (03PS1) 10Jhedden: openstack: update cloudvirt2001-dev flat interface [puppet] - 10https://gerrit.wikimedia.org/r/585757 (https://phabricator.wikimedia.org/T248425) [14:15:02] 10Operations, 10MediaWiki-Maintenance-scripts, 10Traffic: purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10jcrespo) @WDoranWMF Infrastructure side of things would fall probably under #traffic but who would be the right maintainer of Mediawiki... [14:15:05] (03CR) 10Jhedden: [C: 03+2] openstack: update cloudvirt2001-dev flat interface [puppet] - 10https://gerrit.wikimedia.org/r/585757 (https://phabricator.wikimedia.org/T248425) (owner: 10Jhedden) [14:17:36] (03PS1) 10Ssingh: Add python3-setuptools to package builder packages [puppet] - 10https://gerrit.wikimedia.org/r/585758 [14:19:58] 10Operations, 10Traffic: cp1075 + cp1081 being Pybal-depooled/repooled frequently - https://phabricator.wikimedia.org/T249335 (10Vgutierrez) as it can be seen in https://grafana.wikimedia.org/d/80zd3mjZk/t249335?orgId=1 it looks like there is a memory leak on ats-tls that at some point begins to hit negatively... [14:20:10] 10Operations, 10Core Platform Team, 10MediaWiki-Maintenance-scripts, 10Traffic: purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10WDoranWMF) @jcrespo Excellent question <- which is what people say when they aren't positive what the answer is.... [14:25:40] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good, this is referenced by the setup.py of cescout at source generation time" [puppet] - 10https://gerrit.wikimedia.org/r/585758 (owner: 10Ssingh) [14:25:48] (03PS6) 10Jcrespo: jenkins: Adjust CSP header to allow inline CSS and video playback [puppet] - 10https://gerrit.wikimedia.org/r/585038 (https://phabricator.wikimedia.org/T245658) (owner: 10Hashar) [14:26:18] (03CR) 10Ssingh: [C: 03+2] Add python3-setuptools to package builder packages [puppet] - 10https://gerrit.wikimedia.org/r/585758 (owner: 10Ssingh) [14:26:27] (03CR) 10Jcrespo: [C: 03+1] jenkins: Adjust CSP header to allow inline CSS and video playback [puppet] - 10https://gerrit.wikimedia.org/r/585038 (https://phabricator.wikimedia.org/T245658) (owner: 10Hashar) [14:27:19] (03CR) 10Jcrespo: [C: 03+2] jenkins: Adjust CSP header to allow inline CSS and video playback [puppet] - 10https://gerrit.wikimedia.org/r/585038 (https://phabricator.wikimedia.org/T245658) (owner: 10Hashar) [14:30:18] (03PS1) 10Elukey: admin: allow analytics-privatedata-users to use GPUs by default [puppet] - 10https://gerrit.wikimedia.org/r/585760 [14:31:57] jbond42: haha! yeah I use go.pl myself, and andress myself sometimes too :)) [14:32:07] super useful [14:32:20] 10Operations, 10Core Platform Team, 10MediaWiki-Maintenance-scripts, 10Traffic: purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10jcrespo) > It sounds like High but is it UBN? I've done a very superficial triage, just pinging some teams to k... [14:33:16] 10Operations, 10Core Platform Team, 10MediaWiki-Maintenance-scripts, 10Traffic: purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10BBlack) Most likely the cause is that the Varnish rule for normalizing `/static/` to the enwiki hostname hasn't... [14:33:44] godog: lol :D [14:35:39] (03CR) 10Bstorm: "> Patch Set 4:" [puppet] - 10https://gerrit.wikimedia.org/r/585576 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [14:36:49] 10Operations, 10Core Platform Team, 10MediaWiki-Maintenance-scripts, 10Traffic: purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10jcrespo) @WDoranWMF if BBlack is right, this may not need mw code changes, we should wait for that. [14:36:55] (03PS1) 10Elukey: Revert "Revert "profile::analytics::refinery::job::refine: exclude TwoColConflictExit"" [puppet] - 10https://gerrit.wikimedia.org/r/585761 [14:36:59] sigh [14:37:10] !log Restarting Jenkins for a CSP parameter T245658 [14:37:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:37:16] T245658: .mp4 build artifacts not viewable due to CSP in Chrome - https://phabricator.wikimedia.org/T245658 [14:39:05] 10Operations, 10Core Platform Team, 10MediaWiki-Maintenance-scripts, 10Traffic: purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10BBlack) Yeah, the varnish (frontend) code for this is in `modules/varnish/templates/text-frontend.inc.vcl.erb`:... [14:40:37] (03CR) 10Elukey: [C: 03+2] Revert "Revert "profile::analytics::refinery::job::refine: exclude TwoColConflictExit"" [puppet] - 10https://gerrit.wikimedia.org/r/585761 (owner: 10Elukey) [14:43:16] (03CR) 10Ayounsi: "See inline comment, I have the module in the homer root dir then plugins/wmf-homer.py" (031 comment) [software/homer] - 10https://gerrit.wikimedia.org/r/584973 (owner: 10Volans) [14:44:39] (03PS1) 10Cparle: Revert "Revert "Enable WikibaseQualityConstraints on commons"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585762 [14:45:14] 10Operations, 10Traffic: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 (10Vgutierrez) [14:45:29] 10Operations, 10MediaWiki-Maintenance-scripts, 10Traffic, 10Core Platform Team Workboards (Clinic Duty Team): purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10Anomie) The maintenance script seems like it should be functioning, assuming any p... [14:47:37] (03CR) 10Kosta Harlan: [C: 04-1] "Question about the global flag for allowing homepage to be enabled, and also, we should wait for Benoit to reply about what's happening wi" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584183 (https://phabricator.wikimedia.org/T235964) (owner: 10Gergő Tisza) [14:47:39] 10Operations, 10MediaWiki-Maintenance-scripts, 10Traffic, 10Core Platform Team Workboards (Clinic Duty Team): purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10BBlack) ` bblack@cumin1001:~$ sudo cumin A:cp-text 'curl -s https://en.wikipedia.o... [14:52:59] 10Operations, 10MediaWiki-Cache, 10Traffic, 10Core Platform Team Workboards (Clinic Duty Team): purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10Krinkle) [14:53:18] (03CR) 10Muehlenhoff: [C: 03+1] "GPUs are risky per se (it's a userspace interface with a close connection to kernel drivers with privileged access to the kernel), but we " [puppet] - 10https://gerrit.wikimedia.org/r/585760 (owner: 10Elukey) [14:57:46] (03PS1) 10Ppchelko: ChangeProp: fix metrics mappings [deployment-charts] - 10https://gerrit.wikimedia.org/r/585764 [15:00:59] (03CR) 10Hnowlan: [C: 03+2] ChangeProp: fix metrics mappings [deployment-charts] - 10https://gerrit.wikimedia.org/r/585764 (owner: 10Ppchelko) [15:01:19] (03Merged) 10jenkins-bot: ChangeProp: fix metrics mappings [deployment-charts] - 10https://gerrit.wikimedia.org/r/585764 (owner: 10Ppchelko) [15:02:12] (03PS1) 10Dzahn: installserver: rsync home dir data for migration [puppet] - 10https://gerrit.wikimedia.org/r/585765 (https://phabricator.wikimedia.org/T224576) [15:02:44] 10Operations, 10Wikimedia-Mailing-lists: Request for new mailing list Deutschschweiz - https://phabricator.wikimedia.org/T247737 (10Lantus) 05Open→03Resolved a:03Lantus thanks a lot. I am very happy with your work. For me everything is ready. [15:04:47] (03PS1) 10Cparle: Enable WikibaseQualityConstraints on test commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585766 (https://phabricator.wikimedia.org/T248117) [15:06:01] (03PS1) 10Ssingh: Update the Debian package for the v0.1.1 release [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/585767 [15:06:59] (03PS2) 10Dzahn: installserver: rsync home dir data for migration [puppet] - 10https://gerrit.wikimedia.org/r/585765 (https://phabricator.wikimedia.org/T224576) [15:11:26] (03CR) 10Lucas Werkmeister (WMDE): "Looks good to me; the order of wikis is a bit inconsistent, but I’m not sure how strictly that’s usually handled, so it’s probably fine." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585766 (https://phabricator.wikimedia.org/T248117) (owner: 10Cparle) [15:11:29] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: LDAP access to the wmf group for Alex Paskulin - https://phabricator.wikimedia.org/T249272 (10apaskulin) Thank you! Accounts linked. [15:12:58] (03PS3) 10Dzahn: installserver: rsync home dir data for migration [puppet] - 10https://gerrit.wikimedia.org/r/585765 (https://phabricator.wikimedia.org/T224576) [15:16:13] (03CR) 10Andrew Bogott: "It definitely won't actively change anything on those servers but I don't want to break the option of rebuilding." [puppet] - 10https://gerrit.wikimedia.org/r/585576 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [15:16:17] (03CR) 10Ayounsi: [C: 03+1] "Some niptics inline." (032 comments) [software/homer] - 10https://gerrit.wikimedia.org/r/585510 (https://phabricator.wikimedia.org/T244363) (owner: 10Volans) [15:16:59] 10Operations, 10MediaWiki-Cache, 10Traffic, 10Core Platform Team Workboards (Clinic Duty Team): purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10BBlack) So, the backend purging queues in esams are way behind. One the one node I'm staring at... [15:17:32] 10Operations, 10Traffic: varnishd crashes in vbf_stp_condfetch(): cp3057 and cp3061 - https://phabricator.wikimedia.org/T249344 (10ema) [15:17:42] 10Operations, 10Traffic: varnishd crashes in vbf_stp_condfetch(): cp3057 and cp3061 - https://phabricator.wikimedia.org/T249344 (10ema) p:05Triage→03Medium [15:18:31] !log cp3057: restart varnish-fe T249344 [15:18:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:37] T249344: varnishd crashes in vbf_stp_condfetch(): cp3057 and cp3061 - https://phabricator.wikimedia.org/T249344 [15:18:40] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' . [15:18:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:19:49] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' . [15:19:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:52] RECOVERY - Varnish frontend child restarted on cp3057 is OK: (C)2 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Varnish https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=66&fullscreen&orgId=1&var-server=cp3057&var-datasource=esams+prometheus/ops [15:22:10] (03PS4) 10Dzahn: installserver: rsync home dir data for migration [puppet] - 10https://gerrit.wikimedia.org/r/585765 (https://phabricator.wikimedia.org/T224576) [15:23:35] (03CR) 10Kosta Harlan: [C: 04-1] "Can be abandoned in favor of I9ee3252cbdc2e997e66b97cedb08a4d1926d45ee" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/565438 (https://phabricator.wikimedia.org/T238295) (owner: 10Catrope) [15:24:06] (03CR) 10Kosta Harlan: [C: 03+1] Enable GrowthExperiments welcome survey on Ukrainian, Hungarian, Armenian Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584135 (https://phabricator.wikimedia.org/T238295) (owner: 10Gergő Tisza) [15:24:42] (03CR) 10Ayounsi: [C: 03+1] "Code reviewed and tested." [software/homer] - 10https://gerrit.wikimedia.org/r/585511 (owner: 10Volans) [15:25:10] (03PS1) 10Hnowlan: changeprop: move chart into repo [deployment-charts] - 10https://gerrit.wikimedia.org/r/585771 [15:27:57] (03CR) 10Ayounsi: [C: 03+1] diff: use different exit code if there is a diff [software/homer] - 10https://gerrit.wikimedia.org/r/585512 (https://phabricator.wikimedia.org/T249224) (owner: 10Volans) [15:28:12] (03CR) 10Ppchelko: [C: 03+2] changeprop: move chart into repo [deployment-charts] - 10https://gerrit.wikimedia.org/r/585771 (owner: 10Hnowlan) [15:28:29] (03Merged) 10jenkins-bot: changeprop: move chart into repo [deployment-charts] - 10https://gerrit.wikimedia.org/r/585771 (owner: 10Hnowlan) [15:30:58] !log hnowlan@deploy1001 helmfile [STAGING] Ran 'sync' command on namespace 'changeprop' for release 'staging' . [15:31:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:31:21] (03PS5) 10Dzahn: installserver: rsync home dir data for migration [puppet] - 10https://gerrit.wikimedia.org/r/585765 (https://phabricator.wikimedia.org/T224576) [15:31:31] (03CR) 10Kosta Harlan: [C: 04-1] Deploy GrowthExperiments on Serbian Wikipedia (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584133 (https://phabricator.wikimedia.org/T241181) (owner: 10Gergő Tisza) [15:31:46] (03CR) 10Herron: [C: 03+1] "LGTM! Will plan to deploy on Monday" [puppet] - 10https://gerrit.wikimedia.org/r/583414 (https://phabricator.wikimedia.org/T246961) (owner: 10Mstyles) [15:33:36] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10Nuria) > and would need a place to properly test potentially high-load queries with production... [15:33:50] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1002/21705/" [puppet] - 10https://gerrit.wikimedia.org/r/585765 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [15:33:59] (03CR) 10Ayounsi: [C: 03+1] "Tested and checked that there was no false positives." [software/homer] - 10https://gerrit.wikimedia.org/r/585536 (owner: 10Volans) [15:37:14] 10Operations, 10Traffic, 10observability: vhtcpd prometheus metrics broken; prometheus-vhtcpd-stats.py out-of-date with reality - https://phabricator.wikimedia.org/T249346 (10CDanis) [15:43:30] !log cp3061: restart varnish-fe T249344 [15:43:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:35] T249344: varnishd crashes in vbf_stp_condfetch(): cp3057 and cp3061 - https://phabricator.wikimedia.org/T249344 [15:45:04] RECOVERY - Varnish frontend child restarted on cp3061 is OK: (C)2 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Varnish https://grafana.wikimedia.org/dashboard/db/varnish-machine-stats?panelId=66&fullscreen&orgId=1&var-server=cp3061&var-datasource=esams+prometheus/ops [15:46:35] (03CR) 10Andrew Bogott: [C: 03+2] Remove hosts hiera for labstore1004 and labstore1005 [puppet] - 10https://gerrit.wikimedia.org/r/585576 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [15:46:48] (03CR) 10Andrew Bogott: [C: 03+2] Openstack client packages: define for queens/jessie [puppet] - 10https://gerrit.wikimedia.org/r/585601 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [15:47:29] (03CR) 10Andrew Bogott: [C: 03+2] Openstack Neutron: add neutron l3 hacks for Rocky [puppet] - 10https://gerrit.wikimedia.org/r/585034 (https://phabricator.wikimedia.org/T248635) (owner: 10Andrew Bogott) [15:48:29] (03PS2) 10Mvolz: Update spec.yaml [deployment-charts] - 10https://gerrit.wikimedia.org/r/585726 [15:52:01] 10Operations, 10Traffic, 10observability: vhtcpd prometheus metrics broken; prometheus-vhtcpd-stats.py out-of-date with reality - https://phabricator.wikimedia.org/T249346 (10BBlack) I had a chat with the author to make sure we understand the meaning of the fields: First line: `start`: this is just the *nix... [15:58:38] 10Operations, 10MediaWiki-Cache, 10Traffic, 10Core Platform Team Workboards (Clinic Duty Team): purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10BBlack) Probably-related: T241232 [16:02:33] 10Operations, 10MediaWiki-Parser, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), 10Wikimedia-Incident: API action=parse should be poolcounter-limited if a re-parse is necessary - https://phabricator.wikimedia.org/T243803 (10Anomie) `PoolWorkArticleView` is specifically for article view... [16:03:32] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_cxserver_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:05:22] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [16:06:53] (03CR) 10Jforrester: zuul: provision the scap repository (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/579587 (https://phabricator.wikimedia.org/T215458) (owner: 10Hashar) [16:21:23] (03PS1) 10Gergő Tisza: Whitelist X-Wikimedia-Debug header for cross-wiki API requests [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585779 (https://phabricator.wikimedia.org/T249107) [16:23:29] (03CR) 10jerkins-bot: [V: 04-1] Whitelist X-Wikimedia-Debug header for cross-wiki API requests [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585779 (https://phabricator.wikimedia.org/T249107) (owner: 10Gergő Tisza) [16:29:24] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 113.9 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [16:32:48] 10Operations, 10ops-eqiad, 10netops: Eqiad: C6 mgmt switch down - https://phabricator.wikimedia.org/T249309 (10wiki_willy) a:03Cmjohnson Assigning to @Cmjohnson, since he'll be onsite today [16:43:44] 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO, 10observability, and 2 others: Export zuul metrics to Prometheus - https://phabricator.wikimedia.org/T233089 (10Jdforrester-WMF) Is there anything we in RelEng can do to help with this work? [16:49:47] (03PS2) 10Gergő Tisza: Whitelist X-Wikimedia-Debug header for cross-wiki API requests [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585779 (https://phabricator.wikimedia.org/T249107) [16:52:12] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Remove old OpenStack config and manifests - https://phabricator.wikimedia.org/T249058 (10Andrew) [16:52:33] 10Operations, 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team-TODO, 10observability, and 2 others: Export zuul metrics to Prometheus - https://phabricator.wikimedia.org/T233089 (10colewhite) @Jdforrester-WMF Absolutely! If someone would be willing to review and give a stamp of `(approv... [16:54:52] (03PS8) 10Gergő Tisza: Deploy GrowthExperiments on Serbian Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584133 (https://phabricator.wikimedia.org/T241181) [16:55:21] (03CR) 10Gergő Tisza: Deploy GrowthExperiments on Serbian Wikipedia (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584133 (https://phabricator.wikimedia.org/T241181) (owner: 10Gergő Tisza) [16:55:22] 10Operations, 10Performance-Team: Occasional NIC Tx bandwidth saturation for mc1027 - https://phabricator.wikimedia.org/T248962 (10aaron) It stores the serialized naive "top frame" (e.g. headings, paragraphs, template invocation parameters) of the wikitext of pages, as well as the "sub-frames" from template in... [16:55:50] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Traffic: Elevated 503 responses between 2020-03-15 and 2020-03-19 - https://phabricator.wikimedia.org/T248132 (10Mholloway) 05Open→03Declined Checked again just now and it looks like the issue was transitory. Might as well close this. [17:01:10] (03PS8) 10Gergő Tisza: Enable GrowthExperiments on French Wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584183 (https://phabricator.wikimedia.org/T235964) [17:02:55] (03CR) 10Gergő Tisza: [C: 04-2] "> we should wait for Benoit to reply about what's happening with the welcome survey." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584183 (https://phabricator.wikimedia.org/T235964) (owner: 10Gergő Tisza) [17:04:51] (03PS2) 10Andrew Bogott: wmcs::nfs::secondary: use Queens OpenStack client packages [puppet] - 10https://gerrit.wikimedia.org/r/585605 (https://phabricator.wikimedia.org/T249058) [17:10:50] 10Operations, 10ops-eqiad, 10netops: Eqiad: C6 mgmt switch down - https://phabricator.wikimedia.org/T249309 (10Cmjohnson) @XioNoX the netgear switch does not have any power to it, I tried replacing the power cable and used a different power outlet and still nothing. These do not have redundant power and we... [17:12:16] (03CR) 10Andrew Bogott: [C: 03+2] wmcs::nfs::secondary: use Queens OpenStack client packages [puppet] - 10https://gerrit.wikimedia.org/r/585605 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [17:14:24] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Remove old OpenStack config and manifests - https://phabricator.wikimedia.org/T249058 (10Andrew) [17:14:35] 10Operations, 10ops-eqiad, 10DC-Ops: ganeti1011.mgmt is un-configured (was: Puppet resolves wrong IP for Icinga host config) - https://phabricator.wikimedia.org/T249314 (10Cmjohnson) @volans I am pretty sure this has something to do with the mgmt switch on C6 being down. T249309 [17:14:50] 10Operations, 10ops-eqiad, 10DC-Ops: ganeti1011.mgmt is un-configured (was: Puppet resolves wrong IP for Icinga host config) - https://phabricator.wikimedia.org/T249314 (10Cmjohnson) [17:14:52] 10Operations, 10ops-eqiad, 10netops: Eqiad: C6 mgmt switch down - https://phabricator.wikimedia.org/T249309 (10Cmjohnson) [17:15:35] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Onboarding Wolfgang Kandek - https://phabricator.wikimedia.org/T249352 (10RLazarus) p:05Triage→03High [17:15:50] 10Operations, 10ops-eqiad, 10DC-Ops: druid1008 missing asset tag in netbox - https://phabricator.wikimedia.org/T249286 (10Cmjohnson) 05Open→03Resolved fixed [17:16:47] 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Onboarding Wolfgang Kandek - https://phabricator.wikimedia.org/T249352 (10RLazarus) Welcome Wolfgang! We've already been chatting about some of this stuff, but this Phab task will track progress as we get it taken care of. [17:17:36] 10Operations, 10ops-eqiad, 10decommission, 10cloud-services-team (Kanban): labsdb1002-array1: status clarification - https://phabricator.wikimedia.org/T214903 (10Cmjohnson) a:05Cmjohnson→03RobH @robh this is still in the rack and needs to be decom'd [17:18:32] 10Operations, 10ops-eqiad, 10DC-Ops: (Need by: 2020-03-01) rack/setup/install htmldumper1001.eqiad.wmnet. - https://phabricator.wikimedia.org/T245567 (10Cmjohnson) 05Open→03Resolved [17:19:55] 10Operations, 10Traffic, 10observability: vhtcpd prometheus metrics broken; prometheus-vhtcpd-stats.py out-of-date with reality - https://phabricator.wikimedia.org/T249346 (10BBlack) We're probably going to add multiple purger connections to fan out the per-thread load from T241232 to help with T249325 . I'... [17:34:18] 10Operations, 10ops-eqiad, 10netops: Eqiad: C6 mgmt switch down - https://phabricator.wikimedia.org/T249309 (10wiki_willy) [17:34:59] (03PS1) 10Andrew Bogott: cloud-vps: set cloud/eqiad1.yaml:profile::openstack::eqiad1::version: 'mitaka' [puppet] - 10https://gerrit.wikimedia.org/r/585788 (https://phabricator.wikimedia.org/T249058) [17:35:20] RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 72.2 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [17:41:53] (03PS2) 10Andrew Bogott: cloud-vps: set cloud/eqiad1.yaml:profile::openstack::eqiad1::version: 'queens' [puppet] - 10https://gerrit.wikimedia.org/r/585788 (https://phabricator.wikimedia.org/T249058) [17:45:52] (03CR) 10Andrew Bogott: [C: 03+2] cloud-vps: set cloud/eqiad1.yaml:profile::openstack::eqiad1::version: 'queens' [puppet] - 10https://gerrit.wikimedia.org/r/585788 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [17:46:25] (03PS1) 10Dzahn: installserver::migration: fix rsync dir name [puppet] - 10https://gerrit.wikimedia.org/r/585789 [17:46:49] (03CR) 10jerkins-bot: [V: 04-1] installserver::migration: fix rsync dir name [puppet] - 10https://gerrit.wikimedia.org/r/585789 (owner: 10Dzahn) [17:48:14] (03PS2) 10Dzahn: installserver::migration: fix rsync dir name [puppet] - 10https://gerrit.wikimedia.org/r/585789 [17:48:37] (03CR) 10jerkins-bot: [V: 04-1] installserver::migration: fix rsync dir name [puppet] - 10https://gerrit.wikimedia.org/r/585789 (owner: 10Dzahn) [17:49:55] (03PS3) 10Dzahn: installserver::migration: fix rsync dir name [puppet] - 10https://gerrit.wikimedia.org/r/585789 [17:50:17] (03PS4) 10Dzahn: installserver::migration: fix rsync dir name [puppet] - 10https://gerrit.wikimedia.org/r/585789 (https://phabricator.wikimedia.org/T224576) [17:51:06] (03CR) 10Anomie: Whitelist X-Wikimedia-Debug header for cross-wiki API requests (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585779 (https://phabricator.wikimedia.org/T249107) (owner: 10Gergő Tisza) [17:54:42] (03CR) 10Dzahn: [C: 03+2] installserver::migration: fix rsync dir name [puppet] - 10https://gerrit.wikimedia.org/r/585789 (https://phabricator.wikimedia.org/T224576) (owner: 10Dzahn) [17:58:33] !log rsync home dirs from install1002 to apt1001:/srv/home_install1002... [17:58:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:41] (03PS1) 10Andrew Bogott: Add class ::openstack::clientpackages::vms::queens::jessie [puppet] - 10https://gerrit.wikimedia.org/r/585791 (https://phabricator.wikimedia.org/T249058) [18:01:25] (03CR) 10jerkins-bot: [V: 04-1] Add class ::openstack::clientpackages::vms::queens::jessie [puppet] - 10https://gerrit.wikimedia.org/r/585791 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [18:02:10] (03PS2) 10Andrew Bogott: Add class ::openstack::clientpackages::vms::queens::jessie [puppet] - 10https://gerrit.wikimedia.org/r/585791 (https://phabricator.wikimedia.org/T249058) [18:03:12] (03CR) 10Andrew Bogott: [C: 03+2] Add class ::openstack::clientpackages::vms::queens::jessie [puppet] - 10https://gerrit.wikimedia.org/r/585791 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [18:43:17] (03CR) 10Andrew Bogott: "The host 'deployment-sentry.deployment-prep.eqiad.wmflabs' is Jessie and now errors out like so:" [puppet] - 10https://gerrit.wikimedia.org/r/581985 (owner: 10Muehlenhoff) [18:46:37] 10Operations, 10cloud-services-team (Kanban): Remove old OpenStack config and manifests - https://phabricator.wikimedia.org/T249058 (10Andrew) [18:55:29] 10Operations, 10SRE-Access-Requests, 10Developer-Advocacy (Apr-Jun 2020): Add aklapper to analytics-privatedata-users - https://phabricator.wikimedia.org/T248905 (10Aklapper) >>! In T248905#6025584, @elukey wrote: > I checked T213780 and I see that the user in question doesn't have a Kerberos account, how ar... [19:01:18] (03PS1) 10RLazarus: maintenance: Migrate updatetranslationstats to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/585795 (https://phabricator.wikimedia.org/T211250) [19:05:07] (03PS2) 10Gergő Tisza: Enable GrowthExperiments suggested edits on uk, hu, hy, eu wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585219 (https://phabricator.wikimedia.org/T247308) [19:16:58] 10Operations, 10MediaWiki-Cache, 10Traffic, 10Core Platform Team Workboards (Clinic Duty Team): purgeList.php/HTCP purge doesn't seem to invalidate cache correctly - https://phabricator.wikimedia.org/T249325 (10Urbanecm) >>! In T249325#6026427, @jcrespo wrote: >> It sounds like High but is it UBN? > > I'v... [19:19:21] (03PS5) 10Mahveotm: [bugfix] disable crosswiki upload till a solution is found for the broken images [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491928 (https://phabricator.wikimedia.org/T214230) [19:20:20] (03PS1) 10RLazarus: maintenance: Migrate echo_mail_batch to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/585796 (https://phabricator.wikimedia.org/T211250) [19:20:35] (03CR) 10Mahveotm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491928 (https://phabricator.wikimedia.org/T214230) (owner: 10Mahveotm) [19:37:32] (03CR) 10Jforrester: [C: 04-2] "This is not a bugfix." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491928 (https://phabricator.wikimedia.org/T214230) (owner: 10Mahveotm) [19:44:42] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10aezell) @Nuria Our team is in a position of needing to test some new queries or changes to quer... [19:44:57] (03Abandoned) 10Mahveotm: [bugfix] disable crosswiki upload till a solution is found for the broken images [mediawiki-config] - 10https://gerrit.wikimedia.org/r/491928 (https://phabricator.wikimedia.org/T214230) (owner: 10Mahveotm) [19:51:00] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10Catrope) It sounds like we might mean different things when we say "testing". @Nuria is talking... [19:57:05] 10Operations, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, 10Traffic: Cache not consistently updated for PCS JS endpoint - https://phabricator.wikimedia.org/T249290 (10Pchelolo) I'm gonna retag this with #traffic and remove CPT for now, since CDN purges are not within our are of expertise... [20:05:34] (03PS1) 10QChris: Add .gitreview [debs/thanos] - 10https://gerrit.wikimedia.org/r/585799 [20:05:36] (03CR) 10QChris: [V: 03+2 C: 03+2] Add .gitreview [debs/thanos] - 10https://gerrit.wikimedia.org/r/585799 (owner: 10QChris) [20:31:57] (03PS3) 10Gergő Tisza: Whitelist X-Wikimedia-Debug header for cross-wiki API requests [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585779 (https://phabricator.wikimedia.org/T249107) [21:00:30] (03PS1) 10Andrew Bogott: OpenStack: remove config and manifests for version Mitaka [puppet] - 10https://gerrit.wikimedia.org/r/585814 (https://phabricator.wikimedia.org/T249058) [21:00:32] (03PS1) 10Andrew Bogott: OpenStack: remove config and manifests for version 'Newton' [puppet] - 10https://gerrit.wikimedia.org/r/585815 (https://phabricator.wikimedia.org/T249058) [21:03:10] (03CR) 10Anomie: [C: 03+1] "Looks good to deploy." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585779 (https://phabricator.wikimedia.org/T249107) (owner: 10Gergő Tisza) [21:03:13] (03CR) 10Andrew Bogott: [C: 03+2] OpenStack: remove config and manifests for version Mitaka [puppet] - 10https://gerrit.wikimedia.org/r/585814 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [21:09:05] (03PS1) 10Andrew Bogott: OpenStack: remove config and manifests for version 'Ocata' [puppet] - 10https://gerrit.wikimedia.org/r/585817 (https://phabricator.wikimedia.org/T249058) [21:09:48] (03CR) 10Andrew Bogott: [C: 03+2] OpenStack: remove config and manifests for version 'Newton' [puppet] - 10https://gerrit.wikimedia.org/r/585815 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [21:10:06] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 116.9 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [21:10:21] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Remove old OpenStack config and manifests - https://phabricator.wikimedia.org/T249058 (10Andrew) [21:12:26] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [21:17:24] !log ugpraded wikitech-static to 1.34.1 [21:17:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:20:03] (03CR) 10Andrew Bogott: [C: 03+2] OpenStack: remove config and manifests for version 'Ocata' [puppet] - 10https://gerrit.wikimedia.org/r/585817 (https://phabricator.wikimedia.org/T249058) (owner: 10Andrew Bogott) [21:23:24] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1087 is OK: HTTP OK: HTTP/1.0 200 OK - 22335 bytes in 0.006 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [21:55:42] 10Operations, 10Commons, 10MediaWiki-File-management, 10Parsoid, and 7 others: RFC: Use content hash based image / thumb URLs - https://phabricator.wikimedia.org/T149847 (10Krinkle) [21:59:58] RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 75.25 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [22:08:41] (03PS1) 10Bstorm: toolforge-redis: fail over tools-redis to tools-redis-1003 [puppet] - 10https://gerrit.wikimedia.org/r/585830 (https://phabricator.wikimedia.org/T248929) [22:18:42] 10Operations, 10CommRel-Specialists-Support, 10Core Platform Team, 10Editing-team, and 9 others: RFC: Serve Main Page of Wikimedia wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10Krinkle) [22:21:47] 10Operations, 10CommRel-Specialists-Support, 10Core Platform Team, 10Editing-team, and 9 others: RFC: Serve Main Page of Wikimedia wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10Krinkle) Code steward of the core feature TBD. It's a pretty minor feature, but worth double-checking... [22:27:20] (03CR) 10Bstorm: [C: 03+2] toolforge-redis: fail over tools-redis to tools-redis-1003 [puppet] - 10https://gerrit.wikimedia.org/r/585830 (https://phabricator.wikimedia.org/T248929) (owner: 10Bstorm) [23:54:50] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server