[00:15:34] (03CR) 10DannyS712: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/579814 (owner: 10Krinkle) [03:38:40] PROBLEM - Host an-presto1004 is DOWN: PING CRITICAL - Packet loss = 100% [03:59:17] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 2 others: Some (recent?) uploads to Commons are not available on other wikis - https://phabricator.wikimedia.org/T253405 (10Dinoguy1000) [04:23:59] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 2 others: Some (recent?) uploads to Commons are not available on other wikis - https://phabricator.wikimedia.org/T253405 (10AntiCompositeNumber) 05Resolved→03Open https://commons.wikimedia.org/wiki/File:Romulo_Espaldon.jpg... [04:26:26] How'd we manage to un-fix that one so quickly [05:55:44] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 2 others: Some (recent?) uploads to Commons are not available on other wikis - https://phabricator.wikimedia.org/T253405 (10Mbch331) I have problems with https://commons.wikimedia.org/w/index.php?title=File:Baruch_Spinoza_-_Le... [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200523T0700) [07:14:04] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 269, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:14:04] PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [07:21:26] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 271, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [07:23:36] PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [07:25:36] PROBLEM - BFD status on cr1-eqiad is CRITICAL: CRIT: Down: 2 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [07:29:46] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 2 others: Some (recent?) uploads to Commons are not available on other wikis - https://phabricator.wikimedia.org/T253405 (10Kanags) I have the same issue with the uploaded image: https://commons.wikimedia.org/wiki/File:S._S._A... [07:36:26] RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [07:36:34] RECOVERY - BFD status on cr1-eqiad is OK: OK: UP: 12 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [07:38:02] RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [07:41:36] Question for any deployers: if a patch modifies multiple files, but the modifications are not dependent on each other, does each file need its own sync for deployment? [07:44:08] specifically I want to have a patch deployed that modifies 10 files [07:59:22] (03CR) 10Elukey: "Definitely! Can you ping me on Monday (or anytime next week) when you are available on IRC? I'd like to do it when you are around to quick" [puppet] - 10https://gerrit.wikimedia.org/r/596221 (https://phabricator.wikimedia.org/T252365) (owner: 10Bearloga) [08:04:50] !log powercycle an-presto1004 - unresponsive, racadm getsel shows CPU overheating alerts [08:04:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:17:15] 10Operations, 10ops-eqiad, 10Analytics: an-presto1004 down - https://phabricator.wikimedia.org/T253438 (10elukey) [08:18:23] ACKNOWLEDGEMENT - Host an-presto1004 is DOWN: PING CRITICAL - Packet loss = 100% Elukey T253438 [09:22:34] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 2 others: Some (recent?) uploads to Commons are not available on other wikis - https://phabricator.wikimedia.org/T253405 (10Path-x21) The same issue for another of my photos/files I uploaded: https://commons.wikimedia.org/wiki... [09:47:35] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 2 others: Some (recent?) uploads to Commons are not available on other wikis - https://phabricator.wikimedia.org/T253405 (10Path-x21) I have a potential idea why this bug occurs. I was editing the Wikipedia page and I had alre... [10:30:39] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 2 others: Some (recent?) uploads to Commons are not available on other wikis - https://phabricator.wikimedia.org/T253405 (10Stryn) [10:54:13] (03PS1) 10Volans: sre.hosts.decommission: use new spicerack.actions [cookbooks] - 10https://gerrit.wikimedia.org/r/598153 [11:00:04] (03CR) 10Volans: [C: 03+1] "LGTM, I didn't inspect the tar.gz." [software/netbox-deploy] - 10https://gerrit.wikimedia.org/r/595717 (owner: 10CRusnov) [14:37:17] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 2 others: Some (recent?) uploads to Commons are not available on other wikis - https://phabricator.wikimedia.org/T253405 (10CDanis) Of the 500 most recent uploads at the time I pulled the list, 49/500 were affected: | mw.org... [14:46:50] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 2 others: Some (recent?) uploads to Commons are not available on other wikis - https://phabricator.wikimedia.org/T253405 (10Krinkle) [14:50:20] !log Testing mc.php changes on mwdebug1002 [14:50:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:58:44] !log scap-pull to reset state on mwdebug1002 [14:58:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:17:36] (03PS1) 10Krinkle: Revert "Set coalesceKeys to non-global for commonswiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/598170 [15:27:27] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 3 others: Some (recent?) uploads to Commons are not available on other wikis - https://phabricator.wikimedia.org/T253405 (10Krinkle) [15:44:30] !log krinkle@deploy1001 Synchronized wmf-config/mc.php: I5ad8fe96b9098a8 - Disable coalesceKeys on commonswiki (duration: 01m 09s) [15:44:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:58:48] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 1.159e+04 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:00:38] RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 110 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:01:47] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 3 others: Some (recent?) uploads to Commons are not available on other wikis - https://phabricator.wikimedia.org/T253405 (10JohanahoJ) The usual way of displaying images, with [[File:Filename.jpg|...]], does not work on svwp f... [16:11:50] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 3 others: Some (recent?) uploads to Commons are not available on other wikis - https://phabricator.wikimedia.org/T253405 (10Krinkle) >>! In T253405#6159575, @gerritbot wrote: > Change 598118 **merged** by jenkins-bot: > [media... [16:13:04] (03CR) 10Krinkle: [C: 03+2] Revert "Set coalesceKeys to non-global for commonswiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/598170 (owner: 10Krinkle) [16:13:52] (03Merged) 10jenkins-bot: Revert "Set coalesceKeys to non-global for commonswiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/598170 (owner: 10Krinkle) [16:22:36] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 8206 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:24:26] RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 319 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:36:43] (03PS2) 10Alexandros Kosiaris: mobileapps: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597788 (owner: 10Apakhomov) [17:38:28] (03CR) 10Alexandros Kosiaris: eventstreams: added support egress rules (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/597774 (owner: 10Apakhomov) [17:38:38] (03CR) 10Alexandros Kosiaris: [C: 03+2] mobileapps: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597788 (owner: 10Apakhomov) [17:38:42] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Uploading, and 4 others: Some (recent?) uploads to Commons are not available on other wikis - https://phabricator.wikimedia.org/T253405 (10Bahnmoeller) https://commons.wikimedia.org/wiki/File:Christopher_Fiori.jpg is also not showing on wiki... [17:39:06] (03Merged) 10jenkins-bot: mobileapps: added support egress rules [deployment-charts] - 10https://gerrit.wikimedia.org/r/597788 (owner: 10Apakhomov) [17:42:00] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Mostly LGTM, aside from the fact that values-canary.yaml files don't need to have the policy defined" (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/597774 (owner: 10Apakhomov) [17:46:53] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Mostly LGTM, but I 'd skip the values-canary.yaml declarations. the values.yaml ones are going to work just fine." (0313 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/597772 (owner: 10Apakhomov) [17:54:32] * Krinkle staging on mwdebug1002 [17:59:36] (03PS1) 10Alexandros Kosiaris: Package multiple charts for egress networkpolicy [deployment-charts] - 10https://gerrit.wikimedia.org/r/598187 (https://phabricator.wikimedia.org/T249927) [17:59:38] (03PS1) 10Alexandros Kosiaris: Enable networkpolicy across multiple services [deployment-charts] - 10https://gerrit.wikimedia.org/r/598188 (https://phabricator.wikimedia.org/T249927) [18:05:01] !log krinkle@deploy1001 Synchronized php-1.35.0-wmf.31/includes/filerepo/: I31a9bb6672 (duration: 01m 10s) [18:05:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:06:15] !log krinkle@deploy1001 Synchronized php-1.35.0-wmf.32/includes/filerepo/: I31a9bb6672 (duration: 01m 06s) [18:06:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:29:49] (03CR) 10jerkins-bot: [V: 04-1] Enable networkpolicy across multiple services [deployment-charts] - 10https://gerrit.wikimedia.org/r/598188 (https://phabricator.wikimedia.org/T249927) (owner: 10Alexandros Kosiaris) [18:34:48] PROBLEM - MediaWiki memcached error rate on icinga1001 is CRITICAL: 6993 gt 5000 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [18:36:34] RECOVERY - MediaWiki memcached error rate on icinga1001 is OK: (C)5000 gt (W)1000 gt 22 https://wikitech.wikimedia.org/wiki/Memcached https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=1&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [18:44:08] PROBLEM - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [18:56:38] * Krinkle staging on mwdebug1002 [18:58:53] !log krinkle@deploy1001 Synchronized php-1.35.0-wmf.32/includes/filerepo/file/LocalFile.php: I0f7e885997d60 (duration: 01m 08s) [18:58:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:04:34] !log krinkle@deploy1001 Synchronized php-1.35.0-wmf.31/includes/filerepo/file/LocalFile.php: I0f7e885997d60 (duration: 01m 06s) [19:04:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:16:50] RECOVERY - Rate of JVM GC Old generation-s runs - elastic1052-production-search-psi-eqiad on elastic1052 is OK: (C)100 gt (W)80 gt 78.31 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=production-search-psi-eqiad&var-instance=elastic1052&panelId=37 [22:46:18] (03PS2) 10Alexandros Kosiaris: Enable networkpolicy across multiple services [deployment-charts] - 10https://gerrit.wikimedia.org/r/598188 (https://phabricator.wikimedia.org/T249927) [22:46:21] (03PS1) 10Alexandros Kosiaris: Rakefile: read stdout/stderr before waiting [deployment-charts] - 10https://gerrit.wikimedia.org/r/598209 [23:38:02] 10Operations, 10IDS-extension, 10Wikimedia Taiwan, 10Wikimedia-Extension-setup, and 2 others: Deploy IDS rendering engine to production - https://phabricator.wikimedia.org/T148693 (10awight) a:05awight→03None