[00:00:32] But if you can Just Do It™, go for it. :-) [00:00:39] James_F: I can skip the translate namespace and include talk pages. Are redirects in Translate needed? [00:01:14] No. [00:01:23] Just NS0 (and NS1). [00:02:38] Great I'll just do those then for the errors in your page translation log. [01:07:06] James_F: I didn't see any talk pages, but ns0 is done. [01:07:30] Nice. [01:07:33] Thank you. [01:08:20] No problem [02:09:11] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is CRITICAL: cluster=cache_text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [02:09:23] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 229, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [02:10:27] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 71, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [02:10:39] PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-text site=esams https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [02:11:11] PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [02:11:25] PROBLEM - Esams HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [02:11:57] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 73, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [02:12:21] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 231, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [02:19:21] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [02:19:23] RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [02:23:01] RECOVERY - Esams HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [02:24:13] RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5 [03:35:49] (03PS1) 10JJMC89: Revert "Temporary make account creation limits more restrictive" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518350 (https://phabricator.wikimedia.org/T212667) [03:37:59] (03CR) 10SBassett: [C: 03+1] "Per T212667#5275522" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518350 (https://phabricator.wikimedia.org/T212667) (owner: 10JJMC89) [03:38:58] (03CR) 10jerkins-bot: [V: 04-1] Revert "Temporary make account creation limits more restrictive" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518350 (https://phabricator.wikimedia.org/T212667) (owner: 10JJMC89) [03:41:52] (03CR) 10SBassett: [C: 03+1] "Note: the operations-mw-config-php70-composer-lint-docker failure (and some other jenkins failures) is due to UBN T226253" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518350 (https://phabricator.wikimedia.org/T212667) (owner: 10JJMC89) [04:32:16] 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Pruem) Shall we set this to resolved then? The active discussion on how to prevent this in the future s... [05:17:15] PROBLEM - puppet last run on dns4001 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [05:44:31] RECOVERY - puppet last run on dns4001 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:20:51] (03CR) 10Framawiki: [C: 03+1] Revert "Temporary make account creation limits more restrictive" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518350 (https://phabricator.wikimedia.org/T212667) (owner: 10JJMC89) [06:33:37] PROBLEM - puppet last run on es1017 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. [07:00:45] RECOVERY - puppet last run on es1017 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:47:14] (03CR) 10ArielGlenn: [C: 03+1] "Good by me, though it might be outdated by the time the month deadline rolls around :-)" [puppet] - 10https://gerrit.wikimedia.org/r/518108 (https://phabricator.wikimedia.org/T226153) (owner: 10Smalyshev) [09:21:47] PROBLEM - mailman_queue_size on fermium is CRITICAL: CRITICAL: 2 mailman queue(s) above limits (thresholds: bounces: 25 in: 25 virgin: 25) https://wikitech.wikimedia.org/wiki/Mailman [09:33:23] RECOVERY - mailman_queue_size on fermium is OK: OK: mailman queues are below the limits. https://wikitech.wikimedia.org/wiki/Mailman [13:11:28] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518350 (https://phabricator.wikimedia.org/T212667) (owner: 10JJMC89) [14:08:04] 10Operations, 10Performance-Team, 10Traffic, 10Performance: Sometimes pages load slowly for European users (due to some factor outside of Wikimedia cluster) - https://phabricator.wikimedia.org/T226048 (10Krinkle) 05Open→03Resolved a:03ema [14:38:10] (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518278 (https://phabricator.wikimedia.org/T225917) (owner: 10Urbanecm) [14:38:55] (03CR) 10jerkins-bot: [V: 04-1] Add hualab.nl to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518278 (https://phabricator.wikimedia.org/T225917) (owner: 10Urbanecm) [16:31:10] 10Puppet, 10Cloud-Services, 10cloud-services-team (Kanban): Consider ways to make puppetmaster CA changes smoother on the puppet client end - https://phabricator.wikimedia.org/T220268 (10Krenair) 05Open→03Resolved a:03Krenair Wish I'd done this years ago. It seems to have worked and allows us to effort... [16:36:53] 10Operations, 10Puppet, 10Cloud-Services, 10cloud-services-team (Kanban): Move the main WMCS puppetmaster into the Labs realm - https://phabricator.wikimedia.org/T171188 (10Krenair) [17:33:33] (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518278 (https://phabricator.wikimedia.org/T225917) (owner: 10Urbanecm) [17:35:06] (03CR) 10Urbanecm: [C: 03+1] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518350 (https://phabricator.wikimedia.org/T212667) (owner: 10JJMC89) [17:35:08] (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518298 (https://phabricator.wikimedia.org/T204583) (owner: 10Urbanecm) [17:36:22] (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518260 (https://phabricator.wikimedia.org/T225398) (owner: 10Petar.petkovic) [19:11:32] (03PS1) 10Zoranzoki21: Change name of Serbian Wikinews (part 1) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518390 (https://phabricator.wikimedia.org/T226315) [19:17:22] (03PS1) 10Zoranzoki21: Change name of Serbian Wikinews in InitialiseSettings.php (part 2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518391 (https://phabricator.wikimedia.org/T226315) [20:18:25] 10Operations, 10cloud-services-team: Migrate remaining cloudvirt hosts to Stretch/Mitaka - https://phabricator.wikimedia.org/T224561 (10Krenair) the first of those is basically empty but the rest are busy machines: `cloudvirt1014.eqiad.wmnet: canary-1014-01.testlabs.eqiad.wmflabs cloudvirt1016.eqiad.wmnet:... [20:34:42] (03CR) 10DannyS712: [C: 04-1] "Should be combined with https://gerrit.wikimedia.org/r/518391/ into the same patch" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518390 (https://phabricator.wikimedia.org/T226315) (owner: 10Zoranzoki21) [20:35:06] (03CR) 10DannyS712: [C: 04-1] "Should be combined with https://gerrit.wikimedia.org/r/518390/ into the same patch" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518391 (https://phabricator.wikimedia.org/T226315) (owner: 10Zoranzoki21) [20:40:39] hmm, i keep getting varnish errors on commons from europe [20:40:43] https://commons.wikimedia.org/wiki/Category:360_panoramics_videos [20:40:52] Request from 82.197.207.202 via cp1077 cp1077, Varnish XID 183713489 [20:40:55] Error: 503, Backend fetch failed at Sat, 22 Jun 2019 20:40:13 GMT [20:41:46] rogue host in esams or something ? [20:46:08] thedj, are the servers involved consistent when this occurs? [20:46:35] what I mean to say is: is there always one particular host on the list when it errors? [20:46:36] Krenair: yup, only the varnish xid and timestamp vary [20:46:50] ooh I get it as well [20:46:59] me too. [20:47:45] James_F: check codehealth isn't working for https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/JsonData/+/518104/ [20:47:48] any ideas? [20:47:58] it stopped now [20:48:25] yeah stopped for me for a couple of hours too.. now its back though. [20:48:31] RECOVERY - Memory correctable errors -EDAC- on mw1254 is OK: (C)4 ge (W)2 ge 0 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mw1254&var-datasource=eqiad+prometheus/ops [20:49:39] RECOVERY - EDAC syslog messages on mw1254 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mw1254&var-datasource=eqiad+prometheus/ops [20:56:52] don't see anything weird in the graphs [21:00:35] although: https://grafana.wikimedia.org/d/000000352/varnish-failed-fetches?panelId=6&fullscreen&orgId=1&from=now%2Fw&to=now%2Fw&var-datasource=esams%20prometheus%2Fops&var-cache_type=text&var-server=All&var-layer=backend [21:20:50] there is definitely something up with text @ esams, that low level of failed fetches is not normal, and is probably affecting overall availability (it's hard to say how much because of the retries at the frontend layer) [21:32:57] (03PS1) 10Reedy: Move all FR config to an extension function [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518396 (https://phabricator.wikimedia.org/T225144) [21:34:06] (03PS2) 10Reedy: Move all FR config to an extension function [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518396 (https://phabricator.wikimedia.org/T225144) [21:34:41] (03CR) 10jerkins-bot: [V: 04-1] Move all FR config to an extension function [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518396 (https://phabricator.wikimedia.org/T225144) (owner: 10Reedy) [21:37:26] (03PS3) 10Reedy: Move all FR config to an extension function [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518396 (https://phabricator.wikimedia.org/T225144) [21:38:02] (03PS4) 10Reedy: Move all FR config to an extension function [mediawiki-config] - 10https://gerrit.wikimedia.org/r/518396 (https://phabricator.wikimedia.org/T225144) [21:55:24] (03CR) 10CDanis: "Looks pretty good! I have a few questions (mostly from my own Python ignorance) and also about 0.5 unit volans of nitpicks." (039 comments) [software/python-poolcounter] - 10https://gerrit.wikimedia.org/r/517828 (owner: 10Giuseppe Lavagetto) [22:05:18] 10Operations, 10Wikidata, 10wikidata-tech-focus: Move dispatching of wikidata to a dedicated node - https://phabricator.wikimedia.org/T193733 (10Addshore) 05Open→03Stalled p:05Normal→03Low Going to mark this as stalled. Also we havn't had performance issues with dispatching for quite some time now a... [22:06:19] 10Puppet, 10Beta-Cluster-Infrastructure, 10Wikidata, 10User-Addshore: mediawiki::maintenance::wikidata should not run crons for testwikidatawiki when used on labs / a testwikidatawiki doesnt exist - https://phabricator.wikimedia.org/T173357 (10Addshore) p:05Triage→03Low [22:17:39] 10Operations, 10Discovery, 10Traffic, 10WMDE-Analytics-Engineering, and 3 others: Allow access to wdqs.svc.eqiad.wmnet on port 8888 - https://phabricator.wikimedia.org/T176875 (10Addshore) It looks like this was the cause of the dashboard breaking again in T218710. It is a shame that we can not whitelist... [22:28:41] 10Operations, 10MediaWiki-History-and-Diffs, 10Wikidata, 10wikidata-tech-focus, 10Performance: Request timeout when loading diffs on Wikidata - https://phabricator.wikimedia.org/T140879 (10Addshore) Now we see a lovely. ` [XQ6rOQpAMEsAAH3jMpAAAABQ] 2019-06-22 22:28:05: Fatal exception of type "WMFTimeo... [22:29:03] 10Operations, 10MediaWiki-History-and-Diffs, 10Wikidata, 10wikidata-tech-focus, 10Performance: WMFTimeoutException when loading some diffs on Wikidata - https://phabricator.wikimedia.org/T140879 (10Addshore)