[00:00:11] PROBLEM - Maps - OSM synchronization lag - eqiad on einsteinium is CRITICAL: 1.728e+05 ge 1.728e+05 https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=11&fullscreen&orgId=1 [00:00:30] PROBLEM - Maps - OSM synchronization lag - codfw on einsteinium is CRITICAL: 1.728e+05 ge 1.728e+05 https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=12&fullscreen&orgId=1 [00:01:49] !log krinkle@tin Synchronized wmf-config/CommonSettings-labs.php: beta-only (duration: 01m 17s) [00:01:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:12:02] (03Abandoned) 10Krinkle: webperf: Collect Navigation Timing gaps [puppet] - 10https://gerrit.wikimedia.org/r/420831 (https://phabricator.wikimedia.org/T104902) (owner: 10Phedenskog) [06:28:15] PROBLEM - puppet last run on cp1065 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/update-ocsp] [06:49:14] RECOVERY - Disk space on labtestnet2001 is OK: DISK OK [06:58:15] RECOVERY - puppet last run on cp1065 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:33:01] !log on tin rebasing wmf/1.31.0-wmf.29 for https://gerrit.wikimedia.org/r/#/c/428044/ https://gerrit.wikimedia.org/r/#/c/426884/ [07:33:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:33:53] !log (previous changes are solely for CI, noop in production) [07:33:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:35:45] !log restarting Jenkins for a plugin update | T192660 [07:35:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:40:17] I have just rebased the mediawiki-staging area [07:40:28] no actual deploy needed :] Have a good week-end [08:13:05] 10Operations, 10Beta-Cluster-Infrastructure, 10MediaWiki-Configuration, 10Release-Engineering-Team, 10User-MarcoAurelio: Beta Cluster sends password reset mails with prod address - https://phabricator.wikimedia.org/T192686#4147426 (10MarcoAurelio) 05Open>03Resolved p:05Triage>03Normal a:03MarcoA... [09:12:04] PROBLEM - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1963 bytes in 0.112 second response time [09:47:04] RECOVERY - wikidata.org dispatch lag is higher than 300s on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1977 bytes in 0.089 second response time [10:52:47] (03PS2) 10Urbanecm: Temp rate limit for arwiki due to mass vandalism [mediawiki-config] - 10https://gerrit.wikimedia.org/r/427956 (https://phabricator.wikimedia.org/T192668) [10:53:30] (03PS1) 10Urbanecm: Enable WikiLove on sawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428051 (https://phabricator.wikimedia.org/T192212) [10:54:00] (03CR) 10Rxy: [C: 031] "LGTM; recheck for jenkins" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/427956 (https://phabricator.wikimedia.org/T192668) (owner: 10Urbanecm) [10:58:56] (03CR) 10NehalDaveND: [C: 031] "Please go ahead. Thank you." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428051 (https://phabricator.wikimedia.org/T192212) (owner: 10Urbanecm) [10:59:35] (03CR) 10MarcoAurelio: "FIXME: namespaces, logos and other config was not properly handled in this patch @ InitialiseSettings.php." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/400234 (https://phabricator.wikimedia.org/T183561) (owner: 10Urbanecm) [11:02:47] Reedy: is it possible to do an unbreaking prod config scap? [11:09:26] (03Draft1) 10MarcoAurelio: Follow-up Ibd314fc2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428052 (https://phabricator.wikimedia.org/T183561) [11:09:29] (03PS2) 10MarcoAurelio: Follow-up Ibd314fc2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428052 (https://phabricator.wikimedia.org/T183561) [11:14:41] (03PS3) 10MarcoAurelio: Follow-up Ibd314fc2: lfnwiki: add logo path and missing namespace names [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428052 (https://phabricator.wikimedia.org/T183561) [11:15:04] (03CR) 10MarcoAurelio: "follow-up patch: https://gerrit.wikimedia.org/r/#/c/428052/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/400234 (https://phabricator.wikimedia.org/T183561) (owner: 10Urbanecm) [11:28:09] sigh [11:28:29] Dereckson: the config patches for the wikis you created a couple of days ago are all missing data [11:31:45] PROBLEM - High load average on labstore1003 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [11:33:46] RECOVERY - High load average on labstore1003 is OK: OK: Less than 50.00% above the threshold [16.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [11:35:54] (03Draft2) 10MarcoAurelio: gorwiki: add missing logo path and namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428053 (https://phabricator.wikimedia.org/T189109) [11:35:58] (03Draft1) 10MarcoAurelio: gorwiki: add missing logo path and namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428053 (https://phabricator.wikimedia.org/T189109) [11:36:57] I didn't catch this in action. Not sure what that was cream but I wonder if dumps are still coming here? [11:41:04] (03Draft1) 10MarcoAurelio: euwikisource: add missing $wgMetaNamespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428055 (https://phabricator.wikimedia.org/T189465) [11:41:08] (03PS2) 10MarcoAurelio: euwikisource: add missing $wgMetaNamespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428055 (https://phabricator.wikimedia.org/T189465) [11:54:12] (03PS4) 10MarcoAurelio: lfnwiki: add logo path and missing namespace names [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428052 (https://phabricator.wikimedia.org/T183561) [11:54:40] Krinkle: around? [12:02:47] what a mess [12:12:45] (03PS3) 10Urbanecm: gorwiki: add missing namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428053 (https://phabricator.wikimedia.org/T189109) (owner: 10MarcoAurelio) [12:15:56] PROBLEM - High load average on labstore1003 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [12:17:56] RECOVERY - High load average on labstore1003 is OK: OK: Less than 50.00% above the threshold [16.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [12:19:18] (03CR) 10Jayprakash12345: [C: 031] "@SWAT member, Please Create shorturl table first before merge. By" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428051 (https://phabricator.wikimedia.org/T192212) (owner: 10Urbanecm) [12:21:19] (03CR) 10Ladsgroup: "It's a one-time thing but it takes around one month to fully finish so I made it a cronjob so it can easily get done with supervision." [puppet] - 10https://gerrit.wikimedia.org/r/424300 (https://phabricator.wikimedia.org/T189596) (owner: 10Ladsgroup) [12:37:06] PROBLEM - High load average on labstore1003 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [24.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [12:40:06] RECOVERY - High load average on labstore1003 is OK: OK: Less than 50.00% above the threshold [16.0] https://grafana.wikimedia.org/dashboard/db/labs-monitoring [13:41:19] If someone sees Hauskatze tell them I'll be about later [15:22:24] I feel like renames are slower (than yesterday) today [16:46:44] PROBLEM - Router interfaces on cr1-eqsin is CRITICAL: CRITICAL: host 103.102.166.129, interfaces up: 73, down: 1, dormant: 0, excluded: 0, unused: 0 [16:49:24] PROBLEM - IPv4 ping to eqsin on ripe-atlas-eqsin is CRITICAL: CRITICAL - failed 83 probes of 321 (alerts on 19) - https://atlas.ripe.net/measurements/11645085/#!map [16:49:44] RECOVERY - Router interfaces on cr1-eqsin is OK: OK: host 103.102.166.129, interfaces up: 75, down: 0, dormant: 0, excluded: 0, unused: 0 [16:54:24] RECOVERY - IPv4 ping to eqsin on ripe-atlas-eqsin is OK: OK - failed 0 probes of 321 (alerts on 19) - https://atlas.ripe.net/measurements/11645085/#!map [17:28:19] (03PS1) 10Andrew Bogott: Move labvirt1016, 1018, 1019, 1020 to Jessie [puppet] - 10https://gerrit.wikimedia.org/r/428070 [17:29:51] (03CR) 10Andrew Bogott: [C: 032] Move labvirt1016, 1018, 1019, 1020 to Jessie [puppet] - 10https://gerrit.wikimedia.org/r/428070 (owner: 10Andrew Bogott) [17:31:45] !log re-imaging labvirt1016 and labvirt1018 with Jessie [17:31:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:33:29] (03CR) 10Dereckson: [C: 031] lfnwiki: add logo path and missing namespace names [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428052 (https://phabricator.wikimedia.org/T183561) (owner: 10MarcoAurelio) [18:40:10] 10Operations, 10Datasets-General-or-Unknown: Provide a good download service of dumps from Wikimedia - https://phabricator.wikimedia.org/T122917#4147868 (10Aklapper) [18:40:15] 10Operations, 10Datasets-General-or-Unknown: Sometimes (at peak usage?), dumps.wikimedia.org becomes very slow for users (sometimes unresponsive) - https://phabricator.wikimedia.org/T45647#4147869 (10Aklapper) [18:40:18] 10Operations, 10Datasets-General-or-Unknown, 10netops: dumps.wikimedia.org seems to have poor throughput towards some destinations - https://phabricator.wikimedia.org/T120425#4147866 (10Aklapper) 05Open>03stalled >>! In T120425#3763986, @ayounsi wrote: > @Nemo_bis I see that the last comment is from more... [19:47:22] PROBLEM - very high load average likely xfs on ms-be2034 is CRITICAL: CRITICAL - load average: 217.60, 115.18, 58.65 [19:57:23] PROBLEM - MD RAID on ms-be2034 is CRITICAL: CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 1, Spare: 0 [19:57:24] ACKNOWLEDGEMENT - MD RAID on ms-be2034 is CRITICAL: CRITICAL: State: degraded, Active: 3, Working: 3, Failed: 1, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T192721 [19:57:31] 10Operations, 10ops-codfw: Degraded RAID on ms-be2034 - https://phabricator.wikimedia.org/T192721#4147889 (10ops-monitoring-bot) [19:59:52] PROBLEM - Disk space on ms-be2034 is CRITICAL: DISK CRITICAL - /srv/swift-storage/sdb3 is not accessible: Input/output error [20:01:42] PROBLEM - Check systemd state on ms-be2034 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [20:02:43] PROBLEM - swift-container-updater on ms-be2034 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [20:13:32] RECOVERY - very high load average likely xfs on ms-be2034 is OK: OK - load average: 23.02, 33.03, 76.65 [21:11:03] (03PS1) 10Urbanecm: Revert "Revert "Add ruwikimedia to wikidataclient"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428128 (https://phabricator.wikimedia.org/T188456) [21:11:13] (03PS2) 10Urbanecm: Revert "Revert "Add ruwikimedia to wikidataclient"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428128 (https://phabricator.wikimedia.org/T188456) [21:13:19] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428053 (https://phabricator.wikimedia.org/T189109) (owner: 10MarcoAurelio) [21:13:29] (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428055 (https://phabricator.wikimedia.org/T189465) (owner: 10MarcoAurelio) [21:13:42] (03CR) 10Urbanecm: [C: 031] "LGTM." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428052 (https://phabricator.wikimedia.org/T183561) (owner: 10MarcoAurelio) [21:17:27] (03CR) 10Urbanecm: [C: 031] "To SWATter: Run mwscript namespaceDupes.php --wiki=gorwiki --fix" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428053 (https://phabricator.wikimedia.org/T189109) (owner: 10MarcoAurelio) [21:17:37] (03CR) 10Urbanecm: [C: 031] "To SWATter: Run mwscript namespaceDupes.php --wiki=euwikisource --fix" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428055 (https://phabricator.wikimedia.org/T189465) (owner: 10MarcoAurelio) [21:17:47] (03CR) 10Urbanecm: [C: 031] "To SWATter: Run mwscript namespaceDupes.php --wiki=lfnwiki --fix" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/428052 (https://phabricator.wikimedia.org/T183561) (owner: 10MarcoAurelio) [22:36:31] hi [22:39:56] hello [22:41:24] now do +t and then +b duckgoose [22:43:05] see? that would have been wasted time...