[00:05:19] PROBLEM - Widespread puppet agent failures on icinga1001 is CRITICAL: 0.01012 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [00:08:33] PROBLEM - Widespread puppet agent failures on icinga1001 is CRITICAL: 0.01012 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [00:14:59] PROBLEM - Widespread puppet agent failures on icinga1001 is CRITICAL: 0.01012 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [00:26:21] PROBLEM - Widespread puppet agent failures on icinga1001 is CRITICAL: 0.01012 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [00:32:49] RECOVERY - Widespread puppet agent failures on icinga1001 is OK: (C)0.01 ge (W)0.006 ge 0.003634 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [02:53:55] PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 40558688 and 1 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [02:56:05] PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [02:57:09] RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 8680 and 66 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [03:06:43] RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [03:33:01] PROBLEM - Memory correctable errors -EDAC- on elastic1029 is CRITICAL: 4.001 ge 4 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=elastic1029&var-datasource=eqiad+prometheus/ops [03:59:26] 10Operations, 10MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), 10Patch-For-Review, 10User-Ladsgroup, and 2 others: Create Wikisource Hindi - https://phabricator.wikimedia.org/T218155 (10Dcljr) @MF-Warburg and @jhsoby: another week has passed and nothing has been imported beyond [[ https://hi.wikisource.org/wik... [04:02:22] 10Operations, 10Core Platform Team, 10Editing-team, 10Parsing-Team, and 9 others: RFC: Serve Main Page of Wikimedia wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10Dcljr) >>! In T120085#5548718, @Bawolff wrote: > Not really. https://www.wikidata.org/ and https://www.wikidata.org... [04:53:09] 10Operations, 10Core Platform Team, 10Editing-team, 10Parsing-Team, and 9 others: RFC: Serve Main Page of Wikimedia wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10Bawolff) >>! In T120085#5548824, @Dcljr wrote: >>>! In T120085#5548718, @Bawolff wrote: >> Not really. https://www.w... [05:15:45] PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:26:23] RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:48:17] PROBLEM - Host mr1-eqiad.oob is DOWN: PING CRITICAL - Packet loss = 100% [06:48:51] !log force umount/remount of /mnt/hdfs on an-coord1001 - processes stuck in D state, fuser proc consuming a ton of memory [06:48:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:50:03] PROBLEM - Router interfaces on mr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.199, interfaces up: 35, down: 1, dormant: 0, excluded: 1, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:50:53] PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: CRITICAL - Destination Unreachable (2607:f6f0:205::153) [06:51:41] RECOVERY - Router interfaces on mr1-eqiad is OK: OK: host 208.80.154.199, interfaces up: 37, down: 0, dormant: 0, excluded: 1, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [06:54:01] RECOVERY - Host mr1-eqiad.oob is UP: PING OK - Packet loss = 0%, RTA = 1.44 ms [06:56:37] RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 2.86 ms [06:57:45] the interface flapped but can't see anything suspicious in the logs --^ [06:57:49] Cc: XioNoX [07:37:55] 10Operations, 10Core Platform Team, 10Editing-team, 10Parsing-Team, and 9 others: RFC: Serve Main Page of Wikimedia wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10Dcljr) >>! In T120085#5548827, @Bawolff wrote: >>>! In T120085#5548824, @Dcljr wrote: >> Yes, but what is the softwa... [07:56:29] PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:07:03] RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:45:31] (03CR) 10Mobrovac: [C: 04-1] "LGTM, but you also need to change the image version tag in helmfile.d/services/{cluster}/wikifeeds.values.yaml to pick up the new code. A " [deployment-charts] - 10https://gerrit.wikimedia.org/r/540967 (https://phabricator.wikimedia.org/T170455) (owner: 10Mholloway) [08:53:14] (03CR) 10Mobrovac: [C: 04-1] "I like the ideas of moving things around to be closer to the upstream config as well as being explicit about defaults." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/540948 (https://phabricator.wikimedia.org/T200803) (owner: 10Eevans) [09:16:53] 10Operations, 10Core Platform Team, 10Editing-team, 10Parsing-Team, and 9 others: RFC: Serve Main Page of Wikimedia wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10Bawolff) Yes, but when you visit the site it will get removed (in the interface). To put it another way, the / is us... [09:44:09] PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [09:54:45] RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [10:44:16] 10Operations, 10Phabricator, 10Traffic: Access Forbidden to Phabricator at WikiArabia 2019 (Morocco) - https://phabricator.wikimedia.org/T234598 (10Aklapper) [12:47:55] (03PS1) 10Majavah: Enable partial blocks on nlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541008 (https://phabricator.wikimedia.org/T234685) [13:34:22] (03CR) 10Urbanecm: [C: 03+1] "LGTM!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541008 (https://phabricator.wikimedia.org/T234685) (owner: 10Majavah) [13:41:34] (03PS1) 10Urbanecm: Enable NewUserMessage on sq.wikipedia and sq.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541009 (https://phabricator.wikimedia.org/T234499) [15:56:57] PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [16:07:33] RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [16:36:48] (03CR) 10Ladsgroup: mediawiki: Use mediawiki::errorpage instead of a php7-fatal-error.php.erb (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/539203 (https://phabricator.wikimedia.org/T113114) (owner: 10Ladsgroup) [16:36:50] (03PS4) 10Ladsgroup: mediawiki: Use mediawiki::errorpage instead of a php7-fatal-error.php.erb [puppet] - 10https://gerrit.wikimedia.org/r/539203 (https://phabricator.wikimedia.org/T113114) [17:02:50] (03PS1) 10Daimona Eaytoy: Move the remaining wikis to AbuseFilterCachingParser [mediawiki-config] - 10https://gerrit.wikimedia.org/r/541026 (https://phabricator.wikimedia.org/T156095) [17:05:45] (03CR) 10Mholloway: "recheck" [deployment-charts] - 10https://gerrit.wikimedia.org/r/540967 (https://phabricator.wikimedia.org/T170455) (owner: 10Mholloway) [18:07:53] PROBLEM - mobileapps endpoints health on scb2006 is CRITICAL: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most-read articles for January 1, 2016 (with aggregated=true)) is CRITICAL: Test retrieve the most-read articles for January 1, 2016 (with aggregated=true) returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [18:11:11] RECOVERY - mobileapps endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [18:27:16] (03PS3) 10Alex Monk: openstack scripts: Use designateclient where possible [puppet] - 10https://gerrit.wikimedia.org/r/513910 (https://phabricator.wikimedia.org/T224708) [18:27:18] (03PS5) 10Alex Monk: openstack mwopenstackclients: Remove unused methods provided by designateclient [puppet] - 10https://gerrit.wikimedia.org/r/513911 (https://phabricator.wikimedia.org/T224708) [18:28:15] (03CR) 10jerkins-bot: [V: 04-1] openstack mwopenstackclients: Remove unused methods provided by designateclient [puppet] - 10https://gerrit.wikimedia.org/r/513911 (https://phabricator.wikimedia.org/T224708) (owner: 10Alex Monk) [18:54:46] (03PS2) 10Alex Monk: Use designateclient in ensure functions [puppet] - 10https://gerrit.wikimedia.org/r/522196 (https://phabricator.wikimedia.org/T227785) (owner: 10Andrew Bogott) [18:54:48] (03PS4) 10Alex Monk: openstack scripts: Use designateclient where possible [puppet] - 10https://gerrit.wikimedia.org/r/513910 (https://phabricator.wikimedia.org/T224708) [18:54:50] (03PS6) 10Alex Monk: openstack mwopenstackclients: Remove unused methods provided by designateclient [puppet] - 10https://gerrit.wikimedia.org/r/513911 (https://phabricator.wikimedia.org/T224708) [18:56:35] (03CR) 10Alex Monk: "So after I162647ab I think" [puppet] - 10https://gerrit.wikimedia.org/r/522196 (https://phabricator.wikimedia.org/T227785) (owner: 10Andrew Bogott) [19:14:48] 10Operations, 10Puppet, 10Patch-For-Review: Migrate as much as possible from network::constants from network.pp to hiera - https://phabricator.wikimedia.org/T87519 (10Krenair) [19:14:51] 10Operations, 10Patch-For-Review: Replacement of network::constant's special_hosts - https://phabricator.wikimedia.org/T220894 (10Krenair) [19:15:40] 10Operations, 10Puppet, 10Patch-For-Review: Migrate as much as possible from network::constants from network.pp to hiera - https://phabricator.wikimedia.org/T87519 (10Krenair) Reverse-duping this against {T220894}, anyone should feel free to reopen if they disagree [19:16:02] 10Operations, 10Puppet, 10Patch-For-Review: Migrate as much as possible from network::constants from network.pp to hiera - https://phabricator.wikimedia.org/T87519 (10Krenair) [19:16:04] 10Operations, 10Patch-For-Review: Replacement of network::constant's special_hosts - https://phabricator.wikimedia.org/T220894 (10Krenair) [23:24:17] PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [23:45:31] RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers