[02:28:53] PROBLEM - Postgres Replication Lag on maps2002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 18296376 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [02:37:03] RECOVERY - Postgres Replication Lag on maps2002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 251480 and 65 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [03:24:01] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid [03:35:23] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid [04:35:28] (03PS1) 10Jayprakash12345: Create Portal namespace for sawikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544343 (https://phabricator.wikimedia.org/T235343) [04:39:47] (03CR) 10Jayprakash12345: "Hey Urbanecm, Can you take care of this for deployment?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544343 (https://phabricator.wikimedia.org/T235343) (owner: 10Jayprakash12345) [06:44:51] PROBLEM - Check the last execution of netbox_ganeti_eqiad_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_eqiad_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [06:55:27] RECOVERY - Check the last execution of netbox_ganeti_eqiad_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_eqiad_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:23:17] 10Operations: Error 503 and slow loading on multiple wikis - https://phabricator.wikimedia.org/T235949 (10jijiki) [08:25:42] 10Operations: Error 503 and slow loading on multiple wikis (19th Oct 2019 21:28 - 21:36 UTC) - https://phabricator.wikimedia.org/T235949 (10jijiki) [10:40:51] (03CR) 10Jforrester: [C: 03+1] "Ha." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544193 (owner: 10Reedy) [10:51:12] 10Puppet, 10Beta-Cluster-Infrastructure: Puppet broken on Beta Cluster (profile::backup::ferm_directors) - https://phabricator.wikimedia.org/T235968 (10MarcoAurelio) [10:56:43] 10Operations: Error 503 and slow loading on multiple wikis (19th Oct 2019 21:28 - 21:36 UTC) - https://phabricator.wikimedia.org/T235949 (10Marostegui) For what is worth, on the db layer there is a spike on connections as well: https://grafana.wikimedia.org/d/000000278/mysql-aggregated?panelId=9&fullscreen&orgId... [10:58:10] 10Puppet, 10Beta-Cluster-Infrastructure: Puppet broken on Beta Cluster (profile::backup::ferm_directors) - https://phabricator.wikimedia.org/T235968 (10MarcoAurelio) Apparently it's https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/543877/ by @Dzahn [11:37:41] 10Puppet, 10Beta-Cluster-Infrastructure: Puppet broken on Beta Cluster (profile::backup::ferm_directors) - https://phabricator.wikimedia.org/T235968 (10Paladox) All I did to fix this was set ‘profile::backup::ferm_directors: []’ in hiera. [11:51:07] 10Puppet, 10Beta-Cluster-Infrastructure: Puppet broken on Beta Cluster (profile::backup::ferm_directors) - https://phabricator.wikimedia.org/T235968 (10MarcoAurelio) >>! In T235968#5589365, @Paladox wrote: > All I did to fix this was set ‘profile::backup::ferm_directors: []’ in hiera. I re-ran puppet and I st... [11:53:13] 10Puppet, 10Beta-Cluster-Infrastructure: Puppet broken on Beta Cluster (profile::backup::ferm_directors) - https://phabricator.wikimedia.org/T235968 (10Paladox) @MarcoAurelio it takes a few minutes due to the cache. (It’s not instant once your edit that page) [12:25:24] 10Operations, 10Analytics, 10Analytics-Kanban, 10LDAP-Access-Requests, and 2 others: Analytics Access for Grant (groups cn=wmf and analytics-privatedata-users) - https://phabricator.wikimedia.org/T235260 (10gsingers) @Nuria I can access it. Thanks! [12:34:49] (03PS1) 10Ayounsi: Remove esams exclusion [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/544389 [12:36:30] (03CR) 10Ayounsi: "To be merged sometime this week, before the esams work is completed." [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/544389 (owner: 10Ayounsi) [13:08:13] 10Puppet, 10Beta-Cluster-Infrastructure: Puppet broken on Beta Cluster (profile::backup::ferm_directors) - https://phabricator.wikimedia.org/T235968 (10Krenair) 05Open→03Resolved a:03Krenair https://wikitech.wikimedia.org/w/index.php?title=Hiera:Deployment-prep&diff=1841599&oldid=1837662 [16:22:14] (03PS1) 10Faidon Liambotis: Remove esams from blacklists [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/544399 [16:37:36] (03PS1) 10Zoranzoki21: wgCopyUploadDomains: Add iip.bu.uni.wroc.pl there [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544400 (https://phabricator.wikimedia.org/T235904) [16:38:28] (03CR) 10jerkins-bot: [V: 04-1] wgCopyUploadDomains: Add iip.bu.uni.wroc.pl there [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544400 (https://phabricator.wikimedia.org/T235904) (owner: 10Zoranzoki21) [16:43:27] (03CR) 10Zoranzoki21: "... What happening?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544400 (https://phabricator.wikimedia.org/T235904) (owner: 10Zoranzoki21) [16:43:42] (03CR) 10Zoranzoki21: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/544400 (https://phabricator.wikimedia.org/T235904) (owner: 10Zoranzoki21) [16:48:45] (03CR) 10Ayounsi: [C: 04-1] "Duplicate of https://gerrit.wikimedia.org/r/c/operations/software/netbox-reports/+/544389 :)" [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/544399 (owner: 10Faidon Liambotis) [16:49:11] (03Abandoned) 10Faidon Liambotis: Remove esams from blacklists [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/544399 (owner: 10Faidon Liambotis) [19:38:15] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1004 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [1000.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [19:47:53] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1004 is OK: OK: Less than 20.00% above the threshold [500.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [20:32:45] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, and 2 others: decommission frav1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T222109 (10Papaul) ` papaul@fasw-c-eqiad# show | compare [edit interfaces interface-range vlan-administration] - member "ge-[0-1]/0/3"; [edit interfaces interfa... [20:33:19] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission, and 2 others: decommission frav1001.frack.eqiad.wmnet - https://phabricator.wikimedia.org/T222109 (10Papaul) [20:40:34] 10Operations, 10Wikimedia-Mailing-lists: Lengthy delays in emails being recieved from mailing lists - https://phabricator.wikimedia.org/T235983 (10Pine) [21:06:29] PROBLEM - ElasticSearch unassigned shard check - 9243 on search.svc.eqiad.wmnet is CRITICAL: CRITICAL - dewiki_content_1566659363[5](2019-10-17T11:16:36.023Z), enwiki_content_1546970425[1](2019-10-17T11:16:49.266Z), enwiki_content_1546970425[6](2019-10-17T11:16:49.267Z) https://wikitech.wikimedia.org/wiki/Search%23Administration [21:13:20] 10Operations, 10Wikimedia-Mailing-lists: Lengthy delays in emails being recieved from mailing lists - https://phabricator.wikimedia.org/T235983 (10Pine) I checked the delays for a few emails that I get from non-WMF mailing lists, and the maximum delay that I saw was under 4 minutes, so I think that there is li... [21:20:29] 10Operations, 10Wikimedia-Mailing-lists, 10Space (Jan-Mar-2020): Integrate mailing lists in Wikimedia Space - https://phabricator.wikimedia.org/T226727 (10Pine) Under the current circumstances of Wikimedia Space, I oppose integrating the mailing lists into Wikimedia Space, particularly 1. extending WMF's cur... [23:56:20] 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO, 10serviceops, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Paladox) [23:57:58] 10Operations, 10Gerrit, 10Release-Engineering-Team-TODO, 10serviceops, and 2 others: Gerrit Hardware Upgrade (+ upgrade from jessie to stretch or buster) - https://phabricator.wikimedia.org/T222391 (10Paladox)