[00:04:53] (03CR) 10Huji: [C: 03+1] Revert "Change votewiki language temporarily to fa for fawiki elections" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/639650 (https://phabricator.wikimedia.org/T262689) (owner: 10Urbanecm) [00:06:25] (03PS1) 10Dave Pifke: arclamp: Use Python 3 [puppet] - 10https://gerrit.wikimedia.org/r/639885 (https://phabricator.wikimedia.org/T267269) [00:28:52] PROBLEM - HTTPS-dbtree on dbmonitor1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dbtree.wikimedia.org [00:30:24] RECOVERY - HTTPS-dbtree on dbmonitor1001 is OK: HTTP OK: HTTP/1.1 200 OK - 92419 bytes in 0.603 second response time https://wikitech.wikimedia.org/wiki/Dbtree.wikimedia.org [00:34:26] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:45:52] RECOVERY - Check systemd state on an-launcher1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:22:48] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install logstash203[345] - https://phabricator.wikimedia.org/T267420 (10Reedy) [02:22:57] 10Operations, 10ops-codfw, 10DC-Ops: (Need By: TBD) rack/setup/install logstash203[345] - https://phabricator.wikimedia.org/T267420 (10Reedy) More stash, less tash. [02:23:25] servers getting in on movember [04:31:25] (03CR) 10DannyS712: [C: 03+1] Revert "Change votewiki language temporarily to fa for fawiki elections" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/639650 (https://phabricator.wikimedia.org/T262689) (owner: 10Urbanecm) [08:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20201107T0800) [08:13:13] 10Operations, 10DBA, 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10jijiki) [10:04:36] 10Operations, 10Graphoid, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), 10Platform Engineering (Icebox): Undeploy graphoid - https://phabricator.wikimedia.org/T242855 (10Aklapper) [11:30:02] PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 122 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:31:44] RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 7 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:52:00] PROBLEM - proton LVS codfw on proton.svc.codfw.wmnet is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received https://wikitech.wikimedia.org/wiki/Proton [11:53:40] RECOVERY - proton LVS codfw on proton.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Proton [11:57:59] 10Operations, 10MediaWiki-skins-Foreground, 10Wikimedia-Mailing-lists: Move foreground skin to lists.wikimedia.org - https://phabricator.wikimedia.org/T141831 (10Peachey88) [13:21:00] 10Operations, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Update 2030.wikimedia.org redirect to new URI - https://phabricator.wikimedia.org/T264797 (10Abbad) Hey. Sorry, I'm not aware of the procedures here but it seems to have been a little while. Any update or action needed from my side? [13:28:19] 10Operations, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Update 2030.wikimedia.org redirect to new URI - https://phabricator.wikimedia.org/T264797 (10RhinosF1) Someone from #operations needs to merge https://gerrit.wikimedia.org/r/c/632552 @Dzahn was requested for review [13:51:50] PROBLEM - Check systemd state on elastic1063 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [13:53:21] (03PS1) 10Ladsgroup: ores: Reduce number of uwsgi workers to 50 [puppet] - 10https://gerrit.wikimedia.org/r/639898 (https://phabricator.wikimedia.org/T263910) [13:55:34] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Proton, 10Traffic, and 2 others: PDF download generates invalid PDF files - https://phabricator.wikimedia.org/T266559 (10Urbanecm) For the record, I just answered an user report of this issue sent to OTRS. @LGoto I disagree with this being "Low" prior... [14:15:46] RECOVERY - Check systemd state on elastic1063 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:06:28] 10Operations, 10Wikimedia-Mailing-lists: Password reset request for wikimedia-nd mailing list - https://phabricator.wikimedia.org/T202247 (10Geekdidi) [17:00:00] PROBLEM - Host mr1-eqsin IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [17:00:26] PROBLEM - Host mr1-eqsin.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 70%, RTA = 3438.66 ms [17:05:46] RECOVERY - Host mr1-eqsin IPv6 is UP: PING OK - Packet loss = 0%, RTA = 230.86 ms [17:06:12] RECOVERY - Host mr1-eqsin.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 236.74 ms [17:32:48] 10Operations, 10Scap (Scap3-MediaWiki-MVP): Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352 (10Aklapper) >>! In T104352#6012893, @mmodell wrote: > I think this one should be resolved now? ping - do people agree? [18:57:53] 10Operations, 10Wikimedia-Apache-configuration, 10Patch-For-Review: Update 2030.wikimedia.org redirect to new URI - https://phabricator.wikimedia.org/T264797 (10Dzahn) I am currently on vacation. Please try to find other reviewers to speed this up. [19:36:22] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/page/summary/{title} (Get summary from storage) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [19:37:56] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [20:21:40] twentyafterfour tgr_ Urbanecm around? [20:21:51] yes, what's up DannyS712 ? [20:22:12] https://phabricator.wikimedia.org/T256395#6610919 [20:22:41] ehm... [20:22:50] I'll PM [20:23:31] DannyS712: it's being investigated [20:29:04] okay, just wanted to make sure people were aware [21:05:34] PROBLEM - Restbase edge esams on text-lb.esams.wikimedia.org is CRITICAL: /api/rest_v1/page/mobile-sections/{title} (Get mobile-sections for a test page on enwiki) timed out before a response was received https://wikitech.wikimedia.org/wiki/RESTBase [21:07:10] RECOVERY - Restbase edge esams on text-lb.esams.wikimedia.org is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/RESTBase [21:34:15] 10Operations, 10LDAP-Access-Requests: Create a LDAP/Wikitech account for Perside Rosalie - https://phabricator.wikimedia.org/T220611 (10Muchiri124) [22:04:50] PROBLEM - Disk space on Hadoop worker on an-worker1103 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/e 15 GB (0% inode=99%): https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Administration [22:31:17] (03PS7) 10Krinkle: Update redirection of 2030.wikimedia.org with new URI [puppet] - 10https://gerrit.wikimedia.org/r/632552 (https://phabricator.wikimedia.org/T264797) (owner: 10Samuel (WMF)) [22:31:28] (03CR) 10Krinkle: [C: 03+1] "Added T264797 to the doc comment" [puppet] - 10https://gerrit.wikimedia.org/r/632552 (https://phabricator.wikimedia.org/T264797) (owner: 10Samuel (WMF))