[00:50:59] 10Operations, 10Continuous-Integration-Infrastructure, 10SRE-Access-Requests, 10Performance-Team (Radar): Add reedy to contint-docker group - https://phabricator.wikimedia.org/T213015 (10Krinkle) [00:51:07] 10Operations, 10Continuous-Integration-Infrastructure, 10SRE-Access-Requests, 10Performance-Team (Radar): Add krinkle to contint-docker group - https://phabricator.wikimedia.org/T213015 (10Krinkle) [00:51:15] (03PS1) 10Krinkle: Add krinkle to contint-docker group [puppet] - 10https://gerrit.wikimedia.org/r/482483 (https://phabricator.wikimedia.org/T213015) [01:03:50] PROBLEM - Request latencies on argon is CRITICAL: instance=10.64.32.133:6443 verb={LIST,PATCH,PUT} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:03:50] PROBLEM - etcd request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 operation={compareAndSwap,get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:04:28] PROBLEM - etcd request latencies on argon is CRITICAL: instance=10.64.32.133:6443 operation={compareAndSwap,get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:04:28] PROBLEM - etcd request latencies on neon is CRITICAL: instance=10.64.0.40:6443 operation={compareAndSwap,get,list} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:04:46] PROBLEM - Request latencies on chlorine is CRITICAL: instance=10.64.0.45:6443 verb={PATCH,PUT} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [01:05:44] PROBLEM - Request latencies on neon is CRITICAL: instance=10.64.0.40:6443 verb={PATCH,PUT} https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:02:24] PROBLEM - puppet last run on restbase1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:07:12] RECOVERY - Request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:07:12] RECOVERY - etcd request latencies on neon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:07:42] RECOVERY - Request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:07:42] RECOVERY - etcd request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:08:22] RECOVERY - etcd request latencies on argon is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:08:40] RECOVERY - Request latencies on chlorine is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-api [02:33:40] RECOVERY - puppet last run on restbase1011 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [03:32:54] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 931.23 seconds [03:36:36] PROBLEM - puppet last run on analytics1053 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz] [04:02:40] RECOVERY - puppet last run on analytics1053 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [04:26:40] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 266.30 seconds [04:32:22] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received [04:33:30] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy [04:50:56] PROBLEM - Citoid LVS codfw on citoid.svc.codfw.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received [04:53:16] RECOVERY - Citoid LVS codfw on citoid.svc.codfw.wmnet is OK: All endpoints are healthy [06:28:14] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:28:30] PROBLEM - netbox HTTPS on netmon1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 547 bytes in 0.008 second response time [06:29:52] PROBLEM - puppet last run on bast3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:01:08] RECOVERY - puppet last run on bast3002 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [07:08:32] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational [07:08:46] RECOVERY - netbox HTTPS on netmon1002 is OK: HTTP OK: HTTP/1.1 302 Found - 348 bytes in 0.552 second response time [07:53:40] (03PS1) 10Elukey: Decommission an1034/5 from Hadoop Analytics [puppet] - 10https://gerrit.wikimedia.org/r/482489 (https://phabricator.wikimedia.org/T209929) [07:54:15] (03CR) 10Elukey: [C: 03+2] Decommission an1034/5 from Hadoop Analytics [puppet] - 10https://gerrit.wikimedia.org/r/482489 (https://phabricator.wikimedia.org/T209929) (owner: 10Elukey) [07:56:17] (it takes ages for the process to complete, going to start it now and keep it monitored) [08:01:23] (afk again) [09:43:00] (03PS1) 10MaxSem: Get rid of Zero landing support [puppet] - 10https://gerrit.wikimedia.org/r/482492 (https://phabricator.wikimedia.org/T187716) [10:39:32] (03PS1) 10MaxSem: Remove mobilelanding.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482493 (https://phabricator.wikimedia.org/T187716) [14:46:22] (03PS1) 10Zoranzoki21: .gitignore: Add Visual Studio Code in editors [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482501 [14:48:29] (03CR) 10ArielGlenn: [C: 03+2] file truncation checks also now optionally check for last xml tag [dumps] - 10https://gerrit.wikimedia.org/r/482293 (https://phabricator.wikimedia.org/T212462) (owner: 10ArielGlenn) [14:50:16] !log ariel@deploy1001 Started deploy [dumps/dumps@cb30b6c]: check xml files for closing mediawiki tag [14:50:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:22] !log ariel@deploy1001 Finished deploy [dumps/dumps@cb30b6c]: check xml files for closing mediawiki tag (duration: 00m 06s) [14:50:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:30] (03Abandoned) 10ArielGlenn: Check for truncated file content in certain circumstances [dumps] - 10https://gerrit.wikimedia.org/r/481893 (https://phabricator.wikimedia.org/T212462) (owner: 10ArielGlenn) [14:58:16] (03PS1) 10Zoranzoki21: IS.php: Add wgProofreadPagePageJoiner, set it per default on '-' and at zhwikisource on __PAGEJOIN__ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482502 (https://phabricator.wikimedia.org/T205826) [15:53:20] (03PS1) 10Zoranzoki21: Create Portal namespace on shn.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482508 (https://phabricator.wikimedia.org/T212992) [15:55:14] (03CR) 10Zoranzoki21: "Note for deployer: After deployment of this, namespaceDupes.php should be executed." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482508 (https://phabricator.wikimedia.org/T212992) (owner: 10Zoranzoki21) [16:15:42] PROBLEM - Long running screen/tmux on an-coord1001 is CRITICAL: CRIT: Long running SCREEN process. (user: otto PID: 223128, 1731350s 1728000s). [16:43:31] (03CR) 10Tulsi Bhagat: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482508 (https://phabricator.wikimedia.org/T212992) (owner: 10Zoranzoki21) [17:21:12] A resource loader script is hanging again (the same as friday). I'll report it to phabricator [17:39:15] (03PS1) 10Zoranzoki21: Turn off main page special casing for svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482516 (https://phabricator.wikimedia.org/T213018) [17:42:43] 10Operations, 10Wikimedia-General-or-Unknown: load.php URL hanging sometimes - https://phabricator.wikimedia.org/T213030 (10Ciencia_Al_Poder) [17:54:28] PROBLEM - Check systemd state on ms-be1015 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [17:58:06] RECOVERY - Check systemd state on ms-be1015 is OK: OK - running: The system is fully operational [18:06:43] 10Operations, 10WMF-Communications, 10Wikimedia-Mailing-lists, 10Design: Update Wikimedia logo on Mailman web pages from colored version to black and white version - https://phabricator.wikimedia.org/T212674 (10Aklapper) Thanks everyone for chiming in. Proposing to decline this task per last comments. [18:25:58] (03PS9) 10Paladox: gerrit: Update PolyGerrit theme plugin to customise the header either more [puppet] - 10https://gerrit.wikimedia.org/r/482379 [18:27:59] (03PS10) 10Paladox: gerrit: Update PolyGerrit theme plugin to customise the header either more [puppet] - 10https://gerrit.wikimedia.org/r/482379 [18:32:58] (03PS11) 10Paladox: gerrit: Update PolyGerrit theme plugin to customise the header either more [puppet] - 10https://gerrit.wikimedia.org/r/482379 [18:34:48] (03CR) 10Paladox: "@Krinkle done (changed the logo to white)" [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [18:43:04] (03CR) 10Paladox: "Here's some of the colours i have been testing with:" [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [19:04:22] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received [19:05:28] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [19:10:32] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received [19:14:12] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received [19:15:20] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy [19:15:24] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [20:14:01] (03CR) 10Krinkle: "Thanks. I think https://phabricator.wikimedia.org/F27806307 and https://phabricator.wikimedia.org/F27806325 are best." [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [20:14:42] (03CR) 10Krinkle: "The second one has slightly better contrast for readability and accessibility, whicih would help." [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [20:15:48] (03PS12) 10Paladox: gerrit: Update PolyGerrit theme plugin to customise the header either more [puppet] - 10https://gerrit.wikimedia.org/r/482379 [20:16:59] (03PS13) 10Paladox: gerrit: Update PolyGerrit theme plugin to customise the header either more [puppet] - 10https://gerrit.wikimedia.org/r/482379 [20:18:04] (03PS14) 10Paladox: gerrit: Update PolyGerrit theme plugin to customise the header either more [puppet] - 10https://gerrit.wikimedia.org/r/482379 [20:18:39] (03CR) 10Paladox: "I have a live demo at https://gerrit-new.wmflabs.org/r/dashboard/self" [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [20:18:44] (03CR) 10Paladox: "* https://gerrit-new.wmflabs.org/r/" [puppet] - 10https://gerrit.wikimedia.org/r/482379 (owner: 10Paladox) [20:25:11] paladox: that site just returns a blank page :) [20:25:25] Hauskatze works for me :) [20:25:45] Hauskatze try https://gerrit-new.wmflabs.org/r/ [20:26:12] that one works [20:26:45] guess it's browser that I see Subject Status Owner ... crossed by the topmost table line [20:27:16] yup, looks good on Chrome [20:28:02] :) [20:28:12] Apparently you can change things in dark mode too [20:28:36] so it's easy to change it in dark mode and normal mode [20:31:21] (03PS15) 10Paladox: gerrit: Update PolyGerrit theme plugin to customise the header either more [puppet] - 10https://gerrit.wikimedia.org/r/482379 [21:15:47] Hi, how I can at IS.php restrict moving pages at category namespace [21:20:47] (03PS1) 10Zoranzoki21: Add category at wgGettingStartedExcludedCategories for srwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482534 [21:30:34] (03PS1) 10Zoranzoki21: Add categories for other Croatian projects at wmgBabelMainCategory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/482548 [22:00:22] PROBLEM - MegaRAID on analytics1054 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) [22:00:25] ACKNOWLEDGEMENT - MegaRAID on analytics1054 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T213038 [22:00:29] 10Operations, 10ops-eqiad: Degraded RAID on analytics1054 - https://phabricator.wikimedia.org/T213038 (10ops-monitoring-bot) [22:20:31] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:21:35] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 276 bytes in 0.049 second response time [22:26:39] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:35:07] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 276 bytes in 2.819 second response time [22:38:55] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:48:43] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 2.882 second response time [22:52:33] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:15:43] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 2.420 second response time [23:19:31] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:55:01] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.809 second response time [23:58:51] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds