[02:14:36] 10Operations, 10CPT Initiatives (PHP7 (TEC4)), 10HHVM, 10MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), and 3 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10Gilles) Congratulations to everyone involved in this migration, this is excellent work! [02:19:35] PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [02:21:11] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [04:14:49] PROBLEM - snapshot of s6 in eqiad on db1115 is CRITICAL: snapshot for s6 at eqiad taken more than 4 days ago: Most recent backup 2019-09-25 03:45:12 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [04:21:31] PROBLEM - snapshot of s7 in eqiad on db1115 is CRITICAL: snapshot for s7 at eqiad taken more than 4 days ago: Most recent backup 2019-09-25 04:10:29 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [06:16:32] 10Operations, 10DBA: snapshot for s6/s7 at eqiad taken more than 4 days ago - https://phabricator.wikimedia.org/T234152 (10jijiki) [06:17:33] ACKNOWLEDGEMENT - snapshot of s6 in eqiad on db1115 is CRITICAL: snapshot for s6 at eqiad taken more than 4 days ago: Most recent backup 2019-09-25 03:45:12 Effie Mouzeli Task: T234152 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [06:17:33] ACKNOWLEDGEMENT - snapshot of s7 in eqiad on db1115 is CRITICAL: snapshot for s7 at eqiad taken more than 4 days ago: Most recent backup 2019-09-25 04:10:29 Effie Mouzeli Task: T234152 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [06:21:33] 10Operations, 10ops-eqiad: Can't SSH to mw1290.mgmt - https://phabricator.wikimedia.org/T234153 (10jijiki) [06:22:03] ACKNOWLEDGEMENT - SSH mw1290.mgmt on mw1290.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds Effie Mouzeli Opened T234153 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [06:43:41] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [06:46:57] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [06:53:27] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [07:06:14] 10Operations, 10DBA: snapshot for s6/s7 at eqiad taken more than 4 days ago - https://phabricator.wikimedia.org/T234152 (10Marostegui) Might clear itself in a few hours. But thanks for reporting it. We will take a look! [08:02:34] (03PS1) 10Elukey: profile::swap: automatically remove jupyter trash files [puppet] - 10https://gerrit.wikimedia.org/r/539694 [08:33:21] (03PS1) 10Volans: tests: update requests_mock URI registration [software/cumin] - 10https://gerrit.wikimedia.org/r/539695 [08:43:06] (03CR) 10Volans: [C: 03+2] "Self merging to unblock cumin CI" [software/cumin] - 10https://gerrit.wikimedia.org/r/539695 (owner: 10Volans) [08:49:46] (03Merged) 10jenkins-bot: tests: update requests_mock URI registration [software/cumin] - 10https://gerrit.wikimedia.org/r/539695 (owner: 10Volans) [08:50:50] (03CR) 10jenkins-bot: tests: update requests_mock URI registration [software/cumin] - 10https://gerrit.wikimedia.org/r/539695 (owner: 10Volans) [08:58:39] (03CR) 10Volans: "recheck" [software/cumin] - 10https://gerrit.wikimedia.org/r/514840 (https://phabricator.wikimedia.org/T205900) (owner: 10CRusnov) [11:17:33] PROBLEM - mobileapps endpoints health on scb2006 is CRITICAL: /{domain}/v1/page/summary/{title} (Get summary for test page) is CRITICAL: Test Get summary for test page returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [11:19:11] RECOVERY - mobileapps endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [11:28:59] PROBLEM - mobileapps endpoints health on scb2006 is CRITICAL: /{domain}/v1/media/image/featured/{year}/{month}/{day} (retrieve featured image data for April 29, 2016) is CRITICAL: Test retrieve featured image data for April 29, 2016 returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [11:37:05] RECOVERY - mobileapps endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [12:37:33] PROBLEM - mobileapps endpoints health on scb2006 is CRITICAL: /{domain}/v1/page/metadata/{title} (retrieve extended metadata for Video article on English Wikipedia) is CRITICAL: Test retrieve extended metadata for Video article on English Wikipedia returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [12:40:53] RECOVERY - mobileapps endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [12:45:45] PROBLEM - mobileapps endpoints health on scb2006 is CRITICAL: /{domain}/v1/page/random/title (retrieve a random article title) is CRITICAL: Test retrieve a random article title returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [12:47:23] RECOVERY - mobileapps endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [12:53:46] <[1997kB]> is there any issue with login. A user reported they are receiving There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Please resubmit the form. on multiple attempts. [13:16:45] PROBLEM - mobileapps endpoints health on scb2006 is CRITICAL: /{domain}/v1/page/most-read/{year}/{month}/{day} (retrieve the most read articles for January 1, 2016) is CRITICAL: Test retrieve the most read articles for January 1, 2016 returned the unexpected status 504 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [13:18:27] RECOVERY - mobileapps endpoints health on scb2006 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [14:24:21] (03CR) 10Nuria: [C: 03+1] profile::swap: automatically remove jupyter trash files [puppet] - 10https://gerrit.wikimedia.org/r/539694 (owner: 10Elukey) [15:17:20] (03PS1) 10Daimona Eaytoy: Remove afl_log_id from maintain-views [puppet] - 10https://gerrit.wikimedia.org/r/539710 (https://phabricator.wikimedia.org/T226851) [15:19:31] (03Abandoned) 10Daimona Eaytoy: Remove afl_log_id from maintain-views [puppet] - 10https://gerrit.wikimedia.org/r/539710 (https://phabricator.wikimedia.org/T226851) (owner: 10Daimona Eaytoy) [15:29:22] (03PS2) 10Elukey: profile::swap: automatically remove jupyter trash files [puppet] - 10https://gerrit.wikimedia.org/r/539694 [15:29:47] (03CR) 10Elukey: "Simplified regex, more readable :)" [puppet] - 10https://gerrit.wikimedia.org/r/539694 (owner: 10Elukey) [15:50:21] A user in Canada has been telling me the site is slow for him... Dunno if this is relevant [15:55:54] Ah hmm seems like it's an issue with the user's computer, scratch that ^ and sorry for the bother [17:20:50] (03CR) 10Alex Monk: "Puppet is failing to run on deployment-logstash03 due to the lack of the elastalert class (which would come from I0f4cf1e7) - I was going " [puppet] - 10https://gerrit.wikimedia.org/r/505762 (https://phabricator.wikimedia.org/T213933) (owner: 10Filippo Giunchedi) [17:33:16] (03Abandoned) 10Urbanecm: [tests] Test wgRestrictionLevels entries [mediawiki-config] - 10https://gerrit.wikimedia.org/r/529047 (https://phabricator.wikimedia.org/T230103) (owner: 10Urbanecm) [17:56:16] (03CR) 10Paladox: [C: 03+1] gerrit: Fix renamed group name "Project and Group Creators" [puppet] - 10https://gerrit.wikimedia.org/r/539676 (owner: 10MarcoAurelio) [17:57:32] (03CR) 10Paladox: [C: 03+1] "This is a safe change, all it does is changes the default group assigned to new repos (it won't affect existing sites, not will it change " [puppet] - 10https://gerrit.wikimedia.org/r/539676 (owner: 10MarcoAurelio) [21:01:23] (03CR) 10Krinkle: [C: 03+1] "Can you enable it for -labs as well (use the '-' prefix to overwrite the key in its entirety, we'd set 'default' there)." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/539674 (https://phabricator.wikimedia.org/T156095) (owner: 10Daimona Eaytoy)