[00:07:37] PROBLEM - Check systemd state on an-launcher1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:13:25] did something happen to log people out? Or just me... [05:20:41] PROBLEM - Host ms-be2050 is DOWN: PING CRITICAL - Packet loss = 100% [05:20:51] RECOVERY - Host ms-be2050 is UP: PING OK - Packet loss = 0%, RTA = 33.54 ms [08:41:10] (03CR) 10Legoktm: [C: 03+2] Add fiwiki 500k temporary logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/652687 (https://phabricator.wikimedia.org/T270974) (owner: 10Majavah) [08:41:57] (03Merged) 10jenkins-bot: Add fiwiki 500k temporary logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/652687 (https://phabricator.wikimedia.org/T270974) (owner: 10Majavah) [08:44:17] !log legoktm@deploy1001 Synchronized static/images/project-logos/fiwiki-500k.png: Add fiwiki 500k temporary logos (1/3) (duration: 00m 58s) [08:44:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:45:36] !log legoktm@deploy1001 Synchronized static/images/project-logos/fiwiki-500k-1.5x.png: Add fiwiki 500k temporary logos (2/3) (duration: 00m 54s) [08:45:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:49] !log legoktm@deploy1001 Synchronized static/images/project-logos/fiwiki-500k-2x.png: Add fiwiki 500k temporary logos (3/3) (duration: 00m 55s) [08:46:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:59] (03PS2) 10Legoktm: Switch fiwiki to their 500k temporary logo! [mediawiki-config] - 10https://gerrit.wikimedia.org/r/652688 (https://phabricator.wikimedia.org/T270974) (owner: 10Majavah) [08:47:09] (03CR) 10Legoktm: [C: 03+2] Switch fiwiki to their 500k temporary logo! [mediawiki-config] - 10https://gerrit.wikimedia.org/r/652688 (https://phabricator.wikimedia.org/T270974) (owner: 10Majavah) [08:47:58] (03Merged) 10jenkins-bot: Switch fiwiki to their 500k temporary logo! [mediawiki-config] - 10https://gerrit.wikimedia.org/r/652688 (https://phabricator.wikimedia.org/T270974) (owner: 10Majavah) [08:48:29] Majavah: live on mwdebug1001 fyi [08:48:38] cool, i'll test [08:49:08] I think you need to do a full cache clear for it to show up [08:49:08] legoktm: looks fine to me [08:49:26] no, the file name changed so your browser doesn't have it cached [08:50:14] hm ok [08:51:20] syncing [08:52:11] !log legoktm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Switch fiwiki to their 500k temporary logo! (T270974) (duration: 00m 55s) [08:52:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:52:15] T270974: Temporarily change logo on the Finnish Wikipedia - https://phabricator.wikimedia.org/T270974 [08:52:56] I don't see it [08:53:17] I do [08:54:21] I see it logged out, I see it logged in w/ Vector, I don't see it logged in w/ Timeless (my default) [08:54:55] if I navigate to a different page, it shows up, wild. [08:55:00] I'm satisfied [08:55:07] weird [08:56:29] anyways, time for fiwiki and the rest of us to celebrate! [08:56:53] Majavah: will you take care of coordinating when it should be taken down? [08:57:22] I'll be around for another hour at least in case anything is wrong [08:58:00] legoktm: yea, I'll report back locally [08:58:56] i need to get something to eat but i'm also around if needed [10:02:25] PROBLEM - Check systemd state on cumin1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:02:41] PROBLEM - Check systemd state on cumin2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [10:27:53] PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 138 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:31:01] RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 40 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:42:01] PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 219 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:46:39] RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 25 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:49:47] PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 341 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:51:21] RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 41 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:56:05] PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 384 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [10:59:13] RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 25 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:03:55] PROBLEM - MediaWiki exceptions and fatals per minute on alert1001 is CRITICAL: 137 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:05:29] RECOVERY - MediaWiki exceptions and fatals per minute on alert1001 is OK: (C)100 gt (W)50 gt 19 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [11:20:38] those look to largely be PHP Fatal error: Allowed memory size of 698351616 bytes exhausted (tried to allocate 20480 bytes) in /srv/mediawiki/php-1.36.0-wmf.22/includes/libs/rdbms/database/DatabaseMysqli.php on line 186 from /rpc/RunSingleJob.php jobrunner.discovery.wmnet in case anyone is curious. [11:21:36] the number of exceptions is back down to the usual at this point for the last 20 mins or so though. [14:38:46] !log milimetric@deploy1001 Started deploy [analytics/refinery@f9281dd]: [SAFE, IGNORE] Simple hotfix for a python bug, analytics refinery only, not urgent [14:38:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:46] !log milimetric@deploy1001 Finished deploy [analytics/refinery@f9281dd]: [SAFE, IGNORE] Simple hotfix for a python bug, analytics refinery only, not urgent (duration: 10m 00s) [14:48:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:41] !log milimetric@deploy1001 Started deploy [analytics/refinery@f9281dd] (thin): [SAFE, IGNORE] Simple hotfix for a python bug, analytics refinery only, not urgent [14:49:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:48] !log milimetric@deploy1001 Finished deploy [analytics/refinery@f9281dd] (thin): [SAFE, IGNORE] Simple hotfix for a python bug, analytics refinery only, not urgent (duration: 00m 07s) [14:49:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:29:07] PROBLEM - Host ms-be2050 is DOWN: PING CRITICAL - Packet loss = 100% [15:29:57] RECOVERY - Host ms-be2050 is UP: PING OK - Packet loss = 0%, RTA = 33.37 ms [16:42:14] 10Operations, 10SRE-swift-storage: Bug in "internal storage backends" in re-deleting a Commons file after undelete - https://phabricator.wikimedia.org/T270994 (10Majavah) [16:51:07] 10Operations, 10SRE-swift-storage: Bug in "internal storage backends" in re-deleting a Commons file after undelete - https://phabricator.wikimedia.org/T270994 (10JGHowes) [16:52:41] 10Operations, 10SRE-swift-storage: Re-deleting a Commons file: "Error deleting file: The file "mwstore://local-multiwrite/local-deleted/..." is in an inconsistent state within the internal storage backends". - https://phabricator.wikimedia.org/T270994 (10Aklapper) [17:58:01] 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review: Degraded RAID on an-coord1002 - https://phabricator.wikimedia.org/T270768 (10elukey) My great ignorance in sw-RAID setups forced me to step on a mine, namely T215183. The failed disk is the one containing the grub partition table, since it was not... [20:27:49] (03PS1) 10Urbanecm: ukwikisource: Add Archive namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653168 (https://phabricator.wikimedia.org/T270627) [20:32:26] (03PS1) 10Urbanecm: ukwikisource: Delete Translation namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653170 (https://phabricator.wikimedia.org/T270628) [20:36:43] (03PS1) 10Urbanecm: hrwiki: Restrict (apply)changetags permissions to sysop and bot group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653173 (https://phabricator.wikimedia.org/T270996) [20:40:52] (03PS1) 10Urbanecm: frwiktionary: Mark several namespaces as content namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653174 (https://phabricator.wikimedia.org/T270821) [20:46:32] (03CR) 10Urbanecm: [C: 03+1] Add localised logos for the Madurese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651462 (https://phabricator.wikimedia.org/T270693) (owner: 10Odder) [20:46:37] (03CR) 10Urbanecm: [C: 03+1] Add localised logos for the Madurese Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/651465 (https://phabricator.wikimedia.org/T270693) (owner: 10Odder) [20:48:22] (03PS1) 10Urbanecm: hrwiki: Enable visual editor in the draft (Nacrt) namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653177 (https://phabricator.wikimedia.org/T270688) [20:49:06] (03CR) 10jerkins-bot: [V: 04-1] hrwiki: Enable visual editor in the draft (Nacrt) namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653177 (https://phabricator.wikimedia.org/T270688) (owner: 10Urbanecm) [20:52:02] (03PS2) 10Urbanecm: hrwiki: Enable visual editor in the draft (Nacrt) namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653177 (https://phabricator.wikimedia.org/T270688) [21:40:33] (03PS2) 10Urbanecm: hrwiki: Restrict (apply)changetags permissions to sysop and bot group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653173 (https://phabricator.wikimedia.org/T270996) [21:40:39] PROBLEM - mediawiki originals uploads -hourly- for eqiad on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [21:41:15] (03CR) 10Luke081515: [C: 03+1] hrwiki: Restrict (apply)changetags permissions to sysop and bot group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653173 (https://phabricator.wikimedia.org/T270996) (owner: 10Urbanecm) [21:51:17] PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [22:43:59] (03PS1) 10Luke081515: Enable abusefilter block at hrwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653206 (https://phabricator.wikimedia.org/T270997) [22:48:19] (03PS1) 10Luke081515: Enable abusefilter block at zh_yuewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653207 (https://phabricator.wikimedia.org/T270567) [22:51:25] PROBLEM - mediawiki originals uploads -hourly- for codfw on alert1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw [22:52:49] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/653206 (https://phabricator.wikimedia.org/T270997) (owner: 10Luke081515) [23:10:51] (03PS1) 10Urbanecm: Add uz.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/653212 (https://phabricator.wikimedia.org/T270987) [23:11:24] (03CR) 10jerkins-bot: [V: 04-1] Add uz.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/653212 (https://phabricator.wikimedia.org/T270987) (owner: 10Urbanecm) [23:11:57] (03PS2) 10Urbanecm: Add uz.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/653212 (https://phabricator.wikimedia.org/T270987) [23:12:05] RECOVERY - mediawiki originals uploads -hourly- for eqiad on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=eqiad [23:51:03] RECOVERY - mediawiki originals uploads -hourly- for codfw on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To%23mediawiki_originals_uploads https://grafana.wikimedia.org/d/OPgmB1Eiz/swift?panelId=26&fullscreen&orgId=1&var-DC=codfw