[00:01:54] RECOVERY - cp3 Disk Space on cp3 is OK: DISK OK - free space: / 4061 MB (16% inode=93%); [00:03:36] PROBLEM - cloud1 Current Load on cloud1 is WARNING: WARNING - load average: 21.28, 21.26, 16.20 [00:04:32] PROBLEM - cloud2 Current Load on cloud2 is WARNING: WARNING - load average: 18.44, 21.65, 15.42 [00:05:37] RECOVERY - cloud1 Current Load on cloud1 is OK: OK - load average: 19.64, 20.29, 16.47 [00:06:31] RECOVERY - cloud2 Current Load on cloud2 is OK: OK - load average: 11.30, 18.42, 15.03 [00:14:39] [02puppet] 07paladox closed pull request 03#1313: Remove support for custom mobile domain - 13https://git.io/JvdiU [00:14:41] [02puppet] 07paladox deleted branch 03paladox-patch-5 - 13https://git.io/vbiAS [00:14:43] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-5 [00:15:14] [02puppet] 07paladox closed pull request 03#1317: Varnish: Set and unset mf_useformat if toggle_view_mobile/toggle_view_desktop is in the url - 13https://git.io/JvFKk [00:15:16] [02puppet] 07paladox deleted branch 03paladox-patch-6 - 13https://git.io/vbiAS [00:15:17] [02miraheze/puppet] 07paladox deleted branch 03paladox-patch-6 [00:45:11] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JfKVo [00:45:12] [02miraheze/services] 07MirahezeSSLBot 034864fb4 - BOT: Updating services config for wikis [02:27:12] PROBLEM - wiki.fourta.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.fourta.org' expires in 15 day(s) (Mon 15 Jun 2020 02:24:49 GMT +0000). [02:29:52] PROBLEM - wiki.wikidadds.org - LetsEncrypt on sslhost is WARNING: WARNING - Certificate 'wiki.wikidadds.org' expires in 15 day(s) (Mon 15 Jun 2020 02:26:53 GMT +0000). [02:30:59] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JfKoq [02:31:00] [02miraheze/ssl] 07MirahezeSSLBot 03bcff62d - Bot: Update SSL cert for wiki.fourta.org [02:32:45] [02miraheze/ssl] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JfKoY [02:32:47] [02miraheze/ssl] 07MirahezeSSLBot 03b142c40 - Bot: Update SSL cert for wiki.wikidadds.org [02:43:11] RECOVERY - wiki.fourta.org - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.fourta.org' will expire on Fri 28 Aug 2020 01:30:52 GMT +0000. [02:44:02] RECOVERY - wiki.wikidadds.org - LetsEncrypt on sslhost is OK: OK - Certificate 'wiki.wikidadds.org' will expire on Fri 28 Aug 2020 01:32:38 GMT +0000. [03:00:19] [02miraheze/services] 07MirahezeSSLBot pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JfKo6 [03:00:21] [02miraheze/services] 07MirahezeSSLBot 037b8fc9d - BOT: Updating services config for wikis [09:01:31] [ANNOUNCEMENT] On Monday, we will be performing maintenance on the ZppixBot instance. This will result in the unavailability of the Website, Wiki & Main bot (not -test). It will take place at 10 AM UTC+1 and last around half an hour. [10:51:04] PROBLEM - cp7 HTTP 4xx/5xx ERROR Rate on cp7 is WARNING: WARNING - NGINX Error Rate is 58% [10:51:06] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 76% [10:51:22] :( [10:51:29] PROBLEM - ns1 GDNSD Datacenters on ns1 is CRITICAL: CRITICAL - 6 datacenters are down: 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 51.89.160.142/cpweb, 2001:41d0:800:105a::10/cpweb, 51.161.32.127/cpweb, 2607:5300:205:200::17f6/cpweb [10:51:41] PROBLEM - cp6 Varnish Backends on cp6 is CRITICAL: 4 backends are down. mw4 mw5 mw6 mw7 [10:52:04] paladox, Reception123: we’re down [10:52:09] PROBLEM - cp8 Varnish Backends on cp8 is CRITICAL: 4 backends are down. mw4 mw5 mw6 mw7 [10:52:15] PuppyKun, SPF|Cloud: ^ [10:52:24] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 6 datacenters are down: 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 51.89.160.142/cpweb, 2001:41d0:800:105a::10/cpweb, 51.161.32.127/cpweb, 2607:5300:205:200::17f6/cpweb [10:52:36] PROBLEM - cp7 Varnish Backends on cp7 is CRITICAL: 4 backends are down. mw4 mw5 mw6 mw7 [10:52:41] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 63% [10:52:50] PROBLEM - cp3 Varnish Backends on cp3 is CRITICAL: 4 backends are down. mw4 mw5 mw6 mw7 [10:53:03] PROBLEM - cp7 HTTP 4xx/5xx ERROR Rate on cp7 is CRITICAL: CRITICAL - NGINX Error Rate is 78% [10:54:40] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 56% [10:55:06] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is WARNING: WARNING - NGINX Error Rate is 55% [10:56:17] RhinosF1: on it [10:56:31] SPF|Cloud: thanks [10:56:41] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 64% [10:57:08] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 62% [10:58:24] Examknow: sigma worked in -bots [10:58:46] SPF|Cloud: use !s or !status to know what works if you need to in -bots [10:59:48] huh? [11:00:06] this error doesn't make sense.. [11:00:23] SPF|Cloud: why? [11:02:38] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 45% [11:03:40] ERROR 2026 (HY000): SSL connection error: certificate has expired [11:04:40] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 80% [11:06:40] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 27% [11:07:54] SPF|Cloud: oh? [11:08:12] When? And why isn’t that tracked? [11:08:35] because expiration date is five months away from now [11:08:45] That’s not making sense [11:09:18] !log start mysql restart process on dbt1, causing site outage, services using other database cluster seem fine [11:10:29] PROBLEM - dbt1 MySQL on dbt1 is CRITICAL: Can't connect to MySQL server on '51.77.109.151' (115) [11:10:41] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 67% [11:12:40] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 39% [11:16:40] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 55% [11:20:42] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 60% [11:22:39] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 43% [11:23:05] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is WARNING: WARNING - NGINX Error Rate is 57% [11:25:06] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 79% [11:30:26] Tracked at https://phabricator.miraheze.org/T5675 [11:30:27] [ ⚓ T5675 Error 503 ] - phabricator.miraheze.org [11:30:35] JohnLewis: Live incident if you can’t tell [11:30:41] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 63% [11:30:44] SPF|Cloud is on it [11:32:03] restart process finally finished [11:32:28] RECOVERY - dbt1 MySQL on dbt1 is OK: Uptime: 1089 Threads: 9 Questions: 16 Slow queries: 3 Opens: 19 Flush tables: 1 Open tables: 13 Queries per second avg: 0.014 [11:36:09] JohnLewis: thoughts? we're getting 'SSL certificate expired' errors from mariadb, but the certificate should be valid until October [11:36:58] !log "hwclock --hctosys" on dbt1 [11:37:36] Can you get the OpenSSL output for the certificate deployed on dbt1? [11:38:38] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 49% [11:38:41] https://www.irccloud.com/pastebin/DsTUJ9uQ/ [11:38:41] [ Snippet | IRCCloud ] - www.irccloud.com [11:38:43] SPF|Cloud, JohnLewis: can we disable https://phabricator.miraheze.org/T1979#110760? They’re just not listening [11:38:44] [ ⚓ T1979 Error: 503 Backend fetch failed ] - phabricator.miraheze.org [11:39:57] This only happens on dbt1 right? [11:40:04] but I don't get why mediawiki fails to connect whereas phabricator responds fine [11:40:38] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 61% [11:40:48] Thought Phab was on db6? [11:40:50] this error happens whenever using 'sudo -i mysql -h ' on any database server [11:41:34] it is, and the error occurs there as well [11:41:44] I don’t know without having access currently then [11:42:38] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 57% [11:43:36] JohnLewis: can you deal with https://phabricator.miraheze.org/T1979#110760 on phab quickly. They’re simply not listening and it’s time wasting. [11:43:37] [ ⚓ T1979 Error: 503 Backend fetch failed ] - phabricator.miraheze.org [11:44:39] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 78% [11:46:43] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 57% [11:48:41] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 65% [11:49:29] JohnLewis, SPF|Cloud: Please disable the user. He’s just being rude. [11:50:27] and done [11:50:57] since the error is purely SSL related (unencrypted connections work just fine), I think I'm going to create ssh tunnels between the servers [11:51:11] Ok [11:52:01] Thanks for the disable [11:52:24] !log disable test2 puppet for testing [11:52:40] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 24% [11:53:24] PROBLEM - test2 Puppet on test2 is WARNING: WARNING: Puppet is currently disabled, message: reason not specified, last run 1 minute ago with 1 failures [11:55:45] SPF|Cloud: they’re back as https://phabricator.miraheze.org/T5675#110769 [11:55:46] [ ⚓ T5675 Error 503 ] - phabricator.miraheze.org [11:56:12] or just another pestering person [11:56:46] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 64% [11:57:33] * RhinosF1 wishes people would be nicer in outages and let us work [11:58:30] !log disable puppet on dbt1 and db6 - in the process of deploying stunnel between MW and mysql [11:58:43] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 53% [12:01:35] PROBLEM - db6 Puppet on db6 is WARNING: WARNING: Puppet is currently disabled, message: Workaround SPF, last run 10 minutes ago with 0 failures [12:01:49] PROBLEM - dbt1 Puppet on dbt1 is WARNING: WARNING: Puppet is currently disabled, message: Workaround SPF, last run 10 minutes ago with 0 failures [12:02:31] RhinosF1: I'm around, I can look into it however we have to expect unhappy people when there's downtime, and while yes that one was being rude you can't blame them [12:03:09] Reception123: rude still isn’t helpful [12:03:15] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is WARNING: WARNING - NGINX Error Rate is 59% [12:03:55] Reception123: phab tickets don’t need commentary every minute. Maybe direct them to somewhere real time like IRC & Discord [12:03:59] RhinosF1: of course, but the second user you mentioned isn't being rude, they just want to be informed [12:04:06] well you've already done so [12:04:10] PROBLEM - cp8 HTTPS on cp8 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4290 bytes in 0.322 second response time [12:04:17] Reception123: I have. I meant for any more. [12:04:42] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 66% [12:05:14] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 83% [12:06:39] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 50% [12:08:51] !log install mariadb client on test2 [12:09:13] RECOVERY - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is OK: OK - NGINX Error Rate is 39% [12:10:45] wow, fuck [12:11:00] think I got it [12:11:24] https://support.sectigo.com/Com_KnowledgeDetailPage?Id=kA03l00000117LT [12:11:25] [ Sectigo Knowledge Base ] - support.sectigo.com [12:12:40] ffs [12:12:42] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 84% [12:12:46] Did they warn us? [12:13:12] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 75% [12:13:30] RhinosF1: if they did I think we'd know :P [12:14:10] RhinosF1: there is nothing in my email from sectigo [12:15:45] Reception123: that was more a rhetorical rant [12:15:58] RhinosF1: ah, yeah... [12:16:03] We can look at how to get warned after [12:16:31] RhinosF1: yeah, with the security repo task that was made public one or two weeks ago it seems obvious that many don't actually warn so we just need to check things weekly [12:16:47] Yep [12:17:31] RhinosF1: communication isn't great for many of them, they just post on their blogs or whatever and expect everyone to read them [12:20:41] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 35% [12:21:21] I need to go away for 15 minutes, I think [12:21:39] at least we now know what the cause is, so... [12:21:52] SPF|Cloud: ok, how soon can we get it resolved? [12:22:36] shouldn't take too much time technically, since our Sectigo cert has been cross-signed and thus we can rely on valid roots [12:22:53] but I find the documentation to be very poor [12:24:01] Reception123: if you know how to do that, I think you might want to get us a Let's Encrypt certificate for these servers [12:24:41] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 69% [12:24:44] at least for the hostnames dbt1.miraheze.org, db6.miraheze.org and db7.miraheze.org [12:24:49] and I gotta go now, back soon [12:26:22] I'm not sure how it would work for those I'm afraid, paladox knows how that's set up [12:26:37] And the docs on Meta are down obviously [12:26:41] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 48% [12:28:41] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 61% [12:29:47] PROBLEM - test2 MediaWiki Rendering on test2 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4348 bytes in 0.033 second response time [12:29:52] PROBLEM - jobrunner1 MediaWiki Rendering on jobrunner1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4348 bytes in 0.115 second response time [12:29:54] PROBLEM - mw4 MediaWiki Rendering on mw4 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4348 bytes in 0.031 second response time [12:30:19] PROBLEM - mw6 MediaWiki Rendering on mw6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4348 bytes in 0.032 second response time [12:30:22] PROBLEM - mw5 MediaWiki Rendering on mw5 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4348 bytes in 0.018 second response time [12:31:13] PROBLEM - mw7 MediaWiki Rendering on mw7 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - 4348 bytes in 0.111 second response time [12:34:40] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 55% [12:35:19] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is WARNING: WARNING - NGINX Error Rate is 54% [12:37:19] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 75% [12:39:38] back [12:40:10] great [12:40:16] * RhinosF1 just updated phab [12:40:39] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 38% [12:43:22] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is WARNING: WARNING - NGINX Error Rate is 40% [12:44:40] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 63% [12:49:25] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 68% [12:50:40] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 45% [12:52:39] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 79% [12:54:38] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 53% [12:55:08] Reception123: as at now, everyone that’s tweeted us has had a reply. [12:55:16] That I saw and was in english [12:55:54] Great, thanks! [12:56:40] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 67% [12:57:04] Reception123: check facebook. I’ve not looked at that. [12:58:40] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 50% [13:01:07] There's been a notice there [13:02:07] Reception123: so you don't happen to know by chance how to get us an LE certificate for those domain anmes? [13:04:40] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 65% [13:05:31] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is WARNING: WARNING - NGINX Error Rate is 51% [13:07:30] PROBLEM - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is CRITICAL: CRITICAL - NGINX Error Rate is 66% [13:08:00] [02miraheze/dns] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JfK7I [13:08:02] [02miraheze/dns] 07Southparkfan 03025fb31 - Add temp acme records [13:08:39] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 37% [13:08:51] PROBLEM - ns2 GDNSD Datacenters on ns2 is UNKNOWN: NRPE: Unable to read output [13:10:53] PROBLEM - ns2 GDNSD Datacenters on ns2 is CRITICAL: CRITICAL - 6 datacenters are down: 128.199.139.216/cpweb, 2400:6180:0:d0::403:f001/cpweb, 51.89.160.142/cpweb, 2001:41d0:800:105a::10/cpweb, 51.161.32.127/cpweb, 2607:5300:205:200::17f6/cpweb [13:13:17] finally got a working certificate [13:13:48] [02miraheze/mw-config] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JfK7C [13:13:49] [02miraheze/mw-config] 07Southparkfan 03dd7be6e - Change CA file for database [13:13:52] Reception123: ^ please deploy that change [13:14:17] ok, running puppet [13:15:02] !log created a new certificate for dbt1/db6/db7.miraheze.org using LE, files present in /etc/letsencrypt/live/dbt1.miraheze.org on jobrunner1, dbt1 restarting now, going to do db6 now as well [13:15:41] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 54% [13:15:55] SPF|Cloud: puppet ran on mw* [13:16:05] thanks [13:16:30] PROBLEM - dbt1 MySQL on dbt1 is CRITICAL: Can't connect to MySQL server on '51.77.109.151' (115) [13:17:38] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 83% [13:17:39] PROBLEM - db6 MySQL on db6 is CRITICAL: Can't connect to MySQL server on '51.89.160.130' (115) [13:17:40] PROBLEM - phab1 phab.miraheze.wiki HTTPS on phab1 is CRITICAL: HTTP CRITICAL - Invalid HTTP response received from host on port 443: HTTP/1.1 500 Internal Server Error [13:19:09] PROBLEM - phab1 phabricator.miraheze.org HTTPS on phab1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 4189 bytes in 0.041 second response time [13:19:35] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is WARNING: WARNING - NGINX Error Rate is 59% [13:21:00] PROBLEM - cp7 HTTP 4xx/5xx ERROR Rate on cp7 is WARNING: WARNING - NGINX Error Rate is 59% [13:21:33] PROBLEM - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is CRITICAL: CRITICAL - NGINX Error Rate is 73% [13:22:18] RECOVERY - mw6 MediaWiki Rendering on mw6 is OK: HTTP OK: HTTP/1.1 200 OK - 18383 bytes in 0.278 second response time [13:22:29] RECOVERY - dbt1 MySQL on dbt1 is OK: Uptime: 565 Threads: 20 Questions: 18972 Slow queries: 592 Opens: 3207 Flush tables: 1 Open tables: 3201 Queries per second avg: 33.578 [13:22:34] RECOVERY - mw5 MediaWiki Rendering on mw5 is OK: HTTP OK: HTTP/1.1 200 OK - 18374 bytes in 0.031 second response time [13:22:35] RECOVERY - cp8 HTTPS on cp8 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 1694 bytes in 0.401 second response time [13:22:38] RECOVERY - cp7 Varnish Backends on cp7 is OK: All 7 backends are healthy [13:22:49] RECOVERY - ns2 GDNSD Datacenters on ns2 is OK: OK - all datacenters are online [13:22:58] RECOVERY - cp7 HTTP 4xx/5xx ERROR Rate on cp7 is OK: OK - NGINX Error Rate is 15% [13:23:08] RECOVERY - cp6 Varnish Backends on cp6 is OK: All 7 backends are healthy [13:23:10] RECOVERY - mw7 MediaWiki Rendering on mw7 is OK: HTTP OK: HTTP/1.1 200 OK - 18374 bytes in 0.031 second response time [13:23:15] RECOVERY - phab1 phabricator.miraheze.org HTTPS on phab1 is OK: HTTP OK: HTTP/1.1 200 OK - 18982 bytes in 5.581 second response time [13:23:34] RECOVERY - cp8 HTTP 4xx/5xx ERROR Rate on cp8 is OK: OK - NGINX Error Rate is 9% [13:23:35] RECOVERY - cp3 HTTP 4xx/5xx ERROR Rate on cp3 is OK: OK - NGINX Error Rate is 6% [13:23:39] RECOVERY - db6 MySQL on db6 is OK: Uptime: 88 Threads: 154 Questions: 13850 Slow queries: 860 Opens: 1936 Flush tables: 1 Open tables: 1930 Queries per second avg: 157.386 [13:23:47] RECOVERY - cp8 Varnish Backends on cp8 is OK: All 7 backends are healthy [13:23:52] RECOVERY - phab1 phab.miraheze.wiki HTTPS on phab1 is OK: HTTP OK: Status line output matched "HTTP/1.1 200" - 17692 bytes in 0.281 second response time [13:23:52] RECOVERY - test2 MediaWiki Rendering on test2 is OK: HTTP OK: HTTP/1.1 200 OK - 18374 bytes in 0.018 second response time [13:23:53] RECOVERY - jobrunner1 MediaWiki Rendering on jobrunner1 is OK: HTTP OK: HTTP/1.1 200 OK - 18374 bytes in 0.032 second response time [13:24:02] RECOVERY - mw4 MediaWiki Rendering on mw4 is OK: HTTP OK: HTTP/1.1 200 OK - 18375 bytes in 0.019 second response time [13:24:04] SPF|Cloud: congrats [13:24:05] victory! [13:24:17] RECOVERY - cp3 Varnish Backends on cp3 is OK: All 7 backends are healthy [13:24:36] SPF|Cloud: well done [13:25:01] incident declared over at 15:23 CEST [13:25:31] RECOVERY - ns1 GDNSD Datacenters on ns1 is OK: OK - all datacenters are online [13:26:03] PROBLEM - db6 Current Load on db6 is CRITICAL: CRITICAL - load average: 8.56, 5.58, 2.44 [13:26:21] going to fix db7 now as well [13:26:58] !log disable puppet on db7 to apply patch there too [13:27:02] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [13:27:28] SPF|Cloud: yeah, test2 is down. [13:27:38] known [13:27:39] Everything else fine [13:27:46] !s [13:27:46] cleaning that up as well [13:27:48] Please wait while I check the status of Miraheze Services. [13:27:49] RhinosF1: Status report finished. There are currently 1 dead services and 5 alive services. To view the full report, say !status. [13:27:53] Weird bot [13:27:58] !log remove stunnel package from test2 and enable puppet [13:28:02] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [13:28:03] RECOVERY - db6 Current Load on db6 is OK: OK - load average: 3.95, 4.87, 2.56 [13:28:18] grafana [13:29:06] Reception123: can you tweet and facebook and discord the news? [13:29:23] PROBLEM - test2 Puppet on test2 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 38 seconds ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_MediaWiki config] [13:29:35] ok [13:30:45] PROBLEM - db7 Puppet on db7 is WARNING: WARNING: Puppet is currently disabled, message: Workaround CA issue SPF, last run 9 minutes ago with 0 failures [13:33:47] PROBLEM - db7 MySQL on db7 is CRITICAL: Can't connect to MySQL server on '51.89.160.143' (115) [13:33:50] "certificate expired" yeah, what certificate MariaDB? [13:34:06] be more clear next time please [13:34:35] Let’s get an incident report and maybe look at being alerted about these things and better errors [13:34:45] Otherwise, well done everyone [13:37:49] RECOVERY - db7 MySQL on db7 is OK: Uptime: 119 Threads: 8 Questions: 203 Slow queries: 5 Opens: 97 Flush tables: 1 Open tables: 91 Queries per second avg: 1.705 [13:38:29] SPF|Cloud: are wiki creations supposed to fail with a db error? See 33ed574c08d07154f2673c71 on mw7 [13:43:02] !log change TLS settings for db7 replication to reflect changes: https://meta.miraheze.org/wiki/Tech:MariaDB#Replicate_from_master [13:43:02] [ Tech:MariaDB - Miraheze Meta ] - meta.miraheze.org [13:43:06] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [13:43:40] RhinosF1: 2020-05-30 13:37:42 mw7 metawiki: [33ed574c08d07154f2673c71] /wiki/Special:RequestWikiQueue/12334 Wikimedia\Rdbms\DBQueryError from line 1603 of /srv/mediawiki/w/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? Query: INSERT INTO `matomo` (matomo_id,matomo_wiki) VALUES (NULL,'fswahlenwiki') Function: [13:43:40] MatomoAnalytics::addSite Error: 1048 Column 'matomo_id' cannot be null (dbt1.miraheze.org) [13:44:21] SPF|Cloud: Is that supposed to happen or can we create wikis? [13:44:58] https://github.com/miraheze/MatomoAnalytics/blob/master/includes/MatomoAnalytics.php#L31 [13:44:59] [ MatomoAnalytics/MatomoAnalytics.php at master · miraheze/MatomoAnalytics · GitHub ] - github.com [13:45:06] ah, matomo is still broken :) [13:46:01] Urgh [13:46:41] SPF|Cloud: https://fswahlen.miraheze.org/wiki/Hauptseite is the wiki. Ping me when we’re fixed [13:46:43] [ Hauptseite – Foodsharing - AG Wahlen ] - fswahlen.miraheze.org [13:48:00] [02miraheze/puppet] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JfK5n [13:48:02] [02miraheze/puppet] 07Southparkfan 03f34cd59 - Matomo: use LE as CA for database connection [13:48:14] [02miraheze/puppet] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JfK5c [13:48:16] [02miraheze/puppet] 07Southparkfan 03bc09cce - Grafana: use LE as CA for database connection [13:48:41] I was just going to say grafana [13:49:03] SPF|Cloud: icinga web? [13:49:19] All configured authentication methods failed. Please check the system log or Icinga Web 2 log for more information. [13:49:29] that'd be useful [13:49:53] [02miraheze/puppet] 07Southparkfan pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JfK58 [13:49:55] [02miraheze/puppet] 07Southparkfan 03eb04c8b - Icinga: use LE as CA for database connection [13:50:29] .in 15mins try to access icinga [13:50:31] RhinosF1: Okay, will remind at 2020-05-30 - 15:05:30BST [13:50:44] Hi Sario, you just missed a 2.5 hour outage [13:51:32] PROBLEM - cp6 Stunnel Http for mon1 on cp6 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 311 bytes in 0.007 second response time [13:51:47] SPF|Cloud: ^ [13:51:55] seems fine with me [13:52:39] RhinosF1: I saw the logs [13:52:56] SPF|Cloud: ok [13:52:59] Sario: ah [13:53:10] Giod job everyone getting things back up [13:53:21] s/Giod/Good [13:53:21] Sario meant to say: Good job everyone getting things back up [13:53:22] PROBLEM - mon1 HTTPS on mon1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 311 bytes in 0.004 second response time [13:53:28] PROBLEM - cp8 Stunnel Http for mon1 on cp8 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 311 bytes in 0.237 second response time [13:53:40] PROBLEM - cp7 Stunnel Http for mon1 on cp7 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 311 bytes in 0.002 second response time [13:53:42] Why’s things still failing [13:53:42] PROBLEM - mon1 grafana.miraheze.org HTTPS on mon1 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 311 bytes in 0.004 second response time [13:53:47] RhinosF1: which wikis did you create? [13:53:49] SPF|Cloud: that can’t be good [13:53:55] SPF|Cloud: just the one I linked [13:53:57] PROBLEM - cp3 Stunnel Http for mon1 on cp3 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 311 bytes in 0.760 second response time [13:54:09] https://fswahlen.miraheze.org/wiki/Hauptseite [13:54:09] [ Hauptseite – Foodsharing - AG Wahlen ] - fswahlen.miraheze.org [13:54:11] !s [13:54:15] Please wait while I check the status of Miraheze Services. [13:54:16] RhinosF1: Status report finished. There are currently 1 dead services and 5 alive services. To view the full report, say !status. [13:54:35] PROBLEM - mon1 Puppet on mon1 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[php7.3-apcu] [13:55:28] !log ran MatomoAnalytics::addSite( 'fswahlenwiki' ); [13:55:34] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [13:56:19] SPF|Cloud: needs createMainPage and local rights checking the closing the request via sql [14:01:18] !log manually promoted requested for fswahlenwiki to sysop/bureaucrat, created main page [14:01:25] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [14:01:37] SPF|Cloud: and close request via sql [14:01:53] and adding a log record [14:02:22] RECOVERY - mon1 Puppet on mon1 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [14:03:13] SPF|Cloud: you can add to the farmer log via sql sanely? [14:03:45] Do we normally do that when stuff are weird [14:05:30] RhinosF1: try to access icinga [14:06:32] SPF|Cloud: icinga login still down [14:06:39] Ldap? [14:06:48] I'm doings thousands of things at the same time [14:06:59] paladox ^ your area [14:07:04] I can see [14:07:21] !log ran the following due to partially failed creation: $wm = new WikiManager( 'fswahlenwiki' ); $wm->notificationsTrigger( 'creation', 'fswahlenwiki', [ 'siteName' => 'Foodsharing - AG Wahlen' ], 'Heinz' ); [14:07:25] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [14:07:59] !log manually inserting a log entry for fswahlen does not seem to be possible via logEntry() in WikiManager when executed from eval.php, due to wrong context [14:08:02] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [14:15:23] Hi Voidwalker [14:16:28] hey [15:07:40] RECOVERY - cp6 Stunnel Http for mon1 on cp6 is OK: HTTP OK: HTTP/1.1 200 OK - 30244 bytes in 0.050 second response time [15:08:15] [02miraheze/puppet] 07paladox pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JfKbk [15:08:17] [02miraheze/puppet] 07paladox 03df17737 - grafana: Temporarily change server_cert_name to db6.miraheze.org [15:08:21] RECOVERY - cp7 Stunnel Http for mon1 on cp7 is OK: HTTP OK: HTTP/1.1 200 OK - 30249 bytes in 0.015 second response time [15:08:35] RECOVERY - cp8 Stunnel Http for mon1 on cp8 is OK: HTTP OK: HTTP/1.1 200 OK - 30249 bytes in 0.338 second response time [15:09:04] RECOVERY - mon1 HTTPS on mon1 is OK: HTTP OK: HTTP/1.1 200 OK - 30280 bytes in 0.011 second response time [15:09:16] RECOVERY - cp3 Stunnel Http for mon1 on cp3 is OK: HTTP OK: HTTP/1.1 200 OK - 30280 bytes in 1.014 second response time [15:09:20] RECOVERY - mon1 grafana.miraheze.org HTTPS on mon1 is OK: HTTP OK: HTTP/1.1 200 OK - 30280 bytes in 0.013 second response time [15:10:14] PROBLEM - mon1 Puppet on mon1 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 8 minutes ago with 0 failures [15:15:15] * hispano76 greetings [15:26:29] hi hispano76 [15:54:33] Hello RazorDick! If you have any questions, feel free to ask and someone should answer soon. [15:55:01] Hello longdong! If you have any questions, feel free to ask and someone should answer soon. [18:08:07] Hello, who's more familiar with the abuse filters? [18:08:40] Hey, what you exactly need? [18:13:06] [02mw-config] 07The-Voidwalker opened pull request 03#3091: cvt -> global sysop - 13https://git.io/JfKh7 [18:13:16] Reception123, ^ [18:14:55] Voidwalker: thanks! [18:15:30] [02mw-config] 07Reception123 closed pull request 03#3091: cvt -> global sysop - 13https://git.io/JfKh7 [18:15:32] [02miraheze/mw-config] 07Reception123 pushed 031 commit to 03master [+0/-0/±1] 13https://git.io/JfKhA [18:15:33] [02miraheze/mw-config] 07The-Voidwalker 03174546c - cvt -> global sysop (#3091) [18:18:51] Well, I'm seeing some filters from other Wikimedia wikis but as I don't know if they would work as I wish. Which one do you recommend? The filter would be to prevent ips from publishing the word "Diabetes" on discussion pages that use Flow (for CommonsWiki) [18:18:55] Excuse me [18:20:07] For example https://es.wikivoyage.org/wiki/Especial:FiltroAntiAbusos/9 I'm not sure about using this [18:20:08] [ Editar el filtro antiabusos - Wikiviajes ] - es.wikivoyage.org [18:28:00] Reception123: I said that [18:32:53] RhinosF1: oh, must've missed it then [18:32:58] I'm realizing strange about the translation :) [18:33:08] I mean I knew it wasn't in the local group but I didn't realize it's because it had to be changed in LS since MW prohibits that [18:35:30] Reception123: since when would centralauth-lock be in MW? [18:35:41] That’d be mayhem [18:36:04] RhinosF1: yeah didn't think about it for long enough :P [18:36:37] Reception123: I think we all need a holiday [18:37:04] RhinosF1: I agree [18:38:01] Reception123: if you’re one of the few of us that drink, you deserve one [18:38:32] RhinosF1: I don't really, you can give me a soda if you want :P [18:38:53] though every volunteer here deserves a drink [18:39:04] Reception123: that’s nearly all our volunteers [18:39:23] It’d be a cheap night out [18:41:04] yup :) [18:43:17] At least I’ll come out with a bigger wallet than with anyone else [18:45:05] RhinosF1: well make a volunteer welfare fund and we each contribute to it ;) [18:45:33] :) [20:32:46] RECOVERY - db7 Puppet on db7 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:36:45] [02miraheze/ssl] 07paladox pushed 031 commit to 03paladox-patch-1 [+0/-0/±1] 13https://git.io/Jf6JW [20:36:47] [02miraheze/ssl] 07paladox 039139520 - Update certificate [20:36:48] PROBLEM - db7 Puppet on db7 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 4 minutes ago with 0 failures [20:36:48] [02ssl] 07paladox created branch 03paladox-patch-1 - 13https://git.io/vxP9L [20:36:50] [02ssl] 07paladox opened pull request 03#313: Update certificate - 13https://git.io/Jf6Jl [20:37:50] paladox: make sure to document on incident report [20:37:55] And did you fix ldap [20:38:19] Icinga login isn’t working [20:39:13] nope [20:39:36] Ok [20:44:16] PROBLEM - ldap1 Puppet on ldap1 is WARNING: WARNING: Puppet is currently disabled, message: paladox, last run 3 minutes ago with 0 failures [21:05:10] PROBLEM - cp7 Current Load on cp7 is CRITICAL: CRITICAL - load average: 3.87, 4.11, 2.60 [21:07:10] RECOVERY - cp7 Current Load on cp7 is OK: OK - load average: 1.81, 3.23, 2.46 [21:52:19] .status login around [21:52:20] Examknow: Updating User:Examknow/Status to around! [21:52:28] Examknow: Updated! [22:08:49] hello everyone [22:10:14] Hello EK [22:25:25] [02miraheze/ssl] 07paladox pushed 031 commit to 03paladox-patch-2 [+0/-0/±1] 13https://git.io/Jf6To [22:25:27] [02miraheze/ssl] 07paladox 039b72e79 - Update Sectigo ca & wildcard.miraheze.org This drops AddTrust External Root Certificate. Root certificates got from https://support.sectigo.com/articles/Knowledge/Sectigo-Intermediate-Certificates?retURL=%2Fapex%2FCom_KnowledgeWeb2Casepagesectigo&popup=false [22:25:27] [ Comodo Knowledge Base ] - support.sectigo.com [22:25:28] [02ssl] 07paladox created branch 03paladox-patch-2 - 13https://git.io/vxP9L [22:25:30] [02ssl] 07paladox opened pull request 03#314: Update Sectigo ca & wildcard.miraheze.org - 13https://git.io/Jf6TK [22:25:59] [02miraheze/ssl] 07paladox pushed 031 commit to 03paladox-patch-2 [+0/-0/±1] 13https://git.io/Jf6T6 [22:26:00] [02miraheze/ssl] 07paladox 034532720 - Update wildcard.miraheze.org.crt [22:26:02] [02ssl] 07paladox synchronize pull request 03#314: Update Sectigo ca & wildcard.miraheze.org - 13https://git.io/Jf6TK [22:31:53] [02ssl] 07paladox closed pull request 03#314: Update Sectigo ca & wildcard.miraheze.org - 13https://git.io/Jf6TK [22:31:54] [02miraheze/ssl] 07paladox pushed 031 commit to 03master [+0/-0/±2] 13https://git.io/Jf6Ty [22:31:56] [02miraheze/ssl] 07paladox 03f10ccea - Update Sectigo ca & wildcard.miraheze.org (#314) * Update Sectigo ca & wildcard.miraheze.org This drops AddTrust External Root Certificate. Root certificates got from https://support.sectigo.com/articles/Knowledge/Sectigo-Intermediate-Certificates?retURL=%2Fapex%2FCom_KnowledgeWeb2Casepagesectigo&popup=false * Update wildcard.miraheze.org.crt [22:31:56] [ Comodo Knowledge Base ] - support.sectigo.com [22:31:57] [02miraheze/ssl] 07paladox deleted branch 03paladox-patch-2 [22:31:59] [02ssl] 07paladox deleted branch 03paladox-patch-2 - 13https://git.io/vxP9L [22:32:25] paladox: what timezone are you in? [22:32:28] just curious [22:32:32] UK [22:32:53] k [22:35:50] !log nginx reload on cp7 - does not automatically reload on wildcard.miraheze.org.crt changes [22:35:53] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [22:38:59] !log nginx reload on cp[368] - does not automatically reload on wildcard.miraheze.org.crt changes [22:39:03] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [22:41:54] RECOVERY - db7 Puppet on db7 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [22:49:40] !log nginx reload on m[4567], jobrunner1 & test2 - does not automatically reload on wildcard.miraheze.org.crt changes [22:49:44] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [22:49:47] !log restart postfix on misc1 [22:49:50] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [22:50:30] !log restart dovecot on misc1 [22:50:34] Logged the message at https://meta.miraheze.org/wiki/Tech:Server_admin_log, Master [23:05:38] RECOVERY - ldap1 Puppet on ldap1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures