[01:04:05] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.214 second response time [01:07:47] PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:18:43] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 3.390 second response time [01:22:29] PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:43:19] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 8.436 second response time [02:47:01] PROBLEM - pdfrender on scb1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:05:48] (03PS1) 10Tulsi Bhagat: Enable 'extendedmover' user group at en.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481538 [03:14:16] (03PS2) 10Tulsi Bhagat: Enable 'extendedmover' user group at en.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481538 (https://phabricator.wikimedia.org/T212662) [03:31:23] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 933.63 seconds [04:16:27] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 270.63 seconds [06:28:03] PROBLEM - netbox HTTPS on netmon1002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 547 bytes in 0.011 second response time [06:29:11] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:30:19] PROBLEM - Check systemd state on netmon2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:33:03] PROBLEM - puppet last run on mw1278 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ImageMagick-6/policy.xml] [06:37:41] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational [06:38:49] RECOVERY - Check systemd state on netmon2001 is OK: OK - running: The system is fully operational [06:41:19] PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:59:05] RECOVERY - puppet last run on mw1278 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:38:31] RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational [07:38:33] RECOVERY - netbox HTTPS on netmon1002 is OK: HTTP OK: HTTP/1.1 302 Found - 348 bytes in 0.555 second response time [08:05:37] (03PS1) 10Fomafix: Add 'sgs' as alias for 'bat-smg' [dns] - 10https://gerrit.wikimedia.org/r/481539 (https://phabricator.wikimedia.org/T204830) [08:05:59] (03PS1) 10Fomafix: Add redirects from 'sgs' to 'bat-smg' [puppet] - 10https://gerrit.wikimedia.org/r/481540 (https://phabricator.wikimedia.org/T204830) [09:17:25] 10Operations, 10Wikimedia-Mailing-lists: Update Wikimedia logo on Mailman web pages from colored version to black and white version - https://phabricator.wikimedia.org/T212674 (10jrbs) The black and white logo is just for the Foundation. The coloured logo is still used for community-related activity, which I w... [09:21:37] !log restart pdfrender on scb1004 [09:21:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:03] RECOVERY - pdfrender on scb1004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time [11:43:39] !log restarted pdfrender on scb1004 [11:43:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:40] !log manually recreated /dev/log symlink on kubernetes1001, restarting systemd-journald.socket didn't worked (this should fix cron-spam emails from the host every hour) [11:52:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:02:52] 10Operations: /dev/log symlink to /run/systemd/journal/dev-log disappeared on kubernetes1001 - https://phabricator.wikimedia.org/T212681 (10Volans) [12:03:06] 10Operations: /dev/log symlink to /run/systemd/journal/dev-log disappeared on kubernetes1001 - https://phabricator.wikimedia.org/T212681 (10Volans) p:05Triage→03Normal [12:03:20] I've created this ^^^ for the kubernetes1001 /dev/log issue [12:31:38] Heads up, deploying a config change [12:34:29] (03Abandoned) 10MarcoAurelio: Revert "Temporary remove AbuseFilter autoshutoff for mediawikiwiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481529 (owner: 10MarcoAurelio) [12:34:57] !log bawolff@deploy1001 Synchronized private/PrivateSettings.php: T212667 - make spam mitigation global (duration: 00m 49s) [12:35:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:35:01] T212667: Emergency measure: Set wgAccountCreationThrottle => 2 - https://phabricator.wikimedia.org/T212667 [12:43:04] (03PS1) 10MarcoAurelio: Amend mediawiki AbuseFilter configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481543 (https://phabricator.wikimedia.org/T212667) [12:45:25] (03PS2) 10MarcoAurelio: Amend mediawiki AbuseFilter configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481543 (https://phabricator.wikimedia.org/T212667) [12:46:05] (03CR) 10MarcoAurelio: Amend mediawiki AbuseFilter configuration (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481543 (https://phabricator.wikimedia.org/T212667) (owner: 10MarcoAurelio) [12:46:57] (03PS3) 10MarcoAurelio: Amend mediawiki AbuseFilter configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481543 (https://phabricator.wikimedia.org/T212667) [13:30:23] !log bawolff@deploy1001 Synchronized private/PrivateSettings.php: T212667 - adjust spam block (duration: 00m 44s) [13:30:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:30:27] T212667: Emergency measure: Set wgAccountCreationThrottle => 2 - https://phabricator.wikimedia.org/T212667 [13:32:16] frrwiki has some weird exceptions " [XCd0OgpAMEwAAGNets4AAAAK] /w/load.php?debug=false&lang=frr&modules=startup&only=scripts&skin=minerva&target=mobile Exception from line 1535 of /srv/mediawiki/php-1.33.0-wmf.9/includes/resourceloader/ResourceLoader.php: JSON serialization of config data failed. This usually means the config data is not valid UTF-8." [13:32:29] never seen that error before, but it doesn't sound good [13:34:20] Looks like its happening at a rate of about 1 every 5 minutes [13:34:42] also at commons when frr language is set [13:35:05] Maybe something invalid in a language file [13:36:33] (03CR) 10Daimona Eaytoy: Amend mediawiki AbuseFilter configuration (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481543 (https://phabricator.wikimedia.org/T212667) (owner: 10MarcoAurelio) [13:44:21] Oh, from ULS appearently [13:45:54] Hey bawolff, Hauskatze! I came across T212667, how's that going? [13:45:54] T212667: Emergency measure: Set wgAccountCreationThrottle => 2 - https://phabricator.wikimedia.org/T212667 [13:46:58] Daimona: well, it's "on"going [13:47:15] perhaps we should find a quieter place to discuss given that this is a public channel [13:47:25] btw leaving for lunch, bbl [13:47:41] Definitely yes, although the task is public [13:48:03] that one ;) [14:18:17] (03CR) 10MarcoAurelio: Amend mediawiki AbuseFilter configuration (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481543 (https://phabricator.wikimedia.org/T212667) (owner: 10MarcoAurelio) [14:18:51] (03PS1) 10Brian Wolff: Temporary increase account creation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481546 (https://phabricator.wikimedia.org/T212667) [14:19:50] (03CR) 10jerkins-bot: [V: 04-1] Temporary increase account creation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481546 (https://phabricator.wikimedia.org/T212667) (owner: 10Brian Wolff) [14:19:58] whoops [14:20:58] phpcs will be the death of me [14:21:42] (03PS2) 10Brian Wolff: Temporary increase account creation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481546 (https://phabricator.wikimedia.org/T212667) [14:22:48] (03PS3) 10Brian Wolff: Temporary make account creation limits more restrictive [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481546 (https://phabricator.wikimedia.org/T212667) [14:22:55] Third time's the charm [14:26:09] (03CR) 10Rxy: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481546 (https://phabricator.wikimedia.org/T212667) (owner: 10Brian Wolff) [14:26:24] (03CR) 10Brian Wolff: [C: 03+2] Temporary make account creation limits more restrictive [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481546 (https://phabricator.wikimedia.org/T212667) (owner: 10Brian Wolff) [14:27:34] (03Merged) 10jenkins-bot: Temporary make account creation limits more restrictive [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481546 (https://phabricator.wikimedia.org/T212667) (owner: 10Brian Wolff) [14:30:23] !log bawolff@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T212667 fe72284c Adjust account throttle limits (duration: 00m 46s) [14:30:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:27] T212667: Create mitigations for account creation spam attack [public task] - https://phabricator.wikimedia.org/T212667 [14:31:31] [{exception_id}] {exception_url} DomainException from line 353 of /srv/mediawiki/php-1.33.0-wmf.9/vendor/firebase/php-jwt/src/JWT.php: Unknown JSON error: 5 <-- Huh. I wonder if that's oauth related [14:31:38] (03PS4) 10MarcoAurelio: Amend mediawiki AbuseFilter configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481543 (https://phabricator.wikimedia.org/T212667) [14:31:47] (03PS5) 10MarcoAurelio: Amend mediawiki AbuseFilter configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481543 (https://phabricator.wikimedia.org/T212667) [14:32:00] Hauskatze: Do you want to try the mass deletion on the 2nd? [14:32:48] marostegui: we could, but if you can give me a minute or so, I am in the middle of something :) [14:32:59] Daimona: does the patch looks okay to you now? [14:33:07] I'm mirroring Meta settings [14:33:07] Hauskatze: Sure, no rush of course! [14:38:44] (03CR) 10jenkins-bot: Temporary make account creation limits more restrictive [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481546 (https://phabricator.wikimedia.org/T212667) (owner: 10Brian Wolff) [14:47:30] (03CR) 10MarcoAurelio: Amend mediawiki AbuseFilter configuration (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481543 (https://phabricator.wikimedia.org/T212667) (owner: 10MarcoAurelio) [14:49:20] (03CR) 10Rxy: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481543 (https://phabricator.wikimedia.org/T212667) (owner: 10MarcoAurelio) [14:50:18] (03CR) 10Brian Wolff: [C: 03+2] Amend mediawiki AbuseFilter configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481543 (https://phabricator.wikimedia.org/T212667) (owner: 10MarcoAurelio) [14:51:21] (03Merged) 10jenkins-bot: Amend mediawiki AbuseFilter configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481543 (https://phabricator.wikimedia.org/T212667) (owner: 10MarcoAurelio) [14:52:04] (03CR) 10jenkins-bot: Amend mediawiki AbuseFilter configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481543 (https://phabricator.wikimedia.org/T212667) (owner: 10MarcoAurelio) [14:53:50] !log bawolff@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T212667 218371fd35 - Adjust mw.org abusefilter emergency shutoff threshold down to 0.3 (duration: 00m 46s) [14:53:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:54] T212667: Create mitigations for account creation spam attack [public task] - https://phabricator.wikimedia.org/T212667 [18:23:05] https://phabricator.wikimedia.org/T200820 [18:23:23] could someone look at this please? ^ [18:23:38] I am waiting for 2 weeks to upload the files [18:24:10] and also https://phabricator.wikimedia.org/T212101 [19:28:00] 10Operations, 10Wikidata: DBQueryTimeoutError on Wikidata's Special:Nuke - https://phabricator.wikimedia.org/T212690 (10abian) [19:34:17] 10Operations, 10MediaWiki-Database, 10Wikidata: DBQueryTimeoutError on Wikidata's Special:Nuke - https://phabricator.wikimedia.org/T212690 (10Marostegui) [19:36:32] 10Operations, 10MediaWiki-Database, 10Wikidata: DBQueryTimeoutError on Wikidata's Special:Nuke - https://phabricator.wikimedia.org/T212690 (10Marostegui) Does it happen all the time or was it a punctual error? [19:41:52] 10Operations, 10MediaWiki-Database, 10Wikidata: DBQueryTimeoutError on Wikidata's Special:Nuke - https://phabricator.wikimedia.org/T212690 (10abian) It happens every time I leave the pattern field blank, even when the maximum number of pages is, for example, 5. [20:01:05] 10Operations, 10MediaWiki-Database, 10Wikidata, 10Wikimedia-production-error: DBQueryTimeoutError on Wikidata's Special:Nuke - https://phabricator.wikimedia.org/T212690 (10Peachey88) [20:53:33] PROBLEM - Host backup2001 is DOWN: PING CRITICAL - Packet loss = 100% [20:54:35] RECOVERY - Host backup2001 is UP: PING OK - Packet loss = 0%, RTA = 36.17 ms [22:05:53] 10Operations, 10CheckUser, 10MediaWiki-Database, 10Wikimedia-production-error: DBQueryTimeoutError on Wikimedia Login's Special:CheckUser - https://phabricator.wikimedia.org/T212692 (10alanajjar) [22:21:47] PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received [22:25:19] RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy