[00:01:40] be advised jimo account comprimised suggest read only mode? [00:02:34] Thanks foks and Steven [00:02:43] hopefully this won't get a media story [00:02:45] -_- [00:03:32] As a note, Ourmine's classic operations is that they give you your account back after you email/talk to them [00:04:54] Zppix: hmmmm ? [00:06:06] Dereckson jimbo wales account on enwiki was hacked [00:06:25] assuming its also globally linked so ya [00:06:38] Zppix: that's something to solve at stewards level on #wikimedia-stewards [00:07:24] Zppix: Jimbo Wales doesn't have access to the servers according modules/admin/data/data.yaml [00:27:29] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:27:39] PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:28:29] PROBLEM - cxserver endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:29:29] RECOVERY - cxserver endpoints health on scb1003 is OK: All endpoints are healthy [00:30:29] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [00:42:59] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [00:51:39] Hi [00:51:55] hi [00:51:59] What's the public facing statement re the various claims flying around? [00:52:06] what claims? [00:52:19] PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: ocg_job_status 860535 msg (=800000 warning): ocg_render_job_queue 3042 msg (=3000 critical) [00:52:19] PROBLEM - OCG health on ocg1002 is CRITICAL: CRITICAL: ocg_job_status 860539 msg (=800000 warning): ocg_render_job_queue 3041 msg (=3000 critical) [00:52:19] PROBLEM - OCG health on ocg1001 is CRITICAL: CRITICAL: ocg_job_status 860546 msg (=800000 warning): ocg_render_job_queue 3048 msg (=3000 critical) [00:52:52] I'm hearing that there's been a possible compromise of a small number of senior acccount on English Wikipedia, one of which was used to vandalise Trumps's BLP [00:53:24] the effected accounts have been locked [00:54:32] Okay... My understanding was that this wasn't as far as is Known a wider issue at present [00:55:13] ShakespeareFan00: No one can really comment at the moment. It's still under investigation [00:55:20] I'm sure a statement will be made at an appropriate time [00:55:33] It would be nice however if someone posted an intreim statement in -en , if only to calm things down... [00:55:48] But I fully understand , there might not be much that can be said right now [00:55:51] * ShakespeareFan00 out [00:56:08] to add to the above it would also be nice to know what all was comprimised [00:57:39] RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [00:57:45] just sit back and relax, at the end of the day, accounts can be locked, edits reverted and hidden if needed. [00:58:35] And if you know you've got a crappy password, why not fix that, and make sure it's unique for Wikipedia [01:01:10] we've already passed along that message to the good citizens of -en [01:05:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [01:07:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [01:22:19] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [02:01:26] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2789686 (10matmarex) I tried to bisect between IM 6.8.3-1... [02:02:00] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [02:03:19] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [02:03:59] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [02:05:29] PROBLEM - Disk space on ocg1002 is CRITICAL: DISK CRITICAL - free space: /mnt/tmpfs 731 MB (2% inode=99%) [02:06:19] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [02:06:29] RECOVERY - Disk space on ocg1002 is OK: DISK OK [02:06:59] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [02:09:19] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [02:10:59] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [02:11:19] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [02:17:11] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.2) (duration: 05m 30s) [02:17:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:21:46] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Nov 12 02:21:40 UTC 2016 (duration 4m 29s) [02:21:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:22:37] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2789719 (10matmarex) Looks like this is a part of a chain... [02:56:59] PROBLEM - puppet last run on labnet1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:02:52] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2789727 (10matmarex) IM 7 seems to have diverged from IM... [03:05:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [03:06:04] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2789728 (10matmarex) For future reference, because there... [03:06:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [03:18:29] PROBLEM - puppet last run on db1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:23:19] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 610.75 seconds [03:24:59] RECOVERY - puppet last run on labnet1002 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [03:28:19] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 621.94 seconds [03:44:19] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 296.94 seconds [03:46:29] RECOVERY - puppet last run on db1021 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [04:11:29] PROBLEM - Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/dumps - 288 bytes in 0.011 second response time [04:40:19] PROBLEM - puppet last run on lvs4003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:59:00] 06Operations, 10Traffic: Extra RTT on TLS handshakes - https://phabricator.wikimedia.org/T150561#2789754 (10BBlack) Just to double-check things, I've also confirmed that by adding extra copies of the intermediate cert, I can induce the extra RTT without using stapling at all. During these tests, the range of... [05:05:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [05:08:19] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [05:08:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [05:32:59] PROBLEM - puppet last run on puppetmaster2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:59:59] RECOVERY - puppet last run on puppetmaster2001 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:06:14] (03CR) 10BryanDavis: "This does not work. The dynamic mapping will only be picked up if the field name is unknown in the mapping. Put another way, this would wo" [puppet] - 10https://gerrit.wikimedia.org/r/320441 (https://phabricator.wikimedia.org/T150106) (owner: 10BryanDavis) [06:55:19] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:09:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [07:10:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [07:13:20] (03PS1) 10Ladsgroup: Ban ten monst popular passwords from fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321093 (https://phabricator.wikimedia.org/T150570) [07:15:09] PROBLEM - puppet last run on db1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:19:15] (03PS2) 10Ladsgroup: Ban ten most popular passwords from fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321093 (https://phabricator.wikimedia.org/T150570) [07:25:19] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [07:44:09] RECOVERY - puppet last run on db1033 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [07:54:13] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2789845 (10matmarex) By the way, looking for alternatives... [08:07:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [08:08:03] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2789846 (10matmarex) @MoritzMuehlenhoff Here's the best p... [08:08:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [08:09:20] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2789847 (10matmarex) Extended table from T141739#2785991,... [08:12:22] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2789848 (10matmarex) We should probably report this upstr... [08:24:02] 06Operations, 06Commons, 06Multimedia: Deploy some fixed version of ImageMagick from apt.wikimedia.org - https://phabricator.wikimedia.org/T150432#2789850 (10matmarex) [08:24:55] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2789853 (10matmarex) [09:00:11] (03PS1) 10Ladsgroup: ores: Send logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) [09:01:16] (03CR) 10jenkins-bot: [V: 04-1] ores: Send logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) (owner: 10Ladsgroup) [09:04:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [09:05:11] (03PS2) 10Ladsgroup: ores: Send logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) [09:05:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [09:08:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [09:09:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [09:13:39] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [09:14:29] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3052795 keys, up 12 days 52 minutes - replication_delay is 0 [09:32:29] PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:38:39] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=1360.30 Read Requests/Sec=1168.10 Write Requests/Sec=0.20 KBytes Read/Sec=43984.00 KBytes_Written/Sec=10.00 [09:53:39] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=201.50 Read Requests/Sec=133.20 Write Requests/Sec=119.80 KBytes Read/Sec=5420.80 KBytes_Written/Sec=769.20 [10:00:29] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [10:06:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [10:07:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [10:25:29] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [10:26:29] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3055314 keys, up 12 days 2 hours - replication_delay is 0 [10:46:20] (03CR) 10Luke081515: [C: 031] Ban ten most popular passwords from fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321093 (https://phabricator.wikimedia.org/T150570) (owner: 10Ladsgroup) [10:49:49] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [10:49:59] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [10:50:29] RECOVERY - Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.019 second response time [10:51:59] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [10:52:19] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [10:57:19] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [10:58:49] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [11:13:39] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=5008.80 Read Requests/Sec=1820.18 Write Requests/Sec=0.20 KBytes Read/Sec=33192.81 KBytes_Written/Sec=2.00 [11:25:39] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=159.90 Read Requests/Sec=134.60 Write Requests/Sec=0.30 KBytes Read/Sec=3795.20 KBytes_Written/Sec=18.00 [11:26:32] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2789998 (10matmarex) Filed upstream as: https://github.co... [11:57:09] PROBLEM - puppet last run on db1045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:02:02] (03PS1) 10Arseny1992: Enable RevisionSlider (non betafeature) on hewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321103 (https://phabricator.wikimedia.org/T150573) [12:05:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [12:07:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [12:25:09] RECOVERY - puppet last run on db1045 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [12:55:19] PROBLEM - MD RAID on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:56:09] RECOVERY - MD RAID on thumbor1002 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [13:05:03] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia, 07Upstream: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2790038 (10Josve05a) [13:07:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [13:08:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [13:27:55] Hi , any staff online [13:28:16] someone fix https://wikimediafoundation.org/wiki/Home [13:28:22] and block legoktm [13:30:17] robh: [13:30:19] hi [13:34:09] Looking Shanmugamp7 [13:34:22] thanks Krenair [13:35:31] crap, hang on [13:36:35] i have locked the global account [13:37:39] !log `mwscript createAndPromote.php foundationwiki --sysop "Alex Monk (WMF)" --force` temporarily [13:37:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:14] again :/ [13:42:29] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:43:29] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [13:44:12] Thanks Krenair , not sure how many got hacked, is anyone investigating ? [13:50:29] PROBLEM - mobileapps endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:52:29] RECOVERY - mobileapps endpoints health on scb1003 is OK: All endpoints are healthy [13:54:19] PROBLEM - puppet last run on cp3045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:57:49] PROBLEM - puppet last run on ms-fe1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:02:52] "About me : I write bots and work on stuff. See also User:Legoktm (WMF)." [14:02:53] Krenair ^ [14:03:12] what is that from? [14:03:31] his user page [14:03:49] though I'm not sure what arseny92 thinks is wrong with it? [14:04:04] [15:44] Thanks Krenair , not sure how many got hacked, is anyone investigating ? [14:04:06] this [14:04:33] ask wmf communications [14:04:40] where? [14:04:44] history shows the page is ok [14:05:13] yes contribs say he doesn't use his work account tbh [14:06:34] however in case that got also compromised better to lock everything until he regains the control [14:06:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [14:06:59] arseny92 this looks like the same problem that happened last night [14:08:33] what was last night as I wasn't around? [14:08:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [14:09:26] you mean the other foundationwiki user as seen on recentchanges? [14:22:27] #CHATT [14:22:29] asdasd [14:23:19] RECOVERY - puppet last run on cp3045 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [14:23:19] PROBLEM - puppet last run on cp3046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:25:49] RECOVERY - puppet last run on ms-fe1004 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [14:26:12] !log Created OATHAuth tables on all fishbowl wikis [14:26:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:01] reedy , so we just wmgOATHAuthDisableRight default=false as part of T150577 ? or invert the variable? [14:30:02] T150577: Enable OATHAuth for all users - https://phabricator.wikimedia.org/T150577 [14:30:24] arseny92: Don't make a patch for it please [14:30:42] (03PS1) 10Reedy: Enable OATHAuth on fishbowl wikis. Bump password requirements [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321108 [14:31:01] (03CR) 10Reedy: [C: 032] Enable OATHAuth on fishbowl wikis. Bump password requirements [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321108 (owner: 10Reedy) [14:31:45] (03Merged) 10jenkins-bot: Enable OATHAuth on fishbowl wikis. Bump password requirements [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321108 (owner: 10Reedy) [14:32:00] (03CR) 10Ladsgroup: "It seems ores in beta cluster is not happy about it. Let me test it throughfully." [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) (owner: 10Ladsgroup) [14:33:11] Reedy as you wish, just to know hw we'll deal with it [14:33:27] arseny92: That would likely be the correct answer though, yes :) [14:33:39] !log reedy@tin Synchronized wmf-config/InitialiseSettings.php: Enable OATHAuth on fishbowl wikis, bump password requirements (duration: 00m 50s) [14:33:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:34:05] it looks weird tho, [14:34:26] recursive bool [14:34:57] i guess would be better to invert it [14:35:22] well, at some point, we just remove it [14:35:32] because it serves no purpose [14:35:38] it will be available to everyone [14:37:44] i mean set the right definition as true , but the var in IS to default false and true for the current groups, then when we going to enable it for all , justset default true and remove all custom config [14:39:00] CS ln3146 [14:39:03] if ( $wmgOATHAuthDisableRight ) { [14:39:03] $wgGroupPermissions['*']['oathauth-enable'] = false; [14:39:03] } [14:40:46] Reedy: one question, why 2fa is not enabled for admins, CUs, etc. [14:40:51] what is the blocker? [14:41:05] Amir1: testing, crappy UI, missing features etc [14:41:47] I completely agree on the crappy UI. I hope I can help out. [14:42:05] Amir1: There's some LTR/RTL issues I'd love some help on (not now obviously) [14:42:19] On a site as big as Wikimedia, I am surprised it never had it [14:42:27] honestly... [14:42:33] yeah, I actually asked for that bug and got some ideas how to solve it [14:44:44] ^ to rename the var to EnableRight , set true , but override in IS with 'default' => false and true for the individual current groups and the individual projects rollout. Then when we enable it for all, to remove the var and put the right definition into UseOATHAuth [14:45:15] arseny92: That's the least of our worries [14:46:05] but don't you agree that the recursive bool is quite weird from the manageability view [14:46:43] Yes, and we have numerous of them in MW [14:51:19] RECOVERY - puppet last run on cp3046 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [14:53:31] (03PS1) 10Reedy: Enable OATHAuth for sysop, 'crat, oversight and checkuser [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321113 [14:54:28] (03CR) 10Reedy: [C: 032] Enable OATHAuth for sysop, 'crat, oversight and checkuser [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321113 (owner: 10Reedy) [14:55:16] (03Merged) 10jenkins-bot: Enable OATHAuth for sysop, 'crat, oversight and checkuser [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321113 (owner: 10Reedy) [14:55:42] Steward is included in that, correct?^ [14:56:23] Bsadowski1: Nope [14:56:36] Dang. [14:56:41] Hey Reedy. [14:56:42] Bsadowski1: We should do that by GlobalGroup things on meta [14:56:55] Indeed. [14:56:57] Either soemone else can, or I can with my staff account [14:57:17] I can [14:57:20] Please do [14:57:43] Bsadowski1: And any other similar elevated group [14:57:52] Which right? [14:58:02] oathauth-enable [14:58:13] Please do sys admin, Founders... Global sysop etc etc [14:58:44] What should the reason be? [14:58:54] "people have shit passwords" [14:58:58] :-D [14:59:15] please provide a link here to the change so I can add it to phab? [14:59:26] !log reedy@tin Synchronized wmf-config/CommonSettings.php: Enable OATHAuth for all sysop, crat, oversight and checkuser (duration: 00m 47s) [14:59:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:59:32] Bsadowski1: Put per my request. Or by WMF Security Team request [14:59:42] 321113 sufers with the exact config problem i trying to say above [15:00:27] Reedy: I'll make a post on WP:AN [15:00:42] arseny92: as I said, I don't care [15:00:45] We can cleanup later [15:01:02] If it has the desired affect, that WFM [15:01:05] because the dbgroups are false, if you have rights in wikis in those groups , you don't have oath [15:01:41] Enable it on WMF CA wiki, it's on them all [15:04:51] Bsadowski1: Can you post to stewards list too to encourage them to 2FA their account? [15:17:41] Bsadowski1: how is that going? [15:18:03] Oh [15:18:08] RIghto... [15:21:09] PROBLEM - puppet last run on tungsten is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:14] er, so, Bsadowski1, did that happen and can I link to something in a phab comment that I will write? :-) [15:38:23] Done [15:38:27] not nagging, just have itchy fingers [15:38:45] if there's a log entry I can get to, say [15:38:59] otherwise just tell me wiki and groups and I'll paste it in [15:39:03] also, thanks!! [15:40:46] Did it for: stewards, founder, and sysadmin [15:41:09] interface editors and global sysops? [15:41:51] Global sysops done as well [15:42:04] is this across all wikis? [15:42:15] I did it for the global group [15:42:15] or...? [15:42:19] ok [15:42:27] * apergos goes to update the ticket [15:47:36] done [15:47:44] Which ticket? [15:48:38] Oh https://phabricator.wikimedia.org/T150577 ? [15:49:06] private (sorry) [15:49:22] it's got ips in there for example [15:49:31] Ah okay :) [15:49:46] otherwise I woulda nagged you to add it yerself [15:49:46] No problem. [15:50:09] RECOVERY - puppet last run on tungsten is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [15:52:37] I did global interface editor also [15:52:40] Bsadowski1: What other global groups still need it? [15:52:44] Krenair: ^ [15:52:46] Bsadowski1: Thanks [15:53:14] Omb [15:53:29] ombudsman [15:53:49] TBH, feel free to apply common sense here [15:53:52] Did that one. [15:53:56] Yep :) [15:54:12] Add it to any groups that benefit [16:00:17] 06Operations, 06Commons, 06Multimedia: Deploy some fixed version of ImageMagick from apt.wikimedia.org - https://phabricator.wikimedia.org/T150432#2790160 (10Dereckson) So for the initial version, 6.8.3 has been identified by @matmarex as the last 6.x before the bug introduction. This version couldn't be th... [16:04:19] PROBLEM - puppet last run on db1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:05:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [16:08:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [16:09:01] (03PS1) 10Reedy: Log all failed login attempts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321114 [16:14:12] (03CR) 10Alex Monk: Log all failed login attempts (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321114 (owner: 10Reedy) [16:32:19] RECOVERY - puppet last run on db1030 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:47:19] PROBLEM - puppet last run on mw1298 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:50:19] PROBLEM - puppet last run on cp4005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:01:59] PROBLEM - puppet last run on francium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:04:31] 06Operations, 06Labs, 13Patch-For-Review, 07Tracking: Migrate tools to secondary labstore HA cluster (Scheduled on 11/14) [tracking] - https://phabricator.wikimedia.org/T146154#2790205 (10madhuvishy) [17:04:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [17:05:31] (03CR) 10Gergő Tisza: Log all failed login attempts (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321114 (owner: 10Reedy) [17:05:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [17:16:19] RECOVERY - puppet last run on mw1298 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [17:18:19] RECOVERY - puppet last run on cp4005 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [17:28:59] RECOVERY - puppet last run on francium is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [17:31:42] (03CR) 10Brian Wolff: "Just as an aside, AuthManager does already log all failed login attempts (Although it doesn't record related context like IP)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321114 (owner: 10Reedy) [17:32:38] Reedy/Legoktm here? [17:32:45] I am, but heading out shortly [17:32:51] Lego is driving [17:32:53] Steinsplitter: Whats up? [17:33:10] Reedy: Where do i login with oath? at Special:Login? [17:33:19] PROBLEM - puppet last run on elastic1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:33:25] Steinsplitter: you login as normal [17:33:31] If it's enabled, it'll ask you for a OTP [17:33:36] To enable it, visit Special:Preferences [17:33:50] Reedy: i enabled it yet. i get "Incorrect password or confirmation code entered. Please try again. " [17:34:01] there is no field to enter a confirmation code [17:34:07] It appears on the next screen [17:34:21] Steinsplitter: username + password -> login [17:34:31] then appear a next screen [17:34:36] asking you for the token [17:35:20] oh, yes. error in pw. sorry for disturbing :( thanks! [17:35:30] :) [17:35:58] heh [17:44:01] (03CR) 10Gergő Tisza: "> Just as an aside, AuthManager does already log all failed login" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321114 (owner: 10Reedy) [17:47:05] (03PS1) 10Madhuvishy: labstore: Make secondary backup script fail if already running [puppet] - 10https://gerrit.wikimedia.org/r/321117 (https://phabricator.wikimedia.org/T144633) [17:50:51] (03PS2) 10Madhuvishy: labstore: Make secondary backup script fail if already running [puppet] - 10https://gerrit.wikimedia.org/r/321117 (https://phabricator.wikimedia.org/T144633) [17:51:00] (03CR) 10Madhuvishy: [C: 032 V: 032] labstore: Make secondary backup script fail if already running [puppet] - 10https://gerrit.wikimedia.org/r/321117 (https://phabricator.wikimedia.org/T144633) (owner: 10Madhuvishy) [17:52:45] (03CR) 10Brian Wolff: "Hmm, it was actually CentralAuth that logs it, but the log is not as useful - https://logstash.wikimedia.org/goto/c472cc7238a82c18dc68a3ae" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321114 (owner: 10Reedy) [17:54:48] 06Operations, 06Labs, 13Patch-For-Review, 07Tracking: Migrate tools to secondary labstore HA cluster (Scheduled on 11/14) [tracking] - https://phabricator.wikimedia.org/T146154#2790246 (10madhuvishy) [18:01:19] RECOVERY - puppet last run on elastic1031 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [18:02:51] (03PS1) 10Gergő Tisza: Log Throttler events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321118 [18:05:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [18:06:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [18:42:01] !log done with my shell-granted sysop flag on foundationwiki, have removed it [18:42:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:04:50] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [19:06:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [19:53:07] !log deployed patch for T150554 [19:53:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:05:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [20:06:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [20:16:59] Hello I need some assistance please. [20:17:29] hi Chad__, what do you need help with? [20:17:58] assistance is on -tech [20:18:14] -operations is serious stuff [20:18:16] I was treated very badly on the english wikipedia today by the Administrators and I need to get them fired. [20:18:48] Chad__: yeah, no. I just looked at your edits. [20:19:23] You see there were extremley rude to me. [20:19:40] I would like action taken against them [20:19:48] Operations is not the place to deal with it [20:19:50] they' [20:20:08] Ok well where do i report them then? [20:21:09] Follow https://en.wikipedia.org/wiki/Wikipedia:Dispute_resolution [20:48:59] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [20:49:39] PROBLEM - puppet last run on maps1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:52:48] 06Operations, 10MediaWiki-General-or-Unknown, 10Traffic: Failure to save recent changes - https://phabricator.wikimedia.org/T150503#2790359 (10elukey) >>! In T150503#2788909, @Joe wrote: > Still, what is puzzling about apache is: > > # It returns a 503 and not a 400 as I would expect per the HTTP standard >... [21:06:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [21:07:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [21:11:55] (03Draft2) 10XXN: Fixes for namespace definitions for some Romanian (ro) projects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321121 [21:15:17] (03PS3) 10XXN: Fixes for namespace definitions for some Romanian (ro) projects. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321121 [21:17:39] RECOVERY - puppet last run on maps1004 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [21:33:10] 06Operations, 10MediaWiki-General-or-Unknown, 10Traffic: Failure to save recent changes - https://phabricator.wikimedia.org/T150503#2790391 (10Marshallsumter) Just FYI: Changes smaller than about 6kB will be saved but larger ones trip the error message every time. [21:51:57] (03CR) 10Brian Wolff: [C: 031] Log Throttler events [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321118 (owner: 10Gergő Tisza) [22:00:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [22:01:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [22:03:29] (03PS1) 10Brian Wolff: Allow 2FA for the abusefilter group if enabled on wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321162 [22:06:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [22:08:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [22:53:19] RECOVERY - OCG health on ocg1003 is OK: OK: ocg_job_status 793017 msg: ocg_render_job_queue 0 msg [22:53:19] RECOVERY - OCG health on ocg1002 is OK: OK: ocg_job_status 793023 msg: ocg_render_job_queue 0 msg [22:53:19] RECOVERY - OCG health on ocg1001 is OK: OK: ocg_job_status 793024 msg: ocg_render_job_queue 0 msg [23:05:49] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [23:07:49] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [23:08:48] (03CR) 10Luke081515: [C: 031] Allow 2FA for the abusefilter group if enabled on wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321162 (owner: 10Brian Wolff) [23:19:48] (03CR) 10Brian Wolff: "The abusefilter group doesnt seem super critical at the moment so i was going to wait for monday for this" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321162 (owner: 10Brian Wolff) [23:21:39] legoktm: ping [23:23:19] PROBLEM - puppet last run on krypton is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:30:29] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 601 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3098538 keys, up 12 days 15 hours - replication_delay is 601 [23:36:29] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3098080 keys, up 12 days 15 hours - replication_delay is 0 [23:51:19] RECOVERY - puppet last run on krypton is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [23:53:17] matanya: pong [23:53:41] legoktm: check you bot permission as well please [23:54:19] matanya: password has been changed since this morning, I just haven't updated my scripts yet [23:54:43] legoktm: ok, they are failing all over the place :) [23:55:22] yeah, I kinda rushed in the morning as I already had plans for most of today [23:56:00] The person who really needs to fix their bot is labslogbot :P [23:59:43] might as well use this moment to switch to bot passwords...