[00:04:16] (03CR) 10Andrew Bogott: [C: 04-1] "I like this, but I'd like this better if it /added/ Task rather than changing 'Bug' for 'Task'. It's easier to make the software flexible" [puppet] - 10https://gerrit.wikimedia.org/r/209741 (owner: 10Paladox) [00:07:19] (03PS1) 10BryanDavis: Switch beta udp2log host to deployment-fluorine [tools/scap] - 10https://gerrit.wikimedia.org/r/209830 (https://phabricator.wikimedia.org/T98289) [00:08:38] PROBLEM - High load average on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [00:10:37] (03CR) 10BryanDavis: "cherry-picked and deployed in beta" [tools/scap] - 10https://gerrit.wikimedia.org/r/209830 (https://phabricator.wikimedia.org/T98289) (owner: 10BryanDavis) [00:13:26] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [00:13:55] 6operations, 10Wikidata, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 5Patch-For-Review: Create Wikipedia Konkani - https://phabricator.wikimedia.org/T96468#1273938 (10Dzahn) I think the ops part in this was only adding it to DNS which is done and the mwconfig part is more -releng., or? [00:17:51] 6operations, 10Wikidata, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 5Patch-For-Review: Create Wikipedia Konkani - https://phabricator.wikimedia.org/T96468#1273955 (10Krenair) I was planning to just do the deployment side (config, maintenance script, etc.) once the language is actually importe... [00:19:11] greg-g, is there supposed to be some sort of official releng process there? [00:19:31] where? new wiki? [00:19:46] yes [00:20:21] Krenair: nothing #releng specific, really, other than making sure community consensus is there and all that. [00:20:42] I mean, we'll help with it (the actual pushing the buttons part) [00:20:45] It's a new wiki creation, so there is the language committee [00:21:15] which reports to the board, so... [00:21:39] so i can trust them? :) [00:21:44] yes [00:21:51] greg-g, should there be a deployment window for this sort of thing? [00:22:02] it used to platform / team reedy [00:22:18] Krenair: it's not really a swat thing since (I forget, but I believe) it's not a quick operation [00:22:24] yeah [00:22:35] so yeah, we probably want to get mukunda/chad on point with it [00:22:41] I usually do it in a few stages [00:22:52] Reedy: wanna do it? :) [00:23:07] DNS first if necessary, apache if necessary, then mw config and creating the wiki [00:23:14] Reedy: actually, do you konw if there's documenation on what you'd do anywhere? [00:23:16] I don't think it's possible yet? the language wasn't yet imported into mw apparently [00:23:21] https://wikitech.wikimedia.org/wiki/Add_a_wiki [00:23:32] DNS was done [00:23:35] apache seems unnecessary [00:23:37] (I didn't mean right now ;) ) [00:23:39] mw config seems ready [00:23:40] adding ops for DNS made sense [00:23:44] https://wikitech.wikimedia.org/wiki/Add_a_wiki is mostly right [00:23:48] (03CR) 10Yuvipanda: [C: 031] Switch beta udp2log host to deployment-fluorine [tools/scap] - 10https://gerrit.wikimedia.org/r/209830 (https://phabricator.wikimedia.org/T98289) (owner: 10BryanDavis) [00:23:48] so we added it but then we are done, or? [00:23:49] "mostly" [00:24:01] I keep hearing that it's "outdated" or "mostly right" or things along those lines [00:24:11] vim /home/wikipedia/conf/httpd/wikimedia.conf [00:24:15] But unless you can identify specific issues with it, it's the only documentation for the process I'm aware of [00:24:15] That's pretty wrong [00:24:43] there are some old paths in there for chapter/special wikis, true [00:25:02] those would now be... somewhere in puppet perhaps? [00:25:12] didn't we have an apache-config repo at some stage? [00:25:16] Yeah [00:25:25] Secondly, add Apache configuration for the new wiki by submitting it to gerrit first. They are now located at /modules/mediawiki/files/apache/sites/ in operations/puppet.git. [00:25:29] Someone already updated that [00:25:30] Krenair: it got moved to the mediawiki module [00:25:51] Language wikis are generally easier, apache is already listening on *.wikipedia [00:26:08] DNS templates are all generated, adding a new language is 1 line [00:28:28] Looks like a chunk of that page can just be removed [00:29:10] (03CR) 10Jforrester: Re-enable OAuth on wikitech (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/209744 (https://phabricator.wikimedia.org/T98567) (owner: 10Alex Monk) [00:29:55] (03PS2) 10Jforrester: Re-enable OAuth on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/209744 (https://phabricator.wikimedia.org/T98567) (owner: 10Alex Monk) [00:30:04] (03PS3) 10Jforrester: Re-enable OAuth on wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/209744 (https://phabricator.wikimedia.org/T98567) (owner: 10Alex Monk) [00:30:56] (cur | prev) 00:30, 9 May 2015‎ Reedy (Talk | contribs)‎ . . (12,196 bytes) (-1,406)‎ . . (Remove outdated crap) (undo) [00:31:54] haha [00:33:37] (03CR) 10Dzahn: "i don't know if we can and want to really apply all those roles just like on tin. for example role::labsdb::manager , is this really a fea" [puppet] - 10https://gerrit.wikimedia.org/r/208723 (https://phabricator.wikimedia.org/T95436) (owner: 10John F. Lewis) [00:34:05] (03CR) 10Dzahn: "maybe better to apply one role at a time and go from there instead of a single patch" [puppet] - 10https://gerrit.wikimedia.org/r/208723 (https://phabricator.wikimedia.org/T95436) (owner: 10John F. Lewis) [00:34:59] I just noticed the $wmgUseDualLicense comment [00:35:03] It does indeed seem orphaned [00:35:28] aude: Don't suppose you've any idea about wmgUseDualLicense now? [00:36:36] I thought the code used to be in CommonSettings or WikimediaMessages [00:36:41] Can't see any obvious sign of it [00:39:41] I think when looking through the config change we decided it'd need a schema change for echo, and run something for wikidata [00:40:19] Wikidata is documented at https://wikitech.wikimedia.org/wiki/Add_a_wiki#Wikidata [00:40:33] yep [00:40:40] Echo is just the same as it would be for any wiki enabling the extension [00:40:45] yeah [00:40:59] (03PS1) 10Dzahn: add firewall to mira - codfw deployment host [puppet] - 10https://gerrit.wikimedia.org/r/209837 [00:41:19] (03PS2) 10Dzahn: add firewall to mira - codfw deployment host [puppet] - 10https://gerrit.wikimedia.org/r/209837 [00:44:27] (03PS3) 10Dzahn: add firewall to mira - codfw deployment host [puppet] - 10https://gerrit.wikimedia.org/r/209837 (https://phabricator.wikimedia.org/T95436) [00:45:51] (03PS1) 10Dzahn: backup home dirs on codfw deployment host [puppet] - 10https://gerrit.wikimedia.org/r/209838 (https://phabricator.wikimedia.org/T95436) [00:46:00] (03CR) 10Paladox: "Hi do you mean remove the Bug: prefix from footer and instead use Task: prefix" [puppet] - 10https://gerrit.wikimedia.org/r/209741 (owner: 10Paladox) [00:46:58] (03CR) 10Dzahn: [C: 032] "not in use yet, so nothing to break" [puppet] - 10https://gerrit.wikimedia.org/r/209837 (https://phabricator.wikimedia.org/T95436) (owner: 10Dzahn) [00:47:01] (03CR) 10Paladox: "Or how would it go." [puppet] - 10https://gerrit.wikimedia.org/r/209741 (owner: 10Paladox) [00:47:11] (03PS9) 10Paladox: Adding task support instead of using Bug: which was for bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/209741 [00:47:45] (03PS9) 10Alex Monk: Create Wikipedia Konkani [mediawiki-config] - 10https://gerrit.wikimedia.org/r/206300 (https://phabricator.wikimedia.org/T96468) (owner: 10Dzahn) [00:48:38] (03PS2) 10Dzahn: backup home dirs on codfw deployment host [puppet] - 10https://gerrit.wikimedia.org/r/209838 (https://phabricator.wikimedia.org/T95436) [00:48:42] (03CR) 10Paladox: "Do you mean something like this" [puppet] - 10https://gerrit.wikimedia.org/r/209741 (owner: 10Paladox) [00:49:31] (03CR) 10Andrew Bogott: "Yes, the last thing, I think :) I mean -- please support use of /either/ "Task xxx" or "Bug xxx" in the footer. The behavior should be t" [puppet] - 10https://gerrit.wikimedia.org/r/209741 (owner: 10Paladox) [00:49:42] ori, did you do any compression on those static logos? [00:49:56] (03PS1) 10Reedy: Remove wmgDualLicense, orphaned [mediawiki-config] - 10https://gerrit.wikimedia.org/r/209840 [00:49:57] because https://gerrit.wikimedia.org/r/#/c/206300/ - PS9 had to rebase over your change [00:49:59] https://github.com/wikimedia/operations-mediawiki-config/commit/3463cd6e0499841ef40a2682fbd6f9dccb2d80e2 [00:50:01] (03CR) 10Dzahn: [C: 032] backup home dirs on codfw deployment host [puppet] - 10https://gerrit.wikimedia.org/r/209838 (https://phabricator.wikimedia.org/T95436) (owner: 10Dzahn) [00:50:10] "use optipng to make sure each image is maximally optimized." [00:50:15] (03CR) 10Paladox: "And then remove" [puppet] - 10https://gerrit.wikimedia.org/r/209741 (owner: 10Paladox) [00:50:21] (03PS10) 10Paladox: Adding task support instead of using Bug: which was for bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/209741 [00:54:33] ori, any particular arguments you gave to optipng? [00:56:17] Krenair: optipng -o7 [00:56:44] (slowest method, best compression, still lossless) [00:58:38] (03PS11) 10Paladox: Adding task support instead of using Bug: which was for bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/209741 [00:59:34] (03PS10) 10Alex Monk: Create Wikipedia Konkani [mediawiki-config] - 10https://gerrit.wikimedia.org/r/206300 (https://phabricator.wikimedia.org/T96468) (owner: 10Dzahn) [01:00:28] (03CR) 10Andrew Bogott: [C: 031] "Thanks, looks good! Hopefully Chase will have the final word here, since I believe he's more familiar with the code." [puppet] - 10https://gerrit.wikimedia.org/r/209741 (owner: 10Paladox) [01:00:58] (03CR) 10Paladox: "Should I remove" [puppet] - 10https://gerrit.wikimedia.org/r/209741 (owner: 10Paladox) [01:01:35] (03PS1) 10Dzahn: add IPv6 to codfw deployment server [puppet] - 10https://gerrit.wikimedia.org/r/209842 (https://phabricator.wikimedia.org/T95436) [01:01:37] (03PS1) 10Dzahn: add deployer admin groups to codfw deploy server [puppet] - 10https://gerrit.wikimedia.org/r/209843 (https://phabricator.wikimedia.org/T95436) [01:01:40] (03CR) 10jenkins-bot: [V: 04-1] add IPv6 to codfw deployment server [puppet] - 10https://gerrit.wikimedia.org/r/209842 (https://phabricator.wikimedia.org/T95436) (owner: 10Dzahn) [01:01:44] (03CR) 10jenkins-bot: [V: 04-1] add deployer admin groups to codfw deploy server [puppet] - 10https://gerrit.wikimedia.org/r/209843 (https://phabricator.wikimedia.org/T95436) (owner: 10Dzahn) [01:02:08] (03PS2) 10Dzahn: add IPv6 to codfw deployment server [puppet] - 10https://gerrit.wikimedia.org/r/209842 (https://phabricator.wikimedia.org/T95436) [01:02:17] (03CR) 10Alex Monk: Adding task support instead of using Bug: which was for bugzilla (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/209741 (owner: 10Paladox) [01:02:50] (03CR) 10jenkins-bot: [V: 04-1] add IPv6 to codfw deployment server [puppet] - 10https://gerrit.wikimedia.org/r/209842 (https://phabricator.wikimedia.org/T95436) (owner: 10Dzahn) [01:03:12] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/209842 (https://phabricator.wikimedia.org/T95436) (owner: 10Dzahn) [01:03:21] (03PS2) 10Dzahn: add deployer admin groups to codfw deploy server [puppet] - 10https://gerrit.wikimedia.org/r/209843 (https://phabricator.wikimedia.org/T95436) [01:04:06] (03PS1) 10Andrew Bogott: Several improvements to the cold-migrate script. [puppet] - 10https://gerrit.wikimedia.org/r/209844 [01:04:53] (03CR) 10Dzahn: [C: 032] add IPv6 to codfw deployment server [puppet] - 10https://gerrit.wikimedia.org/r/209842 (https://phabricator.wikimedia.org/T95436) (owner: 10Dzahn) [01:07:22] (03CR) 10Paladox: Adding task support instead of using Bug: which was for bugzilla (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/209741 (owner: 10Paladox) [01:08:39] (03CR) 10Dzahn: "adding existing admin groups to new hosts in codfw that are supposed to be like existing hosts in eqiad. that's not really an access reque" [puppet] - 10https://gerrit.wikimedia.org/r/209843 (https://phabricator.wikimedia.org/T95436) (owner: 10Dzahn) [06:16:07] PROBLEM - High load average on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [06:30:27] PROBLEM - puppet last run on mw2082 is CRITICAL Puppet has 1 failures [06:30:47] PROBLEM - puppet last run on elastic1030 is CRITICAL Puppet has 1 failures [06:31:27] PROBLEM - puppet last run on db2065 is CRITICAL Puppet has 1 failures [06:31:36] PROBLEM - puppet last run on cp4004 is CRITICAL Puppet has 1 failures [06:31:37] PROBLEM - puppet last run on logstash1006 is CRITICAL Puppet has 1 failures [06:31:56] PROBLEM - puppet last run on cp3008 is CRITICAL Puppet has 1 failures [06:31:57] PROBLEM - puppet last run on ms-fe2001 is CRITICAL Puppet has 1 failures [06:32:07] PROBLEM - puppet last run on db1018 is CRITICAL Puppet has 1 failures [06:32:37] PROBLEM - puppet last run on mw2113 is CRITICAL Puppet has 1 failures [06:32:57] PROBLEM - puppet last run on mw2023 is CRITICAL Puppet has 1 failures [06:33:36] PROBLEM - puppet last run on db2036 is CRITICAL Puppet has 1 failures [06:33:37] PROBLEM - puppet last run on mw1025 is CRITICAL Puppet has 1 failures [06:33:56] PROBLEM - puppet last run on mw1054 is CRITICAL Puppet has 1 failures [06:34:17] PROBLEM - puppet last run on mw2184 is CRITICAL Puppet has 1 failures [06:34:17] PROBLEM - puppet last run on mw2093 is CRITICAL Puppet has 1 failures [06:34:17] PROBLEM - puppet last run on mw2212 is CRITICAL Puppet has 1 failures [06:34:27] PROBLEM - puppet last run on mw2059 is CRITICAL Puppet has 1 failures [06:34:36] PROBLEM - puppet last run on mw2073 is CRITICAL Puppet has 1 failures [06:34:37] PROBLEM - puppet last run on mw2022 is CRITICAL Puppet has 2 failures [06:45:07] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [06:45:47] RECOVERY - puppet last run on mw2023 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:46:07] RECOVERY - puppet last run on logstash1006 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:46:27] RECOVERY - puppet last run on cp3008 is OK Puppet is currently enabled, last run 0 seconds ago with 0 failures [06:46:27] RECOVERY - puppet last run on db2036 is OK Puppet is currently enabled, last run 18 seconds ago with 0 failures [06:46:27] RECOVERY - puppet last run on ms-fe2001 is OK Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:46:28] RECOVERY - puppet last run on mw1025 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:46:36] RECOVERY - puppet last run on mw2082 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:36] RECOVERY - puppet last run on db1018 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:56] RECOVERY - puppet last run on elastic1030 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:47:07] RECOVERY - puppet last run on mw2184 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:47:07] RECOVERY - puppet last run on mw2212 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:47:07] RECOVERY - puppet last run on mw2113 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [06:47:27] RECOVERY - puppet last run on mw2073 is OK Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:47:27] RECOVERY - puppet last run on mw2022 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:36] RECOVERY - puppet last run on db2065 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:37] RECOVERY - puppet last run on cp4004 is OK Puppet is currently enabled, last run 48 seconds ago with 0 failures [06:48:17] RECOVERY - puppet last run on mw1054 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:47] RECOVERY - puppet last run on mw2093 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:57] RECOVERY - puppet last run on mw2059 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [07:44:45] (03PS2) 10KartikMistry: Install new Apertium packages for ContentTranslation [puppet] - 10https://gerrit.wikimedia.org/r/209739 [07:49:56] (03PS3) 10KartikMistry: Install new Apertium packages for ContentTranslation [puppet] - 10https://gerrit.wikimedia.org/r/209739 [07:51:08] (03CR) 10KartikMistry: [C: 031] "Can be merge anytime now." [puppet] - 10https://gerrit.wikimedia.org/r/209739 (owner: 10KartikMistry) [08:24:27] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 62.50% of data above the critical threshold [35.0] [08:26:37] PROBLEM - High load average on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [08:32:58] PROBLEM - High load average on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [08:44:27] PROBLEM - High load average on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [09:01:24] (03CR) 10John F. Lewis: [C: 031] "It is not and can be merged." [puppet] - 10https://gerrit.wikimedia.org/r/209843 (https://phabricator.wikimedia.org/T95436) (owner: 10Dzahn) [09:03:56] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [09:22:37] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [09:28:07] PROBLEM - High load average on labstore1001 is CRITICAL 62.50% of data above the critical threshold [24.0] [09:34:36] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [09:46:12] (03PS2) 10ArielGlenn: update multiple args format to publish.runner, bump version [software/deployment/trebuchet-trigger] - 10https://gerrit.wikimedia.org/r/209775 [10:13:46] PROBLEM - High load average on labstore1001 is CRITICAL 55.56% of data above the critical threshold [24.0] [10:25:04] API request failed (internal_api_error_DBQueryError): [7e224081] Database query error [10:25:07] hmmm... [10:25:58] again redis issues? [10:30:16] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0] [10:46:53] 6operations, 10Mathoid, 6Services, 5Patch-For-Review: Standardise Mathoid's deployment - https://phabricator.wikimedia.org/T97124#1274260 (10Physikerwelt) >>! In T97124#1272382, @mobrovac wrote: > @akosiaris, yup, I am aware of that. The deploy repo has been created (yet need to structure it and import the... [10:47:57] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [10:54:37] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [12:04:06] PROBLEM - puppet last run on snapshot1003 is CRITICAL Puppet last ran 4 hours ago [12:07:46] PROBLEM - High load average on labstore1001 is CRITICAL 55.56% of data above the critical threshold [24.0] [12:12:28] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [13:53:52] (03CR) 10Paladox: [C: 031] Adding task support instead of using Bug: which was for bugzilla [puppet] - 10https://gerrit.wikimedia.org/r/209741 (owner: 10Paladox) [15:06:27] PROBLEM - High load average on labstore1001 is CRITICAL 62.50% of data above the critical threshold [24.0] [15:17:37] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [15:32:03] (03Abandoned) 10John F. Lewis: Deploy mira as codfw deployment server [puppet] - 10https://gerrit.wikimedia.org/r/208723 (https://phabricator.wikimedia.org/T95436) (owner: 10John F. Lewis) [15:33:37] (03PS5) 10Glaisher: Enable Echo on Wikimedia wikis by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/139326 (https://phabricator.wikimedia.org/T97760) (owner: 10Withoutaname) [15:33:43] (03CR) 10jenkins-bot: [V: 04-1] Enable Echo on Wikimedia wikis by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/139326 (https://phabricator.wikimedia.org/T97760) (owner: 10Withoutaname) [15:42:51] (03PS1) 10John F. Lewis: Add deployment server role to mira [puppet] - 10https://gerrit.wikimedia.org/r/209874 (https://phabricator.wikimedia.org/T95436) [15:43:51] (03CR) 10jenkins-bot: [V: 04-1] Add deployment server role to mira [puppet] - 10https://gerrit.wikimedia.org/r/209874 (https://phabricator.wikimedia.org/T95436) (owner: 10John F. Lewis) [15:44:00] (03CR) 10Alex Monk: [C: 04-1] "* Needs rebase" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/139326 (https://phabricator.wikimedia.org/T97760) (owner: 10Withoutaname) [15:44:24] (03CR) 10John F. Lewis: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/209874 (https://phabricator.wikimedia.org/T95436) (owner: 10John F. Lewis) [15:49:44] 6operations, 10wikitech.wikimedia.org: labswiki DB is inaccessible from tin, terbium, etc. - https://phabricator.wikimedia.org/T98682#1274443 (10Krenair) 3NEW [15:51:27] (03PS1) 10John F. Lewis: Add mira to deployment network rule [puppet] - 10https://gerrit.wikimedia.org/r/209875 (https://phabricator.wikimedia.org/T95436) [15:51:31] 6operations, 6Labs, 10wikitech.wikimedia.org: labswiki DB does not appear to be accessible in labs replicas - https://phabricator.wikimedia.org/T98683#1274454 (10Krenair) 3NEW [15:52:27] (03CR) 10John F. Lewis: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/209874 (https://phabricator.wikimedia.org/T95436) (owner: 10John F. Lewis) [16:00:35] (03PS2) 10John F. Lewis: Add deployment server role to mira [puppet] - 10https://gerrit.wikimedia.org/r/209874 (https://phabricator.wikimedia.org/T95436) [16:09:48] 6operations, 10MediaWiki-JobQueue, 10MediaWiki-JobRunner: enwiki's job is about 23m atm and increasing - https://phabricator.wikimedia.org/T98621#1274508 (10Krenair) p:5High>3Unbreak! [16:16:07] PROBLEM - puppet last run on ms-be1018 is CRITICAL Puppet has 1 failures [16:20:27] PROBLEM - High load average on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [16:26:29] (03CR) 10Alex Monk: "Note that control over this moved away from the local file with the changes in T98640" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/209126 (owner: 10John F. Lewis) [16:28:37] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [16:29:32] (03CR) 10John F. Lewis: "Emailing Ellie to see if the current logo is the final one in which case this is fine." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/209126 (owner: 10John F. Lewis) [16:31:12] 6operations, 6Labs, 10wikitech.wikimedia.org: labswiki DB does not appear to be accessible in labs replicas - https://phabricator.wikimedia.org/T98683#1274512 (10scfc) [16:32:17] RECOVERY - puppet last run on ms-be1018 is OK Puppet is currently enabled, last run 19 seconds ago with 0 failures [17:27:52] (03CR) 10QChris: [C: 04-1] "I think to achieve what you seem to want to achieve, you" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/209741 (owner: 10Paladox) [17:32:47] PROBLEM - puppet last run on cp3009 is CRITICAL puppet fail [17:50:27] RECOVERY - puppet last run on cp3009 is OK Puppet is currently enabled, last run 31 seconds ago with 0 failures [18:13:57] (03PS5) 10Aaron Schulz: Increase jobrunner::runners_basic [puppet] - 10https://gerrit.wikimedia.org/r/209719 (https://phabricator.wikimedia.org/T98621) (owner: 10Nemo bis) [19:10:17] PROBLEM - High load average on labstore1001 is CRITICAL 75.00% of data above the critical threshold [24.0] [19:21:04] 3 Warning: Failed connecting to redis server at fluorine.eqiad.wmnet: Connection timed out [19:21:11] that's an interesting error. fluorine runs redis? [19:23:17] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [19:29:57] PROBLEM - puppet last run on db1018 is CRITICAL Puppet has 1 failures [19:30:21] "role::xenon" [19:30:23] # Aggregates and graphs stack trace snapshots from MediaWiki [19:30:23] # application servers, showing where time is spent. [19:30:29] (fluorine and redis) [19:30:37] Krenair: [19:31:40] it does that via redis? [19:31:44] ok, didn't know that.. [19:32:34] me neither til I looked it up [19:34:37] PROBLEM - High load average on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [19:36:08] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [19:46:07] RECOVERY - puppet last run on db1018 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [20:10:18] PROBLEM - High load average on labstore1001 is CRITICAL 100.00% of data above the critical threshold [24.0] [20:37:47] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [20:53:16] !log krenair Synchronized php-1.26wmf5/extensions/VisualEditor/modules/ve-mw/ui/tools/ve.ui.MWEditModeTool.js: https://gerrit.wikimedia.org/r/#/c/209949/ (duration: 00m 11s) [20:53:23] Logged the message, Master [20:54:59] !log krenair Synchronized php-1.26wmf4/extensions/VisualEditor/modules/ve-mw/ui/tools/ve.ui.MWEditModeTool.js: https://gerrit.wikimedia.org/r/#/c/209950/ (duration: 00m 12s) [20:55:04] Logged the message, Master [21:10:08] PROBLEM - High load average on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [21:11:47] RECOVERY - High load average on labstore1001 is OK Less than 50.00% above the threshold [16.0] [21:13:17] 6operations, 10MediaWiki-JobQueue, 10MediaWiki-JobRunner, 5Patch-For-Review: enwiki's job is about 24m atm and increasing - https://phabricator.wikimedia.org/T98621#1274625 (10EoRdE6) [22:11:36] PROBLEM - puppet last run on db2016 is CRITICAL puppet fail [22:29:17] RECOVERY - puppet last run on db2016 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [22:49:45] apergos, is dumps breaking a relatively common issue over time? [23:02:21] Shit breaks with dumps on a semi regular basis [23:03:37] why are there no alerts going off in this channel when it breaks? [23:05:36] Reedy: feel like doing a server side upload ? [23:05:49] Krenair: No monitoring [23:07:10] do we have a ticket to add monitoring? do we write incident documentation when it breaks? [23:07:52] Probably not. And no [23:08:01] It's a bit of a weird thing to try and monitor [23:08:39] Not impossible, just not so simple [23:24:56] RECOVERY - puppet last run on snapshot1003 is OK Puppet is currently enabled, last run 49 seconds ago with 0 failures [23:31:36] (03PS1) 10Ori.livneh: Update my (=ori) dotfiles [puppet] - 10https://gerrit.wikimedia.org/r/209953 [23:32:20] (03CR) 10Ori.livneh: [C: 032] Update my (=ori) dotfiles [puppet] - 10https://gerrit.wikimedia.org/r/209953 (owner: 10Ori.livneh) [23:50:47] PROBLEM - Varnishkafka Delivery Errors per minute on cp4013 is CRITICAL 11.11% of data above the critical threshold [20000.0] [23:55:28] RECOVERY - Varnishkafka Delivery Errors per minute on cp4013 is OK Less than 1.00% above the threshold [0.0]