[00:00:30] (swat is over, fwiw) [00:00:34] hey superm401, is https://gerrit.wikimedia.org/r/#/c/187869 still relevant? [00:00:44] (03PS2) 10Dzahn: cassandra: add ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/197840 (https://phabricator.wikimedia.org/T92680) [00:01:17] Krenair, yeah. [00:01:32] I'm just going to do it [00:01:32] It can be merged anytime, I should remember next time we do a deploy. [00:01:34] Okay [00:01:37] (03PS2) 10Alex Monk: Update Flow Parsoid config comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187869 (https://phabricator.wikimedia.org/T86920) (owner: 10Mattflaschen) [00:01:45] it's just a comment [00:01:56] (03CR) 10Alex Monk: [C: 032] Update Flow Parsoid config comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187869 (https://phabricator.wikimedia.org/T86920) (owner: 10Mattflaschen) [00:02:44] hey awight is https://gerrit.wikimedia.org/r/#/c/160494/ still relevant? [00:04:43] (03Merged) 10jenkins-bot: Update Flow Parsoid config comment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/187869 (https://phabricator.wikimedia.org/T86920) (owner: 10Mattflaschen) [00:07:26] !log krenair Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/187869 - comment change only, noop (duration: 00m 07s) [00:07:30] Logged the message, Master [00:07:58] Krenair: good question, lemme check... [00:08:02] (03PS4) 10Awight: Revert "Enable FundraisingTranslateWorkflow on the beta cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160494 [00:08:07] yes :) [00:08:22] Mario: Respected plumber, time to swat: http://1.media.dorkly.cvcdn.com/78/33/c11582dcc864b4b270ab9b0cdcf4af61-super-mario-swat.jpg [00:08:55] Krenair: it's just cruft at this point, removing will make betalabs more stable... [00:10:35] (03CR) 10Alex Monk: [C: 032] Revert "Enable FundraisingTranslateWorkflow on the beta cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160494 (owner: 10Awight) [00:10:40] (03Merged) 10jenkins-bot: Revert "Enable FundraisingTranslateWorkflow on the beta cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/160494 (owner: 10Awight) [00:10:58] Krenair: thanks, I probably should have self-merged months ago... [00:11:51] !log krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/160494/ - labs changes only (duration: 00m 09s) [00:11:54] Logged the message, Master [00:11:58] done [00:12:35] !log restarting cassandra on restbase1003 [00:12:38] Logged the message, Master [00:12:51] awight, yeah I've just been going through the queue and made a list of trivial looking things: https://phabricator.wikimedia.org/P405 [00:13:27] 6operations, 6Labs, 10hardware-requests: eqiad: (6) labs virt nodes - https://phabricator.wikimedia.org/T89752#1134165 (10RobH) [00:13:41] 6operations, 6Labs, 10hardware-requests: eqiad: (6) labs virt nodes - https://phabricator.wikimedia.org/T89752#1044517 (10RobH) Order placed, lead time is 2-3 weeks for delivery. [00:14:05] 6operations, 6Labs, 10hardware-requests: eqiad: (6) labs virt nodes - https://phabricator.wikimedia.org/T89752#1134167 (10RobH) p:5Normal>3Low [00:23:59] Are all SWAT/emergency-related patches done and over with? [00:24:19] (i.e., can Roan and I break CI for a few minutes with the weekly /vendor circular dependency problem?) [00:32:36] James_F, yep [00:34:20] OK. [00:34:26] Or Krenair can do it. ;-) [00:34:53] * Krenair rolls eyes [00:35:08] What is it exactly? [00:47:12] !log restarting cassandra on restbase1002 [00:47:19] Logged the message, Master [00:50:28] (03PS6) 10BBlack: nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 [00:51:22] (03CR) 10jenkins-bot: [V: 04-1] nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 (owner: 10BBlack) [00:52:28] (03PS7) 10BBlack: nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 [00:52:41] (03PS1) 10Tim Landscheidt: Disable class_inherits_from_params_class puppet-lint checks [puppet] - 10https://gerrit.wikimedia.org/r/198170 (https://phabricator.wikimedia.org/T87132) [00:53:24] (03CR) 10jenkins-bot: [V: 04-1] nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 (owner: 10BBlack) [00:54:04] (03PS8) 10BBlack: nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 [01:09:50] !log restarting cassandra on restbase1001 [01:09:54] 6operations, 10Datasets-Archiving, 10Datasets-General-or-Unknown: Image tarball dumps are not being generated - https://phabricator.wikimedia.org/T53001#1134272 (10Hydriz) [01:09:55] Logged the message, Master [01:14:35] (03PS1) 10Thcipriani: Allow override of sync_common config [puppet] - 10https://gerrit.wikimedia.org/r/198173 (https://phabricator.wikimedia.org/T91548) [01:24:26] 6operations, 10Datasets-Archiving: Publish a full SVN dump - https://phabricator.wikimedia.org/T93179#1134295 (10Hydriz) [01:29:28] (03PS3) 10Dzahn: (WIP) cassandra: add ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/197840 (https://phabricator.wikimedia.org/T92680) [01:36:23] PROBLEM - puppet last run on amssq55 is CRITICAL: CRITICAL: puppet fail [01:45:43] PROBLEM - Parsoid on wtp1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:46:40] !log deployed parsoid sha 99d1b214 [01:46:43] PROBLEM - Parsoid on wtp1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:46:46] Logged the message, Master [01:47:21] pybal not keeping up with the pace of the restart? [01:49:24] gwicke, i think wp1011 and wtp1017 might need root restart.. can you? [01:50:06] there might be stuck processes on them. [01:50:44] RECOVERY - Parsoid on wtp1011 is OK: HTTP OK: HTTP/1.1 200 OK - 1108 bytes in 0.021 second response time [01:50:56] there you go .. [01:51:32] still needed for wtp1017? [01:52:41] probably? it is still not recovered. [01:52:52] must be from the template update .. some processes may have been stuck. [01:52:53] 10Ops-Access-Requests, 6operations: Add @Krenair to #WMF-NDA - https://phabricator.wikimedia.org/T93318#1134305 (10Jdforrester-WMF) 3NEW [01:53:05] subbu: ok, let me restart it [01:53:09] 10Ops-Access-Requests, 6operations: Add @Krenair to #WMF-NDA - https://phabricator.wikimedia.org/T93318#1134313 (10Jdforrester-WMF) [01:53:59] subbu: don't you have sudo on those boxes too? [01:54:08] maybe i do. [01:54:09] let me try. [01:54:11] it looks like it restarted okay [01:54:14] RECOVERY - Parsoid on wtp1017 is OK: HTTP OK: HTTP/1.1 200 OK - 1108 bytes in 0.011 second response time [01:54:23] RECOVERY - puppet last run on amssq55 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [01:54:41] oh, it did. [01:54:45] all it needed is for me to login. [01:54:47] :) [01:55:17] it looked longer than 5 min .. but at least it recovered. [01:55:18] haha [01:56:08] * gwicke imagines a parsoid process hiding and smiling while waiting for subbu to log in [01:56:33] :) [01:58:44] (03PS4) 10Dzahn: cassandra: add ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/197840 (https://phabricator.wikimedia.org/T92680) [02:06:30] 10Ops-Access-Requests, 6operations, 6Phabricator: Add @Krenair to #WMF-NDA - https://phabricator.wikimedia.org/T93318#1134328 (10Dzahn) [02:06:58] 10Ops-Access-Requests, 6operations, 6Phabricator, 6WMF-NDA-Requests: Add @Krenair to #WMF-NDA - https://phabricator.wikimedia.org/T93318#1134305 (10Dzahn) [02:10:29] (03PS5) 10Dzahn: WIP cassandra: add ferm rules [puppet] - 10https://gerrit.wikimedia.org/r/197840 (https://phabricator.wikimedia.org/T92680) [02:28:28] !log l10nupdate Synchronized php-1.25wmf21/cache/l10n: (no message) (duration: 04m 56s) [02:28:34] Logged the message, Master [02:31:53] !log LocalisationUpdate completed (1.25wmf21) at 2015-03-20 02:30:50+00:00 [02:31:57] Logged the message, Master [02:39:58] 6operations, 6Commons, 6Multimedia, 7HHVM, and 2 others: Create an HHVM 3.6.0 package, adding Tim's streaming patch - https://phabricator.wikimedia.org/T93194#1134368 (10Eloquence) This has been added to #roadmap, per roadmap policy please move it to the appropriate column in the roadmap within the next co... [02:40:31] superm401: Yeah, clearly Phabricator should let me assign it to both you and Greg as having fixed it. [02:41:28] !log l10nupdate Synchronized php-1.25wmf22/cache/l10n: (no message) (duration: 06m 36s) [02:41:36] Logged the message, Master [02:41:56] James_F, Erik and Greg. ;) Although, Erik also ran the script on Beta Labs directly, so I'm not sure if the 45 minutes played a role or not. [02:42:31] superm401: Ha. :-) [02:43:00] 6operations, 10Continuous-Integration, 6Labs, 10Wikimedia-Labs-Infrastructure: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1134376 (10Krinkle) [02:44:54] 6operations, 6Commons, 6Multimedia, 7HHVM, and 3 others: Convert Imagescalers to HHVM, Trusty - https://phabricator.wikimedia.org/T84842#1134378 (10Eloquence) This was scheduled for March, is this happening next week or should we bump it to April? [02:46:04] !log LocalisationUpdate completed (1.25wmf22) at 2015-03-20 02:45:00+00:00 [02:46:08] Logged the message, Master [04:16:13] !log set email for Hmscott@global, attached enwiki account [04:16:20] Logged the message, Master [04:27:44] (03CR) 10Dzahn: [C: 04-2] "eh, yea, got it --> https://phabricator.wikimedia.org/T87519 :)" [puppet] - 10https://gerrit.wikimedia.org/r/197839 (https://phabricator.wikimedia.org/T92680) (owner: 10Dzahn) [04:33:07] (03Abandoned) 10Dzahn: add restbase hosts to network.pp [puppet] - 10https://gerrit.wikimedia.org/r/197839 (https://phabricator.wikimedia.org/T92680) (owner: 10Dzahn) [05:37:21] (03PS1) 10KartikMistry: Beta: Enable Apertium MT for few languages in Beta [puppet] - 10https://gerrit.wikimedia.org/r/198186 [05:53:22] https://tools.wmflabs.org/magnustools/multistatus.html the quick intersection service on labs is down [05:53:27] can someone be of service ? [05:53:38] Thanks .. Gerard [06:28:34] PROBLEM - puppet last run on mw2036 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:44] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:04] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:13] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:34] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:03] PROBLEM - Disk space on vanadium is CRITICAL: DISK CRITICAL - free space: /srv 14133 MB (3% inode=99%): [06:30:05] PROBLEM - puppet last run on mw1235 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:05] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:33] RECOVERY - Disk space on vanadium is OK: DISK OK [06:32:34] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:43] PROBLEM - puppet last run on amssq46 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:34] PROBLEM - puppet last run on mw1251 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:43] PROBLEM - puppet last run on mw2143 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:44] PROBLEM - puppet last run on mw1213 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:54] PROBLEM - puppet last run on mw1177 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:04] PROBLEM - puppet last run on mw1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:44] PROBLEM - puppet last run on mw1175 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:54] PROBLEM - puppet last run on mw1129 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:55] PROBLEM - puppet last run on mw2136 is CRITICAL: CRITICAL: Puppet has 1 failures [06:39:05] (03PS9) 10BBlack: nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 [06:39:59] (03CR) 10jenkins-bot: [V: 04-1] nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 (owner: 10BBlack) [06:40:48] (03PS1) 10Yuvipanda: tools: Don't put a - before servername for webservice2 [puppet] - 10https://gerrit.wikimedia.org/r/198193 [06:43:16] GerardM-: done [06:44:26] fixing the underlying cause of that being down as well [06:45:10] (03PS1) 10Yuvipanda: tools: Make bigbrother restart a tool upto 10 times [puppet] - 10https://gerrit.wikimedia.org/r/198194 [06:45:29] (03PS2) 10Yuvipanda: tools: Don't put a - before servername for webservice2 [puppet] - 10https://gerrit.wikimedia.org/r/198193 [06:45:34] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:45:39] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Don't put a - before servername for webservice2 [puppet] - 10https://gerrit.wikimedia.org/r/198193 (owner: 10Yuvipanda) [06:45:43] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:45:54] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:45:59] (03PS2) 10Yuvipanda: tools: Make bigbrother restart a tool upto 10 times [puppet] - 10https://gerrit.wikimedia.org/r/198194 [06:46:11] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Make bigbrother restart a tool upto 10 times [puppet] - 10https://gerrit.wikimedia.org/r/198194 (owner: 10Yuvipanda) [06:46:24] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:46:43] RECOVERY - puppet last run on mw2036 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:46:44] RECOVERY - puppet last run on mw1177 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:46:54] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:54] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [06:46:54] RECOVERY - puppet last run on amssq46 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:47:04] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:44] RECOVERY - puppet last run on mw1175 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:47:44] Yeah [06:47:53] RECOVERY - puppet last run on mw1251 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:47:53] RECOVERY - puppet last run on mw1129 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:47:54] RECOVERY - puppet last run on mw1213 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:54] RECOVERY - puppet last run on mw2136 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:47:55] RECOVERY - puppet last run on mw2143 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:24] RECOVERY - puppet last run on mw1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:06:58] 6operations: 3 words that make her horny - https://phabricator.wikimedia.org/T93329#1134642 (10emailbot) [07:08:30] springle: around? [07:19:35] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Mar 20 07:18:31 UTC 2015 (duration 18m 30s) [07:19:41] Logged the message, Master [07:24:56] (03PS1) 10Kaldari: Removing old author WikiGrok campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198195 [08:02:01] twice in the last hour when adding a new section to a mw.org talk page I've gotten "Exception encountered, of type "BadMethodCallException". When I reload I get an edit conflict, and it turns out my contribution was successful. [08:03:15] spagewmf: 2015-03-20 07:54:02 mw1212 mediawikiwiki: [4b5683b8] /w/index.php?title=User_talk:Waldir&action=submit BadMethodCallException from line 221 of /srv/mediawiki/php-1.25wmf22/extensions/Echo/Hooks.php: Call to a member function isMinor() on a non-object (NULL) [08:03:28] spagewmf: was that you? [08:03:56] yup, should I phab #Operations [08:04:23] spagewmf: er, it should be probably filed under Echo and Wikimedia-log-errors [08:06:02] legoktm: will do, long night. Do you use logstash/kibana to find this so fast, or ssh fluorine and grep ? [08:06:20] spagewmf: fluorine and grep :) [08:06:43] spagewmf: he just finds it fast by dint of being legoktm :) [08:07:07] I suggested bd808 give a tech talk on logstash/kibana [08:07:38] don't make me choose which of you is more awesom :) [08:09:35] I just did legoktm@fluorine:/a/mw-log$ grep "BadMethodCallException" exception.log | grep mediawikiwiki [08:10:00] and bam, two identical exceptions :P [08:11:37] legoktm: yup, as https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Test_and_monitor_your_live_code says. But I hear logstash is better. [08:12:14] yeah, I've found logstash pretty useful, but there are some operational issues with it right now [08:12:22] https://phabricator.wikimedia.org/T88732 [08:12:57] bbl, sleep time :) [08:13:11] the logstash link in How to deploy code paints darkness. Thanks again!goodnight [08:22:35] (03PS4) 10Mobrovac: Adjust RESTBase / Cassandra settings for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/197662 (https://phabricator.wikimedia.org/T91102) [08:30:07] (03PS5) 10Mobrovac: Adjust RESTBase / Cassandra settings for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/197662 (https://phabricator.wikimedia.org/T91102) [09:07:10] (03CR) 10Kelson: [C: 031] Add 'Kurs' (106) to $wgContentNamespaces at dewikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/197938 (https://phabricator.wikimedia.org/T93071) (owner: 10Glaisher) [09:14:42] (03PS4) 10Filippo Giunchedi: Make deploy2graphite use mw-deployment-vars.sh [puppet] - 10https://gerrit.wikimedia.org/r/183568 (https://phabricator.wikimedia.org/T1387) (owner: 10Reedy) [09:14:47] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Make deploy2graphite use mw-deployment-vars.sh [puppet] - 10https://gerrit.wikimedia.org/r/183568 (https://phabricator.wikimedia.org/T1387) (owner: 10Reedy) [09:20:13] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/197662 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [09:20:51] godog: thnx, am testing it now on the betacluster [09:21:09] godog: i'm not really happy with having two conf files, but i don't see a way around it really [09:25:24] (03CR) 10Filippo Giunchedi: WIP cassandra: add ferm rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/197840 (https://phabricator.wikimedia.org/T92680) (owner: 10Dzahn) [09:26:09] mobrovac: me neither TBH, but if they diverge again then that'd mean another if/else and so on :( [09:26:34] yeah, i know [09:27:24] godog: my first thought was to build arrays in hiera and iterate over them in the config, but that is not a long-term solution in cases where some domains diverge from the template all others have [09:27:35] so this seemed like a good-enough compromise [09:28:27] mobrovac: *nod* good enough is good enough [09:28:53] <_joe_> YuviPanda: neon is failing its icinga test again, I'd bet [09:31:18] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: Enable Apertium MT for few languages in Beta [puppet] - 10https://gerrit.wikimedia.org/r/198186 (owner: 10KartikMistry) [09:32:41] (03PS3) 10Filippo Giunchedi: swift: provision ms-be101[678] [puppet] - 10https://gerrit.wikimedia.org/r/197526 (https://phabricator.wikimedia.org/T90922) [09:32:48] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: provision ms-be101[678] [puppet] - 10https://gerrit.wikimedia.org/r/197526 (https://phabricator.wikimedia.org/T90922) (owner: 10Filippo Giunchedi) [09:56:06] (03CR) 10Alexandros Kosiaris: [C: 04-1] WIP cassandra: add ferm rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/197840 (https://phabricator.wikimedia.org/T92680) (owner: 10Dzahn) [10:18:09] (03CR) 10Alexandros Kosiaris: WIP cassandra: add ferm rules (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/197840 (https://phabricator.wikimedia.org/T92680) (owner: 10Dzahn) [11:01:14] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [11:01:14] PROBLEM - HTTP 5xx req/min on graphite2001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [11:07:09] (03CR) 10Mobrovac: "Confirmed to work in the beta cluster" [puppet] - 10https://gerrit.wikimedia.org/r/197662 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [11:15:56] 6operations: Add Yana to contracts@ - https://phabricator.wikimedia.org/T91269#1135049 (10Aklapper) [off-topic; only to @Chip: ] >>! In T91269#1132301, @Chip wrote: > Sorry all, missed these, didn't have notifications set up properly and didn't have a proper Phabricator account. For Phabricator notification se... [11:16:04] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [11:16:04] RECOVERY - HTTP 5xx req/min on graphite2001 is OK: OK: Less than 1.00% above the threshold [250.0] [11:21:04] (03PS1) 10Giuseppe Lavagetto: Modify build files for 3.6.0 [debs/hhvm] - 10https://gerrit.wikimedia.org/r/198211 [11:22:58] (03PS6) 10Mobrovac: Adjust RESTBase / Cassandra settings for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/197662 (https://phabricator.wikimedia.org/T91102) [11:25:11] 6operations: Purge > 90 days stat1002:/a/squid/archive/glam_nara - https://phabricator.wikimedia.org/T92340#1135052 (10QChris) >>>! In T92340#1132783, @leila wrote: >> @kevinator, where do we keep track of the definition of each entry in the gzip files? > > In T92340#1132795, @Ottomata wrote: > Keep track of? h... [11:29:41] 6operations: Purge > 90 days stat1002:/a/squid/archive/glam_nara - https://phabricator.wikimedia.org/T92340#1135053 (10QChris) >>! In T92340#1133109, @leila wrote: > It looks like IP address and user_agent is the problem. It's harder from my point of view. For example even without IP address and user_agent, th... [11:43:33] (03CR) 10Hoo man: [C: 04-1] "I'm fairly sure this contains oversighted user names and would need a view that joins against the local databases (via utr_wiki) to determ" [software] - 10https://gerrit.wikimedia.org/r/197507 (owner: 10Nemo bis) [12:04:16] 6operations: phpBB 3.1.0 new version - https://phabricator.wikimedia.org/T93349#1135078 (10emailbot) [12:06:00] Do we have a phpbb install somewhere? :P [12:10:48] 6operations: phpBB 3.1.0 new version - https://phabricator.wikimedia.org/T93349#1135094 (10Dereckson) 5Open>3Invalid a:3Dereckson Closed as spam. [12:21:37] <_joe_> lol [12:21:58] <_joe_> hoo: if we had, it would be in a segregate network I guess [12:33:33] (03CR) 10Faidon Liambotis: [C: 04-1] "Why the catch-all? Local addresses that do not exist locally should just fail, not mailed to root." [puppet] - 10https://gerrit.wikimedia.org/r/198114 (owner: 10Rush) [12:46:49] (03CR) 10Faidon Liambotis: [C: 04-1] nginx ssl_stapling_file support (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/198110 (owner: 10BBlack) [13:02:33] (03PS1) 1001tonythomas: Install bouncehandler extension everywhere ( including wikipedias ) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198220 (https://phabricator.wikimedia.org/T92877) [13:03:41] (03PS2) 1001tonythomas: Install bouncehandler extension everywhere ( including wikipedias ) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198220 (https://phabricator.wikimedia.org/T92877) [13:06:15] PROBLEM - HTTP 5xx req/min on graphite2001 is CRITICAL: CRITICAL: 7.69% of data above the critical threshold [500.0] [13:06:23] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.69% of data above the critical threshold [500.0] [13:15:56] 6operations, 10Continuous-Integration, 6Labs: Evaluate options to make puppet errors more visible - https://phabricator.wikimedia.org/T92710#1135199 (10akosiaris) Just making a point here so we avoid future "puppet failed" threads. Puppet itself did not fail in this specific case. A specific resource failed... [13:17:04] RECOVERY - HTTP 5xx req/min on graphite2001 is OK: OK: Less than 1.00% above the threshold [250.0] [13:17:04] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [13:23:08] (03PS1) 10Mobrovac: Activate RESTBase in the Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198221 (https://phabricator.wikimedia.org/T91102) [13:24:33] (03PS1) 10Filippo Giunchedi: swift: update partman allocation for HP [puppet] - 10https://gerrit.wikimedia.org/r/198222 (https://phabricator.wikimedia.org/T90922) [13:24:52] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: update partman allocation for HP [puppet] - 10https://gerrit.wikimedia.org/r/198222 (https://phabricator.wikimedia.org/T90922) (owner: 10Filippo Giunchedi) [13:35:54] 6operations: Provide dh-virtualenv 0.9 package on apt.wikimedia.org Precise and Trusty distributions - https://phabricator.wikimedia.org/T91631#1135226 (10akosiaris) Hello, tried rebuilding the package and I ran into a couple of problems with pdebuilder In order to make it easier to use pbuilder, could you pleas... [13:45:02] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "LGTM" [debs/contenttranslation/apertium-hbs-mkd] - 10https://gerrit.wikimedia.org/r/195264 (https://phabricator.wikimedia.org/T89936) (owner: 10KartikMistry) [13:46:48] !log uploaded apertium-hbs-mkd_0.1.0~r57554-1 on apt.wikimedia.org component: trusty-wikimedia [13:46:53] Logged the message, Master [13:57:23] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: puppet fail [14:14:21] (03PS1) 10Filippo Giunchedi: swift: fix typo in label_filesystem [puppet] - 10https://gerrit.wikimedia.org/r/198226 [14:14:23] (03PS1) 10Filippo Giunchedi: install-server: upgrade kernel on swift HP machines [puppet] - 10https://gerrit.wikimedia.org/r/198227 (https://phabricator.wikimedia.org/T90922) [14:14:43] RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [14:14:58] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] swift: fix typo in label_filesystem [puppet] - 10https://gerrit.wikimedia.org/r/198226 (owner: 10Filippo Giunchedi) [14:18:03] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [14:19:56] 6operations, 10Continuous-Integration, 6Labs: Evaluate options to make puppet errors more visible - https://phabricator.wikimedia.org/T92710#1135275 (10scfc) Note that this task relates to Labs instances where Icinga can't be used. @yuvipanda had to jump through a lot of hoops to make the existing alerts po... [14:22:03] PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [14:33:23] 6operations, 10RESTBase: (nodetool )cleanup needed on restbase1006 - https://phabricator.wikimedia.org/T93079#1135310 (10Eevans) A late update: This cleanup was aborted after it failed to make any headway, and T93140 was discovered. We can try again once compaction has caught up. [14:34:03] 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: Cassandra compaction is getting behind - https://phabricator.wikimedia.org/T93140#1130277 (10Eevans) [14:34:04] 6operations, 10RESTBase: (nodetool )cleanup needed on restbase1006 - https://phabricator.wikimedia.org/T93079#1135313 (10Eevans) [14:35:24] (03PS22) 10JanZerebecki: Wikidata builder [puppet] - 10https://gerrit.wikimedia.org/r/195567 (https://phabricator.wikimedia.org/T90567) [14:39:30] (03PS1) 10Giuseppe Lavagetto: ganglia: add views for a few of codfw clusters [puppet] - 10https://gerrit.wikimedia.org/r/198231 [14:39:32] (03PS23) 10JanZerebecki: Wikidata builder [puppet] - 10https://gerrit.wikimedia.org/r/195567 (https://phabricator.wikimedia.org/T90567) [14:39:59] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] ganglia: add views for a few of codfw clusters [puppet] - 10https://gerrit.wikimedia.org/r/198231 (owner: 10Giuseppe Lavagetto) [14:40:10] 6operations, 10RESTBase: (nodetool) cleanup needed on restbase1006 - https://phabricator.wikimedia.org/T93079#1135340 (10Aklapper) [14:40:32] (03PS24) 10JanZerebecki: Wikidata builder [puppet] - 10https://gerrit.wikimedia.org/r/195567 (https://phabricator.wikimedia.org/T90567) [14:46:02] 6operations: Put archiva.wikimedia.org behind misc-web-lb and force https - https://phabricator.wikimedia.org/T88139#1135376 (10Ottomata) The main issue is this: https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/archiva.pp#L34 Archiva has to deal with large objects being upload and... [14:50:37] (03PS2) 10Ottomata: Mark limn's apache configs as managed by puppet [puppet] - 10https://gerrit.wikimedia.org/r/198117 (owner: 10QChris) [14:52:29] (03PS2) 10Ottomata: Allow limn instances to specify Apache config for proxy [puppet] - 10https://gerrit.wikimedia.org/r/198118 (owner: 10QChris) [14:52:37] (03CR) 10Ottomata: [C: 032] Mark limn's apache configs as managed by puppet [puppet] - 10https://gerrit.wikimedia.org/r/198117 (owner: 10QChris) [14:52:40] (03PS2) 10Ottomata: Add limn proxy template that handles taken down Wikipedia Zero dashboards [puppet] - 10https://gerrit.wikimedia.org/r/198119 (https://phabricator.wikimedia.org/T92920) (owner: 10QChris) [14:52:44] (03PS25) 10JanZerebecki: Wikidata builder [puppet] - 10https://gerrit.wikimedia.org/r/195567 (https://phabricator.wikimedia.org/T90567) [14:54:12] (03CR) 10jenkins-bot: [V: 04-1] Wikidata builder [puppet] - 10https://gerrit.wikimedia.org/r/195567 (https://phabricator.wikimedia.org/T90567) (owner: 10JanZerebecki) [14:54:29] (03PS26) 10JanZerebecki: Wikidata builder [puppet] - 10https://gerrit.wikimedia.org/r/195567 (https://phabricator.wikimedia.org/T90567) [14:56:32] (03PS27) 10JanZerebecki: Wikidata builder [puppet] - 10https://gerrit.wikimedia.org/r/195567 (https://phabricator.wikimedia.org/T90567) [14:57:54] (03CR) 10Ottomata: [C: 032] Allow limn instances to specify Apache config for proxy [puppet] - 10https://gerrit.wikimedia.org/r/198118 (owner: 10QChris) [14:57:59] (03CR) 10Ottomata: [C: 032] Add limn proxy template that handles taken down Wikipedia Zero dashboards [puppet] - 10https://gerrit.wikimedia.org/r/198119 (https://phabricator.wikimedia.org/T92920) (owner: 10QChris) [14:59:08] 7Puppet, 6Multimedia, 6Release-Engineering, 6Scrum-of-Scrums, 7Blocked-on-RelEng: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1135430 (10Gilles) I know this isn't the main direction we've decided on, but I've tried my hand at packaging the django-* things with debianize... [14:59:18] 7Puppet, 6Multimedia, 6Release-Engineering, 6Scrum-of-Scrums, and 2 others: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1135431 (10Gilles) [15:00:52] (03PS28) 10JanZerebecki: Wikidata builder [puppet] - 10https://gerrit.wikimedia.org/r/195567 (https://phabricator.wikimedia.org/T90567) [15:08:55] 6operations, 10ops-eqiad: Failed Raid Analytics1010 - https://phabricator.wikimedia.org/T92957#1135476 (10Ottomata) I wouldn't mind keeping it around for some realtime framework trials, but if it breaks I don't really care much about it. Should I turn off alarms? Maybe I should just remove it from puppet and... [15:09:30] Hm… if the api sends an edit but the new page is identical to the old page, is that edit discareded or does the db end up with a new identical record? [15:10:09] (03CR) 10JanZerebecki: "Used in: https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikidata-build" [puppet] - 10https://gerrit.wikimedia.org/r/195567 (https://phabricator.wikimedia.org/T90567) (owner: 10JanZerebecki) [15:12:22] andrewbogott: no new revision is created, but updates for link tables are still queued, it's called a "null edit" [15:12:57] 6operations: fix up log retention on log collection/storage hosts - https://phabricator.wikimedia.org/T92839#1135492 (10Ottomata) > on oxygen, all logs in /a/log/webrequest/archive/ of the form zero-*.tsv.log-*gz e.g. zero-digi-malaysia.tsv.log-20130711.gz will have to be removed manually, logrot doesn't see tho... [15:13:03] legoktm: ok, I think that’s what I want. I’m going to set up a polling update for a page, didn’t want to fill up my drive :) [15:13:07] well, for many pages :/ [15:13:34] legoktm: will the date for the latest revision change? [15:13:57] andrewbogott: no, but I think the page_touched field will update [15:14:04] ok [15:14:09] sounds fine, I will try it [15:17:00] (03CR) 10Eevans: WIP cassandra: add ferm rules (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/197840 (https://phabricator.wikimedia.org/T92680) (owner: 10Dzahn) [15:17:06] (03PS1) 10Giuseppe Lavagetto: dhcp: add mw2098 [puppet] - 10https://gerrit.wikimedia.org/r/198236 [15:17:26] (03PS2) 10Giuseppe Lavagetto: dhcp: add mw2098 [puppet] - 10https://gerrit.wikimedia.org/r/198236 [15:17:34] (03CR) 10Andrew Bogott: "One inline comment, but overall this looks fine to me." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/195567 (https://phabricator.wikimedia.org/T90567) (owner: 10JanZerebecki) [15:17:47] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] dhcp: add mw2098 [puppet] - 10https://gerrit.wikimedia.org/r/198236 (owner: 10Giuseppe Lavagetto) [15:18:59] (03CR) 10Andrew Bogott: "I'd also like a bit more context. Is the issue that wikidata source is stored in other (multiple) repos, and the extension is built by mi" [puppet] - 10https://gerrit.wikimedia.org/r/195567 (https://phabricator.wikimedia.org/T90567) (owner: 10JanZerebecki) [15:19:51] (03CR) 10Eevans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/197662 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [15:21:38] I desperately need to catch up on all that [15:21:41] gah [15:24:41] (03PS1) 10Andrew Bogott: Turn on the nova auditing facility. [puppet] - 10https://gerrit.wikimedia.org/r/198237 (https://phabricator.wikimedia.org/T52620) [15:25:03] 6operations, 10Continuous-Integration, 6Labs: Evaluate options to make puppet errors more visible - https://phabricator.wikimedia.org/T92710#1135516 (10akosiaris) >>! In T92710#1135275, @scfc wrote: > Note that this task relates to Labs instances where Icinga can't be used. @yuvipanda had to jump through a... [15:26:29] (03PS2) 10Andrew Bogott: Turn on the nova auditing facility. [puppet] - 10https://gerrit.wikimedia.org/r/198237 (https://phabricator.wikimedia.org/T52620) [15:30:19] (03CR) 10Andrew Bogott: [C: 032] Turn on the nova auditing facility. [puppet] - 10https://gerrit.wikimedia.org/r/198237 (https://phabricator.wikimedia.org/T52620) (owner: 10Andrew Bogott) [15:38:35] PROBLEM - puppet last run on amssq37 is CRITICAL: CRITICAL: puppet fail [15:48:51] (03PS10) 10BBlack: nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 [15:49:40] (03CR) 10jenkins-bot: [V: 04-1] nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 (owner: 10BBlack) [15:50:13] fu jenkins, you're never supportive :P [15:50:33] (03PS7) 10Yuvipanda: Adjust RESTBase / Cassandra settings for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/197662 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [15:50:40] (03PS1) 10Giuseppe Lavagetto: dhcp: correct entry for mw2179 [puppet] - 10https://gerrit.wikimedia.org/r/198240 [15:50:42] (03CR) 10Yuvipanda: [C: 032 V: 032] Adjust RESTBase / Cassandra settings for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/197662 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [15:50:46] (03PS5) 10Alexandros Kosiaris: Package builder module [puppet] - 10https://gerrit.wikimedia.org/r/194471 [15:51:40] cheers YuviPanda! [15:51:58] mobrovac: :) [15:52:08] !log Deployed patch for T93365 [15:52:10] (03CR) 10Alex Monk: Activate RESTBase in the Beta Cluster (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198221 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [15:52:11] Logged the message, Master [15:52:21] (03PS2) 10Giuseppe Lavagetto: dhcp: correct entry for mw2179 [puppet] - 10https://gerrit.wikimedia.org/r/198240 [15:52:23] (03PS1) 10Giuseppe Lavagetto: dhcp: correct entry for mw2184 [puppet] - 10https://gerrit.wikimedia.org/r/198241 [15:54:45] (03CR) 10Alex Monk: Activate RESTBase in the Beta Cluster (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198221 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [15:55:03] (03Abandoned) 10Yuvipanda: Revert "Revert "Hiera-ize most of the Elasticsearch config"" [puppet] - 10https://gerrit.wikimedia.org/r/197522 (owner: 10Yuvipanda) [15:56:26] (03PS1) 10Steinsplitter: Whitelisting domain for Nordiska museet to allow GWT upload [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198242 (https://phabricator.wikimedia.org/T93104) [15:56:53] RECOVERY - puppet last run on amssq37 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [15:58:29] 6operations, 10ops-codfw: mw2088 has a faulty RAM - https://phabricator.wikimedia.org/T93370#1135584 (10Joe) 3NEW [15:58:56] (03CR) 10JanZerebecki: "This is the output that was generated when I last ran the cron job manually: https://gerrit.wikimedia.org/r/#/c/198234/ (Which then someon" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/195567 (https://phabricator.wikimedia.org/T90567) (owner: 10JanZerebecki) [16:00:54] RECOVERY - Host mw2003 is UP: PING OK - Packet loss = 0%, RTA = 43.97 ms [16:03:02] (03CR) 10Andrew Bogott: [C: 031] Wikidata builder (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/195567 (https://phabricator.wikimedia.org/T90567) (owner: 10JanZerebecki) [16:03:07] (03PS2) 10Filippo Giunchedi: install-server: upgrade kernel on swift HP machines [puppet] - 10https://gerrit.wikimedia.org/r/198227 (https://phabricator.wikimedia.org/T90922) [16:03:30] (03PS1) 10Giuseppe Lavagetto: dhcp: correct entry for mw2195 [puppet] - 10https://gerrit.wikimedia.org/r/198244 [16:04:38] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] dhcp: correct entry for mw2179 [puppet] - 10https://gerrit.wikimedia.org/r/198240 (owner: 10Giuseppe Lavagetto) [16:05:33] (03CR) 10Giuseppe Lavagetto: [C: 032] dhcp: correct entry for mw2184 [puppet] - 10https://gerrit.wikimedia.org/r/198241 (owner: 10Giuseppe Lavagetto) [16:06:06] (03CR) 10Giuseppe Lavagetto: [C: 032] dhcp: correct entry for mw2195 [puppet] - 10https://gerrit.wikimedia.org/r/198244 (owner: 10Giuseppe Lavagetto) [16:06:29] phabricator is higly slow .... [16:06:34] from europe [16:08:51] 6operations: Add Yana to contracts@ - https://phabricator.wikimedia.org/T91269#1135613 (10Chip) @Aklapper No, there aren't. [16:09:43] greg-g: this is sort of related to my panic yesterday (minus the panic)… when is silver likely to be switched over to 1.25wmf22? Does that group lag by a particular amount of time? [16:10:34] (03CR) 1001tonythomas: "Holding it till JGreen is back from vacs so that we can test the change after rollout" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198220 (https://phabricator.wikimedia.org/T92877) (owner: 1001tonythomas) [16:10:40] andrewbogott: we determined it (silver) was in group1 (non-wikipedias) right? [16:10:52] I think so [16:10:53] andrewbogott: if so, then it's Tuesday at 11am pacific [16:10:58] cool [16:11:18] andrewbogott: see mentions of "group1" here: https://wikitech.wikimedia.org/wiki/Deployments [16:11:34] I’m looking through old bugs, a few bugs have pending patches in 22 [16:11:45] which means it’ll probably break, I guess :( [16:11:49] * andrewbogott makes a calendar note [16:12:50] bugtracker schoud be one of the stables't things. [16:13:41] greg-g, andrewbogott: it was group1, yes [16:14:15] Steinsplitter: It's fast for me... maybe network related? [16:14:15] 10Ops-Access-Requests, 6operations, 6Phabricator, 6Release-Engineering, 5Patch-For-Review: Mukunda needs sudo on iridium (phab host) - https://phabricator.wikimedia.org/T93151#1135629 (10Dzahn) a:5mmodell>3Gage assigning to Gage because he is on Clinic Duty from Monday [16:14:43] andrewbogott, it should be easy enough to defer it to group2... which is like the next day or something, IIRC [16:14:58] Krenair: nope, Tuesday is fine, I’ll just keep an eye out [16:15:10] hoo: strange, it is ony phab. might be possible [16:15:11] it gets updated in these commits for group1: https://gerrit.wikimedia.org/r/#/c/197382/1/wikiversions.json [16:15:17] (ctrl+f 'labswiki') [16:16:52] https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#1._Create_group1_wikis_to_VERSION_patch [16:18:31] 6operations, 10ops-codfw, 6Labs: rack and connect labstore-array4-codfw in codfw - https://phabricator.wikimedia.org/T93215#1135632 (10mark) [16:19:38] putting wikitech in group2 would be kind of a pain for whoever is running the train [16:19:48] group1 is easiest [16:20:03] because group1 == all - wikipedias [16:21:29] (03PS2) 10Alex Monk: Use elseif instead of else if [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192894 (owner: 10Southparkfan) [16:25:07] 6operations, 10ops-codfw: suhail/wmf5817 - relabel system / relocate / setup mgmt / update racktables - https://phabricator.wikimedia.org/T93284#1135654 (10Papaul) 5Open>3Resolved Relocation from B5 to A5 complete Rack table update physical label in place mgmt. and Bios settings complete Test complete s... [16:25:07] 6operations: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1135656 (10Papaul) [16:29:11] 6operations, 10ops-codfw: subra/wmf5816 - relabel system / setup mgmt / update racktables - https://phabricator.wikimedia.org/T93272#1135665 (10Papaul) 5Open>3Resolved Rack table update physical label in place mgmt. and Bios settings complete Test complete subra 10.193.2.163 ge-5/0/11 B5 [16:29:12] 6operations: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1135667 (10Papaul) [16:31:24] PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: puppet fail [16:35:02] (03PS1) 10Dzahn: add hosts subra and suhail to DHCP (poolcounter) [puppet] - 10https://gerrit.wikimedia.org/r/198245 (https://phabricator.wikimedia.org/T93261) [16:35:58] papaul: thanks! [16:37:05] mutante: you welcome [16:39:01] (03PS3) 10Alex Monk: Remove unused variables and commented-out code from CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/156078 (https://bugzilla.wikimedia.org/29902) (owner: 10Withoutaname) [16:44:46] (03PS2) 10Dzahn: add hosts subra and suhail to DHCP (poolcounter) [puppet] - 10https://gerrit.wikimedia.org/r/198245 (https://phabricator.wikimedia.org/T93261) [16:47:25] (03CR) 10Dzahn: [C: 032] add hosts subra and suhail to DHCP (poolcounter) [puppet] - 10https://gerrit.wikimedia.org/r/198245 (https://phabricator.wikimedia.org/T93261) (owner: 10Dzahn) [16:50:55] Hm… is it generally preferred to put service monitoring in the role class or the service class? [16:51:04] RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [16:51:35] (03PS5) 10Nuria: Adding a Last-Access cookie to text and mobile requests [puppet] - 10https://gerrit.wikimedia.org/r/196009 (https://phabricator.wikimedia.org/T88813) [16:52:18] andrewbogott: I've seen it mostly in the service unless its role specific [16:52:19] (03CR) 10BryanDavis: [C: 031] "This seems like a good way to start." [puppet] - 10https://gerrit.wikimedia.org/r/197885 (owner: 10Giuseppe Lavagetto) [16:55:10] (03PS1) 10Andrew Bogott: Icinga monitoring for nova-compute process. [puppet] - 10https://gerrit.wikimedia.org/r/198249 [16:55:22] JohnFLewis: Since you were the first to respond… ^ [16:55:28] (03PS1) 10Dzahn: add subra,suhail to netboot. partman/raid1-1part [puppet] - 10https://gerrit.wikimedia.org/r/198250 (https://phabricator.wikimedia.org/T93261) [16:55:54] (03PS2) 10Andrew Bogott: Icinga monitoring for nova-compute process. [puppet] - 10https://gerrit.wikimedia.org/r/198249 (https://phabricator.wikimedia.org/T90784) [16:56:06] 6operations, 7HTTPS, 7Performance, 7notice, 7user-notice: Support SPDY - https://phabricator.wikimedia.org/T35890#1135743 (10Nemo_bis) [16:56:45] (03CR) 10Dzahn: [C: 032] add subra,suhail to netboot. partman/raid1-1part [puppet] - 10https://gerrit.wikimedia.org/r/198250 (https://phabricator.wikimedia.org/T93261) (owner: 10Dzahn) [16:56:52] (03CR) 10John F. Lewis: [C: 031] "Look sane" [puppet] - 10https://gerrit.wikimedia.org/r/198249 (https://phabricator.wikimedia.org/T90784) (owner: 10Andrew Bogott) [16:59:07] YuviPanda: feel sane looking at an Apache change? [16:59:19] JohnFLewis: about to head off for then ight :( sorry! [16:59:22] (03CR) 10Andrew Bogott: [C: 032] Icinga monitoring for nova-compute process. [puppet] - 10https://gerrit.wikimedia.org/r/198249 (https://phabricator.wikimedia.org/T90784) (owner: 10Andrew Bogott) [16:59:26] JohnFLewis: I didn’t look at your other patches either... [16:59:41] sorry! TOo much stuff happening, am moving continents in a week :) [17:00:12] YuviPanda: bah, it's alright. Who's next week unlucky opsen? [17:00:21] not sure :) [17:00:47] Again, bah :p [17:01:35] (03PS11) 10BBlack: nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 [17:02:03] (03CR) 10John F. Lewis: "Nope, will try and find an opsen to get this deployed when one feels like it. The only issue with this is that it's Apache :)" [puppet] - 10https://gerrit.wikimedia.org/r/185474 (https://phabricator.wikimedia.org/T87039) (owner: 10Glaisher) [17:02:05] JohnFLewis: gage is up for next week [17:02:18] (03CR) 10jenkins-bot: [V: 04-1] nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 (owner: 10BBlack) [17:02:54] andrewbogott: thanks, if only the ops meetings weren't shoved to the private cove I could have searched that myself ;) [17:03:46] In theory they are listed here, but the page is out of date. I will fix [17:03:51] https://wikitech.wikimedia.org/wiki/Ops_Clinic_Duty [17:04:01] (03PS12) 10BBlack: nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 [17:04:13] I never saw it as an update to date source tbh [17:07:39] mutante: neon is erroring out on the team_services entry from this: https://gerrit.wikimedia.org/r/#/c/197717/ [17:07:47] Should i just yank that entry, or is that a typo of some sort? [17:08:42] andrewbogott: that should be a contact name that exists in the private repo [17:08:55] what's the error? says it doesn't know the contact? [17:09:11] maybe underscore is bad.. i'm checking [17:09:28] mutante: pre existing contact or new with that patch? [17:09:54] JohnFLewis: added a little before that patch [17:10:05] right [17:11:13] mutante: imho make it a hyphen as underscores look ugly :) [17:11:22] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [17:12:04] And from what I see, other cases use hyphens (irc-Wikidata etc.) [17:12:08] JohnFLewis: agreed, i just wanna know if that's the actual error [17:12:12] fixing [17:12:20] mutante: thanks [17:12:21] PROBLEM - puppet last run on mw2106 is CRITICAL: CRITICAL: Puppet has 1 failures [17:12:22] PROBLEM - puppet last run on mw2109 is CRITICAL: CRITICAL: Puppet has 1 failures [17:12:25] kk [17:12:36] I hotfixed the file on neon until the next puppet run [17:12:54] andrewbogott: i ran puppet to see what the error is :p [17:13:14] well, my hotfix lasted long enough for me to test my patch so I’m happy [17:13:47] andrewbogott: i see puppet adding your stuff.. nova monitoring.. a bunch of it [17:14:08] should be around 10 things, one check per compute node [17:14:40] which, btw, i’m still interested in if you like/hate the way I made that icinga check, this is the first one I’ve done in ages. https://gerrit.wikimedia.org/r/#/c/198249/ [17:14:51] despite my having merged it out of curiosity [17:15:32] RECOVERY - Check status of defined EventLogging jobs on vanadium is OK: OK: All defined EventLogging jobs are runnning. [17:16:02] andrewbogott: I still don't get the role vs module definition exactly apart from I use the general rule 'is it specific or general' and let people correct me from there really :p [17:16:08] hold on, let me fix this first. i did in the private repo [17:18:06] (03PS1) 10Dzahn: icinga: fix team-services contact name [puppet] - 10https://gerrit.wikimedia.org/r/198253 [17:19:48] (03CR) 10Dzahn: [C: 032] icinga: fix team-services contact name [puppet] - 10https://gerrit.wikimedia.org/r/198253 (owner: 10Dzahn) [17:20:10] andrewbogott: you probably don't see your changes yet because the service reload failed even though it's in the configs [17:20:19] it should soon though [17:21:07] 6operations, 6Commons, 6Multimedia, 7HHVM, 5Patch-For-Review: Create an HHVM 3.6.0 package, adding Tim's streaming patch - https://phabricator.wikimedia.org/T93194#1135785 (10Eloquence) [17:21:12] mutante: yeah, I forced one through, I can wait for the others. [17:21:22] 6operations, 6Commons, 6Multimedia, 7HHVM, 5Patch-For-Review: Create an HHVM 3.6.0 package, adding Tim's streaming patch - https://phabricator.wikimedia.org/T93194#1131609 (10Eloquence) (Tag was inherited and not intentionally added, removing.) [17:22:02] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [17:25:19] andrewbogott: Total Errors: 0 . reloaded icinga and here they are [17:25:22] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=nova [17:25:39] mutante: splendid. [17:25:48] After lunch I’m going to add a bunch more like that [17:26:40] 7Puppet, 6Multimedia, 6Release-Engineering, 6Scrum-of-Scrums, and 2 others: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1135798 (10Tgr) >>! In T84956#1135430, @Gilles wrote: > If this is a route worth pursuing, even if we don't manage to package them all, should I create... [17:27:03] Although it’s one thing to test if nova processes are running, quite another to verify that they aren’t running but broken :( [17:27:37] !log re-inserted log users_to_rename rows [17:27:43] Logged the message, Master [17:28:07] err, I meant from logs* [17:28:18] andrewbogott: about the way you did them, it's good. the --ereg-argument-array etc. avoids that the check process counts itself, which happened in other process checks and leads to fake warnings where it detects 2 processes [17:28:41] PROBLEM - puppet last run on ms-be1018 is CRITICAL: CRITICAL: Puppet has 1 failures [17:28:57] and yes, nrpe::monitor_service .. *nod* [17:30:47] (03PS1) 10Filippo Giunchedi: eqiad-prod: add ms-be101[678] [software/swift-ring] - 10https://gerrit.wikimedia.org/r/198256 (https://phabricator.wikimedia.org/T1268) [17:31:32] RECOVERY - puppet last run on ms-be1018 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [17:32:13] andrewbogott: maybe use check_log to check for errors in logs. the plugin you already have. /usr/lib/nagios/plugins/check_log [17:32:59] i'm not sure though if they should be on the virt hosts or rather on the logserver [17:35:59] check_logfiles (http://labs.consol.de/nagios/check_logfiles/) is more advanced than the default check_log [17:38:05] that on the actual logserver could be used a for a ton of different checks [17:39:56] 6operations, 7HTTPS, 3HTTPS-by-default: Force all Wikimedia cluster traffic to be over SSL for all users (logged-in and anon) - https://phabricator.wikimedia.org/T49832#1135851 (10MC8) [17:40:07] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1135852 (10Dzahn) [17:41:01] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1133244 (10Dzahn) [17:41:13] (03CR) 10Eloquence: "Please add a Phabricator task & #roadmap item. Thank you!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196984 (owner: 10Jforrester) [17:44:01] (03CR) 10Greg Grossmeier: "Can you create a task for this and add to the #roadmap project? kthx" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196984 (owner: 10Jforrester) [17:45:12] (03PS2) 10Alex Monk: Activate RESTBase in the Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198221 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [17:45:17] !log subra,suhail - powercycling to BIOS [17:45:20] Logged the message, Master [17:46:50] (03PS3) 10Alex Monk: Activate RESTBase in the Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198221 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [17:47:52] 6operations, 10hardware-requests: codfw: (2) poolcounter systems - https://phabricator.wikimedia.org/T93266#1135884 (10Dzahn) when i ssh to suhail mgmt, i see it is still installed as "tola" and up and running. ? ``` ssh root@suhail.mgmt.codfw.wmnet root@suhail.mgmt.codfw.wmnet's password: /admin1-> console... [17:47:58] (03CR) 10GWicke: Activate RESTBase in the Beta Cluster (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198221 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [17:50:21] (03CR) 10Mobrovac: Activate RESTBase in the Beta Cluster (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198221 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [17:51:41] (03CR) 10GWicke: Activate RESTBase in the Beta Cluster (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198221 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [17:52:10] (03PS2) 10Jforrester: Enable VisualEditor by default on "phase 5" Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196984 (https://phabricator.wikimedia.org/T93386) [17:54:44] !log killing tola, reinstall as suhail [17:54:50] Logged the message, Master [17:57:23] (03PS4) 10Alex Monk: Activate RESTBase in the Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198221 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [18:00:52] (03CR) 10Mobrovac: [C: 031] Activate RESTBase in the Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198221 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [18:01:20] hey greg-g, can we deploy that? ^ [18:01:23] it should be a no-op for production [18:02:22] it touches a couple of files to abstract out a production IP, and sets up the beta cluster IP in the -labs edit [18:02:47] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1135967 (10Dzahn) on subra: ``` PXE-E61: Media test failure, check cable PXE-M0F: Exiting Broadcom PXE ROM. No boot device available. ``` on suhail: no DHCP offers... [18:07:07] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1135999 (10Dzahn) [18:07:08] 6operations, 10ops-codfw: subra/wmf5816 - relabel system / setup mgmt / update racktables - https://phabricator.wikimedia.org/T93272#1135997 (10Dzahn) 5Resolved>3Open could you check the cable? i'm getting ``` PXE-E61: Media test failure, check cable PXE-M0F: Exiting Broadcom PXE ROM. No boot device ava... [18:13:57] 6operations, 10ops-codfw: suhail/wmf5817 - relabel system / relocate / setup mgmt / update racktables - https://phabricator.wikimedia.org/T93284#1136071 (10Dzahn) server was still installed as "tola". confirmed IP address matches. i can't DHCP/PXEBOOT though. trying to send requests to carbon but carbon never... [18:14:25] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1136076 (10Dzahn) [18:14:25] 6operations, 10ops-codfw: suhail/wmf5817 - relabel system / relocate / setup mgmt / update racktables - https://phabricator.wikimedia.org/T93284#1136075 (10Dzahn) 5Resolved>3Open [18:16:11] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1133244 (10Dzahn) [18:16:12] 6operations, 10hardware-requests: codfw: (2) poolcounter systems - https://phabricator.wikimedia.org/T93266#1136088 (10Dzahn) 5Resolved>3Open i cant PXE boot into an installer on either of the 2 boxes. one seems to be a cabling issue (reopened dc blocking task), the other might be switch vlan config. is th... [18:18:26] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1136106 (10Dzahn) on suhail: carbon never receives any requests from suhail's MAC. suhail then fails with "no offers". is the VLAN config on the switch correct? [18:26:14] (03PS1) 10John F. Lewis: gdash: force ssl protocol [puppet] - 10https://gerrit.wikimedia.org/r/198268 [18:26:47] 6operations, 10ops-codfw: suhail/wmf5817 - relabel system / relocate / setup mgmt / update racktables - https://phabricator.wikimedia.org/T93284#1136142 (10RobH) 5Open>3Resolved You couldn't pxe boot because the parent task T93261 hasn't had the network setup step done. this task doesnt cover, resolving [18:26:48] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1133244 (10RobH) [18:29:43] 6operations, 10Analytics-EventLogging, 6Analytics-Kanban: EventLogging query strings are truncated to 1014 bytes by ?(varnishncsa? or udp packet size?) - https://phabricator.wikimedia.org/T91347#1136154 (10Ottomata) So, we have implemented a varnishkafka eventlogging endpoint that works well and dandy. But!... [18:29:53] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1133244 (10RobH) [18:29:54] 6operations, 10ops-codfw: subra/wmf5816 - relabel system / setup mgmt / update racktables - https://phabricator.wikimedia.org/T93272#1136155 (10RobH) 5Open>3Resolved This may be due to the network port not being setup yet per parent task T93261. I'll set that up now, so resolving this task. (If it still... [18:31:05] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1133244 (10RobH) [18:31:30] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1136165 (10RobH) a:3Dzahn I just finished setting up the network ports, so I'm now assignign this task to @dzahn for install. [18:34:22] 6operations, 10hardware-requests: codfw: (2) poolcounter systems - https://phabricator.wikimedia.org/T93266#1136174 (10RobH) 5Open>3Resolved I'm not sure why install issues would reopen this ticket. The hardware is allocated, any install items shoudl be addressed on T93261. resolving. [18:34:23] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1136176 (10RobH) [18:37:49] RECOVERY - Graphite Carbon on graphite2001 is OK: OK: All defined Carbon jobs are runnning. [18:39:50] 6operations, 6WMF-Design, 10Wikimedia-General-or-Unknown, 7Design, 7Varnish: Better WMF error pages - https://phabricator.wikimedia.org/T76560#1136199 (10greg) [18:42:07] (03CR) 10GWicke: [C: 031] Activate RESTBase in the Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198221 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [18:42:19] PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [18:42:46] Krenair: did you reach greg-g re https://gerrit.wikimedia.org/r/#/c/198221/ ? [18:44:35] (03PS1) 10BBlack: depool cp1058 be [puppet] - 10https://gerrit.wikimedia.org/r/198272 [18:45:17] (03CR) 10BBlack: [C: 032 V: 032] depool cp1058 be [puppet] - 10https://gerrit.wikimedia.org/r/198272 (owner: 10BBlack) [18:46:55] (03CR) 10Odder: [C: 031] Whitelisting domain for Nordiska museet to allow GWT upload [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198242 (https://phabricator.wikimedia.org/T93104) (owner: 10Steinsplitter) [18:47:41] !log reinstalling cp1057 [18:47:46] Logged the message, Master [18:47:55] bad fingers :P [18:48:19] !log reinstalling cp1058 (ignore cp1057 message above, it's a typo!) [18:48:23] Logged the message, Master [18:56:41] 6operations: Purge > 90 days stat1002:/a/squid/archive/glam_nara - https://phabricator.wikimedia.org/T92340#1136244 (10leila) I'm with you @QChris. The best way is for us to understand what kind of analytics the GLAM community needs. Once we know that, we can figure out how to aggregate this data in a way that c... [18:58:16] gwicke, nope [18:58:41] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1136250 (10Dzahn) thanks. retried. confirmed, both subra and suhail now talk to carbon unlike before. But now: Mar 20 18:55:39 carbon dhcpd: DHCPDISCOVER from d4:ae:52:ad:62:75 via 10.192.16.2: netw... [19:02:57] I see him in the office, let me use that opportunity [19:04:14] Krenair: green light from greg [19:16:34] PROBLEM - puppet last run on mw1024 is CRITICAL: CRITICAL: Puppet has 1 failures [19:18:11] (03PS1) 10RobH: wrong reverse entry for subra [dns] - 10https://gerrit.wikimedia.org/r/198282 [19:19:05] (03PS13) 10BBlack: nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 [19:19:07] (03PS1) 10BBlack: repool cp1058 [puppet] - 10https://gerrit.wikimedia.org/r/198283 [19:19:20] (03CR) 10RobH: [C: 032] wrong reverse entry for subra [dns] - 10https://gerrit.wikimedia.org/r/198282 (owner: 10RobH) [19:19:22] (03CR) 10BBlack: [C: 032 V: 032] repool cp1058 [puppet] - 10https://gerrit.wikimedia.org/r/198283 (owner: 10BBlack) [19:21:20] (03PS2) 10BBlack: repool cp1058 [puppet] - 10https://gerrit.wikimedia.org/r/198283 [19:21:34] (03CR) 10BBlack: [C: 032 V: 032] repool cp1058 [puppet] - 10https://gerrit.wikimedia.org/r/198283 (owner: 10BBlack) [19:21:54] heh wrong hash in rebase there, that was confusing :) [19:23:25] (03CR) 10Alex Monk: [C: 032] Activate RESTBase in the Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198221 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [19:23:32] (03Merged) 10jenkins-bot: Activate RESTBase in the Beta Cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198221 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [19:24:13] (03PS1) 10Andrew Bogott: Add process monitoring for nova services. [puppet] - 10https://gerrit.wikimedia.org/r/198285 [19:24:26] PROBLEM - puppet last run on cp1058 is CRITICAL: CRITICAL: Puppet has 1 failures [19:26:15] (03PS2) 10Andrew Bogott: Add process monitoring for nova services. [puppet] - 10https://gerrit.wikimedia.org/r/198285 (https://phabricator.wikimedia.org/T90784) [19:26:57] (03PS1) 10RobH: My initial patch had subra/suhail rows reversed from reality [dns] - 10https://gerrit.wikimedia.org/r/198286 (https://phabricator.wikimedia.org/T93261) [19:27:23] !log krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/198221/ for beta - should be a noop in prod (duration: 00m 08s) [19:27:26] RECOVERY - puppet last run on cp1058 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [19:27:28] Logged the message, Master [19:27:48] (03CR) 10Andrew Bogott: [C: 032] Add process monitoring for nova services. [puppet] - 10https://gerrit.wikimedia.org/r/198285 (https://phabricator.wikimedia.org/T90784) (owner: 10Andrew Bogott) [19:27:59] (03CR) 10RobH: [C: 032] My initial patch had subra/suhail rows reversed from reality [dns] - 10https://gerrit.wikimedia.org/r/198286 (https://phabricator.wikimedia.org/T93261) (owner: 10RobH) [19:28:07] !log krenair Synchronized wmf-config: rv (duration: 00m 07s) [19:28:10] Logged the message, Master [19:28:12] gwicke, hmm, wtf [19:28:19] (03CR) 10Dzahn: "+1, should work just like https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=nova works already" [puppet] - 10https://gerrit.wikimedia.org/r/198285 (https://phabricator.wikimedia.org/T90784) (owner: 10Andrew Bogott) [19:28:42] Krenair: hmm? [19:29:43] gwicke, it didn't seem cause any issues that I could see, but "Undefined variable: wmgRestbaseServer" started appearing in fatalmonitor [19:29:48] even though we set it [19:29:54] (03PS1) 10BBlack: get rid of cache jessie conditionals [puppet] - 10https://gerrit.wikimedia.org/r/198289 [19:30:28] Krenair: did you revert? [19:30:31] yes [19:30:43] okay [19:30:50] by checking out head^ and syncing [19:30:56] but why did that happen? :/ [19:32:28] $wgRestbaseServer or $wmgRestbaseServer? [19:33:06] RECOVERY - puppet last run on mw1024 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [19:33:14] gwicke, $wmgRestbaseServer [19:35:35] (03CR) 10BBlack: [C: 032] get rid of cache jessie conditionals [puppet] - 10https://gerrit.wikimedia.org/r/198289 (owner: 10BBlack) [19:36:07] (03PS1) 10Hoo man: Replicate centralauth table in dbstores [puppet] - 10https://gerrit.wikimedia.org/r/198292 [19:36:37] * tables, doh [19:37:17] (03PS2) 10Hoo man: Replicate centralauth tables in dbstores [puppet] - 10https://gerrit.wikimedia.org/r/198292 [19:40:34] Krenair: to me it looks like wmgRestbaseServer was only set for labs [19:41:08] gwicke, no... https://gerrit.wikimedia.org/r/#/c/198221/4/wmf-config/InitialiseSettings.php [19:41:13] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1136499 (10Dzahn) suhail works now and boots into the installer, the next problem it faces is the partman recipe: ``` ┌────────────────────┤ [!!] Partition disks ├─────────────────────┐... [19:42:59] !log krenair Synchronized wmf-config: retry, think that was a caching issue? (duration: 00m 07s) [19:43:04] Logged the message, Master [19:45:24] !log krenair Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 07s) [19:45:28] Logged the message, Master [19:46:11] gwicke, wtf, I just touched that and it worked? [19:46:14] weird [19:46:35] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1136502 (10Dzahn) ``` /lib/partman/init.d/25md-devices: ******************************************************* /lib/partman/init.d/30parted: ******************************************************* pa... [19:47:04] Krenair: that is indeed weird [19:47:16] the error is not showing up in new log entries [19:51:50] gwicke, is that actually running in beta? [19:52:03] Krenair: what is 'that'? [19:52:04] I'm not able to curl "http://10.68.17.227:7231" from deployment-bastion [19:52:21] I am able to do the production equivalent from tin [19:53:59] it looks stuck [19:54:49] The last Puppet run was at Fri Mar 20 12:29:04 UTC 2015 (444 minutes ago). [19:54:56] before https://gerrit.wikimedia.org/r/#/c/197662/ [19:55:49] Krenair: fixed it [19:56:53] (03PS1) 10Dzahn: netboot: raid1-lvm partman recipe for subra/suhail [puppet] - 10https://gerrit.wikimedia.org/r/198295 (https://phabricator.wikimedia.org/T93261) [19:57:01] gwicke, apparently VE receives an HTTP 500 from it [19:57:49] (03CR) 10Dzahn: [C: 032] netboot: raid1-lvm partman recipe for subra/suhail [puppet] - 10https://gerrit.wikimedia.org/r/198295 (https://phabricator.wikimedia.org/T93261) (owner: 10Dzahn) [19:59:57] gwicke, [20:00:01] krenair@deployment-bastion:~$ curl "http://10.68.17.227:7231/en.wikipedia.beta.wmflabs.org/v1/page/html/Main_Page" [20:00:01] {"type":"https://restbase.org/errors/internal_error","method":"get","detail":"HTTPError: 500: internal_http_error"} [20:01:25] Krenair: that means that the request to parsoid might not work correctly [20:01:48] (03CR) 10Anomie: "I note the privacy implications of this are still being debated at T92977." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/196009 (https://phabricator.wikimedia.org/T88813) (owner: 10Nuria) [20:02:50] nothing in deployment-restbase01:/var/log/restbase [20:03:00] the logging is all to logstash [20:03:00] krenair@deployment-restbase01:~$ curl "http://deployment-parsoid05.eqiad.wmflabs" [20:03:00] curl: (7) Failed to connect to deployment-parsoid05.eqiad.wmflabs port 80: Connection refused [20:03:03] yeah :( [20:03:12] hey, that's a good thing! [20:03:22] tis listening on port 8080 isn't it ? [20:04:18] 0.0.0.0:8000 14859/nodejs <-- Krenair [20:04:19] (03CR) 10Nuria: "I think -per ticket- all privacy concerns have been addressed, if not please so kind as to specify concerns on ticket. As far as I can see" [puppet] - 10https://gerrit.wikimedia.org/r/196009 (https://phabricator.wikimedia.org/T88813) (owner: 10Nuria) [20:04:23] (03PS1) 10Eevans: disable nightly repairs [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/198297 (https://phabricator.wikimedia.org/T92355) [20:04:32] that's it, 8000 [20:04:45] so restbase is configured wrongly in deployment-prep? [20:04:53] possibly [20:04:59] (03CR) 10Nuria: "Also please look at: https://phabricator.wikimedia.org/T88813." [puppet] - 10https://gerrit.wikimedia.org/r/196009 (https://phabricator.wikimedia.org/T88813) (owner: 10Nuria) [20:05:01] (03CR) 10Alex Monk: "Ran into some weirdness getting this synced. Production errors started coming up saying wmgRestbaseServer was not defined, even though thi" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198221 (https://phabricator.wikimedia.org/T91102) (owner: 10Mobrovac) [20:05:10] I havent looked at it, it might be using the production conf [20:05:36] Krenair: it looks like it is trying to use the production parsoid rather than the labs one [20:05:53] and the prod parsoid probably doesn't know about labs domains [20:06:21] 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: automated invocation of Cassandra repair jobs - https://phabricator.wikimedia.org/T92355#1136561 (10Eevans) [20:06:37] if it was going to try prod parsoid it'd try using the internal IP though right? which wouldn't work from labs? [20:07:11] but why would it do that with "restbase::parsoid_uri": http://deployment-parsoid05.eqiad.wmflabs:8000 in hieradata? [20:07:37] I don't see that in hieradata/labs/restbase/common.yaml [20:07:49] https://gerrit.wikimedia.org/r/#/c/197662/7/hieradata/labs/deployment-prep/common.yaml [20:08:49] hm, yeah [20:09:18] (03CR) 10Anomie: "> I think -per ticket- all privacy concerns have been addressed, if not please so kind as to specify concerns on ticket" [puppet] - 10https://gerrit.wikimedia.org/r/196009 (https://phabricator.wikimedia.org/T88813) (owner: 10Nuria) [20:10:04] Krenair: do you remember the logstash url for labs? [20:10:14] http://logstash.wmflabs.org/ doesn't works [20:10:37] https://logstash-beta.wmflabs.org/ [20:10:50] bd808: ah, thx! [20:11:08] I notice you seem to have a watch on 'logstash' ;) [20:11:40] aye. :) [20:12:19] (03CR) 10GWicke: [C: 031] disable nightly repairs [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/198297 (https://phabricator.wikimedia.org/T92355) (owner: 10Eevans) [20:12:25] (03CR) 10Nuria: "@Anomie" [puppet] - 10https://gerrit.wikimedia.org/r/196009 (https://phabricator.wikimedia.org/T88813) (owner: 10Nuria) [20:12:28] godog: still around? [20:12:51] Krenair: the logs say 'connection refused' [20:13:38] 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: Cassandra compaction is getting behind - https://phabricator.wikimedia.org/T93140#1136610 (10Eevans) Update: pending compactions are now all trending downward. Monitoring will continue. {F102095} [20:14:52] Krenair: the config file only has parsoidHost: http://deployment-parsoid05.eqiad.wmflabs [20:14:54] no port [20:15:17] gwicke, okay. do we know where it's trying to connect to? [20:15:43] port 80 by default [20:16:25] isn't the port part of parsoidHost? [20:16:38] trying a puppet run [20:16:46] ok [20:17:51] ah, puppet was disabled, with a message to wait until https://gerrit.wikimedia.org/r/#/c/197662/ was merged (which it is now) [20:18:10] (03CR) 10Anomie: "Why don't you wait for csteipp to reply there, since it's his concern rather than mine? I'm just pointing out that this patch is not as un" [puppet] - 10https://gerrit.wikimedia.org/r/196009 (https://phabricator.wikimedia.org/T88813) (owner: 10Nuria) [20:18:20] (03PS1) 10Dzahn: netboot: use lvm partman recipe for subra/suhail [puppet] - 10https://gerrit.wikimedia.org/r/198304 [20:18:58] Krenair: try now [20:19:45] gwicke, looks good! [20:19:48] I was able to edit with it [20:19:53] awesome ;) [20:19:58] thanks for your help [20:20:17] * gwicke high-fives Krenair [20:25:42] Glaisher, am wondering if I should've used etherpad for P405 [20:26:11] oh well [20:27:54] (03CR) 10Dzahn: [C: 032] netboot: use lvm partman recipe for subra/suhail [puppet] - 10https://gerrit.wikimedia.org/r/198304 (owner: 10Dzahn) [20:34:48] (03PS1) 10John F. Lewis: misc: remove radon.eqiad.wmnet from cache.pp [puppet] - 10https://gerrit.wikimedia.org/r/198341 [20:35:23] mutante: ^ mind looking at that quickly? [20:37:53] PROBLEM - puppet last run on carbon is CRITICAL: CRITICAL: puppet fail [20:40:53] RECOVERY - puppet last run on carbon is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [20:43:41] (03CR) 10Ottomata: [C: 032 V: 032] disable nightly repairs [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/198297 (https://phabricator.wikimedia.org/T92355) (owner: 10Eevans) [20:45:53] (03PS2) 10Dzahn: misc: remove radon.eqiad.wmnet from cache.pp [puppet] - 10https://gerrit.wikimedia.org/r/198341 (owner: 10John F. Lewis) [20:47:13] (03PS1) 10Dzahn: add a .bash_profile for myself [puppet] - 10https://gerrit.wikimedia.org/r/198380 [20:48:28] (03PS1) 10GWicke: Update cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/198383 [20:48:36] ori: ^^ [20:49:04] (03PS2) 10Ori.livneh: Update cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/198383 (owner: 10GWicke) [20:49:15] (03CR) 10Ori.livneh: [C: 032 V: 032] Update cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/198383 (owner: 10GWicke) [20:49:20] (03CR) 10Dzahn: [C: 032] misc: remove radon.eqiad.wmnet from cache.pp [puppet] - 10https://gerrit.wikimedia.org/r/198341 (owner: 10John F. Lewis) [20:49:55] (03CR) 10Dzahn: [C: 032] add a .bash_profile for myself [puppet] - 10https://gerrit.wikimedia.org/r/198380 (owner: 10Dzahn) [20:50:07] (03CR) 10GWicke: "Prod update in https://gerrit.wikimedia.org/r/198383." [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/198297 (https://phabricator.wikimedia.org/T92355) (owner: 10Eevans) [20:50:18] mutante: merged your changes [20:50:32] ori: thanks re cassandra change! [20:50:59] ori: thanks [20:51:50] gwicke: no problem. i'd merge the patch to de-sub-modulify cassandra, but i'm a bit worried about it. does puppet manage cassandra (the service)? [20:52:22] no, we disabled automatic restarts [20:53:01] before we can merge the de-submodulification we'd have to update the patch to the latest submodule state [20:54:03] PROBLEM - puppet last run on mw1156 is CRITICAL: CRITICAL: Puppet has 1 failures [20:54:32] (03PS6) 10Alexandros Kosiaris: Package builder module [puppet] - 10https://gerrit.wikimedia.org/r/194471 [20:56:36] (03PS1) 10John F. Lewis: misc: remove tungsten [puppet] - 10https://gerrit.wikimedia.org/r/198385 [20:56:49] (03PS2) 10John F. Lewis: misc: remove tungsten [puppet] - 10https://gerrit.wikimedia.org/r/198385 [20:57:03] mutante: ^ found another :) [20:57:19] (no more from what I can see so don't expect this 100 times over) [21:00:47] gwicke: i'm up for it if you want to do it [21:00:54] (03CR) 10Dzahn: [C: 032] "yep.it was: "gdash: move from tungsten to graphite1001"" [puppet] - 10https://gerrit.wikimedia.org/r/198385 (owner: 10John F. Lewis) [21:02:02] (03CR) 10Dzahn: [C: 032] phabricator: delete legalpad.yaml [puppet] - 10https://gerrit.wikimedia.org/r/197320 (owner: 10Dzahn) [21:02:23] ori, urandom: maybe Monday? [21:02:40] got some other things I want to get done first [21:02:44] sure [21:02:46] (03PS14) 10BBlack: protoproxy/sslcert/cache: nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 (https://phabricator.wikimedia.org/T86666) [21:02:48] (03PS1) 10BBlack: add generic nrpe script check-fresh-files-in-dir.py [puppet] - 10https://gerrit.wikimedia.org/r/198387 [21:02:50] (03PS1) 10BBlack: test OCSP Stapling on cp1008 [puppet] - 10https://gerrit.wikimedia.org/r/198388 (https://phabricator.wikimedia.org/T86666) [21:03:01] (03PS1) 10Ori.livneh: remove ori weekend commit check [puppet] - 10https://gerrit.wikimedia.org/r/198389 [21:03:07] ori, gwicke: monday is fine [21:03:30] (03PS2) 10Ori.livneh: remove ori weekend commit check [puppet] - 10https://gerrit.wikimedia.org/r/198389 [21:03:37] (03CR) 10Ori.livneh: [C: 032 V: 032] remove ori weekend commit check [puppet] - 10https://gerrit.wikimedia.org/r/198389 (owner: 10Ori.livneh) [21:03:38] ori: you have to wait to merge that until saturday night [21:03:43] haha [21:03:45] too late :P [21:04:01] lol [21:04:12] (03CR) 10jenkins-bot: [V: 04-1] add generic nrpe script check-fresh-files-in-dir.py [puppet] - 10https://gerrit.wikimedia.org/r/198387 (owner: 10BBlack) [21:04:34] ;) [21:05:19] (CR) bblack: [C: -1] jenkins sucks [jenkins-bot] [21:05:21] maybe that should be a jenkins thing istead [21:05:26] the "it's weekend" [21:07:14] it's nitpicking me on pep8 now :P [21:07:43] PROBLEM - Ori committing changes on the weekend on palladium is CRITICAL: CRITICAL: Ori committed a change on a weekend [21:08:57] (03PS2) 10BBlack: add generic nrpe script check-fresh-files-in-dir.py [puppet] - 10https://gerrit.wikimedia.org/r/198387 [21:08:59] (03PS2) 10BBlack: test OCSP Stapling on cp1008 [puppet] - 10https://gerrit.wikimedia.org/r/198388 (https://phabricator.wikimedia.org/T86666) [21:09:01] (03PS15) 10BBlack: protoproxy/sslcert/cache: nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 (https://phabricator.wikimedia.org/T86666) [21:10:01] (03PS1) 10Dzahn: exlude check-fresh-files-in-dir.py from pep8 checks [puppet] - 10https://gerrit.wikimedia.org/r/198390 [21:10:04] RECOVERY - puppet last run on mw1156 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [21:10:05] bblack: ^ [21:10:58] mutante: enh? why not just make the pep8 fixes? [21:11:17] which bblack just did, apparently [21:12:30] lol [21:12:35] (03Abandoned) 10Dzahn: exlude check-fresh-files-in-dir.py from pep8 checks [puppet] - 10https://gerrit.wikimedia.org/r/198390 (owner: 10Dzahn) [21:12:38] and so he did [21:14:54] so what's the deal with how to get automatic Patch-for-Review into a phab task? [21:15:15] put Task: 12345 in the commit message [21:15:25] I've noticed it seems to do it if I put the TXXXX in the commit title, and maybe with "Fixes: TXXXX" [21:15:33] but not "Bug: " or no-prefix [21:15:50] we should doc what it parses for somewhere (or maybe we do and I just never found it) [21:16:27] (03PS3) 10BBlack: test OCSP Stapling on cp1008 [puppet] - 10https://gerrit.wikimedia.org/r/198388 [21:16:30] (03PS16) 10BBlack: protoproxy/sslcert/cache: nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 [21:17:44] still nada for https://gerrit.wikimedia.org/r/#/c/198110/ + https://phabricator.wikimedia.org/T86666 [21:18:00] (with Task:) [21:19:05] (03PS17) 10John F. Lewis: protoproxy/sslcert/cache: nginx ssl_stapling_file support [puppet] - 10https://gerrit.wikimedia.org/r/198110 (https://phabricator.wikimedia.org/T86666) (owner: 10BBlack) [21:19:16] and actually, with "Bug: ", grrrit-wm noted the phab task on the end of the PS15 message above, whereas with "Task: ", it didn't on PS16 [21:19:50] bblack: I changed it to bug (hope you don't mind) [21:20:04] that's fine, that's what it was one PS earlier too [21:20:11] heh [21:20:33] bug should work though :/ [21:20:35] either way, there's nothing posted to the actual Task, although the Bug: variant does affect grrrit-wm [21:20:39] it's Bug:T1234 [21:20:49] have to get rid of the space you mean? [21:21:02] the space shouldn't affect it [21:21:23] infact it should be with the space as opposed to not as that just looks ugly :) [21:21:25] i just meant the bot only gets it if you do "Bug:" [21:21:48] not if it's just a T1234 which will link to phab but that's it [21:21:50] maybe it's something to do with the settings for my phab project/task itself or the ops/puppet repo [21:22:08] anyways, not that important, just annoying [21:22:11] mutante: what the commit has right now is correct though [21:22:47] it may not work if you edit only the commit message after it's already uploaded [21:22:54] maybe it doesn't like topic branches, too [21:23:07] I guess [21:23:09] who knows [21:23:24] topic branches i use all the time, but i usually edit it in gerrit web ui [21:23:37] bblack: it's not your responsibility to make it work so ignore it ;) [21:23:50] it should work now when you merge it [21:23:57] the ticket should get an update on merge [21:23:58] I use branches via cli and it works there [21:24:36] 6operations, 7HTTPS, 3HTTPS-by-default, 7Performance: HTTPS performance tuning - https://phabricator.wikimedia.org/T86666#1136874 (10BBlack) OCSP Stapling commits are here now: https://gerrit.wikimedia.org/r/#/q/status:open+project:operations/puppet+branch:production+topic:ocsp-stapling,n,z Holding for th... [21:25:11] afaict it updates the ticket on initial upload and on merge, but not when just new patchsets are uploaded, so you have to have the "Bug:T1234" in there in PS1 [21:25:57] yeah that makes a sort-of sense [21:26:24] i suppose that is a feature because an update for each PS would be too much on the ticket [21:26:24] you wouldn't want it to spam the ticket on repeated PSX updates. It would be nice if it would pick up the initial addition of the TXXXXX in PS>1 though [21:26:30] yes, that [21:28:56] 6operations, 10Deployment-Systems, 6Release-Engineering, 6Services: Streamline our service development and deployment process - https://phabricator.wikimedia.org/T93428#1136912 (10GWicke) [21:34:08] ori: thanks! [21:35:22] 6operations, 10Deployment-Systems, 6Release-Engineering, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1136938 (10GWicke) 3NEW [21:37:12] 6operations, 10Deployment-Systems, 6Release-Engineering, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1136945 (10GWicke) [21:38:30] <_joe_> ... [21:46:19] <^d> Krenair: Did we ever file a task about the eval() crap? [21:46:34] <^d> (regardless of people's opinions on the eval(), the errors aren't acceptable) [21:46:53] ^d don't think so [21:47:43] 388 Invalid argument: function: not string, closure, or array in /srv/mediawiki/php-1.25wmf21/includes/TemplateParser.php on line 203 [21:47:44] 202 error: syntax error, unexpected T_STRING in /srv/mediawiki/php-1.25wmf21/includes/TemplateParser.php(136) : eval()'d code on line 1 [21:48:53] The AbuseFilter 'article' thing is still further down the log and I have no idea how it happens. Maybe I was missing something obvious last time I checked that code. Might live hack some logging for it if I can't make sense of it otherwise. [21:49:11] kaldari, ^^^ [21:50:39] <^d> Krenair: https://phabricator.wikimedia.org/T93436 [21:51:37] yeah, saw. ty [21:54:55] ^d, do you know more about that error then? [21:55:05] like what code is actually the root of the issue? [21:55:12] <^d> No, but we could add some logging and find out [21:55:37] * ^d preps a livehack [21:56:48] 6operations, 10Deployment-Systems, 6Release-Engineering, 6Services: Evaluate Docker as a container deployment tool - https://phabricator.wikimedia.org/T93439#1137013 (10GWicke) 3NEW [21:56:56] Krenair: I’m looking into it [21:57:13] AF or TemplateParser? [21:57:34] TemplateParser [21:59:15] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 11s) [21:59:20] !log demon Synchronized php-1.25wmf21/includes/TemplateParser.php: (no message) (duration: 00m 05s) [21:59:23] Logged the message, Master [21:59:25] Logged the message, Master [22:00:37] <^d> Krenair, kaldari: I only see one template being eval()'d, it's the checkbox.mustache from MF [22:01:08] ^d: yeah and it seems to actually be working on en.wiki [22:01:28] the template for that is super simple [22:02:05] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137054 (10JohnLewis) 3NEW a:3Dzahn [22:02:10] ^d: I’m adding some more debugging code [22:02:20] <^d> mmk [22:02:58] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137062 (10jeremyb) [22:03:53] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137054 (10jeremyb) [22:03:57] 6operations, 10Deployment-Systems, 6Release-Engineering, 6Services: Streamline our service development and deployment process - https://phabricator.wikimedia.org/T93428#1137065 (10GWicke) [22:04:30] 6operations, 10Deployment-Systems, 6Release-Engineering, 6Services: Streamline our service development and deployment process - https://phabricator.wikimedia.org/T93428#1136887 (10GWicke) [22:05:16] 6operations, 10Deployment-Systems, 6Release-Engineering, 6Services: Streamline our service development and deployment process - https://phabricator.wikimedia.org/T93428#1136887 (10GWicke) [22:05:54] 6operations, 10Deployment-Systems, 6Release-Engineering, 6Services: Streamline our service development and deployment process - https://phabricator.wikimedia.org/T93428#1136887 (10GWicke) [22:06:13] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137073 (10Aklapper) That goes together with the Watchmouse notification on ops@ ("ALERT! Phabricator: Not matched"; Date: Fri, 13 Mar 2015 00:06UTC) which did not see a "[Ops] OKAY Pha... [22:06:59] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137077 (10Dzahn) This is Watchmouse. It checks https://phabricator.wikimedia.org/T2001 and looks for the string "Reference to bz1". It was a random test string that i expected was good... [22:07:58] 6operations, 10Deployment-Systems, 6Release-Engineering, 6Services: Evaluate Docker as a container deployment tool - https://phabricator.wikimedia.org/T93439#1137079 (10GWicke) [22:08:12] ^d, Krenair: https://gerrit.wikimedia.org/r/#/c/198159/6 [22:09:00] <^d> That should stop the one error and make it an exception :) [22:09:04] <^d> Nicer failure [22:09:19] true [22:09:24] we'd get dbname, stack trace, etc. then [22:09:42] ^d: I’m not sure how to debug the eval error other than precompiling it locally and taking a look. [22:09:57] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137084 (10Aklapper) I still see "Reference bz1" in that task. Got more info how that's exactly queried by Watchmouse? [22:10:12] <^d> kaldari: $someone the other day suggested writing a unit test that compiles the template and make sure it doesn't boom [22:10:27] ^d: that’s a good idea [22:10:30] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137085 (10Dzahn) I changed the test string to "docs are teh suck". This should fix it :) [22:12:41] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137086 (10Dzahn) Andre, subtle difference between "Reference to bz1" and "Reference bz1" since a recent version upgrade? [22:13:13] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137088 (10jeremyb) >>! In T93443#1137084, @Aklapper wrote: > I still see "Reference bz1" in that task. me too. and it doesn't matter if logged in. $ curl -vsSL 2>&1 'https://phabri... [22:13:43] gwicke: seriously? [22:14:56] 6operations, 10Deployment-Systems, 6Release-Engineering, 6Services: Evaluate Docker as a container deployment tool - https://phabricator.wikimedia.org/T93439#1137090 (10GWicke) [22:16:07] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137110 (10JohnLewis) >>! In T93443#1137088, @jeremyb wrote: >>>! In T93443#1137084, @Aklapper wrote: >> I still see "Reference bz1" in that task. > > me too. and it doesn't matter if... [22:16:25] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137111 (10Dzahn) {F102172} [22:16:25] paravoid: what prompts your strong reaction? [22:17:50] ^d, do we have a fluorine equivalent in deployment-prep? [22:20:19] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137144 (10jeremyb) >>! Re T93443#1137110 (@JohnLewis): ugh ?! :-( :P [22:22:15] kaldari, ^d, Krenair: https://gerrit.wikimedia.org/r/#/c/198409/ [22:23:00] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137147 (10Dzahn) {F102178} ^ recovering , takes a bit because there are checks from many locations [22:23:06] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137148 (10jeremyb) @chasemp or @mmodell should decide on the string. (or another method. e.g. checking a path for a specific HTTP status) [22:23:11] <^d> Krenair: They get written to /a/project/something [22:23:20] <^d> Or /project/data rather [22:24:57] ^d: thanks [22:25:03] that seems to have a lot of crap in it [22:25:18] Invalid host name (www.es.wikipedia.beta.wmflabs.org) from MWMultiVersion [22:25:51] 6operations, 10Deployment-Systems, 6Release-Engineering, 6Services: Evaluate Docker as a container deployment tool - https://phabricator.wikimedia.org/T93439#1137154 (10GWicke) [22:26:37] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137155 (10Dzahn) checking actual content of an actual page is better than just checking for HTTP 200 imho. it can also be 200 and broken in surprising ways. [22:29:02] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137165 (10jeremyb) well there could be a path that does a health check instead of just loading one page. anyway, I don't know phab, I want to defer to the experts as I said. :) [22:30:53] ^d, I'm gonna deploy the throws patch [22:30:54] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137167 (10JohnLewis) I just accidentally found https://phabricator.wikimedia.org/status/ - not sure how that works but... [22:31:45] <^d> MaxSem: mmk. Let me revert out my hacks real quick too [22:33:24] !log demon Synchronized wmf-config/InitialiseSettings.php: (no message) (duration: 00m 07s) [22:33:27] Logged the message, Master [22:33:31] !log demon Synchronized php-1.25wmf21/includes/TemplateParser.php: (no message) (duration: 00m 07s) [22:33:34] Logged the message, Master [22:33:39] <^d> MaxSem: Ok out of your way [22:34:21] :) [22:37:33] !log maxsem Synchronized php-1.25wmf21/includes/TemplateParser.php: https://gerrit.wikimedia.org/r/#/c/198409/ (duration: 00m 06s) [22:37:39] Logged the message, Master [22:38:43] aha [22:38:49] kaldari, RuntimeException from line 138 of /srv/mediawiki/php-1.25wmf21/includes/TemplateParser.php: Requested template, checkbox, is not callable [22:39:26] MaxSem: crazy. I’ll take a look [22:39:44] 015-03-20 22:39:04 mw1074 igwiki: [67aff3e1] /w/index.php?title=Ih%C3%BC_k%C3%A1r%C3%ADr%C3%AD:MobileOptions&returnto=Ih%C3%BC+k%C3%A1r%C3%ADr%C3%AD%3AMobileMenu [22:39:57] that's where it comes from [22:40:19] (03PS2) 10Dzahn: gdash: add protocol redirect to https [puppet] - 10https://gerrit.wikimedia.org/r/198268 (owner: 10John F. Lewis) [22:41:46] (03CR) 10Dzahn: "yea, this is the standard snippet we used for http->https when behind misc-web" [puppet] - 10https://gerrit.wikimedia.org/r/198268 (owner: 10John F. Lewis) [22:43:07] paravoid: Hmm, can you subscribe me to https://phabricator.wikimedia.org/T92872 ? [22:44:55] Which machine is phab-01 running on? [22:45:31] <^d> Negative24: It's a labs vm [22:45:31] Negative24: a labs instance called phab-01 [22:45:57] Thanks. Its not showing up in Wikitech search [22:46:33] Do you think it would be hard to get ssh access to it if I requested it on phab? [22:46:52] the -01 was supposed to mean we can have many, and easily make -02 and -03 btw [22:47:10] mutante: Figured as much [22:47:30] Negative24: shouldn't be that hard [22:48:04] kaldari, can't repro locally, wft [22:48:34] mutante: Alright. I would like to be able to test something with the security extension (would that need sudo?) [22:48:50] MaxSem: I’m going to test more right after this meeting [22:50:19] Negative24: i dunno the sudo part but it should be possible because that's the purpose of labs to be able to test things. it could be that people ask you to use a phab-02 but hopefully that just means becoming a project member, firing up new instance and applying puppet role [22:51:27] (03PS1) 10Dzahn: gdash: mod_rewrite, mod_headers for proto redirect [puppet] - 10https://gerrit.wikimedia.org/r/198418 [22:51:28] mutante: it seems phab-01 goes under quite a bit of development wear and tear since it was just nuked last week and rebuilt [22:51:57] !log maxsem Synchronized php-1.25wmf21/extensions/MobileFrontend: touch (duration: 00m 09s) [22:52:03] Logged the message, Master [22:52:11] Negative24: yea, maybe you really just want to make phab-02, that's still asking to become a member of the project, but using a separate instance [22:52:35] Negative24: that way you know you can do what you want and not influence other people [22:53:11] it's the whole "treat instances like cattle, not like pets"-mantra labs has [22:53:30] but i'm afraid phab-01 is already a pet [22:53:44] mutante: Sure. What project is it [22:53:58] it should just be "Phabricator" [22:54:17] I'm sometimes getting the feeling that I'm hogging up WMF resources :) [22:54:33] how many instances just go unused? [22:55:16] good question but sometimes i see labs people cleanup and delete old instances [22:55:32] theoretically you are supposed to create , run tests and delete [22:55:38] PROBLEM - Apache HTTP on mw1244 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:55:49] mutante: will do [22:56:29] PROBLEM - HHVM rendering on mw1244 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:59:58] RECOVERY - Apache HTTP on mw1244 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.037 second response time [23:00:49] RECOVERY - HHVM rendering on mw1244 is OK: HTTP OK: HTTP/1.1 200 OK - 69409 bytes in 0.105 second response time [23:03:16] (03CR) 10Dzahn: [C: 032] gdash: mod_rewrite, mod_headers for proto redirect [puppet] - 10https://gerrit.wikimedia.org/r/198418 (owner: 10Dzahn) [23:05:57] (03CR) 10Dzahn: "@John the change is alright, but we also need to ensure the Apache modules are loaded that are used here. mod_rewrite and mod_headers -->" [puppet] - 10https://gerrit.wikimedia.org/r/198268 (owner: 10John F. Lewis) [23:08:11] !log gdash.wikimedia.org now enforcing protocol redirect to https [23:08:16] Logged the message, Master [23:18:43] (03PS1) 10Dzahn: color root shell in red [puppet] - 10https://gerrit.wikimedia.org/r/198425 [23:20:39] RECOVERY - gdash.wikimedia.org on graphite2001 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 473 bytes in 0.092 second response time [23:26:27] 6operations, 10Wikimedia-Shop, 5Patch-For-Review: Changing the URL for the Wikimedia Shop - https://phabricator.wikimedia.org/T92438#1137328 (10Dzahn) Shopify said: "I've started the process. I'll let you know when to point your CNAME." [23:28:15] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137334 (10Dzahn) It's reported as UP again on http://status.wikimedia.org/ Can we close it or do you want to discuss how to further improve it? @chase [23:29:14] 6operations, 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1137345 (10greg) [23:29:25] 6operations, 6Phabricator, 7Monitoring: Phabricator reported down on status.wm.o - https://phabricator.wikimedia.org/T93443#1137353 (10Dzahn) p:5Normal>3Low Lowering priority because it's unbroken for now. [23:29:58] 6operations, 10Deployment-Systems, 6Services: Evaluate Docker as a container deployment tool - https://phabricator.wikimedia.org/T93439#1137361 (10greg) [23:33:52] 6operations, 6WMF-Design, 10Wikimedia-General-or-Unknown, 7Design, 7Varnish: Better WMF error pages - https://phabricator.wikimedia.org/T76560#1137418 (10Dzahn) also see https://gerrit.wikimedia.org/r/#/c/97190/ [23:39:28] (03PS1) 10GWicke: Enable group1 wikis in RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/198433 [23:39:39] In wmf/1.25wmf20 I dont see .gitmodules anymore. I only see composer.json and it seem rather very thin compared to wmf/1.24wmfXX how do you guys manage extendions now on the wmf branch? [23:39:56] bd808, yt? [23:40:21] they're still submodules... [23:40:41] ok, i’ll look again [23:41:38] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1137454 (10Dzahn) Tried different partman recipes because they failed with different errors. https://gerrit.wikimedia.org/r/#/c/198304/ this is the one that should have worked on the identical hardw... [23:42:36] oh, right legoktm, I made the wrong click sequence in Gerrit. I clicked on a branch, then tree on top —which switched back to master branch. [23:42:46] (03CR) 10GWicke: "We should coordinate on merge & deploy for this. The keyspaces and columnfamilies (tables) for the new projects will be auto-created on st" [puppet] - 10https://gerrit.wikimedia.org/r/198433 (owner: 10GWicke) [23:45:05] (03PS2) 10GWicke: Enable group1 wikis in RESTBase [puppet] - 10https://gerrit.wikimedia.org/r/198433 (https://phabricator.wikimedia.org/T93452) [23:46:23] gwicke, you're leaving out just aawik*? [23:47:20] Krenair: yeah, mostly because it was squatting the very prominent pole position in the docs, but didn't have any content [23:47:45] could do a pass to weed out other closed wikis [23:48:13] (03PS1) 10Dzahn: add subra/suhail to site.pp as codfw poolcounters [puppet] - 10https://gerrit.wikimedia.org/r/198437 (https://phabricator.wikimedia.org/T93261) [23:48:17] (03CR) 10Subramanya Sastry: Enable group1 wikis in RESTBase (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/198433 (https://phabricator.wikimedia.org/T93452) (owner: 10GWicke) [23:48:35] so Abkhaz Wikipedia is in front instead? [23:49:02] seems weird to decide arbitrarily that only aawik* will be excluded, but not the other closed wikis [23:49:05] (03CR) 10GWicke: Enable group1 wikis in RESTBase (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/198433 (https://phabricator.wikimedia.org/T93452) (owner: 10GWicke) [23:49:13] (03PS2) 10Dzahn: add subra/suhail to site.pp as codfw poolcounters [puppet] - 10https://gerrit.wikimedia.org/r/198437 (https://phabricator.wikimedia.org/T93261) [23:51:43] (03CR) 10GWicke: Enable group1 wikis in RESTBase (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/198433 (https://phabricator.wikimedia.org/T93452) (owner: 10GWicke) [23:52:31] 6operations, 5Patch-For-Review: Setup poolcounter servers for codfw - https://phabricator.wikimedia.org/T93261#1137495 (10Dzahn) [23:53:44] (03CR) 10Subramanya Sastry: Enable group1 wikis in RESTBase (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/198433 (https://phabricator.wikimedia.org/T93452) (owner: 10GWicke) [23:56:08] !log suhail - new install, signing puppet cert, initial run [23:56:11] Logged the message, Master [23:58:51] gwicke, private wikis are excluded from this? [23:59:04] and *.wikimedia.org [23:59:36] Krenair: all 'special' wikis are excluded