[00:00:05] RoanKattouw, ^d, Krenair, twkozlowski, Krenair: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150227T0000). Please do the needful. [00:00:15] kaldari, ping for swat [00:00:22] jouncebot: two Krenairs? [00:00:32] Krenair: howdy [00:00:46] yeah I'm listed for twwozlowski's stuff as well, MatmaRex [00:01:00] i see, but it's still silly that it reports you double [00:01:07] anyway. want me to submit a bump to master? [00:01:30] MatmaRex, what branches should this be on? just 1.25wmf19? [00:01:34] yes [00:02:57] PROBLEM - haproxy failover on dbproxy1004 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [00:04:19] (03CR) 10Alex Monk: [C: 032] Updating WikiGrok configs for new global vars [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193283 (owner: 10Kaldari) [00:05:17] ^demon|away: When you're not away, MatmaRex needs to be added to wmf ldap group so he can do things like merge submodule updates in branches. k thx bye :) [00:05:20] (03Merged) 10jenkins-bot: Updating WikiGrok configs for new global vars [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193283 (owner: 10Kaldari) [00:05:52] noooo [00:05:58] bd808, I think that'd be the wmf deployment group? [00:06:24] hmm... maybe [00:06:37] MatmaRex: why don't you have deploy rights yet? [00:06:50] it will make your life better I promise [00:06:54] never asked for them, i guess [00:06:59] !log krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/#/c/193283/ (duration: 00m 06s) [00:07:01] i break the site often enough without them, thanks [00:07:04] the sites* [00:07:05] Logged the message, Master [00:07:06] kaldari, ^ [00:07:11] please check [00:07:17] checking…. [00:07:26] MatmaRex: but with rights you can fix the site too. Very important [00:07:43] doubly so [00:07:45] my cherry-pick: https://gerrit.wikimedia.org/r/#/c/193300/ [00:08:02] can't just fast-forward, there's been a l10n update in the meantime [00:08:19] that's mine :| [00:08:22] oh, i'm late. [00:08:37] Krenair: shit, looks like I caused a fatal: https://logstash.wikimedia.org/#/dashboard/elasticsearch/fatalmonitor [00:08:59] just for mobile at least [00:09:27] !log krenair Synchronized wmf-config: rv (duration: 00m 06s) [00:09:31] Logged the message, Master [00:10:27] RECOVERY - haproxy failover on dbproxy1004 is OK: OK check_failover servers up 0 down 0 [00:10:44] Krenair: Did initializesettings also get synced? [00:10:47] IRC just died :| [00:10:56] (my IRC bouncer) [00:11:22] looks like he synced the whole wmf-config dir [00:12:01] yeah, should have got all 4 files under wmf-config [00:12:50] weird, mobile.php is saying a bunch of wmg vars aren’t defined, but they were added to IntializeSettings.php at the same time: https://gerrit.wikimedia.org/r/#/c/193283/2/wmf-config/InitialiseSettings.php [00:12:53] MaxSem: ^ [00:12:57] reverting [00:13:06] * Krenair pokes logmsgbot [00:13:33] ... good job znc. [00:14:40] and still happening [00:14:46] (03PS1) 10BBlack: removing discard on jessie cache SSDs [puppet] - 10https://gerrit.wikimedia.org/r/193301 [00:14:48] Why is it still happening?! [00:14:58] Notice: Undefined variable: wmgMFEnableWikiGrokForAnons in /srv/mediawiki/wmf-config/mobile.php on line 14 [00:15:20] try touching InitialiseSettings.php and resyncing [00:15:38] the cache on the mw servers is based on timestamp. [00:15:54] !log krenair Synchronized wmf-config: touched initialsettings (duration: 00m 07s) [00:16:01] Logged the message, Master [00:16:12] (03CR) 10BBlack: [C: 032] removing discard on jessie cache SSDs [puppet] - 10https://gerrit.wikimedia.org/r/193301 (owner: 10BBlack) [00:16:14] sync-* and scap try to account for that but sometimes mysteriously it doesn't take [00:16:28] I think that did it [00:16:36] yeah looks better [00:16:43] silly hhvm is silly [00:17:07] maybe, the whole initial commit was right and it's just caching that screwed up? [00:17:10] (03PS1) 10Alex Monk: Revert "Updating WikiGrok configs for new global vars" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193302 [00:17:24] (03CR) 10Alex Monk: [C: 032] Revert "Updating WikiGrok configs for new global vars" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193302 (owner: 10Alex Monk) [00:17:25] MaxSem: Here’s the commit: https://gerrit.wikimedia.org/r/#/c/193283/ [00:17:28] (03Merged) 10jenkins-bot: Revert "Updating WikiGrok configs for new global vars" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193302 (owner: 10Alex Monk) [00:17:42] MaxSem: Do you see anything wrong with it? [00:17:43] it's possible [00:17:46] yeah. that's what i'm looking at [00:18:12] you don't even need domain knowledge [00:18:46] just a bunch of vars renamed, seemingly correctly [00:19:20] git grep on the patch looks ok to me [00:19:46] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/). [00:19:52] MaxSem: I guess I’ll need to do it as a 2-part process to avoid any caching issues [00:20:17] (03PS1) 10Krinkle: contint: Create /srv/deployment as 755 instead of 700 [puppet] - 10https://gerrit.wikimedia.org/r/193303 (https://phabricator.wikimedia.org/T87843) [00:20:19] I'd try again with the sync-dir followed basically immediately by a touch and sync-file of InitialiseSettings [00:20:42] even though that stuff shows in fatalmonitor it's not really fatal [00:20:59] it's just a notice whine that an undef variable was used [00:21:04] Krenair: Do you want to try bd808’s suggestion or should I rewrite the patch as a 2-step patch? [00:21:57] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [00:22:33] ok... [00:22:35] * MaxSem repeats his old mantra of find -type f -exec touch {} \; [00:22:36] (03PS1) 10Alex Monk: Revert "Revert "Updating WikiGrok configs for new global vars"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193305 [00:22:53] (03CR) 10Alex Monk: [C: 032] Revert "Revert "Updating WikiGrok configs for new global vars"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193305 (owner: 10Alex Monk) [00:22:57] (03Merged) 10jenkins-bot: Revert "Revert "Updating WikiGrok configs for new global vars"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193305 (owner: 10Alex Monk) [00:23:06] * MaxSem grabs popcorn [00:23:58] kaldari, hang on [00:24:07] when did you do this rename? [00:24:33] The patch is from today [00:25:08] Because I did a quick check of MobileFrontend wmf18 and it appears to have the old names ($wgMFEnableWikiGrok) [00:25:55] (03PS3) 10Ori.livneh: Initial commit of statsdlb [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193255 [00:26:50] Krenair|temp: ah, I guess the old code is still on enwiki. Forgot about that. I’ll rewrite it to keep the old vars for now as well… [00:27:23] Ok, I'll re-revert the change and we'll leave it for now? [00:27:25] (03PS1) 10MaxSem: Revert "Revert "Revert "Updating WikiGrok configs for new global vars""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193307 [00:27:31] Krenair|temp: Doesn’t matter too much since the feature is currently disabled, but would be good to avoid the warnings and such [00:27:38] (03CR) 10MaxSem: [C: 032] Revert "Revert "Revert "Updating WikiGrok configs for new global vars""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193307 (owner: 10MaxSem) [00:27:43] (03Merged) 10jenkins-bot: Revert "Revert "Revert "Updating WikiGrok configs for new global vars""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193307 (owner: 10MaxSem) [00:27:57] Revert “Revert “Revert “Revert “Revert “Revert… :) [00:28:02] you putting that on tin, MaxSem? [00:28:05] (03CR) 10BBlack: [C: 032] contint: Create /srv/deployment as 755 instead of 700 [puppet] - 10https://gerrit.wikimedia.org/r/193303 (https://phabricator.wikimedia.org/T87843) (owner: 10Krinkle) [00:28:09] mmm, still Asher used longer revert strings [00:28:29] (03PS2) 10Ori.livneh: Switch EventLogging's MariaDB consumer to m4-master [puppet] - 10https://gerrit.wikimedia.org/r/193135 [00:28:46] just pulled it, no actual changes [00:28:57] yeah, that was expected [00:29:09] I didn't merge it onto tin [00:30:34] bblack: Hm.. so if I would want to do a manual pupet run the same way the automatic cron does it. How would I do that? [00:31:13] umask 002 sudo pupet agent -t? Or is there a way I can make it use the same envrionment setup that cron/puppet will use? [00:31:27] MatmaRex, will do the vector change [00:32:02] aight [00:32:03] (03PS1) 10Kaldari: Revert "Revert "Revert "Revert "Updating WikiGrok configs for new global vars"""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193308 [00:32:12] Krinkle: /usr/local/sbin/puppet-run [00:32:14] bd808: Perhaps sudo -i ? [00:32:21] and then puppet agent -t [00:32:25] (03CR) 10Ori.livneh: [C: 032 V: 032] Switch EventLogging's MariaDB consumer to m4-master [puppet] - 10https://gerrit.wikimedia.org/r/193135 (owner: 10Ori.livneh) [00:32:26] bblack: Ah, interesting [00:32:28] (is what the cron uses, which sets umask) [00:32:30] <^demon|away> bd808: Somebody beat me to it? [00:33:22] ^demon|away: I think it turns out that he's not a deployer [00:33:22] bblack: thx [00:34:57] (03PS2) 10Kaldari: Revert "Revert "Revert "Revert "Updating WikiGrok configs for new global vars"""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193308 [00:35:51] (03PS3) 10Kaldari: Revert "Revert "Revert "Revert "Updating WikiGrok configs for new global vars"""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193308 [00:36:20] Krenair, MaxSem: OK, should be ready to go now: https://gerrit.wikimedia.org/r/#/c/193308/ [00:36:56] I am doing the vector change, will look at that next [00:37:41] Krenair: NP [00:37:49] (03CR) 10MaxSem: [C: 031] Revert "Revert "Revert "Revert "Updating WikiGrok configs for new global vars"""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193308 (owner: 10Kaldari) [00:37:55] 7Puppet, 10Continuous-Integration, 5Patch-For-Review: Puppet is causing changed/added files in 'slave-scripts' git::clone on integration slaves in labs to become root read-only - https://phabricator.wikimedia.org/T87843#1072221 (10Krinkle) 5Open>3Resolved [00:43:24] !log krenair Synchronized php-1.25wmf19/skins/Vector/skinStyles/mediawiki.sectionAnchor.less: https://gerrit.wikimedia.org/r/#/c/193310/ (duration: 00m 05s) [00:43:27] MatmaRex, ^ [00:43:29] Logged the message, Master [00:43:56] Krenair: thanks, verified [00:44:56] (03CR) 10Alex Monk: [C: 032] Revert "Revert "Revert "Revert "Updating WikiGrok configs for new global vars"""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193308 (owner: 10Kaldari) [00:46:08] (03Merged) 10jenkins-bot: Revert "Revert "Revert "Revert "Updating WikiGrok configs for new global vars"""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193308 (owner: 10Kaldari) [00:47:25] !log krenair Synchronized wmf-config: https://gerrit.wikimedia.org/r/193308 (duration: 00m 10s) [00:47:27] kaldari, MaxSem: ^ [00:47:33] Logged the message, Master [00:47:37] Krenair: I imagine we may get warnings again from mobile.php, but hopefully explicitily touching and syncing InitializeSettings.php after will fix it. It’s probably a cache race condition. [00:47:47] there was a failure in the sync [00:48:27] wee explosion [00:48:45] !log krenair Synchronized wmf-config: touch (duration: 00m 06s) [00:48:49] ok... [00:48:49] Logged the message, Master [00:49:29] worked :P [00:49:34] Krenair: looks like there was a spike in notices, but now it’s back to normal :) [00:49:39] yeah [00:49:43] yay [00:49:54] stabstabstab [00:50:13] Krenair: thanks for weathering the storm :P [00:50:31] I hope the touch sync fixed the original failures [00:52:31] (03CR) 10Alex Monk: [C: 032] Add rollbacker user group to Nepali Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193273 (https://phabricator.wikimedia.org/T90888) (owner: 10Odder) [00:52:35] (03CR) 10Alex Monk: [C: 032] Add autopatrolled user group to Nepali Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193274 (https://phabricator.wikimedia.org/T89816) (owner: 10Odder) [00:55:09] (03CR) 10MaxSem: [C: 04-1] Initial commit of statsdlb (031 comment) [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193255 (owner: 10Ori.livneh) [00:56:45] (03CR) 10MaxSem: Initial commit of statsdlb (031 comment) [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193255 (owner: 10Ori.livneh) [00:57:11] (03Merged) 10jenkins-bot: Add rollbacker user group to Nepali Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193273 (https://phabricator.wikimedia.org/T90888) (owner: 10Odder) [00:57:14] (03Merged) 10jenkins-bot: Add autopatrolled user group to Nepali Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193274 (https://phabricator.wikimedia.org/T89816) (owner: 10Odder) [00:58:54] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/193273/ and https://gerrit.wikimedia.org/r/#/c/193274/ (duration: 00m 06s) [00:58:58] (03CR) 10Ori.livneh: Initial commit of statsdlb (031 comment) [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193255 (owner: 10Ori.livneh) [00:59:03] Logged the message, Master [00:59:38] so why did https://gerrit.wikimedia.org/r/#/c/193281/ not merge? [00:59:39] hm [00:59:45] (03CR) 10MaxSem: Initial commit of statsdlb [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193255 (owner: 10Ori.livneh) [00:59:57] "Error creating thumbnail: Unable to save thumbnail to destination" [00:59:58] urgh [01:00:09] is an integration slave out of space again Krinkle? [01:00:27] Krenair: No, I don't think that's a space issue [01:01:26] Krinkle, do you think if I made it try again, it'd work? [01:02:47] Krenair: Don't know. [01:03:16] Krenair: file a bug :) Any details that help narrow it down would be useful. [01:04:20] This is for something I'm trying to deploy, Krinkle [01:05:27] Krenair: Link? [01:05:38] https://gerrit.wikimedia.org/r/#/c/193281/ [01:09:01] Krenair: https://integration.wikimedia.org/ci/job/mediawiki-phpunit-zend/3442 https://integration.wikimedia.org/ci/job/mediawiki-phpunit-zend/3442/artifact/log/mw-dberror.log/*view*/ [01:09:07] Fri Feb 27 0:56:30 UTC 2015 gallium build3442-unittest_ DatabaseSqlite::replace/single-row 19 unittest_module_deps.md_module may not be NULL REPLACE INTO unittest_module_deps (md_module,md_skin,md_deps) VALUES (NULL,'vector','["/srv/ssd/jenkins-slave/workspace/mediawiki-phpunit-zend@2/src/tests/phpunit/data/less/module/styles.less","/srv/ssd/jenkins-slave/workspace/mediawiki-phpunit-zend@2/src/ [01:09:07] tests/phpunit/data/less/module/dependency.less","/srv/ssd/jenkins-slave/workspace/mediawiki-phpunit-zend@2/src/tests/phpunit/data/less/common/test.common.mixins.less"]') [01:10:02] weird stuff [01:10:14] https://integration.wikimedia.org/ci/job/mediawiki-phpunit-zend/3442/artifact/log/mw-debug-cli.log/*view*/ [01:12:15] Krenair: Looks like a genuine failure [01:12:31] File::transform: Doing stat for mwstore://local-backend/local-thumb/3/3a/Foobar.jpg/1941px-Foobar.jpg [01:12:31] FileBackendStore::getFileStat: File mwstore://local-backend/local-thumb/3/3a/Foobar.jpg/1941px-Foobar.jpg does not exist. [01:12:47] Perhaps mwcore behaviour changed to output an error instead of default thumbnail [01:12:55] not sure how that got merged in that case [01:13:02] test failure https://integration.wikimedia.org/ci/job/mediawiki-phpunit-zend/3442/console [01:13:17] Krenair: try again [01:16:55] Krinkle, ok it worked [01:16:59] Krenair: Filed as https://phabricator.wikimedia.org/T91016 [01:17:02] going to bed now :) [01:17:12] thanks [01:18:54] !log krenair Synchronized php-1.25wmf19/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js: https://gerrit.wikimedia.org/r/#/c/193313/ (duration: 00m 06s) [01:18:57] ok [01:18:59] Logged the message, Master [01:25:36] (03PS1) 10BBlack: kernel/disk stuff for jessie cache installs [puppet] - 10https://gerrit.wikimedia.org/r/193314 [01:38:32] (03PS1) 10Springle: Promote db1046 to m4 master [puppet] - 10https://gerrit.wikimedia.org/r/193317 [01:39:44] (03CR) 10Springle: [C: 032] Promote db1046 to m4 master [puppet] - 10https://gerrit.wikimedia.org/r/193317 (owner: 10Springle) [01:40:50] !log switch db1046 to master of m4 (eventlogging). deployed dbproxy1004 with m4-master CNAME [01:40:56] Logged the message, Master [01:43:10] 6operations, 10hardware-requests: codfw: virtualization servers for misc services - https://phabricator.wikimedia.org/T89161#1072681 (10Aklapper) [02:16:11] !log l10nupdate Synchronized php-1.25wmf18/cache/l10n: (no message) (duration: 00m 02s) [02:16:16] Logged the message, Master [02:17:18] !log LocalisationUpdate completed (1.25wmf18) at 2015-02-27 02:16:14+00:00 [02:17:23] Logged the message, Master [02:18:15] !log l10nupdate Synchronized php-1.25wmf19/cache/l10n: (no message) (duration: 00m 01s) [02:18:20] Logged the message, Master [02:19:22] !log LocalisationUpdate completed (1.25wmf19) at 2015-02-27 02:18:19+00:00 [02:19:28] Logged the message, Master [02:48:37] PROBLEM - Host virt1012 is DOWN: PING CRITICAL - Packet loss = 100% [02:52:27] RECOVERY - Host virt1012 is UP: PING OK - Packet loss = 0%, RTA = 0.42 ms [03:45:09] hello [03:45:36] hello!? [03:46:52] yo hello!!!!!!!!!!!!!!!!\ [04:04:50] 6operations, 10ops-codfw, 3wikis-in-codfw: PXE doesn't work on mc2017-18 - https://phabricator.wikimedia.org/T90586#1072840 (10Papaul) mc2018 is at the stage where it needs validation. on mc2017 I was able to detect the External 10G Nic and try to boot from both ports with no response when to the message no... [04:13:21] 6operations, 10ops-codfw: Please rack & connect the Tampa MX80s in row D - https://phabricator.wikimedia.org/T84658#1072843 (10Papaul) @ Faidon. Do you want me to close this task. I didn't because Mark supposed to do some configuration on both devices before we shipping them to sites. [04:44:36] (03PS1) 10Andrew Bogott: Modify cold-migrate script: [puppet] - 10https://gerrit.wikimedia.org/r/193326 [04:48:39] (03CR) 10Andrew Bogott: [C: 032] Modify cold-migrate script: [puppet] - 10https://gerrit.wikimedia.org/r/193326 (owner: 10Andrew Bogott) [04:59:16] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [04:59:47] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [05:05:17] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [05:05:48] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [05:41:38] !log upgrading virt1012 to Trusty because labs networking failed twice in two hours, and how could it be worse? [05:41:48] Logged the message, Master [05:42:30] ok, I’m taking drastic measures with virt1012, upgrading to Trusty in hopes of bypassing whatever bug is killing the network [05:43:04] good luck [05:48:07] PROBLEM - SSH on virt1012 is CRITICAL: Connection timed out [05:48:17] PROBLEM - salt-minion processes on virt1012 is CRITICAL: Timeout while attempting connection [05:48:27] PROBLEM - puppet last run on virt1012 is CRITICAL: Timeout while attempting connection [05:50:06] PROBLEM - Host virt1012 is DOWN: PING CRITICAL - Packet loss = 100% [05:51:48] ACKNOWLEDGEMENT - DPKG on virt1012 is CRITICAL: Timeout while attempting connection andrew bogott Im fighting a network issue, will be upgrading and rebooting [05:51:48] ACKNOWLEDGEMENT - Disk space on virt1012 is CRITICAL: Timeout while attempting connection andrew bogott Im fighting a network issue, will be upgrading and rebooting [05:51:48] ACKNOWLEDGEMENT - NTP on virt1012 is CRITICAL: NTP CRITICAL: No response from NTP server andrew bogott Im fighting a network issue, will be upgrading and rebooting [05:51:48] ACKNOWLEDGEMENT - RAID on virt1012 is CRITICAL: Timeout while attempting connection andrew bogott Im fighting a network issue, will be upgrading and rebooting [05:51:48] ACKNOWLEDGEMENT - SSH on virt1012 is CRITICAL: Connection timed out andrew bogott Im fighting a network issue, will be upgrading and rebooting [05:51:48] ACKNOWLEDGEMENT - configured eth on virt1012 is CRITICAL: Timeout while attempting connection andrew bogott Im fighting a network issue, will be upgrading and rebooting [05:51:48] ACKNOWLEDGEMENT - dhclient process on virt1012 is CRITICAL: Timeout while attempting connection andrew bogott Im fighting a network issue, will be upgrading and rebooting [05:51:49] ACKNOWLEDGEMENT - puppet last run on virt1012 is CRITICAL: Timeout while attempting connection andrew bogott Im fighting a network issue, will be upgrading and rebooting [05:51:49] ACKNOWLEDGEMENT - salt-minion processes on virt1012 is CRITICAL: Timeout while attempting connection andrew bogott Im fighting a network issue, will be upgrading and rebooting [05:53:06] RECOVERY - Host virt1012 is UP: PING OK - Packet loss = 0%, RTA = 1.19 ms [05:53:27] RECOVERY - SSH on virt1012 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0) [05:53:46] RECOVERY - salt-minion processes on virt1012 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [06:03:07] PROBLEM - puppet last run on mw1085 is CRITICAL: CRITICAL: Puppet has 1 failures [06:08:26] PROBLEM - DPKG on virt1012 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [06:10:22] (03PS1) 10Andrew Bogott: Fix the final reboot to work without project name [puppet] - 10https://gerrit.wikimedia.org/r/193333 [06:11:37] RECOVERY - DPKG on virt1012 is OK: All packages OK [06:17:05] (03CR) 10Ori.livneh: [C: 031] Fix the final reboot to work without project name [puppet] - 10https://gerrit.wikimedia.org/r/193333 (owner: 10Andrew Bogott) [06:17:36] PROBLEM - Host virt1012 is DOWN: PING CRITICAL - Packet loss = 100% [06:18:27] RECOVERY - puppet last run on mw1085 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:21:07] RECOVERY - Host virt1012 is UP: PING OK - Packet loss = 0%, RTA = 3.67 ms [06:25:04] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Feb 27 06:24:00 UTC 2015 (duration 23m 59s) [06:25:10] Logged the message, Master [06:28:07] PROBLEM - puppet last run on mw1126 is CRITICAL: CRITICAL: puppet fail [06:29:27] PROBLEM - puppet last run on db2036 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:37] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:37] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:47] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:47] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:47] PROBLEM - puppet last run on logstash1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:57] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:57] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:06] PROBLEM - Host virt1012 is DOWN: PING CRITICAL - Packet loss = 100% [06:30:07] PROBLEM - puppet last run on mw1129 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:07] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:17] PROBLEM - puppet last run on mw1251 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:41] (03CR) 10Andrew Bogott: [C: 032] Fix the final reboot to work without project name [puppet] - 10https://gerrit.wikimedia.org/r/193333 (owner: 10Andrew Bogott) [06:33:37] RECOVERY - Host virt1012 is UP: PING OK - Packet loss = 0%, RTA = 1.12 ms [06:40:07] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:41:08] YuviPanda: (or ori) I upgraded virt1005 to Trusty yesterday and virt1012 to trusty today… I’m getting totally different kernel versions on the two boxes. [06:41:36] Any guess why that is? Is there some hardware-detecton phase involved in picking a kernel? [06:43:00] (03PS1) 10Yuvipanda: tools: Make portgrabber also ping two additional webproxies [puppet] - 10https://gerrit.wikimedia.org/r/193334 (https://phabricator.wikimedia.org/T89995) [06:43:21] (03PS2) 10Yuvipanda: tools: Make portgrabber also ping two additional webproxies [puppet] - 10https://gerrit.wikimedia.org/r/193334 (https://phabricator.wikimedia.org/T89995) [06:44:42] (03CR) 10Yuvipanda: [C: 032] tools: Make portgrabber also ping two additional webproxies [puppet] - 10https://gerrit.wikimedia.org/r/193334 (https://phabricator.wikimedia.org/T89995) (owner: 10Yuvipanda) [06:45:35] andrewbogott: hmm, not sure. [06:45:42] andrewbogott: also, I still can’t reach tools-webproxy :( [06:45:46] <_joe_> mmmh what'sup [06:45:48] ok [06:45:53] ah, _joe_ might know? [06:46:00] <_joe_> what? [06:46:02] andrewbogott: also this was an upgrade, right? not a re-image? [06:46:05] <_joe_> good morning [06:46:10] YuviPanda: (or ori) I upgraded virt1005 to Trusty yesterday and virt1012 to trusty today… I’m getting totally different kernel versions on the two boxes. [06:46:11] ‘morning! [06:46:18] Any guess why that is? Is there some hardware-detecton phase involved in picking a kernel? [06:46:29] <_joe_> andrewbogott: there is a USN that went out today [06:46:52] <_joe_> andrewbogott: http://www.ubuntu.com/usn/usn-2516-1/ [06:47:04] <_joe_> ATTENTION: Due to an unavoidable ABI change the kernel updates have [06:47:07] <_joe_> been given a new version number, which requires you to recompile and [06:47:11] <_joe_> reinstall all third party kernel modules you might have installed. [06:47:47] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 59691 bytes in 0.230 second response time [06:47:54] <_joe_> andrewbogott: is that explaining it? [06:48:07] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:48:26] _joe_: I don’t think that’s the reason, but reading that page may have explained my mistake. Checking... [06:49:01] (03PS1) 10Yuvipanda: tools: Fix perl syntax issue, maybe? [puppet] - 10https://gerrit.wikimedia.org/r/193335 [06:49:09] (03CR) 10jenkins-bot: [V: 04-1] tools: Fix perl syntax issue, maybe? [puppet] - 10https://gerrit.wikimedia.org/r/193335 (owner: 10Yuvipanda) [06:49:12] right [06:49:15] WARNING: The following packages cannot be authenticated! [06:49:17] linux-image-3.13.0-46-generic linux-image-extra-3.13.0-46-generic [06:49:18] linux-headers-3.13.0-46 linux-headers-3.13.0-46-generic [06:49:19] I didn’t expect /that/ [06:49:25] (03PS2) 10Yuvipanda: tools: Fix perl syntax issue, maybe? [puppet] - 10https://gerrit.wikimedia.org/r/193335 [06:49:52] Hm, and second try, no complaints [06:50:08] <_joe_> andrewbogott: did you re-run apt-get update? [06:50:32] I think my mistake was dumber than that, I did ‘upgrade’ instead of ‘dist-upgrade’. I wondered why it finished so quickly :) [06:50:58] _joe_: can you look at https://gerrit.wikimedia.org/r/193335 [06:50:59] ? [06:51:12] I think I fucked up perl there (in the previous patch), and am attempting to fix [06:51:14] Oh, regarding the verification issue? No, I literally just did ‘dist-upgrade’ twice in a row, and it warned me the first time and not the second [06:51:17] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:51:27] <_joe_> YuviPanda:is this urgent? [06:51:29] (the underlying idea is also somewhat terrible, but eh) [06:51:37] _joe_: kinda. toollabs is down again [06:51:43] Anyway, you can ignore me for now, I’m unstuck [06:52:29] merging this would let us bring it back partially (new webproxies on different machines) [06:52:29] RECOVERY - puppet last run on mw1129 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:52:29] it’s also only 3 lines. [06:52:29] err [06:52:29] 6 [06:52:29] <_joe_> YuviPanda: I don't really understand the context [06:52:43] <_joe_> but yeah the old version is horrible [06:53:02] <_joe_> (and use functions dude) [06:53:10] (03CR) 10Giuseppe Lavagetto: [C: 031] tools: Fix perl syntax issue, maybe? [puppet] - 10https://gerrit.wikimedia.org/r/193335 (owner: 10Yuvipanda) [06:53:18] _joe_: thanks. [06:53:24] _joe_: the entire file must go. [06:53:34] * YuviPanda is juggling between beta and tools [06:53:46] <_joe_> can I now go and brush my teeth? [06:53:56] (03CR) 10Yuvipanda: [C: 032] tools: Fix perl syntax issue, maybe? [puppet] - 10https://gerrit.wikimedia.org/r/193335 (owner: 10Yuvipanda) [06:54:54] _joe_: you have my permission, yes :) [06:54:57] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [06:55:15] <_joe_> go on strontium and do puppet-merge ^^ [06:55:26] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [06:58:17] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [06:58:20] YuviPanda: I merged your patches on palladium [06:58:26] andrewbogott: gah, thanks [06:58:45] I had typed ‘yes’ again and tabbed out, and not realized that it’s just going ‘y’ all over [06:58:46] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [07:02:38] Hm, thanks to that USN the whole world is trying to download a new kernel so my dist-upgrade is going to time out :( [07:05:22] <_joe_> btw, debian patches were out 3 days ago [07:16:07] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 222, down: 0, dormant: 0, excluded: 0, unused: 0 [07:26:48] RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [07:26:56] RECOVERY - puppet last run on logstash1002 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [07:26:57] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [07:27:07] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [07:27:27] RECOVERY - puppet last run on mw1251 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [07:27:36] RECOVERY - puppet last run on mw1126 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [07:27:58] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:27:59] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [07:28:18] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [07:28:26] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [07:30:17] PROBLEM - nutcracker port on silver is CRITICAL: CRITICAL - Socket timeout after 2 seconds [07:31:27] RECOVERY - nutcracker port on silver is OK: TCP OK - 0.000 second response time on port 11212 [07:31:28] what. silver’s having problems now?! [07:33:28] PROBLEM - puppetmaster https on virt1000 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:34:11] I can’t tell what’s happening… DOS on silver maybe? The nova api is super busy [07:34:11] ... [07:34:50] interesting [07:35:16] andrewbogott: not really DOS, from what I could tell. very little requests coming through (tailing access.log) [07:35:27] yeah [07:36:14] [647250.843265] init: nutcracker main process (12123) killed by TERM signal [07:36:49] springle: I just did a reload [07:36:56] !log reload nutcracker on silver [07:37:02] ah ok :) [07:37:24] not sure if that’s enough or if I should do a restart [07:37:40] !log reload apache on virt1000 [07:38:20] <_joe_> hey need help? [07:38:29] <_joe_> sorry I was desk cleaning [07:38:47] <_joe_> I do that once a week, when maggots start to crawl around [07:38:56] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.076 second response time [07:39:04] you get maggots once a week _joe_? [07:39:44] <_joe_> werdna: eheh joking of course [07:39:47] PROBLEM - DPKG on virt1012 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [07:39:54] !log restarted nova-api on virt1000 [07:40:05] <_joe_> YuviPanda: silver is super-ok [07:40:46] _joe_: is the nutcracker alert normal? [07:40:48] <_joe_> werdna: nonetheless, my cat sleeps almost the whole day on my desk, so it's pretty filled with cat hair [07:40:51] hmm, does it use nutcracker at all? [07:40:59] <_joe_> YuviPanda: first of all, yes it does [07:41:00] isn’t it using a memc on itself? [07:41:05] <_joe_> really? [07:41:09] <_joe_> meh [07:41:20] <_joe_> anyway, before going to restart the thing [07:41:26] yeah, i see a memcached process running. [07:41:28] <_joe_> try to assess if any damage is ongoing [07:41:35] <_joe_> YuviPanda: netstat! [07:41:47] <_joe_> that's how you figure what is connecting to what [07:42:12] right. let me first bring back tools (puppetmaster is back up now, so I’m unblocked) [07:42:24] <_joe_> I was saying - if no outage is ongoing, try to figure out what's wrong [07:42:27] PROBLEM - puppetmaster https on virt1000 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:42:27] right. [07:42:32] bah. [07:42:46] <_joe_> hey guys what's up with the whole labs thing? [07:43:08] _joe_: I’m still in the process of upgrading virt1012, it’s taking forever to download stuff [07:43:09] <_joe_> andrewbogott: what exactly did you change tonight? [07:43:14] Virt1000 is freaking out, seemingly unrelated. [07:43:28] <_joe_> ok I'll take a look at virt1000 [07:43:34] I migraded some instances off of virt1012 to other virt hosts. That’s it. [07:43:36] thanks [07:44:00] *migrated [07:44:41] wasn't sure if _joe_ and YuviPanda were talking about cats or servers [07:46:47] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.104 second response time [07:54:28] _joe_: hmm, interesting. 564 connections to nutcracker, all in close_wait [07:54:37] <_joe_> not strange [07:54:48] <_joe_> YuviPanda: I'll take a look shortly [07:54:53] _joe_: thanks! [07:59:51] (03CR) 10Nemo bis: "Brion sync'd this yesterday, but I still don't see any effect on the wikis..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193090 (https://phabricator.wikimedia.org/T59732) (owner: 10Nemo bis) [08:03:56] PROBLEM - Host virt1012 is DOWN: PING CRITICAL - Packet loss = 100% [08:04:38] Is wikimedia uses xen in server farms? [08:07:37] RECOVERY - Host virt1012 is UP: PING OK - Packet loss = 0%, RTA = 6.99 ms [08:07:38] RECOVERY - DPKG on virt1012 is OK: All packages OK [08:08:24] devunt: only for labs [08:08:45] not xen, kvm [08:08:47] As far as i know labs are using kvm with openstack [08:09:25] ah, right. [08:09:40] hmm, I haven’t eaten all day, and perhaps my brain is complaining [08:11:50] so there's no xen on wikimedia servers [08:12:15] sounds good [08:16:31] cause wikimedia should reboot all servers if it was powered with xen :/ [08:18:03] (03PS1) 10Giuseppe Lavagetto: wikitech: remove nutcracker [puppet] - 10https://gerrit.wikimedia.org/r/193341 [08:18:44] <_joe_> here it is guys [08:19:05] <_joe_> devunt: we still don't know what those CVEs are about [08:19:33] yeah it's embargoed [08:20:06] <_joe_> so if you do have info, share them :P [08:20:23] <_joe_> not here, the channel is logged pubicly of course [08:20:24] but rackspace announced that they are going to reboot all servers [08:20:24] https://community.rackspace.com/general/f/53/t/4978 [08:20:45] <_joe_> I expect amazon to have the same need [08:28:30] andrewbogott: how’s virt1012 doing? [08:28:48] pretty close [08:28:56] Would be closer if it didn’t take me 10 minutes to run ‘git review' [08:29:54] 6operations, 10ops-codfw, 3wikis-in-codfw: mc2004 console is unreadable remotely - https://phabricator.wikimedia.org/T90883#1072958 (10Joe) a:5Joe>3None [08:30:05] andrewbogott: :D ok [08:30:25] (03PS1) 10Andrew Bogott: Use the proper location for the libvirt driver on Trusty. [puppet] - 10https://gerrit.wikimedia.org/r/193342 [08:30:39] finally [08:31:40] (03CR) 10Andrew Bogott: [C: 032] Use the proper location for the libvirt driver on Trusty. [puppet] - 10https://gerrit.wikimedia.org/r/193342 (owner: 10Andrew Bogott) [08:34:18] (03PS1) 10Yuvipanda: tools: Make uwsgi & nodejs services also ping additional proxies [puppet] - 10https://gerrit.wikimedia.org/r/193343 (https://phabricator.wikimedia.org/T89995) [08:35:18] (03PS2) 10Yuvipanda: tools: Make uwsgi & nodejs services also ping additional proxies [puppet] - 10https://gerrit.wikimedia.org/r/193343 (https://phabricator.wikimedia.org/T89995) [08:36:14] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Make uwsgi & nodejs services also ping additional proxies [puppet] - 10https://gerrit.wikimedia.org/r/193343 (https://phabricator.wikimedia.org/T89995) (owner: 10Yuvipanda) [08:36:37] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 59691 bytes in 0.292 second response time [08:37:27] RECOVERY - puppet last run on virt1012 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [08:37:30] well /that/ took long enough [08:40:43] morebots: working? [08:40:43] I am a logbot running on tools-exec-10. [08:40:43] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [08:40:43] To log a message, type !log . [08:40:58] !log upgraded virt1012 to Trusty; starting all instances [08:41:06] Logged the message, Master [08:50:31] (03PS1) 10Yuvipanda: tools: Keep proxylistener socket open [puppet] - 10https://gerrit.wikimedia.org/r/193345 (https://phabricator.wikimedia.org/T89995) [08:50:39] (03CR) 10jenkins-bot: [V: 04-1] tools: Keep proxylistener socket open [puppet] - 10https://gerrit.wikimedia.org/r/193345 (https://phabricator.wikimedia.org/T89995) (owner: 10Yuvipanda) [08:50:50] (03PS2) 10Yuvipanda: tools: Keep proxylistener socket open [puppet] - 10https://gerrit.wikimedia.org/r/193345 (https://phabricator.wikimedia.org/T89995) [08:51:42] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Keep proxylistener socket open [puppet] - 10https://gerrit.wikimedia.org/r/193345 (https://phabricator.wikimedia.org/T89995) (owner: 10Yuvipanda) [09:02:38] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [09:08:32] sigh strontium [09:10:27] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [09:11:54] <_joe_> YuviPanda: what happens on strontium? [09:12:15] _joe_: I merged on palladium, but somehow merge on strontium failed [09:12:18] I had to manually merge [09:12:37] <_joe_> why it did fail? [09:12:50] <_joe_> the puppet-merge log tells you when it fails remotely [09:13:11] _joe_: oh, I never noticed... [09:13:12] ! 92b4e2a..5df81e0 production -> origin/production (unable to update local ref) [09:13:17] right [09:13:21] should colorify it at some point [09:13:31] and also make it accept ‘y' [09:13:35] anyway, haven’t eaten any food [09:13:37] brb [09:13:39] labs outage is back [09:13:39] <_joe_> YuviPanda: you have to wait ~ 1 minute from gerrit to puppet-merge [09:14:37] err [09:14:39] labs is back [09:14:41] not labs outage [09:14:43] labs outage is gone [09:14:48] <_joe_> ahah ok [09:15:15] my brain isn’t worrkking [09:15:16] fooodd [09:15:17] and tea [09:15:18] brb [09:42:49] yeah.. yesterday I discovered a whole new world of gerrit with hidden and read only projects that are not based on permissions but rather some weird flag [09:42:59] and guess what. Hidden means inactive... not hidden [09:43:09] <_joe_> lol [09:43:17] <_joe_> why did you bother with that? [09:43:43] https://phabricator.wikimedia.org/T89640 [09:43:48] aka hidden [09:44:06] I did manage to actually turn it to private [09:44:33] after meddling with gerrit rules for like an hour and stumping on a gerrit web UI bug that did not let me add groups [09:44:42] stambling* [09:45:04] and I ended up doing the git fetch remotes/meta/config trick [09:45:28] which is quite useful actually. I would prefer it to gerrits webUI if only it was a bit better documented [09:46:04] I was guessing around permission names for like 5 minutes before understanding the naming scheme [09:47:20] (03CR) 10Alexandros Kosiaris: [C: 032] Revert "Disable cloning of TransparencyReport until the repo is public again" [puppet] - 10https://gerrit.wikimedia.org/r/193150 (https://phabricator.wikimedia.org/T89640) (owner: 10Alexandros Kosiaris) [09:48:14] 6operations, 10Wikimedia-Git-or-Gerrit, 5Patch-For-Review: TransparencyReport repository master in Gerrit silently made private - https://phabricator.wikimedia.org/T89640#1073007 (10akosiaris) 5Open>3Resolved [09:52:53] (03CR) 10Odder: [C: 031] Change templateeditor user group rights on fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189594 (https://phabricator.wikimedia.org/T89040) (owner: 10Calak) [10:16:02] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - ssmith - https://phabricator.wikimedia.org/T90946#1073035 (10phuedx) Please be aware that ssmith could also mean @pizzzacat (**S**herah **Smith**). … We get this //a lot//. [11:07:43] hey hashar [11:07:43] hashar: last bit of ::beta roles is https://gerrit.wikimedia.org/r/193082 [11:08:23] hashar: do you know how jenkins ‘deploys’ parsoid? [11:08:36] YuviPanda: do you know http://wikidata.beta.wmflabs.org/ is broken? [11:08:45] is anyone doing maintenance on beta? [11:09:12] http://en.wikipedia.beta.wmflabs.org/wiki/0.19393398195127265 [11:09:17] also broken [11:10:17] aude: hey! [11:10:36] i could try to investigate though not really sure what the issue is [11:11:35] aude: try now? [11:12:05] what did you do? [11:12:11] works now [11:12:25] aude: logged in -labs, mysql was stopped on deployment-db1 [11:12:28] started it, investigating now [11:12:33] ah [11:12:55] thanks [11:13:04] aude: looks like there might have been some data loss [11:13:08] :( [11:13:14] it's just beta though [11:13:53] aude: https://phabricator.wikimedia.org/T91055? [11:14:11] :( [11:25:00] * YuviPanda pokes hashar [11:25:43] YuviPanda: i am looking at your cleanup change [11:25:51] it is not going to work :D [11:25:51] hashar: whee. thanks [11:25:57] hashar: why not. [11:26:15] hashar: deployment-parsoid01-test has a working parsoid instance with no puppet errors. it doesn’ts eem to be deploying the correct config, tho [11:26:39] cause you remove all beta specific thing that let us get it running and auto update it via Jenkins ? [11:26:47] of course there is no puppet error [11:26:52] you removed the interesting bits :D [11:27:00] and there is probably no parsoid installed [11:28:04] (03CR) 10Odder: [C: 031] Abusefilter config change for ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192246 (https://phabricator.wikimedia.org/T89379) (owner: 10Base) [11:28:40] YuviPanda: I am commenting on the change [11:29:02] hashar: there is a parsoid installed. curl localhost:8000 gives me parsoid. [11:29:24] hashar: the jenkins bits have been moved to role::ci::jenkins_access, a different role, in a previous commit [11:29:38] role::parsoid should only have parsoid related things :) [11:37:26] (03CR) 10Hashar: [C: 04-1] "Removing all the beta cluster specific logic would not let us attach the instance to Jenkins." (0310 comments) [puppet] - 10https://gerrit.wikimedia.org/r/193082 (https://phabricator.wikimedia.org/T86633) (owner: 10Yuvipanda) [11:37:38] YuviPanda: ^ [11:38:18] hashar: the jenkins code has been moved into another module in https://gerrit.wikimedia.org/r/#/c/193084/ [11:38:28] 6operations, 10Wikimedia-Hackathon-2015, 10Wikimedia-Site-requests, 7I18n, 7Tracking: Wikis waiting to be renamed (tracking) - https://phabricator.wikimedia.org/T21986#1073181 (10Qgil) [11:40:29] (03CR) 10Yuvipanda: "https://gerrit.wikimedia.org/r/#/c/193084/ abstracts away the way to grant jenkins access to an instance into a role under role::ci, and i" [puppet] - 10https://gerrit.wikimedia.org/r/193082 (https://phabricator.wikimedia.org/T86633) (owner: 10Yuvipanda) [11:41:39] hashar: I think I’ve addressed all your comments there. [11:42:16] (03CR) 10Yuvipanda: "localsettings.js doesn't actually seem to be set up by puppet anymore, so I'm unsure where the beta version should be set. Outside of that" [puppet] - 10https://gerrit.wikimedia.org/r/193082 (https://phabricator.wikimedia.org/T86633) (owner: 10Yuvipanda) [11:47:42] hashar: so what I want help with is to test if jenkins works properly with the new set of code. the ::beta role is unnecessary, and I’m killing them all [11:47:49] and this is the last one [11:55:29] 6operations, 10ops-eqiad: Rack and set up ms-be1016-1018 - https://phabricator.wikimedia.org/T90922#1073203 (10fgiunchedi) installing, status so far: # uefi boot must be disabled in favor of legacy boot # the raid array is not configured, and launching hp SSA utility from bios results in this: ``` Welcome to G... [11:57:03] (03CR) 10Odder: [C: 031] Enable NewUserMessage extension for fawiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193016 (https://phabricator.wikimedia.org/T90831) (owner: 10Mjbmr) [12:00:48] (03PS1) 10Yuvipanda: shinken: Remove yuvipanda from analytics alert group [puppet] - 10https://gerrit.wikimedia.org/r/193347 [12:01:20] (03PS2) 10Yuvipanda: shinken: Remove yuvipanda from analytics alert group [puppet] - 10https://gerrit.wikimedia.org/r/193347 [12:08:50] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - edenhill - https://phabricator.wikimedia.org/T90953#1073225 (10ArielGlenn) According to this: https://rt.wikimedia.org/Ticket/Display.html?id=8632 he was given access to fix a specific bug and access was intended to be r... [12:13:54] * YuviPanda pokes hashar again [12:15:14] (03PS1) 10ArielGlenn: remove edenhill access and account [puppet] - 10https://gerrit.wikimedia.org/r/193348 (https://phabricator.wikimedia.org/T90953) [12:16:09] hi, powerful ops tribe. /me humbly provides a link to https://phabricator.wikimedia.org/T90860 . (I have a meeting in some hours and might need to discuss the list then. Sorry for the deadline :/ ) [12:16:26] (03CR) 10ArielGlenn: [C: 032] remove edenhill access and account [puppet] - 10https://gerrit.wikimedia.org/r/193348 (https://phabricator.wikimedia.org/T90953) (owner: 10ArielGlenn) [12:16:28] YuviPanda: well I am not going to follow up on that today [12:16:50] hashar: alright. I’ll merge and see if someone complains whenever I have time. Probably not today. [12:17:01] YuviPanda: we can probably get the Parsoid manifest cleaned up next week though [12:17:06] well [12:17:09] it is broken [12:17:23] will probably not let us setup a new instance [12:17:27] deployment-parsoid01-test, running this version definitely isn’t broken. [12:17:29] I tested it. [12:17:44] is it properly deploying from jenkins and reloading the service ? [12:17:46] on postmerge? [12:17:56] that was what I was hoping you would let me test. [12:18:07] I commented about the jenkins code [12:18:11] it’s all been moved to the ci module [12:18:13] and merged [12:18:17] and used by mathoid, citoid, etc. [12:18:23] since that code was exactly the same [12:18:30] so whatever access jenkins had before, it should have now as well [12:18:41] role::ci::jenkins_access [12:18:42] yeah that is the "should" that would need to be carefully inspected [12:18:49] I cant work on that today for sure [12:18:53] fair enough. [12:18:56] :( [12:19:13] I am sure we can get the /srv stuff dropped with the new instances [12:19:16] the ci code has been that way for weeks now, however. so I suspect it works, at least for mathoid, citoid, etc. Otherwise people would have complained. [12:19:23] probably need to recreate a bunch of beta cluster instances and migrate :( [12:19:34] well [12:19:35] that’s not hard at all. That’s what I’ve been doing all the time now :) [12:19:39] the stuff you removed has already been applied [12:19:46] hashar: no, I re-created instances. [12:19:47] the problem will occur when one create a new instance [12:20:01] it might then end up missing some bits that have been cleaned up [12:20:03] ah [12:20:12] not for all of them, but definitely at least for one [12:20:16] anyway, next week. [12:20:21] recreated and readded in Jenkins + operations/mediawiki-config.git has been updated ? [12:20:31] hashar: I re-created with the same name... [12:20:33] :D [12:20:43] yeah but the IP change [12:20:46] right [12:20:46] anyway [12:20:50] I highlys suspect it works [12:20:53] and Jenkins connect to them via IP [12:20:54] but would be good to confirm next week [12:21:00] oh, hmm [12:21:05] some puppet parts refer to the instance IP as well [12:21:08] Elitre: apergos is our on duty person as per topic, might be able to help [12:22:12] hashar: either way, let’s work on this sometime next week. This is the last ::beta role, and with that gone there’ll be no ::beta roles left! [12:22:19] and then we can keep it that way :) [12:22:27] also for parsoid we got to update the parsoidcache varnish backend to point to the new IP :D [12:23:11] hashar: right. [12:23:59] YuviPanda: lets pair that next week in your afternoon / my morning :) [12:24:15] I’m not so fond of that terminology, but sure. [12:24:15] maybe we can even switch to git-deploy from deployment-bastion [12:24:26] I’d rather not do too many things in one go :D [12:24:30] I just want to get rid of the ::beta role [12:24:32] first [12:26:01] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1073241 (10ArielGlenn) [12:26:03] 10Ops-Access-Requests, 6operations, 6Security, 5Patch-For-Review: define in Puppet or remove user account - edenhill - https://phabricator.wikimedia.org/T90953#1073238 (10ArielGlenn) 5Open>3Resolved a:3ArielGlenn of course this doesn't actually remove the account anywhere but it won't be created on a... [12:27:30] Mighty apergos, oh, please hear my laments! I shall make you an offering, if that pleases you. (that usually comes in the form of chocolate.) [13:06:27] PROBLEM - puppet last run on db2002 is CRITICAL: CRITICAL: puppet fail [13:18:17] PROBLEM - puppet last run on mw1027 is CRITICAL: CRITICAL: Puppet has 1 failures [13:24:27] RECOVERY - puppet last run on db2002 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [13:33:44] (03PS1) 10Yuvipanda: shinken: Add coren and andrew to toollabs alert group [puppet] - 10https://gerrit.wikimedia.org/r/193353 [13:33:46] (03PS1) 10Yuvipanda: shinken: Add coren to labs infrastructure alerts [puppet] - 10https://gerrit.wikimedia.org/r/193354 [13:34:20] (03PS3) 10Yuvipanda: shinken: Remove yuvipanda from analytics alert group [puppet] - 10https://gerrit.wikimedia.org/r/193347 [13:34:34] (03CR) 10coren: [C: 031] shinken: Add coren to labs infrastructure alerts [puppet] - 10https://gerrit.wikimedia.org/r/193354 (owner: 10Yuvipanda) [13:34:54] (03CR) 10Yuvipanda: [C: 032 V: 032] shinken: Remove yuvipanda from analytics alert group [puppet] - 10https://gerrit.wikimedia.org/r/193347 (owner: 10Yuvipanda) [13:35:01] (03CR) 10coren: [C: 031] shinken: Add coren and andrew to toollabs alert group [puppet] - 10https://gerrit.wikimedia.org/r/193353 (owner: 10Yuvipanda) [13:35:05] (03PS2) 10Yuvipanda: shinken: Add coren and andrew to toollabs alert group [puppet] - 10https://gerrit.wikimedia.org/r/193353 [13:35:07] RECOVERY - puppet last run on mw1027 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [13:35:14] (03CR) 10Yuvipanda: [C: 032 V: 032] shinken: Add coren and andrew to toollabs alert group [puppet] - 10https://gerrit.wikimedia.org/r/193353 (owner: 10Yuvipanda) [13:35:28] (03PS2) 10Yuvipanda: shinken: Add coren to labs infrastructure alerts [puppet] - 10https://gerrit.wikimedia.org/r/193354 [13:35:36] (03CR) 10Yuvipanda: [V: 032] shinken: Add coren to labs infrastructure alerts [puppet] - 10https://gerrit.wikimedia.org/r/193354 (owner: 10Yuvipanda) [13:35:47] (03CR) 10Yuvipanda: [C: 032] shinken: Add coren to labs infrastructure alerts [puppet] - 10https://gerrit.wikimedia.org/r/193354 (owner: 10Yuvipanda) [13:38:32] (03PS1) 10Yuvipanda: puppet: Allow puppet merge to be answered with 'y' [puppet] - 10https://gerrit.wikimedia.org/r/193355 [13:39:13] (03PS2) 10Yuvipanda: puppet: Allow puppet merge to be answered with 'y' [puppet] - 10https://gerrit.wikimedia.org/r/193355 [13:44:19] Reedy: How large was the PDF that broke image scaling? [13:47:39] 6operations, 6Security: define in Puppet or remove user account - mah - https://phabricator.wikimedia.org/T90944#1073328 (10MarkAHershberger) Dzahn writes: > erbium.eqiad.wmnet > terbium.eqiad.wmnet: > oxygen.wikimedia.org > hooft.esams.wikimedia.org > fluorine.eqiad.wmnet > > If you could provide... [14:09:22] 6operations, 10ops-eqiad: Rack and set up ms-be1016-1018 - https://phabricator.wikimedia.org/T90922#1073366 (10fgiunchedi) tried booting SSA offline mode via virtual media however we need an ilo license ``` hpiLO-> vm cdrom insert http://208.80.154.151:9999/hpssaoffline-1.50-4.0.iso... [14:10:56] PROBLEM - Host tellurium is DOWN: PING CRITICAL - Packet loss = 100% [14:11:32] tellurium ? [14:12:03] Jeff_Green: ^ [14:12:16] wrong tz for him I guess [14:12:23] but he would get the page [14:12:55] Jeff_Green was around on the other channel a while ago [14:13:01] mentioned something about more libc patches? [14:13:01] 6operations, 7Graphite: scale statsd reporting/aggregation (plan) - https://phabricator.wikimedia.org/T89857#1073383 (10fgiunchedi) both seem fine to me as an approach to get us out of the woods in the short term [14:13:03] i wonder if he’s restarting [14:13:40] godog: what did Elitre want? I somehow missed the ping in this window [14:13:53] YuviPanda: yeah, sorry, forgot to silence alerts [14:14:02] ok [14:14:22] apergos: she was inquiring about https://phabricator.wikimedia.org/T90860 [14:14:50] ok, thanks [14:14:58] (I found it finally in the backread) [14:15:26] RECOVERY - Host tellurium is UP: PING OK - Packet loss = 0%, RTA = 1.60 ms [14:16:34] apergos: thanks for fixing the NFS issues on enwiki [14:16:46] it wasn't even nfs [14:16:53] ewk, I can still sudo -u apache on terbium :S [14:16:55] but you;re welcome [14:16:57] right. [14:24:48] 6operations: Please generate a list of task IDs and number of their subscribers, ordered by number of subscribers, for the "top 100" tasks in the VisualEditor project - https://phabricator.wikimedia.org/T90860#1073399 (10ArielGlenn) here ya go.{F48894} [14:28:14] apergos: any news on the salt upgrade? [14:28:26] it's going to be a while [14:28:32] i.e. not in 2 days or anything [14:29:04] apergos: ok [14:29:28] I have to go through the whole test across multiple platforms in a cluster with the upgrade rigamarole [14:29:41] yeah, it’s a fair bit of work.. [14:29:51] it won't be months either though [14:29:55] right [14:30:02] should we wait till the upgrade to get salt-syndic? [14:30:10] or can we just backport the matching package? [14:30:18] right now our version of salt-syndic doesn’t match the version of salt.. [14:30:31] and there won’t be any testing needed, since we don’t actually use salt-syndic anywhere... [14:34:12] um no I plan to get that in much sooner [14:34:49] wheee. sweet. thanks [14:34:55] noprob [14:45:33] (03CR) 10Ottomata: [C: 04-1] Changing permits on agreggator depot once downloaded (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/193256 (https://phabricator.wikimedia.org/T90742) (owner: 10Nuria) [14:47:00] 6operations: Please generate a list of task IDs and number of their subscribers, ordered by number of subscribers, for the "top 100" tasks in the VisualEditor project - https://phabricator.wikimedia.org/T90860#1073410 (10Aklapper) 5Open>3Resolved a:3Aklapper Thank you @ArielGlenn! @Elitre: Tasks with 12 s... [14:47:10] 6operations, 7HTTPS, 3HTTPS-by-default: HTTPS RFC5077 session tickets encryption key rollovers - https://phabricator.wikimedia.org/T86671#1073413 (10BBlack) 5Open>3Resolved a:3BBlack For the time being, we've decided to simply disable RFC5077 session tickets in the new jessie setup, as we're using clie... [14:47:53] tnx apergos and andre_! [14:48:09] +1 [14:48:13] yw! [14:50:41] 10Ops-Access-Requests, 6operations, 6Security, 5Patch-For-Review: define in Puppet or remove user account - edenhill - https://phabricator.wikimedia.org/T90953#1073416 (10Ottomata) Ja, he does not need an account. I thought accounts were removed when they were not in any groups included on a node. [14:50:44] 6operations, 10Datasets-General-or-Unknown, 10Wikidata: Wikidata dumps contain old-style serialization. - https://phabricator.wikimedia.org/T74348#1073417 (10ArielGlenn) OK, I no longer feel as stupid. The number of items with the 'entity' format is small in comparison to the total number of qualities, we... [14:51:23] 6operations, 10Datasets-General-or-Unknown, 10Wikidata: Wikidata dumps contain old-style serialization. - https://phabricator.wikimedia.org/T74348#1073418 (10ArielGlenn) Um, "with this format" means new redirects are dumped with {"entity" ... etc. [14:56:05] godog: quick question :P [14:56:35] in operations/puppet/cassandra, i see ensure package cassandra installed [14:56:46] so where is the pkg coming from? [14:57:22] mobrovac: our internal apt repository [14:57:32] apt.wikimedia? [14:57:48] PROBLEM - puppet last run on db2010 is CRITICAL: CRITICAL: puppet fail [14:58:21] i'd like to set up a mw-vagrant role for cassandra, but there is no cassandra pkg available there [14:58:50] mobrovac: yep that one, the packages themselves are imported from http://www.apache.org/dist/cassandra/debian [14:58:58] !log Ran mysql:wikiadmin@db1033 [metawiki]> UPDATE ipblocks SET ipb_deleted = 1 WHERE ipb_id = 16659; to actually suppress a suppressed name [14:59:05] Logged the message, Master [14:59:56] 6operations, 6Security: define in Puppet or remove user account - mah - https://phabricator.wikimedia.org/T90944#1073421 (10Dzahn) a:3Dzahn [15:00:54] godog: the repo is in mw-vagrant: deb http://apt.wikimedia.org/wikimedia trusty-wikimedia [15:00:59] but apt-cache search cassandra gives nothing [15:01:30] maybe not available for trusty? [15:03:24] mobrovac: indeed, just jessie [15:03:30] damn [15:04:04] we should switch mw-vagrant to jessie anyhow [15:09:59] mobrovac: eh. maybe. I don't want to switch mw-vagrant to jessie until there is a solid plan for migrating the cluster MW servers to it [15:10:33] we haven't tested hhvm on jessie AFAIK [15:10:44] double damn, i was about to throw a phab ticket on you [15:10:45] :P [15:10:52] thnx for the info though [15:10:57] you can open all the tickets you want :) [15:11:14] I have thought of starting an experimental branch to see what's broken [15:11:19] but time... [15:11:25] yeah, i prefer to invest that time into seeing if i can get cassandra to work on trusty [15:11:37] ah i see [15:11:42] we're having the same problem [15:11:58] we'd need a time-stopping machine [15:13:25] mobrovac: it looks like datastax has a PPA with cassandra in it. [15:13:41] yep they do [15:13:54] but, if possible, i'd like to use our (jessie) deb [15:15:11] if you change your mind, mw-vagrant has ::apt::ppa to easily add a new repo [15:15:33] cool [15:15:38] good to know! [15:15:42] thnx for the tip bd808 [15:17:05] RECOVERY - puppet last run on db2010 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [15:23:58] ok changing my mind about using the jessie pkg [15:24:07] will use the ppa fo cassandra [15:26:02] bd808: have you perhaps got any numbers as to how actively people use/test VE on mw-vagrant (or labs-vagrant, for that matter)? [15:28:04] mobrovac: we don't have any adoption numbers. WMF is allergic to user/use tracking (for reasonable reasons) [15:28:20] ok, then s/numbers/idea ? [15:28:21] :P [15:28:54] in other words, would breaking VE in mw-vagrant be considered bad behaviour? [15:29:02] yes [15:29:12] thought so [15:29:22] breaking anything in mw-vagrant is considered bad behavior [15:29:58] but you could make a parallel role couldn't you? [15:30:11] 6operations, 6Security: define in Puppet or remove user account - mah - https://phabricator.wikimedia.org/T90944#1073534 (10Dzahn) Hi Mark, thanks. I think it happened due to some technical mistake or you have been in a role of "release engineers" or something similar and then the role was removed but it didn... [15:30:39] ::role::ve_restbase or something? [15:31:08] or even just have the restbase role reconfigure ve [15:32:00] there is functionality to specify the load order of the generated config files so you would probably just need to load after the normal ve config [15:33:06] yeah well the trick is that what i would to do is for VirtualRESTService from core to use restbase, and not parsoid any more, which means that no config will help me there [15:33:18] i'll delay that [15:33:44] i have to create the restbase and cassandra roles in vagrant sooner or later, so not a big deal really [15:34:20] "no config will help me there" meaning it's going to be an all of nothing change? [15:34:40] all VE installs will require restbase at some point? [15:34:49] that's the idea, yes [15:34:58] bd808: https://phabricator.wikimedia.org/T89066 [15:35:34] ah no, you have a good point [15:35:35] damn [15:36:38] mobrovac: welcome to "legacy follows you for life" land :) [15:37:54] yey yey [15:38:17] (03PS2) 10BBlack: kernel/disk stuff for jessie cache installs [puppet] - 10https://gerrit.wikimedia.org/r/193314 [15:39:19] (03Abandoned) 10BBlack: mkfs for ext4 varnish filesystems on jessie [puppet] - 10https://gerrit.wikimedia.org/r/192833 (owner: 10BBlack) [15:39:33] (03CR) 10BBlack: [C: 032] kernel/disk stuff for jessie cache installs [puppet] - 10https://gerrit.wikimedia.org/r/193314 (owner: 10BBlack) [15:46:51] (03Abandoned) 10Nuria: Changing permits on agreggator depot once downloaded [puppet] - 10https://gerrit.wikimedia.org/r/193256 (https://phabricator.wikimedia.org/T90742) (owner: 10Nuria) [15:51:06] (03PS1) 10coren: Tool Labs: fix test for creating known_hosts [puppet] - 10https://gerrit.wikimedia.org/r/193381 [15:51:15] YuviPanda|food: ^^ [15:59:06] (03PS3) 10Ottomata: Alert critical if important Hadoop service processes are down [puppet] - 10https://gerrit.wikimedia.org/r/192701 (https://phabricator.wikimedia.org/T89730) [15:59:22] (03CR) 10Ottomata: [C: 032 V: 032] Alert critical if important Hadoop service processes are down [puppet] - 10https://gerrit.wikimedia.org/r/192701 (https://phabricator.wikimedia.org/T89730) (owner: 10Ottomata) [16:00:22] 6operations, 6Security: define in Puppet or remove user account - mah - https://phabricator.wikimedia.org/T90944#1073633 (10MarkAHershberger) My volunteer efforts will be mostly on wmflabs if I need any infrastructure. Much of what I'm doing doesn't require access to the servers. ----- Original Message -----... [16:01:44] ottomata: all of those are going to page aren't they? [16:03:10] 10Ops-Access-Requests, 6operations, 6Security, 5Patch-For-Review: define in Puppet or remove user account - edenhill - https://phabricator.wikimedia.org/T90953#1073660 (10ArielGlenn) nope, at least right now the account sticks around. this might be a feature, as this means no orphaned files lying around o... [16:04:48] ACKNOWLEDGEMENT - RAID on restbase1006 is CRITICAL: CRITICAL: Active: 8, Working: 8, Failed: 1, Spare: 0 Filippo Giunchedi T89639 [16:10:31] godog, yes, i was asked to make more paging hadoop stuff [16:10:45] https://phabricator.wikimedia.org/T89730 [16:11:12] those don't go down often, so hpefully you won't notice :) [16:12:58] (03PS1) 10BBlack: jessie cache disks: no trim on M160 anyways [puppet] - 10https://gerrit.wikimedia.org/r/193382 [16:14:10] (03CR) 10BBlack: [C: 032] jessie cache disks: no trim on M160 anyways [puppet] - 10https://gerrit.wikimedia.org/r/193382 (owner: 10BBlack) [16:14:53] ottomata: indeed, looking forward to today' session [16:18:31] Coren: lgtm. Merge and babysit? [16:18:37] Am on phone at restaurant [16:19:17] (03CR) 10coren: [C: 032] "Yuvi +1'd virtually over phone. :-)" [puppet] - 10https://gerrit.wikimedia.org/r/193381 (owner: 10coren) [16:36:40] PROBLEM - puppet last run on iridium is CRITICAL: CRITICAL: Puppet has 1 failures [16:38:53] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - milimetric - https://phabricator.wikimedia.org/T90956#1073722 (10RobH) @Milimetric, I'm not 100% which group we'd put you in then, sorry for the back and forth. On stat1001 I see a group for statistics-web-users, which... [16:41:43] Could some nice opsen (bblack ?) purge https://www.wikimedia.org/ from varnish to get the "now with mustache" logo out of the cached varnish result? [16:42:14] The page has been updated in https://meta.wikimedia.org/w/index.php?diff=11405959&oldid=5820158&rcid=5989447 [16:44:18] heh, the varnish page on wikitech stil reads :Don't do this. Consult a varnish specialist first. [16:44:31] bd808: so you have a very limited window and already pinged the right person ;D [16:44:41] (i just felt bad cuz you asked for a nice opsen ;) [16:44:43] 6operations, 10Beta-Cluster, 6Labs: Backport new salt-syndic packages - https://phabricator.wikimedia.org/T85442#1073725 (10ArielGlenn) I've imported salt-syndic_2014.1.11 into our lucid/precise/rtrusty repos. All dependencies should be there already. Let me know if it wfy. [16:44:59] robh: evil ;) [16:45:17] my first reaction was 'bwahahahahah, there are no nice opsen' [16:45:19] heh [16:46:20] an opsen is curtious, kind, cheerful, thrifty, brave, clean and reverent [16:46:55] hahahahaha [16:47:13] pretty sure when two opsen laugh at that, its not true ;D [16:47:44] let me see: courteous, no. kind: what kind? cheerful: fat chance. thrifty.. maybe so. brave: depends.. clean: got a mouth filthy as a sailor. reverent... *cough cough cough cough* [16:47:45] 6operations, 10ops-codfw: label/update mgmt & settings/test eventlog2001 - https://phabricator.wikimedia.org/T90909#1073729 (10Papaul) a:5Papaul>3RobH Rack table update physical label update mgmt settings and Bios settings complete Test complete eventlog2001 10.193.2.119 ge-5/0/9 A5 [16:48:09] We need to have some 'any opsen can do this' varnish purge info, but i assume its just not to that point of stability and hence doesnt have it yet? [16:48:26] who knows (brandon knows!) [16:48:59] it's just potentially scary if you twiddle the wrongs bits (like banning all enwiki cache hits) [16:49:09] (03CR) 10Hashar: [C: 031] "\o/" [puppet] - 10https://gerrit.wikimedia.org/r/193078 (https://phabricator.wikimedia.org/T76392) (owner: 10Yuvipanda) [16:49:59] apergos: it was the parts of the cub scout motto I remembered. Apparently I forgot trustworthy, loyal, helpful, friendly, and obedient [16:50:30] i had long forgotten scouting stuff [16:50:33] it's wikimedia.org, you really need to mess up to ban all enwiki [16:50:38] * robh was a cub, but not boy scout [16:50:46] they got all strange and religious, i bailed. [16:50:51] anyway, I don't see the mustache I suppose it was done ? [16:51:05] indeed its gone now [16:51:07] woot yeah seems fixed [16:51:10] perhaps cache expired on its own [16:51:11] heh [16:51:28] * apergos breaks into song "Be prepared... that's the boy scouts' marching song..." [16:51:42] well, wiki.png is now moustacheless again anyway [16:51:53] be prepared to hold you liquor pretty well; don't write naughty words on walls if you can't spell [16:52:14] re: banning, there is standard varnish documentation on the web for how to format and execute ban commands [16:52:41] bblack: did you expire it or did it just expire normally? (just curious) [16:52:44] and I could sort-of write a short set of docs about how to handle common scenarios, maybe. but every time I do, there ends up being a long tail of important considerations and exceptions [16:52:49] I didn't touch it [16:53:22] somebody took the mustache off of the image that was cached [16:53:27] basically, in practice, banning objects out of our cache layer is a fucking minefield of crazy things people would never thing of that are critical [16:53:31] it's not mw.o any more either [16:53:57] RECOVERY - puppet last run on iridium is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [16:54:50] (aside from accidentally wiping out the wrong stuff / too much: keep in mind every request passes through 2-3 layers of caches, and any given URL lives in ~30+ machines in at least 6-8 distinct sites/layers [16:54:54] as in: https://wikitech.wikimedia.org/wiki/LVS_and_Varnish [16:55:06] so you have to wary of where and in what order the ban commands go out [16:55:46] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - milimetric - https://phabricator.wikimedia.org/T90956#1073738 (10Milimetric) @RobH, yes, the statistics-web-users group is perfectly fine. I'm the one who's sorry I didn't know to tell you in the first place. Thanks ag... [16:56:14] given that every ban request is for some unique new situation, the bottom line is there will never be a simple "run this command and all is well" for this [16:56:34] 6operations, 10Wikimedia-Fundraising: Need access to PHP error logs on lutetium - https://phabricator.wikimedia.org/T89992#1073740 (10Jgreen) [16:56:42] godog: still here? I could use some packaging help if you have a bit of time [16:56:45] PROBLEM - Varnishkafka log producer on cp1008 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka [16:56:52] andrewbogott: sure [16:56:56] ^ cp1008 isn't really production [16:57:55] godog: So, there are all these ready-made designate packages here: http://packages.ubuntu.com/search?keywords=designate&searchon=names But strangely they are provided for Utopic and not for Trusty. [16:58:04] bblack: We should paste bin this explanation because I'm sure I've read it from you several times now. Or maybe just make it !purge [16:58:44] I suspect that this is just laziness on the maintainer’s part… but those exact packages don’t install on Trusty due to some probably-not-necessary dependencies [16:58:56] RECOVERY - Varnishkafka log producer on cp1008 is OK: PROCS OK: 1 process with command name varnishkafka [16:59:11] godog: would it be possible for you to help me with a backport? (or better yet just do one without me) [16:59:16] the two long-term meta-answers to this problem are: (1) ops needs headcount so we can have more peoples that understand the cache layer deeply and (2) write tools to automate away at least some of the craziness involved in pushing the ban around to the right clusters/sites/layers in order [17:00:06] 6operations, 7Graphite: scale statsd reporting/aggregation (plan) - https://phabricator.wikimedia.org/T89857#1073745 (10fgiunchedi) correction, statsd proxy won't split on `\n` differently than statsdlb. I think we're good to go with statsdlb @ori (context is that I'll be back from vacation on March 16th so if... [17:00:12] godog: I’m still poking designate people on IRC for existing packages, but so far there’s no response [17:00:17] (2) is like 50 items deep in my mental todo list of things we never have time for [17:01:44] andrewbogott: no joy on openstack's repositories themselves? [17:02:38] godog: Could be I’m just bad at the googles… does OS have a debian repo outside of the ubuntu cloud archive? [17:03:37] (03PS4) 10Ori.livneh: Initial commit of statsdlb [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193255 [17:03:53] andrewbogott: I don't think so, double checking [17:04:16] bblack, got time for some C review? [17:04:21] https://gerrit.wikimedia.org/r/193255 [17:05:25] -1 use python :) [17:05:28] done! [17:05:33] not fast enough sadly [17:05:37] lol [17:05:41] no, seriously [17:05:56] it chokes on the volume of udp metrics [17:06:03] that's partly the reason why this is needed in the first place [17:06:21] looking... [17:07:06] !log running CentralAuth's migratePass0.php on all wikis [17:07:10] Logged the message, Master [17:07:33] godog: I can provide you with a git link and a branch; alternatively there are debian source packages that might be a good place to start. I’ve attempted both of those things myself but wound up quite confused [17:07:53] I’ve built lots of debs from python in the past, but this is... different [17:08:17] andrewbogott: I take it a simple build of the utopic package in trusty doesn't work? what about the vivid version? [17:08:39] the vivid version is the wrong Designate version for my purposes (I need 14.1 at the moment) [17:08:50] I’m not sure I know what you mean by ‘a simple build’ [17:09:16] hey, just a random question. how impossible would it be to have ICU 54 on production wikis? it comes with the libicu52 package, which has ICU 52 as you can guess. [17:09:22] simply building the source package on trusty -- trying that at the moment andrewbogott [17:09:44] oh, I think I missed the fact that there were source packages available [17:09:48] thank you! [17:10:21] andrewbogott: np, I tried with http://archive.ubuntu.com/ubuntu/pool/universe/d/designate/designate_2014.1-9.dsc [17:14:29] ori: what's this thing for anyways? it loadbalances over local ports on the same machine? [17:14:36] PROBLEM - puppet last run on lvs2003 is CRITICAL: CRITICAL: Puppet has 1 failures [17:15:07] bblack: we're using txstatsd to process metrics, but it's python, so it saturates a single CPU core and starts dropping metrics [17:15:45] bblack: we're going to switch to a more efficient implementation (likely statsite), but that's a complicated migration since the naming scheme and set of aggregated metrics generated by statsite is slightly different [17:15:56] so that's only going to happen slowly, after godog gets back from vacation [17:16:09] in the interim, we can just spin up a few additional txstatsd instances [17:16:20] and put this thing in front [17:16:28] would round-robin suffice or is the hashing over metric names useful for txstatsd locality? [17:16:44] hashing is needed otherwise the backends will clobber each other's aggregates [17:16:49] ah! [17:17:39] godog: discussion in #openstack-dns resulted in this bug: https://bugs.launchpad.net/cloud-archive/+bug/1426464 [17:17:47] IOW, we are on our own! [17:19:16] andrewbogott: sigh, hopefully it isn't too deep of a rabbit hole, I'm taking a quick look [17:20:06] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - ssmith - https://phabricator.wikimedia.org/T90946#1073777 (10MaxSem) This particular S is you: ssmith: ensure: present gid: 500 name: ssmith realname: Sam Smith ssh_keys: [ssh-rsa AAAAB3NzaC1yc2EAA... [17:20:18] I am hoping/expecting that it’s a trivial rebuild if done by someone competent. [17:22:04] heh, I hope too! [17:25:50] 6operations, 10ops-codfw: prepare 'shipment' for eqdfw - https://phabricator.wikimedia.org/T91077#1073784 (10RobH) 3NEW a:3Papaul [17:26:15] 6operations, 10ops-codfw: prepare equipment list for eqdfw - https://phabricator.wikimedia.org/T91077#1073794 (10RobH) [17:26:40] 6operations, 7network: Set up cr1-eqord & cr1-eqdfw - https://phabricator.wikimedia.org/T89227#1030334 (10RobH) [17:26:41] 6operations, 10ops-codfw: prepare equipment list for eqdfw - https://phabricator.wikimedia.org/T91077#1073784 (10RobH) [17:32:26] RECOVERY - puppet last run on lvs2003 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [17:33:16] godog: I’m going to make breakfast but will be nearby [17:33:29] andrewbogott: ack, I think it'll need 3/4 additional packages [17:34:15] (03CR) 10BBlack: [C: 04-1] "Some nits inlined. I still haven't really checked it over for buffer size / string lens / string ops type stuff." (035 comments) [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193255 (owner: 10Ori.livneh) [17:35:00] that’s fine as long as they don’t clash with existing ones… I anticipate that whatever host runs this will be running stock Trusty. It’d be nice if it could coexist with the nova-api and nova-network services but I can use a dedicated system if I have to. [17:36:08] (03PS1) 10Yuvipanda: tools: Move tomcat tools to generic node [puppet] - 10https://gerrit.wikimedia.org/r/193390 (https://phabricator.wikimedia.org/T91066) [17:36:12] Coren: ^ [17:36:19] welcome back andrewbogott! [17:37:31] (03PS2) 10Yuvipanda: tools: Move tomcat tools to generic node [puppet] - 10https://gerrit.wikimedia.org/r/193390 (https://phabricator.wikimedia.org/T91066) [17:37:43] (03PS3) 10Yuvipanda: puppet: Allow puppet merge to be answered with 'y' [puppet] - 10https://gerrit.wikimedia.org/r/193355 [17:38:00] (03CR) 10Yuvipanda: [C: 032 V: 032] puppet: Allow puppet merge to be answered with 'y' [puppet] - 10https://gerrit.wikimedia.org/r/193355 (owner: 10Yuvipanda) [17:38:42] 6operations, 10ops-codfw: prepare equipment list for eqord - https://phabricator.wikimedia.org/T91079#1073831 (10RobH) 3NEW a:3Papaul [17:38:49] (03PS3) 10Yuvipanda: tools: Move tomcat tools to generic node [puppet] - 10https://gerrit.wikimedia.org/r/193390 (https://phabricator.wikimedia.org/T91066) [17:38:56] 6operations, 10ops-codfw: prepare equipment list for eqord - https://phabricator.wikimedia.org/T91079#1073841 (10RobH) [17:38:57] 6operations, 7network: Set up cr1-eqord & cr1-eqdfw - https://phabricator.wikimedia.org/T89227#1030334 (10RobH) [17:39:10] andrewbogott: yeah it is four packages, designate openstack-pkg-tools python-kajiki python-pecan [17:39:10] 6operations, 10ops-codfw: prepare equipment list for eqdfw - https://phabricator.wikimedia.org/T91077#1073843 (10RobH) [17:39:42] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Move tomcat tools to generic node [puppet] - 10https://gerrit.wikimedia.org/r/193390 (https://phabricator.wikimedia.org/T91066) (owner: 10Yuvipanda) [17:39:45] godog: so, I [17:40:02] I’ll need all of those designate-* packages. You’re just working on one of them at the moment? [17:40:21] (well, not -docs I guess :) ) [17:41:07] andrewbogott: btw, tools webproxy hot spare failover works perfectly. Tested and documented at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Documentation/Admin [17:41:38] andrewbogott: which also includes gridmaster failover, which also works fine when you're testing the correct thing. [17:41:40] :-) [17:41:56] Coren: you should mention that also there :D [17:42:14] Coren: so, tools-submit is an SPOF now. [17:42:19] for bigbrother and cron [17:42:33] I really want to rewrite bigbrother into not-perl [17:43:31] YuviPanda: Sure, but that's medium priority at most - while cron is annoying having -submit gone for a little while has not a large impact. [17:43:47] true, but I’d want bigbrother to be up, I’d think. [17:44:32] bigbrother is easier to make redundant than cron; there is literally no harm done if two run because the worse that can happen is that two jstart* are started for one failed job. [17:44:44] (One of which would fail) [17:45:22] cron jobs, otoh, are not idempotent. [17:45:26] andrewbogott: check /data/scratch/filippo/designate [17:48:16] (03PS2) 1001tonythomas: Added BounceHandler extension to group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191937 [17:48:55] (03PS1) 10Glaisher: Enable WikiLove extension at newiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193392 (https://phabricator.wikimedia.org/T89818) [17:49:35] 10Ops-Access-Requests, 6operations, 6Security, 5Patch-For-Review: define in Puppet or remove user account - edenhill - https://phabricator.wikimedia.org/T90953#1073892 (10Ottomata) Oh hm, I guess it is just their ssh key that is removed then? [17:52:02] godog: Thank you! I will give those a try. [17:52:36] godog: I’m not focussed on this at the moment, though, so don’t wait around for me. I’ll email a followup. [17:53:08] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1073910 (10RobH) [17:53:11] 10Ops-Access-Requests, 6operations, 6Security, 5Patch-For-Review: define in Puppet or remove user account - edenhill - https://phabricator.wikimedia.org/T90953#1073908 (10RobH) 5Resolved>3Open Did anyone actually manually remove the account off the non-pupeptized hosts? [17:53:20] andrewbogott: cool, I have no idea whether they work or conflict, they install though, standard disclaimer (I'm going on vacation and will be back on the 15th) [17:53:45] godog: good to know! Thanks [17:54:12] (03PS3) 1001tonythomas: Added BounceHandler extension to group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191937 (https://phabricator.wikimedia.org/T48640) [17:56:32] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1073924 (10RobH) [17:56:33] 10Ops-Access-Requests, 6operations, 6Security, 5Patch-For-Review: define in Puppet or remove user account - edenhill - https://phabricator.wikimedia.org/T90953#1073922 (10RobH) 5Open>3Resolved i removed the user manually off bast1001 (since it wasnt puppet added, it wouldnt have been removed by puppet,... [17:56:57] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071208 (10RobH) [17:59:55] (03PS1) 10Yuvipanda: tools: Add tomcat starter & required packages to generic nodes [puppet] - 10https://gerrit.wikimedia.org/r/193394 (https://phabricator.wikimedia.org/T91066) [18:00:04] (03CR) 10jenkins-bot: [V: 04-1] tools: Add tomcat starter & required packages to generic nodes [puppet] - 10https://gerrit.wikimedia.org/r/193394 (https://phabricator.wikimedia.org/T91066) (owner: 10Yuvipanda) [18:00:06] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - milimetric - https://phabricator.wikimedia.org/T90956#1073947 (10RobH) a:5RobH>3Tnegrin @Tnegrin, Would you please comment on this task to approve that @Milimetric can be added to statistics-web-users group, which f... [18:00:12] (03PS2) 10Yuvipanda: tools: Add tomcat starter & required packages to generic nodes [puppet] - 10https://gerrit.wikimedia.org/r/193394 (https://phabricator.wikimedia.org/T91066) [18:00:27] 6operations, 10Analytics-Cluster, 6Analytics-Kanban: Upgrade Analytics Cluster to Trusty, and then to CDH 5.3 - https://phabricator.wikimedia.org/T1200#1073949 (10kevinator) [18:00:42] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Add tomcat starter & required packages to generic nodes [puppet] - 10https://gerrit.wikimedia.org/r/193394 (https://phabricator.wikimedia.org/T91066) (owner: 10Yuvipanda) [18:01:59] (03PS1) 10RobH: formalizing milimetric's access to stat1001 [puppet] - 10https://gerrit.wikimedia.org/r/193396 [18:04:41] (03CR) 10RobH: "My change looks awesome! BUT this cannot merge until we have Toby's managerial approval on https://phabricator.wikimedia.org/T90956" [puppet] - 10https://gerrit.wikimedia.org/r/193396 (owner: 10RobH) [18:05:13] andrewbogott: Coren I’m heading to bed now. Anything from me before I go? [18:05:35] YuviPanda: I'm all good. Sleep. [18:05:37] hotspare stuff is documented, and I did a round of seeing which things are on which hosts, and outside of tools-submit (bigbrother + cron) we don’t actually have a SPOF [18:05:40] so that’s good [18:05:52] 1003 and 12 and 12 are a bit overcrowded with toollabs hosts, however. [18:05:56] YuviPanda: Can I get that doc link again? [18:06:00] andrewbogott: sure. moment [18:06:04] andrewbogott: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Documentation/Admin [18:06:08] thanks [18:06:44] (03CR) 10Odder: [C: 031] Enable WikiLove extension at newiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193392 (https://phabricator.wikimedia.org/T89818) (owner: 10Glaisher) [18:06:59] andrewbogott: I just moved it. is at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Admin now. [18:07:06] still left a redirect at the old place, tho. [18:07:19] andrewbogott: can youy follow up on labs-l with incident report, etc? [18:07:49] YuviPanda: yes, or Coren will (Coren: Either way is fine with me, just don’t want to step on your toes if you’ve already started) [18:08:00] right. thanks! [18:08:34] andrewbogott: Your call; might want to fold the small outage with the one that sneakily followed after I went to bed or not? [18:08:58] OK, I will do it after the gage/otto thing [18:09:41] bye everyone [18:10:08] (03PS4) 1001tonythomas: Added BounceHandler extension to group1 wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191937 (https://phabricator.wikimedia.org/T48640) [18:13:34] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - milimetric - https://phabricator.wikimedia.org/T90956#1074055 (10RobH) https://gerrit.wikimedia.org/r/#/c/193396/ is the patchset for this change, once we have Toby's approval. [18:19:22] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1074073 (10RobH) [18:23:56] (03PS1) 10Dzahn: disable hexmode's shell account [puppet] - 10https://gerrit.wikimedia.org/r/193402 (https://phabricator.wikimedia.org/T90944) [18:25:32] (03PS2) 10Dzahn: disable hexmode's shell account [puppet] - 10https://gerrit.wikimedia.org/r/193402 (https://phabricator.wikimedia.org/T90944) [18:25:43] greg-g: Can I get a deploy slot today for an OOjs UI bug fix? [18:26:03] RoanKattouw: what is it? for which wikis? [18:26:08] It's a small JS/CSS change that fixes a bug where all OOUI dialogs are totally broken [18:26:14] phase0 only [18:26:23] ah, I think I know that issue, sure thing, doit [18:26:30] (wmf19 now, I think? Whatever the branch was that got cut on Wednesday) [18:26:32] OK cool thanks [18:26:49] We are still wrangling Jenkins so it won't be right now, but probably some time in the next 30 mins [18:26:57] how about also deploying a logo change for wikiquote [18:27:03] https://gerrit.wikimedia.org/r/#/c/192978/ [18:27:53] (03CR) 10Dzahn: [C: 032] disable hexmode's shell account [puppet] - 10https://gerrit.wikimedia.org/r/193402 (https://phabricator.wikimedia.org/T90944) (owner: 10Dzahn) [18:30:31] bblack: thanks for the review! that's super-useful [18:33:20] greg-g: Never mind, we were living in the future and thought wmf20 already existed. The bug isn't on any production wiki, so we don't need a deploy or cherry-pick, we're just merging things into master and that's all we need to do [18:33:25] greg-g: Sorry for the confusion [18:33:32] ori: np. I'm stuck in meetings and such for the next few hours, but I want to go back and look at string/mem stuff too [18:33:38] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1074154 (10RobH) [18:34:11] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071208 (10RobH) [18:34:20] (03CR) 10Ori.livneh: Initial commit of statsdlb (035 comments) [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193255 (owner: 10Ori.livneh) [18:37:05] PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: Puppet has 1 failures [18:38:58] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - ssmith - https://phabricator.wikimedia.org/T90946#1074164 (10RobH) As @MaxSem states, this particualr ssmith is indeed Sam Smith (@phuedx). So is access to these systems required? If so, we'll need to immediately start... [18:39:20] 6operations, 10ops-codfw: prepare equipment list for eqdfw - https://phabricator.wikimedia.org/T91077#1074165 (10Reedy) [18:42:49] 6operations, 10ops-codfw: rack mw2135 through mw2215 - https://phabricator.wikimedia.org/T86806#1074171 (10Papaul) a:5Papaul>3RobH rack table update physical label in place mgmt settings and BIOS settings complete Test complete mw2135 10.193.2.35 ge-4/0/27 B4 mw2136 10.193.2.36 ge-4/0/28 B4 mw2137 10.... [18:44:36] PROBLEM - puppet last run on caesium is CRITICAL: CRITICAL: Puppet has 1 failures [18:45:04] 6operations, 10ops-codfw: prepare equipment list for eqdfw - https://phabricator.wikimedia.org/T91077#1074176 (10Papaul) @ RobH shipping will work for me. Thannks [18:50:19] 6operations: order onsite tools for eqdfw/eqord - https://phabricator.wikimedia.org/T91095#1074192 (10RobH) 3NEW a:3RobH [18:53:49] (03PS1) 10Dzahn: remove mah from admin groups but ensure present [puppet] - 10https://gerrit.wikimedia.org/r/193408 (https://phabricator.wikimedia.org/T90944) [18:54:43] 6operations: order onsite tools for eqdfw/eqord - https://phabricator.wikimedia.org/T91095#1074221 (10RobH) [18:55:15] (03PS2) 10Dzahn: remove mah from admin groups but ensure present [puppet] - 10https://gerrit.wikimedia.org/r/193408 (https://phabricator.wikimedia.org/T90944) [18:56:21] (03CR) 10Dzahn: [C: 032] remove mah from admin groups but ensure present [puppet] - 10https://gerrit.wikimedia.org/r/193408 (https://phabricator.wikimedia.org/T90944) (owner: 10Dzahn) [19:00:01] robh: ^ that should make puppet happy again. expecting recovery on bast1001, caesium,,.. [19:00:16] RECOVERY - puppet last run on bast1001 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [19:01:36] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - ssmith - https://phabricator.wikimedia.org/T90946#1074260 (10phuedx) @RobH: all traces of ssmith should be removed. I try to go by @phuedx wherever I can. Thanks! [19:02:26] RECOVERY - puppet last run on caesium is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [19:04:44] (03CR) 10Dzahn: "< icinga-wm> RECOVERY - puppet last run on bast1001 is OK" [puppet] - 10https://gerrit.wikimedia.org/r/193408 (https://phabricator.wikimedia.org/T90944) (owner: 10Dzahn) [19:07:32] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - ssmith - https://phabricator.wikimedia.org/T90946#1074294 (10RobH) a:5phuedx>3RobH @phuedx: Thanks! I'll take care of disabling the ssmith account, leaving your active phuedx acount untouched. [19:07:52] (03PS1) 10RobH: Sam Smith uses phuedx, no longer uses ssmith login [puppet] - 10https://gerrit.wikimedia.org/r/193414 [19:07:57] PROBLEM - uWSGI web apps on graphite2001 is CRITICAL: CRITICAL: Not all configured uWSGI apps are running. [19:08:01] <_joe_> ottomata: https://phabricator.wikimedia.org/T83580 [19:08:06] PROBLEM - Graphite Carbon on graphite2001 is CRITICAL: CRITICAL: Not all configured Carbon instances are running. [19:08:46] (03CR) 10RobH: [C: 032] Sam Smith uses phuedx, no longer uses ssmith login [puppet] - 10https://gerrit.wikimedia.org/r/193414 (owner: 10RobH) [19:09:54] thanks [19:10:17] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - ssmith - https://phabricator.wikimedia.org/T90946#1074311 (10RobH) I just merged https://gerrit.wikimedia.org/r/#/c/193414/ Once it has time to run across the cluster, we can go back and clean any rogue accounts puppet... [19:11:26] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1074312 (10RobH) [19:11:47] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071208 (10RobH) [19:13:39] thanks robh [19:14:19] glad to get things cleaned up =] [19:16:59] (03PS5) 10Ori.livneh: Initial commit of statsdlb [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193255 [19:17:10] 6operations, 10RESTBase, 7Monitoring, 5Patch-For-Review: Detailed cassandra monitoring - https://phabricator.wikimedia.org/T78514#1074348 (10GWicke) A dashboard to track heap metrics is now set up at http://grafana.wikimedia.org/#/dashboard/db/cassandra-heap This is fairly performance-oriented. We should... [19:19:36] (03PS2) 10Dzahn: 10.in-addr.arpa - indentation fixes [dns] - 10https://gerrit.wikimedia.org/r/193160 [19:22:49] (03PS2) 10Filippo Giunchedi: Update cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/193169 (owner: 10GWicke) [19:23:04] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Update cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/193169 (owner: 10GWicke) [19:37:06] (03PS6) 10Ori.livneh: Initial commit of statsdlb [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193255 [19:44:05] (03PS1) 10Dzahn: add user mah to absented user group [puppet] - 10https://gerrit.wikimedia.org/r/193420 (https://phabricator.wikimedia.org/T90944) [19:45:10] (03PS2) 10Dzahn: add user mah to absented user group [puppet] - 10https://gerrit.wikimedia.org/r/193420 (https://phabricator.wikimedia.org/T90944) [19:45:30] <^d> mutante: What's absent for? Does it just ensure => absent automatically? [19:45:51] <^d> (in which case, why do we keep the ssh key?) [19:45:53] (03CR) 10Dzahn: [C: 032] add user mah to absented user group [puppet] - 10https://gerrit.wikimedia.org/r/193420 (https://phabricator.wikimedia.org/T90944) (owner: 10Dzahn) [19:46:11] ^d: we don't keep the key, needed rebase, see PS2 [19:46:26] i had to do this one in 2 steps to make puppet happy [19:46:41] because we had the user deleted before puppet had a chance to [19:46:47] and it didn't like that [19:46:50] <^d> Ah ok [19:47:15] and about the special group for absented user [19:47:25] i'm not really sure either [19:47:39] <^d> Yeah that doesn't make a ton of sense to me, but w/e [19:47:45] but copied what we do for other old users [19:56:23] 6operations, 10ops-eqiad: Rack and set up ms-be1016-1018 - https://phabricator.wikimedia.org/T90922#1074526 (10fgiunchedi) @cmjohnson was able to set LD m and LD n as primary/secondary from onsite, rebooting yields this now after "booting in legacy bios mode" ``` 304-Keyboard or System Unit Error ``` [19:57:48] (03PS5) 10Odder: Set $wgBabelCategoryNames true at outreachwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190686 (https://phabricator.wikimedia.org/T89484) (owner: 10Gerardduenas) [19:59:38] (03PS1) 10Ori.livneh: Add Debian packaging [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193422 [19:59:40] godog: ^ [20:00:21] (03CR) 10BBlack: Initial commit of statsdlb (031 comment) [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193255 (owner: 10Ori.livneh) [20:01:02] bblack: meeting? [20:01:34] bblack: but it's UDP, not TCP [20:01:55] (03PS1) 10RobH: cleaning up absented users group in admins data.yml [puppet] - 10https://gerrit.wikimedia.org/r/193423 [20:04:48] (03CR) 10RobH: [C: 032] cleaning up absented users group in admins data.yml [puppet] - 10https://gerrit.wikimedia.org/r/193423 (owner: 10RobH) [20:04:53] (03CR) 10Ori.livneh: Initial commit of statsdlb (031 comment) [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193255 (owner: 10Ori.livneh) [20:09:16] 6operations, 10ops-eqiad: Rack and set up ms-be1016-1018 - https://phabricator.wikimedia.org/T90922#1074571 (10Cmjohnson) I removed the keyboard from teh server and the error cleared. However when I tried to install I am getting an error installing grub on /dev/sdm ┌┤ [!!] Install the GRUB boot loader o... [20:13:17] (03PS1) 10Alexandros Kosiaris: Add akosiaris to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/193425 [20:18:07] (03CR) 10Dzahn: [C: 031] "i think he signed an NDA :) /me hides" [puppet] - 10https://gerrit.wikimedia.org/r/193425 (owner: 10Alexandros Kosiaris) [20:18:49] ori: the values I quoted are at the socket layer, not udp/tcp -specific. UDP may be different and be in /proc/sys/net/ipv4/udp_mem settings, etc [20:19:02] but either way, I suspect you want more than 64K and that the default is also more than 64K [20:19:46] bblack: hm. what would be sensible? i'd rather not check /proc if i can [20:20:40] you can decide on a sensible minimum (e.g. expected-avg-pktsize * 8 or something?), call getsockopt to query current, and set it only if current < desired, or something [20:20:46] I don't know how complex you want to get for this really [20:21:31] https://github.com/gdnsd/gdnsd/blob/master/src/dnsio_udp.c#L162 <- too complicated [20:25:14] bblack: perhaps i should just not touch either socket option? [20:26:26] bblack: let me know if you found yuri's phab ticket [20:28:59] ori: yeah, leaving it at default is a reasoanble option. or offering a cmdline param to explicitly set, and leaving OS default if not set. then it can be tuned if we're seeing loss. [20:29:51] nuria: this is the ticket: https://phabricator.wikimedia.org/T89177 [20:30:11] but I was thinking I left a longer comment there about it, which I didn't. I think the comment I'm thinking of is buried in some other ticket I can't find at the moment... [20:30:56] 6operations, 6Engineering-Community, 3ECT-February-2015, 3ECT-March-2015: date/budget proposal for 2015 Ops Offsite - https://phabricator.wikimedia.org/T89023#1074650 (10Rfarrand) [20:30:59] bblack: you didn't set PHAB_COMMENT_MAXLEN, so it got truncated [20:31:06] :P [20:31:20] (03PS7) 10Ori.livneh: Initial commit of statsdlb [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193255 [20:31:22] (03PS2) 10Ori.livneh: Add Debian packaging [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193422 [20:32:07] bblack, milimetric , ottomata: meeting notes from uniques-VCL meeting we just had: https://wikitech.wikimedia.org/wiki/Analytics/Unique_clients/Last_visit_solution#2015-02-27 [20:32:32] thx [20:33:37] nuria: maybe I was thinking of this ticket too, I donno anymore: https://phabricator.wikimedia.org/T89688 [20:33:53] (03PS3) 10Odder: AbuseFilter config change for ukwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192246 (https://phabricator.wikimedia.org/T89379) (owner: 10Base) [20:34:09] but my point is, there's already some backlogged work that all revolves around "unify the various analytics-y things we're gathering for all the cache clusters to be sane an coherent" [20:34:50] which is all kinda backlogged on me right now. the cookie thing is simpler, maybe can be done separately, but definitely interacts with it all [20:36:11] 6operations: Give Google webmaster tools access to jon katz (Read only is fine) - https://phabricator.wikimedia.org/T90980#1074663 (10Krenair) [20:36:35] 6operations: Give Google webmaster tools access to jon katz (Read only is fine) - https://phabricator.wikimedia.org/T90980#1072222 (10Krenair) Who is actually able to grant this? Does it need WMF management approval? [20:37:52] 6operations: Give Google webmaster tools access to jon katz (Read only is fine) - https://phabricator.wikimedia.org/T90980#1074673 (10Reedy) And for any specific site/project etc? Adding EVERY site/project combination takes a long time. If you need specific wiki(s), it'd be better to list them as such :) [20:38:39] 6operations: Give Google webmaster tools access to jon katz (Read only is fine) - https://phabricator.wikimedia.org/T90980#1074678 (10Jalexander) >>! In T90980#1074673, @Reedy wrote: > And for any specific site/project etc? Adding EVERY site/project combination takes a long time. If you need specific wiki(s), it... [20:39:05] Reedy, I would just assume wikimediafoundation.org only and leave the rest :) [20:39:16] bblack: ok, read them both. I think they are not related per se but it is similar work [20:40:24] they'll only be related in that they're all modifying X-Analytics in some way that should work across mobile+non-mobile [20:42:59] (03CR) 10Alexandros Kosiaris: [C: 032] Add akosiaris to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/193425 (owner: 10Alexandros Kosiaris) [20:43:05] (03CR) 10Alexandros Kosiaris: ":P" [puppet] - 10https://gerrit.wikimedia.org/r/193425 (owner: 10Alexandros Kosiaris) [20:50:22] snicker [20:51:17] {{coi}} [20:51:27] 6operations, 6Security, 5Patch-For-Review: define in Puppet or remove user account - mah - https://phabricator.wikimedia.org/T90944#1074684 (10Dzahn) confirmed with salt that the user is gone from all hosts except labstore1001 (LDAP groups for labs projects) ( salt '*' cmd.run 'id mah') deleted home directo... [20:54:37] (03CR) 10Alex Monk: "Might be OK now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/172012 (https://phabricator.wikimedia.org/T75164) (owner: 10Dereckson) [20:57:00] 6operations, 6Security: define in Puppet or remove user account - amire80 - https://phabricator.wikimedia.org/T90950#1074686 (10Dzahn) a:3Amire80 Hi Amir, it seems very likely that you were not even aware of having this access and it just happened by mistake we made in the past, just making sure that you ar... [20:58:29] <_joe_> do we really want those tickets to be logged here? [20:59:12] 6operations, 6Security: define in Puppet or remove user account - mglaser - https://phabricator.wikimedia.org/T90947#1074688 (10Dzahn) a:3Mglaser Hallo Markus, kannst du kurz kommentieren auf welche Server genau du Zugriff brauchst? Das scheint in der Vergangenheit ein Fehler passiert zu sein. Sind irgendwe... [21:03:08] (03CR) 10Ori.livneh: [C: 032 V: 032] Initial commit of statsdlb [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193255 (owner: 10Ori.livneh) [21:03:25] (03PS1) 10Ori.livneh: Add statsdlb module [puppet] - 10https://gerrit.wikimedia.org/r/193483 [21:03:27] (03CR) 10Ori.livneh: [C: 032 V: 032] Add Debian packaging [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193422 (owner: 10Ori.livneh) [21:06:27] 6operations: Give Google webmaster tools access to jon katz (Read only is fine) - https://phabricator.wikimedia.org/T90980#1074694 (10Dzahn) It's not the only way to share the master password, it's just tedious work to add a delegated user to every single project because the UI makes you do it one by one. The pa... [21:09:24] 6operations: Give Google webmaster tools access to jon katz (Read only is fine) - https://phabricator.wikimedia.org/T90980#1074697 (10Dzahn) ``` Google account for Google Webmaster Tools Make sure you know what you're doing when using Google Webmaster Tools. In order to have individual accountability, *... [21:09:56] PROBLEM - DPKG on osmium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:16:26] RECOVERY - DPKG on osmium is OK: All packages OK [21:26:41] (03PS1) 10Dzahn: initial commit - moved from operations/software [software/dbtree] - 10https://gerrit.wikimedia.org/r/193488 (https://phabricator.wikimedia.org/T90837) [21:26:48] springle: ^ [21:34:40] lol [21:34:45] mutante: what about mediawiki-config [21:34:54] 6operations, 7network: set up switch port for second ethernet interface for ms1001 - https://phabricator.wikimedia.org/T89833#1074731 (10RobH) a:3RobH So I worked with @arielglenn on this, paste of work https://phabricator.wikimedia.org/P340 For some reason I get that error attempting to commit my change, a... [21:35:57] Reedy: that has docroot/noc/ but not docroot/noc/dbtree [21:36:39] Reedy: eh, no, it's https://phabricator.wikimedia.org/T90837 [21:36:58] i'm solving the problem that it's in 2 separate locations by moving it to a third place :p [21:37:16] (03PS1) 10Ori.livneh: Fix for packaging [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193489 [21:37:20] i'm uploading another one to delete it [21:37:26] (03CR) 10Ori.livneh: [C: 032 V: 032] Fix for packaging [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193489 (owner: 10Ori.livneh) [21:37:56] (03PS1) 10Dzahn: move dbtree into its own subrepository, rm here [software] - 10https://gerrit.wikimedia.org/r/193491 (https://phabricator.wikimedia.org/T90837) [21:38:22] https://github.com/wikimedia/operations-mediawiki-config/tree/master/docroot/noc/dbtree [21:38:51] yes, one more change to remove that [21:39:15] and another one to make puppet clone it then from new place [21:40:03] Reedy: https://gerrit.wikimedia.org/r/#/c/193143/ [21:40:51] (03CR) 10Ori.livneh: [C: 032] Add statsdlb module [puppet] - 10https://gerrit.wikimedia.org/r/193483 (owner: 10Ori.livneh) [21:41:22] (03CR) 10Dzahn: "also see https://gerrit.wikimedia.org/r/193488 , https://gerrit.wikimedia.org/r/#/c/193491/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193143 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [21:42:28] (03CR) 10Dzahn: "and https://gerrit.wikimedia.org/r/#/c/192771/ this is where code diverged now because springle fixed dbtree" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193143 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [21:43:50] (03CR) 10Reedy: "Where's puppet going to be checking it out to?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193143 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [21:43:56] mutante: ^ possible -1 [21:47:20] https://github.com/wikimedia/operations-software/blob/master/dbtree/inc/sanity.php#L34 [21:47:23] * Reedy cries a little inside [21:48:43] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - ssmith - https://phabricator.wikimedia.org/T90946#1074751 (10RobH) 5Open>3Resolved It looks like the absent cleanup of the user ssmith has removed it off all hosts (as intended), no manual cleanup is required. I'm r... [21:48:44] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1074753 (10RobH) [21:49:21] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071208 (10RobH) [21:50:25] (03PS1) 10Dzahn: let puppet clone dbtree into noc/docroot/dbtree [puppet] - 10https://gerrit.wikimedia.org/r/193492 [21:50:27] Reedy: ^ [21:50:34] (03CR) 10jenkins-bot: [V: 04-1] let puppet clone dbtree into noc/docroot/dbtree [puppet] - 10https://gerrit.wikimedia.org/r/193492 (owner: 10Dzahn) [21:50:42] (03PS2) 10Reedy: let puppet clone dbtree into noc/docroot/dbtree [puppet] - 10https://gerrit.wikimedia.org/r/193492 (owner: 10Dzahn) [21:50:53] woo, autorebase worked [21:51:10] :) [21:51:34] mutante: is this then being cloned directly onto each apache? [21:51:47] iirc noc is served from anywhere now? [21:51:57] oh, wait, it's misc [21:52:00] just terbium? [21:52:15] (03PS3) 10Dzahn: let puppet clone dbtree into noc/docroot/dbtree [puppet] - 10https://gerrit.wikimedia.org/r/193492 (https://phabricator.wikimedia.org/T90837) [21:52:25] Reedy: yea, that's the thing, it's just on terbium [21:52:30] and the noc role as well [21:52:37] I was about to say it needed to be mediawiki-staging [21:52:42] but if it's just on terbium [21:52:47] don't need the .gitignore entry either then [21:53:01] yea, just that. doing that because springle fixed dbtree [21:53:07] and then asked me how it gets deployed [21:53:16] copy paste, recommit! [21:53:17] and then we saw it's in more than one place and .. eh yeah [21:53:17] ;D [21:53:27] I'm sure we'd discussed this mess before :) [21:53:36] haha, i think so [21:53:57] your proposed solution seems reasonable though [21:54:34] i moved it into the third place, because otherwise i would have to use sparse checkouts of subdirectory in git::clone :p [21:54:53] or puppet would pull all the other stuff in ops/software [21:55:41] (03PS4) 10Dzahn: let puppet clone dbtree into noc/docroot/dbtree [puppet] - 10https://gerrit.wikimedia.org/r/193492 (https://phabricator.wikimedia.org/T90837) [21:57:01] 6operations: order onsite tools for eqdfw/eqord - https://phabricator.wikimedia.org/T91095#1074767 (10RobH) [21:57:58] Reedy: now just need to figure out why there is more stuff on terbium than in the repo .hah :P [21:58:17] ? [21:59:15] contents in operations/software: https://gerrit.wikimedia.org/r/#/c/193491/ on terbium: css images index.php js [22:01:40] Reedy: ooh, man , you know what happened? [22:01:43] 7Puppet, 10Beta-Cluster, 5Patch-For-Review: Puppet failures on deployment-bastion - https://phabricator.wikimedia.org/T75520#1074769 (10Krinkle) Is this still an issue? [22:02:00] springle had fixed dbtree in operations/software and manually git cloned [22:02:08] then there was a deployment [22:02:12] and now it's back to old dbtree [22:02:24] from mediawiki-config [22:02:44] yesterday dbtree was different [22:03:47] hahaha [22:04:00] we could fix it temporarily by updating mediawiki-config [22:04:22] seems sensible short term, rather than deploying new stuff on friday [22:04:52] true [22:06:26] PROBLEM - puppet last run on db2002 is CRITICAL: CRITICAL: puppet fail [22:07:08] (03PS1) 10Reedy: Update dbtree from operations-software upstream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193495 [22:08:03] mutante: want me to jfdi? ;) [22:08:07] s/do/deploy/ [22:09:20] (03CR) 10Dzahn: "yes, please. see these where Sean recently updated it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193495 (owner: 10Reedy) [22:09:38] (03CR) 10Reedy: [C: 032] Update dbtree from operations-software upstream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193495 (owner: 10Reedy) [22:09:43] (03Merged) 10jenkins-bot: Update dbtree from operations-software upstream [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193495 (owner: 10Reedy) [22:10:04] mutante: Ah, one thing though [22:10:06] Is the config [22:10:36] (03CR) 10Dzahn: "thanks. can be confirmed by looking at inc/tree line 40. it asks tendril now. ('https://tendril.wikimedia.org/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193495 (owner: 10Reedy) [22:10:54] Reedy: yes? [22:10:59] 6operations: bond eth intefaces on ms1001 - https://phabricator.wikimedia.org/T89829#1074805 (10RobH) [22:11:00] 6operations, 7network: set up switch port for second ethernet interface for ms1001 - https://phabricator.wikimedia.org/T89833#1074803 (10RobH) 5Open>3Resolved I figured it out. I had left ge-1/0/0 in vlan-public1-eqiad, so unit 0 was being defined for it there, and then again in the ae7 setup, hence the c... [22:11:36] https://github.com/wikimedia/operations-software/blob/master/dbtree/inc/config.template.php [22:11:40] (03CR) 10Reedy: "Need to do something to ensure the config file is in place and populated..." [puppet] - 10https://gerrit.wikimedia.org/r/193492 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [22:11:40] 6operations, 5Patch-For-Review: dbtree - duplicated code in 2 locations - clean up - https://phabricator.wikimedia.org/T90837#1074806 (10Dzahn) also see https://gerrit.wikimedia.org/r/#/c/193492/ https://gerrit.wikimedia.org/r/#/c/193495/ [22:12:04] https://github.com/wikimedia/operations-software/blob/master/dbtree/index.php#L3 [22:12:30] mutante: Guess you're gonna have to write a puppet template :P [22:12:36] Reedy: hrmm.. so the one config needs to go into puppet ?:P arg [22:12:38] yes [22:12:50] Should be roughly similar to PrivateSettings.php [22:13:06] So my commit above fixes the code, but it won't have any config to use as of yet [22:13:44] yea, i can add a template and the password to private repo .. but [22:13:47] lol, the .gitignore that is there is useless too [22:13:51] i don't know the password [22:13:59] because the file has been overwritten by mw deploy [22:14:13] oh, is that tendril login, not db host stuff? [22:14:37] i think it is, because it now asks tendril [22:15:41] Reedy: https://gerrit.wikimedia.org/r/#/c/193036/ [22:16:07] (03PS1) 10Reedy: Update .gitignore for dbtree [software] - 10https://gerrit.wikimedia.org/r/193498 [22:17:58] (03PS1) 10Reedy: Revert "Update dbtree from operations-software upstream" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193499 [22:18:08] (03CR) 10Reedy: [C: 032] Revert "Update dbtree from operations-software upstream" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193499 (owner: 10Reedy) [22:18:13] (03Merged) 10jenkins-bot: Revert "Update dbtree from operations-software upstream" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193499 (owner: 10Reedy) [22:19:04] !log reedy Synchronized docroot and w: nooop for dbtree ( already reverted by prior deploy ) (duration: 00m 05s) [22:19:11] Logged the message, Master [22:19:11] mutante: want to merge https://gerrit.wikimedia.org/r/193498 ? [22:19:53] (03PS2) 10Dzahn: Update .gitignore for dbtree [software] - 10https://gerrit.wikimedia.org/r/193498 (https://phabricator.wikimedia.org/T90837) (owner: 10Reedy) [22:20:05] (03CR) 10Dzahn: [C: 032] Update .gitignore for dbtree [software] - 10https://gerrit.wikimedia.org/r/193498 (https://phabricator.wikimedia.org/T90837) (owner: 10Reedy) [22:21:02] does jenkins work on that repo? :) [22:21:14] i dont think it does [22:21:34] "Random software tools for ops tasks (svn2git, udpprofile, etc)" [22:21:40] too many different languages to have checks ?:p [22:22:37] lol [22:23:20] there is a class passwords::tendril :) [22:24:03] aha [22:24:10] we may be in luck then [22:25:06] RECOVERY - puppet last run on db2002 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [22:25:07] mutante: V2 it :P [22:31:30] hrmm.. now just the db name [22:31:52] and host, without using an actual db* name [22:32:12] tendril.wm.org is neon, that's not the db though [22:32:30] let me check dbtree.. oh wait :) [22:35:06] db1011 ok. and we have 2 users, tendril and tendril_web [22:35:36] tendril_web is used in tendril config on neon, tendril is in puppet [22:37:32] (03PS1) 10Ottomata: Render statistics-private mysql.conf credentials on stat1002 [puppet] - 10https://gerrit.wikimedia.org/r/193503 [22:39:26] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: Puppet has 1 failures [22:44:30] (03PS1) 10Dzahn: puppetize dbtree config file to connect to tendril [puppet] - 10https://gerrit.wikimedia.org/r/193505 [22:46:04] 10Ops-Access-Requests, 6operations: RESTBase deploy access and shell on Cassandra cluster for eevans - https://phabricator.wikimedia.org/T91134#1074880 (10GWicke) 3NEW [22:49:49] 10Ops-Access-Requests, 6operations: RESTBase deploy access and shell on Cassandra cluster for eevans - https://phabricator.wikimedia.org/T91134#1074908 (10GWicke) [22:50:33] (03PS2) 10Dzahn: puppetize dbtree config file to connect to tendril [puppet] - 10https://gerrit.wikimedia.org/r/193505 [22:51:06] (03PS3) 10Dzahn: puppetize dbtree config file to connect to tendril [puppet] - 10https://gerrit.wikimedia.org/r/193505 (https://phabricator.wikimedia.org/T90837) [22:53:21] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - ssmith - https://phabricator.wikimedia.org/T90946#1074920 (10chasemp) >>! In T90946#1074751, @RobH wrote: > It looks like the absent cleanup of the user ssmith has removed it off all hosts (as intended), no manual cleanu... [22:57:20] (03PS4) 10Dzahn: puppetize dbtree config file to connect to tendril [puppet] - 10https://gerrit.wikimedia.org/r/193505 (https://phabricator.wikimedia.org/T90837) [22:58:06] RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [22:59:09] (03PS5) 10Dzahn: puppetize dbtree config file to connect to tendril [puppet] - 10https://gerrit.wikimedia.org/r/193505 (https://phabricator.wikimedia.org/T90837) [23:00:10] (03PS6) 10Dzahn: puppetize dbtree config file to connect to tendril [puppet] - 10https://gerrit.wikimedia.org/r/193505 (https://phabricator.wikimedia.org/T90837) [23:00:36] (03CR) 10Dzahn: "@Reedy: see https://gerrit.wikimedia.org/r/#/c/193505/ for that" [puppet] - 10https://gerrit.wikimedia.org/r/193492 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [23:17:54] 6operations, 5Patch-For-Review: dbtree - duplicated code in 2 locations - puppetize config - https://phabricator.wikimedia.org/T90837#1074971 (10Dzahn) [23:26:52] How is it that https://wikitech.wikimedia.org/wiki/Special:Version shows 1.25wmf18 but /srv/mediawiki/php -> php-1.25wmf17 ? [23:27:01] bd808: any idea? [23:27:13] Is that php link not coupled with the actual live version of the wiki? [23:27:18] nope [23:27:26] what’s it for then? [23:27:32] And who creates it, &c. [23:27:35] the php link always points at the branch that enwiki is running [23:27:50] it is setup in the weekly branch cut process [23:27:55] and /not/ the branch that the current machine is running? [23:27:58] nope [23:28:02] That’s… stupid, right? [23:28:04] (03PS1) 10Ori.livneh: Add statsdlb role and provision on graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/193514 [23:28:26] as far as I know it is only referenced by some old cron jobs [23:28:37] (yes, it's stupid.) [23:28:40] it's only stupid that it exists [23:28:44] it should die [23:28:52] ok :) That at least explans why my debug lines aren’t showing up [23:28:58] as should the p -> php symlink [23:28:59] I will disregard hereafter [23:30:42] andrewbogott: grep labswiki wikiversions.json will tell you which branch is active for wikitech [23:30:49] (03PS2) 10Ori.livneh: Add statsdlb role and provision on graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/193514 [23:31:13] bd808: yep, makes sense. I just assumed that the php link was added by scap for my convenience. [23:31:59] on the main cluster that wouldn't make sense because both branches are live for different vhosts at the same time [23:32:13] Ah, true. [23:32:28] my assumption is that the symlink is from days long past when the only code was in that location [23:32:53] and when multiple active branches were added ti was left behind "just in case" [23:33:03] and is still there many years later [23:33:38] because ... reasons and fear of breaking /something/ [23:33:56] !log pushing a config change to txstatsd on graphite1001, the service may complain briefly [23:34:00] Logged the message, Master [23:34:09] (03CR) 10Ori.livneh: [C: 032] Add statsdlb role and provision on graphite1001 [puppet] - 10https://gerrit.wikimedia.org/r/193514 (owner: 10Ori.livneh) [23:39:07] (03PS1) 10Ori.livneh: statsdlb: quote $DAEMON_ARGS [puppet] - 10https://gerrit.wikimedia.org/r/193516 [23:39:19] (03CR) 10Ori.livneh: [C: 032 V: 032] statsdlb: quote $DAEMON_ARGS [puppet] - 10https://gerrit.wikimedia.org/r/193516 (owner: 10Ori.livneh) [23:40:06] PROBLEM - puppet last run on graphite1001 is CRITICAL: CRITICAL: Puppet has 1 failures [23:40:56] (03PS1) 10Ori.livneh: statsdlb: correct job name [puppet] - 10https://gerrit.wikimedia.org/r/193517 [23:41:07] (03CR) 10Ori.livneh: [C: 032 V: 032] statsdlb: correct job name [puppet] - 10https://gerrit.wikimedia.org/r/193517 (owner: 10Ori.livneh) [23:41:16] RECOVERY - puppet last run on graphite1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:45:59] (03PS1) 10Ori.livneh: Set $wgUDPProfilerPort back to 8125 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193518 [23:46:18] (03CR) 10Ori.livneh: [C: 032] Set $wgUDPProfilerPort back to 8125 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193518 (owner: 10Ori.livneh) [23:46:23] (03Merged) 10jenkins-bot: Set $wgUDPProfilerPort back to 8125 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193518 (owner: 10Ori.livneh) [23:47:04] !log ori Synchronized wmf-config/CommonSettings.php: I8fa0649ab: Set $wgUDPProfilerPort back to 8125 (duration: 00m 06s) [23:47:10] Logged the message, Master [23:54:36] (03PS1) 10Ori.livneh: statsdlb: add two additional txstatsd instances [puppet] - 10https://gerrit.wikimedia.org/r/193519 [23:54:53] (03CR) 10Ori.livneh: [C: 032 V: 032] statsdlb: add two additional txstatsd instances [puppet] - 10https://gerrit.wikimedia.org/r/193519 (owner: 10Ori.livneh)