[00:00:04] RoanKattouw, ^d, Krenair, Jdlrobson: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150226T0000). [00:00:21] Were you going to do it, MaxSem? [00:00:51] Krenair, there was a slight bug in my stuff, can't do right now [00:00:55] PROBLEM - Host virt1005 is DOWN: PING CRITICAL - Packet loss = 100% [00:01:02] (and don't deploy it!) :P :( [00:01:24] mine should be safe to deploy, does that mean i'm the only one [00:01:27] ok [00:01:27] i can just push it myself [00:01:41] Jon and Marius also had stuff [00:02:08] hoo, ping for swat [00:02:19] o/ [00:03:17] ebernhardson, sounds good [00:04:20] Krenair: for me to ship the patches? sure [00:04:25] Krenair: or if you are, either way :) [00:04:25] RECOVERY - Host virt1005 is UP: PING OK - Packet loss = 0%, RTA = 1.40 ms [00:04:37] you can do your patches [00:05:14] * ebernhardson twiddles thumbs while jenkins works its core bump magic [00:07:52] core update rejected by jenkins, 2015-02-26 00:05:46 integration-slave1007 build4614: [65374970] [no req] ErrorException from line 671 of /mnt/jenkins-workspace/workspace/mediawiki-extensions-hhvm/src/includes/Setup.php: PHP Warning: Invalid argument: function: method 'efGatherExtensionSetup' not found [00:08:48] i don't have that method anywhere in my local clone of 1.25wmf18+extensions [00:11:05] Krenair: are you able to merge anything to core? looks like someone broke it [00:11:57] it doesn't appear on tin either... [00:12:13] That's from the Gather extension I think? [00:12:18] Some mobile thing or other? [00:12:23] MaxSem: ----^^ ? [00:12:24] i suppose thats just a warning, theres also this fatal: 2015-02-26 00:05:46 integration-slave1007 build4614: [65374970] [no req] ErrorException from line 671 of /mnt/jenkins-workspace/workspace/mediawiki-extensions-hhvm/src/includes/Setup.php: PHP Warning: Invalid argument: function: method 'efGatherExtensionSetup' not found [00:12:31] bah wrong paste... [00:12:42] Fatal error: Uncaught exception 'MWException' with message 'Invalid callback WikiGrok\Hooks::onEventLoggingRegisterSchemas in hooks for EventLoggingRegisterSchemas [00:13:19] RoanKattouw, poked jdlrobson [00:13:28] RoanKattouw: i can take a quick look [00:13:34] yeah, already trying to push a fix for that [00:13:48] MaxSem: thanks [00:15:10] hi, please merge https://gerrit.wikimedia.org/r/#/c/192888/ [00:15:20] Reedy ^ [00:15:32] ebernhardson: RoanKattouw https://gerrit.wikimedia.org/r/193002 [00:15:35] should take care of it [00:16:05] MaxSem, ^^ [00:16:20] i'm shutting down, won't be able to comment [00:16:31] jdlrobson: thanks, i'll pick it forward to 18. probably 19 as well? [00:18:02] (03CR) 10Yurik: "Please go ahead and merge. Disconnecting ))" [puppet] - 10https://gerrit.wikimedia.org/r/192888 (owner: 10Yurik) [00:19:44] i guess so ebernhardson ... i've been on vacation so i'm not the best person to ask. - it's only deployed on betalabs as far as i'm aware. [00:19:49] ebernhardson, try now [00:21:57] ahha, then no cherry pick necessary. but leaves me wondering why we are running master branch tests when merging to wmf18 [00:22:56] MaxSem: passed the part it failed before, should merge once all the tests finish [00:24:02] MaxSem: That was a bug from the extension.json stuff that Timo fixed really recently [00:24:11] efGatherExtensionSetup [00:26:23] (03PS1) 10GWicke: Use logstash1001 instead of logstash1002 for restbase [puppet] - 10https://gerrit.wikimedia.org/r/193006 [00:26:39] bd808: ^^ [00:27:05] (03CR) 10BryanDavis: [C: 031] Use logstash1001 instead of logstash1002 for restbase [puppet] - 10https://gerrit.wikimedia.org/r/193006 (owner: 10GWicke) [00:27:22] gwicke: lgtm. find a root to merge it for you [00:27:35] mee [00:27:36] bd808: yup, looking.. [00:27:41] nice ;) [00:27:47] :) [00:27:56] (03CR) 10Gage: [C: 032] Use logstash1001 instead of logstash1002 for restbase [puppet] - 10https://gerrit.wikimedia.org/r/193006 (owner: 10GWicke) [00:28:24] jgage: grazie! [00:28:33] !log ebernhardson Synchronized php-1.25wmf18/extensions/Flow: Bump flow submodule in 1.25wmf18 for infinite scroll fix (duration: 00m 09s) [00:28:39] Logged the message, Master [00:28:45] Krenair: all done [00:28:48] ok [00:29:21] gwike: prego [00:29:31] jgage: can you look at https://gerrit.wikimedia.org/r/#/c/192888/ [00:29:35] sure [00:29:37] I'm out and it's a weird deal [00:29:50] no one is especially thrilled it came across today but there it is [00:29:52] ... is wikitech okay? [00:30:14] oh, there it is [00:30:16] up for me ok, gotta run [00:31:04] (03PS2) 10Alex Monk: Correctly configure human infobox [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192709 (owner: 10Jdlrobson) [00:31:13] jdlrobson, will do your patch next [00:31:14] (03CR) 10Gage: [C: 032] MERGE no earlier than 6pm PST: Revoked my own key [puppet] - 10https://gerrit.wikimedia.org/r/192888 (owner: 10Yurik) [00:31:22] (03CR) 10Alex Monk: [C: 032] Correctly configure human infobox [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192709 (owner: 10Jdlrobson) [00:31:27] for labs, should be no-op in prod [00:31:27] (03Merged) 10jenkins-bot: Correctly configure human infobox [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192709 (owner: 10Jdlrobson) [00:32:17] !log krenair Synchronized wmf-config/InitialiseSettings-labs.php: https://gerrit.wikimedia.org/r/#/c/192709/ (duration: 00m 05s) [00:32:24] Logged the message, Master [00:32:33] jdlrobson, please check [00:35:29] um [00:35:40] Krenair: fchecking [00:35:53] is beta ok? it's trying to redirect me to https... [00:35:58] Krenair: mm. it doesn't seem to be working [00:35:59] prod looks ok [00:36:09] jdlrobson, beta or your patch? [00:36:36] beta labs... let me try a few other things [00:36:43] (patch on beta labs) [00:37:16] still seeing old value.. [00:38:02] jdlrobson, ah, yes, it failed to deploy to beta. sigh [00:38:07] Krenair: :) [00:38:18] 00:31:31 fatal: Unable to create '/srv/mediawiki-staging/.git/refs/remotes/origin/master.lock': File exists. [00:38:37] I think superm401 is/was doing something there [00:38:44] Krenair, mid-air collision [00:40:04] with what though? [00:43:20] $ ls -l /srv/mediawiki-staging/.git/refs/remotes/origin/master.lock [00:43:20] ls: cannot access /srv/mediawiki-staging/.git/refs/remotes/origin/master.lock: No such file or directory [00:43:29] should be fine Krenair [00:43:41] on deployment-bastion? [00:43:59] ahh, beta? [00:44:04] yes [00:44:11] yyyyy this channel? :P [00:44:27] because we were trying to do a config change in swat [00:44:32] to a -labs file [00:44:48] huh [00:45:02] anyway, can I push my fixes now? [00:45:37] yeah [00:45:47] (03CR) 10Tim Landscheidt: "I've merged robots.txt and htmlpurifier on labs/toollabs, so this change is now good to go." [puppet] - 10https://gerrit.wikimedia.org/r/148172 (owner: 10Tim Landscheidt) [00:53:52] !log maxsem Synchronized php-1.25wmf19/extensions/WikiGrok/: (no message) (duration: 00m 07s) [00:53:56] Logged the message, Master [00:57:07] !log maxsem Synchronized php-1.25wmf18/extensions/WikiGrok/: (no message) (duration: 00m 07s) [00:57:11] Logged the message, Master [00:58:49] (03CR) 10MaxSem: [C: 032] Enable WikiGrok in repo mode on testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192987 (owner: 10MaxSem) [00:58:57] (03Merged) 10jenkins-bot: Enable WikiGrok in repo mode on testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192987 (owner: 10MaxSem) [00:59:54] !log maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/192987/ (duration: 00m 06s) [00:59:59] Logged the message, Master [01:02:17] still awake hoo? [01:02:35] Sadly, yes :D [01:02:59] hoo, WG works now: https://test.wikidata.org/w/index.php?title=Q1013&diff=10300&oldid=10299 [01:03:32] (03PS2) 10Springle: Unbreak dbtree [software] - 10https://gerrit.wikimedia.org/r/192771 [01:03:32] hoo, are you going to do the wikidata change? [01:03:34] MaxSem: Cool [01:03:59] yeah, I can do it [01:04:31] ok [01:04:46] (03PS1) 10Mjbmr: Enable NewUserMessage extension with for fawiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193016 (https://phabricator.wikimedia.org/T90831) [01:04:50] MaxSem, are you done then? [01:05:04] no but go ahead [01:05:14] investigating so far [01:05:47] (03PS2) 10Mjbmr: Enable NewUserMessage extension for fawiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193016 (https://phabricator.wikimedia.org/T90831) [01:06:49] Krenair, ^^ [01:07:21] hoo is the last person left [01:07:35] waiting for jenkins [01:07:38] Krenair: what's status of the config change? [01:07:47] it looks like it's working now :) [01:07:50] jdlrobson, see -mobile :) [01:07:57] ah was watching wrong channel :) [01:08:03] cool anyway it works so thanks a bunch! [01:08:07] ok, interesting [01:08:38] (03PS1) 10MaxSem: Enable WikiGrok on wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193018 [01:13:32] !log hoo Synchronized php-1.25wmf19/extensions/Wikidata/: Update Wikidata to fix EntityViewPlaceholderExpander (duration: 00m 12s) [01:13:38] Logged the message, Master [01:13:41] {{done}} [01:14:06] that's everything then [01:14:18] (I'm still battling beta.) [01:14:43] 6operations, 10hardware-requests: codfw: (1) eventlogging node - https://phabricator.wikimedia.org/T90747#1068770 (10RobH) @Ori, Understood on wanting identical specifications, and if vanadium is indeed now not fast enough for what is needed, then we can allocate something else. As such, taking a look at wha... [01:16:17] 6operations, 10hardware-requests: codfw: (1) eventlogging node - https://phabricator.wikimedia.org/T90747#1068772 (10RobH) I forgot to add we would then reclaim and decommission vanadium, but not until after the new systems are deployed. (It wouldn't have any urgency, just general housekeeping and freeing up... [01:16:31] (03CR) 10MaxSem: [C: 032] Enable WikiGrok on wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193018 (owner: 10MaxSem) [01:16:39] (03Merged) 10jenkins-bot: Enable WikiGrok on wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193018 (owner: 10MaxSem) [01:18:39] !log maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/193018/ (duration: 00m 07s) [01:18:45] Logged the message, Master [01:21:15] hoo, aude: didn't break this time:P sorry for the trouble [01:21:36] Yeah, looks good to me :) [01:42:46] 6operations, 10hardware-requests: codfw: (1) eventlogging node - https://phabricator.wikimedia.org/T90747#1068811 (10ori) >>! In T90747#1068770, @RobH wrote: > @Ori, > > Understood on wanting identical specifications, and if vanadium is indeed now not fast enough for what is needed, then we can allocate somet... [01:48:55] 6operations, 10hardware-requests: codfw/eqiad: (1) eventlogging node (per site) - https://phabricator.wikimedia.org/T90747#1068815 (10RobH) [01:49:13] 6operations, 10hardware-requests: codfw/eqiad: (1) eventlogging node (per site) - https://phabricator.wikimedia.org/T90747#1066479 (10RobH) Sounds good, I'll move forward with allocation tomorrow. [02:09:24] !log l10nupdate Synchronized php-1.25wmf18/cache/l10n: (no message) (duration: 00m 02s) [02:09:30] Logged the message, Master [02:10:31] !log LocalisationUpdate completed (1.25wmf18) at 2015-02-26 02:09:28+00:00 [02:10:37] Logged the message, Master [02:14:53] (03CR) 10Springle: [C: 032] Unbreak dbtree [software] - 10https://gerrit.wikimedia.org/r/192771 (owner: 10Springle) [02:21:32] !log l10nupdate Synchronized php-1.25wmf19/cache/l10n: (no message) (duration: 00m 01s) [02:21:38] Logged the message, Master [02:22:42] !log LocalisationUpdate completed (1.25wmf19) at 2015-02-26 02:21:38+00:00 [02:22:47] Logged the message, Master [02:36:44] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [02:39:35] (03PS1) 10Springle: Dbtree connects to the tendril database, so drop the ganglia job. [puppet] - 10https://gerrit.wikimedia.org/r/193036 [02:39:56] mutante: ^ [02:42:12] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Feb 26 02:41:08 UTC 2015 (duration 41m 7s) [02:42:16] Logged the message, Master [02:43:21] (03CR) 10Dzahn: [C: 031] Dbtree connects to the tendril database, so drop the ganglia job. [puppet] - 10https://gerrit.wikimedia.org/r/193036 (owner: 10Springle) [02:58:55] (03CR) 10Springle: [C: 032] Dbtree connects to the tendril database, so drop the ganglia job. [puppet] - 10https://gerrit.wikimedia.org/r/193036 (owner: 10Springle) [03:11:26] 6operations: dbtree - duplicated code in 2 locations - clean up - https://phabricator.wikimedia.org/T90837#1068927 (10Dzahn) p:5Triage>3Normal [03:11:35] 6operations: dbtree - duplicated code in 2 locations - clean up - https://phabricator.wikimedia.org/T90837#1068931 (10Dzahn) a:3Dzahn [03:20:44] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [03:40:00] (03PS1) 10Springle: Dbtree link hosts to tendril [software] - 10https://gerrit.wikimedia.org/r/193038 [03:40:46] (03CR) 10Springle: [C: 032] Dbtree link hosts to tendril [software] - 10https://gerrit.wikimedia.org/r/193038 (owner: 10Springle) [03:47:54] 6operations: contacts.wikimedia.org drupal unpuppetized - https://phabricator.wikimedia.org/T90679#1068947 (10Dzahn) I can't seem to find the string "db1001" in the docroot of contacts. Do you know where it is already? [03:52:51] 6operations: contacts.wikimedia.org drupal unpuppetized - https://phabricator.wikimedia.org/T90679#1068948 (10Springle) It has already been manually changed to dbproxy1001. ``` $ sudo grep -nr dbproxy1001 * | awk -F ':' '{print $1 ":" $2}' sites/default/civicrm.settings.php:59 sites/default/civicrm.settings.php... [03:53:25] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [03:55:34] 6operations: contacts.wikimedia.org drupal unpuppetized - https://phabricator.wikimedia.org/T90679#1068949 (10Dzahn) ah, gotcha. thanks. yea, i'm not sure how much to puppetize here since we might get rid of it as you already pointed out, but for now i'm at least going to do the main config files and put the pas... [03:55:58] 6operations: contacts.wikimedia.org drupal unpuppetized - https://phabricator.wikimedia.org/T90679#1068950 (10Dzahn) a:3Dzahn [04:02:29] 6operations, 10ops-eqiad: mw1062 needs a disk replacement - https://phabricator.wikimedia.org/T86542#970960 (10Dzahn) [04:03:20] 6operations: pybal issue? - https://phabricator.wikimedia.org/T90839#1068965 (10Dzahn) [04:15:15] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [05:02:00] what can be possible reason that I can't do git pull on tin? [05:02:51] Not tried again since last week though [05:03:30] (using ssh -A tin, as usual) [05:08:42] (03CR) 10Santhosh: [C: 031] Enable Content Translation in minwiki and uzwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192764 (owner: 10KartikMistry) [05:10:32] kart_: what error did you encounter? [05:26:44] PROBLEM - MySQL Idle Transactions on db1018 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:27:35] RECOVERY - MySQL Idle Transactions on db1018 is OK: OK longest blocking idle transaction sleeps for 0 seconds [05:29:40] 7Puppet: dynamicproxy: Move list of blocked user agents to hiera - https://phabricator.wikimedia.org/T90844#1069056 (10scfc) 3NEW [05:45:01] ori: Permission denied.. [05:48:25] PROBLEM - Disk space on db2011 is CRITICAL: DISK CRITICAL - free space: /srv 57999 MB (3% inode=99%): [05:49:06] wut [05:51:35] RECOVERY - Disk space on db2011 is OK: DISK OK [05:51:35] RECOVERY - MariaDB disk space on db2011 is OK: DISK OK [05:52:16] !log pre-empt m3/m4 shard split and reclaim disk space on db2011 [05:52:24] Logged the message, Master [06:00:14] 6operations, 6Labs: Wikitech registration for prior SVN user - https://phabricator.wikimedia.org/T90658#1069090 (10Dragons_flight) @chasemp: You mention changing where such requests are sent. Do I need to do anything else to ensure that this request is seen by the appropriate people? I'm guessing that it is... [06:07:00] (03CR) 10Tim Landscheidt: "This introduced syntax errors:" [puppet] - 10https://gerrit.wikimedia.org/r/187949 (owner: 10Yuvipanda) [06:13:17] (03PS1) 10Tim Landscheidt: Tools: Restart bigbrother when source changes [puppet] - 10https://gerrit.wikimedia.org/r/193055 [06:14:48] (03PS1) 10Tim Landscheidt: Tools: Fix syntax error in bigbrother [puppet] - 10https://gerrit.wikimedia.org/r/193056 [06:15:35] (03CR) 10Tim Landscheidt: "Fixed in https://gerrit.wikimedia.org/r/#/c/193056/." [puppet] - 10https://gerrit.wikimedia.org/r/187949 (owner: 10Yuvipanda) [06:17:37] 6operations, 6Labs: Wikitech registration for prior SVN user - https://phabricator.wikimedia.org/T90658#1069103 (10chasemp) The request ended up in the right place, so no worries. [06:22:53] !log on mw1088 restarting hhvm [06:22:58] Logged the message, Master [06:28:06] PROBLEM - Apache HTTP on mw1088 is CRITICAL: Connection refused [06:28:35] PROBLEM - puppet last run on db1067 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:36] PROBLEM - HHVM rendering on mw1088 is CRITICAL: Connection refused [06:29:25] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:25] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:25] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:44] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 3 failures [06:30:16] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:16] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:25] RECOVERY - Apache HTTP on mw1088 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.063 second response time [06:31:54] RECOVERY - HHVM rendering on mw1088 is OK: HTTP OK: HTTP/1.1 200 OK - 67488 bytes in 0.188 second response time [06:40:31] (03PS1) 10GWicke: Next step on the road to understanding JVM GC dynamics [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/193060 [06:47:04] RECOVERY - puppet last run on db1067 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:48:15] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [06:49:55] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [07:03:39] (03CR) 10Yuvipanda: [C: 032] Tools: Restart bigbrother when source changes [puppet] - 10https://gerrit.wikimedia.org/r/193055 (owner: 10Tim Landscheidt) [07:04:11] (03CR) 10Yuvipanda: [C: 032] "Ugh, sorry. I can't actually write perl, and assumed +1s were good enough..." [puppet] - 10https://gerrit.wikimedia.org/r/193056 (owner: 10Tim Landscheidt) [07:06:35] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [07:07:25] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [07:08:55] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [07:09:45] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [07:18:54] (03PS1) 10Yuvipanda: tools: Monitor toollabs home page [puppet] - 10https://gerrit.wikimedia.org/r/193061 (https://phabricator.wikimedia.org/T90847) [07:19:12] (03PS2) 10Yuvipanda: tools: Monitor toollabs home page [puppet] - 10https://gerrit.wikimedia.org/r/193061 (https://phabricator.wikimedia.org/T90847) [07:20:40] (03CR) 10Yuvipanda: [C: 032] tools: Monitor toollabs home page [puppet] - 10https://gerrit.wikimedia.org/r/193061 (https://phabricator.wikimedia.org/T90847) (owner: 10Yuvipanda) [07:22:26] (03PS4) 10Giuseppe Lavagetto: wmf-reimage: perform the first puppet run [puppet] - 10https://gerrit.wikimedia.org/r/192793 [07:23:05] PROBLEM - Host mc2007 is DOWN: PING CRITICAL - Packet loss = 100% [07:24:35] RECOVERY - Host mc2007 is UP: PING OK - Packet loss = 0%, RTA = 43.91 ms [07:46:34] _joe_: just a short update on the cert issues from yesterday, seems like a local wifi issue for all of them, not sure why though, maybe new firmware pushed by the vendor by ISP yesterday [07:46:59] <_joe_> matanya: if I were in them, I'd start worrying [07:47:27] yes, i conveyed that message [07:48:17] it doesn't happened on wired to any of them. [07:48:25] *happen [07:50:08] <_joe_> matanya: the message is "someone owns you. If you're lucky, it's hackers" [07:50:50] :) yes, that [08:08:09] (03PS5) 10Giuseppe Lavagetto: wmf-reimage: perform the first puppet run [puppet] - 10https://gerrit.wikimedia.org/r/192793 [08:08:20] (03CR) 10Giuseppe Lavagetto: [C: 032] wmf-reimage: perform the first puppet run [puppet] - 10https://gerrit.wikimedia.org/r/192793 (owner: 10Giuseppe Lavagetto) [08:15:30] 6operations, 10Citoid: Configure citoid to use outbound proxy - https://phabricator.wikimedia.org/T89875#1069205 (10ArielGlenn) p:5Triage>3High [08:15:49] 6operations, 10Citoid: Configure citoid to use the new zotero service - https://phabricator.wikimedia.org/T89873#1069207 (10ArielGlenn) p:5Triage>3High [08:16:10] 6operations, 10Citoid: Update the citoid/deploy branch to not contain zotero deploy - https://phabricator.wikimedia.org/T89872#1069209 (10ArielGlenn) p:5Triage>3High [08:16:25] 6operations, 10Citoid: Configure zotero to use an outbound proxy - https://phabricator.wikimedia.org/T89874#1069211 (10ArielGlenn) p:5Triage>3High [08:17:07] 6operations, 10Citoid: Backport and using zotero-standalone for the zotero service - https://phabricator.wikimedia.org/T89866#1069213 (10ArielGlenn) p:5Triage>3High [08:18:07] (03CR) 10Tim Landscheidt: "The syntax isn't checked by Jenkins (IIRC also not for Python & Co. :-)). What's a bit more disturbing: I only noticed it by going to too" [puppet] - 10https://gerrit.wikimedia.org/r/193056 (owner: 10Tim Landscheidt) [08:18:52] 6operations, 10Citoid: Assign hardware for the zotero service - https://phabricator.wikimedia.org/T89869#1069221 (10ArielGlenn) p:5Triage>3High [08:19:05] 6operations, 10Citoid: Puppetize zotero - https://phabricator.wikimedia.org/T89867#1069223 (10ArielGlenn) p:5Triage>3High [08:27:10] (03PS1) 10Giuseppe Lavagetto: mediawiki: make HHVM listen on localhost only [puppet] - 10https://gerrit.wikimedia.org/r/193062 [08:52:59] (03CR) 10Odder: Set $wgCategoryCollation to 'uca-hsb' on hsbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192803 (https://phabricator.wikimedia.org/T90689) (owner: 10Odder) [08:53:22] (03PS3) 10Odder: Set $wgCategoryCollation to 'uca-hsb' on hsbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192803 (https://phabricator.wikimedia.org/T90689) [08:53:49] (03CR) 10Giuseppe Lavagetto: [C: 032] "As suggested by Tim/ talked about on IRC" [puppet] - 10https://gerrit.wikimedia.org/r/193062 (owner: 10Giuseppe Lavagetto) [09:00:15] PROBLEM - puppet last run on mw1088 is CRITICAL: CRITICAL: puppet fail [09:01:24] RECOVERY - puppet last run on mw1088 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:07:36] (03PS1) 10Giuseppe Lavagetto: wmf-reimage: fixup for enable_and_run_puppet [puppet] - 10https://gerrit.wikimedia.org/r/193064 [09:08:40] (03CR) 10Nikerabbit: [C: 031] Enable Content Translation in minwiki and uzwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192764 (owner: 10KartikMistry) [09:08:57] (03CR) 10Nikerabbit: [C: 031] Set $wgTranslationNotificationsAlwaysHttpsInEmail to true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190203 (owner: 10Se4598) [09:09:17] (03CR) 10Giuseppe Lavagetto: [C: 032] wmf-reimage: fixup for enable_and_run_puppet [puppet] - 10https://gerrit.wikimedia.org/r/193064 (owner: 10Giuseppe Lavagetto) [09:15:06] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 5 below the confidence bounds [09:23:03] PROBLEM - Host mc2008 is DOWN: PING CRITICAL - Packet loss = 100% [09:23:24] RECOVERY - Apache HTTP on mw1145 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.377 second response time [09:23:24] RECOVERY - HHVM rendering on mw1145 is OK: HTTP OK: HTTP/1.1 200 OK - 67496 bytes in 0.743 second response time [09:23:50] (03CR) 10Mobrovac: "While setting MaxTenuringThreshold to a higher value does reduce the pressure on the old space, it also prolongs the time SSTables stay th" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/193060 (owner: 10GWicke) [09:26:14] RECOVERY - Host mc2008 is UP: PING OK - Packet loss = 0%, RTA = 44.29 ms [09:31:04] RECOVERY - HHVM queue size on mw1145 is OK: OK: Less than 30.00% above the threshold [10.0] [09:31:23] RECOVERY - HHVM busy threads on mw1145 is OK: OK: Less than 30.00% above the threshold [57.6] [09:33:45] (03PS5) 10Filippo Giunchedi: es-tool: support IPv6 addresses in (un)ban-node [puppet] - 10https://gerrit.wikimedia.org/r/191357 (owner: 10Chad) [09:33:52] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] es-tool: support IPv6 addresses in (un)ban-node [puppet] - 10https://gerrit.wikimedia.org/r/191357 (owner: 10Chad) [09:42:46] 6operations, 10Beta-Cluster, 6Labs, 10Tool-Labs: Investigate and do incident report for strange virt1012 issues - https://phabricator.wikimedia.org/T90566#1069340 (10hashar) 5Open>3Resolved The incident report has been written and published. Thank you @Andrew! [09:42:47] 6operations, 10Beta-Cluster, 6Labs, 10Tool-Labs: A virt host seems down, taking down all instances with it - https://phabricator.wikimedia.org/T90530#1069342 (10hashar) [09:53:12] apergos: hey! any thoughts onhttps://phabricator.wikimedia.org/T89537? [09:53:13] https://phabricator.wikimedia.org/T89537 [09:54:09] nope, going to have to dig around and find the script that does it etc [09:54:40] dec the last dump? that's a while all right mphf [09:54:44] yeaah [09:54:52] I’ve set it to UBN! [09:54:55] all right on today's todo list [09:55:09] where 'now' = 'today' [09:55:15] ty! [09:55:19] yw [09:56:00] 6operations, 10Datasets-General-or-Unknown, 10Tool-Labs: enwiki database dumps missing - https://phabricator.wikimedia.org/T89537#1069357 (10yuvipanda) a:5coren>3ArielGlenn [10:09:36] (03PS1) 10Giuseppe Lavagetto: wmf-reimage: remove stricthostchecking everywhere [puppet] - 10https://gerrit.wikimedia.org/r/193069 [10:17:31] (03PS2) 10Giuseppe Lavagetto: wmf-reimage: remove stricthostchecking everywhere [puppet] - 10https://gerrit.wikimedia.org/r/193069 [10:18:13] (03CR) 10Giuseppe Lavagetto: [C: 032] wmf-reimage: remove stricthostchecking everywhere [puppet] - 10https://gerrit.wikimedia.org/r/193069 (owner: 10Giuseppe Lavagetto) [10:23:50] (03Abandoned) 10Giuseppe Lavagetto: memcached: add puppet resources so that the role can be applied in codfw [puppet] - 10https://gerrit.wikimedia.org/r/188822 (owner: 10Giuseppe Lavagetto) [10:25:09] PROBLEM - Host mc2009 is DOWN: PING CRITICAL - Packet loss = 100% [10:28:30] RECOVERY - Host mc2009 is UP: PING OK - Packet loss = 0%, RTA = 42.85 ms [10:47:41] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [11:15:26] (03PS1) 10Yuvipanda: beta: Complain if there have been *any* cherry-picks for 48h [puppet] - 10https://gerrit.wikimedia.org/r/193078 (https://phabricator.wikimedia.org/T76392) [11:15:51] (03PS2) 10Yuvipanda: beta: Complain if there have been *any* cherry-picks for 48h [puppet] - 10https://gerrit.wikimedia.org/r/193078 (https://phabricator.wikimedia.org/T76392) [11:26:47] 6operations, 10Beta-Cluster: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#1069501 (10yuvipanda) @ArielGlenn: Yes, parsoid is the only one left. [11:27:20] (03PS1) 10Giuseppe Lavagetto: wmf-reimage: correct another typo [puppet] - 10https://gerrit.wikimedia.org/r/193080 [11:28:15] (03CR) 10Giuseppe Lavagetto: [C: 032] wmf-reimage: correct another typo [puppet] - 10https://gerrit.wikimedia.org/r/193080 (owner: 10Giuseppe Lavagetto) [11:28:41] 6operations, 10Beta-Cluster: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#1069502 (10yuvipanda) the parsoid code seems a fair bit more hairy - is it being deployed via different methods in prod vs labs (trebuchet vs jenkins) exclusively? I'm not fully sure. @GWicke thou... [11:30:00] 6operations, 10Beta-Cluster: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#1069504 (10yuvipanda) Also lots of /var/lib vs /srv issues. [11:30:57] PROBLEM - RAID on restbase1006 is CRITICAL: CRITICAL: Active: 8, Working: 8, Failed: 1, Spare: 0 [11:32:57] PROBLEM - Host mc2010 is DOWN: PING CRITICAL - Packet loss = 100% [11:35:57] RECOVERY - Host mc2010 is UP: PING OK - Packet loss = 0%, RTA = 42.90 ms [11:40:49] (03PS1) 10Yuvipanda: parsoid: Attempt to unify prod and beta roles [puppet] - 10https://gerrit.wikimedia.org/r/193082 (https://phabricator.wikimedia.org/T86633) [11:43:09] (03PS2) 10Yuvipanda: parsoid: Attempt to unify prod and beta roles [puppet] - 10https://gerrit.wikimedia.org/r/193082 (https://phabricator.wikimedia.org/T86633) [11:52:33] 6operations: Please generate a list of task IDs and number of their subscribers, ordered by number of subscribers, for the "top 100" tasks in the VisualEditor project - https://phabricator.wikimedia.org/T90860#1069534 (10Elitre) 3NEW [11:53:31] 6operations: Please generate a list of task IDs and number of their subscribers, ordered by number of subscribers, for the "top 100" tasks in the VisualEditor project - https://phabricator.wikimedia.org/T90860#1069540 (10Aklapper) p:5Triage>3High [12:00:44] 7Puppet, 6operations, 6Labs: Values from mwyaml backend don't override values from ops/pupppet yaml files in hieradata/labs - https://phabricator.wikimedia.org/T90466#1069551 (10yuvipanda) [12:05:49] 7Puppet, 6operations, 6Labs: Values from mwyaml backend don't override values from ops/pupppet yaml files in hieradata/labs - https://phabricator.wikimedia.org/T90466#1069563 (10Joe) Uhm, this is the case because you specifically asked the values in puppet/hieradata to be authoritative :) We can change that... [12:09:10] 7Puppet, 6operations, 6Labs: Values from mwyaml backend don't override values from ops/pupppet yaml files in hieradata/labs - https://phabricator.wikimedia.org/T90466#1069564 (10yuvipanda) oh, right. I guess that's a miscommunication somewhere. I'd definitely want wikitech values to override ops/puppet values. [12:09:21] 7Puppet, 6operations, 6Labs: Values from mwyaml backend don't override values from ops/pupppet yaml files in hieradata/labs - https://phabricator.wikimedia.org/T90466#1069565 (10yuvipanda) p:5Triage>3Normal [12:20:46] (03PS3) 10Yuvipanda: parsoid: Remove parsoid beta role [puppet] - 10https://gerrit.wikimedia.org/r/193082 (https://phabricator.wikimedia.org/T86633) [12:20:48] (03PS1) 10Yuvipanda: ci: Move jenkins access granting code into a role [puppet] - 10https://gerrit.wikimedia.org/r/193084 [12:27:01] (03CR) 10Yuvipanda: [C: 032] ci: Move jenkins access granting code into a role [puppet] - 10https://gerrit.wikimedia.org/r/193084 (owner: 10Yuvipanda) [12:35:50] 6operations, 10Beta-Cluster, 5Patch-For-Review: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#1069583 (10yuvipanda) https://gerrit.wikimedia.org/r/#/c/193082/ has basically killed the current beta role and just picked up the prod role, and testing on deployment-parsoid... [12:37:34] 6operations, 10Beta-Cluster, 5Patch-For-Review: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#1069584 (10yuvipanda) And https://github.com/search?q=betalabs.localsettings.js&type=Code&utf8=%E2%9C%93 tells me it isn't really mentioned anywhere other than the role. Is th... [12:46:23] (03CR) 10Krinkle: [C: 031] "Confirmed. Branch 'wmf-deploy' exists and points to the current master, which has been unmodified since v0.1.0." [puppet] - 10https://gerrit.wikimedia.org/r/192466 (https://phabricator.wikimedia.org/T90495) (owner: 10Legoktm) [12:48:22] YuviPanda: have some time if you can check issue with my access on tin? 'git pull' for deployment branches returns permission denied. [12:50:02] PROBLEM - Host mc2014 is DOWN: PING CRITICAL - Packet loss = 100% [12:52:50] (03CR) 10Krinkle: "A tag is also supported there, but we could also name the branch like "legacy" or something like that. Anyway, we're removing it soon afte" [puppet] - 10https://gerrit.wikimedia.org/r/192466 (https://phabricator.wikimedia.org/T90495) (owner: 10Legoktm) [12:54:49] 6operations, 10Datasets-General-or-Unknown, 10Tool-Labs: enwiki database dumps missing - https://phabricator.wikimedia.org/T89537#1069611 (10ArielGlenn) 5Open>3Resolved The copy script is running regularly (as evidenced by the fact that it's picking up the other dumps). I guess that at some point the en... [12:58:53] RECOVERY - Host mc2014 is UP: PING OK - Packet loss = 0%, RTA = 42.96 ms [13:03:54] PROBLEM - Host mc2013 is DOWN: PING CRITICAL - Packet loss = 100% [13:04:44] (03PS1) 10Nemo bis: Set $wgDismissableSiteNoticeForAnon to true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193090 (https://phabricator.wikimedia.org/T59732) [13:06:48] 6operations, 6Phabricator, 10Wikimedia-Bugzilla, 5Patch-For-Review: Create a static HTML version of Bugzilla - https://phabricator.wikimedia.org/T85140#1069641 (10Aklapper) >>! In T85140#1055057, @Dzahn wrote: > well.. can we add JohnLewis to the BZ security group (--> T89781) See my reply in T89781#1058... [13:07:03] RECOVERY - Host mc2013 is UP: PING OK - Packet loss = 0%, RTA = 43.26 ms [13:10:49] (03CR) 10Krinkle: "Where is git-buildpackage used?" [puppet] - 10https://gerrit.wikimedia.org/r/191677 (owner: 10Hashar) [13:11:47] (03CR) 10Krinkle: [C: 031] "nvm, we use it for lots of stuff. E.g. operations-debs-ruby-dimensions, operations-debs-ruby-jsduck etc." [puppet] - 10https://gerrit.wikimedia.org/r/191677 (owner: 10Hashar) [13:19:44] (03Draft2) 10Filippo Giunchedi: import upstream 0.7.0 [debs/statsite] - 10https://gerrit.wikimedia.org/r/193095 (https://phabricator.wikimedia.org/T90111) [13:19:46] (03Draft2) 10Filippo Giunchedi: import debian directory [debs/statsite] - 10https://gerrit.wikimedia.org/r/193096 (https://phabricator.wikimedia.org/T90111) [13:19:49] (03Draft2) 10Filippo Giunchedi: add debian patches to upstream [debs/statsite] - 10https://gerrit.wikimedia.org/r/193097 (https://phabricator.wikimedia.org/T90111) [13:20:28] (03CR) 10Filippo Giunchedi: "reported upstream too, see https://github.com/armon/statsite/issues/110" [debs/statsite] - 10https://gerrit.wikimedia.org/r/193097 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [13:22:01] 6operations, 7Graphite, 5Patch-For-Review: replace txstatsd - https://phabricator.wikimedia.org/T90111#1069651 (10fgiunchedi) I've published a first iteration on statsite debian package in the last three gerrit changes above [13:28:45] 6operations, 10Datasets-General-or-Unknown, 10Wikidata: Wikidata dumps contain old-style serialization. - https://phabricator.wikimedia.org/T74348#1069658 (10ArielGlenn) right. this is what you want; the old style 'entity' is gone, the new style 'descriptions' is present. or am I missing something? [13:58:06] 7Puppet, 6operations: dynamicproxy: Move list of blocked user agents to hiera - https://phabricator.wikimedia.org/T90844#1069752 (10Aklapper) I can only assume that this is for operations (or Labs)? Well, adding project... [14:09:08] 6operations, 10Datasets-General-or-Unknown, 10Wikidata: Wikidata dumps contain old-style serialization. - https://phabricator.wikimedia.org/T74348#1069776 (10hoo) >>! In T74348#1069658, @ArielGlenn wrote: > right. this is what you want; the old style 'entity' is gone, the new style 'descriptions' is prese... [14:40:22] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [14:46:04] 6operations, 10ops-codfw, 3wikis-in-codfw: Move network cable to the other port on codfw memcached hosts - https://phabricator.wikimedia.org/T90456#1069804 (10Joe) 5Open>3Resolved [14:46:05] 6operations, 3wikis-in-codfw: Setup memcached cluster in codfw - https://phabricator.wikimedia.org/T86888#1069806 (10Joe) [14:50:31] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [14:52:51] ops, do we haz any public wikis with Special:JavaScriptTest fully enabled? [14:53:11] what does fully mean? [14:53:50] Actually runs tests! Here it's enabled but I can't get any actual tests going (?!): https://test2.wikipedia.org/wiki/Special:JavaScriptTest [14:53:52] 6operations, 7Graphite: scale statsd reporting/aggregation (plan) - https://phabricator.wikimedia.org/T89857#1069811 (10fgiunchedi) btw my mental model poorly described above can be summarized as: * stop having every machine push single clients independently to a single central point * instead, expose those m... [14:53:58] Nikerabbit: ^ [14:55:09] I got this perverse idea of making a Cucumber browser test that checks the QUnit test result and running that on SauceLabs on different platforms [14:55:41] But I'd need a public wiki that I can point the browsers to [14:56:01] If we don't have any I could create a labs instance, but it'd be more fun if there were one already [14:58:09] (03PS1) 10Ottomata: Add hash to set extra properties on yarn-site.xml file [puppet/cdh] - 10https://gerrit.wikimedia.org/r/193105 [14:58:27] (03CR) 10Ottomata: [C: 032] Add hash to set extra properties on yarn-site.xml file [puppet/cdh] - 10https://gerrit.wikimedia.org/r/193105 (owner: 10Ottomata) [14:59:31] AndyRussG: curious... [15:00:09] heheh it was a showerthought [15:00:34] that's the only wiki where it is supposed to be enabled on wmf production [15:00:40] But I think it makes sense! Just make Cucumber check that tests were run and that 0 failed [15:01:32] Hmmm indeed, I wonder why there are no modules to run...? [15:02:12] there is JS error in console... [15:05:03] Ah [15:05:25] indeed [15:06:33] Looks like some dependency error maybe [15:07:03] since it sez "Error: Unknown dependency" heheh [15:07:54] Lemme try the same verison locally [15:12:38] 6operations, 10ops-codfw, 3wikis-in-codfw: PXE doesn't work on mc2017-18 - https://phabricator.wikimedia.org/T90586#1069855 (10Joe) on mc2017: No PXE-capable device available. Strike F1 to retry boot, F2 for system setup. [15:12:48] 6operations, 10ops-codfw, 3wikis-in-codfw: PXE doesn't work on mc2017-18 - https://phabricator.wikimedia.org/T90586#1069857 (10Joe) 5Resolved>3Open [15:12:49] 6operations, 3wikis-in-codfw: Setup memcached cluster in codfw - https://phabricator.wikimedia.org/T86888#1069859 (10Joe) [15:14:04] Hmm it works locally on the same release (1.25wmf19) [15:14:30] Nikerabbit: whom do you think I should ping about this? I guess it's _supposed_ to work... [15:14:46] !log demon Synchronized php-1.25wmf18/extensions/RestBaseUpdateJobs: (no message) (duration: 00m 06s) [15:14:53] Logged the message, Master [15:14:53] <^d> gwicke: You're live on wmf18 too [15:15:40] AndyRussG: I would try MatmaRex or Krinkle [15:15:59] they can probably at least give further pointers [15:16:00] hm [15:16:05] thanks! [15:16:44] MatmaRex: https://test2.wikipedia.org/wiki/Special:JavaScriptTest/qunit not working 8p [15:17:23] woot. Error: Unknown dependency: jquery.ui.EditableTemplatedWidget [15:17:28] what in god's name *is* EditableTemplatedWidget? [15:17:30] ^d: ohh [15:17:45] ^d: nm then - I'll just abandon that patch [15:17:54] <^d> Nooo, I mean I merged your patch :) [15:17:57] <^d> And sync'd it [15:18:00] * Nikerabbit creates EditableTemplateWidgetCollectionFactory [15:18:03] I have an evil plan of pointing cross-browser tests at qunit tests [15:18:15] ^d: a-ok ;) [15:18:18] thank you! [15:18:33] AndyRussG: no idea what this EditableTemplatedWidget is, but it looks like the tests for it are not set up correctly [15:19:04] has jquery ui been updated recently? [15:19:10] AndyRussG: oh, it's from Wikibase [15:19:17] I don't remember seeing anything related to it lately [15:19:26] it's not part of jQuery UI [15:19:32] that explains [15:19:47] Wikibase's tests are broken and are breaking QUnit on wikis there it is installed [15:20:00] Hrmf! [15:20:14] where* [15:20:44] (03PS1) 10Ottomata: Update cdh module and tweak FairScheduler settings to give more priority for essential jobs [puppet] - 10https://gerrit.wikimedia.org/r/193109 [15:21:11] Oh well... Sounds like I should just set up a dedicated instance [15:21:55] (03CR) 10jenkins-bot: [V: 04-1] Update cdh module and tweak FairScheduler settings to give more priority for essential jobs [puppet] - 10https://gerrit.wikimedia.org/r/193109 (owner: 10Ottomata) [15:22:12] oop -1 [15:23:01] (03PS2) 10Ottomata: Update cdh module and tweak FairScheduler settings to give more priority for essential jobs [puppet] - 10https://gerrit.wikimedia.org/r/193109 [15:24:52] (03CR) 10QChris: [C: 031] Update cdh module and tweak FairScheduler settings to give more priority for essential jobs [puppet] - 10https://gerrit.wikimedia.org/r/193109 (owner: 10Ottomata) [15:26:25] 6operations, 10ops-codfw, 3wikis-in-codfw: Console on mc2001 is unresponsive - https://phabricator.wikimedia.org/T90559#1069925 (10Joe) The console gets stuck in Strike the F1 key to continue, F2 to run the system setup program [15:32:59] 6operations, 7Graphite: scale statsd reporting/aggregation (plan) - https://phabricator.wikimedia.org/T89857#1069939 (10fgiunchedi) also @chasemp was wondering if our current usage of (tx)statsd makes sense, I collected one minute of statsd traffic on graphite1001 and these are a few of the top senders ```... [15:33:22] gwicke: ^ I think restbase is sending too much traffic to the central txstatsd [15:34:19] (03CR) 10Ottomata: [C: 032] Update cdh module and tweak FairScheduler settings to give more priority for essential jobs [puppet] - 10https://gerrit.wikimedia.org/r/193109 (owner: 10Ottomata) [15:34:39] godog: ok.. how can we scale this? [15:34:43] (03PS1) 10Odder: Lift IP rate limit for Santiago edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193110 (https://phabricator.wikimedia.org/T90778) [15:34:47] looking at your notes [15:35:09] gwicke: one bandaid we've been using is running txstatsd locally on the machine [15:35:45] what's the timeline on the statsite install? [15:36:45] I would think that a single statsd should be able to handle all restbase traffic [15:37:37] so if we can't load balance statsds, have a couple of them on different ports & hit each of them from a different cluster? [15:38:47] does restbase send a timer for each request? [15:38:48] one statsd per machine feeding directly into graphite would have all the same problems as load balanced universal statsds [15:39:26] godog: yes, and also for internal requests, backend requests etc [15:39:38] yeah I don't think that's the sensible thing to do [15:39:48] we could set sampling intervals, but the trouble is that some of them are very low volume [15:39:55] and the instrumentation is generic right now [15:39:55] (03PS1) 10Glaisher: Enable EducationProgram extension on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193112 (https://phabricator.wikimedia.org/T89588) [15:40:10] we'd need to keep stats on frequency & dynamically adjust sampling rate [15:41:04] it's mostly timings [15:41:34] with txstatsd we also need to send two udp packets for each request to increment one per-status timer & one ALL timer [15:41:46] statsd would let us batch that in a single packet [15:42:03] which saves cpu time in both client & server [15:43:34] that's true [15:46:31] so my impression of push options is a) stacking statsds is troublesome, and b) there are some aggregation details to get right when feeding a single graphite metric from different statsd instances, but it might be possible [15:46:48] 6operations, 10Datasets-General-or-Unknown, 10Wikidata: Wikidata dumps contain old-style serialization. - https://phabricator.wikimedia.org/T74348#1069989 (10ArielGlenn) ugh, I stare it for an hour and I'm still blind. Let me look at it for another hour... sorry. [15:47:13] if b) is indeed possible, then we could do either a load balanced statsd cluster, or one statsd per client directly feeding into graphite [15:47:53] I'm leaning towards the load balanced cluster for its simplicity [15:48:34] and the ability to keep the number of statsd instances moderate, which in turn limits the load on graphite [15:49:12] assuming b) I see no reason not to run a local statsd on the machines [15:49:24] (03PS1) 10BBlack: stop unmanaged default nginx config on install [puppet/nginx] - 10https://gerrit.wikimedia.org/r/193116 [15:50:10] godog: main reason would be operational simplicity & number of statsd feeding into graphite [15:50:36] marktraceur, ^d: Who wants to SWAT this morning? [15:51:00] kart_, Krenair: Ping for SWAT in 9 minutes. [15:51:19] having hundreds of statds flushing a lot of metrics every 10s would create more load on graphite than a dozen or so in a cluster [15:51:24] twkozlowski: Ping for SWAT in 9 minutes, although I suspect you just joined for exactly that purpose. [15:51:25] I can swat [15:51:31] Krenair: Oh yeah. Go for it. [15:51:43] Glaisher: Ping for SWAT in 8.5 minutes. [15:52:06] 6operations, 3wikis-in-codfw: mc2004 console is unreadable remotely - https://phabricator.wikimedia.org/T90883#1070000 (10Joe) 3NEW a:3Joe [15:52:19] 6operations, 10ops-codfw, 3wikis-in-codfw: mc2004 console is unreadable remotely - https://phabricator.wikimedia.org/T90883#1070000 (10Joe) [15:52:21] * marktraceur +1s Krenair [15:52:23] (03PS1) 10Filippo Giunchedi: txstatsd: collect udp stats [puppet] - 10https://gerrit.wikimedia.org/r/193117 (https://phabricator.wikimedia.org/T90111) [15:52:26] pong for swat in 7 minutes [15:52:41] Glaisher, that educationprogram installation is going to need schema changes I think? [15:52:43] gwicke: without load balancers though [15:52:49] yes [15:53:18] * ^d is about [15:53:19] gwicke: anyways I don't think pushing metrics is a good strategy overall [15:53:26] (03PS2) 10BBlack: stop unmanaged default nginx config on install [puppet/nginx] - 10https://gerrit.wikimedia.org/r/193116 [15:53:51] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] txstatsd: collect udp stats [puppet] - 10https://gerrit.wikimedia.org/r/193117 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [15:53:53] godog: it scales well [15:53:57] kart_, yt? [15:54:01] and degrades well too [15:54:47] gwicke: it doesn't seem the case ATM [15:54:54] the simplicity of setting up new metrics is very important to if we want to encourage more instrumentation [15:54:56] Krenair: yep. How can I test it though? :) [15:54:59] *too [15:55:24] I'm not sure what setting up new metrics has to do with push vs pull [15:55:26] oh. That was about SWAT. I'm ready. [15:55:31] Krenair: sorry :/ [15:55:49] (I thought it was about tin's git pull :)) [15:56:01] godog: right now it's just that it's a single python process for everything (which is probably one of the slowest statsd implementations around), and we haven't scaled it horizontally yet [15:56:56] akosiaris: around? [15:57:00] godog: we could also use https://github.com/etsy/statsd/blob/master/docs/cluster_proxy.md [15:57:12] sends each metric to a specific statsd backend [15:57:17] by hash [15:57:23] (03CR) 10KartikMistry: [C: 031] "Can go in few minutes." [puppet] - 10https://gerrit.wikimedia.org/r/192769 (owner: 10KartikMistry) [15:57:39] !log restarted resourcemanager on analytics1001 to load new fairscheduler settings [15:57:45] Logged the message, Master [15:57:59] gwicke: what do we do when a single proxy isn't enough? [15:58:15] godog: lets get there first ;) [15:58:18] for installing educationprogram, we'll just need to run sql.php on extensions/EducationProgram/sql/EducationProgram.sql, I guess? [15:58:29] the proxy is multi-threaded and based on nodejs [15:58:54] so each thread should perform well, and it can use all cores on a beefy machine [15:59:05] gwicke: sure, but what's the answer to my question? :) [15:59:14] it should also be possible to scale the proxy horizontally [15:59:31] if the config is in sync, different instances should send traffic to the same backend [16:00:04] manybubbles, anomie, ^d, marktraceur: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150226T1600). [16:00:14] ok [16:00:35] godog: at that point the remaining scaling limit would be if a single metric can't be handled by a single statsd backend any more [16:00:43] * twkozlowski takes a blueberry muffin and watches [16:01:08] godog: but I think we are not quite at that scale yet [16:01:30] (03PS4) 10Alex Monk: Enable Content Translation in minwiki and uzwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192764 (owner: 10KartikMistry) [16:01:50] kart_, are we going to need a schema update for this? [16:02:22] gwicke: yep that's certainly true [16:02:25] kart_? [16:02:29] kart_: yeah, I am around [16:02:56] Krenair: ah. I'm not sure. It is installing ext to new wiki. [16:03:46] akosiaris: you can merge, https://gerrit.wikimedia.org/r/#/c/192769 in few minutes. Thanks! [16:04:11] Nikerabbit: do you know if we need schema updates for new CX enablment? [16:04:24] (03CR) 10Alexandros Kosiaris: [C: 032] CX: Enable ru in source, min and uz in target wikis [puppet] - 10https://gerrit.wikimedia.org/r/192769 (owner: 10KartikMistry) [16:04:41] doesn't look like it [16:05:10] Krenair: Thanks. Go ahead. I'll check if everything is OK. [16:05:12] kart_: done [16:05:23] akosiaris: Thanks! [16:05:24] (03CR) 10Alex Monk: [C: 032] Enable Content Translation in minwiki and uzwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192764 (owner: 10KartikMistry) [16:05:29] (03Merged) 10jenkins-bot: Enable Content Translation in minwiki and uzwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192764 (owner: 10KartikMistry) [16:05:32] godog: IIRC we have a puppetization for plain statsd already? [16:06:04] Krenair: Thanks! [16:06:12] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/192764/ (duration: 00m 07s) [16:06:17] kart_, ^ [16:06:18] Logged the message, Master [16:06:30] kart_: there are no other schema changes currently other than index addition waitin for review [16:06:54] gwicke: yep we do [16:07:14] Nikerabbit: thanks. [16:08:07] godog: the proxy might work with statsite too; I would think that it could even add batch support to it [16:08:32] because it needs to split the packet for hashing on the metric name anyway [16:08:50] so the backend doesn't need to directly support batching [16:09:33] but from a scalability pov it probably doesn't make that much of a difference whether we use statsite or etsy statsd [16:09:48] does it look OK, kart_? [16:10:05] would expect single-thread perf to differ by 2x to 4x [16:10:33] Krenair: yes [16:10:57] ok, good [16:11:06] just waiting on jenkins for my patch now [16:11:42] gwicke: what do you think of the notes I have in T89857 ? [16:12:35] why do we run zend tests against wmf/ branches? do we still have that in prod? [16:12:38] <^d> YuviPanda: Any chance we could get all of ES config into hiera? [16:12:41] Oh, I guess silver... :/ [16:13:02] godog: I'm sceptical about anything that involves polling or http for simple metrics [16:13:35] you'd lose the simplicity of statsd, and get a lot of scaling issues in the aggregator [16:14:12] <^d> godog: Speaking of statsd, we should eventually look at using the statsd plugin in prod elastic as well rather than sending those metrics to ganglia [16:14:16] Why do I have an email saying someone has created a '-private' repository on the github org? [16:14:23] <^d> At some point. [16:14:25] for example, the aggregator would need to know all hosts it should contact [16:14:27] ^d [16:14:29] ^d: *nod* definitely [16:14:53] <^d> Lemme phile a Phab task [16:15:46] gwicke: we know the list of hosts already, what scaling issues specifically btw? [16:16:20] keeping the list of hosts up to date isn't so easy [16:16:39] at least compared to how statsd works right now [16:16:47] the list of hosts is effectively what puppet has certs signed for btw [16:17:01] so that's hundreds of them [16:17:09] some of which would be down at any given time [16:17:35] that's always true, not sure what's your point [16:17:38] !log krenair Synchronized php-1.25wmf19/extensions/VisualEditor/modules/ve-mw/init/ve.init.mw.Target.js: https://gerrit.wikimedia.org/r/#/c/193024/ (duration: 00m 05s) [16:17:42] Logged the message, Master [16:18:07] godog: handling timeouts, host status etc is a lot of complexity [16:18:34] 6operations, 10Wikimedia-Logstash, 7Elasticsearch: Deploy statsd plugin for production elasticsearch & logstash - https://phabricator.wikimedia.org/T90889#1070119 (10Chad) 3NEW [16:18:56] (tested, seems ok) [16:19:04] twkozlowski, ping [16:19:14] Krenair: pong [16:19:25] (03PS4) 10Alex Monk: Set $wgCategoryCollation to 'uca-hsb' on hsbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192803 (https://phabricator.wikimedia.org/T90689) (owner: 10Odder) [16:20:49] (03CR) 10Alex Monk: [C: 032] Set $wgCategoryCollation to 'uca-hsb' on hsbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192803 (https://phabricator.wikimedia.org/T90689) (owner: 10Odder) [16:21:11] 6operations, 7Graphite: scale statsd reporting/aggregation (plan) - https://phabricator.wikimedia.org/T89857#1070129 (10GWicke) Another option: - Use [the statsd proxy](https://github.com/etsy/statsd/blob/master/docs/cluster_proxy.md) to hash packets to specific backend instances, so that each backend statsd... [16:22:11] gwicke: I agree it should handle hosts being down, in return you get hosts status which we need to know anyway (atm via icinga for example) and a reliable way to fetch metrics and make sure they make it to storage and know which metrics aren't currently making it [16:22:35] gwicke: btw happy news, statsite supports batching via '\n' [16:22:44] (03Merged) 10jenkins-bot: Set $wgCategoryCollation to 'uca-hsb' on hsbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/192803 (https://phabricator.wikimedia.org/T90689) (owner: 10Odder) [16:22:45] * Krenair kicks zuul [16:22:49] godog: ahh, nice! [16:23:27] statsd reporting currently takes up around 5% of cpu time in restbase; that'll roughly cut that in half [16:23:50] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/192803 (duration: 00m 07s) [16:23:55] Logged the message, Master [16:24:41] twkozlowski, ok... I think we also need to run a maintenance script [16:24:56] Krenair: Yep. [16:25:08] 6operations, 10Wikimedia-Logstash, 7Elasticsearch, 7Graphite: Deploy statsd plugin for production elasticsearch & logstash - https://phabricator.wikimedia.org/T90889#1070135 (10Chad) [16:25:20] !log Running updateCollation.php on hsbwiki [16:25:24] Logged the message, Master [16:25:38] twkozlowski, done, please test [16:26:12] Krenair: Yes, looks fine. [16:26:17] https://hsb.wikipedia.org/wiki/Kategorija:Zapados%C5%82owjanske_r%C4%9B%C4%8De [16:26:44] Unfortunately Google Translate doesn't seem to like Upper Sorbian :) [16:27:36] ^d: ugh, let me merge that patch of yours now [16:27:39] godog: I agree that having uptime info is great too, but statsd stuff is also latency-sensitive [16:27:45] <^d> YuviPanda: Step in the right direction :) [16:27:47] (03PS2) 10Alex Monk: Lift IP rate limit for Santiago edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193110 (https://phabricator.wikimedia.org/T90778) (owner: 10Odder) [16:27:53] <^d> We need to slim down the rest of that crud too [16:28:37] twkozlowski, hmm... problem [16:28:46] oh, no, it's fine [16:28:52] you addressed it already on the task :) [16:29:07] ^d: yeah. I’m getting rid of the last ::beta role today, which is for parsoid [16:29:15] (03CR) 10Alex Monk: [C: 032] Lift IP rate limit for Santiago edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193110 (https://phabricator.wikimedia.org/T90778) (owner: 10Odder) [16:29:24] (03Merged) 10jenkins-bot: Lift IP rate limit for Santiago edit-a-thon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193110 (https://phabricator.wikimedia.org/T90778) (owner: 10Odder) [16:30:20] !log krenair Synchronized wmf-config/throttle.php: https://gerrit.wikimedia.org/r/#/c/193110/ (duration: 00m 08s) [16:30:25] Logged the message, Master [16:30:46] ok... [16:30:46] Thanks, Krenair [16:31:06] That's just a throttle change, so nothing for you to really verify [16:31:23] Glaisher, ping [16:31:34] pong [16:31:41] gwicke: there are more info to what I was thinking about in the two links I posted too if you want to take a look [16:32:04] it does sound like I'll need to do a schema change to deploy this [16:33:02] godog: so are you proposing to use prometheus? [16:33:06] (03CR) 10BBlack: [C: 032] stop unmanaged default nginx config on install [puppet/nginx] - 10https://gerrit.wikimedia.org/r/193116 (owner: 10BBlack) [16:33:44] !log ran sql.php --wiki=ruwiki php-1.25wmf18/extensions/EducationProgram/sql/EducationProgram.sql [16:33:49] Logged the message, Master [16:33:54] gwicke: I want to be in a position to be able to try whichever and then decide [16:34:17] certainly seems OK... will send out the patch [16:34:59] (03PS2) 10Alex Monk: Enable EducationProgram extension on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193112 (https://phabricator.wikimedia.org/T89588) (owner: 10Glaisher) [16:35:09] (03CR) 10Alex Monk: [C: 032] Enable EducationProgram extension on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193112 (https://phabricator.wikimedia.org/T89588) (owner: 10Glaisher) [16:35:25] (03PS4) 10BBlack: mkfs for ext4 varnish filesystems on jessie [puppet] - 10https://gerrit.wikimedia.org/r/192833 [16:35:27] (03PS1) 10BBlack: cp1008 varnish storage size correction [puppet] - 10https://gerrit.wikimedia.org/r/193126 [16:35:29] (03PS1) 10BBlack: bump nginx for unmanaged-stop-on-install [puppet] - 10https://gerrit.wikimedia.org/r/193127 [16:35:42] godog: what do you think about giving the statsd proxy a try? [16:36:11] (03Merged) 10jenkins-bot: Enable EducationProgram extension on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193112 (https://phabricator.wikimedia.org/T89588) (owner: 10Glaisher) [16:36:23] (03PS6) 10Chad: Move beta elasticsearch config into hiera [puppet] - 10https://gerrit.wikimedia.org/r/188702 [16:36:30] <^d> YuviPanda: Rebased [16:36:53] !log krenair Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/193112/ (duration: 00m 07s) [16:36:54] Glaisher, please check [16:36:59] Logged the message, Master [16:37:18] gwicke: I think we should swap statsite first, though I'll be on VAC the next weeks [16:37:22] two weeks [16:37:58] Krenair: Special pages are showing up fine. [16:38:02] Thanks [16:38:42] godog: another option could be to try the proxy just with restbase data [16:38:57] ok, swat done [16:39:31] godog: statsite sounds good to me too, but imho scaling horizontally is more urgent [16:40:08] (03CR) 10BBlack: [C: 032] cp1008 varnish storage size correction [puppet] - 10https://gerrit.wikimedia.org/r/193126 (owner: 10BBlack) [16:40:17] (03CR) 10BBlack: [C: 032] bump nginx for unmanaged-stop-on-install [puppet] - 10https://gerrit.wikimedia.org/r/193127 (owner: 10BBlack) [16:44:58] gwicke: I agree [16:49:39] 6operations, 10Beta-Cluster, 5Patch-For-Review: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#1070257 (10marcoil) >>! In T86633#1069583, @yuvipanda wrote: > https://gerrit.wikimedia.org/r/#/c/193082/ has basically killed the current beta role and just picked up the pro... [16:50:38] If nobody has anything in SWAT now, I can start cxserver deployment [16:50:46] Krenair: we are done. Right? [16:50:54] yep [16:51:05] cool. [16:51:12] swat is finished [16:52:11] (03CR) 10Brion VIBBER: [C: 031] "lgtm but I'm not familiar with the current commit/deployment rules for the config repo; will poke someone in IRC about +2/deploy :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193090 (https://phabricator.wikimedia.org/T59732) (owner: 10Nemo bis) [16:55:26] !log Updated cxserver to 4e09ee8 [16:55:32] Logged the message, Master [16:57:23] (03PS1) 10BBlack: fix nginx config/package/service ordering [puppet/nginx] - 10https://gerrit.wikimedia.org/r/193129 [16:57:25] ^d: cool. in meeting, will merge after [16:57:31] <^d> mmk [16:58:31] ^d: around? [16:59:11] (03PS2) 10BBlack: fix nginx config/package/service ordering [puppet/nginx] - 10https://gerrit.wikimedia.org/r/193129 [17:00:04] kart_, ^d: Dear anthropoid, the time has come. Please deploy Content Translation/cxserver (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150226T1700). [17:00:14] (03CR) 10BBlack: [C: 032] fix nginx config/package/service ordering [puppet/nginx] - 10https://gerrit.wikimedia.org/r/193129 (owner: 10BBlack) [17:00:14] <^d> I am [17:00:33] (03CR) 10Brion VIBBER: [C: 032] "ok let's try this" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193090 (https://phabricator.wikimedia.org/T59732) (owner: 10Nemo bis) [17:01:02] ^d: thanks. I'm starting, may need your superpower :) [17:01:24] (03PS2) 10Ottomata: Synchronize mediacounts files to dumps.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/192988 (owner: 10QChris) [17:01:39] (03PS1) 10BBlack: bump nginx module [puppet] - 10https://gerrit.wikimedia.org/r/193132 [17:02:04] (03CR) 10BBlack: [C: 032 V: 032] bump nginx module [puppet] - 10https://gerrit.wikimedia.org/r/193132 (owner: 10BBlack) [17:02:43] (03PS2) 10ArielGlenn: add thcipriani to deployers admin group [puppet] - 10https://gerrit.wikimedia.org/r/192552 (https://phabricator.wikimedia.org/T90467) (owner: 10Dzahn) [17:03:32] !log demon Synchronized README: look ma, no key forwarding (duration: 00m 05s) [17:03:37] Logged the message, Master [17:03:46] !log brion Synchronized wmf-config/CommonSettings.php: (no message) (duration: 00m 05s) [17:03:50] Logged the message, Master [17:06:31] (03CR) 10ArielGlenn: [C: 032] add thcipriani to deployers admin group [puppet] - 10https://gerrit.wikimedia.org/r/192552 (https://phabricator.wikimedia.org/T90467) (owner: 10Dzahn) [17:07:41] Nikerabbit: or ^d: git fetch on wmf18 please. [17:09:32] <^d> done [17:11:13] 10Ops-Access-Requests, 6operations: Requesting access to tin.eqiad.wmnet for thcipriani - https://phabricator.wikimedia.org/T90467#1070346 (10ArielGlenn) 5Open>3Resolved a:3ArielGlenn He should be all set up on tin. Closing. [17:12:52] 6operations, 10hardware-requests: codfw/eqiad: (1) eventlogging node (per site) - https://phabricator.wikimedia.org/T90747#1070355 (10Tnegrin) Rob -- if you need a department to charge these against, please use Analytics. Thanks for the help. -Toby [17:12:57] 6operations, 10Wikimedia-Logstash, 7Elasticsearch, 7Graphite: Deploy statsd plugin for production elasticsearch & logstash - https://phabricator.wikimedia.org/T90889#1070356 (10ArielGlenn) p:5Triage>3Normal [17:14:50] ^d: and in wmf19 please. [17:16:09] <^d> done [17:18:25] !log kartik Started scap: Update ContentTranslation [17:18:32] Logged the message, Master [17:18:42] ^d: thanks! [17:20:19] ^d: so, it is git fetch still puzzling me :/ [17:20:31] ori: it was git fetch, not git pull. Sorry. [17:20:43] Should I email to Ops list? [17:22:17] <^d> Plenty of lurkers here should know [17:27:05] (03PS7) 10Yuvipanda: Move beta elasticsearch config into hiera [puppet] - 10https://gerrit.wikimedia.org/r/188702 (owner: 10Chad) [17:27:38] (03CR) 10Yuvipanda: [C: 032] Move beta elasticsearch config into hiera [puppet] - 10https://gerrit.wikimedia.org/r/188702 (owner: 10Chad) [17:28:15] (03PS1) 10Ori.livneh: Switch EventLogging's MariaDB consumer to m4-master [puppet] - 10https://gerrit.wikimedia.org/r/193135 [17:28:51] (03CR) 10Ori.livneh: [C: 04-2] "To be deployed at 00:00 UTC by Ori / Sean" [puppet] - 10https://gerrit.wikimedia.org/r/193135 (owner: 10Ori.livneh) [17:28:57] ^d: merged. [17:29:10] ^d: do we want to support ES on non deployment-prep with the same config? [17:29:55] ^d: ugh, just realized that won’t really work [17:30:19] does that clobber logstash in beta? [17:32:22] (03PS1) 10Yuvipanda: Revert "Move beta elasticsearch config into hiera" [puppet] - 10https://gerrit.wikimedia.org/r/193137 [17:32:31] (03PS3) 10Ottomata: Synchronize mediacounts files to dumps.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/192988 (owner: 10QChris) [17:32:46] (03CR) 10Yuvipanda: [C: 032 V: 032] Revert "Move beta elasticsearch config into hiera" [puppet] - 10https://gerrit.wikimedia.org/r/193137 (owner: 10Yuvipanda) [17:33:01] (03PS4) 10Ottomata: Synchronize mediacounts files to dumps.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/192988 (owner: 10QChris) [17:33:04] hah [17:33:09] (03CR) 10Ottomata: [C: 032 V: 032] Synchronize mediacounts files to dumps.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/192988 (owner: 10QChris) [17:33:14] ^d: that was fail on my part [17:33:20] <^d> :( [17:33:55] ^d: I’ll redo it tomorrow, too sleepy now :( [17:34:03] bd808: it could’ve, if it ran. I hopefully reverted it quickly enough [17:34:22] *nod* [17:34:23] !log kartik Finished scap: Update ContentTranslation (duration: 15m 58s) [17:34:30] bd808: did it? [17:34:30] Logged the message, Master [17:34:39] I'll check [17:35:19] YuviPanda: looks ok to me [17:35:32] bd808: sweet, thanks [17:35:58] ^d: so the problem with that patch is that values from hiera will pick up only if it isn’t explicitly set in the puppet code [17:36:04] and in this case it is. just having them be undef won’t work [17:36:10] <^d> grr [17:36:25] (03PS1) 10Glaisher: Enable EducationProgram extension on lvwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193138 (https://phabricator.wikimedia.org/T89898) [17:36:59] 6operations, 10Beta-Cluster, 5Patch-For-Review: Unify ::production / ::beta roles for *oid - https://phabricator.wikimedia.org/T86633#1070419 (10yuvipanda) Right, so the file itself is in the parsoid deploy repository, but unsure why the deployment host actually gets the betalabs file. [17:37:00] (03PS2) 10Glaisher: Enable EducationProgram extension on lvwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193138 (https://phabricator.wikimedia.org/T89898) [17:37:41] with that, I’ve to go. cya guys [17:38:00] YuviPanda: will ping you for some beta stuff. [17:38:03] YuviPanda|zzz: ^^ [17:38:06] :) [17:39:20] (03PS1) 10BBlack: Explicit, separated dep ordering for nginx only [puppet/nginx] - 10https://gerrit.wikimedia.org/r/193139 [17:39:39] (03PS1) 10Rush: admin enforce-users-groups exclude nobody from reports [puppet] - 10https://gerrit.wikimedia.org/r/193140 [17:40:32] (03CR) 10BBlack: [C: 032] Explicit, separated dep ordering for nginx only [puppet/nginx] - 10https://gerrit.wikimedia.org/r/193139 (owner: 10BBlack) [17:41:05] (03PS1) 10BBlack: bump nginx module [puppet] - 10https://gerrit.wikimedia.org/r/193142 [17:41:06] ^d: The way to move that stuff into hiera and still let cirrus and logstash co-exist might be create_resources() [17:41:07] 6operations, 6Multimedia, 7HHVM: Convert Imagescalers to HHVM, Trusty - https://phabricator.wikimedia.org/T84842#1070434 (10Bawolff) Ping on this? [17:41:20] (03CR) 10BBlack: [C: 032 V: 032] bump nginx module [puppet] - 10https://gerrit.wikimedia.org/r/193142 (owner: 10BBlack) [17:41:25] (03PS1) 10Dzahn: noc/dbtree: remove the entire dbtree directory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193143 (https://phabricator.wikimedia.org/T90837) [17:41:45] <^d> bd808: never used it [17:41:47] <^d> :) [17:41:55] <^d> (so have no opinions) [17:41:58] me neither but I've read about it :) [17:42:01] ^d: http://puppetlunch.com/puppet/hiera.html#create_resources:%20The%20General%20Solution [17:42:49] <^d> I thought that's what role did for us [17:43:53] You'd use that inside the role to pick the right bag of config for the define [17:44:34] although... as long as logstash fills in all the blanks it shouldn't fall back to hiera [17:45:00] <_joe_> what? [17:45:10] so it could be the special flower and hiera should be able to define the "normal" case of setting up for cirrus [17:45:29] _joe_: https://gerrit.wikimedia.org/r/188702 [17:45:59] trying to guess how we can specify different elasticsearch configs in hiera for cirrus and logstash [17:46:29] <_joe_> bd808: that's wrong [17:46:42] <_joe_> bd808: only class parameters are automatically looked up in hiera [17:47:01] 6operations, 6Release-Engineering, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: nb subdomain redirects - https://phabricator.wikimedia.org/T86924#1070464 (10Glaisher) Rather than doing this by an apache config, I think we should let missing.php handle this. We won't have to make changes whenever... [17:47:03] <_joe_> so all those params defined with if-else should become class parameters [17:48:13] <_joe_> bd808: so yes you can [17:48:21] <_joe_> but not like yuvi did there [17:48:53] <_joe_> bd808: so everything that may vary should be a class parameter [17:49:09] <^d> Yeah trying to get there :) [17:50:58] 6operations, 7Graphite: scale statsd reporting/aggregation (plan) - https://phabricator.wikimedia.org/T89857#1070471 (10ori) >>! In T89857#1070129, @GWicke wrote: > Another option: > > - Use [the statsd proxy](https://github.com/etsy/statsd/blob/master/docs/cluster_proxy.md) to hash packets to specific backen... [17:54:05] (03PS2) 10Rush: admin enforce-users-groups exclude known accounts [puppet] - 10https://gerrit.wikimedia.org/r/193140 [17:55:04] i want to let puppet git clone something from operations/software repo, but i don't want the entire repo, just that one directory inside it. is there a way ? [17:55:53] or should every tool be a separate repo for this [17:56:21] sparse checkouts from git are kind of black magic I think [17:56:55] they are supported at some level but not like they are in svn [17:56:59] (03CR) 10Ori.livneh: [C: 031] import upstream 0.7.0 [debs/statsite] - 10https://gerrit.wikimedia.org/r/193095 (https://phabricator.wikimedia.org/T90111) (owner: 10Filippo Giunchedi) [17:57:32] _joe_: I reverted that patch. [17:57:34] bd808: thanks for giving me the right term to look for though [17:57:38] Immediately.brainfart [17:57:51] I'll check up tomorrow [17:57:54] Night [17:58:42] git config core.sparsecheckout true ... hmm [17:59:17] (03CR) 10Glaisher: [C: 031] "It's used in SiteMatrix as well but it already skips these codes. https://github.com/wikimedia/mediawiki-extensions-SiteMatrix/blob/69c4ae" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166281 (https://bugzilla.wikimedia.org/43697) (owner: 10TTO) [18:00:42] 6operations, 5Patch-For-Review: dbtree - duplicated code in 2 locations - clean up - https://phabricator.wikimedia.org/T90837#1070502 (10Dzahn) need to have puppet git clone the tool from operations/software but not get the entire repo, only ./dbtree/ inside it. either need to use git sparse checkouts (http:/... [18:01:33] 7Puppet, 6operations: removing admin::groups from hiera doesn't revoke permissions - https://phabricator.wikimedia.org/T89961#1070507 (10ArielGlenn) at a minimum this should be documented under 'removing access' on the ops clinic duty wikitech page and maybe in the various yaml files. [18:05:21] (03PS3) 10Rush: admin enforce-users-groups exclude known accounts [puppet] - 10https://gerrit.wikimedia.org/r/193140 [18:07:34] 6operations, 5Patch-For-Review: dbtree - duplicated code in 2 locations - clean up - https://phabricator.wikimedia.org/T90837#1070524 (10Dzahn) I think i'd prefer a separate repo instead of getting into subdirectory checkouts, requested on https://www.mediawiki.org/wiki/Git/New_repositories/Requests [18:08:38] (03CR) 10Dzahn: [C: 04-2] noc/dbtree: remove the entire dbtree directory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193143 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [18:11:46] ori: how do you feel about https://gerrit.wikimedia.org/r/#/c/168999/1 nowadays? i think it's good as long as we manually delete stuff before merging it and i commented to also add "rmdirs => true". we have /tmp/mw-cache-1.23wmf6/ and so on [18:12:33] mutante: agree with both points [18:12:59] mutante: but don't have the time to update the patch atm. if you feel up to it, you could do it yourself, and if not, i'll get to it at some point [18:13:32] ori: ok, sounds good. thx [18:19:44] 7Blocked-on-Operations, 6operations, 10Analytics, 6Mobile-Apps, and 4 others: Avoid cache fragmenting URLs for Share a Fact shares - https://phabricator.wikimedia.org/T90606#1063059 (10KLans_WMF) [18:23:35] (03CR) 10CSteipp: "In Iran and China, we don't force users to use https, because a high percentage of users are blocked from using https." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190203 (owner: 10Se4598) [18:25:29] (03PS2) 10Dzahn: mediawiki: tidy /tmp [puppet] - 10https://gerrit.wikimedia.org/r/168999 (owner: 10Ori.livneh) [18:26:40] (03PS3) 10Dzahn: mediawiki: tidy /tmp [puppet] - 10https://gerrit.wikimedia.org/r/168999 (owner: 10Ori.livneh) [18:32:06] apergos: how about https://gerrit.wikimedia.org/r/#/c/162860/ it's been a while and you have +1s [18:32:17] 6operations, 10Wikimedia-Git-or-Gerrit, 5Patch-For-Review: TransparencyReport repository master in Gerrit silently made private - https://phabricator.wikimedia.org/T89640#1070652 (10akosiaris) Hello, sorry for the long wait, I just managed to work on this today. I have create the gerrit repo wikimedia/Tran... [18:33:08] (03PS1) 10Alexandros Kosiaris: Revert "Disable cloning of TransparencyReport until the repo is public again" [puppet] - 10https://gerrit.wikimedia.org/r/193150 (https://phabricator.wikimedia.org/T89640) [18:33:10] huh I let that slip [18:33:19] have tab open, will shove out tomorrow [18:33:23] :) [18:33:23] (a bit late to babysit now) [18:33:45] cool, was just going through gerrit queue [18:33:54] 6operations, 10hardware-requests: codfw/eqiad: (1) eventlogging node (per site) - https://phabricator.wikimedia.org/T90747#1070665 (10RobH) @tnegrin I appreciate the info, but I don't think we'll have to charge this against anything. We're using hardware that is not only currently spare, but was originally p... [18:33:57] sweet [18:42:13] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/604/change/181952/html/" [puppet] - 10https://gerrit.wikimedia.org/r/181952 (owner: 10JanZerebecki) [18:43:06] Error: The txstatsd pupppet module does not like your init system! at /opt/wmf/software/compare-puppet-catalogs/external/puppet/modules/txstatsd/manifests/init.pp:69 on node tungsten.eqiad.wmnet [18:43:15] (03PS1) 10Cmjohnson: Adding DNS for ms-be1016-1018 [dns] - 10https://gerrit.wikimedia.org/r/193152 [18:43:25] 6operations, 10hardware-requests: deploy logstash1001 - https://phabricator.wikimedia.org/T90904#1070739 (10RobH) 3NEW a:3RobH [18:44:44] (03CR) 10Cmjohnson: [C: 032] Adding DNS for ms-be1016-1018 [dns] - 10https://gerrit.wikimedia.org/r/193152 (owner: 10Cmjohnson) [18:45:04] 6operations, 10hardware-requests: deploy eventlog1001 - https://phabricator.wikimedia.org/T90904#1070739 (10RobH) [18:54:01] 6operations, 10ops-eqiad: label eventlog1001 - https://phabricator.wikimedia.org/T90905#1070838 (10RobH) 3NEW a:3Cmjohnson [18:55:34] (03CR) 10Dzahn: [C: 031] "ignore the tungsten error, not related to this change, compiler doesn't see $::initsystem and puppet runs on tungstn are disabled and it w" [puppet] - 10https://gerrit.wikimedia.org/r/181952 (owner: 10JanZerebecki) [18:56:42] (03CR) 10Dzahn: [C: 032] Delete unused classes and templates. [puppet] - 10https://gerrit.wikimedia.org/r/181952 (owner: 10JanZerebecki) [18:56:57] 6operations, 10hardware-requests: deploy eventlog1001 - https://phabricator.wikimedia.org/T90904#1070860 (10RobH) network side done (port labeled/enabled, vlan set) todo: installer module updates, os install, handoff [18:59:04] 6operations, 10hardware-requests: deploy eventlog2001 - https://phabricator.wikimedia.org/T90907#1070870 (10RobH) 3NEW a:3RobH [19:03:31] (03PS1) 10RobH: setting eventlog2001 mgmt ip [dns] - 10https://gerrit.wikimedia.org/r/193155 [19:05:13] (03CR) 10RobH: [C: 032] setting eventlog2001 mgmt ip [dns] - 10https://gerrit.wikimedia.org/r/193155 (owner: 10RobH) [19:06:42] 6operations: contacts.wikimedia.org drupal unpuppetized - https://phabricator.wikimedia.org/T90679#1070905 (10Dzahn) p:5Triage>3Normal [19:08:14] 6operations, 10ops-codfw: label/update mgmt & settings/test eventlog2001 - https://phabricator.wikimedia.org/T90909#1070907 (10RobH) 3NEW a:3Papaul [19:08:45] 6operations, 10ops-codfw: label/update mgmt & settings/test eventlog2001 - https://phabricator.wikimedia.org/T90909#1070907 (10RobH) [19:08:46] 6operations, 10hardware-requests: codfw/eqiad: (1) eventlogging node (per site) - https://phabricator.wikimedia.org/T90747#1070915 (10RobH) [19:09:34] (03PS6) 10Dzahn: rm module apachesync [puppet] - 10https://gerrit.wikimedia.org/r/177080 [19:09:44] (03CR) 10jenkins-bot: [V: 04-1] rm module apachesync [puppet] - 10https://gerrit.wikimedia.org/r/177080 (owner: 10Dzahn) [19:10:08] 6operations, 10ops-codfw: Please rack & connect the Tampa MX80s in row D - https://phabricator.wikimedia.org/T84658#1070917 (10faidon) a:5mark>3Papaul @Papaul, see above. [19:12:50] (03PS7) 10Dzahn: rm module apachesync [puppet] - 10https://gerrit.wikimedia.org/r/177080 [19:13:47] (03CR) 10Mobrovac: [C: 031] "After discussing it with gwicke, I'm good with these settings, especially with MaxTenuringThreshold. Let's see their effect in action!" [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/193060 (owner: 10GWicke) [19:19:27] (03PS1) 10RobH: setting eventlog1001 dns entry [dns] - 10https://gerrit.wikimedia.org/r/193157 [19:20:30] (03CR) 10RobH: [C: 032] setting eventlog1001 dns entry [dns] - 10https://gerrit.wikimedia.org/r/193157 (owner: 10RobH) [19:24:28] (03PS1) 10RobH: setting eventlog1001 mgmt entry [dns] - 10https://gerrit.wikimedia.org/r/193158 [19:25:19] (03CR) 10RobH: [C: 032] setting eventlog1001 mgmt entry [dns] - 10https://gerrit.wikimedia.org/r/193158 (owner: 10RobH) [19:31:15] 6operations, 10ops-codfw: Please rack & connect the Tampa MX80s in row D - https://phabricator.wikimedia.org/T84658#1070960 (10Papaul) yes sir this is has been complete [19:31:23] (03PS4) 10Rush: admin enforce-users-groups exclude known accounts [puppet] - 10https://gerrit.wikimedia.org/r/193140 [19:32:04] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR [19:32:35] (03PS1) 10Dzahn: 10.in-addr.arps - indentation fixes [dns] - 10https://gerrit.wikimedia.org/r/193160 [19:32:39] <^d> cr1-eqiad? ^ [19:33:10] (03PS1) 10RobH: setting eventlog1001 install params [puppet] - 10https://gerrit.wikimedia.org/r/193161 [19:34:07] (03CR) 10RobH: [C: 032] setting eventlog1001 install params [puppet] - 10https://gerrit.wikimedia.org/r/193161 (owner: 10RobH) [19:37:45] ^d: handled in internal channel [19:37:53] <^d> I was there ;-) [19:38:39] ACKNOWLEDGEMENT - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 220, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-codfw:xe-5/2/1 (Telia, IC-307235) (#2648) [10Gbps wave]BR cpettet notified faidon who said our redundancy is in place and he will file a ticket later [19:45:04] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: Puppet has 1 failures [19:48:35] (03PS1) 10Chad: Additional debugging for "invalid host" error in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193164 [19:48:45] (03CR) 10jenkins-bot: [V: 04-1] Additional debugging for "invalid host" error in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193164 (owner: 10Chad) [19:52:01] (03PS1) 10Thcipriani: Change regex to match labs hiera config [puppet] - 10https://gerrit.wikimedia.org/r/193165 [19:52:22] (03PS2) 10Chad: Additional debugging for "invalid host" error in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193164 [19:52:25] (03PS5) 10Rush: admin enforce-users-groups exclude known accounts [puppet] - 10https://gerrit.wikimedia.org/r/193140 [19:53:21] (03CR) 10Chad: [C: 032] Additional debugging for "invalid host" error in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193164 (owner: 10Chad) [19:55:07] (03PS6) 10Rush: admin enforce-users-groups exclude known accounts [puppet] - 10https://gerrit.wikimedia.org/r/193140 [19:56:41] (03CR) 10Rush: [C: 032] admin enforce-users-groups exclude known accounts [puppet] - 10https://gerrit.wikimedia.org/r/193140 (owner: 10Rush) [19:57:04] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [19:57:56] 7Puppet, 6operations, 6Labs: Values from mwyaml backend don't override values from ops/pupppet yaml files in hieradata/labs - https://phabricator.wikimedia.org/T90466#1071085 (10thcipriani) Since Ifa5c79bc10f1147518fbf352d75c9fcd3019bd72 I don't think mwyaml will read from Hiera: since the regex spe... [19:59:07] (03Merged) 10jenkins-bot: Additional debugging for "invalid host" error in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193164 (owner: 10Chad) [19:59:39] (03PS2) 10Greg Grossmeier: Change regex to match labs hiera config [puppet] - 10https://gerrit.wikimedia.org/r/193165 (https://phabricator.wikimedia.org/T90466) (owner: 10Thcipriani) [20:00:02] just a minor touch up of the commit message, thcipriani ^ [20:00:19] gotcha. Thanks! [20:00:30] np :) [20:03:30] !log demon Synchronized multiversion/MWMultiVersion.php: debuggg (duration: 00m 06s) [20:03:35] Logged the message, Master [20:04:44] (03PS1) 10BBlack: Test explicit tag to workaround PUP-2689 [puppet/nginx] - 10https://gerrit.wikimedia.org/r/193167 [20:06:52] (03CR) 10BBlack: [C: 032] Test explicit tag to workaround PUP-2689 [puppet/nginx] - 10https://gerrit.wikimedia.org/r/193167 (owner: 10BBlack) [20:07:36] (03PS1) 10BBlack: bump nginx module [puppet] - 10https://gerrit.wikimedia.org/r/193168 [20:07:57] (03PS2) 10BBlack: bump nginx module [puppet] - 10https://gerrit.wikimedia.org/r/193168 [20:07:59] (03CR) 10Gage: [C: 032] Next step on the road to understanding JVM GC dynamics [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/193060 (owner: 10GWicke) [20:08:02] (03CR) 10BBlack: [C: 032 V: 032] bump nginx module [puppet] - 10https://gerrit.wikimedia.org/r/193168 (owner: 10BBlack) [20:09:02] (03CR) 10Gage: [V: 032] Next step on the road to understanding JVM GC dynamics [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/193060 (owner: 10GWicke) [20:09:13] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: Puppet has 1 failures [20:09:50] (03PS1) 10GWicke: Update cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/193169 [20:10:15] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [20:11:14] mutante: mind doing a quick submodule update? https://gerrit.wikimedia.org/r/193169 [20:11:24] gage already merged the change there, but had to run [20:12:15] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [20:15:28] (03PS1) 10BBlack: document nginx tag workarounds for PUP-2689 [puppet/nginx] - 10https://gerrit.wikimedia.org/r/193171 [20:15:47] (03CR) 10BBlack: [C: 032] document nginx tag workarounds for PUP-2689 [puppet/nginx] - 10https://gerrit.wikimedia.org/r/193171 (owner: 10BBlack) [20:16:14] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [20:16:32] !log disabled puppet on xenon to test bulk db creation with restbase [20:16:41] Logged the message, Master [20:22:51] (03PS1) 10BBlack: bump nginx module [puppet] - 10https://gerrit.wikimedia.org/r/193173 [20:22:53] (03PS1) 10BBlack: workaround PUP-2689 for localssl nginx files [puppet] - 10https://gerrit.wikimedia.org/r/193174 [20:23:15] (03CR) 10BBlack: [C: 032 V: 032] bump nginx module [puppet] - 10https://gerrit.wikimedia.org/r/193173 (owner: 10BBlack) [20:23:27] (03CR) 10BBlack: [C: 032 V: 032] workaround PUP-2689 for localssl nginx files [puppet] - 10https://gerrit.wikimedia.org/r/193174 (owner: 10BBlack) [20:26:21] (03PS1) 10Cmjohnson: Adding dhcpd entries for ms-be1016-18 [puppet] - 10https://gerrit.wikimedia.org/r/193175 [20:27:11] (03CR) 10Cmjohnson: [C: 032] Adding dhcpd entries for ms-be1016-18 [puppet] - 10https://gerrit.wikimedia.org/r/193175 (owner: 10Cmjohnson) [20:30:50] phuedx, you appear to have 2 shell accounts - phuedx and ssmith. which of them do you actually use? [20:32:04] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:34:53] (03PS1) 10BBlack: order nginx->varnish explicitly [puppet] - 10https://gerrit.wikimedia.org/r/193206 [20:35:37] (03PS1) 10Ori.livneh: Add .gitreview [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193213 [20:35:48] 6operations, 10ops-eqiad: Rack and set up ms-be1016-1018 - https://phabricator.wikimedia.org/T90922#1071198 (10Cmjohnson) 3NEW a:3Cmjohnson [20:36:24] 6operations, 10ops-eqiad: Rack and set up ms-be1016-1018 - https://phabricator.wikimedia.org/T90922#1071206 (10Cmjohnson) a:5Cmjohnson>3fgiunchedi Reassigning to @filippo for install [20:36:40] /me looks around for an opsen with +2 powers for https://gerrit.wikimedia.org/r/#/c/193169/ [20:37:22] (cassandra JVM GC settings back close to defaults) [20:38:03] PROBLEM - Cassandra database on xenon is CRITICAL: PROCS CRITICAL: 0 processes with UID = 109 (cassandra), command name java, args CassandraDaemon [20:38:17] !log cassandra on test cluster seems to be broken, probably missing the graphite logger [20:38:23] Logged the message, Master [20:39:46] (03CR) 10BBlack: [C: 032] order nginx->varnish explicitly [puppet] - 10https://gerrit.wikimedia.org/r/193206 (owner: 10BBlack) [20:40:53] !log issue with cassandra test cluster is actually that it's still running cassandra 2.1.2, which is incompatible with the current puppet config; should probably update the test cluster to jessie soon [20:40:57] Logged the message, Master [20:42:45] (03CR) 10Ori.livneh: [C: 032 V: 032] Add .gitreview [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193213 (owner: 10Ori.livneh) [20:42:50] (03PS1) 10Ori.livneh: Initial commit of statsdlb [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193255 [20:42:56] gwicke, godog ^ [20:43:13] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071221 (10chasemp) [20:44:07] ori: why write our own? [20:44:19] service supervisor, you mean? [20:44:22] we'd have to add heartbeats and health tracking as well [20:44:51] i think statsite will handle the load, but i'm not sure fillipo will have time to deploy it before he goes on vacation [20:45:02] so this could help us continue to use txstatsd for the next couple of weeks [20:45:11] why not https://github.com/etsy/statsd/blob/master/docs/cluster_proxy.md ? [20:45:26] because there's no time to do a sensible migration [20:45:31] to either that, or to statsite [20:45:36] i also find the statsd codebase quite poor [20:45:55] the proxy has more robustness than yours [20:45:58] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071224 (10chasemp) [20:46:13] I would also think that it'll work with txstatsd [20:46:35] OK [20:47:03] 6operations, 6Security: define in Puppet or remove user account - khorn - https://phabricator.wikimedia.org/T90924#1071228 (10Dzahn) 3NEW [20:47:16] ori: up for trying it? [20:47:35] no [20:47:44] but you can suggest it and i'll reply [20:48:22] (03PS1) 10Nuria: Changing permits on agreggator depot once downloaded [puppet] - 10https://gerrit.wikimedia.org/r/193256 (https://phabricator.wikimedia.org/T90742) [20:48:32] ori: consider it suggested [20:48:38] ;) [20:48:56] 6operations, 6Security: define in Puppet or remove user account - khorn - https://phabricator.wikimedia.org/T90926#1071254 (10Dzahn) 3NEW [20:49:09] (03CR) 10jenkins-bot: [V: 04-1] Changing permits on agreggator depot once downloaded [puppet] - 10https://gerrit.wikimedia.org/r/193256 (https://phabricator.wikimedia.org/T90742) (owner: 10Nuria) [20:50:27] 6operations, 6Security: define in Puppet or remove user account - khorn - https://phabricator.wikimedia.org/T90926#1071254 (10Dzahn) [20:50:27] 6operations, 6Security: define in Puppet or remove user account - khorn - https://phabricator.wikimedia.org/T90924#1071262 (10Dzahn) [20:50:38] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071265 (10Legoktm) I have no idea why I have access to those hosts and don't think I've ever used them. [20:50:40] (03PS2) 10Nuria: Changing permits on agreggator depot once downloaded [puppet] - 10https://gerrit.wikimedia.org/r/193256 (https://phabricator.wikimedia.org/T90742) [20:51:00] 6operations: define tfinc shell access in puppet - https://phabricator.wikimedia.org/T90927#1071266 (10RobH) 3NEW a:3Tfinc [20:51:34] (03CR) 10jenkins-bot: [V: 04-1] Changing permits on agreggator depot once downloaded [puppet] - 10https://gerrit.wikimedia.org/r/193256 (https://phabricator.wikimedia.org/T90742) (owner: 10Nuria) [20:51:42] 6operations, 6Security: define in Puppet or remove user account - jdlrobson - https://phabricator.wikimedia.org/T90928#1071273 (10Dzahn) 3NEW [20:52:35] (03PS1) 10BBlack: order nginx->varnish explicitly, take 2 [puppet] - 10https://gerrit.wikimedia.org/r/193258 [20:52:45] 10Ops-Access-Requests, 6operations: define in Puppet or remove user account - tfinc - https://phabricator.wikimedia.org/T90927#1071291 (10RobH) [20:52:50] (03CR) 10BBlack: [C: 032 V: 032] order nginx->varnish explicitly, take 2 [puppet] - 10https://gerrit.wikimedia.org/r/193258 (owner: 10BBlack) [20:53:39] ori: does txstatsd have an adminport too? [20:57:39] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071304 (10chasemp) [20:58:01] 6operations, 6Security: define in Puppet or remove user account - jamesur - https://phabricator.wikimedia.org/T90930#1071305 (10Dzahn) 3NEW [20:58:22] !log demon Synchronized multiversion/MWMultiVersion.php: moar debug (duration: 00m 06s) [20:58:29] Logged the message, Master [20:59:27] 6operations, 6Security: define in Puppet or remove user account - ironholds - https://phabricator.wikimedia.org/T90931#1071311 (10Dzahn) 3NEW [21:00:16] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - tnegrin - https://phabricator.wikimedia.org/T90932#1071319 (10RobH) 3NEW a:3Tnegrin [21:00:26] <^d> !log mw1161 is complaining about permissions on setting mtime during rsync [21:00:31] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - tfinc - https://phabricator.wikimedia.org/T90927#1071326 (10RobH) [21:00:31] Logged the message, Master [21:01:23] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071327 (10Smalyshev) I (smalyshev) don't need account on cerium right now (probably left over from Titan work). If there are questions about any other host... [21:02:27] 6operations, 6Security: define in Puppet or remove user account - legoktm - https://phabricator.wikimedia.org/T90933#1071329 (10Dzahn) 3NEW [21:02:34] 7Puppet, 6operations: removing admin::groups from hiera doesn't revoke permissions - https://phabricator.wikimedia.org/T89961#1071336 (10chasemp) When we sussed this out the determination basically that we weren't too concerned with leaving behind groups. users, yes. groups, no. So the right thing in terms... [21:03:03] 6operations, 6Security: define in Puppet or remove user account - legoktm - https://phabricator.wikimedia.org/T90933#1071329 (10Dzahn) Legoktm.Via Web · Thu, Feb 26, 12:50 PM I have no idea why I have access to those hosts and don't think I've ever used them. [21:04:30] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071347 (10Jalexander) Yeah, it looks like both of the ones I'm on are log collectors so I may well have used them in the past with udp2log etc during the f... [21:05:07] ori: I have the statsd proxy running on ruthenium, with statsd.eqiad.wmnet as a backend -- seems to be working fine [21:05:09] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - cscott - https://phabricator.wikimedia.org/T90935#1071348 (10RobH) 3NEW a:3cscott [21:05:59] robh: what? [21:06:04] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071356 (10chasemp) [21:06:09] 6operations, 6Security: define in Puppet or remove user account - jamesur - https://phabricator.wikimedia.org/T90930#1071357 (10Dzahn) James, We're in the process of auditing and cleaning up our access lists to servers. During this audit, we found that your account is enabled on two systems, but not accounted... [21:06:18] cscott: dont panic! [21:06:26] its cleanup, you arent in trouble =] [21:07:04] we found some user entris for access on systems where our admin module doesn't have corresponding data [21:07:11] likely from before it was in general use. [21:07:26] 6operations, 6Security: define in Puppet or remove user account - jamesur - https://phabricator.wikimedia.org/T90930#1071365 (10Jalexander) >>! In T90930#1071357, @Dzahn wrote: > James, > > We're in the process of auditing and cleaning up our access lists to servers. During this audit, we found that your acco... [21:07:27] so we just need to clean up your access to match, so we list off the items we dont have accounting for, and let you account for them [21:07:29] 6operations, 6Security: define in Puppet or remove user account - ironholds - https://phabricator.wikimedia.org/T90931#1071366 (10Dzahn) Oliver, We're in the process of auditing and cleaning up our access lists to servers. During this audit, we found that your account is enabled on two systems, but not accoun... [21:08:19] 6operations, 6Security: define in Puppet or remove user account - khorn - https://phabricator.wikimedia.org/T90926#1071367 (10Dzahn) Katie, We're in the process of auditing and cleaning up our access lists to servers. During this audit, we found that your account is enabled on two systems, but not accounted f... [21:08:26] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - cscott - https://phabricator.wikimedia.org/T90935#1071368 (10cscott) I have no idea what oxygen, erbium, and gadolinium do, so I'm pretty sure I don't need accounts there. [21:10:00] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071369 (10chasemp) >>! In T90923#1071347, @Jalexander wrote: > Yeah, it looks like both of the ones I'm on are log collectors so I may well have used them... [21:10:37] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - santhosh - https://phabricator.wikimedia.org/T90937#1071372 (10RobH) 3NEW a:3santhosh [21:11:03] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071387 (10chasemp) [21:11:15] 6operations, 6Security: define in Puppet or remove user account - jdlrobson - https://phabricator.wikimedia.org/T90928#1071388 (10Dzahn) Jon, We're in the process of auditing and cleaning up our access lists to servers. During this audit, we found that your account is enabled on two systems, but not accounted... [21:11:34] 6operations, 7Graphite: scale statsd reporting/aggregation (plan) - https://phabricator.wikimedia.org/T89857#1071389 (10GWicke) Fwiw, I have [the statsd proxy](https://github.com/etsy/statsd/blob/master/docs/cluster_proxy.md) running on ruthenium, with txstatsd as a backend, and it seems to be working fine. I... [21:13:09] you just had accounts sitting around on random systems that weren't controlled by puppet? :| [21:13:27] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - smalyshev - https://phabricator.wikimedia.org/T90939#1071390 (10RobH) 3NEW a:3Smalyshev [21:14:24] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - smalyshev - https://phabricator.wikimedia.org/T90939#1071401 (10Smalyshev) I don't currently need access on those three. [21:14:29] 6operations, 6Security: T90937: define in Puppet or remove user account - hoo - https://phabricator.wikimedia.org/T90940#1071402 (10Dzahn) 3NEW [21:15:06] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - smalyshev - https://phabricator.wikimedia.org/T90939#1071409 (10Smalyshev) a:5Smalyshev>3RobH [21:17:19] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071420 (10RobH) [21:17:20] 6operations: Puppet should actively purge sudo and access rights not enumerated by the admins module - https://phabricator.wikimedia.org/T88826#1071419 (10RobH) [21:18:39] 6operations, 6Security: define in Puppet or remove user account - hoo - https://phabricator.wikimedia.org/T90940#1071424 (10RobH) [21:19:25] (03PS1) 10BBlack: 3.0.6plus-wm6: do not start varnishd post-install [debs/varnish] (3.0.6-plus-wm) - 10https://gerrit.wikimedia.org/r/193263 [21:19:31] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - ssastry - https://phabricator.wikimedia.org/T90941#1071427 (10RobH) 3NEW a:3ssastry [21:19:33] some of these systems predate puppet truly [21:19:47] or meaningful puppet [21:22:04] 6operations, 6Security: define in Puppet or remove user account - mah - https://phabricator.wikimedia.org/T90944#1071452 (10Dzahn) 3NEW [21:22:18] yeah, a lot of people seem to have been on oxygen and gadolinium [21:22:27] but wasn't hoo's account relatively recent? [21:24:51] (03PS2) 10BBlack: varnish (3.0.6plus-wm6) jessie-wikimedia; urgency=low [debs/varnish] (3.0.6-plus-wm) - 10https://gerrit.wikimedia.org/r/193263 [21:24:51] 6operations, 6Security: define in Puppet or remove user account - ssmith - https://phabricator.wikimedia.org/T90946#1071473 (10RobH) 3NEW a:3Tfinc [21:25:00] 6operations, 6Security: define in Puppet or remove user account - ssmith - https://phabricator.wikimedia.org/T90946#1071473 (10RobH) [21:25:10] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - ssmith - https://phabricator.wikimedia.org/T90946#1071473 (10RobH) [21:26:19] Krenair: Apr 3 2012 [21:26:22] best i can tell [21:26:30] for oxygen [21:26:34] 6operations, 6Security: define in Puppet or remove user account - mglaser - https://phabricator.wikimedia.org/T90947#1071485 (10Dzahn) 3NEW [21:27:09] chasemp, is when hoo's account was created on oxygen? [21:27:32] best guess, yes [21:27:44] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - ssastry - https://phabricator.wikimedia.org/T90941#1071491 (10ssastry) I didn't even know I had access to these servers till now. So, safe to boot me from those servers. [21:28:37] chasemp, wtf? [21:28:53] not sure what you are wtf'ing [21:29:05] he only got shell access about a year ago [21:29:15] https://gerrit.wikimedia.org/r/#/c/112168/ [21:29:29] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - dartar - https://phabricator.wikimedia.org/T90949#1071501 (10RobH) 3NEW a:3DarTar [21:29:58] so there was actually a system his (and many others'...) account existed on almost two years prior? [21:30:02] my guess is based on the stamp on this .profile in /home/hoo [21:30:04] hoo wikidev 675 Apr 3 2012 .profile [21:30:25] which may be misleading [21:30:32] hoo wikidev 4096 Feb 10 2014 .ssh [21:31:20] I would have to poke about to really know [21:33:45] 6operations, 6Security: define in Puppet or remove user account - amire80 - https://phabricator.wikimedia.org/T90950#1071521 (10Dzahn) 3NEW [21:34:40] Krenair: ah so feb 10 it is, the skeleton files appear to be copied with timestamps intact [21:34:51] forgive my brief wtf'ing [21:34:56] ok [21:35:08] that date seems more reasonable [21:37:07] (03CR) 10BBlack: [C: 032] varnish (3.0.6plus-wm6) jessie-wikimedia; urgency=low [debs/varnish] (3.0.6-plus-wm) - 10https://gerrit.wikimedia.org/r/193263 (owner: 10BBlack) [21:38:40] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - diederik - https://phabricator.wikimedia.org/T90951#1071535 (10RobH) 3NEW a:3drdee [21:41:13] any ops want to merge a no-op puppet patch for CI? https://gerrit.wikimedia.org/r/#/c/192466/ :) [21:41:48] (03PS2) 10Rush: contint: Use 'wmf-deploy' branch for cloning mediawiki/tools/codesniffer [puppet] - 10https://gerrit.wikimedia.org/r/192466 (https://phabricator.wikimedia.org/T90495) (owner: 10Legoktm) [21:41:50] (03CR) 10Andrew Bogott: [C: 032] contint: Use 'wmf-deploy' branch for cloning mediawiki/tools/codesniffer [puppet] - 10https://gerrit.wikimedia.org/r/192466 (https://phabricator.wikimedia.org/T90495) (owner: 10Legoktm) [21:42:18] heh [21:43:53] andrewbogott: I think you'll have to re+2 PS2? [21:44:12] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [21:44:25] (03CR) 10Andrew Bogott: [C: 032] contint: Use 'wmf-deploy' branch for cloning mediawiki/tools/codesniffer [puppet] - 10https://gerrit.wikimedia.org/r/192466 (https://phabricator.wikimedia.org/T90495) (owner: 10Legoktm) [21:44:38] andrewbogott: thanks! [21:46:17] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - edenhill - https://phabricator.wikimedia.org/T90953#1071559 (10RobH) 3NEW [21:48:58] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - leila - https://phabricator.wikimedia.org/T90954#1071569 (10RobH) 3NEW a:3leila [21:49:29] 6operations: Upgrade xenon, cerium and praseodymium to jessie - https://phabricator.wikimedia.org/T90955#1071577 (10GWicke) 3NEW [21:49:38] 6operations, 10hardware-requests: codfw/eqiad: (1) eventlogging node (per site) - https://phabricator.wikimedia.org/T90747#1071586 (10Cmjohnson) [21:49:38] 6operations, 10ops-eqiad: label eventlog1001 - https://phabricator.wikimedia.org/T90905#1071585 (10Cmjohnson) 5Open>3Resolved [21:50:00] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - edenhill - https://phabricator.wikimedia.org/T90953#1071590 (10chasemp) ```robh: ottomata: you added him to something last october robh: this person gone now?``` @ottomata [21:50:50] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - edenhill - https://phabricator.wikimedia.org/T90953#1071594 (10RobH) I noticed that @ottomata also then reverted the change a month later. [21:51:10] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - milimetric - https://phabricator.wikimedia.org/T90956#1071598 (10RobH) 3NEW a:3Milimetric [21:52:30] 6operations, 6Security: define in Puppet or remove user account - nuria - https://phabricator.wikimedia.org/T90957#1071615 (10RobH) 3NEW a:3Nuria [21:53:01] RECOVERY - Cassandra database on xenon is OK: PROCS OK: 1 process with UID = 109 (cassandra), command name java, args CassandraDaemon [21:54:57] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - milimetric - https://phabricator.wikimedia.org/T90956#1071630 (10Milimetric) Thanks for the ping @RobH. I only need basic access to stat1001, no sudo or special access. I am a member of the analytics team and thus need... [21:55:17] !log disabled puppet on cassandra test hosts cerium and praseodymium as well (in addition to xenon) to manually fix incompatible puppet config & re-initialize cluster after cluster name change [21:55:22] Logged the message, Master [21:55:41] 6operations, 6Security: define in Puppet or remove user account - nuria - https://phabricator.wikimedia.org/T90957#1071633 (10Nuria) I do not need access to: oxygen.wikimedia.org gadolinium.wikimedia.org Access has been granted recently for me to tin, hafnium and vanadium (sudo in all) and that should be... [21:56:47] 6operations, 10hardware-requests: deploy eventlog1001 - https://phabricator.wikimedia.org/T90904#1071643 (10RobH) 5Open>3Resolved @Ori, System eventlog1001 has puppet/salt keys accepted and is calling in. Resolving this task (the parent of both eventlog1001 and eventlog2001 is still open.) [21:56:48] 6operations, 10hardware-requests: codfw/eqiad: (1) eventlogging node (per site) - https://phabricator.wikimedia.org/T90747#1071645 (10RobH) [22:01:43] (03PS1) 10Odder: Add rollbacker user group to Nepali Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193273 (https://phabricator.wikimedia.org/T90888) [22:02:17] (03PS1) 10Odder: Add autopatrolled user group to Nepali Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193274 (https://phabricator.wikimedia.org/T89816) [22:02:47] 6operations, 6Security: define in Puppet or remove user account - qchris - https://phabricator.wikimedia.org/T90959#1071681 (10Dzahn) 3NEW [22:03:37] 6operations, 10hardware-requests: codfw: virtulization servers for misc services - https://phabricator.wikimedia.org/T89161#1071688 (10RobH) 5Open>3declined rejecting this out, since we are handling the specs in rt [22:03:38] 6operations: Introduce Virtualization in our infrastructure - https://phabricator.wikimedia.org/T87258#1071690 (10RobH) [22:03:52] PROBLEM - Cassandra database on praseodymium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 109 (cassandra), command name java, args CassandraDaemon [22:04:02] 6operations, 6Security: define in Puppet or remove user account - qchris - https://phabricator.wikimedia.org/T90959#1071681 (10Dzahn) vanadium.eqiad.wmnet: qchris:x:2153:500:Christian Aistleitner:/home/qchris:/bin/bash -- CONFIRMED BY QCHRIS. It's ok to remove the account from that host (2015-02-26) oxygen... [22:04:55] 6operations, 6Security: define in Puppet or remove user account - qchris - https://phabricator.wikimedia.org/T90959#1071697 (10Dzahn) a:3Dzahn [22:05:01] RECOVERY - Cassandra database on praseodymium is OK: PROCS OK: 1 process with UID = 109 (cassandra), command name java, args CassandraDaemon [22:05:34] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - ssmith - https://phabricator.wikimedia.org/T90946#1071698 (10MaxSem) a:5Tfinc>3phuedx [22:05:45] 6operations, 6Labs, 10hardware-requests: Buy at least one more virt server for eqiad - https://phabricator.wikimedia.org/T90783#1071699 (10RobH) [22:05:51] 6operations, 6Labs, 10hardware-requests: New hp servers for labs - https://phabricator.wikimedia.org/T89752#1071702 (10RobH) [22:06:21] PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:08:03] the last run check doesn't trigger until you happen to run again? [22:08:31] RECOVERY - puppet last run on cp4011 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [22:08:39] ^d: Mind if I add https://gerrit.wikimedia.org/r/#/c/193273/ and https://gerrit.wikimedia.org/r/#/c/193274/ for the coming SWAT window but /do not/ appear here when it's deployed? [22:09:12] PROBLEM - puppet last run on cp1058 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:22] PROBLEM - puppet last run on cp1067 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:22] PROBLEM - puppet last run on cp1045 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:22] PROBLEM - puppet last run on cp1044 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:23] PROBLEM - puppet last run on cp1060 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:32] PROBLEM - puppet last run on cp1066 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:32] PROBLEM - puppet last run on cp1043 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:32] PROBLEM - puppet last run on cp1070 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:32] PROBLEM - puppet last run on cp1052 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:32] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:32] PROBLEM - puppet last run on cp1062 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:32] PROBLEM - puppet last run on cp4017 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:33] PROBLEM - puppet last run on cp4012 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:33] PROBLEM - puppet last run on cp1063 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:34] PROBLEM - puppet last run on cp4013 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:34] PROBLEM - puppet last run on cp3012 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:35] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:40] guessing that's you bblack? [22:09:41] :) [22:09:44] yup [22:09:51] PROBLEM - puppet last run on cp1055 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:51] PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:52] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:52] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:52] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:52] PROBLEM - puppet last run on amssq45 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:52] PROBLEM - puppet last run on cp3009 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:53] PROBLEM - puppet last run on amssq46 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:53] PROBLEM - puppet last run on amssq49 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:54] PROBLEM - puppet last run on amssq61 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:54] PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:55] PROBLEM - puppet last run on cp4020 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:55] PROBLEM - puppet last run on cp4019 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:09:56] PROBLEM - puppet last run on cp4006 is CRITICAL: CRITICAL: Puppet last ran 5 hours ago [22:10:04] kind of a retarded check. they have been offline for 5 hours, and now they're alerting as I re-enable and run them for the first time [22:11:16] I guess it must be that it suppresses "last run" check when disabled, and now they're alerting in the lag time between mass-re-enable + slowly running on each [22:11:41] RECOVERY - puppet last run on cp1060 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [22:11:42] RECOVERY - puppet last run on cp4017 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [22:11:42] RECOVERY - puppet last run on cp4012 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [22:11:51] RECOVERY - puppet last run on cp4013 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [22:12:01] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [22:12:01] RECOVERY - puppet last run on cp4009 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [22:12:02] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [22:12:12] RECOVERY - puppet last run on cp4019 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [22:12:12] RECOVERY - puppet last run on cp4020 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [22:12:12] RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [22:12:12] RECOVERY - puppet last run on cp4016 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [22:12:12] RECOVERY - puppet last run on cp4015 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [22:12:26] (03PS2) 10Ori.livneh: Initial commit of statsdlb [software/statsdlb] - 10https://gerrit.wikimedia.org/r/193255 [22:12:53] RECOVERY - puppet last run on cp3012 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [22:12:53] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [22:13:11] RECOVERY - puppet last run on cp3009 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [22:13:22] 6operations, 6Security: define in Puppet or remove user account - qchris - https://phabricator.wikimedia.org/T90959#1071717 (10Dzahn) I ran "deluser qchris" and all 4 hosts above and also deleted the home directories. Closing as resolved. Chris said all that he still needs is in puppet and yep he is in a coup... [22:13:39] 6operations, 6Security: define in Puppet or remove user account - qchris - https://phabricator.wikimedia.org/T90959#1071719 (10Dzahn) 5Open>3Resolved [22:13:40] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071720 (10Dzahn) [22:13:52] RECOVERY - puppet last run on cp1052 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [22:14:01] RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [22:14:09] or RoanKattouw or Krenair ^^ [22:14:12] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [22:14:12] RECOVERY - puppet last run on cp4005 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [22:14:12] RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [22:14:22] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [22:14:22] RECOVERY - puppet last run on cp4007 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [22:14:22] RECOVERY - puppet last run on cp4006 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [22:14:22] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [22:14:22] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [22:14:22] RECOVERY - puppet last run on cp3022 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [22:14:22] RECOVERY - puppet last run on cp3019 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [22:14:23] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [22:14:23] RECOVERY - puppet last run on cp3020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:14:24] RECOVERY - puppet last run on amssq31 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [22:14:35] ^d: Need some help from the SWAT team. https://gerrit.wikimedia.org/r/#/c/193271/ fixes a bug in the current version of Mediawiki on mediawiki.org. [22:14:52] RECOVERY - puppet last run on cp1067 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [22:14:52] RECOVERY - puppet last run on cp1044 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [22:15:12] RECOVERY - puppet last run on cp1053 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [22:15:22] RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [22:15:22] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [22:15:22] RECOVERY - puppet last run on cp1054 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [22:15:29] twkozlowski, lgtm. you can probably put my name next to it as well [22:15:31] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [22:15:32] RECOVERY - puppet last run on amssq39 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [22:15:32] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [22:15:32] RECOVERY - puppet last run on amssq33 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [22:15:32] RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [22:15:52] RECOVERY - puppet last run on cp1045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:16:12] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [22:16:12] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [22:16:12] RECOVERY - puppet last run on cp3018 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [22:16:33] RECOVERY - puppet last run on cp3013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:16:33] RECOVERY - puppet last run on amssq37 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [22:16:33] RECOVERY - puppet last run on cp3015 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [22:16:43] RECOVERY - puppet last run on cp1065 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [22:16:43] RECOVERY - puppet last run on cp1068 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [22:16:55] Krenair: Great, I'll be heading for bed, then. Hope everything goes fine (it should :-) [22:17:03] RECOVERY - puppet last run on cp1066 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [22:17:03] RECOVERY - puppet last run on cp1070 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [22:17:13] RECOVERY - puppet last run on cp1063 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [22:17:13] RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [22:17:22] RECOVERY - puppet last run on cp1069 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [22:17:32] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [22:17:42] RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [22:18:02] RECOVERY - puppet last run on cp1058 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [22:18:22] RECOVERY - puppet last run on cp1062 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [22:18:22] RECOVERY - puppet last run on amssq50 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [22:18:22] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [22:18:33] RECOVERY - puppet last run on cp1055 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [22:18:41] RECOVERY - puppet last run on amssq45 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [22:18:42] RECOVERY - puppet last run on cp1057 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [22:18:52] RECOVERY - puppet last run on cp1051 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [22:18:52] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [22:19:08] 6operations, 6Labs, 10hardware-requests: New hp servers for labs - https://phabricator.wikimedia.org/T89752#1071759 (10RobH) [22:19:12] RECOVERY - puppet last run on cp1059 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:19:18] 6operations, 6Labs, 10hardware-requests: Buy at least one more virt server for eqiad - https://phabricator.wikimedia.org/T90783#1071762 (10RobH) [22:19:22] RECOVERY - puppet last run on cp1043 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [22:19:32] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [22:19:33] RECOVERY - puppet last run on amssq52 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [22:19:39] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/605/change/177080/html/tin.eqiad.wmnet.html" [puppet] - 10https://gerrit.wikimedia.org/r/177080 (owner: 10Dzahn) [22:19:42] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [22:19:42] RECOVERY - puppet last run on cp1049 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [22:19:52] RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [22:20:42] RECOVERY - puppet last run on amssq60 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [22:21:02] RECOVERY - puppet last run on amssq61 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [22:21:11] RECOVERY - puppet last run on amssq62 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [22:21:12] RECOVERY - puppet last run on amssq58 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [22:21:12] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [22:21:12] RECOVERY - puppet last run on amssq57 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [22:23:12] RECOVERY - puppet last run on amssq46 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [22:23:21] RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:23:22] RECOVERY - puppet last run on amssq53 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:23:22] RECOVERY - puppet last run on amssq55 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:23:22] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [22:23:22] RECOVERY - puppet last run on amssq48 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:23:22] RECOVERY - puppet last run on amssq43 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [22:23:22] RECOVERY - puppet last run on amssq44 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [22:23:23] RECOVERY - puppet last run on amssq54 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [22:23:23] RECOVERY - puppet last run on amssq51 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [22:23:50] 6operations, 6Security: define in Puppet or remove user account - jdlrobson - https://phabricator.wikimedia.org/T90928#1071784 (10Jdlrobson) I have no idea my friend. To my knowledge I do not use oxygen or gadolinium ... :) [22:24:23] RECOVERY - puppet last run on amssq42 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [22:24:52] 6operations, 6Labs, 10hardware-requests: eqiad: (4) virt nodes - https://phabricator.wikimedia.org/T89752#1071798 (10RobH) 5Open>3stalled a:3RobH [22:25:02] RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [22:25:17] 6operations, 6Labs, 10hardware-requests: eqiad: (1) virt node - https://phabricator.wikimedia.org/T90783#1071803 (10RobH) 5Open>3stalled [22:25:32] RECOVERY - puppet last run on amssq36 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [22:25:32] RECOVERY - puppet last run on amssq32 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [22:25:32] RECOVERY - puppet last run on amssq41 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [22:25:32] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [22:25:32] RECOVERY - puppet last run on amssq34 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [22:27:06] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - smalyshev - https://phabricator.wikimedia.org/T90939#1071814 (10RobH) 5Open>3Resolved Resolving as the parent ticket is now updated to reflect these can be removed. [22:27:08] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071816 (10RobH) [22:34:18] 6operations: Replace apache-graceful-all with something that leverages Salt - https://phabricator.wikimedia.org/T83880#1071866 (10Dzahn) https://gerrit.wikimedia.org/r/#/c/177080/ http://puppet-compiler.wmflabs.org/605/change/177080/html/tin.eqiad.wmnet.html [22:35:18] 6operations, 6Security: define in Puppet or remove user account - legoktm - https://phabricator.wikimedia.org/T90933#1071871 (10Dzahn) a:3Dzahn [22:36:13] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071876 (10RobH) [22:36:58] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071208 (10RobH) [22:36:59] 6operations, 6Security: define in Puppet or remove user account - nuria - https://phabricator.wikimedia.org/T90957#1071879 (10RobH) 5Open>3Resolved resolving this task as the parent task has been updated to reflect nuria doesn't need access to the non-puppetized access hosts of oxygen or gadolinium [22:37:36] 6operations, 6Security: define in Puppet or remove user account - legoktm - https://phabricator.wikimedia.org/T90933#1071883 (10Dzahn) affected hosts here are: oxygen.wikimedia.org gadolinium.wikimedia.org: [22:39:32] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071887 (10RobH) [22:40:17] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - cscott - https://phabricator.wikimedia.org/T90935#1071891 (10RobH) 5Open>3Resolved @cscott: thanks for the quick reply! I'm resolving this task, as I've updated the parent task with the results of this. Thanks! [22:40:18] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071208 (10RobH) [22:41:55] (03PS1) 10Springle: Add an m4-master CNAME, to split off Eventlogging from m2 [dns] - 10https://gerrit.wikimedia.org/r/193276 [22:43:02] (03CR) 10Springle: [C: 032] Add an m4-master CNAME, to split off Eventlogging from m2 [dns] - 10https://gerrit.wikimedia.org/r/193276 (owner: 10Springle) [22:43:10] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071905 (10RobH) [22:43:10] 6operations, 6Security: define in Puppet or remove user account - jamesur - https://phabricator.wikimedia.org/T90930#1071902 (10RobH) 5Open>3Resolved a:3RobH I'm resolving this task, as the parent task summary has been udpated to reflect that James doesn't need the access and can have it removed from oxy... [22:43:29] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071908 (10Dzahn) [22:43:30] 6operations, 6Security: define in Puppet or remove user account - legoktm - https://phabricator.wikimedia.org/T90933#1071906 (10Dzahn) 5Open>3Resolved "deluser legoktm" and rm -rf'ed the home dir on both hosts. resolving [22:43:46] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [22:45:46] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071917 (10RobH) [22:46:30] 6operations, 6Security: define in Puppet or remove user account - jdlrobson - https://phabricator.wikimedia.org/T90928#1071919 (10RobH) 5Open>3Resolved a:3RobH I'm setting this task to resolved, as I've updated the parent task with the result (ok to remove from oxygen/gadolinium). Thanks @jdlrobson for... [22:46:31] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071208 (10RobH) [22:49:07] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071924 (10RobH) [22:54:44] 6operations, 6Security: define in Puppet or remove user account - khorn - https://phabricator.wikimedia.org/T90926#1071927 (10K4-713) You know, I'm not sure what I had those for either. Considering that, it sounds pretty safe to go ahead and cut me off. [22:58:19] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071941 (10Dzahn) [22:58:20] 6operations, 6Security: define in Puppet or remove user account - ironholds - https://phabricator.wikimedia.org/T90931#1071938 (10Dzahn) 5Open>3Resolved a:3Dzahn 14:50 < mutante> fine to remove you from those 2 boxes? ok, thanks, i will do that now 14:50 <+Ironholds> sure! deluser ironholds on both hos... [22:58:34] 6operations, 6Security: define in Puppet or remove user account - khorn - https://phabricator.wikimedia.org/T90926#1071945 (10Dzahn) a:3Dzahn [22:59:41] !log git-deploy: Deploying integration/slave-scripts b532a9a..05a5593 [22:59:45] Logged the message, Master [23:00:48] 6operations, 6Security: define in Puppet or remove user account - khorn - https://phabricator.wikimedia.org/T90926#1071954 (10Dzahn) Katie, thank you for the quick reply. I just deleted your accounts on both hosts and also removed the empty home directories. resolving the ticket, Best, Daniel [23:01:35] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071959 (10Dzahn) [23:01:36] 6operations, 6Security: define in Puppet or remove user account - khorn - https://phabricator.wikimedia.org/T90926#1071958 (10Dzahn) 5Open>3Resolved [23:03:06] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - ssmith - https://phabricator.wikimedia.org/T90946#1071963 (10MaxSem) Note that Sam appears to have 2 accounts: phuedx and ssmith. There can be only one! [23:03:32] (03PS1) 10Kaldari: Updating WikiGrok configs for new global vars [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193283 [23:05:17] (03PS2) 10Kaldari: Updating WikiGrok configs for new global vars [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193283 [23:16:12] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - ssastry - https://phabricator.wikimedia.org/T90941#1072005 (10Dzahn) a:5ssastry>3Dzahn [23:18:14] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - milimetric - https://phabricator.wikimedia.org/T90956#1072009 (10Dzahn) a:5Milimetric>3RobH [23:19:39] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1072013 (10Dzahn) [23:19:39] 6operations, 6Security: define in Puppet or remove user account - jdlrobson - https://phabricator.wikimedia.org/T90928#1072011 (10Dzahn) 5Resolved>3Open @RobH well, but the user still exists, we should delete it before resolving [23:21:04] (03PS18) 10Andrew Bogott: Add class and role for Openstack Horizon [puppet] - 10https://gerrit.wikimedia.org/r/170340 [23:22:27] (03PS1) 10Springle: Deploy dbproxy1004 to m4 [puppet] - 10https://gerrit.wikimedia.org/r/193284 [23:26:13] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1072025 (10RobH) note: i simply moved folks from the unknown to the known good to delete section, i did NOT manually delete anyone at this time. (Im fine w... [23:26:22] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1072026 (10Dzahn) [23:28:11] 6operations, 10Wikimedia-Blog, 5Patch-For-Review: add techblog.wikimedia.org redirection to blog.wikimedia.org to redirects - https://phabricator.wikimedia.org/T90638#1072035 (10RobH) 5Open>3Resolved The changes are live and the techblog > blog rewrite is working just fine. resolving task [23:28:49] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1072038 (10Dzahn) note: i edited the task description and added: 1.5 If the user replies and says they don't need the access, manually remove the user. be... [23:29:39] 6operations, 10ops-codfw, 10hardware-requests, 3wikis-in-codfw: Procure and setup rdb2001-2004 - ETA 2015-03-09 - https://phabricator.wikimedia.org/T86896#1072059 (10RobH) [23:31:33] damned phab dash makes it so every single tiket guilts me when i willfully ignore it ;D [23:33:26] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1072070 (10Dzahn) [23:33:27] 6operations, 6Security: define in Puppet or remove user account - jdlrobson - https://phabricator.wikimedia.org/T90928#1072068 (10Dzahn) 5Open>3Resolved deleted from gadolinium and oxygen [23:34:07] (03CR) 10Springle: [C: 032] Deploy dbproxy1004 to m4 [puppet] - 10https://gerrit.wikimedia.org/r/193284 (owner: 10Springle) [23:34:40] 6operations, 10Wikimedia-Logstash, 10hardware-requests: purchase 3 additional logstash nodes - https://phabricator.wikimedia.org/T89402#1072071 (10RobH) There is currently ongoing discussion on https://rt.wikimedia.org/Ticket/Display.html?id=9199 in regards to the disks being SATA or SAS. [23:35:52] 6operations, 10Wikimedia-Logstash, 10hardware-requests: purchase 3 additional logstash nodes - https://phabricator.wikimedia.org/T89402#1072072 (10RobH) 5stalled>3Resolved Argh, this is kind of redundant to its parent task, so resolving this for trackign on that. [23:35:53] 6operations, 10Wikimedia-Logstash, 10hardware-requests: Production hardware for Logstash service - https://phabricator.wikimedia.org/T84958#1072074 (10RobH) [23:36:19] 6operations, 10Wikimedia-Logstash, 10hardware-requests: Production hardware for Logstash service - https://phabricator.wikimedia.org/T84958#934590 (10RobH) There is currently ongoing discussion on https://rt.wikimedia.org/Ticket/Display.html?id=9199 in regards to the disks being SATA or SAS. [23:36:56] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1072079 (10Dzahn) [23:38:02] 6operations, 10hardware-requests: codfw/eqiad: (1) eventlogging node (per site) - eqiad done, codfw in progress - https://phabricator.wikimedia.org/T90747#1072083 (10RobH) 5Open>3stalled p:5High>3Normal [23:38:46] 6operations: Procure and setup rbf2001-2002 - https://phabricator.wikimedia.org/T86897#1072090 (10RobH) [23:39:36] 6operations, 10Wikimedia-Logstash, 10hardware-requests: eqiad: (3) servers for logstash service - https://phabricator.wikimedia.org/T84958#1072102 (10RobH) 5Open>3stalled a:3RobH [23:45:58] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1072112 (10RobH) [23:47:07] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - cscott - https://phabricator.wikimedia.org/T90935#1072113 (10RobH) manually deleted user off oxygen, erbium, and gadolinium [23:48:15] 6operations, 6Security: define in Puppet or remove user account - nuria - https://phabricator.wikimedia.org/T90957#1072114 (10RobH) manually deluser nuria off of oxygen and gadolinium [23:48:17] (03PS1) 10BBlack: Add $template_name param to service_unit [puppet] - 10https://gerrit.wikimedia.org/r/193288 [23:48:38] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1072115 (10RobH) [23:49:16] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071208 (10RobH) [23:50:35] (03PS3) 10Andrew Bogott: Roughed in designate class [puppet] - 10https://gerrit.wikimedia.org/r/191471 [23:51:25] (03CR) 10jenkins-bot: [V: 04-1] Roughed in designate class [puppet] - 10https://gerrit.wikimedia.org/r/191471 (owner: 10Andrew Bogott) [23:51:41] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - smalyshev - https://phabricator.wikimedia.org/T90939#1072125 (10RobH) manually deluser off xenon.eqiad.wmnet: praseodymium.eqiad.wmnet: cerium.eqiad.wmnet: [23:51:57] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1072126 (10RobH) [23:52:33] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071208 (10RobH) [23:54:00] 6operations, 6Security: define in Puppet or remove user account - jamesur - https://phabricator.wikimedia.org/T90930#1072135 (10RobH) user manually removed from oxygen and gadolinium [23:54:17] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1072138 (10RobH) [23:55:38] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1071208 (10RobH) [23:56:46] (03PS1) 10Springle: Use dbproxy1004 for m4. [dns] - 10https://gerrit.wikimedia.org/r/193295 [23:57:01] (03PS4) 10Andrew Bogott: Roughed in designate class [puppet] - 10https://gerrit.wikimedia.org/r/191471 [23:57:09] (03CR) 10Springle: [C: 032] Use dbproxy1004 for m4. [dns] - 10https://gerrit.wikimedia.org/r/193295 (owner: 10Springle) [23:58:04] jouncebot, next [23:58:05] In 0 hour(s) and 1 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150227T0000) [23:58:28] Do we have polybuildr? [23:59:05] Krenair: i can stand in for him, if that's okay [23:59:17] 7Puppet, 10Continuous-Integration, 5Patch-For-Review: Puppet is causing changed/added files in 'slave-scripts' git::clone on integration slaves in labs to become root read-only - https://phabricator.wikimedia.org/T87843#1072143 (10Krinkle) Today I re-created all integration slaves and the same problem pops u... [23:59:20] I think that change needs a submodule update [23:59:24] 7Puppet, 10Continuous-Integration, 5Patch-For-Review: Puppet is causing changed/added files in 'slave-scripts' git::clone on integration slaves in labs to become root read-only - https://phabricator.wikimedia.org/T87843#1072144 (10Krinkle) 5Resolved>3Open [23:59:25] yes [23:59:34] do you want to make that? [23:59:42] i can't merge in wmf branches [23:59:45] 7Puppet, 10Continuous-Integration: Puppet is causing changed/added files in 'slave-scripts' git::clone on integration slaves in labs to become root read-only - https://phabricator.wikimedia.org/T87843#1000457 (10Krinkle) [23:59:49] unless you want me to submit a bump to master [23:59:53] (of Vector)