[00:01:09] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "no change, just about the proper role structure" [puppet] - 10https://gerrit.wikimedia.org/r/260607 (owner: 10Dzahn)
[00:07:06] <logmsgbot>	 !log bd808@tin Synchronized php-1.27.0-wmf.11/includes/session/CookieSessionProvider.php: https://gerrit.wikimedia.org/r/#/c/265871/ (duration: 00m 25s)
[00:07:11] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:08:02] <logmsgbot>	 !log bd808@tin Synchronized php-1.27.0-wmf.11/extensions/CentralAuth/includes/session/CentralAuthSessionProvider.php: https://gerrit.wikimedia.org/r/#/c/265872/ (duration: 00m 25s)
[00:08:07] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:11:51] <grrrit-wm>	 (03PS2) 10Dzahn: debdeploy: move role to modules/role/ [puppet] - 10https://gerrit.wikimedia.org/r/260609 
[00:12:19] <greg-g>	 so, I have to run to the bus shortly, what's the status bd808 ? That graph still seems to be climbing
[00:12:23] <grrrit-wm>	 (03PS3) 10Dzahn: debdeploy: move role to modules/role/ [puppet] - 10https://gerrit.wikimedia.org/r/260609 
[00:12:49] <bd808>	 greg-g: this one? -- https://grafana.wikimedia.org/dashboard/db/authentication-metrics?panelId=13&fullscreen
[00:12:57] <greg-g>	 yah
[00:13:04] <bd808>	 it looks to have dropped to 0 to me
[00:13:13] <greg-g>	 give it a minute
[00:13:17] <greg-g>	 it does that, right?
[00:13:28] <bd808>	 bah
[00:13:28] <greg-g>	 oh, it actually might have?
[00:13:36] <grrrit-wm>	 (03PS1) 10Andrew Bogott: Save fewer apache logs on silver. [puppet] - 10https://gerrit.wikimedia.org/r/265877 
[00:14:08] * greg-g waits
[00:14:17] <bd808>	 no I think it's fubar
[00:14:31] <bd808>	 why would it keep growing faster and faster?
[00:14:57] <greg-g>	 https://grafana.wikimedia.org/dashboard/db/authentication-metrics?from=1453507181910&to=1453508081910&var-entrypoint=*
[00:15:07] <greg-g>	 it might have worked?
[00:15:07] <grrrit-wm>	 (03CR) 10Dzahn: [C: 031] "besides disk space this is also cool for https://phabricator.wikimedia.org/tag/audits-data-retention/" [puppet] - 10https://gerrit.wikimedia.org/r/265877 (owner: 10Andrew Bogott)
[00:16:19] * bd808 keeps hitting reload
[00:16:26] <greg-g>	 ditto and it keeps staying down
[00:16:47] <greg-g>	 the last three time markers on my view are all 0
[00:17:40] <greg-g>	 ok, I have to run, but, I'm going to assume this addressed the vast majority of issues (and didn't just mask them).
[00:18:16] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "no change, just renaming class" [puppet] - 10https://gerrit.wikimedia.org/r/260609 (owner: 10Dzahn)
[00:18:23] <grrrit-wm>	 (03PS5) 10Yuvipanda: toollabs: Move bigbrother to services nodes [puppet] - 10https://gerrit.wikimedia.org/r/265193 (https://phabricator.wikimedia.org/T123873) 
[00:21:51] <grrrit-wm>	 (03PS2) 10Dzahn: graphite: move roles to modules/role/ [puppet] - 10https://gerrit.wikimedia.org/r/260941 
[00:22:51] <bd808>	 fuck.
[00:23:24] <bd808>	 it just jumped back to the rate. graphite is just lagging like hell getting the data
[00:24:33] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 031] Save fewer apache logs on silver. [puppet] - 10https://gerrit.wikimedia.org/r/265877 (owner: 10Andrew Bogott)
[00:26:37] <mobrovac>	 hm, people seem to be complaining about losing edits after previewing and pressing the back button
[00:26:38] <mobrovac>	 https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)#Losing_edits
[00:28:38] <tgr>	 bd808 i dont think the graph is a big deal
[00:28:56] <tgr>	 its driven by ten bots
[00:29:11] <grrrit-wm>	 (03PS3) 10Dzahn: graphite: move roles to modules/role/ [puppet] - 10https://gerrit.wikimedia.org/r/260941 
[00:29:32] <tgr>	 people failong to log in is the scary issue
[00:31:46] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] toollabs: Move bigbrother to services nodes [puppet] - 10https://gerrit.wikimedia.org/r/265193 (https://phabricator.wikimedia.org/T123873) (owner: 10Yuvipanda)
[00:31:56] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "no diff on all 3 hosts http://puppet-compiler.wmflabs.org/1652/" [puppet] - 10https://gerrit.wikimedia.org/r/260941 (owner: 10Dzahn)
[00:32:09] <grrrit-wm>	 (03PS4) 10Dzahn: graphite: move roles to modules/role/ [puppet] - 10https://gerrit.wikimedia.org/r/260941 
[00:35:43] <bd808>	 tgr: the gross rate of logins looks nominally fine
[00:39:46] <grrrit-wm>	 (03PS4) 10Dzahn: mha: move roles to module/role/ [puppet] - 10https://gerrit.wikimedia.org/r/260694 
[00:40:31] <grrrit-wm>	 (03PS1) 10Yuvipanda: toollabs: Install package that bigbrother needs [puppet] - 10https://gerrit.wikimedia.org/r/265881 
[00:42:37] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "no change, but also: nothing uses this. not in prod and also not in labs (checked site.pp and "watroles")" [puppet] - 10https://gerrit.wikimedia.org/r/260694 (owner: 10Dzahn)
[00:45:02] <YuviPanda>	 mutante: it's used by role/coredb
[00:45:18] <YuviPanda>	 shouldn't affect it I guess, but definitely not nothing uses it
[00:45:19] <mutante>	 YuviPanda: yea, i just looked at the same thing
[00:45:23] <YuviPanda>	 yeah
[00:45:26] <YuviPanda>	 seems unaffected by this change
[00:45:35] <grrrit-wm>	 (03PS2) 10Yuvipanda: toollabs: Install package that bigbrother needs [puppet] - 10https://gerrit.wikimedia.org/r/265881 
[00:45:36] <mutante>	 yes, i'm making double sure
[00:45:42] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] toollabs: Install package that bigbrother needs [puppet] - 10https://gerrit.wikimedia.org/r/265881 (owner: 10Yuvipanda)
[00:45:45] <mutante>	 thanks for the heads up
[00:45:46] <YuviPanda>	 mutante: cool
[00:47:04] <grrrit-wm>	 (03PS2) 10Yuvipanda: toollabs: Move updatetools to run on services host [puppet] - 10https://gerrit.wikimedia.org/r/265195 (https://phabricator.wikimedia.org/T123873) 
[00:47:15] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] toollabs: Move updatetools to run on services host [puppet] - 10https://gerrit.wikimedia.org/r/265195 (https://phabricator.wikimedia.org/T123873) (owner: 10Yuvipanda)
[00:48:05] <grrrit-wm>	 (03CR) 10Tim Landscheidt: "Eh, /you/ introduced that failure just shy of three /days/ ago with Ieb5f4ac69150a944ef12cd74a22076500f456768in :-)." [puppet] - 10https://gerrit.wikimedia.org/r/265881 (owner: 10Yuvipanda)
[00:48:35] <grrrit-wm>	 (03CR) 10Dzahn: "nevermind, actually mha::node is used by role coredb via the "common" clas, but it's still noop .. double checked on db1024" [puppet] - 10https://gerrit.wikimedia.org/r/260694 (owner: 10Dzahn)
[00:49:47] <grrrit-wm>	 (03CR) 10Yuvipanda: "Well, installing the whole world instead of just the package needed.... :)" [puppet] - 10https://gerrit.wikimedia.org/r/265881 (owner: 10Yuvipanda)
[00:50:57] <mutante>	 do you know about "ceilometer" ?
[00:51:13] <grrrit-wm>	 (03PS1) 10Yuvipanda: toollabs: Fix missing parameter for updatetools [puppet] - 10https://gerrit.wikimedia.org/r/265883 
[00:51:28] <YuviPanda>	 mutante: I know andrewbogott was playing with it back in the day
[00:51:30] <mutante>	 openstack::ceilometer::controller is that on an actual host?
[00:51:38] <mutante>	 ok, thanks
[00:52:49] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] toollabs: Fix missing parameter for updatetools [puppet] - 10https://gerrit.wikimedia.org/r/265883 (owner: 10Yuvipanda)
[00:55:24] <grrrit-wm>	 (03PS3) 10Dzahn: mattermost: move role to modules/role/ [puppet] - 10https://gerrit.wikimedia.org/r/260605 
[00:56:19] <grrrit-wm>	 (03PS1) 10Yuvipanda: toollabs: Add package needed for updatetools [puppet] - 10https://gerrit.wikimedia.org/r/265884 
[00:56:26] <mutante>	 YuviPanda: i'd like to change the mattermost role slightly, just the name of the class, not the contents. i would also fix the instance config , already found it :)
[00:56:44] <YuviPanda>	 mutante: \o/ feel free to :D
[00:56:48] <YuviPanda>	 thank you :)
[00:56:54] <mutante>	 ;k cool
[00:57:01] <YuviPanda>	 mutante: another thing to change
[00:57:04] <YuviPanda>	 if changing class names
[00:57:08] <YuviPanda>	 is to look for hiera configs
[00:57:15] <YuviPanda>	 by searching the Hiera: namespace on wikitech
[00:57:18] <YuviPanda>	 people might be using it
[00:57:31] <YuviPanda>	 the change from wdq-mm to wdq_mm broke the load balancer because the hiera config wasn't changed, for example
[00:57:35] <mutante>	 that's a good point, i checked hieradata but not the wiki page
[00:57:39] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032] toollabs: Add package needed for updatetools [puppet] - 10https://gerrit.wikimedia.org/r/265884 (owner: 10Yuvipanda)
[00:57:46] <mutante>	 ok, i'll check 
[00:57:55] <YuviPanda>	 thanks
[01:01:10] <grrrit-wm>	 (03PS2) 10Yuvipanda: toollabs: Remove unneeded inheritance in checker role [puppet] - 10https://gerrit.wikimedia.org/r/265197 
[01:01:17] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] toollabs: Remove unneeded inheritance in checker role [puppet] - 10https://gerrit.wikimedia.org/r/265197 (owner: 10Yuvipanda)
[01:01:55] <grrrit-wm>	 (03PS2) 10Yuvipanda: toollabs: Remove inheritance from services role [puppet] - 10https://gerrit.wikimedia.org/r/265198 
[01:02:01] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] toollabs: Remove inheritance from services role [puppet] - 10https://gerrit.wikimedia.org/r/265198 (owner: 10Yuvipanda)
[01:04:37] <mutante>	 searched hiera namespace, nothing i changed affected. the graphite classes are used but those are not renamed, just moved around to the new place
[01:05:01] <mutante>	 yay for killing inheritance
[01:05:41] <grrrit-wm>	 (03PS4) 10Dzahn: mattermost: move role to modules/role/ [puppet] - 10https://gerrit.wikimedia.org/r/260605 
[01:07:17] <greg-g>	 FYI: we're rolling back to wmf.10
[01:07:19] <YuviPanda>	 mutante: yeah, I've a series of patches that kill it from the roles
[01:09:42] <wikibugs>	 7Blocked-on-Operations, 6operations, 10Wikimedia-General-or-Unknown: Invalidate all users sessions - https://phabricator.wikimedia.org/T124440#1958541 (10Legoktm)
[01:09:45] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] "i checked which instance is using this (mattermost-02) and adjusting the instance config. added new puppet group" [puppet] - 10https://gerrit.wikimedia.org/r/260605 (owner: 10Dzahn)
[01:10:08] <grrrit-wm>	 (03PS1) 10Dereckson: Namespace configuration on cu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265885 (https://phabricator.wikimedia.org/T123654) 
[01:11:26] <grrrit-wm>	 (03PS1) 10BryanDavis: Revert all wikis to 1.27.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265886 
[01:12:35] <mutante>	 YuviPanda: done. the instance is configured to use the new name. no change.. it's called "mattermost::server" now
[01:12:45] <YuviPanda>	 awesome
[01:12:48] <YuviPanda>	 thanks
[01:13:07] <bd808>	 Reedy: can you give https://gerrit.wikimedia.org/r/#/c/265886/ a look to make sure I didn't mess something up?
[01:14:08] <greg-g>	 eyeballing it it looks right :P
[01:14:31] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 032] "sadness" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265886 (owner: 10BryanDavis)
[01:14:43] <OuKB>	 bd808, are you reverting due to that preview bug?
[01:14:54] <grrrit-wm>	 (03Merged) 10jenkins-bot: Revert all wikis to 1.27.0-wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265886 (owner: 10BryanDavis)
[01:14:56] <Reedy>	 nope, people are being logged in again
[01:15:05] <OuKB>	 hrm
[01:15:10] <OuKB>	 also...
[01:16:20] <logmsgbot>	 !log bd808@tin rebuilt wikiversions.php and synchronized wikiversions files: Revert all wikis to 1.27.0-wmf.10
[01:16:26] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[01:16:41] <grrrit-wm>	 (03PS2) 10Yuvipanda: toollabs: Remove role inheritance from gridengine shadow [puppet] - 10https://gerrit.wikimedia.org/r/265208 
[01:16:43] <grrrit-wm>	 (03PS2) 10Yuvipanda: toollabs: Remove inheritance in role from compute [puppet] - 10https://gerrit.wikimedia.org/r/265204 
[01:16:45] <grrrit-wm>	 (03PS2) 10Yuvipanda: toollabs: Remove inheritance from mailrelay [puppet] - 10https://gerrit.wikimedia.org/r/265205 
[01:16:47] <grrrit-wm>	 (03PS2) 10Yuvipanda: toollabs: Move toolwatcher to services [puppet] - 10https://gerrit.wikimedia.org/r/265206 (https://phabricator.wikimedia.org/T123873) 
[01:16:49] <grrrit-wm>	 (03PS2) 10Yuvipanda: toollabs: Remove inheritance from gridengine master role [puppet] - 10https://gerrit.wikimedia.org/r/265207 
[01:16:51] <grrrit-wm>	 (03PS2) 10Yuvipanda: tools: Remove role inheritance from static hosts [puppet] - 10https://gerrit.wikimedia.org/r/265202 
[01:16:53] <grrrit-wm>	 (03PS2) 10Yuvipanda: toollabs: Remove inheritance from bastion role [puppet] - 10https://gerrit.wikimedia.org/r/265203 
[01:17:06] <wikibugs>	 6operations, 10Wikimedia-General-or-Unknown: Connection to Wikimedia projects slow/timing out for some users - https://phabricator.wikimedia.org/T124417#1958565 (10TTO)
[01:18:53] <greg-g>	 alright, tgr, legoktm, bd808, I'm going afk now. Thanks for all the effort on this, sorry it ended this way.
[01:18:58] <icinga-wm>	 PROBLEM - Apache HTTP on mw2020 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 689 bytes in 0.103 second response time
[01:19:23] <bd808>	 the fight isn't over. we're just going to take a break
[01:19:27] <bd808>	 :)
[01:20:08] <icinga-wm>	 PROBLEM - HHVM rendering on mw2020 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 689 bytes in 0.114 second response time
[01:21:04] <andrewbogott>	 YuviPanda, mutante, I never actually rolled ceilometer out anyplace.  If the class is in the way you can remove it.
[01:22:16] <mutante>	 andrewbogott: not re-move, just move :)
[01:22:18] <mutante>	 thanks
[01:22:29] <grrrit-wm>	 (03PS2) 10Andrew Bogott: Save fewer apache logs on silver. [puppet] - 10https://gerrit.wikimedia.org/r/265877 
[01:22:47] <greg-g>	 bd808: :) a deserved break, but yes
[01:23:02] * greg-g waves
[01:23:10] <bd808>	 night greg-g 
[01:23:25] * bd808 keeps starting at fatalmonitor for a while longer
[01:23:47] <grrrit-wm>	 (03PS3) 10Yuvipanda: toollabs: Remove role inheritance from static hosts [puppet] - 10https://gerrit.wikimedia.org/r/265202 
[01:23:48] <greg-g>	 -t
[01:23:49] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Save fewer apache logs on silver. [puppet] - 10https://gerrit.wikimedia.org/r/265877 (owner: 10Andrew Bogott)
[01:23:58] <grrrit-wm>	 (03PS2) 10Dzahn: ceilometer: move roles to modules/role/ [puppet] - 10https://gerrit.wikimedia.org/r/260608 
[01:24:00] <grrrit-wm>	 (03PS4) 10Yuvipanda: toollabs: Remove role inheritance from static hosts [puppet] - 10https://gerrit.wikimedia.org/r/265202 
[01:24:06] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] toollabs: Remove role inheritance from static hosts [puppet] - 10https://gerrit.wikimedia.org/r/265202 (owner: 10Yuvipanda)
[01:24:19] <grrrit-wm>	 (03PS3) 10Yuvipanda: toollabs: Remove inheritance from bastion role [puppet] - 10https://gerrit.wikimedia.org/r/265203 
[01:24:27] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] toollabs: Remove inheritance from bastion role [puppet] - 10https://gerrit.wikimedia.org/r/265203 (owner: 10Yuvipanda)
[01:24:40] <grrrit-wm>	 (03PS3) 10Yuvipanda: toollabs: Remove inheritance in role from compute [puppet] - 10https://gerrit.wikimedia.org/r/265204 
[01:24:47] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] toollabs: Remove inheritance in role from compute [puppet] - 10https://gerrit.wikimedia.org/r/265204 (owner: 10Yuvipanda)
[01:24:58] <grrrit-wm>	 (03PS3) 10Yuvipanda: toollabs: Remove inheritance from mailrelay [puppet] - 10https://gerrit.wikimedia.org/r/265205 
[01:25:04] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] toollabs: Remove inheritance from mailrelay [puppet] - 10https://gerrit.wikimedia.org/r/265205 (owner: 10Yuvipanda)
[01:25:16] <grrrit-wm>	 (03PS3) 10Yuvipanda: toollabs: Move toolwatcher to services [puppet] - 10https://gerrit.wikimedia.org/r/265206 (https://phabricator.wikimedia.org/T123873) 
[01:25:46] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] toollabs: Move toolwatcher to services [puppet] - 10https://gerrit.wikimedia.org/r/265206 (https://phabricator.wikimedia.org/T123873) (owner: 10Yuvipanda)
[01:25:55] <bd808>	 shit shit shit
[01:26:00] <bd808>	 need to scap
[01:26:29] * YuviPanda is here (and pageable if not) in case anything needs people with root
[01:27:25] <bd808>	 Reedy: l10ncache is gone for 10 on at least some servers. SHould I roll back to 11 quickly, rebuild l10n on 10 and the roll again?
[01:27:35] <Reedy>	 bd808: ugh
[01:27:39] <bd808>	 No idea how long a full scap will take
[01:27:43] <Reedy>	 I'd be inclined to say yes
[01:28:34] * bd808 hot patches that shit
[01:28:38] <logmsgbot>	 !log bd808@tin rebuilt wikiversions.php and synchronized wikiversions files: Temporarily back to 1.27.0-wmf11; need to rebuild l10n cache
[01:28:43] <MaxSem>	 shall we keep an oldcrapwiki around specifically for this kind of reverts? :P
[01:28:43] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[01:29:01] <Reedy>	 Considering we don't seem to delete old l10n caches for ages...
[01:29:03] <Reedy>	 Usually
[01:29:09] <Reedy>	 I'm surprise .10 has gone so quickly
[01:29:25] <grrrit-wm>	 (03PS3) 10Yuvipanda: toollabs: Remove inheritance from gridengine master role [puppet] - 10https://gerrit.wikimedia.org/r/265207 
[01:29:26] <bd808>	 the error rate isn't high enough to be everywhere
[01:29:36] <bd808>	 must just be some servers
[01:29:37] <icinga-wm>	 RECOVERY - Apache HTTP on mw2020 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 454 bytes in 0.114 second response time
[01:29:45] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] toollabs: Remove inheritance from gridengine master role [puppet] - 10https://gerrit.wikimedia.org/r/265207 (owner: 10Yuvipanda)
[01:29:54] <Reedy>	 hmm
[01:29:57] <bd808>	 ah, only mw2020
[01:29:59] <grrrit-wm>	 (03PS3) 10Yuvipanda: toollabs: Remove role inheritance from gridengine shadow [puppet] - 10https://gerrit.wikimedia.org/r/265208 
[01:30:01] <Reedy>	 the upstream files *are* on tin
[01:30:01] <bd808>	 ffs
[01:30:07] <Reedy>	 which was down earlier?
[01:30:07] <grrrit-wm>	 (03CR) 10Yuvipanda: [C: 032 V: 032] toollabs: Remove role inheritance from gridengine shadow [puppet] - 10https://gerrit.wikimedia.org/r/265208 (owner: 10Yuvipanda)
[01:30:12] <Reedy>	 did no one run sync-common before repooling?
[01:30:57] <icinga-wm>	 RECOVERY - HHVM rendering on mw2020 is OK: HTTP OK: HTTP/1.1 200 OK - 65203 bytes in 0.267 second response time
[01:31:15] <Reedy>	 or, I guess... if the version wasn't in use, it wouldn't be rebuilt
[01:31:26] <bd808>	 why would a codfw server be getting enough traffic for me to notice?
[01:31:40] <Reedy>	 I think they moved more traffic to it
[01:32:50] <Reedy>	 sync-common and scap-rebuild-cdbs?
[01:33:21] <logmsgbot>	 !log bd808@tin rebuilt wikiversions.php and synchronized wikiversions files: Back to 1.27.0-wmf10 again after fixking l10n cache problems
[01:33:26] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[01:33:31] <bd808>	 Reedy: I jsut ran sync-common on mw2020
[01:33:38] <bd808>	 the right files showed up
[01:33:47] <Reedy>	 heh
[01:34:04] * bd808 does it again to be certain
[01:34:11] <grrrit-wm>	 (03PS3) 10Dzahn: ceilometer: move roles to modules/role/ [puppet] - 10https://gerrit.wikimedia.org/r/260608 
[01:34:59] * Reedy wonders why there is still a scap-1skins on tin
[01:35:14] <bd808>	 we never purged the old scripts
[01:35:16] <Reedy>	 must be some artefact, not on mira
[01:35:23] <bd808>	 they will die with the rebuild
[01:35:39] <icinga-wm>	 PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/).
[01:36:20] <grrrit-wm>	 (03PS1) 10Dereckson: Namespace configuration on ur.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265888 
[01:36:42] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Namespace configuration on ur.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265888 (owner: 10Dereckson)
[01:39:26] <wikibugs>	 6operations, 6Labs, 10Labs-Infrastructure: mail from testlabs to ops list - https://phabricator.wikimedia.org/T124516#1958596 (10Dzahn) 3NEW
[01:40:23] <grrrit-wm>	 (03CR) 10Dzahn: [C: 032] ceilometer: move roles to modules/role/ [puppet] - 10https://gerrit.wikimedia.org/r/260608 (owner: 10Dzahn)
[01:41:25] <grrrit-wm>	 (03PS2) 10Dereckson: Namespace configuration on ur.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265888 (https://phabricator.wikimedia.org/T122045) 
[01:45:01] <wikibugs>	 6operations, 5Patch-For-Review, 7Technical-Debt: Retire Torrus - https://phabricator.wikimedia.org/T87840#1958610 (10Dzahn) @Danny_B This is not actually ready to be retired yet.
[01:46:45] <grrrit-wm>	 (03CR) 10Dzahn: "heh, i already removed myself from this very patch back in 2014.. i'm out :)" [puppet] - 10https://gerrit.wikimedia.org/r/130296 (owner: 10ArielGlenn)
[01:49:07] <grrrit-wm>	 (03CR) 10Dzahn: "@_joe_ can we have your opinion here if this makes sense or should use etcd ?" [puppet] - 10https://gerrit.wikimedia.org/r/247324 (https://phabricator.wikimedia.org/T86644) (owner: 10Chad)
[01:49:49] <grrrit-wm>	 (03CR) 10Dzahn: [C: 04-1] "formal -1 because of the last comment above" [puppet] - 10https://gerrit.wikimedia.org/r/220085 (owner: 10Alexandros Kosiaris)
[01:50:07] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There are 3 unmerged changes in puppet (dir /var/lib/git/operations/puppet).
[01:50:49] <icinga-wm>	 PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 3 unmerged changes in puppet (dir /var/lib/git/operations/puppet).
[01:51:24] <YuviPanda>	 mutante: I picked up yours too
[01:52:03] <mutante>	 YuviPanda: oh, yes please, did the same earlier with yours
[01:52:17] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge.
[01:52:58] <icinga-wm>	 RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge.
[01:54:50] <YuviPanda>	 mutante: thanks :D
[01:54:59] <YuviPanda>	 I think I'm done merging stuff for today
[01:55:01] <YuviPanda>	 hmm
[01:59:09] <mutante>	 YuviPanda: me too, i think 6pm Friday is deadline ::)
[01:59:13] <mutante>	 have a nice weekend
[01:59:56] <grrrit-wm>	 (03PS1) 10Dereckson: Namespace configuration for wuu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265891 (https://phabricator.wikimedia.org/T124389) 
[02:00:05] <mutante>	  /quit beer'o'clock
[02:00:17] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Namespace configuration for wuu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265891 (https://phabricator.wikimedia.org/T124389) (owner: 10Dereckson)
[02:01:12] <wikibugs>	 6operations, 7Mail: consolidate mailman redirects in exim aliases file - https://phabricator.wikimedia.org/T123581#1958627 (10Dzahn) >>! In T123581#1934528, @faidon wrote: > - I've never heard of sec-ops  Me neither, removed that one.  > I don't think anyone has used ops@wikimedia.org, ops-private@wikimedia.or...
[02:05:33] <grrrit-wm>	 (03PS2) 10Dereckson: Namespace configuration for wuu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265891 (https://phabricator.wikimedia.org/T124389) 
[02:05:57] <grrrit-wm>	 (03CR) 10Dereckson: "PS: NS_MODULE → 828" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265891 (https://phabricator.wikimedia.org/T124389) (owner: 10Dereckson)
[02:09:39] <grrrit-wm>	 (03PS1) 10Dereckson: Remove Tranwiki namespace on wuu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265892 (https://phabricator.wikimedia.org/T124389) 
[02:14:06] <grrrit-wm>	 (03PS1) 10Dereckson: Add Portal namespace on wwu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265893 (https://phabricator.wikimedia.org/T124389) 
[02:18:08] <icinga-wm>	 RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge.
[02:25:12] <logmsgbot>	 !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.10) (duration: 09m 09s)
[02:25:18] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:32:16] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Jan 23 02:32:15 UTC 2016 (duration 7m 3s)
[02:32:21] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:33:35] <wikibugs>	 6operations, 10Wiki-Loves-Monuments-General, 10Wikimedia-DNS, 5Patch-For-Review, 7domains: point wikilovesmonument.org ns to wmf - https://phabricator.wikimedia.org/T118468#1958689 (10JanZerebecki) Yes, @faidon convinced me during the dev summit that it is much less work to have these on selected registr...
[02:38:26] <grrrit-wm>	 (03PS1) 10Dereckson: Namespaces configuration on sk.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265896 (https://phabricator.wikimedia.org/T122175) 
[02:48:22] <grrrit-wm>	 (03PS2) 10Subramanya Sastry: parsoid-testing: rename classes with dashes [puppet] - 10https://gerrit.wikimedia.org/r/265873 (https://phabricator.wikimedia.org/T93645) (owner: 10Dzahn)
[02:48:24] <grrrit-wm>	 (03PS2) 10Subramanya Sastry: Clone the 'ruthenium' branch of testreduce and visualdiff [puppet] - 10https://gerrit.wikimedia.org/r/265856 
[02:48:26] <grrrit-wm>	 (03PS2) 10Subramanya Sastry: nginx conf that routes requests to different services on ruthenium [puppet] - 10https://gerrit.wikimedia.org/r/265863 
[02:56:38] <icinga-wm>	 PROBLEM - puppet last run on ms-be2016 is CRITICAL: CRITICAL: puppet fail
[03:22:17] <icinga-wm>	 RECOVERY - puppet last run on ms-be2016 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures
[03:56:37] <icinga-wm>	 PROBLEM - puppet last run on mw2053 is CRITICAL: CRITICAL: Puppet has 1 failures
[04:22:08] <icinga-wm>	 RECOVERY - puppet last run on mw2053 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[04:49:18] <grrrit-wm>	 (03CR) 10Yuvipanda: "Needs manual rebase." [puppet] - 10https://gerrit.wikimedia.org/r/263230 (owner: 10Tim Landscheidt)
[04:50:26] <grrrit-wm>	 (03CR) 10Yuvipanda: "hmm, says this too needs a manual rebase?" [puppet] - 10https://gerrit.wikimedia.org/r/263380 (owner: 10Tim Landscheidt)
[05:39:21] <ebernhardson>	 do lvs's have some sort of firewall on them? Having an odd issue where stat1002 can hit elastic1001.eqiad.wmnet:9200/_cat but not search.svc.eqiad.wmnet:9200
[05:39:35] <ebernhardson>	 search.svc.eqiad.wmnet just times out
[05:44:50] <YuviPanda>	 ebernhardson: I can't hit /_cat either
[05:44:53] <YuviPanda>	 times out for me
[05:45:01] <YuviPanda>	 which is consistent with the analytics VLAN firewall being present
[05:45:07] <YuviPanda>	 unless that hole already was opened up
[05:45:13] <ebernhardson>	 he thing is, akosairis opened the hole for me
[05:45:23] <ebernhardson>	 but maybe he didn't open it to the LVS as well? actually that would make sense
[05:45:48] <YuviPanda>	 that's possible
[05:45:49] <ebernhardson>	 (the hole being open is why elastic1001:9200/_cat works directly)
[05:45:53] <YuviPanda>	 that the hole wasn't opened fully
[05:45:55] <YuviPanda>	 ah, right
[05:46:00] <YuviPanda>	 I didn't see your original question properly
[05:46:03] <YuviPanda>	 and yeah that'd make sense
[05:46:05] <ebernhardson>	 i'll note it on the ticket
[05:46:08] <YuviPanda>	 yeah
[05:47:31] <wikibugs>	 7Blocked-on-Operations, 6operations, 6Discovery, 3Discovery-Search-Sprint: Make elasticsearch cluster accessible from analytics hadoop workers - https://phabricator.wikimedia.org/T120281#1958810 (10EBernhardson) One other issue: The analytics servers can talk to elastic10{01..31}.eqiad.wmnet directly just...
[06:31:38] <icinga-wm>	 PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:38] <icinga-wm>	 PROBLEM - puppet last run on pc1006 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:58] <icinga-wm>	 PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:31:58] <icinga-wm>	 PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:31:59] <icinga-wm>	 PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: puppet fail
[06:31:59] <icinga-wm>	 PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:29] <icinga-wm>	 PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:38] <icinga-wm>	 PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:32:48] <icinga-wm>	 PROBLEM - puppet last run on mc2007 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:17] <icinga-wm>	 PROBLEM - puppet last run on mw2021 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:47:27] <grrrit-wm>	 (03PS1) 10Alex Monk: admin: Replace my prod yubikey SSH key [puppet] - 10https://gerrit.wikimedia.org/r/265907 
[06:56:07] <icinga-wm>	 RECOVERY - puppet last run on mc2007 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures
[06:56:28] <icinga-wm>	 RECOVERY - puppet last run on mw2021 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures
[06:56:58] <icinga-wm>	 RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:56:59] <icinga-wm>	 RECOVERY - puppet last run on pc1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:18] <icinga-wm>	 RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:18] <icinga-wm>	 RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:27] <icinga-wm>	 RECOVERY - puppet last run on kafka1002 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures
[06:57:28] <icinga-wm>	 RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:57] <icinga-wm>	 RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:58:07] <icinga-wm>	 RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:20:47] <icinga-wm>	 PROBLEM - puppet last run on ms-be3003 is CRITICAL: CRITICAL: puppet fail
[07:48:07] <icinga-wm>	 PROBLEM - Host mw2173 is DOWN: PING CRITICAL - Packet loss = 100%
[07:48:28] <icinga-wm>	 RECOVERY - puppet last run on ms-be3003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[08:14:04] <wikibugs>	 6operations, 10OTRS, 7user-notice: Upgrade OTRS to a more recent stable release - https://phabricator.wikimedia.org/T74109#1958843 (10akosiaris) >>! In T74109#1956667, @Rjd0060 wrote: >>>! In T74109#1956329, @Ata wrote: >> OTRS is being [[ https://www.transifex.com/otrs/ | localised ]] on Transifex.  >> Am I...
[08:23:02] <wikibugs>	 6operations, 10Wikimedia-General-or-Unknown: Connection to Wikimedia projects slow/timing out for some users - https://phabricator.wikimedia.org/T124417#1958850 (10Cpiral) In T124510 db808 has backed out of wmf11 per these several incidents.
[08:33:48] <icinga-wm>	 PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 226, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-codfw:xe-5/2/1 (Telia, IC-307235, 34ms) {#2648} [10Gbps wave]BR
[08:48:38] <icinga-wm>	 RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 228, down: 0, dormant: 0, excluded: 0, unused: 0
[10:35:17] <icinga-wm>	 PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 760
[10:40:17] <icinga-wm>	 RECOVERY - check_mysql on db1008 is OK: Uptime: 327715 Threads: 2 Questions: 2804567 Slow queries: 2182 Opens: 1291 Flush tables: 2 Open tables: 401 Queries per second avg: 8.557 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[12:38:18] <icinga-wm>	 PROBLEM - High load average on labstore1001 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [24.0]
[12:44:38] <icinga-wm>	 RECOVERY - High load average on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0]
[12:50:47] <icinga-wm>	 PROBLEM - puppet last run on mw2127 is CRITICAL: CRITICAL: puppet fail
[13:13:05] <jynus>	 !log db1046 maintenance finished- restarting mysql to apply latest configuration
[13:13:08] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:14:28] <icinga-wm>	 PROBLEM - High load average on labstore1001 is CRITICAL: CRITICAL: 87.50% of data above the critical threshold [24.0]
[13:18:27] <icinga-wm>	 RECOVERY - puppet last run on mw2127 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[13:19:37] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1004 is CRITICAL: CRITICAL check_failover servers up 2 down 1
[13:25:44] <jynus>	 !log upgrading and restarting db1046
[13:25:47] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:29:18] <icinga-wm>	 RECOVERY - High load average on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0]
[13:36:38] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1004 is OK: OK check_failover servers up 2 down 0
[13:53:28] <icinga-wm>	 PROBLEM - haproxy failover on dbproxy1004 is CRITICAL: CRITICAL check_failover servers up 2 down 1
[13:53:54] <grrrit-wm>	 (03CR) 10Luke081515: [C: 031] Add Portal namespace on wwu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265893 (https://phabricator.wikimedia.org/T124389) (owner: 10Dereckson)
[13:54:47] <grrrit-wm>	 (03CR) 10Luke081515: [C: 031] Remove Tranwiki namespace on wuu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265892 (https://phabricator.wikimedia.org/T124389) (owner: 10Dereckson)
[13:55:48] <grrrit-wm>	 (03CR) 10Luke081515: [C: 031] Namespaces configuration on sk.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265896 (https://phabricator.wikimedia.org/T122175) (owner: 10Dereckson)
[14:10:37] <icinga-wm>	 RECOVERY - haproxy failover on dbproxy1004 is OK: OK check_failover servers up 2 down 0
[14:33:10] <grrrit-wm>	 (03CR) 10Florianschmidtwelzow: [C: 04-1] Add Portal namespace on wwu.wikipedia (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265893 (https://phabricator.wikimedia.org/T124389) (owner: 10Dereckson)
[14:34:07] <icinga-wm>	 PROBLEM - DPKG on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:34:27] <icinga-wm>	 PROBLEM - configured eth on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:34:28] <icinga-wm>	 PROBLEM - dhclient process on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:34:38] <icinga-wm>	 PROBLEM - RAID on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:34:39] <icinga-wm>	 PROBLEM - Disk space on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:35:08] <icinga-wm>	 PROBLEM - salt-minion processes on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:35:47] <icinga-wm>	 PROBLEM - puppet last run on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[14:52:41] <wikibugs>	 6operations, 6Labs, 10Labs-Infrastructure: mail from testlabs to ops list - https://phabricator.wikimedia.org/T124516#1959003 (10scfc) `/usr/local/sbin/puppetalert.py` sends mail to the project administrators.  One of those is (always?) [[https://wikitech.wikimedia.org/wiki/User:Novaadmin|`novaadmin`]] which...
[15:04:07] <icinga-wm>	 PROBLEM - SSH on alsafi is CRITICAL: Server answer
[15:06:09] <icinga-wm>	 RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0)
[15:59:17] <grrrit-wm>	 (03PS2) 10Dereckson: Add Portal namespace on wuu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265893 (https://phabricator.wikimedia.org/T124389) 
[16:09:10] <Luke081515>	 FlorianSW_: I guess you can remove your -1 now at https://gerrit.wikimedia.org/r/265893 ;)
[16:09:37] <grrrit-wm>	 (03CR) 10Florianschmidtwelzow: Add Portal namespace on wuu.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265893 (https://phabricator.wikimedia.org/T124389) (owner: 10Dereckson)
[16:09:40] <FlorianSW_>	 Luke081515: done :)
[16:15:08] <icinga-wm>	 PROBLEM - SSH on alsafi is CRITICAL: Server answer
[16:17:09] <icinga-wm>	 RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0)
[16:38:17] <icinga-wm>	 PROBLEM - SSH on alsafi is CRITICAL: Server answer
[16:41:18] <icinga-wm>	 PROBLEM - Outgoing network saturation on labstore1003 is CRITICAL: CRITICAL: 10.53% of data above the critical threshold [100000000.0]
[16:43:18] <icinga-wm>	 RECOVERY - Outgoing network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0]
[16:44:27] <icinga-wm>	 RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0)
[16:47:01] <Krinkle>	 !log mwscript deleteEqualMessages.php --wiki wowiki
[16:47:03] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[17:03:44] <Luke081515>	 ori: The job queue is growing again
[17:05:28] <icinga-wm>	 PROBLEM - SSH on alsafi is CRITICAL: Server answer
[17:07:50] <ori>	 Luke081515: it looks OK to me, it's at 105k items
[17:08:22] <ori>	 some fluctuation is normal
[17:08:32] <ori>	 the underlying issue is https://phabricator.wikimedia.org/T124418
[17:10:20] <wikibugs>	 6operations, 10MediaWiki-Cache, 10MediaWiki-JobQueue, 10MediaWiki-JobRunner, 10Traffic: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan - https://phabricator.wikimedia.org/T124418#1959059 (10Luke081515)
[17:20:30] <apergos>	 only 105k?  that's outstanding
[17:21:59] <icinga-wm>	 RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0)
[17:27:42] <Nemo_bis>	 Wow that was a quick recovery. :) https://grafana.wikimedia.org/dashboard/db/job-queue-health
[17:30:28] <icinga-wm>	 PROBLEM - SSH on alsafi is CRITICAL: Server answer
[17:33:57] <icinga-wm>	 PROBLEM - NTP on alsafi is CRITICAL: NTP CRITICAL: No response from NTP server
[17:34:20] <Luke081515|AFK>	 apergos: Now 128k :P
[17:34:34] <apergos>	 still virtually nothing though
[17:34:51] <Luke081515>	 yeah, that's right
[17:36:23] <wikibugs>	 6operations, 6Commons, 10MassMessage, 10MediaWiki-JobQueue, 5Patch-For-Review: Not all MassMessage sent - https://phabricator.wikimedia.org/T124441#1959068 (10Steinsplitter) >>! In T124441#1956519, @Legoktm wrote: > Eh, this is probably different: >  > legoktm@terbium:~$ ./jobs.sh commonswiki > MassMessa...
[17:45:08] <icinga-wm>	 RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0)
[17:51:18] <icinga-wm>	 PROBLEM - SSH on alsafi is CRITICAL: Server answer
[17:55:28] <icinga-wm>	 RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0)
[18:04:07] <icinga-wm>	 PROBLEM - SSH on alsafi is CRITICAL: Server answer
[18:06:08] <icinga-wm>	 RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0)
[18:09:44] <wikibugs>	 6operations: setup YubiHSM and laptop at office - https://phabricator.wikimedia.org/T123818#1959105 (10JKrauska) added you as a cc -- we can provide a older laptop this week.
[18:10:17] <grrrit-wm>	 (03PS1) 10EBernhardson: Add popularity_score field to cirrussearch indices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265927 
[18:12:14] <wikibugs>	 6operations: setup YubiHSM and laptop at office - https://phabricator.wikimedia.org/T123818#1959107 (10JKrauska) Can we keep the laptop locked up in the IT den?  This means it will only be able to be accessed during normal office hours, but also means that there's less chance of it being stolen.
[18:16:46] <Luke081515>	 andre__: Can you take a look at the daemons? Seems like the task daemon is not running
[18:19:45] <andre__>	 Luke081515, https://phabricator.wikimedia.org/daemon/
[18:20:18] <Luke081515>	 andre__ That's the problem, seems like only admins can see this. I just get "No data"
[18:20:29] <Luke081515>	 But when I edit a repo I get: Task Daemon Not RunningUse bin/phd start to start daemons. See Managing Daemons with phd.
[18:32:35] <andre__>	 Luke081515, oh, sorry, I didn't know that only admins can see that :(
[18:33:24] * andre__ wonders if he has sufficient permissions
[18:33:27] <icinga-wm>	 PROBLEM - SSH on alsafi is CRITICAL: Server answer
[18:34:16] <Luke081515>	 I don't know who has root access to the instace, where phabricator is installed, that's the problem... 
[18:34:37] <andre__>	 Luke081515: says that PhabricatorRepositoryPullLocalDaemon, PhabricatorTriggerDaemon and PhabricatorTaskmasterDaemon are running
[18:34:41] <andre__>	 on iridium
[18:34:56] <andre__>	 on the shell.
[18:35:17] <andre__>	 And ironically, https://phabricator.wikimedia.org/config/issue/ says they are not.
[18:35:27] <Luke081515>	 andre__: Strange. For example this shows other data: https://phabricator.wikimedia.org/diffusion/EMAI/edit/
[18:35:28] <icinga-wm>	 RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0)
[18:35:46] <andre__>	 Heh, now I don't have permissions to see that page :)
[18:35:47] <Luke081515>	 But the repos say, pull daemon is running, but not task daemon
[18:35:55] <Luke081515>	 You are not a repo admin?
[18:36:10] <andre__>	 https://phabricator.wikimedia.org/daemon/ does not show the task daemon either for me
[18:36:11] <Luke081515>	 Didn't noticed that yet
[18:36:35] <andre__>	 Ah. I'm sorry. Realizing that the first column on the shell is empty for PhabricatorTaskmasterDaemon
[18:36:37] <andre__>	 so indeed
[18:36:48] <Luke081515>	 I guess the "task daemon" is the same at PhabricatorTaskmasterDaemon
[18:37:08] <Luke081515>	 because at my private instance these three daemons are running, and repo-edit says, all ok
[18:37:35] <andre__>	 https://phabricator.wikimedia.org/daemon/log/ says that "PhabricatorTaskmasterDaemon exited cleanly" 64min ago
[18:37:46] <andre__>	 no idea what's wrong here :(
[18:39:14] <andre__>	 Luke081515, I now ran explicitly:  sudo ./bin/phd launch PhabricatorTaskmasterDaemon
[18:39:24] <Luke081515>	 great
[18:39:33] <Luke081515>	 "Task Daemon Running"
[18:40:08] <andre__>	 Hmm. But with a different "Overseer" ID.
[18:41:47] <icinga-wm>	 PROBLEM - SSH on alsafi is CRITICAL: Server answer
[18:42:25] <Luke081515>	 andre__: I guess phab has this problem more than once. Some time ago, the repo-edit shows "not running - running - not running - running"...
[18:43:19] <Luke081515>	 andre__: Maybe this is the job "PhabricatorApplicationTransactionPublishWorker". He has 14582 failures...
[18:45:57] <icinga-wm>	 RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0)
[18:47:59] <wikibugs>	 6operations, 10Wikimedia-General-or-Unknown: Connection to Wikimedia projects slow/timing out for some users - https://phabricator.wikimedia.org/T124417#1959135 (10Tgr) Probably a side effect of T124406?
[18:58:27] <icinga-wm>	 PROBLEM - SSH on alsafi is CRITICAL: Server answer
[18:59:10] <grrrit-wm>	 (03PS2) 10EBernhardson: Add popularity_score field to cirrussearch indices [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265927 
[18:59:12] <grrrit-wm>	 (03PS1) 10EBernhardson: [cirrus] Repoint more like this queries to codfw cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265934 
[18:59:23] <grrrit-wm>	 (03PS2) 10EBernhardson: [cirrus] Repoint more like this queries to codfw cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265934 
[18:59:49] <grrrit-wm>	 (03CR) 10EBernhardson: [C: 032] "50th percentile spiking from 80ms up to 500ms ... repointing some expensive queries to idle cluster" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265934 (owner: 10EBernhardson)
[19:00:29] <ebernhardson>	 !log repoint most expensive search queries (morelike) at codfw cluster to reduce load. 1/2 of eqiad cluster maxed on cpu
[19:00:32] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:01:22] <grrrit-wm>	 (03Merged) 10jenkins-bot: [cirrus] Repoint more like this queries to codfw cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265934 (owner: 10EBernhardson)
[19:02:24] <logmsgbot>	 !log ebernhardson@tin Synchronized php-1.27.0-wmf.10/extensions/CirrusSearch/: Support code for repointing morelike queries from eqiad to codfw (duration: 00m 30s)
[19:02:27] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:03:18] <wikibugs>	 6operations, 10Beta-Cluster-Infrastructure, 6Labs, 10Labs-Infrastructure: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1959150 (10Tgr) >>! In T50501#1951388, @faidon wrote: > Can someone repeat why we can't just flatten the beta hostnames and just...
[19:03:30] <logmsgbot>	 !log ebernhardson@tin Synchronized wmf-config/CirrusSearch-production.php: config change to repoint morelike search from eqiad to codfw (duration: 00m 26s)
[19:03:33] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[19:08:58] <icinga-wm>	 RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0)
[19:25:38] <icinga-wm>	 PROBLEM - SSH on alsafi is CRITICAL: Server answer
[20:01:43] <grrrit-wm>	 (03PS3) 10EBernhardson: Create new puppet group analytics-search-users [puppet] - 10https://gerrit.wikimedia.org/r/265795 (https://phabricator.wikimedia.org/T122620) 
[20:01:45] <grrrit-wm>	 (03PS1) 10EBernhardson: Add alert for elasticsearch 50th percentile prefix search time [puppet] - 10https://gerrit.wikimedia.org/r/265942 
[20:03:32] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Add alert for elasticsearch 50th percentile prefix search time [puppet] - 10https://gerrit.wikimedia.org/r/265942 (owner: 10EBernhardson)
[20:03:41] <grrrit-wm>	 (03PS2) 10EBernhardson: Add alert for elasticsearch 50th percentile prefix search time [puppet] - 10https://gerrit.wikimedia.org/r/265942 
[20:04:20] <wikibugs>	 6operations, 10OTRS, 7user-notice: Upgrade OTRS to a more recent stable release - https://phabricator.wikimedia.org/T74109#1959212 (10Base) Is there/Would there be a way to updating just translations? Just like Mediawiki gets its translations updated from TWN. It would be rather a pain to have to report here...
[20:04:47] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Add alert for elasticsearch 50th percentile prefix search time [puppet] - 10https://gerrit.wikimedia.org/r/265942 (owner: 10EBernhardson)
[20:07:20] <grrrit-wm>	 (03PS3) 10EBernhardson: Add alert for elasticsearch 50th percentile prefix search time [puppet] - 10https://gerrit.wikimedia.org/r/265942 
[20:37:19] <wikibugs>	 6operations, 10OTRS, 7user-notice: Upgrade OTRS to a more recent stable release - https://phabricator.wikimedia.org/T74109#1959235 (10pajz) To be honest, I don't see how that suddenly is a problem. How often did we have reports of minor translation mistakes? Off the top of my head, I'd guess maybe three, fou...
[20:57:18] <icinga-wm>	 PROBLEM - puppet last run on aqs1001 is CRITICAL: CRITICAL: puppet fail
[21:22:28] <icinga-wm>	 RECOVERY - puppet last run on aqs1001 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures
[21:38:18] <icinga-wm>	 PROBLEM - puppet last run on mw2180 is CRITICAL: CRITICAL: puppet fail
[22:05:37] <icinga-wm>	 RECOVERY - puppet last run on mw2180 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[23:19:37] <grrrit-wm>	 (03CR) 10Hashar: [C: 031] "Puppet part is fine. They might have it applied on labs instances though, havent checked." [puppet] - 10https://gerrit.wikimedia.org/r/265873 (https://phabricator.wikimedia.org/T93645) (owner: 10Dzahn)
[23:49:49] <wikibugs>	 7Blocked-on-Operations, 6operations: Re-pool restbase1007 - https://phabricator.wikimedia.org/T124565#1959465 (10GWicke) 3NEW