[00:00:18] RoanKattouw: looks good on testwiki [00:03:24] Awesome [00:03:27] Thanks for looking out for that [00:05:47] RoanKattouw: done? [00:06:20] Yup [00:06:22] Sorry [00:06:36] for going over [00:06:39] We noticed a patch we'd forgotten about [00:09:46] RoanKattouw: Did you deploy that config change for mobile? [00:09:57] kaldari: That was labs-only, right? [00:10:05] RoanKattouw: Yes [00:10:15] I +2ed it so it should have gone out automatically [00:10:20] cool [00:11:16] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [00:13:03] !log ori Synchronized wmf-config: Iae2e55a11: wmgUseBits: false for itwiki (duration: 00m 19s) [00:13:12] Logged the message, Master [00:13:51] RoanKattouw: err, you didn't sync wmf3? [00:16:07] ori: are you still deploying anything? [00:16:59] Ugh dang it sorry [00:17:01] I'll do wmf3 [00:17:19] He just wanted to do the one patch [00:17:24] legoPanda:, RoanKattouw: I'm done; you can do whatever [00:18:02] !log catrope Synchronized php-1.26wmf3/extensions/PageTriage/: SWAT (duration: 00m 30s) [00:18:09] Logged the message, Master [00:19:12] looks good :) [00:26:03] (03PS1) 10Andrew Bogott: Eureka! This should really disable snapshotting. [puppet] - 10https://gerrit.wikimedia.org/r/208047 [00:27:52] (03CR) 10Andrew Bogott: [C: 032] Eureka! This should really disable snapshotting. [puppet] - 10https://gerrit.wikimedia.org/r/208047 (owner: 10Andrew Bogott) [00:29:33] RoanKattouw, I did the Flow ones at 3 Pacific like the schedule says. [00:38:14] Sorry, I guess the schedule didn't say that. I had the wrong time on the wiki, but I did it when greg-g and I decided. [00:49:22] (03PS1) 10Yuvipanda: tools: Set charset of tools-static to utf-8 properly [puppet] - 10https://gerrit.wikimedia.org/r/208050 [00:49:31] (03PS2) 10Yuvipanda: tools: Set charset of tools-static to utf-8 properly [puppet] - 10https://gerrit.wikimedia.org/r/208050 [00:49:51] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Set charset of tools-static to utf-8 properly [puppet] - 10https://gerrit.wikimedia.org/r/208050 (owner: 10Yuvipanda) [00:53:06] PROBLEM - DPKG on labmon1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [00:54:37] RECOVERY - DPKG on labmon1001 is OK: All packages OK [01:27:27] (03PS1) 10Yuvipanda: Revert "tools: Set charset of tools-static to utf-8 properly" [puppet] - 10https://gerrit.wikimedia.org/r/208056 [01:27:52] (03CR) 10Yuvipanda: [C: 032 V: 032] Revert "tools: Set charset of tools-static to utf-8 properly" [puppet] - 10https://gerrit.wikimedia.org/r/208056 (owner: 10Yuvipanda) [01:36:02] 10Ops-Access-Requests, 6operations: Give me access to stats1003.eqiad.wmnet - https://phabricator.wikimedia.org/T97746#1251292 (10Neil_P._Quinn_WMF) 3NEW [01:38:26] PROBLEM - puppet last run on cp4002 is CRITICAL puppet fail [01:50:07] PROBLEM - puppet last run on stat1002 is CRITICAL Puppet last ran 4 hours ago [01:56:27] RECOVERY - puppet last run on cp4002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [02:01:55] 10Ops-Access-Requests, 6operations: Give me access to stats1003.eqiad.wmnet - https://phabricator.wikimedia.org/T97746#1251312 (10Jdforrester-WMF) Manager confirmation of approval of granting Neil `researchers` and `statistics-users` (and thus `bastiononly`?) access. [02:08:16] PROBLEM - puppet last run on ms-fe2001 is CRITICAL puppet fail [02:27:56] RECOVERY - puppet last run on ms-fe2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [02:31:16] !log l10nupdate Synchronized php-1.26wmf3/cache/l10n: (no message) (duration: 09m 46s) [02:31:17] PROBLEM - RAID on snapshot1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:31:31] Logged the message, Master [02:37:46] RECOVERY - RAID on snapshot1004 is OK no RAID installed [02:38:23] !log LocalisationUpdate completed (1.26wmf3) at 2015-05-01 02:37:20+00:00 [02:38:31] Logged the message, Master [03:01:22] !log l10nupdate Synchronized php-1.26wmf4/cache/l10n: (no message) (duration: 06m 45s) [03:01:30] Logged the message, Master [03:05:24] !log LocalisationUpdate completed (1.26wmf4) at 2015-05-01 03:04:21+00:00 [03:05:32] Logged the message, Master [04:05:33] (03PS1) 10Andrew Bogott: Disable instance pausing in Horizon. [puppet] - 10https://gerrit.wikimedia.org/r/208066 [04:29:16] PROBLEM - Outgoing network saturation on labstore1001 is CRITICAL 13.79% of data above the critical threshold [100000000.0] [04:30:50] (03PS1) 10Yuvipanda: tools: Add a toolschecker role / module / endpoint [puppet] - 10https://gerrit.wikimedia.org/r/208067 (https://phabricator.wikimedia.org/T97748) [04:31:37] PROBLEM - puppet last run on cp3005 is CRITICAL puppet fail [04:36:41] (03PS2) 10Yuvipanda: tools: Add a toolschecker role / module / endpoint [puppet] - 10https://gerrit.wikimedia.org/r/208067 (https://phabricator.wikimedia.org/T97748) [04:37:44] (03PS3) 10Yuvipanda: tools: Add a toolschecker role / module / endpoint [puppet] - 10https://gerrit.wikimedia.org/r/208067 (https://phabricator.wikimedia.org/T97748) [04:37:58] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Add a toolschecker role / module / endpoint [puppet] - 10https://gerrit.wikimedia.org/r/208067 (https://phabricator.wikimedia.org/T97748) (owner: 10Yuvipanda) [04:41:22] (03PS1) 10Yuvipanda: tools: Fix permission string on toolschecker.py [puppet] - 10https://gerrit.wikimedia.org/r/208068 [04:41:27] (03CR) 10jenkins-bot: [V: 04-1] tools: Fix permission string on toolschecker.py [puppet] - 10https://gerrit.wikimedia.org/r/208068 (owner: 10Yuvipanda) [04:41:33] (03PS2) 10Yuvipanda: tools: Fix permission string on toolschecker.py [puppet] - 10https://gerrit.wikimedia.org/r/208068 [04:41:50] (03CR) 10Yuvipanda: [C: 032] tools: Fix permission string on toolschecker.py [puppet] - 10https://gerrit.wikimedia.org/r/208068 (owner: 10Yuvipanda) [04:44:40] (03PS1) 10Yuvipanda: tools: Use require_package for package definitions [puppet] - 10https://gerrit.wikimedia.org/r/208069 [04:44:57] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Use require_package for package definitions [puppet] - 10https://gerrit.wikimedia.org/r/208069 (owner: 10Yuvipanda) [04:48:15] (03PS1) 10Yuvipanda: tools: Fix typo in path of toolschecker uwsgi socket [puppet] - 10https://gerrit.wikimedia.org/r/208070 [04:48:48] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Fix typo in path of toolschecker uwsgi socket [puppet] - 10https://gerrit.wikimedia.org/r/208070 (owner: 10Yuvipanda) [04:49:47] RECOVERY - puppet last run on cp3005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [04:57:16] RECOVERY - Outgoing network saturation on labstore1001 is OK Less than 10.00% above the threshold [75000000.0] [05:03:04] 6operations, 10Analytics-Cluster, 5Patch-For-Review: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1251385 (10awight) Excellent, thanks @Ottomata! Could you let us know the expected lag time for this new pipe? We would like if banner impression logs become av... [05:03:26] 6operations, 10Analytics-Cluster, 5Patch-For-Review: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1251387 (10awight) [05:06:47] (03PS1) 10Yuvipanda: tools: Use uid rather than user to specify uwsgi user [puppet] - 10https://gerrit.wikimedia.org/r/208072 [05:07:11] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Use uid rather than user to specify uwsgi user [puppet] - 10https://gerrit.wikimedia.org/r/208072 (owner: 10Yuvipanda) [05:15:46] PROBLEM - puppet last run on mw2209 is CRITICAL puppet fail [05:21:21] (03PS1) 10Yuvipanda: tools: Return 200 vs 503 for toolschecker success / fail [puppet] - 10https://gerrit.wikimedia.org/r/208073 [05:21:37] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Return 200 vs 503 for toolschecker success / fail [puppet] - 10https://gerrit.wikimedia.org/r/208073 (owner: 10Yuvipanda) [05:24:27] (03PS1) 10Yuvipanda: tools: Return exceptions from toolschecker [puppet] - 10https://gerrit.wikimedia.org/r/208074 [05:24:40] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Return exceptions from toolschecker [puppet] - 10https://gerrit.wikimedia.org/r/208074 (owner: 10Yuvipanda) [05:33:56] RECOVERY - puppet last run on mw2209 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures [05:45:27] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri May 1 05:44:23 UTC 2015 (duration 44m 22s) [05:45:33] Logged the message, Master [05:59:07] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.69% of data above the critical threshold [500.0] [06:02:55] (03PS1) 10BBlack: fix static asset paths in vcl_hash [puppet] - 10https://gerrit.wikimedia.org/r/208076 [06:05:50] (03CR) 10Ori.livneh: [C: 031] fix static asset paths in vcl_hash [puppet] - 10https://gerrit.wikimedia.org/r/208076 (owner: 10BBlack) [06:07:21] (03CR) 10BBlack: [C: 032] fix static asset paths in vcl_hash [puppet] - 10https://gerrit.wikimedia.org/r/208076 (owner: 10BBlack) [06:10:37] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [06:29:57] PROBLEM - puppet last run on mw2016 is CRITICAL Puppet has 1 failures [06:30:08] PROBLEM - puppet last run on cp4003 is CRITICAL Puppet has 1 failures [06:30:46] PROBLEM - puppet last run on cp3014 is CRITICAL Puppet has 1 failures [06:31:46] PROBLEM - puppet last run on mw1119 is CRITICAL Puppet has 1 failures [06:34:56] PROBLEM - puppet last run on mw2206 is CRITICAL Puppet has 1 failures [06:35:46] PROBLEM - puppet last run on mw2022 is CRITICAL Puppet has 1 failures [06:36:28] PROBLEM - puppet last run on mw2212 is CRITICAL Puppet has 1 failures [06:36:28] PROBLEM - puppet last run on mw2045 is CRITICAL Puppet has 1 failures [06:43:53] this error is new to me: Lost parent, LightProcess exiting [06:43:58] in the fatalmonitor [06:46:27] RECOVERY - puppet last run on mw2016 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:28] RECOVERY - puppet last run on mw1119 is OK Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:46:38] RECOVERY - puppet last run on cp4003 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:47:16] RECOVERY - puppet last run on cp3014 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:16] RECOVERY - puppet last run on mw2022 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [06:48:06] RECOVERY - puppet last run on mw2212 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:06] RECOVERY - puppet last run on mw2045 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:48:06] RECOVERY - puppet last run on mw2206 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [06:50:56] PROBLEM - Outgoing network saturation on labstore1001 is CRITICAL 10.34% of data above the critical threshold [100000000.0] [07:13:56] PROBLEM - Outgoing network saturation on labstore1001 is CRITICAL 10.34% of data above the critical threshold [100000000.0] [07:15:39] 6operations, 10Wikimedia-DNS: wmgUseBits = false still loads favicon from bits.wikimedia.org - https://phabricator.wikimedia.org/T97750#1251439 (10Nemo_bis) 3NEW [07:20:28] PROBLEM - Outgoing network saturation on labstore1001 is CRITICAL 10.34% of data above the critical threshold [100000000.0] [07:20:56] PROBLEM - puppet last run on mw2207 is CRITICAL Puppet has 1 failures [07:28:17] 6operations, 6Security, 10Wikimedia-General-or-Unknown, 7Mail: DMARC: Users cannot send emails via a wiki's [[Special:EmailUser]] - https://phabricator.wikimedia.org/T66795#1251454 (10TheDJ) Because I was trying to figure out what needed to be done to fix this ticket, I realized that it is rather messy are... [07:37:26] RECOVERY - puppet last run on mw2207 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [07:40:17] RECOVERY - Outgoing network saturation on labstore1001 is OK Less than 10.00% above the threshold [75000000.0] [07:51:34] <_joe_> twentyafterfour: mh seems like it refers to hhvm lightprocesses [07:53:40] 6operations, 10Traffic, 7Performance: wmgUseBits = false still loads favicon from bits.wikimedia.org - https://phabricator.wikimedia.org/T97750#1251468 (10BBlack) a:3ori [07:55:56] 6operations, 10Traffic, 7Performance: wmgUseBits = false still loads favicon from bits.wikimedia.org - https://phabricator.wikimedia.org/T97750#1251473 (10BBlack) Yeah the favicon paths are all hardcoded here: https://phabricator.wikimedia.org/diffusion/OMWC/browse/master/wmf-config/InitialiseSettings.php;5f... [08:03:32] yeah, no idea what triggered that but it's gone now. there were like 50 instances of that message in the log when I was looking at it earlier [08:13:50] (03PS1) 10BBlack: include favicons in static hashing stuff [puppet] - 10https://gerrit.wikimedia.org/r/208078 [08:16:35] 6operations, 6Phabricator, 5Patch-For-Review: have any task put into ops-access-requests automatically generate an ops-access-review task - https://phabricator.wikimedia.org/T87467#1251480 (10mmodell) [09:28:24] 6operations, 6Security, 10Wikimedia-General-or-Unknown, 7Mail: DMARC: Users cannot send emails via a wiki's [[Special:EmailUser]] - https://phabricator.wikimedia.org/T66795#1251517 (10Technical13) @TheDJ does that ticket now block this one? [10:01:06] 6operations, 7Monitoring: Overhaul reqstats - https://phabricator.wikimedia.org/T83580#1251535 (10fgiunchedi) that'd work for counting things but what to do with timings? also we'd calculate individually I guess, that'd one for each txstatus/rxstatus hundreds * frontend/backend, plus verbs (4?) * frontend/bac... [10:21:36] (03CR) 10Filippo Giunchedi: [C: 031] "waiting for gwicke/eevans to wake up before merging" [puppet] - 10https://gerrit.wikimedia.org/r/207989 (owner: 10GWicke) [10:39:28] PROBLEM - puppet last run on mw1037 is CRITICAL Puppet has 5 failures [10:39:36] PROBLEM - puppet last run on mw1073 is CRITICAL puppet fail [10:39:37] PROBLEM - puppet last run on mw2158 is CRITICAL Puppet has 9 failures [10:39:46] PROBLEM - puppet last run on wtp2016 is CRITICAL Puppet has 13 failures [10:39:47] PROBLEM - puppet last run on mw1018 is CRITICAL puppet fail [10:39:47] PROBLEM - puppet last run on mw1021 is CRITICAL Puppet has 2 failures [10:39:47] PROBLEM - puppet last run on mw2019 is CRITICAL Puppet has 14 failures [10:40:06] PROBLEM - puppet last run on mw1253 is CRITICAL Puppet has 44 failures [10:40:07] PROBLEM - puppet last run on mw1155 is CRITICAL Puppet has 11 failures [10:40:16] PROBLEM - puppet last run on mw1103 is CRITICAL puppet fail [10:40:16] PROBLEM - puppet last run on cp4015 is CRITICAL puppet fail [10:40:16] PROBLEM - puppet last run on cp3038 is CRITICAL Puppet has 22 failures [10:40:17] PROBLEM - puppet last run on analytics1021 is CRITICAL Puppet has 13 failures [10:40:17] PROBLEM - puppet last run on cp4016 is CRITICAL Puppet has 19 failures [10:40:27] PROBLEM - puppet last run on mw1102 is CRITICAL puppet fail [10:40:28] PROBLEM - puppet last run on mw1199 is CRITICAL Puppet has 43 failures [10:40:36] PROBLEM - puppet last run on db2044 is CRITICAL Puppet has 8 failures [10:40:36] PROBLEM - puppet last run on db2047 is CRITICAL Puppet has 15 failures [10:40:36] PROBLEM - puppet last run on mw2176 is CRITICAL Puppet has 22 failures [10:40:37] PROBLEM - puppet last run on mc1015 is CRITICAL Puppet has 24 failures [10:40:37] PROBLEM - puppet last run on db2017 is CRITICAL Puppet has 20 failures [10:40:37] PROBLEM - puppet last run on db2010 is CRITICAL Puppet has 12 failures [10:40:37] PROBLEM - puppet last run on mw2018 is CRITICAL Puppet has 62 failures [10:40:38] PROBLEM - puppet last run on mw1207 is CRITICAL Puppet has 56 failures [10:40:38] PROBLEM - puppet last run on mw1015 is CRITICAL puppet fail [10:40:39] PROBLEM - puppet last run on mw1255 is CRITICAL Puppet has 43 failures [10:40:39] PROBLEM - puppet last run on mw1019 is CRITICAL puppet fail [10:40:40] PROBLEM - puppet last run on bast2001 is CRITICAL Puppet has 19 failures [10:40:40] PROBLEM - puppet last run on mw1047 is CRITICAL Puppet has 33 failures [10:40:41] PROBLEM - puppet last run on mw1113 is CRITICAL Puppet has 50 failures [10:40:57] PROBLEM - puppet last run on analytics1031 is CRITICAL Puppet has 17 failures [10:40:57] PROBLEM - puppet last run on mw1078 is CRITICAL Puppet has 73 failures [10:40:57] PROBLEM - puppet last run on restbase1006 is CRITICAL puppet fail [10:40:57] PROBLEM - puppet last run on mw1104 is CRITICAL Puppet has 46 failures [10:40:57] PROBLEM - puppet last run on db2056 is CRITICAL Puppet has 13 failures [10:40:57] PROBLEM - puppet last run on db2055 is CRITICAL Puppet has 14 failures [10:40:58] PROBLEM - puppet last run on wtp2004 is CRITICAL Puppet has 10 failures [10:40:58] PROBLEM - puppet last run on mw2195 is CRITICAL Puppet has 39 failures [10:40:59] PROBLEM - puppet last run on mw2138 is CRITICAL puppet fail [10:40:59] PROBLEM - puppet last run on mw1110 is CRITICAL Puppet has 38 failures [10:41:00] PROBLEM - puppet last run on cp1064 is CRITICAL Puppet has 15 failures [10:41:06] PROBLEM - puppet last run on mw2129 is CRITICAL Puppet has 45 failures [10:41:06] PROBLEM - puppet last run on mw2135 is CRITICAL puppet fail [10:41:07] PROBLEM - puppet last run on eventlog2001 is CRITICAL puppet fail [10:41:07] PROBLEM - puppet last run on elastic1029 is CRITICAL puppet fail [10:41:07] PROBLEM - puppet last run on mw1157 is CRITICAL Puppet has 27 failures [10:41:07] PROBLEM - puppet last run on mw1230 is CRITICAL puppet fail [10:41:16] PROBLEM - puppet last run on tmh1001 is CRITICAL Puppet has 58 failures [10:41:16] PROBLEM - puppet last run on mw1131 is CRITICAL Puppet has 36 failures [10:41:17] PROBLEM - puppet last run on praseodymium is CRITICAL Puppet has 13 failures [10:41:17] PROBLEM - puppet last run on db1049 is CRITICAL puppet fail [10:41:17] PROBLEM - puppet last run on pollux is CRITICAL puppet fail [10:41:17] PROBLEM - puppet last run on lvs2002 is CRITICAL Puppet has 14 failures [10:41:17] PROBLEM - puppet last run on mw2182 is CRITICAL Puppet has 70 failures [10:41:18] PROBLEM - puppet last run on analytics1029 is CRITICAL puppet fail [10:41:18] PROBLEM - puppet last run on mw2172 is CRITICAL puppet fail [10:41:19] PROBLEM - puppet last run on mw1070 is CRITICAL puppet fail [10:41:26] PROBLEM - puppet last run on mw2207 is CRITICAL Puppet has 57 failures [10:41:26] PROBLEM - puppet last run on mw2203 is CRITICAL puppet fail [10:41:26] PROBLEM - puppet last run on mw2131 is CRITICAL Puppet has 25 failures [10:41:26] PROBLEM - puppet last run on mw2098 is CRITICAL puppet fail [10:41:26] PROBLEM - puppet last run on mw2120 is CRITICAL Puppet has 31 failures [10:41:27] PROBLEM - puppet last run on es1003 is CRITICAL puppet fail [10:41:27] PROBLEM - puppet last run on mw2084 is CRITICAL Puppet has 22 failures [10:42:42] PROBLEM - puppet last run on cp1073 is CRITICAL puppet fail [10:42:42] PROBLEM - puppet last run on db1019 is CRITICAL puppet fail [10:42:42] PROBLEM - puppet last run on zirconium is CRITICAL Puppet has 35 failures [10:42:42] PROBLEM - puppet last run on mw1127 is CRITICAL Puppet has 34 failures [10:42:42] PROBLEM - puppet last run on mw2165 is CRITICAL puppet fail [10:42:46] PROBLEM - puppet last run on mw1169 is CRITICAL Puppet has 74 failures [10:42:46] PROBLEM - puppet last run on lvs2005 is CRITICAL Puppet has 13 failures [10:42:47] PROBLEM - puppet last run on mw2055 is CRITICAL Puppet has 75 failures [10:42:47] PROBLEM - puppet last run on mw2035 is CRITICAL puppet fail [10:42:47] PROBLEM - puppet last run on mw2053 is CRITICAL puppet fail [10:42:48] PROBLEM - puppet last run on lvs4001 is CRITICAL Puppet has 15 failures [10:42:56] PROBLEM - puppet last run on mw1013 is CRITICAL puppet fail [10:42:56] PROBLEM - puppet last run on wtp1014 is CRITICAL puppet fail [10:42:57] PROBLEM - puppet last run on db2061 is CRITICAL puppet fail [10:42:58] PROBLEM - puppet last run on db2049 is CRITICAL Puppet has 22 failures [10:43:06] PROBLEM - puppet last run on db2048 is CRITICAL Puppet has 18 failures [10:43:06] PROBLEM - puppet last run on wtp2014 is CRITICAL Puppet has 26 failures [10:43:06] PROBLEM - puppet last run on mw2177 is CRITICAL Puppet has 29 failures [10:43:07] PROBLEM - puppet last run on mw2101 is CRITICAL Puppet has 69 failures [10:43:07] PROBLEM - puppet last run on mw1017 is CRITICAL Puppet has 70 failures [10:43:07] PROBLEM - puppet last run on db1024 is CRITICAL Puppet has 12 failures [10:43:07] PROBLEM - puppet last run on mw2089 is CRITICAL Puppet has 41 failures [10:43:08] PROBLEM - puppet last run on mw1196 is CRITICAL puppet fail [10:43:08] PROBLEM - puppet last run on mw2111 is CRITICAL Puppet has 76 failures [10:43:09] PROBLEM - puppet last run on mw2119 is CRITICAL Puppet has 37 failures [10:43:09] PROBLEM - puppet last run on ytterbium is CRITICAL puppet fail [10:43:10] PROBLEM - puppet last run on ms-be2015 is CRITICAL puppet fail [10:43:27] PROBLEM - puppet last run on mw1244 is CRITICAL puppet fail [10:43:27] PROBLEM - puppet last run on ganeti2005 is CRITICAL puppet fail [10:43:27] PROBLEM - puppet last run on labvirt1005 is CRITICAL puppet fail [10:43:28] PROBLEM - puppet last run on elastic1028 is CRITICAL puppet fail [10:43:36] PROBLEM - puppet last run on virt1012 is CRITICAL Puppet has 20 failures [10:43:37] PROBLEM - puppet last run on mw1132 is CRITICAL puppet fail [10:43:37] PROBLEM - puppet last run on mw1232 is CRITICAL puppet fail [10:43:37] PROBLEM - puppet last run on analytics1039 is CRITICAL puppet fail [10:43:37] PROBLEM - puppet last run on cp1065 is CRITICAL puppet fail [10:43:37] PROBLEM - puppet last run on analytics1019 is CRITICAL Puppet has 22 failures [10:43:38] PROBLEM - puppet last run on mw1257 is CRITICAL puppet fail [10:43:38] PROBLEM - puppet last run on cp4013 is CRITICAL Puppet has 24 failures [10:43:46] PROBLEM - puppet last run on cp4017 is CRITICAL Puppet has 29 failures [10:43:46] PROBLEM - puppet last run on db1041 is CRITICAL Puppet has 13 failures [10:43:46] PROBLEM - puppet last run on analytics1034 is CRITICAL puppet fail [10:43:47] PROBLEM - puppet last run on wtp1024 is CRITICAL Puppet has 14 failures [10:43:47] PROBLEM - puppet last run on ms-be1016 is CRITICAL puppet fail [10:43:47] PROBLEM - puppet last run on mc1010 is CRITICAL puppet fail [10:43:47] PROBLEM - puppet last run on mw1036 is CRITICAL puppet fail [10:43:48] PROBLEM - puppet last run on logstash1003 is CRITICAL puppet fail [10:43:48] PROBLEM - puppet last run on virt1002 is CRITICAL puppet fail [10:43:56] PROBLEM - puppet last run on mw1138 is CRITICAL Puppet has 66 failures [10:43:57] PROBLEM - puppet last run on mc1008 is CRITICAL Puppet has 30 failures [10:43:57] PROBLEM - puppet last run on db2041 is CRITICAL Puppet has 21 failures [10:43:57] PROBLEM - puppet last run on mw1246 is CRITICAL Puppet has 84 failures [10:43:57] PROBLEM - puppet last run on mw2179 is CRITICAL puppet fail [10:43:57] PROBLEM - puppet last run on mw2214 is CRITICAL puppet fail [10:43:58] PROBLEM - puppet last run on db1010 is CRITICAL puppet fail [10:43:58] PROBLEM - puppet last run on wtp1021 is CRITICAL Puppet has 12 failures [10:43:59] PROBLEM - puppet last run on mw1245 is CRITICAL Puppet has 73 failures [10:44:06] PROBLEM - puppet last run on cp3032 is CRITICAL puppet fail [10:44:06] PROBLEM - puppet last run on cp3013 is CRITICAL puppet fail [10:44:07] PROBLEM - puppet last run on mw1124 is CRITICAL Puppet has 28 failures [10:44:07] PROBLEM - puppet last run on cp1051 is CRITICAL puppet fail [10:44:16] PROBLEM - puppet last run on mw1256 is CRITICAL puppet fail [10:44:17] PROBLEM - puppet last run on es1009 is CRITICAL Puppet has 17 failures [10:44:17] PROBLEM - puppet last run on cp1043 is CRITICAL Puppet has 17 failures [10:44:17] PROBLEM - puppet last run on analytics1036 is CRITICAL Puppet has 24 failures [10:44:17] PROBLEM - puppet last run on db2051 is CRITICAL Puppet has 10 failures [10:44:25] the hell? [10:44:26] PROBLEM - puppet last run on mw1040 is CRITICAL Puppet has 72 failures [10:44:26] PROBLEM - puppet last run on db2035 is CRITICAL puppet fail [10:44:26] PROBLEM - puppet last run on mw1221 is CRITICAL puppet fail [10:44:26] PROBLEM - puppet last run on mw1096 is CRITICAL Puppet has 32 failures [10:44:26] PROBLEM - puppet last run on mw1062 is CRITICAL Puppet has 32 failures [10:44:27] PROBLEM - puppet last run on cp1059 is CRITICAL Puppet has 17 failures [10:44:27] PROBLEM - puppet last run on mw1216 is CRITICAL puppet fail [10:44:28] PROBLEM - puppet last run on mw1234 is CRITICAL Puppet has 69 failures [10:44:28] PROBLEM - puppet last run on stat1001 is CRITICAL Puppet has 12 failures [10:44:29] PROBLEM - puppet last run on mw1161 is CRITICAL Puppet has 35 failures [10:44:29] PROBLEM - puppet last run on es1006 is CRITICAL Puppet has 21 failures [10:44:30] PROBLEM - puppet last run on mc1011 is CRITICAL puppet fail [10:44:31] 6operations, 7Graphite, 5Patch-For-Review: enable statsd extended counters - https://phabricator.wikimedia.org/T95703#1251555 (10fgiunchedi) [10:44:47] PROBLEM - puppet last run on mw1252 is CRITICAL Puppet has 79 failures [10:44:47] PROBLEM - puppet last run on mw2103 is CRITICAL Puppet has 37 failures [10:44:47] PROBLEM - puppet last run on mw1147 is CRITICAL Puppet has 34 failures [10:44:48] PROBLEM - puppet last run on mw2133 is CRITICAL Puppet has 75 failures [10:44:48] PROBLEM - puppet last run on mw2102 is CRITICAL Puppet has 31 failures [10:44:48] PROBLEM - puppet last run on mw1080 is CRITICAL Puppet has 77 failures [10:44:48] PROBLEM - puppet last run on mw2106 is CRITICAL Puppet has 27 failures [10:44:49] PROBLEM - puppet last run on mw2112 is CRITICAL puppet fail [10:44:49] PROBLEM - puppet last run on mw2032 is CRITICAL puppet fail [10:44:56] PROBLEM - puppet last run on es2003 is CRITICAL Puppet has 8 failures [10:44:56] PROBLEM - puppet last run on mw2028 is CRITICAL puppet fail [10:44:56] PROBLEM - puppet last run on mw2034 is CRITICAL puppet fail [10:44:56] PROBLEM - puppet last run on mc1009 is CRITICAL Puppet has 15 failures [10:44:57] PROBLEM - puppet last run on eventlog1001 is CRITICAL Puppet has 24 failures [10:44:57] PROBLEM - puppet last run on ms-fe1003 is CRITICAL puppet fail [10:44:57] PROBLEM - puppet last run on uranium is CRITICAL Puppet has 27 failures [10:44:58] PROBLEM - puppet last run on labnodepool1001 is CRITICAL puppet fail [10:44:58] PROBLEM - puppet last run on wtp1017 is CRITICAL Puppet has 26 failures [10:45:08] PROBLEM - puppet last run on labsdb1001 is CRITICAL puppet fail [10:45:08] PROBLEM - puppet last run on mw1178 is CRITICAL puppet fail [10:45:08] PROBLEM - puppet last run on dbproxy1002 is CRITICAL puppet fail [10:45:08] PROBLEM - puppet last run on mw2183 is CRITICAL puppet fail [10:45:08] PROBLEM - puppet last run on cp1072 is CRITICAL Puppet has 30 failures [10:45:08] PROBLEM - puppet last run on ms-be2002 is CRITICAL puppet fail [10:45:08] PROBLEM - puppet last run on db1007 is CRITICAL puppet fail [10:45:09] PROBLEM - puppet last run on elastic1026 is CRITICAL puppet fail [10:45:09] PROBLEM - puppet last run on californium is CRITICAL puppet fail [10:45:10] PROBLEM - puppet last run on mw1134 is CRITICAL puppet fail [10:45:10] PROBLEM - puppet last run on mc1004 is CRITICAL puppet fail [10:45:11] PROBLEM - puppet last run on rdb1004 is CRITICAL Puppet has 18 failures [10:45:17] puppet master fail, looking [10:45:26] PROBLEM - puppet last run on ms-be1013 is CRITICAL Puppet has 27 failures [10:45:27] PROBLEM - puppet last run on mw1031 is CRITICAL Puppet has 30 failures [10:45:27] PROBLEM - puppet last run on cp4007 is CRITICAL Puppet has 25 failures [10:45:27] PROBLEM - puppet last run on mw1106 is CRITICAL puppet fail [10:45:28] PROBLEM - puppet last run on mw1109 is CRITICAL Puppet has 42 failures [10:45:36] PROBLEM - puppet last run on dbstore1001 is CRITICAL puppet fail [10:45:36] PROBLEM - puppet last run on mw1045 is CRITICAL puppet fail [10:45:36] PROBLEM - puppet last run on cp1049 is CRITICAL puppet fail [10:45:37] PROBLEM - puppet last run on neptunium is CRITICAL puppet fail [10:45:37] PROBLEM - puppet last run on mw1141 is CRITICAL puppet fail [10:45:38] PROBLEM - puppet last run on mw2197 is CRITICAL Puppet has 30 failures [10:45:38] PROBLEM - puppet last run on mw2108 is CRITICAL puppet fail [10:45:38] PROBLEM - puppet last run on mw2046 is CRITICAL Puppet has 73 failures [10:45:38] PROBLEM - puppet last run on mw1063 is CRITICAL puppet fail [10:45:39] PROBLEM - puppet last run on wtp1019 is CRITICAL puppet fail [10:45:47] PROBLEM - puppet last run on mw1130 is CRITICAL Puppet has 37 failures [10:45:47] PROBLEM - puppet last run on cp3045 is CRITICAL Puppet has 20 failures [10:45:47] PROBLEM - puppet last run on cp3046 is CRITICAL puppet fail [10:45:47] PROBLEM - puppet last run on cp3015 is CRITICAL Puppet has 14 failures [10:45:47] PROBLEM - puppet last run on mw1218 is CRITICAL Puppet has 41 failures [10:45:48] PROBLEM - puppet last run on mw1140 is CRITICAL puppet fail [10:45:48] PROBLEM - puppet last run on carbon is CRITICAL puppet fail [10:45:49] PROBLEM - puppet last run on mw1048 is CRITICAL puppet fail [10:45:57] PROBLEM - puppet last run on db1065 is CRITICAL puppet fail [10:45:57] PROBLEM - puppet last run on mw1059 is CRITICAL puppet fail [10:45:57] PROBLEM - puppet last run on ocg1001 is CRITICAL Puppet has 24 failures [10:45:57] PROBLEM - puppet last run on cp1070 is CRITICAL puppet fail [10:45:58] PROBLEM - puppet last run on xenon is CRITICAL puppet fail [10:46:07] PROBLEM - puppet last run on analytics1041 is CRITICAL puppet fail [10:46:08] PROBLEM - puppet last run on mw1007 is CRITICAL puppet fail [10:46:08] PROBLEM - puppet last run on logstash1001 is CRITICAL Puppet has 22 failures [10:46:08] PROBLEM - puppet last run on ms-be1017 is CRITICAL Puppet has 27 failures [10:46:09] PROBLEM - puppet last run on ms-be2013 is CRITICAL Puppet has 28 failures [10:46:09] PROBLEM - puppet last run on magnesium is CRITICAL Puppet has 31 failures [10:46:16] PROBLEM - puppet last run on wtp1020 is CRITICAL puppet fail [10:46:16] PROBLEM - puppet last run on elastic1007 is CRITICAL puppet fail [10:46:16] PROBLEM - puppet last run on pc1001 is CRITICAL Puppet has 20 failures [10:46:16] PROBLEM - puppet last run on analytics1033 is CRITICAL Puppet has 12 failures [10:46:17] PROBLEM - puppet last run on mw1067 is CRITICAL Puppet has 33 failures [10:46:17] PROBLEM - puppet last run on mw1197 is CRITICAL puppet fail [10:46:26] PROBLEM - puppet last run on cerium is CRITICAL Puppet has 31 failures [10:46:26] PROBLEM - puppet last run on mw1200 is CRITICAL Puppet has 32 failures [10:46:26] PROBLEM - puppet last run on mw1072 is CRITICAL puppet fail [10:46:26] PROBLEM - puppet last run on db2046 is CRITICAL Puppet has 13 failures [10:46:27] PROBLEM - puppet last run on db1053 is CRITICAL puppet fail [10:46:27] PROBLEM - puppet last run on db2070 is CRITICAL Puppet has 16 failures [10:46:27] PROBLEM - puppet last run on mw2169 is CRITICAL Puppet has 77 failures [10:46:28] PROBLEM - puppet last run on dbstore2001 is CRITICAL puppet fail [10:46:28] PROBLEM - puppet last run on mw2122 is CRITICAL Puppet has 65 failures [10:46:29] PROBLEM - puppet last run on mw2100 is CRITICAL puppet fail [10:46:29] PROBLEM - puppet last run on netmon1001 is CRITICAL Puppet has 20 failures [10:46:30] PROBLEM - puppet last run on mw2118 is CRITICAL puppet fail [10:46:46] PROBLEM - puppet last run on mc1016 is CRITICAL Puppet has 28 failures [10:46:46] PROBLEM - puppet last run on analytics1035 is CRITICAL puppet fail [10:46:47] PROBLEM - puppet last run on db1029 is CRITICAL Puppet has 23 failures [10:46:47] PROBLEM - puppet last run on mw2170 is CRITICAL Puppet has 38 failures [10:46:47] PROBLEM - puppet last run on es1001 is CRITICAL Puppet has 14 failures [10:46:47] PROBLEM - puppet last run on mc2004 is CRITICAL puppet fail [10:46:47] PROBLEM - puppet last run on mw2121 is CRITICAL puppet fail [10:46:48] PROBLEM - puppet last run on ms-be2003 is CRITICAL Puppet has 20 failures [10:46:48] PROBLEM - puppet last run on labstore2001 is CRITICAL Puppet has 13 failures [10:46:49] PROBLEM - puppet last run on lvs4002 is CRITICAL Puppet has 14 failures [10:46:49] PROBLEM - puppet last run on mw1250 is CRITICAL puppet fail [10:46:49] PROBLEM - puppet last run on snapshot1003 is CRITICAL puppet fail [10:47:06] PROBLEM - puppet last run on db1031 is CRITICAL puppet fail [10:47:07] PROBLEM - puppet last run on caesium is CRITICAL Puppet has 33 failures [10:47:07] PROBLEM - puppet last run on cp4006 is CRITICAL Puppet has 29 failures [10:47:07] PROBLEM - puppet last run on mw1120 is CRITICAL puppet fail [10:47:07] PROBLEM - puppet last run on lvs1002 is CRITICAL Puppet has 22 failures [10:47:07] PROBLEM - puppet last run on db1050 is CRITICAL puppet fail [10:47:16] PROBLEM - puppet last run on mw1176 is CRITICAL puppet fail [10:47:16] PROBLEM - puppet last run on db1022 is CRITICAL puppet fail [10:47:16] PROBLEM - puppet last run on wtp1006 is CRITICAL puppet fail [10:47:16] PROBLEM - puppet last run on mw1164 is CRITICAL puppet fail [10:47:17] PROBLEM - puppet last run on db2067 is CRITICAL Puppet has 10 failures [10:47:17] PROBLEM - puppet last run on db1073 is CRITICAL puppet fail [10:47:17] PROBLEM - puppet last run on lead is CRITICAL puppet fail [10:47:18] PROBLEM - puppet last run on potassium is CRITICAL puppet fail [10:47:18] PROBLEM - puppet last run on mw2160 is CRITICAL Puppet has 72 failures [10:47:19] PROBLEM - puppet last run on es2005 is CRITICAL Puppet has 21 failures [10:47:19] PROBLEM - puppet last run on mw2173 is CRITICAL puppet fail [10:47:20] PROBLEM - puppet last run on mw2164 is CRITICAL Puppet has 33 failures [10:47:26] !log bounce apache2 on palladium, mod_passenger died [10:47:33] !log bounce apache2 on strontium [10:47:35] Logged the message, Master [10:47:36] PROBLEM - puppet last run on cp1055 is CRITICAL Puppet has 15 failures [10:47:37] PROBLEM - puppet last run on analytics1025 is CRITICAL puppet fail [10:47:37] PROBLEM - puppet last run on wtp1016 is CRITICAL Puppet has 31 failures [10:47:37] RECOVERY - puppetmaster backend https on palladium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 6.368 second response time [10:47:37] PROBLEM - puppet last run on mw1242 is CRITICAL puppet fail [10:47:37] PROBLEM - puppet last run on labstore1003 is CRITICAL puppet fail [10:47:37] PROBLEM - puppet last run on mw1222 is CRITICAL puppet fail [10:47:38] PROBLEM - puppet last run on elastic1001 is CRITICAL puppet fail [10:47:38] PROBLEM - puppet last run on ms-be1003 is CRITICAL Puppet has 19 failures [10:47:39] PROBLEM - puppet last run on ms-be1004 is CRITICAL Puppet has 24 failures [10:47:40] Logged the message, Master [10:47:46] PROBLEM - puppet last run on mw1006 is CRITICAL Puppet has 25 failures [10:47:46] PROBLEM - puppet last run on db2039 is CRITICAL Puppet has 19 failures [10:47:46] PROBLEM - puppet last run on wtp2018 is CRITICAL puppet fail [10:47:46] PROBLEM - puppet last run on wtp2005 is CRITICAL Puppet has 22 failures [10:47:46] PROBLEM - puppet last run on db2034 is CRITICAL Puppet has 11 failures [10:47:47] PROBLEM - puppet last run on mw1226 is CRITICAL Puppet has 31 failures [10:47:47] PROBLEM - puppet last run on mw1068 is CRITICAL Puppet has 70 failures [10:47:48] PROBLEM - puppet last run on mw1069 is CRITICAL Puppet has 88 failures [10:47:48] PROBLEM - puppet last run on elastic1012 is CRITICAL Puppet has 10 failures [10:47:49] PROBLEM - puppet last run on elastic1004 is CRITICAL puppet fail [10:47:56] PROBLEM - puppet last run on analytics1040 is CRITICAL Puppet has 14 failures [10:47:56] PROBLEM - puppet last run on mw1145 is CRITICAL Puppet has 76 failures [10:47:57] PROBLEM - puppet last run on db2019 is CRITICAL Puppet has 19 failures [10:47:57] PROBLEM - puppet last run on mw1174 is CRITICAL puppet fail [10:47:57] PROBLEM - puppet last run on db2002 is CRITICAL Puppet has 10 failures [10:47:57] PROBLEM - puppet last run on ms-be2004 is CRITICAL puppet fail [10:47:57] PROBLEM - puppet last run on db2005 is CRITICAL puppet fail [10:47:58] PROBLEM - puppet last run on es2009 is CRITICAL Puppet has 11 failures [10:47:58] PROBLEM - puppet last run on mw1187 is CRITICAL Puppet has 29 failures [10:48:06] PROBLEM - puppet last run on db1040 is CRITICAL puppet fail [10:48:06] PROBLEM - puppet last run on analytics1020 is CRITICAL Puppet has 15 failures [10:48:06] PROBLEM - puppet last run on mw1008 is CRITICAL Puppet has 72 failures [10:48:06] PROBLEM - puppet last run on ms-fe2004 is CRITICAL Puppet has 35 failures [10:48:07] PROBLEM - puppet last run on mw1012 is CRITICAL Puppet has 74 failures [10:48:07] PROBLEM - puppet last run on mw1189 is CRITICAL Puppet has 31 failures [10:48:07] PROBLEM - puppet last run on mw1046 is CRITICAL puppet fail [10:48:08] PROBLEM - puppet last run on mw1082 is CRITICAL Puppet has 36 failures [10:48:08] PROBLEM - puppet last run on db2045 is CRITICAL puppet fail [10:48:11] PROBLEM - puppet last run on elastic1008 is CRITICAL puppet fail [10:48:11] PROBLEM - puppet last run on mw2161 is CRITICAL Puppet has 70 failures [10:48:11] PROBLEM - puppet last run on mw2163 is CRITICAL puppet fail [10:48:21] PROBLEM - puppet last run on mw2036 is CRITICAL puppet fail [10:48:21] PROBLEM - puppet last run on ms-fe1001 is CRITICAL puppet fail [10:48:22] PROBLEM - puppet last run on db2009 is CRITICAL Puppet has 21 failures [10:48:22] PROBLEM - puppet last run on mc2010 is CRITICAL Puppet has 20 failures [10:48:23] PROBLEM - puppet last run on es2001 is CRITICAL puppet fail [10:48:23] PROBLEM - puppet last run on cp1047 is CRITICAL Puppet has 14 failures [10:48:24] PROBLEM - puppet last run on mw2080 is CRITICAL Puppet has 23 failures [10:48:24] PROBLEM - puppet last run on mw1160 is CRITICAL Puppet has 39 failures [10:48:26] PROBLEM - puppet last run on ms-fe1004 is CRITICAL puppet fail [10:48:26] PROBLEM - puppet last run on lvs1005 is CRITICAL Puppet has 13 failures [10:48:26] PROBLEM - puppet last run on db1002 is CRITICAL Puppet has 13 failures [10:48:27] PROBLEM - puppet last run on mc1002 is CRITICAL Puppet has 9 failures [10:48:27] PROBLEM - puppet last run on wtp2008 is CRITICAL puppet fail [10:48:27] PROBLEM - puppet last run on mw2145 is CRITICAL Puppet has 39 failures [10:48:28] PROBLEM - puppet last run on mw2109 is CRITICAL Puppet has 33 failures [10:48:28] PROBLEM - puppet last run on mw2073 is CRITICAL puppet fail [10:48:29] PROBLEM - puppet last run on mc2001 is CRITICAL Puppet has 12 failures [10:48:36] PROBLEM - puppet last run on db1018 is CRITICAL Puppet has 24 failures [10:48:36] PROBLEM - puppet last run on mw1150 is CRITICAL Puppet has 75 failures [10:48:36] PROBLEM - puppet last run on mw1088 is CRITICAL Puppet has 29 failures [10:48:37] PROBLEM - puppet last run on mw1205 is CRITICAL puppet fail [10:48:37] PROBLEM - puppet last run on db1015 is CRITICAL puppet fail [10:48:37] PROBLEM - puppet last run on mw1228 is CRITICAL puppet fail [10:48:37] PROBLEM - puppet last run on mw1217 is CRITICAL Puppet has 75 failures [10:48:38] PROBLEM - puppet last run on holmium is CRITICAL puppet fail [10:48:38] PROBLEM - puppet last run on mc1003 is CRITICAL puppet fail [10:48:39] PROBLEM - puppet last run on logstash1004 is CRITICAL puppet fail [10:48:46] PROBLEM - puppet last run on cp4003 is CRITICAL puppet fail [10:48:46] PROBLEM - puppet last run on cp3042 is CRITICAL Puppet has 16 failures [10:48:46] PROBLEM - puppet last run on mw1099 is CRITICAL Puppet has 36 failures [10:48:47] PROBLEM - puppet last run on mw1061 is CRITICAL Puppet has 29 failures [10:48:47] PROBLEM - puppet last run on mw1100 is CRITICAL Puppet has 31 failures [10:48:56] PROBLEM - puppet last run on mw1060 is CRITICAL Puppet has 72 failures [10:48:57] PROBLEM - puppet last run on mw1117 is CRITICAL Puppet has 85 failures [10:48:57] PROBLEM - puppet last run on mw1173 is CRITICAL Puppet has 33 failures [10:48:57] PROBLEM - puppet last run on virt1006 is CRITICAL puppet fail [10:48:58] PROBLEM - puppet last run on mw1126 is CRITICAL puppet fail [10:49:06] PROBLEM - puppet last run on wtp2001 is CRITICAL Puppet has 11 failures [10:49:06] PROBLEM - puppet last run on mw1002 is CRITICAL puppet fail [10:49:06] PROBLEM - puppet last run on wtp2015 is CRITICAL puppet fail [10:49:07] PROBLEM - puppet last run on mw1003 is CRITICAL Puppet has 57 failures [10:49:07] PROBLEM - puppet last run on mw2113 is CRITICAL puppet fail [10:49:07] PROBLEM - puppet last run on mw1235 is CRITICAL puppet fail [10:49:07] PROBLEM - puppet last run on elastic1027 is CRITICAL puppet fail [10:49:08] PROBLEM - puppet last run on mw1042 is CRITICAL Puppet has 67 failures [10:49:08] PROBLEM - puppet last run on cp3040 is CRITICAL puppet fail [10:49:09] PROBLEM - puppet last run on cp3037 is CRITICAL Puppet has 26 failures [10:49:09] PROBLEM - puppet last run on elastic1021 is CRITICAL puppet fail [10:49:10] PROBLEM - puppet last run on labsdb1003 is CRITICAL Puppet has 10 failures [10:49:16] PROBLEM - puppet last run on mw1177 is CRITICAL puppet fail [10:49:16] PROBLEM - puppet last run on ruthenium is CRITICAL Puppet has 22 failures [10:49:17] PROBLEM - puppet last run on mw1092 is CRITICAL puppet fail [10:49:17] PROBLEM - puppet last run on iron is CRITICAL puppet fail [10:49:26] PROBLEM - puppet last run on cp1056 is CRITICAL puppet fail [10:49:26] PROBLEM - puppet last run on db2065 is CRITICAL puppet fail [10:49:26] PROBLEM - puppet last run on db1023 is CRITICAL puppet fail [10:49:27] PROBLEM - puppet last run on mw1144 is CRITICAL puppet fail [10:49:27] PROBLEM - puppet last run on mw1166 is CRITICAL puppet fail [10:49:27] PROBLEM - puppet last run on logstash1006 is CRITICAL puppet fail [10:49:27] PROBLEM - puppet last run on ms-fe2001 is CRITICAL Puppet has 22 failures [10:49:28] PROBLEM - puppet last run on analytics1030 is CRITICAL puppet fail [10:49:36] PROBLEM - puppet last run on mw1009 is CRITICAL Puppet has 28 failures [10:49:36] PROBLEM - puppet last run on mw1119 is CRITICAL Puppet has 59 failures [10:49:36] PROBLEM - puppet last run on mw2076 is CRITICAL Puppet has 70 failures [10:49:37] PROBLEM - puppet last run on mw2123 is CRITICAL puppet fail [10:49:37] PROBLEM - puppet last run on analytics1010 is CRITICAL Puppet has 40 failures [10:49:46] PROBLEM - puppet last run on db1059 is CRITICAL Puppet has 23 failures [10:49:47] PROBLEM - puppet last run on lvs2004 is CRITICAL Puppet has 12 failures [10:49:47] PROBLEM - puppet last run on db2040 is CRITICAL puppet fail [10:49:47] PROBLEM - puppet last run on db2036 is CRITICAL puppet fail [10:49:47] PROBLEM - puppet last run on mw2143 is CRITICAL Puppet has 40 failures [10:49:47] PROBLEM - puppet last run on db1042 is CRITICAL puppet fail [10:49:48] PROBLEM - puppet last run on mw1118 is CRITICAL Puppet has 71 failures [10:49:48] PROBLEM - puppet last run on mw2184 is CRITICAL Puppet has 35 failures [10:49:49] PROBLEM - puppet last run on db2064 is CRITICAL Puppet has 26 failures [10:49:56] PROBLEM - puppet last run on db1052 is CRITICAL puppet fail [10:49:56] PROBLEM - puppet last run on mw2097 is CRITICAL puppet fail [10:49:56] PROBLEM - puppet last run on labvirt1003 is CRITICAL Puppet has 13 failures [10:49:56] PROBLEM - puppet last run on mw2079 is CRITICAL Puppet has 33 failures [10:49:57] PROBLEM - puppet last run on mw1065 is CRITICAL puppet fail [10:49:57] PROBLEM - puppet last run on mw2212 is CRITICAL Puppet has 33 failures [10:49:57] PROBLEM - puppet last run on mw2083 is CRITICAL Puppet has 23 failures [10:49:58] PROBLEM - puppet last run on mw2206 is CRITICAL Puppet has 76 failures [10:49:58] PROBLEM - puppet last run on mw2043 is CRITICAL Puppet has 67 failures [10:49:59] PROBLEM - puppet last run on logstash1002 is CRITICAL puppet fail [10:49:59] PROBLEM - puppet last run on mw2003 is CRITICAL puppet fail [10:50:00] PROBLEM - puppet last run on mw2134 is CRITICAL puppet fail [10:50:16] PROBLEM - puppet last run on mw2190 is CRITICAL puppet fail [10:50:16] 6operations, 10Deployment-Systems, 7Graphite: [scap] Deploy events aren't showing up in graphite/gdash - https://phabricator.wikimedia.org/T64667#1251557 (10fgiunchedi) this should be working now in graphite at least https://graphite.wikimedia.org/render/?width=586&height=308&_salt=1430476877.683&target=dra... [10:50:16] PROBLEM - puppet last run on mw2090 is CRITICAL puppet fail [10:50:16] PROBLEM - puppet last run on mw2030 is CRITICAL Puppet has 37 failures [10:50:16] PROBLEM - puppet last run on mw1170 is CRITICAL Puppet has 36 failures [10:50:16] PROBLEM - puppet last run on db1028 is CRITICAL puppet fail [10:50:17] PROBLEM - puppet last run on mw1052 is CRITICAL Puppet has 37 failures [10:50:17] PROBLEM - puppet last run on mw1213 is CRITICAL Puppet has 43 failures [10:50:18] PROBLEM - puppet last run on dbproxy1001 is CRITICAL Puppet has 20 failures [10:50:18] PROBLEM - puppet last run on db1034 is CRITICAL Puppet has 26 failures [10:50:26] PROBLEM - puppet last run on mw1211 is CRITICAL puppet fail [10:50:26] PROBLEM - puppet last run on db1016 is CRITICAL puppet fail [10:50:26] PROBLEM - puppet last run on cp3006 is CRITICAL puppet fail [10:50:27] PROBLEM - puppet last run on pc1002 is CRITICAL puppet fail [10:50:27] PROBLEM - puppet last run on db1067 is CRITICAL Puppet has 13 failures [10:50:27] PROBLEM - puppet last run on cp4008 is CRITICAL Puppet has 19 failures [10:50:27] PROBLEM - puppet last run on db1021 is CRITICAL puppet fail [10:50:28] PROBLEM - puppet last run on mw1055 is CRITICAL puppet fail [10:50:28] PROBLEM - puppet last run on db1046 is CRITICAL Puppet has 22 failures [10:50:36] PROBLEM - puppet last run on mw1025 is CRITICAL Puppet has 40 failures [10:50:36] PROBLEM - puppet last run on mw1054 is CRITICAL Puppet has 34 failures [10:50:37] PROBLEM - puppet last run on mw1123 is CRITICAL Puppet has 35 failures [10:50:37] PROBLEM - puppet last run on mw1114 is CRITICAL puppet fail [10:50:37] PROBLEM - puppet last run on mw1175 is CRITICAL puppet fail [10:50:37] PROBLEM - puppet last run on virt1001 is CRITICAL Puppet has 17 failures [10:50:37] PROBLEM - puppet last run on analytics1038 is CRITICAL puppet fail [10:50:38] PROBLEM - puppet last run on mw1044 is CRITICAL puppet fail [10:50:38] PROBLEM - puppet last run on mw1039 is CRITICAL puppet fail [10:50:39] PROBLEM - puppet last run on db1051 is CRITICAL Puppet has 10 failures [10:50:46] PROBLEM - puppet last run on tin is CRITICAL puppet fail [10:50:47] PROBLEM - puppet last run on db2038 is CRITICAL puppet fail [10:50:47] PROBLEM - puppet last run on wtp2019 is CRITICAL puppet fail [10:50:47] PROBLEM - puppet last run on wtp2012 is CRITICAL puppet fail [10:50:47] PROBLEM - puppet last run on mw2126 is CRITICAL Puppet has 79 failures [10:50:47] PROBLEM - puppet last run on mw2062 is CRITICAL puppet fail [10:50:48] PROBLEM - puppet last run on mw2022 is CRITICAL Puppet has 80 failures [10:50:48] PROBLEM - puppet last run on gallium is CRITICAL Puppet has 39 failures [10:50:49] PROBLEM - puppet last run on cp4004 is CRITICAL Puppet has 26 failures [10:50:49] PROBLEM - puppet last run on cp3008 is CRITICAL puppet fail [10:50:50] PROBLEM - puppet last run on cp3014 is CRITICAL Puppet has 17 failures [10:50:50] PROBLEM - puppet last run on multatuli is CRITICAL puppet fail [10:50:56] PROBLEM - puppet last run on mw1172 is CRITICAL Puppet has 28 failures [10:50:56] PROBLEM - puppet last run on cp1071 is CRITICAL puppet fail [10:50:57] PROBLEM - puppet last run on wtp1012 is CRITICAL Puppet has 15 failures [10:50:57] PROBLEM - puppet last run on mw1149 is CRITICAL puppet fail [10:50:57] PROBLEM - puppet last run on mc1012 is CRITICAL puppet fail [10:51:06] PROBLEM - puppet last run on analytics1016 is CRITICAL puppet fail [10:51:06] PROBLEM - puppet last run on mw1251 is CRITICAL puppet fail [10:51:06] PROBLEM - puppet last run on lithium is CRITICAL puppet fail [10:51:07] PROBLEM - puppet last run on mw1129 is CRITICAL puppet fail [10:51:07] PROBLEM - puppet last run on ms-fe2003 is CRITICAL Puppet has 25 failures [10:51:17] PROBLEM - puppet last run on mw1249 is CRITICAL puppet fail [10:51:17] PROBLEM - puppet last run on mw2146 is CRITICAL Puppet has 35 failures [10:51:17] PROBLEM - puppet last run on mw2095 is CRITICAL Puppet has 33 failures [10:51:17] PROBLEM - puppet last run on mw2096 is CRITICAL Puppet has 29 failures [10:51:26] PROBLEM - puppet last run on labnet1001 is CRITICAL Puppet has 22 failures [10:51:27] PROBLEM - puppet last run on lvs2001 is CRITICAL Puppet has 21 failures [10:51:27] PROBLEM - puppet last run on polonium is CRITICAL Puppet has 11 failures [10:51:36] PROBLEM - puppet last run on labcontrol2001 is CRITICAL Puppet has 20 failures [10:51:37] PROBLEM - puppet last run on es1007 is CRITICAL Puppet has 28 failures [10:51:37] PROBLEM - puppet last run on mw1011 is CRITICAL Puppet has 21 failures [10:51:37] PROBLEM - puppet last run on mw2017 is CRITICAL Puppet has 72 failures [10:51:37] PROBLEM - puppet last run on db2007 is CRITICAL Puppet has 9 failures [10:51:37] PROBLEM - puppet last run on mw2002 is CRITICAL puppet fail [10:51:38] PROBLEM - puppet last run on nembus is CRITICAL puppet fail [10:52:06] PROBLEM - puppet last run on cp4014 is CRITICAL Puppet has 21 failures [10:52:06] PROBLEM - puppet last run on virt1003 is CRITICAL puppet fail [10:52:07] PROBLEM - puppet last run on stat1003 is CRITICAL puppet fail [10:52:16] PROBLEM - puppet last run on wtp1005 is CRITICAL puppet fail [10:54:28] (03PS1) 10Filippo Giunchedi: graphite: split alerts role [puppet] - 10https://gerrit.wikimedia.org/r/208083 (https://phabricator.wikimedia.org/T97754) [10:55:16] PROBLEM - puppet last run on ms-be2011 is CRITICAL Puppet has 13 failures [10:57:36] RECOVERY - puppet last run on cp4010 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures [10:57:36] RECOVERY - puppet last run on cp3048 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [10:57:37] RECOVERY - puppet last run on cp1066 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:57:57] PROBLEM - puppet last run on mw2092 is CRITICAL Puppet has 46 failures [10:58:06] RECOVERY - puppet last run on praseodymium is OK Puppet is currently enabled, last run 18 seconds ago with 0 failures [10:58:16] RECOVERY - puppet last run on lvs2002 is OK Puppet is currently enabled, last run 1 second ago with 0 failures [10:58:17] PROBLEM - puppet last run on mw2136 is CRITICAL Puppet has 43 failures [10:58:17] RECOVERY - puppet last run on wtp2016 is OK Puppet is currently enabled, last run 1 second ago with 0 failures [10:58:18] RECOVERY - puppet last run on es1004 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [10:58:47] RECOVERY - puppet last run on db1009 is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures [10:58:47] RECOVERY - puppet last run on analytics1021 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [10:58:47] RECOVERY - puppet last run on cp4015 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:58:48] RECOVERY - puppet last run on cp3033 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures [10:58:48] RECOVERY - puppet last run on cp3044 is OK Puppet is currently enabled, last run 0 seconds ago with 0 failures [10:58:48] RECOVERY - puppet last run on cp3038 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:58:48] RECOVERY - puppet last run on cp3043 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures [10:58:49] RECOVERY - puppet last run on cp3036 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:58:49] RECOVERY - puppet last run on cp4016 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:58:56] RECOVERY - puppet last run on cp4017 is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures [10:59:07] RECOVERY - puppet last run on mc1015 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:07] RECOVERY - puppet last run on db2044 is OK Puppet is currently enabled, last run 31 seconds ago with 0 failures [10:59:07] RECOVERY - puppet last run on db2047 is OK Puppet is currently enabled, last run 47 seconds ago with 0 failures [10:59:07] RECOVERY - puppet last run on db2017 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [10:59:08] RECOVERY - puppet last run on db2010 is OK Puppet is currently enabled, last run 33 seconds ago with 0 failures [10:59:08] RECOVERY - puppet last run on db1068 is OK Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:59:08] RECOVERY - puppet last run on cp1074 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [10:59:09] RECOVERY - puppet last run on elastic1010 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [10:59:09] RECOVERY - puppet last run on palladium is OK Puppet is currently enabled, last run 19 seconds ago with 0 failures [10:59:10] RECOVERY - puppet last run on bast2001 is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [10:59:10] RECOVERY - puppet last run on hydrogen is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [10:59:16] RECOVERY - puppet last run on cp3020 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures [10:59:16] RECOVERY - puppet last run on cp3007 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:16] RECOVERY - puppet last run on cp3022 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:16] RECOVERY - puppet last run on cp3017 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:16] RECOVERY - puppet last run on cp4011 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:17] RECOVERY - puppet last run on analytics1012 is OK Puppet is currently enabled, last run 31 seconds ago with 0 failures [10:59:17] RECOVERY - puppet last run on cp1057 is OK Puppet is currently enabled, last run 40 seconds ago with 0 failures [10:59:18] RECOVERY - puppet last run on cp1073 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:26] RECOVERY - puppet last run on analytics1031 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:26] RECOVERY - puppet last run on restbase1006 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:27] RECOVERY - puppet last run on db1019 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures [10:59:27] RECOVERY - puppet last run on db2056 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:27] RECOVERY - puppet last run on db2055 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:28] RECOVERY - puppet last run on cp1064 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:28] RECOVERY - puppet last run on wtp2004 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [10:59:37] RECOVERY - puppet last run on lvs2005 is OK Puppet is currently enabled, last run 39 seconds ago with 0 failures [10:59:37] RECOVERY - puppet last run on elastic1029 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:37] RECOVERY - puppet last run on mw1157 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [10:59:37] RECOVERY - puppet last run on eventlog2001 is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures [10:59:46] RECOVERY - puppet last run on wtp1014 is OK Puppet is currently enabled, last run 36 seconds ago with 0 failures [10:59:47] RECOVERY - puppet last run on db1049 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:47] RECOVERY - puppet last run on db2061 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [10:59:47] RECOVERY - puppet last run on analytics1029 is OK Puppet is currently enabled, last run 30 seconds ago with 0 failures [10:59:47] RECOVERY - puppet last run on pollux is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures [10:59:56] RECOVERY - puppet last run on db2048 is OK Puppet is currently enabled, last run 0 seconds ago with 0 failures [10:59:56] RECOVERY - puppet last run on wtp2014 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [10:59:56] RECOVERY - puppet last run on es1003 is OK Puppet is currently enabled, last run 55 seconds ago with 0 failures [10:59:56] RECOVERY - puppet last run on db1024 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [10:59:57] RECOVERY - puppet last run on ms-be1014 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:57] RECOVERY - puppet last run on erbium is OK Puppet is currently enabled, last run 1 second ago with 0 failures [10:59:57] RECOVERY - puppet last run on elastic1016 is OK Puppet is currently enabled, last run 39 seconds ago with 0 failures [10:59:58] RECOVERY - puppet last run on labsdb1002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [10:59:58] RECOVERY - puppet last run on lvs1006 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [10:59:59] RECOVERY - puppet last run on ms-be2015 is OK Puppet is currently enabled, last run 17 seconds ago with 0 failures [10:59:59] RECOVERY - puppet last run on virt1009 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:00:00] RECOVERY - puppet last run on ms-be2009 is OK Puppet is currently enabled, last run 51 seconds ago with 0 failures [11:00:17] RECOVERY - puppet last run on mw1155 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:00:17] RECOVERY - puppet last run on db1058 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:00:17] RECOVERY - puppet last run on elastic1013 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:00:27] RECOVERY - puppet last run on cp1065 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:00:27] RECOVERY - puppet last run on labsdb1007 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:00:27] RECOVERY - puppet last run on analytics1024 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures [11:00:27] RECOVERY - puppet last run on analytics1019 is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures [11:00:28] RECOVERY - puppet last run on cp4013 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:00:36] RECOVERY - puppet last run on db1041 is OK Puppet is currently enabled, last run 56 seconds ago with 0 failures [11:00:36] RECOVERY - puppet last run on analytics1034 is OK Puppet is currently enabled, last run 46 seconds ago with 0 failures [11:00:36] RECOVERY - puppet last run on wtp1024 is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures [11:00:37] RECOVERY - puppet last run on ms-be1016 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures [11:00:37] RECOVERY - puppet last run on ms-be1005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:00:37] RECOVERY - puppet last run on dbproxy1003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:00:37] RECOVERY - puppet last run on elastic1009 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:00:38] RECOVERY - puppet last run on lanthanum is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:00:46] RECOVERY - puppet last run on mw1199 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures [11:00:47] RECOVERY - puppet last run on mw1058 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures [11:00:47] RECOVERY - puppet last run on mc1008 is OK Puppet is currently enabled, last run 26 seconds ago with 0 failures [11:00:47] RECOVERY - puppet last run on tmh1002 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [11:00:47] RECOVERY - puppet last run on mw1255 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures [11:00:47] RECOVERY - puppet last run on mw1207 is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [11:00:47] RECOVERY - puppet last run on mw1179 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures [11:00:48] RECOVERY - puppet last run on mw1015 is OK Puppet is currently enabled, last run 1 second ago with 0 failures [11:00:48] RECOVERY - puppet last run on db2041 is OK Puppet is currently enabled, last run 30 seconds ago with 0 failures [11:00:49] RECOVERY - puppet last run on db1010 is OK Puppet is currently enabled, last run 16 seconds ago with 0 failures [11:00:49] RECOVERY - puppet last run on mw2176 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures [11:00:50] RECOVERY - puppet last run on wtp1021 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [11:01:06] RECOVERY - puppet last run on mw1078 is OK Puppet is currently enabled, last run 16 seconds ago with 0 failures [11:01:06] RECOVERY - puppet last run on es1009 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [11:01:07] RECOVERY - puppet last run on cp1043 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:01:07] RECOVERY - puppet last run on mw1104 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [11:01:07] RECOVERY - puppet last run on analytics1036 is OK Puppet is currently enabled, last run 16 seconds ago with 0 failures [11:01:07] RECOVERY - puppet last run on zirconium is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures [11:01:07] RECOVERY - puppet last run on mw1127 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [11:01:08] RECOVERY - puppet last run on mw1040 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [11:01:08] RECOVERY - puppet last run on mw1110 is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures [11:01:09] RECOVERY - puppet last run on db2051 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [11:01:09] RECOVERY - puppet last run on mw1169 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures [11:01:16] RECOVERY - puppet last run on db2035 is OK Puppet is currently enabled, last run 23 seconds ago with 0 failures [11:01:16] RECOVERY - puppet last run on mw2195 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [11:01:16] RECOVERY - puppet last run on mw2138 is OK Puppet is currently enabled, last run 7 seconds ago with 0 failures [11:01:17] RECOVERY - puppet last run on mw2165 is OK Puppet is currently enabled, last run 6 seconds ago with 0 failures [11:01:17] RECOVERY - puppet last run on mw1234 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures [11:01:17] RECOVERY - puppet last run on stat1001 is OK Puppet is currently enabled, last run 56 seconds ago with 0 failures [11:01:17] RECOVERY - puppet last run on es1006 is OK Puppet is currently enabled, last run 45 seconds ago with 0 failures [11:01:18] RECOVERY - puppet last run on mc1011 is OK Puppet is currently enabled, last run 6 seconds ago with 0 failures [11:01:18] RECOVERY - puppet last run on pc1001 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures [11:01:19] RECOVERY - puppet last run on mw2129 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures [11:01:19] RECOVERY - puppet last run on mw2135 is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures [11:01:36] RECOVERY - puppet last run on mw2182 is OK Puppet is currently enabled, last run 35 seconds ago with 0 failures [11:01:36] RECOVERY - puppet last run on mw2158 is OK Puppet is currently enabled, last run 35 seconds ago with 0 failures [11:01:36] RECOVERY - puppet last run on mw2172 is OK Puppet is currently enabled, last run 24 seconds ago with 0 failures [11:01:36] RECOVERY - puppet last run on mw1017 is OK Puppet is currently enabled, last run 45 seconds ago with 0 failures [11:01:37] RECOVERY - puppet last run on mw2207 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:01:37] RECOVERY - puppet last run on mw2177 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures [11:01:37] RECOVERY - puppet last run on mw1196 is OK Puppet is currently enabled, last run 26 seconds ago with 0 failures [11:01:38] RECOVERY - puppet last run on mw1021 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:01:38] RECOVERY - puppet last run on mw2203 is OK Puppet is currently enabled, last run 39 seconds ago with 0 failures [11:02:57] RECOVERY - puppet last run on mw1161 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:02:57] RECOVERY - puppet last run on ms-be1017 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:02:58] RECOVERY - puppet last run on mw1038 is OK Puppet is currently enabled, last run 18 seconds ago with 0 failures [11:02:58] RECOVERY - puppet last run on magnesium is OK Puppet is currently enabled, last run 53 seconds ago with 0 failures [11:02:59] RECOVERY - puppet last run on wtp1020 is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures [11:02:59] RECOVERY - puppet last run on mw2099 is OK Puppet is currently enabled, last run 26 seconds ago with 0 failures [11:03:00] RECOVERY - puppet last run on ms-be2013 is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures [11:03:00] RECOVERY - puppet last run on mw1145 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures [11:03:01] RECOVERY - puppet last run on analytics1033 is OK Puppet is currently enabled, last run 6 seconds ago with 0 failures [11:03:01] RECOVERY - puppet last run on mw1067 is OK Puppet is currently enabled, last run 7 seconds ago with 0 failures [11:03:02] RECOVERY - puppet last run on mw2035 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:03:02] RECOVERY - puppet last run on mw2053 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:03:07] RECOVERY - puppet last run on mw1072 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [11:03:07] RECOVERY - puppet last run on cerium is OK Puppet is currently enabled, last run 46 seconds ago with 0 failures [11:03:07] RECOVERY - puppet last run on db1053 is OK Puppet is currently enabled, last run 45 seconds ago with 0 failures [11:03:07] RECOVERY - puppet last run on mw1012 is OK Puppet is currently enabled, last run 10 seconds ago with 0 failures [11:03:07] RECOVERY - puppet last run on db2046 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:03:16] RECOVERY - puppet last run on db2070 is OK Puppet is currently enabled, last run 46 seconds ago with 0 failures [11:03:16] RECOVERY - puppet last run on mw1147 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:03:16] RECOVERY - puppet last run on mw2169 is OK Puppet is currently enabled, last run 25 seconds ago with 0 failures [11:03:17] RECOVERY - puppet last run on mw1080 is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures [11:03:18] RECOVERY - puppet last run on netmon1001 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures [11:03:18] RECOVERY - puppet last run on mw2103 is OK Puppet is currently enabled, last run 51 seconds ago with 0 failures [11:03:18] RECOVERY - puppet last run on ms-be1006 is OK Puppet is currently enabled, last run 7 seconds ago with 0 failures [11:03:18] RECOVERY - puppet last run on mw2102 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:03:18] RECOVERY - puppet last run on helium is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures [11:03:18] RECOVERY - puppet last run on mw2106 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:03:18] RECOVERY - puppet last run on mw2112 is OK Puppet is currently enabled, last run 8 seconds ago with 0 failures [11:03:19] RECOVERY - puppet last run on mw2032 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures [11:03:36] RECOVERY - puppet last run on dbproxy1002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:03:36] RECOVERY - puppet last run on es1001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:03:36] RECOVERY - puppet last run on cp1072 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:03:36] RECOVERY - puppet last run on mw1134 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:03:36] RECOVERY - puppet last run on mw2183 is OK Puppet is currently enabled, last run 56 seconds ago with 0 failures [11:03:36] RECOVERY - puppet last run on mw2170 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [11:03:36] RECOVERY - puppet last run on snapshot1003 is OK Puppet is currently enabled, last run 0 seconds ago with 0 failures [11:03:37] RECOVERY - puppet last run on mc2004 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [11:03:37] RECOVERY - puppet last run on mw1005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:03:38] RECOVERY - puppet last run on ms-be2003 is OK Puppet is currently enabled, last run 24 seconds ago with 0 failures [11:03:39] RECOVERY - puppet last run on ms-be2002 is OK Puppet is currently enabled, last run 59 seconds ago with 0 failures [11:03:39] RECOVERY - puppet last run on labstore2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:03:49] mozhet vykluchu poka [11:03:56] RECOVERY - puppet last run on cp4006 is OK Puppet is currently enabled, last run 55 seconds ago with 0 failures [11:03:56] sorry, wrong channel [11:03:57] RECOVERY - puppet last run on lvs1002 is OK Puppet is currently enabled, last run 33 seconds ago with 0 failures [11:03:57] RECOVERY - puppet last run on mw1106 is OK Puppet is currently enabled, last run 28 seconds ago with 0 failures [11:03:57] RECOVERY - puppet last run on wtp1006 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [11:03:57] RECOVERY - puppet last run on mw1045 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:03:58] RECOVERY - puppet last run on cp1049 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:06] RECOVERY - puppet last run on neptunium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:07] RECOVERY - puppet last run on mw1141 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:07] RECOVERY - puppet last run on potassium is OK Puppet is currently enabled, last run 57 seconds ago with 0 failures [11:04:07] RECOVERY - puppet last run on db1073 is OK Puppet is currently enabled, last run 31 seconds ago with 0 failures [11:04:07] RECOVERY - puppet last run on lead is OK Puppet is currently enabled, last run 24 seconds ago with 0 failures [11:04:07] RECOVERY - puppet last run on mw1063 is OK Puppet is currently enabled, last run 46 seconds ago with 0 failures [11:04:07] RECOVERY - puppet last run on db2067 is OK Puppet is currently enabled, last run 35 seconds ago with 0 failures [11:04:07] RECOVERY - puppet last run on wtp2001 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures [11:04:08] RECOVERY - puppet last run on mw2160 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:08] RECOVERY - puppet last run on mw2164 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:09] RECOVERY - puppet last run on mw1254 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [11:04:09] RECOVERY - puppet last run on mw2108 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [11:04:10] RECOVERY - puppet last run on es2008 is OK Puppet is currently enabled, last run 28 seconds ago with 0 failures [11:04:10] RECOVERY - puppet last run on es2002 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [11:04:26] RECOVERY - puppet last run on labstore1003 is OK Puppet is currently enabled, last run 18 seconds ago with 0 failures [11:04:26] RECOVERY - puppet last run on mw1059 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:26] RECOVERY - puppet last run on ms-be1003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:26] RECOVERY - puppet last run on elastic1001 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [11:04:27] RECOVERY - puppet last run on cp1070 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:27] RECOVERY - puppet last run on xenon is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:27] RECOVERY - puppet last run on analytics1041 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [11:04:27] RECOVERY - puppet last run on mw1006 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:36] RECOVERY - puppet last run on wtp2018 is OK Puppet is currently enabled, last run 19 seconds ago with 0 failures [11:04:36] RECOVERY - puppet last run on db2039 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:36] RECOVERY - puppet last run on db2034 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:37] RECOVERY - puppet last run on wtp2005 is OK Puppet is currently enabled, last run 45 seconds ago with 0 failures [11:04:37] RECOVERY - puppet last run on mw1007 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:37] RECOVERY - puppet last run on elastic1012 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures [11:04:37] RECOVERY - puppet last run on elastic1004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:37] RECOVERY - puppet last run on analytics1040 is OK Puppet is currently enabled, last run 54 seconds ago with 0 failures [11:04:38] RECOVERY - puppet last run on elastic1007 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:38] RECOVERY - puppet last run on mw2076 is OK Puppet is currently enabled, last run 28 seconds ago with 0 failures [11:04:39] RECOVERY - puppet last run on mw1174 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:39] RECOVERY - puppet last run on db2019 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:40] RECOVERY - puppet last run on db2002 is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [11:04:40] RECOVERY - puppet last run on db2005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:46] RECOVERY - puppet last run on mw1187 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:46] RECOVERY - puppet last run on mw1197 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:46] RECOVERY - puppet last run on analytics1020 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:47] RECOVERY - puppet last run on mw1200 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:47] RECOVERY - puppet last run on mw1189 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [11:04:47] RECOVERY - puppet last run on mw1082 is OK Puppet is currently enabled, last run 39 seconds ago with 0 failures [11:04:56] RECOVERY - puppet last run on elastic1008 is OK Puppet is currently enabled, last run 45 seconds ago with 0 failures [11:04:56] RECOVERY - puppet last run on ms-fe2004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:56] RECOVERY - puppet last run on db2045 is OK Puppet is currently enabled, last run 42 seconds ago with 0 failures [11:04:56] RECOVERY - puppet last run on mw1224 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:57] RECOVERY - puppet last run on mw2161 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:57] RECOVERY - puppet last run on db2059 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:57] RECOVERY - puppet last run on mw2213 is OK Puppet is currently enabled, last run 41 seconds ago with 0 failures [11:04:57] RECOVERY - puppet last run on dbstore2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:57] RECOVERY - puppet last run on mw2124 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:58] RECOVERY - puppet last run on mc1006 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:58] RECOVERY - puppet last run on mw2122 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:04:59] RECOVERY - puppet last run on mw2100 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:05:16] RECOVERY - puppet last run on mw1250 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:05:16] RECOVERY - puppet last run on db1018 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [11:05:16] RECOVERY - puppet last run on wtp2008 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [11:05:17] RECOVERY - puppet last run on mw2109 is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [11:05:17] RECOVERY - puppet last run on mw1150 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [11:05:17] RECOVERY - puppet last run on mw1088 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [11:05:17] RECOVERY - puppet last run on mw1026 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:05:17] RECOVERY - puppet last run on mw2121 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:05:18] RECOVERY - puppet last run on mc2001 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures [11:05:18] RECOVERY - puppet last run on mw1205 is OK Puppet is currently enabled, last run 34 seconds ago with 0 failures [11:05:19] RECOVERY - puppet last run on lvs3001 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [11:05:19] RECOVERY - puppet last run on db1015 is OK Puppet is currently enabled, last run 27 seconds ago with 0 failures [11:05:26] RECOVERY - puppet last run on mw1153 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [11:05:26] RECOVERY - puppet last run on mw1228 is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures [11:05:26] RECOVERY - puppet last run on mw1217 is OK Puppet is currently enabled, last run 31 seconds ago with 0 failures [11:05:26] RECOVERY - puppet last run on mc1003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:05:27] RECOVERY - puppet last run on logstash1004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:05:27] RECOVERY - puppet last run on mw1099 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures [11:05:27] RECOVERY - puppet last run on cp4003 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [11:05:27] RECOVERY - puppet last run on cp3042 is OK Puppet is currently enabled, last run 24 seconds ago with 0 failures [11:05:38] RECOVERY - puppet last run on mw1173 is OK Puppet is currently enabled, last run 51 seconds ago with 0 failures [11:05:38] RECOVERY - puppet last run on mw1164 is OK Puppet is currently enabled, last run 47 seconds ago with 0 failures [11:05:46] RECOVERY - puppet last run on virt1006 is OK Puppet is currently enabled, last run 22 seconds ago with 0 failures [11:05:47] RECOVERY - puppet last run on mw1235 is OK Puppet is currently enabled, last run 24 seconds ago with 0 failures [11:05:47] RECOVERY - puppet last run on mw1003 is OK Puppet is currently enabled, last run 49 seconds ago with 0 failures [11:05:47] RECOVERY - puppet last run on elastic1027 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures [11:05:47] RECOVERY - puppet last run on mw2173 is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [11:05:57] RECOVERY - puppet last run on cp3040 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:05:57] RECOVERY - puppet last run on cp3037 is OK Puppet is currently enabled, last run 52 seconds ago with 0 failures [11:05:57] RECOVERY - puppet last run on labsdb1003 is OK Puppet is currently enabled, last run 50 seconds ago with 0 failures [11:05:57] RECOVERY - puppet last run on cp3010 is OK Puppet is currently enabled, last run 47 seconds ago with 0 failures [11:05:57] RECOVERY - puppet last run on cp3016 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:05:57] RECOVERY - puppet last run on ruthenium is OK Puppet is currently enabled, last run 36 seconds ago with 0 failures [11:06:06] RECOVERY - puppet last run on mw1242 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures [11:06:06] RECOVERY - puppet last run on mw1222 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:07] RECOVERY - puppet last run on iron is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [11:06:07] RECOVERY - puppet last run on cp1056 is OK Puppet is currently enabled, last run 22 seconds ago with 0 failures [11:06:08] RECOVERY - puppet last run on mw1144 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures [11:06:08] RECOVERY - puppet last run on mw1166 is OK Puppet is currently enabled, last run 6 seconds ago with 0 failures [11:06:13] (03PS1) 10Filippo Giunchedi: gdash: adjust deploy metrics [puppet] - 10https://gerrit.wikimedia.org/r/208085 (https://phabricator.wikimedia.org/T64667) [11:06:16] RECOVERY - puppet last run on db2065 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [11:06:16] RECOVERY - puppet last run on logstash1006 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures [11:06:16] RECOVERY - puppet last run on analytics1030 is OK Puppet is currently enabled, last run 46 seconds ago with 0 failures [11:06:16] RECOVERY - puppet last run on ms-fe2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:17] RECOVERY - puppet last run on mw1226 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:17] RECOVERY - puppet last run on mw1009 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:17] RECOVERY - puppet last run on mw1068 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:17] RECOVERY - puppet last run on mw1119 is OK Puppet is currently enabled, last run 5 seconds ago with 0 failures [11:06:17] RECOVERY - puppet last run on mw1069 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:18] RECOVERY - puppet last run on mw2123 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [11:06:18] RECOVERY - puppet last run on analytics1010 is OK Puppet is currently enabled, last run 11 seconds ago with 0 failures [11:06:26] RECOVERY - puppet last run on ms-be2004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:26] RECOVERY - puppet last run on es2009 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:27] RECOVERY - puppet last run on db1040 is OK Puppet is currently enabled, last run 57 seconds ago with 0 failures [11:06:27] RECOVERY - puppet last run on db1059 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:27] RECOVERY - puppet last run on mw1008 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:36] RECOVERY - puppet last run on lvs2004 is OK Puppet is currently enabled, last run 30 seconds ago with 0 failures [11:06:36] RECOVERY - puppet last run on db1042 is OK Puppet is currently enabled, last run 28 seconds ago with 0 failures [11:06:36] RECOVERY - puppet last run on mw1118 is OK Puppet is currently enabled, last run 8 seconds ago with 0 failures [11:06:36] RECOVERY - puppet last run on mw1046 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:36] RECOVERY - puppet last run on db2040 is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures [11:06:36] RECOVERY - puppet last run on db2036 is OK Puppet is currently enabled, last run 46 seconds ago with 0 failures [11:06:37] RECOVERY - puppet last run on mw2143 is OK Puppet is currently enabled, last run 0 seconds ago with 0 failures [11:06:37] RECOVERY - puppet last run on db2064 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:37] RECOVERY - puppet last run on mw2184 is OK Puppet is currently enabled, last run 35 seconds ago with 0 failures [11:06:38] RECOVERY - puppet last run on mw2163 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:38] RECOVERY - puppet last run on labvirt1003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:39] RECOVERY - puppet last run on mw1065 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures [11:06:50] RECOVERY - puppet last run on mw2036 is OK Puppet is currently enabled, last run 49 seconds ago with 0 failures [11:06:50] RECOVERY - puppet last run on mw2016 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [11:06:51] RECOVERY - puppet last run on ms-fe1004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:51] RECOVERY - puppet last run on mw2045 is OK Puppet is currently enabled, last run 3 seconds ago with 0 failures [11:06:52] RECOVERY - puppet last run on db1002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:57] RECOVERY - puppet last run on mw1170 is OK Puppet is currently enabled, last run 43 seconds ago with 0 failures [11:06:57] RECOVERY - puppet last run on mw2145 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:57] RECOVERY - puppet last run on mw2073 is OK Puppet is currently enabled, last run 24 seconds ago with 0 failures [11:06:57] RECOVERY - puppet last run on mw1052 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [11:06:57] RECOVERY - puppet last run on db1028 is OK Puppet is currently enabled, last run 38 seconds ago with 0 failures [11:06:57] RECOVERY - puppet last run on mw1213 is OK Puppet is currently enabled, last run 21 seconds ago with 0 failures [11:07:06] RECOVERY - puppet last run on dbproxy1001 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [11:07:07] RECOVERY - puppet last run on virt1003 is OK Puppet is currently enabled, last run 13 seconds ago with 0 failures [11:07:07] RECOVERY - puppet last run on db1034 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:07] RECOVERY - puppet last run on mw1211 is OK Puppet is currently enabled, last run 1 second ago with 0 failures [11:07:07] RECOVERY - puppet last run on holmium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:07] RECOVERY - puppet last run on cp4014 is OK Puppet is currently enabled, last run 14 seconds ago with 0 failures [11:07:07] RECOVERY - puppet last run on db1067 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:16] RECOVERY - puppet last run on db1021 is OK Puppet is currently enabled, last run 55 seconds ago with 0 failures [11:07:16] RECOVERY - puppet last run on mw1061 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:16] RECOVERY - puppet last run on cp4008 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:17] RECOVERY - puppet last run on mw1054 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [11:07:17] RECOVERY - puppet last run on mw1025 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:17] RECOVERY - puppet last run on mw1114 is OK Puppet is currently enabled, last run 37 seconds ago with 0 failures [11:07:26] RECOVERY - puppet last run on mw1175 is OK Puppet is currently enabled, last run 36 seconds ago with 0 failures [11:07:26] RECOVERY - puppet last run on virt1001 is OK Puppet is currently enabled, last run 15 seconds ago with 0 failures [11:07:26] RECOVERY - puppet last run on mw1039 is OK Puppet is currently enabled, last run 48 seconds ago with 0 failures [11:07:26] RECOVERY - puppet last run on db1051 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:27] RECOVERY - puppet last run on tin is OK Puppet is currently enabled, last run 12 seconds ago with 0 failures [11:07:27] RECOVERY - puppet last run on mw1126 is OK Puppet is currently enabled, last run 47 seconds ago with 0 failures [11:07:27] RECOVERY - puppet last run on mw1002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:28] RECOVERY - puppet last run on db2038 is OK Puppet is currently enabled, last run 22 seconds ago with 0 failures [11:07:28] RECOVERY - puppet last run on wtp2012 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:28] RECOVERY - puppet last run on wtp2015 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:36] RECOVERY - puppet last run on mw2113 is OK Puppet is currently enabled, last run 47 seconds ago with 0 failures [11:07:36] RECOVERY - puppet last run on gallium is OK Puppet is currently enabled, last run 31 seconds ago with 0 failures [11:07:36] RECOVERY - puppet last run on mw2126 is OK Puppet is currently enabled, last run 57 seconds ago with 0 failures [11:07:36] RECOVERY - puppet last run on mw2059 is OK Puppet is currently enabled, last run 2 seconds ago with 0 failures [11:07:36] RECOVERY - puppet last run on mw1042 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:36] RECOVERY - puppet last run on mw2022 is OK Puppet is currently enabled, last run 46 seconds ago with 0 failures [11:07:37] RECOVERY - puppet last run on mw1172 is OK Puppet is currently enabled, last run 17 seconds ago with 0 failures [11:07:37] RECOVERY - puppet last run on cp4004 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:37] RECOVERY - puppet last run on cp3008 is OK Puppet is currently enabled, last run 41 seconds ago with 0 failures [11:07:38] RECOVERY - puppet last run on multatuli is OK Puppet is currently enabled, last run 32 seconds ago with 0 failures [11:07:38] RECOVERY - puppet last run on cp3014 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:39] RECOVERY - puppet last run on cp1071 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [11:07:56] RECOVERY - puppet last run on mw1129 is OK Puppet is currently enabled, last run 47 seconds ago with 0 failures [11:07:56] RECOVERY - puppet last run on ms-fe2003 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:57] RECOVERY - puppet last run on mw1249 is OK Puppet is currently enabled, last run 9 seconds ago with 0 failures [11:07:57] RECOVERY - puppet last run on mw2146 is OK Puppet is currently enabled, last run 58 seconds ago with 0 failures [11:07:57] RECOVERY - puppet last run on mw2092 is OK Puppet is currently enabled, last run 55 seconds ago with 0 failures [11:07:57] RECOVERY - puppet last run on mw2095 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:07:57] RECOVERY - puppet last run on mw2096 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:07] RECOVERY - puppet last run on labnet1001 is OK Puppet is currently enabled, last run 40 seconds ago with 0 failures [11:08:17] RECOVERY - puppet last run on polonium is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:18] RECOVERY - puppet last run on lvs2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:18] RECOVERY - puppet last run on db1052 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:18] RECOVERY - puppet last run on es1007 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:19] RECOVERY - puppet last run on labcontrol2001 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:19] RECOVERY - puppet last run on mw1011 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:19] RECOVERY - puppet last run on mw2097 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:19] RECOVERY - puppet last run on mw2136 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:19] RECOVERY - puppet last run on mw2079 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:19] RECOVERY - puppet last run on mw2017 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:20] RECOVERY - puppet last run on db2007 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:26] RECOVERY - puppet last run on mw2134 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:26] RECOVERY - puppet last run on mw2011 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:26] RECOVERY - puppet last run on mw2056 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:26] RECOVERY - puppet last run on db2029 is OK Puppet is currently enabled, last run 41 seconds ago with 0 failures [11:08:27] RECOVERY - puppet last run on nembus is OK Puppet is currently enabled, last run 33 seconds ago with 0 failures [11:08:36] RECOVERY - puppet last run on snapshot1001 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [11:08:37] RECOVERY - puppet last run on mw2190 is OK Puppet is currently enabled, last run 39 seconds ago with 0 failures [11:08:37] RECOVERY - puppet last run on mw2090 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:37] RECOVERY - puppet last run on ms-be2011 is OK Puppet is currently enabled, last run 56 seconds ago with 0 failures [11:08:37] RECOVERY - puppet last run on mw2030 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:47] RECOVERY - puppet last run on db1016 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:47] RECOVERY - puppet last run on stat1003 is OK Puppet is currently enabled, last run 27 seconds ago with 0 failures [11:08:47] RECOVERY - puppet last run on pc1002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:56] RECOVERY - puppet last run on cp3006 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:56] RECOVERY - puppet last run on mw1055 is OK Puppet is currently enabled, last run 53 seconds ago with 0 failures [11:08:57] RECOVERY - puppet last run on wtp1005 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:09:06] RECOVERY - puppet last run on analytics1038 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:09:06] RECOVERY - puppet last run on mw1044 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:09:17] RECOVERY - puppet last run on wtp2019 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:09:17] RECOVERY - puppet last run on mw2062 is OK Puppet is currently enabled, last run 20 seconds ago with 0 failures [11:09:26] RECOVERY - puppet last run on mw1149 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [11:10:06] RECOVERY - puppet last run on mw2002 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [11:17:26] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 13.33% of data above the critical threshold [500.0] [11:20:48] (03CR) 10Glaisher: Modify AbuseFilter block configuration on eswikibooks (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/206510 (https://phabricator.wikimedia.org/T96669) (owner: 10Glaisher) [11:21:36] (03PS2) 10Glaisher: Modify AbuseFilter block configuration on eswikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/206510 (https://phabricator.wikimedia.org/T96669) [11:30:36] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [11:48:20] (03PS1) 10TTO: Restrict changetags right to sysops and bots only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208088 (https://phabricator.wikimedia.org/T97013) [11:55:03] (03CR) 10Glaisher: Create Wikipedia Konkani (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/206300 (https://phabricator.wikimedia.org/T96468) (owner: 10Dzahn) [12:07:08] . [12:07:09] did usa intelligence supply isis with weapons like they did with al-qaeda to justify creating wars? [12:07:09] did usa excute the creative mess in the middle east like they said they will, does the creative mess include explosions with uncertain responsibles to create wars? [12:07:09] plz, send my qs to help limiting usa & israel aggression against others& may then lessen number of people killed in the middle east. [12:07:09] .did usa intelligence supply isis with weapons like they did with al-qaeda to justify creating wars? [12:12:43] (03CR) 10Glaisher: Create Wikipedia Konkani (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/206300 (https://phabricator.wikimedia.org/T96468) (owner: 10Dzahn) [12:14:57] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 7.14% of data above the critical threshold [500.0] [12:28:07] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [12:47:01] (03CR) 10Dereckson: [C: 04-1] "Some others users have expressed the idea gadgets, scripts through the APIs would benefit of this changetags capability." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208088 (https://phabricator.wikimedia.org/T97013) (owner: 10TTO) [13:10:36] PROBLEM - MySQL InnoDB on db1040 is CRITICAL: CRIT longest blocking idle transaction sleeps for 622 seconds [13:12:47] PROBLEM - MySQL Idle Transactions on db1040 is CRITICAL: CRIT longest blocking idle transaction sleeps for 754 seconds [13:14:27] RECOVERY - MySQL Idle Transactions on db1040 is OK longest blocking idle transaction sleeps for 0 seconds [13:15:36] RECOVERY - MySQL InnoDB on db1040 is OK longest blocking idle transaction sleeps for 0 seconds [13:27:56] (03CR) 10Steinsplitter: [C: 031] Restrict changetags right to sysops and bots only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208088 (https://phabricator.wikimedia.org/T97013) (owner: 10TTO) [13:28:48] (03CR) 10Steinsplitter: "There was never consensus to deploy this... Why we need consensus to switch it off? Simply a joke...?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208088 (https://phabricator.wikimedia.org/T97013) (owner: 10TTO) [13:47:58] (03CR) 10Andrew Bogott: [C: 032] Disable instance pausing in Horizon. [puppet] - 10https://gerrit.wikimedia.org/r/208066 (owner: 10Andrew Bogott) [14:00:50] 6operations, 10Analytics-Cluster, 5Patch-For-Review: Verify kafkatee use for fundraising logs on erbium - https://phabricator.wikimedia.org/T97676#1251830 (10Ottomata) It should be near realtime, maybe slightly more (less than a second) latency than udp2log. Unless, there is kafka broker downtime (which can... [14:05:44] 6operations, 7Monitoring: Overhaul reqstats - https://phabricator.wikimedia.org/T83580#1251834 (10Ottomata) Do you need all timings, or do you want things like average time to first byte sent to client. I know that at least is available in varnishncsa, maybe the other ReqEnd timings are as well? In that case... [14:11:56] (03CR) 10Eevans: "I'm awake!" [puppet] - 10https://gerrit.wikimedia.org/r/207989 (owner: 10GWicke) [14:12:44] (03CR) 10Eevans: [C: 031] Increase new generation size to 1/4 heap [puppet] - 10https://gerrit.wikimedia.org/r/207989 (owner: 10GWicke) [14:17:15] akosiaris, ping [14:17:22] https://phabricator.wikimedia.org/T97606 [14:17:44] any help is welcome if you have a sec :) [14:18:18] (03CR) 10Sjoerddebruin: [C: 031] "Per my comments on the task." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208088 (https://phabricator.wikimedia.org/T97013) (owner: 10TTO) [14:24:44] (03PS1) 10Ottomata: Add --line-buffered to 5xx grep command for ops kafkatee instance [puppet] - 10https://gerrit.wikimedia.org/r/208106 [14:27:07] 6operations, 10Analytics-Cluster: Package kafkacat - https://phabricator.wikimedia.org/T97771#1251864 (10Ottomata) 3NEW a:3Ottomata [14:28:26] (03CR) 10Ottomata: [C: 032] Add --line-buffered to 5xx grep command for ops kafkatee instance [puppet] - 10https://gerrit.wikimedia.org/r/208106 (owner: 10Ottomata) [14:28:30] (03CR) 10Steinsplitter: "switched off by nlwiki: https://nl.wikipedia.org/w/index.php?title=MediaWiki:Common.css&diff=44070070&oldid=43940069 other wikis will com" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208088 (https://phabricator.wikimedia.org/T97013) (owner: 10TTO) [14:31:22] akosiaris: ok if labs instance ‘jessiepackager’ has a few minutes of downtime? [14:31:54] chasemp: same question regarding phab-02 [14:32:11] yes [14:33:04] chasemp: thanks! It’ll be back soon. [14:33:14] (03CR) 10Perhelion: [C: 031] Restrict changetags right to sysops and bots only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208088 (https://phabricator.wikimedia.org/T97013) (owner: 10TTO) [14:34:43] yurik: he may be off today [14:38:42] Coren: how about instance ‘relic’ in project ‘toolserver-legacy'? [14:39:25] That's the mail relay / web redirector. So long as it keeps its public IP you can migrate it. [14:39:51] ‘k thanks [14:40:31] gwicke urandom merging https://gerrit.wikimedia.org/r/#/c/207989/3 [14:40:37] RECOVERY - Host labstore1002 is UPING OK - Packet loss = 0%, RTA = 1.62 ms [14:40:42] godog: cool, thx! [14:40:49] (03PS4) 10Filippo Giunchedi: Increase new generation size to 1/4 heap [puppet] - 10https://gerrit.wikimedia.org/r/207989 (owner: 10GWicke) [14:40:55] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Increase new generation size to 1/4 heap [puppet] - 10https://gerrit.wikimedia.org/r/207989 (owner: 10GWicke) [14:40:56] Coren: labstore1002 lives! [14:41:04] 6operations, 10Wikidata, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 5Patch-For-Review: Create Wikipedia Konkani - https://phabricator.wikimedia.org/T96468#1251881 (10Krenair) So before we create this wiki, we need those translations to be approved and deployed, right? [14:42:50] chasemp, thanks! [14:44:42] andrewbogott: Indeed. A badly seated raid controller card, if you can beleive it. [14:45:12] * Coren briefly considers the switch. [14:45:13] um… that’s not reassuring [14:45:19] Not on a Friday. [14:45:46] andrewbogott: I was a little surprised that's even possible in modern enclosures but hey. [14:46:09] Coren: yeah [14:48:46] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 77.78% of data above the critical threshold [24.0] [14:49:34] Oh, what now. [14:50:21] !log slowly restarting restbase100*.eqiad to apply new gen size change [14:50:29] Logged the message, Master [14:51:19] (03PS1) 10Filippo Giunchedi: mwprof: deprecation [puppet] - 10https://gerrit.wikimedia.org/r/208108 (https://phabricator.wikimedia.org/T97509) [14:51:23] Hm, bit sluggish but not catastrophically so. [14:52:06] PROBLEM - Host labvirt1005 is DOWN: PING CRITICAL - Packet loss = 100% [14:53:43] Now we get to find what haunts virt1005's ram. :-) [14:53:50] "badly seated simm" :-) [14:54:42] shower-breakfast. bbiab [14:57:28] 6operations: Choose a consistent, distributed k/v storage for configuration management/discovery - https://phabricator.wikimedia.org/T95656#1251903 (10BBlack) >>! In T95656#1198202, @hashar wrote: > Out of curiosity, since analytics already uses zookeeper for hive/kafka, maybe it should be given a try first and... [14:57:50] haha, oh there is a ticket and an email thread about this?! [14:57:50] haha [15:00:22] 6operations, 10ops-codfw: labstore1002 does not pass POST - https://phabricator.wikimedia.org/T97688#1251932 (10Cmjohnson) 5Open>3Resolved I booted labstore1002 and post hung and attempting to initialize the h800 raid card. I opened the server up and re-seated the card and for good measure drained flea pow... [15:01:26] _joe_: qq, are the pybalish configs something that would be a candidate for use by discovery service? [15:01:52] rather than editing a file that is grabbed via http, pybal would ask disc. services? [15:02:02] just curious. [15:02:19] 6operations: Choose a consistent, distributed k/v storage for configuration management/discovery - https://phabricator.wikimedia.org/T95656#1251939 (10Ottomata) > I don't think analytics use of ZK is much of an argument here, either. +1. we have 3 ZK servers that are used by Kafka for leader election, and for o... [15:03:50] ottomata: yeah the first targets for this are pybal and varnish backend config/state/weight, etc [15:04:42] pybal we can code access to the kvstore directly with a python client lib. varnish we'll have to have some other tool/daemon watching relevant data changes and re-writing a templated VCL fragment + executing a reload-vcl command, basically. [15:06:33] aye [15:06:56] i get the pybal thing, but why is service discovery better than puppet for varnish? [15:07:32] 6operations, 10ops-eqiad, 6Labs, 10Labs-Infrastructure: labvirt1005 memory errors - https://phabricator.wikimedia.org/T97521#1251945 (10Cmjohnson) Post Error Inlet Ambient Temperature: 17C/62F 207-Memory initialization error on Processor 1 Socket 4. The operating system may not have access to all of the... [15:07:51] well in the varnish case, the critical problem here is the varnish<->varnish server lists for tiers/layers, where varnish knows about lists of other varnishes (it's not such an issue when we're talking about varnish->LVS service backend) [15:08:13] so that we can runtime-depool a varnish machine from both pybal->varnish and varnish<->varnish easily for reboots and such [15:08:38] hmm, ahhh, right, k [15:08:41] doing that through puppet is a nightmare when you have a phab task that says "hey we need to reboot all 100 caches, and not all at the same time" [15:08:53] currently the backends are all in that big puppet hash, right? [15:08:55] like this one: https://phabricator.wikimedia.org/T96854 [15:08:59] right [15:09:08] yeah, makes sense [15:10:40] currently, the most-correct and least-impactful way to do that is, "commit a change to puppet to comment out the varnish server, wait 30 minutes for puppet to run everywhere, edit pybal file to depool there as well, wait another couple of minutes, reboot box, repool in pybal, commit another change through puppet to repool for varnish, wait another 30 minutes" [15:10:46] times ~100 machines [15:11:03] but I tend to take some shortcuts in practice that work out ok, but still... [15:12:08] aye [15:12:24] what we want is to leave all static config alone, and somewhere type "cache-depool cp4011.ulsfo.wmnet" and have it magically do all that through etcd, and then after it's back up do "cache-repool cp4011.ulsfo.wmnet" [15:12:55] bblack: what was the reason that the proxies are upgraded to linux 3.19? it's got a bug in AES so it looks like we'll need to upgrade to 4.0, but it would be even better if we could revert to jessie's 3.16. [15:12:57] and then maybe get smart and fancy and hook those commands right into the initscripts of the machine itself, to handle itself on reboot (or if a little timing lag is needed, make a reboot script that can run on the host itself and time out the commands + reboot) [15:13:43] jgage: I was under the impression the bug you were concerned about was only for 256-bit and we were going 128, thus non-issue? [15:14:22] there are vm subsystem issues in kernels earlier than 3.18 that make those kernels not acceptable for the upload caches [15:14:23] there's a new one with AES, possibly remotely exploitable. https://security-tracker.debian.org/tracker/CVE-2015-3331 [15:14:31] hm ok [15:14:52] see the ticket and the linked ticket, re: convo w/ moritz [15:15:16] "If an IPsec tunnel is configured to use this mode (also known as AES-GCM-ESP) this can lead to memory corruption and crashes (even without malicious traffic). This could potentially also result in remote code execution. [15:15:20] " [15:15:21] thanks ok [15:15:33] jgage: that's not one of the CVE's fixed in 3.19.3? [15:16:05] it came out after 3.19.3, but i'll check for an update [15:16:12] it was fixed upstream in 4.0-rc5 [15:16:21] 7Puppet, 6operations, 10Beta-Cluster: Trebuchet on deployment-bastion: wrong group owner - https://phabricator.wikimedia.org/T97775#1251977 (10greg) Yo @chasemp, is this fallout from that change from yesterday? [15:16:44] right now sid is on 3.16 and experimental is on 4.0 so i don't think debian is maintaining a 3.19.x with updates [15:17:04] they're not maintaining 4.0 either [15:17:31] https://phabricator.wikimedia.org/T96854#1241487 [15:18:13] the point is to get on something that will eventually have a maintained package, which will end up being the 3.19-ckt series once it exists. For now moritzm is building us a non-trunk 3.19 build in https://phabricator.wikimedia.org/T97411 (expected early next week) [15:18:27] you might want to mention that CVE in that latter ticket. He could probably backport that. [15:18:33] cool ok [15:18:37] RECOVERY - High load for whatever reason on labstore1001 is OK Less than 50.00% above the threshold [16.0] [15:21:13] !log doing java security update on kafka brokers, doing rolling restarts [15:21:20] Logged the message, Master [15:21:58] 6operations: Purge > 90 days stat1002:/a/squid/archive/edits - https://phabricator.wikimedia.org/T92339#1251994 (10kevinator) per an email from @ezachte, it is ok to start & automate removal of files older than 90 days. - /a/squid/archive/edits **old destination of logs** - /a/log/webrequest/archive/edits/ **n... [15:22:06] 6operations: Purge > 90 days stat1002:/a/squid/archive/edits - https://phabricator.wikimedia.org/T92339#1251995 (10kevinator) p:5Low>3High [15:26:38] bblack: cool, i have confirmed that the fix is in 3.19.3 [15:29:07] PROBLEM - Kafka Broker Under Replicated Partitions on analytics1022 is CRITICAL: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value CRITICAL: 15.0 [15:29:28] shhhh [15:29:31] i know, i'm doin gthings [15:29:46] PROBLEM - Kafka Broker Under Replicated Partitions on analytics1018 is CRITICAL: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value CRITICAL: 21.0 [15:29:49] !log finished restarting cassandra nodes on restbase100*.eqiad [15:29:56] Logged the message, Master [15:30:18] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [15:32:05] 7Puppet, 6operations, 10Beta-Cluster: Trebuchet on deployment-bastion: wrong group owner - https://phabricator.wikimedia.org/T97775#1252019 (10chasemp) >>! In T97775#1251977, @greg wrote: > Yo @chasemp, is this fallout from that change from yesterday? This would be an issue with trebuchet umask I imagine so... [15:32:46] RECOVERY - Kafka Broker Under Replicated Partitions on analytics1022 is OK: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value OKAY: 0.0 [15:33:26] RECOVERY - Kafka Broker Under Replicated Partitions on analytics1018 is OK: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value OKAY: 0.0 [15:33:37] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 55.56% of data above the critical threshold [24.0] [15:41:09] 7Puppet, 6operations, 10Beta-Cluster: Trebuchet on deployment-bastion: wrong group owner - https://phabricator.wikimedia.org/T97775#1252055 (10mobrovac) >>! In T97775#1252019, @chasemp wrote: > If anything this should really be `sudo chown -R trebuchet:deployment` Current practice seems to disagree on that... [15:42:27] (03PS5) 10Paladox: Change setting name from $wmincClosedWikis to $wgwmincClosedWikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/207909 [15:45:26] 7Puppet, 6operations, 10Beta-Cluster: Trebuchet on deployment-bastion: wrong group owner - https://phabricator.wikimedia.org/T97775#1252075 (10chasemp) sure, I mean all of those should be owned by trebuchet and deployment since deployment is the group for deployers. wikidev is the default group for all user... [15:49:17] PROBLEM - puppet last run on cp3037 is CRITICAL puppet fail [15:50:01] !log anomie Synchronized php-1.26wmf4/includes/: Deploy [[gerrit:208109]] to reduce the complaining about the new feature (duration: 00m 24s) [15:50:09] Logged the message, Master [15:51:09] 7Puppet, 6operations, 10Beta-Cluster: Trebuchet on deployment-bastion: wrong group owner - https://phabricator.wikimedia.org/T97775#1252101 (10thcipriani) There are a couple of approaches out there to solve this problem. Check out: https://gerrit.wikimedia.org/r/#/c/201344/2 I have another patch that is a l... [15:54:55] 6operations, 10Architecture, 10MediaWiki-RfCs, 10RESTBase, and 4 others: RFC: Request timeouts and retries - https://phabricator.wikimedia.org/T97204#1252119 (10ssastry) Interesting discussion. Thank you everyone. So, I think there is a useful distinction that has emerged here. I think internal services li... [15:58:13] !log anomie Synchronized php-1.26wmf3/includes/: Deploy [[gerrit:208109]] to reduce the complaining about the new feature (duration: 00m 28s) [15:58:18] Logged the message, Master [15:59:47] (03PS1) 10GWicke: Increase CMSInitiatingOccupancyFraction from 70 to 78% [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/208115 [16:00:27] 6operations, 10Architecture, 10MediaWiki-RfCs, 10RESTBase, and 4 others: RFC: Request timeouts and retries - https://phabricator.wikimedia.org/T97204#1252126 (10ssastry) There is one other thing that follows ... If you want to pick low concurrency in a client (Parsoid), you also need to pick a lower initia... [16:02:46] 6operations, 7Wikimedia-log-errors: internal_api_error_Exception: [22e05a83] Exception Caught: wfDiff(): popen() failed errors on English Wikipedia - https://phabricator.wikimedia.org/T97145#1252127 (10Multichill) +1 on closing. Haven't seen this for a while. [16:03:37] (03CR) 10Eevans: [C: 031] Increase CMSInitiatingOccupancyFraction from 70 to 78% [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/208115 (owner: 10GWicke) [16:04:29] (03PS2) 10GWicke: Increase CMSInitiatingOccupancyFraction from 70 to 78% [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/208115 [16:04:48] 6operations, 10RESTBase, 10hardware-requests: Expand RESTBase cluster capacity - https://phabricator.wikimedia.org/T93790#1252130 (10fgiunchedi) ok as per related T97692 we're aiming at 54T per cluster, thus +36T in eqiad and +54T in codfw or 90T total, split among 30 new machines for option 1 or 45 for opti... [16:05:06] Krenair: do you get payed for your commoents on phabricator? [16:05:06] 7Puppet, 6operations, 10Beta-Cluster: Trebuchet on deployment-bastion: wrong group owner - https://phabricator.wikimedia.org/T97775#1252131 (10chasemp) >>! In T97775#1252101, @thcipriani wrote: > There are a couple of approaches out there to solve this problem. Check out: https://gerrit.wikimedia.org/r/#/c/2... [16:05:07] PROBLEM - Varnishkafka Delivery Errors per minute on cp4013 is CRITICAL 11.11% of data above the critical threshold [20000.0] [16:05:18] PROBLEM - Varnishkafka Delivery Errors per minute on cp4006 is CRITICAL 11.11% of data above the critical threshold [20000.0] [16:05:19] PROBLEM - Varnishkafka Delivery Errors per minute on cp4014 is CRITICAL 11.11% of data above the critical threshold [20000.0] [16:05:19] PROBLEM - Varnishkafka Delivery Errors per minute on cp4005 is CRITICAL 11.11% of data above the critical threshold [20000.0] [16:05:34] gwicke: https://gerrit.wikimedia.org/r/#/c/208115 good to merge? [16:05:36] (03CR) 10Eevans: [C: 031] Increase CMSInitiatingOccupancyFraction from 70 to 78% [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/208115 (owner: 10GWicke) [16:05:47] RECOVERY - puppet last run on cp3037 is OK Puppet is currently enabled, last run 4 seconds ago with 0 failures [16:05:55] Steinsplitter, I get paid per hour of work I do. So it depends whether I was recording the times when I posted it. Why? [16:06:19] Oh, you're mad about my comments on the deployment thing? [16:06:25] Yeah I do those as a volunteer. [16:06:27] PROBLEM - Varnishkafka Delivery Errors per minute on cp4007 is CRITICAL 11.11% of data above the critical threshold [20000.0] [16:06:39] ok [16:08:22] godog: yes, it's good to go [16:08:47] All of my deployment stuff is as a volunteer. [16:08:57] I was careful to be very explicit about that on my access request. [16:09:47] RECOVERY - Varnishkafka Delivery Errors per minute on cp4007 is OK Less than 1.00% above the threshold [0.0] [16:09:57] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Increase CMSInitiatingOccupancyFraction from 70 to 78% [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/208115 (owner: 10GWicke) [16:10:06] RECOVERY - Varnishkafka Delivery Errors per minute on cp4013 is OK Less than 1.00% above the threshold [0.0] [16:10:07] RECOVERY - Varnishkafka Delivery Errors per minute on cp4006 is OK Less than 1.00% above the threshold [0.0] [16:11:56] RECOVERY - Varnishkafka Delivery Errors per minute on cp4005 is OK Less than 1.00% above the threshold [0.0] [16:11:57] 6operations, 10ops-eqiad, 6Labs, 10Labs-Infrastructure: labvirt1005 memory errors - https://phabricator.wikimedia.org/T97521#1252148 (10Cmjohnson) Moved bad DIMM module to processor 2 socket 4 to see if the error will follow the DIMM. After rebooting the error returned to the same socket. Post message b... [16:12:46] PROBLEM - puppet last run on analytics1022 is CRITICAL Puppet has 1 failures [16:13:36] RECOVERY - Varnishkafka Delivery Errors per minute on cp4014 is OK Less than 1.00% above the threshold [0.0] [16:14:47] PROBLEM - Kafka Broker Under Replicated Partitions on analytics1021 is CRITICAL: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value CRITICAL: 28.0 [16:14:51] shhhh [16:14:55] i scheduled icinga downtime, sheesh. [16:16:31] (03PS1) 10Filippo Giunchedi: cassandra: update submodule [puppet] - 10https://gerrit.wikimedia.org/r/208116 [16:16:43] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] cassandra: update submodule [puppet] - 10https://gerrit.wikimedia.org/r/208116 (owner: 10Filippo Giunchedi) [16:17:30] gwicke: merged [16:18:27] RECOVERY - Kafka Broker Under Replicated Partitions on analytics1021 is OK: kafka.server.ReplicaManager.UnderReplicatedPartitions.Value OKAY: 0.0 [16:20:36] PROBLEM - Varnishkafka Delivery Errors per minute on cp4015 is CRITICAL 11.11% of data above the critical threshold [20000.0] [16:21:47] PROBLEM - Varnishkafka Delivery Errors per minute on cp4014 is CRITICAL 11.11% of data above the critical threshold [20000.0] [16:21:47] PROBLEM - Varnishkafka Delivery Errors per minute on cp4005 is CRITICAL 11.11% of data above the critical threshold [20000.0] [16:23:20] 6operations, 10ops-eqiad: analytics1016 down - https://phabricator.wikimedia.org/T97349#1252193 (10Ottomata) 5Open>3Resolved [16:25:07] RECOVERY - Varnishkafka Delivery Errors per minute on cp4014 is OK Less than 1.00% above the threshold [0.0] [16:25:07] RECOVERY - Varnishkafka Delivery Errors per minute on cp4005 is OK Less than 1.00% above the threshold [0.0] [16:27:07] RECOVERY - Varnishkafka Delivery Errors per minute on cp4015 is OK Less than 1.00% above the threshold [0.0] [16:28:21] paravoid: can i share your 'the case for debian' email to the ops list publicly? Not like a blog or anything, just a couple people i know [16:28:53] its from back in november [16:30:56] RECOVERY - puppet last run on analytics1022 is OK Puppet is currently enabled, last run 1 minute ago with 0 failures [16:42:56] 6operations, 7Monitoring: Overhaul reqstats - https://phabricator.wikimedia.org/T83580#1252218 (10fgiunchedi) the more timings we have the better, but we can add that later and get something in place now to replace reqstats re: pipe let's see how it'd look like but yeah I see what you mean they'd be all indep... [16:52:57] godog: to send a metric via statsd in eqiad, what host:port should I use now? [16:53:57] ottomata: statsd.eqiad.wmnet:8125 will do! [16:54:22] ottomata: related, I think we're good to merge next week https://gerrit.wikimedia.org/r/#/c/207805/ [16:54:26] udp? [16:54:32] yep udp [16:55:23] 6operations, 10Deployment-Systems, 7Graphite, 5Patch-For-Review: [scap] Deploy events aren't showing up in graphite/gdash - https://phabricator.wikimedia.org/T64667#1252284 (10bd808) >>! In T64667#1251557, @fgiunchedi wrote: > > @bd808 is scap sending statsd counters? if so the names will change upon next... [17:06:19] 6operations, 10RESTBase, 10hardware-requests: Expand RESTBase cluster capacity - https://phabricator.wikimedia.org/T93790#1252320 (10GWicke) > re: option 2 above how much hw would we need to realistically test that? @fgiunchedi, we are already getting close to that with the existing boxes. The largest one c... [17:09:32] 6operations, 10Deployment-Systems, 7Graphite, 5Patch-For-Review: [scap] Deploy events aren't showing up in graphite/gdash - https://phabricator.wikimedia.org/T64667#1252345 (10fgiunchedi) >>! In T64667#1252284, @bd808 wrote: >>>! In T64667#1251557, @fgiunchedi wrote: >> >> @bd808 is scap sending statsd co... [17:10:36] PROBLEM - Persistent high iowait on labstore1001 is CRITICAL 66.67% of data above the critical threshold [35.0] [17:11:11] 6operations, 6Security, 10Wikimedia-General-or-Unknown, 7Mail: DMARC: Users cannot send emails via a wiki's [[Special:EmailUser]] - https://phabricator.wikimedia.org/T66795#1252359 (10TheDJ) @Technical13 No. As far as I can see I think all that needs to be done is to flip wgUserEmailUseReplyTo and emails w... [17:24:30] (03PS2) 10Ori.livneh: mwprof: deprecation [puppet] - 10https://gerrit.wikimedia.org/r/208108 (https://phabricator.wikimedia.org/T97509) (owner: 10Filippo Giunchedi) [17:24:50] (03CR) 10Ori.livneh: [C: 032 V: 032] "https://www.youtube.com/watch?v=TFLRHPUWBI8" [puppet] - 10https://gerrit.wikimedia.org/r/208108 (https://phabricator.wikimedia.org/T97509) (owner: 10Filippo Giunchedi) [17:26:00] (Cannot access the database: Unknown database 'dawiki' (10.68.17.94)) [17:26:03] beta cluster ^ [17:26:39] (03CR) 10Ori.livneh: [C: 04-1] "useless comment is useless" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/208083 (https://phabricator.wikimedia.org/T97754) (owner: 10Filippo Giunchedi) [17:27:16] Glaisher, it really does not exist [17:27:24] what are you doing that expects it? [17:27:31] http://deployment.wikimedia.beta.wmflabs.org/wiki/Special:GlobalRenameRequest [17:27:32] accessing the database! [17:27:35] heh [17:27:58] Huh. [17:28:23] legoPanda: ^ [17:28:39] gah [17:28:48] YuviKTM: ^ [17:28:58] :D [17:29:26] :D [17:29:45] it is _really_ annoying btw [17:30:43] 6operations, 10ops-eqiad, 6Labs, 10Labs-Infrastructure: labvirt1005 memory errors - https://phabricator.wikimedia.org/T97521#1252476 (10Cmjohnson) The cpu changed did nothing 207-Memory initialization error on Processor 1 Socket 4. The operating system may not have access to all of the memory installed i... [17:32:00] twentyafterfour: is there no .dblist for group1 wikis? [17:32:54] (03PS2) 10Filippo Giunchedi: graphite: split alerts role [puppet] - 10https://gerrit.wikimedia.org/r/208083 (https://phabricator.wikimedia.org/T97754) [17:33:15] (03CR) 10Filippo Giunchedi: graphite: split alerts role (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/208083 (https://phabricator.wikimedia.org/T97754) (owner: 10Filippo Giunchedi) [17:33:57] ori, I think there's a commit to add that somewhere? [17:34:08] not merged [17:35:34] Krenair: how do you practically determine what they are? [17:35:56] in other words, in the absence of a dblist, how are group1 versions manipulated? [17:36:07] PROBLEM - puppet last run on analytics1001 is CRITICAL Puppet has 1 failures [17:36:08] ori: correct, you reset all, then subtract group 2 from group 0 to get group 1? [17:36:09] not test, zerowiki, mw.org, not wikipedia? [17:36:10] all - group0 - wikipedia? [17:36:28] er yeah [17:36:52] there is a script in multiwiki subdir that automates it [17:37:07] yeah, but it can't be used to apply a setting in initialisesettings.php since it's not a concrete dblist [17:37:07] oh well [17:37:44] ori: the script is multiwiki/updateGroup1 [17:37:50] oh [17:38:11] we could generate the list pretty easily right? [17:38:17] and store it as a real dblist [17:38:41] yes, but then there's the risk of it being out of sync [17:38:48] it's kind of like a virtual table or a view in a database [17:38:52] I thought someone already did.. [17:38:58] it's an expression [17:39:16] maybe a .dblist could actually include some arithmetic expression [17:39:29] like: group1.dblist: # all.dblist - group0.dblist - wikipedia.dblist [17:40:41] 10Ops-Access-Requests, 6operations, 6Release-Engineering, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1252553 (10Dzahn) p:5Triage>3Normal [17:41:06] 10Ops-Access-Requests, 6operations: Give Neil Quinn access to stats1003.eqiad.wmnet - https://phabricator.wikimedia.org/T97746#1252557 (10Dzahn) [17:41:40] 10Ops-Access-Requests, 6operations: Give Neil Quinn access to stats1003.eqiad.wmnet - https://phabricator.wikimedia.org/T97746#1251292 (10Dzahn) p:5Triage>3Normal [17:42:18] 6operations, 10Analytics-EventLogging, 5Patch-For-Review: Reclaim vanadium, move to spares - https://phabricator.wikimedia.org/T95566#1252576 (10Cmjohnson) Wiping [17:43:01] (03CR) 10Alex Monk: [C: 04-1] "That user doesn't actually appear to exist..." [puppet] - 10https://gerrit.wikimedia.org/r/207846 (https://phabricator.wikimedia.org/T97642) (owner: 10Dzahn) [17:49:09] 6operations, 10ops-eqiad, 5Patch-For-Review: reclaim tungsten as spare - https://phabricator.wikimedia.org/T97274#1252594 (10Cmjohnson) Wiping process has started [17:50:54] bblack, how do template params get applied in systemd.erb templates if they are not defined in scope of the file resource that renders the template? [17:51:00] e.g in varnish .systemd.erb? [17:52:27] RECOVERY - puppet last run on analytics1001 is OK Puppet is currently enabled, last run 24 seconds ago with 0 failures [17:53:06] RECOVERY - Persistent high iowait on labstore1001 is OK Less than 50.00% above the threshold [25.0] [17:54:25] Glaisher: can you file a bug for that? [17:54:34] that what? [17:54:37] dawiki thing? [17:55:51] Glaisher: yes [17:55:57] ok [17:57:13] legoktm: https://phabricator.wikimedia.org/T97813?workflow=create [17:57:38] 6operations, 10ops-eqiad: Failed disk db1004 - https://phabricator.wikimedia.org/T97814#1252622 (10Cmjohnson) 3NEW a:3Cmjohnson [17:58:11] 6operations, 10ops-eqiad: Failed disk db1003 - https://phabricator.wikimedia.org/T97815#1252632 (10Cmjohnson) 3NEW a:3Cmjohnson [17:58:36] 6operations, 10RESTBase, 10hardware-requests: Expand RESTBase cluster capacity - https://phabricator.wikimedia.org/T93790#1252642 (10GWicke) For the overall costs it would be interesting to know an order-of-magnitude number for housing 1U for a couple of years. [18:01:14] 6operations, 10ops-eqiad: Failed disk db1004 - https://phabricator.wikimedia.org/T97814#1252657 (10Cmjohnson) nclosure Device ID: 32 Slot Number: 9 Drive's position: DiskGroup: 0, Span: 4, Arm: 1 Enclosure position: N/A Device Id: 9 WWN: 5000C50028E9F470 Sequence Number: 3 Media Error Count: 11742 Other Error... [18:02:04] 6operations, 10ops-eqiad: Failed disk db1003 - https://phabricator.wikimedia.org/T97815#1252658 (10Cmjohnson) nclosure Device ID: 32 Slot Number: 2 Drive's position: DiskGroup: 0, Span: 1, Arm: 0 Enclosure position: N/A Device Id: 2 WWN: 5000C50028E8E6E8 Sequence Number: 3 Media Error Count: 0 Other Error Coun... [18:02:55] (03PS1) 10Legoktm: "dawiki" does not exist on the beta cluster, remove from all-labs.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208154 (https://phabricator.wikimedia.org/T97813) [18:03:08] Glaisher: ^ [18:03:54] thnx :) [18:04:04] (03CR) 10Glaisher: [C: 031] "dawiki" does not exist on the beta cluster, remove from all-labs.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208154 (https://phabricator.wikimedia.org/T97813) (owner: 10Legoktm) [18:04:22] you planning to just do that legoktm? [18:04:27] (03CR) 10Legoktm: [C: 032] "dawiki" does not exist on the beta cluster, remove from all-labs.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208154 (https://phabricator.wikimedia.org/T97813) (owner: 10Legoktm) [18:04:30] Krenair: yes [18:04:33] ok [18:08:19] 6operations, 10ops-eqiad: db1060 raid degraded - https://phabricator.wikimedia.org/T96471#1252681 (10Cmjohnson) Return shipping information FEDEX Label 9611918 2393026 48268171 USPS 9202 3946 5301 2426 7770 65 [18:09:30] (03Merged) 10jenkins-bot: "dawiki" does not exist on the beta cluster, remove from all-labs.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208154 (https://phabricator.wikimedia.org/T97813) (owner: 10Legoktm) [18:11:25] !log legoktm Synchronized all-labs.dblist: https://gerrit.wikimedia.org/r/208154 - no-op (duration: 00m 19s) [18:11:31] Logged the message, Master [18:13:01] 6operations, 10ops-eqiad: Failed disk db1003 - https://phabricator.wikimedia.org/T97815#1252703 (10Cmjohnson) This server is out of warranty but I had spare disks on-site. The new disk is rebuilding Firmware state: Rebuild [18:13:41] 6operations, 10ops-eqiad: Failed disk db1004 - https://phabricator.wikimedia.org/T97814#1252708 (10Cmjohnson) This server is out of warranty but I had spare disks on-site. The new disk is rebuilding Firmware state: Rebuild [18:14:44] greg-g: hi! just to let you know, I'll be deploying the EducationProgram patch at 12:00 SF time [18:14:51] Glaisher: hahaha http://deployment.wikimedia.beta.wmflabs.org/wiki/Special:GlobalRenameRequest now it says "idwiki" [18:15:00] wtf [18:16:07] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [18:16:49] it seems kind of the wrong way to just delete missing db's in labs from the dblist? [18:17:19] Hi opsen; I noticed a non-OS user performed a suppression action on Commons. Anyone care enough to investigate? [18:18:05] mutante: It never existed so why should it be wrong? [18:18:47] Glaisher: http://fpaste.org/217617/5043211/raw/ [18:19:01] because "missing db" should be "add the db" instead of "delete it from config"? [18:19:03] odder: as in they don't have OS permissions? [18:19:16] are you removing them? [18:19:48] mutante: someone must've added them mistakenly so instead of starting a new wiki, we should just remove them, no? [18:19:49] legoktm: Not at the moment, and never have as far as I see [18:20:10] Glaisher: https://github.com/wikimedia/operations-mediawiki-config/commit/a273539c4e7a53639344bb577bfff9fd99cad036 [18:20:12] kart_: hey [18:20:30] kart_: you added wikis to beta labs without creating them? [18:20:49] odder: pm [18:21:16] Glaisher: i don't know. but it seems odd that there is a file to list all the db's in labs, then we delete a random one to fix an error [18:22:34] also, fwiw, there are bugs to create these dbs in beta it looks like: https://phabricator.wikimedia.org/T90683 [18:23:17] RECOVERY - salt-minion processes on stat1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [18:23:17] RECOVERY - dhclient process on stat1002 is OK: PROCS OK: 0 processes with command name dhclient [18:24:37] RECOVERY - puppet last run on stat1002 is OK Puppet is currently enabled, last run 44 seconds ago with 0 failures [18:25:41] 6operations, 10Wikimedia-Mailing-lists: Let public archives be indexed and archived - https://phabricator.wikimedia.org/T90407#1252758 (10Dzahn) To be fair, if he had _not_ mailed all list admins and just asked ops to change global settings, i would have definitely expected list admins to complain about not be... [18:28:04] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1252809 (10Qgil) @ksmith wrote at T97787: > As a ScrumMaster(ish), I would like the ability to quickly create Sprint projects for teams. I swe... [18:33:22] AndyRussG: cool [18:43:46] RECOVERY - RAID on db1004 is OK optimal, 1 logical, 2 physical [18:49:15] (03PS1) 10Legoktm: Revert "Beta: Add wikis for ContentTranslation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208170 (https://phabricator.wikimedia.org/T97813) [18:50:26] (03CR) 10Legoktm: [C: 032] Revert "Beta: Add wikis for ContentTranslation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208170 (https://phabricator.wikimedia.org/T97813) (owner: 10Legoktm) [18:50:33] (03Merged) 10jenkins-bot: Revert "Beta: Add wikis for ContentTranslation" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208170 (https://phabricator.wikimedia.org/T97813) (owner: 10Legoktm) [18:53:25] !log legoktm Synchronized all-labs.dblist: https://gerrit.wikimedia.org/r/#/c/208170/ no-op (duration: 00m 18s) [18:53:31] Logged the message, Master [18:54:07] !log legoktm Synchronized wikiversions-labs.json: https://gerrit.wikimedia.org/r/#/c/208170/ no-op (duration: 00m 25s) [18:54:12] Logged the message, Master [18:59:46] 6operations, 10Analytics-Cluster: Package kafkacat - https://phabricator.wikimedia.org/T97771#1253085 (10faidon) kafkacat is already packaged in Debian unstable. It will probably go in to jessie-backports soon. We'd just need to build it for the other distributions we support, if needed. [19:00:04] AndyRussG: Dear anthropoid, the time has come. Please deploy EducationProgram (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150501T1900). [19:01:26] (03PS1) 10Chad: Remove realm detection from CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208174 [19:03:02] Glaisher: http://deployment.wikimedia.beta.wmflabs.org/wiki/Special:GlobalRenameRequest is fine now [19:04:05] 6operations, 10Analytics-Cluster: Package kafkacat - https://phabricator.wikimedia.org/T97771#1253116 (10Ottomata) Oo! Didn't realize. Awesome! [19:04:24] 6operations, 10Analytics-Cluster: Backport? and install kafkacat (on stat1002?) - https://phabricator.wikimedia.org/T97771#1253117 (10Ottomata) [19:04:52] anyone who knows mailman know of a way to block an email from from emailing the -owner address? [or if we can block an email completely at the server level?] have a list getting a bunch of what looks like dangerous (virus) spam from 1 email [19:14:46] RECOVERY - Disk space on stat1002 is OK: DISK OK [19:15:06] RECOVERY - High load for whatever reason on labstore1001 is OK Less than 50.00% above the threshold [16.0] [19:18:28] Hi greg-g... strangely I'm having trouble getting onto tin for the deploy. I used to be able to, I think it's been more than a month or so since I deployed something... [19:19:04] from bast1001 I'm getting "Permission denied (publickey)." for ssh -A tin [19:23:17] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [19:23:51] 6operations, 10Architecture, 10MediaWiki-RfCs, 10RESTBase, and 4 others: RFC: Request timeouts and retries - https://phabricator.wikimedia.org/T97204#1253154 (10GWicke) > this need for retries is limited to Parsoid From an overload prevention strategy perspective, retries further down the stack make sense... [19:25:44] hrrrmm can anyone help me with access to tin? I'm supposed to be deploying now.... [19:25:49] it used to work... [19:26:06] AndyRussG: did you update your cipher settings per ops-l? [19:26:14] AndyRussG: also, you shouldn't need to keyforward anymore [19:26:46] legoktm: hmm I'm actually not on that list! [19:27:00] maybe I should be if I have deploy rights... heh [19:27:36] AndyRussG: yeah........you definitely need to be on that list [19:27:42] AndyRussG, https://wikitech.wikimedia.org/wiki/Help:Access#Accessing_instances_with_ProxyCommand_ssh_option_.28recommended.29 [19:27:49] 6operations, 10VisualEditor: [Regression?] Removing edited articles from watchlist at it.wp - https://phabricator.wikimedia.org/T97838#1253167 (10Elitre) Actually, adding @Ori. Maybe this has to do with JavaScript "experiments" at it.wiki? [19:28:01] you need the Host * bit at the end [19:28:30] it's true that you should update the client settings, but it would not cause this error [19:28:48] "Permission denied (publickey)" would not be caused by the cipher change [19:29:04] And yeah, don't keep forwarding your keys to tin. I removed that from the deployment instructions a month ago [19:29:15] that's more like an issue with the forwarding, yes [19:30:50] AndyRussG: https://wikitech.wikimedia.org/wiki/Help:Access#Accessing_instances_with_ProxyCommand_ssh_option_.28recommended.29 like this, just that instead of labs instances it's tin and the prod. bastion [19:31:42] (03PS1) 10Yuvipanda: dynamicproxy: Block OruxMaps app [puppet] - 10https://gerrit.wikimedia.org/r/208179 (https://phabricator.wikimedia.org/T97841) [19:31:43] Coren: ^ [19:31:50] legoktm: Krenair mutante thanks! I added the Host * line, still no dice... [19:32:07] Coren: wanna +1? I’ll merge and take care of deploying [19:32:32] AndyRussG, can you ssh to bast1001? [19:32:58] Krenair: yes! that works [19:33:16] You should have something like this in your SSH config: [19:33:31] (03CR) 10coren: [C: 031] "wfm" [puppet] - 10https://gerrit.wikimedia.org/r/208179 (https://phabricator.wikimedia.org/T97841) (owner: 10Yuvipanda) [19:33:36] Host *.eqiad.wmnet silver.wikimedia.org [19:33:37] ProxyCommand ssh -W %h:%p krenair@bast1001.wikimedia.org [19:33:45] User krenair [19:34:03] (03PS2) 10Yuvipanda: dynamicproxy: Block OruxMaps app [puppet] - 10https://gerrit.wikimedia.org/r/208179 (https://phabricator.wikimedia.org/T97841) [19:34:07] IdentityFile ~/.ssh/path_to_your_wmf_prod_id [19:34:22] And then sshing to tin.eqiad.wmnet should just work [19:34:47] (03CR) 10Yuvipanda: [C: 032 V: 032] dynamicproxy: Block OruxMaps app [puppet] - 10https://gerrit.wikimedia.org/r/208179 (https://phabricator.wikimedia.org/T97841) (owner: 10Yuvipanda) [19:35:29] Krenair: unrelated, but to simplify you can just use Host *.eqiad.wmnet *.wikimedia.org !bast1001.wikimedia.org [19:35:40] Krenair: mine sez wmflabs rather than wmnet [19:35:52] yuvipanda, and also !gerrit.wikimedia.org I would assume? [19:35:54] I think it's OK to stick my config in a pastebin, no? [19:36:03] AndyRussG, that'll work for labs, but not production... [19:36:36] Krenair: ah, true. I have a separate one for gerrit.wikimedia.org (diff. key) so that works out ok [19:36:37] No comment re. config sensitivity [19:36:41] AndyRussG: yea [19:37:56] (03PS1) 10Yuvipanda: fix missing ) in Id490f5f4dcba91e75f8c57dbb33b7c028a602f0e [puppet] - 10https://gerrit.wikimedia.org/r/208191 [19:38:00] (03CR) 10jenkins-bot: [V: 04-1] fix missing ) in Id490f5f4dcba91e75f8c57dbb33b7c028a602f0e [puppet] - 10https://gerrit.wikimedia.org/r/208191 (owner: 10Yuvipanda) [19:38:05] (03PS2) 10Yuvipanda: fix missing ) in Id490f5f4dcba91e75f8c57dbb33b7c028a602f0e [puppet] - 10https://gerrit.wikimedia.org/r/208191 [19:38:21] (03CR) 10Yuvipanda: [C: 032 V: 032] fix missing ) in Id490f5f4dcba91e75f8c57dbb33b7c028a602f0e [puppet] - 10https://gerrit.wikimedia.org/r/208191 (owner: 10Yuvipanda) [19:39:23] AndyRussG: the simplest form would be like this: https://phabricator.wikimedia.org/P595 and then just type "ssh tin" [19:40:50] Krenair: mutante: legoktm: got it working! My config file is now a hodgepodge but I'll figure it out later, thanks so much! [19:40:59] (03PS1) 10Ottomata: [WIP] Proof of concept for reqstats overhaul using varnishncsa instances per metric [puppet] - 10https://gerrit.wikimedia.org/r/208192 (https://phabricator.wikimedia.org/T83580) [19:41:11] I think it was the lack of a User myusername line [19:41:22] (03PS1) 10GWicke: Add commons to restbase config [puppet] - 10https://gerrit.wikimedia.org/r/208193 (https://phabricator.wikimedia.org/T97840) [19:41:32] Yeah [19:41:36] AndyRussG: yea, likely that's it. been there before. [19:41:40] Hmm I think I'll extend the deploy slot I had just in case, since I'm still pretty new at this [19:41:44] That will stop it from working unless you happen to have the same username on your local machine [19:41:57] which I don't! But I wonder why it used to work... [19:42:09] I think before I was ssh -A'ing from bast1001 [19:42:17] Yes, that used to be the process. [19:42:27] mmm that must be it then... [19:42:39] thanks! :) [19:42:40] Please use https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Step_2:_get_the_code_on_tin rather than relying only on your memory [19:42:50] also hi yuvipanda! [19:43:04] Krenair: yes indeed! [19:43:07] hi AndyRussG :) [19:43:14] 6operations, 7Monitoring, 5Patch-For-Review: Overhaul reqstats - https://phabricator.wikimedia.org/T83580#1253239 (10Ottomata) WIP proof of concept at: https://gerrit.wikimedia.org/r/#/c/208192/1 I haven't tested this puppetization at all, but I have limitedly tested the statsd_sender.sh script along with t... [19:43:23] (03PS2) 10GWicke: Add commons to restbase config [puppet] - 10https://gerrit.wikimedia.org/r/208193 (https://phabricator.wikimedia.org/T97840) [19:44:55] (03PS2) 10Ottomata: [WIP] Proof of concept for reqstats overhaul using varnishncsa instances per metric [puppet] - 10https://gerrit.wikimedia.org/r/208192 (https://phabricator.wikimedia.org/T83580) [19:45:59] (03PS3) 10Ottomata: [WIP] Proof of concept for reqstats overhaul using varnishncsa instances per metric [puppet] - 10https://gerrit.wikimedia.org/r/208192 (https://phabricator.wikimedia.org/T83580) [19:46:43] (03CR) 10BryanDavis: [C: 031] "MWRealm.php is loaded in MWMultiVersion.php which is guaranteed to have been loaded by the conditional on line 37 of this file. I can't fi" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208174 (owner: 10Chad) [19:54:17] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 62.50% of data above the critical threshold [24.0] [19:54:28] Coren: ^ [19:54:29] hmm [19:54:41] (03PS1) 10Ottomata: Delete udp2log generated edit logs on on stat1002 that are older than 90 days [puppet] - 10https://gerrit.wikimedia.org/r/208196 (https://phabricator.wikimedia.org/T92339) [19:55:21] yuvipanda: The load /is/ a bit on the high side, but not enough that I'd worry. [19:55:28] (03CR) 10Ottomata: [C: 032] Delete udp2log generated edit logs on on stat1002 that are older than 90 days [puppet] - 10https://gerrit.wikimedia.org/r/208196 (https://phabricator.wikimedia.org/T92339) (owner: 10Ottomata) [19:55:33] is io looking ok? [19:55:56] High, but not bad. I'm looking at interactive performance now and it's reasonable. [19:56:09] hmm [19:56:18] hopefully it doesn’t spiral out of control and spoil everyone’s friday evening [19:56:32] Like I said earlier, I'm keeping a close eye on it. [19:56:48] yeah :D [19:57:55] quick sanity check 4 deploy to wmf3 for an extension update: does this sync-dir look correct? $ sync-dir php-1.26wmf3/extensions/EducationProgram/ 'Update EducationProgram' [19:58:20] <^d> yes [19:58:30] ^d: cool thx! [19:58:34] Hmm here goes! [19:59:53] <^d> bd808: Any objections to me sync'ing that crud out of CommonSettings? [20:00:18] Oooh what a nice pig! [20:00:28] !log andyrussg Synchronized php-1.26wmf3/extensions/EducationProgram/: Update EducatiDonProgram (duration: 00m 30s) [20:00:31] ^d: not from me. stay safe out there :) [20:00:32] AndyRussG: :) [20:00:35] Logged the message, Master [20:00:50] (03CR) 10Chad: [C: 032] Remove realm detection from CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208174 (owner: 10Chad) [20:00:51] Scappy is a radiant pig [20:00:56] (03Merged) 10jenkins-bot: Remove realm detection from CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208174 (owner: 10Chad) [20:01:39] !log demon Synchronized wmf-config/CommonSettings.php: less realm stuff (duration: 00m 17s) [20:01:44] Logged the message, Master [20:02:07] 6operations, 5Patch-For-Review: Purge > 90 days stat1002:/a/squid/archive/edits - https://phabricator.wikimedia.org/T92339#1253329 (10Ottomata) 5Open>3Resolved Done. Logs will be pruned daily, and the oldest log file in /a/squid/archive/edits is now edits.tsv.log-20150131.gz. BTW, everything in /a/log/we... [20:03:16] woohoo update fixed the error! thanks again folks :) [20:04:17] (03PS1) 10Ottomata: Fixing @hdfs_site_impala_extra_properties rendering in hdfs-site.xml so that the file is the same every time puppet runs [puppet/cdh] - 10https://gerrit.wikimedia.org/r/208227 [20:04:41] (03CR) 10Ottomata: [C: 032] Fixing @hdfs_site_impala_extra_properties rendering in hdfs-site.xml so that the file is the same every time puppet runs [puppet/cdh] - 10https://gerrit.wikimedia.org/r/208227 (owner: 10Ottomata) [20:05:33] AndyRussG: :) [20:05:35] (03PS1) 10Ottomata: Update cdh module with fix to sort hash in config file properly [puppet] - 10https://gerrit.wikimedia.org/r/208235 [20:05:46] RECOVERY - High load for whatever reason on labstore1001 is OK Less than 50.00% above the threshold [16.0] [20:06:06] (03CR) 10Ottomata: [C: 032 V: 032] Update cdh module with fix to sort hash in config file properly [puppet] - 10https://gerrit.wikimedia.org/r/208235 (owner: 10Ottomata) [20:06:30] (03PS1) 10Yuvipanda: tools: Make toolschecker run directly as an upstart job [puppet] - 10https://gerrit.wikimedia.org/r/208243 [20:06:38] valhallasw`cloud: ^ want to look? :) [20:07:26] (03PS2) 10Yuvipanda: tools: Make toolschecker run directly as an upstart job [puppet] - 10https://gerrit.wikimedia.org/r/208243 [20:08:30] still have to do wmf4 [20:10:38] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 55.56% of data above the critical threshold [24.0] [20:16:16] (03PS1) 10Ori.livneh: Use MWWikiversions::readDbListFile to read dblist files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208262 [20:16:18] (03PS1) 10Ori.livneh: Allow computed dblist expressions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208263 [20:16:20] (03PS1) 10Ori.livneh: Add group1.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208264 [20:16:30] twentyafterfour: ^^^ all three for you [20:16:45] (03CR) 10jenkins-bot: [V: 04-1] Add group1.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208264 (owner: 10Ori.livneh) [20:18:08] 6operations, 10Analytics: analytics1013 crashed, investigate... - https://phabricator.wikimedia.org/T97380#1253354 (10Ottomata) 5Open>3Resolved [20:22:47] 6operations: Reinstall oxygen with Jessie - https://phabricator.wikimedia.org/T97331#1253362 (10Ottomata) 5Open>3Resolved [20:24:39] 10Ops-Access-Requests, 6operations, 10Wikimedia-Git-or-Gerrit: Stephen Niedzielski's Wikitech - https://phabricator.wikimedia.org/T97808#1253364 (10Dzahn) [20:24:40] 10Ops-Access-Requests, 6operations, 10Wikimedia-Git-or-Gerrit: Stephen Niedzielski's Wikitech - https://phabricator.wikimedia.org/T97808#1252514 (10Dzahn) [20:26:35] (03PS1) 10Yuvipanda: tools: Return proper 503 status code when there's a failure [puppet] - 10https://gerrit.wikimedia.org/r/208266 [20:26:37] yuvipanda: what is toolchecker [20:26:51] valhallasw`cloud: look at toolschecker.py [20:27:05] valhallasw`cloud: it basically is a way to get internal services checked / verified via catchpoint [20:27:14] (03PS2) 10Yuvipanda: tools: Return proper 503 status code when there's a failure [puppet] - 10https://gerrit.wikimedia.org/r/208266 [20:28:19] yuvipanda: errr. [20:28:21] okay [20:28:31] valhallasw`cloud: my first thouht was to build it all ourselves... [20:28:34] and have metrics. [20:28:43] but then decided there’s far better ways to improve tools [20:29:01] why does this not just run as a normal tool? [20:29:19] valhallasw`cloud: because then it’ll be dependent on the proxy working and on NFS working [20:29:23] and on gridengine working.. [20:29:24] because it's some sort of web frontend that gives status codes? [20:29:28] 6operations, 10Analytics, 5Patch-For-Review: Hadoop logs on logstash are being really spammy - https://phabricator.wikimedia.org/T87206#1253399 (10Ottomata) 5Open>3Resolved [20:29:30] 6operations: Audit log sources and see if we can make them less spammy - https://phabricator.wikimedia.org/T87205#1253400 (10Ottomata) [20:29:35] oh, and now it's in dist-packages [20:29:35] so it has to be as isolated as possible [20:29:36] I see [20:29:44] but how does it get accessed? [20:29:46] dist-packages might be the wrong place? site-packages? [20:29:56] valhallasw`cloud: http://tools-checker.wmflabs.org/redis [20:29:57] public IP [20:29:59] dist-packages is debians rename of site-packages [20:30:18] okay, and this is some seperate host [20:30:21] ok... [20:30:24] yeah [20:31:14] so... interwebz -> tools-checker.wmflabs.org --> nginx --> uwsgi --> toolschecker.py? [20:31:24] yeah. [20:31:34] if we decide we don’t need ssl (not sure yet) we can take out nginx [20:31:44] <_joe_> ottomata: of course, yes, they are the second candidate [20:31:53] <_joe_> the first being part of the varnish config [20:31:54] we could also firewall it to make it accessible only to catchpoint, not sure of that yet. [20:32:54] _joe_: pybal? [20:33:18] <_joe_> ottomata: yep [20:33:34] <_joe_> (today is bank holiday in Italy) [20:33:52] fwiw I just extended the deploy slot I had since I got interrupted by standup that was supposed to be short but wasnt... still on tin, almost done, Deployments page is now updated to reflect this [20:34:52] (I deployed before to wmf3 which was the most important, just finishing wmf4 now) [20:35:04] aye, no worries _joe_ [20:35:08] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [20:35:27] <_joe_> ottomata: no I was just explaining why it took me so long to read your question :) [20:36:47] RECOVERY - High load for whatever reason on labstore1001 is OK Less than 50.00% above the threshold [16.0] [20:36:50] valhallasw`cloud: could also end up using something not uwsgi (tornado?) but eh seems ok [20:36:55] valhallasw`cloud: OR NODEJS! [20:37:05] !log andyrussg Synchronized php-1.26wmf4/extensions/EducationProgram/: Update EducationProgram (duration: 00m 21s) [20:37:12] Logged the message, Master [20:39:22] woohoo that worked! done now [20:41:50] 6operations: agomez gets bounce back from replying to fr-tech@ - https://phabricator.wikimedia.org/T95961#1253414 (10Dzahn) [20:41:50] 6operations: agomez gets bounce back from replying to fr-tech@ - https://phabricator.wikimedia.org/T95961#1253418 (10Dzahn) [20:41:50] 6operations: agomez gets bounce back from replying to fr-tech@ - https://phabricator.wikimedia.org/T95961#1253420 (10Krenair) [20:42:07] (03PS3) 10Yuvipanda: tools: Return proper 503 status code when there's a failure [puppet] - 10https://gerrit.wikimedia.org/r/208266 [20:42:09] (03PS3) 10Yuvipanda: tools: Make toolschecker run directly as an upstart job [puppet] - 10https://gerrit.wikimedia.org/r/208243 [20:42:13] (03CR) 10Merlijn van Deen: [C: 04-1] tools: Make toolschecker run directly as an upstart job [puppet] - 10https://gerrit.wikimedia.org/r/208243 (owner: 10Yuvipanda) [20:42:31] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Return proper 503 status code when there's a failure [puppet] - 10https://gerrit.wikimedia.org/r/208266 (owner: 10Yuvipanda) [20:45:18] (03CR) 10Merlijn van Deen: "THIS STUFF" (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/208243 (owner: 10Yuvipanda) [20:47:26] (03PS8) 10Andrew Bogott: puppetsigner: Clean up certs and salt keys for instances we can't find in ldap [puppet] - 10https://gerrit.wikimedia.org/r/205897 [20:47:35] yuvipanda: if I recall correctly, you offered to test my changes to ^ ? [20:47:48] andrewbogott: bah, yes. in about 15 minutes? [20:47:52] (03PS1) 10Rush: Icinga: monitor failure rate to our anchors [puppet] - 10https://gerrit.wikimedia.org/r/208268 [20:48:00] yuvipanda: anytime today would be great. [20:48:11] 6operations, 10VisualEditor: [Regression?] Removing edited articles from watchlist at it.wp - https://phabricator.wikimedia.org/T97838#1253430 (10Catrope) [20:48:14] I mostly finished the patch, so you can ignore the ‘todos’ in gerrit (other than ‘test’ of course) [20:48:26] 6operations, 10VisualEditor: [Regression?] Removing edited articles from watchlist at it.wp - https://phabricator.wikimedia.org/T97838#1253147 (10Catrope) >>! In T97838#1253167, @Elitre wrote: > Actually, adding @Ori. Maybe this has to do with JavaScript "experiments" at it.wiki? Well it's apparently also hap... [20:48:37] (03CR) 10jenkins-bot: [V: 04-1] Icinga: monitor failure rate to our anchors [puppet] - 10https://gerrit.wikimedia.org/r/208268 (owner: 10Rush) [20:49:14] awight: yt? [20:49:46] (03PS4) 10Yuvipanda: tools: Make toolschecker run directly as an upstart job [puppet] - 10https://gerrit.wikimedia.org/r/208243 [20:50:09] ottomata: hi! [20:50:25] Thanks again for the kafkatee work [20:50:46] (03PS5) 10Yuvipanda: tools: Make toolschecker run directly as an upstart job [puppet] - 10https://gerrit.wikimedia.org/r/208243 [20:50:57] ottomata: Here's what we're trying to do in the medium-big picture: https://phabricator.wikimedia.org/T90649 [20:51:24] Then, AndyRussG has another project which will supercede that system as well... you two should connect! [20:51:36] (03PS2) 10Rush: Icinga: monitor failure rate to our anchors [puppet] - 10https://gerrit.wikimedia.org/r/208268 [20:51:54] (03PS6) 10Yuvipanda: tools: Make toolschecker run directly as an upstart job [puppet] - 10https://gerrit.wikimedia.org/r/208243 [20:52:25] andrewbogott: so we test that by setting that on a project-wide puppetmaster, and then deleting hosts... [20:52:26] I guess [20:53:04] yuvipanda: yeah, and making sure it doesn’t delete valid keys as well... [20:53:17] Also you can ‘test’ by reading my patch and deciding if it looks crazy :) [20:53:41] (03PS7) 10Yuvipanda: tools: Make toolschecker run directly as an upstart job [puppet] - 10https://gerrit.wikimedia.org/r/208243 [20:54:27] (03PS8) 10Yuvipanda: tools: Make toolschecker run directly as an upstart job [puppet] - 10https://gerrit.wikimedia.org/r/208243 [20:54:35] (03PS3) 10Rush: Icinga: monitor failure rate to our anchors [puppet] - 10https://gerrit.wikimedia.org/r/208268 [20:56:30] (03PS4) 10Rush: Icinga: monitor failure rate to our anchors [puppet] - 10https://gerrit.wikimedia.org/r/208268 [20:56:37] (03CR) 10Dzahn: "you'll have to define the checkcommand to, should be modules/nagios_common/files/checkcommand.cfg" [puppet] - 10https://gerrit.wikimedia.org/r/208268 (owner: 10Rush) [20:57:02] awight: do you need this on unsampled data? [20:57:13] this looks like just hourly counts to that banner, right? [20:57:52] oh, and you want them more frequently, though, right? [20:58:01] like as often as possible, at least every 15 minutes? [20:59:13] awight: ottomata: ? [20:59:18] hiya [20:59:28] ottomata: yeah, they are counts with some other stuff involved--we have to parse the URL into columns in our internal db [20:59:36] hey [20:59:37] just curious about what realtimeish stuff you are tryign to do with fundraising webrequest banner logs [21:00:07] AndyRussG: background is that ottomata just switch our banner impression counting from udp2log to kafkatee... we were talking about the next steps [21:00:16] uhh, i didn't just swithc it :) [21:00:23] <_< ! [21:00:29] awight: ottomata: ah hmm ?! [21:00:32] OK [21:00:37] kafkatee is generating logs alongside of the udp2log ones [21:00:40] i want to switch it! [21:00:44] Ah got it :) [21:00:46] (03CR) 10Dzahn: [C: 031] "thoughts for later: maybe move the IP addresses into hiera" [puppet] - 10https://gerrit.wikimedia.org/r/208268 (owner: 10Rush) [21:00:46] cool! [21:00:47] it should be transparent to you all [21:01:06] 10Ops-Access-Requests, 6operations, 10Wikimedia-Git-or-Gerrit: Stephen Niedzielski's Wikitech - https://phabricator.wikimedia.org/T97808#1253463 (10JGulingan) Hi Stephen, Here is a response I got from ops. ---------- Forwarded message ---------- From: Dzahn Date: Fri,... [21:01:34] https://github.com/wikimedia/operations-puppet/blob/production/manifests/role/logging.pp#L468 [21:02:03] but, more generally, i feel like there are probably better ways to do what you are doing than rotating a log file every 15 minutes and then parsing it [21:02:14] so i'm curious what you are trying to do overall [21:02:19] in an ideal world, how would this work? [21:03:15] (03CR) 10Merlijn van Deen: [C: 031] tools: Make toolschecker run directly as an upstart job [puppet] - 10https://gerrit.wikimedia.org/r/208243 (owner: 10Yuvipanda) [21:03:25] (03PS5) 10Rush: Icinga: monitor failure rate to our anchors [puppet] - 10https://gerrit.wikimedia.org/r/208268 [21:03:29] ottomata: what I've been working on is the client and php-side changes to pull the data from the logs of a different server call [21:03:37] (03CR) 10Yuvipanda: [C: 032] tools: Make toolschecker run directly as an upstart job [puppet] - 10https://gerrit.wikimedia.org/r/208243 (owner: 10Yuvipanda) [21:03:48] 'pull the data from the logs of a different server call' what's that mean? [21:04:04] Well uh that's my layman's way of saying [21:04:26] that the data goes into the cluster via a HTTP call, currently to Special:RecordImpression [21:04:40] medium-term we're switching to Special:BannerLoader [21:04:44] the cluster...? [21:04:48] the analytics cluster? [21:04:51] ah no [21:04:55] The cluster cluster [21:04:57] uuuhh [21:04:59] youknow [21:05:02] where we keep those... [21:05:03] wikis? [21:05:08] hhah [21:05:10] (03PS6) 10Rush: Icinga: monitor failure rate to our anchors [puppet] - 10https://gerrit.wikimedia.org/r/208268 [21:05:13] it goes into mysql? [21:05:17] sorry for my lack of precise terminology [21:05:44] are these all client side calls? [21:05:55] uuuh [21:05:57] client side http requests? [21:06:05] K so there are these folks that visit our Web sites with a browser [21:06:20] their browser calls our servers [21:06:26] in the background [21:06:33] haha, yes, but for your data specifically [21:06:40] right [21:06:50] so their browser makes a bacckround http request that includes the data you are interested in [21:06:54] and that gets logged somewhere? [21:06:59] (03PS7) 10Rush: Icinga: monitor failure rate to our anchors [puppet] - 10https://gerrit.wikimedia.org/r/208268 [21:07:01] so the background calls have URLs with params, and the data we want is on the params [21:07:13] that is correct yeah [21:07:14] ah, and the URLs are those Special: pages [21:07:18] exactly [21:07:28] and those are special server side pages, that do what? [21:07:31] log to where? [21:07:32] So right now it's an extra roundtrip via Special:RecordImpression [21:07:35] or do they do nothing? [21:07:43] and you just collect the logs from the http request logs [21:07:49] (03PS2) 10Ori.livneh: Allow computed dblist expressions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208263 [21:07:54] (03CR) 10jenkins-bot: [V: 04-1] Allow computed dblist expressions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208263 (owner: 10Ori.livneh) [21:07:55] Special:RecordImpression does nothing other than log. Yeah we just collect the http request logs [21:08:05] nothing other than log... [21:08:07] you mean it logs itself? [21:08:09] !log Ran FlowUpdateWorkflowPageId.php for all production Flow wikis for https://phabricator.wikimedia.org/T96888 [21:08:14] Logged the message, Master [21:08:14] or you collect from the varnish http logs [21:08:21] (03PS1) 10Yuvipanda: tools: Appropriate .conf suffix for upstart file [puppet] - 10https://gerrit.wikimedia.org/r/208269 [21:08:26] e.g. hive webrequset table, or udp2log files? [21:08:33] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Appropriate .conf suffix for upstart file [puppet] - 10https://gerrit.wikimedia.org/r/208269 (owner: 10Yuvipanda) [21:08:36] e.g. /bannerImpressions-sampled100.tsv.log [21:08:37] ? [21:08:43] Currently it's udp2log for sampled data yes [21:08:48] and hive for unsampled [21:08:53] ok. [21:08:56] why not: [21:09:04] So the change is going to be to another special page that is already being called and does indeed do something [21:09:10] ok. [21:09:14] qs: [21:09:19] why use http logs for this? [21:09:22] that is a lot of crap to sort through [21:09:41] ottomata: hmm [21:09:47] why not just: client side call -> server side somethign -> log to service/file/kafka [21:09:47] ? [21:09:53] then your logs are separate from all the other http crap [21:10:02] hmm [21:10:02] this would be like eventlogging (but i wouldnt' use eventlogging right now) [21:10:05] (for htis) [21:10:09] (in the future yes.) [21:10:18] (03PS3) 10Dduvall: ci: Role for running Raita [puppet] - 10https://gerrit.wikimedia.org/r/208024 [21:10:28] (03PS2) 10Ori.livneh: Add group1.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208264 [21:10:32] (03CR) 10jenkins-bot: [V: 04-1] Add group1.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208264 (owner: 10Ori.livneh) [21:10:35] (03PS3) 10Ori.livneh: Allow computed dblist expressions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208263 [21:10:42] (03PS3) 10Ori.livneh: Add group1.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208264 [21:10:54] actually, AndyRussG, if all your calls are from client side .js or embedded gif [21:11:04] we could very easily put your logs into their own kafka topics from varnish [21:11:13] ottomata: hmmm. Well, in the future, it'll be logging calls on a page that's already being called [21:11:15] then you'd have a dedicated stream of logs you could parse and do whatever you wante dwith realtime [21:11:24] K [21:11:26] you wouldnl't even need a page in this case though [21:11:31] you'd do somethign like [21:11:34] (03PS8) 10Rush: Icinga: monitor failure rate to our anchors [puppet] - 10https://gerrit.wikimedia.org/r/208268 [21:11:49] One thing I should clarify, I don't deal with the data when it comes out of the system, so I'm not deeply familiar with how that part works or what is needed there [21:11:56] That's mainly Ellery's department [21:11:56] http://bits.wm.org/beacon/fundraiding.gif?key=val&key2=val... [21:12:11] and those query params could go straight into kafka [21:12:19] Lemme give you a bit more details 'cause it's a bit more complex than that [21:12:20] then on teh data that comes out of system [21:12:33] kafka consumer --topic fundraising-whatever | do_CooL-stuff.py [21:12:35] (03CR) 10Rush: [C: 032] Icinga: monitor failure rate to our anchors [puppet] - 10https://gerrit.wikimedia.org/r/208268 (owner: 10Rush) [21:12:54] k... [21:12:55] ottomata: https://www.mediawiki.org/wiki/Extension:CentralNotice/Notes/Campaign-associated_mixins_and_banner_history [21:13:24] Also: https://phabricator.wikimedia.org/T45250 [21:13:25] RL? [21:13:38] ResourceLoader [21:14:13] So there are really two kinds of logging that need to happen [21:14:35] 1) full-on unsampled logging of every banner that actually gets displayed to a user [21:14:50] 2) logging of a sample of users' history of banners they've seen [21:15:05] so, currently, the FR udp2log collected get rotated and parsed every 15 mins, and those are stored in a database somewhere, and that data informs future banner choices? [21:15:05] (03PS1) 10Yuvipanda: tools: Allow nginx (www-data) to write to toolschecker socket [puppet] - 10https://gerrit.wikimedia.org/r/208270 [21:15:17] (03PS2) 10Yuvipanda: tools: Allow nginx (www-data) to write to toolschecker socket [puppet] - 10https://gerrit.wikimedia.org/r/208270 [21:15:23] ottomata: correct [21:15:24] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Allow nginx (www-data) to write to toolschecker socket [puppet] - 10https://gerrit.wikimedia.org/r/208270 (owner: 10Yuvipanda) [21:15:35] ok, that is the part I a trying to help with [21:15:45] the log rotationg and parsing thing sounds really clunky to me [21:15:47] PROBLEM - puppet last run on neon is CRITICAL Puppet last ran 1 day ago [21:17:02] AndyRussG: are all of these logs generated by client side js then? [21:17:12] js http requests? [21:20:49] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 66.67% of data above the critical threshold [24.0] [21:22:42] ottomata: sorry was afk for a sec [21:22:43] (03PS1) 10Yuvipanda: tools: Make uwsgi behave properly on SIGTERM [puppet] - 10https://gerrit.wikimedia.org/r/208271 [21:22:48] (03CR) 10jenkins-bot: [V: 04-1] tools: Make uwsgi behave properly on SIGTERM [puppet] - 10https://gerrit.wikimedia.org/r/208271 (owner: 10Yuvipanda) [21:22:51] (03PS2) 10Yuvipanda: tools: Make uwsgi behave properly on SIGTERM [puppet] - 10https://gerrit.wikimedia.org/r/208271 [21:23:09] RECOVERY - puppet last run on neon is OK Puppet is currently enabled, last run 3 minutes ago with 0 failures [21:23:45] (03CR) 10Yuvipanda: [C: 032] tools: Make uwsgi behave properly on SIGTERM [puppet] - 10https://gerrit.wikimedia.org/r/208271 (owner: 10Yuvipanda) [21:23:47] K uuh so to continue...! Right so the (1) unsampled logs are not on the client, only generated by the http requests [21:23:58] and the (2) sampled banner history logs don't exist yet, but they will be generated on the client [21:24:46] So the issue with (1) is that it's a LOT of requests. that's why there's the Phabricator task asking us to burn Special:RecordImpression in a fire [21:25:02] ok, don't understand 1 [21:25:06] And that's why we're moving things to get the information from calls to the server that have to happen anyway [21:25:25] what is in (1) again? [21:25:30] query params, right? [21:25:31] So when there's a fundraisng campaign or another kind of campaign tageting a lto of users, millions of people get banners [21:25:48] and FR and potentially others want to be able to count every single one of those banner displays [21:26:16] So currently there's this special logging request happing really a lot [21:26:23] hm, right, so. i'm suggesting not to use mediawiki at all for your log collection [21:27:10] if all of your info is client side (banner shown? other params...) [21:27:18] then [21:27:20] Sure... But it'll still be client and WMF bandwidth and cycles [21:27:27] yes [21:27:41] but if you log to your own kafka topic via http request to varnishkafka [21:27:56] then you get a stream that is 100% unsampled banner impressions, without any other http requests [21:28:19] ottomata: Mmm OK yes that's feasible [21:28:55] The upcoming proposed change is to use logs from Special:BannerLoader, which is a call that's already happening and has another purpose too: to fetch the banner to display from the server [21:29:19] valhallasw`cloud: wtf [21:29:28] ? [21:29:29] https://www.irccloud.com/pastebin/YBsESgAv [21:29:31] doesn’t work [21:29:33] AndyRussG: are those calls to the server not cached by varnish then? [21:29:37] but [21:29:41] if I make it —chmod-socket=664 [21:29:42] it works [21:29:42] ottomata: I think they are [21:29:47] (03PS1) 10Rush: Icinga: add icinga::monitor::ripeatlas [puppet] - 10https://gerrit.wikimedia.org/r/208272 [21:29:55] if they are, then they aren't logging via server side [21:29:55] (03PS2) 10Rush: Icinga: add icinga::monitor::ripeatlas [puppet] - 10https://gerrit.wikimedia.org/r/208272 [21:30:02] yuvipanda: oh! nginx has to read and maybe write to it? [21:30:06] you jsut filter out the http request logs from varnish for hits to that uri [21:30:13] and nginx runs as user www-data (?) so maybe we do need that group set to www-data [21:30:13] valhallasw`cloud: yeah, so I’ve it set to —chmod-socket 664 [21:30:17] valhallasw`cloud: but it needs the = [21:30:26] so —chmod-socket 664 doesn’t work [21:30:26] ohhh [21:30:30] while —chmod-socket=664 works [21:30:43] sounds like rtfm? :P [21:30:43] without the = there’s a fuckton of [21:30:44] ottomata: right well I think the idea was precisely for there to be some direct varnish-kafka interaction [21:30:45] > unable to load configuration from 664 [21:30:52] yes! :) [21:30:58] right :) [21:30:59] and that is basically available now [21:30:59] valhallasw`cloud: what? the other params all just work with — [21:31:06] Right! [21:31:16] So that's what I think can happen for (1) the full unsampled logs [21:31:38] ya ok, so, 2 is the history stored on the client? in a cookie? [21:31:49] (03CR) 10Rush: [C: 032] Icinga: add icinga::monitor::ripeatlas [puppet] - 10https://gerrit.wikimedia.org/r/208272 (owner: 10Rush) [21:31:51] 2 will be data stored in localstorage [21:31:53] on the client [21:31:53] that is because we don't have unique ids, right? [21:32:10] and can't associate http request with individuals [21:32:11] ? [21:32:18] and a percentage will be sampled and sent back, I was expecting to use eventlogging [21:32:30] yuvipanda: hm, the fm doesn't say anything about it [21:32:45] aye, cool. AndyRussG, there is some analytics dev work to scale eventlogging more, and make it work with kafka more too. [21:32:47] so that is good [21:32:53] yuvipanda: https://uwsgi-docs.readthedocs.org/en/latest/Options.html?highlight=chmod#chmod-socket like wat [21:32:57] that system i think will eventually work in this same way [21:32:59] ottomata: cool!! [21:33:16] but, there are a lot of moving parts there, so that might take a while [21:33:28] i assume you guys want this working before next FR season? [21:33:29] yeah! sounds familiar :) [21:33:36] ottomata: yes very definitely [21:33:44] k, then, yeah, i think good plan is: [21:33:48] FR is actually going on almost all the time in many different parts of the world [21:33:54] But the big English campaigns are in December [21:33:59] - specialized FR varnishkafka for (1) [21:33:59] - sampled eventlogging for (2) [21:34:11] ottomata: yep! That'll be perfect [21:36:13] ottomata: I kinda have to run in a sec 8p [21:36:59] ok cool, i'm about to head out too. [21:37:23] AndyRussG: real quick [21:37:37] what is the peak volume of (1)? [21:37:46] % of all http requests is fine [21:37:53] bblack: so my suggestion is this: instead of the variety of static-$foo dirs that we have under the bits docroot, make a static/ dir and nest everything (including favicon) under that [21:37:57] (03PS1) 10Yuvipanda: tools: Use '=' for passing params / values to uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/208273 [21:38:01] (03CR) 10jenkins-bot: [V: 04-1] tools: Use '=' for passing params / values to uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/208273 (owner: 10Yuvipanda) [21:38:20] (03PS2) 10Yuvipanda: tools: Use '=' for passing params / values to uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/208273 [21:38:22] valhallasw`cloud: ^ [21:39:19] (03CR) 10Merlijn van Deen: [C: 031] tools: Use '=' for passing params / values to uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/208273 (owner: 10Yuvipanda) [21:39:42] ottomata: the peak volume? it could even be 100% [21:39:52] well no, that's not true [21:39:58] (03CR) 10Yuvipanda: [C: 032] tools: Use '=' for passing params / values to uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/208273 (owner: 10Yuvipanda) [21:39:59] but 100% of pageviews, or close [21:40:17] Most campaigns target either logged-in users or not-logged-in users [21:41:11] But some could target both, or there could be simultaneous ones [21:41:25] interesting, AndyRussG, will this be new, or has this additinoal volume also happened in the past? [21:41:32] So for scalability it needs to be able to get up to 100% of pageviews [21:41:57] ottomata: no, it's not new, it's what currently can happen with Special:RecordImpression, which is why people want to kill it [21:42:30] ok cool. so, we could be sure to handle a +100% increase in kafka traffic, but I'd want to make for sure that we could [21:42:43] but, if this volume has already happened in the past, then clearly we could already :) [21:43:32] AndyRussG: one more thing. If I set up an http endpoint for you to hit, and then showed you how to consume from kafka, would that be helpful in the very near term? [21:44:42] awight: as for the kafkatee logs on erbium, those probably shoudl be still verified as soon as possible [21:44:48] we should still keep that system around for a while [21:44:59] i'm itching to turn off udp2log :) [21:45:59] AndyRussG: ah, wait, 100% of pageviews...that is fine. i was thikning of 100% of http requests, but of course not. pageviews are a smaller % of all kafkatraffic. so yeah, shoudl be no problem [21:46:44] ottomata: for an idea of how high it has gotten even at non-peak times, see the numbers posted by Faidon near the top of the S:RI phab task (again, https://phabricator.wikimedia.org/T45250) [21:46:56] ottomata: fantastic :) [21:47:20] AndyRussG: will you or awight be checking in on the kafkatee logs that are currently on erbium? [21:47:29] https://phabricator.wikimedia.org/T97676 [21:48:58] ottomata: this is the first I hear of them, but if u like I can try, I'll have to learn how first, tho [21:50:17] AndyRussG: they should look exactly like hte udp2logs also saved on erbium [21:50:23] they are being saved alongside of the udp2log ones now [21:50:42] i would like to save these at the exact same path the udp2log ones are, and then turn off udp2log [21:50:46] ottomata: ah OK! I've worked a little with the old python script that processes those [21:51:03] perfect! ja, i just want verification that they will work in place of udp2log ones [21:51:13] that sounds like a plan! It'll still be a temporary measure while we work towards the varnishkafka and other stuff [21:51:14] the whole reason we had kafkatee written was so that we could replace udp2log with kafka more easily [21:51:18] exactly. [21:51:28] ah K got it [21:51:50] yeah the python script for udp2log just munges in the logs and fills up a database based on them [21:51:58] AndyRussG: one thing to double check specifically,i think that kafkatee buffers a little differently, and there may be an incomplete line at the end of the file at any given time, not sure how severe that is or if it matters [21:52:05] So if the new logs look just like the old logs, it should eat them fine [21:52:13] right [21:52:16] I mean OK [21:52:27] Not sure either, but we can do a bit of testing ahead of time [21:52:33] i think there is a special thing that rotates them in, check to see if any of the rotated files end in an impcomplete line [21:52:35] aye [21:52:44] If you send me a sample of the kafkatee logs I can test the script out on them locally [21:53:05] (03PS1) 10Ori.livneh: Add EOL at EOF [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208277 [21:53:28] (03CR) 10Ori.livneh: [C: 032] Add EOL at EOF [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208277 (owner: 10Ori.livneh) [21:53:33] (03Merged) 10jenkins-bot: Add EOL at EOF [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208277 (owner: 10Ori.livneh) [21:54:21] oh, AndyRussG, do you have access to erbium? [21:54:34] erbium.eqiad.wmnet [21:54:39] 6operations, 10Wikimedia-DNS, 5Patch-For-Review: Set up new URL policy.wikimedia.org - https://phabricator.wikimedia.org/T97329#1253609 (10Yana) Excellent. Will let you know. [21:55:03] ottomata: not sure, but I can check [21:55:05] oh you don't [21:55:07] filing ticket [21:55:08] Ah K [21:55:10] thx! [21:55:18] (03PS1) 10Rush: Icinga: fix ripe atlas command definitions [puppet] - 10https://gerrit.wikimedia.org/r/208279 [21:55:23] (03CR) 10jenkins-bot: [V: 04-1] Icinga: fix ripe atlas command definitions [puppet] - 10https://gerrit.wikimedia.org/r/208279 (owner: 10Rush) [21:55:32] (03PS2) 10Rush: Icinga: fix ripe atlas command definitions [puppet] - 10https://gerrit.wikimedia.org/r/208279 [21:55:37] ottomata: I really have to be afk again for a while... I think we basically have a plan now? Maybe we could do e-mail for any remaining details? [21:56:06] (03CR) 10Dzahn: [C: 032] "yea, you beat me to it" [puppet] - 10https://gerrit.wikimedia.org/r/208279 (owner: 10Rush) [21:56:24] AndyRussG: yes [21:56:31] ottomata: cool thanks! [21:56:33] AndyRussG: who's your manager/ Katie? [21:56:39] ottomata: yep! [21:56:47] k [21:56:52] thanks! [21:56:54] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Add andyrussg to udp2log-users group to allow him to verify kafkatee generated fundraising log files on erbium - https://phabricator.wikimedia.org/T97860#1253616 (10Ottomata) 3NEW [21:57:14] K tty later :) thanks for ur work on this :) [21:58:14] yup, thank you! laters! [21:59:06] i'm out too [21:59:08] latesr all! [21:59:09] PROBLEM - High load for whatever reason on labstore1001 is CRITICAL 50.00% of data above the critical threshold [24.0] [22:07:44] (03PS1) 10Ori.livneh: Replace static-$version dirs with static/$version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208283 [22:08:44] (03CR) 10Ori.livneh: [C: 032] Replace static-$version dirs with static/$version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208283 (owner: 10Ori.livneh) [22:08:50] (03Merged) 10jenkins-bot: Replace static-$version dirs with static/$version [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208283 (owner: 10Ori.livneh) [22:15:41] (03PS1) 10Ori.livneh: adapt static hash exemption for I1e900b00c [puppet] - 10https://gerrit.wikimedia.org/r/208287 [22:16:50] ^^ bblack, going to merge this so that thigns are compatible with https://gerrit.wikimedia.org/r/#/c/208283/ . i went with "^/w/static" rather than "^/w/static[-/]" to avoid the risk of a regex syntax accident [22:17:16] (03CR) 10Ori.livneh: [C: 032] adapt static hash exemption for I1e900b00c [puppet] - 10https://gerrit.wikimedia.org/r/208287 (owner: 10Ori.livneh) [22:17:27] bblack: oh hey, you're here [22:17:36] could you take a quick look then? i haven't merged it in prod yet [22:17:50] (03PS4) 10Dduvall: ci: Role for running Raita [puppet] - 10https://gerrit.wikimedia.org/r/208024 [22:18:47] i'm making everything go under static/X.XXwmfX instead of static-X.XXwmfX. then i'll move favicons so that they're nested under static/ as well. [22:18:54] ori: so is the idea to put /favicons/ underneath the static/ stuff as well? [22:19:00] ok [22:19:26] i kept the existing dir structure in place too so that links don't break [22:19:30] and then I guess post-transition, we can upgrade the vcl_hash regex to ^/static/ to be more explicit [22:19:39] yep [22:19:54] ok, works for me :) [22:19:55] [-/] had me worried since '-' can be interpreted as a range marker inside [] [22:19:57] cool, thanks [22:20:12] (03PS1) 10Rush: Icinga: fix overzealous replace for 'alert' [puppet] - 10https://gerrit.wikimedia.org/r/208290 [22:21:21] (03CR) 10Rush: [C: 032] Icinga: fix overzealous replace for 'alert' [puppet] - 10https://gerrit.wikimedia.org/r/208290 (owner: 10Rush) [22:23:58] RECOVERY - High load for whatever reason on labstore1001 is OK Less than 50.00% above the threshold [16.0] [22:30:23] (03PS1) 10Ori.livneh: move favicon and apple-touch to w/static; leave symlinks for back-compat [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208291 [22:35:07] (03PS1) 10Ori.livneh: Update favico URL refs to point to /static/favicon/.. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208292 [22:36:56] (03CR) 10Ori.livneh: [C: 032] move favicon and apple-touch to w/static; leave symlinks for back-compat [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208291 (owner: 10Ori.livneh) [22:37:01] (03Merged) 10jenkins-bot: move favicon and apple-touch to w/static; leave symlinks for back-compat [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208291 (owner: 10Ori.livneh) [22:40:42] (03PS1) 10Rush: Icinga: ripe atlas alerts more overzealous replace fixing [puppet] - 10https://gerrit.wikimedia.org/r/208293 [22:41:29] (03CR) 10Rush: [C: 032] Icinga: ripe atlas alerts more overzealous replace fixing [puppet] - 10https://gerrit.wikimedia.org/r/208293 (owner: 10Rush) [22:46:03] (03CR) 10Ori.livneh: [C: 032] Update favico URL refs to point to /static/favicon/.. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208292 (owner: 10Ori.livneh) [22:46:23] (03Merged) 10jenkins-bot: Update favico URL refs to point to /static/favicon/.. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208292 (owner: 10Ori.livneh) [22:51:18] 10Ops-Access-Requests, 6operations, 10Wikimedia-Git-or-Gerrit: Stephen Niedzielski's Wikitech - https://phabricator.wikimedia.org/T97808#1253739 (10Dzahn) @JGulingan @niedzielski Done!:) You got the +2 now. Sorry, i was confused a bit, i expected it to be a Gerrit UI thing when actually it is all setup to b... [22:51:44] 10Ops-Access-Requests, 6operations, 10Wikimedia-Git-or-Gerrit: Stephen Niedzielski's Wikitech - https://phabricator.wikimedia.org/T97808#1253741 (10Dzahn) 5Open>3Resolved a:3Dzahn [22:53:09] 6operations, 10MediaWiki-DjVu, 10MediaWiki-General-or-Unknown, 6Multimedia, and 2 others: img_metadata queries for Djvu files regularly saturate s4 slaves - https://phabricator.wikimedia.org/T96360#1253752 (10faidon) This has been happening over and over again for all this time (happened just now again). E... [22:58:55] (03PS9) 10Yuvipanda: puppetsigner: Clean up certs and salt keys for instances we can't find in ldap [puppet] - 10https://gerrit.wikimedia.org/r/205897 (owner: 10Andrew Bogott) [23:00:57] (03PS1) 10Ori.livneh: create static/ symlinks in docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208297 [23:01:17] (03CR) 10Ori.livneh: [C: 032] create static/ symlinks in docroot [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208297 (owner: 10Ori.livneh) [23:03:47] PROBLEM - Varnishkafka Delivery Errors per minute on cp4015 is CRITICAL 11.11% of data above the critical threshold [20000.0] [23:06:04] (03CR) 10Aaron Schulz: [C: 031] gdash: adjust jobq dashboard [puppet] - 10https://gerrit.wikimedia.org/r/207786 (https://phabricator.wikimedia.org/T87594) (owner: 10Filippo Giunchedi) [23:08:46] RECOVERY - Varnishkafka Delivery Errors per minute on cp4015 is OK Less than 1.00% above the threshold [0.0] [23:08:57] (03PS10) 10Yuvipanda: puppetsigner: Clean up certs and salt keys for instances we can't find in ldap [puppet] - 10https://gerrit.wikimedia.org/r/205897 (owner: 10Andrew Bogott) [23:08:58] andrewbogott: ^ I updated it [23:09:15] andrewbogott: had a bunch of errors. [23:09:37] see PS9 vs PS10 [23:09:39] (03CR) 10jenkins-bot: [V: 04-1] puppetsigner: Clean up certs and salt keys for instances we can't find in ldap [puppet] - 10https://gerrit.wikimedia.org/r/205897 (owner: 10Andrew Bogott) [23:09:58] yuvipanda: yep, the changes make sense. [23:10:07] It works now, despite Jenkins’ objections? [23:10:56] yuvipanda: I’m cooking dinner now, but thanks for fixing! [23:11:08] let me see if it’\s the line length limitation [23:11:31] s/attach vector/attack vector [23:11:42] * yuvipanda andrewbogott: ah, no, jenkins is correct. I had used tabs (was hacking in the terminal) [23:12:09] didnt dare to say the tabs thing [23:12:51] heh [23:12:51] (03PS11) 10Yuvipanda: puppetsigner: Clean up certs and salt keys for instances we can't find in ldap [puppet] - 10https://gerrit.wikimedia.org/r/205897 (owner: 10Andrew Bogott) [23:12:55] better now. [23:13:20] there's a thing with Elasticsearch [23:13:32] well, on logstash hosts it says: [23:13:45] ElasticSearch health check for shards are CRIT [23:14:37] andrewbogott: am testing some more now [23:15:00] (03PS1) 10Ori.livneh: mediawiki: remove ancient & unused static/$lang rewrite [puppet] - 10https://gerrit.wikimedia.org/r/208300 [23:15:49] (03CR) 10Ori.livneh: [C: 032] mediawiki: remove ancient & unused static/$lang rewrite [puppet] - 10https://gerrit.wikimedia.org/r/208300 (owner: 10Ori.livneh) [23:16:12] andrewbogott: just realized I should have lunch :| I’m going to go have lunch, I”ll test some when I’m back [23:16:29] ori: https://gerrit.wikimedia.org/r/#/c/207355/ more mw module change ?:) [23:16:46] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL Anomaly detected: 10 data above and 0 below the confidence bounds [23:17:17] CUSTOM - ElasticSearch health check for shards on logstash1004 is CRITICAL - elasticsearch http://10.64.0.162:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health [23:17:17] CUSTOM - ElasticSearch health check for shards on logstash1005 is CRITICAL - elasticsearch http://10.64.16.185:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health [23:17:17] CUSTOM - ElasticSearch health check for shards on logstash1006 is CRITICAL - elasticsearch http://10.64.48.109:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health [23:18:34] mutante: I'll check the logstash boxes to see whats up [23:19:06] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL 14.29% of data above the critical threshold [500.0] [23:19:51] https://gdash.wikimedia.org/dashboards/reqerror/ <- wtf? [23:19:56] bd808: thank you, maybe they have never been OK because they are new and part of the "automatically" set up monitoring ticket [23:20:13] mutante: logstash100[1-3] look ok. I bet the alert is because logstash100[4-6] aren't fully up yet [23:21:40] bblack: I don't see anything in fatalmonitor to explain that graph [23:22:05] bd808: yes, confirmed. only 1004-1006 and they have probably been added about 2 days ago then [23:22:40] mutante: *nod* they need more TLC. _j.oe_ was going to help me there [23:22:50] I might poke at them over the weekend [23:23:41] bd808: i'll just ACK them so they wont show up on IRC until the next status change but when OK they will be back without having to remember to re-enable it [23:24:05] mutante: sounds like the right plan. thanks [23:24:15] sees a puppetmaster issue while doing that,,wtf [23:25:01] ACKNOWLEDGEMENT - ElasticSearch health check for shards on logstash1004 is CRITICAL - elasticsearch http://10.64.0.162:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health daniel_zahn still being installed [23:25:01] ACKNOWLEDGEMENT - ElasticSearch health check for shards on logstash1005 is CRITICAL - elasticsearch http://10.64.16.185:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health daniel_zahn still being installed [23:25:01] ACKNOWLEDGEMENT - ElasticSearch health check for shards on logstash1006 is CRITICAL - elasticsearch http://10.64.48.109:9200/_cluster/health error while fetching: Max retries exceeded for url: /_cluster/health daniel_zahn still being installed [23:25:36] bblack: There is a big bump of apache2 errors that seems to kind of line up with the 5xx bump -- https://logstash.wikimedia.org/#dashboard/temp/KuFISMaDRC6hy-lS_nVi_w [23:25:37] PROBLEM - puppetmaster https on palladium is CRITICAL - Socket timeout after 10 seconds [23:25:46] bd808: see elsewhere :) [23:26:17] bblack: should go away now [23:27:07] RECOVERY - puppetmaster https on palladium is OK: HTTP OK: Status line output matched 400 - 335 bytes in 1.507 second response time [23:27:49] it was strontium this time [23:33:47] 6operations, 10MediaWiki-DjVu, 10MediaWiki-General-or-Unknown, 6Multimedia, and 2 others: img_metadata queries for Djvu files regularly saturate s4 slaves - https://phabricator.wikimedia.org/T96360#1253818 (10aaron) It probably just hits < $this->pageCount( $image ) )>> and loads the metadata... [23:37:21] (03PS1) 10Ori.livneh: Update apple-touch to use static/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208303 [23:39:20] (03CR) 10Ori.livneh: [C: 032] Update apple-touch to use static/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208303 (owner: 10Ori.livneh) [23:39:27] (03Merged) 10jenkins-bot: Update apple-touch to use static/ [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208303 (owner: 10Ori.livneh) [23:40:26] RECOVERY - HTTP 5xx req/min on graphite1001 is OK Less than 1.00% above the threshold [250.0] [23:40:36] !log ori Synchronized wmf-config/InitialiseSettings.php: I02e28db61: Update apple-touch to use static (duration: 00m 23s) [23:40:46] Logged the message, Master [23:41:56] ori: https://gerrit.wikimedia.org/r/#/c/208304/ [23:42:13] windowcat: nice work [23:42:42] windowcat: now i can go back to creating my 20,000 file galleries [23:52:27] !log aaron Synchronized php-1.26wmf3/includes/media/DjVu.php: 6cdb23c5d662151a2b578c2acc8823bc975fc22a (duration: 00m 15s) [23:52:34] Logged the message, Master [23:53:33] !log aaron Synchronized php-1.26wmf4/includes/media/DjVu.php: caa2efc0e76c2ba849d465006600d131dc2f78b5 (duration: 00m 21s) [23:53:38] Logged the message, Master [23:55:16] PROBLEM - puppet last run on ms-fe3001 is CRITICAL puppet fail