[00:00:04] twentyafterfour: Dear anthropoid, the time has come. Please deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20151001T0000). [00:04:53] (03PS12) 10Alex Monk: Add all groups to general bastions, mostly empty bastiononly group [puppet] - 10https://gerrit.wikimedia.org/r/227327 (https://phabricator.wikimedia.org/T114161) [00:05:28] (03CR) 10jenkins-bot: [V: 04-1] Add all groups to general bastions, mostly empty bastiononly group [puppet] - 10https://gerrit.wikimedia.org/r/227327 (https://phabricator.wikimedia.org/T114161) (owner: 10Alex Monk) [00:08:00] (03PS13) 10Alex Monk: Add all groups to general bastions, mostly empty bastiononly group [puppet] - 10https://gerrit.wikimedia.org/r/227327 (https://phabricator.wikimedia.org/T114161) [00:08:04] PROBLEM - puppet last run on mw2013 is CRITICAL: CRITICAL: puppet fail [00:26:25] (03PS2) 10Dzahn: mailman: convert language templates, ca,es,fi,fr [puppet] - 10https://gerrit.wikimedia.org/r/242749 (https://phabricator.wikimedia.org/T114332) [00:32:13] (03CR) 10Catrope: [C: 031] Freeze LQT on Swedish Wikimedia chapter wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/242640 (https://phabricator.wikimedia.org/T114277) (owner: 10Mattflaschen) [00:32:51] (03PS3) 10Dzahn: mailman: convert language templates, ca,es,fi,fr [puppet] - 10https://gerrit.wikimedia.org/r/242749 (https://phabricator.wikimedia.org/T114332) [00:32:57] (03CR) 10Dzahn: [C: 032] mailman: convert language templates, ca,es,fi,fr [puppet] - 10https://gerrit.wikimedia.org/r/242749 (https://phabricator.wikimedia.org/T114332) (owner: 10Dzahn) [00:36:13] (03PS4) 10Dzahn: mailman: convert language templates, ca,es,fi,fr [puppet] - 10https://gerrit.wikimedia.org/r/242749 (https://phabricator.wikimedia.org/T114332) [00:37:01] (03CR) 10Dzahn: [C: 032] mailman: convert language templates, ca,es,fi,fr [puppet] - 10https://gerrit.wikimedia.org/r/242749 (https://phabricator.wikimedia.org/T114332) (owner: 10Dzahn) [00:37:34] RECOVERY - puppet last run on mw2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:42:11] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 for VBaranetsky - https://phabricator.wikimedia.org/T114308#1691889 (10RobH) @VBaranetsky, Access to stat1002 is a bit complicated, as there are a couple of different groups. Your request reads " Project: Trademark portfolio requires access t... [00:42:14] PROBLEM - puppet last run on fermium is CRITICAL: CRITICAL: Puppet has 3 failures [00:43:51] (03PS14) 10Alex Monk: Add all groups to general bastions, mostly empty bastiononly group [puppet] - 10https://gerrit.wikimedia.org/r/227327 (https://phabricator.wikimedia.org/T114161) [00:43:54] RECOVERY - puppet last run on fermium is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [00:44:02] !log applying puppet fix on fermium (illegal byte sequence in utf-8) as in T114289#1691038 but for other languages [00:45:16] (03PS15) 10Alex Monk: Add all groups to general bastions, mostly empty bastiononly group [puppet] - 10https://gerrit.wikimedia.org/r/227327 (https://phabricator.wikimedia.org/T114161) [00:45:58] 6operations, 10MediaWiki-extensions-TimedMediaHandler: Assign 3 more servers to video scaler duty - https://phabricator.wikimedia.org/T114337#1691895 (10brion) 3NEW [00:46:49] 6operations, 10MediaWiki-extensions-TimedMediaHandler: Assign 3 more servers to video scaler duty - https://phabricator.wikimedia.org/T114337#1691903 (10brion) [00:46:59] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 for Dan Foy - https://phabricator.wikimedia.org/T113324#1691904 (10RobH) @DFoy, I've gotten Sharee's approval via a back and forth in corporate email. (Non-ideal but it'll do in a pinch for this.) I'll fix up my patch and merge this tomorrow... [00:49:06] Krenair: nice [00:49:25] Krenair: that might work. I'll take a more full look later on and maybe poke chasemp when he has a bit more time [00:51:17] (03PS1) 10Dzahn: mailman: convert language templates, pt.2 [puppet] - 10https://gerrit.wikimedia.org/r/242771 (https://phabricator.wikimedia.org/T114332) [00:52:10] (03CR) 10Dzahn: [C: 032] mailman: convert language templates, pt.2 [puppet] - 10https://gerrit.wikimedia.org/r/242771 (https://phabricator.wikimedia.org/T114332) (owner: 10Dzahn) [00:59:48] (03PS1) 10Faidon Liambotis: otrs: add support for Apache's mod_remoteip [puppet] - 10https://gerrit.wikimedia.org/r/242772 [00:59:57] ori, bblack: ^ [01:01:05] * ori looks [01:05:50] paravoid: the only thing I'm unsure of is this bit, from the docs: "Unlike the RemoteIPInternalProxy directive, any intranet or private IP address reported by such proxies, including the 10/8, 172.16/12, 192.168/16, 169.254/16 and 127/8 blocks (or outside of the IPv6 public 2000::/3 block) are not trusted as the useragent IP, and are left in the RemoteIPHeader header's value." [01:06:11] I think you made the right choice, just late in the day and I want to make sure that's true [01:06:42] Yes, you did. [01:10:07] (03PS1) 10BryanDavis: Backport of D2486378: Implement compress.bzip2:// stream wrapper [debs/hhvm] - 10https://gerrit.wikimedia.org/r/242773 (https://phabricator.wikimedia.org/T113932) [01:11:11] ori: yeah, it confused the shit out of me too [01:11:25] paravoid: what about using address blocks rather than specific addresses? if the idea is to add this configuration snippet to all apaches which are behind a proxy, then any change to the cache::misc::nodes hiera var will result in a lot of apache service restarts [01:13:32] well, we have no address block for misc cp or even all cp* [01:13:47] would trusting all internal hosts be risky? [01:13:50] we'd have to allow "all of eqiad" and this would allow a random compromised box to potentially bypass auth restrictions [01:13:54] right [01:13:56] not so much in this case [01:14:02] (although otrs is kind of stupid) [01:14:16] but think of IP-based auth in other Apaches [01:14:51] in practice apache refreshes are not as scary as we are accustomed to thinking [01:14:55] Can we make Grafana HTTPS-only? [01:15:00] apache declines to reload if the configuration is invalid [01:15:15] indeed [01:15:34] (03CR) 10Ori.livneh: [C: 031] "I think this is correct. Did not test." [puppet] - 10https://gerrit.wikimedia.org/r/242772 (owner: 10Faidon Liambotis) [01:15:35] i'm not sure if notify => Service['apache2'] would reload, though... [01:16:01] it does [01:16:11] we specify [01:16:13] hasrestart => true, [01:16:13] restart => '/usr/sbin/service apache2 reload', [01:16:16] ah, nice [01:16:45] the other question, mostly to bblack [01:16:58] is whether I should allow only $::site's varnishes or all of them [01:17:22] I think the former, I don't think we'll ever do cross-DC varnish->app [01:17:25] but worth thinking about [01:17:40] (03CR) 10Faidon Liambotis: [C: 032] otrs: add support for Apache's mod_remoteip [puppet] - 10https://gerrit.wikimedia.org/r/242772 (owner: 10Faidon Liambotis) [01:17:56] (not worth holding this for that, there is no non-eqiad misc-lb cluster yet :) [01:18:02] thanks for the review, ori [01:18:03] nod [01:18:05] np! [01:18:16] you're welcome, mutante [01:20:22] Platonides: :) all templates converted [01:24:41] (03PS1) 10Yuvipanda: admin: Provision all stat** users on bastion too [puppet] - 10https://gerrit.wikimedia.org/r/242779 [01:27:16] (03CR) 10Alex Monk: "I would really prefer we just did I69fa7588 instead of making this existing problem worse by hiding it for more groups." [puppet] - 10https://gerrit.wikimedia.org/r/242779 (owner: 10Yuvipanda) [01:27:51] (03CR) 10Alex Monk: [C: 04-1] "Also, you should probably be removing a bunch of people from bastiononly in this commit." [puppet] - 10https://gerrit.wikimedia.org/r/242779 (owner: 10Yuvipanda) [01:29:30] !log ori@tin Synchronized php-1.26wmf24/includes/resourceloader/ResourceLoader.php: 1cfe27030e: Change load.php to minify per-module instead of per-request (duration: 00m 17s) [01:30:32] !log ori@tin Synchronized php-1.27.0-wmf.1/includes/resourceloader/ResourceLoader.php: 1cfe27030e: Change load.php to minify per-module instead of per-request (duration: 00m 17s) [01:58:16] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to stat1002 for JUnikowski_WMF - https://phabricator.wikimedia.org/T113298#1691977 (10Krenair) 5Resolved>3Open This is another case of bastion access being missed. [01:59:37] (03PS2) 10Yuvipanda: admin: Provision all stat** users on bastion too [puppet] - 10https://gerrit.wikimedia.org/r/242779 [02:00:45] Krenair: still need to cleanup bastiononly that just fixes the groups of users [02:02:10] 6operations, 6Analytics-Backlog, 10Analytics-EventLogging, 10MediaWiki-extensions-CentralNotice, 10Traffic: Eventlogging should transparently split large event payloads - https://phabricator.wikimedia.org/T114078#1691980 (10Tgr) >>! In T114078#1689560, @BBlack wrote: > Issues with uneccessary data within... [02:03:32] (03CR) 10Dzahn: "if there are stat users who have no bastion access that is most likely because they were added when stat servers still had public IPs and " [puppet] - 10https://gerrit.wikimedia.org/r/242779 (owner: 10Yuvipanda) [02:03:57] (03CR) 10Dzahn: [C: 04-1] admin: Provision all stat** users on bastion too [puppet] - 10https://gerrit.wikimedia.org/r/242779 (owner: 10Yuvipanda) [02:04:09] yuvipanda: please dont. let's check the groups first [02:04:22] if there are users who have stat access but no bastion [02:04:25] mutante: yeah me and Krenair are doing it [02:04:25] then they should be removed [02:04:26] in PM now [02:04:33] will pastebin [02:04:38] I have to go now [02:04:41] and I'll be back later [02:04:41] ok [02:05:55] or maybe move the PM to phab? i have to go too.. same here [02:09:24] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [02:09:58] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to stat1002 for JUnikowski_WMF - https://phabricator.wikimedia.org/T113298#1691983 (10JUnikowski_WMF) Thanks Kevin. Is it something on my side? [02:10:23] mutante yeah [02:10:24] Kevin? [02:10:37] ha-ha krenair is Kevin now [02:12:21] The nickname Kevin is taken [02:14:35] he might be /home alone [02:16:05] bbl [02:37:23] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [02:40:19] !log l10nupdate@tin Synchronized php-1.26wmf24/cache/l10n: l10nupdate for 1.26wmf24 (duration: 10m 54s) [02:47:31] !log l10nupdate@tin LocalisationUpdate completed (1.26wmf24) at 2015-10-01 02:47:31+00:00 [02:53:38] (03PS1) 10Faidon Liambotis: otrs: make Apache redirects XFP-relative [puppet] - 10https://gerrit.wikimedia.org/r/242787 [02:53:40] (03PS1) 10Faidon Liambotis: varnish: redirect otrs-test to HTTPS [puppet] - 10https://gerrit.wikimedia.org/r/242788 [02:54:08] (03CR) 10Faidon Liambotis: [C: 032 V: 032] otrs: make Apache redirects XFP-relative [puppet] - 10https://gerrit.wikimedia.org/r/242787 (owner: 10Faidon Liambotis) [02:54:23] (03CR) 10Faidon Liambotis: [C: 032 V: 032] varnish: redirect otrs-test to HTTPS [puppet] - 10https://gerrit.wikimedia.org/r/242788 (owner: 10Faidon Liambotis) [03:04:24] PROBLEM - Disk space on labstore1002 is CRITICAL: DISK CRITICAL - /run/lock/storage-replicate-labstore-others/snapshot is not accessible: Permission denied [03:04:38] 6operations, 10OTRS, 6Security: Make OTRS sessions IP-address-agnostic - https://phabricator.wikimedia.org/T87217#1692037 (10faidon) >>! In T87217#1036962, @pajz wrote: >>>! In T87217#1012875, @csteipp wrote: >> It looks like the session id is only added to the redirect on the initial login, and it doesn't l... [03:09:44] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [03:11:24] PROBLEM - puppet last run on labvirt1001 is CRITICAL: CRITICAL: puppet fail [03:12:16] (03PS1) 10Faidon Liambotis: otrs: disable SessionCheckRemoteIP [puppet] - 10https://gerrit.wikimedia.org/r/242789 (https://phabricator.wikimedia.org/T87217) [03:12:36] (03CR) 10Faidon Liambotis: [C: 04-1] "Do not merge until we move to OTRS 4." [puppet] - 10https://gerrit.wikimedia.org/r/242789 (https://phabricator.wikimedia.org/T87217) (owner: 10Faidon Liambotis) [03:14:21] !log l10nupdate@tin Synchronized php-1.27.0-wmf.1/cache/l10n: l10nupdate for 1.27.0-wmf.1 (duration: 10m 38s) [03:21:39] !log l10nupdate@tin LocalisationUpdate completed (1.27.0-wmf.1) at 2015-10-01 03:21:39+00:00 [03:27:14] RECOVERY - Disk space on labstore1002 is OK: DISK OK [03:37:34] RECOVERY - puppet last run on labvirt1001 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [04:04:54] PROBLEM - Disk space on labstore1002 is CRITICAL: DISK CRITICAL - /run/lock/storage-replicate-labstore-maps/snapshot is not accessible: Permission denied [04:08:15] PROBLEM - Incoming network saturation on labstore1003 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [100000000.0] [04:31:14] RECOVERY - Incoming network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [05:05:14] PROBLEM - mailman_queue_size on fermium is CRITICAL: CRITICAL: 2 mailman queue(s) above 100 [05:22:04] PROBLEM - Hadoop NodeManager on analytics1035 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [05:29:44] RECOVERY - mailman_queue_size on fermium is OK: OK: mailman queues are below 100 [05:44:53] RECOVERY - Hadoop NodeManager on analytics1035 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [05:48:54] PROBLEM - YARN NodeManager Node-State on analytics1035 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:49:13] PROBLEM - DPKG on analytics1035 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:50:20] (03CR) 10Yuvipanda: [C: 04-2] "Needs to:" [puppet] - 10https://gerrit.wikimedia.org/r/242779 (owner: 10Yuvipanda) [05:50:24] RECOVERY - YARN NodeManager Node-State on analytics1035 is OK: OK: YARN NodeManager analytics1035.eqiad.wmnet:8041 Node-State: RUNNING [05:50:35] RECOVERY - DPKG on analytics1035 is OK: All packages OK [05:54:14] PROBLEM - Exim SMTP on mx2001 is CRITICAL: CRITICAL - Cannot make SSL connection [05:59:14] RECOVERY - Exim SMTP on mx2001 is OK: OK - Certificate will expire on 09/22/2016 18:01. [06:29:54] PROBLEM - puppet last run on db1045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:54] PROBLEM - puppet last run on cp2001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:03] PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: Puppet has 3 failures [06:30:03] PROBLEM - puppet last run on mw2081 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:04] PROBLEM - puppet last run on mc2007 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:04] PROBLEM - puppet last run on mw1112 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:13] PROBLEM - puppet last run on subra is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:15] PROBLEM - puppet last run on mw2016 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:24] PROBLEM - puppet last run on mw1203 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:34] PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:35] PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: puppet fail [06:30:43] PROBLEM - puppet last run on db2044 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:43] PROBLEM - puppet last run on mw1086 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:16] PROBLEM - puppet last run on mw2043 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:24] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:44] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:54] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:04] PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:56:14] RECOVERY - puppet last run on db1045 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [06:56:24] RECOVERY - puppet last run on mw2081 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:56:24] RECOVERY - puppet last run on mw1112 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:34] RECOVERY - puppet last run on subra is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [06:56:35] RECOVERY - puppet last run on mw2016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:45] RECOVERY - puppet last run on mw1203 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:51] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Oct 1 06:56:51 UTC 2015 (duration 56m 50s) [06:57:03] RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:57:04] RECOVERY - puppet last run on mw1086 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:57:04] RECOVERY - puppet last run on db2055 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:57:35] RECOVERY - puppet last run on mw2043 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:03] RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:04] RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:05] RECOVERY - puppet last run on mc2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:44] RECOVERY - puppet last run on db2044 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:15] RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [06:59:24] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:59:44] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [06:59:53] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:28:47] (03PS1) 10Aaron Schulz: [WIP] Lowered "max lag" setting to 5 seconds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/242814 (https://phabricator.wikimedia.org/T95501) [07:28:53] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 18.18% of data above the critical threshold [500.0] [07:37:03] PROBLEM - puppet last run on mw2213 is CRITICAL: CRITICAL: puppet fail [07:42:04] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [07:42:52] (03CR) 10Giuseppe Lavagetto: "We should write a consistency test for admin/data/data.yaml instead." [puppet] - 10https://gerrit.wikimedia.org/r/242779 (owner: 10Yuvipanda) [07:55:31] (03PS1) 10ArielGlenn: link to sha1 sums list as well as md5 sums for dumps on index page [puppet] - 10https://gerrit.wikimedia.org/r/242816 [07:56:50] (03CR) 10ArielGlenn: [C: 032] link to sha1 sums list as well as md5 sums for dumps on index page [puppet] - 10https://gerrit.wikimedia.org/r/242816 (owner: 10ArielGlenn) [07:57:21] (03CR) 10Alex Monk: "What, and leave some random groups implying access but not others for no good reason?" [puppet] - 10https://gerrit.wikimedia.org/r/242779 (owner: 10Yuvipanda) [07:58:23] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [08:03:08] (03CR) 10ArielGlenn: [C: 032 V: 032] dumps: provide sha1 checksums of all files along with the old md5s [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/242626 (owner: 10ArielGlenn) [08:04:54] RECOVERY - puppet last run on mw2213 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [08:05:23] PROBLEM - mailman_queue_size on fermium is CRITICAL: CRITICAL: 1 mailman queue(s) above 100 [08:06:04] (03PS2) 10KartikMistry: Enable CX suggestions in ar, eo, hi, nl, vi and dawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/242528 (https://phabricator.wikimedia.org/T112848) [08:21:54] RECOVERY - mailman_queue_size on fermium is OK: OK: mailman queues are below 100 [08:21:58] (03PS2) 10ArielGlenn: Allow to backup globalimagelinks table, T87571 [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/200313 (owner: 10Kelson) [08:42:15] (03PS1) 10ArielGlenn: add globalusage table config info for dumps, T87571 [puppet] - 10https://gerrit.wikimedia.org/r/242824 [08:43:55] (03CR) 10ArielGlenn: [C: 032] add globalusage table config info for dumps, T87571 [puppet] - 10https://gerrit.wikimedia.org/r/242824 (owner: 10ArielGlenn) [08:49:16] (03PS3) 10Filippo Giunchedi: cassandra: ship systemd service file [puppet] - 10https://gerrit.wikimedia.org/r/242548 (https://phabricator.wikimedia.org/T108306) [08:49:38] (03PS1) 10ArielGlenn: dumps: normalize globalusage table config varname [puppet] - 10https://gerrit.wikimedia.org/r/242825 [08:50:39] (03CR) 10ArielGlenn: [C: 032] dumps: normalize globalusage table config varname [puppet] - 10https://gerrit.wikimedia.org/r/242825 (owner: 10ArielGlenn) [08:57:16] (03CR) 10Alexandros Kosiaris: [C: 031] cassandra: ship systemd service file [puppet] - 10https://gerrit.wikimedia.org/r/242548 (https://phabricator.wikimedia.org/T108306) (owner: 10Filippo Giunchedi) [08:59:05] !log disabling puppet on maps-test200* in expectance of https://gerrit.wikimedia.org/r/242548 [08:59:31] (03CR) 10Mobrovac: [C: 031] cassandra: ship systemd service file [puppet] - 10https://gerrit.wikimedia.org/r/242548 (https://phabricator.wikimedia.org/T108306) (owner: 10Filippo Giunchedi) [09:08:50] (03PS3) 10ArielGlenn: Allow to backup globalimagelinks table, T87571 [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/200313 (owner: 10Kelson) [09:10:17] (03PS4) 10ArielGlenn: Allow to backup globalimagelinks table, T87571 [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/200313 (owner: 10Kelson) [09:12:11] (03CR) 10ArielGlenn: [C: 032] Allow to backup globalimagelinks table, T87571 [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/200313 (owner: 10Kelson) [09:12:19] (03CR) 10ArielGlenn: [V: 032] Allow to backup globalimagelinks table, T87571 [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/200313 (owner: 10Kelson) [09:13:25] PROBLEM - puppet last run on mw1017 is CRITICAL: CRITICAL: Puppet has 3 failures [09:16:31] (03PS1) 10Giuseppe Lavagetto: kube2proxy: more fixes [puppet] - 10https://gerrit.wikimedia.org/r/242831 [09:17:20] (03CR) 10jenkins-bot: [V: 04-1] kube2proxy: more fixes [puppet] - 10https://gerrit.wikimedia.org/r/242831 (owner: 10Giuseppe Lavagetto) [09:18:25] RECOVERY - puppet last run on mw1017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:21:16] (03PS2) 10Giuseppe Lavagetto: kube2proxy: more fixes [puppet] - 10https://gerrit.wikimedia.org/r/242831 [09:21:52] (03CR) 10jenkins-bot: [V: 04-1] kube2proxy: more fixes [puppet] - 10https://gerrit.wikimedia.org/r/242831 (owner: 10Giuseppe Lavagetto) [09:24:17] <_joe_> grr puppet indentation [09:26:24] (03PS3) 10Giuseppe Lavagetto: kube2proxy: more fixes [puppet] - 10https://gerrit.wikimedia.org/r/242831 [09:28:14] <_joe_> attaboy puppet-lint [09:29:07] (03CR) 10Giuseppe Lavagetto: [C: 032] kube2proxy: more fixes [puppet] - 10https://gerrit.wikimedia.org/r/242831 (owner: 10Giuseppe Lavagetto) [09:34:19] 6operations, 10RESTBase, 10RESTBase-Cassandra: column family cassandra metrics size - https://phabricator.wikimedia.org/T113733#1692463 (10mark) >>! In T113733#1680817, @Eevans wrote: >> I think we can trim the list of derived metrics to the most relevant ones, e.g. 50/75/95/99 percentile, count, 1MinuteRate... [09:35:14] 6operations, 10Traffic, 7Browser-Support-Internet-Explorer, 7HTTPS: Xbox 360 Internet Explorer unable to view Wikipedia - https://phabricator.wikimedia.org/T105455#1692464 (10Chmarkine) Did Microsoft fix this issue yet? [09:40:05] 6operations, 10RESTBase, 10RESTBase-Cassandra: column family cassandra metrics size - https://phabricator.wikimedia.org/T113733#1692479 (10mobrovac) >>! In T113733#1692463, @mark wrote: > I'm sorry, but I wouldn't call ~220 GB of storage for Cassandra metrics alone particulary "frugal". :) That's rather a lo... [09:47:12] !log stop puppet on restbase* in preparation for https://gerrit.wikimedia.org/r/242548 [09:47:48] (03PS4) 10Filippo Giunchedi: cassandra: ship systemd service file [puppet] - 10https://gerrit.wikimedia.org/r/242548 (https://phabricator.wikimedia.org/T108306) [09:47:58] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] cassandra: ship systemd service file [puppet] - 10https://gerrit.wikimedia.org/r/242548 (https://phabricator.wikimedia.org/T108306) (owner: 10Filippo Giunchedi) [09:51:01] !log reboot praseodymium to test cassandra systemd unit [09:56:53] PROBLEM - very high load average likely xfs on ms-be1010 is CRITICAL: CRITICAL - load average: 225.89, 140.46, 67.90 [09:57:07] akosiaris: LGTM on praseodymium [10:04:54] PROBLEM - puppet last run on mw2125 is CRITICAL: CRITICAL: Puppet has 1 failures [10:15:04] PROBLEM - puppet last run on mw1227 is CRITICAL: CRITICAL: Puppet has 1 failures [10:20:32] !log uploaded php5_5.3.10-1ubuntu3.20+wmf1 on apt.wikimedia.org precise-wikimedia [10:26:04] PROBLEM - puppet last run on mw1062 is CRITICAL: CRITICAL: Puppet has 1 failures [10:28:21] (03CR) 10Giuseppe Lavagetto: "since that file is not owned by the web user, that would indeed break things as we are now. I'll abandon this change for now." [puppet] - 10https://gerrit.wikimedia.org/r/239795 (owner: 10Giuseppe Lavagetto) [10:28:30] (03Abandoned) 10Giuseppe Lavagetto: hhvm: harden cache permissions [puppet] - 10https://gerrit.wikimedia.org/r/239795 (owner: 10Giuseppe Lavagetto) [10:30:40] (03PS1) 10MarcoAurelio: Enable Extension:ShortUrl on mr.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/242842 (https://phabricator.wikimedia.org/T103646) [10:31:45] RECOVERY - puppet last run on mw2125 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [10:34:18] (03PS1) 10Filippo Giunchedi: cassandra: limit stdout logging to level >= WARN [puppet] - 10https://gerrit.wikimedia.org/r/242844 [10:35:39] !log cleared apt.wikimedia.org of lucid-wikimedia repo [10:35:39] !log hoo@tin Synchronized php-1.27.0-wmf.1/extensions/: Use 1.27.0-wmf.1 for Wikidata again after fixing T114290 (duration: 01m 07s) [10:36:04] PROBLEM - puppet last run on mw1067 is CRITICAL: CRITICAL: Puppet has 3 failures [10:38:02] akosiaris: I think we're only missing https://gerrit.wikimedia.org/r/#/c/242844/1 to avoid double logging and we're set! [10:39:11] (03CR) 10Alexandros Kosiaris: [C: 031] cassandra: limit stdout logging to level >= WARN [puppet] - 10https://gerrit.wikimedia.org/r/242844 (owner: 10Filippo Giunchedi) [10:39:18] godog: cool! [10:39:30] I 'll start enabling puppet on maps boxes then [10:40:04] PROBLEM - puppet last run on mw1100 is CRITICAL: CRITICAL: Puppet has 4 failures [10:40:14] (03PS2) 10Filippo Giunchedi: cassandra: limit stdout logging to level >= WARN [puppet] - 10https://gerrit.wikimedia.org/r/242844 [10:40:32] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] cassandra: limit stdout logging to level >= WARN [puppet] - 10https://gerrit.wikimedia.org/r/242844 (owner: 10Filippo Giunchedi) [10:42:04] RECOVERY - puppet last run on mw1227 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:42:40] akosiaris: sweet, I'm doing the same, some things like journald won't fully work I think until a service restart with the new unit is done [10:44:14] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [10:50:48] (03PS2) 10Giuseppe Lavagetto: Fix regex in stats code [debs/hhvm] - 10https://gerrit.wikimedia.org/r/239816 (https://phabricator.wikimedia.org/T112922) [10:51:24] RECOVERY - puppet last run on mw1062 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [10:53:00] godog: maps-test2001 seems to be fine [10:53:06] moving on with the rest [10:55:04] PROBLEM - puppet last run on mw1181 is CRITICAL: CRITICAL: Puppet has 4 failures [10:55:33] PROBLEM - puppet last run on mw1165 is CRITICAL: CRITICAL: Puppet has 1 failures [11:04:45] RECOVERY - puppet last run on mw1067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:06:06] 6operations, 10MediaWiki-Database: Compress data at external storage - https://phabricator.wikimedia.org/T106386#1692700 (10jcrespo) [11:06:14] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [11:07:04] RECOVERY - puppet last run on mw1100 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:09:35] 6operations, 10RESTBase, 6Services: Switch RESTBase to use Node.js 4 - https://phabricator.wikimedia.org/T107762#1692713 (10MoritzMuehlenhoff) I talked to the maintainer: The new packages won't be co-installable, there will be only one release in the archive. He also mentioned that 4.4.1 isn't the LTS rele... [11:16:35] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: Puppet has 1 failures [11:16:54] PROBLEM - puppet last run on mw1243 is CRITICAL: CRITICAL: Puppet has 2 failures [11:17:54] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 9.09% of data above the critical threshold [500.0] [11:22:03] RECOVERY - puppet last run on mw1181 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [11:22:24] RECOVERY - puppet last run on mw1165 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [11:27:54] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [11:35:00] we have a few broken i18n messages on Wikidata (because we updated our branch) [11:35:25] so think we need to run scap and wonder if that would interfere with what anyone is doing? [11:35:40] (hopefully it is quick, since not that much is changed) [11:37:10] 6operations, 10RESTBase, 6Services: Switch RESTBase to use Node.js 4 - https://phabricator.wikimedia.org/T107762#1692846 (10akosiaris) >>! In T107762#1692713, @MoritzMuehlenhoff wrote: > I talked to the maintainer: The new packages won't be co-installable, there will be only one release in the archive. >... [11:43:14] RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:43:44] RECOVERY - puppet last run on mw1243 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [11:43:46] * Steinsplitter pokes legoktm [11:44:26] 6operations, 10RESTBase, 6Services: Switch RESTBase to use Node.js 4 - https://phabricator.wikimedia.org/T107762#1692875 (10mobrovac) >>! In T107762#1692713, @MoritzMuehlenhoff wrote: > He also mentioned that 4.4.1 isn't the LTS release yet, the information on https://github.com/nodejs/LTS/ are examples/plan... [11:46:45] ok, going to run scap now [11:47:49] !log aude@tin Started scap: Put Wikidata extension back on wmf/1.27.0-wmf.1 [11:52:30] (03PS5) 10Alexandros Kosiaris: Add the new OTRS scheduler watchdog cron entry [puppet] - 10https://gerrit.wikimedia.org/r/242184 [11:52:32] (03PS1) 10Alexandros Kosiaris: otrs: Ship systemd unit file for OTRS scheduler [puppet] - 10https://gerrit.wikimedia.org/r/242860 [12:04:48] (03PS6) 10Alexandros Kosiaris: Add the new OTRS scheduler watchdog cron entry [puppet] - 10https://gerrit.wikimedia.org/r/242184 [12:04:50] (03PS2) 10Alexandros Kosiaris: otrs: Ship systemd unit file for OTRS scheduler [puppet] - 10https://gerrit.wikimedia.org/r/242860 [12:04:52] (03PS1) 10Alexandros Kosiaris: otrs: use hiera to get configuration values [puppet] - 10https://gerrit.wikimedia.org/r/242863 [12:09:04] !log installed PHP security updates on all precise/trusty systems (the respective DSA for jessie is already deployed, it was released three weeks ago) [12:17:39] (03CR) 10Alexandros Kosiaris: [C: 032] otrs: use hiera to get configuration values [puppet] - 10https://gerrit.wikimedia.org/r/242863 (owner: 10Alexandros Kosiaris) [12:18:36] !log aude@tin Finished scap: Put Wikidata extension back on wmf/1.27.0-wmf.1 (duration: 30m 46s) [12:23:24] (03CR) 10Alexandros Kosiaris: [C: 04-1] "That is also configurable via the web interface and resides in" [puppet] - 10https://gerrit.wikimedia.org/r/242789 (https://phabricator.wikimedia.org/T87217) (owner: 10Faidon Liambotis) [12:23:31] (03PS3) 10Muehlenhoff: Lower the conntrack tracking time for TIME_WAIT connections [puppet] - 10https://gerrit.wikimedia.org/r/240361 (https://phabricator.wikimedia.org/T105307) [12:24:02] !log disable SessionRemoteIPcheck on mendelevium's OTRS installation for checking [12:41:42] (03CR) 10Muehlenhoff: [C: 032 V: 032] Lower the conntrack tracking time for TIME_WAIT connections [puppet] - 10https://gerrit.wikimedia.org/r/240361 (https://phabricator.wikimedia.org/T105307) (owner: 10Muehlenhoff) [12:46:27] apergos: hello ! I got a patch pending to pass pyflakes/pep8 on operations/dumps . I am wondering whether you rely on the pyflakes/pep8 jobs currently triggerd [12:46:32] apergos: I would like to get rid of them [12:47:00] I do not use them, I am doing a pile of pylints on my branch as you will have seen [12:47:09] ohhh there is a branhc [12:47:22] and eventually when all that code is linted and flaked to death then I will ask you to enable it all [12:47:41] but I don't have "do it all" on my list, only the xml dump related code for now [12:49:09] ok ok [12:49:13] I am getting rid of the old jobs [12:49:20] ok great [12:49:45] apergos: on the master branch I proposed a change that uses tox to setup a virtualenv and run flake8 (a wrapper around linters pep8 and pyflakes) https://gerrit.wikimedia.org/r/#/c/242494/ [12:49:52] might apply cleanly on your branch [12:50:04] from there we can add a jenkins job that run it for you [12:50:13] currently it is only done manually by commenting 'check experimental' in Gerrit [12:50:30] probably won't run cleanly [12:50:54] but in 2-3 days we'll be in much better shape [12:52:07] let's leave that patch where it is for master and I'll see what would be needed to get the xml related code into master cleanly [12:52:24] since once it passes all the checks it might as well get moved. [12:55:04] 6operations, 10Continuous-Integration-Infrastructure, 10Dumps-Generation, 5Patch-For-Review, 7WorkType-Maintenance: operations/dumps repo should pass flake8 - https://phabricator.wikimedia.org/T114249#1693065 (10ArielGlenn) a:3ArielGlenn [12:55:24] 6operations, 10Continuous-Integration-Infrastructure, 10Dumps-Generation, 5Patch-For-Review, 7WorkType-Maintenance: operations/dumps repo should pass flake8 - https://phabricator.wikimedia.org/T114249#1689397 (10ArielGlenn) [13:03:12] !log repooling cp1046 (eqiad mobile) with caches wiped clean just before [13:05:15] PROBLEM - SSH on ms-be1010 is CRITICAL: Connection refused [13:05:23] PROBLEM - DPKG on ms-be1010 is CRITICAL: Connection refused by host [13:05:24] PROBLEM - dhclient process on ms-be1010 is CRITICAL: Connection refused by host [13:05:35] PROBLEM - swift-container-replicator on ms-be1010 is CRITICAL: Connection refused by host [13:05:35] PROBLEM - swift-object-server on ms-be1010 is CRITICAL: Connection refused by host [13:05:45] PROBLEM - swift-object-updater on ms-be1010 is CRITICAL: Connection refused by host [13:05:45] PROBLEM - puppet last run on ms-be1010 is CRITICAL: Connection refused by host [13:05:45] PROBLEM - swift-account-reaper on ms-be1010 is CRITICAL: Connection refused by host [13:05:53] PROBLEM - swift-account-replicator on ms-be1010 is CRITICAL: Connection refused by host [13:06:03] PROBLEM - swift-container-server on ms-be1010 is CRITICAL: Connection refused by host [13:06:03] PROBLEM - RAID on ms-be1010 is CRITICAL: Connection refused by host [13:06:03] PROBLEM - swift-account-server on ms-be1010 is CRITICAL: Connection refused by host [13:06:04] PROBLEM - salt-minion processes on ms-be1010 is CRITICAL: Connection refused by host [13:06:14] PROBLEM - Check size of conntrack table on ms-be1010 is CRITICAL: Connection refused by host [13:06:15] PROBLEM - swift-container-auditor on ms-be1010 is CRITICAL: Connection refused by host [13:06:24] PROBLEM - swift-object-auditor on ms-be1010 is CRITICAL: Connection refused by host [13:06:24] PROBLEM - swift-account-auditor on ms-be1010 is CRITICAL: Connection refused by host [13:06:25] PROBLEM - swift-container-updater on ms-be1010 is CRITICAL: Connection refused by host [13:06:34] PROBLEM - Disk space on ms-be1010 is CRITICAL: Connection refused by host [13:06:44] PROBLEM - swift-object-replicator on ms-be1010 is CRITICAL: Connection refused by host [13:06:44] PROBLEM - configured eth on ms-be1010 is CRITICAL: Connection refused by host [13:06:46] 6operations, 10ops-eqiad, 10Traffic: cp1046 is crashing and becoming unresponsive - https://phabricator.wikimedia.org/T113639#1693084 (10BBlack) I've left cp1046 depooled the past several days since @cmjohnson did the hardware work above. So far it's been stable under no load. Today I've wiped it's long-te... [13:08:26] 6operations, 10Continuous-Integration-Infrastructure, 10Dumps-Generation, 5Patch-For-Review, 7WorkType-Maintenance: operations/dumps repo should pass flake8 - https://phabricator.wikimedia.org/T114249#1693088 (10hashar) [13:16:07] ggrr ms-be1010 is me [13:19:45] RECOVERY - swift-object-auditor on ms-be1010 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [13:19:52] <_joe_> godog: I was about to ask [13:19:54] RECOVERY - swift-container-updater on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [13:19:54] RECOVERY - swift-account-auditor on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [13:19:54] RECOVERY - Disk space on ms-be1010 is OK: DISK OK [13:20:04] RECOVERY - swift-object-replicator on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [13:20:05] RECOVERY - configured eth on ms-be1010 is OK: OK - interfaces up [13:20:16] RECOVERY - SSH on ms-be1010 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2wmfprecise2 (protocol 2.0) [13:20:24] RECOVERY - DPKG on ms-be1010 is OK: All packages OK [13:20:24] RECOVERY - dhclient process on ms-be1010 is OK: PROCS OK: 0 processes with command name dhclient [13:20:37] RECOVERY - swift-container-replicator on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [13:20:37] RECOVERY - swift-object-server on ms-be1010 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [13:20:43] (03CR) 10Faidon Liambotis: "How does this watchdog work? If it respawns the scheduler this isn't going to work as it won't run under the right cgroup that systemd has" [puppet] - 10https://gerrit.wikimedia.org/r/242184 (owner: 10Alexandros Kosiaris) [13:20:53] RECOVERY - swift-object-updater on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [13:20:53] RECOVERY - swift-account-reaper on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [13:20:54] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [13:20:54] RECOVERY - swift-account-replicator on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [13:21:03] RECOVERY - swift-container-server on ms-be1010 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [13:21:03] RECOVERY - RAID on ms-be1010 is OK: OK: optimal, 14 logical, 14 physical [13:21:04] RECOVERY - swift-account-server on ms-be1010 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [13:21:04] RECOVERY - salt-minion processes on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [13:21:04] RECOVERY - very high load average likely xfs on ms-be1010 is OK: OK - load average: 17.96, 6.12, 2.18 [13:21:14] RECOVERY - Check size of conntrack table on ms-be1010 is OK: OK: nf_conntrack is 7 % full [13:21:14] RECOVERY - swift-container-auditor on ms-be1010 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [13:23:40] _joe_: heh, sometimes 'reboot' doesn't work and downtime expired [13:30:00] (03CR) 10Alex Monk: "According to my script in P2132, the following people should have access:" [puppet] - 10https://gerrit.wikimedia.org/r/242779 (owner: 10Yuvipanda) [13:30:14] (03CR) 10Alex Monk: "'bastiononly' access, that is" [puppet] - 10https://gerrit.wikimedia.org/r/242779 (owner: 10Yuvipanda) [13:32:13] <_joe_> !log uploaded hhvm_3.6.5+dfsg1-1+wm7 [13:32:25] <_joe_> uhm morebots still out? [13:32:28] <_joe_> damn [13:41:27] should i restart it ? [13:41:42] !morebot [13:41:46] morebots: ping [13:41:47] I am a logbot running on tools-exec-1205. [13:41:47] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [13:41:47] To log a message, type !log . [13:41:55] !log Am I logging? [13:42:43] MediaWikiVersionError: Unknown MediaWiki 1.27.0-wmf.1 [13:42:45] yeah hmm [13:42:53] so that is a bug in admin bots [13:43:44] <_joe_> hashar: uhm ok [13:44:41] 6operations, 10Adminbot: morebots no more log because of MediaWiki semantic versionning - https://phabricator.wikimedia.org/T114365#1693206 (10hashar) 3NEW [13:44:45] filled as https://phabricator.wikimedia.org/T114365 [13:45:54] mwclient does not recognize it :D [13:48:51] 6operations, 10Adminbot: morebots no more log because of MediaWiki semantic versionning - https://phabricator.wikimedia.org/T114365#1693219 (10hashar) Seems the issue is in the python module mwclient: https://github.com/mwclient/mwclient.git I have filled https://github.com/mwclient/mwclient/issues/98 Once f... [13:49:43] 6operations, 10Adminbot: morebots no more log because of MediaWiki semantic versionning - https://phabricator.wikimedia.org/T114365#1693224 (10Aklapper) >>! In T114365#1693219, @hashar wrote: > Seems the issue is in the python module mwclient Also see T114349 ? [13:51:36] 6operations, 10Adminbot: morebots no more log because of MediaWiki semantic versionning - https://phabricator.wikimedia.org/T114365#1693227 (10hashar) [13:52:25] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: morebots no more log because of MediaWiki semantic versionning - https://phabricator.wikimedia.org/T114365#1693206 (10hashar) Merged in since I filled the upstream issue. [13:52:34] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: morebots no more log because of MediaWiki semantic versionning - https://phabricator.wikimedia.org/T114365#1693235 (10hashar) [13:53:30] would need to use some python module that parse semantic versionning [13:53:35] and adjust mwclient code base [13:57:07] I do not know who is the person responsible to create the server dashboard at graphana, but whoever it is: thank you [13:57:14] (03PS1) 10BBlack: zero_update: randomize cron every 15 minutes [puppet] - 10https://gerrit.wikimedia.org/r/242888 (https://phabricator.wikimedia.org/T111045) [13:57:56] having IOPS there makes it 1000x times more useful for me than ganglia [14:00:00] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: morebots no more log because of MediaWiki semantic versionning - https://phabricator.wikimedia.org/T114365#1693271 (10PierreSelim) Just a question: the change was to go from version number such as 1.26wmf24 and now mwclient should probably support also number su... [14:00:25] 6operations, 7HHVM, 5Patch-For-Review, 7Wikimedia-log-errors: Unknown modifier '\': [([^\s,]+)\s*=\s*([^\s,]+)[\+\-]] - https://phabricator.wikimedia.org/T112922#1693272 (10Joe) I just deployed an updated package to mw1018 and it seems to have stopped the errors. Deploying now to all the canaries. [14:01:09] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: morebots no more log because of MediaWiki semantic versionning - https://phabricator.wikimedia.org/T114365#1693281 (10hashar) Yup, it needs to support Semantic versioning. There is a python module for that, and probably others: >>! In T67306#683821, @hashar wr... [14:06:42] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: morebots no more log because of MediaWiki semantic versionning - https://phabricator.wikimedia.org/T114365#1693303 (10zhuyifei1999) >>! In T114365#1693227, @hashar wrote: > Merged in since I filled the upstream issue. Latest mwclient doesn't crash. The tool lab... [14:07:22] <_joe_> !log installing the new hhvm package to all the canaries [14:08:05] 6operations, 10ContentTranslation-cxserver, 6Language-Engineering, 10MediaWiki-extensions-ContentTranslation, and 4 others: Apertium leaves a ton of stale processes, consumes all the available memory - https://phabricator.wikimedia.org/T107270#1693306 (10Arrbee) [14:10:00] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1693309 (10zhuyifei1999) [14:17:15] morebots, there [14:17:15] I am a logbot running on tools-exec-1205. [14:17:15] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [14:17:15] To log a message, type !log . [14:17:21] !log testing the log [14:19:02] morebots, back? [14:19:02] I am a logbot running on tools-exec-1214. [14:19:02] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [14:19:02] To log a message, type !log . [14:19:07] !log log test [14:23:39] (03Restored) 10Hashar: Jenkins #2 (DO NOT SUBMIT) [debs/pybal] - 10https://gerrit.wikimedia.org/r/90549 (owner: 10Hashar) [14:23:47] (03CR) 10Hashar: "recheck" [debs/pybal] - 10https://gerrit.wikimedia.org/r/90549 (owner: 10Hashar) [14:24:26] (03Abandoned) 10Hashar: Jenkins #2 (DO NOT SUBMIT) [debs/pybal] - 10https://gerrit.wikimedia.org/r/90549 (owner: 10Hashar) [14:24:55] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1693349 (10Andrew) @zhuyifei1999 that's good news! Can you point me to an up-to-date .deb of that tool? (meanwhile, I will google :) ) [14:26:03] 6operations, 7HHVM, 5Patch-For-Review, 7Wikimedia-log-errors: Unknown modifier '\': [([^\s,]+)\s*=\s*([^\s,]+)[\+\-]] - https://phabricator.wikimedia.org/T112922#1693352 (10Joe) 5Open>3Resolved [14:26:25] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1693354 (10hashar) Maybe it has been installed via git clone + python setup.py install ? [14:27:34] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1693356 (10coren) No, it comes from our repo: ```Package python-mwclient: i 0.6.5-1... [14:31:55] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1693360 (10Andrew) a:3Andrew [14:40:33] (03PS1) 10Filippo Giunchedi: cassandra: add multi-instance support, disabled [puppet] - 10https://gerrit.wikimedia.org/r/242896 (https://phabricator.wikimedia.org/T95253) [14:44:51] (03PS1) 10Andrew Bogott: Include python-mwclient (latest) on exec nodes. [puppet] - 10https://gerrit.wikimedia.org/r/242900 (https://phabricator.wikimedia.org/T114365) [14:45:46] (03CR) 10Andrew Bogott: [C: 032] Include python-mwclient (latest) on exec nodes. [puppet] - 10https://gerrit.wikimedia.org/r/242900 (https://phabricator.wikimedia.org/T114365) (owner: 10Andrew Bogott) [14:52:10] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1693376 (10Andrew) I built python-mwclient_0.8.0~dev1-1_all.deb and provided it on carbon. I also made the inclusion of python-mwclient... [14:58:24] andrewbogott: well done! [14:58:40] it’s not yet fixed, there’s a dependency on ordereddict [14:58:47] which I see you are waiting on anyway [14:58:51] !log installed PHP security updates on all precise/trusty systems (the respective DSA for jessie is already deployed, it was released three weeks ago) [14:58:57] any idea if there’s a git repo for that? [14:59:35] ordereddict is built-in python2.7 iirc [14:59:51] https://pypi.python.org/pypi/ordereddict [14:59:53] <_joe_> it is [14:59:59] backported as a module for python 2.6 support [15:00:03] <_joe_> it's part of the collections module [15:00:05] anomie ostriches thcipriani marktraceur Krenair: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20151001T1500). [15:00:05] matt_flaschen kart_: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:15] I'll do the SWAT. [15:00:22] I am off, gotta run [15:00:22] hashar: yes, but the download link at that page has a cert error [15:00:29] (03CR) 10Mormegil: [C: 031] [Security] Restrict course page editing for any wiki with EducationProgram Extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/240049 (https://phabricator.wikimedia.org/T112806) (owner: 10MarcoAurelio) [15:00:31] sorry :( [15:00:32] I don’t feel great about downloading a .tgz in that case [15:01:32] here [15:01:34] hm… I’m wrong I guess, the browser doesn’t complain [15:02:37] andrewbogott: which is why I'm not against pypi requiring people to host on pypi :) [15:03:04] (relying on every dev to have https setup correctly for their little one-off python module is.... not sane) [15:06:18] (03CR) 10Mattflaschen: [C: 032] Freeze LQT on Swedish Wikimedia chapter wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/242640 (https://phabricator.wikimedia.org/T114277) (owner: 10Mattflaschen) [15:06:25] (03Merged) 10jenkins-bot: Freeze LQT on Swedish Wikimedia chapter wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/242640 (https://phabricator.wikimedia.org/T114277) (owner: 10Mattflaschen) [15:08:05] !log mattflaschen@tin Synchronized wmf-config/InitialiseSettings.php: Freeze LQT on se.wikimedia (duration: 00m 18s) [15:09:04] !log Did final run of convertAllLqtPages.php on sewikimedia immediately before freezing LQT [15:09:53] matt_flaschen: I immediately thought of https://www.youtube.com/watch?v=Vlq20E3SOVQ [15:10:15] morebots: you still aren't working [15:10:15] I am a logbot running on tools-exec-1214. [15:10:16] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [15:10:16] To log a message, type !log . [15:10:33] morebots: but guess what, bd's SAL still is: https://tools.wmflabs.org/sal/production [15:10:33] I am a logbot running on tools-exec-1214. [15:10:33] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [15:10:33] To log a message, type !log . [15:11:07] greg-g: you are seeing https://phabricator.wikimedia.org/T114365 [15:11:23] I think [15:11:33] PROBLEM - puppet last run on mw2022 is CRITICAL: CRITICAL: Puppet has 1 failures [15:12:09] andrewbogott: yeah, which begs the question (sic): should we instead look into making bd's SAL tool official given it has had better uptime over the past 2 months than morebots? [15:13:53] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1693456 (10Andrew) Note that python-mwclient_0.8.0~dev1-1_all.deb was built directly out of the github source at https://github.com/mwcl... [15:14:21] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1693458 (10greg) Just for the record, this SAL tool has been chugging along just fine: https://tools.wmflabs.org/sal/production [15:14:28] <_joe_> greg-g: +2 [15:14:36] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1693462 (10Andrew) [15:14:47] greg-g: yeah, probably [15:14:49] also, useful search! links to specific logs (https://tools.wmflabs.org/sal/log/AVAj8xZi1oXzWjit5xMX )! [15:14:59] as long as I don’t have to maintain nit [15:15:02] *it [15:15:07] well, someone does :) [15:15:19] but it appears to be easier to maintain than morebots :) [15:15:36] it's a stack ops is familiar with (Logstash) [15:16:51] 6operations, 10MediaWiki-Database: Compress data at external storage - https://phabricator.wikimedia.org/T106386#1693476 (10greg) This was mentioned in yesterday's SoS: @jcrespo: I'd like to make sure you get the help you need, but I'm not sure who to task with that. Do you have anyone in mind? [15:20:05] (03CR) 10Alexandros Kosiaris: [C: 031] cassandra: add multi-instance support, disabled [puppet] - 10https://gerrit.wikimedia.org/r/242896 (https://phabricator.wikimedia.org/T95253) (owner: 10Filippo Giunchedi) [15:23:40] 6operations: Puppet not always doing "apt-get update" before installing new packages - https://phabricator.wikimedia.org/T114375#1693516 (10Andrew) 3NEW [15:23:43] matt_flaschen: you doing SWAT for rest of the patches, right? [15:23:49] kart_, yes. [15:23:53] cool. [15:27:20] Can you let me know when you're done please matt_flaschen? [15:27:25] Yep [15:27:27] thanks [15:27:41] 6operations, 10RESTBase, 6Services, 3Mobile-Content-Service, 7Varnish: Enable caching for the Mobile Content Service's RESTBase public endpoints - https://phabricator.wikimedia.org/T113591#1693537 (10GWicke) > Currently, the RB-related VCL implicitly assumes that RB responses are all no-cache, and thus d... [15:27:49] 6operations: Puppet not always doing "apt-get update" before installing new packages - https://phabricator.wikimedia.org/T114375#1693539 (10faidon) Puppet is being run from a shell script, `puppet-run`, which runs `apt-get update` first — and also has `set -e`, so puppet wouldn't run at all if `apt-get update` h... [15:30:41] (03CR) 10Alex Monk: [C: 031] [Security] Restrict course page editing for any wiki with EducationProgram Extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/240049 (https://phabricator.wikimedia.org/T112806) (owner: 10MarcoAurelio) [15:30:55] 6operations: Puppet not always doing "apt-get update" before installing new packages - https://phabricator.wikimedia.org/T114375#1693551 (10Andrew) production uses puppet-run which is now in charge of the apt-get update. Does labs use puppet-run? [15:32:58] 6operations: Puppet not always doing "apt-get update" before installing new packages - https://phabricator.wikimedia.org/T114375#1693555 (10Andrew) 5Open>3Invalid a:3Andrew ok -- I think this is human error. All the evidence above is based on my running 'puppet agent -tv' on the commandline, and that DEFI... [15:33:25] 6operations, 10MediaWiki-Database: Compress data at external storage - https://phabricator.wikimedia.org/T106386#1693559 (10jcrespo) @greg No, I do not know who should be the best fit. This is a "mediawiki-core" change. Who maintains the mysql ORM? If the answer to that is "nobody" then, someone with some med... [15:33:39] !log replacing failed disk ms-be1012 /dev/sdf slot 5 [15:34:10] 6operations: Encrypted password storage - https://phabricator.wikimedia.org/T96130#1693560 (10MoritzMuehlenhoff) 5Open>3Resolved The plain text password store has been migrated to pwstore. Docs available at https://office.wikimedia.org/wiki/Pwstore [15:35:08] kart_, the patch shouldn't be cherry-picked to 1.26wmf24? [15:35:33] ah [15:36:33] PROBLEM - YARN NodeManager Node-State on analytics1035 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:37:02] matt_flaschen: why I get, Could not create a merge commit during the cherry pick [15:37:32] you may have to cherry-pick locally and fix the conflicts [15:37:50] kart_, probably because the patch is not relevant or it's a merge conflict. it doesn't really need to be since 1.27.0-wmf.1 will rollout everywhere today anyway. [15:38:04] RECOVERY - YARN NodeManager Node-State on analytics1035 is OK: OK: YARN NodeManager analytics1035.eqiad.wmnet:8041 Node-State: RUNNING [15:38:22] matt_flaschen: ok. Go ahead with 1.27 [15:38:24] RECOVERY - puppet last run on mw2022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:38:33] https://wikitech.wikimedia.org/wiki/How_to_deploy_code#Updating_the_deployment_branch [15:38:41] Don't want to mess thing in production now. [15:38:50] 6operations, 10ops-eqiad: ms-be1012: slot=5 dev=sdf failed - https://phabricator.wikimedia.org/T113929#1693568 (10Cmjohnson) Replaced the disk Return Part Tracking Info USPS 9202 3946 5301 2428 9946 37 FEDEX 9611918 2393026 50486310 [15:38:52] Krenair: got it. Thanks! [15:39:16] matt_flaschen: can be tested on 'testwiki', so it is fine for 1.27 as of now. [15:42:29] 6operations, 10ops-eqiad: ms-be1012: slot=5 dev=sdf failed - https://phabricator.wikimedia.org/T113929#1693575 (10Cmjohnson) 5Open>3Resolved a:3Cmjohnson Added the disk back and VD5 has returned. Virtual Drive: 0 (Target Id: 0) Virtual Drive: 1 (Target Id: 1) Virtual Drive: 2 (Target Id: 2) Virtual Driv... [15:43:42] !log mattflaschen@tin Synchronized php-1.26wmf24/includes/objectcache/MemcachedBagOStuff.php: Memcached key decode fix (duration: 00m 18s) [15:44:19] !log mattflaschen@tin Synchronized php-1.26wmf24/tests/phpunit/includes/objectcache/BagOStuffTest.php: Memcached key decode fix (duration: 00m 17s) [15:45:56] !log mattflaschen@tin Synchronized php-1.27.0-wmf.1/includes/objectcache/MemcachedBagOStuff.php: Memcached key decode fix (duration: 00m 17s) [15:46:27] !log mattflaschen@tin Synchronized php-1.27.0-wmf.1/tests/phpunit/includes/objectcache/BagOStuffTest.php: Memcached key decode fix (duration: 00m 18s) [15:47:24] (03CR) 10Mattflaschen: [C: 032] Enable CX suggestions in ar, eo, hi, nl, vi and dawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/242528 (https://phabricator.wikimedia.org/T112848) (owner: 10KartikMistry) [15:47:27] !log analytics1049 replacing failed disk /dev/sdi at slot 7 [15:47:48] (03Merged) 10jenkins-bot: Enable CX suggestions in ar, eo, hi, nl, vi and dawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/242528 (https://phabricator.wikimedia.org/T112848) (owner: 10KartikMistry) [15:48:52] !log mattflaschen@tin Synchronized wmf-config/InitialiseSettings.php: Enable CX suggestions in ar, eo, hi, nl, vi and dawiki (duration: 00m 17s) [15:49:09] kart_, your config patch is done. Test now if feasible. [15:50:28] Sure [15:51:44] !log mattflaschen@tin Synchronized php-1.27.0-wmf.1/extensions/ContentTranslation/modules/dashboard/styles/ext.cx.dashboard.less: Fix: Clicking on down arrow in language selector should trigger ULS (duration: 00m 17s) [15:51:59] 6operations, 10ops-eqiad: analytics1049 /dev/sdi busted - https://phabricator.wikimedia.org/T114034#1693599 (10Cmjohnson) Disk has been swapped Return Part Tracking Information USPS 9202 3946 5301 2428 9913 77 FEDEX 9611918 2393026 50483050 [15:52:52] akosiaris: can this be merged? https://gerrit.wikimedia.org/r/#/c/231574/ [15:53:03] matt_flaschen: looks good. Thanks [15:53:13] I added the fixes to your last comments and changed the security group to the one Daniel made yesterday [15:53:27] kart_, and the ULS one is done too. Please test that. [15:53:34] Krenair, done. [15:53:40] thanks [15:53:46] SWAT complete [15:54:06] 6operations, 10ops-eqiad: analytics1049 /dev/sdi busted - https://phabricator.wikimedia.org/T114034#1693610 (10Cmjohnson) 5Open>3Resolved a:3Cmjohnson Disk has been added back sudo megacli -LDinfo -Lall -aALL |grep "Virtual Drive:" Virtual Drive: 0 (Target Id: 0) Virtual Drive: 1 (Target Id: 1) Virtual... [15:55:44] matt_flaschen: Works. Thanks! [15:55:53] (03CR) 10Alex Monk: [C: 032] [Security] Restrict course page editing for any wiki with EducationProgram Extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/240049 (https://phabricator.wikimedia.org/T112806) (owner: 10MarcoAurelio) [15:56:06] No problem [15:56:14] (03Merged) 10jenkins-bot: [Security] Restrict course page editing for any wiki with EducationProgram Extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/240049 (https://phabricator.wikimedia.org/T112806) (owner: 10MarcoAurelio) [15:56:37] !log db1051 replacing failed disk slot 6 [15:56:47] !log krenair@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/240049 (duration: 00m 17s) [15:58:13] andrewbogott: I'll build a virtualenv for the logbot [15:59:56] 6operations, 5Patch-For-Review: Blacklist kernel modules - https://phabricator.wikimedia.org/T102600#1693624 (10MoritzMuehlenhoff) 5Open>3Resolved /etc/modprobe.d/blacklist-wmf.conf is in place for a while now, closing the task. [16:00:05] godog jynus: Respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20151001T1600). Please do the needful. [16:00:32] 6operations, 10ops-eqiad: db1051 degraded raid (disk) - https://phabricator.wikimedia.org/T113786#1693626 (10Cmjohnson) Disk has been swapped and is rebuilding Firmware state: Rebuild Return Part tracking Info USPS 9202 3946 5301 2428 9979 42 FEDEX 9611918 2393026 50489625 [16:05:26] 6operations: Assign salt grains to server groups for debdeploy - https://phabricator.wikimedia.org/T111006#1693653 (10MoritzMuehlenhoff) [16:06:02] godog, difficult swat today [16:06:24] (03PS2) 10Rush: phab: turn dump cron back on as it hits slave only [puppet] - 10https://gerrit.wikimedia.org/r/242663 [16:06:38] we should promote it more [16:06:58] 6operations, 10ops-eqiad: Replace failed PSU on wtp1021 - https://phabricator.wikimedia.org/T114151#1693667 (10Cmjohnson) 5Open>3Resolved Replaced the power supply Return Part Tracking Info USPS 9202 3946 5301 2428 9985 74 FEDEX 9611918 2393026 50490256 [16:07:40] (03CR) 10JanZerebecki: [C: 031] Avoid breaking full phabricator URLs [puppet] - 10https://gerrit.wikimedia.org/r/242237 (owner: 10Daniel Kinzler) [16:08:48] (03CR) 10Rush: [C: 032] phab: turn dump cron back on as it hits slave only [puppet] - 10https://gerrit.wikimedia.org/r/242663 (owner: 10Rush) [16:09:43] 6operations, 6Phabricator, 7Database, 5Patch-For-Review, 7WorkType-Maintenance: Phabricator creates MySQL connection spikes: Attempt to connect to phuser@m3-master.eqiad.wmnet failed with error #1040: Too many connections. - https://phabricator.wikimedia.org/T109279#1693678 (10chasemp) [16:09:46] 6operations, 10ops-eqiad: Replace failed PSU2 on mw1200 - https://phabricator.wikimedia.org/T114142#1693679 (10Cmjohnson) 5Open>3Resolved Replaced the failed PSU Return Part Tracking Info USPS 9202 3946 5301 2428 9994 41 FEDEX 9611918 2393026 50491123 [16:10:34] RECOVERY - puppet last run on ms-be1012 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [16:15:42] (03PS1) 10Muehlenhoff: Move ferm rules out of the module [puppet] - 10https://gerrit.wikimedia.org/r/242915 [16:18:01] 6operations, 10Wikimedia-General-or-Unknown, 7Database: Revision 186704908 on en.wikipedia.org, Fatal exception: unknown "cluster16" - https://phabricator.wikimedia.org/T26675#1693716 (10Krenair) https://wikitech.wikimedia.org/wiki/Dumps/History suggests that it was missing a third of all revisions. I took a... [16:19:37] 6operations, 10MediaWiki-General-or-Unknown, 10Traffic: Separate Cache-Control header for proxy and client - https://phabricator.wikimedia.org/T50835#1693722 (10faidon) [16:19:58] 6operations, 10RESTBase, 6Services, 3Mobile-Content-Service, 7Varnish: Enable caching for the Mobile Content Service's RESTBase public endpoints - https://phabricator.wikimedia.org/T113591#1693726 (10mobrovac) Related tickets to this discussion: - {T50835} - {T110717} [16:23:27] jynus: yeah I'm exhausted [16:24:51] 6operations, 10Wikimedia-General-or-Unknown, 7Database: Revision 186704908 on en.wikipedia.org, Fatal exception: unknown "cluster16" - https://phabricator.wikimedia.org/T26675#1693730 (10Krenair) Although actually, 2010-10 dumps would not be helpful considering this task was created in 2010-08. [16:26:04] !log testing, testing [16:26:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:26:45] !log installed PHP security updates on all precise/trusty systems (the respective DSA for jessie is already deployed, it was released three weeks ago) [16:26:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:28:14] 10Ops-Access-Requests, 6operations, 6Services, 7RESTBase-architecture: Access to aqs100x for gwicke, eevans and mobrovac - https://phabricator.wikimedia.org/T114383#1693751 (10mobrovac) 3NEW a:3RobH [16:30:07] 6operations, 10Wikimedia-General-or-Unknown, 7Database: Revision 186704908 on en.wikipedia.org, Fatal exception: unknown "cluster16" - https://phabricator.wikimedia.org/T26675#1693766 (10jcrespo) Can we just fill it in with a copy of the previous revision, or a text saying "revision lost", this is an old edi... [16:30:44] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1693771 (10valhallasw) Morebots is now re-deployed using a virtualenv rather than using system packages. The logging seems to have becom... [16:31:04] RECOVERY - puppet last run on analytics1049 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:31:05] RECOVERY - Hadoop DataNode on analytics1049 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [16:31:23] RECOVERY - Hadoop NodeManager on analytics1049 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [16:32:53] (03CR) 10Dr0ptp4kt: "@bblack," [puppet] - 10https://gerrit.wikimedia.org/r/242888 (https://phabricator.wikimedia.org/T111045) (owner: 10BBlack) [16:33:04] andrewbogott: *waves* [16:33:09] are you doing stuff with morebots? [16:33:23] valhallasw`cloud: yeah, just restarting them. [16:33:32] well, restarting them, now, after a bunch of package building and upgrades :) [16:33:35] dod you already restart them? [16:33:37] *did [16:34:01] (03PS12) 10Mforns: Consume EventLogging validation logs from Logstash [puppet] - 10https://gerrit.wikimedia.org/r/241984 (https://phabricator.wikimedia.org/T113627) [16:34:06] yes, I built a virtualenv for them :-) [16:34:11] oh... [16:34:15] that wasn’t needed, it was already fixed [16:34:17] but, ok :) [16:34:33] !log test log [16:34:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:34:46] *shrug* [16:34:52] (03CR) 10Ottomata: [C: 031] Create ferm rules for Hadoop NameNode and ResourceManager for master and standby [puppet] - 10https://gerrit.wikimedia.org/r/237335 (owner: 10Muehlenhoff) [16:34:59] the advantage of the virtualenv is that it also works on trusty, but I'm also fine having them run on system packages for now [16:35:32] I don’t care, I’m just allergic to pip [16:35:37] (as is Ops, in general) [16:36:05] 6operations, 10MediaWiki-Database: Compress data at external storage - https://phabricator.wikimedia.org/T106386#1693814 (10Reedy) >>! In T106386#1693559, @jcrespo wrote: > @greg No, I do not know who should be the best fit. This is a "mediawiki-core" change. Who maintains the mysql ORM? > > If the answer to... [16:37:51] fyi, tools is moving away from home-packaged python packages. [16:37:58] 6operations, 10MediaWiki-Database: Compress data at external storage - https://phabricator.wikimedia.org/T106386#1693824 (10jcrespo) Europe is ok, but not a hard blocker. [16:38:02] so on the longer term morebots will have to switch to venv + pip in any case [16:40:24] !log brought analytics1049 back into hadoop after missing a disk since sept. 25th [16:40:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:40:33] 6operations, 10Wikimedia-General-or-Unknown, 7Database: Revision 186704908 on en.wikipedia.org, Fatal exception: unknown "cluster16" - https://phabricator.wikimedia.org/T26675#1693834 (10Reedy) We could, probably, it's just nice to not have broken stuff... Or we just hide the revision for technical reasons.... [16:40:50] 6operations, 10ops-eqiad: analytics1049 /dev/sdi busted - https://phabricator.wikimedia.org/T114034#1693835 (10Ottomata) 5Resolved>3Open Thanks! This machine is now active again. [16:42:07] (03PS13) 10Mforns: Consume EventLogging validation logs from Logstash [puppet] - 10https://gerrit.wikimedia.org/r/241984 (https://phabricator.wikimedia.org/T113627) [16:49:22] 6operations, 6Phabricator, 6Release-Engineering-Team: Enable mod_remoteip and ensure logs follow retention guidelines - https://phabricator.wikimedia.org/T114014#1693857 (10chasemp) upstream made this partially in regards to conversations about this https://secure.phabricator.com/T9494 [16:53:16] (03CR) 10Mforns: "@BrianDavis, Andrew suggested some more changes. Let me know what you think, please." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/241984 (https://phabricator.wikimedia.org/T113627) (owner: 10Mforns) [16:55:10] 6operations, 10MediaWiki-extensions-ZeroPortal, 10Traffic, 6Zero, 5Patch-For-Review: zerofetcher in production is getting throttled for API logins - https://phabricator.wikimedia.org/T111045#1693874 (10BBlack) This issue is becoming a blocker for doing a better job at DDoS mitigation (so that we can port... [16:55:41] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1693875 (10yuvipanda) Yay. Should we kill the package finally now? [16:55:48] 6operations, 10RESTBase, 10RESTBase-Cassandra: column family cassandra metrics size - https://phabricator.wikimedia.org/T113733#1693876 (10fgiunchedi) >>! In T113733#1690824, @Eevans wrote: >>>! In T113733#1689500, @fgiunchedi wrote: >> good question! yeah I think a blacklist would be fine for now > > OK, t... [16:59:56] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 for Dan Foy - https://phabricator.wikimedia.org/T113324#1693891 (10DFoy) Thanks for your help, Rob! [17:01:52] (03CR) 10BBlack: "Yes, part of this is splaying the times randomly, whereas before it was on fixed intervals in sync across all caches. The other part is e" [puppet] - 10https://gerrit.wikimedia.org/r/242888 (https://phabricator.wikimedia.org/T111045) (owner: 10BBlack) [17:07:39] 6operations, 10Wikimedia-Mailing-lists: move sodium backup to archive pool? - https://phabricator.wikimedia.org/T113828#1693928 (10Dzahn) It looks like we need to create a job, very similar to a backup job, just that it's called a "migration job", there is "copy" and "migrate". As to the differences between th... [17:15:17] 6operations, 10RESTBase, 10RESTBase-Cassandra: column family cassandra metrics size - https://phabricator.wikimedia.org/T113733#1693965 (10Eevans) >>! In T113733#1693876, @fgiunchedi wrote: >>>! In T113733#1690824, @Eevans wrote: >>>>! In T113733#1689500, @fgiunchedi wrote: >>> good question! yeah I think a... [17:17:22] 6operations, 6Phabricator, 6Release-Engineering-Team, 7audits-data-retention: Enable mod_remoteip and ensure logs follow retention guidelines - https://phabricator.wikimedia.org/T114014#1693978 (10ArielGlenn) [17:18:10] (03PS1) 10Reedy: Add wiki.voyage to DNS [dns] - 10https://gerrit.wikimedia.org/r/242924 (https://phabricator.wikimedia.org/T88851) [17:20:02] (03PS4) 10Merlijn van Deen: Add Fabric deploy helper [debs/adminbot] - 10https://gerrit.wikimedia.org/r/231151 [17:20:04] (03PS1) 10Merlijn van Deen: Move to virtualenv based system [debs/adminbot] - 10https://gerrit.wikimedia.org/r/242926 [17:22:22] Reedy: for real ?:) heh, that was long time ago but now the ticket got actionable?:) [17:27:35] 6operations, 10Wikimedia-General-or-Unknown, 7Database: Revision 186704908 on en.wikipedia.org, Fatal exception: unknown "cluster16" - https://phabricator.wikimedia.org/T26675#1694035 (10Halfak) +1 to just marking the text as deleted and putting a placeholder in the text store. This is more of a technical p... [17:27:41] 6operations, 10Wikimedia-General-or-Unknown, 7Database: Revision 186704908 on en.wikipedia.org, Fatal exception: unknown "cluster16" - https://phabricator.wikimedia.org/T26675#1694036 (10Halfak) a:5Halfak>3None [17:28:14] 6operations, 10RESTBase, 10RESTBase-Cassandra: column family cassandra metrics size - https://phabricator.wikimedia.org/T113733#1694041 (10fgiunchedi) >>! In T113733#1693965, @Eevans wrote: >> just tested this in labs and it works, LGTM. I'll send patches to deploy, we'll need to roll restart cassandra for i... [17:31:22] 6operations, 6Phabricator, 7audits-data-retention: Enable mod_remoteip and ensure logs follow retention guidelines - https://phabricator.wikimedia.org/T114014#1694061 (10hashar) [17:31:46] 6operations, 6Phabricator, 7audits-data-retention: Enable mod_remoteip and ensure logs follow retention guidelines - https://phabricator.wikimedia.org/T114014#1682374 (10hashar) Removed #releng since it is already in #Phabricator. [17:33:24] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 for VBaranetsky - https://phabricator.wikimedia.org/T114308#1694075 (10RobH) a:3VBaranetsky [17:36:46] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 for VBaranetsky - https://phabricator.wikimedia.org/T114308#1694082 (10JUnikowski_WMF) Hi @RobH, I am replying on behalf of @VBaranetsky. She needs to have access to Hive, hence to be added to the //analytics-privatedata-users// group. Thanks... [17:36:55] 6operations, 10Flow, 10MediaWiki-Redirects, 3Collaboration-Team-Current, and 2 others: Flow notification links on mobile point to desktop - https://phabricator.wikimedia.org/T107108#1694083 (10Jdlrobson) [17:37:22] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 for VBaranetsky - https://phabricator.wikimedia.org/T114308#1694084 (10RobH) We also need @VBaranetsky's managers approval on this task for this access. [17:38:38] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 for VBaranetsky - https://phabricator.wikimedia.org/T114308#1694094 (10JUnikowski_WMF) Hi @RobH, Yes. We are following up on that and will get someone to vet for @VBaranetsky ASAP. Thanks a lot for your help. Jonathan [17:41:21] (03CR) 10Jhobs: [C: 031] "Not much of a puppet or cron expert, but LGTM. I doubt the extra 10 minutes would have too significant of an impact for our users." [puppet] - 10https://gerrit.wikimedia.org/r/242888 (https://phabricator.wikimedia.org/T111045) (owner: 10BBlack) [17:42:16] (03PS4) 10Dzahn: remove sodium.wm.o (leaving sodium.mgmt.eqiad.wmnet) [dns] - 10https://gerrit.wikimedia.org/r/239414 (https://phabricator.wikimedia.org/T110142) (owner: 10John F. Lewis) [17:45:26] RECOVERY - Disk space on labstore1002 is OK: DISK OK [17:49:40] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to stat1002 for JUnikowski_WMF - https://phabricator.wikimedia.org/T113298#1694133 (10JUnikowski_WMF) @KRenair Do I need to create a separate task to get added to the bastiononly group? Thanks! [17:51:49] 6operations, 7audits-data-retention: Gerrit seemingly violates data retention guidelines - https://phabricator.wikimedia.org/T114395#1694145 (10chasemp) 3NEW [17:53:13] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1694158 (10Andrew) Can y'all open a new task for whatever morebots refactoring you're doing, and then close this one? I think the immed... [17:54:03] (03Abandoned) 10RobH: setting up dan foy's shell access [puppet] - 10https://gerrit.wikimedia.org/r/242234 (owner: 10RobH) [17:57:45] (03PS1) 10RobH: setting up dan foy's shell access [puppet] - 10https://gerrit.wikimedia.org/r/242931 [17:58:49] (03CR) 10jenkins-bot: [V: 04-1] setting up dan foy's shell access [puppet] - 10https://gerrit.wikimedia.org/r/242931 (owner: 10RobH) [17:59:59] bah... damn mispaste [18:00:04] twentyafterfour: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20151001T1800). [18:00:21] (03PS1) 1020after4: wikipedia wikis to 1.27.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/242932 [18:00:25] (03PS2) 10RobH: setting up dan foy's shell access [puppet] - 10https://gerrit.wikimedia.org/r/242931 [18:01:16] (03CR) 10RobH: [C: 032] setting up dan foy's shell access [puppet] - 10https://gerrit.wikimedia.org/r/242931 (owner: 10RobH) [18:01:59] (03CR) 1020after4: [C: 032] wikipedia wikis to 1.27.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/242932 (owner: 1020after4) [18:04:14] (03CR) 1020after4: [C: 031] Add config deployment [tools/scap] - 10https://gerrit.wikimedia.org/r/240292 (https://phabricator.wikimedia.org/T109512) (owner: 10Thcipriani) [18:04:18] (03Merged) 10jenkins-bot: wikipedia wikis to 1.27.0-wmf.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/242932 (owner: 1020after4) [18:04:21] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to stat1002 for JUnikowski_WMF - https://phabricator.wikimedia.org/T113298#1694194 (10Krenair) No, it should have been done in this task. This is a recurring issue which I'm trying to get fixed, but for now you need bastiononly added. [18:04:48] !log twentyafterfour@tin rebuilt wikiversions.cdb and synchronized wikiversions files: wikipedia wikis to 1.27.0-wmf.1 [18:04:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:10:20] millimetric: yes it can [18:10:24] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 for Dan Foy - https://phabricator.wikimedia.org/T113324#1694215 (10RobH) 5Open>3Resolved @DFoy, Your access to the bastion hosts and stat1002 are now live, resolving access request. [18:10:37] millimetric: but you won't have access to deploy until the meeting [18:12:11] 6operations, 7Database: Grant 'show view' permissions on s1-analytics-slave/jmorgan to user jmorgan - https://phabricator.wikimedia.org/T114396#1694223 (10Capt_Swing) 3NEW [18:13:21] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1694236 (10Danmichaelo) >>! In T114365#1693281, @hashar wrote: > Yup, it needs to support Semantic versioning. There is a python module... [18:16:38] 6operations, 10RESTBase, 6Services: Switch RESTBase to use Node.js 4 - https://phabricator.wikimedia.org/T107762#1694249 (10GWicke) In local testing I discovered that Debian's `npm` package is actually not updated yet, and `npm install` fails. So, we'll have to wait until Debian's `npm` package has caught up... [18:20:13] 6operations, 10Traffic, 6WMF-Legal: Policy decisions for new (and current) DNS domains registered to the WMF - https://phabricator.wikimedia.org/T101048#1694267 (10Slaporte) > Do we have / can we produce a list of all domains registered to us globally, with any registrar, and get them into our DNS servers so... [18:22:40] !log ori@tin Synchronized php-1.27.0-wmf.1/includes/resourceloader: Ic1d802ee2: ResourceLoader: cache minified user and site modules (duration: 00m 17s) [18:22:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:25:10] 6operations, 10RESTBase, 10RESTBase-Cassandra: column family cassandra metrics size - https://phabricator.wikimedia.org/T113733#1694292 (10Eevans) >>! In T113733#1694041, @fgiunchedi wrote: >>>! In T113733#1693965, @Eevans wrote: >>> just tested this in labs and it works, LGTM. I'll send patches to deploy, w... [18:31:15] (03PS7) 10Thcipriani: Add config deployment [tools/scap] - 10https://gerrit.wikimedia.org/r/240292 (https://phabricator.wikimedia.org/T109512) [18:47:44] 6operations, 10RESTBase, 6Services: Switch RESTBase to use Node.js 4 - https://phabricator.wikimedia.org/T107762#1694363 (10mobrovac) [18:58:14] (03PS1) 10Jdlrobson: Enable WikidataPageBanner on Catalan wiki and zh wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/242942 (https://phabricator.wikimedia.org/T114392) [19:00:20] (03CR) 10Ottomata: Consume EventLogging validation logs from Logstash (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/241984 (https://phabricator.wikimedia.org/T113627) (owner: 10Mforns) [19:01:10] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1694412 (10hashar) 5Open>3Resolved Virtualenv fixed it. Well done folks! @danmichaelo thanks to have taken the time to comment her... [19:03:06] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 for VBaranetsky - https://phabricator.wikimedia.org/T114308#1694426 (10VBaranetsky) Cut and paste from email sent by Geoff Brigham (happy to forward you the email): approved -- Geoff Brigham General Counsel Wikimedia Foundation 149 New Montg... [19:03:39] (03CR) 10BryanDavis: [C: 04-1] "A couple small fixes needed for the renamed filter file." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/241984 (https://phabricator.wikimedia.org/T113627) (owner: 10Mforns) [19:05:19] 6operations, 10Adminbot, 6Labs, 10Tool-Labs: upgrade mwclient (morebots no more log because of MediaWiki semantic versionning) - https://phabricator.wikimedia.org/T114365#1694442 (10valhallasw) Note that it's currently not running under a virtualenv (due to miscommunication, @andrew and I were working on i... [19:09:14] (03PS1) 10Hashar: nodepool: rename ssh keys files to default [puppet] - 10https://gerrit.wikimedia.org/r/242944 [19:10:38] (03CR) 10Hashar: "When puppet apply this patch, Nodepool will detect nodepool.yaml got changed and will magically reload the configuration." [puppet] - 10https://gerrit.wikimedia.org/r/242944 (owner: 10Hashar) [19:12:24] (03CR) 10Andrew Bogott: [C: 032] nodepool: rename ssh keys files to default [puppet] - 10https://gerrit.wikimedia.org/r/242944 (owner: 10Hashar) [19:12:37] andrewbogott: that should just work ™ :D [19:20:53] (03CR) 10Dzahn: [C: 032] remove sodium.wm.o (leaving sodium.mgmt.eqiad.wmnet) [dns] - 10https://gerrit.wikimedia.org/r/239414 (https://phabricator.wikimedia.org/T110142) (owner: 10John F. Lewis) [19:23:47] (03PS6) 10Andrew Bogott: interface: dequote booleans [puppet] - 10https://gerrit.wikimedia.org/r/241237 (https://phabricator.wikimedia.org/T113783) [19:26:10] (03CR) 10Andrew Bogott: [C: 032] interface: dequote booleans [puppet] - 10https://gerrit.wikimedia.org/r/241237 (https://phabricator.wikimedia.org/T113783) (owner: 10Andrew Bogott) [19:27:17] (03PS6) 10Andrew Bogott: Cassandra: dequote some booleans. [puppet] - 10https://gerrit.wikimedia.org/r/241238 (https://phabricator.wikimedia.org/T113783) [19:28:05] godog: can I get a +1 on https://gerrit.wikimedia.org/r/#/c/241238 ? [19:29:18] (03PS6) 10Andrew Bogott: Grafana: dequote booleans. [puppet] - 10https://gerrit.wikimedia.org/r/241239 (https://phabricator.wikimedia.org/T113783) [19:31:34] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 for VBaranetsky - https://phabricator.wikimedia.org/T114308#1694560 (10RobH) @VBaranetsky, Please have Geoff email me directly. Forwarding an email approval isn't any kind of secure authorization, since anyone can type anything they want. Th... [19:33:40] !log re-deployed robots.txt patch and restarted apache on iridium (to expand the phabricator robots.txt) [19:33:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:38:19] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 for VBaranetsky - https://phabricator.wikimedia.org/T114308#1694581 (10VBaranetsky) I completely understand the need for following protocol. Thanks for emailing him. Please let me know when he responds. Best, Vickie [19:43:57] (03PS7) 10Andrew Bogott: Grafana: dequote booleans. [puppet] - 10https://gerrit.wikimedia.org/r/241239 (https://phabricator.wikimedia.org/T113783) [19:44:34] (03CR) 10Andrew Bogott: [C: 032] Grafana: dequote booleans. [puppet] - 10https://gerrit.wikimedia.org/r/241239 (https://phabricator.wikimedia.org/T113783) (owner: 10Andrew Bogott) [19:44:52] I'm looking for someone to help with Fundraising cluster config, Jeff_Green is out today and we need some assistance with server config for an SSL client cert. cmjohnson1 perhaps? [19:47:54] Help has arrived, thank you! [19:48:45] (03PS6) 10Andrew Bogott: Gerrit role: dequote booleans [puppet] - 10https://gerrit.wikimedia.org/r/241240 (https://phabricator.wikimedia.org/T113783) [19:49:33] is there actually a person who knows the pre-streams IRC RC bot? [19:49:51] the bot running on irc.wm.org [19:50:09] and yes, streams has not replaced it yet [19:50:17] even though everybody execpted that [19:51:21] (03CR) 10Andrew Bogott: [C: 032] Gerrit role: dequote booleans [puppet] - 10https://gerrit.wikimedia.org/r/241240 (https://phabricator.wikimedia.org/T113783) (owner: 10Andrew Bogott) [19:51:27] (03PS1) 10Thcipriani: Make deployment rev represent config state [tools/scap] - 10https://gerrit.wikimedia.org/r/243009 [19:54:01] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 for VBaranetsky - https://phabricator.wikimedia.org/T114308#1694650 (10RobH) I've now communicated with Geoff out of band of this task, and he has approved the access of bastion group + analytics-privatedata-users. As previously referenced, thi... [19:54:06] mutante: you probably found it already, but there are some notes under https://wikitech.wikimedia.org/wiki/IRCD [19:55:08] (03PS6) 10Andrew Bogott: Mark salt grain bool values with # lint:ignore:quoted_booleans [puppet] - 10https://gerrit.wikimedia.org/r/241241 (https://phabricator.wikimedia.org/T113783) [19:55:28] mutante: looks like a fairly basic python-irclib <-> UDP thing: https://phabricator.wikimedia.org/rOPUP13fa5079ae7822c0aa38ec2b4d8fc1e0e88dc401#4c3ab499 [19:55:49] 10Ops-Access-Requests, 6operations: Requesting access to stat1002 for VBaranetsky - https://phabricator.wikimedia.org/T114308#1694655 (10RobH) a:5VBaranetsky>3RobH [19:56:46] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Requesting access to stat1002 for JUnikowski_WMF - https://phabricator.wikimedia.org/T113298#1694657 (10RobH) a:5coren>3RobH [19:57:48] greg-g: has the mediawiki train deploy for today finished? [19:58:01] maybe that's a question for Deskana|Away? [19:58:41] why dan? [19:59:07] cscott: done: https://tools.wmflabs.org/sal/log/AVAkk_le1oXzWjit5xan [19:59:19] cscott: the SAL tells all: https://tools.wmflabs.org/sal/production [19:59:34] (03CR) 10Andrew Bogott: [C: 032] Mark salt grain bool values with # lint:ignore:quoted_booleans [puppet] - 10https://gerrit.wikimedia.org/r/241241 (https://phabricator.wikimedia.org/T113783) (owner: 10Andrew Bogott) [20:00:14] (03PS6) 10Andrew Bogott: webserver::php5 unquote a boolean. [puppet] - 10https://gerrit.wikimedia.org/r/241242 (https://phabricator.wikimedia.org/T113783) [20:00:50] valhallasw`cloud: thank you. i was looking for the "start bot" part. if only it was really just /usr/local/bin/start-ircbot . i dont know but i remember there was an issue with it last time [20:01:37] (03PS1) 10RobH: junikowski's stat1002 access had bastions overlooked [puppet] - 10https://gerrit.wikimedia.org/r/243011 [20:02:20] mutante: https://github.com/wikimedia/operations-puppet/blob/a329b58c01345c5f75437f2b8acff32cbf919c1a/modules/mw-rc-irc/manifests/irc-echo.pp [20:02:25] (03PS2) 10RobH: junikowski's stat1002 access had bastions overlooked [puppet] - 10https://gerrit.wikimedia.org/r/243011 [20:02:27] so should be just an upstart service called ircecho [20:02:39] (03CR) 10RobH: [C: 032] junikowski's stat1002 access had bastions overlooked [puppet] - 10https://gerrit.wikimedia.org/r/243011 (owner: 10RobH) [20:02:48] (03CR) 10Andrew Bogott: [C: 032] webserver::php5 unquote a boolean. [puppet] - 10https://gerrit.wikimedia.org/r/241242 (https://phabricator.wikimedia.org/T113783) (owner: 10Andrew Bogott) [20:02:52] mutante: https://github.com/wikimedia/operations-puppet/blob/a329b58c01345c5f75437f2b8acff32cbf919c1a/modules/mw-rc-irc/files/upstart/ircecho.conf [20:03:32] valhallasw`cloud: the author name is interesting :) [20:03:43] this might be the newer system already [20:04:12] upstart conf sounds good and what was missing last time, yea [20:04:50] (03PS6) 10Andrew Bogott: Webserver ca: disable the quoted-bool lint check [puppet] - 10https://gerrit.wikimedia.org/r/241243 (https://phabricator.wikimedia.org/T113783) [20:05:59] 6operations, 7user-notice: schedule maintenance for IRC server - https://phabricator.wikimedia.org/T105804#1694733 (10Dzahn) 13:05 < valhallasw`cloud> mutante: https://github.com/wikimedia/operations-puppet/blob/a329b58c01345c5f75437f2b8acff32cbf919c1a/modules/mw-rc-irc/files/upstart... [20:06:16] (03CR) 10Andrew Bogott: [C: 032] Webserver ca: disable the quoted-bool lint check [puppet] - 10https://gerrit.wikimedia.org/r/241243 (https://phabricator.wikimedia.org/T113783) (owner: 10Andrew Bogott) [20:06:36] valhallasw`cloud: who should i tell about a maitenance period/restart? [20:06:37] (03PS6) 10Andrew Bogott: Diamond: Turn off lint check for quoted bools. [puppet] - 10https://gerrit.wikimedia.org/r/241244 (https://phabricator.wikimedia.org/T113783) [20:06:50] mutante: the entire world, probably :/ [20:06:59] valhallasw`cloud: world@lists ? [20:07:25] I'd do tech news, wikitech-l, and maybe tech-ambassadors? there's quite some tools that depend on it [20:07:43] (03PS3) 10RobH: junikowski's stat1002 access had bastions overlooked [puppet] - 10https://gerrit.wikimedia.org/r/243011 [20:07:52] adds a phab tag and happens all that happens fully automatic :) [20:08:05] it kind of did with user-notice [20:08:06] didnt it [20:08:12] greg-g: ok, thanks. just double-checking before i start a parsoid deploy. [20:08:13] yeah [20:08:47] is it possible to deliver the UDP packets to two machines? [20:09:03] (I'm not sure what kinds of downtime we're talking about, actually, in general) [20:09:04] andrewbogott: im merging your stuff [20:09:06] ok? [20:09:11] (03PS7) 10Andrew Bogott: Diamond: Turn off lint check for quoted bools. [puppet] - 10https://gerrit.wikimedia.org/r/241244 (https://phabricator.wikimedia.org/T113783) [20:09:32] robh: um… yes, but so did I :) [20:09:32] oh wait, now its not htere... [20:09:39] you must have had yours alone and i had both [20:09:39] heh [20:09:45] good enough! [20:09:50] valhallasw`cloud: we are not talking about a long downtime at all, basically just rebooting the server [20:10:08] valhallasw`cloud: but i have been told that no matter how short it is.. some users need to restart their tools [20:10:18] well, the tools should be smart enough to reconnect [20:10:28] I'd be more worried about the impact when anything goes wrong [20:10:36] exactly [20:10:37] e.g. the host doesn't come up after reboot for some reason [20:10:40] (03CR) 10Andrew Bogott: [C: 032] Diamond: Turn off lint check for quoted bools. [puppet] - 10https://gerrit.wikimedia.org/r/241244 (https://phabricator.wikimedia.org/T113783) (owner: 10Andrew Bogott) [20:10:41] that's why it has not happened [20:10:57] if the bot doesnt come back for some reason .. [20:11:15] (03PS6) 10Andrew Bogott: Disable quoted_boolean lint check around is_virtual refs. [puppet] - 10https://gerrit.wikimedia.org/r/241245 (https://phabricator.wikimedia.org/T113783) [20:12:18] (03CR) 10Andrew Bogott: [C: 032] Disable quoted_boolean lint check around is_virtual refs. [puppet] - 10https://gerrit.wikimedia.org/r/241245 (https://phabricator.wikimedia.org/T113783) (owner: 10Andrew Bogott) [20:13:12] (03PS2) 10Andrew Bogott: Dequote one more nrpe critical setting [puppet] - 10https://gerrit.wikimedia.org/r/242170 (https://phabricator.wikimedia.org/T113783) [20:13:16] 6operations, 10Traffic, 7Browser-Support-Internet-Explorer, 7HTTPS: Xbox 360 Internet Explorer unable to view Wikipedia - https://phabricator.wikimedia.org/T105455#1694766 (10brion) Haven't heard anything from them, but when I get home this weekend I'll dust off my 360 and see if there's a relevant system... [20:14:05] (03CR) 10Andrew Bogott: [C: 032] Dequote one more nrpe critical setting [puppet] - 10https://gerrit.wikimedia.org/r/242170 (https://phabricator.wikimedia.org/T113783) (owner: 10Andrew Bogott) [20:21:15] 6operations, 7Database: Grant 'show view' permissions on s1-analytics-slave/jmorgan to user jmorgan - https://phabricator.wikimedia.org/T114396#1694802 (10jcrespo) p:5Triage>3Normal [20:21:24] 6operations, 7Database: Grant 'show view' permissions on s1-analytics-slave/jmorgan to user jmorgan - https://phabricator.wikimedia.org/T114396#1694804 (10jcrespo) a:3jcrespo [20:22:51] 6operations, 7Database: Grant 'show view' permissions on s1-analytics-slave/jmorgan to user jmorgan - https://phabricator.wikimedia.org/T114396#1694223 (10jcrespo) You shouldn't have `CREATE VIEW` in the first place on *.*. Only write and read permissions on your own tables (and views). Will send a patch tomo... [20:26:50] (03CR) 10Andrew Bogott: [C: 032] "Tested in the puppet compiler on ms1001.wikimedia.org" [puppet] - 10https://gerrit.wikimedia.org/r/242171 (https://phabricator.wikimedia.org/T113783) (owner: 10Andrew Bogott) [20:26:50] 6operations, 7HHVM, 5Patch-For-Review, 7Wikimedia-log-errors: Unknown modifier '\': [([^\s,]+)\s*=\s*([^\s,]+)[\+\-]] - https://phabricator.wikimedia.org/T112922#1650167 (10mmodell) Thanks for getting this one fixed! [20:27:06] (03PS2) 10Andrew Bogott: dataset: Remove needless quotes around a 'true' [puppet] - 10https://gerrit.wikimedia.org/r/242171 (https://phabricator.wikimedia.org/T113783) [20:29:29] 6operations, 7Database: Grant 'show view' permissions on s1-analytics-slave/jmorgan to user jmorgan - https://phabricator.wikimedia.org/T114396#1694858 (10Capt_Swing) Fine with me as long as I get full rights to my own tables. Thanks! [20:30:34] (03PS2) 10Andrew Bogott: Change a few rsync params from true/false to yes/no [puppet] - 10https://gerrit.wikimedia.org/r/242172 [20:31:41] (03CR) 10Andrew Bogott: [C: 032] Change a few rsync params from true/false to yes/no [puppet] - 10https://gerrit.wikimedia.org/r/242172 (owner: 10Andrew Bogott) [20:33:07] !log deployed parsoid version 62971510b [20:33:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:44:08] (03PS4) 10Andrew Bogott: lint: fix 'variable not enclosed' warnings [puppet] - 10https://gerrit.wikimedia.org/r/242055 (owner: 10Dzahn) [20:45:10] (03CR) 10Andrew Bogott: [C: 032] lint: fix 'variable not enclosed' warnings [puppet] - 10https://gerrit.wikimedia.org/r/242055 (owner: 10Dzahn) [20:48:36] (03PS2) 10Andrew Bogott: nodepool: switch info logs from hourly to daily [puppet] - 10https://gerrit.wikimedia.org/r/240986 (owner: 10Hashar) [20:49:34] (03CR) 10Andrew Bogott: [C: 032] nodepool: switch info logs from hourly to daily [puppet] - 10https://gerrit.wikimedia.org/r/240986 (owner: 10Hashar) [20:53:03] 10Ops-Access-Requests, 6operations: Jkatz can't sign into Grafana with LDAP password - https://phabricator.wikimedia.org/T114300#1695001 (10Dzahn) @JKatz So we debugged this a bit more on the server itself. We confirmed with another user,papaul, that it works based on the Wikitech names (aka. "cn") but not wi... [21:00:46] mutante: if you want let me know when you are going to do it, I did soemthing or other there back when and I could maybe be helpful (irc bot that is) [21:00:59] (03CR) 1020after4: [C: 031] Make deployment rev represent config state [tools/scap] - 10https://gerrit.wikimedia.org/r/243009 (owner: 10Thcipriani) [21:01:48] mutante: upstart def seems not registering atm but it's been a long time since I poked at this at all [21:03:46] chasemp: :) cool! thanks for that offer, i'll take it [21:05:07] (03PS1) 10RobH: disable user handrade [puppet] - 10https://gerrit.wikimedia.org/r/243024 [21:06:32] 6operations, 7user-notice: schedule maintenance for IRC server - https://phabricator.wikimedia.org/T105804#1695094 (10chasemp) Right should be able to: rush@argon:~# sudo service ircd status ircd start/running, process 15108 rush@argon:~# sudo service ircecho status ircecho start/running, process 8210 [21:06:32] (03CR) 10RobH: [C: 032] disable user handrade [puppet] - 10https://gerrit.wikimedia.org/r/243024 (owner: 10RobH) [21:07:42] 6operations, 7user-notice: schedule maintenance for IRC server - https://phabricator.wikimedia.org/T105804#1695102 (10Quiddity) >>! In T105804#1689469, @Dzahn wrote: > The blocker is kind of the knowledge how to properly bring the IRC bot back up. Sounds simple, but last time i know it was an issue and i could... [21:09:34] 6operations, 10Analytics: removed user handrade from access - https://phabricator.wikimedia.org/T114427#1695116 (10RobH) 3NEW [21:10:37] PROBLEM - puppet last run on mw1017 is CRITICAL: CRITICAL: puppet fail [21:10:46] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: puppet fail [21:10:56] PROBLEM - puppet last run on mw1237 is CRITICAL: CRITICAL: puppet fail [21:11:06] PROBLEM - puppet last run on mw2178 is CRITICAL: CRITICAL: puppet fail [21:11:16] PROBLEM - puppet last run on db2043 is CRITICAL: CRITICAL: puppet fail [21:11:16] PROBLEM - puppet last run on db1048 is CRITICAL: CRITICAL: puppet fail [21:11:16] PROBLEM - puppet last run on conf1003 is CRITICAL: CRITICAL: puppet fail [21:11:25] PROBLEM - puppet last run on pc1002 is CRITICAL: CRITICAL: puppet fail [21:11:26] PROBLEM - puppet last run on analytics1046 is CRITICAL: CRITICAL: puppet fail [21:11:26] PROBLEM - puppet last run on db2042 is CRITICAL: CRITICAL: puppet fail [21:11:27] PROBLEM - puppet last run on cp2020 is CRITICAL: CRITICAL: puppet fail [21:11:44] * andrewbogott is looking at those [21:11:45] PROBLEM - puppet last run on mw1247 is CRITICAL: CRITICAL: puppet fail [21:11:45] PROBLEM - puppet last run on lvs1006 is CRITICAL: CRITICAL: puppet fail [21:11:46] PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: puppet fail [21:11:52] 6operations, 10Analytics: removed user handrade from access - https://phabricator.wikimedia.org/T114427#1695134 (10RobH) a:3Ottomata I'm going with the assumption that I should refer all this analytics to @ottomata for his review or recommendation. Andrew: Please review the above. I'm not sure if you guys... [21:11:55] PROBLEM - puppet last run on mw2025 is CRITICAL: CRITICAL: puppet fail [21:11:56] PROBLEM - puppet last run on mw2135 is CRITICAL: CRITICAL: puppet fail [21:11:56] PROBLEM - puppet last run on cp1050 is CRITICAL: CRITICAL: puppet fail [21:11:56] PROBLEM - puppet last run on db1019 is CRITICAL: CRITICAL: puppet fail [21:11:56] PROBLEM - puppet last run on mw1182 is CRITICAL: CRITICAL: puppet fail [21:12:06] PROBLEM - puppet last run on mw1125 is CRITICAL: CRITICAL: puppet fail [21:12:06] PROBLEM - puppet last run on dbproxy1004 is CRITICAL: CRITICAL: puppet fail [21:12:07] PROBLEM - puppet last run on mw1159 is CRITICAL: CRITICAL: puppet fail [21:12:07] PROBLEM - puppet last run on mw1101 is CRITICAL: CRITICAL: puppet fail [21:12:07] PROBLEM - puppet last run on cp1063 is CRITICAL: CRITICAL: puppet fail [21:12:15] PROBLEM - puppet last run on elastic1019 is CRITICAL: CRITICAL: puppet fail [21:12:15] PROBLEM - puppet last run on mw2165 is CRITICAL: CRITICAL: puppet fail [21:12:15] eh? [21:12:16] PROBLEM - puppet last run on mc1005 is CRITICAL: CRITICAL: puppet fail [21:12:16] PROBLEM - puppet last run on db1020 is CRITICAL: CRITICAL: puppet fail [21:12:16] PROBLEM - puppet last run on mw2005 is CRITICAL: CRITICAL: puppet fail [21:12:16] PROBLEM - puppet last run on mw1127 is CRITICAL: CRITICAL: puppet fail [21:12:25] PROBLEM - puppet last run on gadolinium is CRITICAL: CRITICAL: puppet fail [21:12:26] PROBLEM - puppet last run on mw2147 is CRITICAL: CRITICAL: puppet fail [21:12:27] PROBLEM - puppet last run on cp3033 is CRITICAL: CRITICAL: puppet fail [21:12:27] PROBLEM - puppet last run on cp3020 is CRITICAL: CRITICAL: puppet fail [21:12:27] PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: puppet fail [21:12:27] PROBLEM - puppet last run on cp3044 is CRITICAL: CRITICAL: puppet fail [21:12:33] found, fixing [21:12:35] PROBLEM - puppet last run on cp3043 is CRITICAL: CRITICAL: puppet fail [21:12:35] PROBLEM - puppet last run on ms-be2012 is CRITICAL: CRITICAL: puppet fail [21:12:36] PROBLEM - puppet last run on restbase1004 is CRITICAL: CRITICAL: puppet fail [21:12:36] PROBLEM - puppet last run on mw2149 is CRITICAL: CRITICAL: puppet fail [21:12:36] PROBLEM - puppet last run on mw2151 is CRITICAL: CRITICAL: puppet fail [21:12:36] PROBLEM - puppet last run on elastic1024 is CRITICAL: CRITICAL: puppet fail [21:12:36] PROBLEM - puppet last run on es1012 is CRITICAL: CRITICAL: puppet fail [21:12:36] PROBLEM - puppet last run on mw1146 is CRITICAL: CRITICAL: puppet fail [21:12:37] PROBLEM - puppet last run on ms-be2005 is CRITICAL: CRITICAL: puppet fail [21:12:45] PROBLEM - puppet last run on mw2042 is CRITICAL: CRITICAL: puppet fail [21:12:45] PROBLEM - puppet last run on db1036 is CRITICAL: CRITICAL: puppet fail [21:12:45] PROBLEM - puppet last run on labsdb1006 is CRITICAL: CRITICAL: puppet fail [21:12:45] PROBLEM - puppet last run on db1069 is CRITICAL: CRITICAL: puppet fail [21:12:46] PROBLEM - puppet last run on wtp2014 is CRITICAL: CRITICAL: puppet fail [21:12:47] PROBLEM - puppet last run on cp1074 is CRITICAL: CRITICAL: puppet fail [21:12:48] heh, i wanted to make sure andrewbogott was talking about puppet failures but didnt wanna distract him. [21:12:55] PROBLEM - puppet last run on es2004 is CRITICAL: CRITICAL: puppet fail [21:12:56] PROBLEM - puppet last run on db2048 is CRITICAL: CRITICAL: puppet fail [21:12:56] PROBLEM - puppet last run on mw1227 is CRITICAL: CRITICAL: puppet fail [21:12:57] PROBLEM - puppet last run on mc1001 is CRITICAL: CRITICAL: puppet fail [21:12:57] PROBLEM - puppet last run on elastic1015 is CRITICAL: CRITICAL: puppet fail [21:12:57] PROBLEM - puppet last run on mw1004 is CRITICAL: CRITICAL: puppet fail [21:13:05] PROBLEM - puppet last run on mw1111 is CRITICAL: CRITICAL: puppet fail [21:13:05] PROBLEM - puppet last run on lvs4003 is CRITICAL: CRITICAL: puppet fail [21:13:05] PROBLEM - puppet last run on mw1221 is CRITICAL: CRITICAL: puppet fail [21:13:05] PROBLEM - puppet last run on mw2189 is CRITICAL: CRITICAL: puppet fail [21:13:06] PROBLEM - puppet last run on mw2089 is CRITICAL: CRITICAL: puppet fail [21:13:06] PROBLEM - puppet last run on mw1190 is CRITICAL: CRITICAL: puppet fail [21:13:16] PROBLEM - puppet last run on mw1214 is CRITICAL: CRITICAL: puppet fail [21:13:16] PROBLEM - puppet last run on ms-be2007 is CRITICAL: CRITICAL: puppet fail [21:13:16] PROBLEM - puppet last run on mw2099 is CRITICAL: CRITICAL: puppet fail [21:13:16] PROBLEM - puppet last run on ms-be1016 is CRITICAL: CRITICAL: puppet fail [21:13:25] PROBLEM - puppet last run on cp1051 is CRITICAL: CRITICAL: puppet fail [21:13:26] PROBLEM - puppet last run on ms-be1012 is CRITICAL: CRITICAL: puppet fail [21:13:26] PROBLEM - puppet last run on ms-be2001 is CRITICAL: CRITICAL: puppet fail [21:13:26] PROBLEM - puppet last run on elastic1026 is CRITICAL: CRITICAL: puppet fail [21:13:27] PROBLEM - puppet last run on wtp1018 is CRITICAL: CRITICAL: puppet fail [21:13:35] PROBLEM - puppet last run on cp1069 is CRITICAL: CRITICAL: puppet fail [21:13:35] PROBLEM - puppet last run on mw1232 is CRITICAL: CRITICAL: puppet fail [21:13:36] PROBLEM - puppet last run on wtp1024 is CRITICAL: CRITICAL: puppet fail [21:13:36] PROBLEM - puppet last run on mw1056 is CRITICAL: CRITICAL: puppet fail [21:13:36] PROBLEM - puppet last run on mw1081 is CRITICAL: CRITICAL: puppet fail [21:13:36] PROBLEM - puppet last run on mw2065 is CRITICAL: CRITICAL: puppet fail [21:13:37] PROBLEM - puppet last run on mw2006 is CRITICAL: CRITICAL: puppet fail [21:13:45] PROBLEM - puppet last run on es1018 is CRITICAL: CRITICAL: puppet fail [21:13:46] PROBLEM - puppet last run on mw1124 is CRITICAL: CRITICAL: puppet fail [21:13:46] PROBLEM - puppet last run on mw2179 is CRITICAL: CRITICAL: puppet fail [21:13:46] PROBLEM - puppet last run on mw2041 is CRITICAL: CRITICAL: puppet fail [21:13:46] PROBLEM - puppet last run on mw1196 is CRITICAL: CRITICAL: puppet fail [21:13:46] PROBLEM - puppet last run on es1017 is CRITICAL: CRITICAL: puppet fail [21:13:47] PROBLEM - puppet last run on mw1057 is CRITICAL: CRITICAL: puppet fail [21:13:53] it’s just a typo, but will be noisy in the meantime [21:13:55] PROBLEM - puppet last run on restbase2005 is CRITICAL: CRITICAL: puppet fail [21:13:56] PROBLEM - puppet last run on wtp2007 is CRITICAL: CRITICAL: puppet fail [21:14:06] PROBLEM - puppet last run on mw2194 is CRITICAL: CRITICAL: puppet fail [21:14:06] PROBLEM - puppet last run on cp2021 is CRITICAL: CRITICAL: puppet fail [21:14:06] PROBLEM - puppet last run on mw2197 is CRITICAL: CRITICAL: puppet fail [21:14:07] PROBLEM - puppet last run on cp3035 is CRITICAL: CRITICAL: puppet fail [21:14:15] PROBLEM - puppet last run on acamar is CRITICAL: CRITICAL: puppet fail [21:14:15] PROBLEM - puppet last run on mw1087 is CRITICAL: CRITICAL: puppet fail [21:14:16] PROBLEM - puppet last run on titanium is CRITICAL: CRITICAL: puppet fail [21:14:16] PROBLEM - puppet last run on cp1072 is CRITICAL: CRITICAL: puppet fail [21:14:17] PROBLEM - puppet last run on mw1171 is CRITICAL: CRITICAL: puppet fail [21:14:25] PROBLEM - puppet last run on mw1188 is CRITICAL: CRITICAL: puppet fail [21:14:25] PROBLEM - puppet last run on dbstore1001 is CRITICAL: CRITICAL: puppet fail [21:14:25] PROBLEM - puppet last run on mx2001 is CRITICAL: CRITICAL: puppet fail [21:14:26] PROBLEM - puppet last run on es1013 is CRITICAL: CRITICAL: puppet fail [21:14:26] PROBLEM - puppet last run on cp4007 is CRITICAL: CRITICAL: puppet fail [21:14:26] PROBLEM - puppet last run on ms-be2002 is CRITICAL: CRITICAL: puppet fail [21:14:26] PROBLEM - puppet last run on mw1161 is CRITICAL: CRITICAL: puppet fail [21:14:35] PROBLEM - puppet last run on analytics1015 is CRITICAL: CRITICAL: puppet fail [21:14:36] PROBLEM - puppet last run on mw1130 is CRITICAL: CRITICAL: puppet fail [21:14:37] PROBLEM - puppet last run on mw1048 is CRITICAL: CRITICAL: puppet fail [21:14:46] PROBLEM - puppet last run on mw2201 is CRITICAL: CRITICAL: puppet fail [21:14:46] PROBLEM - puppet last run on mw1243 is CRITICAL: CRITICAL: puppet fail [21:14:46] PROBLEM - puppet last run on mw2139 is CRITICAL: CRITICAL: puppet fail [21:14:46] PROBLEM - puppet last run on wtp1004 is CRITICAL: CRITICAL: puppet fail [21:14:47] PROBLEM - puppet last run on mw2103 is CRITICAL: CRITICAL: puppet fail [21:14:47] PROBLEM - puppet last run on mw2170 is CRITICAL: CRITICAL: puppet fail [21:14:47] PROBLEM - puppet last run on mc1007 is CRITICAL: CRITICAL: puppet fail [21:14:47] PROBLEM - puppet last run on berkelium is CRITICAL: CRITICAL: puppet fail [21:14:47] PROBLEM - puppet last run on ganeti2005 is CRITICAL: CRITICAL: puppet fail [21:14:55] PROBLEM - puppet last run on db2046 is CRITICAL: CRITICAL: puppet fail [21:14:55] PROBLEM - puppet last run on mw2112 is CRITICAL: CRITICAL: puppet fail [21:14:56] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: puppet fail [21:14:56] PROBLEM - puppet last run on elastic1017 is CRITICAL: CRITICAL: puppet fail [21:14:56] PROBLEM - puppet last run on rdb2002 is CRITICAL: CRITICAL: puppet fail [21:15:05] PROBLEM - puppet last run on mw1210 is CRITICAL: CRITICAL: puppet fail [21:15:05] PROBLEM - puppet last run on eventlog1001 is CRITICAL: CRITICAL: puppet fail [21:15:05] PROBLEM - puppet last run on cp2007 is CRITICAL: CRITICAL: puppet fail [21:15:06] PROBLEM - puppet last run on es2003 is CRITICAL: CRITICAL: puppet fail [21:15:06] PROBLEM - puppet last run on mc2009 is CRITICAL: CRITICAL: puppet fail [21:15:06] PROBLEM - puppet last run on mw2102 is CRITICAL: CRITICAL: puppet fail [21:15:06] PROBLEM - puppet last run on mc1009 is CRITICAL: CRITICAL: puppet fail [21:15:07] PROBLEM - puppet last run on mw1038 is CRITICAL: CRITICAL: puppet fail [21:15:15] PROBLEM - puppet last run on mc1004 is CRITICAL: CRITICAL: puppet fail [21:15:16] PROBLEM - puppet last run on mw1147 is CRITICAL: CRITICAL: puppet fail [21:15:16] PROBLEM - puppet last run on cp2018 is CRITICAL: CRITICAL: puppet fail [21:15:16] PROBLEM - puppet last run on db2069 is CRITICAL: CRITICAL: puppet fail [21:15:16] PROBLEM - puppet last run on ms-be1018 is CRITICAL: CRITICAL: puppet fail [21:15:16] PROBLEM - puppet last run on wtp1015 is CRITICAL: CRITICAL: puppet fail [21:15:16] PROBLEM - puppet last run on californium is CRITICAL: CRITICAL: puppet fail [21:15:17] PROBLEM - puppet last run on analytics1001 is CRITICAL: CRITICAL: puppet fail [21:15:17] PROBLEM - puppet last run on wtp2002 is CRITICAL: CRITICAL: puppet fail [21:15:25] PROBLEM - puppet last run on db1064 is CRITICAL: CRITICAL: puppet fail [21:15:25] PROBLEM - puppet last run on mw2140 is CRITICAL: CRITICAL: puppet fail [21:15:27] (03PS1) 10Andrew Bogott: Typo fix: s/abenst/absent [puppet] - 10https://gerrit.wikimedia.org/r/243025 [21:15:27] PROBLEM - puppet last run on mw1024 is CRITICAL: CRITICAL: puppet fail [21:15:27] PROBLEM - puppet last run on mw2160 is CRITICAL: CRITICAL: puppet fail [21:15:27] PROBLEM - puppet last run on mw1033 is CRITICAL: CRITICAL: puppet fail [21:15:35] PROBLEM - puppet last run on rdb1002 is CRITICAL: CRITICAL: puppet fail [21:15:35] i want to mute it but i dont wanna miss a real alert so not going to do it. [21:15:36] PROBLEM - puppet last run on mw2183 is CRITICAL: CRITICAL: puppet fail [21:15:36] PROBLEM - puppet last run on logstash1001 is CRITICAL: CRITICAL: puppet fail [21:15:36] PROBLEM - puppet last run on mw2116 is CRITICAL: CRITICAL: puppet fail [21:15:36] PROBLEM - puppet last run on db1001 is CRITICAL: CRITICAL: puppet fail [21:15:36] PROBLEM - puppet last run on db1029 is CRITICAL: CRITICAL: puppet fail [21:15:37] PROBLEM - puppet last run on mw2031 is CRITICAL: CRITICAL: puppet fail [21:15:45] PROBLEM - puppet last run on mw1041 is CRITICAL: CRITICAL: puppet fail [21:15:45] PROBLEM - puppet last run on mw1007 is CRITICAL: CRITICAL: puppet fail [21:15:45] PROBLEM - puppet last run on mw1032 is CRITICAL: CRITICAL: puppet fail [21:15:45] PROBLEM - puppet last run on mw1089 is CRITICAL: CRITICAL: puppet fail [21:15:46] PROBLEM - puppet last run on pybal-test2001 is CRITICAL: CRITICAL: puppet fail [21:15:46] PROBLEM - puppet last run on cp2025 is CRITICAL: CRITICAL: puppet fail [21:15:46] PROBLEM - puppet last run on mw1225 is CRITICAL: CRITICAL: puppet fail [21:15:47] PROBLEM - puppet last run on analytics1043 is CRITICAL: CRITICAL: puppet fail [21:15:47] PROBLEM - puppet last run on cp3034 is CRITICAL: CRITICAL: puppet fail [21:15:56] PROBLEM - puppet last run on mw1140 is CRITICAL: CRITICAL: puppet fail [21:15:56] PROBLEM - puppet last run on db2070 is CRITICAL: CRITICAL: puppet fail [21:15:56] PROBLEM - puppet last run on mw2100 is CRITICAL: CRITICAL: puppet fail [21:15:56] PROBLEM - puppet last run on analytics1033 is CRITICAL: CRITICAL: puppet fail [21:16:05] PROBLEM - puppet last run on mw1115 is CRITICAL: CRITICAL: puppet fail [21:16:05] 6operations, 10Analytics: removed user handrade from access - https://phabricator.wikimedia.org/T114427#1695162 (10Ottomata) Uhhhh, I would say that I don't have much info on who accesses these systems. Many people ask for access, managers grant permission, and then opsen give access as part of triage duty.... [21:16:05] PROBLEM - puppet last run on wtp1007 is CRITICAL: CRITICAL: puppet fail [21:16:06] PROBLEM - puppet last run on mw2044 is CRITICAL: CRITICAL: puppet fail [21:16:06] PROBLEM - puppet last run on mc2012 is CRITICAL: CRITICAL: puppet fail [21:16:06] PROBLEM - puppet last run on mw1122 is CRITICAL: CRITICAL: puppet fail [21:16:06] PROBLEM - puppet last run on planet1001 is CRITICAL: CRITICAL: puppet fail [21:16:06] PROBLEM - puppet last run on potassium is CRITICAL: CRITICAL: puppet fail [21:16:15] PROBLEM - puppet last run on cp1049 is CRITICAL: CRITICAL: puppet fail [21:16:16] PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: puppet fail [21:16:17] PROBLEM - puppet last run on ganeti2001 is CRITICAL: CRITICAL: puppet fail [21:16:25] PROBLEM - puppet last run on rdb1003 is CRITICAL: CRITICAL: puppet fail [21:16:26] PROBLEM - puppet last run on mw2199 is CRITICAL: CRITICAL: puppet fail [21:16:26] PROBLEM - puppet last run on analytics1057 is CRITICAL: CRITICAL: puppet fail [21:16:26] PROBLEM - puppet last run on mw2009 is CRITICAL: CRITICAL: puppet fail [21:16:26] PROBLEM - puppet last run on mw2153 is CRITICAL: CRITICAL: puppet fail [21:16:26] PROBLEM - puppet last run on aqs1003 is CRITICAL: CRITICAL: puppet fail [21:16:26] PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: puppet fail [21:16:27] PROBLEM - puppet last run on mc2002 is CRITICAL: CRITICAL: puppet fail [21:16:33] (03CR) 10Andrew Bogott: [C: 032] Typo fix: s/abenst/absent [puppet] - 10https://gerrit.wikimedia.org/r/243025 (owner: 10Andrew Bogott) [21:16:35] PROBLEM - puppet last run on dbproxy1002 is CRITICAL: CRITICAL: puppet fail [21:16:35] PROBLEM - puppet last run on ganeti1004 is CRITICAL: CRITICAL: puppet fail [21:16:35] PROBLEM - puppet last run on mw1063 is CRITICAL: CRITICAL: puppet fail [21:16:35] PROBLEM - puppet last run on ms-be2003 is CRITICAL: CRITICAL: puppet fail [21:16:36] PROBLEM - puppet last run on mw2164 is CRITICAL: CRITICAL: puppet fail [21:16:45] PROBLEM - puppet last run on ms-fe2004 is CRITICAL: CRITICAL: puppet fail [21:16:45] PROBLEM - puppet last run on mw2064 is CRITICAL: CRITICAL: puppet fail [21:16:45] PROBLEM - puppet last run on mw1201 is CRITICAL: CRITICAL: puppet fail [21:16:46] PROBLEM - puppet last run on db1033 is CRITICAL: CRITICAL: puppet fail [21:16:46] PROBLEM - puppet last run on es2008 is CRITICAL: CRITICAL: puppet fail [21:16:55] PROBLEM - puppet last run on mw2015 is CRITICAL: CRITICAL: puppet fail [21:16:56] PROBLEM - puppet last run on mw1010 is CRITICAL: CRITICAL: puppet fail [21:16:56] PROBLEM - puppet last run on mw2161 is CRITICAL: CRITICAL: puppet fail [21:16:56] PROBLEM - puppet last run on ganeti2004 is CRITICAL: CRITICAL: puppet fail [21:16:56] PROBLEM - puppet last run on mw1091 is CRITICAL: CRITICAL: puppet fail [21:16:56] PROBLEM - puppet last run on mw1197 is CRITICAL: CRITICAL: puppet fail [21:16:57] PROBLEM - puppet last run on mw1150 is CRITICAL: CRITICAL: puppet fail [21:16:57] PROBLEM - puppet last run on mw2010 is CRITICAL: CRITICAL: puppet fail [21:16:57] PROBLEM - puppet last run on fluorine is CRITICAL: CRITICAL: puppet fail [21:17:05] PROBLEM - puppet last run on db1066 is CRITICAL: CRITICAL: puppet fail [21:17:06] PROBLEM - puppet last run on mw2080 is CRITICAL: CRITICAL: puppet fail [21:17:06] PROBLEM - puppet last run on db1044 is CRITICAL: CRITICAL: puppet fail [21:17:06] PROBLEM - puppet last run on mw1069 is CRITICAL: CRITICAL: puppet fail [21:17:06] PROBLEM - puppet last run on wtp2020 is CRITICAL: CRITICAL: puppet fail [21:17:07] PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: puppet fail [21:17:10] that might be a record for most alarms caused by a single typo [21:17:15] PROBLEM - puppet last run on db2009 is CRITICAL: CRITICAL: puppet fail [21:17:16] PROBLEM - puppet last run on wtp2005 is CRITICAL: CRITICAL: puppet fail [21:17:16] PROBLEM - puppet last run on mw2082 is CRITICAL: CRITICAL: puppet fail [21:17:16] PROBLEM - puppet last run on cp2026 is CRITICAL: CRITICAL: puppet fail [21:17:16] PROBLEM - puppet last run on cp2023 is CRITICAL: CRITICAL: puppet fail [21:17:17] PROBLEM - puppet last run on analytics1004 is CRITICAL: CRITICAL: puppet fail [21:17:17] PROBLEM - puppet last run on lvs1002 is CRITICAL: CRITICAL: puppet fail [21:17:17] PROBLEM - puppet last run on mw2109 is CRITICAL: CRITICAL: puppet fail [21:17:17] PROBLEM - puppet last run on mw2188 is CRITICAL: CRITICAL: puppet fail [21:17:25] PROBLEM - puppet last run on db1031 is CRITICAL: CRITICAL: puppet fail [21:17:25] PROBLEM - puppet last run on iodine is CRITICAL: CRITICAL: puppet fail [21:17:25] PROBLEM - puppet last run on mw1106 is CRITICAL: CRITICAL: puppet fail [21:17:26] PROBLEM - puppet last run on db1050 is CRITICAL: CRITICAL: puppet fail [21:17:28] PROBLEM - puppet last run on lvs1005 is CRITICAL: CRITICAL: puppet fail [21:17:28] PROBLEM - puppet last run on restbase1006 is CRITICAL: CRITICAL: puppet fail [21:17:28] PROBLEM - puppet last run on ms-fe1001 is CRITICAL: CRITICAL: puppet fail [21:17:28] PROBLEM - puppet last run on mw2163 is CRITICAL: CRITICAL: puppet fail [21:17:28] PROBLEM - puppet last run on ms-be1006 is CRITICAL: CRITICAL: puppet fail [21:17:28] PROBLEM - puppet last run on elastic1004 is CRITICAL: CRITICAL: puppet fail [21:17:35] PROBLEM - puppet last run on mc2016 is CRITICAL: CRITICAL: puppet fail [21:17:35] PROBLEM - puppet last run on cp3021 is CRITICAL: CRITICAL: puppet fail [21:17:35] PROBLEM - puppet last run on cp3018 is CRITICAL: CRITICAL: puppet fail [21:17:35] PROBLEM - puppet last run on lvs3003 is CRITICAL: CRITICAL: puppet fail [21:17:35] PROBLEM - puppet last run on cp3040 is CRITICAL: CRITICAL: puppet fail [21:17:36] PROBLEM - puppet last run on mw1143 is CRITICAL: CRITICAL: puppet fail [21:17:36] PROBLEM - puppet last run on mw1142 is CRITICAL: CRITICAL: puppet fail [21:17:36] PROBLEM - puppet last run on mw2114 is CRITICAL: CRITICAL: puppet fail [21:17:37] PROBLEM - puppet last run on copper is CRITICAL: CRITICAL: puppet fail [21:17:37] PROBLEM - puppet last run on restbase-test2002 is CRITICAL: CRITICAL: puppet fail [21:17:45] PROBLEM - puppet last run on mw1107 is CRITICAL: CRITICAL: puppet fail [21:17:46] PROBLEM - puppet last run on mw2083 is CRITICAL: CRITICAL: puppet fail [21:17:46] PROBLEM - puppet last run on db2045 is CRITICAL: CRITICAL: puppet fail [21:17:47] PROBLEM - puppet last run on db2059 is CRITICAL: CRITICAL: puppet fail [21:17:47] PROBLEM - puppet last run on ms-be2004 is CRITICAL: CRITICAL: puppet fail [21:17:55] PROBLEM - puppet last run on mw2075 is CRITICAL: CRITICAL: puppet fail [21:17:56] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: puppet fail [21:17:56] PROBLEM - puppet last run on mw1066 is CRITICAL: CRITICAL: puppet fail [21:17:56] PROBLEM - puppet last run on analytics1048 is CRITICAL: CRITICAL: puppet fail [21:17:56] PROBLEM - puppet last run on ms-be2006 is CRITICAL: CRITICAL: puppet fail [21:17:56] PROBLEM - puppet last run on mw1154 is CRITICAL: CRITICAL: puppet fail [21:18:05] PROBLEM - puppet last run on mw1153 is CRITICAL: CRITICAL: puppet fail [21:18:06] PROBLEM - puppet last run on mw1235 is CRITICAL: CRITICAL: puppet fail [21:18:06] PROBLEM - puppet last run on mw1068 is CRITICAL: CRITICAL: puppet fail [21:18:06] PROBLEM - puppet last run on mw1046 is CRITICAL: CRITICAL: puppet fail [21:18:06] PROBLEM - puppet last run on mw1254 is CRITICAL: CRITICAL: puppet fail [21:18:06] PROBLEM - puppet last run on calcium is CRITICAL: CRITICAL: puppet fail [21:18:15] PROBLEM - puppet last run on es1014 is CRITICAL: CRITICAL: puppet fail [21:18:15] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: puppet fail [21:18:16] PROBLEM - puppet last run on mw2105 is CRITICAL: CRITICAL: puppet fail [21:18:16] PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: puppet fail [21:18:16] PROBLEM - puppet last run on mw2134 is CRITICAL: CRITICAL: puppet fail [21:18:16] PROBLEM - puppet last run on mw2176 is CRITICAL: CRITICAL: puppet fail [21:18:16] PROBLEM - puppet last run on labcontrol1001 is CRITICAL: CRITICAL: puppet fail [21:18:25] PROBLEM - puppet last run on mw1003 is CRITICAL: CRITICAL: puppet fail [21:18:25] PROBLEM - puppet last run on mw2127 is CRITICAL: CRITICAL: puppet fail [21:18:25] PROBLEM - puppet last run on cp2011 is CRITICAL: CRITICAL: puppet fail [21:18:26] 6operations: audit contractors sheet against cluster access - https://phabricator.wikimedia.org/T114430#1695180 (10RobH) 3NEW a:3RobH [21:18:26] PROBLEM - puppet last run on ganeti1002 is CRITICAL: CRITICAL: puppet fail [21:18:26] PROBLEM - puppet last run on cp4002 is CRITICAL: CRITICAL: puppet fail [21:18:26] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: puppet fail [21:18:26] PROBLEM - puppet last run on mw1204 is CRITICAL: CRITICAL: puppet fail [21:18:27] PROBLEM - puppet last run on lvs1004 is CRITICAL: CRITICAL: puppet fail [21:18:27] PROBLEM - puppet last run on mw2087 is CRITICAL: CRITICAL: puppet fail [21:18:36] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: puppet fail [21:18:36] PROBLEM - puppet last run on mw1241 is CRITICAL: CRITICAL: puppet fail [21:18:36] PROBLEM - puppet last run on wtp2001 is CRITICAL: CRITICAL: puppet fail [21:18:36] PROBLEM - puppet last run on achernar is CRITICAL: CRITICAL: puppet fail [21:18:37] PROBLEM - puppet last run on mw2070 is CRITICAL: CRITICAL: puppet fail [21:18:37] PROBLEM - puppet last run on mw1173 is CRITICAL: CRITICAL: puppet fail [21:18:42] 6operations, 10Analytics: removed user handrade from access - https://phabricator.wikimedia.org/T114427#1695191 (10RobH) [21:18:44] 6operations: audit contractors sheet against cluster access - https://phabricator.wikimedia.org/T114430#1695190 (10RobH) [21:18:45] PROBLEM - puppet last run on mw2004 is CRITICAL: CRITICAL: puppet fail [21:18:45] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: puppet fail [21:18:45] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: puppet fail [21:18:45] PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: puppet fail [21:18:46] PROBLEM - puppet last run on mw1131 is CRITICAL: CRITICAL: puppet fail [21:18:46] PROBLEM - puppet last run on restbase2002 is CRITICAL: CRITICAL: puppet fail [21:18:46] PROBLEM - puppet last run on ms-be1015 is CRITICAL: CRITICAL: puppet fail [21:18:46] PROBLEM - puppet last run on mw2123 is CRITICAL: CRITICAL: puppet fail [21:18:47] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: puppet fail [21:18:47] PROBLEM - puppet last run on mw1253 is CRITICAL: CRITICAL: puppet fail [21:18:48] PROBLEM - puppet last run on elastic1008 is CRITICAL: CRITICAL: puppet fail [21:18:48] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: puppet fail [21:18:56] PROBLEM - puppet last run on mw2212 is CRITICAL: CRITICAL: puppet fail [21:18:56] PROBLEM - puppet last run on mw2067 is CRITICAL: CRITICAL: puppet fail [21:18:56] PROBLEM - puppet last run on db1022 is CRITICAL: CRITICAL: puppet fail [21:18:57] PROBLEM - puppet last run on mw2143 is CRITICAL: CRITICAL: puppet fail [21:18:57] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: puppet fail [21:18:57] PROBLEM - puppet last run on wtp1001 is CRITICAL: CRITICAL: puppet fail [21:19:05] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: puppet fail [21:19:05] RECOVERY - puppet last run on mw1017 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [21:19:05] PROBLEM - puppet last run on mw1027 is CRITICAL: CRITICAL: puppet fail [21:19:05] PROBLEM - puppet last run on mw2117 is CRITICAL: CRITICAL: puppet fail [21:19:06] PROBLEM - puppet last run on mw1104 is CRITICAL: CRITICAL: puppet fail [21:19:06] PROBLEM - puppet last run on db2054 is CRITICAL: CRITICAL: puppet fail [21:19:06] RECOVERY - puppet last run on restbase1006 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [21:19:07] PROBLEM - puppet last run on mw1211 is CRITICAL: CRITICAL: puppet fail [21:19:07] PROBLEM - puppet last run on mw2113 is CRITICAL: CRITICAL: puppet fail [21:19:08] PROBLEM - puppet last run on analytics1035 is CRITICAL: CRITICAL: puppet fail [21:19:11] fixed [21:19:15] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: puppet fail [21:19:15] PROBLEM - puppet last run on cp3042 is CRITICAL: CRITICAL: puppet fail [21:19:15] PROBLEM - puppet last run on cp3047 is CRITICAL: CRITICAL: puppet fail [21:19:15] PROBLEM - puppet last run on ms-fe1002 is CRITICAL: CRITICAL: puppet fail [21:19:16] PROBLEM - puppet last run on mw2184 is CRITICAL: CRITICAL: puppet fail [21:19:16] PROBLEM - puppet last run on mw2056 is CRITICAL: CRITICAL: puppet fail [21:19:17] PROBLEM - puppet last run on mw2019 is CRITICAL: CRITICAL: puppet fail [21:19:17] PROBLEM - puppet last run on mw1047 is CRITICAL: CRITICAL: puppet fail [21:19:21] andrewbogott: kill the bot? [21:19:25] PROBLEM - puppet last run on es2010 is CRITICAL: CRITICAL: puppet fail [21:19:25] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: puppet fail [21:19:26] PROBLEM - puppet last run on mw1021 is CRITICAL: CRITICAL: puppet fail [21:19:26] PROBLEM - puppet last run on cp2016 is CRITICAL: CRITICAL: puppet fail [21:19:26] PROBLEM - puppet last run on mw1155 is CRITICAL: CRITICAL: puppet fail [21:19:27] PROBLEM - puppet last run on mw2096 is CRITICAL: CRITICAL: puppet fail [21:19:34] JohnFLewis: *shrug* it’ll recover in a moment [21:19:35] PROBLEM - puppet last run on db1034 is CRITICAL: CRITICAL: puppet fail [21:19:35] PROBLEM - puppet last run on mw2093 is CRITICAL: CRITICAL: puppet fail [21:19:35] PROBLEM - puppet last run on mw2079 is CRITICAL: CRITICAL: puppet fail [21:19:36] PROBLEM - puppet last run on mw1011 is CRITICAL: CRITICAL: puppet fail [21:19:36] PROBLEM - puppet last run on analytics1031 is CRITICAL: CRITICAL: puppet fail [21:19:36] PROBLEM - puppet last run on db1027 is CRITICAL: CRITICAL: puppet fail [21:19:37] PROBLEM - puppet last run on mw2090 is CRITICAL: CRITICAL: puppet fail [21:19:45] PROBLEM - puppet last run on logstash1002 is CRITICAL: CRITICAL: puppet fail [21:19:45] PROBLEM - puppet last run on mw2084 is CRITICAL: CRITICAL: puppet fail [21:19:45] PROBLEM - puppet last run on mw1137 is CRITICAL: CRITICAL: puppet fail [21:19:46] PROBLEM - puppet last run on mw2110 is CRITICAL: CRITICAL: puppet fail [21:19:46] PROBLEM - puppet last run on mw2092 is CRITICAL: CRITICAL: puppet fail [21:19:47] PROBLEM - puppet last run on mw1194 is CRITICAL: CRITICAL: puppet fail [21:19:47] PROBLEM - puppet last run on dbproxy1001 is CRITICAL: CRITICAL: puppet fail [21:19:55] PROBLEM - puppet last run on wtp2010 is CRITICAL: CRITICAL: puppet fail [21:19:55] PROBLEM - puppet last run on mw2196 is CRITICAL: CRITICAL: puppet fail [21:19:55] PROBLEM - puppet last run on elastic1027 is CRITICAL: CRITICAL: puppet fail [21:19:56] PROBLEM - puppet last run on mw2003 is CRITICAL: CRITICAL: puppet fail [21:19:56] PROBLEM - puppet last run on mw2047 is CRITICAL: CRITICAL: puppet fail [21:19:56] PROBLEM - puppet last run on mw2142 is CRITICAL: CRITICAL: puppet fail [21:19:59] andrewbogott: yeah but failures are still coming in so there will be a *lot* more (at least double) [21:20:05] PROBLEM - puppet last run on etherpad1001 is CRITICAL: CRITICAL: puppet fail [21:20:06] PROBLEM - puppet last run on cp2014 is CRITICAL: CRITICAL: puppet fail [21:20:06] PROBLEM - puppet last run on db2029 is CRITICAL: CRITICAL: puppet fail [21:20:06] PROBLEM - puppet last run on db2065 is CRITICAL: CRITICAL: puppet fail [21:20:06] PROBLEM - puppet last run on mw1206 is CRITICAL: CRITICAL: puppet fail [21:20:07] PROBLEM - puppet last run on mw2030 is CRITICAL: CRITICAL: puppet fail [21:20:15] PROBLEM - puppet last run on cp1066 is CRITICAL: CRITICAL: puppet fail [21:20:16] PROBLEM - puppet last run on db1021 is CRITICAL: CRITICAL: puppet fail [21:20:16] PROBLEM - puppet last run on mw1213 is CRITICAL: CRITICAL: puppet fail [21:20:16] PROBLEM - puppet last run on ms-be3003 is CRITICAL: CRITICAL: puppet fail [21:20:17] PROBLEM - puppet last run on cp1071 is CRITICAL: CRITICAL: puppet fail [21:20:17] PROBLEM - puppet last run on db1042 is CRITICAL: CRITICAL: puppet fail [21:20:17] PROBLEM - puppet last run on cp4015 is CRITICAL: CRITICAL: puppet fail [21:20:17] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: puppet fail [21:20:26] PROBLEM - puppet last run on mw2049 is CRITICAL: CRITICAL: puppet fail [21:20:26] PROBLEM - puppet last run on mw2055 is CRITICAL: CRITICAL: puppet fail [21:20:26] PROBLEM - puppet last run on mw2131 is CRITICAL: CRITICAL: puppet fail [21:20:27] PROBLEM - puppet last run on db1051 is CRITICAL: CRITICAL: puppet fail [21:20:27] PROBLEM - puppet last run on cp2003 is CRITICAL: CRITICAL: puppet fail [21:20:27] the bot is throttled [21:20:29] JohnFLewis: 4:15 PM i want to mute it but i dont wanna miss a real alert so not going to do it. [21:20:31] so yea, i'll be a bit [21:20:35] PROBLEM - puppet last run on mw1230 is CRITICAL: CRITICAL: puppet fail [21:20:35] PROBLEM - puppet last run on db1068 is CRITICAL: CRITICAL: puppet fail [21:20:36] PROBLEM - puppet last run on cp2010 is CRITICAL: CRITICAL: puppet fail [21:20:36] PROBLEM - puppet last run on ms-fe2003 is CRITICAL: CRITICAL: puppet fail [21:20:36] PROBLEM - puppet last run on mw2062 is CRITICAL: CRITICAL: puppet fail [21:20:37] PROBLEM - puppet last run on mw2182 is CRITICAL: CRITICAL: puppet fail [21:20:37] PROBLEM - puppet last run on db2047 is CRITICAL: CRITICAL: puppet fail [21:20:38] it'll even [21:20:45] PROBLEM - puppet last run on labsdb1005 is CRITICAL: CRITICAL: puppet fail [21:20:45] PROBLEM - puppet last run on mw1054 is CRITICAL: CRITICAL: puppet fail [21:20:45] PROBLEM - puppet last run on mw1129 is CRITICAL: CRITICAL: puppet fail [21:20:46] PROBLEM - puppet last run on wtp2016 is CRITICAL: CRITICAL: puppet fail [21:20:46] PROBLEM - puppet last run on cp4019 is CRITICAL: CRITICAL: puppet fail [21:20:46] PROBLEM - puppet last run on mw1128 is CRITICAL: CRITICAL: puppet fail [21:20:47] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: puppet fail [21:20:48] nice to see the bot doesnt get kicked :) [21:20:55] PROBLEM - puppet last run on cp3030 is CRITICAL: CRITICAL: puppet fail [21:20:55] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: puppet fail [21:21:01] if the bot dies it just stores its backlog right? [21:21:01] robh: the web interface exists for a reason ;) but mkay [21:21:05] PROBLEM - puppet last run on cp1058 is CRITICAL: CRITICAL: puppet fail [21:21:06] PROBLEM - puppet last run on mw1075 is CRITICAL: CRITICAL: puppet fail [21:21:06] PROBLEM - puppet last run on mw2039 is CRITICAL: CRITICAL: puppet fail [21:21:11] cuz we could kill it now that the web interface is cleaned up [21:21:15] PROBLEM - puppet last run on ms-be3002 is CRITICAL: CRITICAL: puppet fail [21:21:15] and lacking a billio puppet errors ;] [21:21:15] PROBLEM - puppet last run on mw2101 is CRITICAL: CRITICAL: puppet fail [21:21:15] PROBLEM - puppet last run on antimony is CRITICAL: CRITICAL: puppet fail [21:21:16] PROBLEM - puppet last run on labvirt1009 is CRITICAL: CRITICAL: puppet fail [21:21:16] PROBLEM - puppet last run on dataset1001 is CRITICAL: CRITICAL: puppet fail [21:21:16] PROBLEM - puppet last run on db2038 is CRITICAL: CRITICAL: puppet fail [21:21:17] PROBLEM - puppet last run on db1016 is CRITICAL: CRITICAL: puppet fail [21:21:26] PROBLEM - puppet last run on mw2203 is CRITICAL: CRITICAL: puppet fail [21:21:27] it really just reads the logfile [21:21:32] you can kill it and it comes back [21:21:35] PROBLEM - puppet last run on mc1012 is CRITICAL: CRITICAL: puppet fail [21:21:36] PROBLEM - puppet last run on mw1078 is CRITICAL: CRITICAL: puppet fail [21:21:44] on next puppet run [21:21:45] PROBLEM - puppet last run on db2007 is CRITICAL: CRITICAL: puppet fail [21:21:55] mutante: but does it recall all the shit it didnt spool out ? [21:21:55] PROBLEM - puppet last run on mw1179 is CRITICAL: CRITICAL: puppet fail [21:22:02] cuz im tired of it talking already! [21:22:20] either a) don't make it spam all these repeated error or b) run less servers :P [21:22:20] robh: no, shouldnt [21:22:57] RECOVERY - puppet last run on logstash1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:23:17] meh, they are still in error though so im not comfortable killing it anyhow i dont think... [21:23:27] if another opsen disagrees and does it, i wont argue though [21:27:35] RECOVERY - puppet last run on restbase1004 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:38:06] RECOVERY - puppet last run on pc1002 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:38:16] RECOVERY - puppet last run on db2042 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:38:36] RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:38:36] RECOVERY - puppet last run on cp1050 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [21:38:37] RECOVERY - puppet last run on db1019 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:38:42] 6operations, 7user-notice: schedule maintenance for IRC server - https://phabricator.wikimedia.org/T105804#1695274 (10Krenair) You can probably test it in labs... [21:38:47] RECOVERY - puppet last run on es1017 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [21:38:55] RECOVERY - puppet last run on mw1101 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [21:38:56] RECOVERY - puppet last run on elastic1019 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:38:56] RECOVERY - puppet last run on mc1005 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:38:57] RECOVERY - puppet last run on db1020 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [21:39:05] RECOVERY - puppet last run on gadolinium is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [21:39:16] RECOVERY - puppet last run on cp3033 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:39:16] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:39:16] RECOVERY - puppet last run on cp3044 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [21:39:16] RECOVERY - puppet last run on cp3020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:39:16] RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:39:16] RECOVERY - puppet last run on elastic1024 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [21:39:17] RECOVERY - puppet last run on cp3043 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:39:17] RECOVERY - puppet last run on es1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:39:17] RECOVERY - puppet last run on mw2149 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:39:25] RECOVERY - puppet last run on mw1237 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [21:39:26] RECOVERY - puppet last run on db1069 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:39:26] RECOVERY - puppet last run on db1036 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:39:27] RECOVERY - puppet last run on cp1074 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [21:39:35] RECOVERY - puppet last run on mw2178 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:39:35] RECOVERY - puppet last run on wtp2014 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:39:36] RECOVERY - puppet last run on db1048 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:39:36] RECOVERY - puppet last run on db2043 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:39:36] RECOVERY - puppet last run on es2004 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [21:39:36] RECOVERY - puppet last run on conf1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:39:36] RECOVERY - puppet last run on db2048 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [21:39:46] RECOVERY - puppet last run on analytics1046 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:39:47] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:39:56] RECOVERY - puppet last run on cp2020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:39:56] RECOVERY - puppet last run on mw1214 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [21:39:56] RECOVERY - puppet last run on ms-be1016 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:40:05] RECOVERY - puppet last run on cp1051 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:40:06] RECOVERY - puppet last run on lvs1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:40:06] RECOVERY - puppet last run on ms-be1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:40:06] RECOVERY - puppet last run on ms-be2001 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [21:40:07] RECOVERY - puppet last run on wtp1018 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [21:40:15] RECOVERY - puppet last run on cp1069 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:40:16] RECOVERY - puppet last run on mw2025 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [21:40:16] RECOVERY - puppet last run on mw2135 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [21:40:16] RECOVERY - puppet last run on mw1182 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:40:25] RECOVERY - puppet last run on es1018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:40:26] RECOVERY - puppet last run on dbproxy1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:40:27] RECOVERY - puppet last run on mw1159 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [21:40:35] RECOVERY - puppet last run on cp1063 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:40:36] RECOVERY - puppet last run on mw2165 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:40:36] RECOVERY - puppet last run on restbase2005 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:40:36] RECOVERY - puppet last run on mw1127 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:40:37] RECOVERY - puppet last run on wtp2007 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [21:40:46] RECOVERY - puppet last run on mw2147 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [21:40:55] RECOVERY - puppet last run on cp3035 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [21:40:56] RECOVERY - puppet last run on ms-be2012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:41:04] (03PS2) 10BBlack: zero_update: randomize cron every 15 minutes [puppet] - 10https://gerrit.wikimedia.org/r/242888 (https://phabricator.wikimedia.org/T111045) [21:41:05] RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:41:05] RECOVERY - puppet last run on ms-be2005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:41:06] RECOVERY - puppet last run on es1013 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [21:41:06] RECOVERY - puppet last run on mw2042 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [21:41:06] RECOVERY - puppet last run on mx2001 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [21:41:07] RECOVERY - puppet last run on cp4007 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [21:41:15] RECOVERY - puppet last run on analytics1015 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:41:16] RECOVERY - puppet last run on mw1227 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [21:41:16] RECOVERY - puppet last run on mc1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:41:17] RECOVERY - puppet last run on elastic1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:41:25] RECOVERY - puppet last run on mw1221 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [21:41:25] RECOVERY - puppet last run on mw1111 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:41:26] RECOVERY - puppet last run on mw1190 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [21:41:26] RECOVERY - puppet last run on wtp1004 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [21:41:26] RECOVERY - puppet last run on mw2089 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:41:27] RECOVERY - puppet last run on mc1007 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [21:41:35] RECOVERY - puppet last run on berkelium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:41:36] RECOVERY - puppet last run on ganeti2005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:41:37] RECOVERY - puppet last run on ms-be2007 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [21:41:45] RECOVERY - puppet last run on mw1247 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:41:45] RECOVERY - puppet last run on rdb2002 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [21:41:46] RECOVERY - puppet last run on eventlog1001 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:41:46] RECOVERY - puppet last run on cp2007 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [21:41:46] RECOVERY - puppet last run on elastic1026 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [21:41:46] RECOVERY - puppet last run on mc1009 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [21:41:55] RECOVERY - puppet last run on mw2102 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:41:55] (03CR) 10BBlack: [C: 032] zero_update: randomize cron every 15 minutes [puppet] - 10https://gerrit.wikimedia.org/r/242888 (https://phabricator.wikimedia.org/T111045) (owner: 10BBlack) [21:41:56] RECOVERY - puppet last run on mc1004 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [21:41:56] RECOVERY - puppet last run on mw1232 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:41:56] RECOVERY - puppet last run on mw1147 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [21:41:56] RECOVERY - puppet last run on ms-be1018 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:41:56] RECOVERY - puppet last run on wtp1024 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:41:57] RECOVERY - puppet last run on db2069 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:41:57] RECOVERY - puppet last run on cp2018 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [21:41:57] RECOVERY - puppet last run on mw1056 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:41:58] RECOVERY - puppet last run on californium is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [21:41:58] RECOVERY - puppet last run on wtp1015 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:41:59] RECOVERY - puppet last run on mw1081 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:41:59] RECOVERY - puppet last run on analytics1001 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [21:42:06] RECOVERY - puppet last run on mw2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:42:06] RECOVERY - puppet last run on mw1125 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:42:06] RECOVERY - puppet last run on mw1124 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [21:42:06] RECOVERY - puppet last run on mw1196 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:42:15] RECOVERY - puppet last run on mw2179 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:42:15] RECOVERY - puppet last run on mw2041 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [21:42:15] RECOVERY - puppet last run on mw1057 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [21:42:15] RECOVERY - puppet last run on rdb1002 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [21:42:16] RECOVERY - puppet last run on db1001 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [21:42:25] RECOVERY - puppet last run on mw2005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:42:26] RECOVERY - puppet last run on pybal-test2001 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [21:42:26] RECOVERY - puppet last run on mw2194 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:42:27] RECOVERY - puppet last run on cp2025 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [21:42:27] RECOVERY - puppet last run on cp2021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:42:35] RECOVERY - puppet last run on mw2197 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:42:35] RECOVERY - puppet last run on mw1087 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [21:42:36] RECOVERY - puppet last run on cp3034 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:42:36] RECOVERY - puppet last run on acamar is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:42:36] RECOVERY - puppet last run on titanium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:42:36] RECOVERY - puppet last run on mw1146 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [21:42:36] RECOVERY - puppet last run on cp1072 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:42:37] RECOVERY - puppet last run on analytics1033 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:42:37] RECOVERY - puppet last run on mw1115 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [21:42:45] RECOVERY - puppet last run on mw2151 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:42:45] RECOVERY - puppet last run on dbstore1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:42:45] RECOVERY - puppet last run on db2070 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:42:46] RECOVERY - puppet last run on mw1161 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:42:47] RECOVERY - puppet last run on mc2012 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:42:47] RECOVERY - puppet last run on cp1049 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [21:42:55] RECOVERY - puppet last run on ms-be2002 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [21:42:56] RECOVERY - puppet last run on helium is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:42:56] RECOVERY - puppet last run on mw1130 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:42:57] RECOVERY - puppet last run on mw1048 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [21:43:05] RECOVERY - puppet last run on rdb1003 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:43:05] RECOVERY - puppet last run on mw1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:06] RECOVERY - puppet last run on ganeti2001 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:43:06] RECOVERY - puppet last run on mw2189 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:06] RECOVERY - puppet last run on mw2201 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [21:43:07] RECOVERY - puppet last run on aqs1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:07] RECOVERY - puppet last run on mw2139 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:43:07] RECOVERY - puppet last run on dbproxy1002 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [21:43:15] RECOVERY - puppet last run on mc2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:15] RECOVERY - puppet last run on mw1063 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [21:43:15] RECOVERY - puppet last run on mw2103 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:15] RECOVERY - puppet last run on ganeti1004 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [21:43:16] RECOVERY - puppet last run on mw2170 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [21:43:16] RECOVERY - puppet last run on db2046 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:16] RECOVERY - puppet last run on elastic1017 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:43:17] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:43:17] RECOVERY - puppet last run on mw2112 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:43:18] RECOVERY - puppet last run on ms-be2003 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:43:18] RECOVERY - puppet last run on mw2164 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [21:43:19] RECOVERY - puppet last run on mw2099 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:25] RECOVERY - puppet last run on mw1210 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:25] RECOVERY - puppet last run on db1033 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [21:43:26] RECOVERY - puppet last run on es2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:27] RECOVERY - puppet last run on mc2009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:27] RECOVERY - puppet last run on mw1038 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:35] RECOVERY - puppet last run on mw1197 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:43:37] RECOVERY - puppet last run on db1064 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:45] RECOVERY - puppet last run on db1044 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:45] RECOVERY - puppet last run on wtp2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:45] RECOVERY - puppet last run on mw2065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:45] RECOVERY - puppet last run on mw2140 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:46] RECOVERY - puppet last run on wtp2020 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [21:43:47] RECOVERY - puppet last run on mw1033 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [21:43:55] RECOVERY - puppet last run on db2009 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:43:56] RECOVERY - puppet last run on mw2160 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:56] RECOVERY - puppet last run on wtp2005 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:43:56] RECOVERY - puppet last run on analytics1004 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [21:43:56] RECOVERY - puppet last run on logstash1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:56] RECOVERY - puppet last run on db1029 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:56] RECOVERY - puppet last run on mw2183 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:57] RECOVERY - puppet last run on db1022 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [21:43:57] RECOVERY - puppet last run on db1031 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:57] RECOVERY - puppet last run on mw2116 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:43:58] RECOVERY - puppet last run on iodine is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [21:44:05] RECOVERY - puppet last run on mw1041 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:05] RECOVERY - puppet last run on mw1032 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:05] RECOVERY - puppet last run on mw1007 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:44:05] RECOVERY - puppet last run on mw1089 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:05] RECOVERY - puppet last run on mw1106 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:44:06] RECOVERY - puppet last run on mw1225 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [21:44:06] RECOVERY - puppet last run on ms-be1006 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [21:44:06] RECOVERY - puppet last run on elastic1004 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:44:07] RECOVERY - puppet last run on analytics1043 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:16] RECOVERY - puppet last run on cp3021 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:44:16] RECOVERY - puppet last run on mw1140 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [21:44:16] RECOVERY - puppet last run on copper is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [21:44:17] RECOVERY - puppet last run on mw1171 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:17] RECOVERY - puppet last run on mw1188 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:17] RECOVERY - puppet last run on wtp1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:25] RECOVERY - puppet last run on mw2100 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:25] RECOVERY - puppet last run on restbase-test2002 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [21:44:26] RECOVERY - puppet last run on mw1122 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:26] RECOVERY - puppet last run on mw2044 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:26] RECOVERY - puppet last run on potassium is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [21:44:26] RECOVERY - puppet last run on planet1001 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [21:44:26] RECOVERY - puppet last run on db2045 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [21:44:36] RECOVERY - puppet last run on analytics1048 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [21:44:37] RECOVERY - puppet last run on ms-be2006 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [21:44:46] RECOVERY - puppet last run on mw1243 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:46] RECOVERY - puppet last run on mw1254 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [21:44:46] RECOVERY - puppet last run on calcium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:46] RECOVERY - puppet last run on analytics1057 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [21:44:46] RECOVERY - puppet last run on mw2199 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:46] RECOVERY - puppet last run on mw2153 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [21:44:47] RECOVERY - puppet last run on mw2009 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [21:44:47] RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:44:56] RECOVERY - puppet last run on elastic1018 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:44:57] RECOVERY - puppet last run on mw1201 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [21:45:05] RECOVERY - puppet last run on ms-fe2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:06] RECOVERY - puppet last run on mw2064 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:45:06] RECOVERY - puppet last run on ganeti1002 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [21:45:06] RECOVERY - puppet last run on es2008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:06] RECOVERY - puppet last run on cp4002 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [21:45:07] RECOVERY - puppet last run on mw1010 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [21:45:07] RECOVERY - puppet last run on mw1091 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [21:45:13] (03CR) 10Ottomata: Consume EventLogging validation logs from Logstash (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/241984 (https://phabricator.wikimedia.org/T113627) (owner: 10Mforns) [21:45:15] RECOVERY - puppet last run on mw2015 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [21:45:15] RECOVERY - puppet last run on mw1241 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [21:45:15] RECOVERY - puppet last run on mw2161 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:16] RECOVERY - puppet last run on ganeti2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:16] RECOVERY - puppet last run on fluorine is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:16] RECOVERY - puppet last run on wtp2001 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [21:45:16] RECOVERY - puppet last run on achernar is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:45:17] RECOVERY - puppet last run on mw2010 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [21:45:17] RECOVERY - puppet last run on mw1173 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:45:18] RECOVERY - puppet last run on db1066 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [21:45:18] RECOVERY - puppet last run on bast1001 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [21:45:19] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:25] RECOVERY - puppet last run on mw1069 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [21:45:25] RECOVERY - puppet last run on mw2004 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [21:45:25] RECOVERY - puppet last run on mw2080 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:45:26] RECOVERY - puppet last run on restbase2002 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:45:26] RECOVERY - puppet last run on analytics1003 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [21:45:26] RECOVERY - puppet last run on elastic1008 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [21:45:27] RECOVERY - puppet last run on mw1024 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [21:45:35] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [21:45:36] RECOVERY - puppet last run on lvs1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:37] RECOVERY - puppet last run on wtp1001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [21:45:37] RECOVERY - puppet last run on mw2109 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:37] RECOVERY - puppet last run on mw2188 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:45] RECOVERY - puppet last run on mw1027 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:45:45] RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:45] RECOVERY - puppet last run on ms-fe1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:45] RECOVERY - puppet last run on db1050 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:46] RECOVERY - puppet last run on mw2031 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:46] RECOVERY - puppet last run on mw2117 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:45:47] RECOVERY - puppet last run on mw2163 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [21:45:47] RECOVERY - puppet last run on ms-fe1002 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [21:45:49] (03PS1) 10BBlack: fix minutes for zero cron [puppet] - 10https://gerrit.wikimedia.org/r/243031 [21:45:55] RECOVERY - puppet last run on analytics1035 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:55] RECOVERY - puppet last run on mc2016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:56] RECOVERY - puppet last run on lvs3003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:56] RECOVERY - puppet last run on mw1142 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:45:56] RECOVERY - puppet last run on mw1143 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [21:45:57] RECOVERY - puppet last run on mw1107 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [21:46:06] RECOVERY - puppet last run on es2010 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [21:46:07] RECOVERY - puppet last run on mw2083 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [21:46:07] RECOVERY - puppet last run on db2059 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:07] RECOVERY - puppet last run on ms-be2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:12] (03CR) 10BBlack: [C: 032 V: 032] fix minutes for zero cron [puppet] - 10https://gerrit.wikimedia.org/r/243031 (owner: 10BBlack) [21:46:15] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:15] RECOVERY - puppet last run on mw2075 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:16] RECOVERY - puppet last run on mw1066 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:46:16] RECOVERY - puppet last run on analytics1031 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:46:25] RECOVERY - puppet last run on mw1153 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [21:46:26] RECOVERY - puppet last run on mw1068 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:26] RECOVERY - puppet last run on mw1046 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:27] RECOVERY - puppet last run on elastic1027 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:37] RECOVERY - puppet last run on es1014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:38] RECOVERY - puppet last run on labcontrol1001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [21:46:38] RECOVERY - puppet last run on mw2105 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:38] RECOVERY - puppet last run on mw2003 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [21:46:39] RECOVERY - puppet last run on mw1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:45] RECOVERY - puppet last run on etherpad1001 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [21:46:46] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:46] RECOVERY - puppet last run on mw1204 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:46] RECOVERY - puppet last run on lvs1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:46] RECOVERY - puppet last run on db2065 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [21:46:55] RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:55] RECOVERY - puppet last run on mw2087 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:56] RECOVERY - puppet last run on db1021 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:46:56] RECOVERY - puppet last run on mw1150 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:56] RECOVERY - puppet last run on db1042 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [21:46:57] RECOVERY - puppet last run on mw2070 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [21:46:57] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:57] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:46:57] RECOVERY - puppet last run on ms-be3003 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [21:46:58] RECOVERY - puppet last run on mw1131 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [21:47:05] RECOVERY - puppet last run on ms-be1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:05] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [21:47:06] RECOVERY - puppet last run on mw1253 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:06] RECOVERY - puppet last run on db1051 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:06] RECOVERY - puppet last run on mw2123 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [21:47:16] RECOVERY - puppet last run on mw2082 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:16] RECOVERY - puppet last run on mw2212 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:16] RECOVERY - puppet last run on ms-fe2003 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [21:47:16] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:17] RECOVERY - puppet last run on labsdb1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:25] RECOVERY - puppet last run on mw1054 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [21:47:25] RECOVERY - puppet last run on mw2143 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [21:47:25] RECOVERY - puppet last run on mw1104 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [21:47:25] RECOVERY - puppet last run on mw1129 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [21:47:25] RECOVERY - puppet last run on db2047 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:26] RECOVERY - puppet last run on wtp2016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:26] RECOVERY - puppet last run on mw1211 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:47:26] RECOVERY - puppet last run on mw1128 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [21:47:27] RECOVERY - puppet last run on db2054 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:27] RECOVERY - puppet last run on mw2113 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:47:35] RECOVERY - puppet last run on mw2184 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [21:47:36] PROBLEM - puppet last run on cp3042 is CRITICAL: CRITICAL: puppet fail [21:47:36] PROBLEM - puppet last run on cp3030 is CRITICAL: CRITICAL: puppet fail [21:47:36] RECOVERY - puppet last run on cp3047 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:47:36] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:47:36] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: puppet fail [21:47:36] PROBLEM - puppet last run on cp3018 is CRITICAL: CRITICAL: puppet fail [21:47:37] RECOVERY - puppet last run on cp3040 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:47:37] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:38] RECOVERY - puppet last run on mw2114 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:38] RECOVERY - puppet last run on mw2019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:45] PROBLEM - puppet last run on cp1058 is CRITICAL: CRITICAL: puppet fail [21:47:46] RECOVERY - puppet last run on mw1021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:46] RECOVERY - puppet last run on mw1155 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:47] PROBLEM - puppet last run on cp2016 is CRITICAL: CRITICAL: puppet fail [21:47:47] RECOVERY - puppet last run on db1034 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:55] RECOVERY - puppet last run on antimony is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [21:47:56] RECOVERY - puppet last run on mw2096 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:56] RECOVERY - puppet last run on labvirt1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:56] RECOVERY - puppet last run on mw2093 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [21:47:56] RECOVERY - puppet last run on mw2079 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:56] RECOVERY - puppet last run on mw1011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:56] RECOVERY - puppet last run on db1027 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:57] RECOVERY - puppet last run on mw1154 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:47:57] RECOVERY - puppet last run on dataset1001 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:47:58] RECOVERY - puppet last run on db1016 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [21:47:58] RECOVERY - puppet last run on db2038 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:47:59] RECOVERY - puppet last run on mw2090 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [21:48:05] RECOVERY - puppet last run on mw2084 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [21:48:06] RECOVERY - puppet last run on mw1235 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:48:06] RECOVERY - puppet last run on mw1194 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [21:48:06] RECOVERY - puppet last run on dbproxy1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:48:06] RECOVERY - puppet last run on mw2092 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [21:48:07] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:48:15] RECOVERY - puppet last run on wtp2010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:48:15] RECOVERY - puppet last run on mw2196 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:48:16] RECOVERY - puppet last run on mw2134 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:48:16] RECOVERY - puppet last run on mw2047 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [21:48:16] RECOVERY - puppet last run on mw2176 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:48:25] RECOVERY - puppet last run on mw2127 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:48:26] RECOVERY - puppet last run on mw1206 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:48:26] PROBLEM - puppet last run on cp2014 is CRITICAL: CRITICAL: puppet fail [21:48:26] PROBLEM - puppet last run on cp2011 is CRITICAL: CRITICAL: puppet fail [21:48:26] RECOVERY - puppet last run on db2007 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [21:48:26] RECOVERY - puppet last run on db2029 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [21:48:27] PROBLEM - puppet last run on cp1066 is CRITICAL: CRITICAL: puppet fail [21:48:27] RECOVERY - puppet last run on mw1179 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:48:35] RECOVERY - puppet last run on mw2030 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [21:48:36] RECOVERY - puppet last run on mw1213 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:48:36] PROBLEM - puppet last run on cp1071 is CRITICAL: CRITICAL: puppet fail [21:48:45] RECOVERY - puppet last run on mw2049 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:48:45] PROBLEM - puppet last run on cp4015 is CRITICAL: CRITICAL: puppet fail [21:48:46] RECOVERY - puppet last run on mw2131 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:48:46] RECOVERY - puppet last run on mw2055 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [21:48:47] RECOVERY - puppet last run on db1068 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [21:48:55] RECOVERY - puppet last run on mw1230 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [21:48:55] PROBLEM - puppet last run on cp2003 is CRITICAL: CRITICAL: puppet fail [21:48:56] RECOVERY - puppet last run on cp2023 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [21:48:56] RECOVERY - puppet last run on cp2026 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures [21:48:56] PROBLEM - puppet last run on cp2010 is CRITICAL: CRITICAL: puppet fail [21:48:56] RECOVERY - puppet last run on mw2067 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:48:57] RECOVERY - puppet last run on mw2182 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:48:57] RECOVERY - puppet last run on mw2062 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [21:49:06] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:49:15] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: puppet fail [21:49:15] PROBLEM - puppet last run on cp4019 is CRITICAL: CRITICAL: puppet fail [21:49:16] RECOVERY - puppet last run on mw1047 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:49:16] RECOVERY - puppet last run on mw2056 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:49:26] RECOVERY - puppet last run on mw1075 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [21:49:35] (03PS1) 10BBlack: re-enable zerofetcher for cache_text T111045 [puppet] - 10https://gerrit.wikimedia.org/r/243033 [21:49:35] RECOVERY - puppet last run on mw2039 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:49:36] RECOVERY - puppet last run on mw2101 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [21:49:36] RECOVERY - puppet last run on ms-be3002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:49:37] RECOVERY - puppet last run on mw1137 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:49:46] RECOVERY - puppet last run on mw2110 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:49:46] RECOVERY - puppet last run on mw2203 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [21:49:47] RECOVERY - puppet last run on mc1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:49:55] RECOVERY - puppet last run on mw1078 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:49:56] RECOVERY - puppet last run on mw2142 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:50:18] (03CR) 10BBlack: [C: 032 V: 032] re-enable zerofetcher for cache_text T111045 [puppet] - 10https://gerrit.wikimedia.org/r/243033 (owner: 10BBlack) [21:50:46] RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [22:02:36] RECOVERY - puppet last run on cp3030 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [22:02:36] RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [22:03:26] RECOVERY - puppet last run on cp1066 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:04:16] RECOVERY - puppet last run on cp3018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:04:55] !log Running FlowFixLinks.php on all wikis [22:05:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Mr. Obvious [22:05:26] RECOVERY - puppet last run on cp2003 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [22:05:35] RECOVERY - puppet last run on cp2010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:05:46] RECOVERY - puppet last run on cp4019 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [22:06:05] RECOVERY - puppet last run on cp2016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:14:56] RECOVERY - puppet last run on cp2011 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [22:15:47] RECOVERY - puppet last run on cp3042 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [22:16:21] 6operations, 10Analytics, 6Discovery, 10EventBus, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1695470 (10Ottomata) 3NEW [22:16:36] RECOVERY - puppet last run on cp2014 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [22:17:38] 6operations, 10Analytics, 6Discovery, 10EventBus, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1695483 (10Ottomata) Does the description look ok? Feel free to edit and discuss here. [22:18:25] RECOVERY - puppet last run on cp1071 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:18:27] 6operations, 10Analytics, 6Discovery, 10EventBus, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1695486 (10Ottomata) Nuria pointed out to me that (as engineers love to do) we are focusing a lot here on technical architecture, but haven't thought a lot about what use case this MVP w... [22:18:36] RECOVERY - puppet last run on cp4015 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [22:19:06] RECOVERY - puppet last run on cp1058 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:22:04] (03PS1) 10BBlack: move zerofetcher to r::c::base [puppet] - 10https://gerrit.wikimedia.org/r/243038 (https://phabricator.wikimedia.org/T89177) [22:22:51] (03CR) 10BBlack: [C: 032 V: 032] move zerofetcher to r::c::base [puppet] - 10https://gerrit.wikimedia.org/r/243038 (https://phabricator.wikimedia.org/T89177) (owner: 10BBlack) [22:30:30] 6operations: move human users out of UID range for system accounts - https://phabricator.wikimedia.org/T114446#1695535 (10Dzahn) 3NEW [22:41:16] (03PS1) 10Yuvipanda: k8s: Try out the pure iptables proxy [puppet] - 10https://gerrit.wikimedia.org/r/243039 [22:43:20] (03CR) 10Yuvipanda: [C: 032] k8s: Try out the pure iptables proxy [puppet] - 10https://gerrit.wikimedia.org/r/243039 (owner: 10Yuvipanda) [23:00:05] RoanKattouw ostriches rmoen Krenair: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20151001T2300). Please do the needful. [23:07:25] (03PS1) 10Yuvipanda: labs: Failover tools-webproxy to tools-proxy-02 [puppet] - 10https://gerrit.wikimedia.org/r/243047 [23:08:59] (03CR) 10Yuvipanda: [C: 032] labs: Failover tools-webproxy to tools-proxy-02 [puppet] - 10https://gerrit.wikimedia.org/r/243047 (owner: 10Yuvipanda) [23:15:13] yuvipanda|maybe: what you wanted https://gerrit.wikimedia.org/r/#/c/242915/ ..ok. you are busy.. next time [23:15:34] linking that ticket you made [23:16:05] mutante: wrong account [23:16:09] I should kick the other guy out [23:16:50] 6operations, 10Analytics, 6Discovery, 10EventBus, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1695706 (10Ottomata) [23:16:55] (03CR) 10Yuvipanda: [C: 031] "Assuming that every other usage of the redis module has appropriate ferm rules in the role classes..." [puppet] - 10https://gerrit.wikimedia.org/r/242915 (owner: 10Muehlenhoff) [23:16:59] mutante: commented [23:17:23] ah :) ok, cool [23:18:24] 6operations, 10Analytics, 6Discovery, 10EventBus, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1695708 (10GWicke) @ottomata, I would perhaps leave out the section "The MVP might also include". Much of it isn't so minimally viable, IMHO. Re use cases, the following two have been d... [23:18:55] andrewbogott: did we get a new nameserver in the last day or so? [23:20:05] 10Ops-Access-Requests, 6operations: Jkatz can't sign into Grafana with LDAP password - https://phabricator.wikimedia.org/T114300#1695712 (10JKatzWMF) @Dzahn it just worked on the link you provided. Thanks! [23:20:58] 10Ops-Access-Requests, 6operations: Jkatz can't sign into Grafana with LDAP password - https://phabricator.wikimedia.org/T114300#1695715 (10JKatzWMF) ^"Thanks" meaning thanks for your help in solving this, not thanks in that "this solved my problem" :) [23:21:04] 10Ops-Access-Requests, 6operations: Jkatz can't sign into Grafana with LDAP password - https://phabricator.wikimedia.org/T114300#1695716 (10Krenair) But those exact same credentials don't work on grafana? [23:21:18] " every other usage of the redis module " [23:21:25] eh, yea.. but who can tell :) [23:24:37] there it is [23:24:38] 10Ops-Access-Requests, 6operations: Jkatz can't sign into Grafana with LDAP password - https://phabricator.wikimedia.org/T114300#1695723 (10JKatzWMF) @Krenair @Dzahn holy shit...now they do. @ori can confirm that they did not yesterday. Thanks (as in "this solved my problem")! [23:28:00] robh: ^ mystery resolved :) [23:28:30] not "the mystery is resolved" but more "it's resolved but remains a mystery" [23:28:37] 6operations, 10Analytics, 6Discovery, 10EventBus, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1695732 (10GWicke) [23:28:40] ori too ^ :p [23:28:45] if it only happens once it didnt happen. [23:29:00] 6operations, 10Analytics, 6Discovery, 10EventBus, and 6 others: EventBus MVP - https://phabricator.wikimedia.org/T114443#1695470 (10GWicke) I have now integrated some of those changes into the description. [23:29:04] the issue is statistically irrelevant (now that it is solved ;) [23:29:21] rephrase, if it only happened (paste tense) once, it didnt happen [23:31:41] 10Ops-Access-Requests, 6operations: Jkatz can't sign into Grafana with LDAP password - https://phabricator.wikimedia.org/T114300#1695744 (10Dzahn) 5Open>3Resolved a:3Dzahn @JKatzWMF glad it works :) just a bit mysterious. i can confirm the failed attempts were in log yesterday with both user names. [23:50:56] (03CR) 10Ottomata: [C: 031] Add ferm rules for Spark [puppet] - 10https://gerrit.wikimedia.org/r/240341 (https://phabricator.wikimedia.org/T83597) (owner: 10Muehlenhoff)