[00:31:16] <icinga-wm>	 PROBLEM - High load average on labstore1002 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [24.0]
[00:33:16] <icinga-wm>	 RECOVERY - High load average on labstore1002 is OK: OK: Less than 50.00% above the threshold [16.0]
[01:07:56] <icinga-wm>	 PROBLEM - HHVM rendering on mw1065 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:08:46] <icinga-wm>	 PROBLEM - Apache HTTP on mw1065 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:09:55] <icinga-wm>	 PROBLEM - DPKG on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:10:26] <icinga-wm>	 PROBLEM - SSH on mw1065 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:10:26] <icinga-wm>	 PROBLEM - HHVM processes on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:10:37] <icinga-wm>	 PROBLEM - configured eth on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:10:46] <icinga-wm>	 PROBLEM - nutcracker process on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:10:46] <icinga-wm>	 PROBLEM - puppet last run on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:10:56] <icinga-wm>	 PROBLEM - nutcracker port on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:11:05] <icinga-wm>	 PROBLEM - dhclient process on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:11:15] <icinga-wm>	 PROBLEM - Disk space on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:11:28] <icinga-wm>	 PROBLEM - RAID on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:11:36] <icinga-wm>	 PROBLEM - salt-minion processes on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[01:35:55] <icinga-wm>	 RECOVERY - SSH on mw1065 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0)
[01:35:56] <icinga-wm>	 RECOVERY - HHVM processes on mw1065 is OK: PROCS OK: 6 processes with command name hhvm
[01:36:07] <icinga-wm>	 RECOVERY - configured eth on mw1065 is OK: OK - interfaces up
[01:36:16] <icinga-wm>	 RECOVERY - nutcracker process on mw1065 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker
[01:36:16] <icinga-wm>	 RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 56 minutes ago with 0 failures
[01:36:26] <icinga-wm>	 RECOVERY - nutcracker port on mw1065 is OK: TCP OK - 0.000 second response time on port 11212
[01:36:26] <icinga-wm>	 RECOVERY - dhclient process on mw1065 is OK: PROCS OK: 0 processes with command name dhclient
[01:36:46] <icinga-wm>	 RECOVERY - Disk space on mw1065 is OK: DISK OK
[01:36:56] <icinga-wm>	 RECOVERY - RAID on mw1065 is OK: OK: no RAID installed
[01:37:05] <icinga-wm>	 RECOVERY - salt-minion processes on mw1065 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion
[01:37:26] <icinga-wm>	 RECOVERY - DPKG on mw1065 is OK: All packages OK
[01:52:35] <icinga-wm>	 PROBLEM - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out
[01:54:27] <icinga-wm>	 RECOVERY - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 10848 bytes in 0.114 second response time
[02:03:55] <icinga-wm>	 PROBLEM - IPsec on cp1046 is CRITICAL: Strongswan CRITICAL - ok: 23 not-conn: cp3018_v6
[02:05:36] <icinga-wm>	 PROBLEM - Disk space on labstore1002 is CRITICAL: DISK CRITICAL - /run/lock/storage-replicate-labstore-tools/snapshot is not accessible: Permission denied
[02:09:45] <icinga-wm>	 RECOVERY - IPsec on cp1046 is OK: Strongswan OK - 24 ESP OK
[02:14:25] <icinga-wm>	 PROBLEM - IPsec on cp1047 is CRITICAL: Strongswan CRITICAL - ok: 22 connecting: (unnamed) not-conn: cp2003_v6, cp3018_v6
[02:16:16] <icinga-wm>	 RECOVERY - IPsec on cp1047 is OK: Strongswan OK - 24 ESP OK
[02:20:09] <logmsgbot>	 !log l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 05m 36s)
[02:20:21] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:23:07] <logmsgbot>	 !log l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-08-30 02:23:07+00:00
[02:23:15] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:43:06] <icinga-wm>	 PROBLEM - IPsec on cp1073 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp3036_v6, cp3045_v6
[02:46:56] <icinga-wm>	 RECOVERY - IPsec on cp1073 is OK: Strongswan OK - 60 ESP OK
[03:23:05] <icinga-wm>	 PROBLEM - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out
[03:24:55] <icinga-wm>	 RECOVERY - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 10848 bytes in 0.149 second response time
[03:35:26] <icinga-wm>	 PROBLEM - puppet last run on mw1248 is CRITICAL: CRITICAL: Puppet has 1 failures
[03:48:27] <icinga-wm>	 PROBLEM - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out
[03:50:17] <icinga-wm>	 RECOVERY - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 301 TLS Redirect - 505 bytes in 0.012 second response time
[03:50:17] <bblack>	 getting tired of that stupid mobile ipv6 alert paging :P
[03:51:07] <Krenair>	 yeah, why does it keep being that one?
[03:51:11] <Krenair>	 does it work differently to the others?
[03:52:25] <bblack>	 I don't think so.  the first several times it happened, weeks ago, I dug around a bit, but everything seemed to point to it being a false alarm of some kind
[03:52:41] <bblack>	 I'm up anyways, I may as well try to dig more
[03:53:57] <bblack>	 it intermittently fails for other eqiad ipv6 too, e.g.
[03:53:57] <bblack>	 [1440906495] SERVICE ALERT: upload-lb.eqiad.wikimedia.org_ipv6;LVS HTTP IPv6;CRITICAL;SOFT;1;Connection timed out
[03:54:01] <bblack>	 [1440906605] SERVICE ALERT: upload-lb.eqiad.wikimedia.org_ipv6;LVS HTTP IPv6;OK;SOFT;2;HTTP OK: HTTP/1.1 301 TLS Redirect - 455 bytes in 0.001 second response time
[03:54:23] <bblack>	 but mobile happens more frequently than the others, and thus more frequently makes it to 3/3 random fails in a row and hits irc / paging
[03:59:09] <bblack>	 my leading random theories are:
[03:59:55] <bblack>	 1) Something's actually wrong with the ipv6 addresses on eth0 of the cache machines sproadically (but seems unlikely, or we'd see other complaints or a pattern of v6 external monitor failures?)
[04:00:08] <bblack>	 2) Something's wrong with ipv6 on neon (much more likely)
[04:00:23] <bblack>	 3) Something's wrong with ipv6 somewhere else in eqiad's network (I have no idea what)
[04:02:57] <icinga-wm>	 RECOVERY - puppet last run on mw1248 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[04:03:01] <bblack>	 it's also really strange that we have these issues with the neon->cache monitoring of v6 within eqiad, yet we don't see these patterns for the same to the remote caches' v6 in ulsfo and esams
[04:04:09] <bblack>	 you'd think most things that could go wrong for this inside eqiad would show up even worse with the added remote latency
[04:08:36] <bblack>	 probably the most-obvious thing wrong with neon (icinga host) is that it does frequently spike to 100% cpu utilization for brief windows
[04:09:23] <bblack>	 the graphs in ganglia show it steadily at 80% utilization pretty much, but e.g. vmstat on 1s intervals shows that in the small view, it's often 0% idle.
[04:11:24] <bblack>	 check_sslxNN is of course the leading source of tying up CPU
[04:11:46] <bblack>	 (a lot of work already went into making that check more efficient than it might otherwise have been, though)
[04:16:17] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Aug 30 04:16:17 UTC 2015 (duration 16m 16s)
[04:16:25] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[04:16:30] <bblack>	 check_graphite and check_elasticsearch and pretty heavy too
[04:16:53] <bblack>	 I tried commenting out most of the domains check_sslxNN checks, and that dropped it down the list a bunch, but the CPUs still peg fairly frequently even with that turned down
[04:17:47] <bblack>	 the icinga process itself is single-threaded and also routinely locks up a full CPU core, so that can't be good either.
[04:18:41] <bblack>	 all of this tends to make me think that (for whatever unfathomable but logical reason) the local v6 checks are just the most-sensitive victims to all that mess locally on neon
[04:19:45] <icinga-wm>	 PROBLEM - Disk space on cp3022 is CRITICAL: DISK CRITICAL - free space: / 349 MB (3% inode=89%)
[04:22:10] <bblack>	 ^ fixed
[04:22:13] <bblack>	 spam of Aug 30 04:22:01 cp3022 varnishkafka[8527]: %3|1440908521.492|GETADDR|varnishkafka#producer-0| analytics1022.eqiad.wmnet:9092/22: Failed to resolve 'analytics1022.eqiad.wmnet:9092': Name or service not known
[04:22:17] <bblack>	 in syslog
[04:22:50] <bblack>	 probably because those are former bits caches that no longer get any puppetized cache config, so are missing newer updates
[04:23:01] <bblack>	 I should probably go shut off all the related daemons on them for now, until they get reuse
[04:23:26] <icinga-wm>	 RECOVERY - Disk space on cp3022 is OK: DISK OK
[04:28:57] <icinga-wm>	 PROBLEM - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out
[04:30:46] <icinga-wm>	 RECOVERY - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 301 TLS Redirect - 505 bytes in 0.007 second response time
[05:09:08] <tgr>	 bblack: I'm trying to figure out how to test the InstantCommons patches, but I'm not familiar with SSL certs 
[05:09:20] <tgr>	 how do I produce a non-self-signed cert?
[05:16:15] <bblack>	 by sending lots of money to the extortionists that sign certs :)
[05:16:51] <bblack>	 can you just test against live production, like instantcommons users would be doing anyways?
[05:18:09] <tgr>	 I want to test against an invalid cert which is not self-signed (as those seem to be special-cased in PHP)
[05:18:29] <bblack>	 what makes that cert "invalid" in this test case?
[05:18:31] <tgr>	 seb25 tested pretty rigorously that the patches work with Commons
[05:18:51] <bblack>	 (hostname mismatch with SAN/Subject?)
[05:18:52] <tgr>	 what's missing I think is to make sure they don't silently work with invalid certs
[05:19:08] <tgr>	 at least not in any case where that did not happen before
[05:19:14] <bblack>	 well, again, I don't know how to interpret that question without knowing in what sense "invalid" is meant
[05:19:25] <tgr>	 signed by a non-trusted root CA
[05:19:32] <bblack>	 ah!
[05:19:46] <bblack>	 so you want it to be signed by *a* CA, just not one that's in the trusted CA store
[05:19:54] <tgr>	 it would be nice to test the SAN case as well, but I would need a valid cert for that...
[05:19:57] <tgr>	 yes
[05:21:46] <bblack>	 basically it just requires a series of openssl invocations, but google results give the best answers:
[05:21:49] <bblack>	 https://www.phildev.net/ssl/creating_ca.html
[05:22:07] <bblack>	 or: http://pages.cs.wisc.edu/~zmiller/ca-howto/
[05:23:53] <tgr>	 thanks!
[05:24:13] <tgr>	 I tried googling for it but only found instructions for self-signing
[05:24:24] <tgr>	 probably did not know the right keyword
[05:25:42] <wikibugs>	 6operations: icinga (neon) is out of CPU headroom - https://phabricator.wikimedia.org/T110822#1587280 (10BBlack) 3NEW
[06:25:46] <icinga-wm>	 RECOVERY - Disk space on cp3020 is OK: DISK OK
[06:31:46] <icinga-wm>	 PROBLEM - puppet last run on analytics1047 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:25] <icinga-wm>	 PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:26] <icinga-wm>	 PROBLEM - puppet last run on labnet1002 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:45] <icinga-wm>	 PROBLEM - puppet last run on mw2023 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:56:55] <icinga-wm>	 RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:56:56] <icinga-wm>	 RECOVERY - puppet last run on labnet1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:16] <icinga-wm>	 RECOVERY - puppet last run on mw2023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:16] <icinga-wm>	 RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:15:16] <icinga-wm>	 RECOVERY - Disk space on labstore1002 is OK: DISK OK
[07:26:47] <icinga-wm>	 PROBLEM - OCG health on ocg1002 is CRITICAL: CRITICAL: ocg_job_status 569112 msg: ocg_render_job_queue 3228 msg (=3000 critical)
[07:26:56] <icinga-wm>	 PROBLEM - OCG health on ocg1001 is CRITICAL: CRITICAL: ocg_job_status 569698 msg: ocg_render_job_queue 3650 msg (=3000 critical)
[07:27:07] <icinga-wm>	 PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: ocg_job_status 569837 msg: ocg_render_job_queue 3725 msg (=3000 critical)
[09:37:07] <icinga-wm>	 PROBLEM - High load average on labstore1002 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [24.0]
[10:07:33] <YuviPanda>	 !log rebooted labstore1002
[10:07:34] <icinga-wm>	 PROBLEM - Host labstore1002 is DOWN: PING CRITICAL - Packet loss = 100%
[10:07:34] <icinga-wm>	 PROBLEM - Outgoing network saturation on labstore1003 is CRITICAL: CRITICAL: 10.34% of data above the critical threshold [100000000.0]
[10:07:34] <icinga-wm>	 RECOVERY - Outgoing network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0]
[10:07:34] <icinga-wm>	 RECOVERY - Host labstore1002 is UP: PING OK - Packet loss = 0%, RTA = 3.47 ms
[10:07:34] <YuviPanda>	 !log run start-nfs in labstore1002
[10:10:07] <icinga-wm>	 PROBLEM - RAID on labstore1002 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline)
[10:11:25] <icinga-wm>	 RECOVERY - High load average on labstore1002 is OK: OK: Less than 50.00% above the threshold [16.0]
[10:13:46] <icinga-wm>	 PROBLEM - Outgoing network saturation on labstore1003 is CRITICAL: CRITICAL: 18.52% of data above the critical threshold [100000000.0]
[10:29:15] <icinga-wm>	 RECOVERY - OCG health on ocg1002 is OK: OK: ocg_job_status 641129 msg: ocg_render_job_queue 41 msg
[10:29:25] <icinga-wm>	 RECOVERY - OCG health on ocg1001 is OK: OK: ocg_job_status 641152 msg: ocg_render_job_queue 0 msg
[10:29:26] <icinga-wm>	 RECOVERY - OCG health on ocg1003 is OK: OK: ocg_job_status 641154 msg: ocg_render_job_queue 0 msg
[10:34:10] <YuviPanda>	 !log disabled backups on labstore1002 to prevent overwriting of good backups on 2001
[10:34:17] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[10:47:15] <icinga-wm>	 RECOVERY - Outgoing network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0]
[12:53:54] <YuviPanda>	 !log shut down labstore1002, going to powercycle from mgmt
[12:53:54] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:53:54] <icinga-wm>	 PROBLEM - Last backup of the maps filesystem on labstore1002 is CRITICAL: Timeout while attempting connection
[12:53:54] <icinga-wm>	 PROBLEM - Host labstore1002 is DOWN: PING CRITICAL - Packet loss = 100%
[12:53:55] <YuviPanda>	 !log powered labstore1002 back up
[12:53:55] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:53:59] <icinga-wm>	 RECOVERY - Host labstore1002 is UP: PING WARNING - Packet loss = 73%, RTA = 1.58 ms
[12:53:59] <icinga-wm>	 RECOVERY - RAID on labstore1002 is OK: OK: optimal, 72 logical, 72 physical
[12:54:01] <icinga-wm>	 PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit is inactive
[12:54:01] <icinga-wm>	 PROBLEM - Ensure mysql credential creation for tools users is running on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit is inactive
[12:54:04] <godog>	 !log trying to manually assemble missing raid on labstore1002 with mdadm --assemble /dev/md/slice51 --uuid 0747643d:b89b36ff:57156095:c33694fc --verbose
[12:54:04] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:54:06] <godog>	 !log also disabled puppet on labstore1002 while investigating
[12:54:06] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:54:06] <godog>	 !log lvchange -ay labstore/tools on labstore1002
[12:54:06] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:54:11] <godog>	 !log start-nfs on labstore1002
[12:54:11] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[12:54:11] <icinga-wm>	 RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is OK: OK - nfs-exports is active
[12:58:22] <godog>	 !log lvchange -ay labstore/others on labstore1002
[12:58:29] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[13:25:03] <wikibugs>	 6operations, 6Labs: labstore1002 not mounting all LVs after reboot - https://phabricator.wikimedia.org/T110832#1587760 (10fgiunchedi) assuming the missing drives are on purpose, activating the lv worked with `lvchange -ay labstore/others` and `lvchange -ay labstore/tools` (and nuking some 'others' snapshots so...
[13:29:35] <grrrit-wm>	 (03PS2) 10ArielGlenn: Add link to developer app guidelines from dumps pages footer [puppet] - 10https://gerrit.wikimedia.org/r/234685 (https://phabricator.wikimedia.org/T110742) (owner: 10Alex Monk)
[13:30:17] <wikibugs>	 6operations, 6Labs: labstore1002 not mounting all LVs after reboot - https://phabricator.wikimedia.org/T110832#1587764 (10Aklapper) p:5Triage>3Unbreak!
[13:30:35] <grrrit-wm>	 (03CR) 10ArielGlenn: [C: 032] Add link to developer app guidelines from dumps pages footer [puppet] - 10https://gerrit.wikimedia.org/r/234685 (https://phabricator.wikimedia.org/T110742) (owner: 10Alex Monk)
[15:30:45] <icinga-wm>	 PROBLEM - wikidata.org dispatch lag is higher than 300s on wikidata is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1424 bytes in 0.145 second response time
[15:33:00] <wikibugs>	 6operations, 10Datasets-General-or-Unknown, 5Patch-For-Review: Add App Guidelines on Dumps Page - https://phabricator.wikimedia.org/T110742#1587830 (10Krenair) >>! In T110742#1585863, @Krenair wrote: > perhaps something at https://dumps.wikimedia.org/legal.html too?  Main suggestion is done, but this is not...
[15:54:46] <icinga-wm>	 RECOVERY - wikidata.org dispatch lag is higher than 300s on wikidata is OK: HTTP OK: HTTP/1.1 200 OK - 1430 bytes in 0.221 second response time
[16:19:13] <wikibugs>	 6operations, 6Services, 10Traffic: Define a standardized config mechanism for exposing services through varnish - https://phabricator.wikimedia.org/T110717#1587874 (10mobrovac) /me likes this direction. From the looks of it, ideally we should be able to integrate something like that *at least* in the `servic...
[16:34:48] <wikibugs>	 6operations, 6Labs: labstore1002 not mounting all LVs after reboot - https://phabricator.wikimedia.org/T110832#1587886 (10jeremyb-phone)
[18:20:01] <aharoni>	 Hallo.
[18:20:16] <aharoni>	 Is 1.26wmf19 supposed to be still deployed anywhere?
[18:22:04] <aharoni>	 Hmm.
[18:22:09] <aharoni>	 I can't reproduce this any longer,
[18:22:26] <aharoni>	 but a few hours ago, I was testing something with ContentTranslation in production in the French Wikipedia,
[18:23:13] <aharoni>	 and in the Chrome JS debugger I noticed that some JS files are loaded from static/1.26wmf19
[18:23:28] <aharoni>	 now it's all static/1.26wmf20
[18:26:56] <wikibugs>	 6operations, 6Phabricator, 7Database: Attempt to connect to phuser@m3-master.eqiad.wmnet failed with error #1040: Too many connections. - https://phabricator.wikimedia.org/T109964#1587940 (10Aklapper) 5duplicate>3Open
[18:27:28] <grrrit-wm>	 (03PS1) 10Ori.livneh: dotfiles: add script for guesstimating gzipped log file's line count [puppet] - 10https://gerrit.wikimedia.org/r/234855 
[18:27:47] <grrrit-wm>	 (03PS2) 10Ori.livneh: harden ssh-agent-proxy [puppet] - 10https://gerrit.wikimedia.org/r/234700 
[18:27:56] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032 V: 032] harden ssh-agent-proxy [puppet] - 10https://gerrit.wikimedia.org/r/234700 (owner: 10Ori.livneh)
[18:28:20] <grrrit-wm>	 (03PS2) 10Ori.livneh: dotfiles: add script for guesstimating gzipped log file's line count [puppet] - 10https://gerrit.wikimedia.org/r/234855 
[18:28:29] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032 V: 032] dotfiles: add script for guesstimating gzipped log file's line count [puppet] - 10https://gerrit.wikimedia.org/r/234855 (owner: 10Ori.livneh)
[18:32:09] <wikibugs>	 6operations, 6Phabricator, 7Database: Attempt to connect to phuser@m3-master.eqiad.wmnet failed with error #1040: Too many connections. - https://phabricator.wikimedia.org/T109964#1587941 (10JohnLewis)
[18:32:14] <wikibugs>	 6operations, 6Phabricator, 7Database, 5Patch-For-Review: Phabricator creates MySQL connection spikes - https://phabricator.wikimedia.org/T109279#1587942 (10JohnLewis)
[18:32:55] <icinga-wm>	 RECOVERY - Keyholder SSH agent on tin is OK: OK: Keyholder is armed with all configured keys.
[19:10:56] <icinga-wm>	 PROBLEM - Keyholder SSH agent on tin is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[19:15:36] <grrrit-wm>	 (03PS1) 10Mjbmr: Add new user groups for azbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234910 (https://phabricator.wikimedia.org/T109755) 
[21:14:39] <wikibugs>	 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown: Upgrade Ghostscript to 9.15 or later - https://phabricator.wikimedia.org/T110849#1588033 (10matmarex) 3NEW
[21:16:48] <wikibugs>	 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown: Upgrade Ghostscript to 9.15 or later - https://phabricator.wikimedia.org/T110849#1588033 (10matmarex)
[21:22:29] <wikibugs>	 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown, 6Wikisource: Upgrade Ghostscript to 9.15 or later - https://phabricator.wikimedia.org/T110849#1588051 (10555)
[21:30:47] <wikibugs>	 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown, 6Wikisource: Upgrade Ghostscript to 9.15 or later - https://phabricator.wikimedia.org/T110849#1588059 (10ori) [[ https://blueprints.launchpad.net/~anton+/+archive/ubuntu/photo-video-apps/+sourcepub/4833728/+listing-archive-extra | Doable ]], but not...
[21:32:41] <wikibugs>	 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown, 6Wikisource: Upgrade Ghostscript to 9.15 or later - https://phabricator.wikimedia.org/T110849#1588060 (10matmarex) No idea, but it seems to affect specific files only (never seen it before this bug report). I submitted a patch to T110821 that will a...
[21:44:08] <grrrit-wm>	 (03PS1) 10Nemo bis: [Italian Planet] Update Wikimedia Italia feeds [puppet] - 10https://gerrit.wikimedia.org/r/234921 
[21:49:18] <wikibugs>	 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown, 6Wikisource: Upgrade Ghostscript to 9.15 or later - https://phabricator.wikimedia.org/T110849#1588033 (10matmarex)
[22:05:39] <wikibugs>	 6operations, 10MediaWiki-extensions-PdfHandler, 6Multimedia: Error creating PDF on Commons: "convert: no decode delegate for this image format" (fixed in GS 9.07) - https://phabricator.wikimedia.org/T50007#1588117 (10matmarex) All of the linked files except for the last one thumbnail correctly for me. We're...
[22:05:45] <wikibugs>	 6operations, 10MediaWiki-extensions-PdfHandler, 6Multimedia: Error creating PDF on Commons: "convert: no decode delegate for this image format" (fixed in GS 9.07) - https://phabricator.wikimedia.org/T50007#1588120 (10matmarex) 5Open>3Resolved
[22:11:22] <wikibugs>	 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown, 6Wikisource: Upgrade Ghostscript to 9.15 or later - https://phabricator.wikimedia.org/T110849#1588126 (10matmarex)
[22:30:36] <icinga-wm>	 PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: puppet fail
[22:56:26] <icinga-wm>	 RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[22:59:51] <grrrit-wm>	 (03PS2) 10Alex Monk: Fix wikitech beacon 204 [puppet] - 10https://gerrit.wikimedia.org/r/234703 (https://phabricator.wikimedia.org/T104359) 
[23:31:26] <icinga-wm>	 PROBLEM - IPsec on cp1073 is CRITICAL: Strongswan CRITICAL - ok: 59 connecting: (unnamed) not-conn: cp4007_v6
[23:33:26] <icinga-wm>	 RECOVERY - IPsec on cp1073 is OK: Strongswan OK - 60 ESP OK