[00:31:16] PROBLEM - High load average on labstore1002 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [24.0] [00:33:16] RECOVERY - High load average on labstore1002 is OK: OK: Less than 50.00% above the threshold [16.0] [01:07:56] PROBLEM - HHVM rendering on mw1065 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:08:46] PROBLEM - Apache HTTP on mw1065 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:09:55] PROBLEM - DPKG on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:10:26] PROBLEM - SSH on mw1065 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:10:26] PROBLEM - HHVM processes on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:10:37] PROBLEM - configured eth on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:10:46] PROBLEM - nutcracker process on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:10:46] PROBLEM - puppet last run on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:10:56] PROBLEM - nutcracker port on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:11:05] PROBLEM - dhclient process on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:11:15] PROBLEM - Disk space on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:11:28] PROBLEM - RAID on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:11:36] PROBLEM - salt-minion processes on mw1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:35:55] RECOVERY - SSH on mw1065 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [01:35:56] RECOVERY - HHVM processes on mw1065 is OK: PROCS OK: 6 processes with command name hhvm [01:36:07] RECOVERY - configured eth on mw1065 is OK: OK - interfaces up [01:36:16] RECOVERY - nutcracker process on mw1065 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [01:36:16] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 56 minutes ago with 0 failures [01:36:26] RECOVERY - nutcracker port on mw1065 is OK: TCP OK - 0.000 second response time on port 11212 [01:36:26] RECOVERY - dhclient process on mw1065 is OK: PROCS OK: 0 processes with command name dhclient [01:36:46] RECOVERY - Disk space on mw1065 is OK: DISK OK [01:36:56] RECOVERY - RAID on mw1065 is OK: OK: no RAID installed [01:37:05] RECOVERY - salt-minion processes on mw1065 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [01:37:26] RECOVERY - DPKG on mw1065 is OK: All packages OK [01:52:35] PROBLEM - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [01:54:27] RECOVERY - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 10848 bytes in 0.114 second response time [02:03:55] PROBLEM - IPsec on cp1046 is CRITICAL: Strongswan CRITICAL - ok: 23 not-conn: cp3018_v6 [02:05:36] PROBLEM - Disk space on labstore1002 is CRITICAL: DISK CRITICAL - /run/lock/storage-replicate-labstore-tools/snapshot is not accessible: Permission denied [02:09:45] RECOVERY - IPsec on cp1046 is OK: Strongswan OK - 24 ESP OK [02:14:25] PROBLEM - IPsec on cp1047 is CRITICAL: Strongswan CRITICAL - ok: 22 connecting: (unnamed) not-conn: cp2003_v6, cp3018_v6 [02:16:16] RECOVERY - IPsec on cp1047 is OK: Strongswan OK - 24 ESP OK [02:20:09] !log l10nupdate@tin Synchronized php-1.26wmf20/cache/l10n: l10nupdate for 1.26wmf20 (duration: 05m 36s) [02:20:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:23:07] !log l10nupdate@tin LocalisationUpdate completed (1.26wmf20) at 2015-08-30 02:23:07+00:00 [02:23:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:43:06] PROBLEM - IPsec on cp1073 is CRITICAL: Strongswan CRITICAL - ok: 58 not-conn: cp3036_v6, cp3045_v6 [02:46:56] RECOVERY - IPsec on cp1073 is OK: Strongswan OK - 60 ESP OK [03:23:05] PROBLEM - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [03:24:55] RECOVERY - LVS HTTPS IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 10848 bytes in 0.149 second response time [03:35:26] PROBLEM - puppet last run on mw1248 is CRITICAL: CRITICAL: Puppet has 1 failures [03:48:27] PROBLEM - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [03:50:17] RECOVERY - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 301 TLS Redirect - 505 bytes in 0.012 second response time [03:50:17] getting tired of that stupid mobile ipv6 alert paging :P [03:51:07] yeah, why does it keep being that one? [03:51:11] does it work differently to the others? [03:52:25] I don't think so. the first several times it happened, weeks ago, I dug around a bit, but everything seemed to point to it being a false alarm of some kind [03:52:41] I'm up anyways, I may as well try to dig more [03:53:57] it intermittently fails for other eqiad ipv6 too, e.g. [03:53:57] [1440906495] SERVICE ALERT: upload-lb.eqiad.wikimedia.org_ipv6;LVS HTTP IPv6;CRITICAL;SOFT;1;Connection timed out [03:54:01] [1440906605] SERVICE ALERT: upload-lb.eqiad.wikimedia.org_ipv6;LVS HTTP IPv6;OK;SOFT;2;HTTP OK: HTTP/1.1 301 TLS Redirect - 455 bytes in 0.001 second response time [03:54:23] but mobile happens more frequently than the others, and thus more frequently makes it to 3/3 random fails in a row and hits irc / paging [03:59:09] my leading random theories are: [03:59:55] 1) Something's actually wrong with the ipv6 addresses on eth0 of the cache machines sproadically (but seems unlikely, or we'd see other complaints or a pattern of v6 external monitor failures?) [04:00:08] 2) Something's wrong with ipv6 on neon (much more likely) [04:00:23] 3) Something's wrong with ipv6 somewhere else in eqiad's network (I have no idea what) [04:02:57] RECOVERY - puppet last run on mw1248 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [04:03:01] it's also really strange that we have these issues with the neon->cache monitoring of v6 within eqiad, yet we don't see these patterns for the same to the remote caches' v6 in ulsfo and esams [04:04:09] you'd think most things that could go wrong for this inside eqiad would show up even worse with the added remote latency [04:08:36] probably the most-obvious thing wrong with neon (icinga host) is that it does frequently spike to 100% cpu utilization for brief windows [04:09:23] the graphs in ganglia show it steadily at 80% utilization pretty much, but e.g. vmstat on 1s intervals shows that in the small view, it's often 0% idle. [04:11:24] check_sslxNN is of course the leading source of tying up CPU [04:11:46] (a lot of work already went into making that check more efficient than it might otherwise have been, though) [04:16:17] !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Aug 30 04:16:17 UTC 2015 (duration 16m 16s) [04:16:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [04:16:30] check_graphite and check_elasticsearch and pretty heavy too [04:16:53] I tried commenting out most of the domains check_sslxNN checks, and that dropped it down the list a bunch, but the CPUs still peg fairly frequently even with that turned down [04:17:47] the icinga process itself is single-threaded and also routinely locks up a full CPU core, so that can't be good either. [04:18:41] all of this tends to make me think that (for whatever unfathomable but logical reason) the local v6 checks are just the most-sensitive victims to all that mess locally on neon [04:19:45] PROBLEM - Disk space on cp3022 is CRITICAL: DISK CRITICAL - free space: / 349 MB (3% inode=89%) [04:22:10] ^ fixed [04:22:13] spam of Aug 30 04:22:01 cp3022 varnishkafka[8527]: %3|1440908521.492|GETADDR|varnishkafka#producer-0| analytics1022.eqiad.wmnet:9092/22: Failed to resolve 'analytics1022.eqiad.wmnet:9092': Name or service not known [04:22:17] in syslog [04:22:50] probably because those are former bits caches that no longer get any puppetized cache config, so are missing newer updates [04:23:01] I should probably go shut off all the related daemons on them for now, until they get reuse [04:23:26] RECOVERY - Disk space on cp3022 is OK: DISK OK [04:28:57] PROBLEM - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [04:30:46] RECOVERY - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 301 TLS Redirect - 505 bytes in 0.007 second response time [05:09:08] bblack: I'm trying to figure out how to test the InstantCommons patches, but I'm not familiar with SSL certs [05:09:20] how do I produce a non-self-signed cert? [05:16:15] by sending lots of money to the extortionists that sign certs :) [05:16:51] can you just test against live production, like instantcommons users would be doing anyways? [05:18:09] I want to test against an invalid cert which is not self-signed (as those seem to be special-cased in PHP) [05:18:29] what makes that cert "invalid" in this test case? [05:18:31] seb25 tested pretty rigorously that the patches work with Commons [05:18:51] (hostname mismatch with SAN/Subject?) [05:18:52] what's missing I think is to make sure they don't silently work with invalid certs [05:19:08] at least not in any case where that did not happen before [05:19:14] well, again, I don't know how to interpret that question without knowing in what sense "invalid" is meant [05:19:25] signed by a non-trusted root CA [05:19:32] ah! [05:19:46] so you want it to be signed by *a* CA, just not one that's in the trusted CA store [05:19:54] it would be nice to test the SAN case as well, but I would need a valid cert for that... [05:19:57] yes [05:21:46] basically it just requires a series of openssl invocations, but google results give the best answers: [05:21:49] https://www.phildev.net/ssl/creating_ca.html [05:22:07] or: http://pages.cs.wisc.edu/~zmiller/ca-howto/ [05:23:53] thanks! [05:24:13] I tried googling for it but only found instructions for self-signing [05:24:24] probably did not know the right keyword [05:25:42] 6operations: icinga (neon) is out of CPU headroom - https://phabricator.wikimedia.org/T110822#1587280 (10BBlack) 3NEW [06:25:46] RECOVERY - Disk space on cp3020 is OK: DISK OK [06:31:46] PROBLEM - puppet last run on analytics1047 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:25] PROBLEM - puppet last run on db1015 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:26] PROBLEM - puppet last run on labnet1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:45] PROBLEM - puppet last run on mw2023 is CRITICAL: CRITICAL: Puppet has 1 failures [06:56:55] RECOVERY - puppet last run on db1015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:56] RECOVERY - puppet last run on labnet1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:16] RECOVERY - puppet last run on mw2023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:16] RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:15:16] RECOVERY - Disk space on labstore1002 is OK: DISK OK [07:26:47] PROBLEM - OCG health on ocg1002 is CRITICAL: CRITICAL: ocg_job_status 569112 msg: ocg_render_job_queue 3228 msg (=3000 critical) [07:26:56] PROBLEM - OCG health on ocg1001 is CRITICAL: CRITICAL: ocg_job_status 569698 msg: ocg_render_job_queue 3650 msg (=3000 critical) [07:27:07] PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: ocg_job_status 569837 msg: ocg_render_job_queue 3725 msg (=3000 critical) [09:37:07] PROBLEM - High load average on labstore1002 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [24.0] [10:07:33] !log rebooted labstore1002 [10:07:34] PROBLEM - Host labstore1002 is DOWN: PING CRITICAL - Packet loss = 100% [10:07:34] PROBLEM - Outgoing network saturation on labstore1003 is CRITICAL: CRITICAL: 10.34% of data above the critical threshold [100000000.0] [10:07:34] RECOVERY - Outgoing network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [10:07:34] RECOVERY - Host labstore1002 is UP: PING OK - Packet loss = 0%, RTA = 3.47 ms [10:07:34] !log run start-nfs in labstore1002 [10:10:07] PROBLEM - RAID on labstore1002 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) [10:11:25] RECOVERY - High load average on labstore1002 is OK: OK: Less than 50.00% above the threshold [16.0] [10:13:46] PROBLEM - Outgoing network saturation on labstore1003 is CRITICAL: CRITICAL: 18.52% of data above the critical threshold [100000000.0] [10:29:15] RECOVERY - OCG health on ocg1002 is OK: OK: ocg_job_status 641129 msg: ocg_render_job_queue 41 msg [10:29:25] RECOVERY - OCG health on ocg1001 is OK: OK: ocg_job_status 641152 msg: ocg_render_job_queue 0 msg [10:29:26] RECOVERY - OCG health on ocg1003 is OK: OK: ocg_job_status 641154 msg: ocg_render_job_queue 0 msg [10:34:10] !log disabled backups on labstore1002 to prevent overwriting of good backups on 2001 [10:34:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:47:15] RECOVERY - Outgoing network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [12:53:54] !log shut down labstore1002, going to powercycle from mgmt [12:53:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:53:54] PROBLEM - Last backup of the maps filesystem on labstore1002 is CRITICAL: Timeout while attempting connection [12:53:54] PROBLEM - Host labstore1002 is DOWN: PING CRITICAL - Packet loss = 100% [12:53:55] !log powered labstore1002 back up [12:53:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:53:59] RECOVERY - Host labstore1002 is UP: PING WARNING - Packet loss = 73%, RTA = 1.58 ms [12:53:59] RECOVERY - RAID on labstore1002 is OK: OK: optimal, 72 logical, 72 physical [12:54:01] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit is inactive [12:54:01] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1002 is CRITICAL: CRITICAL - Expecting active but unit is inactive [12:54:04] !log trying to manually assemble missing raid on labstore1002 with mdadm --assemble /dev/md/slice51 --uuid 0747643d:b89b36ff:57156095:c33694fc --verbose [12:54:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:54:06] !log also disabled puppet on labstore1002 while investigating [12:54:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:54:06] !log lvchange -ay labstore/tools on labstore1002 [12:54:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:54:11] !log start-nfs on labstore1002 [12:54:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:54:11] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1002 is OK: OK - nfs-exports is active [12:58:22] !log lvchange -ay labstore/others on labstore1002 [12:58:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:25:03] 6operations, 6Labs: labstore1002 not mounting all LVs after reboot - https://phabricator.wikimedia.org/T110832#1587760 (10fgiunchedi) assuming the missing drives are on purpose, activating the lv worked with `lvchange -ay labstore/others` and `lvchange -ay labstore/tools` (and nuking some 'others' snapshots so... [13:29:35] (03PS2) 10ArielGlenn: Add link to developer app guidelines from dumps pages footer [puppet] - 10https://gerrit.wikimedia.org/r/234685 (https://phabricator.wikimedia.org/T110742) (owner: 10Alex Monk) [13:30:17] 6operations, 6Labs: labstore1002 not mounting all LVs after reboot - https://phabricator.wikimedia.org/T110832#1587764 (10Aklapper) p:5Triage>3Unbreak! [13:30:35] (03CR) 10ArielGlenn: [C: 032] Add link to developer app guidelines from dumps pages footer [puppet] - 10https://gerrit.wikimedia.org/r/234685 (https://phabricator.wikimedia.org/T110742) (owner: 10Alex Monk) [15:30:45] PROBLEM - wikidata.org dispatch lag is higher than 300s on wikidata is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1424 bytes in 0.145 second response time [15:33:00] 6operations, 10Datasets-General-or-Unknown, 5Patch-For-Review: Add App Guidelines on Dumps Page - https://phabricator.wikimedia.org/T110742#1587830 (10Krenair) >>! In T110742#1585863, @Krenair wrote: > perhaps something at https://dumps.wikimedia.org/legal.html too? Main suggestion is done, but this is not... [15:54:46] RECOVERY - wikidata.org dispatch lag is higher than 300s on wikidata is OK: HTTP OK: HTTP/1.1 200 OK - 1430 bytes in 0.221 second response time [16:19:13] 6operations, 6Services, 10Traffic: Define a standardized config mechanism for exposing services through varnish - https://phabricator.wikimedia.org/T110717#1587874 (10mobrovac) /me likes this direction. From the looks of it, ideally we should be able to integrate something like that *at least* in the `servic... [16:34:48] 6operations, 6Labs: labstore1002 not mounting all LVs after reboot - https://phabricator.wikimedia.org/T110832#1587886 (10jeremyb-phone) [18:20:01] Hallo. [18:20:16] Is 1.26wmf19 supposed to be still deployed anywhere? [18:22:04] Hmm. [18:22:09] I can't reproduce this any longer, [18:22:26] but a few hours ago, I was testing something with ContentTranslation in production in the French Wikipedia, [18:23:13] and in the Chrome JS debugger I noticed that some JS files are loaded from static/1.26wmf19 [18:23:28] now it's all static/1.26wmf20 [18:26:56] 6operations, 6Phabricator, 7Database: Attempt to connect to phuser@m3-master.eqiad.wmnet failed with error #1040: Too many connections. - https://phabricator.wikimedia.org/T109964#1587940 (10Aklapper) 5duplicate>3Open [18:27:28] (03PS1) 10Ori.livneh: dotfiles: add script for guesstimating gzipped log file's line count [puppet] - 10https://gerrit.wikimedia.org/r/234855 [18:27:47] (03PS2) 10Ori.livneh: harden ssh-agent-proxy [puppet] - 10https://gerrit.wikimedia.org/r/234700 [18:27:56] (03CR) 10Ori.livneh: [C: 032 V: 032] harden ssh-agent-proxy [puppet] - 10https://gerrit.wikimedia.org/r/234700 (owner: 10Ori.livneh) [18:28:20] (03PS2) 10Ori.livneh: dotfiles: add script for guesstimating gzipped log file's line count [puppet] - 10https://gerrit.wikimedia.org/r/234855 [18:28:29] (03CR) 10Ori.livneh: [C: 032 V: 032] dotfiles: add script for guesstimating gzipped log file's line count [puppet] - 10https://gerrit.wikimedia.org/r/234855 (owner: 10Ori.livneh) [18:32:09] 6operations, 6Phabricator, 7Database: Attempt to connect to phuser@m3-master.eqiad.wmnet failed with error #1040: Too many connections. - https://phabricator.wikimedia.org/T109964#1587941 (10JohnLewis) [18:32:14] 6operations, 6Phabricator, 7Database, 5Patch-For-Review: Phabricator creates MySQL connection spikes - https://phabricator.wikimedia.org/T109279#1587942 (10JohnLewis) [18:32:55] RECOVERY - Keyholder SSH agent on tin is OK: OK: Keyholder is armed with all configured keys. [19:10:56] PROBLEM - Keyholder SSH agent on tin is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:15:36] (03PS1) 10Mjbmr: Add new user groups for azbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/234910 (https://phabricator.wikimedia.org/T109755) [21:14:39] 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown: Upgrade Ghostscript to 9.15 or later - https://phabricator.wikimedia.org/T110849#1588033 (10matmarex) 3NEW [21:16:48] 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown: Upgrade Ghostscript to 9.15 or later - https://phabricator.wikimedia.org/T110849#1588033 (10matmarex) [21:22:29] 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown, 6Wikisource: Upgrade Ghostscript to 9.15 or later - https://phabricator.wikimedia.org/T110849#1588051 (10555) [21:30:47] 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown, 6Wikisource: Upgrade Ghostscript to 9.15 or later - https://phabricator.wikimedia.org/T110849#1588059 (10ori) [[ https://blueprints.launchpad.net/~anton+/+archive/ubuntu/photo-video-apps/+sourcepub/4833728/+listing-archive-extra | Doable ]], but not... [21:32:41] 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown, 6Wikisource: Upgrade Ghostscript to 9.15 or later - https://phabricator.wikimedia.org/T110849#1588060 (10matmarex) No idea, but it seems to affect specific files only (never seen it before this bug report). I submitted a patch to T110821 that will a... [21:44:08] (03PS1) 10Nemo bis: [Italian Planet] Update Wikimedia Italia feeds [puppet] - 10https://gerrit.wikimedia.org/r/234921 [21:49:18] 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown, 6Wikisource: Upgrade Ghostscript to 9.15 or later - https://phabricator.wikimedia.org/T110849#1588033 (10matmarex) [22:05:39] 6operations, 10MediaWiki-extensions-PdfHandler, 6Multimedia: Error creating PDF on Commons: "convert: no decode delegate for this image format" (fixed in GS 9.07) - https://phabricator.wikimedia.org/T50007#1588117 (10matmarex) All of the linked files except for the last one thumbnail correctly for me. We're... [22:05:45] 6operations, 10MediaWiki-extensions-PdfHandler, 6Multimedia: Error creating PDF on Commons: "convert: no decode delegate for this image format" (fixed in GS 9.07) - https://phabricator.wikimedia.org/T50007#1588120 (10matmarex) 5Open>3Resolved [22:11:22] 6operations, 6Multimedia, 10Wikimedia-General-or-Unknown, 6Wikisource: Upgrade Ghostscript to 9.15 or later - https://phabricator.wikimedia.org/T110849#1588126 (10matmarex) [22:30:36] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: puppet fail [22:56:26] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [22:59:51] (03PS2) 10Alex Monk: Fix wikitech beacon 204 [puppet] - 10https://gerrit.wikimedia.org/r/234703 (https://phabricator.wikimedia.org/T104359) [23:31:26] PROBLEM - IPsec on cp1073 is CRITICAL: Strongswan CRITICAL - ok: 59 connecting: (unnamed) not-conn: cp4007_v6 [23:33:26] RECOVERY - IPsec on cp1073 is OK: Strongswan OK - 60 ESP OK