[01:06:55] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 35 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[01:11:56] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 2 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[01:12:36] <icinga-wm>	 PROBLEM - HHVM rendering on mw2244 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:13:26] <icinga-wm>	 RECOVERY - HHVM rendering on mw2244 is OK: HTTP OK: HTTP/1.1 200 OK - 72939 bytes in 0.275 second response time
[03:26:26] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 725.31 seconds
[03:32:55] <icinga-wm>	 PROBLEM - puppet last run on mw1208 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIPCity.dat.gz]
[03:33:45] <icinga-wm>	 PROBLEM - puppet last run on mw2135 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz]
[03:55:45] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 137.19 seconds
[04:00:26] <icinga-wm>	 RECOVERY - puppet last run on mw1208 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[04:01:16] <icinga-wm>	 RECOVERY - puppet last run on mw2135 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures
[05:19:25] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 21 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[05:29:26] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 9 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[05:36:35] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 50 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[06:01:35] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 4 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[06:58:36] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 56 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[07:08:36] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 3 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[07:14:16] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[07:21:05] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1259 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[07:44:25] <icinga-wm>	 RECOVERY - Check HHVM threads for leakage on mw1168 is OK: OK
[07:55:21] <wikibugs>	 (03PS1) 10Odder: Add new logo for the Baskhir Wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372777 (https://phabricator.wikimedia.org/T173471)
[08:35:55] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 23 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[08:40:55] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 16 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[08:41:05] <icinga-wm>	 RECOVERY - Check HHVM threads for leakage on mw1259 is OK: OK
[08:57:55] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 51 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[09:31:17] <wikibugs>	 10Operations, 10Wikimedia-Site-requests, 10Regression, 10User-Ladsgroup, and 2 others: Unblock stuck global renames at Meta-Wiki - https://phabricator.wikimedia.org/T173419#3535957 (10MarcoAurelio) 05Open>03Resolved a:03Ladsgroup Closed as everything seems back to normal. Thank you for your help.
[10:28:06] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 14 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[10:35:15] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 33 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[10:50:15] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 6 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[11:12:15] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 45 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[11:37:15] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 3 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[11:56:40] <wikibugs>	 (03PS1) 10Gerrit Patch Uploader: Set X-Frame-Options: SAMEORIGIN if UploadWizard enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372789 (https://phabricator.wikimedia.org/T173631)
[11:56:42] <wikibugs>	 (03CR) 10Gerrit Patch Uploader: "This commit was uploaded using the Gerrit Patch Uploader [1]." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372789 (https://phabricator.wikimedia.org/T173631) (owner: 10Gerrit Patch Uploader)
[12:14:25] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 40 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[12:31:46] <icinga-wm>	 PROBLEM - Check size of conntrack table on ms-fe1005 is CRITICAL: CRITICAL: nf_conntrack is 100 % full
[12:32:47] <icinga-wm>	 RECOVERY - Check size of conntrack table on ms-fe1005 is OK: OK: nf_conntrack is 75 % full
[12:34:25] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 14 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[13:00:14] <wikibugs>	 (03PS2) 10Urbanecm: Reopen bawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372212 (https://phabricator.wikimedia.org/T173471)
[13:01:52] <wikibugs>	 (03PS3) 10Urbanecm: Reopen bawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372212 (https://phabricator.wikimedia.org/T173471)
[13:02:38] <wikibugs>	 (03PS2) 10Urbanecm: Add new logo for the Baskhir Wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372777 (https://phabricator.wikimedia.org/T173471) (owner: 10Odder)
[13:03:32] <wikibugs>	 (03CR) 10Urbanecm: [C: 031] "Just removed the logo part from my patch (372212) and added correct line to this one. Note that they do not depend on each other as the wi" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372777 (https://phabricator.wikimedia.org/T173471) (owner: 10Odder)
[13:06:09] <wikibugs>	 (03CR) 10Urbanecm: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372789 (https://phabricator.wikimedia.org/T173631) (owner: 10Gerrit Patch Uploader)
[13:10:46] <icinga-wm>	 PROBLEM - puppet last run on labtestservices2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:34:01] <wikibugs>	 (03PS1) 10Urbanecm: Update logos for srwiktionary, add HD logos for srwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372795 (https://phabricator.wikimedia.org/T172245)
[13:39:15] <icinga-wm>	 RECOVERY - puppet last run on labtestservices2003 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[13:44:11] <wikibugs>	 (03PS1) 10Urbanecm: Add HD logos for srwikisource, update them too [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372796 (https://phabricator.wikimedia.org/T172268)
[13:48:28] <wikibugs>	 (03PS1) 10Urbanecm: Update logos for srwikinews, add HD version for them [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372797 (https://phabricator.wikimedia.org/T172255)
[13:49:06] <wikibugs>	 (03PS4) 10Urbanecm: Reopen bawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372212 (https://phabricator.wikimedia.org/T173471)
[13:49:18] <wikibugs>	 (03PS2) 10Urbanecm: Enable SandboxLink on cywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372531 (https://phabricator.wikimedia.org/T173054)
[14:18:26] <icinga-wm>	 PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 23 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[14:18:32] <wikibugs>	 (03PS1) 10Urbanecm: Initial configuration for hifwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372798 (https://phabricator.wikimedia.org/T173643)
[14:20:08] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for hifwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372798 (https://phabricator.wikimedia.org/T173643) (owner: 10Urbanecm)
[14:38:19] <wikibugs>	 (03CR) 10Zoranzoki21: [C: 031] "Looks good to me, but someone else must approve." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372795 (https://phabricator.wikimedia.org/T172245) (owner: 10Urbanecm)
[14:41:24] <wikibugs>	 (03PS2) 10Urbanecm: Initial configuration for hifwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372798 (https://phabricator.wikimedia.org/T173643)
[14:48:26] <icinga-wm>	 RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 2 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map
[17:24:06] <wikibugs>	 10Operations, 10DC-Ops, 10Data-Services: Split up labstore external shelf storage available in codfw between labstore2001 and 2 - https://phabricator.wikimedia.org/T171623#3536348 (10madhuvishy) @Papaul Aah, sorry, I had pinged you on the task and didn't know about adding to the ops-codfw board, I'll definit...
[17:24:18] <wikibugs>	 10Operations, 10ops-codfw, 10DC-Ops, 10Data-Services: Split up labstore external shelf storage available in codfw between labstore2001 and 2 - https://phabricator.wikimedia.org/T171623#3536349 (10madhuvishy)
[17:25:55] <icinga-wm>	 PROBLEM - HHVM rendering on mw1287 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time
[17:26:55] <icinga-wm>	 RECOVERY - HHVM rendering on mw1287 is OK: HTTP OK: HTTP/1.1 200 OK - 73718 bytes in 0.387 second response time
[17:28:25] <icinga-wm>	 PROBLEM - Apache HTTP on mw1284 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time
[17:28:36] <icinga-wm>	 PROBLEM - HHVM rendering on mw1284 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time
[17:28:45] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1284 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.006 second response time
[17:29:25] <icinga-wm>	 RECOVERY - Apache HTTP on mw1284 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.044 second response time
[17:29:45] <icinga-wm>	 RECOVERY - HHVM rendering on mw1284 is OK: HTTP OK: HTTP/1.1 200 OK - 73718 bytes in 0.642 second response time
[17:29:45] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1284 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 0.265 second response time
[17:46:16] <icinga-wm>	 PROBLEM - Check Varnish expiry mailbox lag on cp1099 is CRITICAL: CRITICAL: expiry mailbox lag is 2005582
[19:18:11] <wikibugs>	 (03CR) 10Ladsgroup: "No, redis_host is the same but gets overwritten" [puppet] - 10https://gerrit.wikimedia.org/r/369915 (https://phabricator.wikimedia.org/T169246) (owner: 10Halfak)
[23:26:26] <icinga-wm>	 RECOVERY - Check Varnish expiry mailbox lag on cp1099 is OK: OK: expiry mailbox lag is 127898
[23:39:35] <icinga-wm>	 PROBLEM - Disk space on logstash1006 is CRITICAL: DISK CRITICAL - /var/lib/elasticsearch is not accessible: Input/output error
[23:39:55] <icinga-wm>	 PROBLEM - MD RAID on logstash1006 is CRITICAL: CRITICAL: State: degraded, Active: 5, Working: 5, Failed: 1, Spare: 0
[23:39:56] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on logstash1006 is CRITICAL: CRITICAL: State: degraded, Active: 5, Working: 5, Failed: 1, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T173679
[23:39:59] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on logstash1006 - https://phabricator.wikimedia.org/T173679#3536569 (10ops-monitoring-bot)
[23:57:45] <icinga-wm>	 PROBLEM - puppet last run on logstash1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/var/lib/elasticsearch]