[00:26:30] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Ayounsi Cas is aware - The acknowledgement expires at: 2019-07-23 00:26:06. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[00:26:48] <XioNoX>	 chaomodus: ^
[00:28:28] <chaomodus>	 roger roger
[00:28:35] <chaomodus>	 thanks!
[00:51:17] <icinga-wm>	 PROBLEM - rpki grafana alert on icinga1001 is CRITICAL: CRITICAL: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is alerting: rsync status alert. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts
[01:00:36] <XioNoX>	 at least the alert is valid
[01:01:59] <cdanis>	 Tks4Fish: that is going to be hard to find.  your best bet is probably 'git bisect' :\
[01:02:05] <XioNoX>	 seems like afrinic rsync server is not happy, but only from codfw
[01:02:48] <Tks4Fish>	 bah, I don't have shell access :/
[01:03:10] <Tks4Fish>	 I'm doing it by hand, going from commit to commit and trying to pinpoint it :/
[01:17:53] <icinga-wm>	 RECOVERY - rpki grafana alert on icinga1001 is OK: OK: RPKI ( https://grafana.wikimedia.org/d/UwUa77GZk/rpki ) is not alerting. https://wikitech.wikimedia.org/wiki/RPKI%23Grafana_alerts
[02:02:23] <wikibugs>	 (03Abandoned) 10Ayounsi: pmacct: add tags to aggregated netflow based on the source device [puppet] - 10https://gerrit.wikimedia.org/r/410369 (owner: 10Ayounsi)
[03:02:16] <wikibugs>	 (03PS1) 10Ayounsi: pmacct, send more netflow data to analytics [puppet] - 10https://gerrit.wikimedia.org/r/524628
[03:03:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] pmacct, send more netflow data to analytics [puppet] - 10https://gerrit.wikimedia.org/r/524628 (owner: 10Ayounsi)
[03:04:42] <wikibugs>	 (03CR) 10Ayounsi: "> Patch Set 1: Verified-1" [puppet] - 10https://gerrit.wikimedia.org/r/524628 (owner: 10Ayounsi)
[03:05:56] <wikibugs>	 (03CR) 10Ayounsi: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/524628 (owner: 10Ayounsi)
[03:06:44] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] pmacct, send more netflow data to analytics [puppet] - 10https://gerrit.wikimedia.org/r/524628 (owner: 10Ayounsi)
[03:09:18] <wikibugs>	 (03PS2) 10Ayounsi: pmacct, send more netflow data to analytics [puppet] - 10https://gerrit.wikimedia.org/r/524628
[03:34:19] <wikibugs>	 10Operations, 10Analytics, 10Traffic: Fix geoip updaters for new MaxMind hashed keys by 2019-08-15 - https://phabricator.wikimedia.org/T228533 (10faidon) Note that they do not say that we will stop getting updates but merely that we won't be able to benefit from this "security feature". It does sound scary o...
[03:42:45] <icinga-wm>	 PROBLEM - puppet last run on db1123 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[04:10:59] <icinga-wm>	 RECOVERY - puppet last run on db1123 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[04:28:21] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqiad on icinga1001 is CRITICAL: 39.12 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[04:29:05] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at ulsfo on icinga1001 is CRITICAL: 31.74 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[04:29:39] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at esams on icinga1001 is CRITICAL: 36.81 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[04:30:17] <icinga-wm>	 PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is CRITICAL: 58.97 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[04:31:21] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at esams on icinga1001 is OK: (C)60 le (W)70 le 115.4 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[04:31:41] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqiad on icinga1001 is OK: (C)60 le (W)70 le 90.92 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[04:31:59] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is OK: (C)60 le (W)70 le 102 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[04:32:25] <icinga-wm>	 RECOVERY - Varnish traffic drop between 30min ago and now at ulsfo on icinga1001 is OK: (C)60 le (W)70 le 101.9 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1
[04:36:43] <BrownUnicorn81>	 Hello, can I get my account back? https://en.wikipedia.org/wiki/User_talk:Benjaminzyg
[05:33:54] <wikibugs>	 10Operations, 10Analytics, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10wikimediafoundation.org: Access to WikimediaFoundation.org analytics for Deb - https://phabricator.wikimedia.org/T227496 (10Nuria)
[05:34:39] <wikibugs>	 10Operations, 10Analytics, 10LDAP-Access-Requests, 10wikimediafoundation.org: Access to WikimediaFoundation.org analytics for Deb - https://phabricator.wikimedia.org/T227496 (10Nuria)
[05:51:20] <wikibugs>	 (03PS1) 10ArielGlenn: replace all hiera clls with lookup() for dumps generation manifests [puppet] - 10https://gerrit.wikimedia.org/r/524632 (https://phabricator.wikimedia.org/T227742)
[05:53:11] <icinga-wm>	 RECOVERY - Maps - OSM synchronization lag - eqiad on icinga1001 is OK: (C)2.592e+05 ge (W)1.764e+05 ge 2.119e+04 https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=11&fullscreen&orgId=1
[06:19:29] <icinga-wm>	 PROBLEM - puppet last run on kraz is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[06:32:29] <icinga-wm>	 PROBLEM - puppet last run on elastic1047 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[06:47:45] <icinga-wm>	 RECOVERY - puppet last run on kraz is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[06:54:43] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 30225536 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[06:58:37] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Mailing list for Azwiki Admins - https://phabricator.wikimedia.org/T228560 (10Mardetanha)
[07:00:43] <icinga-wm>	 RECOVERY - puppet last run on elastic1047 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[07:01:17] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Mailing list for Azwiki Admins - https://phabricator.wikimedia.org/T228560 (10Mardetanha) it is duplicate of this [[ https://phabricator.wikimedia.org/T228542 | task  ]]
[07:03:27] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Mailing list for Azwiki Admins - https://phabricator.wikimedia.org/T228560 (10Peachey88)
[07:03:31] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: New Mailing lists for AzWiki sysops - https://phabricator.wikimedia.org/T228542 (10Peachey88)
[07:11:19] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 1047080 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[12:44:37] <icinga-wm>	 PROBLEM - Host cp5004 is DOWN: PING CRITICAL - Packet loss = 100%
[12:44:41] <icinga-wm>	 PROBLEM - Host cp5001 is DOWN: PING CRITICAL - Packet loss = 100%
[12:44:49] <icinga-wm>	 PROBLEM - Host cp5006 is DOWN: PING CRITICAL - Packet loss = 100%
[12:44:59] <icinga-wm>	 PROBLEM - Host cp5010 is DOWN: PING CRITICAL - Packet loss = 100%
[12:47:07] <icinga-wm>	 PROBLEM - Host cp5004.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:47:19] <icinga-wm>	 PROBLEM - Host cp5003.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:47:19] <icinga-wm>	 PROBLEM - Host cp5006.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:47:33] <icinga-wm>	 PROBLEM - Host cp5007.mgmt is DOWN: PING CRITICAL - Packet loss = 100%
[12:47:51] <icinga-wm>	 PROBLEM - BFD status on cr1-codfw is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:48:19] <icinga-wm>	 RECOVERY - Host cp5001 is UP: PING WARNING - Packet loss = 93%, RTA = 235.90 ms
[12:48:21] <icinga-wm>	 PROBLEM - BFD status on cr1-eqsin is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:48:23] <icinga-wm>	 RECOVERY - Host cp5004 is UP: PING WARNING - Packet loss = 54%, RTA = 231.28 ms
[12:48:23] <icinga-wm>	 RECOVERY - Host cp5010 is UP: PING WARNING - Packet loss = 66%, RTA = 231.24 ms
[12:48:25] <icinga-wm>	 PROBLEM - OSPF status on cr1-eqsin is CRITICAL: OSPFv2: 3/3 UP : OSPFv3: 2/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[12:48:29] <icinga-wm>	 RECOVERY - Host cp5006 is UP: PING OK - Packet loss = 0%, RTA = 232.29 ms
[12:49:33] <icinga-wm>	 RECOVERY - BFD status on cr1-codfw is OK: OK: UP: 12 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:50:01] <icinga-wm>	 RECOVERY - BFD status on cr1-eqsin is OK: OK: UP: 4 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[12:50:05] <icinga-wm>	 RECOVERY - OSPF status on cr1-eqsin is OK: OSPFv2: 3/3 UP : OSPFv3: 3/3 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[12:51:15] <icinga-wm>	 RECOVERY - Host cp5004.mgmt is UP: PING OK - Packet loss = 0%, RTA = 232.04 ms
[12:53:03] <icinga-wm>	 RECOVERY - Host cp5003.mgmt is UP: PING OK - Packet loss = 0%, RTA = 231.94 ms
[12:53:03] <icinga-wm>	 RECOVERY - Host cp5006.mgmt is UP: PING OK - Packet loss = 0%, RTA = 231.89 ms
[12:53:17] <icinga-wm>	 RECOVERY - Host cp5007.mgmt is UP: PING OK - Packet loss = 0%, RTA = 231.84 ms
[14:00:23] <icinga-wm>	 PROBLEM - Mediawiki Cirrussearch update rate - codfw on icinga1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [50.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1
[14:00:45] <icinga-wm>	 PROBLEM - Mediawiki Cirrussearch update rate - eqiad on icinga1001 is CRITICAL: CRITICAL: 20.00% of data under the critical threshold [50.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1
[14:04:13] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: New Mailing lists for AzWiki sysops - https://phabricator.wikimedia.org/T228542 (10Eldarado) * requested name of the mailing list, ending in @lists.wikimedia.org. Wikimedia-AZ@lists.wikimedia.org  * reasoning/explanation of purpose (and link to community consensus, if a...
[14:10:27] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1004 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [50.0] https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
[14:18:43] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1004 is OK: OK: Less than 70.00% above the threshold [25.0] https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
[14:25:17] <icinga-wm>	 RECOVERY - Mediawiki Cirrussearch update rate - codfw on icinga1001 is OK: OK: Less than 1.00% under the threshold [80.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1
[14:27:19] <icinga-wm>	 RECOVERY - Mediawiki Cirrussearch update rate - eqiad on icinga1001 is OK: OK: Less than 1.00% under the threshold [80.0] https://wikitech.wikimedia.org/wiki/Search%23No_updates https://grafana.wikimedia.org/d/JLK3I_siz/elasticsearch-indexing?panelId=44&fullscreen&orgId=1
[16:09:05] <icinga-wm>	 RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[16:14:03] <icinga-wm>	 PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[16:26:23] <icinga-wm>	 PROBLEM - HHVM rendering on mw2202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[16:27:53] <icinga-wm>	 RECOVERY - HHVM rendering on mw2202 is OK: HTTP OK: HTTP/1.1 200 OK - 76180 bytes in 0.344 second response time https://wikitech.wikimedia.org/wiki/Application_servers
[19:04:11] <wikibugs>	 (03CR) 10Effie Mouzeli: "> is this the expected behaviour for random host mw1307?" [puppet] - 10https://gerrit.wikimedia.org/r/524336 (https://phabricator.wikimedia.org/T219148) (owner: 10Effie Mouzeli)
[19:27:14] <hauskatze>	 hi Niharika - you there?
[19:49:17] <icinga-wm>	 PROBLEM - toolschecker: All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 177 bytes in 0.145 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker
[19:50:47] <arturo>	 :-/
[20:00:31] <icinga-wm>	 RECOVERY - toolschecker: All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 158 bytes in 0.136 second response time https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Toolschecker
[20:19:35] <icinga-wm>	 PROBLEM - puppet last run on lvs1016 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[20:47:51] <icinga-wm>	 RECOVERY - puppet last run on lvs1016 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link
[22:56:32] <wikibugs>	 (03PS1) 10QChris: Add .gitreview [software/cassandra-table-properties] - 10https://gerrit.wikimedia.org/r/524661
[22:56:34] <wikibugs>	 (03CR) 10QChris: [V: 03+2 C: 03+2] Add .gitreview [software/cassandra-table-properties] - 10https://gerrit.wikimedia.org/r/524661 (owner: 10QChris)
[23:56:35] <icinga-wm>	 PROBLEM - HHVM rendering on mw1230 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers
[23:58:05] <icinga-wm>	 RECOVERY - HHVM rendering on mw1230 is OK: HTTP OK: HTTP/1.1 200 OK - 76200 bytes in 0.776 second response time https://wikitech.wikimedia.org/wiki/Application_servers