[00:47:45] <icinga-wm>	 PROBLEM - puppet last run on mw1268 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[01:15:45] <icinga-wm>	 RECOVERY - puppet last run on mw1268 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[01:36:21] <icinga-wm>	 PROBLEM - puppet last run on lvs3004 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[02:04:23] <icinga-wm>	 RECOVERY - puppet last run on lvs3004 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[03:37:23] <icinga-wm>	 PROBLEM - puppet last run on mw1346 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz],File[/usr/share/GeoIP/GeoIP2-City.mmdb.test] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[04:05:25] <icinga-wm>	 RECOVERY - puppet last run on mw1346 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[04:37:55] <icinga-wm>	 PROBLEM - puppet last run on mc1021 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[05:01:38] <icinga-wm>	 PROBLEM - Memory correctable errors -EDAC- on thumbor1004 is CRITICAL: 4.001 ge 4 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=thumbor1004&var-datasource=eqiad+prometheus/ops
[05:05:57] <icinga-wm>	 RECOVERY - puppet last run on mc1021 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[05:17:07] <icinga-wm>	 PROBLEM - puppet last run on elastic1023 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[05:45:11] <icinga-wm>	 RECOVERY - puppet last run on elastic1023 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:03:45] <wikibugs>	 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Wikimedia-Incident: create a docker_registry_codfw swift container backup - https://phabricator.wikimedia.org/T229118 (10greg)
[06:04:21] <wikibugs>	 10Operations, 10Release-Engineering-Team-TODO, 10serviceops, 10Wikimedia-Incident: create swift container-to-container synchronization metrics - https://phabricator.wikimedia.org/T229117 (10greg)
[06:29:15] <icinga-wm>	 PROBLEM - puppet last run on mw1278 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/tmpreaper.conf] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:30:25] <icinga-wm>	 PROBLEM - puppet last run on mw1305 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/etc/ferm/functions.conf],File[/usr/local/sbin/check-and-restart-php] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:30:29] <icinga-wm>	 PROBLEM - puppet last run on mw2285 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:31:11] <icinga-wm>	 PROBLEM - puppet last run on dbproxy1010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/apt2xml] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:32:37] <icinga-wm>	 PROBLEM - puppet last run on elastic2054 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:32:53] <icinga-wm>	 PROBLEM - puppet last run on mw1307 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/cgroup-mediawiki-clean] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:33:05] <icinga-wm>	 PROBLEM - puppet last run on dbproxy1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/phaste] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:33:21] <icinga-wm>	 PROBLEM - puppet last run on mw1289 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ferm/ferm.conf] https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:48:21] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1009 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:48:29] <icinga-wm>	 PROBLEM - WDQS HTTP Port on wdqs1009 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 649 bytes in 0.002 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[06:50:07] <icinga-wm>	 ACKNOWLEDGEMENT - Check systemd state on wdqs1009 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. Stas Malychev 1009 still reindexing https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:50:07] <icinga-wm>	 ACKNOWLEDGEMENT - WDQS HTTP Port on wdqs1009 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 649 bytes in 0.002 second response time Stas Malychev 1009 still reindexing https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[06:55:27] <icinga-wm>	 RECOVERY - puppet last run on dbproxy1003 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:55:43] <icinga-wm>	 RECOVERY - puppet last run on mw1289 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:57:11] <icinga-wm>	 RECOVERY - puppet last run on mw1278 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:58:21] <icinga-wm>	 RECOVERY - puppet last run on mw1305 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:58:25] <icinga-wm>	 RECOVERY - puppet last run on mw2285 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:59:07] <icinga-wm>	 RECOVERY - puppet last run on dbproxy1010 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[07:00:35] <icinga-wm>	 RECOVERY - puppet last run on elastic2054 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[07:00:51] <icinga-wm>	 RECOVERY - puppet last run on mw1307 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[10:03:07] <icinga-wm>	 PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/summary/{title} (Get summary for test page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29
[10:04:39] <icinga-wm>	 RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Mobileapps_%28service%29
[10:09:08] <icinga-wm>	 RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[10:13:59] <icinga-wm>	 PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[12:37:15] <icinga-wm>	 PROBLEM - puppet last run on labsdb1012 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[12:59:37] <icinga-wm>	 RECOVERY - puppet last run on labsdb1012 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[13:33:07] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: LGBT mailing list moderator password reset - https://phabricator.wikimedia.org/T225787 (10Ladsgroup) 05Resolved→03Open Hey, I'm admin there now but I don't have password. Can you reset it again for me?  (In general I can go on for hours on how horrible mailman 2 is...
[13:39:05] <icinga-wm>	 RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[13:43:53] <icinga-wm>	 PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[14:01:56] <wikibugs>	 (03CR) 10Ladsgroup: "> Patch Set 5:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/430627 (https://phabricator.wikimedia.org/T176553) (owner: 10Huji)
[14:18:32] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: LGBT mailing list moderator password reset - https://phabricator.wikimedia.org/T225787 (10Aklapper) >>! In T225787#5371396, @Ladsgroup wrote: > (In general I can go on for hours on how horrible mailman 2 is specially in security and password management, but let's do tha...
[14:41:26] <icinga-wm>	 PROBLEM - High 1m load average on labstore1007 is CRITICAL: cluster=wmcs instance=labstore1007:9100 job=node site=eqiad https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Labstore https://grafana.wikimedia.org/dashboard/db/labs-monitoring
[14:42:12] * arturo paged
[14:49:25] <icinga-wm>	 PROBLEM - High 1m load average on labstore1007 is CRITICAL: cluster=wmcs instance=labstore1007:9100 job=node site=eqiad https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Labstore https://grafana.wikimedia.org/dashboard/db/labs-monitoring
[14:50:37] <icinga-wm>	 ACKNOWLEDGEMENT - High 1m load average on labstore1007 is CRITICAL: cluster=wmcs instance=labstore1007:9100 job=node site=eqiad Arturo Borrero Gonzalez investigating https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Labstore https://grafana.wikimedia.org/dashboard/db/labs-monitoring
[15:08:43] <icinga-wm>	 RECOVERY - High 1m load average on labstore1007 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Labstore https://grafana.wikimedia.org/dashboard/db/labs-monitoring
[15:13:36] <arturo>	 !log disable 1m load average check in icinga for labstore1007 for 24h
[15:13:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:39:03] <icinga-wm>	 RECOVERY - Check systemd state on netmon1002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:43:51] <icinga-wm>	 PROBLEM - Check systemd state on netmon1002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[16:51:27] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[16:52:59] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[17:30:23] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: New Mailing lists for AzWiki sysops - https://phabricator.wikimedia.org/T228542 (10Eldarado) Is there any update about this task?
[18:41:27] <icinga-wm>	 PROBLEM - puppet last run on rdb1005 is CRITICAL: CRITICAL: Failed to apply catalog, zero resources tracked by Puppet. Could be an interrupted request or a dependency cycle. https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[18:42:26] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Close the engineering mailing list - https://phabricator.wikimedia.org/T222308 (10Aklapper) So... how to get attention again of SRE for this task? :)
[19:01:45] <icinga-wm>	 RECOVERY - WDQS HTTP Port on wdqs1009 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.031 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service
[19:03:27] <icinga-wm>	 RECOVERY - Check systemd state on wdqs1009 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[19:09:29] <icinga-wm>	 RECOVERY - puppet last run on rdb1005 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[20:17:10] <wikibugs>	 10Operations, 10Puppet, 10Continuous-Integration-Config: puppet.git rake fails with ruby 2.5 - https://phabricator.wikimedia.org/T208566 (10hashar) 05Resolved→03Open The Gemfile had Puppet 4.8.2 to match the version provided by Debian Jessie: ` lang=ruby,name=Gemfile gem 'puppet', ENV['PUPPET_GEM_VERSION...