[02:57:14] <logmsgbot>	 !log volker-e@deploy1001 Started deploy [design/style-guide@cebc152]: Deploy design/style-guide:
[02:57:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:57:21] <logmsgbot>	 !log volker-e@deploy1001 Finished deploy [design/style-guide@cebc152]: Deploy design/style-guide:  (duration: 00m 07s)
[02:57:22] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:35:07] <logmsgbot>	 !log volker-e@deploy1001 Started deploy [design/style-guide@8bec25e]: Deploy design/style-guide:
[04:35:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:35:14] <logmsgbot>	 !log volker-e@deploy1001 Finished deploy [design/style-guide@8bec25e]: Deploy design/style-guide:  (duration: 00m 07s)
[04:35:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:43:15] <icinga-wm>	 PROBLEM - IPv4 ping to eqsin on ripe-atlas-eqsin is CRITICAL: CRITICAL - failed 141 probes of 591 (alerts on 35) - https://atlas.ripe.net/measurements/11645085/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[05:35:29] <icinga-wm>	 RECOVERY - IPv4 ping to eqsin on ripe-atlas-eqsin is OK: OK - failed 5 probes of 591 (alerts on 35) - https://atlas.ripe.net/measurements/11645085/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts
[06:19:11] <icinga-wm>	 PROBLEM - Postgres Replication Lag on puppetdb2002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB puppetdb (host:localhost) 51284592 and 2 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[06:20:59] <icinga-wm>	 RECOVERY - Postgres Replication Lag on puppetdb2002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB puppetdb (host:localhost) 26576 and 0 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[06:29:29] <icinga-wm>	 PROBLEM - Check systemd state on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:29:36] <icinga-wm>	 PROBLEM - DPKG on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:29:47] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:29:51] <icinga-wm>	 PROBLEM - MD RAID on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:29:53] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:29:55] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:29:59] <icinga-wm>	 PROBLEM - dhclient process on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:30:09] <icinga-wm>	 PROBLEM - Check systemd state on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:30:13] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:30:13] <icinga-wm>	 PROBLEM - DPKG on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:30:19] <icinga-wm>	 PROBLEM - configured eth on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:30:21] <icinga-wm>	 PROBLEM - Check size of conntrack table on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:30:21] <icinga-wm>	 PROBLEM - dhclient process on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:30:23] <icinga-wm>	 PROBLEM - Disk space on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1005&var-datasource=eqiad+prometheus/ops
[06:30:23] <icinga-wm>	 PROBLEM - ores uWSGI web app on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Services/ores
[06:30:23] <icinga-wm>	 PROBLEM - configured eth on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:30:25] <icinga-wm>	 PROBLEM - Disk space on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1001&var-datasource=eqiad+prometheus/ops
[06:30:53] <icinga-wm>	 PROBLEM - MD RAID on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:31:01] <icinga-wm>	 PROBLEM - puppet last run on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:31:41] <icinga-wm>	 PROBLEM - puppet last run on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:41:55] <icinga-wm>	 PROBLEM - ores_workers_running on ores1001 is CRITICAL: connect to address 10.64.0.51 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/ORES
[06:41:55] <icinga-wm>	 PROBLEM - ores_workers_running on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/ORES
[06:41:59] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on ores1005 is CRITICAL: connect to address 10.64.32.14 port 5666: Connection refused https://wikitech.wikimedia.org/wiki/NTP
[06:42:49] <icinga-wm>	 RECOVERY - Check systemd state on ores1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:42:51] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores1001 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:42:53] <icinga-wm>	 RECOVERY - DPKG on ores1001 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:42:59] <icinga-wm>	 RECOVERY - configured eth on ores1001 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:43:01] <icinga-wm>	 RECOVERY - dhclient process on ores1001 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:43:03] <icinga-wm>	 RECOVERY - Disk space on ores1001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1001&var-datasource=eqiad+prometheus/ops
[06:43:31] <icinga-wm>	 RECOVERY - MD RAID on ores1001 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:43:43] <icinga-wm>	 RECOVERY - ores_workers_running on ores1001 is OK: PROCS OK: 91 processes with command name celery https://wikitech.wikimedia.org/wiki/ORES
[06:44:21] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores1001 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:46:37] <icinga-wm>	 RECOVERY - Check size of conntrack table on ores1005 is OK: OK: nf_conntrack is 0 % full https://wikitech.wikimedia.org/wiki/Monitoring/check_conntrack
[06:46:37] <icinga-wm>	 RECOVERY - Disk space on ores1005 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=ores1005&var-datasource=eqiad+prometheus/ops
[06:46:39] <icinga-wm>	 RECOVERY - configured eth on ores1005 is OK: OK - interfaces up https://wikitech.wikimedia.org/wiki/Monitoring/check_eth
[06:47:21] <icinga-wm>	 RECOVERY - ores_workers_running on ores1005 is OK: PROCS OK: 91 processes with command name celery https://wikitech.wikimedia.org/wiki/ORES
[06:47:35] <icinga-wm>	 RECOVERY - Check systemd state on ores1005 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state
[06:47:41] <icinga-wm>	 RECOVERY - DPKG on ores1005 is OK: All packages OK https://wikitech.wikimedia.org/wiki/Monitoring/dpkg
[06:47:53] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ores1005 is OK: OK ferm input default policy is set https://wikitech.wikimedia.org/wiki/Monitoring/check_ferm
[06:47:57] <icinga-wm>	 RECOVERY - MD RAID on ores1005 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook%23Hardware_Raid_Information_Gathering
[06:48:03] <icinga-wm>	 RECOVERY - dhclient process on ores1005 is OK: PROCS OK: 0 processes with command name dhclient https://wikitech.wikimedia.org/wiki/Monitoring/check_dhclient
[06:48:25] <icinga-wm>	 RECOVERY - puppet last run on ores1001 is OK: OK: Puppet is currently enabled, last run 5 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[06:49:07] <icinga-wm>	 RECOVERY - puppet last run on ores1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures https://wikitech.wikimedia.org/wiki/Monitoring/puppet_checkpuppetrun
[07:12:47] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on ores1005 is OK: OK: synced at Sun 2020-01-12 07:12:46 UTC. https://wikitech.wikimedia.org/wiki/NTP
[09:24:06] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Scrapes sample page) timed out before a response was received https://wikitech.wikimedia.org/wiki/Citoid
[09:25:51] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Citoid
[14:45:17] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[14:46:39] <effie>	 !log restart php on mw1238
[14:46:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:48:31] <effie>	 !log restart php on mw1240
[14:48:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:17:47] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:10:01] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 113464672 and 15 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[16:11:51] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 9552 and 101 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring
[17:55:18] <wikibugs>	 10Operations, 10Research, 10Traffic: Set up git-driven static microsite for wikiworkshop.org - https://phabricator.wikimedia.org/T242374 (10bmansurov) Status update: I've [[ https://www.mediawiki.org/w/index.php?title=Gerrit/New_repositories/Requests/Entries&diff=prev&oldid=3609873&diffmode=source | requeste...
[22:16:15] <icinga-wm>	 PROBLEM - OSPF status on cr2-eqdfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[22:18:03] <icinga-wm>	 RECOVERY - OSPF status on cr2-eqdfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status
[22:51:16] <icinga-wm>	 PROBLEM - Host cp3065 is DOWN: PING CRITICAL - Packet loss = 100%
[22:53:51] <icinga-wm>	 PROBLEM - Host cp3061 is DOWN: PING CRITICAL - Packet loss = 100%
[23:19:37] <Krenair>	 two in a row?
[23:47:49] <icinga-wm>	 PROBLEM - BFD status on cr1-eqiad is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status
[23:53:15] <wikibugs>	 10Operations, 10Traffic: servers freeze across the caching cluster - https://phabricator.wikimedia.org/T238305 (10Krenair) `Jan 12 22:51:15 <icinga-wm> PROBLEM - Host cp3065 is DOWN: PING CRITICAL - Packet loss = 100% Jan 12 22:53:51 <icinga-wm> PROBLEM - Host cp3061 is DOWN: PING CRITICAL - Packet loss = 100%...