[00:01:43] <icinga-wm>	 PROBLEM - snapshot of s4 in codfw on db1115 is CRITICAL: snapshot for s4 at codfw taken more than 4 days ago: Most recent backup 2019-12-03 23:33:06 https://wikitech.wikimedia.org/wiki/MariaDB/Backups
[00:02:05] <icinga-wm>	 RECOVERY - Disk space on netflow2001 is OK: DISK OK https://wikitech.wikimedia.org/wiki/Monitoring/Disk_space https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=netflow2001&var-datasource=codfw+prometheus/ops
[01:09:46] <logmsgbot>	 !log andrew@deploy1001 Started deploy [horizon/deploy@841693b]: (no justification provided)
[01:09:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:11:34] <logmsgbot>	 !log andrew@deploy1001 Finished deploy [horizon/deploy@841693b]: (no justification provided) (duration: 01m 48s)
[01:11:39] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:14:57] <logmsgbot>	 !log andrew@deploy1001 Started deploy [horizon/deploy@accbbd1]: (no justification provided)
[01:15:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:16:51] <logmsgbot>	 !log andrew@deploy1001 Finished deploy [horizon/deploy@accbbd1]: (no justification provided) (duration: 01m 53s)
[01:16:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:27:25] <wikibugs>	 (03PS1) 10Andrew Bogott: Horizon: update apache site to use Horizon's new wsgi.py [puppet] - 10https://gerrit.wikimedia.org/r/555704 (https://phabricator.wikimedia.org/T239974)
[01:36:56] <logmsgbot>	 !log andrew@deploy1001 Started deploy [horizon/deploy@accbbd1]: (no justification provided)
[01:37:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:37:44] <logmsgbot>	 !log andrew@deploy1001 Finished deploy [horizon/deploy@accbbd1]: (no justification provided) (duration: 00m 07s)
[01:37:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:38:00] <logmsgbot>	 !log andrew@deploy1001 Started deploy [horizon/deploy@accbbd1]: (no justification provided)
[01:38:04] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:40:36] <logmsgbot>	 !log andrew@deploy1001 Finished deploy [horizon/deploy@accbbd1]: (no justification provided) (duration: 01m 49s)
[01:40:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:43:03] <logmsgbot>	 !log andrew@deploy1001 Started deploy [horizon/deploy@accbbd1]: (no justification provided)
[01:43:18] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:44:50] <logmsgbot>	 !log andrew@deploy1001 Finished deploy [horizon/deploy@accbbd1]: (no justification provided) (duration: 01m 47s)
[01:44:55] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:47:34] <logmsgbot>	 !log andrew@deploy1001 Started deploy [horizon/deploy@accbbd1]: (no justification provided)
[01:47:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:49:28] <logmsgbot>	 !log andrew@deploy1001 Finished deploy [horizon/deploy@accbbd1]: (no justification provided) (duration: 01m 55s)
[01:49:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[01:51:43] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] Horizon: update apache site to use Horizon's new wsgi.py [puppet] - 10https://gerrit.wikimedia.org/r/555704 (https://phabricator.wikimedia.org/T239974) (owner: 10Andrew Bogott)
[01:54:40] <wikibugs>	 (03PS1) 10Andrew Bogott: horizon: replace apache site venv dir with an .erb lookup [puppet] - 10https://gerrit.wikimedia.org/r/555705 (https://phabricator.wikimedia.org/T239974)
[01:58:19] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 03+2] horizon: replace apache site venv dir with an .erb lookup [puppet] - 10https://gerrit.wikimedia.org/r/555705 (https://phabricator.wikimedia.org/T239974) (owner: 10Andrew Bogott)
[02:17:55] <logmsgbot>	 !log andrew@deploy1001 Started deploy [horizon/deploy@ed2243c]: (no justification provided)
[02:17:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:19:45] <logmsgbot>	 !log andrew@deploy1001 Finished deploy [horizon/deploy@ed2243c]: (no justification provided) (duration: 01m 50s)
[02:19:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:56:40] <logmsgbot>	 !log andrew@deploy1001 Started deploy [horizon/deploy@ff0a0e7]: (no justification provided)
[02:56:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:58:33] <logmsgbot>	 !log andrew@deploy1001 Finished deploy [horizon/deploy@ff0a0e7]: (no justification provided) (duration: 01m 53s)
[02:58:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[04:01:54] <wikibugs>	 (03PS1) 10CRusnov: netbox: Add automation git machinery [puppet] - 10https://gerrit.wikimedia.org/r/555715 (https://phabricator.wikimedia.org/T233183)
[04:02:32] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] netbox: Add automation git machinery [puppet] - 10https://gerrit.wikimedia.org/r/555715 (https://phabricator.wikimedia.org/T233183) (owner: 10CRusnov)
[04:48:00] <wikibugs>	 (03PS2) 10CRusnov: netbox: Add automation git machinery [puppet] - 10https://gerrit.wikimedia.org/r/555715 (https://phabricator.wikimedia.org/T233183)
[04:48:36] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] netbox: Add automation git machinery [puppet] - 10https://gerrit.wikimedia.org/r/555715 (https://phabricator.wikimedia.org/T233183) (owner: 10CRusnov)
[04:54:15] <wikibugs>	 (03PS3) 10CRusnov: netbox: Add automation git machinery [puppet] - 10https://gerrit.wikimedia.org/r/555715 (https://phabricator.wikimedia.org/T233183)
[04:54:42] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] netbox: Add automation git machinery [puppet] - 10https://gerrit.wikimedia.org/r/555715 (https://phabricator.wikimedia.org/T233183) (owner: 10CRusnov)
[08:40:49] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[08:42:37] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[09:51:21] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for eqiad on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe1005:9112 job=statsd_exporter site=eqiad https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/dashboard/file/swift?panelId=9&fullscreen&orgId=1&var-DC=eqiad
[09:52:13] <icinga-wm>	 PROBLEM - mediawiki originals uploads -hourly- for codfw on icinga1001 is CRITICAL: account=mw-media class=originals cluster=swift instance=ms-fe2005:9112 job=statsd_exporter site=codfw https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/dashboard/file/swift?panelId=9&fullscreen&orgId=1&var-DC=codfw
[10:41:31] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/dashboard/file/swift?panelId=9&fullscreen&orgId=1&var-DC=eqiad
[10:42:23] <icinga-wm>	 RECOVERY - mediawiki originals uploads -hourly- for codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Swift/How_To https://grafana.wikimedia.org/dashboard/file/swift?panelId=9&fullscreen&orgId=1&var-DC=codfw
[13:11:53] <wikibugs>	 (03PS1) 10TechneSiyam: Modified InitialiseSettings with 1.5x and 2x logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/555723
[13:11:55] <wikibugs>	 (03PS1) 10TechneSiyam: Modified InitialiseSettings with 1.5x and 2x logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/555724
[13:41:45] <icinga-wm>	 PROBLEM - Host mw1280 is DOWN: PING CRITICAL - Packet loss = 100%
[14:14:23] <wikibugs>	 (03CR) 10Masumrezarock100: [C: 03+1] "LGTM. Perhaps @Urbanecm can run a script to cleanup existing redirects like he did for Commons?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/555692 (https://phabricator.wikimedia.org/T240050) (owner: 10IAmNetx)
[14:15:10] <wikibugs>	 (03CR) 10Urbanecm: [C: 03+1] "> Patch Set 1: Code-Review+1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/555692 (https://phabricator.wikimedia.org/T240050) (owner: 10IAmNetx)
[14:50:19] <wikibugs>	 (03CR) 10Urbanecm: [C: 04-1] Modified InitialiseSettings with 1.5x and 2x logos (035 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/555723 (owner: 10TechneSiyam)
[14:50:27] <wikibugs>	 (03Abandoned) 10Urbanecm: Modified InitialiseSettings with 1.5x and 2x logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/555724 (owner: 10TechneSiyam)
[15:54:59] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[15:56:47] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[16:41:34] <wikibugs>	 (03PS1) 10TechneSiyam: Modified InitialiseSettings with 1.5x and 2x logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/555730
[17:15:44] <wikibugs>	 (03PS1) 10ArielGlenn: Revert "configure adds-changes dumps to skip locking for now" [puppet] - 10https://gerrit.wikimedia.org/r/555732
[17:16:01] <wikibugs>	 (03PS2) 10ArielGlenn: Revert "configure adds-changes dumps to skip locking for now" [puppet] - 10https://gerrit.wikimedia.org/r/555732
[17:17:07] <wikibugs>	 (03CR) 10ArielGlenn: [C: 03+2] Revert "configure adds-changes dumps to skip locking for now" [puppet] - 10https://gerrit.wikimedia.org/r/555732 (owner: 10ArielGlenn)
[18:08:11] <icinga-wm>	 PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/pro
[18:08:15] <icinga-wm>	 PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) timed out before a response was received: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) timed out before a response was received https://wikitech.wikimedia.org/wiki/Services/Monitoring/pro
[18:08:23] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1014 is CRITICAL: PYBAL CRITICAL - CRITICAL - uploadlb6_443: Servers cp1088.eqiad.wmnet, cp1084.eqiad.wmnet, cp1080.eqiad.wmnet, cp1076.eqiad.wmnet are marked down but pooled: uploadlb_443: Servers cp1076.eqiad.wmnet, cp1084.eqiad.wmnet, cp1088.eqiad.wmnet, cp1078.eqiad.wmnet, cp1080.eqiad.wmnet, cp1082.eqiad.wmnet, cp1090.eqiad.wmnet are marked down but pooled https://wikitech.wikimedi
[18:08:23] <icinga-wm>	 PROBLEM - PyBal backends health check on lvs1016 is CRITICAL: PYBAL CRITICAL - CRITICAL - uploadlb6_443: Servers cp1076.eqiad.wmnet, cp1078.eqiad.wmnet, cp1080.eqiad.wmnet, cp1082.eqiad.wmnet, cp1090.eqiad.wmnet are marked down but pooled: uploadlb_443: Servers cp1088.eqiad.wmnet, cp1082.eqiad.wmnet, cp1084.eqiad.wmnet, cp1076.eqiad.wmnet, cp1080.eqiad.wmnet are marked down but pooled https://wikitech.wikimedia.org/wiki/PyBal
[18:08:27] <icinga-wm>	 PROBLEM - LVS HTTPS IPv4 #page on upload-lb.eqiad.wikimedia.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[18:09:18] <godog>	 checking
[18:10:23] <rlazarus>	 following along :)
[18:10:56] * shdubsh here
[18:11:05] <marostegui>	 what's up
[18:11:21] <apergos>	 hello
[18:11:27] <godog>	 following up in _security
[18:11:49] * volans|off here
[18:13:37] <icinga-wm>	 RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[18:13:45] <icinga-wm>	 RECOVERY - LVS HTTPS IPv4 #page on upload-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 859 bytes in 0.007 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems
[18:15:25] <icinga-wm>	 RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton
[18:15:41] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1014 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[18:15:41] <icinga-wm>	 RECOVERY - PyBal backends health check on lvs1016 is OK: PYBAL OK - All pools are healthy https://wikitech.wikimedia.org/wiki/PyBal
[18:55:21] <icinga-wm>	 PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=atlas_exporter site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[18:57:05] <icinga-wm>	 RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets
[19:15:21] <icinga-wm>	 PROBLEM - Logstash Elasticsearch indexing errors on icinga1001 is CRITICAL: 0.55 ge 0.5 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/1cee1f1b5d4e6c5e06edb3353a2a4b83 https://grafana.wikimedia.org/dashboard/db/logstash
[19:17:09] <icinga-wm>	 RECOVERY - Logstash Elasticsearch indexing errors on icinga1001 is OK: (C)0.5 ge (W)0.1 ge 0.07083 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/1cee1f1b5d4e6c5e06edb3353a2a4b83 https://grafana.wikimedia.org/dashboard/db/logstash
[19:25:48] <wikibugs>	 (03PS1) 10BryanDavis: cloud: update maintain-views to handle dblists with comments [puppet] - 10https://gerrit.wikimedia.org/r/555740 (https://phabricator.wikimedia.org/T239415)
[19:27:34] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] cloud: update maintain-views to handle dblists with comments [puppet] - 10https://gerrit.wikimedia.org/r/555740 (https://phabricator.wikimedia.org/T239415) (owner: 10BryanDavis)
[19:30:27] <wikibugs>	 (03PS2) 10BryanDavis: cloud: update maintain-views to handle dblists with comments [puppet] - 10https://gerrit.wikimedia.org/r/555740 (https://phabricator.wikimedia.org/T239415)
[19:44:19] <apergos>	 heh bd808 been there and did that same update on my stuff...
[19:55:12] <bd808>	 apergos: shhhhh... we aren't here. ;)
[19:56:07] <apergos>	 neither am I!  see ya  ;-)
[21:08:23] <icinga-wm>	 PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[22:04:58] <wikibugs>	 (03PS1) 10Kosta Harlan: Document workaround for certificate issue on macOS [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/555751
[22:16:27] <icinga-wm>	 RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET
[23:58:57] <icinga-wm>	 PROBLEM - Host backup2001 is DOWN: PING CRITICAL - Packet loss = 100%