[00:01:01] RECOVERY - mobileapps endpoints health on scb2004 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/mobileapps [00:16:09] PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is CRITICAL: cluster=cache_text site=ulsfo https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:16:17] PROBLEM - HTTP availability for Varnish at eqsin on icinga1001 is CRITICAL: job=varnish-text site=eqsin https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d [00:16:37] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is CRITICAL: cluster=cache_text site=eqsin https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:17:07] PROBLEM - HTTP availability for Varnish at eqiad on icinga1001 is CRITICAL: job=varnish-text site=eqiad https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d [00:17:09] PROBLEM - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is CRITICAL: cluster=cache_text site=codfw https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:17:13] PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:17:45] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:17:53] RECOVERY - HTTP availability for Varnish at eqsin on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d [00:17:57] PROBLEM - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is CRITICAL: cluster=cache_text site=esams https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:18:15] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqsin on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:18:45] RECOVERY - HTTP availability for Varnish at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d [00:18:45] RECOVERY - HTTP availability for Nginx -SSL terminators- at codfw on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:18:49] RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [00:19:33] RECOVERY - HTTP availability for Nginx -SSL terminators- at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Cache_TLS_termination https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [01:26:25] (03PS1) 10DannyS712: Don't enable autoblock by default on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534937 (https://phabricator.wikimedia.org/T231943) [01:28:06] (03PS2) 10DannyS712: Don't enable autoblock by default on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534937 (https://phabricator.wikimedia.org/T231943) [01:28:29] (03PS3) 10DannyS712: Don't enable autoblock by default on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534937 (https://phabricator.wikimedia.org/T231943) [02:25:58] (03PS2) 10Andrew Bogott: codfw1dev: move some designate/pdns things to 'codfw1dev.cloud' [puppet] - 10https://gerrit.wikimedia.org/r/534879 [02:28:36] (03CR) 10Andrew Bogott: [C: 03+2] codfw1dev: move some designate/pdns things to 'codfw1dev.cloud' [puppet] - 10https://gerrit.wikimedia.org/r/534879 (owner: 10Andrew Bogott) [02:30:21] PROBLEM - Postgres Replication Lag on maps1001 is CRITICAL: POSTGRES_HOT_STANDBY_DELAY CRITICAL: DB template1 (host:localhost) 247114936 and 15 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [02:33:29] RECOVERY - Postgres Replication Lag on maps1001 is OK: POSTGRES_HOT_STANDBY_DELAY OK: DB template1 (host:localhost) 688 and 6 seconds https://wikitech.wikimedia.org/wiki/Postgres%23Monitoring [03:01:30] (03PS1) 10Andrew Bogott: realm.pp: Update certname validation for cloud VMs [puppet] - 10https://gerrit.wikimedia.org/r/534938 (https://phabricator.wikimedia.org/T229441) [03:04:04] (03CR) 10Andrew Bogott: [C: 03+2] realm.pp: Update certname validation for cloud VMs [puppet] - 10https://gerrit.wikimedia.org/r/534938 (https://phabricator.wikimedia.org/T229441) (owner: 10Andrew Bogott) [03:24:53] (03PS4) 10Andrew Bogott: Openstack Neutron: added config files and templates for version Newton [puppet] - 10https://gerrit.wikimedia.org/r/533927 (https://phabricator.wikimedia.org/T212302) [03:25:52] (03CR) 10Andrew Bogott: [C: 03+2] Openstack Neutron: added config files and templates for version Newton [puppet] - 10https://gerrit.wikimedia.org/r/533927 (https://phabricator.wikimedia.org/T212302) (owner: 10Andrew Bogott) [03:26:24] (03PS4) 10Andrew Bogott: Designate: add Newton config files and resources [puppet] - 10https://gerrit.wikimedia.org/r/533926 (https://phabricator.wikimedia.org/T212302) [03:27:22] (03CR) 10Andrew Bogott: [C: 03+2] Designate: add Newton config files and resources [puppet] - 10https://gerrit.wikimedia.org/r/533926 (https://phabricator.wikimedia.org/T212302) (owner: 10Andrew Bogott) [03:32:23] PROBLEM - traffic_server tls process restarted on cp5001 is CRITICAL: 2 ge 2 https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server https://grafana.wikimedia.org/d/6uhkG6OZk/ats-instance-drilldown?orgId=1&var-site=eqsin+prometheus/ops&var-instance=cp5001&var-layer=tls [03:59:52] (03Abandoned) 10DannyS712: Don't enable autoblock by default on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/534937 (https://phabricator.wikimedia.org/T231943) (owner: 10DannyS712) [06:13:12] (03CR) 10Giuseppe Lavagetto: [C: 03+1] dbctl: use explicit keyword arguments for the callback [software/conftool] - 10https://gerrit.wikimedia.org/r/534818 (owner: 10CDanis) [06:25:53] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Overall LGTM, but see the comments - most are just questions but I'd like a different approach with the callbacks." (033 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/534819 (https://phabricator.wikimedia.org/T229677) (owner: 10CDanis) [06:28:13] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Same small doubt as with the preceding patch, else LGTM" (031 comment) [software/conftool] - 10https://gerrit.wikimedia.org/r/534899 (https://phabricator.wikimedia.org/T229677) (owner: 10CDanis) [07:21:29] PROBLEM - HHVM rendering on mw1229 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers [07:22:57] RECOVERY - HHVM rendering on mw1229 is OK: HTTP OK: HTTP/1.1 200 OK - 78398 bytes in 0.691 second response time https://wikitech.wikimedia.org/wiki/Application_servers [11:26:57] PROBLEM - Widespread puppet agent failures- no resources reported on icinga1001 is CRITICAL: site=eqsin https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [11:35:23] PROBLEM - HTTP availability for Varnish at esams on icinga1001 is CRITICAL: job=varnish-text site=esams https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d [11:36:59] RECOVERY - HTTP availability for Varnish at esams on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 https://logstash.wikimedia.org/goto/60aa05b6e1129b475fbf4e7be868c67d [12:28:56] (03CR) 10Volans: "clarifications inline" (032 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/534819 (https://phabricator.wikimedia.org/T229677) (owner: 10CDanis) [12:32:01] RECOVERY - Widespread puppet agent failures- no resources reported on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet