[00:51:27] <wikibugs>	 (03PS1) 10Urbanecm: Allow ptwiki's bureaucrats to grant/revoke rollbacker user group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481662 (https://phabricator.wikimedia.org/T212735)
[01:01:09] <wikibugs>	 (03PS1) 10Urbanecm: Use localized wgMetaNamespace and wgMetaNamespaceTalk in satwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481663 (https://phabricator.wikimedia.org/T211294)
[03:33:33] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 916.55 seconds
[04:21:05] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 213.38 seconds
[09:46:59] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is CRITICAL: cluster=cache_text site=eqiad https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4fullscreenrefresh=1morgId=1
[09:48:45] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1004 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3fullscreenorgId=1var-site=Allvar-cache_type=textvar-status_type=5
[09:49:23] <elukey>	 seems a single spike problem (recurrent issue) --^
[09:50:11] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2034 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[09:50:37] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at eqiad on icinga1001 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4fullscreenrefresh=1morgId=1
[09:54:47] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1004 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes?panelId=3fullscreenorgId=1var-site=Allvar-cache_type=textvar-status_type=5
[10:00:07] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on db2047 is CRITICAL: cluster=mysql device=cciss,0 instance=db2047:9100 job=node site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2047var-datasource=codfw%2520prometheus%252Fops
[10:31:38] <wikibugs>	 (03PS3) 10Hashar: contint: remove unused classes [puppet] - 10https://gerrit.wikimedia.org/r/481201 (https://phabricator.wikimedia.org/T209361)
[10:33:09] <wikibugs>	 (03CR) 10Hashar: "Those classes are not the Jenkins slaves on labs, they are not used on contint1001 / contint2001 :-)" [puppet] - 10https://gerrit.wikimedia.org/r/481201 (https://phabricator.wikimedia.org/T209361) (owner: 10Hashar)
[10:44:59] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2034 is OK: OK - running: The system is fully operational
[11:39:59] <icinga-wm>	 PROBLEM - Disk space on orespoolcounter2001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.0.56: Connection reset by peer
[11:40:01] <icinga-wm>	 PROBLEM - Disk space on alcyone is CRITICAL: CHECK_NRPE: Error - Could not connect to 208.80.153.16: Connection reset by peer
[11:40:25] <icinga-wm>	 PROBLEM - configured eth on ping2001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.0.22: Connection reset by peer
[11:40:39] <icinga-wm>	 PROBLEM - Check size of conntrack table on alcyone is CRITICAL: CHECK_NRPE: Error - Could not connect to 208.80.153.16: Connection reset by peer
[11:40:47] <icinga-wm>	 PROBLEM - dhclient process on ping2001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.0.22: Connection reset by peer
[11:41:11] <icinga-wm>	 PROBLEM - Check systemd state on ping2001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.0.22: Connection reset by peer
[11:41:33] <icinga-wm>	 PROBLEM - DPKG on orespoolcounter2001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.0.56: Connection reset by peer
[11:41:49] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on alcyone is CRITICAL: CHECK_NRPE: Error - Could not connect to 208.80.153.16: Connection reset by peer
[11:42:27] <icinga-wm>	 PROBLEM - Check systemd state on orespoolcounter2001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.0.56: Connection reset by peer
[11:42:53] <icinga-wm>	 PROBLEM - Check size of conntrack table on ping2001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.0.22: Connection reset by peer
[11:43:53] <icinga-wm>	 PROBLEM - Check size of conntrack table on orespoolcounter2001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.0.56: Connection reset by peer
[11:43:55] <icinga-wm>	 PROBLEM - ganeti-noded running on ganeti2006 is CRITICAL: PROCS CRITICAL: 3 processes with UID = 0 (root), command name ganeti-noded
[11:44:19] <icinga-wm>	 PROBLEM - Check size of conntrack table on alcyone is CRITICAL: CHECK_NRPE: Error - Could not connect to 208.80.153.16: Connection reset by peer
[11:44:55] <icinga-wm>	 PROBLEM - Check systemd state on orespoolcounter2001 is CRITICAL: CHECK_NRPE: Error - Could not connect to 10.192.0.56: Connection reset by peer
[11:44:57] <icinga-wm>	 PROBLEM - Disk space on alcyone is CRITICAL: CHECK_NRPE: Error - Could not connect to 208.80.153.16: Connection reset by peer
[11:45:17] <icinga-wm>	 RECOVERY - Check size of conntrack table on ping2001 is OK: OK: nf_conntrack is 0 % full
[11:45:17] <icinga-wm>	 RECOVERY - configured eth on ping2001 is OK: OK - interfaces up
[11:45:23] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on alcyone is OK: OK ferm input default policy is set
[11:45:25] <icinga-wm>	 RECOVERY - Check size of conntrack table on alcyone is OK: OK: nf_conntrack is 0 % full
[11:45:35] <icinga-wm>	 RECOVERY - dhclient process on ping2001 is OK: PROCS OK: 0 processes with command name dhclient
[11:45:59] <icinga-wm>	 PROBLEM - Check systemd state on ping2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[11:46:01] <icinga-wm>	 RECOVERY - Disk space on orespoolcounter2001 is OK: DISK OK
[11:46:01] <icinga-wm>	 RECOVERY - Check systemd state on orespoolcounter2001 is OK: OK - running: The system is fully operational
[11:46:03] <icinga-wm>	 RECOVERY - Disk space on alcyone is OK: DISK OK
[11:46:13] <icinga-wm>	 RECOVERY - Check size of conntrack table on orespoolcounter2001 is OK: OK: nf_conntrack is 0 % full
[11:46:19] <icinga-wm>	 RECOVERY - DPKG on orespoolcounter2001 is OK: All packages OK
[11:46:21] <icinga-wm>	 RECOVERY - ganeti-noded running on ganeti2006 is OK: PROCS OK: 1 process with UID = 0 (root), command name ganeti-noded
[11:47:11] <icinga-wm>	 RECOVERY - Check systemd state on ping2001 is OK: OK - running: The system is fully operational
[12:49:53] <wikibugs>	 10Operations, 10DBA: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui)
[12:50:20] <icinga-wm>	 ACKNOWLEDGEMENT - Device not healthy -SMART- on db2047 is CRITICAL: cluster=mysql device=cciss,0 instance=db2047:9100 job=node site=codfw Marostegui https://phabricator.wikimedia.org/T208323 https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2047var-datasource=codfw%2520prometheus%252Fops
[14:21:38] <wikibugs>	 (03CR) 10Jforrester: Set wgNoticeProjects for wikimedia (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/471663 (https://phabricator.wikimedia.org/T208694) (owner: 10MacFan4000)
[15:45:27] <wikibugs>	 10Operations, 10Release Pipeline, 10Release-Engineering-Team, 10Core Platform Team Backlog (Watching / External), and 2 others: Migrate production services to kubernetes using the pipeline - https://phabricator.wikimedia.org/T198901 (10hashar)
[16:25:31] <wikibugs>	 (03PS4) 10Framawiki: Publish throttle-analyze at noc [mediawiki-config] - 10https://gerrit.wikimedia.org/r/481267 (https://phabricator.wikimedia.org/T187894)
[17:29:41] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received
[17:33:17] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[17:37:05] <icinga-wm>	 PROBLEM - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is CRITICAL: /api (Ensure Zotero is working) timed out before a response was received: /api (Scrapes sample page) timed out before a response was received
[17:39:25] <icinga-wm>	 RECOVERY - Citoid LVS eqiad on citoid.svc.eqiad.wmnet is OK: All endpoints are healthy
[17:44:48] <wikibugs>	 10Puppet, 10Continuous-Integration-Infrastructure: Need a better way of testing puppet patches for contint/integration stuff - https://phabricator.wikimedia.org/T126370 (10hashar) 05Open→03Declined The jobs now run in Docker containers and the hosts have a very straightforward puppet manifest. Since puppet...
[19:56:16] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 04-1] Require an 8-byte new password for all users (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479571 (https://phabricator.wikimedia.org/T211622) (owner: 10Jforrester)
[20:01:56] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 04-1] "Uh, can we not do this? MinimumPasswordLengthToLogin is an antifeature that should really not be used except after known compromises when " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479570 (https://phabricator.wikimedia.org/T208246) (owner: 10Jforrester)
[20:05:28] <wikibugs>	 (03CR) 10Gergő Tisza: "This is already the default, from core. Icc70122fab1b5 cleans it up, along with a bunch of other things." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479572 (https://phabricator.wikimedia.org/T208441) (owner: 10Jforrester)
[20:09:15] <wikibugs>	 (03CR) 10Gergő Tisza: "No harm in this, but it's a no-op (the list only has 10K passwords, and it's unlikely that will ever change as we have switched to Bloom f" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479573 (owner: 10Jforrester)
[20:21:07] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s3 on db1095 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 774.73 seconds
[20:24:06] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: Create mailing list for Wikimedia's Google Code-in mentors - https://phabricator.wikimedia.org/T212747 (10Aklapper) p:05Triage→03Lowest
[21:28:32] <wikibugs>	 (03CR) 10Jforrester: [C: 04-2] "> Patch Set 1: Code-Review-1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479570 (https://phabricator.wikimedia.org/T208246) (owner: 10Jforrester)
[21:35:03] <wikibugs>	 (03PS1) 10Reedy: Add testcommons.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/481795 (https://phabricator.wikimedia.org/T197616)
[21:38:53] <wikibugs>	 (03PS1) 10Reedy: Add testcommons.wikimedia.org to prod_sites.pp [puppet] - 10https://gerrit.wikimedia.org/r/481796 (https://phabricator.wikimedia.org/T197616)
[21:39:47] <wikibugs>	 (03PS2) 10Reedy: Add testcommons.wikimedia.org to prod_sites.pp [puppet] - 10https://gerrit.wikimedia.org/r/481796 (https://phabricator.wikimedia.org/T197616)
[21:39:49] <wikibugs>	 (03CR) 10Jforrester: "Ideally we'd like this done on 2019-01-02 so that we can get production fully tested with SDC items ahead of deployment to real Commons ne" [dns] - 10https://gerrit.wikimedia.org/r/481795 (https://phabricator.wikimedia.org/T197616) (owner: 10Reedy)
[21:51:12] <wikibugs>	 (03PS2) 10Reedy: Add test-commons.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/481795 (https://phabricator.wikimedia.org/T197616)
[21:52:29] <wikibugs>	 (03PS3) 10Reedy: Add test-commons.wikimedia.org to prod_sites.pp [puppet] - 10https://gerrit.wikimedia.org/r/481796 (https://phabricator.wikimedia.org/T197616)
[22:11:57] <wikibugs>	 (03CR) 10Gergő Tisza: [C: 04-1] "Enforcing it is fine (the easy way is to refuse finishing the login process unless the user changes their password; a more complex but nic" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479570 (https://phabricator.wikimedia.org/T208246) (owner: 10Jforrester)
[22:16:50] <wikibugs>	 (03CR) 10Gergő Tisza: "Hm, I guess password reset via email should still work so it's not that bad. Still a crude approach, IMO." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479570 (https://phabricator.wikimedia.org/T208246) (owner: 10Jforrester)
[22:47:35] <wikibugs>	 (03CR) 10Jforrester: [C: 04-2] "> Patch Set 1: -Code-Review" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/479570 (https://phabricator.wikimedia.org/T208246) (owner: 10Jforrester)