[00:00:06] <icinga-wm>	 PROBLEM - Maps - OSM synchronization lag - codfw on einsteinium is CRITICAL: 1.728e+05 ge 1.728e+05 https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=12&fullscreen&orgId=1
[00:00:07] <icinga-wm>	 PROBLEM - Maps - OSM synchronization lag - eqiad on einsteinium is CRITICAL: 1.728e+05 ge 1.728e+05 https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=11&fullscreen&orgId=1
[00:02:33] <mutante>	 that looks like always the same pattern in the graph and that it will recover in a second
[00:04:31] <mutante>	 always catches up at 2 days
[00:11:07] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 66 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[00:26:16] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[00:30:46] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[01:04:32] <icinga-wm>	 PROBLEM - configured eth on pc2007 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[01:06:22] <icinga-wm>	 PROBLEM - dhclient process on pc2007 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[01:07:06] <papaul>	 new pc node that is me doing new install 
[01:08:03] <icinga-wm>	 PROBLEM - puppet last run on pc2007 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[01:08:03] <icinga-wm>	 PROBLEM - Check systemd state on pc2007 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[01:09:53] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on pc2007 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[01:09:54] <icinga-wm>	 PROBLEM - puppet last run on pc2009 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 8 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[rsyslog-gnutls]
[01:11:43] <icinga-wm>	 PROBLEM - DPKG on pc2007 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[01:15:32] <icinga-wm>	 PROBLEM - Disk space on pc2007 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake.
[01:20:12] <icinga-wm>	 PROBLEM - Host pc2007 is DOWN: PING CRITICAL - Packet loss = 100%
[01:21:13] <icinga-wm>	 RECOVERY - configured eth on pc2007 is OK: OK - interfaces up
[01:21:22] <icinga-wm>	 RECOVERY - Host pc2007 is UP: PING OK - Packet loss = 0%, RTA = 36.08 ms
[01:21:23] <icinga-wm>	 RECOVERY - DPKG on pc2007 is OK: All packages OK
[01:21:52] <icinga-wm>	 RECOVERY - Check systemd state on pc2007 is OK: OK - running: The system is fully operational
[01:22:02] <icinga-wm>	 RECOVERY - Disk space on pc2007 is OK: DISK OK
[01:22:12] <icinga-wm>	 RECOVERY - dhclient process on pc2007 is OK: PROCS OK: 0 processes with command name dhclient
[01:23:22] <icinga-wm>	 PROBLEM - puppet last run on pc2007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 44 seconds ago with 1 failures. Failed resources (up to 3 shown): Package[rsyslog-gnutls]
[01:26:40] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review, 10User-Banyek: rack/setup/install pc2007-pc2010 - https://phabricator.wikimedia.org/T207259 (10Papaul) ```  pc2007  root@pc2007:~# fdisk -l Disk /dev/sda: 4.4 TiB, 4799217008640 bytes, 9373470720 sectors Units: sectors of 1 * 512 = 512 bytes Sector si...
[01:27:27] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review, 10User-Banyek: rack/setup/install pc2007-pc2010 - https://phabricator.wikimedia.org/T207259 (10Papaul)
[01:29:23] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review, 10User-Banyek: rack/setup/install pc2007-pc2010 - https://phabricator.wikimedia.org/T207259 (10Papaul) a:05Papaul>03Banyek @Banyek all yours
[01:34:53] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.774 second response time
[01:35:53] <icinga-wm>	 PROBLEM - puppet last run on pc2008 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 23 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[rsyslog-gnutls]
[01:38:02] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:39:53] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on pc2007 is OK: OK: synced at Wed 2018-10-31 01:39:51 UTC.
[01:46:53] <icinga-wm>	 PROBLEM - puppet last run on pc2010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 15 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[rsyslog-gnutls]
[02:32:52] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s3 on db2036 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[02:48:21] <icinga-wm>	 RECOVERY - Maps - OSM synchronization lag - eqiad on einsteinium is OK: (C)1.728e+05 ge (W)9e+04 ge 1.01e+04 https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=11&fullscreen&orgId=1
[02:51:01] <icinga-wm>	 RECOVERY - Maps - OSM synchronization lag - codfw on einsteinium is OK: (C)1.728e+05 ge (W)9e+04 ge 1.025e+04 https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=12&fullscreen&orgId=1
[03:14:12] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 33 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[03:26:42] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 46 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[03:32:12] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 933.06 seconds
[04:08:02] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 135.07 seconds
[04:11:21] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 04-2] "Going to try to do this within the cloud instead" [puppet] - 10https://gerrit.wikimedia.org/r/470445 (https://phabricator.wikimedia.org/T208244) (owner: 10Andrew Bogott)
[04:20:17] <wikibugs>	 (03PS2) 10Andrew Bogott: ntp: use cloud-specific ntp servers for cloud VMS [puppet] - 10https://gerrit.wikimedia.org/r/470446 (https://phabricator.wikimedia.org/T208244)
[04:20:19] <wikibugs>	 (03PS1) 10Andrew Bogott: Add role/profile for a set of in-cloud ntp servers [puppet] - 10https://gerrit.wikimedia.org/r/470751 (https://phabricator.wikimedia.org/T208244)
[04:21:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Add role/profile for a set of in-cloud ntp servers [puppet] - 10https://gerrit.wikimedia.org/r/470751 (https://phabricator.wikimedia.org/T208244) (owner: 10Andrew Bogott)
[04:23:37] <wikibugs>	 (03PS2) 10Andrew Bogott: Add role/profile for a set of in-cloud ntp servers [puppet] - 10https://gerrit.wikimedia.org/r/470751 (https://phabricator.wikimedia.org/T208244)
[04:23:39] <wikibugs>	 (03PS3) 10Andrew Bogott: ntp: use cloud-specific ntp servers for cloud VMS [puppet] - 10https://gerrit.wikimedia.org/r/470446 (https://phabricator.wikimedia.org/T208244)
[04:27:42] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.956 second response time
[04:29:25] <wikibugs>	 10Operations, 10cloud-services-team: Sporadic puppet failures - https://phabricator.wikimedia.org/T201247 (10Andrew) @Volans, I don't have timestamps, but I do have this from our weekly meeting alert summary:   >  >     2018-10-27: labvirt1014 puppet transient page >     2018-10-28: labvirt1017 puppet transien...
[04:31:11] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:29:25] <wikibugs>	 (03PS1) 10Elukey: mcrouter: switch codfw proxy mw2214 with mw2163 [puppet] - 10https://gerrit.wikimedia.org/r/470752 (https://phabricator.wikimedia.org/T208272)
[06:33:59] <wikibugs>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/13268/" [puppet] - 10https://gerrit.wikimedia.org/r/470752 (https://phabricator.wikimedia.org/T208272) (owner: 10Elukey)
[06:41:01] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 34 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[06:47:52] <icinga-wm>	 PROBLEM - Wikitech-static main page has content on labtestweb2001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:48:12] <icinga-wm>	 PROBLEM - Wikitech-static main page has content on labweb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:48:22] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 68 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[06:48:52] <icinga-wm>	 RECOVERY - Wikitech-static main page has content on labtestweb2001 is OK: HTTP OK: HTTP/1.1 200 OK - 33890 bytes in 0.237 second response time
[06:49:12] <icinga-wm>	 RECOVERY - Wikitech-static main page has content on labweb1001 is OK: HTTP OK: HTTP/1.1 200 OK - 33890 bytes in 0.238 second response time
[06:50:56] <wikibugs>	 10Operations, 10User-Elukey: mcrouter prometheus exporter stops working when mcrouter restarts - https://phabricator.wikimedia.org/T208375 (10elukey) p:05Triage>03Normal
[06:53:32] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 32 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[06:54:08] <wikibugs>	 10Operations, 10User-Elukey: Upgrade memkeys to its latest upstream - https://phabricator.wikimedia.org/T208376 (10elukey) p:05Triage>03Normal
[06:58:41] <icinga-wm>	 PROBLEM - BGP status on cr1-codfw is CRITICAL: BGP CRITICAL - AS1299/IPv6: Active, AS1299/IPv4: Connect https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[07:12:12] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 50 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[07:12:12] <icinga-wm>	 RECOVERY - BGP status on cr1-codfw is OK: BGP OK - up: 24, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[07:33:53] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] remove graphite and carbon-relay cnames [dns] - 10https://gerrit.wikimedia.org/r/470626 (owner: 10Cwhite)
[07:33:58] <wikibugs>	 (03PS3) 10Filippo Giunchedi: remove graphite and carbon-relay cnames [dns] - 10https://gerrit.wikimedia.org/r/470626 (owner: 10Cwhite)
[07:37:05] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM! Naming nit inline but other than that looks good to go" (031 comment) [debs/statsd-proxy] (wmf_v0.0.10) - 10https://gerrit.wikimedia.org/r/470512 (https://phabricator.wikimedia.org/T196484) (owner: 10Cwhite)
[07:41:01] <wikibugs>	 (03PS1) 10Filippo Giunchedi: wmnet: reintroduce graphite.eqiad.wmnet only [dns] - 10https://gerrit.wikimedia.org/r/470755
[07:41:17] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] wmnet: reintroduce graphite.eqiad.wmnet only [dns] - 10https://gerrit.wikimedia.org/r/470755 (owner: 10Filippo Giunchedi)
[07:41:41] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] wmnet: reintroduce graphite.eqiad.wmnet only [dns] - 10https://gerrit.wikimedia.org/r/470755 (owner: 10Filippo Giunchedi)
[07:44:48] <wikibugs>	 10Operations, 10Continuous-Integration-Config, 10Jenkins: Ensure jenkins on puppet.git checks for yaml syntax errors - https://phabricator.wikimedia.org/T208240 (10ema) p:05Triage>03Normal
[07:46:29] <wikibugs>	 10Operations, 10Patch-For-Review: stop using mod_php anywhere - https://phabricator.wikimedia.org/T208257 (10ema) p:05Triage>03Normal
[07:48:16] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests: Requesting access to netbox for bd808 - https://phabricator.wikimedia.org/T208267 (10ema) p:05Triage>03Normal
[07:49:24] <wikibugs>	 10Operations, 10Analytics, 10Analytics-EventLogging, 10Performance-Team, 10Traffic: Increase EventLogging limit from 2K to 5K - https://phabricator.wikimedia.org/T208282 (10ema) p:05Triage>03Normal
[07:50:11] <wikibugs>	 10Operations, 10Release-Engineering-Team, 10monitoring, 10Performance-Team (Radar), 10goodfirstbug: Increase "check_legal_html" coverage to group0 wikis - https://phabricator.wikimedia.org/T208284 (10ema) p:05Triage>03Normal
[07:50:55] <wikibugs>	 10Operations, 10Wikimedia-Site-requests, 10HHVM: Set hhvm.virtual_host[default][always_decode_post_data] = false - https://phabricator.wikimedia.org/T208191 (10ema) p:05Triage>03Normal
[08:05:39] <wikibugs>	 (03CR) 10Filippo Giunchedi: "LGTM modulo one value" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/470659 (https://phabricator.wikimedia.org/T196484) (owner: 10Cwhite)
[08:07:54] <wikibugs>	 (03PS1) 10Elukey: Release latest upstream [debs/memkeys] (debian) - 10https://gerrit.wikimedia.org/r/470773 (https://phabricator.wikimedia.org/T208376)
[08:08:32] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 32 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[08:08:52] <wikibugs>	 (03CR) 10Elukey: "Already built and tested in deployment-prep, looks good so far." [debs/memkeys] (debian) - 10https://gerrit.wikimedia.org/r/470773 (https://phabricator.wikimedia.org/T208376) (owner: 10Elukey)
[08:11:22] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/470452 (https://phabricator.wikimedia.org/T206454) (owner: 10Herron)
[08:13:39] <wikibugs>	 (03PS1) 10Elukey: geoip:archive.sh: avoid hardlinks [puppet] - 10https://gerrit.wikimedia.org/r/470778
[08:14:59] <wikibugs>	 (03CR) 10Elukey: "Andrew/Fran: I am probably missing something about the hardlinks, let me know if this change is not good.." [puppet] - 10https://gerrit.wikimedia.org/r/470778 (owner: 10Elukey)
[08:15:51] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 47 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[08:16:17] <wikibugs>	 (03CR) 10Filippo Giunchedi: logstash: add generic kafka input config (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/470454 (https://phabricator.wikimedia.org/T206454) (owner: 10Herron)
[08:18:02] <icinga-wm>	 RECOVERY - Check systemd state on ms-be1034 is OK: OK - running: The system is fully operational
[08:22:27] <wikibugs>	 (03PS1) 10Ema: admin: move sbassett to users [puppet] - 10https://gerrit.wikimedia.org/r/470779 (https://phabricator.wikimedia.org/T207852)
[08:25:59] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10User-jijiki: Requesting access to deployment and analytics-privatedata-users for sbassett - https://phabricator.wikimedia.org/T207852 (10ema) Request approved during SRE meeting, 2018-10-29.
[08:26:44] <wikibugs>	 (03PS13) 10Banyek: mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253)
[08:27:31] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253) (owner: 10Banyek)
[08:32:23] <wikibugs>	 (03PS14) 10Banyek: mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253)
[08:33:10] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253) (owner: 10Banyek)
[08:35:45] <wikibugs>	 (03PS2) 10Ema: admin: move sbassett to users [puppet] - 10https://gerrit.wikimedia.org/r/470779 (https://phabricator.wikimedia.org/T207852)
[08:35:47] <wikibugs>	 (03PS1) 10Ema: admin: requested groups membership for sbassett [puppet] - 10https://gerrit.wikimedia.org/r/470783 (https://phabricator.wikimedia.org/T207852)
[08:36:21] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 35 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[08:39:46] <godog>	 !log start rolling out rsyslog 8.38 to stretch hosts
[08:39:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[08:42:16] <wikibugs>	 (03PS1) 10Ema: admin: add new user 'jdl' [puppet] - 10https://gerrit.wikimedia.org/r/470784 (https://phabricator.wikimedia.org/T207951)
[08:42:17] <wikibugs>	 (03PS1) 10Ema: admin: groups membership for jdl [puppet] - 10https://gerrit.wikimedia.org/r/470785 (https://phabricator.wikimedia.org/T207951)
[08:42:41] <wikibugs>	 (03PS15) 10Banyek: mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253)
[08:42:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] admin: add new user 'jdl' [puppet] - 10https://gerrit.wikimedia.org/r/470784 (https://phabricator.wikimedia.org/T207951) (owner: 10Ema)
[08:43:01] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10User-jijiki: Requesting access to deployment, operational logs, and analytics cluster for jlinehan - https://phabricator.wikimedia.org/T207951 (10ema) Request approved during SRE meeting, 2018-10-29.
[08:43:39] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253) (owner: 10Banyek)
[08:43:51] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 55 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[08:44:42] <icinga-wm>	 RECOVERY - puppet last run on pc2008 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[08:49:25] <wikibugs>	 (03PS16) 10Banyek: mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253)
[08:49:52] <wikibugs>	 (03PS1) 10Gehel: maps: increase alerting threshold on OSM replication lag [puppet] - 10https://gerrit.wikimedia.org/r/470787
[08:50:10] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253) (owner: 10Banyek)
[08:52:51] <icinga-wm>	 RECOVERY - puppet last run on pc2007 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures
[08:56:35] <wikibugs>	 (03PS17) 10Banyek: mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253)
[09:00:55] <wikibugs>	 (03CR) 10Ema: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/470784 (https://phabricator.wikimedia.org/T207951) (owner: 10Ema)
[09:04:21] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 30 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[09:04:41] <icinga-wm>	 RECOVERY - puppet last run on pc2009 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:06:11] <icinga-wm>	 RECOVERY - puppet last run on pc2010 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[09:06:44] <wikibugs>	 (03PS18) 10Banyek: mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253)
[09:11:42] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 55 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[09:12:28] <wikibugs>	 (03PS19) 10Banyek: mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253)
[09:12:55] <wikibugs>	 (03CR) 10Elukey: [C: 032] Release latest upstream [debs/memkeys] (debian) - 10https://gerrit.wikimedia.org/r/470773 (https://phabricator.wikimedia.org/T208376) (owner: 10Elukey)
[09:14:25] <joal>	 Thanks a lot hashar for the merge :)
[09:14:34] <hashar>	 joal: does it work now? :)
[09:14:45] <joal>	 hashar: testing now - seems ok - will confirm
[09:15:09] <joal>	 Ah - actually no :(
[09:15:11] <joal>	 hashar: 
[09:16:11] <elukey>	 !log upload memkeys 20181031-1 to jessie-wikimedia thirdparty
[09:16:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:17:47] <joal>	 hashar: From what I read in the job, the parameter I added to the xconfig is not used :(
[09:18:35] <wikibugs>	 (03PS20) 10Banyek: mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253)
[09:23:32] <icinga-wm>	 PROBLEM - puppet last run on elastic2019 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[rsyslog],Package[rsyslog-gnutls]
[09:24:31] <elukey>	 !log upgraded memkeys to 20181031-1 on all the mc* - T208376
[09:24:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:24:34] <stashbot>	 T208376: Upgrade memkeys to its latest upstream - https://phabricator.wikimedia.org/T208376
[09:24:52] <wikibugs>	 10Operations, 10Patch-For-Review, 10User-Elukey: Upgrade memkeys to its latest upstream - https://phabricator.wikimedia.org/T208376 (10elukey) 05Open>03Resolved
[09:24:54] <wikibugs>	 10Operations, 10Gadgets, 10MediaWiki-Cache, 10MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), and 4 others: Mcrouter periodically reports soft TKOs for mc[1,2]035 leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey)
[09:25:06] <godog>	 that elastic2019 failure is me
[09:25:31] <icinga-wm>	 PROBLEM - puppet last run on mw2184 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[rsyslog-gnutls]
[09:27:08] <wikibugs>	 (03PS19) 10Gehel: relforge: setup 2 instances to validate multi-instance configuration [puppet] - 10https://gerrit.wikimedia.org/r/466591 (https://phabricator.wikimedia.org/T198352)
[09:28:30] <gehel>	 godog: just curious, why is it failing only on elastic2019?
[09:29:32] <godog>	 gehel: by chance, not only on elastic2019, I forgot to add 'run-no-puppet' to my apt install invocation
[09:30:26] <gehel>	 godog: `run-no-puppet` what is it?
[09:30:31] * gehel is going to learn something today
[09:31:02] <icinga-wm>	 PROBLEM - DPKG on mwdebug2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[09:31:22] <wikibugs>	 (03CR) 10Gehel: "puppet compiler looks good: https://puppet-compiler.wmflabs.org/compiler1002/13274/" [puppet] - 10https://gerrit.wikimedia.org/r/466591 (https://phabricator.wikimedia.org/T198352) (owner: 10Gehel)
[09:31:56] <wikibugs>	 (03PS2) 10Tim Eulitz: Prepare AdvancedSearch go-live SWAT changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470642 (https://phabricator.wikimedia.org/T207638)
[09:33:05] <godog>	 gehel: basically to wait for puppet to finish if it is running before running a command
[09:34:20] <wikibugs>	 (03PS3) 10Tim Eulitz: Prepare AdvancedSearch go-live SWAT changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470642 (https://phabricator.wikimedia.org/T207638)
[09:34:24] <volans>	 gehel: a wrapper to disable puppet, do stuff, enable puppet ;)
[09:34:31] <gehel>	 Oh, so it is a wrapper script!
[09:34:35] <volans>	 the same we have in code in matt's review :D
[09:34:47] <gehel>	 Nice, I did not know that one, I do that manually each time I need it
[09:34:52] <volans>	 that's why I want to move that to the puppet module once we'll have one :D
[09:35:00] <gehel>	 make sense
[09:35:34] <joal>	 hashar: Second try :)
[09:36:41] <hashar>	 argh
[09:36:41] <icinga-wm>	 RECOVERY - DPKG on mwdebug2001 is OK: All packages OK
[09:38:29] <hashar>	 joal: I have refreshed the job and build it manually  ( https://integration.wikimedia.org/ci/job/analytics-refinery-release/145/console )
[09:38:51] <icinga-wm>	 RECOVERY - puppet last run on elastic2019 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[09:39:25] <wikibugs>	 (03PS21) 10Banyek: mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253)
[09:40:52] <hashar>	 joal: no luck. I am looking at the web interface configuration page
[09:42:01] <icinga-wm>	 PROBLEM - puppet last run on mwdebug2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[rsyslog]
[09:43:08] <tim_WMDE>	 Hey, I just added a request for a SWAT deployment for the mid-day SWAT in a bit. Since it's the first time I am doing a SWAT, could someone double-check if I missed anything / if there's a problem with it?
[09:43:14] <tim_WMDE>	 https://wikitech.wikimedia.org/wiki/Deployments#Wednesday,_October_31 Link for the lazy
[09:43:29] <hashar>	 joal: maybe the setting has to be set everywhere?
[09:43:36] <wikibugs>	 (03PS1) 10Vgutierrez: certcentral: Stop abusing SELF_SIGNED status to signal errors [software/certcentral] - 10https://gerrit.wikimedia.org/r/470790 (https://phabricator.wikimedia.org/T208378)
[09:45:56] <hashar>	 joal: at least the build does:  /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Djdk.net.URLClassPath.disableClassPathURLCheck=true 
[09:46:05] <hashar>	 but then it says: Executing Maven:  -B -f /srv/jenkins-workspace/workspace/analytics-refinery-release/pom.xml -s /tmp/settings7804593339835235621.xml clean package
[09:46:08] <hashar>	 which lack the option
[09:46:16] <hashar>	 so maybe it has to be set everywhere
[09:46:37] <wikibugs>	 (03PS22) 10Banyek: mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253)
[09:47:32] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 35 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[09:50:43] <joal>	 hashar: I experienced the same issue with the previous setting (without having the parameter set, so it seems betrer)
[09:51:06] <joal>	 hashar: meaning the Executing maven line didn't contain the parameter
[09:51:10] <wikibugs>	 (03PS23) 10Banyek: mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253)
[09:51:13] <joal>	 :(
[09:51:56] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253) (owner: 10Banyek)
[09:52:12] <icinga-wm>	 RECOVERY - puppet last run on mwdebug2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:54:11] <wikibugs>	 (03CR) 10Gabriel Birke: [C: 031] Prepare AdvancedSearch go-live SWAT changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470642 (https://phabricator.wikimedia.org/T207638) (owner: 10Tim Eulitz)
[09:54:51] <hashar>	 joal: :(((
[09:54:52] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 46 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[09:56:11] <icinga-wm>	 RECOVERY - puppet last run on mw2184 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[09:56:36] <wikibugs>	 (03CR) 10Ladsgroup: [C: 032] Prepare AdvancedSearch go-live SWAT changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470642 (https://phabricator.wikimedia.org/T207638) (owner: 10Tim Eulitz)
[09:57:42] <addshore>	 go go gadget Amir1 
[09:57:46] <wikibugs>	 (03Merged) 10jenkins-bot: Prepare AdvancedSearch go-live SWAT changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470642 (https://phabricator.wikimedia.org/T207638) (owner: 10Tim Eulitz)
[10:00:24] <wikibugs>	 (03PS24) 10Banyek: mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253)
[10:01:13] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253) (owner: 10Banyek)
[10:02:46] <wikibugs>	 (03CR) 10jenkins-bot: Prepare AdvancedSearch go-live SWAT changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470642 (https://phabricator.wikimedia.org/T207638) (owner: 10Tim Eulitz)
[10:05:22] <Amir1>	 ^ rebased on deploy1001
[10:15:18] <joal>	 hashar: I have another solution to test that involves updating the jar - is there a way to easily setup the same java env that jenkins generates for me to test?
[10:18:18] <wikibugs>	 10Operations, 10DBA: Implement parsercache service on pc[12]0(07|08|09|10) and replace leased pc[12]00[456] - https://phabricator.wikimedia.org/T208383 (10jcrespo) p:05Triage>03High
[10:22:01] <icinga-wm>	 PROBLEM - DPKG on archiva1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages
[10:23:13] <volans>	 !log restarted pdfrender on scb1003
[10:23:15] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:24:21] <icinga-wm>	 RECOVERY - DPKG on archiva1001 is OK: All packages OK
[10:24:31] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.002 second response time
[10:24:47] <hashar>	 joal: I think the maven job is just a fancy way to run maven. I am not sure there is much magic
[10:25:03] <hashar>	 joal: assuming you get the patched java 8 version, you should be able to reproduce the issue locally ?
[10:25:26] <joal>	 hashar: I am on that track currently (java version)
[10:25:32] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 34 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[10:25:49] <joal>	 hashar: as for maven, I really don't know how to pass it the param :(
[10:26:00] <hashar>	 :\
[10:26:04] <hashar>	 trying a release at https://integration.wikimedia.org/ci/job/analytics-refinery-release/147/console
[10:26:58] <hashar>	 :(
[10:27:58] <gehel>	 hashar: you're working on making releases from Jenkins? Great!
[10:29:24] <wikibugs>	 10Operations, 10DBA: Implement parsercache service on pc[12]0(07|08|09|10) and replace leased pc[12]00[456] - https://phabricator.wikimedia.org/T208383 (10Banyek) a:03Banyek
[10:29:30] <joal>	 hashar: gone for now, will work on that again this afternoon and keep you posted - Thanks fro the help :)
[10:29:41] <hashar>	 gehel: na Madhumitha did it a while ago
[10:29:49] <hashar>	 gehel: and solely for analytics/refinery 
[10:30:11] <gehel>	 hashar: I should have a look and see if we can do the same for our projects
[10:30:30] * gehel would feel much safer if releases were made by a robot and not by a human
[10:30:35] <hashar>	 gehel: be bold! The logic for refinery is in jjb/analytics.yaml . Surely that could be generalized to all maven repos
[10:30:52] <gehel>	 I'll get to it eventually...
[10:31:16] <hashar>	 joal: or maybe that is due to the maven version
[10:32:50] <gehel>	 hashar, joal: always use mvnwrapper and fix the version!
[10:33:01] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 59 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[10:33:31] <gehel>	 hashar: btw, where is the Dockerfile for that java image that we use?
[10:34:23] <wikibugs>	 10Operations, 10DBA, 10User-Banyek: Implement parsercache service on pc[12]0(07|08|09|10) and replace leased pc[12]00[456] - https://phabricator.wikimedia.org/T208383 (10Banyek)
[10:34:26] <wikibugs>	 10Operations, 10DBA, 10User-Banyek: Implement parsercache service on pc[12]0(07|08|09|10) and replace leased pc[12]00[456] - https://phabricator.wikimedia.org/T208383 (10jcrespo) a:05Banyek>03None
[10:36:07] <wikibugs>	 10Operations, 10DBA, 10User-Banyek: Implement parsercache service on pc[12]0(07|08|09|10) and replace leased pc[12]00[456] - https://phabricator.wikimedia.org/T208383 (10jcrespo) a:03Banyek
[10:36:39] <wikibugs>	 10Operations, 10ops-codfw, 10netops, 10Patch-For-Review: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10elukey) There is a problem in the schedule I am afraid.. Nov 1st is holiday for most of the Europeans, plus I am a bit concerned about DBA presence since @Banyek and and M...
[10:37:41] <hashar>	 joal: AH so I have run the clean packages  with mvn -X
[10:38:02] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 33 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[10:38:11] <hashar>	 and eventually it says surefire:test fails, he command was:/bin/sh -c cd /srv/jenkins-workspace/workspace/analytics-refinery-release/refinery-core && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -jar /srv/jenkins-workspace/workspace/analytics-refinery-release/refinery-core/target/surefire/surefirebooter1119679690881654009.jar
[10:38:11] <hashar>	 /srv/jenkins-workspace/workspace/analytics-refinery-release/refinery-core/target/surefire/surefire5743705044851205592tmp /srv/jenkins-workspace/workspace/analytics-refinery-release/refinery-core/target/surefire/surefire_07400669994662315744tmp
[10:38:39] <hashar>	 sorry that is too long, the summary is the java command is not passed the magic setting
[10:39:51] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1226 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:40:22] * hashar tries with maven 3.5 instead of 3.0
[10:40:42] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1226 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.039 second response time
[10:43:25] <wikibugs>	 10Operations, 10ops-codfw, 10netops, 10Patch-For-Review: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10Joe) >>! In T208272#4706141, @ayounsi wrote: > Here is the full list of hosts in that row. No outages expected, but brief (5s) connectivity interruption for some racks is...
[10:45:07] <wikibugs>	 10Operations, 10ops-codfw, 10netops, 10Patch-For-Review: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10Joe) To be clear: I think we should do the maintenance **without depooling anything** and check what would happen when we lose a row, even if in an inactive datacenter. Bu...
[10:45:22] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 69 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[10:48:23] <hashar>	 I will restart the CI Jenkins in a few minutes
[10:51:49] <Amir1>	 godog: hey, tell me when you're around! thanks
[10:53:03] <godog>	 Amir1: hey, sure I'm here
[10:53:34] <Amir1>	 godog: so, logstash can't find ores logs or it's discarding them
[10:53:47] <Amir1>	 what can we do? How we can check
[10:54:32] <Amir1>	 https://logstash.wikimedia.org/goto/617aa6c0cedc953b704a2ed722c7078e
[10:54:47] <Amir1>	 the INFO ones are coming from uwsgi and ores is not sending them
[10:55:10] <godog>	 Amir1: ack, is there a task ?
[10:55:17] <Amir1>	 there is five
[10:55:26] <godog>	 I'll check ores1001
[10:56:06] <hashar>	 !log restarting CI jenkins on contint1001
[10:56:08] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:57:15] <godog>	 Amir1: I looked now and it doesn't seem ores1001 is sending logstash.svc anything on port 12201
[10:57:16] <Amir1>	 godog: this would work for now: https://phabricator.wikimedia.org/T181630
[10:57:45] <Amir1>	 let me check the config
[10:57:46] <hashar>	 !log contint1001: upgraded java and restarted Jenkins
[10:57:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:58:18] <Amir1>	 The config says: logstash.svc.eqiad.wmnet:12201 (is this right?)
[10:59:21] <godog>	 depends what protocol/format you are using for sending, I'm assuming this is python-logstash ?
[10:59:26] <wikibugs>	 (03PS2) 10Elukey: Add change_tag to list of tables to sqoop [puppet] - 10https://gerrit.wikimedia.org/r/470593 (https://phabricator.wikimedia.org/T205940) (owner: 10Fdans)
[10:59:35] <tim_WMDE>	 I am also around Amir1
[11:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181031T1100).
[11:00:04] <jouncebot>	 Dereckson, Amir1, and Tim_WMDE: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[11:00:09] <Amir1>	 godog: yup. It sends the JSON stuff
[11:00:29] <Amir1>	 tim_WMDE: cool. Your patch should be live on beta by now (hopefully) I can double check
[11:00:44] <zeljkof>	 o/
[11:01:11] <zeljkof>	 Dereckson, Amir1: you are deployers, right? go ahead and self-organize and deploy your patches :)
[11:01:24] <wikibugs>	 10Operations, 10Continuous-Integration-Config: Ensure jenkins on puppet.git checks for yaml syntax errors - https://phabricator.wikimedia.org/T208240 (10hashar)
[11:01:26] <zeljkof>	 tim_WMDE: are you a deployer? or do you need help deploying the patch?
[11:01:32] <tim_WMDE>	 Amir1 looks like it is live, thanks
[11:01:46] <tim_WMDE>	 I am just around because I submitted something for SWAT deployment
[11:02:04] <Amir1>	 zeljkof: I merged tim'
[11:02:11] <Amir1>	 *Tim's patch already (beta)
[11:02:18] <zeljkof>	 Amir1: ah, cool, ok, then go ahead, I'm around if you need me :)
[11:02:27] <wikibugs>	 (03CR) 10Ladsgroup: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470679 (https://phabricator.wikimedia.org/T205064) (owner: 10Ladsgroup)
[11:03:30] <Dereckson>	 Hello. Give me a greenlight and I can deploy it. It's a no op in prod.
[11:03:38] <godog>	 Amir1: ok if that's "one json per line" then the port is 11514
[11:03:42] <wikibugs>	 (03Merged) 10jenkins-bot: Do not load WikibaseQuality [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470679 (https://phabricator.wikimedia.org/T205064) (owner: 10Ladsgroup)
[11:03:45] <godog>	 12201 is udp/gelf
[11:04:14] <godog>	 also I'm assuming this is equally broken in beta
[11:04:23] <Amir1>	 I think we are sending it over udp but json per line (let me double check)
[11:05:27] <Amir1>	 testing the patch in mwdebug1002
[11:08:55] <logmsgbot>	 !log ladsgroup@deploy1001 Synchronized wmf-config/Wikibase.php: SWAT: [[gerrit:470679|Do not load WikibaseQuality (T205064)]] (duration: 01m 05s)
[11:08:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:08:59] <stashbot>	 T205064: Undeploy WikibaseQuality extension from the WMF - https://phabricator.wikimedia.org/T205064
[11:10:50] <wikibugs>	 (03CR) 10jenkins-bot: Do not load WikibaseQuality [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470679 (https://phabricator.wikimedia.org/T205064) (owner: 10Ladsgroup)
[11:11:56] <godog>	 Amir1: I'd recommend starting testing in beta and move to port 11514/udp, that should work and accept json over udp
[11:12:23] <godog>	 12201/udp is gelf which isn't implemented by logstash_handler.py afaics
[11:21:14] <Amir1>	 oh okay
[11:21:31] <wikibugs>	 10Operations, 10ops-codfw, 10netops, 10Patch-For-Review: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10fgiunchedi) >>! In T208272#4708611, @Joe wrote:  > we will most likely need to run switftrepl after the outage to catch up on missing originals. Should we failover traffic...
[11:22:02] <Amir1>	 Dereckson: SWAT is yours
[11:26:30] <wikibugs>	 10Operations, 10Continuous-Integration-Config: Ensure jenkins on puppet.git checks for yaml syntax errors - https://phabricator.wikimedia.org/T208240 (10hashar) PuppetSyntax has support to lint hiera files and the task should be run when a hiera file is changed. With that change, the rake task is registered (`...
[11:28:29] <Dereckson>	 thanks
[11:30:19] <bblack>	 jouncebot: next
[11:30:20] <jouncebot>	 In 0 hour(s) and 29 minute(s): Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181031T1200)
[11:30:24] <bblack>	 heh
[11:30:38] <Dereckson>	 bblack: we're currently in SWAT, do you need to add a change?
[11:31:07] <bblack>	 no, just keeping myself aware of concurrent things! :)
[11:32:18] <wikibugs>	 (03PS2) 10Dereckson: Find bash in environment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470585
[11:33:33] <wikibugs>	 (03CR) 10Dereckson: [C: 032] Find bash in environment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470585 (owner: 10Dereckson)
[11:34:48] <wikibugs>	 (03Merged) 10jenkins-bot: Find bash in environment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470585 (owner: 10Dereckson)
[11:36:28] <logmsgbot>	 !log dereckson@deploy1001 Synchronized docroot/noc/createTxtFileSymlinks.sh: UNIX-agnostic shebang for createTxtFileSymlinks ([[Gerrit:470585]], no-op in prod) (duration: 00m 54s)
[11:36:30] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:37:44] <wikibugs>	 (03PS9) 10Giuseppe Lavagetto: mediawiki: add httpd class, alternative to mediawiki::web [puppet] - 10https://gerrit.wikimedia.org/r/467643
[11:37:46] <wikibugs>	 (03PS16) 10Giuseppe Lavagetto: mediawiki::webserver: introduce profile, use it on mwdebug* [puppet] - 10https://gerrit.wikimedia.org/r/467644
[11:38:45] <wikibugs>	 (03CR) 10jenkins-bot: Find bash in environment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470585 (owner: 10Dereckson)
[11:40:36] <wikibugs>	 (03CR) 10Filippo Giunchedi: create rsyslog::ship_logfile - simplified logstash shipper via kafka (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/469945 (https://phabricator.wikimedia.org/T206454) (owner: 10Herron)
[11:41:06] <Amir1>	 godog: hmm, with 11514 in beta still can't see anything :/
[11:41:53] <godog>	 Amir1: what's the beta host you're trying on?
[11:42:20] <Amir1>	 deployment-ores01 sending to deployment-logstash2.deployment-prep.eqiad.wmflabs:11514
[11:43:52] <icinga-wm>	 PROBLEM - High lag on wdqs1004 is CRITICAL: 3633 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[11:45:20] <godog>	 Amir1: are you root on that host? I'm checking if anything is sent with tcpdump -i any 'port 11514'
[11:45:24] <godog>	 and doesn't look like it
[11:46:09] <Amir1>	 let me double check
[11:46:12] <icinga-wm>	 PROBLEM - High lag on wdqs1004 is CRITICAL: 3662 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[11:46:25] <Amir1>	 we need to restart the service to make it emit some logs
[11:47:39] <Amir1>	 godog: it sends them when you restart the service: 11:47:10.950660 IP deployment-ores01.deployment-prep.eqiad.wmflabs.52533 > deployment-logstash2.deployment-prep.eqiad.wmflabs.11514: UDP, length 1063
[11:47:46] <godog>	 indeed now it is sending
[11:49:53] <Amir1>	 godog: haha, they now show up in logstash but everything is INFO :/
[11:50:11] <Amir1>	 godog: Is this wrong? https://github.com/wikimedia/ores/blob/master/ores/logging/logstash_fomatter.py#L33
[11:50:39] <Amir1>	 maybe we send WARNING and it needs to get a number or something similar
[11:50:42] <icinga-wm>	 PROBLEM - High lag on wdqs1005 is CRITICAL: 3662 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[11:50:58] <godog>	 Amir1: I don't know, why not use python-logstash though?
[11:51:28] <Amir1>	 godog: it doesn't support python3 properly and also it's unmaintained (the whole thing is a basic copy paste though)
[11:53:09] <godog>	 I see
[11:53:53] <godog>	 something did get sent as warning though, e.g.
[11:53:55] <godog>	 {"@version": "1", "host": "deployment-ores01", "message": "celery@deployment-ores01 ready.", "@timestamp": "2018-10-31T11:47:53.928143+00:00", "tags": [], "level": "WARNING", "path":
[11:55:55] <Amir1>	 hmm, let me try something
[11:56:51] <wikibugs>	 (03PS5) 10BBlack: interface::rps: strict single CPU core per queue [puppet] - 10https://gerrit.wikimedia.org/r/468313
[11:56:53] <wikibugs>	 (03PS7) 10BBlack: interface::rps: always be NUMA aware [puppet] - 10https://gerrit.wikimedia.org/r/467469
[11:56:55] <wikibugs>	 (03PS7) 10BBlack: graphite: add interface::rps settings to graphite hosts [puppet] - 10https://gerrit.wikimedia.org/r/468388 (https://phabricator.wikimedia.org/T196484) (owner: 10Cwhite)
[11:56:57] <wikibugs>	 (03PS1) 10BBlack: remove numa device_to_cpumask_invert fact [puppet] - 10https://gerrit.wikimedia.org/r/470812
[11:56:59] <wikibugs>	 (03PS1) 10BBlack: remove wdqs numa_networking hieradata [puppet] - 10https://gerrit.wikimedia.org/r/470813
[11:57:01] <wikibugs>	 (03PS1) 10BBlack: tlsproxy: always NUMA, and looser CPU binding [puppet] - 10https://gerrit.wikimedia.org/r/470814
[11:57:03] <wikibugs>	 (03PS1) 10BBlack: remove global numa_networking [puppet] - 10https://gerrit.wikimedia.org/r/470815
[11:57:09] <godog>	 ok, I have to go Amir1, ttyl
[11:57:58] <Amir1>	 have fun!
[11:58:29] <wikibugs>	 (03PS17) 10Giuseppe Lavagetto: mediawiki::webserver: introduce profile, use it on mwdebug* [puppet] - 10https://gerrit.wikimedia.org/r/467644
[12:00:05] <jouncebot>	 Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181031T1200)
[12:08:55] <wikibugs>	 10Operations, 10Certcentral, 10DNS, 10Traffic: Allow Let's Encrypt issue wildcard certificates - https://phabricator.wikimedia.org/T208390 (10Vgutierrez)
[12:09:40] <wikibugs>	 10Operations, 10Certcentral, 10DNS, 10Traffic: Allow Let's Encrypt issue wildcard certificates - https://phabricator.wikimedia.org/T208390 (10Vgutierrez) p:05Triage>03Normal
[12:17:42] <icinga-wm>	 RECOVERY - High lag on wdqs1005 is OK: (C)3600 ge (W)1200 ge 273 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[12:21:39] <wikibugs>	 (03PS1) 10BBlack: wikimedia.org CAA: allow wildcards for LE [dns] - 10https://gerrit.wikimedia.org/r/470816 (https://phabricator.wikimedia.org/T208390)
[12:26:51] <icinga-wm>	 PROBLEM - High lag on wdqs1005 is CRITICAL: 4413 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[12:27:01] <wikibugs>	 10Operations, 10Wikimedia-Logstash, 10monitoring, 10Patch-For-Review, and 3 others: Send celery and wsgi service logs to logstash - https://phabricator.wikimedia.org/T181630 (10Ladsgroup) It needs some changes. I will make them and when you do them it just works: {F26999501} cc @fgiunchedi
[12:29:46] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 04-1] "What if we use the ensure_service() function to avoid the big if {} block?" [puppet] - 10https://gerrit.wikimedia.org/r/470683 (https://phabricator.wikimedia.org/T207591) (owner: 10GTirloni)
[12:30:38] <wikibugs>	 (03PS18) 10Giuseppe Lavagetto: mediawiki::webserver: introduce profile, use it on mwdebug* [puppet] - 10https://gerrit.wikimedia.org/r/467644
[12:31:26] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Refactor puppet WDQS module - https://phabricator.wikimedia.org/T208201 (10Mathew.onipe)
[12:32:20] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): refactor wdqs::updater to use scap::targets for sudo rules - https://phabricator.wikimedia.org/T208392 (10Mathew.onipe) p:05Triage>03Normal
[12:33:13] <wikibugs>	 (03PS2) 10Arturo Borrero Gonzalez: deployment-prep hieradata: Fix comment about which host this IP is [puppet] - 10https://gerrit.wikimedia.org/r/470095 (owner: 10Alex Monk)
[12:34:15] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 032] deployment-prep hieradata: Fix comment about which host this IP is [puppet] - 10https://gerrit.wikimedia.org/r/470095 (owner: 10Alex Monk)
[12:35:29] <onimisionipe>	 !log depooling wdqs1005 to catch up with others
[12:35:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:38:02] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Fix Type constraints in wdqs (init.pp) - https://phabricator.wikimedia.org/T208393 (10Mathew.onipe) p:05Triage>03Normal
[12:40:19] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] "Cherry-picked on beta, it works as expected. No significant changes happen. Apache gets restarted since we change the mode of the worker.l" [puppet] - 10https://gerrit.wikimedia.org/r/467644 (owner: 10Giuseppe Lavagetto)
[12:40:39] <wikibugs>	 (03CR) 10Giuseppe Lavagetto: [C: 032] "works correctly in beta." [puppet] - 10https://gerrit.wikimedia.org/r/467643 (owner: 10Giuseppe Lavagetto)
[12:40:51] <wikibugs>	 (03PS10) 10Giuseppe Lavagetto: mediawiki: add httpd class, alternative to mediawiki::web [puppet] - 10https://gerrit.wikimedia.org/r/467643
[12:42:13] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Isolate wdqs service (blazegraph) as a submodule under the wdqs module - https://phabricator.wikimedia.org/T208394 (10Mathew.onipe) p:05Triage>03Normal
[12:42:20] <wikibugs>	 (03PS19) 10Giuseppe Lavagetto: mediawiki::webserver: introduce profile, use it on mwdebug* [puppet] - 10https://gerrit.wikimedia.org/r/467644
[12:44:10] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): Cleanup wdqs puppet profile to include the new changes based on refactoring - https://phabricator.wikimedia.org/T208395 (10Mathew.onipe) p:05Triage>03Normal
[12:44:53] <wikibugs>	 (03PS1) 10Gehel: wdqs: raise alerting threshold on updater lag for public cluster [puppet] - 10https://gerrit.wikimedia.org/r/470819 (https://phabricator.wikimedia.org/T199228)
[12:50:40] <wikibugs>	 (03PS3) 10GTirloni: tools-services: Add updatetools_enabled key [puppet] - 10https://gerrit.wikimedia.org/r/470683 (https://phabricator.wikimedia.org/T207591)
[12:51:35] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] tools-services: Add updatetools_enabled key [puppet] - 10https://gerrit.wikimedia.org/r/470683 (https://phabricator.wikimedia.org/T207591) (owner: 10GTirloni)
[12:57:00] <wikibugs>	 (03PS4) 10GTirloni: tools-services: Add updatetools_enabled key [puppet] - 10https://gerrit.wikimedia.org/r/470683 (https://phabricator.wikimedia.org/T207591)
[12:57:56] <wikibugs>	 (03CR) 10DCausse: [C: 031] relforge: setup 2 instances to validate multi-instance configuration [puppet] - 10https://gerrit.wikimedia.org/r/466591 (https://phabricator.wikimedia.org/T198352) (owner: 10Gehel)
[12:58:00] <wikibugs>	 (03CR) 10Ottomata: [C: 031] "IIRC, the hardlinks were just to avoid duplicates and save a bit of space, this should be fine." [puppet] - 10https://gerrit.wikimedia.org/r/470778 (owner: 10Elukey)
[13:00:04] <jouncebot>	 Deploy window MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181031T1300)
[13:02:53] <thedj>	 hmm. this rename seems to have lost it's original: https://commons.wikimedia.org/wiki/File:Lekeitioko_Mertxe_Pagoaga.webm
[13:05:35] <wikibugs>	 (03PS2) 10Elukey: geoip:archive.sh: avoid hardlinks [puppet] - 10https://gerrit.wikimedia.org/r/470778
[13:06:24] <wikibugs>	 (03CR) 10Elukey: [C: 032] geoip:archive.sh: avoid hardlinks [puppet] - 10https://gerrit.wikimedia.org/r/470778 (owner: 10Elukey)
[13:06:46] <wikibugs>	 (03CR) 10Fdans: [C: 031] geoip:archive.sh: avoid hardlinks [puppet] - 10https://gerrit.wikimedia.org/r/470778 (owner: 10Elukey)
[13:09:32] <wikibugs>	 (03PS1) 10Ladsgroup: ores: Change logstash port from GELF to json lines [puppet] - 10https://gerrit.wikimedia.org/r/470827 (https://phabricator.wikimedia.org/T181546)
[13:10:58] <wikibugs>	 10Operations, 10ops-codfw, 10netops, 10Patch-For-Review: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10Eevans) >>! In T208272#4708611, @Joe wrote: >>>! In T208272#4706141, @ayounsi wrote: >> >> [ ... ] >> >> restbase2003 >> restbase2004 >> restbase2008 >> restbase2011 >  >...
[13:13:40] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 04-1] "I think we need the logic for the ensure inside service_params." [puppet] - 10https://gerrit.wikimedia.org/r/470683 (https://phabricator.wikimedia.org/T207591) (owner: 10GTirloni)
[13:17:04] <wikibugs>	 10Operations, 10Gadgets, 10MediaWiki-Cache, 10MW-1.33-notes (1.33.0-wmf.3; 2018-11-06), and 4 others: Mcrouter periodically reports soft TKOs for mc[1,2]035 leading to MW Memcached exceptions - https://phabricator.wikimedia.org/T203786 (10elukey) Finally after a lot of digging I added a meaningful graph to...
[13:18:55] <wikibugs>	 (03PS5) 10GTirloni: tools-services: Add updatetools_enabled key [puppet] - 10https://gerrit.wikimedia.org/r/470683 (https://phabricator.wikimedia.org/T207591)
[13:19:50] <wikibugs>	 (03CR) 10Hashar: [C: 031] "Verified. Should be good to go." [puppet] - 10https://gerrit.wikimedia.org/r/376739 (https://phabricator.wikimedia.org/T93414) (owner: 10Hashar)
[13:22:20] <wikibugs>	 (03PS2) 10Hashar: TXT entries for Github domain verification [dns] - 10https://gerrit.wikimedia.org/r/468279 (https://phabricator.wikimedia.org/T207364)
[13:23:20] <wikibugs>	 (03CR) 10Hashar: "They will then be verified on https://github.com/organizations/wikimedia/settings/domains" [dns] - 10https://gerrit.wikimedia.org/r/468279 (https://phabricator.wikimedia.org/T207364) (owner: 10Hashar)
[13:25:43] <wikibugs>	 (03PS1) 10Filippo Giunchedi: swift: enable statsd_exporter [puppet] - 10https://gerrit.wikimedia.org/r/470830 (https://phabricator.wikimedia.org/T205870)
[13:26:26] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] swift: enable statsd_exporter [puppet] - 10https://gerrit.wikimedia.org/r/470830 (https://phabricator.wikimedia.org/T205870) (owner: 10Filippo Giunchedi)
[13:26:36] <wikibugs>	 (03PS2) 10Filippo Giunchedi: swift: enable statsd_exporter [puppet] - 10https://gerrit.wikimedia.org/r/470830 (https://phabricator.wikimedia.org/T205870)
[13:31:52] <icinga-wm>	 PROBLEM - puppet last run on ms-be2025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:31:52] <icinga-wm>	 PROBLEM - puppet last run on ms-be1034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:32:17] <godog>	 sigh that's me
[13:32:32] <icinga-wm>	 PROBLEM - puppet last run on ms-fe2007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:33:11] <icinga-wm>	 PROBLEM - puppet last run on ms-be1043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:33:21] <icinga-wm>	 PROBLEM - puppet last run on ms-be2023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:34:02] <icinga-wm>	 PROBLEM - puppet last run on ms-fe2008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:34:11] <icinga-wm>	 PROBLEM - puppet last run on ms-be2013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:34:11] <icinga-wm>	 PROBLEM - puppet last run on ms-be1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:34:12] <icinga-wm>	 PROBLEM - puppet last run on ms-be1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:34:12] <icinga-wm>	 PROBLEM - puppet last run on ms-be1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:34:12] <icinga-wm>	 PROBLEM - puppet last run on ms-be2017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:34:22] <icinga-wm>	 PROBLEM - puppet last run on ms-be2026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:34:31] <icinga-wm>	 PROBLEM - puppet last run on ms-be1027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:34:32] <icinga-wm>	 PROBLEM - puppet last run on ms-be2018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:34:32] <icinga-wm>	 PROBLEM - puppet last run on ms-be2021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:34:39] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 031] admin: move sbassett to users [puppet] - 10https://gerrit.wikimedia.org/r/470779 (https://phabricator.wikimedia.org/T207852) (owner: 10Ema)
[13:34:52] <icinga-wm>	 PROBLEM - puppet last run on ms-be1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:34:52] <icinga-wm>	 PROBLEM - puppet last run on ms-be1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:34:52] <icinga-wm>	 PROBLEM - puppet last run on ms-be1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:34:52] <icinga-wm>	 PROBLEM - puppet last run on ms-be1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:35:02] <icinga-wm>	 PROBLEM - puppet last run on ms-be1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:35:02] <icinga-wm>	 PROBLEM - puppet last run on ms-be1040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:35:11] <icinga-wm>	 PROBLEM - puppet last run on ms-be2014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:35:22] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:35:22] <icinga-wm>	 PROBLEM - puppet last run on ms-be1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:35:22] <icinga-wm>	 PROBLEM - puppet last run on ms-be2033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:35:23] <Platonides>	 godog: maybe quiet icinga-wm ?
[13:35:31] <icinga-wm>	 PROBLEM - puppet last run on ms-be2038 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:35:31] <icinga-wm>	 PROBLEM - puppet last run on ms-be2037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:35:32] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:35:38] <wikibugs>	 (03PS1) 10Filippo Giunchedi: hieradata: add mappings for swift/statsd_exporter [puppet] - 10https://gerrit.wikimedia.org/r/470832
[13:35:42] <icinga-wm>	 PROBLEM - puppet last run on ms-be2042 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:35:50] <godog>	 Platonides: fix incoming
[13:35:52] <icinga-wm>	 PROBLEM - puppet last run on ms-be1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:35:52] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:36:00] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] hieradata: add mappings for swift/statsd_exporter [puppet] - 10https://gerrit.wikimedia.org/r/470832 (owner: 10Filippo Giunchedi)
[13:36:12] <icinga-wm>	 PROBLEM - puppet last run on ms-be2022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:36:22] <icinga-wm>	 PROBLEM - puppet last run on ms-be2041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:36:31] <icinga-wm>	 PROBLEM - puppet last run on ms-be1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:36:31] <icinga-wm>	 PROBLEM - puppet last run on ms-be2027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:36:31] <icinga-wm>	 PROBLEM - puppet last run on ms-be1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:36:32] <icinga-wm>	 PROBLEM - puppet last run on ms-be2029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:36:35] <Platonides>	 good
[13:36:42] <icinga-wm>	 PROBLEM - puppet last run on ms-be2019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:36:51] <icinga-wm>	 PROBLEM - puppet last run on ms-be1025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:36:51] <icinga-wm>	 PROBLEM - puppet last run on ms-be2020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:36:51] <icinga-wm>	 PROBLEM - puppet last run on ms-be2028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:36:51] <icinga-wm>	 PROBLEM - puppet last run on ms-be2039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:37:11] <icinga-wm>	 PROBLEM - puppet last run on ms-fe2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:37:32] <icinga-wm>	 PROBLEM - puppet last run on ms-be2016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:37:33] <icinga-wm>	 PROBLEM - puppet last run on ms-be1041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:37:41] <icinga-wm>	 PROBLEM - puppet last run on ms-be1039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:37:41] <icinga-wm>	 PROBLEM - puppet last run on ms-be2031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:37:42] <icinga-wm>	 PROBLEM - puppet last run on ms-be1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:37:51] <icinga-wm>	 PROBLEM - puppet last run on ms-be1015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:37:51] <icinga-wm>	 PROBLEM - puppet last run on ms-fe2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:37:52] <icinga-wm>	 PROBLEM - puppet last run on ms-be2015 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:37:52] <icinga-wm>	 PROBLEM - puppet last run on ms-be1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:37:52] <icinga-wm>	 PROBLEM - puppet last run on ms-be2036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:38:02] <icinga-wm>	 PROBLEM - puppet last run on ms-be2024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:38:11] <icinga-wm>	 PROBLEM - puppet last run on ms-be2035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:38:22] <icinga-wm>	 PROBLEM - puppet last run on ms-be1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:38:31] <icinga-wm>	 PROBLEM - puppet last run on ms-be1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:38:32] <icinga-wm>	 PROBLEM - puppet last run on ms-be1032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:38:40] <herron>	 oof
[13:38:41] <icinga-wm>	 PROBLEM - puppet last run on ms-fe1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:38:42] <icinga-wm>	 PROBLEM - puppet last run on ms-be2043 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:38:43] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 031] admin: add new user 'jdl' [puppet] - 10https://gerrit.wikimedia.org/r/470784 (https://phabricator.wikimedia.org/T207951) (owner: 10Ema)
[13:38:51] <icinga-wm>	 PROBLEM - puppet last run on ms-be2030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:38:59] <herron>	 I’ll turn down ircecho for the time being
[13:39:11] <icinga-wm>	 PROBLEM - puppet last run on ms-be1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:39:11] <icinga-wm>	 PROBLEM - puppet last run on ms-be1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:39:12] <icinga-wm>	 PROBLEM - puppet last run on ms-be2040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:39:25] <Platonides>	 ok
[13:39:41] <wikibugs>	 (03PS2) 10Filippo Giunchedi: hieradata: add mappings for swift/statsd_exporter [puppet] - 10https://gerrit.wikimedia.org/r/470832
[13:39:45] <herron>	 !log temporarily stopping ircecho on einsteinium
[13:39:47] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:41:05] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] hieradata: add mappings for swift/statsd_exporter [puppet] - 10https://gerrit.wikimedia.org/r/470832 (owner: 10Filippo Giunchedi)
[13:41:36] <wikibugs>	 (03PS3) 10Filippo Giunchedi: hieradata: add mappings for swift/statsd_exporter [puppet] - 10https://gerrit.wikimedia.org/r/470832
[13:42:55] <godog>	 herron: thanks
[13:43:39] <volans>	 godog: can I be of any help?
[13:44:44] <godog>	 volans: it should be recovering now but thanks anyways
[13:44:59] <volans>	 sorry, saw it just now
[13:46:00] <godog>	 nah my bad for not running the compiler, too eager
[13:51:05] <wikibugs>	 (03PS25) 10Banyek: mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253)
[13:52:05] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253) (owner: 10Banyek)
[13:53:01] <wikibugs>	 (03PS1) 10Elukey: Move statistics::discovery to stat1007 [puppet] - 10https://gerrit.wikimedia.org/r/470837 (https://phabricator.wikimedia.org/T205846)
[13:54:01] <wikibugs>	 (03CR) 10Elukey: [C: 032] Move statistics::discovery to stat1007 [puppet] - 10https://gerrit.wikimedia.org/r/470837 (https://phabricator.wikimedia.org/T205846) (owner: 10Elukey)
[13:57:42] <wikibugs>	 (03PS1) 10Filippo Giunchedi: hieradata: fix statsd_exporter mappings for swift [puppet] - 10https://gerrit.wikimedia.org/r/470838
[13:58:16] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] hieradata: fix statsd_exporter mappings for swift [puppet] - 10https://gerrit.wikimedia.org/r/470838 (owner: 10Filippo Giunchedi)
[13:58:29] <wikibugs>	 (03PS2) 10Filippo Giunchedi: hieradata: fix statsd_exporter mappings for swift [puppet] - 10https://gerrit.wikimedia.org/r/470838
[14:08:46] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] admin: requested groups membership for sbassett [puppet] - 10https://gerrit.wikimedia.org/r/470783 (https://phabricator.wikimedia.org/T207852) (owner: 10Ema)
[14:08:54] <wikibugs>	 10Operations, 10Release-Engineering-Team (Kanban): Migrate operations/puppet CI job from Jessie to Stretch - https://phabricator.wikimedia.org/T208422 (10hashar)
[14:10:15] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] admin: move sbassett to users [puppet] - 10https://gerrit.wikimedia.org/r/470779 (https://phabricator.wikimedia.org/T207852) (owner: 10Ema)
[14:11:07] <wikibugs>	 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Migrate operations/puppet CI job from Jessie to Stretch - https://phabricator.wikimedia.org/T208422 (10hashar) a:03hashar
[14:11:21] <herron>	 !log re-enabling ircecho on einsteinium
[14:11:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:13:00] <wikibugs>	 (03PS1) 10Elukey: Move ::statistics::wmde to stat1007 [puppet] - 10https://gerrit.wikimedia.org/r/470840 (https://phabricator.wikimedia.org/T205846)
[14:13:13] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] admin: add new user 'jdl' [puppet] - 10https://gerrit.wikimedia.org/r/470784 (https://phabricator.wikimedia.org/T207951) (owner: 10Ema)
[14:14:09] <wikibugs>	 (03CR) 10Elukey: [C: 032] Move ::statistics::wmde to stat1007 [puppet] - 10https://gerrit.wikimedia.org/r/470840 (https://phabricator.wikimedia.org/T205846) (owner: 10Elukey)
[14:17:31] <hasharLunch>	 !log Adding a Stretch based CI job for operations/puppet (non voting job for now) | T208422
[14:17:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:17:35] <stashbot>	 T208422: Migrate operations/puppet CI job from Jessie to Stretch - https://phabricator.wikimedia.org/T208422
[14:17:37] <wikibugs>	 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Migrate operations/puppet CI job from Jessie to Stretch - https://phabricator.wikimedia.org/T208422 (10hashar) Deployed, the new job uses Stretch and is non voting until it is proven to be working properly :]
[14:17:44] <logmsgbot>	 Testing dologmsg on mwmaint1002
[14:18:11] <logmsgbot>	 !log anomie Testing dologmsg on mwmaint1002
[14:18:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:18:27] <wikibugs>	 (03PS3) 10Ema: admin: move sbassett to users [puppet] - 10https://gerrit.wikimedia.org/r/470779 (https://phabricator.wikimedia.org/T207852)
[14:18:36] <wikibugs>	 (03Abandoned) 10Hashar: git buildpackage configuration [debs/wmf-pt-kill] - 10https://gerrit.wikimedia.org/r/461941 (owner: 10Hashar)
[14:19:22] <wikibugs>	 (03CR) 10Ema: [C: 032] admin: move sbassett to users [puppet] - 10https://gerrit.wikimedia.org/r/470779 (https://phabricator.wikimedia.org/T207852) (owner: 10Ema)
[14:20:06] <wikibugs>	 (03Abandoned) 10Hashar: Bump Jinja2 to 2.10+ [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/399155 (owner: 10Hashar)
[14:21:59] <wikibugs>	 (03CR) 10Mathew.onipe: "Just one comment. But all looks good!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/470819 (https://phabricator.wikimedia.org/T199228) (owner: 10Gehel)
[14:22:02] <icinga-wm>	 PROBLEM - puppet last run on stat1007 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 4 minutes ago with 2 failures. Failed resources (up to 3 shown): File[/srv/analytics-wmde/graphite],File[/srv/analytics-wmde/wdcm]
[14:22:41] <logmsgbot>	 !log anomie@mwmaint1002 Running migrateImageCommentTemp.php on group0 for T188132
[14:22:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:22:45] <stashbot>	 T188132: Merge image_comment_temp table into the image table - https://phabricator.wikimedia.org/T188132
[14:24:06] <logmsgbot>	 !log anomie@mwmaint1002 Running migrateComments.php on group0 for T166733
[14:24:09] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:24:10] <stashbot>	 T166733: Deploy refactored comment storage - https://phabricator.wikimedia.org/T166733
[14:25:02] <wikibugs>	 (03Abandoned) 10Hashar: prometheus: make ferm DNS record type configurable [puppet] - 10https://gerrit.wikimedia.org/r/381073 (https://phabricator.wikimedia.org/T153468) (owner: 10Hashar)
[14:25:20] <wikibugs>	 (03CR) 10Mathew.onipe: [C: 031] maps: increase alerting threshold on OSM replication lag [puppet] - 10https://gerrit.wikimedia.org/r/470787 (owner: 10Gehel)
[14:29:22] <wikibugs>	 (03CR) 10Faidon Liambotis: "This is an improvement, so not voting it down, but it also seems like a good candidate for moving this to Hiera." [puppet] - 10https://gerrit.wikimedia.org/r/470446 (https://phabricator.wikimedia.org/T208244) (owner: 10Andrew Bogott)
[14:31:26] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review, 10User-Banyek: rack/setup/install pc2007-pc2010 - https://phabricator.wikimedia.org/T207259 (10Banyek) a:05Banyek>03Papaul @Papaul as I checked the storage on the hosts it's set up for with stripe size of 512Kb instead of 256K (https://wikitech.wi...
[14:31:41] <wikibugs>	 (03PS3) 10Herron: role::logstash::collector: migrate to profile::logstash::collector [puppet] - 10https://gerrit.wikimedia.org/r/470452 (https://phabricator.wikimedia.org/T206454)
[14:32:43] <wikibugs>	 (03CR) 10Herron: [C: 032] role::logstash::collector: migrate to profile::logstash::collector [puppet] - 10https://gerrit.wikimedia.org/r/470452 (https://phabricator.wikimedia.org/T206454) (owner: 10Herron)
[14:34:12] <anomie>	 jynus or marostegui: Please let me know if the maintenance run I just started seems to cause any problems on s3. If all goes well, I'll be doing similar runs for the rest of the wikis (in parallel by section) soon-ish. Maybe tomorrow or Monday depending on how fast the current run completes.
[14:34:35] <wikibugs>	 10Operations, 10Certcentral, 10DNS, 10Traffic, 10Patch-For-Review: Allow Let's Encrypt issue wildcard certificates - https://phabricator.wikimedia.org/T208390 (10Vgutierrez)
[14:35:21] <jynus>	 anomie: one sec
[14:37:51] <wikibugs>	 (03CR) 10Alex Monk: "I think that's complicated by the fact that standard::ntp gets included in the "standard" class itself, which is included from a *lot* of " [puppet] - 10https://gerrit.wikimedia.org/r/470446 (https://phabricator.wikimedia.org/T208244) (owner: 10Andrew Bogott)
[14:37:55] <wikibugs>	 (03PS2) 10Ema: admin: requested groups membership for sbassett [puppet] - 10https://gerrit.wikimedia.org/r/470783 (https://phabricator.wikimedia.org/T207852)
[14:38:48] <wikibugs>	 (03PS1) 10Vgutierrez: certcentral: Add pinkunicorn-wildcard certificate configuration [puppet] - 10https://gerrit.wikimedia.org/r/470846 (https://phabricator.wikimedia.org/T208424)
[14:39:09] <wikibugs>	 (03CR) 10Ema: [C: 032] admin: requested groups membership for sbassett [puppet] - 10https://gerrit.wikimedia.org/r/470783 (https://phabricator.wikimedia.org/T207852) (owner: 10Ema)
[14:41:44] <wikibugs>	 (03CR) 10Vgutierrez: "pcc looks happy: https://puppet-compiler.wmflabs.org/compiler1002/13280/" [puppet] - 10https://gerrit.wikimedia.org/r/470846 (https://phabricator.wikimedia.org/T208424) (owner: 10Vgutierrez)
[14:42:34] <icinga-wm>	 RECOVERY - High lag on wdqs1005 is OK: (C)3600 ge (W)1200 ge 170 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[14:43:10] <cmjohnson1>	 !log movd wtp1034 eth0 to new switch...it was left over
[14:43:12] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:44:45] <icinga-wm>	 RECOVERY - High lag on wdqs1004 is OK: (C)3600 ge (W)1200 ge 306 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[14:45:06] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review, 10User-Banyek: rack/setup/install pc2007-pc2010 - https://phabricator.wikimedia.org/T207259 (10jcrespo) A larger stripe size should not be a huge issue (unlike a smaller one, which affected performance significantly and we didn't like it). We were thi...
[14:47:15] <icinga-wm>	 RECOVERY - puppet last run on stat1007 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[14:47:21] <wikibugs>	 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review, 10User-Banyek: rack/setup/install pc2007-pc2010 - https://phabricator.wikimedia.org/T207259 (10Banyek) @jcrespo actually i can change the stripe size on one of the hosts, and do some comparison, what do you think about this?
[14:54:12] <wikibugs>	 (03CR) 10Effie Mouzeli: [C: 031] admin: groups membership for jdl [puppet] - 10https://gerrit.wikimedia.org/r/470785 (https://phabricator.wikimedia.org/T207951) (owner: 10Ema)
[14:58:17] <wikibugs>	 (03PS2) 10Ema: admin: add new user 'jdl' [puppet] - 10https://gerrit.wikimedia.org/r/470784 (https://phabricator.wikimedia.org/T207951)
[14:59:14] <wikibugs>	 (03CR) 10Ema: [C: 032] admin: add new user 'jdl' [puppet] - 10https://gerrit.wikimedia.org/r/470784 (https://phabricator.wikimedia.org/T207951) (owner: 10Ema)
[15:02:38] <wikibugs>	 (03PS2) 10Ema: admin: groups membership for jdl [puppet] - 10https://gerrit.wikimedia.org/r/470785 (https://phabricator.wikimedia.org/T207951)
[15:03:22] <wikibugs>	 (03CR) 10Ema: [C: 032] admin: groups membership for jdl [puppet] - 10https://gerrit.wikimedia.org/r/470785 (https://phabricator.wikimedia.org/T207951) (owner: 10Ema)
[15:07:15] <wikibugs>	 (03PS1) 10Cmjohnson: Adding mgmt dns for new servers p1007-10 [dns] - 10https://gerrit.wikimedia.org/r/470849 (https://phabricator.wikimedia.org/T207258)
[15:12:03] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10Patch-For-Review: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Cmjohnson)
[15:14:51] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1019 - https://phabricator.wikimedia.org/T196507 (10Cmjohnson) HP wanted me to reseat the sata cables which I did, and now all 10 disks are showing again but we're back to the original issue of the raid battery not fully charging.    The amount of time and e...
[15:16:08] <wikibugs>	 (03PS1) 10Elukey: role::statistics::private: add deprecation motd to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/470850 (https://phabricator.wikimedia.org/T205846)
[15:16:36] <wikibugs>	 (03PS2) 10Elukey: role::statistics::private: add deprecation motd to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/470850 (https://phabricator.wikimedia.org/T205846)
[15:17:11] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] role::statistics::private: add deprecation motd to stat1005 [puppet] - 10https://gerrit.wikimedia.org/r/470850 (https://phabricator.wikimedia.org/T205846) (owner: 10Elukey)
[15:18:22] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: Degraded RAID on aqs1006 - https://phabricator.wikimedia.org/T206915 (10Cmjohnson) @elukey the new disk arrived, I am happy to swap it whenever you're ready.   it's the first disk on the server and you will need manually replace it in raid since it's SW raid.    ping w...
[15:18:34] <wikibugs>	 (03CR) 10Elukey: [V: 032 C: 032] "https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/470850/" [puppet] - 10https://gerrit.wikimedia.org/r/470850 (https://phabricator.wikimedia.org/T205846) (owner: 10Elukey)
[15:19:17] <wikibugs>	 (03PS3) 10Cwhite: graphite: add queue_depth and batch_size options to carbon-c-relay [puppet] - 10https://gerrit.wikimedia.org/r/470659 (https://phabricator.wikimedia.org/T196484)
[15:19:18] <anomie>	 My run of migrateComments.php on group0 finished now.
[15:19:28] <wikibugs>	 (03PS1) 10Banyek: mariadb: added pc2007 as parsercache host for shard pc1 [puppet] - 10https://gerrit.wikimedia.org/r/470851 (https://phabricator.wikimedia.org/T208383)
[15:19:49] <jynus>	 anomie: as usual, no concerns except the lagging it may create
[15:20:03] <anomie>	 jynus: Thanks for checking.
[15:20:06] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] mariadb: added pc2007 as parsercache host for shard pc1 [puppet] - 10https://gerrit.wikimedia.org/r/470851 (https://phabricator.wikimedia.org/T208383) (owner: 10Banyek)
[15:20:22] <jynus>	 anomie: also you have a tendency to run those just before friday/major holidays :-D
[15:20:48] <anomie>	 jynus: Is there a holiday coming up?
[15:21:29] <jynus>	 no issue, just a funny coincidence
[15:22:17] <jynus>	 just be around to calm down people if lag starts happening on codfw/dbstores/labsdbs
[15:22:28] <jynus>	 ;-)
[15:23:13] <wikibugs>	 (03PS1) 10Dmaza: Enable Partial Blocks on testwiki and testiwkidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470852 (https://phabricator.wikimedia.org/T203821)
[15:23:14] <jynus>	 all the work you are doing with DMLs is work you are saving that I will not have to do with DDLs
[15:23:18] <wikibugs>	 (03PS4) 10Cwhite: add socket_bufsize option to make SO_RCVBUF tunable [debs/statsd-proxy] (wmf_v0.0.10) - 10https://gerrit.wikimedia.org/r/470512 (https://phabricator.wikimedia.org/T196484)
[15:23:46] <jynus>	 anomie: one last thing, only partially related
[15:24:24] <jynus>	 we are going to start implementing db consistency checks/alerts, I may need your input in the future for some ideas
[15:24:42] <anomie>	 Ok, feel free to CC me.
[15:25:52] <elukey>	 cmjohnson1: I can try to shutdown aqs1006 in 5/10 mins
[15:25:56] <elukey>	 are you free?
[15:26:01] <cmjohnson1>	 Yes
[15:26:08] <cmjohnson1>	 Free enough ;-)
[15:26:14] <jynus>	 anomie: thank you
[15:26:46] <cmjohnson1>	 Elukey. Disk is hot swap. Shutdown is not necessary
[15:27:41] <elukey>	 cmjohnson1: ah ok, I only need to fail the disk via mdadm
[15:27:46] <cmjohnson1>	 Plus it’s the first disk. Most likely has grub installed on it. If i pull it out you may not get the OS to come back on reboot. 
[15:29:06] <wikibugs>	 (03PS4) 10Cwhite: graphite: add queue_depth and batch_size options to carbon-c-relay [puppet] - 10https://gerrit.wikimedia.org/r/470659 (https://phabricator.wikimedia.org/T196484)
[15:29:59] <wikibugs>	 (03CR) 10Cwhite: [C: 032] graphite: add queue_depth and batch_size options to carbon-c-relay [puppet] - 10https://gerrit.wikimedia.org/r/470659 (https://phabricator.wikimedia.org/T196484) (owner: 10Cwhite)
[15:31:48] <wikibugs>	 (03PS2) 10Gehel: wdqs: raise alerting threshold on updater lag for public cluster [puppet] - 10https://gerrit.wikimedia.org/r/470819 (https://phabricator.wikimedia.org/T199228)
[15:31:53] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10User-jijiki: Requesting access to deployment, operational logs, and analytics cluster for jlinehan - https://phabricator.wikimedia.org/T207951 (10ema) @jlinehan please try to SSH as jdl to one of the systems you should now have access to, for examp...
[15:32:23] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to Jupyter notebook / analytics-privatedata-users for jgleeson - https://phabricator.wikimedia.org/T208432 (10jgleeson)
[15:32:45] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] wdqs: raise alerting threshold on updater lag for public cluster [puppet] - 10https://gerrit.wikimedia.org/r/470819 (https://phabricator.wikimedia.org/T199228) (owner: 10Gehel)
[15:32:56] <wikibugs>	 (03PS3) 10Gehel: wdqs: raise alerting threshold on updater lag for public cluster [puppet] - 10https://gerrit.wikimedia.org/r/470819 (https://phabricator.wikimedia.org/T199228)
[15:32:57] <wikibugs>	 10Operations: Package and install php 7.2 in place of php 7.0 - https://phabricator.wikimedia.org/T208433 (10Joe) p:05Triage>03High
[15:33:00] <elukey>	 cmjohnson1: need to figure out what it is the partition to fail sorry, can we do it in say 2h? (meetings now)
[15:33:01] <wikibugs>	 (03CR) 10Gehel: wdqs: raise alerting threshold on updater lag for public cluster (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/470819 (https://phabricator.wikimedia.org/T199228) (owner: 10Gehel)
[15:33:03] <elukey>	 otherwise will do it now
[15:33:15] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10User-jijiki: Requesting access to deployment and analytics-privatedata-users for sbassett - https://phabricator.wikimedia.org/T207852 (10ema) @sbassett please try to SSH as sbassett to one of the systems you should now have access to, for example s...
[15:33:21] <wikibugs>	 10Operations, 10User-Joe: Package and install php 7.2 in place of php 7.0 - https://phabricator.wikimedia.org/T208433 (10Joe) a:03Joe
[15:33:27] <cmjohnson1>	 let's plan for tomorrow morning (elukey)
[15:33:37] <elukey>	 ack
[15:33:51] <wikibugs>	 (03CR) 10Vgutierrez: [C: 032] wikimedia.org CAA: allow wildcards for LE [dns] - 10https://gerrit.wikimedia.org/r/470816 (https://phabricator.wikimedia.org/T208390) (owner: 10BBlack)
[15:39:06] <icinga-wm>	 RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 19 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[15:39:50] <wikibugs>	 (03PS2) 10Cwhite: phabricator: remove custom diamond::collector [puppet] - 10https://gerrit.wikimedia.org/r/466988 (https://phabricator.wikimedia.org/T183454)
[15:40:01] <elukey>	 cmjohnson1: sorry for the extra ping - just checked and it seems that mdadm already expelled the disk (/dev/sde) so I think that you are ready to swap if you have time now or later
[15:40:15] <elukey>	 the host is already depooled
[15:40:31] <elukey>	 I was confused by the output of /prod/mdstat
[15:40:36] <elukey>	 *proc
[15:40:55] <cmjohnson1>	 okay..yeah I can do it now
[15:40:58] <cmjohnson1>	 elukey ^
[15:41:02] <elukey>	 super
[15:41:48] <cmjohnson1>	 done
[15:41:52] <wikibugs>	 (03CR) 10Cwhite: [C: 032] phabricator: remove custom diamond::collector [puppet] - 10https://gerrit.wikimedia.org/r/466988 (https://phabricator.wikimedia.org/T183454) (owner: 10Cwhite)
[15:41:54] <elukey>	 thanks!
[15:42:23] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install an-worker10[78-96].eqiad.wmnet - https://phabricator.wikimedia.org/T207192 (10Cmjohnson)
[15:42:32] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on sodium - https://phabricator.wikimedia.org/T202705 (10Cmjohnson) The disk has been swapped
[15:45:56] <wikibugs>	 (03PS2) 10Vgutierrez: certcentral: Add pinkunicorn-wildcard certificate configuration [puppet] - 10https://gerrit.wikimedia.org/r/470846 (https://phabricator.wikimedia.org/T208424)
[15:46:04] <wikibugs>	 (03CR) 10Vgutierrez: [C: 032] certcentral: Add pinkunicorn-wildcard certificate configuration [puppet] - 10https://gerrit.wikimedia.org/r/470846 (https://phabricator.wikimedia.org/T208424) (owner: 10Vgutierrez)
[15:46:26] <icinga-wm>	 PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 47 probes of 325 (alerts on 35) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23RIPE_alerts
[15:46:30] <wikibugs>	 (03PS1) 10Andrew Bogott: Horizon: move projects to eqiad1: huggle, mwstake, logging, mobile [puppet] - 10https://gerrit.wikimedia.org/r/470855 (https://phabricator.wikimedia.org/T204745)
[15:47:30] <wikibugs>	 (03PS2) 10Andrew Bogott: Horizon: move projects to eqiad1: huggle, mwstake, logging, mobile [puppet] - 10https://gerrit.wikimedia.org/r/470855 (https://phabricator.wikimedia.org/T204745)
[15:48:53] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] Horizon: move projects to eqiad1: huggle, mwstake, logging, mobile [puppet] - 10https://gerrit.wikimedia.org/r/470855 (https://phabricator.wikimedia.org/T204745) (owner: 10Andrew Bogott)
[15:50:12] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Ottomata) Just had a great meeting with @chasemp, @faidon, @JAllemandou  and @nuria.  The main action item (after Nuria h...
[15:53:27] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10chasemp) My notes from the 2018-10-31 meeting:   ```https://phabricator.wikimedia.org/T207321#4691776  * hosts that push...
[15:57:34] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10User-jijiki: Requesting access to deployment and analytics-privatedata-users for sbassett - https://phabricator.wikimedia.org/T207852 (10sbassett) @ema Looks like I'm in (`sbassett@stat1007:~$`)  Thanks.
[16:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: That opportune time is upon us again. Time for a Morning SWAT (Max 6 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181031T1600).
[16:00:04] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[16:04:03] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review, 10User-jijiki: Requesting access to deployment and analytics-privatedata-users for sbassett - https://phabricator.wikimedia.org/T207852 (10ema) 05Open>03Resolved a:03ema Very well!