[00:00:04] RoanKattouw ostriches Krenair: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160106T0000). Please do the needful. [00:10:16] (03PS3) 10Dzahn: icinga/ores: put homemade plugins into /usr/local/ [puppet] - 10https://gerrit.wikimedia.org/r/262677 [00:11:21] (03CR) 10jenkins-bot: [V: 04-1] icinga/ores: put homemade plugins into /usr/local/ [puppet] - 10https://gerrit.wikimedia.org/r/262677 (owner: 10Dzahn) [00:21:32] 7Blocked-on-Operations, 6operations, 10RESTBase, 6Services: Switch RESTBase to use Node.js 4.2 - https://phabricator.wikimedia.org/T107762#1917202 (10MoritzMuehlenhoff) I've made a backport of nodejs 4.2.4 for jessie (not copied to carbon yet, need some testing feedback from node-based apps before that can... [00:30:20] (03PS2) 10Yuvipanda: network: move esams/ulsfo subnets below codfw [puppet] - 10https://gerrit.wikimedia.org/r/260922 (owner: 10Faidon Liambotis) [00:30:36] (03CR) 10Yuvipanda: [C: 032 V: 032] network: move esams/ulsfo subnets below codfw [puppet] - 10https://gerrit.wikimedia.org/r/260922 (owner: 10Faidon Liambotis) [00:32:59] (03CR) 10Yuvipanda: network: move frack networks into a separate realm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/260923 (owner: 10Faidon Liambotis) [00:34:39] (03CR) 10Faidon Liambotis: network: move frack networks into a separate realm (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/260923 (owner: 10Faidon Liambotis) [00:45:07] (03PS4) 10Dzahn: icinga/ores: put homemade plugins into /usr/local/ [puppet] - 10https://gerrit.wikimedia.org/r/262677 [00:45:33] (03PS5) 10Dzahn: icinga/ores: put homemade plugins into /usr/local/ [puppet] - 10https://gerrit.wikimedia.org/r/262677 [00:47:24] (03PS5) 10Madhuvishy: [WIP] wikimetrics: Puppet module for wikimetrics [puppet] - 10https://gerrit.wikimedia.org/r/260687 [00:48:22] (03CR) 10Dzahn: [C: 032] icinga/ores: put homemade plugins into /usr/local/ [puppet] - 10https://gerrit.wikimedia.org/r/262677 (owner: 10Dzahn) [01:09:04] 6operations, 10ores, 7Icinga, 5Patch-For-Review: change ores monitoring to avoid icinga reload on puppet runs - https://phabricator.wikimedia.org/T122830#1917264 (10Dzahn) 5Open>3Resolved https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=ores.wmflabs.org&service=ORES+worker [01:15:28] 6operations, 10ores, 7Icinga: change ores monitoring to avoid icinga reload on puppet runs - https://phabricator.wikimedia.org/T122830#1917266 (10Dzahn) [01:35:29] PROBLEM - MariaDB Slave Lag: m3 on db1048 is CRITICAL: CRITICAL slave_sql_lag Seconds_Behind_Master: 1121 [01:37:39] RECOVERY - MariaDB Slave Lag: m3 on db1048 is OK: OK slave_sql_lag Seconds_Behind_Master: 0 [02:27:45] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 30s) [02:27:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:34:38] !log l10nupdate@tin ResourceLoader cache refresh completed at Wed Jan 6 02:34:38 UTC 2016 (duration 6m 53s) [02:34:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:01:55] Hmmm. [03:28:01] (03PS4) 10Alex Monk: Make MediaWiki treat $lang of be_x_oldwiki as be-tarask, just don't change the real DB name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/236966 (https://phabricator.wikimedia.org/T111853) [04:02:33] PROBLEM - puppet last run on restbase2006 is CRITICAL: CRITICAL: puppet fail [04:27:14] RECOVERY - puppet last run on restbase2006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [04:30:24] PROBLEM - RAID on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:36:13] PROBLEM - puppet last run on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:40:33] PROBLEM - DPKG on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:41:04] PROBLEM - SSH on mw1007 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:43:44] PROBLEM - configured eth on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:45:04] PROBLEM - nutcracker port on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:46:05] PROBLEM - dhclient process on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:46:05] PROBLEM - salt-minion processes on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:52:23] RECOVERY - salt-minion processes on mw1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [04:52:34] PROBLEM - nutcracker process on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:52:38] (03CR) 10John Vandenberg: [C: 031] base: fix missing whitespaces in check_conntrack.py [puppet] - 10https://gerrit.wikimedia.org/r/262593 (owner: 10Hashar) [04:53:17] Krenair: "I can use a mouse" made me lol [04:53:32] (03CR) 10John Vandenberg: base: fix missing whitespaces in check_conntrack.py (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/262593 (owner: 10Hashar) [04:53:50] (03CR) 10John Vandenberg: [C: 031] elasticsearch: lint check_elasticsearch.py [puppet] - 10https://gerrit.wikimedia.org/r/262594 (owner: 10Hashar) [04:54:10] (03CR) 10John Vandenberg: [C: 031] interface: lint interface-rps.py [puppet] - 10https://gerrit.wikimedia.org/r/262595 (owner: 10Hashar) [04:54:16] (03PS3) 10Chad: Use %{TIME_YEAR} instead of updating Wikimania redirects every year [puppet] - 10https://gerrit.wikimedia.org/r/262670 [04:54:26] (03CR) 10John Vandenberg: [C: 031] toollabs: lint genpp.py [puppet] - 10https://gerrit.wikimedia.org/r/262596 (owner: 10Hashar) [04:55:13] (03CR) 10John Vandenberg: [C: 031] varnish: lint varnishlog.py [puppet] - 10https://gerrit.wikimedia.org/r/262597 (owner: 10Hashar) [04:56:44] PROBLEM - Disk space on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:57:39] (03CR) 10John Vandenberg: tox entry point to run pep8==1.4.6 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/244148 (https://phabricator.wikimedia.org/T114887) (owner: 10Hashar) [04:58:01] (03CR) 10John Vandenberg: [C: 031] "\o/" [puppet] - 10https://gerrit.wikimedia.org/r/262598 (https://phabricator.wikimedia.org/T114887) (owner: 10Hashar) [04:58:11] Hmm, mw1007 (^^) is responding to ping but not ssh. [04:58:25] And out of space per icinga [04:58:35] PROBLEM - salt-minion processes on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:58:59] Yeah mw1007 is unhappy [05:03:48] Downside of everyone being in SF: nobody's working at 9pm PST :p [05:16:34] RECOVERY - SSH on mw1007 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [05:18:53] PROBLEM - puppet last run on mw1004 is CRITICAL: CRITICAL: Puppet has 55 failures [05:22:44] PROBLEM - SSH on mw1007 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:50:34] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: puppet fail [05:54:24] ostriches: just airtask someone to go wake them up with a airhorn? [06:12:54] RECOVERY - puppet last run on mw1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:17:34] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:24:33] PROBLEM - NTP on mw1007 is CRITICAL: NTP CRITICAL: No response from NTP server [06:32:05] PROBLEM - puppet last run on mw1112 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:14] PROBLEM - puppet last run on analytics1047 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:43] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:44] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:44] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:23] PROBLEM - puppet last run on mw1060 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:25] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:43] PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:54] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 2 failures [06:56:14] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [06:56:23] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:56:54] RECOVERY - puppet last run on mw1060 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:54] RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:54] RECOVERY - puppet last run on mw1112 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:14] RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:57:34] RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [06:58:24] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:14] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:12:24] RECOVERY - NTP on mw1007 is OK: NTP OK: Offset 0.002798438072 secs [08:06:53] PROBLEM - NTP on mw1007 is CRITICAL: NTP CRITICAL: No response from NTP server [08:13:35] PROBLEM - puppet last run on mw1016 is CRITICAL: CRITICAL: Puppet has 2 failures [08:13:41] (03PS1) 10Ottomata: Reinstall spare analytics1017 as Jessie for mw dev summit Jupyter hacking [puppet] - 10https://gerrit.wikimedia.org/r/262698 [08:14:22] (03CR) 10Ottomata: [C: 032 V: 032] Reinstall spare analytics1017 as Jessie for mw dev summit Jupyter hacking [puppet] - 10https://gerrit.wikimedia.org/r/262698 (owner: 10Ottomata) [08:15:14] 6operations, 6Labs, 10wikitech.wikimedia.org: Rename specific account in LDAP, Wikitech, Gerrit and Phabricator - https://phabricator.wikimedia.org/T85913#1917543 (10demon) If somebody can give me the rename user rights on wikitech I can do this. As I've said before, that's the piece I lack. [08:30:24] RECOVERY - nutcracker port on mw1007 is OK: TCP OK - 0.000 second response time on port 11212 [08:36:53] PROBLEM - nutcracker port on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:37:25] RECOVERY - configured eth on mw1007 is OK: OK - interfaces up [08:37:53] RECOVERY - nutcracker process on mw1007 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:37:53] RECOVERY - dhclient process on mw1007 is OK: PROCS OK: 0 processes with command name dhclient [08:37:53] RECOVERY - salt-minion processes on mw1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:38:05] RECOVERY - Disk space on mw1007 is OK: DISK OK [08:38:13] RECOVERY - NTP on mw1007 is OK: NTP OK: Offset 0.003093600273 secs [08:38:14] RECOVERY - DPKG on mw1007 is OK: All packages OK [08:38:44] RECOVERY - RAID on mw1007 is OK: OK: no RAID installed [08:38:44] RECOVERY - nutcracker port on mw1007 is OK: TCP OK - 0.000 second response time on port 11212 [08:39:13] RECOVERY - SSH on mw1007 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [08:42:23] RECOVERY - puppet last run on mw1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:42:35] PROBLEM - salt-minion processes on tin is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [08:54:23] PROBLEM - salt-minion processes on mira is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [09:08:51] RECOVERY - salt-minion processes on tin is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [09:17:47] (03CR) 10Hashar: base: fix missing whitespaces in check_conntrack.py (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/262593 (owner: 10Hashar) [09:19:42] RECOVERY - salt-minion processes on mira is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [09:24:51] 6operations, 10Beta-Cluster-Infrastructure, 6Labs: deployment-mediawiki03 : apt broken trying to reach out webproxy.eqiad.wmnet - https://phabricator.wikimedia.org/T122953#1917550 (10hashar) 3NEW [09:25:30] 6operations, 10Beta-Cluster-Infrastructure, 6Labs: deployment-mediawiki03 : apt broken trying to reach out webproxy.eqiad.wmnet - https://phabricator.wikimedia.org/T122953#1917557 (10hashar) [09:28:43] 6operations, 10Beta-Cluster-Infrastructure, 6Labs: deployment-mediawiki03 : apt broken trying to reach out webproxy.eqiad.wmnet - https://phabricator.wikimedia.org/T122953#1917559 (10hashar) Command to mass check: `root@deployment-salt:~ # salt -v '*' file.find /etc/apt grep=webproxy` [09:35:39] 6operations, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 6Labs: deployment-mediawiki03 : apt broken trying to reach out webproxy.eqiad.wmnet - https://phabricator.wikimedia.org/T122953#1917560 (10hashar) The`apt` puppet class has a `$use_proxy` parameter that would create the fi... [09:35:51] RECOVERY - puppet last run on mw1016 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [09:35:59] 6operations, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 6Labs: deployment-mediawiki03 : apt broken trying to reach out webproxy.eqiad.wmnet - https://phabricator.wikimedia.org/T122953#1917562 (10hashar) 5Open>3Resolved a:3hashar [09:54:52] PROBLEM - salt-minion processes on bohrium is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [09:56:12] PROBLEM - salt-minion processes on cygnus is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [10:06:42] RECOVERY - salt-minion processes on cygnus is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:15:41] PROBLEM - RAID on analytics1017 is CRITICAL: Timeout while attempting connection [10:15:51] PROBLEM - Disk space on analytics1017 is CRITICAL: Timeout while attempting connection [10:16:02] PROBLEM - NTP on analytics1017 is CRITICAL: NTP CRITICAL: No response from NTP server [10:16:22] PROBLEM - configured eth on analytics1017 is CRITICAL: Timeout while attempting connection [10:16:41] PROBLEM - dhclient process on analytics1017 is CRITICAL: Timeout while attempting connection [10:17:02] PROBLEM - salt-minion processes on analytics1017 is CRITICAL: Timeout while attempting connection [10:17:02] PROBLEM - puppet last run on analytics1017 is CRITICAL: Timeout while attempting connection [10:17:41] PROBLEM - DPKG on analytics1017 is CRITICAL: Timeout while attempting connection [10:17:41] PROBLEM - Check size of conntrack table on analytics1017 is CRITICAL: Timeout while attempting connection [10:21:51] RECOVERY - salt-minion processes on bohrium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:23:43] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [10:26:22] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [1000.0] [10:32:22] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 35, down: 1, dormant: 0, excluded: 1, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-4/2/0 (Telia, IC-314533, 24ms) {#11371} [10Gbps wave]BR [10:39:11] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 205, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-eqord:xe-1/0/0 (Telia, IC-314533, 29ms) {#3658} [10Gbps wave]BR [10:40:41] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 37, down: 0, dormant: 0, excluded: 1, unused: 0 [10:41:11] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 207, down: 0, dormant: 0, excluded: 0, unused: 0 [10:46:41] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [10:47:12] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [10:47:21] PROBLEM - puppet last run on wtp2020 is CRITICAL: CRITICAL: puppet fail [11:14:21] RECOVERY - puppet last run on wtp2020 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [13:05:19] 6operations, 10MediaWiki-Maintenance-scripts: WikiExporter does not respect groupLoadsByDB[$wiki]['dump'] - https://phabricator.wikimedia.org/T43668#1917699 (10TTO) This is an old task. Is it still relevant in the current context? [14:51:03] PROBLEM - Kafka Broker Replica Max Lag on kafka1014 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [5000000.0] [15:03:42] RECOVERY - Kafka Broker Replica Max Lag on kafka1014 is OK: OK: Less than 50.00% above the threshold [1000000.0] [15:18:30] (03CR) 10BBlack: [C: 031] interface: lint interface-rps.py [puppet] - 10https://gerrit.wikimedia.org/r/262595 (owner: 10Hashar) [15:19:02] (03CR) 10BBlack: [C: 031] varnish: lint varnishlog.py [puppet] - 10https://gerrit.wikimedia.org/r/262597 (owner: 10Hashar) [15:34:30] bblack: I should have made them standalone changes :/ [15:35:37] sometimes I wish gerrit had a button for that :) [15:37:10] usually my quick hack if I made 3-4 independent changes in series on my production checkout, is just to do a git pull -r, then iterate through "rebase -i; move change N to the first position; git push origin N:refs/for/production;", then they all are independent. [16:11:54] (03Abandoned) 10Thcipriani: Ensure apt update before sql libraries install [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/195779 (https://phabricator.wikimedia.org/T91545) (owner: 10Thcipriani) [16:21:20] 6operations, 6Labs, 10wikitech.wikimedia.org: Rename specific account in LDAP, Wikitech, Gerrit and Phabricator - https://phabricator.wikimedia.org/T85913#1918059 (10bd808) >>! In T85913#1917543, @demon wrote: > If somebody can give me the rename user rights on wikitech I can do this. As I've said before, th... [16:34:29] bblack: Gerrit has a cherry-pick button, let you take a patch on tip of production branch (and thus break the dependency chain) [16:34:55] ah I guess that's the button then :) [16:35:36] I should do that for all those patches [16:35:48] I have sent them as a chain because the last one enable pep8 on the whole repo [16:36:12] PROBLEM - Kafka Broker Replica Max Lag on kafka1022 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [5000000.0] [16:42:31] RECOVERY - Kafka Broker Replica Max Lag on kafka1022 is OK: OK: Less than 50.00% above the threshold [1000000.0] [16:47:12] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [16:48:31] (03PS2) 10Hashar: varnish: lint varnishlog.py [puppet] - 10https://gerrit.wikimedia.org/r/262597 [16:48:41] (03PS2) 10Hashar: interface: lint interface-rps.py [puppet] - 10https://gerrit.wikimedia.org/r/262595 [16:48:48] (03PS2) 10Hashar: base: fix missing whitespaces in check_conntrack.py [puppet] - 10https://gerrit.wikimedia.org/r/262593 [16:48:53] PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [16:48:55] (03PS2) 10Hashar: toollabs: lint genpp.py [puppet] - 10https://gerrit.wikimedia.org/r/262596 [16:49:04] (03PS2) 10Hashar: elasticsearch: lint check_elasticsearch.py [puppet] - 10https://gerrit.wikimedia.org/r/262594 [16:53:02] RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [16:53:22] PROBLEM - puppet last run on mw2053 is CRITICAL: CRITICAL: puppet fail [16:53:22] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [17:00:04] Deploy window Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160106T1700) [17:09:32] (03CR) 10Lokal Profil: "For l10n:" [dumps/dcat] - 10https://gerrit.wikimedia.org/r/262422 (https://phabricator.wikimedia.org/T118397) (owner: 10Lokal Profil) [17:15:14] 6operations, 10Dumps-Generation, 10hardware-requests: determine hardware needs for dumps in eqiad (boxes out of warranty, capacity planning) - https://phabricator.wikimedia.org/T118154#1918118 (10RobH) Please note the quote for 3 new boxes is pending @mark's approval on task T120126 [17:17:07] 6operations, 10netops: Orange S.A. searches a contact at WMF for tests - https://phabricator.wikimedia.org/T122293#1918120 (10Sylvain_WMFr) 5stalled>3Resolved a:3Sylvain_WMFr Thanks! I sent both email addresses to Orange. [17:22:32] RECOVERY - puppet last run on mw2053 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:27:21] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0] [17:28:52] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0] [17:31:08] 6operations, 6Analytics-Backlog, 7HTTPS: EventLogging sees too few distinct client IPs - https://phabricator.wikimedia.org/T119144#1918133 (10Ottomata) Yeah! @nuria, how do we figure out if we should fix this, or drop IPs? It seems some want it dropped, but research needs them. Should we just fix, or is t... [17:41:41] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [17:42:11] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [17:45:01] (03PS1) 10RobH: adding robh yubikey [puppet] - 10https://gerrit.wikimedia.org/r/262720 [17:46:34] (03CR) 10RobH: [V: 032] adding robh yubikey [puppet] - 10https://gerrit.wikimedia.org/r/262720 (owner: 10RobH) [17:46:45] (03CR) 10RobH: [C: 032] adding robh yubikey [puppet] - 10https://gerrit.wikimedia.org/r/262720 (owner: 10RobH) [17:50:08] (03CR) 10Giuseppe Lavagetto: [C: 032] Checkout and then rebase instead of cherry-pick [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/248634 (owner: 10Alex Monk) [17:50:35] (03PS4) 10Giuseppe Lavagetto: Checkout and then rebase instead of cherry-pick [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/248634 (owner: 10Alex Monk) [17:51:37] (03CR) 10Giuseppe Lavagetto: [V: 032] Checkout and then rebase instead of cherry-pick [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/248634 (owner: 10Alex Monk) [17:52:39] 6operations, 6Analytics-Backlog, 7HTTPS: EventLogging sees too few distinct client IPs - https://phabricator.wikimedia.org/T119144#1918172 (10Nuria) @ottomata: I think it will be unfair to remove the IP if research is relying on it for near term projects, so let's fix it. salted IPs should be getting purged... [17:52:51] 6operations, 6Analytics-Kanban, 7HTTPS: EventLogging sees too few distinct client IPs - https://phabricator.wikimedia.org/T119144#1918173 (10Nuria) [17:56:40] (03CR) 10GWicke: "> (1) Unless the rest API URLs *never* provide content that varies per user, this would be incorrect." [puppet] - 10https://gerrit.wikimedia.org/r/261662 (https://phabricator.wikimedia.org/T122673) (owner: 10GWicke) [18:00:30] (03CR) 10GWicke: "> (2) There's a more-general solution pending at: https://gerrit.wikimedia.org/r/#/c/259882" [puppet] - 10https://gerrit.wikimedia.org/r/261662 (https://phabricator.wikimedia.org/T122673) (owner: 10GWicke) [18:01:35] 6operations, 10Traffic, 5Patch-For-Review, 7Performance: Varnish apparently unconditionally varies on session cookies - https://phabricator.wikimedia.org/T122673#1918190 (10GWicke) More general patch at https://gerrit.wikimedia.org/r/#/c/259882. @faidon @bblack, could you review / deploy that soon? [18:04:54] 6operations, 10Traffic, 5Patch-For-Review, 7Performance: Varnish apparently unconditionally varies on session cookies - https://phabricator.wikimedia.org/T122673#1918193 (10Physikerwelt) Is this related to https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Mathematics#Math_rendering_suddenly_very_slow [18:05:09] 6operations, 10hardware-requests: reclaim rubidium to spares - https://phabricator.wikimedia.org/T118213#1918195 (10RobH) [18:06:03] 6operations, 6Analytics-Kanban, 7HTTPS: EventLogging sees too few distinct client IPs - https://phabricator.wikimedia.org/T119144#1918199 (10Ottomata) Ok, I think to fix this, we just need to change the varnishkafka format string to send the X-Client-IP header in place of the %h host IP header. [18:07:04] 6operations: add to alias - https://phabricator.wikimedia.org/T122927#1918200 (10eliza) [18:07:22] 6operations: add to alias - https://phabricator.wikimedia.org/T122927#1918202 (10Krenair) You can click 'Edit task' near the top right hand corner, and it will give you a page like the task creation form. Click in the projects input, begin to type 'operations' and the operations suggestion option will appear. Se... [18:07:23] (03PS6) 10Madhuvishy: [WIP] wikimetrics: Puppet module for wikimetrics [puppet] - 10https://gerrit.wikimedia.org/r/260687 [18:07:25] 6operations: add to alias - https://phabricator.wikimedia.org/T122927#1918203 (10eliza) Krenair - just associated this to Ops [18:10:21] 6operations, 10hardware-requests: reclaim rubidium to spares - https://phabricator.wikimedia.org/T118213#1918215 (10RobH) [18:10:43] (03CR) 10Merlijn van Deen: [C: 031] toollabs: lint genpp.py [puppet] - 10https://gerrit.wikimedia.org/r/262596 (owner: 10Hashar) [18:13:01] (03PS1) 10RobH: reclaim rubidium to spares [puppet] - 10https://gerrit.wikimedia.org/r/262721 [18:14:28] (03PS1) 10RobH: reclaim rubidium to spares [dns] - 10https://gerrit.wikimedia.org/r/262722 [18:14:54] (03CR) 10RobH: [C: 032] reclaim rubidium to spares [puppet] - 10https://gerrit.wikimedia.org/r/262721 (owner: 10RobH) [18:15:13] (03CR) 10RobH: [C: 032] reclaim rubidium to spares [dns] - 10https://gerrit.wikimedia.org/r/262722 (owner: 10RobH) [18:20:07] 6operations, 10hardware-requests: reclaim rubidium to spares - https://phabricator.wikimedia.org/T118213#1918219 (10RobH) [18:20:40] 6operations, 10hardware-requests: reclaim rubidium to spares - https://phabricator.wikimedia.org/T118213#1918223 (10RobH) a:5RobH>3Cmjohnson all but the disk wipe and adding back to spares sheet have been done. I've added in ops-eqiad, and assigned this to @cmjohnson for completion. [18:20:54] 6operations, 10ops-eqiad, 10hardware-requests: reclaim rubidium to spares - https://phabricator.wikimedia.org/T118213#1918226 (10RobH) [18:20:59] (03CR) 10EBernhardson: [C: 031] elasticsearch: lint check_elasticsearch.py [puppet] - 10https://gerrit.wikimedia.org/r/262594 (owner: 10Hashar) [18:22:48] 6operations, 10hardware-requests: eqiad: (2) servers request for ORES - https://phabricator.wikimedia.org/T119598#1918231 (10RobH) a:5akosiaris>3mark Just to summarize, this task is now assigned to @mark and awaits his approval for allocation of: WMF4577 & WMF4578: Identical systems, Dell PowerEdge R420,... [18:23:29] 6operations: Add Marc Brent to fr-all - https://phabricator.wikimedia.org/T122972#1918241 (10JGulingan) 3NEW [18:24:37] (03CR) 10Hoo man: "Any chance we could change this from using intuition to use the l10n bot directly (Like for MW extensions)?" [dumps/dcat] - 10https://gerrit.wikimedia.org/r/262422 (https://phabricator.wikimedia.org/T118397) (owner: 10Lokal Profil) [18:28:53] 6operations, 6Parsing-Team, 10hardware-requests: Dedicated server for running Parsoid's roundtrip tests to get reliable parse latencies and use as perf. benchmarking tests - https://phabricator.wikimedia.org/T116090#1918281 (10RobH) 5Open>3declined a:3RobH So this request has been pending more informat... [18:31:49] 6operations, 10Sentry, 10hardware-requests: Procure hardware for Sentry - https://phabricator.wikimedia.org/T93138#1918304 (10RobH) This seems like it would indeed be better served by a Ganeti VM. I'll be removing #hardware-requests and adding #vm-requests. Please note that since this is still pending pup... [18:32:10] 6operations, 10Sentry, 10vm-requests: Procure hardware for Sentry - https://phabricator.wikimedia.org/T93138#1918305 (10RobH) [18:37:18] 6operations, 10Dumps-Generation, 10hardware-requests: determine hardware needs for dumps in eqiad (boxes out of warranty, capacity planning) - https://phabricator.wikimedia.org/T118154#1918337 (10RobH) p:5High>3Normal [18:46:31] (03CR) 10Nikerabbit: "Do you mean https://github.com/Krinkle/intuition/issues/45 ?" [dumps/dcat] - 10https://gerrit.wikimedia.org/r/262422 (https://phabricator.wikimedia.org/T118397) (owner: 10Lokal Profil) [18:47:49] (03PS1) 10Andrew Bogott: Run puppet twice on new instance creation, with apt-get in between. [puppet] - 10https://gerrit.wikimedia.org/r/262726 [18:49:53] (03CR) 10Andrew Bogott: [C: 032] Run puppet twice on new instance creation, with apt-get in between. [puppet] - 10https://gerrit.wikimedia.org/r/262726 (owner: 10Andrew Bogott) [18:51:22] PROBLEM - puppet last run on multatuli is CRITICAL: CRITICAL: puppet fail [18:54:41] (03PS1) 10Andrew Bogott: Labs Jessie: Run puppet twice on new instance creation, with apt-get in between. [puppet] - 10https://gerrit.wikimedia.org/r/262731 [18:54:44] (03Abandoned) 10EBernhardson: Turn on the rest of the top 10 wikis (by size) for ES labs replica [mediawiki-config] - 10https://gerrit.wikimedia.org/r/255397 (owner: 10EBernhardson) [18:56:02] (03CR) 10Hoo man: "I created https://phabricator.wikimedia.org/T122975 for tracking the progress on having automated localization updates." [dumps/dcat] - 10https://gerrit.wikimedia.org/r/262422 (https://phabricator.wikimedia.org/T118397) (owner: 10Lokal Profil) [18:56:28] (03PS2) 10EBernhardson: [elastic] Fix bad method call in diamond collection [puppet] - 10https://gerrit.wikimedia.org/r/259778 [18:59:38] it turns out we have a bad method call in the diamond collector for elasticsearch, anyone willing to review and deploy a simple puppet patch for it? the result of the bug is that the master node isn't collecting all the stats it should [19:03:24] ottomata: YuviPanda: hey! are you at the office? Where will you be working on the jupyter stuff? :) [19:04:27] 6operations: add to alias - https://phabricator.wikimedia.org/T122927#1918389 (10eliza) p:5Normal>3High [19:07:16] 6operations: add to alias - https://phabricator.wikimedia.org/T122927#1918400 (10eliza) One other request from user is that she would like this to be an urgent request. Updated the priority on this. Thank you, [19:07:55] AndyRussG: yes! but probably will be working on it this afternoon [19:08:07] ottomata: ah K cool [19:08:53] ottomata: me and maybe others from Fr-tech might be working on some CentralNotice metrics but I thought we could at least set up in IRL proximity so we could interrupt and bother you more easily! [19:12:29] 10Ops-Access-Requests, 6operations: Grant katie access to hive tables from stat1002 - https://phabricator.wikimedia.org/T122977#1918419 (10EBernhardson) 3NEW [19:14:25] 10Ops-Access-Requests, 6operations: Grant katie access to hive tables from stat1002 - https://phabricator.wikimedia.org/T122977#1918447 (10Krenair) Katie Filbert or Katie Horn or...? [19:18:31] RECOVERY - puppet last run on multatuli is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [19:20:12] 6operations, 10Wikimedia-DNS: Set up compat redirect stats.wikipedia.org -> stats.wikimedia.org - https://phabricator.wikimedia.org/T21353#1918493 (10RobH) a:5RobH>3None this is over a year old and is somehow assigned to me (likely from the rt import?) Since its a case of old links now being invalid for w... [19:20:31] 6operations, 10Wikimedia-DNS: Set up compat redirect stats.wikipedia.org -> stats.wikimedia.org - https://phabricator.wikimedia.org/T21353#1918498 (10RobH) 5Open>3declined a:3RobH As such, I am declining this. [19:21:11] (03CR) 10Krinkle: "How is that related to anything?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189148 (https://phabricator.wikimedia.org/T85947) (owner: 10Legoktm) [19:22:03] 6operations, 10ops-eqiad, 5Patch-For-Review: rack/setup pc1004-1006 - https://phabricator.wikimedia.org/T121888#1918514 (10Cmjohnson) [19:22:58] (03CR) 10Krinkle: [C: 04-1] "The job needs to expand submodules. event-schemas is already specified in gitmodules. The job is failing because there is a PHPUnit test l" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189148 (https://phabricator.wikimedia.org/T85947) (owner: 10Legoktm) [19:27:17] 6operations, 10ops-eqiad: Wipe and Remove from rack cp1037-1040 - https://phabricator.wikimedia.org/T83553#1918535 (10Cmjohnson) [19:34:26] moritzm: https://gerrit.wikimedia.org/r/#/c/262739/ [19:49:38] (03PS1) 10Alexandros Kosiaris: WIP: Puppet provider for scap3 [puppet] - 10https://gerrit.wikimedia.org/r/262742 [19:52:28] (03CR) 10jenkins-bot: [V: 04-1] WIP: Puppet provider for scap3 [puppet] - 10https://gerrit.wikimedia.org/r/262742 (owner: 10Alexandros Kosiaris) [19:57:22] PROBLEM - puppet last run on cp3045 is CRITICAL: CRITICAL: puppet fail [19:58:06] paravoid: Maybe you can list some of your concerns regarding repo-auth mode at https://phabricator.wikimedia.org/T121913 [20:00:45] (03PS1) 10Muehlenhoff: Use paged searches in ldaplist [puppet] - 10https://gerrit.wikimedia.org/r/262745 [20:01:20] (03PS1) 10Gergő Tisza: Add output plugin for Sentry [software/logstash/plugins] - 10https://gerrit.wikimedia.org/r/262747 (https://phabricator.wikimedia.org/T85239) [20:01:30] #wmhack for upcoming sessions [20:05:51] (03PS1) 10Muehlenhoff: Extend kernel module blacklist [puppet] - 10https://gerrit.wikimedia.org/r/262748 [20:16:07] (03PS2) 10Ori.livneh: WIP: Puppet provider for scap3 [puppet] - 10https://gerrit.wikimedia.org/r/262742 (owner: 10Alexandros Kosiaris) [20:17:20] (03CR) 10jenkins-bot: [V: 04-1] WIP: Puppet provider for scap3 [puppet] - 10https://gerrit.wikimedia.org/r/262742 (owner: 10Alexandros Kosiaris) [20:17:46] (03PS3) 10Ori.livneh: WIP: Puppet provider for scap3 [puppet] - 10https://gerrit.wikimedia.org/r/262742 (owner: 10Alexandros Kosiaris) [20:18:15] (03CR) 10Ori.livneh: "PS2 / PS3: fixes for lint issues flagged by rubocop" [puppet] - 10https://gerrit.wikimedia.org/r/262742 (owner: 10Alexandros Kosiaris) [20:21:26] (03CR) 10Andrew Bogott: "Just to confirm -- the paged search still gets us all the entries, right? Not just the first page of them?" [puppet] - 10https://gerrit.wikimedia.org/r/262745 (owner: 10Muehlenhoff) [20:22:32] RECOVERY - puppet last run on cp3045 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [20:36:29] (03PS2) 10Muehlenhoff: Extend kernel module blacklist [puppet] - 10https://gerrit.wikimedia.org/r/262748 [20:39:19] (03CR) 10Muehlenhoff: [C: 032 V: 032] Extend kernel module blacklist [puppet] - 10https://gerrit.wikimedia.org/r/262748 (owner: 10Muehlenhoff) [20:45:59] 6operations: add user to strategicpartnerships@ - https://phabricator.wikimedia.org/T122989#1918729 (10eliza) 3NEW [20:50:52] (03PS1) 10Muehlenhoff: Update to 3.19.8-ckt11 [debs/linux] - 10https://gerrit.wikimedia.org/r/262770 [20:51:33] 6operations, 6Phabricator, 6Project-Creators: Create policy projects and convert people projects to open - https://phabricator.wikimedia.org/T90491#1918744 (10RobH) [20:52:56] (03CR) 10Muehlenhoff: [C: 032 V: 032] Update to 3.19.8-ckt11 [debs/linux] - 10https://gerrit.wikimedia.org/r/262770 (owner: 10Muehlenhoff) [20:57:14] (03PS1) 10Muehlenhoff: Also drop KVM patches from series file [debs/linux] - 10https://gerrit.wikimedia.org/r/262814 [20:57:34] (03CR) 10Muehlenhoff: [C: 032 V: 032] Also drop KVM patches from series file [debs/linux] - 10https://gerrit.wikimedia.org/r/262814 (owner: 10Muehlenhoff) [21:03:19] 6operations: track down and power off spare systems hitting dhcp - https://phabricator.wikimedia.org/T122990#1918770 (10RobH) 3NEW a:3RobH [21:16:16] (03PS1) 10RobH: db2033 had wrong mac in dhcp lease file [puppet] - 10https://gerrit.wikimedia.org/r/262816 [21:17:55] (03PS2) 10RobH: db2033 had wrong mac in dhcp lease file [puppet] - 10https://gerrit.wikimedia.org/r/262816 [21:19:34] (03CR) 10RobH: [C: 032] db2033 had wrong mac in dhcp lease file [puppet] - 10https://gerrit.wikimedia.org/r/262816 (owner: 10RobH) [21:23:49] 6operations, 10Traffic, 7Pybal: Pybal 1.12 has issues with executing ipvsadm commands - https://phabricator.wikimedia.org/T118948#1918835 (10Joe) 5Open>3Resolved a:3Joe [21:32:46] (03PS3) 10Filippo Giunchedi: elasticsearch: lint check_elasticsearch.py [puppet] - 10https://gerrit.wikimedia.org/r/262594 (owner: 10Hashar) [21:32:56] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] elasticsearch: lint check_elasticsearch.py [puppet] - 10https://gerrit.wikimedia.org/r/262594 (owner: 10Hashar) [21:35:38] 10Ops-Access-Requests, 6operations: Grant katie access to hive tables from stat1002 - https://phabricator.wikimedia.org/T122977#1918904 (10EBernhardson) Katie Filbert, aka aude [21:35:57] 10Ops-Access-Requests, 6operations: Grant katie access to hive tables from stat1002 - https://phabricator.wikimedia.org/T122977#1918912 (10EBernhardson) [21:42:07] 6operations, 10ops-codfw: setup/install/deploy db2033 - https://phabricator.wikimedia.org/T122998#1918926 (10RobH) 3NEW a:3Papaul [22:04:16] (03PS3) 10Muehlenhoff: Remove now obsolete OpenDJ server module and related templates/files [puppet] - 10https://gerrit.wikimedia.org/r/260542 [22:04:50] (03CR) 10Muehlenhoff: [C: 032 V: 032] Remove now obsolete OpenDJ server module and related templates/files [puppet] - 10https://gerrit.wikimedia.org/r/260542 (owner: 10Muehlenhoff) [22:06:21] 10Ops-Access-Requests, 6operations: Grant katie access to hive tables from stat1002 - https://phabricator.wikimedia.org/T122977#1918970 (10aude) [22:09:50] Could you someone bounce deployment-mediawiki03? It's not responding, and it's keeping security scans from completing. [22:11:10] dapatrick, that's in labs... [22:11:11] but sure [22:11:22] dapatrick, I can ssh in [22:11:28] Thanks! [22:11:42] it's not responding to http requests? [22:12:10] Nope. HTTP connections to port 80 are timing out. [22:13:49] restarted apache, no luck [22:14:15] 6operations, 10Deployment-Systems, 10Traffic: Varnish cache busting desired for /static/$VERSION/ resources which change within the lifetime of a branch - https://phabricator.wikimedia.org/T99096#1918985 (10mmodell) [22:14:17] 6operations, 6Performance-Team, 6Release-Engineering-Team, 10Traffic, 7Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1918983 (10mmodell) 5Open>3Resolved a:3mmodell [22:14:19] Hmph. Odd. [22:14:51] dapatrick, Krenair: -03 has been kinda flappy today. I know thcipriani was half looking at it earlier [22:16:01] Okay, thanks. [22:16:01] deployment-mediawiki03 responds if I connect to it from tools-login [22:16:03] hhvm restart that seemed to work earlier, FWIW. [22:16:24] Ah, right. Application, not web server. [22:26:45] thcipriani: If that hhvm restart brought it back earlier, it's down again now. [22:27:50] (03PS4) 10Alexandros Kosiaris: Puppet provider for scap3 [puppet] - 10https://gerrit.wikimedia.org/r/262742 [22:28:45] (03CR) 10jenkins-bot: [V: 04-1] Puppet provider for scap3 [puppet] - 10https://gerrit.wikimedia.org/r/262742 (owner: 10Alexandros Kosiaris) [22:30:04] dapatrick: looking now. [22:30:23] 7Puppet, 10MediaWiki-Vagrant, 7Easy: MediaWiki-Vagrant guest OS clock gets out of sync - https://phabricator.wikimedia.org/T116507#1919024 (10Mattflaschen) [22:31:22] thcipriani: Thanks! [22:34:56] (03PS1) 10EBernhardson: [elasticsearch] Record per-node fetch timing [puppet] - 10https://gerrit.wikimedia.org/r/262828 [22:43:09] 6operations: onboarding Filippo: Icinga & paging - https://phabricator.wikimedia.org/T83743#1919070 (10fgiunchedi) [23:03:03] !log switched restbase1009 to node 4.2 for testing, and restarted restbase; see https://phabricator.wikimedia.org/T107762 [23:03:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:04:54] 7Blocked-on-Operations, 6operations, 10RESTBase, 6Services: Switch RESTBase to use Node.js 4.2 - https://phabricator.wikimedia.org/T107762#1919104 (10GWicke) @moritzmuehlenhoff: Thank you for the packages! I installed those on restbase1009 and restarted restbase. I'll keep an eye on it. [23:10:01] 10Ops-Access-Requests, 6operations: Grant katie access to hive tables from stat1002 - https://phabricator.wikimedia.org/T122977#1919116 (10RobH) @aude: All access changes require that we check that the person accessing the cluster has signed L3. It seems you have not signed the new phabricator copy, please re... [23:10:33] 10Ops-Access-Requests, 6operations: Grant katie access to hive tables from stat1002 - https://phabricator.wikimedia.org/T122977#1919117 (10RobH) a:3Deskana So there is a bit of confusion on who approves WMDE access requests. The most relevant recent example I can find is https://phabricator.wikimedia.org/T1... [23:14:12] 10Ops-Access-Requests, 6operations: Grant katie access to hive tables from stat1002 - https://phabricator.wikimedia.org/T122977#1919124 (10Deskana) a:5Deskana>3None @aude is the point engineer working on search for Wikidata, and analysis of existing search logs is a pretty important part of that. So, appro... [23:15:16] (03PS1) 10RobH: grant aude access to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/262836 (https://phabricator.wikimedia.org/T122977) [23:18:02] PROBLEM - puppet last run on mw2109 is CRITICAL: CRITICAL: puppet fail [23:18:05] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant katie access to hive tables from stat1002 - https://phabricator.wikimedia.org/T122977#1919131 (10RobH) a:3RobH Just chatted with @aude. This is all ready go to and merge now on Monday (3 day wait). I'll keep it assigned to myself and process it... [23:18:11] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant katie access to hive tables from stat1002 - https://phabricator.wikimedia.org/T122977#1919133 (10RobH) 5Open>3stalled [23:18:48] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant katie access to hive tables from stat1002 - https://phabricator.wikimedia.org/T122977#1919136 (10Deskana) Thank you, @RobH! [23:22:16] (03PS3) 10Rush: [elastic] Fix bad method call in diamond collection [puppet] - 10https://gerrit.wikimedia.org/r/259778 (owner: 10EBernhardson) [23:23:31] (03CR) 10Rush: [C: 032] [elastic] Fix bad method call in diamond collection [puppet] - 10https://gerrit.wikimedia.org/r/259778 (owner: 10EBernhardson) [23:26:42] (03PS1) 10Andrew Bogott: Enable memory cgroups for labs debian instances [puppet] - 10https://gerrit.wikimedia.org/r/262838 (https://phabricator.wikimedia.org/T122734) [23:26:44] (03PS1) 10Andrew Bogott: Labs jessie image: override the debian grub with a copy of our current puppetized grub defaults. [puppet] - 10https://gerrit.wikimedia.org/r/262839 (https://phabricator.wikimedia.org/T122734) [23:27:38] (03PS2) 10Andrew Bogott: Enable memory cgroups for labs debian instances [puppet] - 10https://gerrit.wikimedia.org/r/262838 (https://phabricator.wikimedia.org/T122734) [23:27:40] (03PS2) 10Andrew Bogott: Labs jessie image: override the debian grub with a copy of our current puppetized grub defaults. [puppet] - 10https://gerrit.wikimedia.org/r/262839 (https://phabricator.wikimedia.org/T122734) [23:27:42] (03PS2) 10Andrew Bogott: Labs Jessie: Run puppet twice on new instance creation, with apt-get in between. [puppet] - 10https://gerrit.wikimedia.org/r/262731 [23:30:06] Who knows about augeas and /etc/default? I could use a reviewer. [23:30:21] PROBLEM - Check size of conntrack table on analytics1021 is CRITICAL: Connection refused by host [23:30:32] PROBLEM - dhclient process on analytics1021 is CRITICAL: Connection refused by host [23:30:50] Hercules? (: [23:31:07] (03CR) 10Andrew Bogott: "This works, but it's my first ever augeas change and I don't entirely understand what I'm doing. The specifics are cribbed from modules/b" [puppet] - 10https://gerrit.wikimedia.org/r/262838 (https://phabricator.wikimedia.org/T122734) (owner: 10Andrew Bogott) [23:31:24] PROBLEM - DPKG on analytics1021 is CRITICAL: Timeout while attempting connection [23:31:24] PROBLEM - puppet last run on analytics1021 is CRITICAL: Timeout while attempting connection [23:33:16] PROBLEM - Host analytics1021 is DOWN: PING CRITICAL - Packet loss = 100% [23:45:35] RECOVERY - puppet last run on mw2109 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [23:49:35] (03CR) 10Andrew Bogott: [C: 032] Labs Jessie: Run puppet twice on new instance creation, with apt-get in between. [puppet] - 10https://gerrit.wikimedia.org/r/262731 (owner: 10Andrew Bogott)