[00:00:17] (03CR) 10Dzahn: "but if you wanted to get the notifications.. currently it's: members smalyshev,irc-wikidata,jzerebecki,hoo" [puppet] - 10https://gerrit.wikimedia.org/r/288706 (owner: 10JanZerebecki) [00:01:32] (03CR) 10Smalyshev: "yeah so maybe it should be that admins group?" [puppet] - 10https://gerrit.wikimedia.org/r/288706 (owner: 10JanZerebecki) [00:06:14] PROBLEM - HTTPS on ununpentium is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:IO::Socket::SSL: SSL connect attempt failed with unknown error error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol [00:06:48] eh, ok, yea, i know why. it's not used yet [00:07:54] !log catrope@tin Synchronized php-1.28.0-wmf.1/includes/api/ApiMain.php: Fix missing parameter for OutputPageCheckLastModified hook (duration: 00m 32s) [00:08:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:11:01] (03PS1) 10Dzahn: add gehel to wdqs icinga contact group [puppet] - 10https://gerrit.wikimedia.org/r/288735 [00:11:05] ACKNOWLEDGEMENT - HTTPS on ununpentium is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:IO::Socket::SSL: SSL connect attempt failed with unknown error error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol daniel_zahn migrating to behind misc-web, to it wont speak https anymore [00:11:55] (03CR) 10Dzahn: "https://gerrit.wikimedia.org/r/#/c/288735/" [puppet] - 10https://gerrit.wikimedia.org/r/288706 (owner: 10JanZerebecki) [00:13:08] (03PS2) 10Dzahn: add gehel to wdqs icinga contact group [puppet] - 10https://gerrit.wikimedia.org/r/288735 [00:14:51] 06Operations, 13Patch-For-Review: move RT off of magnesium - https://phabricator.wikimedia.org/T119112#2294442 (10Dzahn) a:03Dzahn [00:19:13] 06Operations, 13Patch-For-Review: move RT off of magnesium - https://phabricator.wikimedia.org/T119112#2294444 (10Dzahn) I have a 4.2.8 RT running on jessie and behind misc-web. I could login and see existing tickets. There is only a minimal schema change that i have not applied yet. I made a backup of the dat... [00:22:24] RECOVERY - HTTPS on ununpentium is OK: SSL OK - Certificate rt.wikimedia.org valid until 2016-07-27 02:03:35 +0000 (expires in 74 days) [00:26:20] (03PS1) 10Alex Monk: Add my (Alex M's) key to root if realm is labtest [labs/private] - 10https://gerrit.wikimedia.org/r/288736 [00:28:39] (03PS1) 10Dzahn: RT: install different Apache config on new server [puppet] - 10https://gerrit.wikimedia.org/r/288737 (https://phabricator.wikimedia.org/T119112) [00:30:35] (03PS2) 10Dzahn: RT: install different Apache config on new server [puppet] - 10https://gerrit.wikimedia.org/r/288737 (https://phabricator.wikimedia.org/T119112) [00:32:50] (03CR) 10Smalyshev: [C: 031] add gehel to wdqs icinga contact group [puppet] - 10https://gerrit.wikimedia.org/r/288735 (owner: 10Dzahn) [00:33:28] (03CR) 10Smalyshev: "May be worth waiting for Gehel to confirm he wants to be notified :)" [puppet] - 10https://gerrit.wikimedia.org/r/288735 (owner: 10Dzahn) [00:34:42] (03CR) 10Dzahn: "yes, absolutely, it was just the easiest reminder to upload it" [puppet] - 10https://gerrit.wikimedia.org/r/288735 (owner: 10Dzahn) [00:37:41] (03PS3) 10Dzahn: RT: install different Apache config on new server [puppet] - 10https://gerrit.wikimedia.org/r/288737 (https://phabricator.wikimedia.org/T119112) [00:39:38] (03CR) 10Dzahn: [C: 032] "no-op on current server http://puppet-compiler.wmflabs.org/2797/" [puppet] - 10https://gerrit.wikimedia.org/r/288737 (https://phabricator.wikimedia.org/T119112) (owner: 10Dzahn) [00:46:34] PROBLEM - HTTPS on ununpentium is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:IO::Socket::SSL: SSL connect attempt failed with unknown error error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol [00:47:24] i'm fixing that [00:47:33] (03PS1) 10Dzahn: RT: SSL setup only on precise [puppet] - 10https://gerrit.wikimedia.org/r/288739 [00:53:52] (03PS2) 10Dzahn: RT: SSL setup only on precise [puppet] - 10https://gerrit.wikimedia.org/r/288739 (https://phabricator.wikimedia.org/T119112) [00:57:42] (03PS3) 10Dzahn: RT: SSL setup only on precise [puppet] - 10https://gerrit.wikimedia.org/r/288739 [01:00:02] (03CR) 10Dzahn: [C: 032] "no-op on existing server http://puppet-compiler.wmflabs.org/2799/" [puppet] - 10https://gerrit.wikimedia.org/r/288739 (owner: 10Dzahn) [01:01:45] ACKNOWLEDGEMENT - HTTPS on ununpentium is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:IO::Socket::SSL: SSL connect attempt failed with unknown error error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol daniel_zahn removed by https://gerrit.wikimedia.org/r/#/c/288739/ [01:04:12] (03PS1) 10Dzahn: gitblit: remove exec permission from unit file [puppet] - 10https://gerrit.wikimedia.org/r/288740 [01:07:33] (03CR) 10Dzahn: [C: 032] gitblit: remove exec permission from unit file [puppet] - 10https://gerrit.wikimedia.org/r/288740 (owner: 10Dzahn) [01:24:44] !log aaron@tin Synchronized php-1.28.0-wmf.1/includes/api/ApiStashEdit.php: f33ed7ae239 (duration: 00m 36s) [01:25:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:28:04] wait we're at ununpentium now?! damn [02:19:25] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.1) (duration: 08m 06s) [02:19:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:28:08] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat May 14 02:28:07 UTC 2016 (duration 8m 42s) [02:28:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:32:55] (03PS3) 10Andrew Bogott: Glance: Fix the glance image backup cron [puppet] - 10https://gerrit.wikimedia.org/r/288621 [02:35:22] (03PS1) 10Andrew Bogott: Detect labsproject for labtest instances [puppet] - 10https://gerrit.wikimedia.org/r/288743 [02:36:22] (03CR) 10jenkins-bot: [V: 04-1] Detect labsproject for labtest instances [puppet] - 10https://gerrit.wikimedia.org/r/288743 (owner: 10Andrew Bogott) [02:41:13] (03CR) 10Alex Monk: [C: 04-1] "'||', not 'or'" [puppet] - 10https://gerrit.wikimedia.org/r/288743 (owner: 10Andrew Bogott) [03:13:53] 06Operations, 06Labs, 10Tool-Labs, 13Patch-For-Review: toolserver.org certificate to expire 2016-06-30 - https://phabricator.wikimedia.org/T134798#2294608 (10yuvipanda) It's in relic.toolserver-legacy.eqiad.wmflabs [03:23:10] (03PS1) 10Alex Monk: (Attempt to) puppetise labtest pdns config difference for TLD and private IP range [puppet] - 10https://gerrit.wikimedia.org/r/288744 [03:28:44] (03PS2) 10Alex Monk: Fix labtest pdns config for different TLD and private IP range [puppet] - 10https://gerrit.wikimedia.org/r/288744 [06:31:14] PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:24] PROBLEM - puppet last run on mw2021 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:44] PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: puppet fail [06:31:53] PROBLEM - puppet last run on eventlog2001 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:34] PROBLEM - puppet last run on cp2001 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:03] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 1 failures [06:50:25] RECOVERY - MariaDB Slave Lag: s2 on dbstore2002 is OK: OK slave_sql_lag Replication lag: 0.24 seconds [06:56:43] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:57:03] RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:57:14] RECOVERY - puppet last run on mw2021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:34] RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:35] RECOVERY - puppet last run on eventlog2001 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:58:24] RECOVERY - puppet last run on cp2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:13:24] 06Operations: eqiad: 1 hardware access request for labs on real hardware (mwoffliner) - https://phabricator.wikimedia.org/T117095#2294665 (10Kelson) @yuvipanda How does the implementation of this task looks like? This need/request is always really valid. [07:20:25] PROBLEM - puppet last run on mw2113 is CRITICAL: CRITICAL: puppet fail [07:48:34] RECOVERY - puppet last run on mw2113 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:58:14] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Puppet has 1 failures [09:22:35] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [12:02:53] PROBLEM - puppet last run on silver is CRITICAL: CRITICAL: puppet fail [12:28:03] (03CR) 10Andrew Bogott: [C: 031] "This looks fine to me but I'd like a second opinion about if there's a different/better way to accomplish this" [labs/private] - 10https://gerrit.wikimedia.org/r/288736 (owner: 10Alex Monk) [12:28:35] (03CR) 10Andrew Bogott: "|| doesn't work and 'or' does." [puppet] - 10https://gerrit.wikimedia.org/r/288743 (owner: 10Andrew Bogott) [12:28:43] RECOVERY - puppet last run on silver is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [12:44:59] (03CR) 10Andrew Bogott: [C: 032] Fix labtest pdns config for different TLD and private IP range [puppet] - 10https://gerrit.wikimedia.org/r/288744 (owner: 10Alex Monk) [12:57:34] (03CR) 10Andrew Bogott: "Well, I've read the docs and I'm more confused now. Anyway I can get the behavior I want with parens" [puppet] - 10https://gerrit.wikimedia.org/r/288743 (owner: 10Andrew Bogott) [12:57:58] 06Operations, 06Performance-Team, 06Services, 07Availability: Consider cassandra for session storage (with SSL) - https://phabricator.wikimedia.org/T134811#2294830 (10aaron) [12:58:50] (03PS2) 10Andrew Bogott: Detect labsproject for labtest instances [puppet] - 10https://gerrit.wikimedia.org/r/288743 [13:02:58] (03CR) 10Yuvipanda: [C: 032 V: 032] "Let's do post-merge review on this one :D" [software/kubernetes] - 10https://gerrit.wikimedia.org/r/288600 (owner: 10Yuvipanda) [13:03:13] (03PS2) 10Yuvipanda: tools: Bump k8s version [puppet] - 10https://gerrit.wikimedia.org/r/288719 [13:03:21] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Bump k8s version [puppet] - 10https://gerrit.wikimedia.org/r/288719 (owner: 10Yuvipanda) [13:05:34] (03CR) 10Merlijn van Deen: "An alternative could be via hiera, using "passwords::root::extra_keys" (see https://wikitech.wikimedia.org/wiki/Hiera:Tools). I think that" [labs/private] - 10https://gerrit.wikimedia.org/r/288736 (owner: 10Alex Monk) [13:44:38] (03CR) 10Alex Monk: "That's certainly a different way, but is it better?" [labs/private] - 10https://gerrit.wikimedia.org/r/288736 (owner: 10Alex Monk) [13:48:14] (03CR) 10Alex Monk: [C: 031] "https://phabricator.wikimedia.org/P3100 - looks right" [puppet] - 10https://gerrit.wikimedia.org/r/288743 (owner: 10Andrew Bogott) [13:49:16] (03PS1) 10Yuvipanda: tools: Enable host automounts [puppet] - 10https://gerrit.wikimedia.org/r/288761 (https://phabricator.wikimedia.org/T134748) [13:50:21] (03PS2) 10Yuvipanda: tools: Enable host automounts [puppet] - 10https://gerrit.wikimedia.org/r/288761 (https://phabricator.wikimedia.org/T134748) [13:57:50] (03PS1) 10Yuvipanda: k8s: Actually enable host automounter [puppet] - 10https://gerrit.wikimedia.org/r/288762 (https://phabricator.wikimedia.org/T134748) [13:58:16] (03PS3) 10Yuvipanda: tools: Enable host automounts [puppet] - 10https://gerrit.wikimedia.org/r/288761 (https://phabricator.wikimedia.org/T134748) [13:58:24] PROBLEM - puppet last run on labstore2004 is CRITICAL: CRITICAL: puppet fail [13:58:24] (03Abandoned) 10Yuvipanda: k8s: Actually enable host automounter [puppet] - 10https://gerrit.wikimedia.org/r/288762 (https://phabricator.wikimedia.org/T134748) (owner: 10Yuvipanda) [14:21:31] (03PS3) 10Andrew Bogott: Detect labsproject for labtest instances [puppet] - 10https://gerrit.wikimedia.org/r/288743 [14:23:22] (03CR) 10Andrew Bogott: [C: 032] Detect labsproject for labtest instances [puppet] - 10https://gerrit.wikimedia.org/r/288743 (owner: 10Andrew Bogott) [14:24:44] RECOVERY - puppet last run on labstore2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:31:18] (03PS1) 10Yuvipanda: tools: Bump k8s version again [puppet] - 10https://gerrit.wikimedia.org/r/288765 [14:37:15] (03PS2) 10Yuvipanda: tools: Bump k8s version again [puppet] - 10https://gerrit.wikimedia.org/r/288765 [14:38:27] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Bump k8s version again [puppet] - 10https://gerrit.wikimedia.org/r/288765 (owner: 10Yuvipanda) [14:47:55] (03PS1) 10Andrew Bogott: Include 10.196.* in site 'codfw' [puppet] - 10https://gerrit.wikimedia.org/r/288767 [14:54:33] (03PS2) 10Andrew Bogott: Include 10.196.* in site 'codfw' [puppet] - 10https://gerrit.wikimedia.org/r/288767 [15:10:20] (03PS1) 10Alex Monk: varnish text frontend: Fix zerowiki check for use in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/288768 (https://phabricator.wikimedia.org/T135082) [15:18:56] (03CR) 10Yurik: [C: 031] varnish text frontend: Fix zerowiki check for use in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/288768 (https://phabricator.wikimedia.org/T135082) (owner: 10Alex Monk) [15:23:24] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [15:23:53] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [15:26:55] ^ looks like it was a very minor blip on our eqiad<->esams link, not really very critical so far [15:27:53] (but there was also a notable PURGE spike at all sites. It could be the link itself was fine and the multicast purge spike was so overwhelming that it affected other traffic on the link...) [15:29:24] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [15:29:54] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [15:33:20] (03PS1) 10Yuvipanda: Add 'hostautomounter' admission controller [software/kubernetes] - 10https://gerrit.wikimedia.org/r/288770 [16:31:18] (03PS1) 10Yuvipanda: uidenforcer: Set GID as well as UID [software/kubernetes] - 10https://gerrit.wikimedia.org/r/288771 [17:00:34] PROBLEM - MariaDB Slave Lag: s4 on db1056 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 317.86 seconds [17:04:43] RECOVERY - MariaDB Slave Lag: s4 on db1056 is OK: OK slave_sql_lag Replication lag: 0.39 seconds [17:08:33] (03CR) 10BBlack: [C: 032] varnish text frontend: Fix zerowiki check for use in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/288768 (https://phabricator.wikimedia.org/T135082) (owner: 10Alex Monk) [17:08:54] apparently grrrit hates your patch :) [17:09:21] heh [18:09:34] PROBLEM - aqs endpoints health on aqs1002 is CRITICAL: /pageviews/top/{project}/{access}/{year}/{month}/{day} (Get top page views) is CRITICAL: Test Get top page views returned the unexpected status 500 (expecting: 200) [18:10:05] PROBLEM - aqs endpoints health on aqs1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:11:54] RECOVERY - aqs endpoints health on aqs1001 is OK: All endpoints are healthy [18:13:23] RECOVERY - aqs endpoints health on aqs1002 is OK: All endpoints are healthy [18:43:47] (03PS1) 10Yuvipanda: Add hostpathenforcer admission controller [software/kubernetes] - 10https://gerrit.wikimedia.org/r/288775 [18:58:28] (03PS2) 10JanZerebecki: Update Wikidata property blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/288404 (owner: 10Matěj Suchánek) [18:59:02] (03CR) 10JanZerebecki: [C: 031] "Can be deployed at any time." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/288404 (owner: 10Matěj Suchánek) [19:30:24] PROBLEM - puppet last run on es2013 is CRITICAL: CRITICAL: puppet fail [19:56:54] RECOVERY - puppet last run on es2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:53:54] 06Operations, 10media-storage: Document how to handle 'inconsistent state within the internal storage backends' issues - https://phabricator.wikimedia.org/T135318#2295167 (10Dereckson) [21:09:47] I hope this is the rigth channel, if not i beg your pardon. I need login to osmit-tre.equiad.wmflabs user sbiribizio added me to the osmit project but he said I should ask for a bastion role [21:18:05] alezenait: you should ask on #wikimedia-labs channel [21:22:01] thanks [23:14:42] 06Operations, 06Discovery, 10Traffic, 10Wikidata, 10Wikidata-Query-Service: WDQS empty response - transfer clsoed with 15042 bytes remaining to read - https://phabricator.wikimedia.org/T134989#2295239 (10ema) The issue can be reproduced running two varnishd on the same host, without any VCL. A web server... [23:46:39] (03PS4) 10Yuvipanda: tools: Enable host automounts [puppet] - 10https://gerrit.wikimedia.org/r/288761 (https://phabricator.wikimedia.org/T134748) [23:46:41] (03PS1) 10Yuvipanda: tools: Enable HostPathEnforcer admission controller [puppet] - 10https://gerrit.wikimedia.org/r/288808 (https://phabricator.wikimedia.org/T112718) [23:48:21] (03CR) 10jenkins-bot: [V: 04-1] tools: Enable HostPathEnforcer admission controller [puppet] - 10https://gerrit.wikimedia.org/r/288808 (https://phabricator.wikimedia.org/T112718) (owner: 10Yuvipanda) [23:50:34] PROBLEM - puppet last run on mw1204 is CRITICAL: CRITICAL: Puppet has 1 failures