[00:01:35] (03PS1) 10Dzahn: phabricator: add vcs::listen_addresses for codfw [puppet] - 10https://gerrit.wikimedia.org/r/317295 (https://phabricator.wikimedia.org/T143363) [00:02:09] (03CR) 10Dzahn: "after adding this, follow-up here: https://gerrit.wikimedia.org/r/#/c/317295/1/hieradata/role/codfw/phabricator/main.yaml" [dns] - 10https://gerrit.wikimedia.org/r/317291 (https://phabricator.wikimedia.org/T143363) (owner: 10Dzahn) [00:19:58] (03PS1) 10Dzahn: add git-ssh.codfw.wikimedia.org service IP [dns] - 10https://gerrit.wikimedia.org/r/317296 (https://phabricator.wikimedia.org/T143363) [01:55:32] PROBLEM - puppet last run on cp3046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:23:46] RECOVERY - puppet last run on cp3046 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [03:26:56] Reedy are you here? [03:37:46] Zppix: he is in BST, so try a bit later [03:38:34] i was just curious if it was okay to use tracking for a bug/todo list for my bot (instead of having project created) [04:14:11] (03CR) 10Bartosz DziewoƄski: [C: 031] "As written, this change is going to affect Commons, Romanian Wikipedia, and a couple wikis where I'm not sure no one cares about this (lik" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315121 (https://phabricator.wikimedia.org/T147799) (owner: 10MarcoAurelio) [04:36:09] PROBLEM - Disk space on cp4006 is CRITICAL: DISK CRITICAL - free space: / 344 MB (3% inode=86%) [04:59:10] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:25:01] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:26:30] RECOVERY - Disk space on cp4006 is OK: DISK OK [08:39:22] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:07:41] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:45:52] (03CR) 10Addshore: Enable simple-json-datasource on prod Grafana (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/314029 (https://phabricator.wikimedia.org/T147329) (owner: 10Addshore) [10:35:13] PROBLEM - check_mailq on barium is CRITICAL: CRITICAL: mailq is 5000 (threshold c = 5000) [10:40:13] PROBLEM - check_mailq on barium is CRITICAL: CRITICAL: mailq is 5000 (threshold c = 5000) [10:45:13] PROBLEM - check_mailq on barium is CRITICAL: CRITICAL: mailq is 5004 (threshold c = 5000) [10:50:13] PROBLEM - check_mailq on barium is CRITICAL: CRITICAL: mailq is 5007 (threshold c = 5000) [10:55:13] PROBLEM - check_mailq on barium is CRITICAL: CRITICAL: mailq is 5010 (threshold c = 5000) [11:00:14] PROBLEM - check_mailq on barium is CRITICAL: CRITICAL: mailq is 5008 (threshold c = 5000) [11:05:15] PROBLEM - check_mailq on barium is CRITICAL: CRITICAL: mailq is 5014 (threshold c = 5000) [11:05:32] ACKNOWLEDGEMENT - check_mailq on barium is CRITICAL: CRITICAL: mailq is 5014 (threshold c = 5000) Jeff_Green mx throttling on the receiving end [12:34:22] !log Stopping replication in db2055 to use it to clone another host - T146261 [12:34:25] T146261: Reimage dbstore2001 as jessie - https://phabricator.wikimedia.org/T146261 [12:34:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:34:30] (03PS1) 10Aklapper: Exclude entries with oldValue=null in "Project changes" results [puppet] - 10https://gerrit.wikimedia.org/r/317316 [13:45:31] (03PS1) 10Aklapper: Also list name of acting user for project creations and name changes [puppet] - 10https://gerrit.wikimedia.org/r/317317 [13:52:09] (03PS1) 10Aklapper: Drop "Phabricator workboards with single column only" query [puppet] - 10https://gerrit.wikimedia.org/r/317318 [14:11:37] PROBLEM - Host cp1052 is DOWN: PING CRITICAL - Packet loss = 100% [14:25:17] PROBLEM - IPsec on cp4009 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [14:25:17] PROBLEM - IPsec on cp3041 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [14:25:57] PROBLEM - IPsec on cp2013 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 [14:26:09] PROBLEM - IPsec on cp2019 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 [14:26:17] PROBLEM - IPsec on cp4016 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [14:26:18] PROBLEM - IPsec on cp3032 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [14:26:28] PROBLEM - IPsec on cp3040 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [14:26:28] PROBLEM - IPsec on cp3030 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [14:26:32] PROBLEM - IPsec on cp4010 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [14:26:37] PROBLEM - IPsec on cp2010 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 [14:26:37] PROBLEM - IPsec on cp2004 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 [14:26:47] PROBLEM - IPsec on cp2001 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 [14:26:48] PROBLEM - IPsec on cp2023 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 [14:26:48] PROBLEM - IPsec on cp2007 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 [14:26:48] PROBLEM - IPsec on cp2016 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 [14:26:58] !log depooled cp1052 (cache_text@eqiad, ethernet linkdown for unknown reasons) [14:27:01] PROBLEM - IPsec on cp4017 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [14:27:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:27:21] PROBLEM - IPsec on cp4008 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [14:27:26] PROBLEM - IPsec on cp4018 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [14:27:27] PROBLEM - IPsec on cp3033 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [14:27:27] PROBLEM - IPsec on cp3031 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [14:27:27] PROBLEM - IPsec on cp3042 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [14:27:38] there's going to be a ton of that ipsec spam, sorry, it's just how the monitoring on that stuff works :/ [14:27:48] PROBLEM - IPsec on cp3043 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [14:28:59] bblack: needs help? [14:30:20] we're ok, but thanks :) [14:30:36] just saw it now :) [14:33:42] 06Operations, 10ops-eqiad, 10Traffic: cp1052 ethernet link down 2016-10-22 14:11 - https://phabricator.wikimedia.org/T148891#2735948 (10BBlack) [14:34:40] ACKNOWLEDGEMENT - Host cp1052 is DOWN: PING CRITICAL - Packet loss = 100% Brandon Black https://phabricator.wikimedia.org/T148891 [14:34:44] bblack: it looks like the bnx2x interface is gone [14:34:51] from your task [14:35:27] anything regarding the fan by any chance? [14:35:28] well the link is down, the interface is still there [14:35:49] it's the same symptom we'd see if someone unplugged the cable [14:36:08] nothing about fans I don't think [14:36:26] ok, in the past I got some bad experience from fan on 10G network card failing ;) [14:36:39] ah [14:36:56] anyways, I'm gonna check the switch side too [14:38:13] ok [14:40:07] 06Operations, 10ops-eqiad, 10Traffic: cp1052 ethernet link down 2016-10-22 14:11 - https://phabricator.wikimedia.org/T148891#2735964 (10BBlack) Interface on asw-c-eqiad says down as well: ``` bblack@asw-c-eqiad> show interfaces xe-8/0/7 Physical interface: xe-8/0/7, Enabled, Physical link is Down Interfac... [14:44:55] 06Operations, 10ops-eqiad, 10Traffic: cp1052 ethernet link down 2016-10-22 14:11 - https://phabricator.wikimedia.org/T148891#2735972 (10BBlack) Tried resetting the interface on the switch side with `test interface xe-8/0/7 restart-auto-negotiation` as well as commits of disable then re-enable of the interfac... [14:47:40] (03CR) 10MarcoAurelio: "> As written, this change is going to affect Commons, Romanian" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315121 (https://phabricator.wikimedia.org/T147799) (owner: 10MarcoAurelio) [14:47:47] (03PS5) 10MarcoAurelio: Stop adding "Category:Uploaded with UploadWizard" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315121 (https://phabricator.wikimedia.org/T147799) [14:52:13] 06Operations, 10ops-eqiad, 10Traffic: cp1052 ethernet link down 2016-10-22 14:11 - https://phabricator.wikimedia.org/T148891#2735980 (10BBlack) Tried re-setting from host-side software, and seems to have worked! In response to `ifconfig eth0 down`, dmesg had new output: ``` [Sat Oct 22 14:46:17 2016] failed... [14:52:25] !log rebooted cp1052 - T148891 [14:52:27] T148891: cp1052 ethernet link down 2016-10-22 14:11 - https://phabricator.wikimedia.org/T148891 [14:52:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:52:43] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0] [14:53:56] the 503 spike seems artificial. the host was finally able to forward 503 events that happened when its link first went down [14:54:22] (or something similar) [14:55:41] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [1000.0] [14:55:58] RECOVERY - IPsec on cp4008 is OK: Strongswan OK - 44 ESP OK [14:55:58] RECOVERY - IPsec on cp4018 is OK: Strongswan OK - 44 ESP OK [14:56:07] RECOVERY - IPsec on cp3031 is OK: Strongswan OK - 44 ESP OK [14:56:07] RECOVERY - IPsec on cp3033 is OK: Strongswan OK - 44 ESP OK [14:56:07] RECOVERY - Host cp1052 is UP: PING OK - Packet loss = 0%, RTA = 1.51 ms [14:56:07] RECOVERY - IPsec on cp3042 is OK: Strongswan OK - 44 ESP OK [14:56:17] RECOVERY - IPsec on cp4009 is OK: Strongswan OK - 44 ESP OK [14:56:17] RECOVERY - IPsec on cp3041 is OK: Strongswan OK - 44 ESP OK [14:56:18] RECOVERY - IPsec on cp3043 is OK: Strongswan OK - 44 ESP OK [14:56:56] RECOVERY - IPsec on cp2013 is OK: Strongswan OK - 56 ESP OK [14:57:10] RECOVERY - IPsec on cp2019 is OK: Strongswan OK - 56 ESP OK [14:57:30] RECOVERY - IPsec on cp4016 is OK: Strongswan OK - 44 ESP OK [14:57:34] RECOVERY - IPsec on cp3032 is OK: Strongswan OK - 44 ESP OK [14:57:34] RECOVERY - IPsec on cp3040 is OK: Strongswan OK - 44 ESP OK [14:57:34] RECOVERY - IPsec on cp3030 is OK: Strongswan OK - 44 ESP OK [14:57:35] RECOVERY - IPsec on cp4010 is OK: Strongswan OK - 44 ESP OK [14:57:46] RECOVERY - IPsec on cp2004 is OK: Strongswan OK - 56 ESP OK [14:57:46] RECOVERY - IPsec on cp2010 is OK: Strongswan OK - 56 ESP OK [14:57:48] RECOVERY - IPsec on cp2001 is OK: Strongswan OK - 56 ESP OK [14:57:56] RECOVERY - IPsec on cp2007 is OK: Strongswan OK - 56 ESP OK [14:57:56] RECOVERY - IPsec on cp2023 is OK: Strongswan OK - 56 ESP OK [14:57:57] RECOVERY - IPsec on cp2016 is OK: Strongswan OK - 56 ESP OK [14:58:09] RECOVERY - IPsec on cp4017 is OK: Strongswan OK - 44 ESP OK [15:02:01] !log repool cp1052 - T148891 [15:02:02] T148891: cp1052 ethernet link down 2016-10-22 14:11 - https://phabricator.wikimedia.org/T148891 [15:02:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:02:08] !log bblack@puppetmaster1001 conftool action : set/pooled=yes; selector: name=cp1052.eqiad.wmnet [15:02:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:05:06] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [15:05:17] 06Operations, 10ops-eqiad, 10Traffic: cp1052 ethernet link down 2016-10-22 14:11 - https://phabricator.wikimedia.org/T148891#2735991 (10BBlack) 05Open>03Resolved a:03BBlack Seems ok post-reboot, repooled. [15:05:38] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [15:31:03] (03PS1) 10Aklapper: Also list parent project for (sub)project creations and name changes [puppet] - 10https://gerrit.wikimedia.org/r/317321 [15:32:39] PROBLEM - Host cp1052 is DOWN: PING CRITICAL - Packet loss = 100% [15:35:03] <_joe_> bblack: it's down again apparently [15:35:30] <_joe_> I'll depool it [15:37:03] !log oblivian@puppetmaster1001 conftool action : set/pooled=no; selector: name=cp1052.eqiad.wmnet [15:37:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:42:55] PROBLEM - puppet last run on cp3005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:46:11] PROBLEM - IPsec on cp3033 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [15:46:11] PROBLEM - IPsec on cp3042 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [15:46:11] PROBLEM - IPsec on cp4009 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [15:46:21] (03CR) 1020after4: Gerrit: Enable concurrent collector (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/316983 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [15:46:23] PROBLEM - IPsec on cp3041 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [15:46:23] PROBLEM - IPsec on cp3043 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [15:46:53] PROBLEM - IPsec on cp2013 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 [15:47:01] PROBLEM - IPsec on cp2019 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 [15:47:29] (03CR) 10Paladox: Gerrit: Enable concurrent collector (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/316983 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [15:47:30] PROBLEM - IPsec on cp4016 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [15:47:31] PROBLEM - IPsec on cp3032 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [15:47:31] PROBLEM - IPsec on cp3030 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [15:47:31] PROBLEM - IPsec on cp3040 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [15:47:31] PROBLEM - IPsec on cp4010 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [15:47:32] PROBLEM - IPsec on cp2010 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 [15:47:32] PROBLEM - IPsec on cp2004 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 [15:47:50] PROBLEM - IPsec on cp2001 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 [15:47:51] PROBLEM - IPsec on cp2023 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 [15:47:51] PROBLEM - IPsec on cp2007 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 [15:47:51] PROBLEM - IPsec on cp2016 is CRITICAL: Strongswan CRITICAL - ok: 54 connecting: cp1052_v4, cp1052_v6 [15:48:07] PROBLEM - IPsec on cp4017 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [15:48:26] PROBLEM - IPsec on cp4008 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [15:48:26] PROBLEM - IPsec on cp4018 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [15:48:31] PROBLEM - IPsec on cp3031 is CRITICAL: Strongswan CRITICAL - ok: 42 connecting: cp1052_v4, cp1052_v6 [15:48:45] bblack: ^ [15:54:11] (03PS1) 10Paladox: Gerrit: Up the size for packedGitLimit to 5GB [puppet] - 10https://gerrit.wikimedia.org/r/317322 (https://phabricator.wikimedia.org/T148478) [15:54:20] twentyafterfour ^^ [15:54:39] (03PS2) 10Paladox: Gerrit: Up the size for packedGitLimit to 5GB [puppet] - 10https://gerrit.wikimedia.org/r/317322 (https://phabricator.wikimedia.org/T148478) [15:56:24] (03CR) 1020after4: [C: 031] "I'd like to get Chad's input on this one but I think it should definitely be larger than 500mb" [puppet] - 10https://gerrit.wikimedia.org/r/317322 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [15:57:45] (03PS3) 10Paladox: Gerrit: Up the size for packedGitLimit to 5GB [puppet] - 10https://gerrit.wikimedia.org/r/317322 (https://phabricator.wikimedia.org/T148478) [15:58:03] (03CR) 10Paladox: "Just changed it from GB to gb, just in case It will fail in gerrit." [puppet] - 10https://gerrit.wikimedia.org/r/317322 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [15:59:00] (03CR) 10Paladox: [C: 031] Gerrit: Up the size for packedGitLimit to 5GB [puppet] - 10https://gerrit.wikimedia.org/r/317322 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [15:59:13] (03CR) 1020after4: Gerrit: Up the size for packedGitLimit to 5GB (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/317322 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [16:01:08] (03PS4) 10Paladox: Gerrit: Up the size for packedGitLimit to 5GB [puppet] - 10https://gerrit.wikimedia.org/r/317322 (https://phabricator.wikimedia.org/T148478) [16:01:21] (03PS5) 10Paladox: Gerrit: Up the size for packedGitLimit to 2gb [puppet] - 10https://gerrit.wikimedia.org/r/317322 (https://phabricator.wikimedia.org/T148478) [16:01:24] (03CR) 10Paladox: [C: 031] Gerrit: Up the size for packedGitLimit to 2gb [puppet] - 10https://gerrit.wikimedia.org/r/317322 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [16:02:19] (03PS1) 10Aklapper: Also display column name when hiding/showing workboard columns [puppet] - 10https://gerrit.wikimedia.org/r/317323 [16:03:30] bleh [16:03:32] stupid host [16:04:12] (03CR) 10Paladox: Gerrit: Up the size for packedGitLimit to 2gb (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/317322 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [16:06:39] 06Operations, 10ops-eqiad, 10Traffic: cp1052 ethernet link down 2016-10-22 14:11 - https://phabricator.wikimedia.org/T148891#2736011 (10BBlack) 05Resolved>03Open It failed again: ``` 15:32 < icinga-wm> PROBLEM - Host cp1052 is DOWN: PING CRITICAL - Packet loss = 100% ``` @Joe depooled again ~15:37. Lea... [16:06:53] (03CR) 10Hashar: [C: 04-1] "The T148478 is/was most surely a hardware fault." [puppet] - 10https://gerrit.wikimedia.org/r/316983 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [16:07:28] ACKNOWLEDGEMENT - Host cp1052 is DOWN: PING CRITICAL - Packet loss = 100% Brandon Black https://phabricator.wikimedia.org/T148891 [16:08:14] back in a bit, I need to bounce my bouncer [16:08:59] (03CR) 10Paladox: "@Hashar it is unlikely to be another hardwhare fault if so how could this be resolved?" [puppet] - 10https://gerrit.wikimedia.org/r/316983 (https://phabricator.wikimedia.org/T148478) (owner: 10Paladox) [16:09:02] RECOVERY - puppet last run on cp3005 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [16:17:25] (03CR) 10Hashar: [C: 031] WIP: Adding Tyler to releasers-mediawiki group [puppet] - 10https://gerrit.wikimedia.org/r/316843 (https://phabricator.wikimedia.org/T148681) (owner: 10Chad) [16:55:01] PROBLEM - Disk space on cp4006 is CRITICAL: DISK CRITICAL - free space: / 346 MB (3% inode=86%) [17:04:10] ^looking at this (I think some others were corrected before, so looking at all) [17:07:04] RECOVERY - Disk space on cp4006 is OK: DISK OK [17:07:43] nginx unified error log, from the new spammy SSLv3 failure stuff in them [17:12:21] 06Operations, 10Traffic: nginx SSL_do_handshake spam filling disks - https://phabricator.wikimedia.org/T148893#2736029 (10BBlack) [17:34:49] (03CR) 10Alex Monk: "I think we should limit this so it only hides (from this particular list) creations of publicly-visible + logged-in-user-editable projects" [puppet] - 10https://gerrit.wikimedia.org/r/317316 (owner: 10Aklapper) [17:39:43] (03CR) 10Alex Monk: "Haven't reviewed the code but +1 to the idea" [puppet] - 10https://gerrit.wikimedia.org/r/317317 (owner: 10Aklapper) [17:40:13] RECOVERY - check_mailq on barium is OK: OK: mailq (982) is below threshold (1000/5000) [17:40:29] (03CR) 10Alex Monk: "Very helpful in theory (+1), haven't reviewed the code" [puppet] - 10https://gerrit.wikimedia.org/r/317321 (owner: 10Aklapper) [17:43:49] (03CR) 10Alex Monk: "(project_transaction.oldValue != "null" OR project_transaction.newValue NOT IN ("public", "users"))" [puppet] - 10https://gerrit.wikimedia.org/r/317316 (owner: 10Aklapper) [17:51:02] help... my exploer.exe crashed on win 7 pro and i cannot open task manger or anything using shortcut keys or nothing [17:52:00] Zppix: restart the system? [17:52:13] anyother solutions? [17:53:35] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [17:58:18] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [18:00:53] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [18:02:07] Zppix your in the wrong place to disccuss windows, try #microsoft [18:02:30] Zppix update to windows 10 (off topic) [18:02:48] i fixed it [18:02:56] test [18:03:02] Oh, still update to windows 10 :) [18:03:13] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [18:03:56] paladox i dont really like win 10 not only that but my graphics card wont support it 100% yet :/ waiting for the driver update to come out [18:04:29] Oh, Zppix if your waiting this long, it will be unlikly too, but intel 2nd gen doint support it either, but it works [18:07:00] paladox i dont plan on getting win 10 until they make it so everything (or atleast the important stuff) supports it anyway we're driffing to far off topic [18:07:49] Oh ok [18:22:52] (03PS4) 10Zppix: Adds translations to the user's lang in the links within the readme in the ROOT dir. [puppet] - 10https://gerrit.wikimedia.org/r/315728 [18:22:58] (03PS4) 10Zppix: Added a new commonly typed typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315743 [18:35:23] (03CR) 10Reedy: [C: 04-1] "Why? Translate isn't even enabled on wikitech" [puppet] - 10https://gerrit.wikimedia.org/r/315728 (owner: 10Zppix) [19:19:55] PROBLEM - parsoid on wtp1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:26:53] RECOVERY - parsoid on wtp1002 is OK: HTTP OK: HTTP/1.1 200 OK - 1514 bytes in 8.353 second response time [20:14:55] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [21:37:23] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 664 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 2993814 keys - replication_delay is 664 [21:55:55] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 2981653 keys - replication_delay is 0 [23:38:55] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:41:47] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues