[00:00:04] RoanKattouw ostriches Krenair: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20151231T0000). Please do the needful. [00:03:05] Last swat before the new year and nothing to swat! [00:03:12] * ostriches takes swat, declares it complete [00:03:20] Go me [00:04:46] What a complcated SWAT :D [00:05:01] SWAT simply isn't what it used to be. Sigh. [00:05:32] * ostriches rereminisces [00:47:57] ostriches: I looked at the description at rSVN, should we deactivate it, because there will be no commit at the future? [00:48:28] Deactivate? No. [00:48:32] ok [00:48:45] It's r/o [00:48:50] r/o? [00:48:54] readonly [00:49:19] yes, but there will be no commits at the future, so I think we don't need to import it four times a day? [00:49:28] It doesn't import ever? [00:49:33] It's locally hosted on Phab [00:49:41] wiat a moment [00:50:10] https://phabricator.wikimedia.org/diffusion/SVN/browse/GOODBYE and the last commit was in 2013 [00:50:26] https://phabricator.wikimedia.org/diffusion/SVN/history/ [00:50:58] with the current setting phabricator looks for updates every 6 hours. I know that's not often, but if we deactivate it, the deamons will not look for it [00:51:23] Can we tweak the update freq? [00:51:31] 6h is max [00:51:50] see: https://secure.phabricator.com/book/phabricator/article/diffusion_updates/ [00:52:30] Deactivating it makes it kinda hidden though. And I dunno what that'll do to rSVN123 links in a post too? [00:53:43] hm, ok then we can let it on [00:53:59] Can always try and find out :) [00:54:03] but we can think, if we want to deactivate archived extensions in the future [00:54:18] but currently "lowest" I guess ;) [00:55:41] Links still totally work. [00:55:50] * ostriches deactivates all 3 svn repos so they don't poll anymore [00:56:29] btw: concerning: T120915 Only about 700 extension left to tag with projects... yay :D [00:56:50] https://phabricator.wikimedia.org/diffusion/query/hr7W2k6q5Uf8/ [00:56:54] but these are only the repos which contain the word "extension" [00:57:08] ok, thanks [01:10:25] PROBLEM - puppet last run on mw1077 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:14:16] RECOVERY - puppet last run on mw1077 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [01:46:36] PROBLEM - Hadoop NodeManager on analytics1054 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [02:02:04] RECOVERY - Hadoop NodeManager on analytics1054 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [02:24:51] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 25s) [02:24:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:31:43] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Dec 31 02:31:43 UTC 2015 (duration 6m 52s) [02:31:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:42:35] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: puppet fail [02:44:15] PROBLEM - puppet last run on wtp2007 is CRITICAL: CRITICAL: puppet fail [03:09:11] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:12:01] RECOVERY - puppet last run on wtp2007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:31:07] PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:01] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:03] PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:12] PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 3 failures [06:34:02] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:32] PROBLEM - puppet last run on mw1112 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:41] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [06:56:53] RECOVERY - puppet last run on mw1112 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:11] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [06:57:12] RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [06:57:32] RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:57:51] RECOVERY - puppet last run on kafka1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:31] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:12] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:30:21] PROBLEM - configured eth on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:30:31] PROBLEM - RAID on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:30:51] PROBLEM - puppet last run on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:31:00] PROBLEM - nutcracker port on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:31:21] PROBLEM - SSH on mw1007 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:31:21] PROBLEM - nutcracker process on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:41] PROBLEM - dhclient process on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:33:11] PROBLEM - Disk space on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:33:12] PROBLEM - salt-minion processes on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:37:12] PROBLEM - DPKG on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:40:21] PROBLEM - puppet last run on mw1077 is CRITICAL: CRITICAL: Puppet has 7 failures [07:41:11] RECOVERY - Disk space on mw1007 is OK: DISK OK [07:41:12] RECOVERY - salt-minion processes on mw1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [07:41:13] RECOVERY - DPKG on mw1007 is OK: All packages OK [07:47:32] PROBLEM - salt-minion processes on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:47:32] PROBLEM - DPKG on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:51:32] PROBLEM - Disk space on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:54:52] RECOVERY - dhclient process on mw1007 is OK: PROCS OK: 0 processes with command name dhclient [08:01:12] PROBLEM - dhclient process on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:10:49] PROBLEM - HHVM rendering on mw1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:11:48] PROBLEM - Apache HTTP on mw1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:12:29] PROBLEM - configured eth on mw1077 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:12:38] PROBLEM - nutcracker process on mw1077 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:12:39] PROBLEM - HHVM processes on mw1077 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:13:19] PROBLEM - salt-minion processes on mw1077 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:13:29] PROBLEM - Check size of conntrack table on mw1077 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:13:39] PROBLEM - RAID on mw1077 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:13:48] PROBLEM - SSH on mw1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:13:59] PROBLEM - dhclient process on mw1077 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:14:00] PROBLEM - DPKG on mw1077 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:14:19] PROBLEM - nutcracker port on mw1077 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:14:28] PROBLEM - Disk space on mw1077 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:20:09] RECOVERY - dhclient process on mw1077 is OK: PROCS OK: 0 processes with command name dhclient [08:20:12] RECOVERY - DPKG on mw1077 is OK: All packages OK [08:20:19] RECOVERY - nutcracker port on mw1077 is OK: TCP OK - 0.000 second response time on port 11212 [08:20:29] RECOVERY - Disk space on mw1077 is OK: DISK OK [08:20:29] RECOVERY - configured eth on mw1077 is OK: OK - interfaces up [08:20:39] RECOVERY - nutcracker process on mw1077 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [08:20:48] RECOVERY - HHVM processes on mw1077 is OK: PROCS OK: 6 processes with command name hhvm [08:21:20] RECOVERY - salt-minion processes on mw1077 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [08:21:38] RECOVERY - Check size of conntrack table on mw1077 is OK: OK: nf_conntrack is 0 % full [08:21:48] RECOVERY - RAID on mw1077 is OK: OK: no RAID installed [08:21:49] RECOVERY - SSH on mw1077 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [08:21:58] RECOVERY - Apache HTTP on mw1077 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.138 second response time [08:22:59] RECOVERY - HHVM rendering on mw1077 is OK: HTTP OK: HTTP/1.1 200 OK - 65726 bytes in 0.135 second response time [08:23:09] RECOVERY - puppet last run on mw1077 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [08:43:02] RECOVERY - dhclient process on mw1007 is OK: PROCS OK: 0 processes with command name dhclient [08:49:12] PROBLEM - dhclient process on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:05:10] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 639 [09:15:10] RECOVERY - check_mysql on db1008 is OK: Uptime: 837501 Threads: 142 Questions: 34372081 Slow queries: 10223 Opens: 58755 Flush tables: 2 Open tables: 417 Queries per second avg: 41.041 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [09:16:01] PROBLEM - Hadoop NodeManager on analytics1051 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [09:20:10] RECOVERY - Hadoop NodeManager on analytics1051 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [09:51:37] PROBLEM - puppet last run on mw1131 is CRITICAL: CRITICAL: Puppet has 1 failures [10:00:16] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 643 [10:00:25] 10Ops-Access-Requests, 6operations, 10Analytics, 10ContentTranslation-Analytics, and 2 others: access for amire80 to stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T122524#1911454 (10Arrbee) Hello, the request is approved from my side. Thanks. [10:10:13] RECOVERY - check_mysql on db1008 is OK: Uptime: 840802 Threads: 150 Questions: 34435193 Slow queries: 10382 Opens: 58756 Flush tables: 2 Open tables: 418 Queries per second avg: 40.955 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [10:11:24] RECOVERY - Disk space on mw1007 is OK: DISK OK [10:12:54] RECOVERY - salt-minion processes on mw1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:17:04] RECOVERY - puppet last run on mw1131 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [10:17:43] PROBLEM - Disk space on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:19:04] PROBLEM - salt-minion processes on mw1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [10:35:12] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 742 [10:37:22] RECOVERY - SSH on mw1007 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 (protocol 2.0) [10:37:33] RECOVERY - Disk space on mw1007 is OK: DISK OK [10:37:42] RECOVERY - RAID on mw1007 is OK: OK: no RAID installed [10:37:43] RECOVERY - DPKG on mw1007 is OK: All packages OK [10:37:53] RECOVERY - salt-minion processes on mw1007 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:38:12] RECOVERY - configured eth on mw1007 is OK: OK - interfaces up [10:38:13] RECOVERY - dhclient process on mw1007 is OK: PROCS OK: 0 processes with command name dhclient [10:38:33] RECOVERY - nutcracker port on mw1007 is OK: TCP OK - 0.000 second response time on port 11212 [10:38:43] RECOVERY - nutcracker process on mw1007 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [10:40:12] RECOVERY - check_mysql on db1008 is OK: Uptime: 842602 Threads: 155 Questions: 34467921 Slow queries: 10417 Opens: 58758 Flush tables: 2 Open tables: 418 Queries per second avg: 40.906 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [10:41:13] RECOVERY - puppet last run on mw1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:45:24] PROBLEM - Hadoop NodeManager on analytics1049 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [12:03:55] 10Ops-Access-Requests, 6operations, 10Analytics, 10ContentTranslation-Analytics, and 2 others: access for amire80 to stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T122524#1911500 (10jcrespo) The access is now in review, a minimum of 3 days is required for security review. That would usually mean... [12:10:18] RECOVERY - Hadoop NodeManager on analytics1049 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [12:44:08] PROBLEM - Hadoop NodeManager on analytics1050 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [13:00:50] RECOVERY - Hadoop NodeManager on analytics1050 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [16:07:36] PROBLEM - puppet last run on restbase-test2001 is CRITICAL: CRITICAL: puppet fail [16:33:53] RECOVERY - puppet last run on restbase-test2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:42:21] PROBLEM - RAID on db1052 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [16:56:05] 6operations, 10ops-eqiad: db1052 degraded RAID - https://phabricator.wikimedia.org/T122703#1911685 (10jcrespo) 3NEW [17:23:59] 6operations, 10Parsoid, 10Wikimedia-Site-Requests: please deploy fetch-sitematrix update - https://phabricator.wikimedia.org/T122548#1911705 (10Krenair) a:5Krenair>3None [17:24:20] 6operations, 10Parsoid, 10Wikimedia-Site-Requests: please deploy parsoid sitematrix update - https://phabricator.wikimedia.org/T122548#1907298 (10Krenair) [20:45:40] PROBLEM - salt-minion processes on rutherfordium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:45:49] PROBLEM - Disk space on rutherfordium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:46:00] PROBLEM - HTTP-peopleweb on rutherfordium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:46:09] PROBLEM - Check size of conntrack table on rutherfordium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:46:39] PROBLEM - RAID on rutherfordium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:46:40] PROBLEM - DPKG on rutherfordium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:46:50] PROBLEM - puppet last run on rutherfordium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:47:10] PROBLEM - dhclient process on rutherfordium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:47:11] PROBLEM - configured eth on rutherfordium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:46:45] PROBLEM - Hadoop NodeManager on analytics1045 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [21:58:16] PROBLEM - puppet last run on mw2081 is CRITICAL: CRITICAL: puppet fail [21:59:05] RECOVERY - Hadoop NodeManager on analytics1045 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [22:06:44] PROBLEM - NTP on rutherfordium is CRITICAL: NTP CRITICAL: No response from NTP server [22:26:55] RECOVERY - puppet last run on mw2081 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:38:22] PROBLEM - SSH on rutherfordium is CRITICAL: Server answer [22:40:32] RECOVERY - SSH on rutherfordium is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0) [22:49:52] PROBLEM - puppet last run on mw2079 is CRITICAL: CRITICAL: puppet fail [23:14:08] PROBLEM - SSH on rutherfordium is CRITICAL: Server answer [23:18:18] RECOVERY - SSH on rutherfordium is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0) [23:18:48] RECOVERY - puppet last run on mw2079 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:30:39] PROBLEM - SSH on rutherfordium is CRITICAL: Server answer [23:31:48] RECOVERY - SSH on rutherfordium is OK: SSH OK - OpenSSH_6.7p1 Debian-5 (protocol 2.0) [23:55:16] PROBLEM - SSH on rutherfordium is CRITICAL: Server answer