[00:01:43] <ori>	 bd808: the nodes are showing now, and jobs are processing, i think
[00:01:51] <ori>	 but jenkins reports they have 0 bytes of available swap space
[00:02:52] <bd808>	 w00t. yeah I see jobs running
[00:03:12] <bd808>	 I don't know if the swap space thing is important or not
[00:03:16] <bd808>	 I would how the jobs don't push things into swap normally
[00:04:48] <ori>	 !log (at 23:46 UTC) restarted nova-compute on labvirt1002
[00:04:53] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[00:05:35] <legoktm>	 so it's probably the nova-compute restart that is fixing it?
[00:05:50] <bd808>	 Seems likely.
[00:06:42] <bd808>	 That also kind of makes sense. They would disappear from the Jenkins UI entirely do to network reachability issues
[00:07:07] <YuviPanda>	 hello
[00:07:11] <ori>	 nodepoold on labnodepool1001 was hung, did not respond to SIGKILL
[00:07:28] <YuviPanda>	 I see others have already fixed stuff :)
[00:08:06] <YuviPanda>	 bd808: it's also safe to restart nova-compute, it being down only prevents new instance scheduling / restarts / logging(?), so provided it comes back up it's ok
[00:08:31] <ori>	 strace showed it (nodepoold) was waiting on a lock ; could not get a useful stack trace from gdb
[00:09:50] <ori>	 i think it was waiting for a reply and was stuck in a blocking read
[00:10:11] <ori>	 anyways
[00:10:13] <ori>	 \o
[00:10:17] <bd808>	 thanks ori. Should we add some notes to that phab task? Twice in 72 hours seems like there is something systemic that needs to be looked into
[00:10:17] <ori>	 bye
[00:11:01] <ori>	 bd808: i doubt i discovered anything andrewbogott didnt' already know
[00:11:24] <ori>	 but yeah, re: something systemic that needs to be looked into
[00:11:40] <YuviPanda>	 labs has systemic issues? noway!
[00:11:47] <bd808>	 I'll add some notes. thanks again for using your superpowers
[00:20:20] <wikibugs>	 6operations, 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure, and 3 others: rake-jessie jobs stuck due to no ci-jessie-wikimedia slaves being attached to Jenkins - https://phabricator.wikimedia.org/T122731#1912365 (10bd808) Restored ~2016-01-011T23:58  @ori and @legoktm both attempted...
[00:53:03] <grrrit-wm>	 (03CR) 10BryanDavis: "Filed T122734 for memory cgroups" [puppet] - 10https://gerrit.wikimedia.org/r/245920 (owner: 10BryanDavis)
[00:57:12] <YuviPanda>	 bd808: do you want me to import the jessie deb now?
[00:57:35] <bd808>	 sure!
[00:57:54] <YuviPanda>	 kkk
[00:58:44] <bd808>	 I need to test the latest upstream version in labs sometime soon too
[00:59:00] * bd808 makes a task to remind himself to do that
[01:00:38] <YuviPanda>	 bd808: do you want the exact version we have for trusty?
[01:00:41] <YuviPanda>	 for jessie?
[01:00:45] <YuviPanda>	 or shall I just import the latest
[01:01:11] <bd808>	 Getting the latest would be fine
[01:01:42] <bd808>	 I need to test that 1.8.1 doesn't break something on trusty before we switch that
[01:01:53] <bd808>	 but nothing is using it for jessie yet
[01:01:59] <YuviPanda>	 ok
[01:04:13] <YuviPanda>	 !log imported vagrant 1.8.1 for jessie per bd808
[01:04:15] <YuviPanda>	 bd808: done
[01:04:18] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[01:04:29] <bd808>	 thanks YuviPanda 
[01:08:38] <grrrit-wm>	 (03CR) 10BryanDavis: "Yuvi imported Vagrant 1.8.1 into the jessie apt repo" [puppet] - 10https://gerrit.wikimedia.org/r/245920 (owner: 10BryanDavis)
[01:54:22] <icinga-wm>	 PROBLEM - Disk space on elastic1006 is CRITICAL: DISK CRITICAL - free space: / 1061 MB (3% inode=95%)
[02:24:30] <logmsgbot>	 !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.9) (duration: 10m 09s)
[02:24:36] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[02:31:28] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Jan  2 02:31:28 UTC 2016 (duration 6m 58s)
[02:31:36] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[03:00:33] <grrrit-wm>	 (03PS2) 10Luke081515: Changed user group rights at trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261869 (https://phabricator.wikimedia.org/T122710) 
[03:01:08] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Changed user group rights at trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261869 (https://phabricator.wikimedia.org/T122710) (owner: 10Luke081515)
[03:02:10] <grrrit-wm>	 (03PS3) 10Luke081515: Changed user group rights at trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261869 (https://phabricator.wikimedia.org/T122710) 
[03:02:36] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Changed user group rights at trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261869 (https://phabricator.wikimedia.org/T122710) (owner: 10Luke081515)
[03:03:38] <grrrit-wm>	 (03PS4) 10Luke081515: Changed user group rights at trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261869 (https://phabricator.wikimedia.org/T122710) 
[03:04:00] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Changed user group rights at trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261869 (https://phabricator.wikimedia.org/T122710) (owner: 10Luke081515)
[03:05:10] <grrrit-wm>	 (03PS5) 10Luke081515: Changed user group rights at trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261869 (https://phabricator.wikimedia.org/T122710) 
[03:05:30] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Changed user group rights at trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261869 (https://phabricator.wikimedia.org/T122710) (owner: 10Luke081515)
[03:09:28] <grrrit-wm>	 (03PS6) 10Luke081515: Changed user group rights at trwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261869 (https://phabricator.wikimedia.org/T122710) 
[03:15:16] <grrrit-wm>	 (03PS1) 10Base: Added noindex rule for uawikimedia's ns2 Bug: T122732 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261902 (https://phabricator.wikimedia.org/T122732) 
[03:34:46] <twentyafterfour>	 !log deploying https://gerrit.wikimedia.org/r/261725, restarted apache2 on iridium
[03:34:51] <morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master
[04:23:32] <icinga-wm>	 PROBLEM - Disk space on elastic1004 is CRITICAL: DISK CRITICAL - free space: / 1062 MB (3% inode=95%)
[04:48:43] <icinga-wm>	 PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[05:10:26] <icinga-wm>	 RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 27079 bytes in 0.092 second response time
[06:28:13] <icinga-wm>	 RECOVERY - Disk space on elastic1004 is OK: DISK OK
[06:31:13] <icinga-wm>	 PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:22] <icinga-wm>	 PROBLEM - puppet last run on wtp2008 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:42] <icinga-wm>	 PROBLEM - puppet last run on ms-be1010 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:31:43] <icinga-wm>	 PROBLEM - puppet last run on mw2021 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:11] <icinga-wm>	 PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:32:52] <icinga-wm>	 PROBLEM - puppet last run on mw2016 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:33:12] <icinga-wm>	 PROBLEM - puppet last run on holmium is CRITICAL: CRITICAL: Puppet has 2 failures
[06:33:33] <icinga-wm>	 PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:34:21] <icinga-wm>	 PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:34:51] <icinga-wm>	 PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:34:52] <icinga-wm>	 PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 2 failures
[06:36:22] <icinga-wm>	 PROBLEM - puppet last run on mw2088 is CRITICAL: CRITICAL: Puppet has 1 failures
[06:37:12] <icinga-wm>	 PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: puppet fail
[06:55:22] <icinga-wm>	 RECOVERY - puppet last run on holmium is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures
[06:56:23] <icinga-wm>	 RECOVERY - puppet last run on wtp2008 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures
[06:57:02] <icinga-wm>	 RECOVERY - puppet last run on mw2016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:02] <icinga-wm>	 RECOVERY - puppet last run on ms-be1010 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures
[06:57:03] <icinga-wm>	 RECOVERY - puppet last run on mw2021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:22] <icinga-wm>	 RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:57:51] <icinga-wm>	 RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:11] <icinga-wm>	 RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[06:58:21] <icinga-wm>	 RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:59:02] <icinga-wm>	 RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:59:41] <icinga-wm>	 RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[07:00:01] <icinga-wm>	 RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[07:01:34] <icinga-wm>	 RECOVERY - puppet last run on mw2088 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[10:35:10] <icinga-wm>	 PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 650
[10:40:20] <icinga-wm>	 PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 951
[10:45:10] <icinga-wm>	 RECOVERY - check_mysql on db1008 is OK: Uptime: 1015702 Threads: 143 Questions: 39014068 Slow queries: 12603 Opens: 58844 Flush tables: 2 Open tables: 416 Queries per second avg: 38.410 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0
[12:00:39] <icinga-wm>	 PROBLEM - High load average on labstore1001 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [24.0]
[12:03:10] <icinga-wm>	 PROBLEM - puppet last run on auth2001 is CRITICAL: CRITICAL: puppet fail
[12:04:09] <icinga-wm>	 RECOVERY - High load average on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0]
[12:12:09] <icinga-wm>	 PROBLEM - Hadoop NodeManager on analytics1036 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[12:20:18] <icinga-wm>	 PROBLEM - puppet last run on elastic1006 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago
[12:22:29] <icinga-wm>	 RECOVERY - Hadoop NodeManager on analytics1036 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager
[12:32:01] <icinga-wm>	 PROBLEM - High load average on labstore1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [24.0]
[12:33:40] <icinga-wm>	 RECOVERY - puppet last run on auth2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[14:10:44] <icinga-wm>	 PROBLEM - tools-home on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 20 seconds
[14:11:09] <valhallasw`cloud>	 ^ yes, seems positively dead
[14:11:59] <valhallasw`cloud>	 or... very very slow, at least
[14:12:34] <icinga-wm>	 RECOVERY - tools-home on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 973513 bytes in 5.487 second response time
[14:25:15] <icinga-wm>	 PROBLEM - tools-home on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 20 seconds
[14:27:14] <icinga-wm>	 RECOVERY - tools-home on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 973454 bytes in 12.739 second response time
[14:28:54] <icinga-wm>	 PROBLEM - puppet last run on mw2081 is CRITICAL: CRITICAL: puppet fail
[14:39:23] <icinga-wm>	 RECOVERY - High load average on labstore1001 is OK: OK: Less than 50.00% above the threshold [16.0]
[14:56:42] <icinga-wm>	 RECOVERY - puppet last run on mw2081 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[15:21:37] <icinga-wm>	 PROBLEM - puppet last run on mw2090 is CRITICAL: CRITICAL: puppet fail
[15:29:38] <grrrit-wm>	 (03CR) 10Alex Monk: [C: 031] Added noindex rule for uawikimedia's ns2 Bug: T122732 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261902 (https://phabricator.wikimedia.org/T122732) (owner: 10Base)
[15:29:55] <grrrit-wm>	 (03PS2) 10Alex Monk: Added noindex rule for uawikimedia's user namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261902 (https://phabricator.wikimedia.org/T122732) (owner: 10Base)
[15:49:27] <icinga-wm>	 RECOVERY - puppet last run on mw2090 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[17:31:43] <grrrit-wm>	 (03PS1) 10Hoo man: Provide a latest link for the Wikidata JSON dumps [puppet] - 10https://gerrit.wikimedia.org/r/261949 (https://phabricator.wikimedia.org/T72247) 
[18:50:11] <icinga-wm>	 PROBLEM - puppet last run on ms-be3003 is CRITICAL: CRITICAL: puppet fail
[19:04:48] <grrrit-wm>	 (03PS5) 10Tim Landscheidt: Avoid breaking full phabricator URLs [puppet] - 10https://gerrit.wikimedia.org/r/256663 (https://phabricator.wikimedia.org/T75997) (owner: 10Thiemo Mättig (WMDE))
[19:12:10] <grrrit-wm>	 (03PS6) 10Nemo bis: Avoid breaking full phabricator URLs [puppet] - 10https://gerrit.wikimedia.org/r/256663 (https://phabricator.wikimedia.org/T75997) (owner: 10Thiemo Mättig (WMDE))
[19:17:34] <icinga-wm>	 RECOVERY - puppet last run on ms-be3003 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[21:24:04] <grrrit-wm>	 (03PS1) 10RLuts: Enable WikidataPageBanner extension on Ukrainian Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261994 (https://phabricator.wikimedia.org/T121999) 
[21:28:20] <grrrit-wm>	 (03CR) 10Base: [C: 031] Enable WikidataPageBanner extension on Ukrainian Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261994 (https://phabricator.wikimedia.org/T121999) (owner: 10RLuts)
[21:38:20] <grrrit-wm>	 (03PS1) 10Florianschmidtwelzow: Remove $wgCopyrightIcons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261999 
[21:39:30] <grrrit-wm>	 (03CR) 10Reedy: [C: 031] Remove $wgCopyrightIcons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261999 (owner: 10Florianschmidtwelzow)
[21:49:53] <grrrit-wm>	 (03PS2) 10Florianschmidtwelzow: Remove $wgCopyrightIcon [mediawiki-config] - 10https://gerrit.wikimedia.org/r/261999 
[22:27:03] <icinga-wm>	 PROBLEM - puppet last run on ms-be1017 is CRITICAL: CRITICAL: puppet fail
[22:53:03] <icinga-wm>	 RECOVERY - puppet last run on ms-be1017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures