[07:18:14] PROBLEM - RAID on mc15 is CRITICAL: Timeout while attempting connection [07:18:24] PROBLEM - Disk space on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:18:35] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [07:18:40] RECOVERY - Disk space on mc15 is OK: DISK OK [07:18:43] !log allowed labs access to the new labsdb hosts [07:19:00] Logged the message, Mistress of the network gear. [07:19:30] gooood morning [07:32:00] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset -0.005519151688 secs [07:32:20] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset -0.007647514343 secs [08:01:50] !log Running git gc on Zuul git repositories (gallium: /var/lib/zuul/git/operations/puppet ) [08:01:59] Logged the message, Master [08:08:01] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 08:07:53 UTC 2013 [08:08:11] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:31] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:09:41] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 08:09:33 UTC 2013 [08:09:52] !log zuul: running git gc on all Zuul git repositories: find /var/lib/zuul/git -maxdepth 4 -type d -name .git -print -exec git --git-dir={} gc \; [08:10:01] Logged the message, Master [08:10:31] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:10:41] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 08:10:38 UTC 2013 [08:11:31] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:15:02] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 08:14:54 UTC 2013 [08:15:31] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:18:10] New review: ArielGlenn; "Thanks a lot for the patchset and the cleanup." [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/64343 [08:20:56] Change restored: Hashar; "(no reason)" [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/64436 [08:20:59] New patchset: Hashar; "Jenkins job validation (DO NOT SUBMIT).." [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/64436 [08:25:26] Change abandoned: Hashar; "(no reason)" [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/64436 [08:26:26] Change restored: Hashar; "(no reason)" [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/64436 [08:26:33] New patchset: Hashar; "Jenkins job validation (DO NOT SUBMIT)..." [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/64436 [08:26:48] Change abandoned: Hashar; "(no reason)" [operations/puppet/cdh4] (master) - https://gerrit.wikimedia.org/r/64436 [08:58:31] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [08:59:31] New patchset: ArielGlenn; "bugfix: convert revision timestamp from datestring format to db format" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/64787 [09:01:25] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/64787 [09:11:36] New review: Hashar; "Fixed https://bugzilla.wikimedia.org/show_bug.cgi?id=47850" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63391 [09:41:00] !log made Jeroen De Dauw a maintainer of https://packagist.org/packages/mediawiki/data-values [09:41:07] Logged the message, Master [09:48:49] New patchset: Matthias Mullie; "Disable AFTv5 talk page links" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64789 [09:54:10] apergos: Are any of the dump machines still running < 12.04? [09:54:30] the two dataset hosts, and they are on my list. I must schedule them [09:54:42] aha [09:54:45] mm the snapshot hosts in tampa, but they'lll get done as things are moved off [09:54:52] the ones in eqiad are updated [09:54:57] i was just wondering if snapshots::packages had the redundant else block [09:55:11] Cause it was queried about our own "extension" usage in https://gerrit.wikimedia.org/r/64776 [09:55:35] not redundant yet [09:55:43] That's fine then [09:55:57] Though, probably means the normal README needs updating further [09:56:06] ie new ubuntu install packaged [09:56:10] uh [09:56:19] the package I use is still from our copy [09:56:30] really? [09:56:41] yes [09:56:42] Precise: [09:56:42] package { [ 'subversion', 'php5', 'php5-cli', 'php5-mysql', 'mysql-client-5.5', 'p7zip-full', 'libicu42', 'utfnormal' ]: [09:56:46] uh huh [09:56:49] else [09:56:49] package { [ 'subversion', 'php5', 'php5-cli', 'php5-mysql', 'mysql-client-5.1', 'p7zip-full', 'libicu42', 'wikimedia-php5-utfnormal' ]: [09:57:16] I oughta know, I built the precise packages we are using [09:57:32] heh [09:57:36] How come it's renamed? [09:57:51] cause I was stupid about the package name when I first put it together [09:58:00] when I did it again on precise I tried to make it suck less [09:58:01] Aha [09:58:14] So it's not actually debian/ubnutu packaged upstream [09:58:17] it still might suck. but I guarantee that it sucks less. [09:58:18] no. [09:58:26] faiir enough [09:58:41] no upstream. I can't think who would want it there. [10:00:51] sorry, I probably should have put that in the changelog 'rename package and make it all suck less' but meh [10:01:12] heh [10:01:17] I didn't look at the changelog [10:11:13] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [10:12:13] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [10:32:13] PROBLEM - Puppet freshness on colby is CRITICAL: No successful Puppet run in the last 10 hours [11:05:12] New patchset: Odder; "(bug 48075) Localise $wgSiteName, $wgMetaNamespace for dvwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64798 [11:08:53] New patchset: Odder; "(bug 48075) Localise $wgSiteName, $wgMetaNamespace for dvwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64798 [11:27:52] PROBLEM - DPKG on mc15 is CRITICAL: Timeout while attempting connection [11:28:44] RECOVERY - DPKG on mc15 is OK: All packages OK [12:00:45] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:02:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [12:06:05] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [12:06:05] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [12:06:05] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [12:08:21] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 12:08:12 UTC 2013 [12:08:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:10:21] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 12:10:13 UTC 2013 [12:10:31] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:11:51] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 12:11:41 UTC 2013 [12:12:31] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:12:51] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 12:12:45 UTC 2013 [12:13:31] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:13:51] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 12:13:41 UTC 2013 [12:14:31] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:14:32] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 12:14:30 UTC 2013 [12:15:32] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:16:02] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 12:15:56 UTC 2013 [12:16:31] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:21:01] PROBLEM - Puppet freshness on db1017 is CRITICAL: No successful Puppet run in the last 10 hours [13:13:03] !log anomie synchronized php-1.22wmf3/extensions/CentralAuth 'Update CentralAuth to master, for SUL2 testing on testwiki and test2wiki' [13:13:11] Logged the message, Master [13:13:35] !log anomie synchronized php-1.22wmf4/extensions/CentralAuth 'Update CentralAuth to master, for SUL2 testing on testwiki and test2wiki' [13:13:43] Logged the message, Master [13:14:52] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 13:14:51 UTC 2013 [13:15:15] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [13:19:32] PROBLEM - DPKG on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:20:22] RECOVERY - DPKG on mc15 is OK: All packages OK [13:22:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:23:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.143 second response time [13:37:19] PROBLEM - Host db1025 is DOWN: PING CRITICAL - Packet loss = 100% [13:40:19] RECOVERY - Host db1025 is UP: PING OK - Packet loss = 0%, RTA = 3.79 ms [13:52:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:53:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [13:56:41] New patchset: Anomie; "Set $wgCentralAuthLoginWiki for test & test2" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64813 [13:57:03] Change merged: Anomie; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64813 [13:58:41] !log anomie synchronized wmf-config/InitialiseSettings.php 'Set $wgCentralAuthLoginWiki for test & test2 (SUL2 testing)' [13:58:49] Logged the message, Master [13:59:01] !log anomie synchronized wmf-config/CommonSettings.php 'Set $wgCentralAuthLoginWiki for test & test2 (SUL2 testing)' [13:59:09] Logged the message, Master [14:01:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:02:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [14:04:15] Hmm. Why is login.wikimedia.org not seeing the memcached key that was just set from test.wikipedia.org? [14:23:16] New patchset: coren; "Labsdb: Make labsdb100? mysqld not read_only" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64814 [14:23:39] Can someone urgently review this please? ^^ [14:23:58] Once I fix the typo [14:24:09] :-) [14:24:48] New patchset: coren; "Labsdb: Make labsdb100? mysqld not read_only" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64814 [14:26:41] New review: coren; "Seems okay." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/64814 [14:34:52] Oh, that explains it. testwiki is served from pmtpa, so it's not using the same memcached servers as loginwiki. Ugh. [14:35:54] New patchset: Anomie; "Disable $wgCentralAuthLoginWiki for testwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64815 [14:36:31] Change merged: Anomie; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64815 [14:37:30] !log anomie synchronized wmf-config/InitialiseSettings.php 'Disable $wgCentralAuthLoginWiki for testwiki' [14:37:38] Logged the message, Master [14:38:20] New patchset: coren; "Labsdb: Make labsdb100? mysqld not read_only" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64814 [14:43:02] New patchset: coren; "Labsdb: Make labsdb100? mysqld not read_only" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64814 [14:45:01] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64814 [14:49:20] New patchset: coren; "Fixy silly typo 'readonly' -> 'read_only'" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64816 [14:50:02] New review: coren; "Trivial typo fix. Self-merging." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/64816 [14:50:03] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64816 [14:50:16] PROBLEM - SSH on lvs6 is CRITICAL: Server answer: [14:51:16] RECOVERY - SSH on lvs6 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [15:49:09] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64267 [15:53:43] hmmm, weird, anybody seen this before? [15:53:52] this host has stats in ganglia: [15:53:53] http://ganglia.wikimedia.org/latest/?c=Analytics%20cluster%20eqiad&h=analytics1021.eqiad.wmnet&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2 [15:53:54] but [15:54:14] I can't find metrics for anything but analytics1001-analytics1010 via search or aggregate graphs [16:07:56] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 16:07:47 UTC 2013 [16:08:58] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:08:58] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 16:08:53 UTC 2013 [16:09:56] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:09:56] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 16:09:52 UTC 2013 [16:10:56] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:11:26] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 16:11:24 UTC 2013 [16:11:56] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:14:16] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [16:14:16] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [16:14:16] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [16:14:57] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 16:14:47 UTC 2013 [16:14:58] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:17:16] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [16:17:16] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [16:42:59] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [16:45:21] PROBLEM - Host analytics1009 is DOWN: PING CRITICAL - Packet loss = 100% [16:48:59] RECOVERY - Host analytics1009 is UP: PING OK - Packet loss = 0%, RTA = 0.47 ms [17:05:15] PROBLEM - DPKG on labsdb1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [17:08:16] RECOVERY - DPKG on labsdb1001 is OK: All packages OK [17:10:37] PROBLEM - RAID on labsdb1001 is CRITICAL: Timeout while attempting connection [17:12:15] PROBLEM - Host labsdb1001 is DOWN: PING CRITICAL - Packet loss = 100% [17:12:25] New patchset: Ottomata; "Adding monitoring alerts for Kafka brokers on analytics1021 and analytics1022" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64822 [17:13:06] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64822 [17:13:30] !log replacing disk10 (L11) on ms-be1009 [17:13:39] Logged the message, Master [17:14:35] PROBLEM - Host labsdb1003 is DOWN: PING CRITICAL - Packet loss = 100% [17:14:43] New patchset: Physikerwelt; "Intial version of puppet script for LaTeXML" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61767 [17:15:25] RECOVERY - Host labsdb1001 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [17:16:25] PROBLEM - mysqld processes on labsdb1002 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [17:18:35] PROBLEM - Host labsdb1002 is DOWN: PING CRITICAL - Packet loss = 100% [17:19:26] RECOVERY - Host labsdb1003 is UP: PING OK - Packet loss = 0%, RTA = 0.35 ms [17:19:37] New patchset: Ottomata; "Fixing kafka::server class include" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64825 [17:19:52] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64825 [17:22:25] RECOVERY - Host labsdb1002 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [17:25:25] RECOVERY - mysqld processes on labsdb1002 is OK: PROCS OK: 3 processes with command name mysqld [17:29:35] PROBLEM - NTP on labsdb1001 is CRITICAL: NTP CRITICAL: Offset unknown [17:34:18] PROBLEM - NTP on labsdb1003 is CRITICAL: NTP CRITICAL: Offset unknown [17:34:58] New patchset: Odder; "(bug 46864) Disable 'toc-floated' user option on hevoy" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61058 [17:35:38] RECOVERY - NTP on labsdb1001 is OK: NTP OK: Offset 0.003293395042 secs [17:39:19] RECOVERY - NTP on labsdb1003 is OK: NTP OK: Offset 0.003236651421 secs [17:44:20] !log rebooting several fundraising hosts for kernel upgrades [17:44:28] Logged the message, Master [17:48:38] PROBLEM - Host db1008 is DOWN: PING CRITICAL - Packet loss = 100% [17:48:53] !log powering down db26 for last time [17:49:01] Logged the message, Master [17:50:08] PROBLEM - check_mysql on db1025 is CRITICAL: Slave IO: Connecting Slave SQL: Yes Seconds Behind Master: (null) [17:51:18] RECOVERY - Host db1008 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [17:55:08] RECOVERY - check_mysql on db1025 is OK: Uptime: 14966 Threads: 1 Questions: 6162 Slow queries: 3 Opens: 108 Flush tables: 2 Open tables: 63 Queries per second avg: 0.411 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [18:02:38] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64789 [18:03:52] New patchset: Jgreen; "attempt to make puppet conf db1008 as mysql master once again" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64831 [18:04:21] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64831 [18:08:17] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [18:15:57] !log bsitu synchronized wmf-config/InitialiseSettings.php 'Disable AFTv5 talk page links' [18:16:05] Logged the message, Master [18:16:43] !log bsitu synchronized wmf-config/CommonSettings.php 'Disable AFTv5 talk page links' [18:16:52] Logged the message, Master [18:18:52] New patchset: Asher; "auto start mysql-multi instances" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64834 [18:58:30] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [18:59:00] PROBLEM - Host analytics1003 is DOWN: PING CRITICAL - Packet loss = 100% [19:03:21] RECOVERY - Host analytics1003 is UP: PING OK - Packet loss = 0%, RTA = 3.16 ms [19:07:37] PROBLEM - DPKG on analytics1005 is CRITICAL: Timeout while attempting connection [19:09:07] PROBLEM - Host analytics1006 is DOWN: PING CRITICAL - Packet loss = 100% [19:09:07] PROBLEM - Host analytics1004 is DOWN: PING CRITICAL - Packet loss = 100% [19:09:17] PROBLEM - Host analytics1005 is DOWN: PING CRITICAL - Packet loss = 100% [19:13:28] RECOVERY - DPKG on analytics1005 is OK: All packages OK [19:13:37] RECOVERY - Host analytics1004 is UP: PING OK - Packet loss = 0%, RTA = 1.28 ms [19:13:37] RECOVERY - Host analytics1006 is UP: PING OK - Packet loss = 0%, RTA = 0.74 ms [19:13:37] RECOVERY - Host analytics1005 is UP: PING OK - Packet loss = 0%, RTA = 0.59 ms [19:17:45] starting scap [19:21:56] !log kaldari Started syncing Wikimedia installation... : [19:22:05] Logged the message, Master [19:22:50] New review: MZMcBride; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64635 [19:23:21] ahahahaha [19:24:14] New review: MZMcBride; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64229 [19:27:11] !log kaldari Finished syncing Wikimedia installation... : [19:27:19] Logged the message, Master [19:48:42] Labs outgoing IPs. Just 208.80.153.224? [19:49:19] New patchset: Catrope; "[WIP DO NOT MERGE] New Parsoid Varnish puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [19:49:36] New patchset: Catrope; "Parsoid VCL refinements" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64008 [19:58:06] New patchset: Catrope; "New Parsoid Varnish puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [19:59:22] New patchset: Catrope; "Parsoid VCL refinements" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64008 [20:08:05] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 20:08:02 UTC 2013 [20:09:05] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:09:35] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 20:09:24 UTC 2013 [20:10:04] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:10:29] New patchset: Tim Landscheidt; "Add script sql to Tools." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64847 [20:10:35] RECOVERY - Puppet freshness on mc15 is OK: puppet ran at Tue May 21 20:10:22 UTC 2013 [20:10:54] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 20:10:45 UTC 2013 [20:10:57] im getting a permissions error running git fetch on fenari [20:11:00] er [20:11:02] tin i mean [20:11:04] awjrichards@tin:/a/common/php-1.22wmf4$ git fetch [20:11:04] remote: Counting objects: 14420, done [20:11:05] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:11:05] remote: Finding sources: 100% (23/23) [20:11:05] remote: Getting sizes: 100% (9/9) [20:11:05] remote: Compressing objects: 100% (9/9) [20:11:06] error: insufficient permission for adding an object to repository database .git/objects [20:11:08] fatal: failed to write object [20:11:48] awjr: Probably someone with a bad umask screwed up the perms [20:11:51] Looking [20:12:04] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 20:11:54 UTC 2013 [20:12:05] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [20:12:05] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:12:11] thanks RoanKattouw, yeah, doing git fetch /a/common/php-1.22wmf3 is fine [20:12:34] awjr: This the security release? [20:12:45] James_F: no; regular mobile deployment [20:12:54] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 20:12:52 UTC 2013 [20:13:04] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:13:13] awjr: I blame kaldari [20:13:22] usually a good bet, RoanKattouw [20:13:25] drwxr-xr-x 2 kaldari wikidev 4096 May 21 18:33 dd [20:13:34] oops [20:13:35] kaldari: Please set your umask on tin to 0002 [20:13:41] I though I had my mask fixed [20:13:46] lemme check... [20:13:54] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 20:13:44 UTC 2013 [20:14:10] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:14:18] awjr: OK try again now? [20:14:35] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 20:14:25 UTC 2013 [20:14:55] awjr, RoanKattouw: sorry about that. Mask is fixed now [20:15:03] RoanKattouw: yup, works [20:15:04] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:15:06] do I need to fix any files? [20:15:09] thanks RoanKattouw and kaldari [20:15:15] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 20:15:09 UTC 2013 [20:15:39] kaldari: I fixed them [20:16:04] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:16:10] RoanKattouw: is 0002 a better mask than 022? [20:16:18] yes [20:16:26] New patchset: Catrope; "New Parsoid Varnish puppetization" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/63890 [20:16:31] 22 prohibits write to your group [20:16:37] ah [20:17:05] New patchset: Catrope; "Parsoid VCL refinements" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64008 [20:17:16] https://en.wikipedia.org/wiki/Umask#Setting_the_mask_using_octal_notation [20:23:15] PROBLEM - Host db1013 is DOWN: PING CRITICAL - Packet loss = 100% [20:24:15] RECOVERY - Host db1013 is UP: PING OK - Packet loss = 0%, RTA = 0.29 ms [20:26:34] PROBLEM - mysqld processes on db1013 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [20:31:34] RECOVERY - mysqld processes on db1013 is OK: PROCS OK: 1 process with command name mysqld [20:33:04] PROBLEM - Puppet freshness on colby is CRITICAL: No successful Puppet run in the last 10 hours [20:38:32] PROBLEM - NTP on db1013 is CRITICAL: NTP CRITICAL: Offset unknown [20:43:32] RECOVERY - NTP on db1013 is OK: NTP OK: Offset 0.007891893387 secs [20:44:53] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Tue May 21 20:44:47 UTC 2013 [20:45:52] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [21:03:56] !log awjrichards Started syncing Wikimedia installation... : Regular weekly MobileFrontend deployment [21:04:04] Logged the message, Master [21:10:27] !log awjrichards Finished syncing Wikimedia installation... : Regular weekly MobileFrontend deployment [21:10:35] Logged the message, Master [21:13:50] New patchset: Physikerwelt; "Intial version of puppet script for LaTeXML" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/61767 [21:25:19] New patchset: Pyoungmeister; "adding a labsdb management tool and asociated class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64862 [21:49:00] New patchset: Pyoungmeister; "adding a labsdb management tool and asociated class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64862 [22:00:11] PROBLEM - Host cp1035 is DOWN: PING CRITICAL - Packet loss = 100% [22:01:31] RECOVERY - Varnish HTTP upload-backend on cp1035 is OK: HTTP OK: HTTP/1.1 200 OK - 632 bytes in 0.001 second response time [22:01:41] RECOVERY - Host cp1035 is UP: PING OK - Packet loss = 0%, RTA = 0.46 ms [22:06:18] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [22:06:18] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [22:06:18] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [22:20:08] PROBLEM - Host cp1036 is DOWN: PING CRITICAL - Packet loss = 100% [22:21:18] PROBLEM - Puppet freshness on db1017 is CRITICAL: No successful Puppet run in the last 10 hours [22:21:28] RECOVERY - Host cp1036 is UP: PING OK - Packet loss = 0%, RTA = 0.19 ms [23:06:23] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [23:08:53] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [23:50:11] PROBLEM - DPKG on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:51:01] RECOVERY - DPKG on mc15 is OK: All packages OK