[00:03:31] ooo [00:03:43] Happy 4th of July (UTC) everyone [00:06:10] What's special about today then? [00:06:31] Ohhh, US Independence Day. [00:07:48] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [00:07:58] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [00:08:48] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 00:08:44 UTC 2013 [00:09:06] Yeah, that small, irrelevant holiday. [00:09:18] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [00:09:48] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [00:09:48] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 00:09:39 UTC 2013 [00:09:58] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [00:10:23] odder: Obviously it's not a holiday in the UK... [00:10:48] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 00:10:40 UTC 2013 [00:11:28] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 00:11:25 UTC 2013 [00:11:48] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [00:11:58] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [00:12:15] Krenair: nor is it here, doesn't stop me wishing a happy holiday (UTC) to USians who are here [00:12:38] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 00:12:28 UTC 2013 [00:12:48] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [00:13:18] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 00:13:12 UTC 2013 [00:13:58] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [00:14:18] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 00:14:15 UTC 2013 [00:14:48] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [00:15:08] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 00:15:01 UTC 2013 [00:15:58] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [00:15:58] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 00:15:55 UTC 2013 [00:16:38] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 00:16:36 UTC 2013 [00:16:48] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [00:16:58] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [00:17:38] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 00:17:30 UTC 2013 [00:17:48] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [00:18:18] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 00:18:08 UTC 2013 [00:18:58] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 00:18:48 UTC 2013 [00:18:58] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [00:19:28] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 00:19:20 UTC 2013 [00:19:48] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [00:19:58] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [00:19:58] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 00:19:57 UTC 2013 [00:20:28] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 00:20:25 UTC 2013 [00:20:48] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [00:20:58] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [00:21:08] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 00:20:57 UTC 2013 [00:21:28] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 00:21:22 UTC 2013 [00:21:48] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [00:21:58] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [00:21:58] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 00:21:51 UTC 2013 [00:22:18] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 00:22:13 UTC 2013 [00:22:48] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [00:22:48] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 00:22:40 UTC 2013 [00:22:58] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [00:22:58] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 00:22:57 UTC 2013 [00:23:48] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [00:23:58] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [00:23:58] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 00:23:57 UTC 2013 [00:24:03] I thought someone said puppet freshness was disabled? [00:24:48] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [00:29:48] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 00:29:45 UTC 2013 [00:29:58] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [00:29:58] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 00:29:50 UTC 2013 [00:30:58] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [00:45:14] !log Add cp1056, cp1057, cp1069 and cp1070 to /etc/dsh/group/bits , they were missing and so the purge-varnish script didn't actually work [00:45:24] Logged the message, Mr. Obvious [00:50:25] PROBLEM - Disk space on analytics1006 is CRITICAL: DISK CRITICAL - free space: / 704 MB (3% inode=84%): [00:50:28] !log catrope synchronized php-1.22wmf8/extensions/VisualEditor 'Latest fixes' [00:50:37] Logged the message, Master [00:50:52] !log catrope synchronized php-1.22wmf9/extensions/VisualEditor 'Latest fixes' [00:51:02] Logged the message, Master [00:51:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:52:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time [00:59:17] New patchset: Manybubbles; "Setup gitreview" [operations/puppet/jmxtrans] (master) - https://gerrit.wikimedia.org/r/71949 [01:03:05] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset -0.003312706947 secs [01:03:15] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset -0.001386761665 secs [01:03:15] PROBLEM - RAID on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:04:25] PROBLEM - swift-account-replicator on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:04:35] PROBLEM - Disk space on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:04:35] PROBLEM - swift-account-reaper on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:04:35] PROBLEM - swift-account-auditor on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:04:35] PROBLEM - swift-container-server on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:04:35] PROBLEM - swift-container-replicator on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:04:35] PROBLEM - swift-object-auditor on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:04:36] PROBLEM - swift-account-server on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:04:45] PROBLEM - swift-container-updater on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:04:45] PROBLEM - swift-object-replicator on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:04:55] PROBLEM - SSH on ms-be5 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:04:55] PROBLEM - swift-container-auditor on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:04:55] PROBLEM - swift-object-server on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:05:15] PROBLEM - swift-object-updater on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:05:15] PROBLEM - DPKG on ms-be5 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:07:32] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [01:08:02] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [01:08:42] RECOVERY - swift-object-server on ms-be5 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [01:08:52] RECOVERY - swift-object-updater on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [01:09:02] RECOVERY - DPKG on ms-be5 is OK: All packages OK [01:09:12] RECOVERY - RAID on ms-be5 is OK: OK: State is Optimal, checked 1 logical device(s) [01:09:12] RECOVERY - swift-account-replicator on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [01:09:25] RECOVERY - SSH on ms-be5 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [01:09:25] RECOVERY - swift-account-reaper on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [01:09:25] RECOVERY - swift-account-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [01:09:32] RECOVERY - swift-object-auditor on ms-be5 is OK: PROCS OK: 2 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [01:09:32] RECOVERY - swift-container-server on ms-be5 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [01:09:32] RECOVERY - swift-account-server on ms-be5 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [01:09:32] RECOVERY - swift-container-auditor on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [01:09:32] RECOVERY - swift-container-replicator on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [01:09:32] RECOVERY - swift-container-updater on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [01:09:33] RECOVERY - swift-object-replicator on ms-be5 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [01:12:07] !log catrope synchronized php-1.22wmf8/resources/Resources.php 'collapsibleTabs fix' [01:12:18] Logged the message, Master [01:12:30] !log catrope synchronized php-1.22wmf8/resources/startup.js 'touch' [01:12:39] Logged the message, Master [01:12:53] !log catrope synchronized php-1.22wmf8/skins/vector/vector.js 'touch' [01:13:02] Logged the message, Master [01:13:15] !log catrope synchronized php-1.22wmf8/skins/vector/collapsibleTabs.js 'touch' [01:13:24] Logged the message, Master [01:13:38] !log catrope synchronized php-1.22wmf9/resources/Resources.php 'collapsibleTabs fix' [01:13:47] Logged the message, Master [01:14:00] !log catrope synchronized php-1.22wmf9/resources/startup.js 'touch' [01:14:09] Logged the message, Master [01:14:22] !log catrope synchronized php-1.22wmf9/skins/vector/vector.js 'touch' [01:14:31] Logged the message, Master [01:14:44] !log catrope synchronized php-1.22wmf9/skins/vector/collapsibleTabs.js 'touch' [01:14:54] Logged the message, Master [01:15:07] New patchset: Manybubbles; "Slick support for ganglia slope and units." [operations/puppet/jmxtrans] (master) - https://gerrit.wikimedia.org/r/71954 [01:16:41] New review: Manybubbles; "This is what I was thinking we should do to support ganglia's slope and units attributes." [operations/puppet/jmxtrans] (master) - https://gerrit.wikimedia.org/r/71954 [01:22:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:23:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.150 second response time [01:26:02] PROBLEM - Packetloss_Average on analytics1006 is CRITICAL: CRITICAL: packet_loss_average is 23.1485764444 (gt 8.0) [01:42:45] New review: Krinkle; "(1 comment)" [operations/puppet/jmxtrans] (master) - https://gerrit.wikimedia.org/r/71954 [01:45:58] !log catrope synchronized php-1.22wmf8/extensions/VisualEditor 'Fix Logged the message, Master [01:46:23] !log catrope synchronized php-1.22wmf9/extensions/VisualEditor 'Fix Logged the message, Master [01:52:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:53:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.150 second response time [02:01:21] RECOVERY - Disk space on ms-be5 is OK: DISK OK [02:07:35] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [02:07:45] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [02:14:33] !log LocalisationUpdate completed (1.22wmf8) at Thu Jul 4 02:14:32 UTC 2013 [02:14:44] Logged the message, Master [02:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [02:27:40] !log LocalisationUpdate completed (1.22wmf9) at Thu Jul 4 02:27:40 UTC 2013 [02:27:50] Logged the message, Master [02:34:13] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Jul 4 02:34:13 UTC 2013 [02:34:22] Logged the message, Master [02:52:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:53:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [03:03:53] PROBLEM - Puppet freshness on ms-be1002 is CRITICAL: No successful Puppet run in the last 10 hours [03:07:04] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [03:07:14] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [03:11:14] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: No successful Puppet run in the last 10 hours [03:22:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:23:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.136 second response time [03:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [04:08:13] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [04:08:13] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [04:08:33] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 04:08:28 UTC 2013 [04:09:13] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [04:09:13] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 04:09:08 UTC 2013 [04:09:43] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 04:09:38 UTC 2013 [04:10:13] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [04:10:13] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [04:10:23] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 04:10:13 UTC 2013 [04:10:43] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 04:10:38 UTC 2013 [04:11:13] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [04:11:13] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [04:11:13] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 04:11:11 UTC 2013 [04:11:43] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 04:11:34 UTC 2013 [04:12:13] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [04:12:13] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [04:12:23] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 04:12:22 UTC 2013 [04:12:53] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 04:12:47 UTC 2013 [04:13:13] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [04:13:13] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [04:13:33] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 04:13:25 UTC 2013 [04:13:43] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 04:13:38 UTC 2013 [04:14:13] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [04:14:13] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [04:17:13] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 04:17:10 UTC 2013 [04:18:13] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [04:29:53] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 04:29:49 UTC 2013 [04:30:13] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [04:43:36] PROBLEM - Solr on solr1 is CRITICAL: Average request time is 1739.0 (gt 400) [04:46:35] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 04:46:31 UTC 2013 [04:47:24] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [04:59:44] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 04:59:42 UTC 2013 [05:00:24] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [05:13:38] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [05:17:58] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [05:51:44] Change merged: Tim Starling; [operations/debs/wikimedia-task-appserver] (master) - https://gerrit.wikimedia.org/r/64023 [06:04:02] * Nemo_bis hugs Reedy :D thanks [06:09:34] PROBLEM - SSH on lvs5 is CRITICAL: Server answer: [06:10:34] RECOVERY - SSH on lvs5 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [06:11:42] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [06:11:42] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [06:13:44] TimStarling: what do you want to do w/ https://gerrit.wikimedia.org/r/#/c/57890/ ? you +1'd it with some suggestions in the past, and i've since taken your suggestions [06:14:32] PROBLEM - SSH on lvs6 is CRITICAL: Server answer: [06:14:34] i don't mind waiting on it if it's simply reluctance to monkey with the deployment scripts, just wanted to make sure it didn't fall off [06:14:50] you noticed I was in a merging mood today? [06:15:22] mayyyyybe. who's asking? [06:16:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:16:32] RECOVERY - SSH on lvs6 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [06:17:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [06:20:09] New patchset: Tim Starling; "Set common rsync and dsh parameters in mw-deployment-vars" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57890 [06:20:19] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57890 [06:22:09] neat, thanks. should we test it? [06:22:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:23:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 5.295 second response time [06:25:32] I ran puppetd -tv on a random apache, it succeeded [06:26:02] I'll run it on tin now, then I guess we can wait half an hour, then do a test scap [06:26:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:27:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.146 second response time [06:27:39] sounds good [06:30:42] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 06:30:37 UTC 2013 [06:31:02] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 06:30:55 UTC 2013 [06:31:42] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [06:31:42] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [06:31:52] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 06:31:48 UTC 2013 [06:32:02] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 06:32:00 UTC 2013 [06:32:42] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [06:32:42] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [06:40:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:42:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [06:52:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:53:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [06:55:53] New patchset: ArielGlenn; "add script for rsyncs of public data between dataset hosts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71964 [07:06:37] !log tstarling synchronized README [07:06:48] Logged the message, Master [07:08:23] sweet [07:08:44] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [07:09:04] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [07:09:39] not sure if that was a proper test [07:09:49] bbl [07:18:14] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [07:18:14] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [07:18:14] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [07:18:14] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [07:18:14] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [07:18:14] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [07:18:15] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [07:18:15] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [07:22:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:23:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [07:31:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:32:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [07:48:16] hello [07:53:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:54:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.143 second response time [07:55:50] PROBLEM - Disk space on wtp1010 is CRITICAL: DISK CRITICAL - free space: / 317 MB (3% inode=78%): [08:00:40] PROBLEM - Parsoid on wtp1010 is CRITICAL: Connection refused [08:06:30] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [08:06:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:07:10] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [08:07:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [08:07:50] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 08:07:41 UTC 2013 [08:08:10] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 08:08:00 UTC 2013 [08:08:10] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:30] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [08:08:30] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 08:08:24 UTC 2013 [08:08:50] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 08:08:40 UTC 2013 [08:09:10] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [08:09:30] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [08:16:30] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 08:16:22 UTC 2013 [08:17:10] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [08:22:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:22:40] RECOVERY - Parsoid on wtp1010 is OK: HTTP OK: HTTP/1.1 200 OK - 1373 bytes in 0.005 second response time [08:22:50] RECOVERY - Disk space on wtp1010 is OK: DISK OK [08:23:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [08:30:40] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 08:30:29 UTC 2013 [08:31:30] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [08:32:30] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset 0.003408432007 secs [08:32:30] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset 0.001331090927 secs [08:40:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:41:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [08:49:07] New patchset: Tim Starling; "Revert "Set common rsync and dsh parameters in mw-deployment-vars"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71967 [08:49:20] Change merged: Tim Starling; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71967 [08:53:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:54:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.154 second response time [08:54:30] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [09:02:30] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset -0.004135847092 secs [09:05:14] New patchset: Hashar; "contint: publish Zuul git repositories" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71968 [09:08:27] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [09:08:27] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [09:10:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:11:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [09:22:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:23:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [09:32:54] !log tstarling Started syncing Wikimedia installation... : [09:33:04] Logged the message, Master [09:36:14] !log tstarling Finished syncing Wikimedia installation... : [09:36:23] Logged the message, Master [09:52:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:53:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.133 second response time [10:02:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:03:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [10:09:22] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [10:10:02] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [10:10:02] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [10:10:30] New patchset: Hashar; "contint: publish Zuul git repositories" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71968 [10:16:13] New patchset: Hashar; "contint: publish Zuul git repositories" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71968 [10:18:36] New patchset: Hashar; "contint: publish Zuul git repositories" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71968 [10:21:12] New patchset: Hashar; "contint: publish Zuul git repositories" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71968 [10:23:04] New review: Hashar; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71968 [10:40:08] New patchset: Nikerabbit; "ULS deployment phase 5" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71971 [10:53:55] New review: Siebrand; "Scheduled for deployment in the LangEng deployment window 2013-07-09 08:00 UTC." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/71971 [11:07:56] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [11:08:26] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [11:10:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:11:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 5.794 second response time [11:52:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:53:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.148 second response time [11:59:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:00:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [12:00:33] !log uploaded buck version 0+git20130612-0+wmf1 at apt.wikimedia.org [12:00:43] Logged the message, Master [12:07:36] quiet today [12:07:40] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:08:00] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 12:07:53 UTC 2013 [12:08:20] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:08:30] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 12:08:22 UTC 2013 [12:08:40] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:09:00] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 12:08:58 UTC 2013 [12:09:20] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:09:30] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 12:09:25 UTC 2013 [12:09:40] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:10:00] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 12:09:57 UTC 2013 [12:10:20] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:10:30] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 12:10:22 UTC 2013 [12:10:40] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:10:42] New patchset: ArielGlenn; "add script for rsyncs of public data between dataset hosts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71964 [12:10:50] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 12:10:49 UTC 2013 [12:11:20] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 12:11:11 UTC 2013 [12:11:20] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:11:40] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:11:40] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 12:11:37 UTC 2013 [12:12:00] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 12:11:52 UTC 2013 [12:12:02] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71964 [12:12:20] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:12:40] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:17:00] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 12:16:59 UTC 2013 [12:17:20] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [12:23:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:24:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.136 second response time [12:27:03] hashar: loha [12:30:00] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 12:29:59 UTC 2013 [12:30:40] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [12:32:42] AzaToth: hi :-) writing doc this afternoon sorry. [12:34:32] good excuse :-P [12:35:54] hashar: only wondered if you've thought about pushing made debs to a external repo or not [12:36:20] or at least have a functionl reprepro repo for them available [12:36:34] lets have the jenkins jobs to build first :D [12:37:02] hashar: still waiting for jenkins to actually do anything :-P [12:37:11] * AzaToth hides under a pile of docs [12:37:32] AzaToth: the slaves do not have access to the Zuul git repository [12:37:38] I need some more engineering there :-] [12:37:44] ok [12:37:59] have no knownledge how zuul operates [12:39:26] New patchset: ArielGlenn; "enable rsync dumps cronjob for primary dataset host" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71977 [12:43:16] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71977 [12:50:18] hashar: why whould the slaves need to have access to the git repo anyway? [13:02:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:04:50] PROBLEM - Puppet freshness on ms-be1002 is CRITICAL: No successful Puppet run in the last 10 hours [13:05:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [13:05:35] New patchset: Mark Bergsma; "Add esams mobile LVS service IPs for HTTPS" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71979 [13:06:42] New patchset: ArielGlenn; "hosts that mount /data will do so from dataset host in same datacenter" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71980 [13:07:12] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71979 [13:11:27] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [13:11:37] New patchset: ArielGlenn; "hosts that mount /data will do so from dataset host in same datacenter" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71980 [13:12:17] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: No successful Puppet run in the last 10 hours [13:12:17] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [13:17:53] New patchset: Mark Bergsma; "Add esams mobile HTTPS service IPs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71981 [13:18:44] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71981 [13:22:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:23:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [13:30:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:32:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [13:37:52] RECOVERY - Disk space on analytics1006 is OK: DISK OK [13:40:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:41:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.136 second response time [13:52:32] New patchset: ArielGlenn; "hosts that mount /data will do so from dataset host in same datacenter" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71980 [13:56:40] !log - update mwlib to 0.15.10 [13:56:49] Logged the message, Master [13:57:13] !log restarted all services [13:57:25] Logged the message, Master [14:00:01] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71980 [14:02:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:03:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [14:05:44] New patchset: Mark Bergsma; "Add wikidata/wikivoyage LVS IPs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71982 [14:05:45] New patchset: Mark Bergsma; "Enable IPv6 for amslvs1 and amslvs2" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71983 [14:06:25] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71982 [14:08:43] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [14:08:53] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [14:10:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:10:50] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71983 [14:11:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [14:13:23] RECOVERY - Packetloss_Average on analytics1006 is OK: OK: packet_loss_average is 0.41879042471 [14:25:06] New patchset: Mark Bergsma; "Remove obsolete $ipv6_hosts construct" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71985 [14:26:13] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71985 [14:30:01] New patchset: Mark Bergsma; "Fix mobile IPv6 service IP name" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71986 [14:30:47] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71986 [14:34:42] New patchset: ArielGlenn; "add writeuptopageid to mwbzutils" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/71988 [14:46:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:47:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [14:57:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:59:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.133 second response time [15:04:06] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/71988 [15:04:22] New patchset: Mark Bergsma; "Create simple PyBal module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71992 [15:06:02] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [15:06:12] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [15:07:51] New patchset: Mark Bergsma; "Create simple PyBal module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71992 [15:09:22] New patchset: Mark Bergsma; "Create simple PyBal module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71992 [15:10:02] oh! [15:10:10] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71992 [15:11:35] and then people are bitching noone's converting stuff into modules [15:11:40] :) [15:11:50] even I did like 3 or 4 already [15:13:28] yurik, Dr0ptp4kt: around? [15:13:39] we got a question about a Zero deployment yesterday [15:14:43] New patchset: Mark Bergsma; "Migrate class lvs::balancer to the new PyBal module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71993 [15:16:12] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71993 [15:18:01] New patchset: Mark Bergsma; "Pass variable $site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71994 [15:18:54] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71994 [15:22:18] mark, faidon: do you know what the consequences of this Zero patch set https://gerrit.wikimedia.org/r/#/c/67504/ are on traffic that will be send to udp2log? (it was re-enabled yesterday in https://gerrit.wikimedia.org/r/#/c/71753/) [15:22:54] i meant paravoid, whoops [15:23:55] i haven't got a clue [15:24:05] why would it do anything for udp2log? [15:25:22] we are seeing a surge of incoming traffic and we suspect that it might be caused by that changeset [15:25:31] ottomata knows the details :) [15:26:10] this is the 2nd time that after deploying that patch we are having issues with dropped messages [15:26:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:26:43] first time can be a coincidence, two times indicates that it's related [15:27:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [15:33:42] perhaps it's causing some sort of redirect loop [15:34:05] yeah mark, i'm trying to see if there is possibly actually more traffic on the mobile varnishes [15:34:18] i'm looking at this rigiht now: [15:34:29] that should be really easy to see in the graphs if it's resulting in significantly more udp2log traffic [15:34:30] frontend.client_conn and pkts_in [15:34:49] client_req also [15:34:52] ok [15:34:53] frontend.client_req that is [15:35:08] http://ganglia.wikimedia.org/latest/graph_all_periods.php?title=&vl=&x=&n=&hreg%5B%5D=cp(1046%7C1047%7C1059%7C1060)&mreg%5B%5D=frontend.client_req>ype=stack&glegend=show&aggregate=1 [15:35:43] kind of hard to tell….looking at the last week level it does look like there has been more than historically over the last few hours [15:35:58] it looks a bit elevated yes [15:40:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:41:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [15:53:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:54:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [16:01:12] New patchset: Petr Onderka; "starting with dump format: file header" [operations/dumps/incremental] (gsoc) - https://gerrit.wikimedia.org/r/71995 [16:01:47] Change merged: Petr Onderka; [operations/dumps/incremental] (gsoc) - https://gerrit.wikimedia.org/r/71995 [16:11:29] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [16:11:29] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [16:17:21] Hm. So. I want to write a script that writes into the labs LDAP. Where is the right place to run that script from, canonically? [16:27:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:28:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [16:29:33] New patchset: Reedy; "(bug 50425) Add the flood flag on bs.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71243 [16:29:58] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71243 [16:30:29] New patchset: Reedy; "(bug 50287) Restrict local uploads on Meta-Wiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71252 [16:30:46] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71252 [16:31:06] New patchset: Reedy; "(bug 50377) Enable 'autopatrolled' group on hewikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71362 [16:31:36] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71362 [16:31:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:32:17] New patchset: Reedy; "(bug 50007) Add 'Translation' namespace to English Wikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71783 [16:32:35] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71783 [16:33:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [16:35:46] New patchset: Helder.wiki; "Install ArticleFeedbackv5 on pt.wikibooks.org" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71524 [16:39:51] New patchset: Reedy; "Update protection configs for core change I6bf650a3" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71538 [16:43:37] !log reedy synchronized wmf-config/InitialiseSettings.php [16:43:47] Logged the message, Master [16:44:14] New review: Reedy; "It could be useful, but does need rebasing" [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/65644 [17:09:24] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [17:09:54] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [17:19:14] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [17:19:14] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [17:19:14] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [17:19:14] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [17:19:14] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [17:19:14] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [17:19:15] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [17:19:15] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [17:25:24] drdee, pong [17:32:29] New patchset: Vogone; "Fixed import source for testwikidatawiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71998 [17:37:21] New review: Anomie; "Since I6bf650a3 has been merged in time for 1.22wmf10, this needs to be merged and deployed before 1..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71538 [17:42:38] ottomata, around? [17:48:05] yup hey, just responded to your email [17:48:10] yurik^ [17:58:05] ottomata, thx :) [18:04:04] New patchset: Matthias Mullie; "Add cron to generate CSVs for ee-dashboard" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/71282 [18:08:08] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [18:08:55] Bah, the muricans are lazing about and doing barbecues and stuff. [18:08:58] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [18:09:28] Change merged: Ottomata; [operations/puppet/jmxtrans] (master) - https://gerrit.wikimedia.org/r/71949 [18:11:37] New patchset: Vogone; "Fixed import source for testwikidatawiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/71998 [18:22:10] New review: Ottomata; "Phew, ok. That took me a while to parse what was going on there, but I get it." [operations/puppet/jmxtrans] (master) - https://gerrit.wikimedia.org/r/71954 [19:07:07] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [19:07:17] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [19:11:51] New patchset: Yuvipanda; "Organized the packages for exec_environ" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72001 [19:13:51] New patchset: Yuvipanda; "Add php5-xsl (XSLT processor) on request from Tpt" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72002 [19:15:55] New review: coren; "Trivial package addition." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/72002 [19:16:48] New review: coren; "That does look much better" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/72001 [19:16:49] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72001 [19:17:18] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72002 [19:24:23] New patchset: Ottomata; "Adding icinga check to make sure kafka brokers aren't getting too many produce requests" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72004 [19:24:39] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72004 [19:48:07] New patchset: ArielGlenn; "mwbzutils: clean up makefile and source in prep for debian packaging" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/72005 [20:33:24] New review: AzaToth; "I know it's difficult to produce a perfect handcrafted makefile." [operations/dumps] (ariel) C: -1; - https://gerrit.wikimedia.org/r/72005 [20:39:21] New review: QChris; "@Akosiaris" [operations/debs/buck] (master) - https://gerrit.wikimedia.org/r/70673 [20:46:14] qchris: my focus was only to have gerrit to actually build; it was way to complicated to figure out by my self how to do all other stuff, which I told them all (^daemon & C:o) [20:46:22] I don't know it tests works or not [20:46:29] never tested [20:48:00] qchris: and yea, I saw your comment back then, but I only saw the line "Upstreams buck start script sucks. Your variant is waaaay simpler. :-D", I never noticed there was any more comment there ヾ [20:48:14] sorry for that [20:48:20] AzaToth: Hi :-) Buck seems to build. And with the modifications I suggested, gerrit built. [20:48:33] AzaToth: No problem. That's why I commented again. [20:48:36] heh [20:48:44] can you whip up a patchie? [20:49:12] :-) I'll be busy with other stuff for some time. [20:49:26] Then I could try ... But as I said, your buid does not run on my machines :-/ [20:49:42] ah [20:49:55] I'll put it on my todo and see what I could do. [20:50:15] I put you in as reviewer, then you can test whether or not it works for you. [20:51:51] qchris: "sadly" BUCK_DIRECTORY doesn't exists ツ [20:52:16] qchris: the original buck system assumed there was a git directory where buck was run from [20:52:17] :-) You did rip out setting that env variable as well? [20:52:32] Yes. Buck is a pain. [20:52:43] It also wants to auto-update and recompile itself. [20:53:43] heh [20:53:55] and I haven't figure out a way to make offline build [20:54:13] it's all automagical [20:54:20] Yes :-( [21:00:01] qchris: BUILD FAILED: No directory all when resolving target //all:all in context FULLY_QUALIFIED [21:13:31] New patchset: Yuvipanda; "Arrange tool lab packages list in alphabetical order" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72046 [21:20:16] New patchset: Yuvipanda; "Add perl modules installed via user request (from SAL)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72047 [21:22:48] New review: coren; "Meh, I'm no fan of arbitrary orders, but if it helps others..." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/72046 [21:22:49] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72046 [21:23:54] New patchset: Yuvipanda; "Add python modules installed via user request (from SAL)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72048 [21:24:00] AzaToth: Is that an output from building gerrit? [21:24:35] AzaToth: To run the unit tests, use 'buck test --all -e slow' [21:25:15] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72047 [21:25:35] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72048 [21:26:50] New patchset: Yuvipanda; "Add php modules installed via user request (from SAL)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72049 [21:30:10] qchris: ah --all [21:30:38] qchris: I just copied what you had entered in the comment on gerrit ツ [21:31:27] AzaToth: Whoops. Sorry :-) I got it wrong there. [21:32:19] heh [21:33:25] New patchset: AzaToth; "Add testrunner classes" [operations/debs/buck] (master) - https://gerrit.wikimedia.org/r/72050 [21:33:32] qchris: ↑ [21:33:53] qchris: http://paste.debian.net/14449/ [21:35:07] AzaToth: Looks good. [21:35:17] AzaToth: Gonna try that on my machine as well :) [21:35:45] a bit ugly, but I didn't want to update buck.jar to include the testrunner files [21:49:54] New patchset: Lcarr; "Revert "Adding icinga check to make sure kafka brokers aren't getting too many produce requests"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72052 [21:50:06] fyi, reverting this analytics related check [21:51:47] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/72052 [22:06:45] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [22:09:20] !log icinga restarted, was down due to bad check [22:09:29] Logged the message, Mistress of the network gear. [22:12:13] LeslieCarr: what was bad about it? [22:12:45] I ask because when I saw the commit fly past on IRC, I made a mental note to study it and see how I could do the same for the services for which I need to set up monitoring. [22:13:56] alternately go back to enjoying July 4th and don't worry about my inane questions :) [22:16:55] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 22:16:53 UTC 2013 [22:17:05] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [22:17:05] RECOVERY - Puppet freshness on cp3001 is OK: puppet ran at Thu Jul 4 22:16:58 UTC 2013 [22:18:05] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [22:30:05] RECOVERY - Puppet freshness on dysprosium is OK: puppet ran at Thu Jul 4 22:29:56 UTC 2013 [22:30:35] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [23:05:01] PROBLEM - Puppet freshness on ms-be1002 is CRITICAL: No successful Puppet run in the last 10 hours [23:05:50] New patchset: DixonD; "A partial fix for https://bugzilla.wikimedia.org/show_bug.cgi?id=50561" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/72054 [23:06:09] PROBLEM - Puppet freshness on dysprosium is CRITICAL: No successful Puppet run in the last 10 hours [23:06:19] PROBLEM - Puppet freshness on cp3001 is CRITICAL: No successful Puppet run in the last 10 hours [23:12:59] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: No successful Puppet run in the last 10 hours [23:15:53] New patchset: QChris; "Add buck$py.class to include-binaries" [operations/debs/buck] (master) - https://gerrit.wikimedia.org/r/72055 [23:16:55] New patchset: QChris; "Add buck$py.class to include-binaries" [operations/debs/buck] (master) - https://gerrit.wikimedia.org/r/72055 [23:20:17] New review: QChris; "When also applying Iee6f6559e3cf7ab21a0b1e12a2cfca61fd4ae27f," [operations/debs/buck] (master) - https://gerrit.wikimedia.org/r/72050 [23:21:48] New review: QChris; "The inline comments have been addressed in separate changes:" [operations/debs/buck] (master) - https://gerrit.wikimedia.org/r/70673