[00:08:06] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 00:07:55 UTC 2013 [00:08:46] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:09:06] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 00:08:59 UTC 2013 [00:09:46] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:09:56] PROBLEM - RAID on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:10:06] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 00:09:58 UTC 2013 [00:10:46] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [00:10:46] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:10:56] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 00:10:47 UTC 2013 [00:11:46] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:12:16] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 00:12:08 UTC 2013 [00:12:46] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:16:36] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 00:16:34 UTC 2013 [00:16:46] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:21:06] !log deployed change 64875 to virt0 [00:21:15] Logged the message, Master [00:58:54] PROBLEM - Disk space on mc15 is CRITICAL: Timeout while attempting connection [00:59:24] PROBLEM - Swift HTTP on ms-fe4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:59:44] RECOVERY - Disk space on mc15 is OK: DISK OK [01:01:44] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset 5.149841309e-05 secs [01:02:24] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset 0.002368807793 secs [01:11:12] PROBLEM - Swift HTTP on ms-fe1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:11:42] PROBLEM - Swift HTTP on ms-fe3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:12:32] PROBLEM - Apache HTTP on mw1156 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:13:02] RECOVERY - Swift HTTP on ms-fe1 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.058 second response time [01:13:12] PROBLEM - Apache HTTP on mw1158 is CRITICAL: Connection timed out [01:13:32] PROBLEM - Apache HTTP on mw1159 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:13:32] PROBLEM - Apache HTTP on mw1153 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:13:32] RECOVERY - Swift HTTP on ms-fe3 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.058 second response time [01:14:14] PROBLEM - Apache HTTP on mw1160 is CRITICAL: Connection timed out [01:14:33] PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: Connection timed out [01:15:04] PROBLEM - DPKG on ms-fe3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:15:04] PROBLEM - Apache HTTP on mw1157 is CRITICAL: Connection timed out [01:15:14] PROBLEM - Apache HTTP on mw1155 is CRITICAL: Connection timed out [01:15:25] PROBLEM - RAID on ms-fe3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:08:27] !log LocalisationUpdate completed (1.22wmf4) at Wed May 22 02:08:27 UTC 2013 [02:08:47] Logged the message, Master [02:14:48] !log LocalisationUpdate completed (1.22wmf3) at Wed May 22 02:14:48 UTC 2013 [02:14:56] Logged the message, Master [02:35:44] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed May 22 02:35:44 UTC 2013 [02:35:53] Logged the message, Master [06:27:59] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 06:27:51 UTC 2013 [06:28:39] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:28:59] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 06:28:49 UTC 2013 [06:29:40] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:29:40] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 06:29:35 UTC 2013 [06:30:39] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:33:09] PROBLEM - Puppet freshness on colby is CRITICAL: No successful Puppet run in the last 10 hours [06:41:45] New patchset: Nikerabbit; "Disable Narayam on commons now that they have ULS" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64884 [06:53:57] PROBLEM - Host wikipedia-lb.esams.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100% [06:53:59] PROBLEM - Host bits-lb.esams.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100% [06:54:29] RECOVERY - Host bits-lb.esams.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 86.21 ms [06:54:31] RECOVERY - Host wikipedia-lb.esams.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 86.14 ms [06:59:17] PROBLEM - RAID on ms-fe1 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:59:38] PROBLEM - Swift HTTP on ms-fe1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:00:18] RECOVERY - RAID on ms-fe1 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [07:00:29] RECOVERY - Swift HTTP on ms-fe1 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.059 second response time [07:03:17] PROBLEM - RAID on ms-fe1 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:03:37] PROBLEM - Swift HTTP on ms-fe1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:04:07] RECOVERY - RAID on ms-fe1 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [07:04:38] RECOVERY - Swift HTTP on ms-fe1 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 8.629 second response time [07:04:57] PROBLEM - Disk space on mc15 is CRITICAL: Timeout while attempting connection [07:05:57] RECOVERY - Disk space on mc15 is OK: DISK OK [07:45:18] PROBLEM - RAID on ms-fe1 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:46:38] RECOVERY - Swift HTTP on ms-fe4 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.057 second response time [07:49:08] RECOVERY - RAID on ms-fe1 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [07:50:07] !log restarted swift proxy on ms-fe1 [07:50:18] Logged the message, Master [07:59:43] hey mark, Magnus (Snaps) and I are trying to find a date/time to demo the progress on varnishkafka; what would be a convenient date/time for you? [08:02:08] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset -0.00244987011 secs [08:02:38] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset -0.006417512894 secs [08:06:20] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [08:06:20] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [08:06:20] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:00] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 08:07:52 UTC 2013 [08:08:00] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:50] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 08:08:44 UTC 2013 [08:09:00] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:09:40] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 08:09:30 UTC 2013 [08:10:01] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:13:32] New patchset: Hashar; "beta: configuration for Wikidata" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61428 [08:14:08] New review: Hashar; "Rebased to use wmf-config/wgConfVHosts-labs.php Not sure why we do not have a www.wikidata.org ent..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61428 [08:15:00] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 08:14:52 UTC 2013 [08:15:00] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:21:20] PROBLEM - Puppet freshness on db1017 is CRITICAL: No successful Puppet run in the last 10 hours [08:35:01] New review: Hashar; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61428 [08:47:48] PROBLEM - Host bits-lb.esams.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100% [08:47:49] PROBLEM - Host wikipedia-lb.esams.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100% [08:48:29] RECOVERY - Host wikipedia-lb.esams.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 86.08 ms [08:48:30] RECOVERY - Host bits-lb.esams.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 86.65 ms [08:48:48] PROBLEM - RAID on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:49:40] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [09:16:02] hi can someone run extensions/TimedMediaHandler/maintenance/resetTranscodes.php on commons. there are some quite old jobs that got never run, want to reinsert them [09:58:48] probably stupid question: www.wikidata.org/w/extensions/Wikibase/docs/summaries.txt is accessible through the browser, but www.wikidata.org/w/extensions/Wikibase/docs/ontology.owl is not - would anyone know why? [10:08:53] LeslieCarr ping [10:09:17] LeslieCarr: what is IP range for labs? I mean the IP addresses that foreign servers see when people from labs instances connect to them [10:09:29] paravoid ^ [10:11:25] or anyone else who might know that... [10:12:08] isn't it 10.42.0.0/something petan ? [10:12:21] matanya the public IP address range I mean [10:12:23] 208.80.153.128/25 [10:12:26] like 208.80.153.163 is one of them [10:12:27] oh [10:12:29] mark: ty [10:12:37] what mark said :) [11:16:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:17:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [11:19:15] PROBLEM - SSH on mc15 is CRITICAL: Connection timed out [11:20:05] RECOVERY - SSH on mc15 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [11:20:25] PROBLEM - RAID on mc15 is CRITICAL: Timeout while attempting connection [11:21:16] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [11:31:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:33:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [11:38:55] PROBLEM - DPKG on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:40:43] RECOVERY - DPKG on mc15 is OK: All packages OK [11:48:07] New patchset: Nemo bis; "(bug 40341) Enable translation import on wikis with Translate extension" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64919 [11:51:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:52:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.150 second response time [12:07:59] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 12:07:56 UTC 2013 [12:09:00] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:09:30] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 12:09:24 UTC 2013 [12:10:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:10:39] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 12:10:38 UTC 2013 [12:11:01] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:11:50] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 12:11:49 UTC 2013 [12:12:00] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:13:00] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 12:12:50 UTC 2013 [12:13:00] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:13:49] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 12:13:40 UTC 2013 [12:14:01] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:14:29] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 12:14:24 UTC 2013 [12:14:59] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:15:09] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 12:15:02 UTC 2013 [12:15:39] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [12:15:39] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [12:15:39] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [12:16:00] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:18:40] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [12:18:40] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [12:21:31] New patchset: ArielGlenn; "More documentation of bz2 multistream and index files" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/64921 [12:23:13] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/64921 [12:44:13] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [12:51:03] PROBLEM - RAID on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:51:53] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [13:08:20] PROBLEM - Disk space on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:11:10] RECOVERY - Disk space on mc15 is OK: DISK OK [13:22:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:23:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [13:52:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:53:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [14:09:40] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [14:13:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:14:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time [14:14:30] PROBLEM - RAID on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:15:21] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [14:20:43] andrewbogott_afk: Silke_WMDE or anyone who knows about puppet, i am getting an error running puppet for the wikidata test system [14:20:46] http://dpaste.com/1195230/ [14:21:05] how is it trying to run both mediawiki single node and the wikidata one? [14:22:38] it's new, for one thing in mediawiki_singlenode or atleast newishly updated [14:23:57] * aude may have fixed it [14:29:31] PROBLEM - RAID on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:30:20] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [14:32:35] aude: fixed? [14:32:42] no [14:32:45] coming [14:32:52] thanks [14:34:28] Silke_WMDE: Fun Fact: Replication is teh workingz. :-) [14:52:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:53:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [14:53:51] Coren: ohhh.... replicating what where exactly? [14:55:18] Coren: Yay! [14:55:47] I should update the roadmap one last time before leaving [14:55:49] :) [14:56:31] DanielK_WMDE: Replicating 4/7 clusters to labs [14:57:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:57:21] Silke_WMDE: does that include cross-server replication of commons (and perhaps wikidata)? [14:57:35] is there a page describing what is filtered, and how? [14:58:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.158 second response time [14:58:55] DanielK_WMDE: The WMF approach is: federated tables instead of cross db joins [14:59:11] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [14:59:24] DanielK_WMDE: list of cluster working: https://wikitech.wikimedia.org/wiki/ToolLabsDatabasePlan [14:59:27] federated tables have TERRIBLE performance on big joins [14:59:40] do i have to be in the tools project to use? [14:59:42] and joning the image table against commons is just that [14:59:46] and it'S the prime use case [15:00:10] DanielK_WMDE: Convince them! ;) [15:00:36] why are we talking in this channel btw? [15:00:45] Silke_WMDE, Coren: you can try with federated tables, but be prepared to change that. i'm pretty sure it's not going to work for the "join local imagelinks table against commons image table" use case [15:00:58] coren is here..... [15:01:03] ...which is the prime use case for replicating commons to all servers [15:01:45] is there a way i can kill "notice: Run of Puppet configuration client already in progress; skipping" [15:01:57] it froze while logged in [15:02:15] aude yes [15:02:19] how? [15:02:27] find the process with ps aux | grep puppetd [15:02:30] kill -9 it [15:02:32] ok [15:02:36] restart puppet [15:02:46] brb [15:03:14] works [15:03:33] but not wikidata singlenode [15:09:56] DanielK_WMDE: Actually, that's a case I expect should work very well since the where clause is on an indexed column [15:10:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:10:32] New patchset: Ottomata; "Changing metrics.wikimedia.org htpasswd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64942 [15:11:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [15:11:36] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64942 [15:11:42] Coren: but as far as i know, the index data is not replicated to the federated table. so a join means comparing two indexes chunk by chunk over the network. [15:11:44] * aude found the problem with wikidata singlenode :) [15:11:58] Coren: that's not as bad as a table scan over the network, but still [15:12:06] will take a few minutes to fix stuff [15:16:23] DanielK_WMDE: That's basically it. I checked, and performance is quite adequate for that scheme. [15:16:49] DanielK_WMDE: It's apparently fairly smart about not transferring the whole index [15:17:29] PROBLEM - SSH on mc15 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:18:39] Daniel plus, baseline performance is high enough that might even still tip the balance. [15:18:49] Coren, petan, I just wanted to let you know that I ported a script that uses the replicated databases from toolserver to Labs today; not only did it work fine, but a script that typically took 3-4 hours to complete on TS was finished in 3-4 minutes here! good job!! [15:18:52] :-) [15:18:54] Coren: do i have to be in tools project to access the db? [15:19:16] aude: Yes, it's the only project it's accessible from atm. Post-amsterdam, we'll generalize. [15:19:19] RECOVERY - SSH on mc15 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [15:19:22] hmmmm, ok [15:19:27] * aude will be joining tools :) [15:19:30] :-) [15:20:00] i already am in bots [15:20:14] Coren how the accounts for db are being created? [15:20:15] What's your wikitech username? I'll add you now [15:20:23] Coren: same as irc [15:20:30] Coren I noticed that toolwatcher is now getting login directly from the replica file instead of generating random pw [15:20:41] Coren but how the replica file is created? o.O [15:21:00] petan: From the NFS server, which doubles as the DB manager (since it has access to all the filesystems for the creds) [15:21:22] ok, so how the nfs server knows it should generate credentials? [15:21:26] aude: {{done}} [15:21:37] yay! [15:21:39] I mean which process is creating these accounts? and where it lives? [15:21:44] aude: Useful pointers: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Database_access [15:21:55] ok [15:21:57] petan: It's a watcher that lives on the NFS server. [15:22:07] !replicateddb is https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Database_access [15:22:07] Key was added [15:22:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:22:31] does it create account for every user, or only for every user who is in tools project? [15:22:38] Right now, we have s1, s2, s4, s5. s3 s6 should arrive later today. [15:22:50] petan: It creates account for every user that is on a project that uses NFS. [15:22:55] aha [15:23:08] so technically projects outside tools should be able to login to db right now? if they use nfs? [15:23:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [15:23:54] petan: AFAIK, the only one is deployment-prep, but they don't have the network magic to get to the DBs just yet. Like I said, post-Amsterdam. Perhaps we'll even have time to do that there, depending on how busy Ryan and I get. [15:24:09] mhm [15:24:39] petan: If you're thinking bots, look at the /etc/hosts and /etc/iptables.conf hack on tools. But beware: that's a hack that'll change shortly! [15:24:49] ok [15:35:41] http://www.wikidata.org/w/extensions/Wikibase/docs/summaries.txt is accessible through the browser, but http://www.wikidata.org/w/extensions/Wikibase/docs/ontology.owl is not - would anyone know why? (both files are in the extension) [16:08:06] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 16:07:58 UTC 2013 [16:08:06] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:09:26] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 16:09:25 UTC 2013 [16:10:06] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:11:34] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 16:10:41 UTC 2013 [16:11:34] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:11:56] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 16:11:52 UTC 2013 [16:12:06] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:12:56] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [16:12:56] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 16:12:48 UTC 2013 [16:13:06] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:13:46] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 16:13:38 UTC 2013 [16:14:06] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:14:26] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 16:14:22 UTC 2013 [16:15:06] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:15:06] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 16:15:04 UTC 2013 [16:15:56] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [16:16:01] New patchset: coren; "Tool Labs: install 'dc' package (user request)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64950 [16:16:07] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:17:08] New review: coren; "Is baby patch. Won't harm flies." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/64950 [16:17:09] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64950 [16:33:56] PROBLEM - Puppet freshness on colby is CRITICAL: No successful Puppet run in the last 10 hours [16:40:08] PROBLEM - Host colby is DOWN: PING CRITICAL - Packet loss = 100% [17:21:16] What would result in puppet ensure=>present installing a package which apt can immediately upgrade? Doesn't ensure=>present install the newest version available? [17:25:22] getting ready for dirsync zero extension to wmf4 [17:28:32] New patchset: ArielGlenn; "bugfixes: handle deleted text; workaround dupl text ids" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/64955 [17:33:18] Anyone around who can help me with server 'singer' ? [17:33:33] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/64955 [17:33:51] what's up with it? [17:34:00] sbernardin: [17:34:13] Hey [17:34:30] It's not booting into the OS [17:35:04] oh? what's the last message you see? [17:35:20] mutante had me reboot it [17:35:28] But still nothing [17:35:45] It posts but shows no boot device [17:36:44] dyrsyncing [17:36:48] zero ext [17:36:56] When that didn't work...I was told to try and put singers drives into 'colby' [17:37:05] Still the same thing [17:37:18] on colby you mean? [17:37:20] awesome [17:38:07] ah after a dist-upgrade [17:38:10] that's a dra [17:38:12] g [17:39:20] only thing in puppet is misc::secure and I thought we were living without that, maybe not yet [17:40:24] !log yurik synchronized php-1.22wmf4/extensions/ZeroRatedMobileAccess/ [17:40:28] what is the disk setup there anyways? what did it have? [17:40:32] Logged the message, Master [17:40:49] 4 mw and 1 srv servers failed ssh connection [17:42:34] sbernardin: ? [17:42:47] apergos: 2 160gb drives [17:43:10] Don't know how they were setup [17:44:45] https://bugzilla.wikimedia.org/show_bug.cgi?id=48693 [17:44:49] Anyone seen that? [17:45:17] !log yurik synchronized php-1.22wmf3/extensions/ZeroRatedMobileAccess/ [17:45:26] Logged the message, Master [17:46:08] so we don't know if they were raided up or anything [17:46:11] wunnerful [17:46:26] apergos: singer goes back some time and it was misc box...probably software raid1 [17:46:53] Theo10011: yes, there was a little discussion about it, the people doing deployment know [17:47:27] * apergos guesses it's a grub issue, but why that would be, no idea [17:47:37] are the disks back in singer now? [17:47:52] and hopefully *in the same slots* ?? [17:48:05] Thanks apergos. Someone said the patch is waiting to be merged/deployed, a few hours ago. Any idea how long that can take? [17:48:16] https://gerrit.wikimedia.org/r/#/c/64946/ [17:48:18] ^ that one [17:48:35] don't know, too much travelling happening right now [17:48:49] :| [17:48:50] k [17:48:53] sorry... [17:49:19] it's alright, I'll find something else to do. [17:49:21] apergos: should I put the disks back in singer? [17:49:22] Thanks. [17:49:35] sbernardin: if you know which one went in which slot, please do [17:50:05] apergos: yes I do....ill put them back now [17:50:08] ok [17:50:11] thanks [17:59:56] !log yurik synchronized php-1.22wmf4/extensions/ZeroRatedMobileAccess/ [18:00:05] Logged the message, Master [18:00:28] 2 more minutes, finishing sync of zero [18:02:04] !log yurik synchronized php-1.22wmf3/extensions/ZeroRatedMobileAccess/ [18:02:11] Logged the message, Master [18:08:19] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [18:08:19] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [18:08:19] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [18:12:23] ori-l: mediawiki-vagrant is pretty awesome [18:12:39] I'm about to get on a flight and this'll let me do some mediawiki dev :) [18:17:53] Ryan_Lane: Lufthansa? [18:17:59] odder: delta [18:18:09] Do they have Internet on board? [18:18:14] nope [18:18:18] Loosers. [18:18:21] otherwise I'd probably use labs :) [18:18:25] Choose Lufthansa next time! [18:18:39] hah. as if I get a choice ;) [18:18:48] we go with the cheapest thing available, for the most part [18:18:51] :-) [18:19:08] Well, I'm not saying Lufthansa is the most expensive one! [18:19:16] (In case any Germans are here.) [18:20:18] the american airlines are usually the cheapest to take leaving from the US [18:20:24] they also suck :D [18:20:54] Ryan_Lane: Have you ever flown with Ryanair? [18:21:11] nope. I hear that's quite a crappy experience, though [18:21:26] If not, then really, those airlines /do not/ suck :) [18:21:43] man. so many freaking dependencies I need to work on my extension [18:22:07] sbernardin: is singer ready for me to try bringing up and watch the console? [18:22:19] PROBLEM - Puppet freshness on db1017 is CRITICAL: No successful Puppet run in the last 10 hours [18:25:19] !log kaldari synchronized php-1.22wmf4/includes/Preferences.php 'syncing preferences.php for bug 48693' [18:25:28] Logged the message, Master [18:32:27] kaldari, how long is your deployment? [18:32:39] need to rollback ours, discovered some bugs :( [18:32:47] I'm all done [18:33:03] kaldari, your window is until noon? [18:33:48] I believe platform has the window until 1, I was using their window [18:33:56] odder: hm. my first flight likely does have wifi [18:34:03] since they weren't using it yet [18:34:04] my flight to AMS definitely does not :) [18:34:33] yurik: Roan is going to be doing the wmf4 deployment in a little bit, so you should coordinate with him [18:35:57] juts great, gerrit doesn't load :( [18:37:27] yeah, it seems down [18:38:25] apergos: singer is all ready for you... [18:38:28] very very slow [18:38:35] ok thanks [18:41:41] !log aaron rebuilt wikiversions.cdb and synchronized wikiversions files: Swithced remaining wikis to 1.22wmf4 [18:41:50] Logged the message, Master [18:43:10] apergos: can you access mangenese? [18:43:48] e.g. are you not on a plane or something? ;) [18:44:08] I am not on a plane [18:44:12] I can look at it in a minute [18:44:29] I'm waiting for the singer bootup fail message [18:44:30] BTW, when I did the sync for wmf4, I got the following errors: [18:44:31] mw57: ssh: connect to host mw57 port 22: Connection timed out [18:44:32] mw80: ssh: connect to host mw80 port 22: Connection timed out [18:44:32] mw98: ssh: connect to host mw98 port 22: Connection timed out [18:44:32] srv284: ssh: connect to host srv284 port 22: Connection timed out [18:44:32] mw1173: ssh: connect to host mw1173 port 22: Connection timed out [18:45:54] New patchset: Aaron Schulz; "Switched remaining wikis to 1.22wmf4" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64968 [18:47:30] I can try restarting gerrit, not sure what else would be useful Aaron [18:47:32] er [18:47:35] AaronSchulz: [18:48:01] it usually gets pegged at 100% cpu every few days and someone kicks [18:48:17] that person may be you the next few weeks [18:48:25] "A claim key should have a single $ in it" [18:48:36] aude: even more frequent now, yay :) [18:49:19] AaronSchulz: well we have a fix tht will be deployed next week [18:49:21] kaldari: checking the event logs for those [18:49:32] Gerrit down? [18:49:41] could have dimm errors [18:49:48] getting all kinds of different errors: http://cl.ly/image/0t1F1T220I1D [18:50:28] !log restarted gerrit [18:50:36] Logged the message, Master [18:50:55] try it now [18:51:07] working now [18:51:17] AaronSchulz: let us know if that continues (it's a bot gone wild) or not [18:51:42] RECOVERY - Host mw1173 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [18:52:02] not so sure about backporting the fix, though as it touches lots of parts of our api [18:53:15] https://bugzilla.wikimedia.org/show_bug.cgi?id=48061 [18:54:31] PROBLEM - Apache HTTP on mw1173 is CRITICAL: Connection refused [18:54:32] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64968 [18:55:32] RECOVERY - Apache HTTP on mw1173 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.062 second response time [18:56:19] sbernardin: there's no grub message, indeed the last thing it does is present messages about the broadcom chipset and the ip and mac addresses [18:57:33] you could see if you can set in the bios to boot from th secondary drive (and then we hope it has grub on it and is recognized) [18:58:37] failing that the next step woul dbe to put the first drive in some other box as a spare disk and see if it could be mounted in order to get the data off it, or at least see if we can see the partition table on it [18:58:57] Krinkle, doing revert now [18:58:57] that will be someone else's task if it needs to happen tonight (10 pm here) [19:00:57] yurik: By all means, I rephrased to make it easier to understand. The message wasn't very clear. Now it can at least be seen to what date we're reverting back to. Ideally it would also say way or what commit in the extension submodule it actually reverts. [19:00:58] apergos: OK...will try to have it boot off of the second drive [19:01:00] AaronSchulz: I used to kick gerrit regularly but I thought we had resolved that problem [19:01:07] guess I was dreaming [19:01:11] Anyway, thats not for me but for your own clarity. I just observed it. [19:01:22] Krinkle, sorry, yes, i realized after reading yours [19:01:29] yurik: Go ahead :) [19:01:37] aaa, why is git pull not working in wmf4 again :((( [19:02:06] git pull && git submodule update --init? [19:02:10] What is it saying [19:03:28] Krinkle, error: insufficient permission for adding an object to repository database .git/ objects [19:04:32] yurik: Looks like another deployer messed up. Can you figure out who it is? Look at the ownership of the file in question [19:04:34] !log yurik synchronized php-1.22wmf3/extensions/ZeroRatedMobileAccess/ [19:04:43] Logged the message, Master [19:05:10] yurik: is this on fenari or tin? [19:05:14] tin [19:07:04] yurik: Looks like kaldari performed several git actions on tin with non-standard permissions [19:07:20] They should be "drwxrwxr-x 2 wikidev" [19:07:45] Recent ones by kaldari are "drwxr-xr-x 2 kaldari wikidev" [19:07:48] no group writability [19:08:01] We need kaldari or a root to fix it [19:08:13] I'll fix it [19:08:29] RoanKattouw, thx [19:08:54] Done [19:09:50] RoanKattouw, thx! [19:09:56] syncing dir ex [19:10:02] ext/zero [19:10:46] RoanKattouw: Didn't we add a global /etc/profile so that all users have umask 0002? [19:10:55] RECOVERY - Host mw80 is UP: PING OK - Packet loss = 0%, RTA = 27.11 ms [19:11:18] Krinkle: on fenari [19:11:24] Aha [19:11:30] I think mutante-away tried to fix that situation on tin [19:11:35] But old umasks die hard [19:11:37] !log yurik synchronized php-1.22wmf4/extensions/ZeroRatedMobileAccess/ [19:11:40] Especially if kaldari has a screen or something [19:11:45] Logged the message, Master [19:12:22] RoanKattouw: indeed, I see my umask is 0002 on tin just like on fenari. No custom .bashrc [19:13:35] PROBLEM - Apache HTTP on mw80 is CRITICAL: Connection refused [19:14:35] RECOVERY - Apache HTTP on mw80 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.502 second response time [19:14:41] Probably because you logged out and back in [19:14:59] Yeah, I usually don't stay logged in. [19:15:25] RECOVERY - Host srv284 is UP: PING OK - Packet loss = 0%, RTA = 26.52 ms [19:16:07] RoanKattouw: btw, are outgoing http connections blocked from tin differently than on fenari? I can't seem to be able to fetch my dotfiles to /tmp [19:17:36] kaldari: I got all but mw57 and mw98 back up...can't access mgmt interface either...will need sbernardin to check them [19:17:55] PROBLEM - Apache HTTP on srv284 is CRITICAL: Connection refused [19:18:43] Krinkle: tin doesn't have a public IP and there's no NAT [19:18:48] So it cannot contact the outside world [19:18:59] OK. I'll sync from fenari then. [19:19:11] I have them there [19:19:55] RECOVERY - Apache HTTP on srv284 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 2.194 second response time [19:41:40] cmjohnson1: should I reboot them? [19:42:02] sbernardin: are they off? what is their status [19:44:13] cmjohnson1: they're currently powered on [19:44:58] can you plug the monitor in and see if there is an error [19:45:17] No display [19:45:42] cmjohnson1: nothing coming up on the screen [19:45:59] okay..which one are you on? [19:46:06] cmjohnson1: probably need to reboot [19:46:20] which on are you on? [19:46:20] They both have power and are on [19:46:31] I'm on mw98 now [19:46:59] okay power off via button...unplug for a few minutes and then power on [19:50:37] cmjohnson1: mw98 coming back up now [19:51:21] okay..do the same for mw57 [19:51:45] RECOVERY - Host mw98 is UP: PING OK - Packet loss = 0%, RTA = 26.58 ms [19:58:45] RECOVERY - Host mw57 is UP: PING OK - Packet loss = 0%, RTA = 26.53 ms [19:59:45] cmjohnson1: mw57 back up now as well [20:00:00] cool..thx...please resolve that ticket [20:08:08] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 20:08:01 UTC 2013 [20:08:58] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:09:37] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 20:09:27 UTC 2013 [20:09:57] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:10:47] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 20:10:43 UTC 2013 [20:10:57] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:11:47] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 20:11:46 UTC 2013 [20:12:01] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:12:57] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 20:12:52 UTC 2013 [20:13:57] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:14:37] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 20:14:28 UTC 2013 [20:14:57] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [20:15:17] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 20:15:08 UTC 2013 [20:15:57] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [21:22:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:23:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [21:51:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:52:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [22:16:06] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [22:16:06] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [22:16:06] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [22:19:06] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [22:19:06] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [22:27:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:28:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.149 second response time [22:32:44] !log removing users sara & mwang from ldap/ops [22:32:52] Logged the message, Master [22:38:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:39:15] New patchset: Akosiaris; "Add cloudera in reprepro updates" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64988 [22:40:03] hmmm. [22:40:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [22:40:17] http://wordpress.org/news/2013/04/wordpress-3-6-beta-2/ [22:40:44] "This is software still in development and we really don’t recommend that you run it on a production site (...)." [22:40:53] And yet, blog.wikimedia.org is using this very beta version :) [22:43:50] !log krinkle synchronized php-1.22wmf4/extensions/VisualEditor/modules/ve/ve.EventEmitter.js 'touch, attempt to fix VisualEditor cache snafu on dewiki' [22:43:59] Logged the message, Master [22:44:46] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [22:46:29] New patchset: Akosiaris; "Add cloudera in reprepro updates" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64988 [22:57:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:57:56] PROBLEM - NTP on ssl3002 is CRITICAL: NTP CRITICAL: No response from NTP server [22:58:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [23:08:19] Change merged: Akosiaris; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64988 [23:30:20] PROBLEM - Backend Squid HTTP on sq60 is CRITICAL: Connection refused [23:30:29] PROBLEM - Backend Squid HTTP on sq56 is CRITICAL: Connection refused [23:30:39] PROBLEM - Backend Squid HTTP on sq51 is CRITICAL: Connection refused [23:30:39] PROBLEM - Backend Squid HTTP on sq58 is CRITICAL: Connection refused [23:30:49] PROBLEM - Backend Squid HTTP on sq53 is CRITICAL: Connection refused [23:31:19] PROBLEM - [23:31:35] PROBLEM - ? [23:31:36] all of the above icinga alerts would be me [23:32:11] * Damianz turns akosiaris off [23:32:11] the half spelled out alert though.... don't know what it is [23:32:13] problem solved [23:32:19] :-) [23:39:40] !log aaron synchronized php-1.22wmf4/maintenance/runJobs.php '2188c14239b74d202b2109933612259193f6fa41' [23:39:49] Logged the message, Master [23:40:36] the half spelled out alert is neon having its disk full [23:40:39] again [23:40:40] sigh [23:41:48] RECOVERY - Backend Squid HTTP on sq58 is OK: HTTP OK: HTTP/1.0 200 OK - 487 bytes in 0.061 second response time [23:41:52] there we go [23:42:17] RECOVERY - Backend Squid HTTP on sq54 is OK: HTTP OK: HTTP/1.0 200 OK - 487 bytes in 0.060 second response time [23:42:21] what did you delete ? [23:42:30] puppet.log [23:42:38] which isn't logrotated [23:42:38] :-( [23:42:43] and had all the diffs from naggen [23:42:56] which are due to different ordering [23:44:53] hmmmm. will I get this local vm fully finished before my flight….. [23:45:17] RECOVERY - Backend Squid HTTP on sq52 is OK: HTTP OK: HTTP/1.0 200 OK - 494 bytes in 0.054 second response time [23:46:04] New patchset: Andrew Bogott; "Call apt-get update after adding a new apt repo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64995 [23:46:34] \o/ [23:46:37] PROBLEM - NTP on ssl3003 is CRITICAL: NTP CRITICAL: No response from NTP server [23:46:43] I have a local version of openstackmanager and all dependencies [23:46:51] and an instance is building in the vm :) [23:47:03] now I can work on the single javascript change I wanted to do on the plane [23:47:20] One day you'll be able to deploy labs in vbox from puppet [23:47:23] * Damianz dreams [23:47:27] :D [23:47:34] it's all mostly puppetixed [23:47:42] but it's not modules [23:48:40] ok. boarding time [23:51:17] RECOVERY - Backend Squid HTTP on sq55 is OK: HTTP OK: HTTP/1.0 200 OK - 487 bytes in 0.063 second response time [23:54:29] PROBLEM - Host sq59 is DOWN: PING CRITICAL - Packet loss = 100% [23:54:57] RECOVERY - Host sq59 is UP: PING OK - Packet loss = 0%, RTA = 26.70 ms [23:55:17] RECOVERY - Backend Squid HTTP on sq59 is OK: HTTP OK: HTTP/1.0 200 OK - 1250 bytes in 0.107 second response time [23:56:27] RECOVERY - Backend Squid HTTP on sq56 is OK: HTTP OK: HTTP/1.0 200 OK - 487 bytes in 0.062 second response time [23:57:17] RECOVERY - Backend Squid HTTP on sq57 is OK: HTTP OK: HTTP/1.0 200 OK - 487 bytes in 0.063 second response time [23:57:47] RECOVERY - Backend Squid HTTP on sq53 is OK: HTTP OK: HTTP/1.0 200 OK - 487 bytes in 0.059 second response time