[00:08:06] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 00:07:55 UTC 2013 [00:08:46] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:09:06] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 00:08:59 UTC 2013 [00:09:46] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:09:56] PROBLEM - RAID on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:10:06] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 00:09:58 UTC 2013 [00:10:46] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [00:10:46] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:10:56] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 00:10:47 UTC 2013 [00:11:46] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:12:16] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 00:12:08 UTC 2013 [00:12:46] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:16:36] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 00:16:34 UTC 2013 [00:16:46] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [00:21:06] !log deployed change 64875 to virt0 [00:21:15] Logged the message, Master [00:58:54] PROBLEM - Disk space on mc15 is CRITICAL: Timeout while attempting connection [00:59:24] PROBLEM - Swift HTTP on ms-fe4 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:59:44] RECOVERY - Disk space on mc15 is OK: DISK OK [01:01:44] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset 5.149841309e-05 secs [01:02:24] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset 0.002368807793 secs [01:11:12] PROBLEM - Swift HTTP on ms-fe1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:11:42] PROBLEM - Swift HTTP on ms-fe3 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:12:32] PROBLEM - Apache HTTP on mw1156 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:13:02] RECOVERY - Swift HTTP on ms-fe1 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.058 second response time [01:13:12] PROBLEM - Apache HTTP on mw1158 is CRITICAL: Connection timed out [01:13:32] PROBLEM - Apache HTTP on mw1159 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:13:32] PROBLEM - Apache HTTP on mw1153 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:13:32] RECOVERY - Swift HTTP on ms-fe3 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.058 second response time [01:14:14] PROBLEM - Apache HTTP on mw1160 is CRITICAL: Connection timed out [01:14:33] PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: Connection timed out [01:15:04] PROBLEM - DPKG on ms-fe3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:15:04] PROBLEM - Apache HTTP on mw1157 is CRITICAL: Connection timed out [01:15:14] PROBLEM - Apache HTTP on mw1155 is CRITICAL: Connection timed out [01:15:25] PROBLEM - RAID on ms-fe3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:08:27] !log LocalisationUpdate completed (1.22wmf4) at Wed May 22 02:08:27 UTC 2013 [02:08:47] Logged the message, Master [02:14:48] !log LocalisationUpdate completed (1.22wmf3) at Wed May 22 02:14:48 UTC 2013 [02:14:56] Logged the message, Master [02:35:44] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed May 22 02:35:44 UTC 2013 [02:35:53] Logged the message, Master [06:27:59] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 06:27:51 UTC 2013 [06:28:39] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:28:59] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 06:28:49 UTC 2013 [06:29:40] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:29:40] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 06:29:35 UTC 2013 [06:30:39] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [06:33:09] PROBLEM - Puppet freshness on colby is CRITICAL: No successful Puppet run in the last 10 hours [06:41:45] New patchset: Nikerabbit; "Disable Narayam on commons now that they have ULS" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64884 [06:53:57] PROBLEM - Host wikipedia-lb.esams.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100% [06:53:59] PROBLEM - Host bits-lb.esams.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100% [06:54:29] RECOVERY - Host bits-lb.esams.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 86.21 ms [06:54:31] RECOVERY - Host wikipedia-lb.esams.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 86.14 ms [06:59:17] PROBLEM - RAID on ms-fe1 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:59:38] PROBLEM - Swift HTTP on ms-fe1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:00:18] RECOVERY - RAID on ms-fe1 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [07:00:29] RECOVERY - Swift HTTP on ms-fe1 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.059 second response time [07:03:17] PROBLEM - RAID on ms-fe1 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:03:37] PROBLEM - Swift HTTP on ms-fe1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:04:07] RECOVERY - RAID on ms-fe1 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [07:04:38] RECOVERY - Swift HTTP on ms-fe1 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 8.629 second response time [07:04:57] PROBLEM - Disk space on mc15 is CRITICAL: Timeout while attempting connection [07:05:57] RECOVERY - Disk space on mc15 is OK: DISK OK [07:45:18] PROBLEM - RAID on ms-fe1 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:46:38] RECOVERY - Swift HTTP on ms-fe4 is OK: HTTP OK: HTTP/1.1 200 OK - 2503 bytes in 0.057 second response time [07:49:08] RECOVERY - RAID on ms-fe1 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [07:50:07] !log restarted swift proxy on ms-fe1 [07:50:18] Logged the message, Master [07:59:43] hey mark, Magnus (Snaps) and I are trying to find a date/time to demo the progress on varnishkafka; what would be a convenient date/time for you? [08:02:08] RECOVERY - NTP on ssl3003 is OK: NTP OK: Offset -0.00244987011 secs [08:02:38] RECOVERY - NTP on ssl3002 is OK: NTP OK: Offset -0.006417512894 secs [08:06:20] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [08:06:20] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [08:06:20] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:00] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 08:07:52 UTC 2013 [08:08:00] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:08:50] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 08:08:44 UTC 2013 [08:09:00] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:09:40] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 08:09:30 UTC 2013 [08:10:01] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:13:32] New patchset: Hashar; "beta: configuration for Wikidata" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61428 [08:14:08] New review: Hashar; "Rebased to use wmf-config/wgConfVHosts-labs.php Not sure why we do not have a www.wikidata.org ent..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61428 [08:15:00] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 08:14:52 UTC 2013 [08:15:00] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [08:21:20] PROBLEM - Puppet freshness on db1017 is CRITICAL: No successful Puppet run in the last 10 hours [08:35:01] New review: Hashar; "(1 comment)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/61428 [08:47:48] PROBLEM - Host bits-lb.esams.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100% [08:47:49] PROBLEM - Host wikipedia-lb.esams.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100% [08:48:29] RECOVERY - Host wikipedia-lb.esams.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 86.08 ms [08:48:30] RECOVERY - Host bits-lb.esams.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 86.65 ms [08:48:48] PROBLEM - RAID on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:49:40] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [09:16:02] hi can someone run extensions/TimedMediaHandler/maintenance/resetTranscodes.php on commons. there are some quite old jobs that got never run, want to reinsert them [09:58:48] probably stupid question: www.wikidata.org/w/extensions/Wikibase/docs/summaries.txt is accessible through the browser, but www.wikidata.org/w/extensions/Wikibase/docs/ontology.owl is not - would anyone know why? [10:08:53] LeslieCarr ping [10:09:17] LeslieCarr: what is IP range for labs? I mean the IP addresses that foreign servers see when people from labs instances connect to them [10:09:29] paravoid ^ [10:11:25] or anyone else who might know that... [10:12:08] isn't it 10.42.0.0/something petan ? [10:12:21] matanya the public IP address range I mean [10:12:23] 208.80.153.128/25 [10:12:26] like 208.80.153.163 is one of them [10:12:27] oh [10:12:29] mark: ty [10:12:37] what mark said :) [11:16:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:17:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [11:19:15] PROBLEM - SSH on mc15 is CRITICAL: Connection timed out [11:20:05] RECOVERY - SSH on mc15 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [11:20:25] PROBLEM - RAID on mc15 is CRITICAL: Timeout while attempting connection [11:21:16] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [11:31:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:33:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [11:38:55] PROBLEM - DPKG on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:40:43] RECOVERY - DPKG on mc15 is OK: All packages OK [11:48:07] New patchset: Nemo bis; "(bug 40341) Enable translation import on wikis with Translate extension" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64919 [11:51:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:52:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.150 second response time [12:07:59] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 12:07:56 UTC 2013 [12:09:00] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:09:30] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 12:09:24 UTC 2013 [12:10:03] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:10:39] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 12:10:38 UTC 2013 [12:11:01] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:11:50] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 12:11:49 UTC 2013 [12:12:00] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:13:00] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 12:12:50 UTC 2013 [12:13:00] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:13:49] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 12:13:40 UTC 2013 [12:14:01] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:14:29] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 12:14:24 UTC 2013 [12:14:59] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:15:09] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 12:15:02 UTC 2013 [12:15:39] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [12:15:39] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [12:15:39] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [12:16:00] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [12:18:40] PROBLEM - Puppet freshness on pdf1 is CRITICAL: No successful Puppet run in the last 10 hours [12:18:40] PROBLEM - Puppet freshness on pdf2 is CRITICAL: No successful Puppet run in the last 10 hours [12:21:31] New patchset: ArielGlenn; "More documentation of bz2 multistream and index files" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/64921 [12:23:13] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/64921 [12:44:13] PROBLEM - Puppet freshness on db45 is CRITICAL: No successful Puppet run in the last 10 hours [12:51:03] PROBLEM - RAID on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:51:53] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [13:08:20] PROBLEM - Disk space on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:11:10] RECOVERY - Disk space on mc15 is OK: DISK OK [13:22:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:23:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [13:52:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:53:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [14:09:40] PROBLEM - Puppet freshness on ms-fe3001 is CRITICAL: No successful Puppet run in the last 10 hours [14:13:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:14:10] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time [14:14:30] PROBLEM - RAID on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:15:21] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [14:20:43] andrewbogott_afk: Silke_WMDE or anyone who knows about puppet, i am getting an error running puppet for the wikidata test system [14:20:46] http://dpaste.com/1195230/ [14:21:05] how is it trying to run both mediawiki single node and the wikidata one? [14:22:38] it's new, for one thing in mediawiki_singlenode or atleast newishly updated [14:23:57] * aude may have fixed it [14:29:31] PROBLEM - RAID on mc15 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [14:30:20] RECOVERY - RAID on mc15 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [14:32:35] aude: fixed? [14:32:42] no [14:32:45] coming [14:32:52] thanks [14:34:28] Silke_WMDE: Fun Fact: Replication is teh workingz. :-) [14:52:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:53:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [14:53:51] Coren: ohhh.... replicating what where exactly? [14:55:18] Coren: Yay! [14:55:47] I should update the roadmap one last time before leaving [14:55:49] :) [14:56:31] DanielK_WMDE: Replicating 4/7 clusters to labs [14:57:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:57:21] Silke_WMDE: does that include cross-server replication of commons (and perhaps wikidata)? [14:57:35] is there a page describing what is filtered, and how? [14:58:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.158 second response time [14:58:55] DanielK_WMDE: The WMF approach is: federated tables instead of cross db joins [14:59:11] PROBLEM - Puppet freshness on db44 is CRITICAL: No successful Puppet run in the last 10 hours [14:59:24] DanielK_WMDE: list of cluster working: https://wikitech.wikimedia.org/wiki/ToolLabsDatabasePlan [14:59:27] federated tables have TERRIBLE performance on big joins [14:59:40] do i have to be in the tools project to use? [14:59:42] and joning the image table against commons is just that [14:59:46] and it'S the prime use case [15:00:10] DanielK_WMDE: Convince them! ;) [15:00:36] why are we talking in this channel btw? [15:00:45] Silke_WMDE, Coren: you can try with federated tables, but be prepared to change that. i'm pretty sure it's not going to work for the "join local imagelinks table against commons image table" use case [15:00:58] coren is here..... [15:01:03] ...which is the prime use case for replicating commons to all servers [15:01:45] is there a way i can kill "notice: Run of Puppet configuration client already in progress; skipping" [15:01:57] it froze while logged in [15:02:15] aude yes [15:02:19] how? [15:02:27] find the process with ps aux | grep puppetd [15:02:30] kill -9 it [15:02:32] ok [15:02:36] restart puppet [15:02:46] brb [15:03:14] works [15:03:33] but not wikidata singlenode [15:09:56] DanielK_WMDE: Actually, that's a case I expect should work very well since the where clause is on an indexed column [15:10:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:10:32] New patchset: Ottomata; "Changing metrics.wikimedia.org htpasswd" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64942 [15:11:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [15:11:36] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64942 [15:11:42] Coren: but as far as i know, the index data is not replicated to the federated table. so a join means comparing two indexes chunk by chunk over the network. [15:11:44] * aude found the problem with wikidata singlenode :) [15:11:58] Coren: that's not as bad as a table scan over the network, but still [15:12:06] will take a few minutes to fix stuff [15:16:23] DanielK_WMDE: That's basically it. I checked, and performance is quite adequate for that scheme. [15:16:49] DanielK_WMDE: It's apparently fairly smart about not transferring the whole index [15:17:29] PROBLEM - SSH on mc15 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:18:39] Daniel plus, baseline performance is high enough that might even still tip the balance. [15:18:49] Coren, petan, I just wanted to let you know that I ported a script that uses the replicated databases from toolserver to Labs today; not only did it work fine, but a script that typically took 3-4 hours to complete on TS was finished in 3-4 minutes here! good job!! [15:18:52] :-) [15:18:54] Coren: do i have to be in tools project to access the db? [15:19:16] aude: Yes, it's the only project it's accessible from atm. Post-amsterdam, we'll generalize. [15:19:19] RECOVERY - SSH on mc15 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.1 (protocol 2.0) [15:19:22] hmmmm, ok [15:19:27] * aude will be joining tools :) [15:19:30] :-) [15:20:00] i already am in bots [15:20:14] Coren how the accounts for db are being created? [15:20:15] What's your wikitech username? I'll add you now [15:20:23] Coren: same as irc [15:20:30] Coren I noticed that toolwatcher is now getting login directly from the replica file instead of generating random pw [15:20:41] Coren but how the replica file is created? o.O [15:21:00] petan: From the NFS server, which doubles as the DB manager (since it has access to all the filesystems for the creds) [15:21:22] ok, so how the nfs server knows it should generate credentials? [15:21:26] aude: {{done}} [15:21:37] yay! [15:21:39] I mean which process is creating these accounts? and where it lives? [15:21:44] aude: Useful pointers: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Database_access [15:21:55] ok [15:21:57] petan: It's a watcher that lives on the NFS server. [15:22:07] !replicateddb is https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Database_access [15:22:07] Key was added [15:22:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:22:31] does it create account for every user, or only for every user who is in tools project? [15:22:38] Right now, we have s1, s2, s4, s5. s3 s6 should arrive later today. [15:22:50] petan: It creates account for every user that is on a project that uses NFS. [15:22:55] aha [15:23:08] so technically projects outside tools should be able to login to db right now? if they use nfs? [15:23:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [15:23:54] petan: AFAIK, the only one is deployment-prep, but they don't have the network magic to get to the DBs just yet. Like I said, post-Amsterdam. Perhaps we'll even have time to do that there, depending on how busy Ryan and I get. [15:24:09] mhm [15:24:39] petan: If you're thinking bots, look at the /etc/hosts and /etc/iptables.conf hack on tools. But beware: that's a hack that'll change shortly! [15:24:49] ok [15:35:41] http://www.wikidata.org/w/extensions/Wikibase/docs/summaries.txt is accessible through the browser, but http://www.wikidata.org/w/extensions/Wikibase/docs/ontology.owl is not - would anyone know why? (both files are in the extension) [16:08:06] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 16:07:58 UTC 2013 [16:08:06] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:09:26] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 16:09:25 UTC 2013 [16:10:06] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:11:34] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 16:10:41 UTC 2013 [16:11:34] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:11:56] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 16:11:52 UTC 2013 [16:12:06] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:12:56] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [16:12:56] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 16:12:48 UTC 2013 [16:13:06] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:13:46] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 16:13:38 UTC 2013 [16:14:06] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:14:26] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 16:14:22 UTC 2013 [16:15:06] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:15:06] RECOVERY - Puppet freshness on ms2 is OK: puppet ran at Wed May 22 16:15:04 UTC 2013 [16:15:56] PROBLEM - Puppet freshness on mc15 is CRITICAL: No successful Puppet run in the last 10 hours [16:16:01] New patchset: coren; "Tool Labs: install 'dc' package (user request)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64950 [16:16:07] PROBLEM - Puppet freshness on ms2 is CRITICAL: No successful Puppet run in the last 10 hours [16:17:08] New review: coren; "Is baby patch. Won't harm flies." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/64950 [16:17:09] Change merged: coren; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/64950 [16:33:56] PROBLEM - Puppet freshness on colby is CRITICAL: No successful Puppet run in the last 10 hours [16:40:08] PROBLEM - Host colby is DOWN: PING CRITICAL - Packet loss = 100% [17:21:16] What would result in puppet ensure=>present installing a package which apt can immediately upgrade? Doesn't ensure=>present install the newest version available? [17:25:22] getting ready for dirsync zero extension to wmf4 [17:28:32] New patchset: ArielGlenn; "bugfixes: handle deleted text; workaround dupl text ids" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/64955 [17:33:18] Anyone around who can help me with server 'singer' ? [17:33:33] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/64955 [17:33:51] what's up with it? [17:34:00] sbernardin: [17:34:13] Hey [17:34:30] It's not booting into the OS [17:35:04] oh? what's the last message you see? [17:35:20] mutante had me reboot it [17:35:28] But still nothing [17:35:45] It posts but shows no boot device [17:36:44] dyrsyncing [17:36:48] zero ext [17:36:56] When that didn't work...I was told to try and put singers drives into 'colby' [17:37:05] Still the same thing [17:37:18] on colby you mean? [17:37:20] awesome [17:38:07] ah after a dist-upgrade [17:38:10] that's a dra [17:38:12] g [17:39:20] only thing in puppet is misc::secure and I thought we were living without that, maybe not yet [17:40:24] !log yurik synchronized php-1.22wmf4/extensions/ZeroRatedMobileAccess/ [17:40:28] what is the disk setup there anyways? what did it have? [17:40:32] Logged the message, Master [17:40:49] 4 mw and 1 srv servers failed ssh connection [17:42:34] sbernardin: ? [17:42:47] apergos: 2 160gb drives [17:43:10] Don't know how they were setup [17:44:45] https://bugzilla.wikimedia.org/show_bug.cgi?id=48693 [17:44:49] Anyone seen that? [17:45:17] !log yurik synchronized php-1.22wmf3/extensions/ZeroRatedMobileAccess/ [17:45:26] Logged the message, Master [17:46:08] so we don't know if they were raided up or anything [17:46:11] wunnerful [17:46:26] apergos: singer goes back some time and it was misc box...probably software raid1 [17:46:53] Theo10011: yes, there was a little discussion about it, the people doing deployment know [17:47:27] * apergos guesses it's a grub issue, but why that would be, no idea [17:47:37] are the disks back in singer now? [17:47:52] and hopefully *in the same slots* ?? [17:48:05] Thanks apergos. Someone said the patch is waiting to be merged/deployed, a few hours ago. Any idea how long that can take? [17:48:16] https://gerrit.wikimedia.org/r/#/c/64946/ [17:48:18] ^ that one [17:48:35] don't know, too much travelling happening right now [17:48:49] :| [17:48:50] k [17:48:53] sorry... [17:49:19] it's alright, I'll find something else to do. [17:49:21] apergos: should I put the disks back in singer? [17:49:22] Thanks. [17:49:35] sbernardin: if you know which one went in which slot, please do [17:50:05] apergos: yes I do....ill put them back now [17:50:08] ok [17:50:11] thanks [17:59:56] !log yurik synchronized php-1.22wmf4/extensions/ZeroRatedMobileAccess/ [18:00:05] Logged the message, Master [18:00:28] 2 more minutes, finishing sync of zero [18:02:04] !log yurik synchronized php-1.22wmf3/extensions/ZeroRatedMobileAccess/ [18:02:11] Logged the message, Master [18:08:19] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [18:08:19] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [18:08:19] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [18:12:23] ori-l: mediawiki-vagrant is pretty awesome [18:12:39] I'm about to get on a flight and this'll let me do some mediawiki dev :) [18:17:53] Ryan_Lane: Lufthansa? [18:17:59] odder: delta [18:18:09] Do they have Internet on board? [18:18:14] nope [18:18:18] Loosers. [18:18:21] otherwise I'd probably use labs :) [18:18:25] Choose Lufthansa next time! [18:18:39] hah. as if I get a choice ;) [18:18:48] we go with the cheapest thing available, for the most part [18:18:51] :-) [18:19:08] Well, I'm not saying Lufthansa is the most expensive one! [18:19:16] (In case any Germans are here.) [18:20:18] the american airlines are usually the cheapest to take leaving from the US [18:20:24] they also suck :D [18:20:54] Ryan_Lane: Have you ever flown with Ryanair? [18:21:11] nope. I hear that's quite a crappy experience, though [18:21:26] If not, then really, those airlines /do not/ suck :) [18:21:43] man. so many freaking dependencies I need to work on my extension [18:22:07] sbernardin: is singer ready for me to try bringing up and watch the console? [18:22:19] PROBLEM - Puppet freshness on db1017 is CRITICAL: No successful Puppet run in the last 10 hours [18:25:19] !log kaldari synchronized php-1.22wmf4/includes/Preferences.php 'syncing preferences.php for bug 48693' [18:25:28] Logged the message, Master [18:32:27] kaldari, how long is your deployment? [18:32:39] need to rollback ours, discovered some bugs :( [18:32:47] I'm all done [18:33:03] kaldari, your window is until noon? [18:33:48] I believe platform has the window until 1, I was using their window [18:33:56] odder: hm. my first flight likely does have wifi [18:34:03] since they weren't using it yet [18:34:04] my flight to AMS definitely does not :) [18:34:33] yurik: Roan is going to be doing the wmf4 deployment in a little bit, so you should coordinate with him [18:35:57] juts great, gerrit doesn't load :( [18:37:27] yeah, it seems down [18:38:25] apergos: singer is all ready for you... [18:38:28] very very slow [18:38:35] ok thanks [18:41:41] !log aaron rebuilt wikiversions.cdb and synchronized wikiversions files: Swithced remaining wikis to 1.22wmf4 [18:41:50] Logged the message, Master [18:43:10] apergos: can you access mangenese? [18:43:48] e.g. are you not on a plane or something? ;) [18:44:08] I am not on a plane [18:44:12] I can look at it in a minute [18:44:29] I'm waiting for the singer bootup fail message [18:44:30] BTW, when I did the sync for wmf4, I got the following errors: [18:44:31] mw57: ssh: connect to host mw57 port 22: Connection timed out [18:44:32] mw80: ssh: connect to host mw80 port 22: Connection timed out [18:44:32] mw98: ssh: connect to host mw98 port 22: Connection timed out [18:44:32] srv284: ssh: connect to host srv284 port 22: Connection timed out [18:44:32] mw1173: ssh: connect to host mw1173 port 22: Connection timed out [18:45:54] New patchset: Aaron Schulz; "Switched remaining wikis to 1.22wmf4" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/64968 [18:47:30] I can try restarting gerrit, not sure what else would be useful Aaron [18:47:32] er [18:47:35] AaronSchulz: [18:48:01] it usually gets pegged at 100% cpu every few days and someone kicks [18:48:17] that person may be you the next few weeks [18:48:25] "A claim key should have a single $ in it" [18:48:36] aude: even more frequent now, yay :) [18:49:19] AaronSchulz: well we have a fix tht will be deployed next week [18:49:21] kaldari: checking the event logs for those [18:49:32] Gerrit down? [18:49:41] could have dimm errors [18:49:48] getting all kinds of different errors: http://cl.ly/image/0t1F1T220I1D [18:50:28] !log restarted gerrit [18:50:36] Logged the message, Master [18:50:55] try it now [18:51:07]