[00:01:21] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [00:08:31] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [00:13:31] PROBLEM - DPKG on helium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [00:14:32] New patchset: Andrew Bogott; "Use instance-proxy for the default hostname." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57443 [00:17:31] RECOVERY - DPKG on helium is OK: All packages OK [00:18:00] lesliecarr: rt4870 is for rdb1/2 [00:22:01] PROBLEM - Host mw27 is DOWN: PING CRITICAL - Packet loss = 100% [00:23:41] RECOVERY - Host mw27 is UP: PING OK - Packet loss = 0%, RTA = 26.55 ms [00:26:21] PROBLEM - Apache HTTP on mw27 is CRITICAL: Connection refused [00:29:58] New patchset: coren; "Add ssh_restrict_network variable to SSH" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57447 [00:33:18] LeslieCarr: you got your dns issue fixed i hope? [00:33:31] if not you can invalidate cached entries with rec_control wipe-cache fqdn [00:35:21] RECOVERY - Apache HTTP on mw27 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.141 second response time [00:35:48] robh: last check leslie was in the middle of install and all was good [00:47:55] cool [00:52:03] New patchset: coren; "Add ssh_restrict_network variable to SSH" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57447 [01:05:00] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [01:08:43] New review: coren; "Tested and works as intended." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/57447 [01:56:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:57:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.133 second response time [02:04:30] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [02:13:31] !log LocalisationUpdate completed (1.22wmf1) at Thu Apr 4 02:13:31 UTC 2013 [02:13:40] Logged the message, Master [02:21:01] !log LocalisationUpdate completed (1.21wmf12) at Thu Apr 4 02:21:01 UTC 2013 [02:21:09] New review: Lcarr; "This will put this on every machine - I don't think that's a good idea" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57447 [02:21:09] Logged the message, Master [02:45:45] New review: coren; "It actually is a no-op unless you set the appropriate puppet variable since it's guarded in a condit..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57447 [03:08:59] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [03:12:42] New review: Ryan Lane; "Please check with puppetmaster::self to ensure this won't break every system." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57447 [03:26:09] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [03:56:05] !log upgrading opendj on virt1000 [03:56:18] Logged the message, Master [03:59:06] PROBLEM - Host mw1085 is DOWN: PING CRITICAL - Packet loss = 100% [04:01:06] RECOVERY - Host mw1085 is UP: PING OK - Packet loss = 0%, RTA = 1.35 ms [04:05:53] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:08:03] PROBLEM - LDAP on virt1000 is CRITICAL: Connection refused [04:08:13] PROBLEM - LDAPS on virt1000 is CRITICAL: Connection refused [04:09:03] RECOVERY - LDAP on virt1000 is OK: TCP OK - 0.000 second response time on port 389 [04:09:13] RECOVERY - LDAPS on virt1000 is OK: TCP OK - 0.000 second response time on port 636 [04:09:23] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 04:09:14 UTC 2013 [04:09:53] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:11:13] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 04:11:03 UTC 2013 [04:11:53] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:12:53] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 04:12:48 UTC 2013 [04:13:53] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:14:33] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 04:14:26 UTC 2013 [04:14:53] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:16:03] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [04:16:03] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 04:15:59 UTC 2013 [04:16:34] !log opendj upgrade on virt1000 complete [04:16:42] Logged the message, Master [04:16:53] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:17:33] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 04:17:25 UTC 2013 [04:17:53] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:18:53] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 04:18:46 UTC 2013 [04:18:53] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:20:03] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 04:20:01 UTC 2013 [04:20:53] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:21:13] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 04:21:10 UTC 2013 [04:21:53] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:22:23] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 04:22:12 UTC 2013 [04:22:53] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:23:13] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 04:23:07 UTC 2013 [04:23:53] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:24:03] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 04:23:54 UTC 2013 [04:24:53] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:25:13] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 04:25:04 UTC 2013 [04:25:53] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:30:03] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [04:32:53] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 04:32:46 UTC 2013 [04:32:53] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:43:57] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [04:59:57] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [05:04:25] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [05:38:10] PROBLEM - Squid on brewster is CRITICAL: Connection refused [05:42:51] !log clearing squid logs and restarting squid on brewster [05:42:58] Logged the message, Master [05:43:36] RECOVERY - Squid on brewster is OK: TCP OK - 0.027 second response time on port 8080 [05:57:42] New patchset: Jeremyb; "fix comment on udp2log sample rate" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57471 [05:57:54] Ryan_Lane: filled up again? [05:57:58] :( [06:00:19] yes [06:03:26] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [06:04:46] PROBLEM - Varnish HTCP daemon on cp1041 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:04:46] PROBLEM - Varnish HTTP mobile-backend on cp1041 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:04:46] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:16:11] New review: Stefan.petrea; "Looks good" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/57471 [06:22:36] RECOVERY - Varnish HTCP daemon on cp1041 is OK: PROCS OK: 1 process with UID = 997 (varnishhtcpd), args varnishhtcpd worker [06:22:36] RECOVERY - Varnish HTTP mobile-backend on cp1041 is OK: HTTP OK: HTTP/1.1 200 OK - 634 bytes in 6.417 second response time [06:22:46] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [06:30:26] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 06:30:21 UTC 2013 [06:31:26] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [06:32:56] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 06:32:49 UTC 2013 [06:33:26] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [07:06:38] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [07:44:58] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [07:44:58] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [07:44:58] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [07:54:28] New patchset: Rfaulk; "mod. use get_project_host_map method to generate map for project to host key." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56576 [08:04:59] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [08:08:19] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 08:08:10 UTC 2013 [08:08:59] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [08:09:19] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 08:09:16 UTC 2013 [08:09:59] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [08:20:32] https://doc.wikimedia.org/mediawiki-core/master/php/html/ is gone [08:30:09] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [08:32:59] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 08:32:57 UTC 2013 [08:33:59] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [09:06:39] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [09:32:48] Krinkle: 11:20 < ori-l> https://doc.wikimedia.org/mediawiki-core/master/php/html/ is gone [09:33:09] paravoid: I already know, he told me in #wikimedia-tech [09:33:26] the job has been disabled for a while, so it wasn't being updated. [09:33:34] I'm trying to get it working again [09:33:46] In doing so I cleared the existing dir [09:34:05] (rsync cleared it when syncing) [09:34:15] I don't care that much, I just thought you should know :) [09:34:25] I know :) [10:05:43] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [10:10:29] New patchset: Dereckson; "(bug 43359) Enable WebFonts on Javanese projects" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/39578 [10:10:39] New review: Dereckson; "PS3: rebased" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/39578 [10:19:46] New patchset: Mark Bergsma; "Temporarily remove cp3003 from the cache pool" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57490 [10:20:46] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57490 [10:29:51] was mailman changed recently? [10:31:22] following the steps to subscribe to certain private ml [10:31:33] - The moderator doesn't get an email [10:31:53] - The request doesn't appear in the list of pending requests [10:32:07] (and the user isn't subscribed, which is correct) [10:33:33] more likely that someone changed privacy options for that list and all requests are rejected automatically [10:36:17] PROBLEM - Varnish HTTP upload-backend on cp3003 is CRITICAL: Connection refused [10:36:22] then that "someone" did so in global mailman preferences [10:36:38] in that case I would expect an email saying that it was automatically rejected [10:36:47] PROBLEM - Varnish HTTP upload-frontend on cp3003 is CRITICAL: Connection refused [10:37:18] RECOVERY - Varnish HTTP upload-backend on cp3003 is OK: HTTP OK: HTTP/1.1 200 OK - 634 bytes in 0.177 second response time [10:37:47] RECOVERY - Varnish HTTP upload-frontend on cp3003 is OK: HTTP OK: HTTP/1.1 200 OK - 675 bytes in 0.176 second response time [10:37:49] the only similar config option I see is ban_list) [10:37:49] chapin.thomasovnta@batterydream.com [10:38:02] that's the only banned email [10:40:27] trying to follow https://wikitech.wikimedia.org/wiki/Git-buildpackage to build a deb for libvpx-1.1.0+patch i have the repo ready but can not push to gerrit since create-project requires the right permissions [10:40:57] who can run ssh USER@gerrit.wikimedia.org -p 29418 'gerrit create-project' -d "Package libvpx" -n operations/debs/libvpx -o ops -p operations/debs [10:41:01] and allow me to push changes [10:43:47] j^, poke ^demon [10:44:23] New patchset: MaxSem; "Check mobile site's HTTP status" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57419 [10:44:37] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [10:52:27] PROBLEM - Host cp3003 is DOWN: PING CRITICAL - Packet loss = 100% [11:02:37] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [11:04:25] Any ideas why gerrit might complain about lack of Change-ID when it clearly /is/ included in a commit message? [11:04:32] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [11:04:50] odder, that's the commit message? [11:04:54] *what's the commit message? [11:06:33] odder, is the commit id in the middle of the message? [11:06:36] git log -1 shows Change-Id: I92f841efc6eb10c3c46990a27b5e599d1da8f411 in the last line [11:08:27] New review: Faidon; "Class name "builder" is far too generic. Most Debian tools call everything pbuilder even if it's usi..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/56382 [11:09:13] Platonides: http://tools.wikimedia.pl/~odder/showHEAD is what I get when I do git show HEAD [11:13:20] odder, looks good to me [11:16:24] It looks good to me too, but still – doesn't work :) [11:18:37] New patchset: Platonides; "(bug 46828) Add patroller and autopatrolled groups on viwiki Added patroller and autopatrolled user groups on viwiki, and modified wgAddGroups, wgRemoveGroups according to the bug." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57497 [11:19:01] that's your patchset, odder [11:19:05] New patchset: Nikerabbit; "(bug 46840) Update ttmservr solr schema" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57498 [11:19:07] :) [11:20:22] New review: Nikerabbit; "Needed for Special:SearchTranslations." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57498 [11:20:39] New patchset: Nikerabbit; "(bug 46840) Update TTMServer Solr schema" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57498 [11:28:32] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [11:32:32] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [11:37:31] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [11:40:22] New review: MaxSem; "It's a sync with the master schema at mediawiki/extensions/Translate which was reviewed by me and Or..." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/57498 [12:02:31] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [12:02:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:03:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.153 second response time [12:07:08] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [12:08:48] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 12:08:38 UTC 2013 [12:09:08] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [12:09:48] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 12:09:41 UTC 2013 [12:10:08] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [12:10:38] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 12:10:35 UTC 2013 [12:11:08] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [12:11:38] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 12:11:29 UTC 2013 [12:12:08] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [12:12:18] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 12:12:09 UTC 2013 [12:13:08] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [12:13:18] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 12:13:15 UTC 2013 [12:14:08] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [12:15:28] RECOVERY - search indices - check lucene status page on search1017 is OK: HTTP OK: HTTP/1.1 200 OK - 62829 bytes in 0.010 second response time [12:17:38] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [12:33:28] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 12:33:25 UTC 2013 [12:33:38] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [12:45:35] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [13:01:28] New patchset: Mark Bergsma; "Temporarily depool cp1035 for persistent debugging" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57502 [13:02:05] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57502 [13:06:32] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [13:26:20] <^demon> !log running jgit gc on all repos [13:26:27] Logged the message, Master [13:26:32] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [13:26:42] thanks for that demon ;) [13:26:51] <^demon> You're welcome. [13:26:51] the puppet repo was getting unbearably slow [13:27:18] i believe there are some large binaries in the history, perhaps at some point we should just remove that, even if it breaks people's clones [13:27:42] then again, that's painful too [13:27:43] maybe not [13:28:09] speaking of that [13:28:17] we talked a bit about making puppet a fast-forward only repo [13:28:27] then totally forgot about it [13:28:31] maybe we should do it now? [13:29:07] <^demon> Considering it'll automatically rebase for you before merging, the workflow ends up being pretty much identical. [13:29:16] <^demon> Either way, you have to resolve conflicts if they exist. [13:31:13] yeah that would be good [13:31:48] <^demon> https://gerrit.wikimedia.org/r/#/admin/projects/operations/puppet - easily changed, just swap the merge type. [13:32:17] done [13:32:32] i've done that for some of the operations/debs repos previously as well [13:32:42] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [13:35:53] <^demon> There's actually a new experimental merge type as well, just not exposed in the UI. Jgit now supports recursive merges :) [13:36:36] I think fast-forward is better [13:37:12] <^demon> Yeah, I prefer fast-forward too. [13:37:24] <^demon> But for something like core, where there's lots of potential points for conflict, it may be useful [13:38:14] and maybe at some point it might make sense to rewrite history to fix authors, make it more sequential, remove binary files etc [13:39:00] <^demon> I want to do that with mediawiki core, but people seemed to shy away from the idea when I brought it up. [13:39:08] <^demon> "Everybody has to re-clone? The horror!" [13:42:33] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [13:43:33] PROBLEM - Varnish HTTP upload-frontend on cp1035 is CRITICAL: Connection refused [13:43:43] PROBLEM - Varnish HTCP daemon on cp1035 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 997 (varnishhtcpd), args varnishhtcpd worker [13:44:03] PROBLEM - Varnish HTTP upload-backend on cp1035 is CRITICAL: Connection refused [13:45:00] that's me [13:45:16] New patchset: Ottomata; "Now syncing sampled-1000 logs from gadolinium instead of emery." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57504 [13:45:39] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57504 [13:49:17] New patchset: Jeremyb; "fix comment on udp2log sample rate" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57471 [13:49:44] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57471 [13:53:33] RECOVERY - Varnish HTTP upload-frontend on cp1035 is OK: HTTP OK: HTTP/1.1 200 OK - 675 bytes in 0.001 second response time [13:53:43] RECOVERY - Varnish HTCP daemon on cp1035 is OK: PROCS OK: 1 process with UID = 997 (varnishhtcpd), args varnishhtcpd worker [13:54:06] <^demon> Heh, you can see the small spike from repacking: http://ganglia.wikimedia.org/latest/?c=Miscellaneous%20eqiad&h=manganese.wikimedia.org&m=cpu_report&r=hour&s=descending&hc=4&mc=2 [13:57:15] New patchset: Demon; "Set up weekly jgit gc operations for all repositories" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57327 [14:00:07] New review: Demon; "PS2 tosses the (very noisy) output to /dev/null. This will run every Saturday at 2am. Should take ~5..." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/57327 [14:00:39] New patchset: Mark Bergsma; "Revert "Temporarily depool cp1035 for persistent debugging"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57505 [14:01:04] New patchset: Odder; "(bug 46882) Namespace 100 to be searched by default on sewikimedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57506 [14:01:10] New patchset: Mark Bergsma; "Revert "Temporarily depool cp1035 for persistent debugging"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57505 [14:01:21] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57505 [14:03:33] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [14:05:05] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [14:11:44] New patchset: Ottomata; "Installing webstatscollector gzip cron on gadolinium." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57507 [14:12:19] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57507 [14:19:51] New patchset: Demon; "Updating gerrit to 2.6-rc0-144-gb1dadd2" [operations/debs/gerrit] (master) - https://gerrit.wikimedia.org/r/57508 [14:20:18] New review: Demon; "war is available at: https://integration.wikimedia.org/nightly/gerrit/wmf/gerrit-2.6-rc0-144-gb1dadd..." [operations/debs/gerrit] (master) C: 1; - https://gerrit.wikimedia.org/r/57508 [14:22:06] and where is available peace? [14:22:30] heyaaa apergos, you there? [14:25:07] PROBLEM - Puppet freshness on virt1000 is CRITICAL: No successful Puppet run in the last 10 hours [14:26:46] <^demon> MaxSem: I didn't create any peace. [14:27:08] how typical! [14:28:20] Change abandoned: Demon; "Not going to do this as a giant change. We can do it as needed." [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/52892 [14:32:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:19] New patchset: Andrew Bogott; "Do a full MW clone instead of a shallow one." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57325 [14:33:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.198 second response time [14:33:34] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57325 [14:35:49] New patchset: Demon; "Updating gerrit to 2.6-rc0-144-gb1dadd2" [operations/debs/gerrit] (master) - https://gerrit.wikimedia.org/r/57508 [14:43:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:43:40] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [14:44:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time [14:46:24] ottomata: [14:46:29] what's up? [14:50:05] heya [14:50:13] do you know how the webstats dumps files are removed from locke? [14:50:27] i know your script on snapshot1 copies them to dumps.wm.o [14:50:37] but the logs on locke are about only 11 or 12 days old [14:50:54] i was about to add a find -mtime +12 -delete cron on gadolinium [14:50:59] unless you know a better way / how locke does it now [14:51:20] apergos^ [14:51:35] hmm [14:51:51] there used to be a job over there that cleaned em up I thought [14:52:10] OH [14:52:12] you know there is [14:52:14] sorry, i missed that [14:52:25] /a/webstats/scripts/purge [14:52:32] yeah [14:52:42] for file in $(find ./ -maxdepth 1 -type f -mtime +10) [14:52:42] do [14:52:42] # date=$(echo $file | cut -d- -f2) [14:52:42] # newdir=./archive/${date:0:4}/${date:4:2}/ [14:52:42] # mkdir -p $newdir [14:52:43] # mv $file $newdir/ [14:52:43] rm $file [14:52:44] done [14:52:45] basically find --delete [14:52:53] i guess it used to do some fancy rotating [14:53:01] k, i'll stick with my delete cron then [14:53:06] all righty [14:53:09] danke! [14:55:07] sure [14:55:49] i'm soon going to edit your cron on snapshot1 to copy webstats dumps from gadolinium [14:55:50] s'ok? [14:55:53] is there anything I should know? [14:58:10] New patchset: Ottomata; "Setting up cron to delete old webstats dumps on gadolinium." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57511 [15:00:49] heya chad, mark, now that action = fast forward only [15:01:07] what should we do differently when committing changes? [15:01:17] ^demon [15:01:52] New patchset: Matthias Mullie; "Update frwiki config" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/55946 [15:02:12] <^demon> Everything has to be rebased on top of HEAD before submitting. Gerrit's supposed to do this implicitly, but if it doesn't there's always the rebase button. [15:02:21] <^demon> If there's conflicts, you'll still have to resolve, so no change there. [15:02:40] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [15:04:10] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [15:04:17] yeah, hm, what about patchsets though [15:04:23] amended ones [15:04:26] every time rebase? [15:05:42] New patchset: Ottomata; "Setting up cron to delete old webstats dumps on gadolinium." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57511 [15:05:52] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57511 [15:06:33] New review: coren; "I have, there is no added configuration to sshd_config when the ssh_restrict_network has no value, w..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57447 [15:09:17] New patchset: Ottomata; "Missing a /" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57512 [15:09:26] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57512 [15:10:10] <^demon> You don't have to rebase until right before submission. [15:10:28] <^demon> Individual patches can be based on whatever parent makes sense. [15:10:39] until merge? [15:10:40] hmm, i see [15:10:45] well, no longer merge [15:10:46] i get it [15:10:46] k [15:10:51] <^demon> :) [15:10:53] same diff anyway, right? [15:10:57] <^demon> Basically, yeah [15:11:00] if there was a conflict it wouldn't merge a nyway [15:11:10] usually rebase will jsut work? [15:11:17] <^demon> Yeah, if it's a clean rebase. [15:11:53] <^demon> Ideally I'd like to see it use the recursive merge for rebasing, but meh, still experimental. [15:12:03] <^demon> So a rebase could at least solve some conflicts. [15:27:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:28:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [15:32:07] New patchset: Odder; "(bug 46154) Add forgotten config for thwiki Add abusefilter-log-detail for autoconfirmed on thwiki per bug." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57515 [15:33:38] New review: Andrew Bogott; "This should probably wait for https://gerrit.wikimedia.org/r/#/c/43886/ since that renames the proxy." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57443 [15:35:57] ^demon: if you can't approve this small patch, what would be best way to get it in quickly? https://gerrit.wikimedia.org/r/#/c/56879/ [15:36:24] (that's to halve the wikibugs spam) [15:37:04] <^demon> I can't. [15:37:09] <^demon> It's not puppetized. [15:37:24] <^demon> https://gerrit.wikimedia.org/r/#/c/53973/ needs to be merged. [15:39:45] PROBLEM - DPKG on cp1041 is CRITICAL: Timeout while attempting connection [15:40:35] RECOVERY - DPKG on cp1041 is OK: All packages OK [15:44:17] ^demon: but that's only the bugzilla part of the fix. Or "It's not puppetized" referred to bugzilla? [15:45:09] <^demon> Oh, I misread the repo name. [15:45:19] <^demon> Nothing relating to bugzilla or wikibugs is puppetized :( [15:45:58] Right, I suspected that [15:46:32] still, CR would help (and the commit message quotes you :p) [15:47:10] PROBLEM - Varnish HTCP daemon on cp1041 is CRITICAL: Timeout while attempting connection [15:48:05] RECOVERY - Varnish HTCP daemon on cp1041 is OK: PROCS OK: 1 process with UID = 997 (varnishhtcpd), args varnishhtcpd worker [15:48:06] <^demon> Isn't REL_CC what it used to be? [15:50:35] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [15:54:38] !log now generating webstatscollector dumps on gadolinium. snapshot1 syncs dumps from there over to dataset2 for dumps.wikimedia.org [15:54:45] Logged the message, Master [15:54:58] New patchset: Odder; "(bug 45638) Add patrol right to autopatrolled on itwikivoyage This change removes patrol right added to autoconfirmed group in I198651d7, and moves it to the autopatrolled group instead, per comment 9 in bug 45638." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57519 [16:00:35] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [16:05:38] RECOVERY - Host cp3003 is UP: PING OK - Packet loss = 0%, RTA = 87.83 ms [16:05:38] PROBLEM - Varnish HTTP upload-frontend on cp3003 is CRITICAL: Connection refused [16:05:39] PROBLEM - Varnish HTCP daemon on cp3003 is CRITICAL: Connection refused by host [16:05:48] PROBLEM - NTP on cp3003 is CRITICAL: NTP CRITICAL: No response from NTP server [16:05:48] PROBLEM - SSH on cp3003 is CRITICAL: Connection refused [16:05:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [16:08:18] PROBLEM - Varnish traffic logger on cp3003 is CRITICAL: Connection refused by host [16:08:19] apergos: Hey. Is that slave not so lagged now? :D [16:08:28] PROBLEM - Varnish HTTP upload-backend on cp3003 is CRITICAL: Connection refused [16:08:48] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 16:08:43 UTC 2013 [16:08:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [16:09:18] PROBLEM - DPKG on tmh2 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:09:28] PROBLEM - DPKG on tmh1 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:09:48] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 16:09:46 UTC 2013 [16:09:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [16:10:18] RECOVERY - DPKG on tmh2 is OK: All packages OK [16:10:28] RECOVERY - DPKG on tmh1 is OK: All packages OK [16:10:48] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 16:10:43 UTC 2013 [16:10:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [16:11:38] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 16:11:33 UTC 2013 [16:11:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [16:12:18] PROBLEM - Host cp3003 is DOWN: PING CRITICAL - Packet loss = 100% [16:12:18] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 16:12:16 UTC 2013 [16:12:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [16:12:58] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 16:12:55 UTC 2013 [16:13:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [16:14:06] Looks to be.. [16:15:41] !log upgrading image & video scalers for libav [16:15:49] Logged the message, Master [16:17:19] PROBLEM - DPKG on mw78 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:17:28] PROBLEM - DPKG on mw80 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:18:18] PROBLEM - Apache HTTP on mw75 is CRITICAL: Connection refused [16:18:18] RECOVERY - DPKG on mw78 is OK: All packages OK [16:18:28] PROBLEM - Apache HTTP on mw78 is CRITICAL: Connection refused [16:18:28] PROBLEM - Apache HTTP on mw79 is CRITICAL: Connection refused [16:18:28] RECOVERY - DPKG on mw80 is OK: All packages OK [16:18:39] PROBLEM - Apache HTTP on mw80 is CRITICAL: Connection refused [16:18:39] PROBLEM - Apache HTTP on mw76 is CRITICAL: Connection refused [16:18:39] PROBLEM - Apache HTTP on mw77 is CRITICAL: Connection refused [16:18:48] PROBLEM - LVS HTTP IPv4 on rendering.svc.pmtpa.wmnet is CRITICAL: Connection refused [16:20:04] let's shutdown tampa ;) [16:20:34] But then we wouldn't have slaves to run silly queries on! [16:20:45] <^demon> Or fenari! [16:21:42] or srv193/testwiki! [16:22:36] paravoid: hello :) [16:22:43] hi [16:22:48] RECOVERY - Host cp3003 is UP: PING OK - Packet loss = 0%, RTA = 88.13 ms [16:25:52] Reedy: we still don't have such queries :( [16:26:08] I've used a few [16:26:33] selfish [16:26:35] oh crap [16:26:40] I'm for the query equality https://gerrit.wikimedia.org/r/#/c/33713/ [16:26:59] just saw the lvs, fixing [16:27:09] you again!?! [16:27:14] yes :) [16:28:18] RECOVERY - Apache HTTP on mw75 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.129 second response time [16:28:28] RECOVERY - Apache HTTP on mw78 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.146 second response time [16:28:38] RECOVERY - Apache HTTP on mw80 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.234 second response time [16:28:38] RECOVERY - Apache HTTP on mw77 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.231 second response time [16:28:44] why the f** did apache stop [16:28:50] oh well, it was pmtpa so no harm [16:29:28] RECOVERY - Apache HTTP on mw79 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.133 second response time [16:29:28] PROBLEM - Host cp3003 is DOWN: PING CRITICAL - Packet loss = 100% [16:29:48] RECOVERY - LVS HTTP IPv4 on rendering.svc.pmtpa.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 62001 bytes in 0.812 second response time [16:30:10] New review: Nemo bis; "Ping! Tim, your last comment in short is that this may not load Tampa servers *enough* (now that the..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/33713 [16:31:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:32:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.135 second response time [16:33:28] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 16:33:17 UTC 2013 [16:33:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [16:35:25] New review: Matthias Mullie; "Approved by Luke!" [operations/mediawiki-config] (master) C: 2; - https://gerrit.wikimedia.org/r/55946 [16:37:19] RECOVERY - SSH on cp3003 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [16:37:29] RECOVERY - Host cp3003 is UP: PING OK - Packet loss = 0%, RTA = 88.12 ms [16:37:41] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/55946 [16:38:53] New patchset: Se4598; "fix AFTv5 oversight e-mail for frwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57522 [16:40:27] Change abandoned: Se4598; "fixed by https://gerrit.wikimedia.org/r/55946" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57522 [16:45:11] mlitn: heya, just checking if you were doing the AFTv5 deploy yet? [16:45:48] greg-g: I am :) [16:46:39] mlitn: cool, just didn't see any server-side activity in the log :) [16:46:53] so Reedy unfortunately my job is still running :-( [16:47:10] I noticed the 84306 lag :( [16:47:45] and the log file is a lot larger, it grew a giant amount in only well less than two weeks [16:47:49] this is not good [16:48:40] RECOVERY - Apache HTTP on mw76 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.133 second response time [16:48:43] ugh all the autocreated users are in there of course [16:48:43] man, everyone wants db55 today [16:49:13] I don't, I want my job to be done [16:54:41] greg-g: no activity yet because jenkins doesn't seem to be willing to merge my changes (https://gerrit.wikimedia.org/r/#/c/57521/ & https://gerrit.wikimedia.org/r/#/c/57523/) over errors that at first sight seem unrelated to my code [16:54:54] any idea who I should turn to? :) [16:55:00] ^demon: ^^ [16:55:38] * greg-g likes telling ^demon to look at something said above, it's ^^^ [16:57:04] <^demon> You've got test failures. [16:57:19] <^demon> Jenkins lists the tests it ran in its comment, click on those links to read the results. [16:57:39] heh [16:57:46] <^demon> Well, https://gerrit.wikimedia.org/r/#/c/57523/ is a test failure. [16:57:59] <^demon> https://gerrit.wikimedia.org/r/#/c/57521/ look like jenkins or zuul went wonky. [16:58:37] I was getting those this week and just removing the -2 for wmf branch merges [16:58:55] <^demon> Override jenkins on https://gerrit.wikimedia.org/r/#/c/57521/. [16:59:13] alright will do [16:59:32] ^demon: funny thing is I have no idea about those failing unit tests; they're unrelated to what I'm pushing [16:59:41] <^demon> Yeah, but I'm curious why they're failing. [16:59:46] <^demon> Did someone merge something bad? [16:59:50] <^demon> Or is jenkins confused? [17:03:07] LeslieCarr: I need changes is filtering between labs and eqiad, RT ticket? [17:03:14] in* [17:04:48] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [17:05:58] ^demon: looks like those failing tests are part of wmf/1.21wmf12, but not in wmf/1.22wmf1 [17:06:22] <^demon> Well that's good. [17:06:26] (as in: those tests, that file, do not exist there) [17:12:33] weird though (re tests) [17:15:44] if I run those tests on my local machine, in that branch, fresh checkout, they fail too [17:16:00] so it's not jenkins going crazy [17:19:49] New patchset: awjrichards; "Add mobile photo uploads to backlog for images needing categorization" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57528 [17:20:58] ^demon: looks like that failure has been there for awhile [17:21:22] as in: multiple people have had it occur already, and seem to have undone jenkins' -1 and merged it in anyway: https://gerrit.wikimedia.org/r/#/q/status:merged+project:mediawiki/core+branch:wmf/1.21wmf12,n,z [17:21:37] mind if I do the same? [17:22:19] <^demon> Go for it. [17:22:33] ok, thanks [17:22:47] mlitn: yeah, do it, you only have 38 more minutes! ;) [17:22:54] ^^ [17:22:58] yeah [17:26:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:27:14] !log mlitn synchronized wmf-config/InitialiseSettings.php 'Updating ArticleFeedbackv5 frwiki config' [17:27:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.145 second response time [17:27:21] Logged the message, Master [17:30:46] !log mlitn synchronized php-1.21wmf12/extensions/ArticleFeedbackv5 'Update ArticleFeedbackv5 to master' [17:30:53] Logged the message, Master [17:31:12] !log mlitn synchronized php-1.22wmf1/extensions/ArticleFeedbackv5 'Update ArticleFeedbackv5 to master' [17:31:15] Logged the message, Master [17:32:56] paravoid: hey, about https://gerrit.wikimedia.org/r/#/c/56692/ : it's not urgent, but i don't want it to fall off the radar completely. would it help to file an RT ticket for it? [17:34:21] Coren: rt [17:34:39] I need some filtering adjustment between the labs in pmtpa and a server in eqiad. Do I file an RT in pmtpa, eqiad, or somewhere else? [17:34:58] sounds like "networking" [17:35:07] New patchset: Lcarr; "Add ssh_restrict_network variable to SSH" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57447 [17:35:22] mutante: So it does. Hadn't noticed the "network" queue. :-) [17:35:27] rt in network :) [17:35:27] hehe [17:35:34] pmtpa and eqiad are for onsite work [17:35:41] like "please take a blowtorch to spence" [17:36:15] * Coren prefers ax work, himself. [17:36:50] thermite [17:39:37] New review: Yuvipanda; "lgtm. The lack of newline between the templates does not seem to cause issues with the layout." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/57528 [17:42:41] mlitn: after the update if aftv5 I see now for a new message. I assume that the l10n cache has to be rebuild. [17:43:06] Raymond_: it should be rebuilding as we speak :) [17:43:21] mlitn: perfect :) [17:45:53] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [17:45:53] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [17:45:53] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [17:46:23] mutante: Thermite is effective, but nowhere as cathartic as swinging a large, heavy edged object and hearing components split. :-) [17:47:37] Coren: haha, true:) [18:04:57] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [18:19:24] New patchset: Ottomata; "Moving inclusion of accounts::datasets to webstatscollector class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57534 [18:19:38] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57534 [18:21:17] if you are remote and want to follow the metrics meeting: http://www.youtube.com/watch?v=72UupRKxtEw [18:22:36] mutante: thx for the link [18:23:46] New patchset: Ottomata; "Renaming udp_stats in ganglia." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57535 [18:26:02] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57535 [18:30:17] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [18:32:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:33:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.143 second response time [18:42:36] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [18:55:20] New patchset: Ottomata; "Fixing udp_stats.py code to work with prefixed ganglia names" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57541 [18:56:19] New patchset: Ottomata; "Fixing udp_stats.py code to work with prefixed ganglia names" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57541 [18:57:25] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57541 [18:58:46] New patchset: MaxSem; "Add mobile photo uploads to backlog for images needing categorization" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57528 [19:00:40] !log resuming iwlinks migrations on s3 [19:00:47] Logged the message, Master [19:05:16] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57528 [19:05:50] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [19:07:44] MaxSem: :) [19:08:04] what?:) [19:08:14] 57528 ^^ [19:08:36] the props should go not to me [19:09:07] !log maxsem synchronized wmf-config/InitialiseSettings.php 'https://gerrit.wikimedia.org/r/57528' [19:09:14] Logged the message, Master [19:13:39] ottomata: have a look at planet.debian.org [19:13:48] ottomata: a lot of people discussing their packaging workflows with git [19:14:11] (and whoever else is interested in Debian packaging using git) [19:15:30] that particular post? [19:15:32] or more than that? [19:15:54] I count five [19:15:58] http://danielpocock.com/autotools-project-distribution-and-packaging-on-debian [19:16:02] http://joeyh.name/blog/entry/upstream_git_repositories/ [19:16:06] http://www.eyrie.org/~eagle/journal/2013-04/001.html [19:16:09] http://thomas.goirand.fr/blog/?p=94 [19:16:12] http://blog.brlink.eu/index.html#i62 [19:17:35] paravoid: is there any strong reason to use pbuilder as root rather than the non-root version? [19:17:43] ayye cool, will read these [19:17:44] no [19:17:52] you should't build packages as root [19:18:06] pbuilder requires it unless you use the non-root version [19:18:13] our current directions require root [19:19:20] about to try Evernote or something.. hmm maybe, "NixNote" to paste these [19:19:26] so, should we switch the instructions to use pbuilder-uml? [19:19:42] oh you mean the uml stuff? [19:19:45] I've never used those [19:19:54] !log enabled 1:100 full query logging on db1042/1043, disabled puppet [planning to revert in 4-8 hours] [19:20:00] Logged the message, Master [19:20:02] hm, maybe tools just do sudo pbuilder [19:20:16] although I'm sure they use fakeroot for some parts [19:21:15] so, no [19:21:22] pbuilder uses BUILDUSERID internally [19:21:50] you invoke it as root, but drops to that when it builds [19:25:46] New patchset: Ottomata; "Adding generic UDP stats to udp2log ganglia view" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57546 [19:26:55] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57546 [19:28:56] paravoid: but it requires that you run it as root [19:29:04] !log maxsem synchronized php-1.22wmf1/extensions/MobileFrontend 'https://gerrit.wikimedia.org/r/#/c/57525/' [19:29:10] Logged the message, Master [19:29:44] yes [19:33:40] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [19:33:43] !log maxsem synchronized php-1.21wmf12/extensions/MobileFrontend 'https://gerrit.wikimedia.org/r/#/c/57525/' [19:33:50] Logged the message, Master [19:36:59] PROBLEM - SSH on cp1043 is CRITICAL: Server answer: [19:37:34] RECOVERY - SSH on cp1043 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [19:38:00] New patchset: Lcarr; "Add ssh_restrict_network variable to SSH" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57447 [19:39:15] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57447 [19:41:04] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: Timeout while attempting connection [19:42:04] PROBLEM - RAID on analytics1012 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [19:43:45] New patchset: Lcarr; "Add ganglia graph for global jobqueue length" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37441 [19:43:49] New review: Mark Bergsma; "IIRC, this breaks bonding on Lucid. We've been experimenting with this a while ago, and couldn't fin..." [operations/puppet] (production) C: -2; - https://gerrit.wikimedia.org/r/57310 [19:48:37] New review: Lcarr; "My ruby knowledge is not good enough to give a serious review. Considering your careful record, if ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55304 [19:48:37] New patchset: Ottomata; "Stacking some more values in udp2log ganglia view" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57555 [19:49:56] New patchset: Lcarr; "Temporarily install apachebench on wtp1004" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54110 [19:50:12] New review: Ottomata; "This is great! Will be very useful for developing puppet templates before deploying them, especiall..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55304 [19:50:21] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37441 [19:50:35] New patchset: Ottomata; "Stacking some more values in udp2log ganglia view" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57555 [19:50:41] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57555 [19:51:15] New patchset: Lcarr; "Temporarily install apachebench on wtp1004" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54110 [19:51:36] got a change on sockpuppet for Coren and uhh, Fredrecio? [19:51:38] ok to merge? [19:52:32] LeslieCarr: [19:52:32] Add ssh_restrict_network variable to SSH [19:54:01] and [19:54:01] Add ganglia graph for global jobqueue length [19:54:45] ottomata: Should be essentially a noop for sockpuppet; it adds a section if you have that variable set [19:55:01] k danke [19:55:47] sorry ottomata , yes :) [19:55:55] k done, danke [19:56:00] damn other channels being concentration taking [19:56:10] Thanks ottomata [19:57:02] LeslieCarr: esams file squid purging is broken again [19:57:14] Nemo_bis: why'd you break it ? [19:57:27] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54110 [19:57:30] it breaks constantly on its own :) [19:57:33] e.g. https://upload.wikimedia.org/wikipedia/commons/f/fe/Wikipedia-logo-v2-km.png [19:58:11] hmm or was the broken one served by varnish :/ [19:58:48] i'll check out the purging script … as soon as someone can tlel me how i make a wiki page redirect to another page [19:59:08] LeslieCarr: insert #REDIRECT [[other page]] [19:59:11] #REDIRECT[[PAGE]] ? [19:59:13] http://p.defau.lt/?nVbMBJZgil4azOmRoGOHXA [19:59:38] correct version is 18 kb [20:00:08] thanks [20:01:42] looking now :) [20:02:45] New patchset: Asher; "module sets cpu frequency governor via cpufrequtils" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57589 [20:03:16] binasher: :) [20:03:38] paravoid: i feel a little silly making that a module [20:04:14] ooh thanks for the ganglia graph [20:04:35] yeah but I can't think of anything better than that [20:04:39] binasher: \n on content => [20:05:34] and you're using single/double quotes inconsistently :) [20:05:37] New patchset: Ottomata; "Adding send stats to udp2log view as well." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57602 [20:05:51] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57602 [20:06:09] paravoid: why the \n? it isn't needed, do you prefer for style? [20:06:17] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [20:06:25] it isn't, it helps when you e.g. cat /etc/default/cpufrequtils [20:06:58] otherwise you'll see GOVERNOR=performancebinasher@hostname:~$ [20:07:05] sounds good to me [20:07:28] I'm famous for nitpicking [20:07:47] ask e.g. ottomata [20:07:51] heh :) [20:07:58] would you live with 2-space soft tabs vs. real tabs? [20:08:14] no. [20:08:19] Nemo_bis: so on the good side the multicast relay is receiving like 5 million purge requests [20:08:19] 4-space soft tab, ok! [20:08:23] can we remove puppet-lint then [20:08:47] binasher: but it's the ruby coding style! [20:08:47] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 20:08:38 UTC 2013 [20:08:57] :-) [20:09:04] LeslieCarr: probably all people clicking "purge" a hundred times per reupload :P [20:09:16] ahha however it's not being rebroadcast inside of esams! [20:09:17] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [20:09:23] different multicast problem - very sneaky [20:09:26] icinga-wm: decom'ed [20:09:37] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 20:09:29 UTC 2013 [20:10:09] paravoid: re: quotes.. is File["/etc/default/cpufrequtils"] the only actual inconsistency, or will puppet expand ${variables} in strings reqardless of quote type? [20:10:17] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [20:10:27] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 20:10:18 UTC 2013 [20:10:28] you need " if there are variables in it [20:10:36] yeah, that's what i thought [20:10:44] but style wants you to always use ' if there are no vars [20:10:45] binasher: neither :) [20:10:57] binasher: puppet does variable expansion on double quotes [20:11:01] but it's not the only inconsistency [20:11:04] db1055 will not pxe boot...it goes through the whole dhcp process and assigns the correct ip but than i get a blinking cursor. I don't believe it is the nic card because it reaches the dhcp server. I verified dhcpd file an 0 errors, and dns looks good. any ideas [20:11:10] binasher: $governor = "performance" [20:11:14] paravoid: yeah [20:11:17] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [20:11:44] New patchset: Asher; "module sets cpu frequency governor via cpufrequtils" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57589 [20:11:47] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 20:11:39 UTC 2013 [20:11:51] robh: maybe you know something that I am missing see ^ [20:12:00] oh, damn it [20:12:17] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [20:13:16] !log puppetstoredconfigclean.rb xenon.eqiad.wmnet - Killing xenon ... [20:13:23] Logged the message, Master [20:14:38] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57589 [20:14:45] !log restarted udpmcast on hooft [20:14:52] Logged the message, Mistress of the network gear. [20:16:08] New patchset: Mattflaschen; "Bump Echo EventLogging schema for new notification types." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57645 [20:17:53] !log restarting varnish on cp3001 [20:17:59] Logged the message, Mistress of the network gear. [20:19:54] cmjohnson1: if you get the pxe boot [20:20:02] but it doesnt proceed, sometimes in reverse dns entries issue [20:20:31] but thats odd for a system in a range like that [20:20:40] lemme take a glance at settings [20:20:50] !log stopped cp3001 varnish [20:20:55] Logged the message, Mistress of the network gear. [20:21:19] robh: thx [20:21:53] New patchset: MaxSem; "Send Zero notifications to #wikimedia-mobile" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57647 [20:22:05] Nemo_bis: getting closer to the reason - lots of packets in the Recv-Q [20:22:29] who needs watching thrillers during evening [20:22:34] ahha [20:22:34] varnishhtcpd worker - waiting for net [20:22:41] also being run by "997" [20:22:51] will restart those processes [20:23:18] wow nope [20:23:21] still "waiting for net" [20:23:28] lesse their network utilization [20:23:30] New patchset: Dzahn; "add a script and cron to mail out bugzilla audit log and move bugzilla scripts to files/bugzilla instead of misc" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56562 [20:24:23] cmjohnson1: Ok, so I confirmed the netboot.cfg and dhcp lease files are correct, and the dns entries all appear correct [20:24:30] i'm taking over console to check there [20:24:47] ok [20:27:12] cmjohnson1: So are you on it still? [20:27:18] cuz console com2 does nothing for me. [20:27:23] no error, no nothing [20:27:28] did the mainboard get swapped or sometnhing? [20:27:39] (cuz the bios redirection default is wrong addresses for com1/2 [20:27:40] ) [20:27:42] i thought i was off...sorry..all yours [20:27:50] ahh, there we go [20:28:15] cmjohnson1: Ok, this one is wrong too [20:28:19] which will break installer [20:28:22] now, the default is [20:28:34] Serial Port Address Device1=COM2,Serial [20:28:34] Device2=COM1> [20:28:44] thats : Serial Device1=COM2,Serial Device2=COM1 [20:28:48] ahha! [20:28:52] thats fubar, as it used to be the other way around [20:28:57] but now delld efaults to that [20:29:01] and it MUST be changed on every system [20:29:06] or the installer will not output correctly [20:29:14] yep..thanks for being a second set of eyes...i looked right over that [20:29:15] Change merged: Ryan Lane; [operations/debs/gerrit] (master) - https://gerrit.wikimedia.org/r/57508 [20:29:23] cmjohnson1: ok, im off, i did not fix [20:29:26] i left for you [20:29:27] sad part is I looked right at it [20:29:47] heh, i just look at the page and think 'huh, pattern of layout is off' [20:29:51] then notice the text is wrong [20:29:52] paravoid: if you're around, i could use some of your knowledge [20:29:58] im just so used to staring at these screens ;_;. [20:30:16] you have a bigger screen than I do...13" here [20:30:31] so cp3004 (for example) has varnishhtcpd "waiting on network" however the network port is a 10G and only 1g is utilized [20:30:51] however lshw -class network shows that for some reason it thinks that the capacity of the card is 1g [20:31:14] maybe that is a known bug in the report, maybe that's why the machine is confused ? [20:32:47] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Thu Apr 4 20:32:45 UTC 2013 [20:33:01] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57645 [20:33:17] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [20:33:29] New review: Dzahn; "recheck" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56562 [20:36:10] New patchset: Ryan Lane; "Reference used images by absolute path in gerrit's css" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56705 [20:36:57] New review: Dzahn; "explanation for NE flag.. quoting:" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/49069 [20:37:25] New patchset: Ryan Lane; "Set up weekly jgit gc operations for all repositories" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57327 [20:37:29] New patchset: Ryan Lane; "Style gerrit's LDAP login page according to other gerrit pages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56706 [20:38:35] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56705 [20:39:47] New patchset: Ryan Lane; "Style gerrit's LDAP login page according to other gerrit pages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56706 [20:40:49] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56706 [20:41:04] New patchset: Ryan Lane; "Set up weekly jgit gc operations for all repositories" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57327 [20:41:10] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57327 [20:43:57] New patchset: Kaldari; "bug 46392, add 'Contact Wikipedia' footer link on enwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57649 [20:44:32] hrm, okay that's not it, iperf is able to go over [20:47:26] Nemo_bis: purge still failing ? [20:51:03] LeslieCarr: let me check more recent reuploads [20:51:11] thanks [20:52:05] i guess i can also just take a picture and then upload it, then obviously modify it and see [20:52:24] LeslieCarr: I think it may still be failing [20:52:29] using the brand new commons app?:) [20:52:40] I re purged the same file and I'm still getting the old one sometimes [20:52:51] gah [20:52:57] hehe [20:53:16] X-Cache: cp1026 hit (62), cp3005 hit (8), cp3004 frontend miss (0) [20:53:20] http://p.defau.lt/?XXhzdaTnZSV_o_K3ddOTdQ [21:02:04] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [21:02:24] Nemo_bis: i have to hit up a phone screen - mutante / notpeter are either of you available to check this out ? varnishes in esams not purging some images (see Nemo_bis posts) and the machines are receiving the purge requests [21:06:54] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [21:11:48] "Original upload log" and "File history" seem to differ [21:12:17] oh. .svg vs. .png , nevermind [21:16:04] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [21:20:35] New patchset: Dzahn; "redirect wikinews.com to .org - RT-4804" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/57651 [21:22:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:23:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [21:33:04] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [21:43:52] New patchset: Asher; "setting cpu frequency governor to performance on all coredbs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57660 [21:44:12] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57660 [21:50:14] New patchset: Asher; "lack of status from init.d/cpufrequitls breaks file change -> service refresh" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57661 [21:50:30] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57661 [21:52:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:53:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [21:58:06] Nemo_bis: i believe it is fixed (by using ?action=purge) [21:58:14] http://commons.wikimedia.org/wiki/File:Wikipedia-logo-v2-km.png [22:00:12] New patchset: Pyoungmeister; "adding per-node and per-instance monitoring for SANITARIUM" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57665 [22:00:25] !log mflaschen synchronized php-1.21wmf12/extensions/GettingStarted/ 'GettingStarted deployment for E3' [22:00:32] Logged the message, Master [22:01:04] !log mflaschen synchronized php-1.21wmf12/extensions/GuidedTour/ 'GuidedTour 1.21wmf12 deployment for E3' [22:01:10] Logged the message, Master [22:01:51] !log mflaschen synchronized php-1.22wmf1/extensions/GettingStarted/ 'GettingStarted 1.22wmf1 deployment for E3' [22:01:58] Logged the message, Master [22:02:30] !log mflaschen synchronized php-1.22wmf1/extensions/GuidedTour/ 'GuidedTour 1.22wmf1 deployment for E3' [22:02:32] ^demon: got a gerrit issue. when clicking a "Side-by-Side" diff i get Download of https://gerrit.wikimedia.org/r/gerrit_ui/deferredjs/9CDCCBC9EFD12CB75D742AA7009560ED/4.cache.js?manualRetry=3&autoRetry=3 failed with status 404(Not Found) [22:02:37] Logged the message, Master [22:02:59] <^demon> What change was this on? [22:03:01] https://gerrit.wikimedia.org/r/#/c/57651/ [22:03:34] <^demon> Wfm. Hard refresh your browser? [22:03:42] <^demon> We upgraded gerrit about an hour ago, might have some stale JS? [22:03:55] that fixed it indeed. thank you. i just didn't see this one before [22:04:02] that makes sense, yep [22:04:20] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [22:05:53] New review: Dzahn; "dzahn@fenari:~$ apache-fast-test wikinews.url mw1044" [operations/apache-config] (master) C: 2; - https://gerrit.wikimedia.org/r/57651 [22:05:53] Change merged: Dzahn; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/57651 [22:06:50] New review: Reedy; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37441 [22:07:26] New patchset: Bsitu; "Disable Echo on testwiki temporarily" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57666 [22:08:27] New review: Reedy; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37441 [22:09:12] New patchset: Asher; "cpufrequtils shouldn't be treated as a service, except to start at boot" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57667 [22:09:58] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57667 [22:10:12] !log olivneh synchronized wmf-config/CommonSettings.php 'Bump Echo EventLogging schema for new notification types.' [22:10:23] Logged the message, Master [22:10:23] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57666 [22:11:56] New patchset: Reedy; "Fix paths to getJobQueueLengths.php from g37441" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57668 [22:12:22] New review: Reedy; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37441 [22:12:31] mutante: sometimes action=purge works, but the problem is that it mostly fails [22:12:40] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [22:12:47] you'd need to check new reuploads to see if it works as it should [22:13:18] and no it didn't work for that file http://p.defau.lt/?4ufZnplc298judbksc9PiQ [22:14:34] !log bsitu synchronized wmf-config/InitialiseSettings.php 'Disable Echo on testwiki' [22:14:41] Logged the message, Master [22:22:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:23:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [22:37:25] PROBLEM - RAID on db44 is CRITICAL: CRITICAL: Degraded [22:37:30] New patchset: Pyoungmeister; "adding per-node and per-instance monitoring for SANITARIUM" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57665 [22:43:15] Reedy: thansk for the followup, I saw the empty graphs and hoped it was some puppet delay :/ [22:43:21] did I read the mwscript docs incorrectly or what [22:43:29] Yes and no [22:43:41] As it stands, it only looks in maintenance [22:43:56] if it's not in there, and the root + file exists, it'll run that [22:44:13] why is the en.wiki one empty too? https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20pmtpa&h=hume.wikimedia.org&v=&m=enwiki_JobQueue_length&r=hour&z=default&jr=&js=&st=1365115414&z=large [22:44:39] Part of the reason I opened the bug to look in extensions/WikimediaMaintence next [22:44:44] ok [22:44:45] New patchset: Pyoungmeister; "adding per-node and per-instance monitoring for SANITARIUM" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57665 [22:45:03] I wonder if the passwords for MySQL are called correctly [22:45:41] This I mean: include passwords::nagios::mysql [22:45:50] You'd have to ask ops [22:45:53] I've no idea [22:46:00] In the whole repo I didn't find an example :/ [22:46:08] (I = grep) [22:46:39] /bin/sh: 1: mwscript: not found [22:46:41] that is the error [22:46:45] it's being called correctly [22:46:46] in most cases they're used in templates which get replaced before it came to the first part [22:46:47] Wheeeee [22:47:00] where it = the password [22:47:14] sudo -u apache php /usr/local/apache/common/mutliversion/MWScript.php [22:47:15] New patchset: Bsitu; "Re-enable Echo on testwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57673 [22:47:29] thanks notpeter [22:47:50] yep [22:48:44] !log running updateSpecialPages on cswikinews per request [22:48:46] Logged the message, Master [22:48:46] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57673 [22:48:49] Reedy: where is that mutli ? [22:48:57] Danny_B: ^ [22:49:03] Danny_B: reload the page [22:49:10] Nemo_bis: What? [22:49:26] Reedy: there's a typo in your line above [22:49:33] lol [22:49:35] was it just an IRC typo or what [22:49:41] sudo -u apache php /usr/local/apache/common/multiversion/MWScript.php [22:50:08] Chad said to use mwdeploy though? [22:50:31] Most scripts have to be run as apache now [22:50:40] So that's out of date info [22:50:42] New patchset: Pyoungmeister; "adding per-node and per-instance monitoring for SANITARIUM" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57665 [22:50:50] :( [22:51:16] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57665 [22:51:33] mutante: awesome, thank you! [22:51:43] !log bsitu synchronized wmf-config/InitialiseSettings.php 'Re-enable Echo on testwiki' [22:51:50] Logged the message, Master [23:00:10] New patchset: Pyoungmeister; "different resource require different resource names" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57676 [23:01:37] New patchset: Pyoungmeister; "different resource require different resource names" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57676 [23:02:53] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57676 [23:04:52] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [23:06:02] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [23:07:18] dzahn is doing a graceful restart of all apaches [23:08:01] !log dzahn gracefulled all apaches [23:08:07] Logged the message, Master [23:11:02] RECOVERY - DPKG on db1057 is OK: All packages OK [23:11:07] !log DNS update - adding wikinews.com [23:11:13] Logged the message, Master [23:11:32] RECOVERY - RAID on db1057 is OK: OK: State is Optimal, checked 2 logical device(s) [23:11:32] RECOVERY - Disk space on db1057 is OK: DISK OK [23:17:32] !log wikinews.com now redirects to wikinews.org [23:17:39] Logged the message, Master [23:20:38] I will run scap soon [23:24:01] !log asher synchronized wmf-config/db-eqiad.php 'pulling db1043' [23:24:08] Logged the message, Master [23:24:59] !log shutting down mysql on db1043, preparing to reboot [23:25:06] Logged the message, Master [23:27:13] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [23:29:02] PROBLEM - Host db1043 is DOWN: PING CRITICAL - Packet loss = 100% [23:29:17] hey, in our standard wiki apache config, we have 'RewriteRule ^/$ /w/index.php'. why/how does that also work even when there is no trailing slash? [23:29:29] New patchset: Asher; "raising default max_connections (was previously lowered from 5000)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57679 [23:30:13] RECOVERY - Host db1043 is UP: PING OK - Packet loss = 0%, RTA = 0.37 ms [23:32:00] New patchset: Asher; "raising default max_connections (was previously lowered from 5000)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57679 [23:32:02] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [23:32:20] PROBLEM - mysqld processes on db1043 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [23:32:26] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57679 [23:33:03] db1043 ^^ = me [23:33:20] RECOVERY - mysqld processes on db1043 is OK: PROCS OK: 1 process with command name mysqld [23:34:30] PROBLEM - MySQL Idle Transactions Port 3308 on db1057 is CRITICAL: NRPE: Command check_mysql not defined [23:34:30] PROBLEM - MySQL Slave Running Port 3308 on db1054 is CRITICAL: NRPE: Command check_mysql not defined [23:34:30] PROBLEM - MySQL Recent Restart Port 3308 on db1054 is CRITICAL: NRPE: Command check_mysql not defined [23:34:30] PROBLEM - MySQL Slave Delay Port 3308 on db1057 is CRITICAL: NRPE: Command check_mysql not defined [23:34:40] PROBLEM - MySQL Idle Transactions on db1053 is CRITICAL: NRPE: Command check_mysql not defined [23:34:40] PROBLEM - MySQL Idle Transactions Port 3306 on db1054 is CRITICAL: NRPE: Command check_mysql not defined [23:34:40] PROBLEM - MySQL Recent Restart Port 3306 on db1057 is CRITICAL: NRPE: Command check_mysql not defined [23:34:40] PROBLEM - MySQL Slave Running Port 3306 on db1057 is CRITICAL: NRPE: Command check_mysql not defined [23:34:40] PROBLEM - MySQL Slave Delay Port 3306 on db1054 is CRITICAL: NRPE: Command check_mysql not defined [23:34:41] PROBLEM - mysqld processes on db1054 is CRITICAL: PROCS CRITICAL: 2 processes with command name mysqld [23:34:50] PROBLEM - MySQL Recent Restart on db1053 is CRITICAL: NRPE: Command check_mysql not defined [23:34:50] PROBLEM - MySQL Idle Transactions Port 3307 on db1054 is CRITICAL: NRPE: Command check_mysql not defined [23:34:50] PROBLEM - MySQL Recent Restart Port 3307 on db1057 is CRITICAL: NRPE: Command check_mysql not defined [23:34:50] PROBLEM - MySQL Slave Running Port 3307 on db1057 is CRITICAL: NRPE: Command check_mysql not defined [23:34:50] PROBLEM - MySQL Slave Delay Port 3307 on db1054 is CRITICAL: NRPE: Command check_mysql not defined [23:35:00] PROBLEM - MySQL Recent Restart Port 3308 on db1057 is CRITICAL: NRPE: Command check_mysql not defined [23:35:00] PROBLEM - MySQL Slave Delay on db1053 is CRITICAL: NRPE: Command check_mysql not defined [23:35:00] PROBLEM - MySQL Slave Running Port 3308 on db1057 is CRITICAL: NRPE: Command check_mysql not defined [23:35:00] PROBLEM - MySQL Slave Delay Port 3308 on db1054 is CRITICAL: NRPE: Command check_mysql not defined [23:35:00] PROBLEM - MySQL Idle Transactions Port 3308 on db1054 is CRITICAL: NRPE: Command check_mysql not defined [23:35:10] PROBLEM - MySQL Slave Running Port 3306 on db1054 is CRITICAL: NRPE: Command check_mysql not defined [23:35:10] PROBLEM - MySQL Slave Running on db1053 is CRITICAL: NRPE: Command check_mysql not defined [23:35:10] PROBLEM - MySQL Recent Restart Port 3306 on db1054 is CRITICAL: NRPE: Command check_mysql not defined [23:35:10] PROBLEM - MySQL Slave Delay Port 3306 on db1057 is CRITICAL: NRPE: Command check_mysql not defined [23:35:10] PROBLEM - MySQL Idle Transactions Port 3306 on db1057 is CRITICAL: NRPE: Command check_mysql not defined [23:35:11] PROBLEM - mysqld processes on db1057 is CRITICAL: PROCS CRITICAL: 3 processes with command name mysqld [23:35:20] PROBLEM - MySQL Recent Restart Port 3307 on db1054 is CRITICAL: NRPE: Command check_mysql not defined [23:35:20] PROBLEM - MySQL Slave Delay Port 3307 on db1057 is CRITICAL: NRPE: Command check_mysql not defined [23:35:20] PROBLEM - MySQL Idle Transactions Port 3307 on db1057 is CRITICAL: NRPE: Command check_mysql not defined [23:35:20] PROBLEM - MySQL Slave Running Port 3307 on db1054 is CRITICAL: NRPE: Command check_mysql not defined [23:36:00] fuck up, space [23:36:44] New patchset: Pyoungmeister; "those spaces must be _ for any of this to work..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57680 [23:37:36] uhhhh [23:37:49] oh phew [23:38:33] New patchset: Pyoungmeister; "space -> _ in resource name" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57681 [23:39:24] Change abandoned: Pyoungmeister; "(no reason)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57680 [23:39:33] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57681 [23:40:11] binasher: still pre-prod! :)_ [23:40:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:41:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.142 second response time [23:41:24] I'm going to do an out of cycle sync of CommonSettings to try to fix https://test.wikipedia.org/ [23:41:27] notpeter: it's the sanitarium.. it's supposed to occasionally throw "check? what check?!" [23:41:57] it must first be crazy to become sane. [23:42:02] csteipp, let me know when you're done your deployment. [23:42:12] superm401: I'm not [23:42:18] Oh, are you guys done? [23:42:38] csteipp, we're supposed to be, but https://test.wikipedia.org/ is down. [23:42:45] E2 is done [23:42:53] So I'd like to sync one file if possible to try to fix it. [23:43:06] superm401: Yeah, I haven't started anything. I was waiting for you guys to fix the site [23:43:11] Failed opening '/home/wikipedia/common/php-1.22wmf1/extensions/E3Experiments/E3Experiments.php' for inclusion [23:43:25] Okay, then I'll go ahead. [23:43:52] !log asher synchronized wmf-config/db-eqiad.php 'returning db1043' [23:43:59] Logged the message, Master [23:44:23] superm401: how long will that take (have you started?)? [23:44:30] PROBLEM - NTP on db1043 is CRITICAL: NTP CRITICAL: Offset unknown [23:44:37] greg-g, yes, I've started, only a minute or so. [23:44:40] ok [23:44:43] !log asher synchronized wmf-config/db-eqiad.php 'pulling db1043' [23:44:50] Logged the message, Master [23:46:08] !log bsitu Started syncing Wikimedia installation... : Update Echo to master [23:46:15] Logged the message, Master [23:46:25] !log asher synchronized wmf-config/db-eqiad.php 'returning db1043' [23:46:32] Logged the message, Master [23:46:58] New patchset: Odder; "(bug 46712) Set a different favicon for iswiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57683 [23:48:56] <^demon> Is someone sync'ing out extension-list to fix the fatal on testwiki? [23:48:57] Oi! Mid-air collision detected! [23:49:07] ^demon: yep [23:49:07] !log mflaschen synchronized wmf-config/CommonSettings.php [23:49:14] ^demon: there ^ [23:49:15] Logged the message, Master [23:49:16] ;) [23:49:19] <^demon> That's not extension-list [23:49:21] Gerrit Notification Bot <3<3<3 [23:49:30] RECOVERY - NTP on db1043 is OK: NTP OK: Offset 0.003011226654 secs [23:49:33] hrm [23:50:18] <^demon> I see nothing in the git log for it, nor do I see anything non-committed. [23:50:29] ^demon, the extension-list actually includes it? [23:50:34] I thought only CommonSettings should. [23:50:54] <^demon> include_once(/home/wikipedia/common/php-1.22wmf1/extensions/E3Experiments/E3Experiments.php): failed to open stream: No such file or directory in /home/wikipedia/common/php-1.22wmf1/maintenance/mergeMessageFileList.php [23:51:02] <^demon> There's the clue, mergeMessageFileList.php [23:51:20] !log demon synchronized wmf-config/extension-list 'Remove e3 since it was deleted' [23:51:27] Logged the message, Master [23:52:21] ^demon, sorry, I should have read the error more closely. [23:52:31] BTW: Wouldn't it be useful for the Gerrit Notification Bot to add the 'patch-in-gerrit' keyword when adding Gerrit link to the bug? I think andre__ actively uses that keyword. [23:52:44] odder: yes. [23:52:45] <^demon> odder: Not possible yet. [23:52:48] <^demon> But yes, would be nice. [23:52:55] <^demon> I'm waiting on a new release of j2bugzilla. [23:53:26] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [23:53:46] Great ^demon, thanks :) I'm sure Andre will be happy! [23:54:16] RECOVERY - MySQL Slave Running Port 3307 on db1054 is OK: OK replication [23:54:16] RECOVERY - MySQL Recent Restart Port 3307 on db1054 is OK: OK 111310 seconds since restart [23:54:26] RECOVERY - MySQL Slave Running Port 3308 on db1057 is OK: OK replication [23:54:26] RECOVERY - MySQL Recent Restart Port 3308 on db1054 is OK: OK seconds since restart [23:54:26] RECOVERY - MySQL Slave Delay Port 3308 on db1057 is OK: OK replication delay seconds [23:54:26] RECOVERY - MySQL Slave Running Port 3308 on db1054 is OK: OK replication [23:54:26] RECOVERY - MySQL Idle Transactions Port 3308 on db1057 is OK: OK longest blocking idle transaction sleeps for 0 seconds [23:54:36] RECOVERY - MySQL Idle Transactions Port 3308 on db1054 is OK: OK longest blocking idle transaction sleeps for seconds [23:54:36] RECOVERY - MySQL Slave Delay Port 3306 on db1054 is OK: OK replication delay seconds [23:54:36] RECOVERY - MySQL Slave Running Port 3306 on db1057 is OK: OK replication [23:54:36] RECOVERY - MySQL Recent Restart Port 3306 on db1057 is OK: OK 101816 seconds since restart [23:54:36] RECOVERY - MySQL Idle Transactions Port 3306 on db1054 is OK: OK longest blocking idle transaction sleeps for 0 seconds [23:54:44] <^demon> Heh, l10nupdate is much faster with repacked git repos. [23:54:46] RECOVERY - MySQL Slave Delay Port 3308 on db1054 is OK: OK replication delay seconds [23:54:46] RECOVERY - MySQL Slave Delay Port 3307 on db1054 is OK: OK replication delay seconds [23:54:46] RECOVERY - MySQL Recent Restart Port 3307 on db1057 is OK: OK 111349 seconds since restart [23:54:46] RECOVERY - MySQL Idle Transactions Port 3307 on db1054 is OK: OK longest blocking idle transaction sleeps for 0 seconds [23:54:46] RECOVERY - MySQL Slave Running Port 3307 on db1057 is OK: OK replication [23:55:06] RECOVERY - MySQL Slave Running Port 3306 on db1054 is OK: OK replication [23:55:06] RECOVERY - MySQL Recent Restart Port 3306 on db1054 is OK: OK 111374 seconds since restart [23:55:06] RECOVERY - MySQL Slave Delay Port 3306 on db1057 is OK: OK replication delay seconds [23:55:06] RECOVERY - MySQL Idle Transactions Port 3306 on db1057 is OK: OK longest blocking idle transaction sleeps for 0 seconds [23:55:06] RECOVERY - mysqld processes on db1057 is OK: PROCS OK: 3 processes with command name mysqld [23:55:16] RECOVERY - MySQL Recent Restart Port 3308 on db1057 is OK: OK 94122 seconds since restart [23:55:16] RECOVERY - MySQL Slave Delay Port 3307 on db1057 is OK: OK replication delay seconds [23:55:16] RECOVERY - MySQL Idle Transactions Port 3307 on db1057 is OK: OK longest blocking idle transaction sleeps for 0 seconds [23:56:53] New patchset: Mattflaschen; "Remove E3Experiments and LastModified." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57687 [23:57:39] Change merged: Mattflaschen; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57687