[00:19:14] RECOVERY - check_job_queue on hume is OK: JOBQUEUE OK - all job queues below 10,000 [00:22:24] PROBLEM - check_job_queue on hume is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:51:19] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:52:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 2.658 second response time [01:21:10] (03PS1) 10Dereckson: Add filemover group on it.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83386 [01:39:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:41:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [01:49:35] RECOVERY - check_job_queue on fenari is OK: JOBQUEUE OK - all job queues below 10,000 [01:52:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:52:45] PROBLEM - check_job_queue on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:53:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.138 second response time [01:57:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:00:59] !log LocalisationUpdate failed: git pull of extensions failed [02:01:03] Logged the message, Master [02:01:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [02:29:39] RECOVERY - check_job_queue on fenari is OK: JOBQUEUE OK - all job queues below 10,000 [02:32:49] PROBLEM - check_job_queue on fenari is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:43:51] Is there a bug about LocalisationUpdate failing? [03:02:43] https://bugzilla.wikimedia.org/show_bug.cgi?id=53890 [03:14:26] PROBLEM - Puppet freshness on sq42 is CRITICAL: No successful Puppet run in the last 10 hours [03:14:44] (03PS1) 10Ori.livneh: Log more Navigation Timing events on mobile [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83388 [03:22:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:23:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [04:16:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:17:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [04:30:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:31:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 1.326 second response time [04:34:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:36:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [04:51:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:52:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 5.323 second response time [05:22:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:23:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [05:52:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:53:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.148 second response time [05:57:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:58:16] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.192 second response time [06:14:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:15:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 8.242 second response time [06:36:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:41:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.947 second response time [07:06:31] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: No successful Puppet run in the last 10 hours [07:40:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:41:46] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [08:48:21] PROBLEM - Disk space on analytics1011 is CRITICAL: DISK CRITICAL - free space: /var/lib/hadoop/data/g 74662 MB (3% inode=99%): [09:28:34] (03CR) 10Ori.livneh: [C: 04-1] "Not sure yet if that's what I want to do." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83388 (owner: 10Ori.livneh) [09:31:39] (03PS1) 10Ori.livneh: Ganglia backend for StatsD: Emit metadata every send_metadata_interval [operations/puppet] - 10https://gerrit.wikimedia.org/r/83396 [09:33:34] (03PS2) 10Ori.livneh: NavigationTiming StatsD instance: flush every 5 mins [operations/puppet] - 10https://gerrit.wikimedia.org/r/83230 [09:50:40] PROBLEM - MySQL Processlist on db1021 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 42 copy to table, 0 statistics [09:53:40] RECOVERY - MySQL Processlist on db1021 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 1 statistics [10:55:32] (03CR) 10Jalexander: [C: 031] "looks good to me, adding a couple who may be able to push." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83218 (owner: 10TTO) [12:14:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:15:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 5.938 second response time [13:15:11] PROBLEM - Puppet freshness on sq42 is CRITICAL: No successful Puppet run in the last 10 hours [13:40:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:41:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [14:01:24] (03CR) 10Ottomata: "(1 comment)" [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/83191 (owner: 10Ottomata) [14:04:01] (03PS3) 10Ottomata: Updating kafka script and init scripts with recent changes in Kafka bin/*.sh scripts from 0.8 branch. [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/83137 [14:04:16] (03PS4) 10Ottomata: Updating kafka scripts with recent changes from 0.8 branch. [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/83137 [14:05:32] (03CR) 10Ottomata: [C: 032 V: 032] Updating kafka scripts with recent changes from 0.8 branch. [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/83137 (owner: 10Ottomata) [14:06:08] hiaaaooo [14:06:19] can any shell wizards out there find a better way to do this? [14:06:21] https://gerrit.wikimedia.org/r/#/c/83191/1/debian/bin/kafka [14:15:54] (03PS3) 10Ottomata: Adding environment var ZOOKEEPER_URL. [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/83191 [14:16:00] (03CR) 10Ottomata: [C: 032 V: 032] Adding environment var ZOOKEEPER_URL. [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/83191 (owner: 10Ottomata) [14:24:35] (03CR) 10Ottomata: [C: 032 V: 032] Added README.md markdown file [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/83347 (owner: 10Edenhill) [14:44:50] (03PS8) 10Ottomata: (WIP) Initial Debian version [operations/software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/78782 (owner: 10Faidon Liambotis) [14:45:18] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:47:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 9.031 second response time [14:47:28] (03PS9) 10Ottomata: (WIP) Initial Debian version [operations/software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/78782 (owner: 10Faidon Liambotis) [15:00:02] (03CR) 10Edenhill: [C: 031] "(1 comment)" [operations/software/varnish/varnishkafka] (debian) - 10https://gerrit.wikimedia.org/r/78782 (owner: 10Faidon Liambotis) [15:19:30] hi, is someone arround who can put debian packages to apt.wikimedia.org ? [15:59:34] ottomata: [15:59:39] args="a b c --zookeeper d e f" [15:59:45] TMP=${args/--zookeeper} [15:59:51] if [ "${#args}" -ne "${#TMP}" ]; then echo "has zookeeper arg"; fi [16:01:00] it's the standard hack: the '/' strips '--zookeeper' if it finds it, and then you compare variable lengths to see if they're the same (meaning no --zookeeper was matched and stripped) or unequal (meaning args had '--zookeeper') [16:01:36] but you'd have to switch the shebang to #!/bin/bash; i'm not sure dash (which ubuntu uses as /bin/sh) supports that kind of parameter expansion [16:04:04] i'd keep the grep probably but express it as: grep -- '--zookeeper' <<<"$*" [16:11:32] yeah, ori-l, i could do this in bash [16:11:40] [[ ]] has wildcard support [16:11:41] can do [16:11:56] [[ $@ != *—zookeeper* ]] [16:11:57] or somethign [16:12:04] but since this is going to be part of a shipped .deb [16:12:08] faidon wants me to use sh [16:12:38] well, use grep then [16:12:40] https://wiki.ubuntu.com/DashAsBinSh#A.24.7Bparm.2BAC8.3F.2BAC8-pat.5B.2BAC8-str.5D.7D [16:12:43] hehe :p [16:12:52] and merge a couple of patches for me! [16:12:57] it will help you in unspecified ways [16:13:04] https://gerrit.wikimedia.org/r/#/c/83230/ [16:13:10] haha, sure! (i'm in a meeting right now though) [16:13:21] it will help the meeting! [16:13:42] just kidding. don't worry about it if you're in a meeting. [16:14:34] i got it open in a tab, will check it soon [16:15:09] k. the js changes are just ported from analytics/statsd-ganglia, don't need to be reviewed [16:48:20] (03PS1) 10Nikerabbit: ULS event logging for wiktionaries [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83418 [17:00:06] (03Abandoned) 10Matthias Mullie: Add cron to generate CSVs for ee-dashboard [operations/puppet] - 10https://gerrit.wikimedia.org/r/71282 (owner: 10Matthias Mullie) [17:06:55] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: No successful Puppet run in the last 10 hours [17:09:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:10:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 8.026 second response time [17:13:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:14:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.399 second response time [17:16:08] hi, is someone arround who can put debian packages to apt.wikimedia.org ? [17:16:29] hey physikerwelt :-] [17:16:35] hi [17:16:40] ops could, they are together talking to each other face to face [17:16:51] so most probably not reading this channel right now :D [17:17:17] was it for latexml ? [17:17:27] ok I was wondering since I have put this questions approxmently 10 times to the channel [17:17:28] ja [17:17:30] ye [17:17:31] s [17:18:34] should I file a bugreport at bugzilla`? [17:24:11] physikerwelt: Has anyone reviewed them? [17:24:57] hmz, saves on mediawiki.org are slow / sometimes timing out for me [17:25:20] Big/complex pages? [17:25:26] The whole LaTeXML code? No, not someone at wikimedia just the debian installation script [17:25:26] nope [17:25:53] I think there is a internal code review at NIST but I'm not sure [17:26:39] does that mean the NSA reviewed it? :-P [17:26:52] physikerwelt: depends on the group that produced the code; but it is strongly encouraged [17:27:19] I'd say that probably about 40% of the code is at least reviewed by team members :p [17:27:20] small, simple pages. not seeing the issue on en.wp. getting actual timeouts / wikimedia errors on mediawiki.org though [17:27:41] saves end up going through anyway, but error reported to user [17:29:04] The idea with the package in the repository can be changed quickly if there are errors in the code [17:29:29] I think it would be hard to find someone to review the whole latexml code [17:29:32] Reedy, try editing e.g. https://www.mediawiki.org/wiki/Project:Sandbox [17:31:15] does timeout for me apparently weird [17:34:05] Reedy: I am wondering if that is varnish related [17:34:33] err squid [17:34:49] from 208.80.154.136 via cp1005.eqiad.wmnet (squid/2.7.STABLE9) to 10.64.0.131 (10.64.0.131) Error: ERR_READ_TIMEOUT, errno [No Error] [17:36:07] 402 entries from mediawikiwiki in slow-parse.log, 338 of them from VisualEditor/TranslationCentral [17:36:13] not sure if related [17:37:19] concerning https://git.wikimedia.org/summary/operations%2Fdebs%2Flatexml can I do anything to get in the apt repository? [17:37:36] physikerwelt: e-mail ops-requests@rt.wikimedia.org [17:38:03] ori-l: thanks [17:43:18] (03CR) 10Ryan Lane: [C: 032] Add GCM cipher and remove DES [operations/puppet] - 10https://gerrit.wikimedia.org/r/83043 (owner: 10Ryan Lane) [17:43:46] !log adding GCM cipher and removing DES from SSL config [17:43:50] Logged the message, Master [17:44:12] what's up all these? "logmsgbot: LocalisationUpdate failed: git pull of extensions failed" https://wikitech.wikimedia.org/wiki/Server_admin_log [17:44:21] robla: Git repo problems after the migration [17:44:31] Missing stuff [17:44:45] !log depooling ssl1001 [17:44:48] Logged the message, Master [17:45:06] robla: https://bugzilla.wikimedia.org/show_bug.cgi?id=53890 & https://bugzilla.wikimedia.org/show_bug.cgi?id=53841 [17:49:04] (03PS1) 10Springle: add springle to icinga sms [operations/puppet] - 10https://gerrit.wikimedia.org/r/83421 [17:50:26] (03PS2) 10Hashar: Update some rss/atom feeds in the planet wikimedia config [operations/puppet] - 10https://gerrit.wikimedia.org/r/83348 (owner: 10Jeroen De Dauw) [17:51:27] (03PS2) 10Springle: add springle to icinga sms [operations/puppet] - 10https://gerrit.wikimedia.org/r/83421 [17:51:40] (03CR) 10Springle: [C: 032 V: 032] add springle to icinga sms [operations/puppet] - 10https://gerrit.wikimedia.org/r/83421 (owner: 10Springle) [17:52:19] ori-l: I'm even to stupid to send an email... I got the reply User 'schubotz@tu-berlin.de' could not be loaded in the mail gateway do I have to use a special address? [17:53:29] * bawolff attampts to be naggy about RT 5735 (vhtcpd not working on cp1061) [17:54:09] bawolff: half hearted attempt, there [17:54:31] Well you know, don't want to be too annoying :P [17:54:38] !log pooling ssl1001 and depooling ssl1002-5 [17:54:41] Logged the message, Master [17:55:06] I was naggy on sunday, but that kind of fell on deaf ears due to it being sunday [17:55:25] renaggify [17:56:03] i removed {{Please leave this line alone and write below (this is the coloured heading)}} , page saves again [17:56:31] snappy, even. [17:57:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:57:26] bawolff: you should be able to see https://rt.wikimedia.org/Ticket/Display.html?id=5614 now :) [17:57:59] (03PS1) 10RobH: fixing tab spacing [operations/puppet] - 10https://gerrit.wikimedia.org/r/83423 [17:58:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.135 second response time [17:58:33] physikerwelt: e-mail the request to me and i'll forward it and cc you [17:58:36] mutante_: If upgrading tzdata breaks other stuff we've got big problems ;) [17:59:51] jeremyb: I can indeed. I've now moved on to different and other varnish servers not working ;) [18:00:16] reintroduce template, timeout back [18:00:16] (03CR) 10RobH: [C: 032] "spaces > tabs, plus removed the now defunct HKT paging hours" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83423 (owner: 10RobH) [18:04:35] !log repooling ssl1002-5 and depooling ssl1006-9 [18:04:38] Logged the message, Master [18:06:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:07:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.246 second response time [18:11:07] !log repooling ssl1006-9 [18:11:10] Logged the message, Master [18:14:06] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: various misc wikis to 1.22wmf16 [18:14:08] Logged the message, Master [18:15:12] (03PS1) 10RobH: cmjohnson new sshkey [operations/puppet] - 10https://gerrit.wikimedia.org/r/83425 [18:15:22] cmjohnson1: CAn you login to gerrit and +1 that change for me? [18:15:33] just seems extra secrute if the person asking for the change approves it =] [18:15:36] secure [18:16:13] can i get a merge on https://gerrit.wikimedia.org/r/#/c/83230/ and the patch it depends on? [18:16:43] (03CR) 10Cmjohnson: [C: 031 V: 031] cmjohnson new sshkey [operations/puppet] - 10https://gerrit.wikimedia.org/r/83425 (owner: 10RobH) [18:16:51] also, who changed the jquery version for ganglia-web? [18:16:58] (03CR) 10Cmjohnson: cmjohnson new sshkey [operations/puppet] - 10https://gerrit.wikimedia.org/r/83425 (owner: 10RobH) [18:17:40] ori-l, are you still poking at the mediawiki.org breakage? [18:18:12] Eloquence: i am, but i'm a bit stuck. i've narrowed it down to Template:Please_leave_this_line_alone_and_write_below_(this_is_the_coloured_heading) [18:18:28] ori-l, it seems to occur on other pages as well though, so perhaps a general template issue of some kind [18:18:42] (03CR) 10RobH: [C: 032] cmjohnson new sshkey [operations/puppet] - 10https://gerrit.wikimedia.org/r/83425 (owner: 10RobH) [18:19:01] !log depooling ssl3001-2 [18:19:04] Logged the message, Master [18:19:15] I'm not seeing any suspicious template-related changes in wmf16 - https://www.mediawiki.org/wiki/MediaWiki_1.22/wmf16 - except for a modification of templatedata: https://gerrit.wikimedia.org/r/#/q/d66b4c60,n,z [18:20:01] * ori-l looks [18:20:44] (03PS1) 10Lcarr: adding in some mmgt addresses in ulsfo [operations/dns] - 10https://gerrit.wikimedia.org/r/83426 [18:20:58] anomie, saves on mediawiki.org are broken, ori-l is poking at it right now [18:21:25] broken as in "server reports time out but save completes" [18:21:47] some context in channel log [18:21:50] Eloquence: I'm not likely to be able to help, my next flight boards in about 10 minutes. But I'll take a quick look. [18:21:53] kk [18:22:09] <^d> Eloquence: I'm able to make edits to mw.org ok. [18:22:21] <^d> Just did https://www.mediawiki.org/w/index.php?title=User%3A%5Edemon%2Ftest&diff=779481&oldid=612875 [18:22:23] ^d: it seems to depend on the presence of templates, lemme see if I can still repro [18:22:47] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: various misc wikis to 1.22wmf16 [18:23:01] it looks very mw-related but I'm around if you folks need anything from the ops side [18:23:04] yep, try editing https://www.mediawiki.org/wiki/Project:Sandbox which just has a simple template at the top [18:25:14] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: wikiversity, wikibooks and wikisource to 1.22wmf16 [18:25:17] Logged the message, Master [18:26:07] !log upgrading mw1000-mw1009 [18:26:10] Logged the message, Mistress of the network gear. [18:26:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:26:38] I see messages in exception.log on fluorine, looks like "Lock wait timeout" in Database.php coming from Translate's use of the MessageGroupStats hook. But that could be unrelated, I suppose. [18:27:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 3.984 second response time [18:27:27] Reedy, we are updating other wikis while the save issue is still unresolved? [18:27:42] [09-Sep-2013 18:22:32] Catchable fatal error: Argument 2 passed to Wikibase\Api\ItemByTitleHelper::getEntityIds() must be an array, null given, called in /usr/local/apache/common-local/php-1.22wmf16/extensions/Wikibase/repo/includes/api/GetEntities.php on line 72 and defined at /usr/local/apache/common-local/php-1.22wmf16/extensions/Wikibase/repo/includes/api/ItemByTitleHelper.php on line 64 [18:28:05] You can ignore those [18:28:16] They should filter out from the recent list [18:28:35] wikibase is not on mediawiki.org [18:28:45] that error is on test wikidata [18:29:08] Reedy: are the timeouts only happening on mediawiki.org, or is it more widespread? [18:29:21] Reedy: you'll have newer tzdata once puppet ran on $all [18:29:28] I was just going to try editing wikiversity and see [18:29:38] mutante_: great [18:29:41] Thanks [18:29:44] Reedy: and no need to import anything.. it was actually an issue with the sync from Ubuntu mirror to our mirror [18:29:51] * anomie heads to the gate, good luck all [18:29:55] Even better! [18:30:16] !log repooling ssl3001/3002 and depooling ssl3003 [18:30:18] Logged the message, Master [18:30:23] PROBLEM - DPKG on mw108 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:30:33] PROBLEM - DPKG on mw10 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:30:33] PROBLEM - DPKG on mw103 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:30:43] PROBLEM - DPKG on mw105 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:30:43] PROBLEM - DPKG on mw102 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:30:43] PROBLEM - DPKG on mw101 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:30:47] that's me [18:30:48] ... [18:30:49] sorry [18:30:52] <- yea, so .. [18:30:54] Yay, tampa [18:30:59] since we looked at the tzdata thing.. [18:31:03] PROBLEM - DPKG on mw107 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:03] PROBLEM - DPKG on mw106 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:03] PROBLEM - DPKG on mw104 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:12] and it was really weird, there was the newer package but NOT the newer Packages.bz2 list [18:31:13] PROBLEM - DPKG on mw1007 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:13] PROBLEM - DPKG on mw1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:13] PROBLEM - DPKG on mw100 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:14] PROBLEM - DPKG on mw1009 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:14] PROBLEM - DPKG on mw1057 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:24] no problem editing https://en.wikiversity.org/wiki/Wikiversity:Sandbox [18:31:32] and then Alex ran the sync script manually and it fixed it, now all Apaches have upgrades to do [18:31:32] is https://ishmael.wikimedia.org/ defunct? [18:31:33] PROBLEM - DPKG on mw1071 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:33] PROBLEM - DPKG on mw1005 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:33] PROBLEM - DPKG on mw1023 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:33] PROBLEM - DPKG on mw1012 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:33] PROBLEM - DPKG on mw1016 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:33] PROBLEM - DPKG on mw1037 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:33] PROBLEM - DPKG on mw1004 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:34] PROBLEM - DPKG on mw1042 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:34] PROBLEM - DPKG on mw1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:35] PROBLEM - DPKG on mw1086 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:35] PROBLEM - DPKG on mw1028 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:36] PROBLEM - DPKG on mw1069 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:36] PROBLEM - DPKG on mw1019 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:37] PROBLEM - DPKG on mw1020 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:37] PROBLEM - DPKG on mw1059 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:38] PROBLEM - DPKG on mw1060 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:38] PROBLEM - DPKG on mw1052 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:39] PROBLEM - DPKG on mw1063 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:42] that's all me [18:31:43] PROBLEM - DPKG on mw1038 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:43] PROBLEM - DPKG on mw109 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:43] PROBLEM - DPKG on mw1024 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:43] PROBLEM - DPKG on mw1083 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:43] PROBLEM - DPKG on mw1026 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:43] PROBLEM - DPKG on mw1093 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:43] PROBLEM - DPKG on mw1014 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:44] PROBLEM - DPKG on mw1089 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:44] PROBLEM - DPKG on mw1036 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:45] PROBLEM - DPKG on mw1062 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:45] PROBLEM - DPKG on mw1006 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:46] PROBLEM - DPKG on mw1054 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:46] PROBLEM - DPKG on mw1095 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:47] and Leslie is running them . so Apache itself will also be upgraded [18:31:47] PROBLEM - DPKG on mw1088 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:47] PROBLEM - DPKG on mw1015 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:53] PROBLEM - DPKG on mw1010 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:53] PROBLEM - DPKG on mw1078 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:53] PROBLEM - DPKG on mw1033 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:59] i love breaking everything! [18:32:03] PROBLEM - DPKG on mw1047 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:03] PROBLEM - DPKG on mw1040 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:03] PROBLEM - DPKG on mw1022 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:03] PROBLEM - DPKG on mw1017 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:03] PROBLEM - DPKG on mw1048 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:03] PROBLEM - DPKG on mw1066 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:03] PROBLEM - DPKG on mw1099 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:04] PROBLEM - DPKG on mw1031 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:04] PROBLEM - DPKG on mw1029 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:05] PROBLEM - DPKG on mw1076 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:05] PROBLEM - DPKG on mw1079 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:13] PROBLEM - DPKG on mw1013 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:13] PROBLEM - DPKG on mw1008 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:13] PROBLEM - DPKG on mw1097 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:13] PROBLEM - DPKG on mw1055 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:13] PROBLEM - DPKG on mw1011 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:13] PROBLEM - DPKG on mw1084 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:13] PROBLEM - DPKG on mw1025 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:14] PROBLEM - DPKG on mw1034 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:14] PROBLEM - DPKG on mw1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:15] PROBLEM - DPKG on mw1090 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:15] PROBLEM - DPKG on mw1070 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:16] oh noessssss,the intertubes are broke! [18:32:16] PROBLEM - DPKG on mw1043 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:16] PROBLEM - DPKG on mw1072 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:17] PROBLEM - DPKG on mw1082 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:17] PROBLEM - DPKG on mw1068 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:18] PROBLEM - DPKG on mw1045 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:18] PROBLEM - DPKG on mw1067 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:19] PROBLEM - DPKG on mw1096 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:19] PROBLEM - DPKG on mw1049 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:20] PROBLEM - DPKG on mw1080 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:20] PROBLEM - DPKG on mw1039 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:21] PROBLEM - DPKG on mw1051 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:21] PROBLEM - DPKG on mw1021 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:22] PROBLEM - DPKG on mw1027 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:23] PROBLEM - DPKG on mw1035 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:23] PROBLEM - DPKG on mw1091 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:23] PROBLEM - DPKG on mw1032 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:24] PROBLEM - DPKG on mw1041 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:24] PROBLEM - DPKG on mw1074 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:33] PROBLEM - DPKG on mw1085 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:34] PROBLEM - DPKG on mw1092 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:34] PROBLEM - DPKG on mw1056 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:34] PROBLEM - DPKG on mw1044 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:34] PROBLEM - DPKG on mw1098 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:34] PROBLEM - DPKG on mw1058 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:34] PROBLEM - DPKG on mw1065 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:35] PROBLEM - DPKG on mw1030 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:35] PROBLEM - DPKG on mw1064 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:36] PROBLEM - DPKG on mw1094 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:36] PROBLEM - DPKG on mw1087 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:37] PROBLEM - DPKG on mw1073 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:37] PROBLEM - DPKG on mw1053 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:38] PROBLEM - DPKG on mw1018 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:38] RECOVERY - DPKG on mw10 is OK: All packages OK [18:32:39] RECOVERY - DPKG on mw103 is OK: All packages OK [18:32:41] it's fun to break them [18:32:44] RECOVERY - DPKG on mw109 is OK: All packages OK [18:32:44] PROBLEM - DPKG on mw1077 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:45] RECOVERY - DPKG on mw105 is OK: All packages OK [18:32:45] RECOVERY - DPKG on mw102 is OK: All packages OK [18:32:45] RECOVERY - DPKG on mw1015 is OK: All packages OK [18:32:45] PROBLEM - DPKG on mw1075 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:32:45] RECOVERY - DPKG on mw101 is OK: All packages OK [18:33:03] RECOVERY - DPKG on mw107 is OK: All packages OK [18:33:03] RECOVERY - DPKG on mw106 is OK: All packages OK [18:33:03] RECOVERY - DPKG on mw104 is OK: All packages OK [18:33:03] PROBLEM - DPKG on mw1081 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:33:11] MessageGroupStats is a good suspect, i don't have db access and ishmael looks defunct [18:33:13] RECOVERY - DPKG on mw1013 is OK: All packages OK [18:33:13] RECOVERY - DPKG on mw1008 is OK: All packages OK [18:33:13] RECOVERY - DPKG on mw1011 is OK: All packages OK [18:33:13] RECOVERY - DPKG on mw1007 is OK: All packages OK [18:33:13] RECOVERY - DPKG on mw1001 is OK: All packages OK [18:33:13] RECOVERY - DPKG on mw1002 is OK: All packages OK [18:33:13] RECOVERY - DPKG on mw100 is OK: All packages OK [18:33:14] RECOVERY - DPKG on mw1009 is OK: All packages OK [18:33:15] I can't decide if this is more annoying than the puppet out of date problems [18:33:23] RECOVERY - DPKG on mw108 is OK: All packages OK [18:33:24] ori-l: dbaccess? sql mediawikiwiki [18:33:33] RECOVERY - DPKG on mw1085 is OK: All packages OK [18:33:33] RECOVERY - DPKG on mw1012 is OK: All packages OK [18:33:33] RECOVERY - DPKG on mw1005 is OK: All packages OK [18:33:33] RECOVERY - DPKG on mw1016 is OK: All packages OK [18:33:33] RECOVERY - DPKG on mw1004 is OK: All packages OK [18:33:33] RECOVERY - DPKG on mw1003 is OK: All packages OK [18:33:43] RECOVERY - DPKG on mw1014 is OK: All packages OK [18:33:43] RECOVERY - DPKG on mw1006 is OK: All packages OK [18:33:53] RECOVERY - DPKG on mw1010 is OK: All packages OK [18:33:53] RECOVERY - DPKG on mw1078 is OK: All packages OK [18:34:03] RECOVERY - DPKG on mw1047 is OK: All packages OK [18:34:03] RECOVERY - DPKG on mw1022 is OK: All packages OK [18:34:03] RECOVERY - DPKG on mw1040 is OK: All packages OK [18:34:03] RECOVERY - DPKG on mw1017 is OK: All packages OK [18:34:03] RECOVERY - DPKG on mw1048 is OK: All packages OK [18:34:03] RECOVERY - DPKG on mw1066 is OK: All packages OK [18:34:03] RECOVERY - DPKG on mw1031 is OK: All packages OK [18:34:04] RECOVERY - DPKG on mw1079 is OK: All packages OK [18:34:04] RECOVERY - DPKG on mw1081 is OK: All packages OK [18:34:05] RECOVERY - DPKG on mw1076 is OK: All packages OK [18:34:05] RECOVERY - DPKG on mw1029 is OK: All packages OK [18:34:13] RECOVERY - DPKG on mw1097 is OK: All packages OK [18:34:13] RECOVERY - DPKG on mw1055 is OK: All packages OK [18:34:13] RECOVERY - DPKG on mw1025 is OK: All packages OK [18:34:13] RECOVERY - DPKG on mw1070 is OK: All packages OK [18:34:13] RECOVERY - DPKG on mw1084 is OK: All packages OK [18:34:13] RECOVERY - DPKG on mw1034 is OK: All packages OK [18:34:13] RECOVERY - DPKG on mw1072 is OK: All packages OK [18:34:14] RECOVERY - DPKG on mw1043 is OK: All packages OK [18:34:14] RECOVERY - DPKG on mw1090 is OK: All packages OK [18:34:15] RECOVERY - DPKG on mw1068 is OK: All packages OK [18:34:15] RECOVERY - DPKG on mw1045 is OK: All packages OK [18:34:16] RECOVERY - DPKG on mw1067 is OK: All packages OK [18:34:16] RECOVERY - DPKG on mw1082 is OK: All packages OK [18:34:17] RECOVERY - DPKG on mw1049 is OK: All packages OK [18:34:17] RECOVERY - DPKG on mw1080 is OK: All packages OK [18:34:18] RECOVERY - DPKG on mw1039 is OK: All packages OK [18:34:18] RECOVERY - DPKG on mw1051 is OK: All packages OK [18:34:19] RECOVERY - DPKG on mw1057 is OK: All packages OK [18:34:19] RECOVERY - DPKG on mw1027 is OK: All packages OK [18:34:20] RECOVERY - DPKG on mw1021 is OK: All packages OK [18:34:23] RECOVERY - DPKG on mw1091 is OK: All packages OK [18:34:23] RECOVERY - DPKG on mw1035 is OK: All packages OK [18:34:23] RECOVERY - DPKG on mw1032 is OK: All packages OK [18:34:23] RECOVERY - DPKG on mw1041 is OK: All packages OK [18:34:23] RECOVERY - DPKG on mw1074 is OK: All packages OK [18:34:31] ii tzdata 2013d-0ubuntu0.12.04 [18:34:33] RECOVERY - DPKG on mw1056 is OK: All packages OK [18:34:33] RECOVERY - DPKG on mw1023 is OK: All packages OK [18:34:33] RECOVERY - DPKG on mw1092 is OK: All packages OK [18:34:33] RECOVERY - DPKG on mw1071 is OK: All packages OK [18:34:33] RECOVERY - DPKG on mw1044 is OK: All packages OK [18:34:33] RECOVERY - DPKG on mw1058 is OK: All packages OK [18:34:33] RECOVERY - DPKG on mw1098 is OK: All packages OK [18:34:34] RECOVERY - DPKG on mw1037 is OK: All packages OK [18:34:34] RECOVERY - DPKG on mw1065 is OK: All packages OK [18:34:35] RECOVERY - DPKG on mw1030 is OK: All packages OK [18:34:35] RECOVERY - DPKG on mw1042 is OK: All packages OK [18:34:36] RECOVERY - DPKG on mw1053 is OK: All packages OK [18:34:36] RECOVERY - DPKG on mw1073 is OK: All packages OK [18:34:37] RECOVERY - DPKG on mw1018 is OK: All packages OK [18:34:37] RECOVERY - DPKG on mw1094 is OK: All packages OK [18:34:38] RECOVERY - DPKG on mw1087 is OK: All packages OK [18:34:38] RECOVERY - DPKG on mw1064 is OK: All packages OK [18:34:39] RECOVERY - DPKG on mw1086 is OK: All packages OK [18:34:39] RECOVERY - DPKG on mw1028 is OK: All packages OK [18:34:40] RECOVERY - DPKG on mw1069 is OK: All packages OK [18:34:40] RECOVERY - DPKG on mw1020 is OK: All packages OK [18:34:41] RECOVERY - DPKG on mw1059 is OK: All packages OK [18:34:41] RECOVERY - DPKG on mw1060 is OK: All packages OK [18:34:42] RECOVERY - DPKG on mw1019 is OK: All packages OK [18:34:42] RECOVERY - DPKG on mw1052 is OK: All packages OK [18:34:43] RECOVERY - DPKG on mw1063 is OK: All packages OK [18:34:43] RECOVERY - DPKG on mw1038 is OK: All packages OK [18:34:44] RECOVERY - DPKG on mw1024 is OK: All packages OK [18:34:44] RECOVERY - DPKG on mw1026 is OK: All packages OK [18:34:45] RECOVERY - DPKG on mw1036 is OK: All packages OK [18:34:45] RECOVERY - DPKG on mw1077 is OK: All packages OK [18:34:46] RECOVERY - DPKG on mw1083 is OK: All packages OK [18:34:46] RECOVERY - DPKG on mw1062 is OK: All packages OK [18:34:47] RECOVERY - DPKG on mw1089 is OK: All packages OK [18:34:47] RECOVERY - DPKG on mw1093 is OK: All packages OK [18:34:48] RECOVERY - DPKG on mw1054 is OK: All packages OK [18:34:48] RECOVERY - DPKG on mw1095 is OK: All packages OK [18:34:49] RECOVERY - DPKG on mw1075 is OK: All packages OK [18:34:49] RECOVERY - DPKG on mw1088 is OK: All packages OK [18:34:53] RECOVERY - DPKG on mw1033 is OK: All packages OK [18:35:03] RECOVERY - DPKG on mw1099 is OK: All packages OK [18:35:13] RECOVERY - DPKG on mw1096 is OK: All packages OK [18:39:30] !log repooling ssl3003 [18:39:33] Logged the message, Master [18:40:36] hm ori-l [18:40:42] it says [18:40:45] Can Merge [18:40:46] No [18:40:51] ohh dep [18:40:52] sorry [18:40:52] see it [18:41:05] (03PS2) 10Ottomata: Ganglia backend for StatsD: Emit metadata every send_metadata_interval [operations/puppet] - 10https://gerrit.wikimedia.org/r/83396 (owner: 10Ori.livneh) [18:41:13] PROBLEM - DPKG on mw1104 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:41:14] PROBLEM - DPKG on mw1105 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:41:28] (03CR) 10Ottomata: [C: 032 V: 032] Ganglia backend for StatsD: Emit metadata every send_metadata_interval [operations/puppet] - 10https://gerrit.wikimedia.org/r/83396 (owner: 10Ori.livneh) [18:41:33] PROBLEM - DPKG on mw1107 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:41:33] PROBLEM - DPKG on mw1102 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:41:33] PROBLEM - DPKG on mw1106 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:41:34] PROBLEM - DPKG on mw1100 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:41:38] (03PS3) 10Ottomata: NavigationTiming StatsD instance: flush every 5 mins [operations/puppet] - 10https://gerrit.wikimedia.org/r/83230 (owner: 10Ori.livneh) [18:41:43] PROBLEM - DPKG on mw1101 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:41:43] PROBLEM - DPKG on mw1103 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:41:43] PROBLEM - DPKG on mw1108 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:42:03] PROBLEM - DPKG on mw1109 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:42:09] (03CR) 10Ottomata: [C: 032 V: 032] NavigationTiming StatsD instance: flush every 5 mins [operations/puppet] - 10https://gerrit.wikimedia.org/r/83230 (owner: 10Ori.livneh) [18:43:23] PROBLEM - Apache HTTP on mw1107 is CRITICAL: Connection refused [18:43:33] PROBLEM - Apache HTTP on mw1103 is CRITICAL: Connection refused [18:43:33] RECOVERY - DPKG on mw1107 is OK: All packages OK [18:43:33] PROBLEM - Apache HTTP on mw1108 is CRITICAL: Connection refused [18:43:43] RECOVERY - DPKG on mw1101 is OK: All packages OK [18:43:43] RECOVERY - DPKG on mw1103 is OK: All packages OK [18:43:43] PROBLEM - Apache HTTP on mw1105 is CRITICAL: Connection refused [18:43:43] PROBLEM - Apache HTTP on mw1100 is CRITICAL: Connection refused [18:43:44] RECOVERY - DPKG on mw1108 is OK: All packages OK [18:44:03] RECOVERY - DPKG on mw1109 is OK: All packages OK [18:44:03] PROBLEM - Apache HTTP on mw1104 is CRITICAL: Connection refused [18:44:13] just kill that bot, christ [18:44:13] RECOVERY - DPKG on mw1104 is OK: All packages OK [18:44:13] PROBLEM - Apache HTTP on mw1101 is CRITICAL: Connection refused [18:44:14] PROBLEM - Apache HTTP on mw1109 is CRITICAL: Connection refused [18:44:14] RECOVERY - DPKG on mw1105 is OK: All packages OK [18:44:21] ori-l: icinga? [18:44:23] PROBLEM - Apache HTTP on mw1102 is CRITICAL: Connection refused [18:44:23] ori-l: ignore it [18:44:35] RECOVERY - DPKG on mw1102 is OK: All packages OK [18:44:35] RECOVERY - DPKG on mw1100 is OK: All packages OK [18:44:35] RECOVERY - DPKG on mw1106 is OK: All packages OK [18:45:31] Reedy, ori-l - did we track down the issue? can't repro it on mediawiki.org now [18:45:48] ori was suggesting translate related db issues [18:46:03] RECOVERY - DPKG on cp1061 is OK: All packages OK [18:46:23] RECOVERY - Apache HTTP on mw1102 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.315 second response time [18:46:23] RECOVERY - Apache HTTP on mw1107 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.067 second response time [18:46:24] that was me from not forcing the restart [18:46:33] RECOVERY - Apache HTTP on mw1103 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.088 second response time [18:46:33] RECOVERY - Apache HTTP on mw1108 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.041 second response time [18:46:34] yay salt for making it easy to fix (and break) [18:46:43] RECOVERY - Apache HTTP on mw1105 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.060 second response time [18:46:43] RECOVERY - Apache HTTP on mw1100 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.093 second response time [18:46:56] i don't see any more db lock errors on fluorine [18:47:03] RECOVERY - Apache HTTP on mw1104 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.074 second response time [18:47:13] RECOVERY - Apache HTTP on mw1101 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.062 second response time [18:47:13] RECOVERY - Apache HTTP on mw1109 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.097 second response time [18:47:23] it may have resolved itself, but it's a little unsatisfying to leave it at that [18:49:48] MongoDB [18:51:31] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Everything else that isn't wikipedia, wikivoyage or wikidatawiki to 1.22wmf16 [18:51:35] Logged the message, Master [18:52:50] (03PS1) 10Reedy: Everything that isn't wikipedia, wikivoyage or wikidatawiki to 1.22wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83434 [18:53:05] ori-l, is it worth opening a bug about the MessageGroupStats related lock timeouts or is it too sporadic at this point? [18:53:17] (03CR) 10Reedy: [C: 032] Everything that isn't wikipedia, wikivoyage or wikidatawiki to 1.22wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83434 (owner: 10Reedy) [18:53:26] There might already be one [18:53:31] There's a few various ones like that [18:54:03] there is https://bugzilla.wikimedia.org/show_bug.cgi?id=51410 [18:54:29] (03Merged) 10jenkins-bot: Everything that isn't wikipedia, wikivoyage or wikidatawiki to 1.22wmf16 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83434 (owner: 10Reedy) [18:55:10] Mon Sep 9 18:24:16 UTC 2013 mw1194 mediawikiwiki MessageGroupStats::clear 10.64.16.8 1205 Lock wait timeout exceeded; try restarting transaction (10.64.16.8) DELETE FROM `translate_groupstats` WHERE tgs_group = 'page-Communication' AND tgs_lang = 'hu' [18:55:10] Mon Sep 9 18:24:56 UTC 2013 mw1196 mediawikiwiki MessageGroupStats::clear 10.64.16.8 1205 Lock wait timeout exceeded; try restarting transaction (10.64.16.8) DELETE FROM `translate_groupstats` WHERE tgs_group = 'page-Communication' AND tgs_lang = 'hu' [18:55:10] Mon Sep 9 18:25:11 UTC 2013 mw1189 mediawikiwiki MessageGroupStats::clear 10.64.16.8 1205 Lock wait timeout exceeded; try restarting transaction (10.64.16.8) DELETE FROM `translate_groupstats` WHERE tgs_group = 'page-Communication' AND tgs_lang = 'hu' [18:55:20] There's also: [18:55:20] Mon Sep 9 18:51:24 UTC 2013 snapshot1004 enwikiquote Error connecting to 10.64.16.40: Access denied for user 'wikiadmin'@'10.64.16.142' (using password: YES) [18:56:12] !log reedy synchronized php-1.22wmf15 [18:56:15] Logged the message, Master [18:57:33] !log reedy synchronized php-1.22wmf16 [18:57:36] Logged the message, Master [18:58:00] csteipp: I'm done now... [18:58:01] the access denieds are rampant in the archives, don't think it's related [18:58:20] Reedy: Thanks! I'll push in a few mintues [18:58:38] ori-l: Sounds something relatively simple to fix though... [18:59:41] there are also a couple of LinksUpdate::incrTableUpdate locks from mediawikiwiki in the log that precede the MessageGroupStats stuff by about 10 mins [19:00:51] Eloquence: i don't think i have a good enough grip on what happened to write a useful bug report [19:01:12] your update to 51410 looks good. [19:02:53] paravoid: got a second to help me debug something? [19:05:26] ori-l, kk, added a small note to the existing report about deadlock issues [19:10:14] Mon Sep 9 18:53:48 UTC 2013 snapshot1004 elwiki Error connecting to 10.64.16.40: Access denied for user 'wikiadmin'@'10.64.16.142' (using password: YES) [19:10:22] springle: ^ Can we fix these up? [19:10:50] Reedy: looking [19:12:13] Looks to be only 10.64.16.40-42 atm [19:12:57] Ouch [19:13:12] 1106/1189 lines of the dberror og are access denied [19:14:27] Seems to be only those 3 hosts [19:14:44] Oh [19:14:48] bbiab [19:14:50] It's externalstorage... [19:15:30] I suspect it should probably just be connecting as wikiuser [19:17:13] csteipp: PHP Fatal error: Call to a member function getAuthToken() on a non-object in /usr/local/apache/common-local/php-1.22wmf16/extensions/CentralAuth/CentralAuthHoo [19:17:13] ks.php on line 653 [19:27:25] wikiadmin grants are different between es hosts [19:50:15] (03PS1) 10BBlack: needs to detect close rucing RECVWAIT state as well... [operations/software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/83542 [19:50:38] ori-l: what's up? [19:52:45] (03PS1) 10BBlack: 0.0.10 release stuff [operations/software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/83543 [19:56:56] (03CR) 10BBlack: [C: 032 V: 032] needs to detect close rucing RECVWAIT state as well... [operations/software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/83542 (owner: 10BBlack) [19:57:14] (03CR) 10BBlack: [C: 032 V: 032] 0.0.10 release stuff [operations/software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/83543 (owner: 10BBlack) [20:01:29] !log upgrading mw111* [20:01:33] Logged the message, Mistress of the network gear. [20:02:56] (03PS1) 10BBlack: Merge branch 'master' into debian [operations/software/varnish/vhtcpd] (debian) - 10https://gerrit.wikimedia.org/r/83545 [20:02:57] (03PS1) 10BBlack: bump pkg version [operations/software/varnish/vhtcpd] (debian) - 10https://gerrit.wikimedia.org/r/83546 [20:03:12] (03CR) 10BBlack: [C: 032 V: 032] Merge branch 'master' into debian [operations/software/varnish/vhtcpd] (debian) - 10https://gerrit.wikimedia.org/r/83545 (owner: 10BBlack) [20:03:32] hexmode: i suppose you are the one with the Galaga highscore ?:) [20:04:05] paravoid: if i set statsd to flush to ganglia every minute, everything is fine; if i set it to flush every five minutes, the host disappears from ganglia [20:04:19] as is typical of ganglia, there are about a dozen configuration values that are ostensibly related to data expiry [20:04:43] PROBLEM - DPKG on mw111 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:04:49] !log fixed broken wikiadmin grants on es100[2-4] [20:04:52] Logged the message, Master [20:04:58] otto merged my change earlier to have metric metadata reported every 60 seconds, but that didn't seem to fix it [20:04:59] Yay [20:05:03] PROBLEM - DPKG on mw1110 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:05:13] PROBLEM - DPKG on mw1118 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:05:33] PROBLEM - DPKG on mw1119 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:05:33] PROBLEM - DPKG on mw1113 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:05:33] PROBLEM - DPKG on mw1112 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:05:33] PROBLEM - DPKG on mw1116 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:05:39] (03CR) 10BBlack: [C: 032 V: 032] bump pkg version [operations/software/varnish/vhtcpd] (debian) - 10https://gerrit.wikimedia.org/r/83546 (owner: 10BBlack) [20:05:43] PROBLEM - DPKG on mw1115 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:05:43] PROBLEM - DPKG on mw1111 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:05:43] PROBLEM - DPKG on mw1117 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:05:48] dmax is 0 (=never expire); tmax is flushInterval (=5 minutes, but tmax is ornamental anyhow) [20:05:53] PROBLEM - DPKG on mw1114 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:06:32] https://gerrit.wikimedia.org/r/#/c/83201/ would help [20:06:34] PROBLEM - Apache HTTP on mw111 is CRITICAL: Connection refused [20:06:43] RECOVERY - DPKG on mw111 is OK: All packages OK [20:06:53] PROBLEM - Apache HTTP on mw1110 is CRITICAL: Connection refused [20:06:53] PROBLEM - Apache HTTP on mw1115 is CRITICAL: Connection refused [20:07:13] RECOVERY - DPKG on mw1118 is OK: All packages OK [20:07:13] PROBLEM - Apache HTTP on mw1113 is CRITICAL: Connection refused [20:07:23] PROBLEM - Apache HTTP on mw1118 is CRITICAL: Connection refused [20:07:32] LeslieCarr: are you doing upgrades again? [20:07:33] RECOVERY - DPKG on mw1119 is OK: All packages OK [20:07:33] RECOVERY - DPKG on mw1113 is OK: All packages OK [20:07:33] PROBLEM - Apache HTTP on mw1111 is CRITICAL: Connection refused [20:07:33] RECOVERY - DPKG on mw1112 is OK: All packages OK [20:07:34] RECOVERY - DPKG on mw1116 is OK: All packages OK [20:07:45] yep [20:07:52] ok [20:07:54] RECOVERY - DPKG on mw1114 is OK: All packages OK [20:08:04] RECOVERY - DPKG on mw1110 is OK: All packages OK [20:08:14] RECOVERY - Apache HTTP on mw1113 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.088 second response time [20:08:16] !log upgrading mw112* [20:08:19] Logged the message, Mistress of the network gear. [20:08:24] RECOVERY - DPKG on mw1117 is OK: All packages OK [20:08:24] RECOVERY - Apache HTTP on mw1118 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.067 second response time [20:08:34] RECOVERY - Apache HTTP on mw1111 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.085 second response time [20:08:34] RECOVERY - Apache HTTP on mw111 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.245 second response time [20:08:54] RECOVERY - DPKG on mw1115 is OK: All packages OK [20:08:54] RECOVERY - DPKG on mw1111 is OK: All packages OK [20:08:54] RECOVERY - Apache HTTP on mw1110 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.100 second response time [20:08:54] RECOVERY - Apache HTTP on mw1115 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.095 second response time [20:09:57] whaaaaa. it showed up. [20:11:14] PROBLEM - DPKG on mw112 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:11:25] paravoid: never mind. appears to work now. [20:11:35] PROBLEM - DPKG on mw1122 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:11:35] PROBLEM - DPKG on mw1125 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:11:35] PROBLEM - DPKG on mw1128 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:11:54] PROBLEM - DPKG on mw1129 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:12:05] PROBLEM - DPKG on mw1123 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:12:05] PROBLEM - DPKG on mw1127 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:12:06] PROBLEM - DPKG on mw1126 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:12:14] PROBLEM - DPKG on mw1120 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:12:15] PROBLEM - DPKG on mw1121 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:12:15] PROBLEM - DPKG on mw1124 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:13:14] RECOVERY - DPKG on mw112 is OK: All packages OK [20:14:04] RECOVERY - DPKG on mw1123 is OK: All packages OK [20:14:04] RECOVERY - DPKG on mw1126 is OK: All packages OK [20:14:04] RECOVERY - DPKG on mw1127 is OK: All packages OK [20:14:14] RECOVERY - DPKG on mw1120 is OK: All packages OK [20:14:15] RECOVERY - DPKG on mw1121 is OK: All packages OK [20:14:15] RECOVERY - DPKG on mw1124 is OK: All packages OK [20:14:35] RECOVERY - DPKG on mw1128 is OK: All packages OK [20:14:35] RECOVERY - DPKG on mw1125 is OK: All packages OK [20:14:35] RECOVERY - DPKG on mw1122 is OK: All packages OK [20:14:54] RECOVERY - DPKG on mw1129 is OK: All packages OK [20:16:43] (03Abandoned) 10Ori.livneh: Log more Navigation Timing events on mobile [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83388 (owner: 10Ori.livneh) [20:18:58] !log bumped vhtcpd to 0.0.10 on brewster (conn close bugfix) [20:19:01] Logged the message, Master [20:19:03] !log upgrading mw113* servers [20:19:06] Logged the message, Mistress of the network gear. [20:21:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:21:34] PROBLEM - DPKG on mw113 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:21:43] !log upgrading gdnsd to 1.10 [20:21:47] Logged the message, Master [20:22:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.665 second response time [20:22:34] PROBLEM - DPKG on mw1136 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:22:41] hey akosiaris, you around? [20:22:43] or paravoid? [20:22:52] I am, but in the middle of something [20:22:52] ottomata: yes [20:22:54] PROBLEM - DPKG on mw1137 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:22:59] qchris and I are having a really weird ubuntu/debian .so issue with the libsnappy-java package [20:23:04] PROBLEM - DPKG on mw1131 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:23:04] PROBLEM - DPKG on mw1138 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:23:04] PROBLEM - DPKG on mw1132 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:23:05] it seems like it is missing a .so [20:23:07] but we aren't sure [20:23:14] PROBLEM - DPKG on mw1133 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:23:14] PROBLEM - DPKG on mw1139 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:23:18] akosiaris: got a sec to join our hangout and discuss with us? [20:23:24] https://plus.google.com/hangouts/_/2da993a9acec7936399e9d78d13bf7ec0c0afdbc [20:23:34] PROBLEM - DPKG on mw1135 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:23:34] PROBLEM - DPKG on mw1134 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:23:34] RECOVERY - DPKG on mw113 is OK: All packages OK [20:24:16] I am at the office, but I doubt I can find an empty meet room. Lemme check a sec [20:25:04] RECOVERY - DPKG on mw1131 is OK: All packages OK [20:25:04] RECOVERY - DPKG on mw1138 is OK: All packages OK [20:25:04] RECOVERY - DPKG on mw1132 is OK: All packages OK [20:25:14] RECOVERY - DPKG on mw1139 is OK: All packages OK [20:25:14] RECOVERY - DPKG on mw1133 is OK: All packages OK [20:25:24] akosiaris: Stationary cupboard [20:25:34] RECOVERY - DPKG on mw1134 is OK: All packages OK [20:25:34] RECOVERY - DPKG on mw1135 is OK: All packages OK [20:25:34] RECOVERY - DPKG on mw1136 is OK: All packages OK [20:25:49] !log upgrading mw114* [20:25:52] Logged the message, Mistress of the network gear. [20:25:54] RECOVERY - DPKG on mw1137 is OK: All packages OK [20:26:02] ahh, ha, go sit on the hammock! nobody minds :p [20:26:14] i'm on the hammock! [20:26:19] trying to steal my hammock spot ? [20:26:23] ottomata: where are you actually ? [20:27:32] gotta .. beat .. the .. Galaga ..hiscore [20:27:47] ottomata: checked. all taken. Wanna fallback to IRC ? [20:28:04] or i can join and not speak :P [20:28:29] !log mwalker synchronized php-1.22wmf15/extensions/CentralNotice 'Updating CentralNotice for custom banner classes' [20:28:32] Logged the message, Master [20:28:38] LeslieCarr: still in brooklyn, I arrive tomorrow [20:28:44] PROBLEM - DPKG on mw114 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:28:48] haha, ok uyeah join and not speak i guess [20:28:51] will be easier to describe [20:28:58] https://plus.google.com/hangouts/_/2da993a9acec7936399e9d78d13bf7ec0c0afdbc [20:29:03] !log mwalker synchronized php-1.22wmf16/extensions/CentralNotice 'Updating CentralNotice for custom banner classes' [20:29:06] Logged the message, Master [20:29:14] PROBLEM - DPKG on mw1148 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:29:34] PROBLEM - DPKG on mw1147 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:29:49] woot, gonna make the channel spammy again [20:29:55] PROBLEM - DPKG on mw1140 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:29:56] PROBLEM - DPKG on mw1142 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:30:04] PROBLEM - DPKG on mw1144 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:30:14] PROBLEM - DPKG on mw1143 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:30:14] PROBLEM - DPKG on mw1141 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:30:14] PROBLEM - DPKG on mw1146 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:30:35] PROBLEM - DPKG on mw1145 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:30:44] RECOVERY - DPKG on mw114 is OK: All packages OK [20:30:45] !icinga magic --fix [20:30:49] LeslieCarr: It shows everyone you're working hard! [20:31:14] RECOVERY - DPKG on mw1143 is OK: All packages OK [20:31:35] RECOVERY - DPKG on mw1147 is OK: All packages OK [20:31:35] RECOVERY - DPKG on mw1145 is OK: All packages OK [20:31:54] RECOVERY - DPKG on mw1140 is OK: All packages OK [20:31:54] RECOVERY - DPKG on mw1142 is OK: All packages OK [20:32:04] RECOVERY - DPKG on mw1144 is OK: All packages OK [20:32:14] RECOVERY - DPKG on mw1148 is OK: All packages OK [20:32:14] RECOVERY - DPKG on mw1141 is OK: All packages OK [20:32:14] RECOVERY - DPKG on mw1146 is OK: All packages OK [20:33:50] !log upgrading mw115* [20:33:52] Logged the message, Mistress of the network gear. [20:36:01] (03PS1) 10CSteipp: Don't use Old AutoLogin when Silent is used [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83555 [20:36:18] (03PS1) 10Andrew Bogott: Explicitly handle projectgid for production [operations/puppet] - 10https://gerrit.wikimedia.org/r/83556 [20:36:24] PROBLEM - DPKG on mw115 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:36:27] (03PS9) 10Hashar: contint: publish Zuul git over git protocol [operations/puppet] - 10https://gerrit.wikimedia.org/r/82625 [20:37:34] PROBLEM - DPKG on mw1150 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:37:54] PROBLEM - DPKG on mw1152 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:38:24] (03PS2) 10Hashar: Enable Flow on labs [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/82766 (owner: 10Matthias Mullie) [20:38:24] PROBLEM - DPKG on mw1151 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:38:24] RECOVERY - DPKG on mw115 is OK: All packages OK [20:39:23] (03CR) 10Hashar: "That is for Bug 53061 - "support Flow on beta cluster", I have updated the commit message to reference the bug report." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/82766 (owner: 10Matthias Mullie) [20:39:54] RECOVERY - DPKG on mw1152 is OK: All packages OK [20:40:26] RECOVERY - DPKG on mw1151 is OK: All packages OK [20:40:29] !log upgrading mw116* [20:40:32] Logged the message, Mistress of the network gear. [20:40:36] RECOVERY - DPKG on mw1150 is OK: All packages OK [20:41:30] paravoid: wanna review yet another contint change ? https://gerrit.wikimedia.org/r/82625 :-D [20:41:48] (03CR) 10Andrew Bogott: [C: 032] Explicitly handle projectgid for production [operations/puppet] - 10https://gerrit.wikimedia.org/r/83556 (owner: 10Andrew Bogott) [20:42:26] PROBLEM - DPKG on mw116 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:43:26] PROBLEM - DPKG on mw1164 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:43:26] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:43:40] PROBLEM - DPKG on mw1162 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:43:40] PROBLEM - DPKG on mw1167 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:43:41] !log upgrading asw-ulsfo [20:43:44] Logged the message, Mistress of the network gear. [20:43:46] upgrading everything! [20:43:46] PROBLEM - DPKG on mw1161 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:44:18] PROBLEM - DPKG on mw1168 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:44:18] PROBLEM - DPKG on mw1163 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:44:18] PROBLEM - DPKG on mw1166 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:44:18] PROBLEM - DPKG on mw1165 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:44:26] PROBLEM - DPKG on mw1169 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:45:19] (03Abandoned) 10Cmjohnson: Adding Siebrand to admins and site for stat1 access RT5726 [operations/puppet] - 10https://gerrit.wikimedia.org/r/83110 (owner: 10Cmjohnson) [20:45:26] RECOVERY - DPKG on mw116 is OK: All packages OK [20:45:36] RECOVERY - DPKG on mw1162 is OK: All packages OK [20:45:36] RECOVERY - DPKG on mw1167 is OK: All packages OK [20:45:43] !log upgrading mw117* [20:45:46] Logged the message, Mistress of the network gear. [20:45:47] RECOVERY - DPKG on mw1161 is OK: All packages OK [20:46:16] RECOVERY - DPKG on mw1168 is OK: All packages OK [20:46:16] RECOVERY - DPKG on mw1163 is OK: All packages OK [20:46:17] RECOVERY - DPKG on mw1166 is OK: All packages OK [20:46:17] RECOVERY - DPKG on mw1165 is OK: All packages OK [20:46:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [20:46:26] RECOVERY - DPKG on mw1169 is OK: All packages OK [20:46:26] RECOVERY - DPKG on mw1164 is OK: All packages OK [20:49:06] PROBLEM - DPKG on mw1174 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:49:16] PROBLEM - DPKG on mw1176 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:49:16] PROBLEM - DPKG on mw1170 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:49:16] PROBLEM - DPKG on mw1177 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:49:26] PROBLEM - DPKG on mw1175 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:49:26] PROBLEM - DPKG on mw1171 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:49:26] PROBLEM - DPKG on mw117 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:49:46] PROBLEM - DPKG on mw1172 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:49:58] PROBLEM - DPKG on mw1178 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:49:58] PROBLEM - DPKG on mw1179 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:50:20] ottomata: how can i enable snappy compression in kafka ? [20:50:28] RECOVERY - DPKG on mw1171 is OK: All packages OK [20:50:28] RECOVERY - DPKG on mw117 is OK: All packages OK [20:50:39] I am trying to reproduce the issue. [20:51:08] RECOVERY - DPKG on mw1174 is OK: All packages OK [20:51:18] RECOVERY - DPKG on mw1176 is OK: All packages OK [20:51:18] RECOVERY - DPKG on mw1170 is OK: All packages OK [20:51:18] PROBLEM - Apache HTTP on mw1179 is CRITICAL: Connection refused [20:51:18] PROBLEM - Apache HTTP on mw1174 is CRITICAL: Connection refused [20:51:18] PROBLEM - Apache HTTP on mw1171 is CRITICAL: Connection refused [20:51:19] RECOVERY - DPKG on mw1177 is OK: All packages OK [20:51:28] PROBLEM - Apache HTTP on mw1176 is CRITICAL: Connection refused [20:51:29] RECOVERY - DPKG on mw1175 is OK: All packages OK [20:51:29] PROBLEM - Apache HTTP on mw1177 is CRITICAL: Connection refused [20:51:30] PROBLEM - Apache HTTP on mw1170 is CRITICAL: Connection refused [20:51:30] PROBLEM - Apache HTTP on mw1175 is CRITICAL: Connection refused [20:51:38] PROBLEM - Apache HTTP on mw1172 is CRITICAL: Connection refused [20:51:38] PROBLEM - Apache HTTP on mw1178 is CRITICAL: Connection refused [20:51:38] RECOVERY - DPKG on mw1172 is OK: All packages OK [20:51:48] that was me, all fixed [20:51:58] RECOVERY - DPKG on mw1178 is OK: All packages OK [20:51:58] RECOVERY - DPKG on mw1179 is OK: All packages OK [20:52:20] RECOVERY - Apache HTTP on mw1179 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.064 second response time [20:52:21] RECOVERY - Apache HTTP on mw1174 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.065 second response time [20:52:21] RECOVERY - Apache HTTP on mw1171 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.065 second response time [20:52:23] !log upgrading mw118* [20:52:26] Logged the message, Mistress of the network gear. [20:52:28] RECOVERY - Apache HTTP on mw1176 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.055 second response time [20:52:28] RECOVERY - Apache HTTP on mw1177 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.054 second response time [20:52:29] RECOVERY - Apache HTTP on mw1170 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.055 second response time [20:52:29] RECOVERY - Apache HTTP on mw1175 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.063 second response time [20:52:29] RECOVERY - Apache HTTP on mw1172 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.059 second response time [20:52:29] RECOVERY - Apache HTTP on mw1178 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.059 second response time [20:53:04] akosiaris: log into kraken-kafka.pmtpa.wmflabs [20:53:09] oo, you might have to be on the analytics labs project if you aren't alreayd [20:53:21] checking [20:53:22] i think i am [20:53:33] oh, you are [20:53:33] good [20:53:34] yeah [20:53:52] so i've got varnishkafka running there producing with snappy [20:54:38] PROBLEM - DPKG on mw118 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:54:49] onesec [20:55:18] PROBLEM - DPKG on mw1186 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:55:18] PROBLEM - DPKG on mw1187 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:55:18] PROBLEM - DPKG on mw1184 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:55:18] PROBLEM - DPKG on mw1183 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:55:28] PROBLEM - DPKG on mw1185 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:55:28] PROBLEM - DPKG on mw1181 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:55:58] PROBLEM - DPKG on mw1182 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:55:58] PROBLEM - DPKG on mw1189 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:56:18] PROBLEM - DPKG on mw1188 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:56:19] PROBLEM - DPKG on mw1180 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [20:56:38] RECOVERY - DPKG on mw118 is OK: All packages OK [20:57:18] RECOVERY - DPKG on mw1186 is OK: All packages OK [20:57:18] RECOVERY - DPKG on mw1188 is OK: All packages OK [20:57:18] RECOVERY - DPKG on mw1187 is OK: All packages OK [20:57:18] RECOVERY - DPKG on mw1184 is OK: All packages OK [20:57:18] RECOVERY - DPKG on mw1183 is OK: All packages OK [20:57:29] RECOVERY - DPKG on mw1185 is OK: All packages OK [20:57:29] RECOVERY - DPKG on mw1181 is OK: All packages OK [20:57:58] RECOVERY - DPKG on mw1182 is OK: All packages OK [20:58:20] RECOVERY - DPKG on mw1180 is OK: All packages OK [20:58:27] sorry, akosiaris, phone [20:58:28] so yeah [20:58:30] ummmm [20:58:41] its kinda annoying to do because there are two brokers and new leaders get elected every time kafka is restarted [20:58:42] but [20:58:48] akosiaris: any idea where our PHP debian package sources are hosted at? I can't find anything in Gerrit, so I guess it is still in SVN or in some local repos on some random server. Any clue? [20:58:50] you can see the exception by tailing the kafka log [20:58:56] /var/log/kafka/kafka.log [20:58:58] RECOVERY - DPKG on mw1189 is OK: All packages OK [20:59:08] hm, i'll gist the stepts to restart and produce snappy errors [20:59:24] hashar: apt.wikimedia.org. Have an deb-src line in sources and do apt-get source php5 [20:59:41] we don't have code of our own so we don't have a repo [20:59:53] ah that is why [20:59:59] (03PS1) 10Cmjohnson: adding siebrand to stat1 rt5726 [operations/puppet] - 10https://gerrit.wikimedia.org/r/83560 [21:00:17] i was thinking about the package you gave me for some segfault on gallium and was wondering how you did change the source. [21:00:42] apt-get source, fixes to debian/patches and then buildpackage [21:01:00] but i have not uploaded those in apt.wikimedia.org [21:01:11] akosiaris: https://gist.github.com/ottomata/6501464 [21:01:20] akosiaris: ok :) [21:01:27] we encountered another error from what i remember so I postponed it [21:01:54] anytime you restart kafka server you'll have to do the preferred replica election thing [21:02:01] just because the leader will failover to the other broker [21:02:05] ottomata: thanx. Trying to reproduce now [21:02:08] k [21:02:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:03:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.131 second response time [21:03:34] (03PS2) 10Cmjohnson: adding siebrand to stat1 rt5726 [operations/puppet] - 10https://gerrit.wikimedia.org/r/83560 [21:05:40] (03CR) 10Cmjohnson: [C: 032] adding siebrand to stat1 rt5726 [operations/puppet] - 10https://gerrit.wikimedia.org/r/83560 (owner: 10Cmjohnson) [21:08:02] (03PS1) 10BBlack: fix shell-style comment on analytics1007 [operations/dns] - 10https://gerrit.wikimedia.org/r/83563 [21:08:03] (03CR) 10RobH: [C: 032] "this couldn't possibly go wrong......" [operations/puppet] - 10https://gerrit.wikimedia.org/r/82879 (owner: 10RobH) [21:08:11] (03PS8) 10RobH: RT#5011 bugzilla to use own certificate [operations/puppet] - 10https://gerrit.wikimedia.org/r/82879 [21:11:24] (03CR) 10Nikerabbit: [C: 032] ULS event logging for wiktionaries [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83418 (owner: 10Nikerabbit) [21:11:36] (03Merged) 10jenkins-bot: ULS event logging for wiktionaries [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83418 (owner: 10Nikerabbit) [21:11:50] mwalker: you're done right? [21:12:03] Nikerabbit: yep yep -- all yours [21:14:49] (03PS1) 10BBlack: uncomment eqiad osm IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/83564 [21:15:31] ottomata: failing to reproduce... [21:15:45] (03CR) 10Faidon Liambotis: [C: 032] fix shell-style comment on analytics1007 [operations/dns] - 10https://gerrit.wikimedia.org/r/83563 (owner: 10BBlack) [21:16:14] (03CR) 10BBlack: [C: 032] fix shell-style comment on analytics1007 [operations/dns] - 10https://gerrit.wikimedia.org/r/83563 (owner: 10BBlack) [21:17:22] akosiaris: yup I am hit with another PHP backtrace. I got to git bisect it :( [21:17:31] (03CR) 10Faidon Liambotis: [C: 032] uncomment eqiad osm IPs [operations/dns] - 10https://gerrit.wikimedia.org/r/83564 (owner: 10BBlack) [21:17:52] !log nikerabbit synchronized wmf-config/InitialiseSettings.php 'ULS eventlogging' [21:17:56] Logged the message, Master [21:18:24] !log nikerabbit synchronized wmf-config/CommonSettings.php 'ULS eventlogging' [21:18:27] Logged the message, Master [21:19:01] hm [21:19:02] !log gallium upgrading PHP packages. [21:19:03] akosiaris: [21:19:05] Logged the message, Master [21:20:06] bwerrrrrrrr [21:20:16] dunno, let's restart the kafka server again [21:21:35] can I kill yours and try real quick? [21:21:36] akosiaris: ? [21:21:38] ottomata: I have a while loop doing the curl you pointed every 1 sec from a VM in labs [21:21:42] cool that's fine [21:21:44] someone been making exim changes ? [21:21:44] ottomata: sure [21:21:46] (that's what I do too) [21:21:52] !log lanthanum.eqiad.wmnet upgrading packages [21:21:55] Logged the message, Master [21:22:31] hmm i got it [21:22:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:22:32] see it? [21:23:30] ^d: any ETA on missing commits? [21:23:41] ottomata: yes. How did you restart kafka ? [21:23:56] kafka server-start [21:24:03] that is a slightly different error message though.... [21:24:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [21:24:29] i just killed my kafka server [21:24:30] you try again? [21:24:50] yes [21:25:59] weird.? [21:26:01] So... i killed it [21:26:08] and started it from the init script [21:26:19] ok i think that should be fine [21:26:39] if we don't reproduce now, I am gonna start chasing down the kafka server-start path... [21:26:55] yeah but I can't consume either though [21:26:57] and if I look at JMX [21:27:03] i can def see failed produce requests piling up [21:27:14] just don't know why we don't see the error [21:27:24] maybe init.d logs differently? [21:27:29] shoudlnt' [21:27:39] doubtfull... [21:28:03] also, i'm pretty sure I first saw this error when I was trying this while kafka was running under the init script [21:28:11] i only started doing it with server-start since troubleshooting [21:28:27] going to try in a different topic with varnishkafka [21:28:32] just to keep things clean [21:30:35] same deal, hmmmmmMMMmm [21:30:40] going to kill yours and try something [21:31:32] hmmm, akosiaris, we will have an easier time debugging this if we move to a different instance [21:31:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:31:41] we should use kraken-kafka-external.pmtpa.wmflabs [21:31:45] it is a single broker cluster [21:31:49] so we don' t have to do this replica election thing [21:31:55] ottomata: sure [21:32:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.137 second response time [21:37:19] ok [21:37:20] akosiaris: [21:37:28] you can use init script on kraken-kafka-external [21:37:39] and curl on kraken-kafka [21:37:49] varnishkafka will produce to kraken-kafka-external broker now [21:38:03] ok cool [21:38:19] !log upgrading mw119* [21:38:22] Logged the message, Mistress of the network gear. [21:38:30] akosiaris: I have to run really soon [21:39:35] ottomata: ok... [21:40:05] you just to make sure. Just restart kafka using init script on external and curl on non external and i should be able to reproduce, right ? [21:41:27] PROBLEM - DPKG on mw119 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:41:27] PROBLEM - DPKG on mw1196 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:41:37] PROBLEM - DPKG on mw1195 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:41:57] PROBLEM - DPKG on mw1199 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:41:57] PROBLEM - DPKG on mw1194 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:41:58] PROBLEM - DPKG on mw1190 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:42:07] PROBLEM - DPKG on mw1193 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:42:17] PROBLEM - DPKG on mw1192 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:42:27] PROBLEM - DPKG on mw1198 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:42:27] PROBLEM - DPKG on mw1191 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:42:39] (03CR) 10RobH: [C: 032] "this cannot possibly go wrong...." [operations/puppet] - 10https://gerrit.wikimedia.org/r/82879 (owner: 10RobH) [21:42:56] Is there some issues to deploy Echo on small wikis or we're still in a test only for a small set? (ckb. is requesting it) [21:42:57] PROBLEM - DPKG on mw1197 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:43:27] RECOVERY - DPKG on mw119 is OK: All packages OK [21:43:28] RECOVERY - DPKG on mw1198 is OK: All packages OK [21:43:37] RECOVERY - DPKG on mw1195 is OK: All packages OK [21:43:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:43:42] Dereckson: you should ask on the roadmap talkpage on me.o [21:43:57] PROBLEM - Apache HTTP on mw1191 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:43:57] RECOVERY - DPKG on mw1197 is OK: All packages OK [21:43:57] RECOVERY - DPKG on mw1194 is OK: All packages OK [21:43:57] RECOVERY - DPKG on mw1190 is OK: All packages OK [21:43:59] damn it [21:44:08] someone has a lot of outstanding commits on sockpuppet im having to merge with mine [21:44:09] RECOVERY - DPKG on mw1193 is OK: All packages OK [21:44:10] Dereckson: https://www.mediawiki.org/wiki/Echo/Release_Plan_2013 [21:44:16] so hope they are ok [21:44:19] RECOVERY - DPKG on mw1192 is OK: All packages OK [21:44:25] (03PS3) 10Jforrester: Remove VisualEditor's dependency on EventLogging [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79290 [21:44:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [21:44:28] RECOVERY - DPKG on mw1191 is OK: All packages OK [21:44:29] !log reedy synchronized php-1.22wmf16/extensions/EducationProgram/ [21:44:32] Logged the message, Master [21:44:37] (03CR) 10Jforrester: [C: 031] "Good to go whenever." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79290 (owner: 10Jforrester) [21:44:47] (03PS4) 10Jforrester: Enable VisualEditor beta welcome notice for all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79291 [21:44:54] (03CR) 10Jforrester: [C: 031] "Good to go whenever." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79291 (owner: 10Jforrester) [21:44:55] robh: one of those was mine....couldn't get in to merge it so thanks [21:44:57] RECOVERY - DPKG on mw1199 is OK: All packages OK [21:45:08] cool [21:45:22] Ok, place your bets, gonna make bugzilla work with own certs [21:45:26] or break it horribly [21:45:33] andre__: its being applied now ;] [21:46:04] akosiaris: i believe so, yes [21:46:04] aight! [21:46:19] Thank you Nemo_bis. [21:46:29] !log upgrading mw120* [21:46:32] Logged the message, Mistress of the network gear. [21:46:47] RECOVERY - Apache HTTP on mw1191 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.053 second response time [21:47:20] (03PS1) 10Reedy: Remove redirect from http to https from votewiki [operations/apache-config] - 10https://gerrit.wikimedia.org/r/83565 [21:47:40] akosiaris: can you not reproduce? [21:47:47] oh varnishkafak wasn't running [21:47:53] so, you will also [21:48:02] have to make sure varnishkafka is running on kraken-kafka.pmtpa [21:48:02] ottomata: just did [21:48:24] great [21:48:28] RECOVERY - DPKG on mw1196 is OK: All packages OK [21:48:33] yeah, varnishkafka has an init script on kraken-kafka.p [21:48:37] you should be able to restart that if you have to [21:48:42] but that shoudlnt'b e your problem [21:48:44] if so though [21:48:52] you can edit /etc/varnishkafka.conf on kraken-kafka.p [21:48:53] <^d> ori-l: I thought I fixed most of them friday :\ [21:48:55] if you ened to change anything [21:48:57] ok i gotta run [21:49:08] thanks akosiaris, lemme know what you find (email?) [21:49:09] laters! [21:49:27] PROBLEM - DPKG on mw120 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:49:38] PROBLEM - DPKG on mw1200 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:49:47] PROBLEM - DPKG on mw1202 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:49:55] (03Abandoned) 10Jforrester: Enable VisualEditor beta welcome notice for all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79291 (owner: 10Jforrester) [21:49:57] PROBLEM - DPKG on mw1207 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:50:09] PROBLEM - DPKG on mw1201 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:50:17] PROBLEM - DPKG on mw1209 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:50:17] PROBLEM - DPKG on mw1208 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:50:27] PROBLEM - DPKG on mw1205 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:50:27] PROBLEM - DPKG on mw1206 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:50:28] (03PS3) 10Jforrester: Enable VisualEditor beta welcome notice for all wikis [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77269 (owner: 10Catrope) [21:50:37] PROBLEM - DPKG on mw1204 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:50:37] PROBLEM - DPKG on mw1203 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:51:03] (03CR) 10Jforrester: [C: 031] "Now good to go." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77269 (owner: 10Catrope) [21:51:27] PROBLEM - Apache HTTP on mw120 is CRITICAL: Connection refused [21:51:27] RECOVERY - DPKG on mw120 is OK: All packages OK [21:51:37] RECOVERY - DPKG on mw1200 is OK: All packages OK [21:51:53] (03PS1) 10Reedy: Make votewiki use http:// in wgCanonicalServer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83566 [21:51:57] PROBLEM - Apache HTTP on mw1203 is CRITICAL: Connection refused [21:51:57] RECOVERY - DPKG on mw1202 is OK: All packages OK [21:51:57] PROBLEM - Apache HTTP on mw1200 is CRITICAL: Connection refused [21:51:57] PROBLEM - Apache HTTP on mw1202 is CRITICAL: Connection refused [21:51:57] PROBLEM - Apache HTTP on mw1207 is CRITICAL: Connection refused [21:51:58] RECOVERY - DPKG on mw1207 is OK: All packages OK [21:52:07] PROBLEM - Apache HTTP on mw1208 is CRITICAL: Connection refused [21:52:07] RECOVERY - DPKG on mw1201 is OK: All packages OK [21:52:18] PROBLEM - Apache HTTP on mw1201 is CRITICAL: Connection refused [21:52:18] PROBLEM - Apache HTTP on mw1206 is CRITICAL: Connection refused [21:52:19] PROBLEM - Apache HTTP on mw1205 is CRITICAL: Connection refused [21:52:19] PROBLEM - Apache HTTP on mw1204 is CRITICAL: Connection refused [21:52:19] RECOVERY - DPKG on mw1208 is OK: All packages OK [21:52:19] RECOVERY - DPKG on mw1209 is OK: All packages OK [21:52:28] PROBLEM - Apache HTTP on mw1209 is CRITICAL: Connection refused [21:52:31] RECOVERY - DPKG on mw1206 is OK: All packages OK [21:52:31] RECOVERY - DPKG on mw1205 is OK: All packages OK [21:52:37] RECOVERY - DPKG on mw1204 is OK: All packages OK [21:52:37] RECOVERY - DPKG on mw1203 is OK: All packages OK [21:52:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:53:37] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [21:53:47] RECOVERY - Apache HTTP on mw1203 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.060 second response time [21:54:27] RECOVERY - Apache HTTP on mw1209 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.282 second response time [21:54:27] RECOVERY - Apache HTTP on mw120 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.371 second response time [21:54:41] !log upgrading mw121* [21:54:44] Logged the message, Mistress of the network gear. [21:54:49] RECOVERY - Apache HTTP on mw1200 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.055 second response time [21:54:50] RECOVERY - Apache HTTP on mw1202 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.054 second response time [21:54:57] RECOVERY - Apache HTTP on mw1207 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.055 second response time [21:55:07] RECOVERY - Apache HTTP on mw1208 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.054 second response time [21:55:18] RECOVERY - Apache HTTP on mw1206 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.057 second response time [21:55:19] RECOVERY - Apache HTTP on mw1201 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.058 second response time [21:55:19] RECOVERY - Apache HTTP on mw1205 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.059 second response time [21:55:19] RECOVERY - Apache HTTP on mw1204 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.055 second response time [21:57:50] PROBLEM - DPKG on mw121 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:58:00] PROBLEM - DPKG on mw1215 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:58:00] PROBLEM - DPKG on mw1214 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:58:00] PROBLEM - DPKG on mw1212 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:58:00] PROBLEM - DPKG on mw1219 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:58:17] PROBLEM - DPKG on mw1216 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:58:29] PROBLEM - DPKG on mw1217 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:58:29] PROBLEM - DPKG on mw1213 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:58:29] PROBLEM - DPKG on mw1211 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:58:37] PROBLEM - DPKG on mw1210 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:58:50] PROBLEM - DPKG on mw1218 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [21:59:27] PROBLEM - Apache HTTP on mw1215 is CRITICAL: Connection refused [21:59:37] PROBLEM - Apache HTTP on mw1213 is CRITICAL: Connection refused [21:59:47] RECOVERY - DPKG on mw1218 is OK: All packages OK [21:59:47] RECOVERY - DPKG on mw121 is OK: All packages OK [21:59:59] RECOVERY - DPKG on mw1215 is OK: All packages OK [21:59:59] RECOVERY - DPKG on mw1214 is OK: All packages OK [21:59:59] RECOVERY - DPKG on mw1212 is OK: All packages OK [21:59:59] RECOVERY - DPKG on mw1219 is OK: All packages OK [22:00:19] RECOVERY - DPKG on mw1216 is OK: All packages OK [22:00:27] RECOVERY - DPKG on mw1217 is OK: All packages OK [22:00:27] RECOVERY - DPKG on mw1211 is OK: All packages OK [22:00:27] RECOVERY - DPKG on mw1213 is OK: All packages OK [22:00:27] RECOVERY - Apache HTTP on mw1215 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.048 second response time [22:00:39] RECOVERY - DPKG on mw1210 is OK: All packages OK [22:00:39] RECOVERY - Apache HTTP on mw1213 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.057 second response time [22:01:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:03:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.123 second response time [22:04:32] (03PS1) 10Cmjohnson: decom sq36/removing sq31-36 from site.pp/cache.pp/dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/83569 [22:05:55] (03PS1) 10Reedy: Tidy up maintenance jobs on terbium/hume [operations/puppet] - 10https://gerrit.wikimedia.org/r/83570 [22:06:47] ^ No functional change there [22:08:21] (03Abandoned) 10Reedy: Move non geodata cron jobs to terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/62526 (owner: 10Reedy) [22:08:30] (03CR) 10Cmjohnson: [C: 032] decom sq36/removing sq31-36 from site.pp/cache.pp/dhcpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/83569 (owner: 10Cmjohnson) [22:10:31] !log blog updated to wp 3.6 and no one noticed, thats good upgradin folks [22:10:34] Logged the message, RobH [22:12:08] mutante: Mind merging 83570? No functional changes, just tidying up no longer needed comments etc [22:12:35] anwiki [22:12:36] Statistics completed in 0.46s [22:13:13] https://noc.wikimedia.org/~reedy/updateSpecialPages.log [22:13:16] ^ symlinked to there.. LD [22:13:18] *:D [22:15:27] (03PS2) 10Dzahn: Tidy up maintenance jobs on terbium/hume [operations/puppet] - 10https://gerrit.wikimedia.org/r/83570 (owner: 10Reedy) [22:15:49] Reedy: heh, noc homedirs, yea:) [22:15:54] (03PS1) 10MaxSem: Remove a bunch of obsolete rules [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83572 [22:16:22] I'll move that cronjob to terbium.. Not sure what we should do about logging to somewhere (nowhere?) [22:18:46] > /home/wikipedia/logs/norotate/updateSpecialPages.log [22:19:17] yea .. where to ..nod [22:19:39] (03CR) 10Dzahn: [C: 032] "lgtm, no functional change" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83570 (owner: 10Reedy) [22:20:36] Hmm [22:20:39] /tmp [22:20:42] /home/mwdeploy/ [22:20:49] /var/log/translation [22:21:10] * Reedy goes with /home/mwdeploy [22:21:13] /var/log/mediawiki/maintenance/updateSpecialPages.log [22:21:23] hrmm.. /var/log nicer than home :) [22:22:15] heh [22:22:22] I was thinking it's probably worth making all those consistent [22:23:06] consistency is good [22:23:37] arr, now i didn't start that command in a screen [22:23:38] (03PS1) 10Reedy: Move update_special_pages to terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/83573 [22:24:46] notices we also have update_special_pages_small [22:25:44] nevermind, _had_ [22:27:11] (03CR) 10Matthias Mullie: "Ignore my previous @todo comment; I think we can ignore Parsoid config for now (on labs)." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/82766 (owner: 10Matthias Mullie) [22:27:23] (03CR) 10Dzahn: [C: 032] Move update_special_pages to terbium [operations/puppet] - 10https://gerrit.wikimedia.org/r/83573 (owner: 10Reedy) [22:28:00] (03PS1) 10Reedy: Move logs to /var/log [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 [22:28:06] (03CR) 10jenkins-bot: [V: 04-1] Move logs to /var/log [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 (owner: 10Reedy) [22:28:18] (03PS2) 10Reedy: Move logs to /var/log [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 [22:29:37] (03CR) 10Dzahn: [C: 04-1] "when i try that long Google feed link in my browser at office i get " iGoogle has not been enabled by the administrator of the domain @wik" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83348 (owner: 10Jeroen De Dauw) [22:30:39] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:31:33] (03CR) 10Mattflaschen: ""Someone of the ops team must put it manually to apt.wikimedia.org so that this script can be developed further."" [operations/puppet] - 10https://gerrit.wikimedia.org/r/61767 (owner: 10Physikerwelt) [22:32:40] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.691 second response time [22:36:49] !log LocalisationUpdate failed: git pull of extensions failed [22:36:53] Logged the message, Master [22:36:58] Submodule 'wikihiero' () registered for path 'wikihiero' [22:36:58] .gitignore: needs merge [22:36:58] .gitreview: needs merge [22:36:58] error: you need to resolve your current index first [22:37:00] .gitignore: needs merge [22:37:02] .gitreview: needs merge [22:37:04] error: you need to resolve your current index first [22:37:07] .gitignore: needs merge [22:37:09] error: you need to resolve your current index first [22:37:11] Unable to checkout '2b43badc9607f8775f08efee820f7c5a15665f32' in submodule path 'DataTypes' [22:37:13] Unable to checkout '2acfd9d6a2fef31881e36c7adeba12c0d75c69d2' in submodule path 'WikibaseDatabase' [22:37:15] Unable to checkout '741079183e328588e8f899a4432e5ee362dfc395' in submodule path 'WikibaseQuery' [22:37:18] Updating extensions FAILED [22:37:31] !log LocalisationUpdate failed: git pull of extensions failed [22:37:58] (03CR) 10Hashar: [C: 031] "That should enable Flow on beta. The hourly Jenkins job 'beta-update-databases' should take care of updating the databases if the extensi" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/82766 (owner: 10Matthias Mullie) [22:38:05] * ori-l peers at morebots [22:38:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:40:58] (03PS1) 10BBlack: fix osm eqiad IPs (wrong subnet) [operations/dns] - 10https://gerrit.wikimedia.org/r/83577 [22:41:43] (03Abandoned) 10BBlack: fix osm eqiad IPs (wrong subnet) [operations/dns] - 10https://gerrit.wikimedia.org/r/83577 (owner: 10BBlack) [22:42:34] (03PS1) 10BBlack: fix osm eqiad IPs (wrong subnet) [operations/dns] - 10https://gerrit.wikimedia.org/r/83578 [22:43:37] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 7.861 second response time [22:44:10] Running updates for 1.22wmf16 (on aawikibooks) [22:44:10] 529740 MediaWiki messages are updated [22:45:02] im in ur wikimedia updating your localisations [22:45:17] Reedy: that is a lot of them [22:45:26] are you sure that is a legit change ? :D [22:45:41] I'm not sure how it counts [22:45:55] if per language etc [22:46:13] microsoft style: sum(X, Y) { return random(); } [22:47:07] (03CR) 10BBlack: [C: 032] fix osm eqiad IPs (wrong subnet) [operations/dns] - 10https://gerrit.wikimedia.org/r/83578 (owner: 10BBlack) [22:48:30] mutante: what was your operations/debs/ repo for which you needed a phplint jenkins job ? [22:49:16] Anyone want to merge https://gerrit.wikimedia.org/r/#/c/79231/ ? [22:50:20] (03CR) 10Cmjohnson: [C: 032] Rebuild localisation cache in several threads [operations/puppet] - 10https://gerrit.wikimedia.org/r/79231 (owner: 10MaxSem) [22:50:29] reedy ^ [22:50:29] ori-l: do you know whether we already have a statsd instance plugged with our graphite install ? [22:50:36] hashar: you already did that:) [22:50:45] hashar: wikistats [22:50:54] ori-l: Zuul can be made to send some metrics to some statsd instances but I haven't found any available. [22:51:17] mutante: so that is working already? :-D [22:51:24] Updated 1165926 messages in total [22:51:24] Done [22:51:24] All done in 377.40279197693 seconds [22:51:24] Rebuilding localization cache [22:52:00] hashar: yes [22:52:03] \O/ [22:54:57] (03CR) 10Dzahn: [C: 031] "definitely better all in /var/log/ than mixed in /tmp (eeeh) and /home/mwdeploy, even nicer would be /var/log/mediawiki/" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 (owner: 10Reedy) [22:55:51] could any brave ops land a patch for contint please , might need a few minutes to talk about it though : https://gerrit.wikimedia.org/r/#/c/82625/ tested in labs previously. [22:56:06] (03CR) 10Reedy: "Seems sensible to do that to me." [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 (owner: 10Reedy) [22:56:06] don't need to talk about it, need BEER [22:56:38] (03CR) 10Dzahn: [C: 032] Localized logo for hewikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83218 (owner: 10TTO) [22:56:45] I could package { "beer": ensure => present } [22:56:53] (03Merged) 10jenkins-bot: Localized logo for hewikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83218 (owner: 10TTO) [22:56:55] ensure => cold [22:57:00] trying to make puppet pay, eh [22:58:20] PROBLEM - MySQL Processlist on db1051 is CRITICAL: CRIT 0 unauthenticated, 0 locked, 0 copy to table, 42 statistics [23:00:12] RECOVERY - MySQL Processlist on db1051 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 2 statistics [23:02:23] hashar: yeah, there's a statsd instance on hafnium [23:02:28] logs to graphite and ganglia [23:02:43] poke me if you want me to help you set something up [23:02:53] !log dzahn synchronized ./wmf-config/InitialiseSettings.php [23:02:56] Logged the message, Master [23:03:54] !log deployed localized logo for he.wikivoyage [23:03:57] Logged the message, Master [23:05:52] ori-l: I guess I need the IP and port :-D [23:07:16] Hiya. So now I have shell access. Wonderful. [23:07:29] siebrand: welcome to (s)hell [23:07:35] I'm looking for documentation on how to connect into the Wikimedia network. Is there documentation for that somewhere? [23:07:49] In what sense? [23:07:57] SSH in... [23:07:58] ssh wikimedia.org [23:07:59] siebrand: https://wikitech.wikimedia.org/wiki/Server_access_responsibilities [23:08:14] ProxyCommand you should use! [23:08:18] bast1001.wikimedia.org [23:08:43] there's a config example for that, yea [23:09:04] hashar: 127.0.0.1:8126 [23:09:47] Host *.pmtpa.wmnet [23:09:48] ProxyCommand ssh -a -W %h:%p fenari.wikimedia.org [23:09:49] Host *.eqiad.wmnet [23:09:50] ProxyCommand ssh -a -W %h:%p bast1001.wikimedia.org [23:09:59] (missing indentation before the PRoxyCommand lines) [23:10:00] !log LocalisationUpdate completed (1.22wmf16) at Mon Sep 9 23:09:05 UTC 2013 [23:10:02] Logged the message, Master [23:10:15] with that setup, I can "ssh lanthanum.eqiad.wmnet" [23:10:29] my ssh client would connect to bast1001 and use it as a proxy to reach the host [23:10:39] (03PS3) 10Reedy: Move logs to /var/log/mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 [23:10:50] hashar: try telnetting to 8126 [23:11:38] Escape character is '^]'. [23:11:38] help [23:11:38] Commands: stats, counters, timers, gauges, delcounters, deltimers, delgauges, health, quit [23:12:35] (this was in no way related to the ssh proxy stuff, it was about statsd) [23:12:53] So I need to ssh to stat1. Is that "stat1.wikimedia.org" or does it have a different name> [23:12:54] ? [23:13:16] * siebrand greets rmoen  [23:13:36] I'm getting "Server refused our key" so far :) [23:13:44] siebrand: via bastion [23:13:51] ssh to bastion1001 [23:13:53] ssh stat1 [23:13:57] siebrand: it's the right name, but once you setup the ProxyCommand all you need is "ssh stat1" [23:14:10] Does Putty know ProxyCommand? [23:14:13] Some (not many) hosts are directly accessible (ie have public ip addresses) [23:14:52] siebrand: yes https://wikitech.wikimedia.org/wiki/User:Wikinaut/Help:Access_to_instances_with_PuTTY_and_WinSCP [23:14:58] that's the guide for labs [23:15:06] but you can follow it here as well, just different bastion host [23:15:12] (and different key) [23:15:17] PROBLEM - Puppet freshness on sq42 is CRITICAL: No successful Puppet run in the last 10 hours [23:15:30] see screenshots [23:15:36] mutante: is stat1 in labs? [23:15:45] siebrand: no, tampa [23:15:46] no [23:15:54] Then what's my instance? [23:16:08] siebrand: you need "plink.exe" with putty, but you have it if you installed it incl. the putty tools [23:16:24] hashar: uh, yes, but, i forgot -- because i only have access to a couple of machines, i put it on hafnium, but because hafnium has a public interface, i configured statsd to only listen to metrics on the loopback interface [23:16:30] so we should provision another statsd instance somewhere [23:16:34] siebrand: you can just ssh into stat1.wikimedia.org directly [23:16:34] siebrand: replace instance with "stat1" [23:16:45] Uhh [23:16:51] stat1.wikimedia.org is publically accessible [23:17:21] on port 22, it's refusing my key. The merge was done <1 hour ago. How long does it take for me to be known there? [23:17:31] The last Puppet run was at Mon Sep 9 23:10:46 UTC 2013 (5 minutes ago). [23:17:55] there's no authorized_keys file there yet [23:18:19] Isn't that what puppet should set up for me? [23:18:25] notice: /Stage[main]/Accounts::Siebrand/Unixaccount[Siebrand Mazeland]/User[siebrand]/ensure: created [23:18:40] looks for the change [23:19:06] mutante: https://gerrit.wikimedia.org/r/#/c/83560/ [23:19:11] Accounts::Siebrand/Ssh_authorized_key[smazeland@wikimedia.org]: Could not evaluate: Puppet::Util::FileType::FileTypeFlat could not write /home/olivneh/.ssh/authorized_keys: Permission denied - /home/olivneh/.ssh/authorized_keys [23:19:24] wait... smazeland in /home/olivneh? [23:19:25] thank god :P [23:19:32] (03PS1) 10Akosiaris: update-ubuntu-mirror fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/83582 [23:19:32] what the? [23:19:57] * siebrand raises an eyebrow. [23:21:20] it's not the only one [23:21:25] same for tnegrin [23:22:00] clearly everyone in the foundation are just ori-l's puppets... [23:22:38] (03PS1) 10Hashar: contint: let us configure statsd for Zuul [operations/puppet] - 10https://gerrit.wikimedia.org/r/83583 [23:22:53] hashar: \o/ [23:23:10] ori-l: do you have the statsd host in some puppet variable ? [23:23:21] (03CR) 10jenkins-bot: [V: 04-1] contint: let us configure statsd for Zuul [operations/puppet] - 10https://gerrit.wikimedia.org/r/83583 (owner: 10Hashar) [23:23:25] grrrrr [23:23:39] no, because there isn't a generic statsd host yet; the one of hafnium is not reachable by other nodes [23:23:43] but we *should* have one [23:24:32] I'll submit an RT ticket, because it appears that something didn't work. [23:24:36] we could run it on an existing node; statsd isn't very expensive to run. it only keeps data until the next flush interval which is typically a minute [23:24:44] siebrand: i think mutante is looking into it [23:24:53] (03PS2) 10Hashar: contint: let us configure statsd for Zuul [operations/puppet] - 10https://gerrit.wikimedia.org/r/83583 [23:25:07] ori-l: That may be possible. Just to be sure it doesn't get stuck because there's no RT, I'm filing onw. [23:25:08] one. [23:25:32] (03CR) 10jenkins-bot: [V: 04-1] contint: let us configure statsd for Zuul [operations/puppet] - 10https://gerrit.wikimedia.org/r/83583 (owner: 10Hashar) [23:25:34] siebrand: yes.. and yes:) [23:25:35] ori-l: how does statsd on hafnium is not reachable by other nodes ? [23:25:43] ori-l: seems it is listening on 0.0.0.0 [23:27:22] (03PS3) 10Hashar: contint: let us configure statsd for Zuul [operations/puppet] - 10https://gerrit.wikimedia.org/r/83583 [23:28:00] PHP Fatal error: Call to a member function getAuthToken() on a non-object in /usr/local/apache/common-local/php-1.22wmf15/extensions/CentralAuth/CentralAuthHooks.php on line 653 [23:28:06] csteipp: ^ Wasn't that reverted? [23:28:18] !log LocalisationUpdate completed (1.22wmf15) at Mon Sep 9 23:28:18 UTC 2013 [23:28:21] Logged the message, Master [23:28:43] (03CR) 10Akosiaris: [C: 032] update-ubuntu-mirror fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/83582 (owner: 10Akosiaris) [23:28:47] (03CR) 10jenkins-bot: [V: 04-1] contint: let us configure statsd for Zuul [operations/puppet] - 10https://gerrit.wikimedia.org/r/83583 (owner: 10Hashar) [23:29:19] hashar: hrm, right. it's just the tcp management interface. [23:31:50] 8125/tcp, 8126/udp 40988/udp [23:32:19] Reedy: Yeah, where you seeing htat? [23:32:47] Hmm [23:32:54] Last one was 12 minutes ago [23:33:06] Sep 9 23:19:57 10.64.32.67 apache2[16228]: PHP Fatal error: Call to a member function getAuthToken() on a non-object in /usr/local/apache/common-local/php-1.22wmf15/extensions/CentralAuth/CentralAuthHooks.php on line 653 [23:33:07] Sep 9 23:20:06 10.64.16.171 apache2[19998]: PHP Fatal error: Call to a member function getAuthToken() on a non-object in /usr/local/apache/common-local/php-1.22wmf15/extensions/CentralAuth/CentralAuthHooks.php on line 653 [23:33:53] (03PS4) 10Hashar: contint: let us configure statsd for Zuul [operations/puppet] - 10https://gerrit.wikimedia.org/r/83583 [23:34:16] Reedy: https://gerrit.wikimedia.org/r/#/c/83442/ [23:35:10] ori-l: notice: /Stage[main]/Accounts::Olivneh/Ssh_authorized_key[ori@wmf.prod]/ensure: created [23:35:38] ori-l: notice: /Stage[main]/Accounts::Siebrand/Ssh_authorized_key[smazeland@wikimedia.org]/ensure: created [23:35:42] wow, this is crazy [23:36:13] ori-l: fix was: chmod 600 your authorized_keys [23:36:28] it's crazy that it stops others [23:36:45] siebrand: try again ? [23:37:04] ori-l: hafnium is listening: udp 0 0 0.0.0.0:8125 [23:37:21] Look at that… siebrand@stat1:~$ [23:37:55] congratulations! [23:37:56] Next challenge is finding out how to get into MySQL. [23:38:27] Ah, that's the research account, I guess. [23:38:38] ori-l: How maintains that account? [23:38:43] hashar: that's awful; i need to fix that asap. [23:38:43] s/How/Who [23:38:46] !log mwalker Started syncing Wikimedia installation... : Scapping to update CentralNotice i18n strings [23:38:49] Logged the message, Master [23:38:56] siebrand: Nikerabbit has the credentials IIRC [23:39:06] ori-l: Okay, will ask. [23:40:13] Reedy: https://gerrit.wikimedia.org/r/#/c/83555/ [23:41:15] siebrand: :) i think Diederik has it [23:41:20] mwalker, any weird output during localisation rebuild? it's now multithreaded [23:41:22] drdee: ? [23:41:47] MaxSem: doesn't seem like it [23:41:56] it's going a lot faster than usual for sure [23:42:05] mutante: I gots it. [23:42:09] thanks for the quick help. [23:42:16] yay! [23:42:19] cool, yw. that was/is a weird error [23:42:34] now tnegrin can also login, but don't know IRC nick [23:43:04] (03PS2) 10Reedy: Don't use Old AutoLogin when Silent is used [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83555 (owner: 10CSteipp) [23:43:07] Anyone deploying at the moment? [23:43:08] (03CR) 10Reedy: [C: 032] Don't use Old AutoLogin when Silent is used [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83555 (owner: 10CSteipp) [23:43:15] MaxSem: You added "--threads=#"? [23:43:20] (03Merged) 10jenkins-bot: Don't use Old AutoLogin when Silent is used [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/83555 (owner: 10CSteipp) [23:43:21] yup [23:43:28] (03PS1) 10Ori.livneh: StatsD on Hafnium: listen only on loopback interface [operations/puppet] - 10https://gerrit.wikimedia.org/r/83586 [23:43:32] MaxSem: srsly. Wikimedia wasn't using that? [23:43:50] mutante: could you merge that? (https://gerrit.wikimedia.org/r/83586) [23:43:55] MaxSem: on twn, we use --threads=12 [23:43:59] hashar: ^^ [23:44:09] :-))) [23:44:45] mwalker: Are you deploying? [23:44:50] hashar: but that leaves the question open of where to run an instance [23:45:09] it could go on professor (the graphite host) but it's in tampa and hence presumably slotted for decommissioning [23:45:13] csteipp: yesish -- I am scapping i18n changes out [23:45:15] where you mid something? [23:45:30] *were != where [23:45:42] mwalker: Not mid, just a patch got pushed out and I need to do a config change soon [23:45:52] ok; I'll let you know when this finishes [23:45:54] should be soon [23:45:58] I'm at mw1093 [23:46:00] Cool. Thanks [23:46:13] ganglia down? [23:46:18] (03CR) 10Dzahn: [C: 032] "this is a good thing" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83586 (owner: 10Ori.livneh) [23:47:09] mutante: thanks. [23:47:11] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Sep 9 23:47:10 UTC 2013 [23:47:14] Logged the message, Master [23:47:17] (03PS1) 10BryanDavis: Set proper RRD storage types for vhtcpd ganglia. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83587 [23:47:26] bblack: ^ [23:47:54] bblack: Ori helped me understand the ganglia/RRD interaction better. [23:49:09] ganglia is down, fwiw [23:49:31] (03CR) 10Dzahn: "tcp 0 0 127.0.0.1:8126 0.0.0.0:* LISTEN 109 3871986 11921/statsd" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83586 (owner: 10Ori.livneh) [23:49:41] ori-l: done [23:50:11] hashar: 127.0.0.1:8125 [23:50:29] it won't help hashar, since zuul is running on a separate host [23:50:35] !log restarted statsd on hafnium [23:50:38] Logged the message, Master [23:50:44] so I can send statsd to it ? :D [23:50:47] we should provision a statsd instance on a machine with a private interface [23:50:55] (03PS2) 10BryanDavis: Set proper RRD storage types for vhtcpd ganglia. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83587 [23:51:15] MaxSem: I dont recall scap halting at a server before -- it's finished copying to mw1093; but it's been sitting here for several minutes [23:51:16] (03PS3) 10BryanDavis: Set proper RRD storage types for vhtcpd ganglia. [operations/puppet] - 10https://gerrit.wikimedia.org/r/83587 [23:51:21] what is installed on hafnium anyway? Or to say it otherwise: why is it public ? [23:51:40] mwalker, happens sometimes [23:52:08] * mwalker ponders the mysteries of scap [23:52:12] * mwalker is horrified [23:52:17] !log mwalker Finished syncing Wikimedia installation... : Scapping to update CentralNotice i18n strings [23:52:20] Logged the message, Master [23:52:21] hashar: same reason as stat1? [23:52:41] csteipp: you should be good to go now [23:52:46] Thanks [23:53:04] mutante: anyway you could get https://gerrit.wikimedia.org/r/#/c/83583/ in [23:53:09] hashar: i requested a node for running some web-accessible visualization tools on top of graphite data but i got distracted by setting up statsd instead [23:53:17] mutante: it is not actually configuring Zuul to use statsd bu that would let us enable it later on :) [23:53:30] ori-l: you are a good citizen :-] [23:54:47] !log csteipp synchronized wmf-config/CommonSettings.php [23:54:50] Logged the message, Master [23:54:52] let's just run it on professor for now [23:54:56] i'll submit a patch [23:55:09] (03PS2) 10Ryan Lane: WORK IN PROGRESS: Simplify git-deploy configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/83046 [23:55:28] (03CR) 10BBlack: [C: 032] "Let's see what happens" [operations/puppet] - 10https://gerrit.wikimedia.org/r/83587 (owner: 10BryanDavis) [23:55:31] hashar: alright, simple enough:) [23:55:37] (03CR) 10Dzahn: [C: 032] contint: let us configure statsd for Zuul [operations/puppet] - 10https://gerrit.wikimedia.org/r/83583 (owner: 10Hashar) [23:55:50] (03CR) 10jenkins-bot: [V: 04-1] WORK IN PROGRESS: Simplify git-deploy configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/83046 (owner: 10Ryan Lane) [23:57:00] (03PS3) 10Ryan Lane: WORK IN PROGRESS: Simplify git-deploy configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/83046