[00:43:32] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 30 May 2014 18:25:33 UTC [00:47:32] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Sun 01 Jun 2014 21:47:06 UTC [00:49:22] PROBLEM - Disk space on vanadium is CRITICAL: DISK CRITICAL - free space: / 4271 MB (3% inode=95%): [01:17:42] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Mon Jun 2 01:17:34 UTC 2014 [02:02:14] !log tstarling Synchronized php-1.24wmf7: (no message) (duration: 01m 31s) [02:02:27] Logged the message, Master [02:06:11] !log tstarling Synchronized php-1.24wmf6: Revert "Use square bounding boxes for default-sized thumbnails" (duration: 01m 18s) [02:06:16] Logged the message, Master [02:08:32] PROBLEM - Disk space on searchidx1001 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=46%): [02:13:56] !log LocalisationUpdate completed (1.24wmf6) at 2014-06-02 02:12:53+00:00 [02:14:01] Logged the message, Master [02:25:00] !log LocalisationUpdate completed (1.24wmf7) at 2014-06-02 02:23:57+00:00 [02:25:06] Logged the message, Master [03:03:54] !log moving /var/log/eventlogging/archive/* to /srv/eventlogging-logs to free up space on the root partition. unpuppetized for now, sadly. [03:03:58] Logged the message, Master [03:04:05] !log ..on vanadium. [03:04:10] Logged the message, Master [03:04:22] RECOVERY - Disk space on vanadium is OK: DISK OK [03:12:11] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Jun 2 03:11:05 UTC 2014 (duration 11m 4s) [03:12:14] Logged the message, Master [03:44:32] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 30 May 2014 18:25:33 UTC [04:33:32] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 01:33:19 UTC [06:02:42] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Mon Jun 2 06:02:41 UTC 2014 [06:45:32] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 30 May 2014 18:25:33 UTC [06:54:53] (03PS1) 10Ori.livneh: Clean up system::role [operations/puppet] - 10https://gerrit.wikimedia.org/r/136712 [06:56:59] ori: morning [06:57:52] ori: in https://gerrit.wikimedia.org/r/#/c/83574/ i tried to add include ::mediawiki::mwlogdir to class maintenance which you deleted [06:58:10] what is the correct place for this include in the new structure ? [07:19:20] (03PS3) 10Gilles: Launch Media Viewer for all users on German wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134811 [07:19:35] (03PS3) 10Gilles: Launch Media Viewer for all users on English wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134812 [07:21:02] !log restarted Zuul unintentionally [07:21:07] Logged the message, Master [07:23:23] (03PS1) 10Gilles: Lower sampling for enwiki and dewiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136717 [07:39:05] hello [08:30:32] PROBLEM - Puppet freshness on db1007 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 05:29:51 UTC [08:30:42] RECOVERY - Puppet freshness on db1007 is OK: puppet ran at Mon Jun 2 08:30:33 UTC 2014 [08:36:03] (03CR) 10Hashar: "recheck" [operations/apache-config] - 10https://gerrit.wikimedia.org/r/133981 (https://bugzilla.wikimedia.org/64557) (owner: 10John F. Lewis) [08:36:14] It seems the raw pagecount files no longer get properly synced to dataset1001, and I [08:36:18] am having a hard time understanding why the cron job in modules/dataset/manifests/cron/pagecountsraw.pp [08:36:22] stopped working. Could someone having access to dataset1001 help me debug? [08:36:52] The files are still getting produced on gadolinium, but are not visible on dumps.wikimedia.org. [08:37:05] _joe_: godog akosiaris ^^^^ :D [08:37:15] & apergos [08:38:00] There are a few related changes around permissions/users, but without access to the machines, it's hard to isolate the issue. [08:38:46] potentially the cron would spam the catchall email address and might offer clues [08:39:50] The cron has MAILTO set, but I do not get emails from ops-dumps@... [08:40:06] But I guess someone in here does :-) [08:40:57] hashar: looking forward to those pinglimiter gdashes :) in the meanwhile, an easier one in need of +1 is https://gerrit.wikimedia.org/r/#/c/136631/ [08:41:19] Nemo_bis: I have no clue how gdash works sorry -:( [08:41:33] haha, ok :) [08:43:03] Also ... the files on gadolinium belong to nobody/nogroup (but are world readable). [08:47:10] sec [08:47:22] I've changed nothing over there recently [08:47:34] which means in the last 2 weeks [08:47:44] let's see [08:48:59] Commits like c84707b90bc1b85297f4675eaa34a3d7e384546a look related. But it's hard to tell from my end. [08:50:41] Things started to break over the weekend. E.g.: [08:50:46] http://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-05/ [08:51:03] has pagecounts-20140530-150000.gz as last file [08:56:45] cron output isn't very useful [09:06:35] so I see indeed that the dir on gaadolinium from which we woould copy, has the files, has the user that copies, and that the incoming dir on dataset1001 does not have the files. I have a rsync reset message in the cron output: [09:06:48] rsync: connection unexpectedly closed (0 bytes received so far) [Receiver] [09:06:48] rsync error: error in rsync protocol data stream (code 12) at io.c(605) [Receiver=3.0.9] [09:06:48] Rsync from remote host gadolinium.wikimedia.org/a/webstats/dumps/ to local host directory /data/pagecounts/incoming/ failed! [09:06:58] which is one of the most generic error messages ever [09:07:03] :-( [09:07:06] yeah [09:08:17] The rsync command has the remote shell set to some ssh command. Does that ssh command work for you? [09:08:54] let;s see about that, I was going to have a look at the auth log too [09:13:31] I can ssh over (key is accepted) but I'm logged out instantly [09:13:41] as the datasets user that is [09:14:15] Mhmmm. [09:16:50] so /etc/passwd changed on gadolinium may 30 16:15 and across the cluster around that time, guessing this is going to be the changeset [09:17:28] Looks good. [09:19:20] May 30 16:14:51 gadolinium puppet-agent[32741]: (/Stage[main]/Role::Dataset::Systemusers/Generic::Systemuser[datasets]/User[datasets]/shell) shell changed '/bin/b [09:19:20] ash' to '/bin/false' [09:19:25] that would do it. [09:19:32] now let me see how I can fix that back [09:19:39] That matches the above c84707b90bc1b85297f4675eaa34a3d7e384546a. [09:20:30] which switched from 'include "accounts::$user"' to "require => Generic::Systemuser['datasets']," [09:20:38] uh huh [09:28:08] (03PS1) 10ArielGlenn: set datasets user shell back to /bin/bash [operations/puppet] - 10https://gerrit.wikimedia.org/r/136734 [09:28:20] (03PS1) 10QChris: Provide default shell to datasets user [operations/puppet] - 10https://gerrit.wikimedia.org/r/136735 (https://bugzilla.wikimedia.org/65978) [09:28:28] You beat me :-) [09:28:31] I did [09:29:06] come on gerrit [09:29:11] er jenkins [09:30:08] (03CR) 10ArielGlenn: [C: 032] set datasets user shell back to /bin/bash [operations/puppet] - 10https://gerrit.wikimedia.org/r/136734 (owner: 10ArielGlenn) [09:30:11] (03Abandoned) 10QChris: Provide default shell to datasets user [operations/puppet] - 10https://gerrit.wikimedia.org/r/136735 (https://bugzilla.wikimedia.org/65978) (owner: 10QChris) [09:32:08] the next run we should be fine, thanks for noticing this [09:32:32] Thanks for the fix. [09:32:45] I'll check in an hour. [09:42:12] (03PS2) 10Whym: FeaturedFeeds for Wiktionary [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136316 (https://bugzilla.wikimedia.org/66015) [09:44:38] (03CR) 10Whym: "MaxSem: thanks for your help! Per bugzilla discussion, "featuredtexts" seems like a better choice over "wotd", so I updated this patch." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136316 (https://bugzilla.wikimedia.org/66015) (owner: 10Whym) [09:45:28] (03CR) 10Whym: "Um, I meant "featuredwords"." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136316 (https://bugzilla.wikimedia.org/66015) (owner: 10Whym) [09:46:32] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 30 May 2014 18:25:33 UTC [10:28:22] apergos: rsync-ing the datasets works again. Thanks for the quick fix. [10:28:46] thank you for reporting and following up [11:41:34] (03PS1) 10ArielGlenn: log auditing tool plus wrapper script to audit each cluster [operations/software] - 10https://gerrit.wikimedia.org/r/136741 [11:41:37] (03CR) 10jenkins-bot: [V: 04-1] log auditing tool plus wrapper script to audit each cluster [operations/software] - 10https://gerrit.wikimedia.org/r/136741 (owner: 10ArielGlenn) [11:57:21] hi manybubbles [11:57:29] hi! [11:57:42] some complains on search issues for the last 48h. is this known ? [11:58:43] matanya: not to me - are they in irc form? [11:58:47] which wikis? [11:59:21] manybubbles: on he.wiki [11:59:51] matanya: ah, well, in four hours I have a window to switch it from the hebrew analyzer to pure unicode normalization [12:00:13] I'm pretty sure search issues would be caused by the flakiness I observed in the hebrew analyzer [12:00:15] i got those too several times: We could not complete your search due to a temporary problem. Please try again later. [12:00:57] matanya: almost certainly the flakiness - I saw it right before I went to a conference and didn't have the bandwidth to fix it there. [12:01:08] well, that isn't entirely true, I didn't have the bandwidth to monitor it after it was fixed [12:01:13] to make sure it didn't get worse [12:01:46] but in either case, in three hour (had my times wrong) I'll push out a path which I'll use to run a script which should fix it [12:02:44] sure, thanks .I guessed there is no point to report a bug for it, since i don't know how to reproduce :) [12:06:22] matanya: I have no problem with mystery bugs - especially if you put timestamps in them [12:06:36] if it happens it happens [12:06:36] ok, so next time :) [12:06:39] yeah! [12:23:22] (03CR) 10Christopher Johnson (WMDE): "Please check new patch 4 for changes in response to review." (0311 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 (owner: 10Christopher Johnson (WMDE)) [12:31:29] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 12:28:29 UTC [12:33:29] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 12:28:29 UTC [12:34:09] PROBLEM - graphite.wikimedia.org on tungsten is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:35:29] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 12:28:29 UTC [12:37:29] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 12:28:29 UTC [12:39:29] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 12:28:29 UTC [12:41:29] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 12:28:29 UTC [12:43:29] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 12:28:29 UTC [12:43:59] RECOVERY - graphite.wikimedia.org on tungsten is OK: HTTP OK: HTTP/1.1 200 OK - 1607 bytes in 0.002 second response time [12:44:09] RECOVERY - Puppet freshness on mw1191 is OK: puppet ran at Mon Jun 2 12:44:08 UTC 2014 [12:44:29] !log manually ran puppet on mw11991 [12:44:34] Logged the message, Master [12:46:29] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 12:44:08 UTC [12:47:29] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 30 May 2014 18:25:33 UTC [12:48:29] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 12:44:08 UTC [12:48:59] \O/ [12:49:39] RECOVERY - Puppet freshness on mw1191 is OK: OK [12:51:30] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 12:49:29 UTC [12:52:59] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 12:49:29 UTC [12:53:29] RECOVERY - Puppet freshness on mw1191 is OK: ok [12:56:29] PROBLEM - Puppet freshness on mw1191 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 12:54:40 UTC [13:03:32] (03PS1) 10Alexandros Kosiaris: Add bgwriter ganglia stats to postgresql [operations/puppet] - 10https://gerrit.wikimedia.org/r/136743 [13:13:00] (03PS1) 10Cmjohnson: granting brion and dr0ptp4kt stat1003 access RT768 and RT7569 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136744 [13:25:54] (03CR) 10Cmjohnson: [C: 032] granting brion and dr0ptp4kt stat1003 access RT768 and RT7569 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136744 (owner: 10Cmjohnson) [13:31:12] (03PS2) 10Alexandros Kosiaris: Add bgwriter ganglia stats to postgresql [operations/puppet] - 10https://gerrit.wikimedia.org/r/136743 [13:33:30] (03CR) 10Alexandros Kosiaris: [C: 032] Add bgwriter ganglia stats to postgresql [operations/puppet] - 10https://gerrit.wikimedia.org/r/136743 (owner: 10Alexandros Kosiaris) [13:34:48] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [13:38:08] (03CR) 10Christopher Johnson (WMDE): "check_dispatch currently in testing at" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136095 (owner: 10Christopher Johnson (WMDE)) [13:55:19] (03CR) 10coren: [C: 032] "Package GET!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135505 (https://bugzilla.wikimedia.org/61445) (owner: 10Tim Landscheidt) [14:00:48] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [14:05:14] <_joe_> mmmh let me see why we had this alarm [14:05:41] <_joe_> Coren: I'm out today, but this could be the week for the labs upgrade of the puppetmaster [14:05:53] <_joe_> would you have time for this? [14:06:14] _joe_: Sounds good to me, albeit not today either. Mid-week? [14:06:34] <_joe_> yes! [14:06:40] <_joe_> oh wait, I forget the TZ difference [14:06:50] <_joe_> we can set for thursday [14:07:00] <_joe_> wed I'll have to leave early [14:07:19] Works for me too. [14:12:48] (03PS1) 10Matanya: (bug 66033) Throttle for Library of Israel editathon [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136750 [14:16:09] (03PS1) 10coren: Labs: add user_password_expires to replication [operations/software] - 10https://gerrit.wikimedia.org/r/136751 [14:16:37] (03PS2) 10coren: Labs: add user_password_expires to replication [operations/software] - 10https://gerrit.wikimedia.org/r/136751 (https://bugzilla.wikimedia.org/64369) [14:17:40] (03CR) 10coren: [C: 032] "Null column addition; track prod schema" [operations/software] - 10https://gerrit.wikimedia.org/r/136751 (https://bugzilla.wikimedia.org/64369) (owner: 10coren) [14:18:11] (03PS1) 10Rush: irc.wikimedia.org: ekrem => argon [operations/dns] - 10https://gerrit.wikimedia.org/r/136752 [14:19:47] (03PS2) 10Rush: irc.wikimedia.org: ekrem => argon [operations/dns] - 10https://gerrit.wikimedia.org/r/136752 [14:21:54] (03PS1) 10Cmjohnson: disabling/removing users kwang and csalvia from admins, site.pp, data.yaml, icinga contact groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/136753 [14:23:47] (03PS5) 10Rush: send user and channel count to statsd for ircd [operations/puppet] - 10https://gerrit.wikimedia.org/r/135074 [14:23:49] (03PS1) 10Rush: keep rc-pmtpa name for now [operations/puppet] - 10https://gerrit.wikimedia.org/r/136755 [14:23:52] lol [14:24:08] (03CR) 10Rush: [C: 032 V: 032] keep rc-pmtpa name for now [operations/puppet] - 10https://gerrit.wikimedia.org/r/136755 (owner: 10Rush) [14:27:54] (03PS2) 10Rush: keep rc-pmtpa name for now [operations/puppet] - 10https://gerrit.wikimedia.org/r/136755 [14:28:41] (03CR) 10Rush: [V: 032] keep rc-pmtpa name for now [operations/puppet] - 10https://gerrit.wikimedia.org/r/136755 (owner: 10Rush) [14:32:01] (03CR) 10Cmjohnson: [C: 032] disabling/removing users kwang and csalvia from admins, site.pp, data.yaml, icinga contact groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/136753 (owner: 10Cmjohnson) [14:34:09] (03CR) 10Manybubbles: [C: 031] "Looks good for inclusion in SWAT in half an hour." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136291 (https://bugzilla.wikimedia.org/65908) (owner: 10Odder) [14:35:15] (03CR) 10Manybubbles: [C: 031] "Looks good for inclusion in SWAT in half an hour." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136154 (https://bugzilla.wikimedia.org/65905) (owner: 10Jean-Frédéric) [14:44:30] (03PS6) 10Rush: send user and channel count to statsd for ircd [operations/puppet] - 10https://gerrit.wikimedia.org/r/135074 [14:44:45] (03PS7) 10Rush: send user and channel count to statsd for ircd [operations/puppet] - 10https://gerrit.wikimedia.org/r/135074 [14:44:57] (03CR) 10Rush: [C: 032 V: 032] "want to get this in before migration" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135074 (owner: 10Rush) [14:48:09] (03PS1) 10Rush: ircd_stats path fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/136756 [14:48:20] (03PS2) 10Rush: ircd_stats path fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/136756 [14:48:25] (03CR) 10Rush: [C: 032 V: 032] ircd_stats path fix [operations/puppet] - 10https://gerrit.wikimedia.org/r/136756 (owner: 10Rush) [14:51:29] !log chown -R datasets /data/xmldatadumps/public/other/pagecounts-ez on dataset1001 to accompany 70a7f61, fixing bug 66005 [14:51:34] Logged the message, Master [14:56:05] (03Abandoned) 10Manybubbles: Install kuromoji [operations/software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/133833 (owner: 10Manybubbles) [14:57:14] (03PS1) 10BBlack: txqueuelen 10K for 10Gbps LVS interfaces [operations/puppet] - 10https://gerrit.wikimedia.org/r/136758 [15:00:04] manybubbles, anomie: The time is nigh to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140602T1500) [15:00:27] manybubbles: You going to handle SWAT today? [15:00:35] anomie: yeah! [15:00:46] new bot? [15:01:08] twkozlowski: on for me to deploy your changes? [15:01:49] (03CR) 10Manybubbles: [C: 032] Change banned Elasticsearch plugins [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134585 (owner: 10Manybubbles) [15:01:57] (03CR) 10Manybubbles: [C: 032] Reduce enwiki to 6 Elasticsearch shards [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135768 (owner: 10Manybubbles) [15:02:21] * manybubbles has the conch [15:03:31] https://gerrit.wikimedia.org/r/#/c/134400/ is also ready fyi ;) [15:04:00] (03CR) 10RobH: [C: 031] "Looks good to me, leaving for Chase to merge as noted by commit message." [operations/dns] - 10https://gerrit.wikimedia.org/r/136752 (owner: 10Rush) [15:05:51] anomie: have you grokked https://gerrit.wikimedia.org/r/#/c/134400/ ? I haven't and it non-trivial enough in size that it deserves some love. [15:08:14] manybubbles: It *seems* ok, but needs a +1 from someone familiar with stuff (which might be me, later when I have time to test) before I'd SWAT it. [15:08:14] hashar: looks like zuul is a bit stuck? [15:08:50] anomie: fair. I'm pretty sure I'm not the appropriate person to +1 it so if that is you then cool. [15:08:59] Or Nemo_bis could visit Special:ListGroupRights on all those wikis and verify it works after deploy. :) [15:09:45] manybubbles: Plenty of others in mwcore who could review it too. But since it doesn't have that review, I'd say it's not SWATtable right now. [15:10:05] (No urgency.) [15:10:21] manybubbles: grrr [15:10:45] manybubbles: yeah it is. Workaround is to unpool the labs slaves manually in jenkins. doing so [15:13:30] (03CR) 10BBlack: [C: 032 V: 032] txqueuelen 10K for 10Gbps LVS interfaces [operations/puppet] - 10https://gerrit.wikimedia.org/r/136758 (owner: 10BBlack) [15:16:46] (03PS1) 10BBlack: Add path param for txqueuelen check [operations/puppet] - 10https://gerrit.wikimedia.org/r/136761 [15:17:50] hashar: should I kick something to unstick it? [15:19:17] well the Ui is unresponsive :( [15:19:34] * hashar blames SF waking up [15:20:15] !log Jenkins/Zuul stuck. Depooling/Repooling some slaves to reregister jobs with Zuul [15:20:20] Logged the message, Master [15:22:40] (03CR) 10BBlack: [C: 032 V: 032] Add path param for txqueuelen check [operations/puppet] - 10https://gerrit.wikimedia.org/r/136761 (owner: 10BBlack) [15:25:08] !log Zuul stuck in a loop reporting a change :-( [15:25:13] Logged the message, Master [15:26:04] !log restarted Zuul. All jobs lists :-( [15:26:09] Logged the message, Master [15:28:27] (03CR) 10Manybubbles: "recheck." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134585 (owner: 10Manybubbles) [15:28:33] (03CR) 10Manybubbles: "recheck" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135768 (owner: 10Manybubbles) [15:29:40] (03CR) 10Manybubbles: [C: 031] Change banned Elasticsearch plugins [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134585 (owner: 10Manybubbles) [15:29:43] (03CR) 10Manybubbles: [C: 032] Change banned Elasticsearch plugins [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134585 (owner: 10Manybubbles) [15:29:49] (03CR) 10Manybubbles: [C: 031] Reduce enwiki to 6 Elasticsearch shards [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135768 (owner: 10Manybubbles) [15:29:53] (03CR) 10Manybubbles: [C: 032] Reduce enwiki to 6 Elasticsearch shards [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135768 (owner: 10Manybubbles) [15:30:34] manybubbles: fixed :) [15:30:39] tanks! [15:31:04] I have to find the cause of the loop now :-( [15:31:05] (03Merged) 10jenkins-bot: Change banned Elasticsearch plugins [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/134585 (owner: 10Manybubbles) [15:31:10] (03Merged) 10jenkins-bot: Reduce enwiki to 6 Elasticsearch shards [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/135768 (owner: 10Manybubbles) [15:33:11] !log manybubbles Synchronized wmf-config/: SWAT deploy changing some search settings (duration: 00m 05s) [15:33:11] !log attempting to powercycle analytics1015, it is not responding to pings, no output on console [15:33:11] Logged the message, Master [15:33:14] Logged the message, Master [15:33:17] twkozlowski: around for you swat deploy? [15:33:51] (03PS5) 10BBlack: fix for faulty BGP session collisions [operations/debs/pybal] - 10https://gerrit.wikimedia.org/r/134833 [15:34:10] manybubbles: Yes. [15:34:13] !log reindexing all hebrew wikis to switch them from the hebrew analyzer to proper unicode normalization [15:34:18] Logged the message, Master [15:34:24] (03CR) 10Manybubbles: [C: 032] Add wgImportSources entries for Wikimedia Poland wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136291 (https://bugzilla.wikimedia.org/65908) (owner: 10Odder) [15:34:32] (03Merged) 10jenkins-bot: Add wgImportSources entries for Wikimedia Poland wiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136291 (https://bugzilla.wikimedia.org/65908) (owner: 10Odder) [15:34:39] (03PS2) 10Manybubbles: Add French Ministry for Culture to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136154 (https://bugzilla.wikimedia.org/65905) (owner: 10Jean-Frédéric) [15:34:50] (03CR) 10BBlack: "PS5 removes all of the releaseResources bits and just handles the actual issue with the in/out connections lists" [operations/debs/pybal] - 10https://gerrit.wikimedia.org/r/134833 (owner: 10BBlack) [15:35:02] (03CR) 10Manybubbles: [C: 032] Add French Ministry for Culture to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136154 (https://bugzilla.wikimedia.org/65905) (owner: 10Jean-Frédéric) [15:35:08] (03Merged) 10jenkins-bot: Add French Ministry for Culture to wgCopyUploadsDomains [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136154 (https://bugzilla.wikimedia.org/65905) (owner: 10Jean-Frédéric) [15:36:07] RECOVERY - Host analytics1015 is UP: PING OK - Packet loss = 0%, RTA = 0.48 ms [15:36:51] !log manybubbles Synchronized wmf-config/: SWAT deploy - more import sources and upload domains (duration: 00m 04s) [15:36:56] Logged the message, Master [15:36:59] twkozlowski: you are live [15:37:15] I confirm. [15:37:17] :-D [15:40:07] RECOVERY - Disk space on analytics1022 is OK: DISK OK [15:40:16] thanks for the configm [15:40:24] * manybubbles puts down the conch [15:44:27] paravoid: should we schedule a weekly swift check in so that we're forced to actually budget time and think about that on a regular basis? (Since I have thought about it ~ not at all so far) [15:45:24] I've love to help you get up to speed, but do you think that we could do it at least partly offline? [15:45:35] I have enough meetings (for my taste) as it is :) [15:45:57] no need for a call or anything, just a habitual reminder to think about it is all I need :) [15:46:07] Having a scheduled time might prevent procrastination though [15:46:16] oh yeah, sure [15:46:34] <^d> Hmm, swift. We need that in labs still. [15:46:37] it's time for your weekly "feel bad about swift" moment! [15:46:43] please take a moment to feel bad about swift [15:46:46] ori: Yep, that's pretty much the idea. [15:47:07] ^d: yeah one of the things we discussed was andrewbogott setting it up in labs [15:47:16] as a learning experience that will have a useful outcome :) [15:47:36] ok, maybe I'll work on that this afternoon. [15:47:47] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 30 May 2014 18:25:33 UTC [15:48:03] I have a long todo of items for swift, the one I was telling you in Zurich [15:48:13] I'll make to record it all somewhere [15:48:15] wikitech probably [15:48:28] paravoid: Sounds good. And, just pick a 10-minute window that you have free (if there are any of those left) and send me an invite. [15:51:17] (03PS1) 10Ottomata: Not saving Kafka jmx metrics via jmxtrans to an outfile on disk [operations/puppet] - 10https://gerrit.wikimedia.org/r/136762 [15:51:33] (03PS2) 10Ottomata: Not saving Kafka jmx metrics via jmxtrans to an outfile on disk [operations/puppet] - 10https://gerrit.wikimedia.org/r/136762 [15:53:28] ^d, hashar, paravoid, should 'swift in labs' live within the beta project? Or elsewhere? [15:53:46] There's a fossilized 'swift' project already set up but no one is maintaining it and the instances are all mothballed. [15:53:51] andrewbogott: I think that project got setup to test out stage swift itself [15:53:59] we were discussing the possibility of hosting an instance externally [15:54:05] i.e. in production, integrated with keystone [15:54:18] ah, as a service for labs rather than... [15:54:20] basically providing the "labstore" service, only with swift [15:54:25] ok, then where i test it is probably moot :) [15:54:26] for the beta cluster mark told us recently there was a plan to provide a dedicated swift cluster on labs. Probably on baremetal [15:54:30] we were having that discussion in zurich if you recall ;) [15:54:34] what faidon said :) [15:54:48] yes, I recall, just didn't put that conversation together with this one for some reason :) [15:54:48] (03CR) 10Ottomata: [C: 032 V: 032] Not saving Kafka jmx metrics via jmxtrans to an outfile on disk [operations/puppet] - 10https://gerrit.wikimedia.org/r/136762 (owner: 10Ottomata) [15:55:00] :) [15:55:17] Anyway, I'll learn how to build the service and then worry about hardware. [15:57:07] paravoid: what does 'fe' and 'be' represent in ms-*? [15:57:13] I presume ms is mediastore [15:57:28] frontend (proxy) and backend (container/object/account server etc.) [15:58:12] so the frontend has, like, caches and 404-handler and such? Or are those terms that are meaningful within swift itself? [15:58:30] * andrewbogott could just read the code [15:58:45] it doesn't cache, but it routes the request to the appropriate backend(s) [15:58:55] and runs the 404 handler too, yes [15:58:58] ok [16:00:18] And things like 'ms1001.wikimedia.org' that are neither fe or be… unrelated to swift entirely? (Those look to be pretty much unpuppetized) [16:00:25] correct :) [16:00:33] swift is just fe/be [16:00:39] great [16:01:13] not to be confused with fubu [16:01:27] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 120.866669 [16:01:47] * ori will crack terrible jokes until all his patches are merged [16:04:17] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:04:33] matanya: fix pushed to all hebrew wikis [16:04:42] <^d> paravoid: On the subject of swift, do you think we could get rt 7418 resolved (new key for elastic snapshots)? Would like to test it on the testwikis. [16:04:57] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 33.599998 [16:05:01] I haven't resolved it on purpose, as I don't want any more data in swift yet [16:05:17] PROBLEM - Varnishkafka Delivery Errors on cp1070 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 16.799999 [16:05:17] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 50.400002 [16:05:31] http://ganglia.wikimedia.org/latest/?r=week&cs=&ce=&m=part_max_used&s=by+name&c=Swift+eqiad&h=&host_regex=&max_graphs=0&tab=m&vn=&hide-hf=false&sh=1&z=small&hc=4 [16:05:47] see the downward slope [16:06:03] <^d> Hmm [16:06:07] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:06:09] oooo [16:06:20] first time in like a month! [16:06:20] also see the month view [16:07:17] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 206.766663 [16:07:41] !log rebuilding all english non-wikipedias with unicode normalization [16:07:46] Logged the message, Master [16:08:17] RECOVERY - Varnishkafka Delivery Errors on cp1070 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:08:17] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:10:07] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 58.799999 [16:10:51] <^d> paravoid: Oh, were we still waiting for all the servers to get fully in rotation? [16:10:51] <^d> Or something else? [16:10:51] PROBLEM - Varnishkafka Delivery Errors on cp1057 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 314.766663 [16:10:51] PROBLEM - Varnishkafka Delivery Errors on cp1056 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 315.566681 [16:10:51] (03CR) 10BBlack: [C: 04-1] "I'm still digging through this, there's a lot in this patch. But one simple thing I noticed so far: "record" in geoip_lookup() and geoip" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136655 (https://bugzilla.wikimedia.org/64582) (owner: 10Ori.livneh) [16:11:17] PROBLEM - Varnishkafka Delivery Errors on cp1070 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 284.466675 [16:11:17] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 352.799988 [16:11:30] ah the occasional zookeeper timeout -> leader election :/ [16:12:01] _joe__: just introducing you to nuria, one of the analytics devs, she's looking at your graphite anomaly detection script [16:12:21] hola __joe__ [16:12:47] PROBLEM - Varnishkafka Delivery Errors on cp1069 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1081.766724 [16:14:47] RECOVERY - Varnishkafka Delivery Errors on cp1057 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:15:12] greg-g, what would be the proper procedure to push this patch? https://gerrit.wikimedia.org/r/#/c/136503/ [16:15:17] RECOVERY - Varnishkafka Delivery Errors on cp1070 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:15:17] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:15:24] it deploys new extensions to the betalabs [16:15:47] RECOVERY - Varnishkafka Delivery Errors on cp1056 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:15:47] RECOVERY - Varnishkafka Delivery Errors on cp1069 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:15:57] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:16:17] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:18:27] PROBLEM - Varnishkafka Delivery Errors on cp3022 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2109.166748 [16:18:47] PROBLEM - Varnishkafka Delivery Errors on cp1056 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 362.333344 [16:18:47] PROBLEM - Varnishkafka Delivery Errors on cp1069 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 208.666672 [16:18:59] (03CR) 10Ori.livneh: "@BBlack: cool, will amend. I also noticed this on rereview: geo_sanitize_for_cookie(geo_get_top_cookie_domain(host)) can modify host, whic" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136655 (https://bugzilla.wikimedia.org/64582) (owner: 10Ori.livneh) [16:19:17] PROBLEM - Varnishkafka Delivery Errors on cp1070 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 3219.899902 [16:19:17] PROBLEM - Varnishkafka Delivery Errors on cp3019 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 622.466675 [16:19:47] PROBLEM - Varnishkafka Delivery Errors on cp1057 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 2288.266602 [16:20:07] PROBLEM - Varnishkafka Delivery Errors on cp3020 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 3940.199951 [16:20:17] PROBLEM - Varnishkafka Delivery Errors on cp3021 is CRITICAL: kafka.varnishkafka.kafka_drerr.per_second CRITICAL: 1206.400024 [16:20:27] RECOVERY - Varnishkafka Delivery Errors on cp3022 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:21:47] RECOVERY - Varnishkafka Delivery Errors on cp1069 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:22:21] yurikR: JFDI, mostly [16:22:26] thanks manybubbles [16:22:27] RECOVERY - Varnishkafka Delivery Errors on cp3019 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:22:45] matanya: sorry to leave it broken so long - let me know if it isn't fully healed [16:22:47] RECOVERY - Varnishkafka Delivery Errors on cp1057 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:22:47] RECOVERY - Varnishkafka Delivery Errors on cp1056 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:22:59] ^d: https://gerrit.wikimedia.org/r/#/c/136750/ please ? [16:23:04] will do manybubbles thanks for all those efforts [16:23:07] RECOVERY - Varnishkafka Delivery Errors on cp3020 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:23:23] RECOVERY - Varnishkafka Delivery Errors on cp1070 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:24:02] (03CR) 10Chad: [C: 032] (bug 66033) Throttle for Library of Israel editathon [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136750 (owner: 10Matanya) [16:24:10] (03Merged) 10jenkins-bot: (bug 66033) Throttle for Library of Israel editathon [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136750 (owner: 10Matanya) [16:24:43] !log demon Synchronized wmf-config/throttle.php: Library of Israel editathon (duration: 00m 04s) [16:24:47] <^d> matanya: Done. [16:24:48] Logged the message, Master [16:24:57] thanks a lot ^d [16:25:06] <^d> yw [16:26:01] oh, wow [16:26:13] sync-file now shows duration :P [16:26:21] :) [16:27:08] (03PS1) 10BBlack: only re-apply RPS on change [operations/puppet] - 10https://gerrit.wikimedia.org/r/136763 [16:27:17] RECOVERY - Varnishkafka Delivery Errors on cp3021 is OK: kafka.varnishkafka.kafka_drerr.per_second OKAY: 0.0 [16:27:35] !log ran preferred-replica-election to fix vk delivery errors [16:27:40] Logged the message, Master [16:33:27] yeah, sync has gotten nicer [16:33:34] bd808: I assume you are to thank [16:33:35] (03PS3) 10Ori.livneh: Make GeoIP lookup code safer [operations/puppet] - 10https://gerrit.wikimedia.org/r/136655 (https://bugzilla.wikimedia.org/64582) [16:36:34] manybubbles: he is! [16:37:53] (03CR) 10BBlack: [C: 032 V: 032] only re-apply RPS on change [operations/puppet] - 10https://gerrit.wikimedia.org/r/136763 (owner: 10BBlack) [16:38:05] (03CR) 10Ori.livneh: "haven't tested this yet, will stage this on labs later" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136655 (https://bugzilla.wikimedia.org/64582) (owner: 10Ori.livneh) [16:38:08] (03PS1) 10BBlack: Only run ifup for tagged if something changed [operations/puppet] - 10https://gerrit.wikimedia.org/r/136764 [16:39:05] ori: have you seen my question from this morning? [16:39:12] matanya: nope [16:39:18] but running to a dentist appt right now! [16:39:24] PM me and i'll see it for sure when i get back [16:39:37] sure, thanks, good luck there [16:40:30] (03CR) 10BBlack: [C: 032 V: 032] Only run ifup for tagged if something changed [operations/puppet] - 10https://gerrit.wikimedia.org/r/136764 (owner: 10BBlack) [16:41:39] <_joe__> ori: rcstream, tomorrow I'll set up lvs, ok? [16:42:12] <_joe__> oh he is at the dentist, which is usually unpleasant [16:42:30] <_joe__> this will cheer him up at least :) [16:43:34] (03CR) 10Jforrester: [C: 04-1] "The wider "Add Sister Projects sidebar Beta Feature to whitelist" commit is presumably outstanding? That will need some careful work to al" (031 comment) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132611 (owner: 10Tpt) [16:45:28] _joe__: looking for labs puppet 3 help ? [16:46:00] <_joe__> matanya: yes, I need someone with root access though :( [16:46:23] will never have that, sorry [16:48:39] <_joe__> matanya: also, Coren already volunteered [16:48:44] <_joe__> :D [16:48:57] great, way better than i can do [16:49:46] which reminds me... [16:50:03] chasemp: i should abandon my sudo module now, right ? [16:50:09] (03PS13) 10Yuvipanda: toollabs: Initial work for the mongo role [operations/puppet] - 10https://gerrit.wikimedia.org/r/135442 [16:50:11] (03PS13) 10Yuvipanda: mongo: Support newer yaml style configuration [operations/puppet] - 10https://gerrit.wikimedia.org/r/135499 [16:50:24] matanya: seems reasonable [16:50:29] greg-g, seems like the schedule is pretty empty today, could i push https://gerrit.wikimedia.org/r/#/c/136503/ after chasemp (their window ends at 11 pdt and gwicke doesn't start until 13 pdt) [16:51:00] ori: updated the mongodb nit, do plus two when you can [16:51:00] (03Abandoned) 10Matanya: sudo: convert into a module [operations/puppet] - 10https://gerrit.wikimedia.org/r/111189 (owner: 10Matanya) [16:52:31] (03PS2) 10Tpt: Sets otherProjectsLinksByDefault to true for fr and it wikisources. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132611 [16:54:20] (03PS2) 10Matanya: hafnium: add firewall [operations/puppet] - 10https://gerrit.wikimedia.org/r/134304 [16:55:11] (03PS3) 10Tpt: Sets otherProjectsLinksByDefault to true for fr and it wikisources. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132611 [16:55:36] (03PS5) 10Matanya: dns recurses: add firewll [operations/puppet] - 10https://gerrit.wikimedia.org/r/133515 [16:57:51] (03CR) 10Matanya: "Anybody home? this patch is waiting for a very long time in the queue. Can one please comment/merge/abandon?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/107567 (owner: 10Matanya) [16:58:03] (03CR) 10Tpt: "@Jforrester" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132611 (owner: 10Tpt) [16:58:57] (03CR) 10Jforrester: [C: 031] Sets otherProjectsLinksByDefault to true for fr and it wikisources. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132611 (owner: 10Tpt) [16:59:16] (03Abandoned) 10Matanya: coredb_mysql: puppet 3 compatibility fix: fully qualify variable [operations/puppet] - 10https://gerrit.wikimedia.org/r/108313 (owner: 10Matanya) [17:00:04] chasemp: The time is nigh to deploy Ops (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140602T1700) [17:00:06] (03CR) 10Matanya: "Alex, can you please merge your fixes to the manifest into this module, or is it too much work?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/108498 (owner: 10Matanya) [17:00:29] (03CR) 10Rush: [C: 032 V: 032] irc.wikimedia.org: ekrem => argon [operations/dns] - 10https://gerrit.wikimedia.org/r/136752 (owner: 10Rush) [17:01:40] (03CR) 10Alexandros Kosiaris: "It probably is going to be difficult, but I can at least try." [operations/puppet] - 10https://gerrit.wikimedia.org/r/108498 (owner: 10Matanya) [17:02:37] (03CR) 10Matanya: "andrew? any further comment/reply to the question ?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/112423 (owner: 10Matanya) [17:03:53] !log moving irc.wikimedia.org to argon [17:03:58] Logged the message, Master [17:04:46] (03CR) 10Chad: [C: 032] Lower search suggestion cache to 3 hours [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136449 (owner: 10Chad) [17:04:51] (03CR) 10Andrew Bogott: "yes, sorry, commented inline." (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/112423 (owner: 10Matanya) [17:04:53] (03Merged) 10jenkins-bot: Lower search suggestion cache to 3 hours [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136449 (owner: 10Chad) [17:05:18] !log demon Synchronized wmf-config/InitialiseSettings.php: Cache search suggestions for 3 hours instead of 6 (duration: 00m 04s) [17:05:22] Logged the message, Master [17:12:21] (03PS5) 10Matanya: webserver: fixing duplicate declaration of apache-mpm [operations/puppet] - 10https://gerrit.wikimedia.org/r/112423 [17:13:58] (03CR) 10Matanya: "Is my interpretation above correct ?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117698 (owner: 10Matanya) [17:15:25] !log disabling ircd on ekrem [17:15:30] Logged the message, Master [17:17:37] (03CR) 10Andrew Bogott: [C: 04-1] webserver: fixing duplicate declaration of apache-mpm (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/112423 (owner: 10Matanya) [17:20:36] (03PS2) 10Matanya: sunfire hosts decommed, removed from dns [operations/dns] - 10https://gerrit.wikimedia.org/r/118480 [17:22:32] (03PS6) 10Matanya: webserver: fixing duplicate declaration of apache-mpm [operations/puppet] - 10https://gerrit.wikimedia.org/r/112423 [17:23:36] (03CR) 10Andrew Bogott: [C: 031] webserver: fixing duplicate declaration of apache-mpm [operations/puppet] - 10https://gerrit.wikimedia.org/r/112423 (owner: 10Matanya) [17:26:40] (03PS1) 10Rush: ircd stats enabled vs enable [operations/puppet] - 10https://gerrit.wikimedia.org/r/136771 [17:26:54] (03CR) 10Rush: [C: 032 V: 032] ircd stats enabled vs enable [operations/puppet] - 10https://gerrit.wikimedia.org/r/136771 (owner: 10Rush) [17:29:51] (03CR) 10Rush: [C: 031] "seems like there are no refs to admins::ldap" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136480 (owner: 10Dzahn) [17:51:12] yurikR: hey, how soon do you need that new extension stuff in beta cluster? [17:56:40] (03PS7) 10Rush: admin yaml stat* and analytics fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/136426 [17:56:48] (03CR) 10MaxSem: "Eh, I actually already committed the message names from PS1 to FeturedFeeds proper. Thoughts?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136316 (https://bugzilla.wikimedia.org/66015) (owner: 10Whym) [17:59:12] Coren: heya! can you look at https://gerrit.wikimedia.org/r/#/c/135442/ and https://gerrit.wikimedia.org/r/#/c/135442/ [18:01:30] (03CR) 10BBlack: [C: 032 V: 032] fix for faulty BGP session collisions [operations/debs/pybal] - 10https://gerrit.wikimedia.org/r/134833 (owner: 10BBlack) [18:01:37] (03PS5) 10Rush: admin yaml sanger [operations/puppet] - 10https://gerrit.wikimedia.org/r/136464 [18:01:39] (03PS1) 10Rush: ensure csalvia absent [operations/puppet] - 10https://gerrit.wikimedia.org/r/136790 [18:08:47] PROBLEM - twemproxy process on mw1053 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 65534 (nobody), command name nutcracker [18:08:47] PROBLEM - Apache HTTP on mw1053 is CRITICAL: Connection refused [18:09:08] PROBLEM - twemproxy port on mw1053 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 65534 (nobody), command name nutcracker [18:10:19] (03CR) 10GWicke: [C: 031] "I was wondering if we could set this up in a way that makes the sources.list line compatible with either reprepo or mini-dinstall, but it " [operations/puppet] - 10https://gerrit.wikimedia.org/r/136128 (owner: 10Filippo Giunchedi) [18:10:40] ah, 1053 has been like that for 4 days [18:10:45] (thanks mutante|away ) [18:11:43] csteipp: please answer https://gerrit.wikimedia.org/r/132393 [18:12:35] jzerebecki: I will get to it as soon as I can [18:12:58] thx [18:15:35] mw1053 had a new hard drive installed but not added back because of the mw-sync issues last week. is that fixed? [18:15:52] same with mw1151 [18:16:16] waiting to revert this https://gerrit.wikimedia.org/r/#/c/136238/ [18:18:08] ACKNOWLEDGEMENT - Apache HTTP on mw1053 is CRITICAL: Connection refused daniel_zahn sync problems after reinstall #7408 [18:18:08] ACKNOWLEDGEMENT - twemproxy port on mw1053 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 65534 (nobody), command name nutcracker daniel_zahn sync problems after reinstall #7408 [18:18:08] ACKNOWLEDGEMENT - twemproxy process on mw1053 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 65534 (nobody), command name nutcracker daniel_zahn sync problems after reinstall #7408 [18:28:48] does flow team know about fatals? [18:30:32] manybubbles: in what way? [18:31:07] manybubbles: aka: is there an issue they need to address? [18:31:19] greg-g: https://logstash.wikimedia.org/#/dashboard/elasticsearch/fatalmonitor has a bunch of fatals from something in flow - not sure what the use effect is [18:31:35] PHP Warning: array_merge() [function.array-merge]: Argument #1 is not an array in /usr/local/apache/common-local/php-1.24wmf7/extensions/Flow/includes/ReferenceClarifier.php on line 71 [18:31:41] maybe not fatals, actually [18:31:42] just warning [18:31:53] * greg-g nods [18:31:57] all warnings, but a pretty sizeable number of them [18:32:04] ebernhardson, ^^ [18:32:48] i'm just getting back from 3 weeks off, but i think that will be fixed by todays backport [18:33:16] k, I just pinged in -corefeatures [18:33:30] well, person-less/general ping [18:36:03] greg-g, would it be ok to push config changes that i proposed? [18:37:02] yurikR: so, max and I were thinking.... there's this deployment training session on Wednesday, can you wait until then? It'd be a good example of how to do new extension in beta cluster. [18:37:15] sure [18:37:16] If you need it before then.... [18:37:26] cool, then you can just have max do it for you :) [18:37:26] greg-g, well, i wanted to start testing [18:37:29] but it could wait [18:37:33] sure? [18:38:35] yeah, lets wait, i might learn something ) [18:39:08] hehe [18:39:53] greg-g: heya! when we first deployed MobileApp, you wanted me to let you know if it starts having any significant PHP code. It does have a small bit of PHP now (rather trivial code, adds edit tags) https://gerrit.wikimedia.org/r/#/c/135988/. just a fyi [18:40:02] MaxSem ^^ [18:41:08] (03PS4) 10Ori.livneh: Make GeoIP lookup code safer [operations/puppet] - 10https://gerrit.wikimedia.org/r/136655 (https://bugzilla.wikimedia.org/64582) [18:47:13] yurikR, cool [18:48:47] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 30 May 2014 18:25:33 UTC [18:48:48] (03CR) 10Ori.livneh: "staged on labs (deployment-cache-text02, serving beta cluster text)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136655 (https://bugzilla.wikimedia.org/64582) (owner: 10Ori.livneh) [18:50:36] * bd808 notices that people noticed some of the sync-* changes and is happy [18:51:45] !log csteipp Synchronized php-1.24wmf7/includes/upload/UploadBase.php: (no message) (duration: 00m 06s) [18:51:51] Logged the message, Master [18:53:54] cajoel: peerings are activated [18:58:20] !log csteipp Synchronized php-1.24wmf6/includes/upload/UploadBase.php: (no message) (duration: 00m 04s) [18:58:25] Logged the message, Master [19:03:14] (03CR) 10Faidon Liambotis: "Ori, let's remove logged_daemon etc. per the plan we discussed, to be introduced at some point later via a different module?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135447 (owner: 10Ori.livneh) [19:03:19] (03CR) 10Rush: [C: 032 V: 032] admin yaml stat* and analytics fixups [operations/puppet] - 10https://gerrit.wikimedia.org/r/136426 (owner: 10Rush) [19:03:32] paravoid: sure, submitting an update now [19:04:00] (03CR) 10Faidon Liambotis: [C: 032] "As much as I hate this..." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136251 (owner: 10Ori.livneh) [19:04:35] (03CR) 10Faidon Liambotis: [C: 032] mediawiki::web: Remove $::lsbdistrelease guard [operations/puppet] - 10https://gerrit.wikimedia.org/r/136344 (owner: 10Ori.livneh) [19:05:17] PROBLEM - Varnish HTTP upload-frontend on cp3003 is CRITICAL: Connection timed out [19:05:38] (03PS7) 10Ori.livneh: Add rsyslog module and port existing usage [operations/puppet] - 10https://gerrit.wikimedia.org/r/135447 [19:06:17] (03Abandoned) 10Ori.livneh: Add 'deployment' service alias [operations/dns] - 10https://gerrit.wikimedia.org/r/136227 (owner: 10Ori.livneh) [19:06:17] RECOVERY - Varnish HTTP upload-frontend on cp3003 is OK: HTTP OK: HTTP/1.1 200 OK - 308 bytes in 6.875 second response time [19:06:19] (03PS2) 10Faidon Liambotis: Get rid of role::mediawiki::appserver::test [operations/puppet] - 10https://gerrit.wikimedia.org/r/136346 (owner: 10Ori.livneh) [19:06:37] PROBLEM - Host upload-lb.esams.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100% [19:06:42] (03CR) 10Faidon Liambotis: [C: 032] "Technically there's also a difference in the system role text, but wth :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136346 (owner: 10Ori.livneh) [19:06:49] (03CR) 10Faidon Liambotis: [V: 032] "Technically there's also a difference in the system role text, but wth :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136346 (owner: 10Ori.livneh) [19:07:04] wooooo [19:07:25] what's with cp3003? [19:07:27] RECOVERY - Host upload-lb.esams.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 106.09 ms [19:07:34] (03CR) 10Chad: [C: 031] monitoring: monitor mediawiki jobs [operations/puppet] - 10https://gerrit.wikimedia.org/r/136292 (owner: 10Giuseppe Lavagetto) [19:07:50] [33232745.341830] Out of socket memory [19:08:22] something's really bad [19:08:26] (03CR) 10Ori.livneh: monitoring: monitor mediawiki jobs (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136292 (owner: 10Giuseppe Lavagetto) [19:08:47] mobile caches esams is also on an upwards slope [19:09:07] <_joe_> hey [19:09:11] csteipp's sync? [19:09:11] <_joe_> just got paged [19:10:12] what was csteipp's sync? [19:10:36] (03PS1) 10Rush: absent csalvia admin yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136800 [19:10:43] paravoid: Is there a way I can see the members of ldap/wmf so I can raise RT tickets for people missing from it? [19:10:54] James_F: not now, outage [19:11:00] Oh, sure. Sorry. [19:11:00] (03CR) 10Rush: [C: 032 V: 032] absent csalvia admin yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136800 (owner: 10Rush) [19:11:10] <_joe_> I doubt that could create direct problems just to esams varnishes [19:11:49] (03Abandoned) 10Rush: ensure csalvia absent [operations/puppet] - 10https://gerrit.wikimedia.org/r/136790 (owner: 10Rush) [19:13:12] James_F: i'll get you a list in a few [19:13:24] mutante: Thanks! [19:13:34] paravoid: traffic through uplaod lb seems a little higher than normal, but I'm at aloss so far .. [19:13:53] (03CR) 10Ottomata: [C: 031] Add service check for rcstream backends [operations/puppet] - 10https://gerrit.wikimedia.org/r/136502 (owner: 10Ori.livneh) [19:14:09] I distinctly remember dealing with something like this before [19:14:09] but I don't remember anything about it anymore [19:14:40] (03CR) 10Ottomata: [C: 031] rcstream: add 'stream' subcommand to rcstreamctl [operations/puppet] - 10https://gerrit.wikimedia.org/r/136622 (owner: 10Ori.livneh) [19:14:55] (03PS2) 10Ori.livneh: Add service check for rcstream backends [operations/puppet] - 10https://gerrit.wikimedia.org/r/136502 [19:15:01] (03CR) 10Ori.livneh: [C: 032 V: 032] Add service check for rcstream backends [operations/puppet] - 10https://gerrit.wikimedia.org/r/136502 (owner: 10Ori.livneh) [19:15:12] <_joe_> are we still down? doesn't seem the case to me [19:15:13] seems better [19:15:55] (03CR) 10Ottomata: [C: 031] rcstream: enroll in ganglia; add system role [operations/puppet] - 10https://gerrit.wikimedia.org/r/136549 (owner: 10Ori.livneh) [19:15:56] Jun 2 19:11:29 cp3003 kernel: [33232974.378174] Out of socket memory [19:15:57] something external maybe? [19:15:59] was the last one [19:16:01] yeah could be [19:16:04] http://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Upload%20caches%20esams&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2&st=1401736534&g=network_report&z=large [19:16:13] also see http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Mobile+caches+esams&m=cpu_report&s=by+name&mc=2&g=network_report [19:16:18] (03PS2) 10Ori.livneh: rcstream: enroll in ganglia; add system role [operations/puppet] - 10https://gerrit.wikimedia.org/r/136549 [19:16:23] (03CR) 10Ori.livneh: [C: 032 V: 032] rcstream: enroll in ganglia; add system role [operations/puppet] - 10https://gerrit.wikimedia.org/r/136549 (owner: 10Ori.livneh) [19:16:35] <_joe_> bblack: that does not make sense [19:16:44] <_joe_> ori: why ganglia? [19:16:52] <_joe_> I strongly disagree :) [19:17:10] what it's external doesn't make sense? [19:17:19] _joe_: i added a diamond module too [19:17:21] (03CR) 10Ottomata: Add resource for server metrics site (031 comment) [operations/puppet/nginx] - 10https://gerrit.wikimedia.org/r/136550 (owner: 10Ori.livneh) [19:17:23] <_joe_> bblack: no the network vs cpu [19:17:23] _joe_: but ganglia is still useful [19:17:44] <_joe_> ori: for old services maybe, please don't duplicate things [19:18:04] also, re: out of socket mem, for LVS I'm looking at raising vm.min_free_kbytes in some kind of programmatic way, which is related. Might be helpful on varnishes as well. [19:18:07] (03CR) 10Ori.livneh: Add resource for server metrics site (031 comment) [operations/puppet/nginx] - 10https://gerrit.wikimedia.org/r/136550 (owner: 10Ori.livneh) [19:18:09] http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Bits+caches+esams&m=cpu_report&s=by+name&mc=2&g=network_report as well [19:19:09] (03CR) 10Ottomata: [C: 031] "Ok, if you say so! :)" [operations/puppet/nginx] - 10https://gerrit.wikimedia.org/r/136550 (owner: 10Ori.livneh) [19:19:13] so, bits, mobile and upload got a network spike [19:19:15] (03CR) 10Ottomata: [C: 031] diamond: add diamond::collector::nginx resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/136551 (owner: 10Ori.livneh) [19:19:40] but not text [19:19:48] https://graphite.wikimedia.org/render/?title=Wiki%20Pageviews/min%20Holt%20Winters%20Forecast%20-1hours&from=-1hours&width=1024&height=500&until=now&areaMode=none&hideLegend=false&lineWidth=1&lineMode=connected&target=alias%28color%28holtWintersForecast%28reqstats.pageviews%29,%22green%22%29,%22pageview/min%20Forecast%22%29&target=alias%28dashed%28color%28holtWintersConfidenceBands%28reqstats.pageviews%29,%22black%22%29%29,%22pageview/min%2 [19:19:58] pageview/min [19:19:58] (03CR) 10Ori.livneh: [C: 032] Add resource for server metrics site [operations/puppet/nginx] - 10https://gerrit.wikimedia.org/r/136550 (owner: 10Ori.livneh) [19:20:07] from 2450 to 2700 [19:20:33] text too, just lower [19:20:41] less of change, but still apparent [19:20:44] global event? [19:20:48] google doodle or whatever [19:20:59] log digging time [19:21:06] yeah, but [19:21:13] (03CR) 10Ottomata: [C: 031] Add custom Diamond collector for RCStream [operations/puppet] - 10https://gerrit.wikimedia.org/r/136621 (owner: 10Ori.livneh) [19:21:25] bits+upload+mobile are all amslvs2/4 lvs3002/4, right? [19:21:29] that's the other connection [19:21:37] but we only moved the upload IP, right? [19:21:49] we have a pageview/min spike, we're far higher in layers than lvs :) [19:22:27] well, it's what's on my brain, and the traffic that spiked exactly correlates with BGP stuff being messed for lvs[24].esams [19:22:47] (03PS1) 10Ori.livneh: Fix /a/common duplicate def'n [operations/puppet] - 10https://gerrit.wikimedia.org/r/136807 [19:22:52] messed how? [19:23:13] mess with in general, as in it's something that's been changed recently in general [19:23:28] there's nothing specific going on, I haven't touched those hosts since before the meeting [19:25:01] (03PS1) 10Ori.livneh: Update nginx submodule for I9450b8cf0 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136810 [19:26:12] (03Abandoned) 10Ori.livneh: Update nginx submodule for I9450b8cf0 [operations/puppet] - 10https://gerrit.wikimedia.org/r/136810 (owner: 10Ori.livneh) [19:26:54] "Indien", "Mexiko", "Standard_time_zones_of_the_world.png", "Zeitzone" [19:27:11] 7 of the top 10 [19:29:23] there was some update to ticketing for mexico vs portual game today? [19:29:27] portugal [19:30:04] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I think this is simple and well thought, just need some robustness." (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136621 (owner: 10Ori.livneh) [19:30:20] a search with a referer of "how many timezones in india" in German [19:30:48] (03PS2) 10Ori.livneh: diamond: add diamond::collector::nginx resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/136551 [19:31:40] Bei WWM fragt man nach einem Land mit 3 Zeitzonen. Da Indien mein persönliches Steckenpferd ist, wusste ich das natürlich gleich! [19:31:52] tweet from 31 minutes ago [19:32:08] WWM = Wer wird Millionär [19:32:30] aka Who Wants to Be a Millionaire [19:32:54] http://commons.wikimedia.org/wiki/File:Standard_time_zones_of_the_world.png is number 2 on the list [19:33:38] <_joe_> oh so it's a tv-quiz induced swarm of requests? [19:33:42] yes. [19:33:45] haha [19:33:59] (03CR) 10Dzahn: [C: 031] admin yaml sanger [operations/puppet] - 10https://gerrit.wikimedia.org/r/136464 (owner: 10Rush) [19:34:04] another one coming in just now [19:34:11] https://graphite.wikimedia.org/render/?title=Wiki%20Pageviews/min%20Holt%20Winters%20Forecast%20-1hours&from=-1hours&width=1024&height=500&until=now&areaMode=none&hideLegend=false&lineWidth=1&lineMode=connected&target=alias%28color%28holtWintersForecast%28reqstats.pageviews%29,%22green%22%29,%22pageview/min%20Forecast%22%29&target=alias%28dashed%28color%28holtWintersConfidenceBands%28reqstats.pageviews%29,%22black%22%29%29,%22pageview/min%2 [19:34:11] traffic's still rising, it's proving to be interesting for tweaking lvs3002 anyways [19:35:34] (03PS3) 10Dzahn: rm admins::ldap, replaced by ldap-admins yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136480 [19:37:25] <_joe_> bblack: well then it's good :) [19:37:26] (03CR) 10Rush: [C: 031] "chasemp: so you want to fix prod and leave labs funky and tbd labs stuff that would be broken by this?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136807 (owner: 10Ori.livneh) [19:38:18] it made the rx fifo errors a lot more noticeable (but still very small). finally realized the "obvious": bump the rx ring buffer size in ethtool, it was less than 1/10th its max setting [19:38:26] (03PS2) 10Ori.livneh: Fix /a/common duplicate def'n [operations/puppet] - 10https://gerrit.wikimedia.org/r/136807 [19:38:32] (03CR) 10Ori.livneh: [C: 032 V: 032] Fix /a/common duplicate def'n [operations/puppet] - 10https://gerrit.wikimedia.org/r/136807 (owner: 10Ori.livneh) [19:39:56] the second spike is for [19:39:56] 533 http://de.m.wikipedia.org/wiki/WM_66 [19:40:06] I've opened the stream via http://www.stream2watch.me/live-tv/rtl-deutschland-live-stream [19:40:10] I see a question about WM 66 [19:40:17] this is so awesome [19:41:26] (03PS1) 10Ottomata: Add deployment config for analytics/refinery [operations/puppet] - 10https://gerrit.wikimedia.org/r/136813 [19:41:36] (03PS2) 10Ori.livneh: rcstream: add 'stream' subcommand to rcstreamctl [operations/puppet] - 10https://gerrit.wikimedia.org/r/136622 [19:41:43] (03CR) 10jenkins-bot: [V: 04-1] Add deployment config for analytics/refinery [operations/puppet] - 10https://gerrit.wikimedia.org/r/136813 (owner: 10Ottomata) [19:42:07] (03PS3) 10Ori.livneh: diamond: add diamond::collector::nginx resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/136551 [19:42:27] (let me know if the grrrit-wm spam is making debugging the site issues harder, i'll stop) [19:42:28] (03CR) 10Ottomata: [C: 031] diamond: add diamond::collector::nginx resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/136551 (owner: 10Ori.livneh) [19:42:50] PROBLEM - Puppet freshness on ekrem is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 16:42:36 UTC [19:42:53] (03PS6) 10Rush: admin yaml sanger [operations/puppet] - 10https://gerrit.wikimedia.org/r/136464 [19:43:03] * YuviPanda should implement some sort of spam control for grrrit-wm [19:43:23] (03CR) 10Rush: [C: 032 V: 032] admin yaml sanger [operations/puppet] - 10https://gerrit.wikimedia.org/r/136464 (owner: 10Rush) [19:43:37] paravoid: that sounds like a German quiz show on TV made people search or soemthing [19:43:45] yes that's exactly what happened [19:43:51] it's a washing machine from East Germany .. duh [19:44:12] the WM 66 was a smaller spike compared to country with 3 timezones [19:44:16] (03PS2) 10Ottomata: Add deployment config for analytics/refinery [operations/puppet] - 10https://gerrit.wikimedia.org/r/136813 [19:44:25] with top contenders being india and mexico [19:44:32] (03CR) 10Ori.livneh: [C: 032] diamond: add diamond::collector::nginx resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/136551 (owner: 10Ori.livneh) [19:44:49] (03CR) 10Ori.livneh: [C: 032] rcstream: add 'stream' subcommand to rcstreamctl [operations/puppet] - 10https://gerrit.wikimedia.org/r/136622 (owner: 10Ori.livneh) [19:48:00] (03PS1) 10Rush: admin yaml /virt100[8-9].eqiad.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/136815 [19:48:08] (03CR) 10Dzahn: [C: 032] rm admins::ldap, replaced by ldap-admins yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136480 (owner: 10Dzahn) [19:50:11] (03PS3) 10Ottomata: Add deployment config for analytics/refinery [operations/puppet] - 10https://gerrit.wikimedia.org/r/136813 [19:50:16] (03CR) 10Ottomata: [C: 032 V: 032] Add deployment config for analytics/refinery [operations/puppet] - 10https://gerrit.wikimedia.org/r/136813 (owner: 10Ottomata) [19:50:18] (03PS1) 10Rush: admin yaml labstore* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136816 [19:50:36] ACKNOWLEDGEMENT - Puppet freshness on ekrem is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 16:42:36 UTC daniel_zahn IRC server moved [19:51:01] ekrem is me folks, will fix [19:51:04] ^ [19:51:22] (03PS1) 10Ori.livneh: include ::diamond in ::diamond::collector [operations/puppet] - 10https://gerrit.wikimedia.org/r/136817 [19:51:45] (03PS1) 10Ottomata: Use proper analytics/refinery git url [operations/puppet] - 10https://gerrit.wikimedia.org/r/136818 [19:51:51] (03PS2) 10Ottomata: Use proper analytics/refinery git url [operations/puppet] - 10https://gerrit.wikimedia.org/r/136818 [19:52:23] (03CR) 10Ottomata: [C: 032 V: 032] Use proper analytics/refinery git url [operations/puppet] - 10https://gerrit.wikimedia.org/r/136818 (owner: 10Ottomata) [19:53:22] (03CR) 10Ottomata: [C: 031] include ::diamond in ::diamond::collector [operations/puppet] - 10https://gerrit.wikimedia.org/r/136817 (owner: 10Ori.livneh) [19:53:32] (03CR) 10Ori.livneh: [C: 032] include ::diamond in ::diamond::collector [operations/puppet] - 10https://gerrit.wikimedia.org/r/136817 (owner: 10Ori.livneh) [19:54:05] (03PS1) 10Rush: cleanup old ircd and ekrem [operations/puppet] - 10https://gerrit.wikimedia.org/r/136819 [19:54:45] (03PS2) 10Rush: cleanup old ircd and ekrem [operations/puppet] - 10https://gerrit.wikimedia.org/r/136819 [19:55:03] (03CR) 10Rush: [C: 032 V: 032] "this is just cleanup we already moved away from this server" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136819 (owner: 10Rush) [19:56:30] RECOVERY - Puppet freshness on ekrem is OK: puppet ran at Mon Jun 2 19:56:24 UTC 2014 [19:58:58] (03CR) 10Andrew Bogott: [C: 04-2] "What ever happens here, it needs to be separate from the code that's used on the stats boxes." [operations/puppet] - 10https://gerrit.wikimedia.org/r/135499 (owner: 10Yuvipanda) [19:59:32] hmm, chasemp, stats group puppet error on analytics1026 [19:59:38] stats is not a valid group name [19:59:41] k looking [20:00:04] gwicke, subbu: The time is nigh to deploy Parsoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140602T2000) [20:01:07] (03PS1) 10Ori.livneh: Disenroll rcstream from Ganglia [operations/puppet] - 10https://gerrit.wikimedia.org/r/136820 [20:01:12] ^ _joe_ [20:01:20] ottomata: ok weird...troubleshooting may take a minute [20:01:32] k [20:01:33] np [20:01:34] ooh, we have a fancy bot now. [20:02:55] (03CR) 10Ori.livneh: "@AaronSchulz: let's find out? :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136347 (owner: 10Ori.livneh) [20:08:05] (03PS1) 10Ori.livneh: Comment out diamond::collector::nginx on rcs100x [operations/puppet] - 10https://gerrit.wikimedia.org/r/136822 [20:08:32] !log Jenkins unpolled integration-slave1003 npm is outdated there and does not trust npmregistry.org ( {{bug|61508}} ) [20:08:37] Logged the message, Master [20:08:51] (03CR) 10Ori.livneh: "@Faidon: done" [operations/puppet] - 10https://gerrit.wikimedia.org/r/135447 (owner: 10Ori.livneh) [20:08:55] (03PS1) 10Rush: admin yaml analytics1026 refs 'analytics-users' [operations/puppet] - 10https://gerrit.wikimedia.org/r/136823 [20:09:08] (03CR) 10Rush: [C: 032 V: 032] admin yaml analytics1026 refs 'analytics-users' [operations/puppet] - 10https://gerrit.wikimedia.org/r/136823 (owner: 10Rush) [20:09:11] !log deployed Parsoid 04a4bf2b [20:09:18] Logged the message, Master [20:09:28] waiting for the restart to finish.. [20:10:58] ottomata: fixed [20:11:40] danke! [20:12:30] (03CR) 10Rush: [C: 031] "'tis true" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136822 (owner: 10Ori.livneh) [20:12:48] (03CR) 10Ori.livneh: [C: 032] Comment out diamond::collector::nginx on rcs100x [operations/puppet] - 10https://gerrit.wikimedia.org/r/136822 (owner: 10Ori.livneh) [20:14:23] * Coren emerges from the traffic nightmare. [20:15:40] top [20:15:44] oops [20:16:15] (03CR) 10coren: [C: 032] "Is sane." [operations/puppet] - 10https://gerrit.wikimedia.org/r/135442 (owner: 10Yuvipanda) [20:16:31] bicycle > traffic_jam : True [20:16:40] Coren: actually, andrewbogott -2'd the dependent patch, so I need to redo them [20:16:43] (03PS2) 10Ori.livneh: role::mediawiki::common: remove if $::realm == production guard [operations/puppet] - 10https://gerrit.wikimedia.org/r/136355 [20:17:15] (03CR) 10Ori.livneh: [C: 032 V: 032] role::mediawiki::common: remove if $::realm == production guard [operations/puppet] - 10https://gerrit.wikimedia.org/r/136355 (owner: 10Ori.livneh) [20:17:23] YuviPanda: Ah, yeah, just saw. Well, the tools part was sane. [20:17:33] we are done with the Parsoid deploy [20:17:34] Coren: yeah, I've to redo them [20:19:50] (03PS2) 10Ori.livneh: Move beta-specific configs from role::mediawiki::common to role::mediawiki::appserver::beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/136353 [20:20:26] (03CR) 10Ori.livneh: [C: 032 V: 032] Move beta-specific configs from role::mediawiki::common to role::mediawiki::appserver::beta [operations/puppet] - 10https://gerrit.wikimedia.org/r/136353 (owner: 10Ori.livneh) [20:33:01] PROBLEM - Puppet freshness on mw1053 is CRITICAL: Last successful Puppet run was Sat 31 May 2014 23:20:24 UTC [20:39:28] mw1053 is: err: Could not run Puppet configuration client: Could not retrieve local facts: Cannot allocate memory - which arp 2>/dev/null [20:40:44] (03PS1) 10Ori.livneh: Move Apache gmond module to ::apache::monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/136830 [20:41:16] (03PS2) 10Rush: admin yaml /virt100[8-9].eqiad.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/136815 [20:41:23] (03CR) 10Rush: [C: 032 V: 032] admin yaml /virt100[8-9].eqiad.wmnet/ [operations/puppet] - 10https://gerrit.wikimedia.org/r/136815 (owner: 10Rush) [20:42:14] (03CR) 10jenkins-bot: [V: 04-1] Move Apache gmond module to ::apache::monitoring [operations/puppet] - 10https://gerrit.wikimedia.org/r/136830 (owner: 10Ori.livneh) [20:43:36] (03CR) 10Ori.livneh: "the jenkins -1 is over pep8 violations in the python source, but i am merely moving it, not changing its contents." [operations/puppet] - 10https://gerrit.wikimedia.org/r/136830 (owner: 10Ori.livneh) [20:46:14] (03PS2) 10Rush: admin yaml labstore* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136816 [20:46:30] (03CR) 10Rush: [C: 032 V: 032] admin yaml labstore* [operations/puppet] - 10https://gerrit.wikimedia.org/r/136816 (owner: 10Rush) [20:46:39] (03PS3) 10Dzahn: add yaml group for PDF QA users, switch tantalum [operations/puppet] - 10https://gerrit.wikimedia.org/r/136440 [20:47:31] (03PS4) 10Dzahn: add yaml group for PDF QA users, switch tantalum [operations/puppet] - 10https://gerrit.wikimedia.org/r/136440 [20:48:22] (03CR) 10Jgreen: [C: 031] add yaml group for PDF QA users, switch tantalum [operations/puppet] - 10https://gerrit.wikimedia.org/r/136440 (owner: 10Dzahn) [20:48:49] Can anyone explain where the files in puppet:///volatile come from? [20:49:52] (03CR) 10Dzahn: [C: 032] add yaml group for PDF QA users, switch tantalum [operations/puppet] - 10https://gerrit.wikimedia.org/r/136440 (owner: 10Dzahn) [20:49:58] in prod: stafford, /var/lib/puppet/volatile [20:52:10] paravoid: Ah, sorry, by 'come from' I mean… what generates them? [20:52:22] depends on which files specifically :) [20:52:24] Or are they just hand-written files that live outside of git for some reason? [20:52:33] ok, then, the swift ring definitions :) [20:52:34] that's exactly it [20:52:44] the swift ring definitions are being generated using swift-ring-builder [20:52:59] and then scp'ed over [20:53:10] so volatile is essentially being used as a file distribution mechanism [20:53:13] I want to kill that [20:53:14] there's two alternatives [20:53:52] one is, the swift puppet module approach: designate one of the boxes a master, set up an rsync server, set up cronjobs that rsync that [20:54:03] the second one is https://github.com/pandemicsyn/swift-ring-master which kinda looks interesting [20:54:28] Those files aren't automatically generated, though, right? It's just something you do sometimes? [20:54:39] yes [20:54:56] in particular, when I want to pool or depool a disk or a server [20:55:28] Hm, doesn't pooling or depooling involve a change to puppet code in git? [20:56:07] nope! [20:56:33] ok [20:57:09] what does swift-ring-builder take as its input? [20:57:22] the ring file, plus some commands [20:57:26] like set_weight, delete, add etc. [20:57:34] try it on one of the swift boxes [20:57:41] run swift-ring-builder /etc/swift/object.builder [20:57:55] do not run modify commands on that filepath obviously :) [20:58:03] if you want to play with modifying, copy it somewhere [20:58:12] Ben had some instructions for all that up in wikitech [20:58:24] they might be a bit outdated by now [20:58:47] So this is circular, right? The swift boxes are built with puppet which depends on files in volatile, which are generated on the swift boxes... [20:58:58] they don't have to [20:59:08] you can generate them anywhere you have swift-ring-builder installed [20:59:17] we could do it on the puppetmaster themselves, I just don't want to bother installing python-swift there [20:59:23] ah, I see. So they don't actually gather ambient info from the swift box, only from the commandline? [20:59:29] yes [20:59:45] ok, great. [20:59:46] thanks! [20:59:53] no worries :) [21:00:00] fwiw, adding drives to swift rings https://wikitech.wikimedia.org/wiki/User_talk:Dzahn [21:02:54] (03PS1) 10Dr0ptp4kt: Add zerodot support for 520-18. [operations/puppet] - 10https://gerrit.wikimedia.org/r/136896 [21:05:55] (03CR) 10BBlack: [C: 032 V: 032] Add zerodot support for 520-18. [operations/puppet] - 10https://gerrit.wikimedia.org/r/136896 (owner: 10Dr0ptp4kt) [21:06:17] (03PS13) 10Matanya: Move logs to /var/log/mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 (owner: 10Reedy) [21:06:51] (03PS2) 10Dzahn: rm old admins::parsoid class, replaced by yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136476 [21:07:03] (03CR) 10jenkins-bot: [V: 04-1] Move logs to /var/log/mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 (owner: 10Reedy) [21:08:47] (03CR) 10Aaron Schulz: [C: 031] "I suppose" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136347 (owner: 10Ori.livneh) [21:11:21] (03PS14) 10Matanya: Move logs to /var/log/mediawiki [operations/puppet] - 10https://gerrit.wikimedia.org/r/83574 (owner: 10Reedy) [21:15:38] (03PS2) 10Ori.livneh: Move Apache gmond module to ::apache::monitoring; pep8 fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/136830 [21:16:12] (03CR) 10Dzahn: [C: 032] rm old admins::parsoid class, replaced by yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136476 (owner: 10Dzahn) [21:27:17] (03PS1) 10Matanya: rm old admins::pmacct class, replaced by yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136904 [21:27:34] (03CR) 10jenkins-bot: [V: 04-1] rm old admins::pmacct class, replaced by yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136904 (owner: 10Matanya) [21:27:49] Reedy: ^d, can i rm php-1.24wmf5 on search indexer? [21:27:56] low disk [21:29:01] mutante: That *should* be safe, but it will come back on the next scap [21:29:04] <^d> Bleh, I just nuked a ton of those. [21:29:15] <^d> wmf5 was the only one I left because 6/7 were missing. [21:29:59] !log searchidx1001 - low disk space, gzip MegaSAS.log, delete old kernel headers [21:30:04] Logged the message, Master [21:30:07] i think it's already better without doing that ^ [21:30:11] RECOVERY - Disk space on searchidx1001 is OK: DISK OK [21:30:14] there [21:30:29] (03PS2) 10Matanya: rm old admins::pmacct class, replaced by yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136904 [21:32:27] (03CR) 10Dzahn: [C: 032] rm old admins::pmacct class, replaced by yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136904 (owner: 10Matanya) [21:37:01] PROBLEM - Puppet freshness on analytics1010 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 18:36:09 UTC [21:42:13] ACKNOWLEDGEMENT - Puppet freshness on analytics1010 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 18:36:09 UTC daniel_zahn stats is not a valid group name - related to work-around for special stats group [21:44:56] (03CR) 10Ori.livneh: [C: 031] run all maintenance crons as apache user [operations/puppet] - 10https://gerrit.wikimedia.org/r/136118 (owner: 10Dzahn) [21:45:01] PROBLEM - Puppet freshness on mw1153 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 18:44:05 UTC [21:45:26] anyone looking into 1153? [21:45:40] oh, it was 1053 that had the issue [21:45:42] * ori looks [21:46:01] (03PS1) 10Dzahn: stat1010: stats group -> analytics user group [operations/puppet] - 10https://gerrit.wikimedia.org/r/136910 [21:46:03] argh [21:46:52] ori: it got reinstalled.. it's not pooled but messed up in several ways (try starting apache) [21:47:05] ori: it had the disk replaced recently [21:48:56] (03CR) 10Dzahn: [C: 032] stat1010: stats group -> analytics user group [operations/puppet] - 10https://gerrit.wikimedia.org/r/136910 (owner: 10Dzahn) [21:49:01] PROBLEM - Puppet freshness on labstore1001 is CRITICAL: Last successful Puppet run was Fri 30 May 2014 18:25:33 UTC [21:49:50] (03PS1) 10Ori.livneh: Remove declaration of File['/a'] from imagescaler manifest [operations/puppet] - 10https://gerrit.wikimedia.org/r/136912 [21:49:55] mutante: ^ [21:49:58] that should fix mw1153 [21:50:23] (03PS1) 10Matanya: rm old admins::mortals class, replaced by yaml [operations/puppet] - 10https://gerrit.wikimedia.org/r/136913 [21:50:25] (03CR) 10Ori.livneh: "note that the require => File['/a'] is not needed since Puppet autorequires parent dirs when they are managed by it" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136912 (owner: 10Ori.livneh) [21:50:51] RECOVERY - Puppet freshness on analytics1010 is OK: puppet ran at Mon Jun 2 21:50:42 UTC 2014 [21:51:06] ori: ah!.. so it was just down during the switch ? [21:51:32] mutante: no, this will affect other imagescalers too [21:52:03] mw1153 "The last Puppet run was at Mon Jun 2 18:43:41 UTC 2014 (188 minutes ago).", i.e., broke due to my change [21:53:01] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 18:51:58 UTC [21:54:23] (03CR) 10Dzahn: [C: 031] "yep, now defined in mediawiki::sync and this should fix puppet runs on imagescalers" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136912 (owner: 10Ori.livneh) [21:55:59] (03CR) 10Ori.livneh: [C: 032] Remove declaration of File['/a'] from imagescaler manifest [operations/puppet] - 10https://gerrit.wikimedia.org/r/136912 (owner: 10Ori.livneh) [21:56:45] ori: what i don't get is Could not retrieve kernel: Cannot allocate memory - which uname 2>/dev/null [21:56:54] that's a different host [21:57:01] 1053 [21:57:03] the reinstalled one [21:57:11] yeah, no idea what that's about either [21:57:14] ok [21:58:28] (03CR) 10Ori.livneh: "Giuseppe, note that we can't get Diamond metrics until Diamond is packaged for Trusty, so there may be cause for keeping Ganglia a bit lon" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136820 (owner: 10Ori.livneh) [21:58:31] RECOVERY - Puppet freshness on mw1153 is OK: puppet ran at Mon Jun 2 21:58:23 UTC 2014 [21:59:16] mutante: kswapd is going nuts it seems [21:59:37] mutante: there are a bijillion root 32662 0.0 0.3 373420 46760 ? Ssl May30 1:24 python /usr/local/sbin/grain-ensure contains deployment_target mediawiki procs [21:59:41] killing them [22:00:01] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 18:59:02 UTC [22:00:51] RECOVERY - Puppet freshness on mw1053 is OK: puppet ran at Mon Jun 2 22:00:46 UTC 2014 [22:01:08] cgrulesengd ..hmm ..ok [22:01:31] ori: already looks better, thx [22:04:01] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 19:03:36 UTC [22:04:12] (03PS3) 10Dzahn: hafnium: add firewall [operations/puppet] - 10https://gerrit.wikimedia.org/r/134304 (owner: 10Matanya) [22:06:21] (03PS1) 10Dzahn: use analytics-users group vs. stats group [operations/puppet] - 10https://gerrit.wikimedia.org/r/136919 [22:07:01] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 19:06:50 UTC [22:08:00] bd808: did our deploy just move some things around? [22:08:18] rather, I think my symlinks are funky [22:08:25] (03CR) 10Dzahn: [C: 032] "should not need any firewall holes because the connections it makes are outgoing (per log from matanya and Alex chatting about it)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/134304 (owner: 10Matanya) [22:09:04] manybubbles@terbium:~$ mwscript maintenance/eval.php --wiki enwiki [22:09:06] Could not open input file: /a/common/multiversion/MWScript.php [22:09:36] ^d: I think we broke mwscript [22:10:04] manybubbles: is /a gone? [22:10:24] manybubbles: gr, i think i know what this is about [22:10:25] just a moment [22:10:40] /a/common is an empty directory [22:10:40] thanks [22:11:01] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 19:10:41 UTC [22:12:28] * mutante adds iptables to hafnium (eventlogging) [22:13:01] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 19:12:17 UTC [22:13:33] manybubbles: why are you running mwscript on terbium rather than tin? [22:14:09] ori: its where ^d had me running cirrus' command scripts [22:14:13] since forever [22:14:33] manybubbles: it was not puppetized for it [22:14:37] manybubbles: could you run them on tin instead? [22:15:04] ori: probably - aren't other scripts run on terbium for this? [22:16:04] I can't find it now, but there used to be wikidata scripts on that machine [22:16:23] i'm looking to see if anything depends on /a/common [22:17:01] PROBLEM - Puppet freshness on analytics1009 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 19:16:05 UTC [22:17:01] PROBLEM - Puppet freshness on tmh1001 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 19:15:55 UTC [22:17:01] RECOVERY - Puppet freshness on tmh1001 is OK: puppet ran at Mon Jun 2 22:16:58 UTC 2014 [22:18:09] manybubbles: wikidata scripts run as crons, they are in maintenance and terbium is indeed the maintenance host (but not really deployment like tin) [22:18:30] (03PS1) 10Ori.livneh: remove Deployment::Target['mediawiki'] from the app servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/136920 [22:18:41] so it has all the things that are run as cron jobs [22:18:49] as opposed to manual deployer scripts [22:18:53] mutante: yeah, i'll need to puppetize that [22:18:59] i don't think it was ever puppetized [22:19:12] ori: gotcha [22:20:01] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 19:19:33 UTC [22:20:46] mutante: wld appreciate reviews on https://gerrit.wikimedia.org/r/#/c/136712/ , https://gerrit.wikimedia.org/r/#/c/136351/ , https://gerrit.wikimedia.org/r/#/c/136349/ , https://gerrit.wikimedia.org/r/#/c/136347/ , https://gerrit.wikimedia.org/r/#/c/136830/ (however many you can do) if you have the bandwidth. they're mostly small [22:23:01] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 19:22:40 UTC [22:23:59] this /a/common thing is going to suck [22:24:01] fix incoming [22:24:01] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 19:23:15 UTC [22:25:01] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 19:24:16 UTC [22:29:01] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Last successful Puppet run was Mon 02 Jun 2014 19:28:14 UTC [22:37:24] !log Hack-patching integration-slave1003.eqiad.wmflabs per https://bugzilla.wikimedia.org/show_bug.cgi?id=61508#c2 [22:37:28] Logged the message, Master [22:37:53] (03CR) 10Dzahn: [C: 031] "just 2 inline comments" (032 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136712 (owner: 10Ori.livneh) [22:46:34] (03CR) 10Ori.livneh: "sure: re comment" (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/136712 (owner: 10Ori.livneh) [22:48:38] (03PS2) 10Ori.livneh: Clean up system::role [operations/puppet] - 10https://gerrit.wikimedia.org/r/136712 [22:48:51] (03CR) 10jenkins-bot: [V: 04-1] Clean up system::role [operations/puppet] - 10https://gerrit.wikimedia.org/r/136712 (owner: 10Ori.livneh) [22:49:01] (03PS3) 10Ori.livneh: Clean up system::role [operations/puppet] - 10https://gerrit.wikimedia.org/r/136712 [22:49:42] !log Repooled integration-slave1003 in Jenkins. [22:49:47] Logged the message, Master [22:51:42] (03CR) 10Ori.livneh: [C: 032] Clean up system::role [operations/puppet] - 10https://gerrit.wikimedia.org/r/136712 (owner: 10Ori.livneh) [22:53:18] ori: I think when I created it there was some thought that we might put other things in the 'system' module besides role. [22:53:26] But 'what' was never specified [22:56:08] andrewbogott: i'm not really sure where to put it either, so i'm not doing any better [22:56:18] yep, that's where I landed as well [23:00:04] mwalker, ori, MaxSem, spagewmf: The time is nigh to deploy SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20140602T2300) [23:00:31] RoanKattouw: i can do it [23:00:36] mwalker, MaxSem [23:00:44] springle: are you swatting now? [23:00:52] oops, not springle, spagewmf [23:00:58] I can do too [23:01:10] MaxSem: ok, i'm sorta in the middle of something so if you could that'd be awesome [23:01:36] on it [23:02:07] spage, yt? preparing to deploy your stuff [23:03:21] ori: Sorry, I have a meeting [23:03:35] RoanKattouw: no problem, i was volunteering to do it [23:03:42] RoanKattouw: but then MaxSem volunteered harder :P [23:03:54] nonononon [23:03:57] nonono [23:04:04] I simply volunteered:) [23:04:16] (03PS1) 10Ori.livneh: File['/a/common']: symlink to common-local; replace => False [operations/puppet] - 10https://gerrit.wikimedia.org/r/136940 [23:09:18] (03CR) 10Andrew Bogott: [C: 031] "Famous last words: "With replace => false I can't see what this would break that isn't broken already"" [operations/puppet] - 10https://gerrit.wikimedia.org/r/136940 (owner: 10Ori.livneh) [23:09:48] (03PS2) 10Ori.livneh: File['/a/common']: symlink to common-local; replace => False [operations/puppet] - 10https://gerrit.wikimedia.org/r/136940 [23:10:24] (03CR) 10Ori.livneh: [C: 032 V: 032] File['/a/common']: symlink to common-local; replace => False [operations/puppet] - 10https://gerrit.wikimedia.org/r/136940 (owner: 10Ori.livneh) [23:17:44] MaxSem: can you sync-dir php-1.24wmf7/extensions/Flow first so I can sanity-test the fixes on mw.org? [23:18:05] (03PS1) 10BBlack: Bump vm.min_free_kbytes on LVS/cache nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/136943 [23:18:07] manybubbles: terbium's fine now [23:18:11] !log maxsem Synchronized php-1.24wmf7/extensions/Flow/: https://gerrit.wikimedia.org/r/#/c/136936/ (duration: 00m 05s) [23:18:16] Logged the message, Master [23:18:49] oh wow new sync-dir bd808! :P [23:19:03] MaxSem: You like? [23:19:13] It's "scap lite" now [23:19:20] (03CR) 10jenkins-bot: [V: 04-1] Bump vm.min_free_kbytes on LVS/cache nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/136943 (owner: 10BBlack) [23:19:52] No l10n update and limited sync but all the tasty python logging and other new scap goodness [23:20:02] !log maxsem Synchronized php-1.24wmf7/extensions/VisualEditor/: (no message) (duration: 00m 04s) [23:20:07] Logged the message, Master [23:20:16] also, [23:20:23] FRIGGIN FAST [23:20:24] !!! [23:20:27] It also touches InitialiseSettings.php on each run [23:20:54] Cool. That would be the fanout rsync that Reedy asked for I think [23:21:55] !log maxsem Synchronized php-1.24wmf6/extensions/VisualEditor/: (no message) (duration: 00m 03s) [23:21:58] Logged the message, Master [23:22:11] !log maxsem Synchronized php-1.24wmf6/extensions/Flow/: (no message) (duration: 00m 04s) [23:22:15] Logged the message, Master [23:22:30] yay, you got MaxSem on board, you know you're doing it right ;) [23:23:35] spage and RoanKattouw, your extensions were updated, please verify:) [23:26:22] MaxSem: looking good, enwiki Special:WhatLinksHere errors seem gone [23:29:53] (03PS1) 10BBlack: Set rx ring params for bnx2x on 10GbE LVS [operations/puppet] - 10https://gerrit.wikimedia.org/r/136944 [23:32:11] (03CR) 10MaxSem: [C: 032] Enable Flow on mw:Talk:Cite-from-id [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136428 (owner: 10Spage) [23:32:22] (03Merged) 10jenkins-bot: Enable Flow on mw:Talk:Cite-from-id [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/136428 (owner: 10Spage) [23:34:43] !log maxsem Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/136428/ (duration: 00m 03s) [23:34:47] (03PS2) 10BBlack: Bump vm.min_free_kbytes on LVS/cache nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/136943 [23:34:47] Logged the message, Master [23:34:52] spage, ^^^ [23:38:21] spage, https://www.mediawiki.org/wiki/Talk:Cite-from-id looks the same for me [23:38:56] (03PS2) 10Ori.livneh: remove Deployment::Target['mediawiki'] [operations/puppet] - 10https://gerrit.wikimedia.org/r/136920 [23:39:08] (03CR) 10BBlack: [C: 032 V: 032] Bump vm.min_free_kbytes on LVS/cache nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/136943 (owner: 10BBlack) [23:39:29] MaxSem: I guess something cached the redirect, https://www.mediawiki.org/w/index.php?title=Talk:Cite-from-id&redirect=no is Flow-enabled [23:47:00] (03CR) 10BryanDavis: [C: 031] remove Deployment::Target['mediawiki'] [operations/puppet] - 10https://gerrit.wikimedia.org/r/136920 (owner: 10Ori.livneh) [23:47:43] MaxSem: all looks good, thanks! Flow's plan for world domination advances. 6,999,986 talk pages to go :)