[00:34:22] (03PS1) 10Springle: rebalance group loads while db1050 out [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103326 [00:36:06] (03CR) 10Springle: [C: 032] rebalance group loads while db1050 out [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103326 (owner: 10Springle) [00:37:10] !log springle synchronized wmf-config/db-eqiad.php 'rebalance group loads while db1050 out' [00:37:26] Logged the message, Master [00:43:14] (03PS1) 10Springle: assign db1033 to s1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/103327 [00:45:38] (03CR) 10Springle: [C: 032] assign db1033 to s1 [operations/puppet] - 10https://gerrit.wikimedia.org/r/103327 (owner: 10Springle) [00:50:02] !log xtrabackup clone db1049 to db1033 [00:50:18] Logged the message, Master [01:59:41] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:29:11 PM UTC [01:59:41] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:28:53 PM UTC [02:07:23] !log LocalisationUpdate completed (1.23wmf7) at Mon Dec 23 02:07:23 UTC 2013 [02:07:42] Logged the message, Master [02:13:09] !log LocalisationUpdate completed (1.23wmf8) at Mon Dec 23 02:13:09 UTC 2013 [02:13:24] Logged the message, Master [02:26:35] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Dec 23 02:26:35 UTC 2013 [02:26:52] Logged the message, Master [04:00:22] PROBLEM - RAID on searchidx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:01:11] RECOVERY - RAID on searchidx1001 is OK: OK: optimal, 1 logical, 4 physical [05:00:41] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:28:53 PM UTC [05:00:41] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:29:11 PM UTC [05:17:31] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [05:54:35] (03PS1) 10Springle: warm up db1033 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103339 [05:56:28] (03CR) 10Springle: [C: 032] warm up db1033 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103339 (owner: 10Springle) [05:57:39] !log springle synchronized wmf-config/db-eqiad.php 'warm up db1033 in s1' [05:57:55] Logged the message, Master [06:18:31] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [06:24:58] (03CR) 10BryanDavis: [C: 031] "Using the "official" logstash packages works for me. I haven't looked at the startup scripts, etc that they provide, but we should be able" [operations/puppet] - 10https://gerrit.wikimedia.org/r/103112 (owner: 10Faidon Liambotis) [07:30:31] PROBLEM - MySQL Idle Transactions on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:30:31] PROBLEM - MySQL InnoDB on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:31:41] PROBLEM - MySQL Recent Restart on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:01] PROBLEM - mysqld processes on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:09] hmm [07:32:11] PROBLEM - MySQL Slave Running on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:32:21] PROBLEM - MySQL Processlist on db1016 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:34:31] RECOVERY - MySQL Recent Restart on db1016 is OK: OK 10541341 seconds since restart [07:34:51] RECOVERY - mysqld processes on db1016 is OK: PROCS OK: 1 process with command name mysqld [07:35:01] RECOVERY - MySQL Slave Running on db1016 is OK: OK replication Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Error: [07:35:11] RECOVERY - MySQL Processlist on db1016 is OK: OK 0 unauthenticated, 0 locked, 0 copy to table, 0 statistics [07:35:21] RECOVERY - MySQL InnoDB on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [07:35:21] RECOVERY - MySQL Idle Transactions on db1016 is OK: OK longest blocking idle transaction sleeps for 0 seconds [07:43:44] !log taking down db1016 for xfs_check, locked up and: XFS (dm-0): xlog_space_left: head behind tail [07:43:58] ouch [07:44:01] Logged the message, Master [07:44:16] paravoid: know something about that? [07:44:24] * springle never seen before [07:44:24] nope [07:44:52] sounds like a kernel bug [08:01:01] PROBLEM - MySQL Slave Delay on db1050 is CRITICAL: CRIT replication delay 45460 seconds [08:01:11] PROBLEM - MySQL Replication Heartbeat on db1050 is CRITICAL: CRIT replication delay 45386 seconds [08:01:41] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:28:53 PM UTC [08:01:41] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:29:11 PM UTC [08:31:43] (03CR) 10Faidon Liambotis: [C: 032] reprepro: import from elasticsearch/logstash apt [operations/puppet] - 10https://gerrit.wikimedia.org/r/103112 (owner: 10Faidon Liambotis) [08:33:15] (03CR) 10Faidon Liambotis: [C: 032] "Ugh." [operations/puppet] - 10https://gerrit.wikimedia.org/r/103198 (owner: 10Yurik) [08:33:57] (03CR) 10Faidon Liambotis: [C: 032] Zero: Keep things DRY - removed duplicate IDs [operations/puppet] - 10https://gerrit.wikimedia.org/r/103199 (owner: 10Yurik) [08:35:10] thanks paravoid, i will rebase the other patches [08:36:17] which ones? [08:41:56] (03CR) 10Faidon Liambotis: [C: 04-2] "Per our meeting." [operations/puppet] - 10https://gerrit.wikimedia.org/r/102316 (owner: 10Yurik) [08:48:55] paravoid, https://gerrit.wikimedia.org/r/#/c/102887/ [08:49:34] yup, I saw that [08:51:08] (03PS3) 10Yurik: Handle the change with netmapper and varnish 3.0.4 & later [operations/puppet] - 10https://gerrit.wikimedia.org/r/102887 [08:52:11] paravoid, actually - no rebase conflicts ^. Any objections to that patch? [08:52:17] yes [08:52:22] I'd like to hear from Brandon [08:52:32] to see if it'd be possible to fix in the vmod itself [08:53:51] last time i spoke with him he said its beyond vmod - its the new feature of the varnish. But maybe he overlooked something [08:54:50] my only concern at this point is the fact that now i can't test anything in betalabs, because the varnish there is the latest, and I would have to downgrade it, which precludes me from ESI testing based on the patch you submitted [08:55:02] ?? [08:55:10] I don't understand [08:55:15] beta cluster uses the same VCLs [08:55:24] well, then don't [08:55:33] brandon migrated it to the 3.0.5+ alpha [08:55:35] run a separate varnish instance from betalabs [08:55:54] or wait until brandon wakes up, it's only a few hours [08:56:07] i thought he was out for 2 weeks? [08:56:14] oh? is he? [08:56:19] that's what he said [08:56:55] checking log... [08:57:22] PROBLEM - MySQL Slave Delay on db69 is CRITICAL: CRIT replication delay 305 seconds [08:57:36] sorry, no log - chatzilla doesn't log the new channels [08:58:01] PROBLEM - MySQL Replication Heartbeat on db69 is CRITICAL: CRIT replication delay 342 seconds [09:03:01] RECOVERY - MySQL Replication Heartbeat on db69 is OK: OK replication delay -0 seconds [09:03:21] RECOVERY - MySQL Slave Delay on db69 is OK: OK replication delay 0 seconds [09:04:37] paravoid: https://gerrit.wikimedia.org/r/#/c/92288/ ... :) [09:05:38] (03PS5) 10Faidon Liambotis: removing cache clean up patch [operations/puppet] - 10https://gerrit.wikimedia.org/r/92288 (owner: 10Matanya) [09:06:06] (03PS6) 10Faidon Liambotis: removing cache clean up patch [operations/puppet] - 10https://gerrit.wikimedia.org/r/92288 (owner: 10Matanya) [09:07:11] (03PS7) 10Faidon Liambotis: varnish: removing temp cache fix for zero [operations/puppet] - 10https://gerrit.wikimedia.org/r/92288 (owner: 10Matanya) [09:07:58] (03CR) 10Faidon Liambotis: [C: 032] varnish: remove temp cache fix for zero [operations/puppet] - 10https://gerrit.wikimedia.org/r/92288 (owner: 10Matanya) [09:09:39] thanks paravoid [09:10:20] paravoid, how about i do the same thing in zero.vcl as we did in mobile.vcl -- surround those if statements with the <% if cluster_options.fetch( "enable_esi", false ) -%> [09:10:30] which statements? [09:10:40] why can't you just setup a separate varnish instance to play with EIS [09:10:43] *ESI [09:10:53] paravoid, https://gerrit.wikimedia.org/r/#/c/102887/3/templates/varnish/zero.inc.vcl.erb [09:10:54] so that you don't have to go through a production review process to perform tests? [09:12:18] labs instance? can't do it very easily - all domain names are targeted there [09:12:28] so i would have to migrate to older varnish [09:12:39] which i would rather not do as that has high chance of killing betalabs [09:12:43] wait, what? [09:12:59] http://en.m.wikipedia.beta.wmflabs.org/wiki/Main_Page [09:13:06] this is served by the new varnish [09:13:32] as well as all other *.m.*.beta & *.zero.*.beta [09:16:20] your idea kind of defeats the point of labs I think [09:17:10] it's kinda silly to go via a production code review process to test VCL changes in labs [09:24:03] (03CR) 10Faidon Liambotis: Varnish: switch to W3C standard headers for ESI (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/102633 (owner: 10Faidon Liambotis) [09:26:42] (03PS3) 10Faidon Liambotis: Varnish: switch to W3C standard headers for ESI [operations/puppet] - 10https://gerrit.wikimedia.org/r/102633 [09:27:09] (03PS4) 10Faidon Liambotis: Varnish: switch to W3C standard headers for ESI [operations/puppet] - 10https://gerrit.wikimedia.org/r/102633 [09:30:12] (03CR) 10Yurik: [C: 031] "No need to hold this off until backend, this way i will receive correct signals in the backend from varnish from the start" [operations/puppet] - 10https://gerrit.wikimedia.org/r/102633 (owner: 10Faidon Liambotis) [09:32:34] (03CR) 10Faidon Liambotis: [C: 032] Varnish: switch to W3C standard headers for ESI [operations/puppet] - 10https://gerrit.wikimedia.org/r/102633 (owner: 10Faidon Liambotis) [09:34:55] (03CR) 10Faidon Liambotis: [C: 04-2] "I don't see the point of an m->desktop redirect, as I pointed out on the bug report." [operations/apache-config] - 10https://gerrit.wikimedia.org/r/101787 (owner: 10John F. Lewis) [09:43:44] (03Abandoned) 10Murfel: Update favicon spcom.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103249 (owner: 10Murfel) [09:46:21] RECOVERY - Host cp1065 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms [09:47:36] that would be me [09:53:01] PROBLEM - Host cp1065 is DOWN: PING CRITICAL - Packet loss = 100% [09:54:33] !log rebooting cp1065 with Linux 3.2 [09:54:50] Logged the message, Master [09:55:11] RECOVERY - Host cp1065 is UP: PING OK - Packet loss = 0%, RTA = 1.04 ms [10:22:37] (03PS2) 10Murfel: Update favicon spcom.ico [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103203 [10:38:11] RECOVERY - DPKG on lanthanum is OK: All packages OK [10:39:03] akosiaris: was that you? [10:39:22] paravoid: indirectly yes [10:39:49] !log uploaded php5_5.3.10-1ubuntu3.9+wmf1 to apt.wikimedia.org [10:40:04] Logged the message, Master [11:02:41] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:28:53 PM UTC [11:02:41] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:29:11 PM UTC [11:11:31] PROBLEM - Disk space on labstore4 is CRITICAL: DISK CRITICAL - /srv is not accessible: Input/output error [11:11:42] oh no ^ [11:17:44] ugghh [11:18:38] that cannot be good [11:18:45] [2289504.725663] XFS (dm-0): xfs_log_force: error 5 returned. [11:18:47] ok yay [11:18:50] *oh yay [11:19:48] that's a secondary error [11:19:51] what's the primary one? [11:20:25] there's a lot of cruft in dmesg, I'm still looking [11:20:56] ffff880436b22000: 80 00 20 3c c3 04 a5 b2 00 00 00 01 00 00 00 00 .. <............ [11:20:56] [2278571.852351] XFS (dm-0): Internal error xfs_dir2_data_reada_verify at line 226 of file /build/buildd/linux-lts-raring-3.8.0/fs/xfs/xfs_dir2_data.c. Caller 0xffffffffa036e33f [11:21:13] and after the stacktrace [2278571.869978] XFS (dm-0): Corruption detected. Unmount and run xfs_repair [11:21:13] [2278571.881017] XFS (dm-0): metadata I/O error: block 0x3c0082bb0 ("xfs_trans_read_buf_map") error 117 numblks 8 [11:21:53] [2288970.235855] XFS (dm-0): I/O Error Detected. Shutting down filesystem [11:21:56] [2288970.238569] Buffer I/O error on device dm-0, logical block 85086316 [11:22:01] [2288970.163666] XFS (dm-0): Corruption detected. Unmount and run xfs_repair [11:22:04] yay [11:22:09] check-raid says ok... [11:22:18] so much fun :/ [11:22:26] it seems like this has been going on for awhile [11:22:28] it's corrupted [11:22:33] [39175.745798] XFS (dm-0): Internal error xfs_dir2_data_reada_verify at line 226 of file /build/buildd/linux-lts-raring-3.8.0/fs [11:22:42] why are we using xfs again ? :P [11:23:00] root@labstore4:~# mount | wc -l [11:23:01] 39302 [11:23:02] wtf. [11:23:24] isn't that supposed to have a backup labstoer? [11:23:32] yes it is [11:24:04] don't know which one though [11:36:07] text coren maybe? [11:39:02] so labstore3 would be the backup, we would have to move the labsnfs ip addr of10.0.0.45 (currently on labstore4) to labstore3 and do the startup [11:39:09] I'm not confident in getting that right myself [11:39:31] is that correct ? cause they see different sets of disks [11:39:38] at least according to check-raid [11:40:04] and I remember coren saying the share the disk shelves now [11:40:25] that is what I thought (they share them) [11:40:35] so something could be different now (again) [11:40:51] texting is probably best [11:41:02] well one sees 25 logical and 36 physical and the other 12 logical, 12 physical [11:42:30] if labstore 3 is just missing some, that might be intentional [11:42:46] since it's supposed to be the standby host [11:43:50] the outputs of mount are ridiculous on both hosts [13:24:33] !log starting xtrabackup clone db1001 to db1016 [13:24:49] Logged the message, Master [13:39:25] anyone with corens cellular number around? [13:40:10] apergos was handling the issue afaik [13:40:28] he's been paged and voicemail left [13:40:35] apergos: thanks [13:45:13] Can someone find out what happened to a seemingly missing password reset email? [13:53:16] https://tools.wmflabs.org/guc/?user=79.22.0.97 < WTF? [13:53:35] Vito: #wikimedia-labs please [14:03:41] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:28:53 PM UTC [14:03:41] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:29:11 PM UTC [14:34:41] PROBLEM - Host labstore4 is DOWN: PING CRITICAL - Packet loss = 100% [14:35:55] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 02:29:59 PM UTC [14:37:55] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 02:29:59 PM UTC [14:39:55] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 02:29:59 PM UTC [14:41:55] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 02:29:59 PM UTC [14:43:55] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 02:29:59 PM UTC [14:43:55] RECOVERY - Host labstore4 is UP: PING OK - Packet loss = 0%, RTA = 35.40 ms [14:44:26] RECOVERY - Disk space on labstore4 is OK: DISK OK [14:45:55] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 02:29:59 PM UTC [14:47:55] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 02:29:59 PM UTC [14:49:55] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 02:29:59 PM UTC [14:51:56] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 02:29:59 PM UTC [14:53:55] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 02:29:59 PM UTC [14:55:56] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 02:29:59 PM UTC [14:57:47] ignre the lvs1003 issue please [14:57:55] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 02:29:59 PM UTC [14:59:56] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 02:29:59 PM UTC [15:00:25] RECOVERY - Puppet freshness on lvs1003 is OK: puppet ran at Mon Dec 23 15:00:17 UTC 2013 [15:01:55] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 03:00:17 PM UTC [15:03:02] that will be cleared up next puppet run [15:28:46] qchris, this is part of the ssl discussion: [15:28:47] https://rt.wikimedia.org/Ticket/Display.html?id=859 [15:28:49] can you read that? [15:29:00] Let me try :-) [15:29:33] RECOVERY - Puppet freshness on lvs1003 is OK: puppet ran at Mon Dec 23 15:29:26 UTC 2013 [15:29:51] ottomata: Thanks. I can read it :-) [15:30:40] !log labnet1001 relocating racks C4 to B3 eqiad [15:30:56] Logged the message, Master [15:31:27] cool, the relevant bit there are towards the end of the discussion [15:31:33] although i wish there were links to gerrit changes [15:31:41] it doesn't really say why we still have them in the stream [15:36:23] hiyaaa paravoid! [15:36:46] nuria wants to do some testing of varnish log changes for eventlogging, she's wondering if she could/should use the beta cluster [15:36:49] who should she ask about that? [15:37:53] oh maybe antoine, she'll figure it out I think [15:44:23] PROBLEM - Puppet freshness on searchidx1001 is CRITICAL: Last successful Puppet run was Mon 23 Dec 2013 12:44:05 PM UTC [15:56:33] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: reqstats.5xx [crit=500.000000 [16:05:45] Jeff_Green: [16:05:49] i just merged your OCG comment [16:05:51] s'ok? [16:12:33] RECOVERY - Puppet freshness on searchidx1001 is OK: puppet ran at Mon Dec 23 16:12:30 UTC 2013 [16:21:38] ottomata: yes, thank you [16:21:51] i got interrupted and forgot [16:24:19] k no probs [16:44:29] ottomata: hey [16:44:35] you forgot to close #6535 [16:45:13] paravoid: does this change make sense to you? https://gerrit.wikimedia.org/r/#/c/103376? I was trying to use MobileFrontend on beta labs today and login would give me 503s [16:45:13] thanks [16:45:35] oh actually, i guess it isn't entirely closed, paravoid [16:45:42] https://gerrit.wikimedia.org/r/#/c/103375/ [16:45:48] see comment there ^ [16:46:45] chrismcmahon: hey, nice catch [16:47:12] chrismcmahon: although the fix is incomplete, you need to make it on mobile-backend change too [16:47:17] chrismcmahon: I'll fix it and merge, no worries [16:48:00] paravoid: great thanks, varnish params is not my area of expertise [16:48:24] chrismcmahon: not that 35s is a reasonable amount of time to wait for a page... [16:49:50] paravoid: something is going wrong with beta labs this morning, was getting stack traces from login, now getting 500 errors from Varnish, not sure way. but before that I was getting 503s trying to log in. whatever's going on is affecting page load times also [16:52:00] chrismcmahon: I linked to https://bugzilla.wikimedia.org/show_bug.cgi?id=57026 in my comment [16:52:03] and it has been reopened since [16:52:19] with the cause identified and a proposed, yet unmerged, fix [16:52:43] so it was probably this that is causing 57249 [16:53:33] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: reqstats.5xx [warn=250.000 [16:55:17] ottomata: analytics1001 is gone ...correct? I replaced it with virt1001 and removing puppet entries. Just want to confirm...thx [16:56:53] cmjohnson1: it is still alive [16:57:03] but afaik i think slated for repurposing [16:57:26] i just sshed in anyway [16:58:06] ottomata: okay..yeah I haven't pushed dns changes yet [16:58:20] just want to be clear [16:58:42] k [16:58:43] cool [16:59:41] !log dns update [16:59:56] Logged the message, Master [17:04:23] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:28:53 PM UTC [17:04:23] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Last successful Puppet run was Wed 18 Dec 2013 10:29:11 PM UTC [17:10:23] paravoid, can/should I just include admins::roots on vanadium? [17:10:31] all I really need is groups::wikidev [17:10:37] I already commented on the patchset [17:10:40] oh sorry [17:10:44] we've lost grrrit-wm [17:10:54] ok thanks [17:30:27] gwicke: around? [17:41:36] cmjohnson1: Anything I can help with in gwicke's absence? [17:48:16] james_f did you get my last...i was disconnected [17:50:09] cmjohnson1: No. [17:51:02] I want to take down cerium or xenon ...they're reporting failed raid but gwicke thinks it's a red herring but...still reports so I will either try to fix or replace disk [17:51:18] * James_F nods. [17:52:03] cmjohnson1: De-pooling should work fine, I think. [17:52:30] there's no depooling, it's the cassandra test hosts [17:52:46] right...i just want approval [17:52:54] Oh, those. [17:53:09] * James_F looses track of what cerium is being used for this week. :-) [17:53:22] subbu should be able to tell for sure, but that sounds OK. [17:54:55] cmjohnson1: don't worry about the raid [17:55:18] I manually failed one of the disks each so that I could use them as linear stripe [17:55:38] okay...can i fail the drive so it stops reporting [17:56:25] cmjohnson1: I did mdadm -f and -r [17:56:36] the drive is used in lvm [17:57:35] cmjohnson1: do you mean editing mdadm.conf or the like? [17:58:06] I can disable reporting there if that helps [17:58:37] that would help [18:02:25] cmjohnson1: it looks like AUTOCHECK=false in /etc/default/mdadm should do it [18:03:10] changed that now, please let me know if mdadm continues to spam [18:03:35] cool..i will thanks for doing that [18:04:11] sorry for the spam [18:30:17] what's up with grrrt-wm? [18:30:52] wow [18:31:09] nice job! [18:31:10] (03PS1) 10Yatinmaan: Added commons.ico with more resolutions. \n Buzilla Link - review/yatinmaan/103107 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103355 [18:31:34] (03CR) 10Aklapper: "Superseded by https://gerrit.wikimedia.org/r/#/c/103355/ according to comments in GCI task" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103107 (owner: 10Yatinmaan) [18:31:35] (03CR) 10Aklapper: [C: 04-1] "The commit message is currently broken and needs fixing. See https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103355 (owner: 10Yatinmaan) [18:31:40] (03PS1) 10Jgreen: comment out OCG icinga check for now [operations/puppet] - 10https://gerrit.wikimedia.org/r/103363 [18:31:41] (03CR) 10Jgreen: [C: 032 V: 031] comment out OCG icinga check for now [operations/puppet] - 10https://gerrit.wikimedia.org/r/103363 (owner: 10Jgreen) [18:31:51] (03CR) 10Qgil: [C: 031] "Thank you! Now everything looks good to me. Odder?" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103203 (owner: 10Murfel) [18:32:14] (03PS1) 10Ottomata: Removing sudo_user otto from vanadium [operations/puppet] - 10https://gerrit.wikimedia.org/r/103372 [18:32:15] (03PS1) 10Ottomata: Giving Nuria access to vanadium [operations/puppet] - 10https://gerrit.wikimedia.org/r/103373 [18:32:17] (03CR) 10Ottomata: [C: 032 V: 032] Removing sudo_user otto from vanadium [operations/puppet] - 10https://gerrit.wikimedia.org/r/103372 (owner: 10Ottomata) [18:32:20] (03CR) 10Ottomata: [C: 032 V: 032] Giving Nuria access to vanadium [operations/puppet] - 10https://gerrit.wikimedia.org/r/103373 (owner: 10Ottomata) [18:32:21] (03PS1) 10Ottomata: Including admins::roots on vanadium [operations/puppet] - 10https://gerrit.wikimedia.org/r/103375 [18:32:23] (03CR) 10Ottomata: "Should admins::roots be everywhere?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/103375 (owner: 10Ottomata) [18:32:37] (03PS1) 10Cmcmahon: varnish: adjust first_byte_timeout to 180s (MobileFrontend) [operations/puppet] - 10https://gerrit.wikimedia.org/r/103376 [18:32:39] (03PS1) 10Cmjohnson: Removing old analytics dns files (an1001,1002,1005,1006,1008). Replacing mgmt entries with virt100[1-3][7-8]. [operations/dns] - 10https://gerrit.wikimedia.org/r/103377 [18:32:40] (03PS2) 10Cmcmahon: varnish: adjust first_byte_timeout to 185s (MobileFrontend) [operations/puppet] - 10https://gerrit.wikimedia.org/r/103376 [18:32:43] (03CR) 10Cmjohnson: [C: 032] Removing old analytics dns files (an1001,1002,1005,1006,1008). Replacing mgmt entries with virt100[1-3][7-8]. [operations/dns] - 10https://gerrit.wikimedia.org/r/103377 (owner: 10Cmjohnson) [18:32:45] (03PS1) 10Faidon Liambotis: install-server: further cleanups to apt-repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/103378 [18:32:45] (03PS2) 10Faidon Liambotis: install-server: further cleanups to apt-repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/103378 [18:32:46] (03CR) 10Faidon Liambotis: [C: 032 V: 032] install-server: further cleanups to apt-repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/103378 (owner: 10Faidon Liambotis) [18:32:51] (03PS3) 10Faidon Liambotis: varnish: adjust first_byte_timeout to 185s (MobileFrontend) [operations/puppet] - 10https://gerrit.wikimedia.org/r/103376 (owner: 10Cmcmahon) [18:32:54] (03PS4) 10Faidon Liambotis: varnish: adjust first_byte_timeout for mobile too [operations/puppet] - 10https://gerrit.wikimedia.org/r/103376 (owner: 10Cmcmahon) [18:32:57] (03CR) 10Faidon Liambotis: [C: 032 V: 032] varnish: adjust first_byte_timeout for mobile too [operations/puppet] - 10https://gerrit.wikimedia.org/r/103376 (owner: 10Cmcmahon) [18:32:57] (03CR) 10Qgil: "No, let's continue amending this change here, please. There is a chain of dependencies that needs to be resolved anyway. Yatinmaan, this c" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103107 (owner: 10Yatinmaan) [18:33:00] (03PS1) 10Cmjohnson: Changing dhcpd file entries to reflect server name change (analytics to virts) [operations/puppet] - 10https://gerrit.wikimedia.org/r/103380 [18:33:01] (03CR) 10Qgil: [C: 04-1] "Please, let's keep amending https://gerrit.wikimedia.org/r/#/c/103107/ . I have left some comments there about why it is good to amend pat" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103355 (owner: 10Yatinmaan) [18:33:11] (03CR) 10Faidon Liambotis: [C: 04-1] "No, it shouldn't be everywhere, not until it's overhauled and fully automated. Just include groups::wikidev for now." [operations/puppet] - 10https://gerrit.wikimedia.org/r/103375 (owner: 10Ottomata) [18:33:14] (03PS2) 10Cmjohnson: Updating puppet entries to reflect server name changes (analytics to virts) [operations/puppet] - 10https://gerrit.wikimedia.org/r/103380 [18:33:18] (03PS2) 10Ottomata: Including groups::wikidev on vanadium [operations/puppet] - 10https://gerrit.wikimedia.org/r/103375 [18:33:19] (03PS3) 10Ottomata: Including groups::wikidev on vanadium [operations/puppet] - 10https://gerrit.wikimedia.org/r/103375 [18:33:20] (03CR) 10Ottomata: "Ok, I changed this to just include groups::wikidev" [operations/puppet] - 10https://gerrit.wikimedia.org/r/103375 (owner: 10Ottomata) [18:33:20] ottomata1: analytics multicast should be fixed, can you check? [18:33:21] (03CR) 10Ottomata: [C: 032 V: 032] "Ok, I changed this to just include groups::wikidev" [operations/puppet] - 10https://gerrit.wikimedia.org/r/103375 (owner: 10Ottomata) [18:33:22] (03PS3) 10Cmjohnson: Updating puppet entries to reflect server name changes (analytics to virts) [operations/puppet] - 10https://gerrit.wikimedia.org/r/103380 [18:33:23] (03CR) 10Cmjohnson: [C: 032] Updating puppet entries to reflect server name changes (analytics to virts) [operations/puppet] - 10https://gerrit.wikimedia.org/r/103380 (owner: 10Cmjohnson) [18:33:33] (03PS1) 10Cmjohnson: Removing public ip for analytics1001 [operations/dns] - 10https://gerrit.wikimedia.org/r/103382 [18:33:33] (03CR) 10Cmjohnson: [C: 032] Removing public ip for analytics1001 [operations/dns] - 10https://gerrit.wikimedia.org/r/103382 (owner: 10Cmjohnson) [18:33:36] (03CR) 10Qgil: "Ok, so the 16x16 and 32x32 versions of the original favicon look more sharp than yours. Since the original ones are correct, you can just " [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103355 (owner: 10Yatinmaan) [18:33:43] (03PS1) 10Ottomata: Adding Nuria so to admins::restricted [operations/puppet] - 10https://gerrit.wikimedia.org/r/103383 [18:33:44] (03CR) 10Ottomata: [C: 032 V: 032] Adding Nuria so to admins::restricted [operations/puppet] - 10https://gerrit.wikimedia.org/r/103383 (owner: 10Ottomata) [18:34:30] (03PS1) 10Ori.livneh: Revert "Hack: cron job to clean up tifs from /tmp on app servers" [operations/puppet] - 10https://gerrit.wikimedia.org/r/103390 [18:35:35] paravoid, will do in just a sec [18:35:50] (check multicast that is) [18:37:36] hm, or not [18:38:08] oh nevermind, it's probably fixed [18:38:27] ottomata: nuria's account should have sudo to be meaningful (otherwise can't stop / start eventlogging services), paravoid's OK with it, so i'm submitting another patch on top of yours [18:39:58] ok [18:40:01] thanks ori [18:40:29] ori that one has been merged so you'll need to submit a new change [18:41:49] (03PS1) 10Hoo man: Add the file namespace to Wikibase Client excludeNamespaces [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103394 [18:48:08] (03PS1) 10Ori.livneh: Add nuria to sudoers on vanadium [operations/puppet] - 10https://gerrit.wikimedia.org/r/103397 [18:56:44] (03PS2) 10Hoo man: Add file and translate NS to Wikibase Client excludeNamespaces [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/103394 [18:57:35] (03CR) 10Faidon Liambotis: [C: 032] Add nuria to sudoers on vanadium [operations/puppet] - 10https://gerrit.wikimedia.org/r/103397 (owner: 10Ori.livneh) [18:58:20] ori-l, nuria: ^^ [18:59:24] (03CR) 10Aaron Schulz: [C: 031] Revert "Hack: cron job to clean up tifs from /tmp on app servers" [operations/puppet] - 10https://gerrit.wikimedia.org/r/103390 (owner: 10Ori.livneh)