[00:00:10] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:00:29] New patchset: Ryan Lane; "Ensure default site is gone." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2872 [00:01:02] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2872 [00:01:03] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2872 [00:02:07] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:02:25] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Wed Feb 29 00:02:15 UTC 2012 [00:04:04] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:06:11] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:08:07] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:08:16] RECOVERY - RAID on db40 is OK: OK: 1 logical device(s) checked [00:09:38] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:11:25] New patchset: Ryan Lane; "Add manganese to ldap firewall rules on virt0" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2874 [00:11:35] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:13:32] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:13:48] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2874 [00:13:48] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2874 [00:15:29] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:17:26] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:19:23] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:21:20] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:23:17] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:25:14] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:27:11] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:29:08] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:31:05] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:32:53] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Wed Feb 29 00:32:42 UTC 2012 [00:33:02] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:34:59] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:36:29] PROBLEM - Packetloss_Average on emery is CRITICAL: CRITICAL: packet_loss_average is 8.46059248 (gt 8.0) [00:36:56] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:37:14] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.60164352 (gt 8.0) [00:38:53] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:39:11] PROBLEM - Puppet freshness on mw1010 is CRITICAL: Puppet has not run in the last 10 hours [00:40:50] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:42:47] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:44:44] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:46:14] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [00:46:14] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [00:46:41] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:48:56] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:50:53] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:50:56] so what is up with db1040 ? [00:51:24] it's doing puppet checks [00:52:04] i blame nagios [00:52:05] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Wed Feb 29 00:51:36 UTC 2012 [00:52:23] RECOVERY - MySQL Slave Running on db24 is OK: OK replication [00:52:50] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:53:12] yeah [00:53:13] heh [00:54:47] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:56:44] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [00:57:56] PROBLEM - MySQL Slave Delay on db24 is CRITICAL: CRIT replication delay 11499 seconds [00:58:41] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:00:25] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Wed Feb 29 01:00:14 UTC 2012 [01:00:34] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:02:22] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Wed Feb 29 01:01:49 UTC 2012 [01:02:49] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:03:25] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Wed Feb 29 01:03:11 UTC 2012 [01:04:55] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:06:52] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:07:25] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2514 [01:07:26] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2514 [01:07:55] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.23574895161 (gt 8.0) [01:08:58] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:09:45] New patchset: Ryan Lane; "Adding sumanah back onto manganese" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2876 [01:11:04] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:12:09] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2876 [01:12:10] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2876 [01:12:16] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Wed Feb 29 01:11:59 UTC 2012 [01:13:01] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:14:58] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:16:55] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:19:10] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:21:07] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:21:52] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Wed Feb 29 01:21:32 UTC 2012 [01:23:04] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:25:01] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:26:58] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:28:55] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:30:37] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:30:37] RECOVERY - MySQL Slave Delay on db24 is OK: OK replication delay 27 seconds [01:31:22] RECOVERY - MySQL Replication Heartbeat on db24 is OK: OK replication delay 0 seconds [01:32:34] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:34:04] RECOVERY - Puppet freshness on db1040 is OK: puppet ran at Wed Feb 29 01:33:38 UTC 2012 [01:34:31] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:36:28] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [01:51:22] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.14903610169 [02:02:31] !log manually set large rmem_max and rmem_default on locke and restarted udp2log to stem packet loss, opened an rt ticket to fix the (lost) fix [02:02:36] Logged the message, Master [02:17:46] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [02:18:04] RECOVERY - Packetloss_Average on emery is OK: OK: packet_loss_average is 3.20444134454 [02:27:49] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [02:36:49] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [02:36:49] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [02:53:24] New patchset: Bhartshorne; "added first iteration of the swift cleaner" [operations/software] (master) - https://gerrit.wikimedia.org/r/2877 [02:53:25] New review: gerrit2; "Lint check passed." [operations/software] (master); V: 1 - https://gerrit.wikimedia.org/r/2877 [02:54:47] New review: Bhartshorne; "(no comment)" [operations/software] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2877 [02:54:47] Change merged: Bhartshorne; [operations/software] (master) - https://gerrit.wikimedia.org/r/2877 [03:21:48] TimStarling: do you remember the magic word to deploy the squid configs to only upload squids? [03:21:56] just 'upload'? [03:23:28] PROBLEM - Disk space on db1025 is CRITICAL: DISK CRITICAL - free space: / 284 MB (3% inode=80%): /var/lib/ureadahead/debugfs 284 MB (3% inode=80%): [03:24:26] I think you would have to specify each cluster name as it appears in generated/clusters [03:24:31] PROBLEM - MySQL disk space on db1025 is CRITICAL: DISK CRITICAL - free space: / 284 MB (3% inode=80%): /var/lib/ureadahead/debugfs 284 MB (3% inode=80%): [03:24:45] ah, that looks right. [03:24:52] I was trying to read the perl and my eyes burned. [03:25:13] thank you [03:25:27] !log took swift out of rotation - thumbnails now served by ms5 [03:25:30] :( [03:25:31] Logged the message, Master [03:25:54] oh, and TimStarling, you were right. My heuristic for assessing bad thumbnails sucked ass. [03:31:25] RECOVERY - Disk space on db1025 is OK: DISK OK [03:32:28] RECOVERY - MySQL disk space on db1025 is OK: DISK OK [06:21:55] PROBLEM - Puppet freshness on db1022 is CRITICAL: Puppet has not run in the last 10 hours [07:58:55] New patchset: ArielGlenn; "move rsync to external mirrors off to download mirror host" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2879 [07:59:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2879 [08:03:10] New review: ArielGlenn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2879 [08:03:10] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2879 [08:50:41] New patchset: ArielGlenn; "download host kernel settings for eth buffer allocs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2880 [08:51:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2880 [08:52:47] New review: ArielGlenn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2880 [08:52:48] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2880 [09:06:46] PROBLEM - Puppet freshness on hooper is CRITICAL: Puppet has not run in the last 10 hours [09:07:49] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [09:08:43] PROBLEM - Puppet freshness on formey is CRITICAL: Puppet has not run in the last 10 hours [10:17:21] PROBLEM - Puppet freshness on virt0 is CRITICAL: Puppet has not run in the last 10 hours [10:30:01] PROBLEM - Puppet freshness on mw70 is CRITICAL: Puppet has not run in the last 10 hours [10:30:01] PROBLEM - Puppet freshness on mw1098 is CRITICAL: Puppet has not run in the last 10 hours [10:40:58] PROBLEM - Puppet freshness on mw1010 is CRITICAL: Puppet has not run in the last 10 hours [10:48:01] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [10:48:01] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [10:55:04] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [11:00:01] PROBLEM - Puppet freshness on srv278 is CRITICAL: Puppet has not run in the last 10 hours [11:13:04] PROBLEM - Puppet freshness on db25 is CRITICAL: Puppet has not run in the last 10 hours [11:13:58] PROBLEM - Puppet freshness on amssq42 is CRITICAL: Puppet has not run in the last 10 hours [11:13:58] PROBLEM - Puppet freshness on sq76 is CRITICAL: Puppet has not run in the last 10 hours [11:13:58] PROBLEM - Puppet freshness on mw71 is CRITICAL: Puppet has not run in the last 10 hours [11:15:01] PROBLEM - Puppet freshness on amslvs2 is CRITICAL: Puppet has not run in the last 10 hours [11:15:01] PROBLEM - Puppet freshness on cp1002 is CRITICAL: Puppet has not run in the last 10 hours [11:15:01] PROBLEM - Puppet freshness on db1019 is CRITICAL: Puppet has not run in the last 10 hours [11:15:01] PROBLEM - Puppet freshness on db10 is CRITICAL: Puppet has not run in the last 10 hours [11:15:01] PROBLEM - Puppet freshness on dataset1001 is CRITICAL: Puppet has not run in the last 10 hours [11:15:01] PROBLEM - Puppet freshness on es1 is CRITICAL: Puppet has not run in the last 10 hours [11:15:01] PROBLEM - Puppet freshness on db1030 is CRITICAL: Puppet has not run in the last 10 hours [12:12:10] New patchset: Mark Bergsma; "Cleanup with hierarchy" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2881 [12:12:43] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2881 [12:12:44] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2881 [12:19:24] New patchset: Mark Bergsma; "Move misc::install-server into a separate misc/ file" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2882 [12:19:52] New patchset: Mark Bergsma; "Fix mode of /srv/autoinstall" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2883 [12:20:21] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2882 [12:20:21] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2882 [12:20:21] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2882 [12:20:22] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2883 [12:20:22] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2883 [12:20:31] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2883 [12:20:32] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2883 [12:41:12] New patchset: Mark Bergsma; "Add new, simple partman recipe for LVM on hw raid, root/swap LVs only" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2884 [12:41:39] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2884 [12:41:50] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2884 [12:41:51] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2884 [12:50:59] !log upgraded mwlib to 0.13.5 on pdf cluster [12:51:02] Logged the message, Master [13:05:17] I wanna update twinkle at hi-wp. I'm just worried that if I mess it up and someone messes an MW update up, I wont be able to tell the difference if it was me who messed up. So when will it be a good time to update twinkle sometime in the next week? [13:14:53] anyone? [13:23:26] New patchset: Mark Bergsma; "Make lvm.cfg recipe fully automatic" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2885 [13:23:54] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2885 [13:24:09] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2885 [13:24:10] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2885 [13:27:07] New patchset: Hashar; "rt: force HTTPS protocol" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2446 [13:42:38] New patchset: Mark Bergsma; "Make partman recipes fully automatic" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2886 [13:43:45] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2886 [13:43:46] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2886 [13:46:02] mark: do you happen to know how I we can get gerrit to reverify a change? https://gerrit.wikimedia.org/r/#change,2682 [13:46:34] should I just resubmit a dummy patchset ? [13:46:49] that doesn't work I believe [13:46:52] I don't think it's possible [13:53:28] New patchset: Hashar; "git-setup script no more use "git config --global"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2682 [13:53:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2682 [13:54:11] that would mean we only lint one time [13:55:07] !log Reinstalled strontium and palladium with hw raid1 and fully automatic lvm based partman recipe [13:55:12] Logged the message, Master [14:02:36] New patchset: Demon; "Push script for extensions" [operations/software] (master) - https://gerrit.wikimedia.org/r/2887 [14:02:37] New review: gerrit2; "Lint check passed." [operations/software] (master); V: 1 - https://gerrit.wikimedia.org/r/2887 [14:03:08] New review: Demon; "(no comment)" [operations/software] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2887 [14:03:09] Change merged: Demon; [operations/software] (master) - https://gerrit.wikimedia.org/r/2887 [14:38:03] domas: any suggestions how to do a find and replace on an unindexed column on a table with over 116 million rows? [14:38:20] Wanting to replace " " with "_" as someone made an issue in MW... [14:42:54] depends on how scattered rows are [14:43:38] No idea unfortunately [14:43:48] Relatively it's not going to be not that many rows [14:43:56] 1%? 0.1%? [14:44:24] Any page title (on any wiki) that has a space in it, a picture from commons, and the page has been moved [14:44:29] you can write a script that reads all rows, then issues replace statements =) [14:44:46] I was just thinking use one of the eqiad slaves [14:44:49] they're sat there doing nothing [14:44:56] you sure can do a batch query before [14:44:59] so I can run a long slow query, build a list of PKs and replace on the master from there? [14:45:03] right [14:46:16] I'll do that then, thanks :) [14:51:54] New patchset: Hashar; "Bug 28469 - Make SVN Documentation be indexed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2888 [15:19:25] New patchset: Demon; "Add rule for RL2 gadgets branch" [operations/software] (master) - https://gerrit.wikimedia.org/r/2889 [15:19:26] New review: gerrit2; "Lint check passed." [operations/software] (master); V: 1 - https://gerrit.wikimedia.org/r/2889 [15:38:45] domas: just over 4 million rows [15:38:56] 3.5% or s [15:38:58] so [15:42:01] heh [15:42:09] if new rows are not being written, just prepare a batch and run a script :) [15:44:32] Yup, that's what I was going to do now [15:48:00] just do the regular waitforslaves every X mutations [15:48:09] you can use REPLACE() and LIMIT 100 or something [15:50:39] New review: Demon; "(no comment)" [operations/software] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2889 [15:50:39] Change merged: Demon; [operations/software] (master) - https://gerrit.wikimedia.org/r/2889 [16:35:44] robh: cisco arrived please update rt 2499 [16:35:59] ok, will in a moment, on dell rma call at the moment =] [16:36:08] that will make ryan lane really happy [16:45:18] cmjohnson1: ok, finding a place for it. [16:47:34] i hate where cr2pmtpa is [16:47:42] its low neough in the rack that now the rack has a lot of power and no space [16:47:51] hrmm [16:48:30] ok, found a place. [16:49:50] cmjohnson1: https://rt.wikimedia.org/Ticket/Display.html?id=2499 updated [16:50:36] cool [16:55:50] So, it appears puppet "magically" restarted itself in the middle of the night and undid all my work on reportcard1. [16:55:58] This is frowny-face. [16:56:00] :( [17:08:23] dschoon: it does do that. root's crontab. [17:08:43] maplebed: ah! yes, that makes a great deal of sense. [17:08:48] ty. will fix. [17:09:00] I think it's because puppet was crashing randomly so this "fixed" it. [17:09:15] *nod* just don't have time to puppetize changes atm. [17:09:26] next week. need box in a certain state now tho. [17:12:28] oh Lucene, why are you evil? Why oh why. [17:12:34] morning AaronSchulz - any chance you could give me a code review on the swift cleaning stuff? [17:12:53] Jeff_Green: it secretly wants to be a dairy product. [17:13:02] hehehe. [17:13:45] when searches start coming up "Have you seen me?" I will really worry [17:14:42] maplebed: sure [17:14:51] thanks! [17:22:16] New patchset: Bhartshorne; "explaining why I don't use the normal URL to HEAD objects in swift" [operations/software] (master) - https://gerrit.wikimedia.org/r/2891 [17:22:33] New review: Bhartshorne; "(no comment)" [operations/software] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2891 [17:22:34] Change merged: Bhartshorne; [operations/software] (master) - https://gerrit.wikimedia.org/r/2891 [17:51:39] New review: Danakim; "Looks fine to me. Will this be merged in?" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2682 [18:22:26] New patchset: Bhartshorne; "grumble." [operations/software] (master) - https://gerrit.wikimedia.org/r/2892 [18:22:46] New review: Bhartshorne; "(no comment)" [operations/software] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2892 [18:22:46] Change merged: Bhartshorne; [operations/software] (master) - https://gerrit.wikimedia.org/r/2892 [18:39:19] Is there a way to make [[Special:Version]] update right after a deploy? https://bugzilla.wikimedia.org/34794 [18:39:33] oops, wrong bug [18:40:21] special:version caching: https://bugzilla.wikimedia.org/34796 [18:44:51] <^demon> Special:Version isn't cached? [18:47:38] hexmode: cant they just put ?action=purge [18:47:40] ;p [18:48:12] <^demon> On a special page? :p [18:48:16] i dunno [18:48:20] RobH: I've no clue :{ [18:48:22] im asking [18:48:38] i assumed purging would work on any cached page [18:48:39] special or no [18:48:44] but its an assumption [18:48:49] ^demon: do they not? [18:48:56] <^demon> The special pages that are cached are recached manually. [18:49:03] <^demon> Most special pages aren't cached at all. [18:49:18] so is this guy who reported the bug crazy? [18:49:23] <^demon> Possibly. [18:49:36] its always possible, i dunno them [18:49:38] heh [18:50:06] hexmode: may wanna make sure he knows about action purge so if he sees it again, he can try that. [18:50:12] but sounds like it shouldnt be cached form what chad says [18:50:33] oh wait [18:50:38] he states he was logged in [18:50:43] so he gets no cache, so this cannot be right [18:50:46] hexmode: ^ [18:50:57] <^demon> And the svn data's not cached in memc or anything. [18:51:33] browser cache? [18:52:10] <^demon> Possibly. I think it's confusion most likely. [18:53:27] ^demon: maplebed: RobH: Ask saper in #mediawiki [18:53:40] hexmode: doesnt matter, its invalid [18:53:43] he says he was logged in [18:53:51] logged in users skip caching. [18:54:34] * ^demon goes back to his work [19:06:08] robh: any mgmt info for virt5? [19:06:23] lemme make some right now [19:08:11] cmjohnson1: +virt5 1H IN A 10.1.8.78 [19:08:23] thx [19:08:28] !log dns update for virt5 mgmt [19:08:30] Logged the message, RobH [19:18:33] oooooooo [19:18:37] it's in and racked!? [19:21:04] New patchset: Bhartshorne; "changing the user for swift's rewrite stuff so I can use a new password" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2893 [19:21:34] New review: Bhartshorne; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2893 [19:21:35] Change merged: Bhartshorne; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2893 [19:23:47] ryan_lane: yes it is....do you want more than one nic connected? [19:24:02] yes, please [19:24:05] I need two connected [19:24:19] \o/ [19:24:24] okay [19:26:29] Ryan_Lane: i said that today would be your favorite day this week [19:26:32] because of that server [19:28:16] today is :) [19:28:26] I'm very happy about this [19:28:39] I'm also moving over to the new gerrit server today! :) [19:45:59] maplebed: the cleaner scripts seems to look OK so far otherwise [19:46:15] excellent. [19:46:20] I already found one bug... ;) [19:46:39] the try/except catching keyboard interrupts wouldn't have stopped execution when multiple files are passed in. [19:46:45] I moved it from delegate() to main() [19:47:08] I appreciate you reading over it for me. Thanks! [19:48:27] robh: do we want to keep nic redundancy active-active or none? [19:48:43] where is this setting? [19:48:44] or active-standby [19:48:53] in the cimc configuration [19:49:09] hrmm, i have to recall wtf i did in eqiad, lemme pull on eup [19:49:47] if none the eth ports operate independently [19:50:03] active-active used simultaneously [19:50:26] active standby [19:50:31] okay [19:50:46] these have two mgmt connections is why if i recall correctly [19:50:57] but we dont care about that, we only wire one on servers. [19:51:04] only core routers get dual mgmt connections [19:51:33] cmjohnson1: when you go to test, on these its not root, but admin [19:51:41] and it wont let you change it, so better to just use admin [19:51:52] just figured that out [19:51:55] we could add a root, but that leaves it open to errors, since its an added account [19:51:59] wondering how to do that [19:52:01] best to just adapt to that [19:52:08] nah, just leave admin, dont add root [19:52:23] if we decide to do that in the future, we will do it in a mass update on all the cisco servers [19:52:38] for now i prefer they all operate on the defaults like that [19:52:45] ok..sounds good [19:52:53] all set up...sending network ticket now [19:53:02] Ryan_Lane: ^ [19:53:07] since you took juniper classes [19:53:09] heh [19:53:13] you may wanna snag that ticket if you dont wanna wait ;] [19:53:18] * Ryan_Lane nods [19:53:24] my classes were canceled, i now have them next month =P [19:53:29] awww [19:53:31] took online version this time [19:56:50] you got to take one? [19:56:58] oh. the ones next month will be online. got it [19:57:26] yea, i dont wanna have another cancellation due to class size [19:57:32] online courses are a lot less prone to that issue [19:57:48] ok [19:58:01] I am going to have to update my parallels windows copy [19:58:07] i have not run it in months [19:58:17] and even then its for one specific windows task, i never run updates =P [19:58:29] the only thing i didn't care for in the online course was having to use windows and putty for everything. [19:58:35] seemed like that caused more issues [19:58:50] well, luckily i have my air running os x [19:58:57] i have parallels copies of linux and windows [19:59:07] and i plan to load my old laptop up with bootcamp before the course [19:59:15] so will have a nice mix of options to do the course. [19:59:34] that should help [20:06:40] New patchset: Demon; "Changing permissions" [operations/software] (master) - https://gerrit.wikimedia.org/r/2894 [20:07:27] New review: Demon; "(no comment)" [operations/software] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2894 [20:07:27] Change merged: Demon; [operations/software] (master) - https://gerrit.wikimedia.org/r/2894 [20:32:58] New review: Reedy; "This change should also be made to the api appserver config too..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2578 [20:34:18] Do we have a way of adding a Vary: Accept-Encoding header to svg files? [20:40:18] a user tells me new mailing lists can't be created? really? [20:58:17] !log restarting nova-compute on virt4 [20:58:21] Logged the message, Master [20:58:40] iptables restore was failing [21:01:12] maplebed: are we using swauth for swift? [21:01:23] yes [21:07:41] maplebed: I looked at the swiftcleaner code earlier, it looked like it should work as advertised. I did not run the numbers [21:07:55] apergos: awesome. thanks for the review. [21:08:00] sure ething [21:15:14] ryan_lane: I did not find out anything wrong with the test of labstore 1 but had Dell send me a new hdd anyway. rt 2441 [21:18:04] cmjohnson1: hrmmm [21:18:17] tested ok in the backplane slot it was in? [21:18:22] or tested ok removed from that slot? [21:18:28] (the former is better than the latter in this case) [21:18:46] i just wanna be sure it doesnt sound like backplane issue [21:19:47] between the dell rep and myself we don't think it is the backplane at this time. the easiest thing for us to do is swap the drive. if that doesn't work than it may be something more serious like the backplane. [21:21:11] * Ryan_Lane nods [21:21:25] cmjohnson1: have you checked out the raidset since the drive switch? [21:21:55] I have not switched the drive yet? it will arrive tomorrow [21:22:24] oh ok [21:22:56] yeah, are you still having issue with disk 2? [21:23:18] yes [21:25:32] ok...i will replace it and we'll go from there. [21:35:53] Ryan_Lane: not sure he can really do that [21:36:02] do what? [21:36:03] all he can tell is the mgmt feedback, which is lacking compared to raid utils [21:36:12] checking out raidsets [21:36:17] the raid controller showed the drive as being bad [21:36:24] in fact, it would go between missing and bad [21:36:28] right but how is he going to read that? [21:36:29] well, missing a rebuilding [21:36:33] *and [21:36:38] he can see if the disk is good or bad in the drac lom most of the time [21:36:40] boot into the raid controller? [21:36:46] but he cannot determine raid health outside of looking at nagios [21:36:50] sure if its down [21:36:55] sorry, didnt realize it was fully down [21:36:57] it's not doing anything right now [21:37:00] it isn't [21:37:09] I can shut it down or reboot it [21:37:15] this is a dell right? [21:37:19] yeah [21:37:23] just recalling what the raid bios shows [21:37:36] * Ryan_Lane nods [21:37:39] it shows if its healthy or not [21:37:50] but i dont really recall if it shows rebuiliding, cuz i always use command line tools, heh [21:38:07] so cmjohnson1 if it doesnt show that feel free to ping a root [21:38:22] but would be nice to know if it does. [21:38:48] if the server is up and online, then chris cannot confirm raidset, so was just making sure everuyone was on same page [21:38:49] sorry ;] [21:39:21] maplebed: meh, we should probably just make mw:mediawiki an admin [21:40:32] * AaronSchulz looks at bug 34814 [21:41:35] ^demon: in 30 minutes I'm going to switch gerrit [21:49:40] <^demon> Ok, I'm done with the stuff I'm doing. Fire away. [21:50:19] can somebody review my proposed apache conf changes for RT #2488? on fenari diff /root/redirects.conf /home/w/conf/httpd/redirects.conf [21:54:50] meanwhile . . . back in a few minutes, gonna make a quick snow shovelling pass before it gets too heavy [22:01:21] ^demon: ok [22:02:35] Gerrit is now down. I'm moving to the new server [22:02:50] !log stopped gerrit service on formey, moving to manganese [22:02:53] Logged the message, Master [22:03:11] TimStarling: what's with the pipe stuff in populateImageSha1.php and how are old/archived files handled? [22:04:27] that's old isn't it? [22:05:07] one could almost just call upgradeRow() on everything (though that redoes MIME) [22:06:50] crap. different ssh host key.... [22:06:59] mmm, r25134 [22:08:41] damn it, where does gerrit store its key? [22:09:27] AaronSchulz: it allows the SHA-1 operation to be done in parallel with the DB write [22:09:27] !log stopping gerrit on manganese [22:09:30] Logged the message, Master [22:09:43] !log replacing ssh_host_key on manganese for gerrit with the same one on formey [22:09:46] Logged the message, Master [22:10:08] in the original version without the pipe, it was spending about half of its time in each, so doing the pipe improved performance by a factor of 2 or so [22:10:51] what about the other two file tables? [22:11:46] ok. gerrit's back up [22:12:03] !log gerrit moved to manganese. [22:12:06] Logged the message, Master [22:12:34] !log reversing gerrit replication from formey -> manganese to manganese -> formey [22:12:37] Logged the message, Master [22:13:25] the version in r25134 had an unused variable $oldimageTable, obviously I intended to initialise it too but didn't get around to it [22:13:56] filearchive always had a sha1 field [22:14:08] so it didn't need populating [22:14:09] ah, yes [22:18:00] !log installed python-paramiko on manganese. needs to be puppetized [22:18:03] Logged the message, Master [22:18:39] !log stopped ircecho on formey and started it on manganese [22:18:42] Logged the message, Master [22:18:51] pity about brion's comment from r54410 [22:19:13] Ryan_Lane: doesn't formey still ircecho svn commits? [22:19:19] no [22:19:22] I thought I explained the reason for it to him at the time, but the comment was written two years on [22:19:33] Ryan_Lane: ok. [22:19:36] that happens elsewhere [22:21:23] TimStarling: maybe it can call getHistory() and hit all the old versions too. It also needs a '-force' param [22:21:45] am I going to cause a conflict for you if I edit that comment? [22:21:53] lol [22:22:00] probably not [22:22:51] gerrit-wm: hmm [22:24:18] New review: Ryan Lane; "Missing lint check..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2899 [22:24:21] better [22:28:10] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2899 [22:28:14] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2899 [22:30:29] !log restarting gerrit on manganese to enable replication [22:30:32] Logged the message, Master [22:32:08] hm. replication isn't working [22:32:18] Failed replicate of refs/notes/review to gerrit2@formey.wikimedia.org:/var/lib/gerrit2/review_site/git/operations/puppet.git, reason: failed to lock [22:32:32] maybe gerrit needs to be started on it.... [22:32:40] crap. short read of block. [22:37:52] bah, I started this saying it was late and now here it is after midnight again and I am caught in the middle of network testing with a peer [22:37:53] hate [22:41:30] we're deploying to en.wp shortly, aren't we? [22:41:38] I believe so [22:46:19] looks like brion beat me to OKing tim's commit [22:48:26] \o/ i win [22:55:06] New patchset: Bhartshorne; "added delete to swiftcleaner" [operations/software] (master) - https://gerrit.wikimedia.org/r/2900 [22:59:54] AaronSchulz: dunno if you're up for it, but I added the delete and purge stuff to the swiftcleaner and I'd love another review. [23:00:01] patch set 2900 up there. [23:00:24] oh, you're probably busy with 1.19 deploy. [23:00:25] nevermind. [23:00:59] * AaronSchulz was looking [23:01:15] sweet! [23:01:34] it isn't showing up in git log though [23:03:05] maybe because I didn't merge it yet? [23:08:20] ;) [23:08:36] I like how fast viewing logs is on git (hot the disk) [23:10:02] I tested it with a list of 500 images, but I'm hesitant to test it with the full a2 container just when enwiki is getting deployed. [23:10:08] I think I might go for a run then start it up. [23:10:24] but the run on 500 images worked nicely. [23:11:06] I ran it twice; the first time it registered a range of errors (filename and ms5 mostly), the second time every file that had errored out the first time errored with the 'does not exist on swift' check, indicating successful deletion! [23:11:07] \o/ [23:11:56] TimStarling: I'd love it if you review it once the deploy is over, as I want to leave it running even after we clear out old chaff to catch new errors. [23:12:19] I promise it won't make you curse as much as delete-stuff did with its byts and pixs. [23:13:16] https://gerrit.wikimedia.org/r/gitweb?p=operations/software.git;a=blob;f=swiftcleaner/swiftcleaner;hb=HEAD [23:13:28] Tim cursed? [23:13:53] I could see the lasers coming out of his eyes all the way across the ocean. [23:13:55] * AaronSchulz was under the impression that we just though the swears in his head and didn't say aything [23:14:03] *he just [23:15:27] ok, I'll look at it [23:15:33] thanks. [23:16:13] New review: Bhartshorne; "(no comment)" [operations/software] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2900 [23:16:16] Change merged: Bhartshorne; [operations/software] (master) - https://gerrit.wikimedia.org/r/2900 [23:17:20] New patchset: Bhartshorne; "whitespace only change to test gerrit." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2901 [23:17:33] Ryan_Lane: ^^^ [23:17:40] ok. cool [23:17:48] probably just my own locally screwed up repo [23:17:56] would you like me to either abandon or merge it to test other bits? [23:18:02] New patchset: Ryan Lane; "Ensure replica is readonly" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2902 [23:18:07] heh. there we go [23:18:16] nah. I had already done so once [23:18:19] I'll test again with this one [23:18:27] ok, barring anything else I'll abandon the change. [23:18:45] ok [23:18:49] replication is failing to formey [23:18:56] I probably need to rsync the data across [23:19:04] Change abandoned: Bhartshorne; "just testing stuff." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2901 [23:19:19] I think it screwed up earlier because a file was locked [23:19:28] and now it's broken :( [23:19:38] oh well. easily solved [23:19:53] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2902 [23:19:57] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2902 [23:20:46] Ryan_Lane, bits is running nginx right? can we configure it to do transparent gzip compression on .svg files? WebFonts SVG fonts specifically could benefit from this for now https://bugzilla.wikimedia.org/show_bug.cgi?id=34810 [23:21:01] bits is running varnish [23:21:10] hmmmm [23:21:23] backend is apache [23:21:32] ah then apache we can def configure it that way [23:21:43] and varnish should pass it through fine [23:22:21] Reedy, that sound about right for that webfonts svg thing? [23:22:50] That'd be fine, yup [23:23:03] excellent, i'll add a note on it [23:23:30] Google says something about adding Vary: Accept-Encoding too [23:24:12] *nod* [23:25:12] It's actually got some reasonable suggestions [23:25:30] Though, moving images from wikipages to embedded CSS seems a little extreme ;) [23:26:04] let's just embed everything into a single data uri ;) [23:31:12] brion: we wouldn't need image squids then! :p [23:32:08] hehe [23:35:36] !log trying a run of swiftcleaner against the commons a2 shard on swift. [23:35:40] Logged the message, Master [23:55:36] New patchset: Ryan Lane; "No need for gid 550 on manganese, or wikidev" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2903 [23:57:06] woosters: I added the ability to delete stuff to the swiftcleaner script. Running with 20 threads it looks like it's doing about 120 checks per second, putting total running time for one shard estimated at about 20 minutes (I'll find out soon enough if that's accurate). [23:57:24] It looks like that added between 10 and 15% to the io load on ms5 [23:57:36] so I think I can probably run it at double that and still have ms5 be happy. [23:57:46] triple and ms5 would be working but probably still responsive. [23:57:48] that is encouraging news indeed [23:58:46] ..aaaand it's done. 17 minutes to do the a2 shard. [23:59:07] at that rate, 72hrs for all of commons. [23:59:12] rob la's team is deploying 1.19, while should not have an impact on that, better let him know [23:59:13] half that if I double the rate, etc. [23:59:40] yeah, I know.I won't start the real thing until they're done. [23:59:48] I figured a 5-20 minute test would be ok.