[00:27:46] we've had someone waiting for a labs account for 3.5 days. can someone please do it for me? [00:28:34] https://www.mediawiki.org/w/index.php?title=Developer_access&diff=580651&oldid=580536 ; danke [00:29:08] (i think i've already asked at least a couple times...) [00:50:58] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [00:50:58] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [00:50:58] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [00:50:58] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [00:50:58] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [00:50:59] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [00:50:59] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [00:51:00] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [00:51:00] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [01:41:58] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 231 seconds [01:42:25] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 304 seconds [01:45:25] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 6 seconds [01:46:37] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 18 seconds [02:38:04] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [02:38:04] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [02:38:04] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [02:38:04] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [02:38:04] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [02:38:05] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [02:38:05] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [02:38:06] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [02:38:06] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [02:38:07] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [02:38:07] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [02:38:08] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [02:38:08] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [02:38:09] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [02:38:09] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [02:38:10] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [02:38:10] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [02:38:11] PROBLEM - Puppet freshness on es1007 is CRITICAL: Puppet has not run in the last 10 hours [02:38:11] PROBLEM - Puppet freshness on es1008 is CRITICAL: Puppet has not run in the last 10 hours [02:38:12] PROBLEM - Puppet freshness on es1010 is CRITICAL: Puppet has not run in the last 10 hours [02:38:12] PROBLEM - Puppet freshness on es1009 is CRITICAL: Puppet has not run in the last 10 hours [02:46:01] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [03:37:01] PROBLEM - Puppet freshness on oxygen is CRITICAL: Puppet has not run in the last 10 hours [03:42:25] RECOVERY - Puppet freshness on spence is OK: puppet ran at Mon Sep 10 03:42:05 UTC 2012 [04:49:29] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [04:49:29] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [04:49:29] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [04:55:29] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [06:22:18] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [06:22:18] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [07:24:36] PROBLEM - Puppet freshness on manganese is CRITICAL: Puppet has not run in the last 10 hours [09:27:29] PROBLEM - Puppet freshness on mw8 is CRITICAL: Puppet has not run in the last 10 hours [10:52:26] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [10:52:26] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [10:52:26] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [10:52:26] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [10:52:26] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [10:52:27] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [10:52:27] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [10:52:28] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [10:52:28] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [12:39:09] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [12:39:09] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [12:39:09] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [12:39:09] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [12:39:09] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [12:39:10] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [12:39:10] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [12:39:11] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [12:39:11] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [12:39:12] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [12:39:12] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [12:39:13] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [12:39:13] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [12:39:14] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [12:39:14] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [12:39:15] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [12:39:15] PROBLEM - Puppet freshness on es1007 is CRITICAL: Puppet has not run in the last 10 hours [12:39:16] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [12:39:16] PROBLEM - Puppet freshness on es1010 is CRITICAL: Puppet has not run in the last 10 hours [12:39:17] PROBLEM - Puppet freshness on es1009 is CRITICAL: Puppet has not run in the last 10 hours [12:39:17] PROBLEM - Puppet freshness on es1008 is CRITICAL: Puppet has not run in the last 10 hours [12:46:47] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [13:37:47] PROBLEM - Puppet freshness on oxygen is CRITICAL: Puppet has not run in the last 10 hours [14:51:01] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [14:51:01] PROBLEM - Puppet freshness on ms-be1011 is CRITICAL: Puppet has not run in the last 10 hours [14:51:01] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [14:57:01] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [15:02:01] New review: Dereckson; "Shellpolicy. This configuration change is waiting a local consensus." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/23068 [15:29:26] Hey Jeff_Green [15:34:44] drdee_ hey--i just read your email [15:34:50] k [15:35:44] do you think it's possible that library changes (i.e. via dist upgrade) could have caused the compiled filters to misbehave? [15:36:19] the file timestamps on the filters, udp2log itself, and the udp2log config files are several months old or older [15:36:45] what library changes have happened recently? [15:37:09] not sure, offhand. i'll see if apt logs are informative [15:37:40] AFAIK, locke and the other logging boxes are still running ubuntu 10, not precise [15:38:25] locke is 10.04.4 [15:40:15] running zcat /a/squid/archive/sampled-1000.log-20120908.gz | ./bi-filter | wc -l gives also 19 lines so the results mimick udp-filter [15:40:50] that's interesting [15:40:52] maybe the random sampling is not so random in udp2log………. [15:41:25] you know...I haven't investigated the URIs used for the banners. could *those* have changed such that the filter no longer matches? [15:41:26] it seems less of a filter problem to me (given the current information) [15:41:39] that's a good point to look at :) [15:42:20] the bi-filter looks for "http://meta.wikimedia.org/wiki/Special:BannerLoader" or "?title=Special:BannerLoader" [15:43:09] particularly, the first string also looks for the meta domain, i am not sure on what domain the banners are hosted [15:43:44] i'm not either--but I can get the fr-tech folks to go through that aspect with me [15:43:54] k [15:54:52] Jeff_Green: do you have an example URL with a banner? [16:06:34] * jeremyb waves robla. have a couple mins? [16:06:54] i've been looking for someone to do https://www.mediawiki.org/w/index.php?title=Developer_access&diff=580651&oldid=580536 [16:07:13] bbl [16:09:04] drdee_: at the moment the best I've got is what we've historically logged [16:09:06] we know who you really are :-P [16:09:14] ok heading in to the office, see folks soon [16:09:37] Jeff_Green: how about deploying the same bi-filter on oxygen and run them parallel to see what happens? [16:10:34] sure [16:10:52] why don't we do your new filter there? [16:13:09] drdee_: i have a new angle to investigate too [16:13:30] I think banner impressions are logged via javascript subrequest (?), did that change/break? [16:13:59] can you show me a URL? [16:16:32] i wish I could, afaik there are no banners up atm [16:16:44] we're talking about running a small test today so we have something to work with [16:16:49] or a link to source code in gerrit/svn? [16:19:07] :( [16:19:28] meanwhile we can setup a new filter on oxygen to see if there are differences in traffic received [16:20:13] they should be around soon, at which point I will pounce and get them to run tests with me. I've got until mid-afternoon pacific to focus on this [16:20:26] ok. are the o2 filters puppetized? [16:20:44] yes they are [16:27:45] Jeff_Green: https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=blob;f=templates/udp2log/filters.oxygen.erb;h=75f71b36c47c5f70f2a784b6efef146c57e990af;hb=8d3f3d5a2355b0a876a4406ad53c5831b4ff2d3b [16:38:37] drdee_: pgehres is here [16:38:44] great [16:38:46] hey all [16:38:49] hi hi [16:38:53] yo [16:39:01] so can we look at some examples of banners that are currently up? [16:39:09] yes [16:39:18] http://meta.wikimedia.org/w/index.php?title=Special:NoticeTemplate/view&template=B12_InPaJwKsRn [16:39:27] but not all code is there, some is done by CN [16:39:28] I'm going to adjust the filter to be 1:1 instead of 1:100 for now [16:39:47] forced it would look like http://en.wikipedia.org/?banner=B12_InPaJwKsRn [16:39:57] Jeff_Green: okay, WLM should be logging as well [16:41:33] alright, filter is adjusted--we're logging 1:1 to locke:/a/squid/fundraising/logs/bannerImpressions.log temporarily [16:41:41] cool [16:41:50] can you point me to the filter? [16:42:13] amssq33.esams.wikimedia.org 364602071 2012-09-10T16:41:49.450 0 77.195.90.140 TCP_MISS/200 2035 GET http://meta.wikimedia.org/w/index.php?title=Special:BannerLoade [16:42:16] that looks truncated [16:42:28] that does [16:42:45] i see complete lines [16:42:45] pgehres: it's /a/squid/fundraising/bi-filter [16:42:51] but not the last one [16:43:26] can i log into locke? [16:43:30] sec [16:43:51] could WikiLovesMonuments be crowding out the fundraiser banners? [16:44:01] WLM should be logging as well [16:44:10] AFAIK the BI filter logs them all [16:44:19] pgehres: you appear to have an account on locke [16:44:25] it's locke.wikimedia.org [16:44:30] well, hey, so it is [16:46:30] but i mean WLM banners are shown on every page, right? and so the number of WLM hits is gigantic compared to fundraiser banners, so when you start to sample at 1:100 then the probability of hitting a fundraiser banner is very small [16:46:41] also it seems that nothing has changed [16:46:50] well, when we run fundraising, we pause WLM [16:46:54] okay, I have have it [16:46:56] in respect with the filters, udp2log etc [16:47:12] it looks like the filter is looking for "http://meta.wikimedia.org/wiki/Special:BannerLoader" [16:47:19] yes [16:47:24] it's looking for one or the other right? [16:47:29] or is it both? [16:47:29] but that is now loading at "http://meta.wikimedia.org/w/index.php?title=Special%3ABannerLoader" [16:47:40] tadaaa [16:47:44] that's the solution [16:48:06] that's assuming I am read C properly [16:48:17] lemme check with K4-713 who wrote this [16:48:25] who wrote the filter? [16:48:29] yes, there are two strings defined and it will match one of them [16:49:21] PROBLEM - udp2log log age for locke on locke is CRITICAL: CRITICAL: log files /a/squid/fundraising/logs/bannerImpressions-sampled100.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [16:49:38] but what caused this change? when did the colon started to be URLencoded? [16:50:47] kaldari made some changes recently [16:50:49] RECOVERY - udp2log log age for locke on locke is OK: OK: all log files active [16:50:58] Jeff_Green: i think this is also a good moment to deprecate bi_filter and migrate to udp-filter [16:51:07] drdee: i concur [16:51:22] that way you can just read from the config what it is matching [16:51:29] you can also enable geocoding [16:51:37] and puppetize it :) [16:51:37] drdee_ agreed [16:51:55] okay, i have to go to a meeting, so feel free to do whatever if it fixes it [16:51:59] ok [16:52:15] i think we just need to match the string "title=Special%3ABannerLoader" [16:53:27] drdee_ you're going to puppetize for locke? [16:53:51] i can't either you or ottomata has to do it [16:53:57] ah ok. i can do [16:54:21] i can do a +1 in gerrit [16:57:08] zack makes an interesting suggestion--why not do a specific string just for fundraising banners? apparently we've had URI vs filter issues in the past [16:57:30] &string=FRBNRKTHX [16:57:43] Jeff_Green: all our banners are prefixed with B12 [16:58:03] pgehres: that's a bit short to trust as unique though [16:58:08] :-) [16:58:48] well the string would be something like title=Special%3ABannerLoader&banner=B12 [16:58:53] right? [16:59:09] that seems awfully long for a regex filter [16:59:10] that should work [16:59:14] can we compromise :-) [16:59:15] it's not a regex filter [16:59:48] so string length doesn't affect performance? [16:59:55] barely [17:00:01] ok [17:00:13] or we can drop at the start the title= part [17:01:02] let's configure the filter by hand on locke for now, I also want to fix the log rotation script to reload the process differently [17:01:23] I can't run the script as root since it's rsync-ing to a root-squashed nfs mount [17:01:45] k [17:02:06] does udp-filter handle string matches? i see examples piped through awk (shudder) [17:02:29] huh? where? [17:02:58] don't worry--not through udp-filter [17:03:10] oh yes [17:03:20] we should kill them with fire those awk scripts [17:03:21] trying to find your proposed syntax in my email [17:03:53] found it [17:04:13] pipe 100 /usr/bin/udp-filter -p Special:BannerLoader >> /a/squid/fundraising/logs/bannerImpressions-sampled100.log [17:06:12] so . . . how about this: [17:06:17] pipe 100 /usr/bin/udp-filter -p BannerLoader&banner=B12 >> /a/squid/fundraising/logs/bannerImpressions-sampled100.log [17:06:36] this still leaves me disturbed about all the truncated lines [17:06:48] it looks good [17:07:53] k. now of course we're logging nothing b/c it was all wlm before [17:08:27] :) [17:08:45] !log adjusted locke:/etc/udp2log/squid to fix pattern match for fundraiser banners [17:08:48] want some fr banners [17:08:49] ? [17:08:52] yep [17:08:55] Logged the message, Master [17:09:01] bring the noise! [17:09:41] on [17:10:19] Jeff_Green: let me know when you are satified [17:10:57] still piddly input [17:11:02] hmmm [17:11:04] where can I see a banner? [17:11:11] en.wikipedia.org [17:11:16] splitting with WLM [17:11:45] zomg I just thought of something . . . [17:12:11] & [17:12:22] i want to stab this entire architecture in the soul. [17:13:08] ? [17:13:21] it's straight shell scripting [17:13:38] & shoves the process in the background [17:13:43] [sh] [17:13:44] joy. [17:14:16] * pgehres is missing something [17:14:21] much better [17:14:44] logs flowing? [17:14:48] udp2log reads /etc/udp2log/squid [17:15:08] pipe 100 /usr/bin/udp-filter -p BannerLoader&banner=B12 >> /a/squid/fundraising/logs/bannerImpressions-sampled100.log [17:15:41] that says: run this in the background "pipe 100 /usr/bin/udp-filter -p BannerLoader" and gobbledygook thk mr shell [17:15:48] pipe 100 /usr/bin/udp-filter -p BannerLoader\&banner=B12 >> /a/squid/fundraising/logs/bannerImpressions-sampled100.log [17:15:50] that's better [17:15:50] yes [17:16:00] heh [17:16:07] i see now [17:16:27] yeah logs are flowing, you can take it down [17:17:18] I'm going to restart udp2log and clean up the mess [17:17:32] while you in there, can you run a 1:100 against a 1:1 ? [17:17:43] or maybe a 1:10 instead of a 1:1 [17:19:19] so we can test to make sure that the sampling is fairly accurate [17:20:42] it is looking good [17:21:30] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [17:21:39] PROBLEM - Varnish traffic logger on cp1025 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [17:21:39] PROBLEM - Varnish traffic logger on cp1021 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [17:21:39] PROBLEM - Varnish traffic logger on cp1027 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [17:21:48] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [17:22:15] PROBLEM - Varnish traffic logger on cp1026 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [17:22:24] PROBLEM - Varnish traffic logger on cp1028 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [17:22:24] PROBLEM - Varnish traffic logger on cp1024 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [17:22:24] PROBLEM - Varnish traffic logger on cp1022 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [17:22:24] PROBLEM - Varnish traffic logger on cp1044 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [17:22:33] PROBLEM - Varnish traffic logger on cp1043 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [17:23:18] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [17:23:45] RECOVERY - Varnish traffic logger on cp1026 is OK: PROCS OK: 3 processes with command name varnishncsa [17:23:59] hi apergos, if you have a second could you take a look at https://gerrit.wikimedia.org/r/#/c/23107/ ? [17:25:33] PROBLEM - Puppet freshness on manganese is CRITICAL: Puppet has not run in the last 10 hours [17:26:58] drdee thanks for your help! [17:27:57] Jeff_Green, very welcome and send me the gerrit url once you have committed the puppet change. maybe it is an idea to annotate that B12 is a unique identifier for fundraising banners [17:27:57] RECOVERY - Varnish traffic logger on cp1021 is OK: PROCS OK: 3 processes with command name varnishncsa [17:31:42] RECOVERY - Varnish traffic logger on cp1024 is OK: PROCS OK: 3 processes with command name varnishncsa [17:34:33] RECOVERY - Varnish traffic logger on cp1022 is OK: PROCS OK: 3 processes with command name varnishncsa [17:35:54] will do [17:38:18] RECOVERY - Varnish traffic logger on cp1025 is OK: PROCS OK: 3 processes with command name varnishncsa [17:39:03] RECOVERY - Varnish traffic logger on cp1028 is OK: PROCS OK: 3 processes with command name varnishncsa [17:41:47] TomDaley: ping [17:41:54] RECOVERY - Varnish traffic logger on cp1027 is OK: PROCS OK: 3 processes with command name varnishncsa [17:42:39] RECOVERY - Varnish traffic logger on cp1044 is OK: PROCS OK: 3 processes with command name varnishncsa [17:45:12] RECOVERY - Varnish traffic logger on cp1043 is OK: PROCS OK: 3 processes with command name varnishncsa [17:53:36] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [17:56:56] TomDaley: (unping, got it taken care of) [18:05:39] drdee, how much space are these logs going to take? [18:09:52] apergos: between 500 and 700mb per day, we can have a rolling window of 3 months [18:11:44] ok [18:12:05] New patchset: Andrew Bogott; "Update wiki instance-status pages automatically." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23155 [18:12:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/23155 [18:13:05] cmjohnson1: updating the meeting notes [18:13:08] on memcached [18:13:12] we have all the parts right? [18:13:21] (old notes say awating sfp+ arrival) [18:13:25] we still waiting? [18:13:33] yes, we have everything ow [18:13:34] now [18:13:40] cool, thx [18:14:40] so RobH woosters the 520s do let us select the H310 raid controller (which has a "no raid" mode) on the customization screen [18:17:04] I am unclear about getting 12 drives in the chassis though [18:18:21] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK HTTP/1.1 200 OK - 454 bytes in 0.021 seconds [18:18:38] apergos: http://www.dell.com/us/enterprise/p/poweredge-r520/pd [18:18:43] so the 520 is only 8 disks. [18:18:45] thats shitty. [18:18:59] seems like the 720 is the way we have to go. [18:19:38] well what's the difference in price if we [18:19:51] get more 520s to make up the difference in storage? [18:20:54] RECOVERY - Puppet freshness on mw8 is OK: puppet ran at Mon Sep 10 18:20:49 UTC 2012 [18:24:27] notpeter: after install on mw8 and puppet update I see this error: √ [18:24:29] err: /Stage[main]/Apaches::Service/Service[apache]: Failed to call refresh: Could not start Service[apache]: Execution of '/etc/init.d/apache2 start' returned 1: at /var/lib/git/operations/puppet/manifests/apaches.pp:154 [18:25:30] cmjohnson1: | cmjohnson1 onsite @ sdtpa hehe [18:25:32] lies! [18:25:35] rsync -a 10.0.5.8::httpdconf/ /usr/local/apache/conf [18:25:38] cmjohnson1: ^ [18:26:02] robh: i am there in spirit! [18:27:39] PROBLEM - Apache HTTP on mw8 is CRITICAL: Connection refused [18:32:20] !log authdns-update for wicipediacymraeg.org [18:32:29] Logged the message, RobH [18:32:45] New patchset: Andrew Bogott; "Update wiki instance-status pages automatically." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23155 [18:33:44] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/23155 [18:36:17] yeah RobH so the upshot is: the R720 takes 8 3.5" or 16 2.5" drives, and has the H310 option. The R520 takes 8 2.5" or 3.5" drives, and also had the H310 option. The R510 while it takes 12 2.5 or 3.5" drives only has the H200 or the H700. [18:36:39] RECOVERY - Apache HTTP on mw8 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.028 second response time [18:36:59] all in 2u form factor [18:41:00] RECOVERY - NTP on mw8 is OK: NTP OK: Offset -0.01140093803 secs [18:41:21] New patchset: Aaron Schulz; "Removed cruft swift.php file." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23308 [18:42:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/23308 [18:42:18] paravoid and binasher: did you guys decide anything about the container servers? I saw you were taking about them a little bit [18:44:06] New review: Pyoungmeister; "diederik has check with apergos about space requirements of this." [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/23107 [18:44:54] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23107 [18:55:05] New patchset: Aaron Schulz; "Reverted 421cc6c changes; this won't work well short thumbnail names." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23309 [18:55:05] New review: Aaron Schulz; "See https://gerrit.wikimedia.org/r/#/c/22930/ and related." [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/23309 [18:55:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/23309 [18:55:24] RECOVERY - Puppet freshness on oxygen is OK: puppet ran at Mon Sep 10 18:55:08 UTC 2012 [19:01:04] apergos: we were just brainstorming, didn't decide anything [19:01:09] ok [19:02:13] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23308 [19:05:03] RobH: maybe the 720 instead of the 720xd is more affordable [19:05:31] (it takes 16 2.5" drives, don't know sizes, it does say max 24T of storage, and the H310) [19:40:58] Change merged: Dzahn; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23175 [19:47:40] mutante: in the mood for some more shell reqs? ;) [19:50:40] hrmm, the noc copy doesn't match the gerrit master copy thing seems to be fixed [19:50:50] (for InitialiseSettings.php) [19:50:58] Brooke: ^ [19:59:08] * jeremyb was thinking of gerrit 2328[67] if anyone wants to take them [19:59:14] have to run very shortly [20:04:30] AaronSchulz: so the reason I saw no 503s form thumb purges is that according to the latest container ring layout the container listings are only on 4 hosts (all with ssds) [20:04:49] I don't really know what the container server processes on the other hosts do, maybe nothing :-D [20:15:30] labsconsole: There was either an authentication database error or you are not allowed to update your external account. [20:17:28] i logged out and back in and it's still happening [20:17:54] Ryan_Lane: andrewbogott_afk: paravoid: ping [20:18:11] (this is when trying to create a user) [20:19:39] bbl [20:53:36] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [20:53:36] PROBLEM - Puppet freshness on ms-be1009 is CRITICAL: Puppet has not run in the last 10 hours [20:53:36] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [20:53:36] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [20:53:36] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [20:53:37] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [20:53:37] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [20:53:38] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [20:53:38] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [21:08:14] jeremyb: when trying to add a user? [21:08:23] jeremyb: what's the shell account name? [21:08:26] maybe it already exists? [21:08:44] error reporting for auth plugins in mediawiki sucks [21:28:09] oh. let me check that [21:29:43] Ryan_Lane: no current user in LDAP, no user in wiki. both names = theKaramanukian [21:31:35] Ryan_Lane: seems the key was that the shell name had to be all lowercase (I thought it was just the first letter that mattered?) [21:31:40] created now [21:35:00] jeremyb: Excellent. [21:37:42] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [21:39:03] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms [21:39:34] jeremyb: shell account name must be all lowercase [21:39:46] jeremyb: the form shows the allowed characters [21:40:33] !log added salt and dependent packages to lucid-wikimedia and precise-wikimedia repos [21:40:42] Logged the message, Master [21:43:15] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [21:43:24] paravoid: since I need to puppetize salt for deployment, I'm going to get it working between virt0 and the labs instances [21:43:43] then we'll also be able to do remote calls to the instances if needed [21:44:37] uh [21:44:40] ? [21:44:53] it seems a bit premature but... let's see :) [21:44:59] btw, puppet has a REST call for a kick [21:45:03] http://docs.puppetlabs.com/guides/rest_api.html [21:45:23] I need to be able to sign salt keys on the puppet server [21:45:33] so, I thought it better to test it on labs before production ;) [21:50:06] apergos: btw, I just run swift-init stop all on ms-be6 [21:50:33] no reason for that crap to be running over there anyways [21:50:35] we should do that on reboots and such, to avoid having it serve stale objects and such [21:50:46] in theory it's not in the ring right? but still [21:50:48] *rings [21:51:00] yeah, in theory [21:52:57] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.023 second response time [22:40:21] PROBLEM - Puppet freshness on analytics1011 is CRITICAL: Puppet has not run in the last 10 hours [22:40:21] PROBLEM - Puppet freshness on analytics1013 is CRITICAL: Puppet has not run in the last 10 hours [22:40:21] PROBLEM - Puppet freshness on analytics1014 is CRITICAL: Puppet has not run in the last 10 hours [22:40:21] PROBLEM - Puppet freshness on analytics1015 is CRITICAL: Puppet has not run in the last 10 hours [22:40:21] PROBLEM - Puppet freshness on analytics1012 is CRITICAL: Puppet has not run in the last 10 hours [22:40:22] PROBLEM - Puppet freshness on analytics1016 is CRITICAL: Puppet has not run in the last 10 hours [22:40:22] PROBLEM - Puppet freshness on analytics1017 is CRITICAL: Puppet has not run in the last 10 hours [22:40:23] PROBLEM - Puppet freshness on analytics1019 is CRITICAL: Puppet has not run in the last 10 hours [22:40:23] PROBLEM - Puppet freshness on analytics1021 is CRITICAL: Puppet has not run in the last 10 hours [22:40:24] PROBLEM - Puppet freshness on analytics1018 is CRITICAL: Puppet has not run in the last 10 hours [22:40:24] PROBLEM - Puppet freshness on analytics1023 is CRITICAL: Puppet has not run in the last 10 hours [22:40:25] PROBLEM - Puppet freshness on analytics1022 is CRITICAL: Puppet has not run in the last 10 hours [22:40:25] PROBLEM - Puppet freshness on analytics1024 is CRITICAL: Puppet has not run in the last 10 hours [22:40:26] PROBLEM - Puppet freshness on analytics1020 is CRITICAL: Puppet has not run in the last 10 hours [22:40:26] PROBLEM - Puppet freshness on analytics1025 is CRITICAL: Puppet has not run in the last 10 hours [22:40:27] PROBLEM - Puppet freshness on analytics1026 is CRITICAL: Puppet has not run in the last 10 hours [22:40:27] PROBLEM - Puppet freshness on es1007 is CRITICAL: Puppet has not run in the last 10 hours [22:40:28] PROBLEM - Puppet freshness on es1008 is CRITICAL: Puppet has not run in the last 10 hours [22:40:28] PROBLEM - Puppet freshness on es1009 is CRITICAL: Puppet has not run in the last 10 hours [22:40:29] PROBLEM - Puppet freshness on analytics1027 is CRITICAL: Puppet has not run in the last 10 hours [22:40:29] PROBLEM - Puppet freshness on es1010 is CRITICAL: Puppet has not run in the last 10 hours [22:40:50] !log swift: weight 66->100 for ms-be9, 11, 12 [22:40:59] Logged the message, Master [22:48:18] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [22:53:32] !log adding all current labsconsole users to mediawiki shell group [22:53:41] Logged the message, Master [23:16:37] New patchset: Aaron Schulz; "Reverted 421cc6c; this won't work with short thumbnail names." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23309 [23:18:21] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/23309 [23:23:06] faidon: you are using ms3 for swift..correct and not having any issues w/it ? i have an old ticket that doesn't seem to be an issue any longer. [23:27:07] is ms3 in any pool? I thought not [23:27:28] it is set up for swift [23:27:58] but if it is not in any ring then we aren't actually using it for anything (i.e. it gets no traffic) [23:28:38] i don't know much more than checking the drives on the sys [23:28:50] i see dev/sdao1 on /srv/swift-storage/sdao1 type xfs (rw,noatime,nodiratime,nobarrier,logbufs=8) [23:28:51] and so on [23:29:32] well [23:30:07] no GETs in the log [23:30:13] for today or yesterday [23:31:05] the only thing it tries to ever do apparently is sync [23:31:13] i think ben was using as a lab [23:31:30] not for production [23:31:42] I see [23:35:33] apergos: https://gerrit.wikimedia.org/r/#/c/3766/ [23:35:48] yeah well I knew it wasn't in them [23:35:58] I look at the rings at least a couple times a week for one thing or another