[00:08:27] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [00:08:27] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [00:08:27] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [00:57:48] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:59:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 2.126 seconds [01:24:30] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [01:24:30] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [01:33:57] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:37:06] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.971 seconds [01:41:27] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [01:41:27] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [02:12:16] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:24:44] !log LocalisationUpdate completed (1.21wmf5) at Mon Dec 10 02:24:44 UTC 2012 [02:24:53] Logged the message, Master [02:25:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.703 seconds [02:59:31] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [03:05:22] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [03:15:49] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.040 second response time [03:44:10] PROBLEM - Puppet freshness on ms-be3 is CRITICAL: Puppet has not run in the last 10 hours [04:25:10] New patchset: Ori.livneh; "Yet another pmpta -> pmtpa" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37378 [04:36:55] basile: blog.wm.o has non-secure resources when fetched by HTTPS (at least a few images) [05:52:38] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [06:00:35] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [07:44:03] PROBLEM - Puppet freshness on ms-be3002 is CRITICAL: Puppet has not run in the last 10 hours [08:45:11] hello [09:02:35] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [09:13:50] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.007 second response time on port 11000 [09:32:28] New patchset: Stefan.petrea; "Added parsing modules for wikistats testing" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37800 [09:34:59] New review: Hashar; "Seems fine to me :)" [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/37800 [09:36:22] hashar: ping [09:36:29] yeah there :-] [09:36:56] apergos: Hello 8) would you have time to get a few package installed on the cont int server please ? The change is https://gerrit.wikimedia.org/r/37800 [09:37:15] average_drifter: it is quiet during eu morning but we have a few ops floating around to assist us nonetheless [09:37:25] depending how much they are busy with other projects though [09:39:42] im just looking at bug 42860 somehow some webm files have Content-Type: text/plain [09:40:04] anyone knows what layer this could go wrong? [09:40:22] example url: https://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Le_Voyage_dans_la_Lune_%28Georges_M%C3%A9li%C3%A8s%2C_1902%29.ogv/Le_Voyage_dans_la_Lune_%28Georges_M%C3%A9li%C3%A8s%2C_1902%29.ogv.480p.webm [09:40:33] !b 42860 [09:40:33] https://bugzilla.wikimedia.org/42860 [09:40:43] sometimes i get text/plain sometimes an ok webm type [09:41:37] j^: I guess the content type is set by the mediawiki extension isn't it ? [09:41:45] could be that some servers do not properly set it [09:41:46] or [09:42:04] its set while putting it into swift yes [09:42:04] the file might be cached on different cache and one of the copy has a wrong content type [09:42:33] since sometimes it returns ok i am womdering if some other part of the cache chain also sets it [09:43:41] hashar: shall we ask for second oppinion on https://gerrit.wikimedia.org/r/#/c/37800/ ? [09:44:21] j^: I seem to always get it with text/plain though if I append a query parameter ( url/.…webm?ohihatecache ) I get audio/webm [09:44:41] j^: so it looks like to me the cache has the wrong info [09:45:12] average_drifter: I can't merge changes in operations/puppet.git , only roots can. [09:45:21] average_drifter: so we need someone from ops to look at it :-] [09:45:23] j^: [09:45:52] hashar: alright :) [09:46:39] hashar: i guess as soon as https://gerrit.wikimedia.org/r/#/c/35574/ is in production it should be easier to reset those caches [09:50:07] j^: I have updated the bug report listing the curl command and the headers [09:50:20] j^: nothing I can do about it, I have no idea how to purge that from the upload caches [10:00:42] j^: that bug was fixed a while ago [10:00:45] but some cache content may still have it [10:00:54] !g I38bb589f10ab3253472fd5b5fbb0a19b80b4d9e1 [10:00:54] https://gerrit.wikimedia.org/r/#q,I38bb589f10ab3253472fd5b5fbb0a19b80b4d9e1,n,z [10:02:17] mark: any way to purge the cache if a file has been identified? [10:02:38] this is only happing in esams right? [10:03:38] yes [10:03:42] only in esams [10:03:59] and mediawiki doesn't seem to purge those [10:04:42] i changed that, https://gerrit.wikimedia.org/r/#/c/35574/ [10:04:45] that was the discussion about purging vs removing the scaled files, remember [10:04:47] but its not deployed [10:04:48] ah [10:06:09] so in the future it should be possible to fix sending ?action=purge [10:09:16] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [10:09:16] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [10:09:16] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [10:14:12] hashar: this link you gave me, it talks about wikistat testing, that;s the one you want merged? [10:15:17] New review: Siebrand; "The problem here is "shellpolicy"? What does that mean?" [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/31823 [10:20:36] apergos: yup [10:20:59] apergos: https://gerrit.wikimedia.org/r/37800 the perl modules are required by the analytics team [10:21:05] apergos: their tests are written in perl :-] [10:21:24] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37800 [10:21:42] you just linked me the same change, did you mean to? [10:21:51] yup [10:22:01] so you don't have to look it up in case you closed your browser :-] [10:22:11] oh. no I had the tab open [10:22:13] thanks though [10:22:21] average_drifter: Ariel merged the change that get the new perl packages on gallium ! [10:22:28] tabs are wonderful things [10:22:50] wait for it please [10:23:03] I still need to merge on sockpuppet (just done) and run on gallium [10:26:42] hm [10:26:57] ah there it is [10:27:06] your change is live, have fun with the perl scripts [10:27:13] apergos: thanks!!! [10:27:24] I'm going to be afk for a bit, (visit to lawyer), back in a little while [10:48:52] PROBLEM - Host ms-be3002 is DOWN: PING CRITICAL - Packet loss = 100% [11:06:25] RECOVERY - Host ms-be3002 is UP: PING OK - Packet loss = 0%, RTA = 109.32 ms [11:21:43] RECOVERY - SSH on ms-be3002 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [11:25:35] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [11:25:35] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [11:42:14] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [11:42:14] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [12:02:16] New patchset: MaxSem; "Temporarily raise $wgMaxCoordinatesPerPage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37823 [12:10:55] Change merged: MaxSem; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37823 [12:13:34] !log maxsem synchronized wmf-config/CommonSettings.php 'https://gerrit.wikimedia.org/r/#/c/37823/' [12:13:44] Logged the message, Master [13:00:10] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [13:04:35] New review: Dereckson; "Shellpolicy issue has been mitigated (cf. bug 41757, comments 8 and 9)." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/31823 [13:06:46] New patchset: Dereckson; "(bug 42737) Rights configuration on ur.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37411 [13:15:47] jeremyb: I think I've fixed all of the theme's URLs to be protocol-independent, but I can't fix all the images that people insert into individual posts. [13:16:48] guillom: well can we at least start by figuring out a way to make images correct for new posts? [13:17:00] guillom: the problems i saw looked pretty recent [13:18:32] * jeremyb has to run in a min [13:19:27] jeremyb: I don't know if it's a WordPress bug (in which case we can't fix it) or if it only happens when we hotlink, e.g. from Commons (in which case the only way is to fix them manually). [13:20:35] guillom: i do know it definitely happens with uploads to blog.wm.o [13:20:51] the 2 that i noticed were http://blog.wm.o/... [13:23:35] It seems that there are workarounds ( http://www.deluxeblogtips.com/2012/06/relative-urls.html ) but I don't get why this wouldn't be implemented in WP itself. [13:23:42] Anyway, not my highest priority atm :) [13:23:58] guillom: anyway, i guess that means no WMF person has worked on it yet. /me will click that link later [13:24:10] bbl [13:45:50] PROBLEM - Puppet freshness on ms-be3 is CRITICAL: Puppet has not run in the last 10 hours [14:22:46] apergos , hashar thanks ! [14:23:26] yw (mostly hashar) [14:24:31] team work ! ;-D [14:25:10] :-) [14:47:20] RECOVERY - Puppet freshness on ms-be3002 is OK: puppet ran at Mon Dec 10 14:47:02 UTC 2012 [14:49:43] wi guys [14:49:50] what's up with Gerrit ? [14:49:55] I need to do a git review but I can't [14:50:03] I get [14:50:18] error: The requested URL returned error: 403 while accessing https://gerrit.wikimedia.org/r/p/analytics/wikistats.git/info/refs [14:54:24] Hello ? [14:54:34] I can't push my changes in Gerrit [14:54:38] average_drifter: i think san fransisco's asleep :) [14:54:39] Any changes [14:54:45] jeremyb: ok [14:56:17] what can I do ? [15:00:04] ok got it working [15:07:53] RECOVERY - NTP on ms-be3002 is OK: NTP OK: Offset -0.02130961418 secs [15:53:32] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [16:01:37] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [16:43:11] New patchset: Jgreen; "set tmp suffix on in-progress files so we can ignore them for rsync" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37851 [16:44:04] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37851 [17:22:43] New patchset: Demon; "Hook for Special:Version" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37854 [17:23:21] New patchset: Demon; "Hook for Special:Version" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37854 [17:47:36] PROBLEM - Host constable is DOWN: CRITICAL - Host Unreachable (208.80.152.151) [17:49:24] RECOVERY - Host constable is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [17:58:08] !log demon synchronized wmf-config/extdist/svn-invoker.conf 'More futile efforts to fix extdist' [17:58:17] Logged the message, Master [18:12:48] can someone please review https://gerrit.wikimedia.org/r/#/c/35298/ [18:25:13] !log temp. depooling ssl4 [18:25:20] Logged the message, Master [18:34:23] did you want to add IEMobile to that preg match, MaxSem? I see it's in devicedetection, but maybe you already test for 'mobile' in the string someplace/ [18:34:26] ? [18:34:42] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:36:25] RoanKattouw_away: where you at? [18:36:32] I found our issue with git-deploy [18:36:34] apergos, I ve removed it in the second patchset as it was optimised out of DeviceDetection too [18:36:44] ok [18:37:30] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35298 [18:38:13] tyvm apergos [18:38:45] what hosts need a puppet run now? [18:39:30] MaxSem: [18:39:41] varnish, but it's not urgent so a usual puppet run would suffice [18:39:53] ok [18:42:57] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 9.270 seconds [18:43:41] apergos, can you also take a look at https://gerrit.wikimedia.org/r/#/c/35931/ ?:) [18:43:51] in just a sec, yes [18:48:23] heya, LeslieCarr or other networking knowledgeables [18:48:35] whats up [18:48:40] got a sec to help me understand linux udp packet loss re udp2log? [18:49:03] sure, though the packet loss stat isn't actually packet loss [18:49:07] it's loss in the log processing [18:49:23] right [18:49:24] right, well [18:49:32] except, i see drops in /proc/net/udp [18:49:35] or [18:49:51] i see, that just means that udp2log isn't able to remove items from the buffer fast enough? [18:49:52] oh? increasing ? [18:49:53] yes [18:49:58] this is on analytics machines, btw [18:50:02] not the main udp2log machines [18:50:05] those all run unsampled [18:50:16] i'm trying to do stuff with unsampled logs [18:50:17] ah [18:50:33] Change merged: Demon; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37854 [18:50:53] which machines ? [18:51:05] i can check on the interfaces to see if there's drops there [18:51:06] just in case [18:51:26] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35931 [18:51:35] !log demon synchronized wmf-config/CommonSettings.php 'New link for Special:Version' [18:51:42] welp, i've got 2 i'm playing with right now, [18:51:43] Logged the message, Master [18:51:51] analytics1026 is the one that is easier to play with for debugging [18:51:57] apergos, you're my hero! [18:52:12] for at least the next 5 minutes :-D [18:52:13] ok [18:52:14] but, I think this is application related somehow, leslieCarr [18:52:15] because [18:52:31] I know how it goes here: "what has ops done for us lately?" ;-) [18:52:32] i'm using Tim's packet-loss log code to print out a packet loss report [18:52:44] that code just looks at hostnames and seq numbers in the log stream [18:52:52] and then prints out a report if it seems missing seqs [18:53:21] so, when I run the unsampled log through packet-loss log with no other udp2log processes [18:53:28] packet loss is close to nothing, or 0 [18:53:54] but, if I add another unsampled process (this one is using udp-filter and then sending data out to another machine) [18:54:19] all the sudden I get lots of dropped packets, and packet loss log reports ~60% loss [18:54:47] well the interface itself is getting very little traffic with no lost packets [18:54:57] on analytics1026? [18:54:59] how can you tell? [18:55:07] tcpdump? [18:55:26] ifconfig? [18:56:11] the switch information [18:56:17] oh [18:56:26] its getting very little traffic? [18:56:35] yeah [18:56:40] dstat —net shows ~50MB recv / sec [18:56:54] (this is multicast, btw) [18:57:01] reading from multicast ip [18:57:46] yeah, i'm seeing a max of 400mbit a second [18:57:55] and 0 drops [18:58:07] aye ok, sounds about right, [18:58:33] ok 0 drops at switch, so that means that its all on an26, right? [18:58:35] MaxSem: your change is not live on yttrium [18:58:40] yep [18:58:43] ok, ja [18:58:48] *now [18:59:26] that's what I've figured so far, because if I run only the packet-loss filter to measure loss at the log line level, i get very little loss [18:59:35] but if I run another process, I start getting lots [18:59:43] and i'd like to understand why [18:59:47] so [18:59:58] from netstat -u [19:00:00] netstat -su [19:00:01] 2377440535 packet receive errors [19:00:10] /proc/net/udp says [19:00:13] drops [19:00:17] 13195408 [19:00:22] and that keeps increasing [19:00:59] now that is a good question that i don't have the answer to yet [19:01:01] i'm suspecting that there is some buffer issue with multiple unsampled processes [19:01:21] we have some production machines that are running unsampled processes, but I think that there is only one when that happens [19:01:22] yeah [19:01:25] there were also changes to /etc/solr/conf/solrconfig.xml and /etc/default/jetty [19:01:43] and I don't understand all of the network buffers that the nic or the kernel have in between the application, especially in context of udp2log, since it forks a process for each defined filter [19:02:00] hrm, is there a local buffer in the application ? [19:02:05] yes i think so [19:02:10] // Process received packets [19:02:11] const size_t bufSize = 65536; [19:02:11] ? [19:02:19] yeah [19:02:19] bytesRead = socket.Recv(receiveBuffer, bufSize); [19:03:29] hrm, want to try increasing that ? [19:03:42] ergh, woul dhave to recompile it, but i could [19:04:07] hm, before I do lemme see if I can understand how sockets and buffers work with forked processes like this [19:04:19] apergos, that's ok - these changes are made by the solr class [19:04:26] ok great [19:04:27] ok [19:04:52] also, this stackoverflow question may be relevant - http://stackoverflow.com/questions/6627702/multicast-packet-loss-running-two-instances-of-the-same-application [19:05:24] (stackoverflow -- making sysadmins seem like magical fountains of knowledge for years!) [19:05:43] haha, i've been googling all morning, haven't read this one yet... [19:05:52] read many a stack overflow posts, hehe [19:11:00] !log fixing test-star vs. old-star and new-star WP certificate/key confusion on ssl hosts.. arrg [19:11:08] Logged the message, Master [19:17:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:22:37] New patchset: Jgreen; "redo for new environment" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37864 [19:24:40] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37864 [19:28:51] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.649 seconds [19:32:21] ok, LeslieCarr, as far as I can tell from reading udp2log code [19:32:38] there is a single process that reads from the socket, and writes to an application buffer [19:33:09] and epoll is used to notify each process that they need to read from the app buffer [19:33:30] so, it seems like this is a different problem than the stack overflow link you sent [19:33:48] <^demon> Just started scap. Hang on to your pants everybody. [19:33:55] the OP there was trying to read from the same mulitcast ip in different processes [19:34:05] udp2log should only use a single process to read off the network socket [19:38:08] !log repooling ssl4 [19:38:18] Logged the message, Master [19:39:53] ottomata: I'm catching up on the backlog here. what's the new line you're adding to the config? [19:40:21] ...and which machine are you debugging this on? [19:40:22] oh? i'm not adding new lines [19:40:26] this is on analytics machines [19:40:30] not on any of the prod instanes [19:40:32] instances [19:40:42] !log restarting nginx on all ssl hosts [19:40:46] i'm dealing with unsampled udp2logs, trying to understand why I can't processthem without dropping packets [19:40:50] Logged the message, Master [19:41:13] this is happening in two cases for me right now [19:41:23] the easiest to debug is anlaytics1026 [19:41:39] i'm using an1026 to help drdee and average_drifter test webstatscollector changes [19:41:50] webstatscollector operates on unsampled udp2logs [19:41:55] https://en.wikipedia.org isn't working for me. Known issue? [19:42:09] I get "Firefox can't establish a connection to the server at en.wikipedia.org." [19:42:43] so i'm trying to get them an unsampled stream that I can pipe through filter (now running as udp-filter -o), and then pipe that over to stat1 so stefan can test collector results and make sure they work [19:42:57] but, whenever I try to use udp2log with an unsampled pipe process [19:43:18] <^demon> Marybelle: wfm. [19:43:19] i start loosing packets. the packet-loss log reports around 60% loss [19:43:43] Marybelle: working for me as well, at least on Chrome. Trying Firefox in a sec [19:43:45] ^demon: Hmm, it was working for me and then suddenly stopped. [19:43:54] Other sites seem to load fine... [19:44:11] robla: we also need the unsampled streams to start supporting funnel analysis for the different product teams [19:44:36] PROBLEM - HTTPS on ssl1004 is CRITICAL: Connection refused [19:44:50] <^demon> I wish downforeveryoneorjustme did ssl. [19:45:37] Firefox working for me as well [19:46:06] i think it was related to this: mutante: !log restarting nginx on all ssl hosts [19:46:11] It might be an issue with XO... [19:46:55] Another report that https isn't working, but http is. [19:47:05] I'm able to reproduce. http is fine; https isn't working. [19:47:43] Hi Terry, Sumana. [19:47:56] I can connect to en.wp over http but not https right now -- but wikisource https works [19:47:56] sumanah sees the problem as well [19:48:00] (hi Marybelle) [19:48:12] maybe HTTPS everywhere is the issue.... [19:48:15] weirdly, the person right next to me (Katie) is connecting over https and is fine [19:48:26] robla: It's failing for me in Chrome without HTTPSEverywhere installed. [19:48:40] It may be a provider issue and not an issue with Wikimedia. [19:48:40] so much for that then.... [19:48:45] Though I just saw nagios complain about SSL... [19:48:45] well, I tried just manually in epiphany and without HTTPS Everywhere and it failed [19:49:24] ipv6 maybe? [19:49:27] Wiktionary works with & without HTTPS. [19:49:37] It's working again now. [19:49:46] ok, problem is fixed [19:49:55] https://en.wikipedia.org is fine for me now [19:49:59] Clogged tubes, I guess. [19:50:16] RECOVERY - HTTPS on ssl1004 is OK: OK - Certificate will expire on 07/19/2016 06:51. [19:50:38] !log replaced wikipedia SSL cert with new DigiCert (rather than RapidSSL) (RT-3639) [19:50:46] Logged the message, Master [19:51:37] aha, ok [19:57:52] !log demon ran sync-common-all [19:58:18] Logged the message, Master [19:59:20] !log demon rebuilt wikiversions.cdb and synchronized wikiversions files: testwiki to 1.21wmf6 [19:59:33] Logged the message, Master [20:01:24] !log demon rebuilt wikiversions.cdb and synchronized wikiversions files: testwiki back to 1.21wmf5 [20:01:37] Logged the message, Master [20:01:46] PHP fatal error in /home/wikipedia/common/wmf-config/CommonSettings.php line 2861: [20:01:46] require() [function.require]: Failed opening required '/home/wikipedia/common/php-1.21wmf6/../wmf-config/ExtensionMessages-1.21wmf6.php' (include_path='/home/wikipedia/common/php-1.21wmf6/extensions/TimedMediaHandler/handlers/OggHandler/PEAR/File_Ogg:/home/wikipedia/common/php-1.21wmf6:/home/wikipedia/common/php-1.21wmf6/lib:/usr/local/lib/php:/usr/share/php') [20:02:48] overzealous search and replace for s/wmf/wmf6/ in CommonSettings.php? [20:03:03] <^demon> Yeah, I rolled back to wmf5. [20:03:13] <^demon> No, we need l10n for 1.21wmf6. [20:03:26] !log demon Started syncing Wikimedia installation... : [20:03:32] <^demon> I forgot sync-common-all doesn't do l10n. [20:03:33] Logged the message, Master [20:04:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:04:46] * robla goes to grab lunch while scap is running [20:05:34] tmh1 enwiki Error connecting to 10.0.6.73: User 'wikiadmin' has exceeded the 'max_user_connections' resource (current value: 80) [20:05:42] well I guess it was hit then :) [20:06:58] woo [20:07:02] it works! [20:07:38] !log demon Finished syncing Wikimedia installation... : [20:08:02] Logged the message, Master [20:10:49] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [20:10:49] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [20:10:49] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [20:11:02] New patchset: Asher; "returning pc1 to svc" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37874 [20:11:46] !log demon rebuilt wikiversions.cdb and synchronized wikiversions files: [20:12:03] Logged the message, Master [20:15:49] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37874 [20:17:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.172 seconds [20:19:31] !log asher synchronized wmf-config/CommonSettings.php 're-enabling pc1 for sql: the bag o stuffening' [20:19:46] Logged the message, Master [20:20:18] New patchset: Jgreen; "add conf for civicrm amazon audit script" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37875 [20:20:53] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37875 [20:26:19] New patchset: Demon; "Various updates for 1.21wmf6" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37878 [20:27:29] !log demon Started syncing Wikimedia installation... : [20:27:36] Logged the message, Master [20:36:39] Change merged: Demon; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37878 [20:38:08] New patchset: Jgreen; "typo fixes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37936 [20:39:43] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37936 [20:40:38] notpeter: around? [20:41:14] hey [20:44:30] notpeter: can you manually re-add the stuff at http://wikitech.wikimedia.org/view/Cron_jobs#manual_cron_jobs to hume [20:44:37] it has the crontab code snippets [20:44:50] right now, none of that stuff is run anymore [20:45:06] ...probably would help to be puppetized or something ;) [20:45:20] sure [20:46:50] why was none of that put into puppet? [20:47:30] AaronSchulz: all of them? [20:47:34] including the svn shit? [20:48:12] let's just put it all into puppet instead of any manual stuff ? [20:48:14] please? [20:49:09] notpeter: this is https://bugzilla.wikimedia.org/show_bug.cgi?id=42152 [20:50:57] LeslieCarr: oh, definitely [20:51:05] but I'm trying to at least figure out what needs to go in [20:51:31] AaronSchulz: ok. but what I'm asking is "is that page up to date?" [20:51:44] "are all of those crons needed at present" [20:51:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:52:27] notpeter: that doxygen one might be useless...should use git [20:52:44] might want to ask Ryan_Lane about the ldap one maybe [20:53:11] lol, zwinger [20:53:21] notpeter: anyway the top three are fine [20:53:43] New patchset: Alex Monk; "(bug 42921) Add upload_by_url permission to commons image-reviewers" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37939 [20:53:52] ok [20:55:28] <^demon> The svn crap should not be on hume--that's on formey as it should be. [20:55:46] ok [20:56:23] <^demon> (And I *think* it's mostly puppetized, although all 3 of those are going away in the semi-near-ish-future) [20:57:22] New patchset: Dzahn; "make mobile wikipedia use new SSL cert and kill test-star afterwards" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37940 [20:58:39] New review: Dzahn; "ok now,after manual fix to key mismatch... and about time ..would have expired soon" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/37940 [20:58:40] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37940 [20:59:47] New patchset: Demon; "test2wiki also on 1.21wmf6" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37941 [21:00:08] Change merged: Demon; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/37941 [21:01:04] New review: Ryan Lane; "After discussion with Asher, it may not be possible to properly send a vary header for the redirects..." [operations/apache-config] (master); V: 0 C: -2; - https://gerrit.wikimedia.org/r/13293 [21:03:27] !log demon Finished syncing Wikimedia installation... : [21:03:35] Logged the message, Master [21:03:47] !log demon rebuilt wikiversions.cdb and synchronized wikiversions files: test2wiki also on 1.21wmf6 [21:03:58] New review: Asher; "We should move all of this rewrite logic to varnish, once varnish replaces squid. Hopefully that ca..." [operations/apache-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/13293 [21:04:04] Logged the message, Master [21:04:58] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 7.259 seconds [21:09:23] AaronSchulz: should the update flagged revs script possibly not be run as root? [21:10:12] I don't see why it would have to be [21:10:42] ok [21:10:53] setting it ot run as apache user, then [21:14:08] New patchset: Lcarr; "l10nupdate user required on these hosts for deployment rt # 4052" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37942 [21:14:12] Reedy: ^^ this should fix your problem [21:14:15]