[00:04:13] !log catrope Finished syncing Wikimedia installation... : Updating VisualEditor to master [00:04:18] Logged the message, Master [00:06:22] (03PS2) 10BryanDavis: Add *_delta stats for vhtcpd ganglia. [operations/puppet] - 10https://gerrit.wikimedia.org/r/80151 [00:11:21] !log catrope synchronized php-1.22wmf13/extensions/VisualEditor/modules/ve/ui/tools/buttons/ve.ui.UnderlineButtonTool.js 'touch' [00:11:26] Logged the message, Master [00:13:40] PROBLEM - Puppet freshness on cp1063 is CRITICAL: No successful Puppet run in the last 10 hours [00:22:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:23:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [00:26:21] hi Risker :) [00:26:26] (03PS1) 10Ori.livneh: Tee centralauth wfDebug()s to vanadium for Gangliafication [operations/puppet] - 10https://gerrit.wikimedia.org/r/80164 [00:27:01] hi ori-l! we were supposed to talk about something last week but we both wound up getting on planes instead. Was it performance? [00:27:20] RECOVERY - check_job_queue on hume is OK: JOBQUEUE OK - all job queues below 10,000 [00:27:38] Eep -- I may have forgotten, too. [00:28:19] StevenW was playing matchmaker there. We can talk about it later this week, I'm kind of overloaded for the next day or two [00:29:23] Yes, performance! [00:29:26] Thanks for the reminder. [00:30:30] PROBLEM - check_job_queue on hume is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:31:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:32:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.141 second response time [00:33:04] TimStarling: https://gerrit.wikimedia.org/r/80164 is almost an exact duplicate of a change you reviewed in the past which configured the udp2log instance on fluorine to relay exceptions/fatals to vanadium. This time it's centralauth, as an expedient hack to get a metric going to accompany the coming deployments. Could you review? [00:37:28] (03CR) 10Greg Grossmeier: [C: 031] Tee centralauth wfDebug()s to vanadium for Gangliafication [operations/puppet] - 10https://gerrit.wikimedia.org/r/80164 (owner: 10Ori.livneh) [00:39:20] (03CR) 10Tim Starling: [C: 04-1] "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/80164 (owner: 10Ori.livneh) [00:41:29] (03PS2) 10Ori.livneh: Tee centralauth wfDebug()s to vanadium for Gangliafication [operations/puppet] - 10https://gerrit.wikimedia.org/r/80164 [00:51:13] ori-l: do you want me to deploy it? [00:51:39] TimStarling: yes, I'd appreciate that. Thanks. [00:51:47] TimStarling: that'd be great (not sure if he's still here). getting more data the better at this point [00:52:00] well then [00:52:01] ;) [00:58:02] !log allow engineering and ops group members in RT to create new saved searches [00:58:07] Logged the message, Master [01:02:23] !log mwalker synchronized php-1.22wmf13/extensions/CentralNotice/special 'Applying CentralNotice fix for bug 53032' [01:02:28] Logged the message, Master [01:02:58] !log mwalker synchronized php-1.22wmf12/extensions/CentralNotice/special 'Applying CentralNotice fix for bug 53032' [01:03:03] Logged the message, Master [01:22:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:23:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [02:06:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:07:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [02:13:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:14:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [02:15:12] !log LocalisationUpdate completed (1.22wmf13) at Wed Aug 21 02:15:12 UTC 2013 [02:15:18] Logged the message, Master [02:22:02] AaronSchulz: it's a tribute? [02:28:33] !log LocalisationUpdate completed (1.22wmf12) at Wed Aug 21 02:28:33 UTC 2013 [02:28:39] Logged the message, Master [02:34:50] !log [02:35:19] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Aug 21 02:35:19 UTC 2013 [02:35:24] Logged the message, Master [02:46:02] !log RT - grant global right to see system dashboards to privileged users [02:46:07] Logged the message, Master [02:46:40] !log RT - create new system dashboard 'wikimedia default' that lists open ops-requests, quick ticket creation, reminders and bookmarked tickets [02:46:45] Logged the message, Master [02:50:03] PROBLEM - Puppet freshness on zirconium is CRITICAL: No successful Puppet run in the last 10 hours [02:56:03] PROBLEM - Puppet freshness on lvs1001 is CRITICAL: No successful Puppet run in the last 10 hours [03:16:29] (03CR) 10Dzahn: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/79955 (owner: 10Mattflaschen) [03:22:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:23:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.136 second response time [03:28:49] Coren: I'm around now. my jet lag caused me to pass out for most of the day [03:30:29] just "jet lag"? [03:32:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:33:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [03:38:06] (03PS3) 10Jalexander: Replace public key for jamesofur [operations/puppet] - 10https://gerrit.wikimedia.org/r/79304 [03:47:42] Ryan_Lane: Yeay jet lag. Go to sleep, I'll talk to you tomorrow. :-) (I'm off to bed to rest that effing flu away) [03:48:58] heh, well now I'm not tired :) [03:50:05] Ryan_Lane: work through the night, it'll be fun ;) [03:50:12] :D [03:51:28] Ryan_Lane and greg-g (and ori-l and TimStarling and the others who have been working on it), thank you for your work on the HTTPS issue. [03:51:41] yw [03:51:50] Risker: you're welcome. [03:52:03] Tim's doing the hard part, I'm just writing emails and blog posts and wiki pages ;) [03:52:35] yeah, that wikipage needs a good copy edit and comb-out, but at least it's there :) [03:52:44] please do! [03:52:47] I'm doing peice meal right now [03:53:12] also, if you haven't seen, the Blog post draft has more up to date info that I'll copy over to the HTTPS metawiki page in the morning [03:53:16] * Risker tries to find it again....darn watchlist.... [03:53:23] https://meta.wikimedia.org/wiki/Wikimedia_Blog/Drafts/HTTPS_by_default_for_logged_in_users [03:53:57] * greg-g goes to check in on family [03:53:59] see ya later [04:22:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:23:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [05:14:05] PROBLEM - Puppet freshness on mw1126 is CRITICAL: No successful Puppet run in the last 10 hours [05:22:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:23:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [05:30:54] greg-g, did some copy editing for you, also left some CE suggestions on the talk page [05:36:26] yes, that's a good point [05:52:13] PROBLEM - Puppet freshness on srv281 is CRITICAL: No successful Puppet run in the last 10 hours [05:52:13] PROBLEM - Puppet freshness on virt2 is CRITICAL: No successful Puppet run in the last 10 hours [05:55:13] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [06:00:43] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:02:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [06:14:09] PROBLEM - Puppet freshness on mw1046 is CRITICAL: No successful Puppet run in the last 10 hours [06:49:24] PROBLEM - search indices - check lucene status page on search32 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern found - 480 bytes in 0.057 second response time [07:02:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:03:37] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.130 second response time [07:05:43] (03PS1) 10Tim Starling: Proposed configuration for wgSecureLogin [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80175 [07:05:59] (03CR) 10Tim Starling: "Untested." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80175 (owner: 10Tim Starling) [07:15:36] (03PS1) 10ArielGlenn: fix canonical ip addr for snmptrap [operations/puppet] - 10https://gerrit.wikimedia.org/r/80176 [07:16:46] (03PS2) 10ArielGlenn: fix canonical ip addr for snmptrap [operations/puppet] - 10https://gerrit.wikimedia.org/r/80176 [07:18:41] (03CR) 10ArielGlenn: [C: 032] fix canonical ip addr for snmptrap [operations/puppet] - 10https://gerrit.wikimedia.org/r/80176 (owner: 10ArielGlenn) [07:22:38] RECOVERY - Puppet freshness on virt2 is OK: puppet ran at Wed Aug 21 07:22:37 UTC 2013 [07:27:31] (03CR) 10MaxSem: [C: 031] Proposed configuration for wgSecureLogin [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80175 (owner: 10Tim Starling) [07:31:38] apergos: that's wrong [07:32:03] please explain [07:32:05] you're assuming eth0 has an IP which can easily not be the case [07:32:15] e.g. all machines with bonds [07:32:39] what woudl you propose? [07:34:00] set it to ipaddress_bond0 if that variable has a value, otherwise to eth0? [07:34:04] or are there more exceptions? [07:34:56] who knows really :) [07:37:18] well I could do that for now [07:47:32] can I update ULS to master on 1.22wmf13 now? [07:51:07] I'm the only one having deployment windows today [07:51:18] and I'm saying it's okay :) [07:51:19] go ahead [07:51:26] thanks [07:51:34] yw [07:54:07] RECOVERY - Puppet freshness on virt1005 is OK: puppet ran at Wed Aug 21 07:54:00 UTC 2013 [07:57:19] (03CR) 10Faidon: [C: 032] "Having compatibility with what's in precise is a good thing, so thanks for that." [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/80127 (owner: 10Edenhill) [07:57:25] (03CR) 10Faidon: [V: 032] "Having compatibility with what's in precise is a good thing, so thanks for that." [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/80127 (owner: 10Edenhill) [08:01:08] !log nikerabbit synchronized php-1.22wmf13/extensions/UniversalLanguageSelector/ 'ULS to master' [08:01:20] Logged the message, Master [08:01:23] I'm done [08:12:12] Nischayn22|Away: thanks [08:12:26] (03CR) 10Faidon: [C: 04-1] "(4 comments)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/79329 (owner: 10Akosiaris) [08:19:24] (03PS5) 10Faidon: exim: add DKIM for wikimedia.org domains [operations/puppet] - 10https://gerrit.wikimedia.org/r/79754 [08:19:25] (03PS1) 10Faidon: mailman: remove DKIM headers [operations/puppet] - 10https://gerrit.wikimedia.org/r/80181 [08:20:16] (03CR) 10Faidon: [C: 032] mailman: remove DKIM headers [operations/puppet] - 10https://gerrit.wikimedia.org/r/80181 (owner: 10Faidon) [08:20:23] (03CR) 10Faidon: [C: 032] exim: add DKIM for wikimedia.org domains [operations/puppet] - 10https://gerrit.wikimedia.org/r/79754 (owner: 10Faidon) [08:40:05] (03PS1) 10ArielGlenn: one more try for snmp trap ip client ip address [operations/puppet] - 10https://gerrit.wikimedia.org/r/80182 [08:41:51] (03CR) 10ArielGlenn: [C: 032] one more try for snmp trap ip client ip address [operations/puppet] - 10https://gerrit.wikimedia.org/r/80182 (owner: 10ArielGlenn) [08:44:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:46:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [09:20:50] (03PS1) 10Faidon: exim: s/content/source/ on DKIM keys [operations/puppet] - 10https://gerrit.wikimedia.org/r/80184 [09:21:04] (03CR) 10Faidon: [C: 032 V: 032] exim: s/content/source/ on DKIM keys [operations/puppet] - 10https://gerrit.wikimedia.org/r/80184 (owner: 10Faidon) [09:21:07] ffs [09:21:10] SO MUCH TIME [09:27:47] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:29:38] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.145 second response time [09:31:57] 2013-08-21 12:31:25 1VC4lA-0007tq-Pn DKIM: d=wikimedia.org s=wikimedia c=relaxed/relaxed a=rsa-sha256 [verification succeeded] [09:32:00] 2013-08-21 12:31:42 1VC4lR-0007tx-UT DKIM: d=lists.wikimedia.org s=wikimedia c=relaxed/relaxed a=rsa-sha256 [verification succeeded] [09:32:03] finally [10:13:51] PROBLEM - Puppet freshness on cp1063 is CRITICAL: No successful Puppet run in the last 10 hours [10:23:41] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:24:25] (03PS1) 10Faidon: exim: revert DKIM signing for wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/80189 [10:24:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.535 second response time [10:26:19] (03CR) 10Faidon: [C: 032] exim: revert DKIM signing for wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/80189 (owner: 10Faidon) [10:58:44] and finally starting my windows [10:58:44] (03PS1) 10Faidon: filebackend: switch master to Ceph [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80195 [11:00:52] (03CR) 10Faidon: [C: 032] filebackend: switch master to Ceph [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80195 (owner: 10Faidon) [11:02:08] !log faidon synchronized wmf-config/filebackend.php 'promote ceph to master' [11:02:13] Logged the message, Master [11:54:46] good luck! [12:03:07] (03PS1) 10Yuvipanda: Add appropriate support for websocket proxying [operations/puppet] - 10https://gerrit.wikimedia.org/r/80201 [12:03:21] it's fiiine [12:08:18] !log Running extensions/CentralAuth/maintenance/populateHomeDB.php in a screen on tin [12:08:23] Logged the message, Master [12:08:54] :-) [12:12:32] (03PS9) 10Yuvipanda: Route requests based on data from Redis [operations/puppet] - 10https://gerrit.wikimedia.org/r/78025 [12:12:34] (03PS9) 10Yuvipanda: Add redis lua library to labsproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 [12:12:34] (03PS2) 10Yuvipanda: Add appropriate support for websocket proxying [operations/puppet] - 10https://gerrit.wikimedia.org/r/80201 [12:12:36] (03PS1) 10Yuvipanda: Remove useless proxy_redirect directive [operations/puppet] - 10https://gerrit.wikimedia.org/r/80203 [12:23:17] apergos: so, what did you do with ::ipaddress after all? [12:23:52] if bond0 is set use that, if not see if eth0 is set and use that, otherwise fall back to ipaddress [12:24:05] I tried checking with salt which hss didn't have eth0 [12:24:10] those all seemed to have bond0 [12:24:28] *which hosts [12:45:25] I think we should just $::ipaddress [12:45:36] virt2 must be the only real exception [12:45:46] but anyway, let's talk over a gerrit changeset :) [12:50:56] PROBLEM - Puppet freshness on zirconium is CRITICAL: No successful Puppet run in the last 10 hours [12:56:56] PROBLEM - Puppet freshness on lvs1001 is CRITICAL: No successful Puppet run in the last 10 hours [13:08:40] (03PS1) 10Faidon: Prepare rubidium/mexia for authdns migration [operations/puppet] - 10https://gerrit.wikimedia.org/r/80211 [13:10:06] (03CR) 10Faidon: [C: 032] Prepare rubidium/mexia for authdns migration [operations/puppet] - 10https://gerrit.wikimedia.org/r/80211 (owner: 10Faidon) [13:31:04] okay, here goes [13:38:16] !log switching ns1 traffic to mexia (new authdns) [13:38:21] Logged the message, Master [13:40:00] (03CR) 10Akosiaris: "(4 comments)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/79329 (owner: 10Akosiaris) [13:46:25] PROBLEM - HTTP radosgw on ms-fe1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:46:29] when in the move to HTTPS happening? [13:46:48] (03PS5) 10Akosiaris: Refactoring nrpe module (round 2/??) [operations/puppet] - 10https://gerrit.wikimedia.org/r/79329 [13:47:55] PROBLEM - HTTP radosgw on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:49:15] RECOVERY - HTTP radosgw on ms-fe1004 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.005 second response time [13:49:45] PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:49:47] PROBLEM - Apache HTTP on mw1154 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:49:47] PROBLEM - Apache HTTP on mw1159 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:49:47] RECOVERY - HTTP radosgw on ms-fe1002 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 2.950 second response time [13:49:50] aww crap [13:50:35] RECOVERY - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 60995 bytes in 0.230 second response time [13:50:37] RECOVERY - Apache HTTP on mw1154 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.063 second response time [13:50:37] RECOVERY - Apache HTTP on mw1159 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.064 second response time [13:50:37] PROBLEM - HTTP radosgw on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:51:10] wth [13:51:13] there's nothing wrong [13:51:26] malafaya: Answered in -tech. [13:54:25] RECOVERY - HTTP radosgw on ms-fe1001 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.004 second response time [13:55:38] ganglia says ms-be1008 was unhappy there for a bit [13:56:06] it's been unhappy in general [13:56:12] it was like that before [13:56:32] there is a spike at that time though [13:56:35] hmmm [13:56:36] weird [13:56:43] thanks, that was a nice hint [13:56:46] sure [13:56:59] I had it on my todo to ask #ceph on how to debug this high cpu usage [13:57:02] look at 2/4hr [13:57:26] look at day, it's even worse [13:57:28] pretty interesting [13:58:31] !log stopping ceph-osd 88 & 95 (ms-be1008), evidence of unexplainable high cpu usage [13:58:36] Logged the message, Master [14:00:05] PROBLEM - Ceph on ms-fe1003 is CRITICAL: Ceph HEALTH_WARN 708 pgs degraded: 395 pgs stuck unclean: recovery 12860568/908502459 degraded (1.416%): 2/143 in osds are down [14:00:15] PROBLEM - Ceph on ms-fe1001 is CRITICAL: Ceph HEALTH_WARN 708 pgs degraded: 396 pgs stuck unclean: recovery 12860572/908502633 degraded (1.416%): 2/143 in osds are down [14:00:25] PROBLEM - Ceph on ms-fe1004 is CRITICAL: Ceph HEALTH_WARN 708 pgs degraded: 406 pgs stuck unclean: recovery 12860574/908502678 degraded (1.416%): 2/143 in osds are down [14:03:14] CPU is now increased for normal reasons [14:03:37] the OSDs were out'ed, so now other OSDs on that box are getting all of its data [14:05:48] holy crap [14:05:52] ? [14:05:58] http://ganglia.wikimedia.org/latest/?c=Ceph%20eqiad&h=ms-be1001.eqiad.wmnet&m=cpu_report&r=hour&s=descending&hc=4&mc=2 [14:06:01] gigabit's full [14:06:09] yeow [14:06:22] take that swift [14:08:52] so why is there a spike in traffic and load like that in the last ten mins? that sems pretty sharp [14:09:00] (only if you are not busy) [14:09:01] because I stopped osd.88 & osd.95 [14:09:10] two osds has that much impact? [14:09:14] ms-be1001/2 have the other two copies [14:09:25] so now it's reconstructing the third copy out of the other two [14:09:33] for all of 88/95's data [14:09:37] that's good! [14:09:43] as long as I can limit it somehow [14:09:47] as fas as it can go you mean [14:09:48] it's nice that it can go at it full speed [14:09:58] unlike say swift [14:10:09] where it took weeks for you to depool machines :) [14:10:16] itmight be nice to be able to throttle it some [14:10:26] (03PS1) 10Akosiaris: Tab cleanup in site.pp. Also fix vim modeline [operations/puppet] - 10https://gerrit.wikimedia.org/r/80213 [14:14:12] amazing [14:16:18] PROBLEM - Apache HTTP on mw1155 is CRITICAL: Connection timed out [14:16:18] PROBLEM - Apache HTTP on mw1159 is CRITICAL: Connection timed out [14:16:27] PROBLEM - HTTP radosgw on ms-fe1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:16:28] PROBLEM - Apache HTTP on mw1160 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:16:28] PROBLEM - Apache HTTP on mw1154 is CRITICAL: Connection timed out [14:16:34] fuck [14:16:37] PROBLEM - HTTP radosgw on ms-fe1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:16:37] PROBLEM - Apache HTTP on mw1156 is CRITICAL: Connection timed out [14:16:57] PROBLEM - Apache HTTP on mw1157 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:17:17] PROBLEM - Apache HTTP on mw1158 is CRITICAL: Connection timed out [14:17:28] PROBLEM - Apache HTTP on mw1153 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:17:28] PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: Connection timed out [14:17:37] grumble [14:20:17] RECOVERY - HTTP radosgw on ms-fe1003 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.004 second response time [14:20:18] RECOVERY - Apache HTTP on mw1159 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 4.601 second response time [14:20:18] RECOVERY - Apache HTTP on mw1153 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.156 second response time [14:20:18] RECOVERY - Apache HTTP on mw1158 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 9.751 second response time [14:20:27] RECOVERY - HTTP radosgw on ms-fe1001 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.018 second response time [14:20:37] RECOVERY - Apache HTTP on mw1156 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.638 second response time [14:21:07] RECOVERY - Apache HTTP on mw1155 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.057 second response time [14:21:18] RECOVERY - Apache HTTP on mw1160 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.112 second response time [14:21:27] RECOVERY - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 60995 bytes in 0.146 second response time [14:21:30] PROBLEM - HTTP radosgw on ms-fe1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:21:47] RECOVERY - Apache HTTP on mw1157 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.055 second response time [14:22:27] RECOVERY - Apache HTTP on mw1154 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.068 second response time [14:24:17] RECOVERY - HTTP radosgw on ms-fe1004 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.008 second response time [14:38:40] (03PS1) 10Ottomata: Preparing to remove private webrequest logs from stat1. [operations/puppet] - 10https://gerrit.wikimedia.org/r/80214 [14:42:41] (03CR) 10Ottomata: [C: 032 V: 032] Preparing to remove private webrequest logs from stat1. [operations/puppet] - 10https://gerrit.wikimedia.org/r/80214 (owner: 10Ottomata) [14:49:40] Haha 28.6M rows in globalnames [14:58:34] PROBLEM - SSH on pdf1 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:59:22] (03PS1) 10Faidon: Ceph: tune config knobs in response to mini-outage [operations/puppet] - 10https://gerrit.wikimedia.org/r/80219 [14:59:25] RECOVERY - SSH on pdf1 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [15:00:44] (03CR) 10Faidon: [C: 032] Ceph: tune config knobs in response to mini-outage [operations/puppet] - 10https://gerrit.wikimedia.org/r/80219 (owner: 10Faidon) [15:14:09] PROBLEM - Puppet freshness on mw1126 is CRITICAL: No successful Puppet run in the last 10 hours [15:20:09] PROBLEM - HTTP radosgw on ms-fe1004 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:21:49] PROBLEM - HTTP radosgw on ms-fe1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:22:49] PROBLEM - Apache HTTP on mw1157 is CRITICAL: Connection timed out [15:22:59] RECOVERY - HTTP radosgw on ms-fe1004 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.003 second response time [15:22:59] PROBLEM - Apache HTTP on mw1159 is CRITICAL: Connection timed out [15:23:19] PROBLEM - Apache HTTP on mw1153 is CRITICAL: Connection timed out [15:23:19] PROBLEM - Apache HTTP on mw1158 is CRITICAL: Connection timed out [15:23:19] PROBLEM - Apache HTTP on mw1155 is CRITICAL: Connection timed out [15:23:30] PROBLEM - Apache HTTP on mw1154 is CRITICAL: Connection timed out [15:23:39] PROBLEM - Apache HTTP on mw1156 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:24:39] PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: Connection timed out [15:24:42] PROBLEM - Apache HTTP on mw1160 is CRITICAL: Connection timed out [15:24:42] RECOVERY - HTTP radosgw on ms-fe1003 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.003 second response time [15:25:24] (03PS2) 10Reedy: Remove long-buried $wgLogAutocreatedAccounts [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79059 (owner: 10Nemo bis) [15:25:28] (03CR) 10Reedy: [C: 032] Remove long-buried $wgLogAutocreatedAccounts [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79059 (owner: 10Nemo bis) [15:25:40] (03Merged) 10jenkins-bot: Remove long-buried $wgLogAutocreatedAccounts [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79059 (owner: 10Nemo bis) [15:25:59] PROBLEM - HTTP radosgw on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:26:04] (03PS2) 10Reedy: Set Wikibase sort order to alphabetic for ilowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79990 (owner: 10TTO) [15:26:09] (03CR) 10Reedy: [C: 032] Set Wikibase sort order to alphabetic for ilowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79990 (owner: 10TTO) [15:26:19] (03Merged) 10jenkins-bot: Set Wikibase sort order to alphabetic for ilowiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79990 (owner: 10TTO) [15:26:30] RECOVERY - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 60992 bytes in 5.537 second response time [15:26:33] RECOVERY - Apache HTTP on mw1154 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 6.420 second response time [15:26:37] (03CR) 10Raimond Spekking: "Thanks for merging. But https://de.planet.wikimedia.org/ still shows the old URL. Anyhting more to do? Or is it related to the failure abo" [operations/puppet] - 10https://gerrit.wikimedia.org/r/79466 (owner: 10Raimond Spekking) [15:27:09] RECOVERY - Apache HTTP on mw1153 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.058 second response time [15:27:09] RECOVERY - Apache HTTP on mw1155 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.047 second response time [15:27:49] RECOVERY - Apache HTTP on mw1157 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.176 second response time [15:28:12] (03CR) 10Reedy: [C: 04-1] "Needs rebasing" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78502 (owner: 10TTO) [15:28:19] RECOVERY - Apache HTTP on mw1158 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.987 second response time [15:28:20] (03CR) 10Akosiaris: [C: 032] Tab cleanup in site.pp. Also fix vim modeline [operations/puppet] - 10https://gerrit.wikimedia.org/r/80213 (owner: 10Akosiaris) [15:28:29] RECOVERY - Apache HTTP on mw1160 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.857 second response time [15:28:46] (03PS2) 10Reedy: Add WikiProject namespace for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78648 (owner: 10TTO) [15:28:49] RECOVERY - HTTP radosgw on ms-fe1002 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.004 second response time [15:28:53] (03CR) 10Reedy: [C: 032] Add WikiProject namespace for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78648 (owner: 10TTO) [15:28:59] RECOVERY - Apache HTTP on mw1159 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 6.176 second response time [15:29:08] (03Merged) 10jenkins-bot: Add WikiProject namespace for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78648 (owner: 10TTO) [15:29:20] (03PS3) 10Reedy: Create five additional namespaces for pflwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78624 (owner: 10TTO) [15:29:26] (03CR) 10Reedy: [C: 032] Create five additional namespaces for pflwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78624 (owner: 10TTO) [15:29:38] (03Merged) 10jenkins-bot: Create five additional namespaces for pflwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78624 (owner: 10TTO) [15:30:14] (03PS2) 10Reedy: NewUserMessage extension configuration on ckb.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78662 (owner: 10Dereckson) [15:30:17] (03CR) 10Reedy: [C: 032] NewUserMessage extension configuration on ckb.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78662 (owner: 10Dereckson) [15:30:29] (03Merged) 10jenkins-bot: NewUserMessage extension configuration on ckb.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78662 (owner: 10Dereckson) [15:31:19] PROBLEM - Apache HTTP on mw1155 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:31:30] (03PS2) 10Reedy: skwikisource: Project name localization [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79016 (owner: 10Danny B.) [15:31:30] PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: Connection timed out [15:31:39] (03CR) 10Reedy: [C: 032] skwikisource: Project name localization [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79016 (owner: 10Danny B.) [15:31:39] PROBLEM - Apache HTTP on mw1154 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:31:39] PROBLEM - Apache HTTP on mw1160 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:31:49] (03Merged) 10jenkins-bot: skwikisource: Project name localization [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79016 (owner: 10Danny B.) [15:31:49] PROBLEM - Apache HTTP on mw1157 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:31:51] paravoid: flapping a bit? [15:31:59] PROBLEM - HTTP radosgw on ms-fe1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:31:59] PROBLEM - Apache HTTP on mw1159 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:32:04] (03PS1) 10Faidon: Revert "filebackend: switch master to Ceph" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80221 [15:32:29] (03CR) 10Faidon: [C: 032] Revert "filebackend: switch master to Ceph" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80221 (owner: 10Faidon) [15:32:49] (03PS3) 10Reedy: Add several additional user groups for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79197 (owner: 10TTO) [15:32:59] (03CR) 10Reedy: [C: 032] Add several additional user groups for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79197 (owner: 10TTO) [15:33:08] (03Merged) 10jenkins-bot: Add several additional user groups for ckbwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79197 (owner: 10TTO) [15:33:19] PROBLEM - Apache HTTP on mw1153 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:33:19] PROBLEM - Apache HTTP on mw1158 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:33:30] (03PS2) 10Reedy: Add namespace aliases (shortcuts) for dewikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79550 (owner: 10TTO) [15:33:37] (03CR) 10Reedy: [C: 032] Add namespace aliases (shortcuts) for dewikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79550 (owner: 10TTO) [15:33:47] (03Merged) 10jenkins-bot: Add namespace aliases (shortcuts) for dewikivoyage [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79550 (owner: 10TTO) [15:33:58] Reedy: are you deploying? [15:34:15] I need to revert this but diff is dirty [15:34:20] has some wikisource stuff [15:34:25] Was going to when I'd finished merging stuff... [15:34:32] I've not pulled anything onto tin yet [15:34:41] I need to though [15:34:44] I fetched [15:34:49] RECOVERY - HTTP radosgw on ms-fe1002 is OK: HTTP OK: HTTP/1.1 200 OK - 311 bytes in 0.032 second response time [15:35:06] Right, I'll stop merging while you do yours [15:35:15] can I merge? I'm just going to sync one file [15:35:31] yeah, that's fine [15:36:19] RECOVERY - Apache HTTP on mw1158 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 6.721 second response time [15:36:28] !log faidon synchronized wmf-config/filebackend.php 'revert ceph promotion to master' [15:36:30] RECOVERY - Apache HTTP on mw1154 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 5.659 second response time [15:36:33] Logged the message, Master [15:37:09] RECOVERY - Apache HTTP on mw1153 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.062 second response time [15:37:09] RECOVERY - Apache HTTP on mw1155 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.064 second response time [15:37:29] RECOVERY - Apache HTTP on mw1160 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.091 second response time [15:37:29] RECOVERY - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 60992 bytes in 0.220 second response time [15:37:33] RECOVERY - Apache HTTP on mw1156 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.043 second response time [15:37:40] RECOVERY - Apache HTTP on mw1157 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.060 second response time [15:37:59] RECOVERY - Apache HTTP on mw1159 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.232 second response time [15:38:04] (03PS2) 10Reedy: Set up flood flag on shwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79324 (owner: 10TTO) [15:38:09] (03CR) 10Reedy: [C: 032] Set up flood flag on shwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79324 (owner: 10TTO) [15:38:18] (03Merged) 10jenkins-bot: Set up flood flag on shwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79324 (owner: 10TTO) [15:38:52] (03CR) 10Reedy: [C: 032] Enable "block" action for AbuseFilter on meta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79458 (owner: 10TTO) [15:39:17] (03PS2) 10Reedy: Clean up abusefilter.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78951 (owner: 10TTO) [15:40:04] (03CR) 10Reedy: [C: 04-1] "Needs rebasing" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/76277 (owner: 10TTO) [15:40:29] (03PS2) 10Reedy: Dereference unused category from ArticleFeedbackToolv5 en.wiki config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79832 (owner: 10Nemo bis) [15:40:34] (03CR) 10Reedy: [C: 032] Dereference unused category from ArticleFeedbackToolv5 en.wiki config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79832 (owner: 10Nemo bis) [15:40:45] (03Merged) 10jenkins-bot: Dereference unused category from ArticleFeedbackToolv5 en.wiki config [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79832 (owner: 10Nemo bis) [15:41:23] (03PS3) 10Reedy: Add missing HTTP error pages on bits.wm.o [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78507 (owner: 10TTO) [15:42:17] (03CR) 10Reedy: [C: 04-1] "Needs rebasing" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78637 (owner: 10TTO) [15:42:48] (03CR) 10Reedy: [C: 04-1] (bug 52997) $wgCategoryCollation to 'uca-ru' on all Russian-language [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79770 (owner: 10Andrey Kiselev) [15:43:47] (03PS3) 10Reedy: Clean up abusefilter.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78951 (owner: 10TTO) [15:43:51] (03CR) 10Reedy: [C: 032] Clean up abusefilter.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78951 (owner: 10TTO) [15:44:08] (03Merged) 10jenkins-bot: Clean up abusefilter.php [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78951 (owner: 10TTO) [15:44:29] (03PS2) 10Reedy: Enable "block" action for AbuseFilter on meta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79458 (owner: 10TTO) [15:44:34] (03CR) 10Akosiaris: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/79329 (owner: 10Akosiaris) [15:44:38] (03CR) 10Reedy: [C: 032] Enable "block" action for AbuseFilter on meta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79458 (owner: 10TTO) [15:44:47] (03Merged) 10jenkins-bot: Enable "block" action for AbuseFilter on meta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/79458 (owner: 10TTO) [15:46:01] !log reedy synchronized wmf-config/ [15:46:06] Logged the message, Master [15:46:12] That was very quick [15:46:38] !log reedy synchronized wmf-config/ [15:47:08] what's wrong? [15:47:33] Nothing [15:47:38] It just seemed to be very quick [15:47:44] real 0m23.814s [15:48:40] I suspect the dsh list being updated helped [15:51:39] !log switching ns0 to rubidium [15:51:44] Logged the message, Master [15:53:10] PROBLEM - Puppet freshness on srv281 is CRITICAL: No successful Puppet run in the last 10 hours [15:59:52] (03PS4) 10Reedy: Add missing HTTP error pages on bits.wm.o [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78507 (owner: 10TTO) [16:00:06] (03CR) 10Reedy: [C: 032] Add missing HTTP error pages on bits.wm.o [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78507 (owner: 10TTO) [16:00:18] (03Merged) 10jenkins-bot: Add missing HTTP error pages on bits.wm.o [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78507 (owner: 10TTO) [16:04:29] !log reedy synchronized docroot and w [16:04:34] Logged the message, Master [16:15:08] PROBLEM - Puppet freshness on mw1046 is CRITICAL: No successful Puppet run in the last 10 hours [16:17:42] (03PS1) 10Akosiaris: Adding brewster to new backup system [operations/puppet] - 10https://gerrit.wikimedia.org/r/80223 [16:17:59] (03CR) 10jenkins-bot: [V: 04-1] Adding brewster to new backup system [operations/puppet] - 10https://gerrit.wikimedia.org/r/80223 (owner: 10Akosiaris) [16:19:27] (03PS1) 10Faidon: authdns: migrate ns2 to eeden [operations/puppet] - 10https://gerrit.wikimedia.org/r/80224 [16:19:28] (03PS1) 10Faidon: authdns: remove IP/dns::authserver from old NS [operations/puppet] - 10https://gerrit.wikimedia.org/r/80225 [16:20:21] (03CR) 10Faidon: [C: 032] authdns: migrate ns2 to eeden [operations/puppet] - 10https://gerrit.wikimedia.org/r/80224 (owner: 10Faidon) [16:21:02] !log switching ns2 to eeden [16:21:07] Logged the message, Master [16:23:18] PROBLEM - Host ns2.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [16:23:30] that's expected [16:23:32] damn you puppet [16:25:38] RECOVERY - Host ns2.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 89.19 ms [16:27:16] !log DNS migration complete [16:27:21] Logged the message, Master [16:27:24] bblack: ^ [16:27:35] paravoid: yay! [16:27:45] http://ganglia.wikimedia.org/latest/?c=Miscellaneous%20pmtpa&h=mexia.wikimedia.org&m=cpu_report&r=hour&s=descending&hc=4&mc=2 [16:27:48] pretty graphs too [16:30:57] So: http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&h=mexia.wikimedia.org&m=cpu_report&s=descending&mc=2&g=cpu_report&c=Miscellaneous+pmtpa [16:31:12] that's our DNS cpu utilization? something's clearly broken, nothing's happening :) [16:31:30] dude you should put a busy loop somewhere in your code [16:31:38] it's far too efficient [16:32:02] (03PS1) 10Brian Wolff: Follow-up ccde789d5c24. Accidental change of crat -> sysop. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80226 [16:32:06] (03PS2) 10Akosiaris: Adding brewster to new backup system [operations/puppet] - 10https://gerrit.wikimedia.org/r/80223 [16:32:09] We could have it generate its own realtime graphs to waste more CPU [16:32:56] (03CR) 10Akosiaris: [C: 032] Adding brewster to new backup system [operations/puppet] - 10https://gerrit.wikimedia.org/r/80223 (owner: 10Akosiaris) [16:32:58] (03CR) 10Faidon: [C: 032] authdns: remove IP/dns::authserver from old NS [operations/puppet] - 10https://gerrit.wikimedia.org/r/80225 (owner: 10Faidon) [16:33:32] (03PS3) 10Akosiaris: Adding brewster to new backup system [operations/puppet] - 10https://gerrit.wikimedia.org/r/80223 [16:33:43] (03CR) 10Akosiaris: [C: 032] Adding brewster to new backup system [operations/puppet] - 10https://gerrit.wikimedia.org/r/80223 (owner: 10Akosiaris) [16:33:55] gerrit hates me ... [16:34:22] (03CR) 10Reedy: [C: 032] Follow-up ccde789d5c24. Accidental change of crat -> sysop. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80226 (owner: 10Brian Wolff) [16:34:37] (03Merged) 10jenkins-bot: Follow-up ccde789d5c24. Accidental change of crat -> sysop. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80226 (owner: 10Brian Wolff) [16:36:17] !log reedy synchronized wmf-config/InitialiseSettings.php [16:36:22] Logged the message, Master [16:36:48] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:37:46] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [16:55:33] heya paravoid, Snaps has some recent changes in librdkafka [16:55:38] coudl you pull those into your debian branch? [16:55:45] busy [16:55:45] later [16:55:46] also, note that he has moved 0.8 stuff to master [16:55:48] ok np [16:55:51] I know [16:55:53] he told me :) [16:56:11] hah, ok cool :) [16:59:32] (03PS1) 10Demon: Turning secure login back off for beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80227 [17:00:22] paravoid: I'll add a fix for issue #14 tonight, I'll tell you when Im done, okay? [17:01:10] ok [17:01:34] wait [17:01:39] you have 0.8 users already? [17:01:41] cool! [17:02:40] seems so :) [17:03:22] (03CR) 10Demon: [C: 032] Turning secure login back off for beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80227 (owner: 10Demon) [17:03:30] (03Merged) 10jenkins-bot: Turning secure login back off for beta [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80227 (owner: 10Demon) [17:38:10] RECOVERY - Ceph on ms-fe1001 is OK: Ceph HEALTH_OK [17:38:20] RECOVERY - Ceph on ms-fe1004 is OK: Ceph HEALTH_OK [17:38:50] RECOVERY - Ceph on ms-fe1003 is OK: Ceph HEALTH_OK [17:42:25] greg-g: got a sec? [17:47:49] ksnider: sure thing, just scrambling to move the https deploy to next week ;) [18:17:37] (03PS1) 10Aaron Schulz: Enabled $wgMWOAuthSecureTokenTransfer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80234 [18:20:37] (03PS1) 10coren: Tool Labs: Move dev packages to dev_environ [operations/puppet] - 10https://gerrit.wikimedia.org/r/80236 [18:22:09] (03CR) 10coren: [C: 032] "Simple change." [operations/puppet] - 10https://gerrit.wikimedia.org/r/80236 (owner: 10coren) [18:22:55] (03CR) 10Aaron Schulz: [C: 032] Enabled $wgMWOAuthSecureTokenTransfer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80234 (owner: 10Aaron Schulz) [18:23:29] (03Merged) 10jenkins-bot: Enabled $wgMWOAuthSecureTokenTransfer [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80234 (owner: 10Aaron Schulz) [18:29:52] !log aaron synchronized wmf-config/CommonSettings.php 'Enabled $wgMWOAuthSecureTokenTransfer' [18:29:58] Logged the message, Master [18:32:29] scary variable names are scary [18:38:48] !log reedy synchronized wmf-config/InitialiseSettings.php 'Enable Collection on testwikidatawiki' [18:38:53] Logged the message, Master [18:38:58] (03PS6) 10Petr Onderka: Implemented diff dumps [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/79808 [18:41:03] !log reedy synchronized wmf-config/InitialiseSettings.php 'Disable Collection on testwikidatawiki. It's mucho brokenp' [18:41:08] Logged the message, Master [18:42:55] lol Reedy [18:43:14] See https://www.strongspace.com/shared/6vrn7lwl00 for the PDF [18:43:23] (03CR) 10Petr Onderka: [C: 032 V: 032] Implemented diff dumps [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/79808 (owner: 10Petr Onderka) [18:43:56] lol Collection. [18:44:46] even the licenseis wrong [18:50:50] Anyone from ops want to update the php5-fss package so we can get it deployed to the Apaches? [18:50:56] https://bugzilla.wikimedia.org/show_bug.cgi?id=51551 [18:51:03] Changelog etc is already committed [18:51:18] Though, we are apparently running a version that isn't in the git repo changelog, which is slightly scary [19:48:53] apergos: "snapshot1004 ndswikibooks Error connecting to 10.64.16.42" spam...any idea what that is on about? [20:13:57] PROBLEM - Puppet freshness on cp1063 is CRITICAL: No successful Puppet run in the last 10 hours [20:17:01] ottomata: heads up, analytics1023-1025 are all warning about just a few hundred MBs of disk left [20:17:48] oo danke, looking. those are zookeepers [20:20:10] !log dist-upgrading zirconium [20:20:14] Logged the message, Master [20:22:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:23:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.136 second response time [20:32:57] RECOVERY - Puppet freshness on zirconium is OK: puppet ran at Wed Aug 21 20:32:50 UTC 2013 [20:35:04] !log re-enabled puppet runs on zirconium which was "admin down" [20:35:09] Logged the message, Master [20:36:40] !log installing package upgrades on iron [20:36:45] Logged the message, Master [20:38:34] RECOVERY - Disk space on analytics1024 is OK: DISK OK [20:41:09] ottomata: :) [20:44:27] yeah, making partitions for zk data [20:44:32] guess it was piling up [20:44:33] ! [20:44:42] these machines will be reinstalled pretty soon anyway (next week perhaps?) :) [20:44:52] cool [20:52:34] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:53:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [20:59:41] !log cp1061,cp1062 have dpkg issue upgrading varnish packages [20:59:46] Logged the message, Master [21:09:02] ACKNOWLEDGEMENT - Puppet freshness on mw1046 is CRITICAL: No successful Puppet run in the last 10 hours daniel_zahn hard disk, read-only fs, RT #5628 [21:11:53] ACKNOWLEDGEMENT - Apache HTTP on srv281 is CRITICAL: Connection refused daniel_zahn flaky hardware, pybal disabled, decom RT #5647 [21:11:53] ACKNOWLEDGEMENT - Puppet freshness on srv281 is CRITICAL: No successful Puppet run in the last 10 hours daniel_zahn flaky hardware, pybal disabled, decom RT #5647 [21:14:14] job queue on terbium: enwiki (1660342), uzwiki (32411), zhwiki (100804), commonswiki (14348), frwiki (26611), Total (1855913) [21:18:56] (03PS1) 10Dzahn: monitoring: accept 1 OR 2 processes with args carbon-cache.py on professor [operations/puppet] - 10https://gerrit.wikimedia.org/r/80297 [21:19:21] RECOVERY - Disk space on analytics1025 is OK: DISK OK [21:20:01] RECOVERY - Disk space on analytics1023 is OK: DISK OK [21:20:08] (03CR) 10Dzahn: [C: 032] "fix https://icinga-admin.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=professor&service=carbon-cache.py" [operations/puppet] - 10https://gerrit.wikimedia.org/r/80297 (owner: 10Dzahn) [21:20:47] (03CR) 10Raimond Spekking: "Never mind. It's working now." [operations/puppet] - 10https://gerrit.wikimedia.org/r/79466 (owner: 10Raimond Spekking) [21:22:07] (03CR) 10Dzahn: "ah yeah, sorry, puppet wasn't running on the host unrelated to this change, so it wasn't applied and i just fixed that" [operations/puppet] - 10https://gerrit.wikimedia.org/r/79466 (owner: 10Raimond Spekking) [21:22:29] AaronSchulz: no idea. I can look tomorrow I guess [21:22:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:23:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [21:27:34] paravoid, looks like m.wikimediafoundation.org has disappeared from DNS [21:28:59] also m.mediawiki.org [21:29:44] they are not resolving [21:32:35] i don't know the new system at all, but checking out the new operations/dns.git now to see what i can find [21:34:10] ty binasher [21:40:25] binasher: Any chance you can find out how long the DNS entries have been missing? [21:41:20] K4-713: whenever paravoid switched over the new infra [21:41:47] 13:39 paravoid: switching ns1 traffic to mexia (new authdns) [21:41:51] 15:53 paravoid: switching ns0 to rubidium [21:42:10] try root@mexia:/srv/authdns/git/templates# [21:42:18] I don't know the new system either, just looking on mexia [21:42:36] seems like you ought to be able to poke around there [21:43:46] binasher: ^^ [21:46:06] bblack: around at all? [21:47:16] yes [21:49:13] so it's just those two reported hostnames? [21:50:08] who knows... [21:50:09] i haven't managed to find any others yet [21:50:18] dns/templates/wikimediafoundation.org looks correct with [21:50:26] m 1H IN CNAME m.wikimedia.org. [21:50:36] yes [21:50:42] the line before is: store 1H IN CNAME wikimedia-lb.wikimedia.org. [21:50:42] I was just staring at that [21:50:45] so why? [21:50:57] and store.wikimediafoundation.org resolves [21:51:51] ok yeah [21:52:01] hush gerrit [21:52:03] grrrr [21:52:03] I think paravoid's template system found a parser "bug" [21:52:21] the final lines of these files don't have newlines. EOF just "happens" after the final dot on that line [21:52:54] it would be easier to just add the final newline to the output for now, though [21:53:28] I'm not sure if that's the template tool or its inputs [21:53:28] bblack: that seems to be the case with the wikipedia.org template as well though, where the final line is for zu.zero [21:53:36] and zu.zero.wikipedia.org resolves [21:55:57] I manually tried adding a newline on mexia for m.wmf.org and it worked [21:56:11] perhaps the bug has other subtleties to it [21:57:35] actually, i misspoke, wikipedia.org does have a newline [21:58:57] the input template for wmf.org ends with a newline, so it must be the generator [21:59:05] I guess mediawiki.org woulr be the same deal, from looking at it [22:00:58] wow, the metaphorical newline [22:01:36] yay bblack [22:01:49] one could just tack on a newline I guess in /usr/local/bin/authdns-gen-zones it looks like [22:04:11] yes, I think that's the way to do it for now [22:04:32] I'm just poking at things a little to see how we force regeneration and all that [22:04:51] keep_trailing_newline it seems is an argument to jinja but [22:05:00] then yu have to know what version and does it work right and meh [22:05:02] http://jinja.pocoo.org/docs/api/ [22:05:09] and given I have 0 experience with it... [22:05:44] yeah, I'd opt for just + "\n" on every file [22:06:24] yep [22:06:50] it seeeems like [22:06:52] # Write zonefile [22:06:52] open(zonepath, 'w').write(HEADER + output) [22:07:04] you could ack a newline onto the output just before that maybe [22:07:08] *tack [22:07:26] otoh it is past my bedtime and I won't be around if it breaks, so... [22:07:58] well I'm trying to push that, but gerrit is being slow [22:08:10] ah yeah see the other channel [22:08:28] there was some long running clone, it should be getting better soon [22:08:35] hmm [22:08:56] can't connect to the pedia atm [22:09:06] "the pedia" ? [22:09:14] which pedia is that? [22:09:32] th was klingon I think but 'the'? [22:09:34] k, it loaded now after a minute [22:09:42] whew [22:09:57] go gerrit go, you can do it [22:12:42] still waiting on the same push, but I'm afraid to control-C it and leave things in some mixed up gerrit state [22:12:45] gerrit is not going :( [22:13:43] yeah just let it run [22:13:59] (03PS1) 10BBlack: add newlines to every zonefile at the end [operations/puppet] - 10https://gerrit.wikimedia.org/r/80304 [22:14:18] ah there it goes [22:14:42] I just got on the host and was getting ready to give it a talking to [22:14:52] must have decided discretion was the better part of valor :-P [22:15:18] (03CR) 10BBlack: [C: 032] add newlines to every zonefile at the end [operations/puppet] - 10https://gerrit.wikimedia.org/r/80304 (owner: 10BBlack) [22:16:02] hey [22:16:30] newlines? really? [22:16:31] unbelievable [22:16:40] newlines. [22:17:07] I did extensive diffs between old and new generated zones [22:17:12] ...with diff -uwb [22:17:20] wow, the metaphorical newline [22:17:21] (w= whitespace, b = blank lines) [22:17:24] hah, yeah [22:18:48] it's weird, bblack just tried it worked for him [22:19:07] paravoid: it's still an upstream bug. the parser can't silently drop the final line just because it doesn't like the lack of a newline. it should at least fail [22:19:28] nod, I was about to say this [22:19:42] I filed a bug, hopefully the author will fix it for us :P [22:19:47] :D [22:19:53] he's awesome, I wouldn't worry [22:20:31] in the meantime, once puppet pushes the updated python zone generator, just run authdns-update and it will fix it, or? [22:20:40] why didn't we stacktrace it? ;) [22:20:47] hahaha [22:20:55] bblack: hm. maybe. [22:21:42] no [22:21:45] I check mtimes [22:21:49] ok [22:21:54] authdns-check-zones needs --force [22:22:00] er, gen [22:22:23] can I just run gen-zones --force + send sighup? [22:22:29] authdns-gen-zones --force /srv/authdns/git/templates /etc/gdnsd/zones; gdnsd reload [22:22:32] yes [22:22:44] on all three boxes [22:27:28] the reason it checks mtime btw is [22:27:32] real 0m4.783s [22:28:50] bblack: on it? [22:29:03] yes, eeden isn't finished yet [22:29:43] puppet is very slow over there, so I just did it manually [22:29:58] and... 10 minutes of ncache TTL in theory, but who knows in practice [22:30:07] but all 3 are updated now [22:30:08] right [22:30:14] awjr: ^^ [22:30:41] bblack: want to mail the list or should I? [22:30:53] reply to arthur's mail on-list I mean [22:30:58] I'll reply yeah [22:31:13] awesome thanks paravoid; m.mediawiki.org and m.wikimediafoundation.org are now resolving for me :D [22:31:25] !log opened up wikiadmin grants on es1 shard to accommodate snapshot1004 [22:31:30] Logged the message, Master [22:31:46] thanks [22:35:25] binasher: \o/ [22:36:47] still log noise though :( [22:37:05] thanks for the investigation [22:38:06] binasher: http://pastebin.com/nCUV1JMT is bugging me [22:38:59] it's not Gadget::loadStructuredList()...and the rest of the method is just wfMessage and isAllowed() calls [22:39:01] hmm, dig m.wikivoyage.org @ns0.wikimedia.org can't really confirm yet [22:39:28] AaronSchulz: is the show master inside that transaction for chronologyprotector? [22:39:58] mutante: I don't think this exists [22:40:12] binasher: that's what I assumed, yes [22:40:37] AaronSchulz: lol deleting all user_properties rows for the user, inserting new ones, deleting again, and reinserting within the same tx [22:40:52] AaronSchulz: weren't we talking about that at lunch a few weeks back? [22:41:14] sounds familiar [22:41:36] holding the locks for 20 seconds seems like the greater evil though [22:42:34] AaronSchulz: does GadgetHooks::getPreferences read those same rows? or attempt to? [22:43:50] binasher: nope [22:43:50] delete where non_unique_key = foo isn't great lock wise for a long tx like that either [22:44:14] I actually hacked it to not do the DELETE if there were no rows on non-locking select [22:44:26] that was a while ago and cut down on 95% of those deadlocks [22:44:30] paravoid: ah, hm, just slightly related https://bugzilla.wikimedia.org/show_bug.cgi?id=48318 [22:44:32] that's nice - it was doing that on new user creation [22:44:38] yep [22:45:19] can we kill off the second round of delete/insert [22:46:21] mutante: looks like just a missing cname [22:46:40] want to add it? I can, but maybe you'd like to try out the new DNS while I'm around :) [22:47:23] paravoid: yea, i wonder if we really want to add www.m and then put an Apache redirect so that the mobile devices get redirect back from there to www. [22:47:36] AaronSchulz: plus getting the master position before commit provides a position that doesn't reflect the tx, so why bother with chronologyprotector [22:47:50] why back to www? [22:47:57] because there is no m. [22:48:33] so they want to show the desktop site as well afaict [22:48:36] i guess there's value with giving chronologyprotector a newer position if its already initialized with an old position [22:48:40] just not end at a dead end page [22:49:34] awjr: ^ do you know if it would serve a different mobile site at wikivoyage? [22:50:01] i think it's just about showing the desktop site and not breaking on mobile so far ? [22:50:36] mutante not quite following - lemme read scrollback. but real quick, there is a .m for wikivoyage; eg http://en.m.wikivoyage.org/wiki/Main_Page [22:51:05] I don't see how the SHOW STATUS is before the COMMIT looking at the code [22:51:08] oh i see for the wwww [22:51:09] er [22:51:10] www [22:51:18] yeah [22:51:24] $factory->shutdown(); is like the last thing to be called [22:51:29] it breaks when it tries to find www.m. [22:51:34] we don't have a separate mobile version of www.wikipedia.org either [22:51:39] and that just happens on mobile devices [22:51:41] $this->chronProt->shutdown(); [22:51:42] $this->commitMasterChanges(); [22:51:45] although we were just discussing that ew might like to do something about that [22:51:51] (03PS1) 10Faidon: Revert "Allow ops to force push on ops/dns.git" [operations/dns] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/80309 [22:51:57] hmm, could be that but the commit call is already done before that in wiki.php [22:52:04] (03CR) 10Faidon: [C: 032 V: 032] Revert "Allow ops to force push on ops/dns.git" [operations/dns] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/80309 (owner: 10Faidon) [22:52:20] awjr: so if we just added www.m in DNS now, and then put Apache redirects to redirect back from there to www. ? [22:52:25] it's probably best for mobile devices to get www.wikivoyage.org for now (no .m variant) until we figure out what to do about www.wikipedia.org so we can do the same thing there [22:52:33] that's what i thought people want us to do on that ticket at least [22:52:43] www.m.wikipedia.org doesnt resolve either [22:52:49] lemme read teh bug [22:52:54] did it before? [22:52:55] yea, paravoid just said we could add it though [22:53:03] no i dont think so bblack [22:53:25] no, I said that if that's the request, we can :) [22:54:08] eh - wikivoyage.org does not redirect to www.m.wikivoyage.org for me... [22:54:26] awjr: heh, that's what i commented first but [22:54:32] bblack: btw, I tried using dnsreplay to find such occurences [22:54:36] @Daniel: Yes but try www.wikivoyage.org/wiki/Paris for instance. [22:54:40] bblack: I didn't have much luck with it :) [22:55:02] bblack: subtle differences in responses betwen pdns/gdnsd [22:55:35] yeah it's the subtle ones that are the scariest of course [22:55:55] "hey my project is missing a hostname" are the easy bugs :) [22:55:58] blech how does the redirect happen for www.wikivoyage.org/wiki/Paris? is tehre a geo lookup and user gets redirected to en.? [22:56:07] all the cases I saw, gdnsd was better [22:56:25] and why are people getting links for www.wikivoyage.org/wiki/Paris? [22:56:25] you mean things like response bits/flags diffs? [22:56:32] rather than en.wikivoyage.org/wiki/Paris? [22:56:58] PROBLEM - Puppet freshness on lvs1001 is CRITICAL: No successful Puppet run in the last 10 hours [22:57:05] I mean like asking en.wikipedia.org A from pdns and pdns replying CNAME wikipedia-lb.wikimedia.org and then the response for wikipedia-lb.wikimedia.org :) [22:57:12] (CNAME and finally A) [22:57:37] ah yes [22:58:01] and dnsreplay is in powerdns flavored C++ [22:58:06] I couldn't easily filter them out [22:58:17] there's still this bugfix unpublished as well, I guess it and the newline thing will go in 1.10.0: https://github.com/blblack/gdnsd/commit/22b0dcf8a19aaeb1e6f32ad9f0aad95ab26b8a61 [22:58:24] I entertained the idea of writing a similar thing in e.g. perl [22:58:34] but I really don't think the CNAME+ANY thing is a production issue in the general sense [22:58:43] but there was another important issue as well [22:58:50] replaying the traffic with 127.0.0.1 as src ip [22:58:55] is kinda pointless with geodns :) [22:59:22] awjr: "google still finds www.wikivoyage.org urls for some of it's [22:59:22] searches (I assume because of old link juice) especially if you're looking for [22:59:31] wikivoyage specific urls with a "wikivoyage X" type search." [22:59:44] awjr: because they ran www. on their old server before migration [23:01:09] mutante: ah so it's a matter of time for the crawlers to catch up - the canonical URL i see on wikivoyage pages looks correct [23:01:11] bblack: I've seen the bug yeah. I think it explains the weird bind9-host output [23:02:11] awjr: yea, so is it worth it, maybe analytics has numbers? [23:02:36] * awjr shrugs [23:02:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:02:49] maybe - i dunno how long it takes for everything to get recrawled [23:02:55] bblack: include file support is nice too :) [23:02:55] gotta run to a mtg [23:03:16] awjr: kk, culater [23:03:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [23:26:51] mmm, didn't know mwlib was so much fun [23:27:53] '+skwikisource' => array( [23:27:53] 'Wikisource' => NS_PROJECT, [23:27:53] 'Wikisource_talk' => NS_PROJECT_TALK, [23:27:53] ), [23:27:58] sorry [23:28:38] thjis was the right paste: [23:28:42] Code Review - Error [23:28:43] Server Unavailable [23:29:17] aka gerrit is not working [23:32:32] (03PS1) 10Danny B.: skwikibooks: Project name localization [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/80313 [23:38:48] (03PS1) 10Dzahn: redirect http->https on etherpad.wikimedia.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/80314 [23:40:58] anyone on the gerrit issue? [23:41:20] VE needs to do an emergency deploy to fix a content curroption issue [23:41:23] greg-g: I thought Tim restarted the server? [23:41:26] and that it's OK now? [23:41:31] works for me? [23:42:26] oh, awesome, didn't see any !log [23:42:51] was just noticing the comments in other channels [23:45:00] RECOVERY - carbon-cache.py on professor is OK: PROCS OK: 2 processes with args carbon-cache.py [23:45:29] binasher: ^ minor but fixed, bbl [23:45:40] mutante: thanks!