[00:03:38] greg-g: Are you still ok with my update going in parallel? [00:04:23] bd808: yeah [00:04:37] Excellent. I'll get it done then [00:04:42] * greg-g nods [00:06:16] greg-g: {{done}} [00:06:49] sweet [00:07:33] git-deploy is quick like a bunny when you are only updating a single host :) [00:08:59] !log mholmquist synchronized php-1.23wmf8/extensions/MultimediaViewer/ [00:09:06] Logged the message, Master [00:10:51] !log mholmquist synchronized php-1.23wmf9/extensions/MultimediaViewer/ [00:10:54] MaxSem: Done! Will test on mw.org but it should be fine. [00:10:57] Logged the message, Master [00:11:28] Yeah, looks good [00:11:38] wee [00:11:47] Oh, sorry [00:11:51] LIGHTENING DEPLOYYYYY [00:11:59] * marktraceur apologises to greg-g [00:12:01] (03CR) 10MaxSem: [C: 032] Enable beta mobile diff on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106236 (owner: 10MaxSem) [00:12:15] (03Merged) 10jenkins-bot: Enable beta mobile diff on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106236 (owner: 10MaxSem) [00:12:33] (03PS1) 10Aaron Schulz: Disable wgMaxBacklinksInvalidate now that throttling is used [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106465 [00:14:31] (03CR) 10jenkins-bot: [V: 04-1] Disable wgMaxBacklinksInvalidate now that throttling is used [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106465 (owner: 10Aaron Schulz) [00:15:21] !log maxsem synchronized wmf-config 'https://gerrit.wikimedia.org/r/106236' [00:15:28] Logged the message, Master [00:15:44] I hate that fucking submodule [00:19:12] I actually can't remember the command to make that diff go away [00:21:21] !log maxsem synchronized php-1.23wmf8/extensions/Collection/ 'https://gerrit.wikimedia.org/r/106293' [00:21:28] Logged the message, Master [00:25:25] !log maxsem synchronized php-1.23wmf8/extensions/Collection/ 'Shit hit fan' [00:25:32] Logged the message, Master [00:25:57] I'm done [00:26:06] 'Shit hit fan'? [00:26:09] haha [00:26:20] MaxSem: that's it? "I'm done"? [00:26:27] :) [00:26:43] hmm, maybe ori knows [00:26:51] * AaronSchulz wishes that was just a symlink or something [00:27:01] knows what? [00:27:11] ori: how do I remove the submodule change from f12b7b2122797c0602c10a1f902955998143f3e4 ? [00:27:39] and I can't checkout nor reset to f12b7b2122797c0602c10a1f902955998143f3e4 [00:27:48] it's sooo convenient in TortoiseGit:P [00:28:00] ori: I meant https://gerrit.wikimedia.org/r/106465 [00:28:26] if you don't submodule update before commit, that crap gets tossed in [00:29:33] you mean, retain only the CommonSettings change? [00:29:43] right, I couldn't care less about the other thing [00:32:03] i dunno, checkout, git reset --soft 'HEAD^', git reset -- submodule_thigny, git commit -c ORIG_HEAD, git review? [00:32:39] (03PS2) 10MaxSem: Disable wgMaxBacklinksInvalidate now that throttling is used [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106465 (owner: 10Aaron Schulz) [00:32:53] ^^^:) [00:33:01] AaronSchulz: I checked it out, `git reset HEAD^` and checked in again and the `git show` looks clean [00:33:20] Max to the rescue [00:33:56] MaxSem: Nothing wrong I hope [00:34:50] bd808: checked in what? [00:35:31] The file that was dirty from the reset HEAD^ [00:35:47] Max fixed it for you in gerrit [00:36:00] I know but I want to know how to deal with this crap in the future [00:36:07] aaron has to insert quarters into his computer for each git operation [00:36:31] * ori kids [00:37:40] So I did `git review --download 106465; git reset HEAD^; git commit --all` and that seemed to drop the submodule change for me [00:38:08] !log ori synchronized php-1.23wmf8/extensions/EventLogging 'Update EventLogging to master' [00:38:15] Logged the message, Master [00:39:18] bd808: that doesn't work for me [00:40:01] Hmmm… oh. I don't have the submodule initialized !? [00:40:21] I remember Roan staring at this for a hour last time [00:40:26] * AaronSchulz can't recall what he did [00:40:32] MaxSem: so, what happened? [00:41:46] git submodule update [00:41:47] fatal: reference is not a tree: f12b7b2122797c0602c10a1f902955998143f3e4 [00:41:49] Unable to checkout 'f12b7b2122797c0602c10a1f902955998143f3e4' in submodule path 'docroot/bits/WikipediaMobileFirefoxOS' [00:41:56] ugh, so much for stackoverflow suggestions ;) [00:42:22] greg-g, I didn't realise wmf8 was so old, deployd master and was met with excepion due to https://gerrit.wikimedia.org/r/103915 [00:43:18] yeah, it is :/ [00:43:22] ah, that [00:43:32] MaxSem: so, reverted or fixed something? [00:43:37] reverted [00:43:39] * greg-g nods [00:44:32] (03PS1) 10Subramanya Sastry: WIP: Update parsoid puppet config to use new repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/106471 [00:46:47] (03CR) 10Subramanya Sastry: "This is my first puppet patch ever and an initial attempt to update parsoid config to use the new repo. So, not sure if I got everything t" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106471 (owner: 10Subramanya Sastry) [01:12:27] (03PS1) 10Springle: repool db1041 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106476 [01:12:51] (03CR) 10Springle: [C: 032] repool db1041 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106476 (owner: 10Springle) [01:12:59] (03Merged) 10jenkins-bot: repool db1041 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106476 (owner: 10Springle) [01:14:07] !log springle synchronized wmf-config/db-eqiad.php 'repool db1041' [01:14:16] Logged the message, Master [01:57:47] (03PS1) 10Springle: disable index_merge_sort_union [operations/puppet] - 10https://gerrit.wikimedia.org/r/106478 [02:02:22] springle: just a heads up that i'm running a big export query on db1047 (research slave), should finish in less than 10 mins [02:02:32] i don't expect any issues / alerts [02:02:55] * AaronSchulz wonders what was wrong with index_merge_sort_union [02:02:58] eh, time to go [02:08:01] ori: ok tnx [02:08:17] (now that almost 10mins have passed before i noticed :) [02:08:33] Query OK, 39178348 rows affected (3 min 11.23 sec) [02:08:36] faster than i thought [02:08:42] nice [02:11:15] (03CR) 10Springle: [C: 032] disable index_merge_sort_union [operations/puppet] - 10https://gerrit.wikimedia.org/r/106478 (owner: 10Springle) [02:15:04] Bsadowski1: I don't know what you're pinging me about. [02:21:17] PROBLEM - puppetmaster https on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:17] PROBLEM - HTTP on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:28:28] !log partition logging tables on logpager slaves: s2 db1002, s4 db1004, s5 db1026, s6 db1040 [02:50:11] wikitech down... [02:52:17] RECOVERY - HTTP on virt0 is OK: HTTP OK: HTTP/1.1 302 Found - 457 bytes in 6.802 second response time [02:52:58] should be back up ;) [02:52:59] !log LocalisationUpdate completed (1.23wmf8) at Thu Jan 9 02:52:59 UTC 2014 [02:53:44] !log wikitech nada. restarted apache on virt0 [02:54:17] RECOVERY - puppetmaster https on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 8.126 second response time [02:54:37] Reedy: did you do something too, or was it just the apache restart? [02:56:31] I'm not sure the bot can log to wikitech when wikitech is down. [02:56:57] wikitech was back up for seconds. now nada again [02:57:09] Yeah, it isn't loading for me. [02:57:17] PROBLEM - puppetmaster https on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:57:17] PROBLEM - HTTP on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:57:20] I'm not sure what virt0 is, but it seems awfully fickle. [02:57:39] It's virtualisation host 0 [02:57:53] server reached MaxClients setting [02:58:07] springle: Nothing. Just RECOVERY - HTTP being on the host where it's located [02:58:26] is maxclients really small? [02:59:03] 150 [02:59:24] Sounds pretty busy [02:59:59] puppet is busiest on top [03:00:17] RECOVERY - HTTP on virt0 is OK: HTTP OK: HTTP/1.1 302 Found - 457 bytes in 1.948 second response time [03:00:39] here we go [03:01:23] phusion_passenger exception then apache came back [03:03:07] RECOVERY - puppetmaster https on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.179 second response time [03:03:28] !log wikitech down. restarted apache on virt0. phusion_passenger exception + MaxClients hit [03:03:31] * springle tries again [03:03:34] Logged the message, Master [03:03:38] yay [03:03:48] !log partition logging tables on logpager slaves: s2 db1002, s4 db1004, s5 db1026, s6 db1040 [03:03:54] Logged the message, Master [03:33:56] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Jan 9 03:33:55 UTC 2014 [03:34:02] Logged the message, Master [04:29:31] (03PS1) 10Springle: Segregate watchlist and recentshangeslinked queries on all shards. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106483 [04:30:16] (03CR) 10Springle: [C: 032] Segregate watchlist and recentshangeslinked queries on all shards. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106483 (owner: 10Springle) [04:30:25] (03Merged) 10jenkins-bot: Segregate watchlist and recentshangeslinked queries on all shards. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106483 (owner: 10Springle) [04:31:41] !log springle synchronized wmf-config/db-eqiad.php 'watchlist/recnetchangeslinked LB on s[234567]' [04:31:48] Logged the message, Master [09:23:57] PROBLEM - Host pdf1 is DOWN: PING CRITICAL - Packet loss = 100% [10:28:07] akosiaris: pm? [10:34:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:36:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:38:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:40:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:42:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:44:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:46:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:48:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:50:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:52:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:54:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:56:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:57:05] PROBLEM - Host mw27 is DOWN: PING CRITICAL - Packet loss = 100% [10:58:05] RECOVERY - Host mw27 is UP: PING OK - Packet loss = 0%, RTA = 35.37 ms [10:58:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [11:00:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [11:00:45] RECOVERY - Puppet freshness on wtp1019 is OK: puppet ran at Thu Jan 9 11:00:40 UTC 2014 [11:01:46] hello [11:02:09] RobH: good morning [11:02:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 11:00:40 AM UTC [11:05:03] PROBLEM - Host cp4016 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:04] PROBLEM - Host cp4004 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:04] PROBLEM - Host cp4014 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:04] PROBLEM - Host cp4008 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:04] PROBLEM - Host cp4011 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:04] PROBLEM - Host cp4019 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:04] PROBLEM - Host mobile-lb.ulsfo.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [11:05:23] PROBLEM - Host mobile-lb.ulsfo.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:863:ed1a::1:c [11:05:29] that's ulsfo, ignore [11:05:33] PROBLEM - Host cp4012 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:33] PROBLEM - Host lvs4003 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:34] pheew.ok [11:05:36] (03PS1) 10Matanya: wmclient moved to git. correcting README [operations/debs/adminbot] - 10https://gerrit.wikimedia.org/r/106502 [11:05:36] I'll have a look, but ignore for now [11:05:42] thx for info [11:05:43] PROBLEM - Host lvs4001 is DOWN: PING CRITICAL - Packet loss = 100% [11:06:03] RECOVERY - Host cp4004 is UP: PING OK - Packet loss = 0%, RTA = 75.12 ms [11:06:04] RECOVERY - Host cp4019 is UP: PING OK - Packet loss = 0%, RTA = 86.67 ms [11:06:04] RECOVERY - Host cp4016 is UP: PING OK - Packet loss = 0%, RTA = 75.07 ms [11:06:04] RECOVERY - Host cp4012 is UP: PING OK - Packet loss = 0%, RTA = 75.03 ms [11:06:04] RECOVERY - Host cp4008 is UP: PING OK - Packet loss = 0%, RTA = 75.12 ms [11:06:04] RECOVERY - Host cp4014 is UP: PING OK - Packet loss = 0%, RTA = 75.03 ms [11:06:13] RECOVERY - Host lvs4001 is UP: PING OK - Packet loss = 0%, RTA = 75.11 ms [11:06:13] RECOVERY - Host lvs4003 is UP: PING OK - Packet loss = 0%, RTA = 75.15 ms [11:06:13] RECOVERY - Host cp4011 is UP: PING OK - Packet loss = 0%, RTA = 84.76 ms [11:06:13] RECOVERY - Host mobile-lb.ulsfo.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 85.09 ms [11:06:33] RECOVERY - Host mobile-lb.ulsfo.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 86.74 ms [11:09:48] mutante: have a sec? [11:31:53] RECOVERY - Puppet freshness on wtp1019 is OK: puppet ran at Thu Jan 9 11:31:49 UTC 2014 [11:38:11] http://reportcard.wikimedia.org/ appears to be down [11:38:36] are we only using http://reportcard.wmflabs.org/ now? [11:44:16] I think so [11:44:49] cert error The certificate is only valid for metrics.wikimedia.org [11:45:25] PROBLEM - Host cp4012 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:25] PROBLEM - Host cp4011 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:25] PROBLEM - Host cp4020 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:25] PROBLEM - Host cp4019 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:28] ori: what's behind the login on https://reportcard [11:45:40] The site says: "WMF E3 Metrics API" [11:45:43] PROBLEM - Host bits-lb.ulsfo.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:863:ed1a::1:a [11:45:46] PROBLEM - Host mobile-lb.ulsfo.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:863:ed1a::1:c [11:45:49] PROBLEM - Host upload-lb.ulsfo.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:863:ed1a::2:b [11:45:51] PROBLEM - Host text-lb.ulsfo.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:863:ed1a::1 [11:45:54] PROBLEM - Host cp4004 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:55] PROBLEM - Host cp4017 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:55] PROBLEM - Host cp4010 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:55] PROBLEM - Host cp4006 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:55] PROBLEM - Host cp4005 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:55] PROBLEM - Host cp4015 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:56] PROBLEM - Host cp4016 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:56] PROBLEM - Host cp4008 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:57] PROBLEM - Host cp4018 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:57] PROBLEM - Host cp4001 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:58] PROBLEM - Host lvs4001 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:58] PROBLEM - Host cp4013 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:58] PROBLEM - Host cp4009 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:59] PROBLEM - Host cp4007 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:59] PROBLEM - Host lvs4003 is DOWN: PING CRITICAL - Packet loss = 100% [11:46:00] PROBLEM - Host upload-lb.ulsfo.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [11:46:01] PROBLEM - Host cp4003 is DOWN: PING CRITICAL - Packet loss = 100% [11:46:01] PROBLEM - Host cp4014 is DOWN: PING CRITICAL - Packet loss = 100% [11:46:02] PROBLEM - Host lvs4002 is DOWN: PING CRITICAL - Packet loss = 100% [11:46:02] PROBLEM - Host cp4002 is DOWN: PING CRITICAL - Packet loss = 100% [11:46:03] PROBLEM - Host bast4001 is DOWN: PING CRITICAL - Packet loss = 100% [11:46:03] PROBLEM - Host bits-lb.ulsfo.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [11:46:04] PROBLEM - Host lvs4004 is DOWN: PING CRITICAL - Packet loss = 100% [11:46:15] * hashar waves at ulsfo [11:46:15] dear vendor who I won't name: fuck you [11:46:24] PROBLEM - Host mobile-lb.ulsfo.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [11:46:34] PROBLEM - Host text-lb.ulsfo.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [11:46:43] mutante: dunno [11:47:06] who can create a git repo for me? [11:47:10] we have no traffic going to ulsfo right now so... [11:47:31] I would like to push gangilos to gerrit [11:47:34] PROBLEM - Host backup4001 is DOWN: PING CRITICAL - Packet loss = 100% [11:47:39] operations/software/ganglios would be the name [11:48:29] matanya: i can do [11:48:36] what is ganglios about ? [11:48:39] thanks hash [11:48:52] ganglios is a collection of tools that allow nagios to trigger alerts based on data it pulls from ganglia. [11:49:05] RT ticket 6602 [11:49:22] ahh [11:49:24] RECOVERY - Host cp4012 is UP: PING OK - Packet loss = 0%, RTA = 80.58 ms [11:49:24] RECOVERY - Host cp4020 is UP: PING OK - Packet loss = 0%, RTA = 87.83 ms [11:49:24] RECOVERY - Host cp4011 is UP: PING OK - Packet loss = 0%, RTA = 80.01 ms [11:49:24] RECOVERY - Host upload-lb.ulsfo.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 80.00 ms [11:49:27] RECOVERY - Host text-lb.ulsfo.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 80.04 ms [11:49:31] RECOVERY - Host cp4019 is UP: PING OK - Packet loss = 0%, RTA = 80.03 ms [11:49:31] RECOVERY - Host bits-lb.ulsfo.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 87.72 ms [11:49:34] RECOVERY - Host mobile-lb.ulsfo.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 80.07 ms [11:49:36] RECOVERY - Host lvs4003 is UP: PING OK - Packet loss = 0%, RTA = 80.85 ms [11:49:36] RECOVERY - Host lvs4002 is UP: PING OK - Packet loss = 0%, RTA = 80.17 ms [11:49:36] RECOVERY - Host upload-lb.ulsfo.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 80.02 ms [11:49:39] RECOVERY - Host cp4006 is UP: PING OK - Packet loss = 0%, RTA = 87.61 ms [11:49:39] RECOVERY - Host cp4002 is UP: PING OK - Packet loss = 0%, RTA = 80.05 ms [11:49:39] RECOVERY - Host cp4016 is UP: PING OK - Packet loss = 0%, RTA = 80.83 ms [11:49:39] RECOVERY - Host cp4003 is UP: PING OK - Packet loss = 0%, RTA = 87.95 ms [11:49:39] RECOVERY - Host cp4015 is UP: PING OK - Packet loss = 0%, RTA = 87.93 ms [11:49:39] RECOVERY - Host cp4009 is UP: PING OK - Packet loss = 0%, RTA = 88.20 ms [11:49:39] RECOVERY - Host cp4014 is UP: PING OK - Packet loss = 0%, RTA = 88.01 ms [11:49:40] RECOVERY - Host cp4010 is UP: PING OK - Packet loss = 0%, RTA = 87.93 ms [11:49:41] RECOVERY - Host cp4001 is UP: PING OK - Packet loss = 0%, RTA = 80.19 ms [11:49:41] RECOVERY - Host cp4018 is UP: PING OK - Packet loss = 0%, RTA = 80.76 ms [11:49:42] RECOVERY - Host cp4017 is UP: PING OK - Packet loss = 0%, RTA = 80.78 ms [11:49:42] RECOVERY - Host cp4013 is UP: PING OK - Packet loss = 0%, RTA = 87.97 ms [11:49:42] RECOVERY - Host cp4008 is UP: PING OK - Packet loss = 0%, RTA = 88.10 ms [11:49:43] RECOVERY - Host cp4004 is UP: PING OK - Packet loss = 0%, RTA = 88.28 ms [11:49:43] RECOVERY - Host cp4005 is UP: PING OK - Packet loss = 0%, RTA = 80.18 ms [11:49:44] RECOVERY - Host cp4007 is UP: PING OK - Packet loss = 0%, RTA = 80.75 ms [11:49:45] RECOVERY - Host bits-lb.ulsfo.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 87.86 ms [11:49:45] RECOVERY - Host bast4001 is UP: PING OK - Packet loss = 0%, RTA = 88.03 ms [11:49:46] RECOVERY - Host lvs4001 is UP: PING OK - Packet loss = 0%, RTA = 87.93 ms [11:49:46] RECOVERY - Host lvs4004 is UP: PING OK - Packet loss = 0%, RTA = 80.59 ms [11:49:47] RECOVERY - Host mobile-lb.ulsfo.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 80.00 ms [11:49:47] matanya: and I thought we already had that hehe [11:50:00] not in git [11:50:01] RECOVERY - Host text-lb.ulsfo.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 80.58 ms [11:50:11] subversion? [11:50:21] RECOVERY - Host backup4001 is UP: PING OK - Packet loss = 0%, RTA = 80.64 ms [11:50:28] * twkozlowski has spoken the cursed word [11:50:45] /ban twkozlowski No more subversion, we use hyperversion now. [11:52:58] matanya: created it : git clone ssh://gerrit.wikimedia.org:29418/operations/software/ganglios [11:53:05] matanya: rights inherits from operations/software [11:53:12] good, thanks! [11:53:23] i'll let you know if i have issues [11:53:40] hashar: warning: remote HEAD refers to nonexistent ref, unable to checkout. [11:53:53] matanya: Andrew Otto can follow up I guess [11:53:56] yeah the repo is empty [11:54:07] oh, ok. [11:54:25] I guess whoever import the code will force push [11:54:38] i have it, and can push it [11:54:53] unless you do it differntly [11:54:58] I have no clue [11:55:11] seems the code is in a mercurial repository at ttps://bitbucket.org/maplebed/ganglios [11:55:18] yes [11:55:21] that's normal error when it's empty [11:55:40] oh mercurial.. import from git would have been easier i suppose [11:55:49] ah, maplebed [11:56:06] git hg works for us :) [11:56:14] cool [11:56:28] you want to add the .gitreview file first [11:56:33] to the empty repo [11:56:34] i'll do it. unless there is some rule against it [11:56:39] yes, right [11:58:00] just import the hg > git repository [11:58:09] then we can send a commit that adds .gitreview [11:58:14] you will need push rights though [12:01:11] matanya: wanna push now ? [12:01:35] hashar: i'm converting atm [12:01:42] give me 5 mins [12:02:01] hmm, how do I publish something to noc.wikimedia.org/~username/ these days? [12:02:13] mkdir ~/public_html on fenari [12:02:58] just kidding, you use heroku to upload a docker container with jekyll [12:04:23] matanya: you should have force push rights on operations/software/ganglios [12:04:35] thanks a lot hashar [12:04:49] ori, do you ever sleep? [12:05:00] he sleeps at the office [12:05:10] makes sense:P [12:05:12] you should relocate there ori [12:05:46] i'm in my kitchen [12:06:28] I am laying on my couch still in pyjamas (at least I am wearing a pant) [12:07:58] hmm, does that mean that I should allow reading my ~? sounds scary:P [12:09:03] depends on what you keep in there [12:09:45] nothing valuable, actually [12:09:54] but still... [12:09:55] meh [12:10:47] hashar: it is git push /somethinghere/ origin/master [12:11:15] git push ssh://gerrit.wikimedia.org:29418/operations/software/ganglios master:master [12:11:17] I guess [12:11:59] didn't work hashar : ! [remote rejected] master -> master (prohibited by Gerrit) [12:12:00] error: failed to push some refs to 'ssh://gerrit.wikimedia.org:29418/operations/software/ganglios' [12:12:07] ahah [12:12:17] try with -f ? [12:12:20] git push -f ... [12:12:33] same [12:12:50] ahh [12:12:54] I created the master branch [12:12:58] hey akosiaris, I've just run diff tests on test wiki using the new package, looks good: http://noc.wikimedia.org/~maxsem/test.html [12:12:59] can you git push -f ? [12:13:23] yes, thanks! [12:13:32] MaxSem: nice :-) [12:14:18] I 'll upgrade the wikidiff2 package today then [12:14:29] awesome, thanks:) [12:14:33] matanya: don't you want to push the mercurial converted repo ? [12:14:52] i did, i started with .gitreview before [12:15:16] now i'll verfiy the hg stuff, and if it is all sane, i'll push that too [12:17:41] PROBLEM - Host cp4011 is DOWN: PING CRITICAL - Packet loss = 100% [12:17:41] PROBLEM - Host cp4012 is DOWN: PING CRITICAL - Packet loss = 100% [12:17:41] PROBLEM - Host cp4020 is DOWN: PING CRITICAL - Packet loss = 100% [12:17:41] PROBLEM - Host text-lb.ulsfo.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [12:17:51] PROBLEM - Host cp4019 is DOWN: PING CRITICAL - Packet loss = 100% [12:18:11] RECOVERY - Host cp4011 is UP: PING OK - Packet loss = 0%, RTA = 80.94 ms [12:18:12] RECOVERY - Host cp4019 is UP: PING OK - Packet loss = 0%, RTA = 80.93 ms [12:18:12] RECOVERY - Host cp4012 is UP: PING OK - Packet loss = 0%, RTA = 81.44 ms [12:18:16] (03PS1) 10Matanya: imported Mercurial ganglios from https://bitbucket.org/maplebed/ganglios/overview [operations/software/ganglios] - 10https://gerrit.wikimedia.org/r/106505 [12:18:21] RECOVERY - Host cp4020 is UP: PING OK - Packet loss = 0%, RTA = 80.94 ms [12:19:01] RECOVERY - Host text-lb.ulsfo.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 75.04 ms [12:19:10] ok, this looks ok. I'll let otto review this [12:19:16] what do we need ganglios for? [12:20:12] paravoid: https://rt.wikimedia.org/Ticket/Display.html?id=6602 [12:30:32] oh, mutante https://gerrit.wikimedia.org/r/#/c/102629/ holidays are over [12:57:11] Could someone review? https://gerrit.wikimedia.org/r/#/c/101820/ [13:14:03] (03PS1) 10Alexandros Kosiaris: Removed the unneeded know wikidiff2.ini override [operations/puppet] - 10https://gerrit.wikimedia.org/r/106510 [13:15:25] akosiaris: s/know/now/ [13:15:47] duh... thanks [13:17:22] (03PS2) 10Alexandros Kosiaris: Removed the unneeded now wikidiff2.ini override [operations/puppet] - 10https://gerrit.wikimedia.org/r/106510 [13:28:43] paravoid: can you please explian the status of dns within puppet? there is the authdns module and a seperate manifest named dns.pp [13:29:21] matanya: dns.pp is the old one, it doesn't have authoritative dns classes for production anymore [13:29:39] matanya: it has the recursors, which is an entirely different piece of infrastructure and that I need to fix [13:29:50] matanya: and it also has the auth server for labs, based on powerdns & ldap [13:30:19] matanya: but from what I heard, this will change very soon with the introduction of https://wiki.openstack.org/wiki/Designate [13:30:27] very soon = with the transition to eqiad [13:30:54] which all need to striped from that pp and spread into the dns module/other modules? [13:32:10] no [13:32:38] dns::auth-server::ldap will go soon [13:32:48] dns::recursor needs overhauling [13:34:39] thanks paravoid. aside from this, any help needed in the debian DSA team? :) [13:34:55] haha [13:35:22] this sounds like a yes to me [13:35:49] you can't be DSA if you're not a member of the project [13:36:01] and we have nothing like labs to experiment with [13:36:19] and we have no code review system and no code reviews for the most part [13:36:44] so suffice to say, it's a little difficult [13:36:47] but thanks for the offer :) [13:37:03] i see a lot of space for helping :P [13:37:58] deploy gerrit/code review, create labs etc [14:52:07] (03CR) 10Milimetric: [C: 032 V: 032] Restructuring code from bin/logster into logster.logster module [operations/debs/logster] - 10https://gerrit.wikimedia.org/r/101021 (owner: 10Ottomata) [15:25:47] (03PS1) 10Jgreen: remove star cert from aluminium [operations/puppet] - 10https://gerrit.wikimedia.org/r/106519 [15:30:37] (03CR) 10Jgreen: [C: 032 V: 031] remove star cert from aluminium [operations/puppet] - 10https://gerrit.wikimedia.org/r/106519 (owner: 10Jgreen) [16:17:21] PROBLEM - RAID on db1001 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [16:31:15] pdf1 is down (http://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&c=PDF+servers+pmtpa&h=pdf1.wikimedia.org&tab=m&vn=&hide-hf=false&metric_group=). can someone reboot it please? [16:33:05] * Jeff_Green looking [16:33:25] schmir: I will as soon as I remember how to reboot with DRAC5. :-) [16:33:45] Coren: racadm serveraction powercycle ? [16:33:46] Coren: thanks! [16:33:58] reboot? reuse? recycle? [16:34:36] Jeff_Green: hardreset. [16:34:45] wow never seen that one [16:35:34] schmir: Should be rebooting now. [16:53:38] Coren: thanks, but it doesn't seem to come up again. it's been 17 minutes and the machine has rather small disk (<100G). [16:54:09] schmir: Indeed not. Lemme see if there is a hardware fail. [16:58:25] (03PS1) 10Aude: add wikisource site link group for wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106531 [16:58:26] (03PS1) 10Aude: cleanup old Wikibase settings which are same as default now [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106532 [17:02:29] database locked again? [17:02:54] Which what where? [17:03:11] "The database has been automatically locked while the slave database servers catch up to the master" [17:03:27] Which wiki? [17:03:33] enwp [17:03:44] 1-2 minutes lagged [17:04:10] huh [17:05:01] schmir: I have bad news; it looks like the hardware is quite deat. [17:05:03] dead* [17:05:05] Looks like it's increasing [17:05:11] it just unlocked for me [17:05:19] en.wiki is read-only for me. :-( [17:05:27] mmm [17:05:33] schmir: Even a power cycle doesn't wake it up. [17:05:46] db63 looks strange [17:06:06] schmir: Need to open a ticket to have someone go and take a look physically. [17:06:08] Coren: what system died? [17:06:15] RobH: pdf1 [17:06:21] depending on location there are some tricks we may be able to try, lemme see [17:06:45] RobH: In pmtpa [17:07:00] Reedy: Seems better now. Edits are still coming through. [17:07:04] yea, its not in one of the managed power port racks though [17:07:05] =[ [17:07:06] Guess it was temporary. [17:07:10] lemme take a poke at it [17:07:22] but i imagine if you didnt get it to work, i prolly wont be able to either [17:07:23] https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&m=&tab=ch&vn=&hreg%5B%5D=db63 [17:07:26] "No matching metrics" [17:07:51] All lag just disappeared... [17:07:57] Coren: are you on its com port? [17:08:08] or is it just timing out from old connection like drac5 likes to do? [17:08:10] RobH: I was up to ~3 min ago. [17:08:16] but no longer? [17:08:31] then its drac5 bein shitty and i have to restart it, so wanted to make sure you werent doin somethin [17:08:31] RobH: Not anymore no. [17:08:33] cool [17:08:54] resetting its drac [17:08:57] RobH: But that's not going to help you very much; I was on the console indeed to notice that the box doesn't even emerge from POST. [17:09:05] does it get to post at all? [17:09:10] Nope. [17:09:35] ok, gonna check the drac log and see if it detects any hw faults [17:09:51] since this is in tampa, we need to do all the major troubleshooting [17:09:53] Coren: please do, thanks! the machines are going to be put out of service soon [17:10:17] Coren: Description: PCI Parity Err: Critical Event sensor, PCI PERR (Bus 3 Device 0 Function 0) was asserted [17:10:25] sweet... sounds like mainboard. [17:10:33] or memory [17:10:42] but prolly board [17:10:48] timestamp [17:10:48] Date/Time: 01/10/2014 08:37:58 [17:10:55] so looks like our fail event [17:11:23] and wait for it.... the system was purchased on 2009-02-06 [17:11:28] so there won't be any warranty support [17:11:35] its olllllld [17:12:03] if it is memory, then we have on site spare to swap and resurrect, but i bet its the mainboard [17:12:23] if so the system is just dead, we'd harvest off the good parts for spare and toss the rest [17:12:33] well, toss, recycle, whatevs [17:13:18] so it looks like the old pdf rendering cluster is simply going to be a system down. Those are the unpuppetized pdf generation stuff that no one really can easily spin up replacement (hence the sprints underway to correct that issue) [17:13:21] =[ [17:13:42] cuz every pdf system in that cluster is slightly different! (or it used to be) [17:14:00] RobH: the latter [17:14:23] can we switch to rendering on pdf2 and pdf3 only? [17:14:30] !log reedy synchronized php-1.23wmf10 [17:14:36] we should be able to depool it in whatever configs use it [17:14:37] Logged the message, Master [17:16:16] though it was the only listing in commonsettings [17:16:26] !log reedy updated /a/common to {{Gerrit|I5f2816468}}: Segregate watchlist and recentshangeslinked queries on all shards. [17:16:29] i wonder if we can push pdf2 there or if pdf1 was doing something unique that pdf2 doesnt do [17:16:31] (03PS1) 10Reedy: Add/update symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106536 [17:16:32] Logged the message, Master [17:16:58] (03CR) 10Reedy: [C: 032] Add/update symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106536 (owner: 10Reedy) [17:17:06] Reedy: hey, what's your travel schedule Arch Summit? [17:17:07] (03Merged) 10jenkins-bot: Add/update symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106536 (owner: 10Reedy) [17:17:45] greg-g: I arrive on teh 20th [17:17:57] !log reedy synchronized docroot and w [17:18:04] Logged the message, Master [17:18:40] schmir: you happen to know the pdf system roles? i can see where we reference it in our config [17:18:48] but we only point things on collections to pdf1 [17:18:51] PROBLEM - MySQL Replication Heartbeat on db69 is CRITICAL: CRIT replication delay 303 seconds [17:18:54] RobH: I would need to start some additional services. would someone be able to change $wgCollectionMWServeURL to pint to pdf2 [17:19:01] i have no idea if pdf1 did something uniwue that 2 doesnt [17:19:07] i can change it right now [17:19:15] (03PS1) 10Manybubbles: Turn off automatically searching commons [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106537 [17:19:25] just $wgCollectionMWServeURL = "http://pdf2.wikimedia.org:8080/mw-serve/"; right? [17:19:39] Reedy: cool [17:19:44] its been years since i touched pdf stuff =P [17:19:51] RECOVERY - MySQL Replication Heartbeat on db69 is OK: OK replication delay -1 seconds [17:20:54] (03PS1) 10RobH: pdf1 died, replace with pdf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106538 [17:22:09] RobH: yes [17:22:13] RobH, what's wrong with pdf1? [17:22:26] OuKB: It's dead, Jim. [17:22:34] lolol [17:22:49] hw failure [17:22:56] it couldn't wait for 2 more weeks while we're coding a replacement [17:22:59] seems like memory (my wishful thinking) or mainboard (prolly that) [17:23:00] nope [17:23:20] :) [17:23:36] okay, if it's broken let's try fining the crap out of it [17:23:41] *fixing [17:24:05] (03CR) 10RobH: [C: 032] pdf1 died, replace with pdf2 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106538 (owner: 10RobH) [17:26:04] RobH: thanks. [17:27:30] RobH, are you deploying that change or should I? [17:27:36] i am now [17:27:45] i had to refresh how to do mediawiki changes, its been a very long time. [17:27:47] =P [17:31:18] !log robh synchronized wmf-config/ [17:31:25] Logged the message, Master [17:32:14] ok, so that should be live now [17:33:04] and he left irc =P [17:33:55] !log reedy updated /a/common to {{Gerrit|Iaada76f43}}: pdf1 died, replace with pdf2 [17:34:00] (03PS1) 10Reedy: Update php symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106540 [17:34:01] Logged the message, Master [17:34:18] (03CR) 10Reedy: [C: 032] Update php symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106540 (owner: 10Reedy) [17:34:26] (03Merged) 10jenkins-bot: Update php symlink [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106540 (owner: 10Reedy) [17:35:13] schmir: hey, that change should be live now [17:35:22] so mediawikiconfig points to pdf2 now [17:35:32] RobH: yes, thanks. pdf generation is already working again. [17:35:41] awesome! [17:35:49] thanks for the help [17:36:05] RobH: thanks for the switch [17:36:19] * RobH had not done a mediawiki config change in a very long time, now that all the devs do that stuff themselves [17:36:46] mwalker: while I'm here, where does the current code for the replacement live (git repo)? I'd like to take a look at it... [17:40:10] schmir, in http://git.wikimedia.org/?wicket:interface=:0:1::: everything that starts with mediawiki/extensions/Collection/OfflineContentGenerator [17:41:02] MaxSem: "Our servers are currently experiencing a technical problem."...hope they didn't die too :) [17:41:17] thanks will take a look [17:42:26] need to leave now. bye! [17:45:08] ACKNOWLEDGEMENT - Host elastic1007 is DOWN: PING CRITICAL - Packet loss = 100% Chris Johnson RT 6517 [17:47:14] MaxSem: is docroot/bits/WikipediaMobileFirefoxOS missing an upstream or something? [17:47:29] !log reedy synchronized php-1.23wmf10 [17:47:34] Logged the message, Master [17:47:35] no idea, ask dr0ptpakt [17:50:37] ^d: how do you get an older patch version in git-review? [17:51:01] <^d> Dunno [17:51:12] use git checkout or similar [17:51:49] git fetch https://reedy@gerrit.wikimedia.org/r/mediawiki/core refs/changes/41/106541/1 && git checkout FETCH_HEAD [17:53:12] * Aaron|home was trying to avoid going into a detached head, though I guess checkout -b can handle that [17:53:32] I could have sworn there was git-review command for that, but I don't see it, meh [17:53:33] !log reedy started scap: testwiki to 1.23wmf10 build l10n cache [17:53:39] Logged the message, Master [17:53:44] git review -d #number,#patchnumber ? [17:54:58] 'change can be changeNumber as obtained using ---list option, or it can be changeNumber,patchsetNumber ' [17:55:08] ah, so that works [17:55:15] gah, wmf10 [17:55:33] * aude wasting time updating wmf9 [17:56:22] heh [17:56:32] Still need 9 presumably? [17:56:35] 8 is dying in an hour [17:56:36] no [17:56:45] we have something for test wikidata [17:57:08] ah [17:57:16] happy new year present? [17:57:26] yes! [17:57:29] wikisource [17:57:45] scary [17:57:45] ^d: if you do git review -d 106465,1 you can see a submodule change...I'm curious how one undoes that (other than use some tgit magic with happened in ps2) [17:57:47] it's taking over [17:57:50] :) [17:58:06] ^d: the normal commands to reset or checkout the submodule to the old version fail since git can find it [17:58:15] maybe there is some remote that I don't have that others do [17:59:03] <^d> Huh? [17:59:09] <^d> How is 106465 a submodule change? [18:04:28] (03PS3) 10BryanDavis: [WIP] Add logstash config for udp2log [operations/puppet] - 10https://gerrit.wikimedia.org/r/106154 [18:05:39] Reedy: so we need https://gerrit.wikimedia.org/r/#/c/106545/ ideally before you build localisation cache [18:05:52] then https://gerrit.wikimedia.org/r/#/c/106531/ [18:05:58] ^d: this is mediawiki-config [18:06:06] nice to have https://gerrit.wikimedia.org/r/#/c/106532/ [18:06:25] aude: too late :P [18:06:30] awww :( [18:06:41] Started 13 minutes or so ago [18:06:55] Updating should in theory be quicker [18:06:58] we can find out :) [18:07:05] it's just one message [18:07:16] no big deal if it's broken for a while [18:07:27] The changes should mean that's pretty quick [18:07:32] diff of json file and push [18:07:33] ok, hope so [18:07:39] and rebuild [18:08:39] <^d> Aaron|home: Oh, PS1 on it, I see it now. What are you trying to do? [18:09:01] ^d: figure out how to fix that myself [18:09:25] I don't want to have to ask some mobile person or redo it the hard way each time [18:09:41] Roan stared at it for long time and figure something out too [18:09:53] <^d> Removing submodules is a pain. You have to remove them from .gitmodules, .git/modules/ and .git/config [18:10:11] <^d> If it's gone in the gerrit version, it's technically gone, you just have to do local cleanup. [18:10:19] <^d> $hateSubmodules++ [18:10:21] what is being removed? [18:10:24] it had to do with the repo being in github and then moved to gerrit, with some commits getting merged in the past [18:11:10] removing submodules normally is possible in some newer git version [18:11:10] <^d> Yeah, because we were deploying straight from github which was evil++ [18:11:23] <^d> The submodule's already been updated. [18:11:36] <^d> Aaron|home's problem is that local clones still point to the old repo and barf. [18:12:14] * aude rage [18:12:49] <^d> Delete local repo, re-clone from gerrit :P [18:12:57] <^d> It's just mediawiki-config, not like core or something [18:13:51] yeah, already doing that [18:13:55] that was getting super annoying [18:16:42] ^d: btw, that should be ++$hateSubmodules, it's faster [18:16:59] <^d> meh [18:17:00] * Aaron|home is safe from nerf darts at home [18:17:17] <^d> You're lucky, I'm in a darting mood today :p [18:18:18] (03CR) 10Chad: [C: 031] "lgtm, will merge during LD." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106537 (owner: 10Manybubbles) [18:19:28] (03PS2) 10Reedy: add wikisource site link group for wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106531 (owner: 10Aude) [18:19:33] (03CR) 10Reedy: [C: 032] add wikisource site link group for wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106531 (owner: 10Aude) [18:19:42] (03Merged) 10jenkins-bot: add wikisource site link group for wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106531 (owner: 10Aude) [18:20:04] scap is nearly done [18:20:19] !log reedy finished scap: testwiki to 1.23wmf10 build l10n cache (duration: 31m 21s) [18:20:25] Logged the message, Master [18:20:43] (03PS2) 10Reedy: cleanup old Wikibase settings which are same as default now [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106532 (owner: 10Aude) [18:20:48] (03CR) 10Reedy: [C: 032] cleanup old Wikibase settings which are same as default now [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106532 (owner: 10Aude) [18:20:57] (03Merged) 10jenkins-bot: cleanup old Wikibase settings which are same as default now [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106532 (owner: 10Aude) [18:22:04] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: testwiki back to 1.23wmf9 till window [18:22:10] Logged the message, Master [18:22:38] !log reedy updated /a/common to {{Gerrit|Ib6f2c4d96}}: cleanup old Wikibase settings which are same as default now [18:22:43] Logged the message, Master [18:23:11] (03PS2) 10Reedy: Enable AssertEdit extension only on 1.23wmf9 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/105729 [18:24:36] (03PS3) 10Reedy: Enable AssertEdit extension only on 1.23wmf9 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/105729 [18:24:37] (03PS1) 10Reedy: Wikipedias to 1.23wmf9 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106549 [18:24:38] (03PS1) 10Reedy: phase1 wikis to 1.23wmf10 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106550 [18:31:18] (03PS1) 10Aude: Fix $wgExtensionEntryPointListFiles, getRealmSpecificFilename does not work for this [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106551 [18:31:30] :( [18:31:32] * aude not touching MWRealm.php [18:31:41] not to try and fix that [18:32:01] or we can rename the files with a . [18:32:01] Probably a good idea [18:32:43] (03CR) 10Reedy: [C: 032] Fix $wgExtensionEntryPointListFiles, getRealmSpecificFilename does not work for this [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106551 (owner: 10Aude) [18:33:14] (03Merged) 10jenkins-bot: Fix $wgExtensionEntryPointListFiles, getRealmSpecificFilename does not work for this [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106551 (owner: 10Aude) [18:55:44] (03PS2) 10Reedy: Wikipedias to 1.23wmf9 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106549 [18:55:49] (03CR) 10Reedy: [C: 032] Wikipedias to 1.23wmf9 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106549 (owner: 10Reedy) [18:56:01] (03Merged) 10jenkins-bot: Wikipedias to 1.23wmf9 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106549 (owner: 10Reedy) [18:56:27] (03PS4) 10Reedy: Enable AssertEdit extension only on 1.23wmf9 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/105729 [18:56:39] (03CR) 10Reedy: [C: 032] Enable AssertEdit extension only on 1.23wmf9 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/105729 (owner: 10Reedy) [18:56:56] (03Merged) 10jenkins-bot: Enable AssertEdit extension only on 1.23wmf9 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/105729 (owner: 10Reedy) [19:04:31] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: Wikipedias to 1.23wmf9 [19:04:38] Logged the message, Master [19:07:04] !log reedy synchronized wmf-config/ 'AssertEdit only on 1.23wmf9' [19:07:10] Logged the message, Master [19:07:33] (03PS2) 10Reedy: phase1 wikis to 1.23wmf10 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106550 [19:07:56] (03CR) 10Reedy: [C: 032] phase1 wikis to 1.23wmf10 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106550 (owner: 10Reedy) [19:08:04] (03Merged) 10jenkins-bot: phase1 wikis to 1.23wmf10 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106550 (owner: 10Reedy) [19:08:22] PROBLEM - Apache HTTP on mw1149 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:09:39] Reedy, bits look hosed on enwiki [19:10:10] test wikidata also (not that it's important) [19:10:27] also some 503 from bits for dewiki (e.g. for watchlist) [19:10:31] PROBLEM - Apache HTTP on mw1150 is CRITICAL: Connection timed out [19:11:06] app servers have a process hike [19:11:07] no stylesheet from Italy [19:11:13] caches have network spike [19:11:31] PROBLEM - Apache HTTP on mw1152 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:12:31] RECOVERY - Apache HTTP on mw1150 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 1.682 second response time [19:13:13] ^ relevant server is relevant [19:13:22] RECOVERY - Apache HTTP on mw1149 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 8.441 second response time [19:13:31] RECOVERY - Apache HTTP on mw1152 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 7.363 second response time [19:13:44] gah [19:14:12] lol, width issue on enwiki from betafeatures [19:14:27] Most things are recovering [19:14:30] :/ [19:15:07] Width issue? [19:15:36] Oh, the layout change [19:15:41] yeaaah [19:15:59] * marktraceur looks around for jdlrobson [19:16:23] Gloria: It's not a bug, it's a feature [19:16:31] PROBLEM - Apache HTTP on mw1151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:18:31] RECOVERY - Apache HTTP on mw1151 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 6.981 second response time [19:20:31] PROBLEM - Apache HTTP on mw1150 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:21:31] RECOVERY - Apache HTTP on mw1150 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 3.490 second response time [19:21:31] PROBLEM - Apache HTTP on mw1151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:22:15] 1151 is still apparently unhappy [19:22:22] RECOVERY - Apache HTTP on mw1151 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 808 bytes in 0.055 second response time [19:23:13] bits cache traffic seems to have settled at about up a third [19:24:14] do we have a RL fragmentation problem? [19:29:32] https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=Bits%2520caches%2520eqiad&tab=m&vn=&hide-hf=false [19:29:41] https://ganglia.wikimedia.org/latest/?r=hour&cs=&ce=&s=by+name&c=Bits%2520application%2520servers%2520eqiad&tab=m&vn=&hide-hf=false [19:30:20] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: phase1wikis to 1.23wmf10 [19:30:26] Logged the message, Master [19:31:31] Lol fatals [19:31:36] stripped symbols :-/ [19:31:43] how do people run websites with stripped symbols? [19:31:47] I know, blindly [19:32:17] 50 PHP Fatal error: Call to a member function matchStartAndRemove() on a non-object in /usr/local/apache/common-local/php-1.23wmf10/includes/parser/Parser.php on [19:32:17] line 3262 [19:32:18] Ug [19:32:25] Reedy: #0 /usr/local/apache/common-local/php-1.23wmf10/includes/MagicWord.php(240): MagicWord->load('cascadingsource...') [19:32:25] wth [19:32:32] ah, you are seeing it [19:33:02] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: 1.23wmf10 looks brokened [19:33:10] Logged the message, Master [19:33:36] ^d: Fatals on mediawiki.org [19:33:46] marktraceur: See above? [19:33:50] Oh, nvm. [19:33:57] Ignore me! [19:34:06] Seemingly just mw.org [19:34:17] <^d> marktraceur: Ignoring you, as requested! [19:34:21] But not test? Weird. [19:34:40] !log reedy started scap: Update 1.23wmf10 l10n cache [19:34:47] Logged the message, Master [19:34:58] Most of them are load.php requests [19:36:51] aude: Re-running scap there is no message changes... [19:36:57] (03PS1) 10Cmjohnson: Adding new ip's for barium [operations/dns] - 10https://gerrit.wikimedia.org/r/106562 [19:36:59] huh [19:37:10] did you merge my patch [19:37:19] I merged 2.. [19:37:19] https://gerrit.wikimedia.org/r/#/c/106545/ [19:37:27] Missed that one [19:38:13] !log reedy started scap: Update 1.23wmf10 l10n cache [19:38:19] Logged the message, Master [19:38:32] aude: Still no changes [19:38:38] really? [19:38:47] wait for jenkins? [19:38:53] Nothing to wait for [19:39:02] Updating ExtensionMessages-1.23wmf10.php... [19:39:02] k [19:39:02] done [19:39:02] Copying to local copy... [19:39:02] done [19:39:03] Updating LocalisationCache for 1.23wmf10... done [19:39:11] pull latest git [19:39:21] ? [19:39:22] Merge "Add wikisource sitelink section message for Wikidata" [19:39:32] It's already at the top of the WM git logs [19:39:39] * aude looks [19:40:36] ok, i see it in WikimediaMessages [19:40:51] (03CR) 10Jgreen: [C: 031 V: 031] Adding new ip's for barium [operations/dns] - 10https://gerrit.wikimedia.org/r/106562 (owner: 10Cmjohnson) [19:41:04] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: phase1 wikis minux mediawikiwiki to 1.23wmf10 [19:41:11] Logged the message, Master [19:41:25] (03CR) 10Cmjohnson: [C: 032] Adding new ip's for barium [operations/dns] - 10https://gerrit.wikimedia.org/r/106562 (owner: 10Cmjohnson) [19:41:25] [c44af903] /wiki/Special:Version Exception from line 317 of /usr/local/apache/common-local/php-1.23wmf10/includes/MagicWord.php: Error: invalid magic word 'cascadingsources' [19:41:32] Who owns cascadingsources? [19:41:35] Wikidata? [19:41:41] never heard of it! [19:42:27] It's in core [19:42:36] ugh [19:42:44] Reedy: jackmcbarn [19:42:45] https://gerrit.wikimedia.org/r/104999 [19:42:52] * jackmcbarn looks [19:43:09] oh [19:43:12] was that a one-time error? [19:43:23] probably because one of the files got pushed a split-second before the other [19:43:27] No [19:43:33] https://test2.wikipedia.org/wiki/Special:Version [19:43:38] Mash F5 to your hearts content [19:43:44] hmm [19:43:51] The message isn't in the localisation cache is the problem [19:44:18] <^d> That fatal is so lame. It should fall back to the i18n file if the cache lacks an entry. [19:44:23] PHP fatal error in /usr/local/apache/common-local/php-1.23wmf10/includes/parser/Parser.php line 3262: [19:44:26] Call to a member function matchStartAndRemove() on a non-object [19:44:27] on test wikidata [19:44:43] I've logged a bug for that one [19:44:45] works on beta cluster so i assume it's not a code issue [19:45:02] lol [19:45:30] I can't even remember what they do for message caches [19:45:46] seems related to magic word [19:45:56] parser error [19:46:11] new MagicWordArray( $substIDs ); is null or something [19:46:20] https://bugzilla.wikimedia.org/show_bug.cgi?id=59875 [19:48:47] Updating LocalisationCache for 1.23wmf10... is taking a while [19:49:20] Which suggests "something" has changed [19:49:24] * Reedy waits [19:50:03] !log reedy started scap [19:50:09] Logged the message, Master [19:50:09] Updating LocalisationCache for 1.23wmf10... Updated 366 JSON file(s) in '/a/common/php-1.23wmf10/cache/l10n'. [19:50:30] * aude can't see test wikidata so no idea [19:51:16] gotta wait for it to propogate [19:52:01] if i purge, then i get an exception [19:52:42] Yeah [19:52:44] wiki/Q22?action=purge Exception from line 317 of /usr/local/apache/common-local/php-1.23wmf10/includes/MagicWord.php: Error: invalid magic word 'cascadingsources' [19:52:53] Need to wait for scap to finish [19:52:54] fix one and i think we fix both [19:54:48] how is mediawiki.org up? isn't it supposed to be the same as test and test2? [19:55:17] We can change it on a per wiki basis [19:55:28] I purposely excluded it when I moved the test wikis back to 1.23wmf10 [19:56:07] PHP Warning: array_map() [function.array-map]: An error occurred while invoking the map callback in /usr/local/apache/common-local/php-1.23wmf10/includes/Exception.php on line 397 [19:58:39] Reedy: so, status? need help? [19:58:42] Waiting [19:58:56] There was some change in the localisation cache [19:59:12] Took a while to rebuild.. So need to wait for this to push then see where we are [19:59:21] k [20:03:48] <^d> Reedy: That array_map() warning happens when an exception is thrown inside an array_map() callback. [20:04:06] Yeahh [20:04:08] awesomes [20:04:20] Rebuilding CDB files from /upstream... [20:04:22] "halfway" [20:05:30] https://test2.wikipedia.org/wiki/Main_Page [20:05:31] Yup [20:05:37] Hit an updated apache and all looks fine [20:06:36] trololololol [20:07:10] !log reedy finished scap (duration: 19m 42s) [20:07:14] PHP Warning: http_build_query() expects at most 3 parameters, 4 given in /usr/local/apache/common-local/php-1.23wmf10/includes/libs/MultiHttpClient.php on line 182 [20:07:16] Logged the message, Master [20:07:26] PHP Warning: htmlspecialchars() expects parameter 1 to be string, array given in /usr/local/apache/common-local/php-1.23wmf9/includes/parser/CoreParserFunctions.php on line 212 [20:07:54] looks like it falls apart as we speak [20:08:11] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: mediawikiwiki to 1.23wmf10 aswell [20:08:16] test wikidata looks good [20:08:17] Logged the message, Master [20:09:25] is mutante around? [20:09:47] He's /away [20:09:52] EU evening [20:09:56] probably not [20:10:14] aude: Should be all good [20:10:14] thanks Reedy [20:10:19] yep [20:11:14] Reedy: are we like all good all good? [20:11:39] greg-g: I think so. Bit of noise in the apache logs but nothing serious [20:11:45] awesome-sauce [20:11:57] any follow up needed? [20:12:00] bug reports whatever? [20:12:11] Just doing those [20:12:19] awesome-sauce [20:12:30] mind cc'ing me? mostly curiouus [20:12:32] -u [20:12:46] * greg-g goes to get some lunch real quick [20:13:55] * aude going home [20:18:25] !log aaron synchronized php-1.23wmf10/includes/filebackend 'e45db51adeddcc71a97e6547a6f4ecaf5f320a8c' [20:18:31] Logged the message, Master [20:19:27] !log aaron synchronized php-1.23wmf10/includes/filebackend 'e45db51adeddcc71a97e6547a6f4ecaf5f320a8c' [20:19:33] Logged the message, Master [20:20:19] AaronSchulz: are you fixing Argument 1 passed to FileBackendStore::normalizeXAttributes() must be an array, null given ? [20:20:38] no, just getting a custom header fix in [20:20:43] can look at whatever that is too [20:20:44] !log aaron synchronized php-1.23wmf10/includes/filebackend 'e45db51adeddcc71a97e6547a6f4ecaf5f320a8c' [20:20:50] Logged the message, Master [20:21:18] oh, those succeeded [20:21:39] * AaronSchulz was trying cygwin and it just seemed to bail after "copying to apaches" [20:21:45] * AaronSchulz went back to putty [20:22:02] Reedy: was it you I talked to about the error with "cascadingsource" and MagicWord? I just saw some of them on test2wiki. [20:22:25] Reedy: looks like this: https://saucelabs.com/jobs/69ec8db606194393bcfc68cffa6b4b87 [20:25:22] The localisation cache didn't seem to build properly first time around [20:25:25] Not quite sure why [20:40:02] Reedy: the cascadingsources issue did the same thing on beta labs as on test2wiki: got that error everywhere for a short time, then back to normal. [20:51:05] !log aaron synchronized php-1.23wmf10/includes/filebackend '94cb6f164ed2b9f7dbdf809e2d861ae04ab1df21' [20:51:11] Logged the message, Master [21:01:08] Reedy: was mediawikiwiki on 1.23wmf9 up until "[20:08:11] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: mediawikiwiki to 1.23wmf10 aswell" ? [21:02:54] (Flow got a Fatal on mediawikiwiki at [09-Jan-2014 20:06:05] that should have been fixed in 1.23wmf10) [21:05:40] <^d> spage: There were bunches of fatals in wmf10 due to magic word localization. [21:05:44] <^d> Hence the rollback [21:08:47] ^d OK, so the answer is "yes at 20:06 mediawikiwiki was on wmf9" [21:09:03] <^d> Yeah, as far as I know. [21:09:11] <^d> Easily verified by checking wikiversions.dat :) [21:13:35] ^d: any more comments on https://gerrit.wikimedia.org/r/#/c/100760/ ? [21:16:25] <^d> matanya: lgtm, but I'm not ops so best I can do is +1 :) [21:16:45] (03CR) 10Chad: [C: 031] "lgtm." [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 (owner: 10Matanya) [21:16:52] ^d: that would be good too :) Thanks a lot! [21:25:58] (03PS1) 10Chad: Remove searchidx2 from dsh groups [operations/puppet] - 10https://gerrit.wikimedia.org/r/106622 [21:28:15] (03CR) 10Ori.livneh: [C: 04-1] "I made some points of criticism inline, but don't let demoralize you -- this is shaping up nicely." (034 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/100760 (owner: 10Matanya) [22:02:19] Reedy: did you see the VE bug on mw.org? [22:02:43] Reedy: they have a proposed fix, but only rmoen is around (no one to review/merge), so we might have to rollback mw.org *again* until they can fix that [22:10:55] here's the patch? [22:11:06] ori: https://gerrit.wikimedia.org/r/#/c/106621/ [22:11:12] just merged by Jamesofur [22:11:14] er other james [22:11:57] ori: now associated cherry-pick https://gerrit.wikimedia.org/r/106627 [22:15:34] does the cherry pick need review? those are usually self-merged [22:15:48] ori: I don't have deploy rights, so I can't self-merge. [22:15:56] ori: (Helpful.) [22:16:22] James_F: ok. is it safe? [22:16:33] ori: Looks OK to us. [22:16:40] ori: Frankly, it can't be much worse than current state. [22:17:50] ori: Thanks. [22:17:52] James_F, ori: its completely safe [22:17:59] rmoen: boring! [22:18:03] :) [22:18:06] <^d> Completely safe? [22:18:07] ori only touches dangerous code [22:18:08] <^d> We're doomed. [22:18:09] should i deploy it? [22:18:19] <^d> I'll just say it. [22:18:20] ori: yes please, [22:18:21] Please. [22:18:22] <^d> NOTHING CAN POSSIBLY GO WRONG [22:18:24] can we at least pretend that it's risky? [22:18:29] shush chad! [22:18:33] sweet, now we're challenging the gods, i like that [22:18:36] :) [22:18:49] Cant break something that's already broken.. [22:18:54] * greg-g knocks on wood, throws salt over left shoulder, turns around twice [22:18:56] challenge accepted! [22:19:00] hah [22:19:00] <^d> rmoen: You're not trying hard enough :) [22:19:01] or was it thrice? WE'RE DOOMED [22:19:43] !log aaron synchronized php-1.23wmf10/includes/libs '1a5ac00f8991905d8fd643d53cb744024143c558' [22:19:49] Logged the message, Master [22:19:49] oh good ;) [22:20:10] greg-g: It's thrice, widdershins. [22:20:58] that's not en_US [22:21:40] James_F: i'm preparing the submodule update, hence the delay [22:22:07] ori: Yeah, sadly I'm familiar with how slow it can be. No worries. :-) [22:28:55] !log ori synchronized php-1.23wmf10/extensions/VisualEditor/ApiVisualEditor.php 'Update VisualEditor for cherry pick I5cc44c5ef35 (bug 59867)' [22:29:00] Logged the message, Master [22:29:42] ^ James_F [22:30:00] and rmoen [22:30:38] Thanks ori. [22:30:51] did it fix things? [22:31:17] ori: Yes. [22:31:18] * ori types 'scap' repeatedly into an empty vim buffer [22:31:26] seems to [22:31:35] thanks much ori [22:32:03] ori: Thank you sir [22:32:05] greg-g: it's just self-serving, thrill-seeking behavior on my part, you know that by now :P [22:32:26] ori: but when your self-serving is also public-benefiting, I appreciate it ;) [22:32:45] * James_F grins. [22:32:57] ori: I owe you drinks on Monday when I get back to the office. [22:33:00] alright, I think I might get other things done now... [22:52:40] (03PS1) 10MaxSem: Enable new diff everywhere now that wikidiff2 has been upgraded [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106629 [22:57:36] greg-g, can I LD ^^^ ? [22:58:54] MaxSem: has it been tested in betacluster? [22:59:07] yep, and on testwiki too [22:59:54] betadiff doesn't mean a betafeature, right? [23:00:49] it's a MobileFrontend beta, not BetaFeatures one [23:00:58] ah [23:01:17] MaxSem: yes then [23:01:23] thanks:) [23:06:04] * aude wonders why bodyContent on enwiki has a max-width of 715 px? [23:06:06] * MatmaRex reminds everyone how he said BetaFeatures is a bad name [23:06:14] meaning huge white space on the side [23:06:17] aude: because of one of the beta features [23:06:19] everyone hates it [23:06:22] wtf! [23:06:27] (typographt refresh) [23:06:30] just disable it [23:06:36] * aude opted in to all beta [23:07:10] MatmaRex: The '80s sitcom audience behind me just yelled "WE KNOW!", thought you should know [23:07:41] heh [23:24:19] ori, paravoid can i poke you to take a look at my puppet patch? :) [23:24:33] sure, which one? [23:24:58] ori, https://gerrit.wikimedia.org/r/#/c/106471/ [23:25:03] thanks [23:25:10] I had a look at it earlier today [23:25:18] but it said WIP, and had some SSS FIXMEs [23:25:29] so I thought I should wait :) [23:25:38] right .. i wanted feedback about all that .. ;) [23:25:48] ok, let me put in my review [23:25:55] that would be great, thanks. [23:26:28] subbu: the '# SSS FIXME: Who/what uses this script?' comment reminded me of http://www.youtube.com/watch?v=HstmAnXY5r8 [23:27:11] ori, i'll have to wait to get home to see that ... at a coffee shop without headphones :) [23:28:47] ori, paravoid the other qn. I had about the patch was how would this be tested ... [23:52:48] (03PS1) 10Aaron Schulz: Make temp URLs actually work (e.g. for private containers) [operations/puppet] - 10https://gerrit.wikimedia.org/r/106635 [23:56:02] paravoid: not sure how that was missed [23:56:20] http://docs.openstack.org/trunk/config-reference/content/object-storage-tempurl.html [23:56:52] I remember seeing the middleware and wondering [23:57:02] but I thought the feature was in production [23:57:07] they "work" for public files since they are public anyway [23:57:21] not sure if any callers would have hit this then (e.g. TMH) [23:57:47] I was testing around to make sure some MW refactoring didn't break them...then I realized that must never have worked [23:57:56] hah [23:58:50] once everything is on wmf10 I can disable the CF extension \o/ [23:59:01] CF? [23:59:21] CloudFiles [23:59:26] oh [23:59:33] because you've rewritten it?