[00:03:38] greg-g: Are you still ok with my update going in parallel? [00:04:23] bd808: yeah [00:04:37] Excellent. I'll get it done then [00:04:42] * greg-g nods [00:06:16] greg-g: {{done}} [00:06:49] sweet [00:07:33] git-deploy is quick like a bunny when you are only updating a single host :) [00:08:59] !log mholmquist synchronized php-1.23wmf8/extensions/MultimediaViewer/ [00:09:06] Logged the message, Master [00:10:51] !log mholmquist synchronized php-1.23wmf9/extensions/MultimediaViewer/ [00:10:54] MaxSem: Done! Will test on mw.org but it should be fine. [00:10:57] Logged the message, Master [00:11:28] Yeah, looks good [00:11:38] wee [00:11:47] Oh, sorry [00:11:51] LIGHTENING DEPLOYYYYY [00:11:59] * marktraceur apologises to greg-g [00:12:01] (03CR) 10MaxSem: [C: 032] Enable beta mobile diff on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106236 (owner: 10MaxSem) [00:12:15] (03Merged) 10jenkins-bot: Enable beta mobile diff on testwiki [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106236 (owner: 10MaxSem) [00:12:33] (03PS1) 10Aaron Schulz: Disable wgMaxBacklinksInvalidate now that throttling is used [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106465 [00:14:31] (03CR) 10jenkins-bot: [V: 04-1] Disable wgMaxBacklinksInvalidate now that throttling is used [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106465 (owner: 10Aaron Schulz) [00:15:21] !log maxsem synchronized wmf-config 'https://gerrit.wikimedia.org/r/106236' [00:15:28] Logged the message, Master [00:15:44] I hate that fucking submodule [00:19:12] I actually can't remember the command to make that diff go away [00:21:21] !log maxsem synchronized php-1.23wmf8/extensions/Collection/ 'https://gerrit.wikimedia.org/r/106293' [00:21:28] Logged the message, Master [00:25:25] !log maxsem synchronized php-1.23wmf8/extensions/Collection/ 'Shit hit fan' [00:25:32] Logged the message, Master [00:25:57] I'm done [00:26:06] 'Shit hit fan'? [00:26:09] haha [00:26:20] MaxSem: that's it? "I'm done"? [00:26:27] :) [00:26:43] hmm, maybe ori knows [00:26:51] * AaronSchulz wishes that was just a symlink or something [00:27:01] knows what? [00:27:11] ori: how do I remove the submodule change from f12b7b2122797c0602c10a1f902955998143f3e4 ? [00:27:39] and I can't checkout nor reset to f12b7b2122797c0602c10a1f902955998143f3e4 [00:27:48] it's sooo convenient in TortoiseGit:P [00:28:00] ori: I meant https://gerrit.wikimedia.org/r/106465 [00:28:26] if you don't submodule update before commit, that crap gets tossed in [00:29:33] you mean, retain only the CommonSettings change? [00:29:43] right, I couldn't care less about the other thing [00:32:03] i dunno, checkout, git reset --soft 'HEAD^', git reset -- submodule_thigny, git commit -c ORIG_HEAD, git review? [00:32:39] (03PS2) 10MaxSem: Disable wgMaxBacklinksInvalidate now that throttling is used [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106465 (owner: 10Aaron Schulz) [00:32:53] ^^^:) [00:33:01] AaronSchulz: I checked it out, `git reset HEAD^` and checked in again and the `git show` looks clean [00:33:20] Max to the rescue [00:33:56] MaxSem: Nothing wrong I hope [00:34:50] bd808: checked in what? [00:35:31] The file that was dirty from the reset HEAD^ [00:35:47] Max fixed it for you in gerrit [00:36:00] I know but I want to know how to deal with this crap in the future [00:36:07] aaron has to insert quarters into his computer for each git operation [00:36:31] * ori kids [00:37:40] So I did `git review --download 106465; git reset HEAD^; git commit --all` and that seemed to drop the submodule change for me [00:38:08] !log ori synchronized php-1.23wmf8/extensions/EventLogging 'Update EventLogging to master' [00:38:15] Logged the message, Master [00:39:18] bd808: that doesn't work for me [00:40:01] Hmmm… oh. I don't have the submodule initialized !? [00:40:21] I remember Roan staring at this for a hour last time [00:40:26] * AaronSchulz can't recall what he did [00:40:32] MaxSem: so, what happened? [00:41:46] git submodule update [00:41:47] fatal: reference is not a tree: f12b7b2122797c0602c10a1f902955998143f3e4 [00:41:49] Unable to checkout 'f12b7b2122797c0602c10a1f902955998143f3e4' in submodule path 'docroot/bits/WikipediaMobileFirefoxOS' [00:41:56] ugh, so much for stackoverflow suggestions ;) [00:42:22] greg-g, I didn't realise wmf8 was so old, deployd master and was met with excepion due to https://gerrit.wikimedia.org/r/103915 [00:43:18] yeah, it is :/ [00:43:22] ah, that [00:43:32] MaxSem: so, reverted or fixed something? [00:43:37] reverted [00:43:39] * greg-g nods [00:44:32] (03PS1) 10Subramanya Sastry: WIP: Update parsoid puppet config to use new repository [operations/puppet] - 10https://gerrit.wikimedia.org/r/106471 [00:46:47] (03CR) 10Subramanya Sastry: "This is my first puppet patch ever and an initial attempt to update parsoid config to use the new repo. So, not sure if I got everything t" [operations/puppet] - 10https://gerrit.wikimedia.org/r/106471 (owner: 10Subramanya Sastry) [01:12:27] (03PS1) 10Springle: repool db1041 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106476 [01:12:51] (03CR) 10Springle: [C: 032] repool db1041 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106476 (owner: 10Springle) [01:12:59] (03Merged) 10jenkins-bot: repool db1041 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106476 (owner: 10Springle) [01:14:07] !log springle synchronized wmf-config/db-eqiad.php 'repool db1041' [01:14:16] Logged the message, Master [01:57:47] (03PS1) 10Springle: disable index_merge_sort_union [operations/puppet] - 10https://gerrit.wikimedia.org/r/106478 [02:02:22] springle: just a heads up that i'm running a big export query on db1047 (research slave), should finish in less than 10 mins [02:02:32] i don't expect any issues / alerts [02:02:55] * AaronSchulz wonders what was wrong with index_merge_sort_union [02:02:58] eh, time to go [02:08:01] ori: ok tnx [02:08:17] (now that almost 10mins have passed before i noticed :) [02:08:33] Query OK, 39178348 rows affected (3 min 11.23 sec) [02:08:36] faster than i thought [02:08:42] nice [02:11:15] (03CR) 10Springle: [C: 032] disable index_merge_sort_union [operations/puppet] - 10https://gerrit.wikimedia.org/r/106478 (owner: 10Springle) [02:15:04] Bsadowski1: I don't know what you're pinging me about. [02:21:17] PROBLEM - puppetmaster https on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:26:17] PROBLEM - HTTP on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:28:28] !log partition logging tables on logpager slaves: s2 db1002, s4 db1004, s5 db1026, s6 db1040 [02:50:11] wikitech down... [02:52:17] RECOVERY - HTTP on virt0 is OK: HTTP OK: HTTP/1.1 302 Found - 457 bytes in 6.802 second response time [02:52:58] should be back up ;) [02:52:59] !log LocalisationUpdate completed (1.23wmf8) at Thu Jan 9 02:52:59 UTC 2014 [02:53:44] !log wikitech nada. restarted apache on virt0 [02:54:17] RECOVERY - puppetmaster https on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 8.126 second response time [02:54:37] Reedy: did you do something too, or was it just the apache restart? [02:56:31] I'm not sure the bot can log to wikitech when wikitech is down. [02:56:57] wikitech was back up for seconds. now nada again [02:57:09] Yeah, it isn't loading for me. [02:57:17] PROBLEM - puppetmaster https on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:57:17] PROBLEM - HTTP on virt0 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:57:20] I'm not sure what virt0 is, but it seems awfully fickle. [02:57:39] It's virtualisation host 0 [02:57:53] server reached MaxClients setting [02:58:07] springle: Nothing. Just RECOVERY - HTTP being on the host where it's located [02:58:26] is maxclients really small? [02:59:03] 150 [02:59:24] Sounds pretty busy [02:59:59] puppet is busiest on top [03:00:17] RECOVERY - HTTP on virt0 is OK: HTTP OK: HTTP/1.1 302 Found - 457 bytes in 1.948 second response time [03:00:39] here we go [03:01:23] phusion_passenger exception then apache came back [03:03:07] RECOVERY - puppetmaster https on virt0 is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.179 second response time [03:03:28] !log wikitech down. restarted apache on virt0. phusion_passenger exception + MaxClients hit [03:03:31] * springle tries again [03:03:34] Logged the message, Master [03:03:38] yay [03:03:48] !log partition logging tables on logpager slaves: s2 db1002, s4 db1004, s5 db1026, s6 db1040 [03:03:54] Logged the message, Master [03:33:56] !log LocalisationUpdate ResourceLoader cache refresh completed at Thu Jan 9 03:33:55 UTC 2014 [03:34:02] Logged the message, Master [04:29:31] (03PS1) 10Springle: Segregate watchlist and recentshangeslinked queries on all shards. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106483 [04:30:16] (03CR) 10Springle: [C: 032] Segregate watchlist and recentshangeslinked queries on all shards. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106483 (owner: 10Springle) [04:30:25] (03Merged) 10jenkins-bot: Segregate watchlist and recentshangeslinked queries on all shards. [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106483 (owner: 10Springle) [04:31:41] !log springle synchronized wmf-config/db-eqiad.php 'watchlist/recnetchangeslinked LB on s[234567]' [04:31:48] Logged the message, Master [09:23:57] PROBLEM - Host pdf1 is DOWN: PING CRITICAL - Packet loss = 100% [10:28:07] akosiaris: pm? [10:34:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:36:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:38:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:40:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:42:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:44:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:46:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:48:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:50:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:52:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:54:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:56:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [10:57:05] PROBLEM - Host mw27 is DOWN: PING CRITICAL - Packet loss = 100% [10:58:05] RECOVERY - Host mw27 is UP: PING OK - Packet loss = 0%, RTA = 35.37 ms [10:58:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [11:00:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 10:31:15 AM UTC [11:00:45] RECOVERY - Puppet freshness on wtp1019 is OK: puppet ran at Thu Jan 9 11:00:40 UTC 2014 [11:01:46] hello [11:02:09] RobH: good morning [11:02:35] PROBLEM - Puppet freshness on wtp1019 is CRITICAL: Last successful Puppet run was Thu 09 Jan 2014 11:00:40 AM UTC [11:05:03] PROBLEM - Host cp4016 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:04] PROBLEM - Host cp4004 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:04] PROBLEM - Host cp4014 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:04] PROBLEM - Host cp4008 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:04] PROBLEM - Host cp4011 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:04] PROBLEM - Host cp4019 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:04] PROBLEM - Host mobile-lb.ulsfo.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [11:05:23] PROBLEM - Host mobile-lb.ulsfo.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:863:ed1a::1:c [11:05:29] that's ulsfo, ignore [11:05:33] PROBLEM - Host cp4012 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:33] PROBLEM - Host lvs4003 is DOWN: PING CRITICAL - Packet loss = 100% [11:05:34] pheew.ok [11:05:36] (03PS1) 10Matanya: wmclient moved to git. correcting README [operations/debs/adminbot] - 10https://gerrit.wikimedia.org/r/106502 [11:05:36] I'll have a look, but ignore for now [11:05:42] thx for info [11:05:43] PROBLEM - Host lvs4001 is DOWN: PING CRITICAL - Packet loss = 100% [11:06:03] RECOVERY - Host cp4004 is UP: PING OK - Packet loss = 0%, RTA = 75.12 ms [11:06:04] RECOVERY - Host cp4019 is UP: PING OK - Packet loss = 0%, RTA = 86.67 ms [11:06:04] RECOVERY - Host cp4016 is UP: PING OK - Packet loss = 0%, RTA = 75.07 ms [11:06:04] RECOVERY - Host cp4012 is UP: PING OK - Packet loss = 0%, RTA = 75.03 ms [11:06:04] RECOVERY - Host cp4008 is UP: PING OK - Packet loss = 0%, RTA = 75.12 ms [11:06:04] RECOVERY - Host cp4014 is UP: PING OK - Packet loss = 0%, RTA = 75.03 ms [11:06:13] RECOVERY - Host lvs4001 is UP: PING OK - Packet loss = 0%, RTA = 75.11 ms [11:06:13] RECOVERY - Host lvs4003 is UP: PING OK - Packet loss = 0%, RTA = 75.15 ms [11:06:13] RECOVERY - Host cp4011 is UP: PING OK - Packet loss = 0%, RTA = 84.76 ms [11:06:13] RECOVERY - Host mobile-lb.ulsfo.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 85.09 ms [11:06:33] RECOVERY - Host mobile-lb.ulsfo.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 86.74 ms [11:09:48] mutante: have a sec? [11:31:53] RECOVERY - Puppet freshness on wtp1019 is OK: puppet ran at Thu Jan 9 11:31:49 UTC 2014 [11:38:11] http://reportcard.wikimedia.org/ appears to be down [11:38:36] are we only using http://reportcard.wmflabs.org/ now? [11:44:16] I think so [11:44:49] cert error The certificate is only valid for metrics.wikimedia.org [11:45:25] PROBLEM - Host cp4012 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:25] PROBLEM - Host cp4011 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:25] PROBLEM - Host cp4020 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:25] PROBLEM - Host cp4019 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:28] ori: what's behind the login on https://reportcard [11:45:40] The site says: "WMF E3 Metrics API" [11:45:43] PROBLEM - Host bits-lb.ulsfo.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:863:ed1a::1:a [11:45:46] PROBLEM - Host mobile-lb.ulsfo.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:863:ed1a::1:c [11:45:49] PROBLEM - Host upload-lb.ulsfo.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:863:ed1a::2:b [11:45:51] PROBLEM - Host text-lb.ulsfo.wikimedia.org_ipv6 is DOWN: /bin/ping6 -n -U -w 15 -c 5 2620:0:863:ed1a::1 [11:45:54] PROBLEM - Host cp4004 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:55] PROBLEM - Host cp4017 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:55] PROBLEM - Host cp4010 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:55] PROBLEM - Host cp4006 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:55] PROBLEM - Host cp4005 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:55] PROBLEM - Host cp4015 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:56] PROBLEM - Host cp4016 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:56] PROBLEM - Host cp4008 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:57] PROBLEM - Host cp4018 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:57] PROBLEM - Host cp4001 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:58] PROBLEM - Host lvs4001 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:58] PROBLEM - Host cp4013 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:58] PROBLEM - Host cp4009 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:59] PROBLEM - Host cp4007 is DOWN: PING CRITICAL - Packet loss = 100% [11:45:59] PROBLEM - Host lvs4003 is DOWN: PING CRITICAL - Packet loss = 100% [11:46:00] PROBLEM - Host upload-lb.ulsfo.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [11:46:01] PROBLEM - Host cp4003 is DOWN: PING CRITICAL - Packet loss = 100% [11:46:01] PROBLEM - Host cp4014 is DOWN: PING CRITICAL - Packet loss = 100% [11:46:02] PROBLEM - Host lvs4002 is DOWN: PING CRITICAL - Packet loss = 100% [11:46:02] PROBLEM - Host cp4002 is DOWN: PING CRITICAL - Packet loss = 100% [11:46:03] PROBLEM - Host bast4001 is DOWN: PING CRITICAL - Packet loss = 100% [11:46:03] PROBLEM - Host bits-lb.ulsfo.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [11:46:04] PROBLEM - Host lvs4004 is DOWN: PING CRITICAL - Packet loss = 100% [11:46:15] * hashar waves at ulsfo [11:46:15] dear vendor who I won't name: fuck you [11:46:24] PROBLEM - Host mobile-lb.ulsfo.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [11:46:34] PROBLEM - Host text-lb.ulsfo.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [11:46:43] mutante: dunno [11:47:06] who can create a git repo for me? [11:47:10] we have no traffic going to ulsfo right now so... [11:47:31] I would like to push gangilos to gerrit [11:47:34] PROBLEM - Host backup4001 is DOWN: PING CRITICAL - Packet loss = 100% [11:47:39] operations/software/ganglios would be the name [11:48:29] matanya: i can do [11:48:36] what is ganglios about ? [11:48:39] thanks hash [11:48:52] ganglios is a collection of tools that allow nagios to trigger alerts based on data it pulls from ganglia. [11:49:05] RT ticket 6602 [11:49:22] ahh [11:49:24] RECOVERY - Host cp4012 is UP: PING OK - Packet loss = 0%, RTA = 80.58 ms [11:49:24] RECOVERY - Host cp4020 is UP: PING OK - Packet loss = 0%, RTA = 87.83 ms [11:49:24] RECOVERY - Host cp4011 is UP: PING OK - Packet loss = 0%, RTA = 80.01 ms [11:49:24] RECOVERY - Host upload-lb.ulsfo.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 80.00 ms [11:49:27] RECOVERY - Host text-lb.ulsfo.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 80.04 ms [11:49:31] RECOVERY - Host cp4019 is UP: PING OK - Packet loss = 0%, RTA = 80.03 ms [11:49:31] RECOVERY - Host bits-lb.ulsfo.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 87.72 ms [11:49:34] RECOVERY - Host mobile-lb.ulsfo.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 80.07 ms [11:49:36] RECOVERY - Host lvs4003 is UP: PING OK - Packet loss = 0%, RTA = 80.85 ms [11:49:36] RECOVERY - Host lvs4002 is UP: PING OK - Packet loss = 0%, RTA = 80.17 ms [11:49:36] RECOVERY - Host upload-lb.ulsfo.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 80.02 ms [11:49:39] RECOVERY - Host cp4006 is UP: PING OK - Packet loss = 0%, RTA = 87.61 ms [11:49:39] RECOVERY - Host cp4002 is UP: PING OK - Packet loss = 0%, RTA = 80.05 ms [11:49:39] RECOVERY - Host cp4016 is UP: PING OK - Packet loss = 0%, RTA = 80.83 ms [11:49:39] RECOVERY - Host cp4003 is UP: PING OK - Packet loss = 0%, RTA = 87.95 ms [11:49:39] RECOVERY - Host cp4015 is UP: PING OK - Packet loss = 0%, RTA = 87.93 ms [11:49:39] RECOVERY - Host cp4009 is UP: PING OK - Packet loss = 0%, RTA = 88.20 ms [11:49:39] RECOVERY - Host cp4014 is UP: PING OK - Packet loss = 0%, RTA = 88.01 ms [11:49:40] RECOVERY - Host cp4010 is UP: PING OK - Packet loss = 0%, RTA = 87.93 ms [11:49:41] RECOVERY - Host cp4001 is UP: PING OK - Packet loss = 0%, RTA = 80.19 ms [11:49:41] RECOVERY - Host cp4018 is UP: PING OK - Packet loss = 0%, RTA = 80.76 ms [11:49:42] RECOVERY - Host cp4017 is UP: PING OK - Packet loss = 0%, RTA = 80.78 ms [11:49:42] RECOVERY - Host cp4013 is UP: PING OK - Packet loss = 0%, RTA = 87.97 ms [11:49:42] RECOVERY - Host cp4008 is UP: PING OK - Packet loss = 0%, RTA = 88.10 ms [11:49:43] RECOVERY - Host cp4004 is UP: PING OK - Packet loss = 0%, RTA = 88.28 ms [11:49:43] RECOVERY - Host cp4005 is UP: PING OK - Packet loss = 0%, RTA = 80.18 ms [11:49:44] RECOVERY - Host cp4007 is UP: PING OK - Packet loss = 0%, RTA = 80.75 ms [11:49:45] RECOVERY - Host bits-lb.ulsfo.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 87.86 ms [11:49:45] RECOVERY - Host bast4001 is UP: PING OK - Packet loss = 0%, RTA = 88.03 ms [11:49:46] RECOVERY - Host lvs4001 is UP: PING OK - Packet loss = 0%, RTA = 87.93 ms [11:49:46] RECOVERY - Host lvs4004 is UP: PING OK - Packet loss = 0%, RTA = 80.59 ms [11:49:47] RECOVERY - Host mobile-lb.ulsfo.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 80.00 ms [11:49:47] matanya: and I thought we already had that hehe [11:50:00] not in git [11:50:01] RECOVERY - Host text-lb.ulsfo.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 80.58 ms [11:50:11] subversion? [11:50:21] RECOVERY - Host backup4001 is UP: PING OK - Packet loss = 0%, RTA = 80.64 ms [11:50:28] * twkozlowski has spoken the cursed word [11:50:45] /ban twkozlowski No more subversion, we use hyperversion now. [11:52:58] matanya: created it : git clone ssh://gerrit.wikimedia.org:29418/operations/software/ganglios [11:53:05] matanya: rights inherits from operations/software [11:53:12] good, thanks! [11:53:23] i'll let you know if i have issues [11:53:40] hashar: warning: remote HEAD refers to nonexistent ref, unable to checkout. [11:53:53] matanya: Andrew Otto can follow up I guess [11:53:56] yeah the repo is empty [11:54:07] oh, ok. [11:54:25] I guess whoever import the code will force push [11:54:38] i have it, and can push it [11:54:53] unless you do it differntly [11:54:58] I have no clue [11:55:11] seems the code is in a mercurial repository at ttps://bitbucket.org/maplebed/ganglios [11:55:18] yes [11:55:21] that's normal error when it's empty [11:55:40] oh mercurial.. import from git would have been easier i suppose [11:55:49] ah, maplebed [11:56:06] git hg works for us :) [11:56:14] cool [11:56:28] you want to add the .gitreview file first [11:56:33] to the empty repo [11:56:34] i'll do it. unless there is some rule against it [11:56:39] yes, right [11:58:00] just import the hg > git repository [11:58:09] then we can send a commit that adds .gitreview [11:58:14] you will need push rights though [12:01:11] matanya: wanna push now ? [12:01:35] hashar: i'm converting atm [12:01:42] give me 5 mins [12:02:01] hmm, how do I publish something to noc.wikimedia.org/~username/ these days? [12:02:13] mkdir ~/public_html on fenari [12:02:58] just kidding, you use heroku to upload a docker container with jekyll [12:04:23] matanya: you should have force push rights on operations/software/ganglios [12:04:35] thanks a lot hashar [12:04:49] ori, do you ever sleep? [12:05:00] he sleeps at the office [12:05:10] makes sense:P [12:05:12] you should relocate there ori [12:05:46] i'm in my kitchen [12:06:28] I am laying on my couch still in pyjamas (at least I am wearing a pant) [12:07:58] hmm, does that mean that I should allow reading my ~? sounds scary:P [12:09:03] depends on what you keep in there [12:09:45] nothing valuable, actually [12:09:54] but still... [12:09:55] meh [12:10:47] hashar: it is git push /somethinghere/ origin/master [12:11:15] git push ssh://gerrit.wikimedia.org:29418/operations/software/ganglios master:master [12:11:17] I guess [12:11:59] didn't work hashar : ! [remote rejected] master -> master (prohibited by Gerrit) [12:12:00] error: failed to push some refs to 'ssh://gerrit.wikimedia.org:29418/operations/software/ganglios' [12:12:07] ahah [12:12:17] try with -f ? [12:12:20] git push -f ... [12:12:33] same [12:12:50] ahh [12:12:54] I created the master branch [12:12:58] hey akosiaris, I've just run diff tests on test wiki using the new package, looks good: http://noc.wikimedia.org/~maxsem/test.html [12:12:59] can you git push -f ? [12:13:23] yes, thanks! [12:13:32] MaxSem: nice :-) [12:14:18] I 'll upgrade the wikidiff2 package today then [12:14:29] awesome, thanks:) [12:14:33] matanya: don't you want to push the mercurial converted repo ? [12:14:52] i did, i started with .gitreview before [12:15:16] now i'll verfiy the hg stuff, and if it is all sane, i'll push that too [12:17:41] PROBLEM - Host cp4011 is DOWN: PING CRITICAL - Packet loss = 100% [12:17:41] PROBLEM - Host cp4012 is DOWN: PING CRITICAL - Packet loss = 100% [12:17:41] PROBLEM - Host cp4020 is DOWN: PING CRITICAL - Packet loss = 100% [12:17:41] PROBLEM - Host text-lb.ulsfo.wikimedia.org is DOWN: PING CRITICAL - Packet loss = 100% [12:17:51] PROBLEM - Host cp4019 is DOWN: PING CRITICAL - Packet loss = 100% [12:18:11] RECOVERY - Host cp4011 is UP: PING OK - Packet loss = 0%, RTA = 80.94 ms [12:18:12] RECOVERY - Host cp4019 is UP: PING OK - Packet loss = 0%, RTA = 80.93 ms [12:18:12] RECOVERY - Host cp4012 is UP: PING OK - Packet loss = 0%, RTA = 81.44 ms [12:18:16] (03PS1) 10Matanya: imported Mercurial ganglios from https://bitbucket.org/maplebed/ganglios/overview [operations/software/ganglios] - 10https://gerrit.wikimedia.org/r/106505 [12:18:21] RECOVERY - Host cp4020 is UP: PING OK - Packet loss = 0%, RTA = 80.94 ms [12:19:01] RECOVERY - Host text-lb.ulsfo.wikimedia.org is UP: PING OK - Packet loss = 0%, RTA = 75.04 ms [12:19:10] ok, this looks ok. I'll let otto review this [12:19:16] what do we need ganglios for? [12:20:12] paravoid: https://rt.wikimedia.org/Ticket/Display.html?id=6602 [12:30:32] oh, mutante https://gerrit.wikimedia.org/r/#/c/102629/ holidays are over [12:57:11] Could someone review? https://gerrit.wikimedia.org/r/#/c/101820/ [13:14:03] (03PS1) 10Alexandros Kosiaris: Removed the unneeded know wikidiff2.ini override [operations/puppet] - 10https://gerrit.wikimedia.org/r/106510 [13:15:25] akosiaris: s/know/now/ [13:15:47] duh... thanks [13:17:22] (03PS2) 10Alexandros Kosiaris: Removed the unneeded now wikidiff2.ini override [operations/puppet] - 10https://gerrit.wikimedia.org/r/106510 [13:28:43] paravoid: can you please explian the status of dns within puppet? there is the authdns module and a seperate manifest named dns.pp [13:29:21] matanya: dns.pp is the old one, it doesn't have authoritative dns classes for production anymore [13:29:39] matanya: it has the recursors, which is an entirely different piece of infrastructure and that I need to fix [13:29:50] matanya: and it also has the auth server for labs, based on powerdns & ldap [13:30:19] matanya: but from what I heard, this will change very soon with the introduction of https://wiki.openstack.org/wiki/Designate [13:30:27] very soon = with the transition to eqiad [13:30:54] which all need to striped from that pp and spread into the dns module/other modules? [13:32:10] no [13:32:38] dns::auth-server::ldap will go soon [13:32:48] dns::recursor needs overhauling [13:34:39] thanks paravoid. aside from this, any help needed in the debian DSA team? :) [13:34:55] haha [13:35:22] this sounds like a yes to me [13:35:49] you can't be DSA if you're not a member of the project [13:36:01] and we have nothing like labs to experiment with [13:36:19] and we have no code review system and no code reviews for the most part [13:36:44] so suffice to say, it's a little difficult [13:36:47] but thanks for the offer :) [13:37:03] i see a lot of space for helping :P [13:37:58] deploy gerrit/code review, create labs etc [14:52:07] (03CR) 10Milimetric: [C: 032 V: 032] Restructuring code from bin/logster into logster.logster module [operations/debs/logster] - 10https://gerrit.wikimedia.org/r/101021 (owner: 10Ottomata) [15:25:47] (03PS1) 10Jgreen: remove star cert from aluminium [operations/puppet] - 10https://gerrit.wikimedia.org/r/106519 [15:30:37] (03CR) 10Jgreen: [C: 032 V: 031] remove star cert from aluminium [operations/puppet] - 10https://gerrit.wikimedia.org/r/106519 (owner: 10Jgreen) [16:17:21] PROBLEM - RAID on db1001 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [16:31:15] pdf1 is down (http://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&c=PDF+servers+pmtpa&h=pdf1.wikimedia.org&tab=m&vn=&hide-hf=false&metric_group=). can someone reboot it please? [16:33:05] * Jeff_Green looking [16:33:25] schmir: I will as soon as I remember how to reboot with DRAC5. :-) [16:33:45] Coren: racadm serveraction powercycle ? [16:33:46] Coren: thanks! [16:33:58] reboot? reuse? recycle? [16:34:36] Jeff_Green: hardreset. [16:34:45] wow never seen that one [16:35:34] schmir: Should be rebooting now. [16:53:38] Coren: thanks, but it doesn't seem to come up again. it's been 17 minutes and the machine has rather small disk (<100G). [16:54:09] schmir: Indeed not. Lemme see if there is a hardware fail. [16:58:25] (03PS1) 10Aude: add wikisource site link group for wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106531 [16:58:26] (03PS1) 10Aude: cleanup old Wikibase settings which are same as default now [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/106532 [17:02:29] database locked again? [17:02:54] Which what where? [17:03:11] "The database has been automatically locked while the slave database servers catch up to the master" [17:03:27] Which wiki? [17:03:33] enwp [17:03:44] 1-2 minutes lagged [17:04:10] huh [17:05:01] schmir: I have bad news; it looks like the hardware is quite deat. [17:05:03] dead* [17:05:05] Looks like it's increasing [17:05:11] it just unlocked for me [17:05:19] en.wiki is read-only for me. :-( [17:05:27] mmm [17:05:33]