[00:34:22] PROBLEM - Host ms-be5 is DOWN: PING CRITICAL - Packet loss = 100% [00:43:30] New patchset: Ryan Lane; "Add sudo rights needed to manage gluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2969 [00:43:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2969 [00:44:18] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2969 [00:44:21] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2969 [00:49:00] I cannot seem to see the edit filter log for filter 213 on en.wiki [00:49:20] it's also showing an empty number of hits, when I know it's been tracking things for at least 2 years [00:49:22] http://en.wikipedia.org/w/index.php?title=Special:AbuseLog&wpSearchFilter=213 [00:49:47] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [00:58:47] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [01:07:47] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [01:07:47] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [01:26:16] !log running fixBug34995 on all wikis [01:26:20] Logged the message, Master [01:32:24] !log fixBug34995.php done [01:32:27] Logged the message, Master [01:32:41] PROBLEM - Puppet freshness on search1019 is CRITICAL: Puppet has not run in the last 10 hours [01:37:20] PROBLEM - Puppet freshness on search1020 is CRITICAL: Puppet has not run in the last 10 hours [01:54:44] RECOVERY - MySQL Replication Heartbeat on db42 is OK: OK replication delay 0 seconds [01:54:53] RECOVERY - MySQL Slave Delay on db42 is OK: OK replication delay 0 seconds [01:55:36] New patchset: Bhartshorne; "adding the ability to read optionsfrom a config file and modify running behavior so we can throttle up and down the cleaner as its running." [operations/software] (master) - https://gerrit.wikimedia.org/r/2970 [01:55:39] New review: gerrit2; "Lint check passed." [operations/software] (master); V: 1 - https://gerrit.wikimedia.org/r/2970 [01:58:36] New patchset: Bhartshorne; "adding the ability to read optionsfrom a config file and modify running behavior so we can throttle up and down the cleaner as its running." [operations/software] (master) - https://gerrit.wikimedia.org/r/2970 [01:58:39] New review: gerrit2; "Lint check passed." [operations/software] (master); V: 1 - https://gerrit.wikimedia.org/r/2970 [01:59:08] New patchset: Bhartshorne; "adding the ability to read optionsfrom a config file and modify running behavior so we can throttle up and down the cleaner as its running." [operations/software] (master) - https://gerrit.wikimedia.org/r/2970 [01:59:10] New review: gerrit2; "Lint check passed." [operations/software] (master); V: 1 - https://gerrit.wikimedia.org/r/2970 [02:00:36] New review: Bhartshorne; "(no comment)" [operations/software] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2970 [02:00:39] Change merged: Bhartshorne; [operations/software] (master) - https://gerrit.wikimedia.org/r/2970 [02:18:02] !log LocalisationUpdate completed (1.19) at Thu Mar 8 02:18:02 UTC 2012 [02:18:06] Logged the message, Master [02:38:11] PROBLEM - Host cp1044 is DOWN: PING CRITICAL - Packet loss = 100% [02:43:53] RECOVERY - Puppet freshness on search1020 is OK: puppet ran at Thu Mar 8 02:43:39 UTC 2012 [02:48:45] New patchset: Ryan Lane; "Ensure a run directory exists for the glustermanager" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2971 [02:48:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2971 [02:49:11] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2971 [02:49:14] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2971 [02:54:34] okay [02:54:41] I need to know why that edit filter isn't showing results [02:54:48] because the vandal it's meant to block is active now [03:01:44] RECOVERY - Puppet freshness on search1019 is OK: puppet ran at Thu Mar 8 03:01:31 UTC 2012 [03:07:39] RECOVERY - Host ms-be5 is UP: PING OK - Packet loss = 0%, RTA = 0.18 ms [04:24:27] Commons seems extraordinarily slow at present. :( [04:52:08] PROBLEM - Puppet freshness on db1022 is CRITICAL: Puppet has not run in the last 10 hours [05:32:47] New patchset: Dzahn; "nagios - move snmp stuff into it's own class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2972 [05:32:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2972 [05:47:29] Reedy: Any chance you're around? [05:54:43] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2972 [05:54:46] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2972 [06:01:20] http://lists.wikimedia.org/pipermail/toolserver-l/2012-March/004794.html <-- possible Toolserver corruption? [06:11:47] https://bugzilla.wikimedia.org/show_bug.cgi?id=35054 [06:11:54] That wouldn't be Swift-related, right? [06:51:04] !log tstarling rebuilt wikiversions.cdb and synchronized wikiversions files: [06:51:08] Logged the message, Master [06:52:31] !log tstarling synchronized multiversion/MWMultiVersion.php [06:52:34] Logged the message, Master [06:55:39] New patchset: Dzahn; "add iptables rules - let only production network send snmp-traps to nagios" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2973 [06:55:50] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2973 [07:09:39] New patchset: Dzahn; "add iptables rules - let only production network send snmp-traps to nagios" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2973 [07:09:50] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2973 [07:10:47] New patchset: Dzahn; "add iptables rules - let only production network send snmp-traps to nagios" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2973 [07:10:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2973 [07:11:14] Can someone perhaps monitor commons somehow ? action=delete submission are going very slow (bug 35047) [07:11:22] perhaps we can somehow track the cause [07:11:51] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2973 [07:11:54] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2973 [07:16:57] New patchset: Dzahn; "sort iptables ports and protocols alphabetically" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2974 [07:17:05] PROBLEM - Disk space on ms1004 is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=95%): /var/lib/ureadahead/debugfs 0 MB (0% inode=95%): [07:17:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2974 [07:22:56] RECOVERY - Disk space on ms1004 is OK: DISK OK [07:35:59] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [07:52:06] New patchset: Dzahn; "add groups::wikidev to cadmium, puppet broke due to dependency Group[500] for User[catrope] without it" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2975 [07:52:17] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2975 [07:52:56] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2974 [07:52:59] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2974 [07:53:32] New review: Dzahn; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2975 [07:53:35] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2975 [07:54:50] RECOVERY - Puppet freshness on cadmium is OK: puppet ran at Thu Mar 8 07:54:46 UTC 2012 [08:10:08] Has the image server been successfully replicated? [08:10:18] or whatever was being done to it yesterday? [08:14:11] New review: Dzahn; "looking at that docroot as it is now, there are some files and dirs owned by groups "svn" and "svnad..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/2888 [08:16:53] PROBLEM - Host cp1019 is DOWN: PING CRITICAL - Packet loss = 100% [08:23:02] RECOVERY - Host cp1019 is UP: PING OK - Packet loss = 0%, RTA = 26.42 ms [08:27:14] PROBLEM - Backend Squid HTTP on cp1019 is CRITICAL: Connection refused [08:28:08] PROBLEM - Frontend Squid HTTP on cp1019 is CRITICAL: Connection refused [08:36:59] RECOVERY - Backend Squid HTTP on cp1019 is OK: HTTP OK HTTP/1.0 200 OK - 27400 bytes in 0.161 seconds [08:38:02] RECOVERY - Frontend Squid HTTP on cp1019 is OK: HTTP OK HTTP/1.0 200 OK - 27545 bytes in 0.108 seconds [08:42:59] PROBLEM - Backend Squid HTTP on cp1019 is CRITICAL: Connection refused [09:11:02] PROBLEM - Puppet freshness on mw1010 is CRITICAL: Puppet has not run in the last 10 hours [09:13:52] RECOVERY - Backend Squid HTTP on cp1019 is OK: HTTP OK HTTP/1.0 200 OK - 27400 bytes in 0.161 seconds [09:17:10] RECOVERY - Puppet freshness on mw1010 is OK: puppet ran at Thu Mar 8 09:16:56 UTC 2012 [09:17:52] !log running puppet on mw1010 - finished quickly without problems - uh, wonder why Nagios reported puppet freshness then [09:17:55] Logged the message, Master [09:19:52] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [09:19:52] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [09:28:54] New patchset: Hashar; "Bug 28469 - Make SVN Documentation be indexed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2888 [09:29:06] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2888 [09:31:21] New review: Hashar; "Good point! I have changed the recursive declaration so it put files in the svnadm group which most..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2888 [09:31:50] New review: Hashar; "And I also rebased the patch set :-)" [operations/puppet] (production) C: 0; - https://gerrit.wikimedia.org/r/2888 [10:17:10] New review: Dzahn; "alright, convinced after we talked more about this. e.g. "everything is in svnadm group already besi..." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2888 [10:17:13] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2888 [10:20:48] hashar: permissions changed by puppet [10:21:03] great! thanks :-) [10:21:09] looks a lot cleanr for sure [10:21:53] and yes http://svn.wikimedia.org/robots.txt [10:26:02] we just have to wait a few days now [10:26:27] PROBLEM - Lucene on mw1010 is CRITICAL: Connection refused [10:27:48] hashar: did you know, that if you click on TESFILE in gerrit, in change 2959, you get "Not Found" even though the change is still there? was that the test? [10:28:36] https://gerrit.wikimedia.org/r/#change,2959 ? [10:28:41] well I manage to see the diff [10:28:55] https://gerrit.wikimedia.org/r/#patch,sidebyside,2959,1,TESTFILE [10:28:56] how do you get the not found error? [10:29:14] works for me :) [10:29:48] The page you requested was not found. [10:29:50] weird [10:30:13] I have used that change to trigger a notification to jenkins [10:30:25] same here https://gerrit.wikimedia.org/r/#patch,sidebyside,2954,1,manifests/site.pp [10:30:48] must be an issue on your side :-( [10:39:01] i wonder how it can be.. but anyways.. off for dinner first.. it only happens on those 2 changes that somehow have "test" n their name anyways [10:39:15] cya around [10:42:12] PROBLEM - Host ms-be5 is DOWN: PING CRITICAL - Packet loss = 100% [10:50:45] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [10:59:45] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [11:08:45] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [11:08:45] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [11:35:29] PROBLEM - RAID on searchidx2 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:37:17] RECOVERY - RAID on searchidx2 is OK: OK: State is Optimal, checked 4 logical device(s) [12:13:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:15:41] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.914 seconds [12:50:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:56:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 1.687 seconds [13:20:59] Joan: I am now [13:30:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:36:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.888 seconds [14:01:21] New patchset: Mark Bergsma; "Fix up squid/varnish partitioning" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2976 [14:01:32] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2976 [14:02:41] Reedy: Hi pumpkin. [14:02:51] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2976 [14:02:55] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2976 [14:03:01] Reedy: http://lists.wikimedia.org/pipermail/toolserver-l/2012-March/004794.html [14:03:05] I was going to bother you about that. [14:05:57] Joan: first is empty, 2nd isn't [14:06:42] Hmm. [14:07:33] Now it appears to be fine. [14:07:39] lol [14:08:02] I mean, the file page is still fucked up. [14:08:04] There's a bug about that. [14:08:15] But the Hexen page is now showing up the DB. [14:08:17] up in the [14:09:13] http://en.wikipedia.org/w/index.php?title=Hexen:_Beyond_Heretic&action=history [14:09:16] lawl [14:11:08] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:15:02] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 3.222 seconds [14:24:05] !log reedy synchronized wmf-config/InitialiseSettings.php 'Bug 35012 - Namespace aliases for wikipedia and wikipedia-talk namespaces on Sanskrit wiki' [14:24:08] Logged the message, Master [14:30:46] Hi, [14:30:59] still no dev is able to investigate my bug ? https://bugzilla.wikimedia.org/show_bug.cgi?id=17618 [14:49:30] I'll have a look when these uploads finish [14:51:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:53:26] PROBLEM - Puppet freshness on db1022 is CRITICAL: Puppet has not run in the last 10 hours [14:55:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 5.994 seconds [15:04:44] Reedy: thank you :) [15:10:27] Greatgib: are you florent? [15:10:48] Reedy: yes [15:11:18] Right, I'm gonna close of this bug, as it's easier to keep these things seperate [15:12:12] Reedy: ? but i think that it is the same bug that the previous person suffered even if he finally managed to get rid of it [15:12:38] Yes it is (seemingly), but this isn't a general bug [15:12:42] It's one persons issues [15:13:42] Reedy: ok, I think that it is certainly to fix my case, but the idea was more for you to be able to investigate what happened to trigger this case [15:14:15] But there's no fix, the user seems to have sorted it, but we don't know, and they haven't said how they did if they did [15:14:17] Reedy: but sure that i'm in a hurry to be able to use my account again :p [15:14:31] New patchset: Mark Bergsma; "Make a specific partman file for varnish with xfs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2977 [15:14:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2977 [15:14:58] let me see what hte password hashes look like [15:15:00] Reedy: vito already encountered such a case, and the easy fix looks like to rename my account en enwiki to something different and then to try to sul my account again [15:16:22] ^^ but that would not help to find the root cause of the issue [15:16:43] right, your home wiki is frwiki [15:17:44] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2977 [15:17:47] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2977 [15:17:49] Reedy: yes, and my account is successfully merged on all other wiki: de, mediacommon etc... [15:19:23] Greatgib: try logging into enwiki directly now [15:23:22] Reedy: That works :) [15:23:39] Reedy: did you just fixed my case or found what was the cause? [15:23:42] Great [15:23:45] I just fixed your case [15:23:48] yeh great :p [15:23:54] We get errors like this from time to time [15:24:09] Reedy: ok, sad that you didn't found the origin of this :( [15:24:32] Reedy: anyway, thank you :) [15:24:51] Unless many many people start complaining, it's not usually worth the effort to investigate [15:25:15] Just tried it on one of my test accounts, and visiting zhwiki directly worked [15:25:41] and I can login directly to zhwiki [15:27:40] Greatgib: you might want to attempt to reattach your enwiki account too [15:31:01] Reedy: It is merged and everything looks to be fine now :) [15:31:18] Yay [15:31:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:31:57] Reedy: just for information, it was not on the first other wiki that i logged in that this bug happened but more during my first trial to edit an article in enwiki: [15:32:08] looking like this bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=23295 [15:35:29] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.191 seconds [16:11:11] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:17:02] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.128 seconds [16:28:52] New patchset: Mark Bergsma; "Initial puppetization of varnish upload cluster in eqiad" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2978 [16:29:03] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2978 [16:30:16] !log reedy synchronized php-1.19/extensions/ExtensionDistributor/ExtensionDistributor_body.php 'Test for bug 27246' [16:30:20] Logged the message, Master [16:31:23] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2978 [16:31:25] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2978 [16:31:43] !log reedy synchronized php-1.19/extensions/ExtensionDistributor/ExtensionDistributor_body.php 'Revert live hack because it works, will come in properly' [16:31:46] Logged the message, Master [16:35:45] New patchset: Mark Bergsma; "Add text/upload cache groups to ganglia" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2979 [16:35:56] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2979 [16:36:08] !log reedy synchronized php-1.19/extensions/ExtensionDistributor/ExtensionDistributor_body.php 'r113368' [16:36:11] Logged the message, Master [16:36:17] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2979 [16:36:19] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2979 [16:43:02] New patchset: Mark Bergsma; "Add empty upload varnish VCL templates" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2980 [16:43:13] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2980 [16:44:36] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2980 [16:44:38] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2980 [16:45:19] PROBLEM - Host db1033 is DOWN: PING CRITICAL - Packet loss = 100% [16:45:59] RECOVERY - Disk space on search10 is OK: DISK OK [16:50:56] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:56:47] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.031 seconds [16:59:25] hi all [16:59:50] hi [17:00:17] New patchset: Mark Bergsma; "Add fstype" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2981 [17:00:29] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2981 [17:00:40] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2981 [17:00:43] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2981 [17:01:19] how to work from home for wikimedia in the tech field ? [17:09:47] New patchset: Mark Bergsma; "Fix storage file name" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2982 [17:09:59] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2982 [17:10:33] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2982 [17:10:35] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2982 [17:12:54] New patchset: Mark Bergsma; "Dashes are ILLEGAL!" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2983 [17:13:08] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2983 [17:13:22] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2983 [17:13:24] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2983 [17:29:04] PROBLEM - Host db18 is DOWN: PING CRITICAL - Packet loss = 100% [17:30:43] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:38:40] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.050 seconds [17:46:52] !log reedy synchronized wmf-config/InitialiseSettings.php 'ptwikipedia to ptwiki' [17:46:55] Logged the message, Master [18:05:35] Hey, anyone here who I can ask about NOINDEX tags on the mobile versions of EN Wikipedia pages? [18:10:46] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:16:46] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 8.002 seconds [18:27:28] !log Pushing new AFTv5 code to testwiki, do not sync to the live site just yet [18:27:31] Logged the message, Mr. Obvious [18:29:11] !log Applying AFTv5 schema changes on testwiki [18:29:14] Logged the message, Mr. Obvious [18:30:09] PROBLEM - Varnish HTTP upload-backend on cp1021 is CRITICAL: Connection refused [18:34:57] PROBLEM - Lucene on search1008 is CRITICAL: Connection refused [18:40:31] !log preilly synchronized php-1.19/extensions/MobileFrontend/ApplicationTemplate.php 'remove ROBOTS metatag' [18:40:35] Logged the message, Master [18:43:27] !log preilly synchronized php-1.19/extensions/MobileFrontend/ApplicationTemplate.php 'remove ROBOTS metatag' [18:43:31] Logged the message, Master [18:43:50] !log needed to fix a google issue with robots [18:43:53] Logged the message, Master [18:52:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:53:15] RECOVERY - Host cp1044 is UP: PING OK - Packet loss = 0%, RTA = 26.45 ms [18:53:55] !log Running rebuildLocalisationCache.php [18:53:58] Logged the message, Mr. Obvious [18:58:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.025 seconds [18:59:06] PROBLEM - Varnish traffic logger on cp1044 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [19:01:03] RECOVERY - Varnish traffic logger on cp1044 is OK: PROCS OK: 2 processes with command name varnishncsa [19:04:48] !log Clearing message blobs [19:04:50] Logged the message, Mr. Obvious [19:08:15] PROBLEM - Host cp1019 is DOWN: PING CRITICAL - Packet loss = 100% [19:09:19] !log preilly synchronized php-1.19/extensions/ZeroRatedMobileAccess/ZeroRatedMobileAccess.body.php 'changes for zero' [19:09:22] Logged the message, Master [19:09:38] !log preilly synchronized php-1.19/extensions/ZeroRatedMobileAccess/ZeroRatedMobileAccess.i18n.php 'changes for zero' [19:09:40] !log push zero rated changes [19:09:41] Logged the message, Master [19:09:43] Logged the message, Master [19:18:27] PROBLEM - Puppet freshness on mw1010 is CRITICAL: Puppet has not run in the last 10 hours [19:21:27] PROBLEM - Puppet freshness on mw1020 is CRITICAL: Puppet has not run in the last 10 hours [19:21:27] PROBLEM - Puppet freshness on mw1110 is CRITICAL: Puppet has not run in the last 10 hours [19:26:51] !log Applying AFTv5 schema changes to en_labswikimedia [19:26:54] Logged the message, Mr. Obvious [19:29:19] !log catrope synchronized wmf-config/InitialiseSettings.php '$wgArticleFeedbackv5OversightEmails' [19:29:22] Logged the message, Master [19:29:46] !log catrope synchronized wmf-config/CommonSettings.php '$wgArticleFeedbackv5OversightEmails' [19:29:49] Logged the message, Master [19:30:16] !log Running AFTv5 schema changes on enwiki [19:30:19] Logged the message, Mr. Obvious [19:30:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:34:20] !log Running scap for ArticleFeedbackv5 updates [19:34:23] Logged the message, Mr. Obvious [19:36:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 6.400 seconds [19:38:55] !log catrope synchronizing Wikimedia installation... : ArticleFeedbackv5 updates [19:38:58] Logged the message, Master [19:51:07] sync done. [20:01:58] New patchset: Lcarr; "Modifying nagios config files for icinga" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2989 [20:02:09] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2989 [20:12:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:18:37] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.019 seconds [20:26:52] PROBLEM - Host ms-be3 is DOWN: PING CRITICAL - Packet loss = 100% [20:40:25] !log preilly synchronized php-1.19/extensions/ZeroRatedMobileAccess/ZeroRatedMobileAccess.php 'changes for zero' [20:40:29] Logged the message, Master [20:40:41] !log preilly synchronized php-1.19/extensions/ZeroRatedMobileAccess/ZeroRatedMobileAccess.body.php 'changes for zero' [20:40:45] Logged the message, Master [20:42:10] PROBLEM - Host ps1-c1-sdtpa is DOWN: PING CRITICAL - Packet loss = 100% [20:42:46] RECOVERY - Host ps1-c1-sdtpa is UP: PING OK - Packet loss = 0%, RTA = 2.65 ms [20:52:40] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [20:52:58] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:57:01] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 9.951 seconds [21:01:40] PROBLEM - Puppet freshness on owa3 is CRITICAL: Puppet has not run in the last 10 hours [21:10:40] PROBLEM - Puppet freshness on owa1 is CRITICAL: Puppet has not run in the last 10 hours [21:10:40] PROBLEM - Puppet freshness on owa2 is CRITICAL: Puppet has not run in the last 10 hours [21:24:08] New patchset: Lcarr; "Modifying nagios config files for icinga" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2989 [21:24:19] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2989 [21:30:05] Ryan_Lane: you must be very busy. Can you add yourself a memo to review https://gerrit.wikimedia.org/r/#change,2863 ? [21:30:15] it is supposed to remove ANSI characters from puppet linter [21:30:54] New patchset: Reedy; "Change doxygen checkout of core to checkout/update via git" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2990 [21:31:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2990 [21:31:14] hashar: that's what broke the lint checks before, remember? [21:31:27] it's deprecated [21:31:31] Ryan_Lane: yup . Because I used: puppet --color [21:31:38] oh [21:31:39] ok [21:31:42] let's try it [21:31:42] the --color had to be passed to the [21:31:46] I see [21:31:47] not directly to puppet [21:32:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:33:21] New review: Demon; "Looks good, but let's not merge it until the core mirror is 100% in sync with svn. Then we can go ah..." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2990 [21:36:14] New review: Hashar; "(no comment)" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/2990 [21:36:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 2.585 seconds [21:38:11] New patchset: Reedy; "Change doxygen checkout of core to checkout/update via git" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2990 [21:38:23] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2990 [21:38:56] New review: Reedy; "Yeah, indeed. No point breaking stuff prematurely? ;)" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2990 [21:39:49] Reedy: just amend your change 2990 [21:39:59] I just did [21:40:04] Reedy: then we can just keep rebasing it for the next two weeks or [21:40:26] What do you mean? [21:40:29] <^demon> Well as soon as you give me the green light and we push core to its final home we can go ahead and change that over. [21:41:06] we will do that after your holidays anyway [21:41:52] nighty o/ [21:43:56] New review: Hashar; "(no comment)" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/2990 [21:44:16] Reedy: you missed my second comment in the patch set :-D [21:44:46] "I would prefer that you didn't submit this" [21:44:55] That really feels passive aggressive :p [21:45:54] yeah :-/ [21:46:20] I would prefer something like "can you please gently review your change because we are afraid it might produce blank pages on live site. Kthx? " [21:46:49] or "Well done dude!!!! A little more change and it will be perfect" ;-D [21:47:40] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2863 [21:47:42] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2863 [21:48:11] * hashar hopes he did not made a mistake [21:48:59] we'll see :) [21:49:40] let me rebase one of my old changes [21:50:40] New patchset: Hashar; "redirect some missing Swift syslog messages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2820 [21:50:52] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2820 [21:51:04] hashar: seems to be working [21:51:09] yeah :-) [21:51:35] lemme test a failure [21:51:41] New patchset: Ryan Lane; "Test" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2991 [21:51:44] could do that in the test branch probably [21:51:52] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/2991 [21:52:02] meh, I'm just going to abandon the change [21:52:16] much better! [21:52:38] looks good [21:52:52] next thing would be to make the path shorter but that would be for another time [21:53:15] if you are missing python and have some free time, you could get a look at https://bugzilla.wikimedia.org/34764 [21:53:26] Change abandoned: Ryan Lane; "But he said it's just a test, and he said it's just a test." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2991 [21:53:27] it is about gerrit running puppet linter on every repo [21:53:51] since it is in the general patchset-submitted hook and executed unconditionally for every repo :) [21:53:56] will probably poke it next week [21:54:06] yeah [21:54:10] I can likely handle it right now [21:54:50] I will probably have a jenkins job to lint PHP, probably for python too [21:55:26] New patchset: Ryan Lane; "Only run this for the puppet repo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2992 [21:55:37] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2992 [21:55:49] hashar: review? [21:56:04] sounds too easy :-))) [21:56:11] I was looking at the set-verify hook [21:56:18] set-verify function [21:57:12] heh [21:57:20] well, the lint check will only run for one repo, now [21:57:36] we wanted a more generic solution, but if you guys are going to handle your stuff in jenkins, there's no point [21:58:25] well I will surely try to use jenkins [21:59:08] New review: Hashar; "Clean and easy way to fix that bug. Kudos!" [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/2992 [21:59:22] Ryan_Lane: looks good [21:59:42] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2992 [21:59:45] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2992 [21:59:55] I am probably going to get a PHP linting job in jenkins next week [22:00:37] I have closed bug. Thanks ! [22:00:48] yw [22:00:55] I've been wanting to fix that for a while [22:01:23] Change abandoned: Ryan Lane; "..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2954 [22:04:21] I am heading to bed / weekend :-) have fun and see you all monday! [22:12:59] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:13:15] !log catrope synchronized php-1.19/includes/WebRequest.php 'r113411' [22:13:18] Logged the message, Master [22:13:25] !log catrope synchronized php-1.19/includes/Cdb.php 'r113411' [22:13:29] Logged the message, Master [22:13:36] !log catrope synchronized php-1.19/includes/MagicWord.php 'r113411' [22:13:39] Logged the message, Master [22:18:50] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 0.025 seconds [22:26:47] New patchset: Asher; "syncing up with https://github.com/asher/gdash" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2993 [22:26:58] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2993 [22:28:17] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2993 [22:28:20] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2993 [22:39:50] New patchset: Pyoungmeister; "script cleanup" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2994 [22:40:01] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2994 [22:40:45] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2994 [22:40:47] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2994 [22:40:50] !log aaron synchronized php-1.19/includes/filerepo/backend 'deployed r113413, r113414' [22:40:53] Logged the message, Master [22:46:37] New patchset: RobH; "added other locke items to oxygen as it is the locke replacement" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2995 [22:46:48] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/2995 [22:52:14] !log asher synchronized wmf-config/db.php 'pulling db18' [22:52:18] Logged the message, Master [22:52:53] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:54:20] New patchset: RobH; "damned typo added other locke items to oxygen as it is the locke replacement" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2995 [22:54:31] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2995 [22:56:48] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 4.585 seconds [23:01:11] New review: RobH; "looks right to me, though the puppet class names are a bit servername specific to locke. chekcing t..." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/2995 [23:01:13] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2995 [23:01:46] !log asher synchronized wmf-config/db.php 'returning db24 to service' [23:01:49] Logged the message, Master [23:02:02] RECOVERY - Host ms-be3 is UP: PING OK - Packet loss = 0%, RTA = 1.07 ms [23:07:44] RECOVERY - Host db1033 is UP: PING OK - Packet loss = 0%, RTA = 26.77 ms [23:07:47] !log tstarling synchronized php-1.19/extensions/ExtensionDistributor/svn-invoker.conf [23:07:50] Logged the message, Master [23:08:20] New patchset: Ryan Lane; "Add gluster to instances by default" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2996 [23:08:31] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2996 [23:08:39] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2996 [23:08:41] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2996 [23:11:56] PROBLEM - mysqld processes on db1033 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [23:13:53] RECOVERY - mysqld processes on db1033 is OK: PROCS OK: 1 process with command name mysqld [23:16:48] !log tstarling synchronizing Wikimedia installation... : [23:16:51] Logged the message, Master [23:17:20] * AaronSchulz waits for 15min :) [23:19:05] !log started changing the php symlink to 1.19 instead of 1.18, but then changed my mind and changed it back. [23:19:07] Logged the message, Master [23:19:35] PROBLEM - MySQL Replication Heartbeat on db1033 is CRITICAL: CRIT replication delay 23854 seconds [23:20:02] PROBLEM - MySQL Slave Running on db1033 is CRITICAL: CRIT replication Slave_IO_Running: No Slave_SQL_Running: No Last_Error: Rollback done for prepared transaction because its XID was not in the [23:21:32] RECOVERY - MySQL Replication Heartbeat on db1033 is OK: OK replication delay seconds [23:21:40] New patchset: Ryan Lane; "Adding base directory for labs automounts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2998 [23:21:51] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/2998 [23:21:58] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/2998 [23:21:59] RECOVERY - MySQL Slave Running on db1033 is OK: OK replication [23:22:00] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/2998 [23:25:53] PROBLEM - mysqld processes on db1033 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [23:29:02] sync done. [23:33:05] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:36:35] New patchset: Ryan Lane; "Adding gluster options for autofs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3000 [23:36:46] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/3000 [23:36:59] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 335 bytes in 1.925 seconds [23:37:09] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/3000 [23:37:12] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/3000 [23:37:32] !log awjrichards synchronizing Wikimedia installation... : Updating MobileFrontend per https://www.mediawiki.org/wiki/Extension:MobileFrontend/Deployments [23:37:35] Logged the message, Master [23:39:40] sync done. [23:45:41] RECOVERY - Host ms-be5 is UP: PING OK - Packet loss = 0%, RTA = 1.65 ms [23:55:02] !log tstarling synchronized wmf-config/InitialiseSettings.php 'per-wiki memory limit configuration, with extra memory for zh* for converter tables' [23:55:05] Logged the message, Master [23:56:06] !log tstarling synchronized wmf-config/CommonSettings.php 'per-wiki memory limit configuration, with extra memory for zh* for converter tables' [23:56:09] Logged the message, Master [23:57:37] hmm [23:57:50] !log awjrichards synchronized php-1.19/extensions/MobileFrontend/templates/ApplicationTemplate.php 'r113428' [23:57:53] Logged the message, Master