[00:05:12] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [00:08:32] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 00:08:23 UTC 2013 [00:08:52] RECOVERY - Puppet freshness on labstore1001 is OK: puppet ran at Tue Apr 9 00:08:46 UTC 2013 [00:09:12] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [00:09:22] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 00:09:13 UTC 2013 [00:10:12] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [00:10:42] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 00:10:33 UTC 2013 [00:11:12] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [00:11:22] PROBLEM - SSH on caesium is CRITICAL: Server answer: [00:11:22] PROBLEM - SSH on cp1044 is CRITICAL: Server answer: [00:12:02] PROBLEM - SSH on lvs1001 is CRITICAL: Server answer: [00:13:22] RECOVERY - SSH on cp1044 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [00:14:13] PROBLEM - SSH on cp1043 is CRITICAL: Server answer: [00:15:22] RECOVERY - SSH on caesium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [00:16:22] PROBLEM - SSH on cp1044 is CRITICAL: Server answer: [00:17:22] RECOVERY - SSH on cp1044 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [00:18:12] RECOVERY - SSH on cp1043 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [00:18:22] PROBLEM - SSH on caesium is CRITICAL: Server answer: [00:18:55] <^demon> !log jenkins is hung, looking into it. [00:19:02] RECOVERY - SSH on lvs1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [00:19:02] Logged the message, Master [00:21:22] RECOVERY - SSH on caesium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [00:26:07] <^demon> !log jenkins is back. java gc took awhile. [00:26:14] Logged the message, Master [00:26:54] New patchset: Odder; "(bug 46153) Subject namespace to thwikibooks, change wgSitename" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58251 [00:32:52] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 00:32:50 UTC 2013 [00:33:12] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [00:40:47] Now I think of it, it might have been useful to ask them if they wanted to have this namespaces included in wgContentNamespaces and wgNamespacesToBeSearchedDefault [00:41:13] s/this/these; I haven't seen anyone else asking those questions on the bugs I checked before committing though [00:45:09] New patchset: Dzahn; "add snuggle.wm SSL cert per RT #4473" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58254 [00:47:24] New review: Dzahn; "renamed to .pem from .crt, from tridge in /data/ssl-certs/snuggle.wikimedia.org/" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/58254 [01:01:29] hi, i need to add a new namespace as part of the new extension for meta. The extension is not fully ready yet, but I would like to start creating some pages and don't think they should be in nz 0 [01:02:17] should i do them as regular NS 0, and later re-save them once the extension declares that namespace name, or is there another way? [01:04:12] yurik: Just create the namespace? [01:04:30] It's certainly not something you need to ask operations about [01:04:36] Reedy, not sure how i can do that? I thought NS are defined in localsettings [01:04:53] and someone said ops are the ones assigning numbers [01:04:53] https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php [01:05:06] No, they're not [01:05:07] https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php [01:05:09] bah [01:05:14] Search for wgExtraNamespaces [01:05:19] First entry there is metawiki [01:05:41] any ops around? i need some quick query proxying to the OTRS DB [01:05:54] so that i can fix the RT 4430, etc. query [01:06:34] Reedy, I am modeling mine on Schema extension, and its not defined in there, but instead in the extension itself [01:06:43] And? [01:06:46] i guess i will define it here first, wait for it to deploy [01:06:56] edit the pages, and once the extension is ready, remove here [01:07:02] is that a good plan? [01:07:08] That's what I'd do, yeah [01:07:33] just put an accompanying comment saying what it's for etc [01:07:49] mutante: can you chat about rt 822? or see my req ~10 lines back [01:08:01] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [01:08:05] Reedy, thanks! [01:40:20] New patchset: Yurik; "Merge "$wgEventLoggingFile: emery => vanadium"" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58264 [01:40:54] Change abandoned: Yurik; "git review is misbehaving" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58264 [01:41:29] yurik: Probably not git-review's fault [01:41:36] You most likely have a dirty master branch [01:41:42] Try running git pull --rebase on master [01:42:12] RoanKattouw, i just did a fresh clone, and typed git-review hoping it would initialize the hook. Instead it commited [01:42:18] Oh, hehe [01:42:20] i guess i am at fault :) [01:42:21] That's git review -s [01:42:38] heh, still, it shouldn't have commited something already in gerrit [01:43:12] Oh, haha I see why [01:43:15] Interesting failure mode [01:43:19] That is a git-review bug yes [01:45:52] New patchset: Yurik; "Added "Zero" & "Zero talk" namespaces to metawiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58265 [01:46:02] RoanKattouw, lol, i just switched to another branch, from fresh clone [01:46:07] made a change, commited [01:46:16] Yeah [01:46:21] and it requested to do the "yes" to two changes :-P [01:46:21] If you actually make a change first, then it works fine [01:46:27] Oh, that [01:46:29] See my wikitech-l post [01:46:36] yeah, i saw something about it [01:46:40] so i typed yes [01:46:41] Annoying bug which I fixed in git-review a year ago, they decided to unfix it recently [01:46:45] hope it didn't break anythnig? :) [01:46:49] No [01:46:52] Also [01:47:00] If you're concerned, say "no", then run "git fetch gerrit", then try again [01:47:10] But if gerrit-wm reports only one patchset, you're fine [01:47:23] If it floods the channel or otherwise reports spurious things in your name, that's when you panic ;) [01:47:34] RoanKattouw, https://gerrit.wikimedia.org/r/#/c/58265/ [01:47:43] Oh hah [01:47:48] Based on an abandoned commit [01:48:01] bleh! [01:48:05] git rebase --onto origin/master HEAD~! [01:48:06] Ahm [01:48:08] HEAD~1 [01:48:34] git review -d 58265 [01:48:46] Yup, then that rebase, then git review [01:49:15] New patchset: Yurik; "Added "Zero" & "Zero talk" namespaces to metawiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58265 [01:49:27] fixed [01:49:28] bleh!!! [01:50:03] RoanKattouw, thanks :) [01:50:06] Hm. Funny whitespace error reminder :) [01:54:24] New patchset: Yurik; "Added "Zero" & "Zero talk" namespaces to metawiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58265 [01:55:26] odder, thx, fixed [02:03:05] PROBLEM - Disk space on analytics1010 is CRITICAL: DISK CRITICAL - free space: / 679 MB (3% inode=87%): [02:06:43] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [02:07:51] yurik: Are you able to deploy that yourself? Or do you want me to do it for you? [02:17:23] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [02:18:54] !log LocalisationUpdate completed (1.22wmf1) at Tue Apr 9 02:18:53 UTC 2013 [02:19:08] Logged the message, Master [02:20:23] PROBLEM - Apache HTTP on mw1057 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:33] PROBLEM - Apache HTTP on mw1061 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:33] PROBLEM - MySQL Slave Delay on db1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:20:33] PROBLEM - Apache HTTP on mw1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:43] PROBLEM - Apache HTTP on mw1098 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:43] PROBLEM - Apache HTTP on mw1051 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:43] PROBLEM - Apache HTTP on mw1088 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:43] PROBLEM - Apache HTTP on mw1186 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:43] PROBLEM - Apache HTTP on mw1028 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:44] PROBLEM - Apache HTTP on mw1022 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:46] (Cannot contact the database server: Unknown error (10.64.16.6)) [02:20:49] ^ s1 master [02:20:53] PROBLEM - Apache HTTP on mw1178 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:53] PROBLEM - Apache HTTP on mw1086 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:53] PROBLEM - Apache HTTP on mw1110 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:53] PROBLEM - Apache HTTP on mw1059 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:53] PROBLEM - Apache HTTP on mw1091 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:54] PROBLEM - Apache HTTP on mw1041 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:54] PROBLEM - MySQL Slave Running on db1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:20:55] PROBLEM - Apache HTTP on mw1171 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:55] PROBLEM - Apache HTTP on mw1168 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:56] PROBLEM - Apache HTTP on mw1078 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:56] PROBLEM - Apache HTTP on mw1025 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:57] PROBLEM - Apache HTTP on mw1090 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:57] PROBLEM - Apache HTTP on mw1095 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:58] PROBLEM - Apache HTTP on mw1067 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:58] PROBLEM - Apache HTTP on mw1106 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:59] notpeter: TimStarling^ [02:20:59] PROBLEM - Apache HTTP on mw1031 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:20:59] PROBLEM - Apache HTTP on mw1104 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:21:00] PROBLEM - Apache HTTP on mw1027 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:21:00] PROBLEM - LVS HTTP IPv4 on rendering.svc.pmtpa.wmnet is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 MediaWiki exception - 1600 bytes in 2.221 second response time [02:21:01] PROBLEM - LVS HTTP IPv4 on appservers.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:21:01] PROBLEM - Apache HTTP on mw1172 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:21:02] PROBLEM - MySQL Recent Restart on db1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:21:02] PROBLEM - Apache HTTP on mw1064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:21:03] PROBLEM - Apache HTTP on mw1109 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:21:03] PROBLEM - Apache HTTP on mw1047 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:22:02] This outage was caused by a server monkey or kitten. [02:22:07] ^___^ [02:22:13] Those kittens [02:22:13] PROBLEM - Apache HTTP on mw1150 is CRITICAL: Connection timed out [02:22:13] RECOVERY - Apache HTTP on mw1093 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 8.556 second response time [02:22:13] RECOVERY - Apache HTTP on mw1035 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 9.798 second response time [02:22:13] RECOVERY - Apache HTTP on mw1096 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 9.930 second response time [02:22:13] PROBLEM - Apache HTTP on mw1151 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:22:13] PROBLEM - Apache HTTP on mw1063 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:22:23] RECOVERY - Apache HTTP on mw1049 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 1.679 second response time [02:22:33] RECOVERY - Apache HTTP on mw1054 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 742 bytes in 0.059 second response time [02:22:33] RECOVERY - Apache HTTP on mw1186 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.056 second response time [02:22:43] RECOVERY - Apache HTTP on mw1034 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 742 bytes in 0.060 second response time [02:22:43] RECOVERY - Apache HTTP on mw1090 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.044 second response time [02:22:43] RECOVERY - Apache HTTP on mw1025 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 742 bytes in 0.061 second response time [02:22:43] RECOVERY - Apache HTTP on mw1097 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.064 second response time [02:22:44] RECOVERY - Apache HTTP on mw1095 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.504 second response time [02:22:44] RECOVERY - Apache HTTP on mw1038 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 742 bytes in 1.317 second response time [02:22:44] RECOVERY - Apache HTTP on mw1086 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 5.704 second response time [02:22:44] RECOVERY - Apache HTTP on mw1033 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.064 second response time [02:22:45] RECOVERY - Apache HTTP on mw1030 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.061 second response time [02:22:45] RECOVERY - Apache HTTP on mw1109 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.068 second response time [02:22:46] RECOVERY - Apache HTTP on mw1110 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 9.659 second response time [02:22:53] RECOVERY - Apache HTTP on mw1104 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 6.022 second response time [02:22:53] RECOVERY - Apache HTTP on mw1041 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 9.129 second response time [02:22:53] RECOVERY - Apache HTTP on mw1172 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 2.254 second response time [02:22:53] RECOVERY - Apache HTTP on mw1168 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 7.585 second response time [02:22:53] RECOVERY - MySQL Recent Restart on db1017 is OK: OK seconds since restart [02:22:53] RECOVERY - MySQL Slave Running on db1017 is OK: OK replication [02:22:54] RECOVERY - Apache HTTP on mw1069 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 2.941 second response time [02:22:54] RECOVERY - Apache HTTP on mw1163 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.052 second response time [02:22:55] RECOVERY - Apache HTTP on mw1183 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.053 second response time [02:22:55] RECOVERY - Apache HTTP on mw1062 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.059 second response time [02:22:56] RECOVERY - Apache HTTP on mw1099 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.066 second response time [02:22:56] RECOVERY - Apache HTTP on mw1079 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.070 second response time [02:22:57] RECOVERY - Apache HTTP on mw1032 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.059 second response time [02:22:57] RECOVERY - Apache HTTP on mw1048 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.062 second response time [02:22:58] RECOVERY - LVS HTTP IPv4 on rendering.svc.pmtpa.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 63409 bytes in 0.514 second response time [02:22:58] RECOVERY - Apache HTTP on mw1082 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 1.610 second response time [02:22:59] RECOVERY - LVS HTTP IPv4 on appservers.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 63126 bytes in 7.974 second response time [02:22:59] RECOVERY - Apache HTTP on mw1064 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 8.664 second response time [02:23:00] RECOVERY - Apache HTTP on mw1055 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 3.362 second response time [02:23:00] RECOVERY - Apache HTTP on mw1043 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 3.534 second response time [02:23:01] RECOVERY - Apache HTTP on mw1020 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 4.441 second response time [02:23:01] RECOVERY - Apache HTTP on mw1023 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 4.588 second response time [02:23:02] RECOVERY - MySQL Idle Transactions on db1017 is OK: OK longest blocking idle transaction sleeps for seconds [02:23:02] RECOVERY - Apache HTTP on mw1216 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.052 second response time [02:23:03] RECOVERY - Apache HTTP on mw1112 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.056 second response time [02:23:03] RECOVERY - Apache HTTP on mw1213 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.059 second response time [02:23:27] does it do that a lot? :) [02:23:45] When stuff is broken.. [02:24:33] RECOVERY - Apache HTTP on mw1152 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 3.215 second response time [02:24:57] * moogsi prods icinga-wm [02:25:05] New review: Tim Starling; "What's wrong with #wikimedia-dev? Why does it need to die, and why can't these bots be moved to it?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57752 [02:25:16] I think it's shut up because everything is back again.. [02:25:29] jolly good [02:27:40] New review: Peachey88; "> But more importantly, from my experience lurking and replying on #mediawiki, it seems like the bot..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57752 [02:30:59] !log LocalisationUpdate completed (1.21wmf12) at Tue Apr 9 02:30:59 UTC 2013 [02:31:07] Logged the message, Master [02:32:53] PROBLEM - Apache HTTP on mw1110 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:32:53] PROBLEM - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 MediaWiki exception - 1595 bytes in 2.218 second response time [02:32:56] PROBLEM - LVS HTTP IPv4 on rendering.svc.pmtpa.wmnet is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 MediaWiki exception - 1600 bytes in 2.294 second response time [02:33:03] PROBLEM - Apache HTTP on mw1167 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:03] PROBLEM - Apache HTTP on mw1173 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:03] PROBLEM - Apache HTTP on mw1023 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:03] PROBLEM - Apache HTTP on mw1052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:03] PROBLEM - Apache HTTP on mw1220 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:03] PROBLEM - Apache HTTP on mw1219 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:04] PROBLEM - Apache HTTP on mw1217 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:04] PROBLEM - Apache HTTP on mw1092 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:04] PROBLEM - Apache HTTP on mw1211 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:05] PROBLEM - MySQL Idle Transactions on db1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:33:13] PROBLEM - Apache HTTP on mw1164 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:13] PROBLEM - Apache HTTP on mw1162 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:13] PROBLEM - Apache HTTP on mw1039 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:13] PROBLEM - Apache HTTP on mw1056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:13] PROBLEM - Apache HTTP on mw1105 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:13] PROBLEM - Apache HTTP on mw1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:13] PROBLEM - Apache HTTP on mw1037 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:14] PROBLEM - Apache HTTP on mw1026 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:14] PROBLEM - Apache HTTP on mw1093 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:15] PROBLEM - Apache HTTP on mw1181 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:15] PROBLEM - MySQL Replication Heartbeat on db1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:33:16] PROBLEM - Apache HTTP on mw1050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:23] PROBLEM - Apache HTTP on mw1175 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:33] PROBLEM - Apache HTTP on mw1094 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:33] PROBLEM - Apache HTTP on mw1061 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:33] PROBLEM - MySQL Slave Delay on db1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:33:33] PROBLEM - Apache HTTP on mw1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:43] PROBLEM - Apache HTTP on mw1098 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:43] PROBLEM - Apache HTTP on mw1051 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:43] PROBLEM - Apache HTTP on mw1088 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:43] PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 MediaWiki exception - 1600 bytes in 2.351 second response time [02:33:46] PROBLEM - Apache HTTP on mw1022 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:53] PROBLEM - Apache HTTP on mw1041 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:53] PROBLEM - Apache HTTP on mw1059 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:53] PROBLEM - MySQL Slave Running on db1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:33:53] PROBLEM - Apache HTTP on mw1097 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:53] PROBLEM - Apache HTTP on mw1038 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:53] PROBLEM - Apache HTTP on mw1113 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:53] PROBLEM - Apache HTTP on mw1184 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:54] PROBLEM - Apache HTTP on mw1067 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:54] PROBLEM - Apache HTTP on mw1168 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:55] PROBLEM - Apache HTTP on mw1078 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:55] PROBLEM - Apache HTTP on mw1106 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:56] PROBLEM - Apache HTTP on mw1095 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:56] PROBLEM - LVS HTTP IPv4 on m.wikimedia.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - pattern not found - 863 bytes in 0.002 second response time [02:33:57] PROBLEM - LVS HTTP IPv4 on appservers.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:58] PROBLEM - Apache HTTP on mw1064 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:58] PROBLEM - Apache HTTP on mw1109 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:59] PROBLEM - Apache HTTP on mw1172 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:59] PROBLEM - Apache HTTP on mw1176 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:33:59] PROBLEM - MySQL Recent Restart on db1017 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:34:00] PROBLEM - Apache HTTP on mw1033 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:01] PROBLEM - Apache HTTP on mw1069 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:03] PROBLEM - Apache HTTP on mw1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:03] PROBLEM - Apache HTTP on mw1042 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:03] PROBLEM - Apache HTTP on mw1183 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:03] PROBLEM - Apache HTTP on mw1099 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:03] PROBLEM - Apache HTTP on mw1187 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:03] PROBLEM - Apache HTTP on mw1163 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:04] PROBLEM - Apache HTTP on mw1101 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:04] PROBLEM - Apache HTTP on mw1102 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:05] PROBLEM - Apache HTTP on mw1071 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:05] PROBLEM - Apache HTTP on mw1053 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:06] PROBLEM - Apache HTTP on mw1024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:06] PROBLEM - Apache HTTP on mw1077 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:07] PROBLEM - Apache HTTP on mw1180 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:07] PROBLEM - Apache HTTP on mw1020 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:08] PROBLEM - Apache HTTP on mw1174 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:09] RECOVERY - MySQL Idle Transactions on db1017 is OK: OK longest blocking idle transaction sleeps for seconds [02:34:09] RECOVERY - MySQL Replication Heartbeat on db1017 is OK: OK replication delay seconds [02:34:09] PROBLEM - Apache HTTP on mw1212 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:10] PROBLEM - Apache HTTP on mw1108 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:10] PROBLEM - Apache HTTP on mw1044 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:11] PROBLEM - Apache HTTP on mw1213 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:11] PROBLEM - Apache HTTP on mw1210 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:12] PROBLEM - Apache HTTP on mw1177 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:12] PROBLEM - Apache HTTP on mw1214 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:34:13] RECOVERY - Apache HTTP on mw1056 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 742 bytes in 7.304 second response time [02:34:23] RECOVERY - Apache HTTP on mw1061 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.042 second response time [02:34:23] RECOVERY - Apache HTTP on mw1094 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.063 second response time [02:34:23] RECOVERY - MySQL Slave Delay on db1017 is OK: OK replication delay seconds [02:34:23] RECOVERY - Apache HTTP on mw1075 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 5.652 second response time [02:34:33] RECOVERY - Apache HTTP on mw1098 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.959 second response time [02:34:33] RECOVERY - Apache HTTP on mw1051 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 1.628 second response time [02:34:33] RECOVERY - Apache HTTP on mw1088 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.060 second response time [02:34:33] RECOVERY - Apache HTTP on mw1022 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.054 second response time [02:34:43] RECOVERY - Apache HTTP on mw1110 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.065 second response time [02:34:43] RECOVERY - Apache HTTP on mw1041 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.045 second response time [02:34:43] RECOVERY - Apache HTTP on mw1059 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.066 second response time [02:34:43] RECOVERY - MySQL Slave Running on db1017 is OK: OK replication [02:34:43] RECOVERY - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 63126 bytes in 0.216 second response time [02:34:47] RECOVERY - Apache HTTP on mw1113 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.056 second response time [02:34:47] RECOVERY - Apache HTTP on mw1184 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.055 second response time [02:34:47] RECOVERY - Apache HTTP on mw1067 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.058 second response time [02:34:47] RECOVERY - Apache HTTP on mw1097 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.064 second response time [02:34:47] RECOVERY - Apache HTTP on mw1038 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 742 bytes in 0.065 second response time [02:34:47] RECOVERY - Apache HTTP on mw1106 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.062 second response time [02:34:47] RECOVERY - Apache HTTP on mw1095 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.066 second response time [02:34:48] RECOVERY - Apache HTTP on mw1078 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.071 second response time [02:34:48] RECOVERY - Apache HTTP on mw1168 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.071 second response time [02:34:49] RECOVERY - LVS HTTP IPv4 on appservers.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 63126 bytes in 0.202 second response time [02:35:02] RECOVERY - Apache HTTP on mw1026 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.060 second response time [02:35:02] RECOVERY - Apache HTTP on mw1039 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.064 second response time [02:35:02] RECOVERY - Apache HTTP on mw1105 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.060 second response time [02:35:02] RECOVERY - Apache HTTP on mw1021 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.069 second response time [02:35:02] RECOVERY - Apache HTTP on mw1164 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.060 second response time [02:35:02] RECOVERY - Apache HTTP on mw1037 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.071 second response time [02:35:02] RECOVERY - Apache HTTP on mw1162 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.063 second response time [02:35:03] RECOVERY - Apache HTTP on mw1093 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.081 second response time [02:35:12] RECOVERY - MySQL Recent Restart on db1017 is OK: OK seconds since restart [02:35:12] RECOVERY - Apache HTTP on mw1181 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.048 second response time [02:35:12] RECOVERY - Apache HTTP on mw1050 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 742 bytes in 0.063 second response time [02:35:12] RECOVERY - Apache HTTP on mw1023 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.058 second response time [02:35:12] RECOVERY - Apache HTTP on mw1017 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.070 second response time [02:35:12] RECOVERY - Apache HTTP on mw1033 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.059 second response time [02:35:12] RECOVERY - Apache HTTP on mw1109 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.051 second response time [02:35:13] RECOVERY - Apache HTTP on mw1092 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.063 second response time [02:35:13] RECOVERY - Apache HTTP on mw1101 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.067 second response time [02:35:14] RECOVERY - Apache HTTP on mw1163 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.052 second response time [02:35:14] RECOVERY - Apache HTTP on mw1177 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.061 second response time [02:35:15] RECOVERY - Apache HTTP on mw1210 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.056 second response time [02:35:15] RECOVERY - Apache HTTP on mw1175 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.060 second response time [02:35:22] RECOVERY - Apache HTTP on mw1053 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.055 second response time [02:35:22] RECOVERY - Apache HTTP on mw1167 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.061 second response time [02:35:22] RECOVERY - Apache HTTP on mw1187 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.057 second response time [02:35:22] RECOVERY - Apache HTTP on mw1212 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.053 second response time [02:35:22] RECOVERY - Apache HTTP on mw1172 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.057 second response time [02:35:22] RECOVERY - LVS HTTP IPv4 on rendering.svc.pmtpa.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 63409 bytes in 0.528 second response time [02:35:32] RECOVERY - LVS HTTP IPv4 on m.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 20652 bytes in 0.005 second response time [02:35:34] RECOVERY - Apache HTTP on mw1020 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.064 second response time [02:35:34] RECOVERY - Apache HTTP on mw1064 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.092 second response time [02:35:34] RECOVERY - Apache HTTP on mw1174 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.057 second response time [02:35:34] RECOVERY - Apache HTTP on mw1183 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.051 second response time [02:35:34] RECOVERY - Apache HTTP on mw1176 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.062 second response time [02:35:34] RECOVERY - Apache HTTP on mw1220 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.055 second response time [02:35:42] RECOVERY - Apache HTTP on mw1071 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.056 second response time [02:35:42] RECOVERY - Apache HTTP on mw1102 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.066 second response time [02:35:42] RECOVERY - Apache HTTP on mw1213 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.057 second response time [02:35:42] RECOVERY - Apache HTTP on mw1211 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.055 second response time [02:35:52] RECOVERY - LVS HTTP IPv4 on appservers.svc.pmtpa.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 63410 bytes in 0.504 second response time [02:35:54] RECOVERY - Apache HTTP on mw1024 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.049 second response time [02:35:54] RECOVERY - Apache HTTP on mw1052 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.049 second response time [02:35:54] RECOVERY - Apache HTTP on mw1077 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.063 second response time [02:35:54] RECOVERY - Apache HTTP on mw1044 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.075 second response time [02:35:54] RECOVERY - Apache HTTP on mw1108 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.063 second response time [02:35:54] RECOVERY - Apache HTTP on mw1180 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.060 second response time [02:35:54] RECOVERY - Apache HTTP on mw1219 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.056 second response time [02:35:55] RECOVERY - Apache HTTP on mw1217 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.067 second response time [02:36:02] RECOVERY - Apache HTTP on mw1042 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.062 second response time [02:36:02] RECOVERY - Apache HTTP on mw1069 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.062 second response time [02:36:02] RECOVERY - Apache HTTP on mw1099 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.066 second response time [02:36:02] RECOVERY - Apache HTTP on mw1173 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.059 second response time [02:36:02] RECOVERY - Apache HTTP on mw1214 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.051 second response time [02:42:16] New review: MZMcBride; "> What's wrong with #wikimedia-dev? Why does it need to die, and why can't these bots be moved to it?" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57752 [02:46:35] New review: Krinkle; "As pointed out on bug 46322, moving them out of #mediawiki is good because it is supposed to be a su..." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/57752 [02:49:30] New review: Krinkle; "And creating more channels is not a solution." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57752 [02:51:39] TimStarling: I'm wondering why you always rebase a change in mediawiki/core before merging. [02:52:00] so that the parent of the change will be correct [02:52:08] so that it won't look like a total dog's breakfast in gitk [02:52:30] I figured (I already knew the answer) [02:52:39] However seems like a losing battle since nobody else does it [02:52:52] if that's what we want (and I'm with you on this one) we can simply configure gerrit to do this for you [02:52:58] e.g. cherry-pick instead of merge on submission. [02:53:12] PROBLEM - Squid on brewster is CRITICAL: Connection refused [02:53:15] but when manually rebasing it creates more notifications, more patchsets and runs the tests again 2 times. [02:54:12] especially the latter causes it to take like 10 minutes to merge. In which case you might even shoot yourself in the foot. If you approve two changes within 10 minutes, one will still be a merge. [02:56:42] RECOVERY - mysqld processes on db1054 is OK: PROCS OK: 3 processes with command name mysqld [02:57:08] TimStarling: btw, I was wondering if you could take a look at some changes I have for lint.php https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/tools/code-utils+branch:master+topic:lint,n,z [02:57:30] When those are merged I can change the mw-core-phplint job to use that instead (it is still using php -l, which makes it slow) [02:57:59] I'm a bit busy at the moment working out why the site went down 15 minutes ago [02:58:10] right now it can't as hashar only wants it to test changed files, but that requires it accept a variadic list of parameters including files. [02:58:12] OK [03:07:56] Number one sign that your filesystem is BIG: mkfs takes forever to complete. [03:08:12] heh [03:08:30] what's the latest incarnation in layers/layout/etc.? [03:08:42] oooh, it's Rob's week [03:08:45] data = bsize=4096 blocks=9764192256, imaxpct=5 [03:08:51] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [03:09:21] 09 01:05:41 < jeremyb_> any ops around? i need some quick query proxying to the OTRS DB [03:09:31] ^^^ could still use that... [03:10:10] jeremyb_: I'm not entirely sure I'd even know where the OTRS DB /is/, let alone its credentials. I can try to look it up if you want. [03:10:16] jeremyb_: probably waiting till they work out what crashed the site would be a ideal solution [03:10:34] * Coren is like ops-lite: all the bits, 3% of the ops knowledge. :-) [03:10:36] p858snake|l: i didn't realize that was still ongoing [03:10:57] Coren: let me see [03:10:59] Coren: db48 and replicated on 49 [03:11:09] p858snake|l: a little more complicated than that [03:11:50] Coren: i'd say use db48 per https://rt.wikimedia.org/Ticket/Display.html?id=4430#txn-108884 [03:12:44] jeremyb_: I can do that then. Got a RT to the ticket? [03:12:52] s/ticket/query/ [03:13:00] no ticket yet... was going to just throw in IRC [03:13:04] * jeremyb_ was typing [03:13:19] That also works, but is iffier for posting the result. :-) [03:13:43] Coren: i want to run SELECT * FROM ticket where tn='ticket id goes here'; for 1 specific ticket (doesn't really matter which one but i'll specify for you i guess). and then i'll change it in the web interface and then you rerun the query again [03:13:52] and we compare to see what my change did [03:14:00] Ah, kk [03:14:17] Single row results are IRC-compatible. :-) [03:14:27] right :) [03:14:40] could go in /msg though [03:15:25] * Coren is standing by. [03:15:38] Coren: 2013040810009632 seems like as good as any [03:15:51] it's spam about a music station in denver :-P [03:16:02] err, s/music/radio/ [03:16:41] Oh, eww. There are a LOT of columns. [03:17:17] hehe [03:18:41] hah [03:18:56] or not wise? [03:31:24] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [03:32:56] So, /dev/mapper/store-nfs0 37T 34M 37T 1% /srv [03:34:10] Coren: now i did make a ticket, https://rt.wikimedia.org/Ticket/Display.html?id=4913 [03:34:16] Reedy, what do you mean by deploy it myself? i don't think i have the rights yet [03:34:28] at some point i might need them, please deploy [03:34:33] Coren: i like those df #s :D :D [03:35:36] TimStarling, if you have a moment, https://gerrit.wikimedia.org/r/#/c/58265/ [03:35:46] simple addition of a meta namespace [03:37:24] jeremyb_: {{done}} [03:40:04] Coren: woot, 2008011610000328 is different between the 2 tickets [03:40:16] * jeremyb_ is happy [03:40:27] (between the RTs) [03:40:39] PROBLEM - Disk space on analytics1010 is CRITICAL: DISK CRITICAL - free space: / 712 MB (3% inode=87%): [03:48:14] yurik: you don't need Tim for that, actually. You can merge it yourself, since you have +2 in core. But as courtesy to other deployers (who probably do not want to be the unwitting deployers of your changes), you should wait for a deployment window so you can merge and the change and then sync it afterwards. [03:48:38] In the meantime, you can add the namespaces here: http://www.mediawiki.org/wiki/Extension_namespace_registration [03:50:33] And you should also let meta admins know and solicit their approval on http://meta.wikimedia.org/wiki/Meta:Babel [03:52:23] New review: Peachey88; "I thought the desire was not to have more configuration for extensions done on/in on wikipages?" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58265 [03:56:20] New patchset: Krinkle; "docroot/noc: Update and re-run createTxtFileSymlinks.sh" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58267 [03:57:11] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58267 [03:57:24] ori-l, i can't +2 in any ops project [03:57:28] only core & extensions [04:01:31] New patchset: Krinkle; "docroot/noc: Fix path level depth" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58268 [04:02:03] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58268 [04:04:26] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:06:46] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [04:08:46] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 04:08:38 UTC 2013 [04:09:26] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:09:46] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 04:09:40 UTC 2013 [04:10:26] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:10:37] yurik, how about now? (Don't merge it, just check.) [04:10:46] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 04:10:36 UTC 2013 [04:11:13] ori-l, yei! [04:11:26] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:11:26] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 04:11:25 UTC 2013 [04:11:42] Puppet freshness on xenon is DUCK SEASON [04:11:46] Puppet freshness on xenon is RABBIT SEASON [04:12:26] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:12:27] yurik: ok, you weren't in 'wmf-deployment', but as a staff member with +2 in core, you ought to be [04:12:46] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 04:12:45 UTC 2013 [04:12:54] thanks :) [04:13:26] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:13:33] the points about waiting for a window and notifying stewards remains [04:13:52] now, i will post on meta and notify [04:14:08] i don't want to deploy any extensions or anything else just yet [04:14:40] just a setting so that when i create a new meta page, it will be in the proper namespace, not NS 0 [04:15:23] why meta, btw? [04:16:36] you might have an easier time doing this on another wiki, like foundationwiki [04:17:28] we went with metawiki for eventlogging because data analysts and researchers were already in the habit of describing their data models there [04:19:06] ori-l: he can't actually deploy it without being a mortal, right? [04:20:20] he's not? [04:20:21] ...he's not. [04:20:36] ori-l, any wiki will work actually [04:21:01] foundationwiki might be good too, but isn't it readonly? [04:21:15] readonly? no [04:21:17] the goal (eventually) is to have telcos edit settings themselves [04:21:36] then not on foundationwiki for now [04:22:44] yurik: i removed you from the deployment group, sorry :-/ i feel crummy about that now, but jeremyb_ is right [04:23:10] hehe [04:23:14] anyways, i think you just need to have tomasz or whomever request that you be added to the deployment group [04:23:20] to mortals, i mean [04:23:35] it might be approved already. idk what his RT says [04:23:51] i'm not exactly sure why we need to have a community discussion about these private pages though... only admins & possibly special telco group should be able to edit [04:24:06] i seem to need more sleep than yurik though [04:24:12] heh [04:24:15] meta-pedians dont like you clogging up their namespaces [04:24:16] its 12:30am here [04:24:36] heh, but we wouldn't want a separate wiki set up for this :) [04:24:49] Susan: ^ [04:25:36] as described in http://www.mediawiki.org/wiki/Requests_for_comment/Zero_Architecture#Partner_Configuration [04:25:50] each Zero NS page will be a json blob [04:25:58] just like ori-l 's schema [04:26:08] Why not use Schema? [04:26:17] because the structure is very different [04:26:29] How so? [04:26:41] these are settings for ZERO banners [04:26:54] And? [04:27:02] Susan, are you looking at the spec? [04:27:08] just so we are on the same page :) [04:27:39] I think if you want a namespace for JSON blobs on Meta-Wiki, it would make sense to use the one that's already there. [04:27:50] We just finished cleaning up the mess of namespaces on Meta-Wiki. [04:27:55] We really don't need to rebuild it. [04:28:16] there are several reasons - just because its json, shouldn't be bunched up together. [04:28:28] a new namespace will give several very important abilities: [04:28:33] It's not bunched up together, it's separated a high level using a namespace. [04:29:09] you couldn't use Schema [04:29:26] because Schema is stricter than JSON; it's JSON Schema [04:29:32] it's not for data; it's for data models [04:29:42] New review: MZMcBride; "You should discuss this with the Meta-Wiki community first." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58265 [04:29:46] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [04:29:51] Susan, with a new namespace we could have a custom security group, we could have custom validators for IP ranges overlapping other IP ranges, etc [04:30:08] we could have very strict general validation for the settings [04:30:31] preventing telcos really goofing up settings that affect billions, not millions :) [04:30:45] (zero has the goal of reaching every phone on the planet :)) [04:30:51] far reaching goal, i agree [04:30:57] now you really sound like dr. evil [04:31:25] You should talk to the Meta-Wiki community. [04:32:06] PROBLEM - Puppet freshness on virt1000 is CRITICAL: No successful Puppet run in the last 10 hours [04:32:10] yes, thx, will do [04:32:25] Does this type of configuration change very often? [04:32:30] ori-l, muahahahaha [04:32:52] considering that us govt finally agreed to install laser on my yacht... [04:32:56] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 04:32:46 UTC 2013 [04:33:25] > you couldn't use Schema [04:33:26] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [04:33:48] I'm not sure this is true. [04:34:51] The MediaWiki namespace is already pretty badly abused on Meta-Wiki. Maybe it makes sense to throw it in there. [04:35:05] That's what CentralNotice uses. [04:35:07] For now. [04:35:39] Susan, it can't be a free-form namespace, because it has to be very rigorously tested (by automated tools, not humans) [04:35:45] and prevent saving of incorrect data [04:36:08] It sounds very similar to the Schema namespace. [04:36:38] Though if you're so worried about people fucking it up, I'm not sure why you'd not just do it in PHP. [04:36:47] That's why I was asking about how often the configuration changes. [04:36:58] Susan, sorry missed the question [04:37:24] php is not good because changes happen all the time, and very often a new partner comes and wants to begin testing within days, if not hours [04:37:37] with the new system we can get them on the way almost immediatelly [04:37:48] Actual input validation would come from an actual input form, of course. [04:38:03] possibly, but not only [04:38:24] Susan, it looks like schema NS because both require validation. But this is fundamentally different validations, different logic [04:38:24] No, you can hack something up with a namespace, I'm just not sure what the advantage is. [04:38:55] not sure what you mean [04:39:27] If it's a configuration interface that you want to have strict access controls and validation for, I think that would suggest putting it in the Special namespace. [04:39:32] And using a form. [04:39:37] An HTMLForm. [04:39:46] ....why? [04:39:49] that's exactly how i plan it [04:39:58] ori-l: Why would you use a form for inputs? [04:39:59] in the sense of - having a form that simplifies editing it [04:40:20] Susan: sorry, I forgot you were a troll for a moment, never mind. [04:40:38] ori-l: Whatever. [04:40:59] the problem though, is that it will be much more work, whereas initially i plan to have very limited exposure to it until the model proovs to be working [04:41:16] so the validation logic is already in place (i have all the code for it) [04:41:24] which i had to do in order to have it working [04:41:47] in any case, susan, i still fail to see what your objections are [04:41:55] i do think we should have everything organized [04:42:01] and i try to achieve that [04:42:36] this is how current settings system look like: http://en.wikipedia.org/wiki/MediaWiki:Zero-rated-mobile-access-carrier-options [04:42:40] and http://en.wikipedia.org/wiki/MediaWiki:Zero-rated-mobile-access-language-options-for-carrier [04:43:15] this is unacceptable as it becomes unmanageable very easily, not strongly typed, and caching cannot be easily reset [04:43:27] so changing it usually requires several days just for the cache to clear [04:43:42] lastly, we can't have telcos edit that [04:43:46] New patchset: Tim Starling; "Add UDP log to bits" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58269 [04:43:47] they are sure to break it [04:44:10] Okay. [04:44:16] Well, you should propose it at Meta-Wiki. [04:45:43] how is storing it on a mediawiki instance better than say, a proper php configuration file stored in git? [04:45:46] Namespaces still don't really scale well and I'm not sure people at Meta-Wiki want more of them. [04:47:41] p858snake|l, storing config file in git does not get automatically validated (only by php), and it takes considerable effort to deploy it [04:48:07] plus i wouldn't want hundreds of telcos going to GIT to change a few settings [04:49:11] You want hundred of telcos editing Meta-Wiki? [04:49:17] hundreds, rather. [04:49:38] I doubt you'd get either, but both seem unpleasant. [04:49:46] i want them to edit the few pages they will have access to, yes [04:50:22] instead of building a dedicated portal, which you would agree would take considerable engineering efforts that could be spend on fixing much more important problems [04:50:44] plus i wouldn't want hundreds of telcos going to GIT to change a few settings yurik: I don't know too much about this, but it all seems complicated and I'm not sure all the complication is needed. [04:51:28] The distinction between providers is some HTML banner on the pages? [04:51:29] the goal of this project is to make it simple [04:51:29] it's actually a really good solution [04:51:35] and i'm sure the deplyment setup could be changed for it as well [04:51:36] just not sure meta is the best wiki [04:51:39] Susan, and tons of settings [04:51:42] but it's elegant and simple. [04:51:50] yurik: The settings are the part I don't understand. [04:52:12] From a high level, you'd set up zero.wikipedia.org and then carriers would choose to whitelist it or not. [04:52:13] well, telcos allow different subset of languages for one [04:52:23] images vs no images [04:52:29] links to their home site [04:52:41] And we build out that support for them? [04:52:44] which might be free or may need to be verified [04:52:48] we already have that support [04:52:54] built * [04:53:15] if you navigate to zero wiki, there is code that does all that [04:53:20] depending on which wiki you come from [04:53:25] i mean, which telco [04:53:51] The overall architecture seems complicated when the end result is giving away the content for free. ;-) [04:54:04] Which we already do on the primary domains. [04:54:13] I dunno. [04:54:17] I'm going to sleep. [04:54:38] Susan, please read all the requirements later, and you will get the complexity [04:54:55] https://www.mediawiki.org/wiki/Requests_for_comment/Zero_Architecture [04:54:58] Are there other pages to read? [04:55:10] I understand that a complex system has been built up. I understand that. [04:55:17] I'm just still not sure I understand why. [04:56:28] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [05:00:28] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [05:05:55] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [05:15:22] New patchset: Krinkle; "noc: Clean up conf viewer" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58272 [05:15:54] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58272 [05:33:15] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 05:33:07 UTC 2013 [05:33:55] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [05:41:39] !log on mw1149: enabled apache access log [05:41:46] Logged the message, Master [05:46:24] New review: Yurik; "Proposed at http://meta.wikimedia.org/wiki/Meta:Babel#Zero_configuration_namespace_coming_to_meta_ne..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58265 [05:57:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:58:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.128 second response time [06:01:22] RECOVERY - Squid on brewster is OK: TCP OK - 0.027 second response time on port 8080 [06:02:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:03:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [06:07:59] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [06:27:59] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 06:27:53 UTC 2013 [06:28:59] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [06:32:49] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 06:32:48 UTC 2013 [06:32:59] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [06:47:08] RECOVERY - udp2log log age for aft on emery is OK: OK: all log files active [06:56:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:58:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [07:01:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:02:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [07:07:37] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [07:49:08] PROBLEM - udp2log log age for aft on emery is CRITICAL: CRITICAL: log files /var/log/squid/aft/clicktracking.log, have not been written in a critical amount of time. For most logs, this is 4 hours. For slow logs, this is 4 days. [07:50:58] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [07:50:58] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [07:50:58] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [07:56:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:57:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [08:00:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:00:53] New patchset: Nikerabbit; "(bug 43359) Enable WebFonts on Javanese projects" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/39578 [08:02:27] New patchset: Nikerabbit; "(bug 43359) Enable WebFonts on Javanese projects" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/39578 [08:02:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [08:02:31] New review: Nemo bis; "jv.quote doesn't exist and they never asked to disable it by default" [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/39578 [08:02:52] Nikerabbit: you forgot to remove jv.quote [08:02:58] not that it harms [08:03:43] Nemo_bis: of [08:03:59] roger [08:04:49] New patchset: Nikerabbit; "(bug 43359) Enable WebFonts on Javanese projects" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/39578 [08:06:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [08:07:26] New patchset: Nikerabbit; "WebFonts to dv* and Narayam for bhwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58282 [08:08:18] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 08:08:12 UTC 2013 [08:09:04] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [08:09:16] Nikerabbit: happy Day of the Finnish language :) [08:09:18] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 08:09:15 UTC 2013 [08:09:56] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/39578 [08:09:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [08:10:53] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58282 [08:13:20] Krinkle|detached: you have merged code which you have not deployed [08:16:29] also local changes [08:23:38] Nikerabbit: either deploy them [08:23:43] Nikerabbit: or revert them in gerrit :D [08:29:04] hashar: I have no interest in trying to sort this out [08:30:45] Nikerabbit: I thought you were blocked in a deployment :-] [08:31:22] hashar: I am [08:31:31] unless magic happens, my deployments wont happen today [08:31:57] I told you [08:32:10] revert the undeployed change [08:32:16] just list them, press revert in Gerrit [08:32:25] I will be more than happy to approve/merge the reverts [08:32:46] Nikerabbit: you can't be blocked because someone forgot to deploy some code :-] [08:32:50] hashar: and what about local changes? [08:32:58] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 08:32:51 UTC 2013 [08:33:13] oh my F*** god [08:33:43] Nikerabbit: revert them [08:33:50] and !log what you did :-] [08:33:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [08:34:03] Nikerabbit: they are safe to revert. I can do it if you want [08:34:50] fine, I can do it myself [08:35:03] New patchset: Nikerabbit; "Revert "noc: Clean up conf viewer"" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58284 [08:35:58] approved :-] [08:36:11] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58284 [08:36:48] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [08:37:35] Nikerabbit: and git diff 0518aef..cd16dc9 will show you what changed :-] [08:37:52] (that is really : git diff master..origin/master [08:37:56] so I just do a git remote update [08:38:01] then do the git diff master..origin/master [08:38:12] and maybe before I will do a git log master..origin/master [08:38:42] hashar: yup [08:39:16] I know the commands, I just am not interested in figuring out if there significant changes locally [08:40:58] !log nikerabbit synchronized wmf-config/InitialiseSettings.php 'Narayam and WebFonts' [08:41:05] Logged the message, Master [08:43:23] !log Got rid of local changes and undeployed commit related to noc [08:43:31] Logged the message, Master [08:47:59] Nikerabbit: yeah that the process, if in doubt: revert. [08:49:27] hashar: how about adding it to the wikitech manual for deploying [08:49:44] go ahead :-] [08:53:38] !log Jenkins tests for mediawiki/core are broken ({{bug|47031}}) due to a faulty change that has been merged in master branch. Root cause is the gating job not testing the proper change {{bug| 46723}} :/ [08:53:45] Logged the message, Master [09:05:40] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [09:27:34] hashar: https://wikitech.wikimedia.org/w/index.php?title=How_to_deploy_code&diff=65946&oldid=65942 [09:29:54] Nemo_bis: thank you :) [09:40:16] New review: Bennylin; "many thanks!" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/39578 [09:57:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:58:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [10:05:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [10:08:58] PROBLEM - Puppet freshness on db1051 is CRITICAL: No successful Puppet run in the last 10 hours [10:08:58] PROBLEM - Puppet freshness on db1058 is CRITICAL: No successful Puppet run in the last 10 hours [10:26:08] !log Deployed experimental varnish build (3.0.3plus~rc1-wm10) on dysprosium and cp1022 [10:26:15] Logged the message, Master [11:01:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:02:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.133 second response time [11:05:12] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [11:06:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:07:22] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [11:27:22] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [11:32:22] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [11:56:29] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [12:03:29] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [12:05:38] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [12:08:38] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 12:08:36 UTC 2013 [12:09:38] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [12:09:48] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 12:09:38 UTC 2013 [12:10:38] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [12:10:38] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 12:10:34 UTC 2013 [12:11:38] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [12:12:18] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 12:12:11 UTC 2013 [12:12:38] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [12:12:58] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 12:12:48 UTC 2013 [12:13:38] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [12:17:58] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [12:32:48] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 12:32:44 UTC 2013 [12:33:38] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [12:37:32] !log jenkins plugins seems to have been upgraded overnight :( [12:37:39] Logged the message, Master [12:54:13] New patchset: Mark Bergsma; "Add cp3007 to the pool" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58296 [12:54:50] New patchset: Mark Bergsma; "Add cp3007 to the pool" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58296 [12:56:35] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58296 [12:58:49] New patchset: Ottomata; "Remove AFT ClickTracking set-up from emery" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58123 [12:58:55] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58123 [13:07:09] !log jenkins plugin upgrade is tracked in {{bug|47040}}. Luckily git plugin has been fixed just in time ;-D Just need to migrate the config history. [13:07:16] Logged the message, Master [13:10:33] mark, you got a couple of minutes to wrap up the RFP for the Kafka contractor? [13:13:02] yes [13:17:55] you wanna go to the google doc? [13:19:31] ok [13:25:46] mark, paravoid: i am chatting in the google doc not sure if you see it [13:26:17] i'm not seeing it [13:26:23] (click on '2 other viewers') [13:26:45] ah [13:27:18] I just have it open to see mark's comments out of curiosity [13:27:22] doing other stuff now :) [13:28:45] k [13:31:56] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [13:31:56] * Coren says some very evil things about pip and python installing in general. [13:32:56] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 13:32:48 UTC 2013 [13:33:36] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [13:34:34] New review: Helder.wiki; "https://pt.wikipedia.org/wiki/WP:Esplanada/an%C3%BAncios#Remo.C3.A7.C3.A3o_do_modo_emergencial_do_CA..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58081 [13:36:11] we need to come up with some other group besides wmf to give graphite access to. or at least give a reason why it was limited so much. (Denny_WMDE1 probably could use to log in to it right about now... ; see #-tech) [13:36:15] (was limited to 'wmf' at the same time as ishmael. i can understand ishmael may need to be more limited than graphite but IMHO both should not necessarily be just wmf people) [13:37:04] there is a long-standing request of creating a signed-nda group [13:37:12] wmf was just a compromise because that is talking too long [13:37:15] there's more than one kind of nda [13:37:24] that's one of the problems apparently [13:37:36] Ryan might know more [13:37:42] I'd sign any NDA, and I am actually already working for a chapter if that helps :) [13:37:53] this is re: ishmael and a few other things (analytics), not sure about graphite [13:38:14] gdash is open though [13:38:19] https://gdash.wikimedia.org/ [13:38:29] what is non-public in graphite? [13:38:32] and if it's something generally useful I think it could be added there [13:39:02] in any case graphite is what's needed atm [13:39:17] paravoid: https://rt.wikimedia.org/Ticket/Display.html?id=3770#txn-102554 FYI [13:40:00] gdash is a frontend to graphite [13:40:08] what is it that it's needed? [13:40:18] paravoid: see #-tech [13:40:31] as for the RT, better chat with ottomata/Ryan [13:41:35] hi wha? [13:43:45] nda ldap grop [13:43:48] group even [13:44:26] i think we don't need that anymore, since we've removed the fancy ldap stuff from the kraken things [13:44:35] we might want it one day [13:45:08] ottomata: see me 9 mins ago above. [13:46:17] aye ok, yeah [13:46:25] this would probably be different than the data nda though, right? [13:46:27] (i don't know) [13:47:09] mark, can we flip the switch on testwiki only this week to test device compatibility? [13:47:43] ottomata: well i was thinking (and MaxSem maybe was too per his question above?) that graphite wouldn't need an NDA [13:49:16] I'm not sure why graphite is restricted but I'm sure there's a good reason [13:49:24] I think gdash is a reasonable compromise though [13:49:31] we should ask asher for more. [13:49:48] we as in jeremyb I guess :) [13:50:28] MaxSem: yes [13:50:29] paravoid: not sure if i asked him directly before (but i did ask before while he was around) [13:50:32] you don't need me present for that right [13:50:45] whee, thanks [13:50:59] i just want to be present for any significant deployment [13:52:07] sure [13:54:43] RECOVERY - search indices - check lucene status page on search1018 is OK: HTTP OK: HTTP/1.1 200 OK - 60075 bytes in 0.008 second response time [14:01:43] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:02:33] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [14:04:11] any ideas why http://noc.wikimedia.org/conf/ lacks InitialiseSettings.php and CommonSettings.php? [14:06:10] i was just wondering the same thing in the last ~10 mins [14:06:45] jeremyb_: a long wondering! [14:07:26] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [14:07:32] well i didn't start investigating until the last 15 secs [14:08:58] gah, hashar's gone. Krinkle's detached [14:09:12] Nikerabbit: Krinkle|detached: ^^ [14:09:39] Nemo_bis: https://gerrit.wikimedia.org/r/58272 https://gerrit.wikimedia.org/r/58284 [14:09:48] err, odder* [14:11:11] what? [14:11:50] Nikerabbit: probably just mistaken tab-completion :) [14:12:21] Nemo_bis: not mistaken [14:13:02] ah that one [14:13:28] jeremyb_: someone left repo in unclean state so I just reverted everything not deployed [14:14:02] Nikerabbit: huh. well who knows when this started (i don't) but it would be nice to get those files visible again. :) [14:19:27] jeremyb_: it will probably be solved when somebody handles https://gerrit.wikimedia.org/r/#/c/58272/ [14:33:06] PROBLEM - Puppet freshness on virt1000 is CRITICAL: No successful Puppet run in the last 10 hours [14:40:36] New review: Alchimista; "Local consensus was reached in 2008 [0], where local community approved the activation for indetermi..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58081 [14:57:05] New review: Faidon; "(15 comments)" [operations/debs/kafka] (master) C: -1; - https://gerrit.wikimedia.org/r/53170 [14:57:08] ottomata: ^ [14:57:35] yeah!!!!! reviews! [14:57:45] :) [14:57:48] thank you [14:58:06] i was going to poke you on that in a day or two yay, yeah, and i htink i've learned a bit since I did that too, so with that and your comments we'll get there [14:58:09] these fell through the cracks tbh [14:58:21] I was about to say so too [14:58:38] I was reading this and I was "but he knows this by now -- oh wait, this was before" [14:59:35] thanks paravoid! [15:01:22] New review: Nemo bis; "Not true: "foi decidido activar temporariamente"" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58081 [15:04:08] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [15:11:19] New review: Faidon; "(5 comments)" [operations/puppet/kafka] (master) C: -1; - https://gerrit.wikimedia.org/r/50385 [15:11:28] ottomata: ^ too [15:11:33] anything else for me? [15:12:25] there's the geoip stuff which should be ready to be merged but I asked brandon to also have a look so that he starts to become acquainted with our puppet [15:12:27] i responded to this: but i think we just need to talk about that [15:12:27] https://gerrit.wikimedia.org/r/#/c/56064/ [15:12:31] (i'm not ready atm) [15:12:32] oh yeah [15:12:33] geoip [15:12:34] cool [15:12:46] agh i really need to fix my email filters for gerri [15:12:55] haha [15:13:50] ottomata: I'm a bit confused with that one [15:13:58] jsonschema [15:14:05] do you have a minute to discuss it? [15:15:00] New review: Ottomata; "The module doesn't depend on files:///. It expects you to set that up, either manually or with the ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53714 [15:15:15] uuum, yeah i think so [15:15:23] havn't looked at in a bit though [15:15:30] i think that's just a consequence of me not knowing what to do [15:15:40] so I mentioned importing the *Debian* git repo [15:15:41] so, how do I take their project and create a deb for the 1.1.0 tag? [15:15:43] not the upstream one [15:16:04] i think that's the one I did….? [15:16:11] anonscm.debian.org [15:16:13] not that one? [15:16:22] that one [15:16:26] yeah I did that [15:16:31] but [15:16:38] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: Timeout while attempting connection [15:16:40] you did? [15:16:44] I see one single commit that adds all files [15:16:44] i got conflicts when I tried to merge teh 1.1.0 tag into the debian/experimental branch [15:16:51] yeahhhhhh, i did something nasty to compensate :p [15:16:53] instead [15:17:07] i recreated debian/experimental from 1.1.0 tag…and manually copied the debian/ directory in [15:17:14] nooo :) [15:17:29] what's the proper thing to do there? [15:17:29] no no no [15:17:37] i konw what I did is improper [15:17:38] but [15:17:48] how do I take a tag and apply the debian/ stuff? [15:17:51] other way around? [15:18:00] recreate from tag and then merge from debian/? [15:18:07] wait [15:18:09] k [15:19:00] so, is the Debian repo importing the upstream git? [15:19:04] or is it importing tarballs? [15:19:14] * paravoid checks [15:19:35] the full source and tags are in that repo [15:19:49] oh but maybe they are importing tags from tarballs hm [15:20:18] yeah how do you tell? [15:20:42] New patchset: Reedy; "Don't use stdout of mergeMessageFileList.php" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57697 [15:20:53] git log and see if it's multiple upstream commits or one bulk importing 1.x.y [15:20:53] it appears it's the former [15:21:24] aye hm, i think that's what I did [15:21:33] yeah, i added the github remote and grabbed the tag from there [15:21:50] https://github.com/Julian/jsonschema [15:21:59] yeah that could work [15:23:25] so yeah, i added the remote, created debian/1.1.0-1 from the 1.1.0 tag, then re-created debian/experimental directly from debian/1.1.0-1, and then manually copied the debian/ directory from the old debian/experimental branch [15:23:28] but, instead [15:23:31] what should I do? [15:23:42] git clone git://anonscm... [15:23:57] git remote add -f upstream git://github.com/Julian/jsonschema [15:25:00] git merge v1.1.0 [15:25:04] resolve the trivial conflict [15:25:22] merge onto debian/experimental? [15:25:39] git commit -m "New upstream release, v1.1.0" [15:25:45] debian-branch = debian/experimental [15:25:47] so i guess so [15:25:51] oh [15:25:52] upstream-branch = master [15:25:54] well, yeah, git checkout -b wikimedia debian/experimental first [15:26:01] hm [15:26:02] and change gbp.conf to reflect that too [15:26:05] ok [15:26:05] hm [15:26:55] got ahold of the maintainer on irc [15:27:01] I'll let him push this to anonscm [15:27:15] oh, the tag? [15:27:20] then we don't have to add the remote? [15:27:37] the github remote? [15:27:48] no [15:27:52] do everything I said above [15:28:09] I'll try to make him prepare 1.1.0 [15:28:14] and upload it to Debian and everything [15:28:21] but until then, let's just work on it ourselves [15:28:35] so after that, dch -v 1.1.0-1 [15:28:45] k [15:30:38] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 182 seconds [15:31:24] oh mark, i forgot to ask, when do you think you will have time to install network ACL's for the analytics machines? [15:32:03] New patchset: Reedy; "Move scap source location from fenari to tin" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56104 [15:33:38] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [15:34:55] New patchset: Reedy; "Remove some node lists" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56108 [15:35:11] New review: Reedy; "Removed yaseo lists too for obvious reasons" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56108 [15:35:36] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 227 seconds [15:35:59] cool paravoid that worked! [15:36:18] i did it in debian/wikimedia branch though, i hope that's ok [15:38:36] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 2 seconds [15:39:08] yo Jeff_Green [15:39:15] hey drdee [15:39:34] New review: Alchimista; ""Assim, foi decidido activar temporariamente (por tempo indefinido) o sistema Captcha " ->" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58081 [15:39:45] i am looking for a parental guardian for Locke, any interest? [15:40:01] ha, what does that even mean? [15:40:12] ottomata: that's fine [15:42:28] Jeff_Green: it means that analytics team has no longer use for Locke (ottomata correct me if I am wrong) and that we either retire the box (it's 7 years old) or you take it on for fundraising purposes [15:42:36] ok, shoudl I just push that branch only to gerrit for review then? [15:42:49] drdee: yes, the plan is to migrate fundraising off of it [15:43:39] i suppose this gets at the question of where analytics ends and fundraising begins [15:44:11] ottomata: yep [15:44:32] drdee: i don't really care either way though, there isn't much to do with locke in the meantime until we get the netapp reconfigured so we can start using the replacement [15:44:44] is there an RT ticket for that? [15:45:15] for fixing the netapp? no, this came up last minute last week and I didn't think to RT it [15:45:30] i was all set to switch and realized we couldn't write to the eqiad netapp [15:45:39] b/c replication goes pmtpa-->eqiad [15:45:43] :( [15:46:32] and we have banners up again, so I just need to line up ma.rk and fundraising for an hour or so of log collection outage [15:48:04] maybe we should have a dedicated box for fundraising related udp2log streams because it seems that as you guys are going to be running this basically all year, it severely restricts our flexibility in scheduling maintenance, upgrades etc etc [15:48:35] yeahhhhh, just keep locke ?:) [15:49:50] ha [15:49:55] i look forward to not having this log stream at all [15:55:26] Who broke noc? https://noc.wikimedia.org/conf/ [15:55:30] There's no php config files [15:58:41] Timo or Niklas [15:59:25] * Reedy cries [15:59:50] at least the wikis aren't broken again [15:59:59] Again? [16:00:06] I really CBA fixing 23 symlinks manually [16:00:34] Reedy: maybe a live hack that was on fenari and not merged / deployed [16:00:53] Reedy: that was blocking Niklas deployment this morning. I instructed him to revert the change, clean out fenari and proceed . [16:00:56] Reedy: well, the outage last evening Pacific time at 02:00 UTC [16:01:02] It was only small [16:01:08] Post l10nupdate [16:01:16] right [16:01:21] hashar: How was stuff in docroot/noc/conf breaking deployment? [16:01:39] Reedy: here is the revert https://gerrit.wikimedia.org/r/#/c/58284/ :D [16:01:52] I don't know we did not even look at the change. [16:02:02] It's all in docroot [16:02:03] I have looked at it just now to find out if it was related to your issue :-] [16:02:16] aka: not merged/deployed -> blindly revert [16:02:21] When it's checked out oin fenari it's live [16:02:49] then the change should have been pulled :] [16:02:57] Reedy: there were uncommitted local changes, and a patch that touches same files [16:03:04] Ah [16:03:33] So Krinkle self meged, but hadn't put it on fenari? [16:04:04] Yay for scripts [16:04:15] Noc fixed, for now [16:04:48] \O/ [16:06:20] New review: Reedy; "Do not merge stuff into a repository if you're not going to make sure that they are correctly checke..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58272 [16:06:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [16:07:38] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [16:08:28] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 16:08:21 UTC 2013 [16:08:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [16:09:18] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 16:09:10 UTC 2013 [16:09:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [16:09:58] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 16:09:54 UTC 2013 [16:10:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [16:11:08] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 16:10:59 UTC 2013 [16:11:41] Reedy: thanks [16:11:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [16:16:49] Whee [16:20:21] New patchset: Ori.livneh; "Change 'mongofork' param to false by default" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58310 [16:32:01] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58310 [16:32:24] ori-l ^ :) [16:33:18] RECOVERY - Puppet freshness on xenon is OK: puppet ran at Tue Apr 9 16:33:12 UTC 2013 [16:33:58] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [16:39:13] New patchset: Ottomata; "Upstream release 1.1.0-1" [operations/debs/python-jsonschema] (debian/wikimedia) - https://gerrit.wikimedia.org/r/58311 [16:53:44] RoanKattouw_away: ping [16:54:31] New review: Faidon; "Use 1.1.0-1~wmf1 (or perhaps ~precise1) as the version instead, in case 1.1.0-1 gets uploaded into D..." [operations/debs/python-jsonschema] (debian/wikimedia) C: -1; - https://gerrit.wikimedia.org/r/58311 [16:54:49] PROBLEM - RAID on cp1041 is CRITICAL: Timeout while attempting connection [16:55:49] RECOVERY - RAID on cp1041 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [16:57:10] paravoid / ottomata: thank youuuuuuuu [16:57:19] what did I do? [16:57:45] lots of patient code review [16:57:46] guided! [17:01:05] oh I didn't know Wikipedias had "share" links too https://id.wikipedia.org/wiki/MediaWiki:Sidebar [17:02:00] :( [17:03:39] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [17:05:00] Oh noes, not again. [17:05:25] Didn't I remove Facebook links from SiteNotice on one Wikipedia already? [17:05:50] lol [17:06:02] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [17:09:16] 6. What single evolution would improve your answer to #6 the most? [17:09:33] oh, self-referential [17:11:19] New patchset: Dzahn; "add snuggle.wm SSL cert per RT #4473" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58254 [17:11:36] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58254 [17:13:10] had to rebase that even though it was a newly added file.. shrug [17:13:39] AaronSchulz: hah, I hadn't started that yet... wonderful ;) [17:13:42] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [17:14:19] meh, all the fields were required, even text forms that I have nothing to say about [17:14:27] I just though "fuck it" and left [17:14:57] oops [17:15:30] if no effort is put into making the thing why should I bother putting any into filling it out? [17:16:25] odder: you did? which wikipedia? [17:16:37] well, right, but they probably will action based on the responses [17:16:39] some Indonesian [17:16:43] * Nemo_bis realises is wrong channel [17:16:51] * greg-g points AaronSchulz to -staff for a more on-topic channel ;) [17:16:54] there's only one Indonesian wikipedia AFAIK [17:17:10] you could ask John Vanderberg what he thinks [17:17:12] there are more languages spoken in Indonesia than Indonesian [17:17:20] true that [17:17:44] but nobody calls "Italian Wikipedias" the "younger sisters" of it.wiki, dunno id.wiki [17:17:45] Bahasa Indonesia = id [17:17:59] mutante: can you deploy the last bugzilla changes? :) [17:17:59] AaronSchulz: ok, now I'm at the point you are at, number 8, which is required, asks me to compare against the cubicles that were here before... I wasn't. ugh. [17:18:14] mutante: hopefully they'll reduce wikibugs spam by 50 % [17:18:35] New patchset: RobH; "RT 4796 adding Matt Flaschen to stat1 and alphabetizing the include accounts on stat1 entry" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58315 [17:19:14] Nemo_bis: https://min.wikipedia.org/w/index.php?title=Templat:AdvancedSiteNotices&diff=prev&oldid=17946 [17:19:44] Nemo_bis: they broke my logo :-( [17:19:45] Nemo_bis: "Make Bugzilla 4.2's BugMail.pm not trigger bugmail for CC field only changes" ? [17:19:48] New patchset: RobH; "RT 4796 adding Matt Flaschen to stat1 and alphabetizing the include accounts on stat1 entry" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58315 [17:19:49] yay gerrit is being fast today \o/ [17:20:20] mutante: yes [17:20:26] Nikerabbit: Are you sure? I'm pretty sure I did deploy that change [17:20:27] odder: I saw :< [17:20:29] Nemo_bis: RobH is already on it, on his list [17:20:36] mutante: oh wonderful :) [17:20:40] Perhaps git got the working copy confused. The local changes were equal to the commit. [17:20:43] thanks Rob [17:20:58] New patchset: Krinkle; "Revert "Revert "noc: Clean up conf viewer""" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58316 [17:21:01] Nemo_bis: no worries, i chatted with mutante about this and then didnt get to it yet, but can do now [17:21:03] New patchset: Krinkle; "Revert "Revert "noc: Clean up conf viewer""" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58316 [17:21:05] since you are about to test [17:21:14] (or someone is i assume) [17:21:17] :) [17:21:17] andre__: ^ [17:21:21] yes I'll test [17:21:33] just ping me when done [17:21:58] here. What can I do? :) [17:22:13] Yes, deploy something. I'll see if it breaks. [17:22:45] :) [17:22:53] Change merged: jenkins-bot; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58316 [17:24:10] Nemo_bis (and andre__) [17:24:15] deployed, give it a test please? [17:24:35] andre__: yep, the BugMail.pm is deployed now [17:24:48] Krinkle: as far as I could see the local changes did not match the commit (commit had more files changed), but again I didn't investigate deeply [17:24:53] alright, thanks! [17:25:01] Nikerabbit: ok, no problem. [17:25:15] lemme know if it works so i can resolve the bugs [17:25:19] and RT ticket [17:25:32] a cc was ignored [17:25:54] \o/ [17:26:05] however I didn't see any mail sent to andre__ , have you added yourself to globalwatchers with preferences set to receive cc changes? [17:26:58] http://p.defau.lt/?Qhtckegn58qyotJgOeiH0w [17:27:59] Nemo_bis: see the excluding list. [17:28:12] globalwatcher excluded for CC only changes, that [17:28:16] s what the deployed patch was about [17:28:18] New patchset: Ori.livneh; "Upstream release 1.1.0-1" [operations/debs/python-jsonschema] (debian/wikimedia) - https://gerrit.wikimedia.org/r/58311 [17:29:55] andre__: right, somehow I thought it was only for wikibugs [17:30:02] unfortunately not :) [17:30:09] New patchset: RobH; "RT 4796 adding Matt Flaschen to stat1 and alphabetizing the include accounts on stat1 entry" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58315 [17:30:23] yeah, seems to work. I've tried with a second email account, on one report I'm reporter with my primary account, and a "normal" one. Behaves as expected. [17:30:26] Thanks everybody! [17:30:30] cool [17:30:31] So does someone with actual gerrit knowledge want to check this out? [17:30:35] https://gerrit.wikimedia.org/r/#/c/58315/ [17:30:43] it wouldnt let me merge patch set 2, even though it passed code rebiew [17:30:50] andre__: you could file a bug for that though, in that foreach (@watchers) an additional condition to check if it's wikibugs or another account [17:31:04] rebase makes it so i can merge [17:31:09] (chjanges from can merge no to yes) [17:31:12] so odd.... [17:31:27] RobH: that's a bit weird yea, just like on my recent one, cant merge until after another rebase, when there seems to be no reason for having to rebase (like a newly added file) [17:31:43] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58315 [17:31:55] the odd part was this is site.pp, nonnew file [17:31:59] but same behavior as your new file issue [17:32:15] thanks RobH for deploying! [17:32:29] next time it happens, before i merge im gonna bribe hashar to look at it. [17:32:34] andre__: quite welcome! [17:32:42] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [17:37:46] mutante, did you see my response to https://bugzilla.wikimedia.org/show_bug.cgi?id=46022 - I'm not sure that type of forward will quite do what we want [17:39:46] Thehelpfulone: since the bug links to a bug on launchpad, and that links to http://sourceforge.net/tracker/index.php?func=detail&aid=1220144&group_id=103&atid=300103 [17:40:15] Thehelpfulone: that mentions the "accept_these_nonmembers" option in mailman.. i think just use that, yea [17:40:49] if what you want is what is decribed in https://bugs.launchpad.net/mailman/+bug/266839 [17:41:58] hmm I'm not sure that's the same thing - okay so Sumana this morning posted an email to WikimediaAnnounce-l with the WMF engineering report for March [17:42:41] if I'm understanding that bug correctly, we want that email to automatically be forwarded to Wikimedia-l so that the audience to that list are able to a) see it and b) comment on it/discuss it in an email thread because WikimediaAnnounce-l is restricted posting [17:43:29] why does it have to be so complicated :p is cross-posting evil? [17:43:52] i added another comment to the bugzilla bug.. but besides that.. theres only one way to find out.. [17:45:01] hmm Nemo_bis did I understand that bug correctly ^ (wikimediaannounce-l -> wikimedia-l forwarder?) [17:47:59] Thehelpfulone: yes [17:48:22] mutante: simply, people forget or don't want to subscribe to wikimedia-l [17:48:30] Thehelpfulone: give me any list name you have admin on [17:48:34] New patchset: RobH; "removing old users giovannia & akhanna from cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58318 [17:48:34] (yes, one can subscribe with delivery disabled blabla) [17:48:46] mutante, one for testing you mean? [17:49:00] Thehelpfulone: i just want to give you a link [17:49:08] https://lists.wikimedia.org/mailman/admin/ops/?VARHELP=privacy/recipient/acceptable_aliases [17:49:13] replace "ops" with another list name [17:49:38] ooh I've not seen that one before [17:49:39] * Thehelpfulone reads [17:50:00] because people forget to subscribe to a list, but subscribe to another list, you want to forward mail from list a to list b... o ..k.. [17:50:38] yeah that might work, mutante can you look at the subscriber list for wikimediaannounce-l - is wikimedia-l already subscribed? [17:50:59] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [17:50:59] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [17:50:59] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [17:52:04] Thehelpfulone: there is nothing in that option on wikimediaannounce-l [17:52:23] oh , subscriber list, wait [17:52:26] yeah [17:53:20] wikimedia-blog is subscribed, and other external lists, but not wikimedia-l afaict [17:54:23] Nemo_bis, do you remember how it was done in the past? [17:54:55] i can just confirm we use the "acceptable_alias" and a redirect like this https://gerrit.wikimedia.org/r/#/c/4693/1/files/exim/exim4.listserver_aliases.conf succesfully when renaming lists, which seems to be the same thing [17:55:23] add the old list email address to "acceptable aliases" on the new list web ui (insert details) [17:55:26] merge a mail alias to redirect mail to the old list [17:56:21] <-- that is when you actually "rename" a list and everything should go to new list [17:56:32] mutante, okay, can you subscribe me to test@ then add cabal-l as an acceptable_alias? [17:56:39] Thehelpfulone: it never worked [17:57:18] oh wow, i'm admin of test@ :p.. forgot ,, yeah [17:57:35] heh, yeah cabal-l is the test list I was told I could use :D [17:58:35] subscribed [17:59:08] added cabal-l to acceptable aliases [18:00:01] okay, now can you send an email to test@? [18:00:08] i just typed the full email adddress, not a special regex or anything [18:00:12] that field takes regex though [18:00:54] okay [18:00:56] Thehelpfulone: done [18:01:05] Krenair: o_0 [18:01:57] okay, so I received the one to test@ but nothing came through on cabal-l? [18:03:06] Thehelpfulone: for that cabal-l needs to be subscribed [18:04:18] of course, so please can you subscribe cabal-l to test@ then send another email (I don't even think we'll need the acceptable_alias for this to work) [18:04:36] yea, done and sent another mail [18:04:54] Nemo_bis: so I had a look at your question about [[mw:Manual:$wgEmergencyCaptcha]] [18:05:40] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [18:05:54] Thehelpfulone: right, if that is all you want then you probably dont need it [18:06:47] yeah, https://lists.wikimedia.org/mailman/private/cabal-l/2013-April/000000.html it shows up in the archives but I didn't receive it myself (I subscribed myself to cabal-l) [18:06:49] odder: what question^ [18:07:21] It was more of a statement really. https://bugzilla.wikimedia.org/show_bug.cgi?id=41745#c4 [18:07:36] Thehelpfulone: if you're a gmail address and subscribed to both, gmail drops duplicates [18:08:20] ah, so that probably wouldn't help with forwarding? [18:09:50] Nemo_bis, the subject line is different though, there's "[Cabal-l] [Test] test mail for THO" compared to "[Test] test mail for THO" [18:11:04] New patchset: RobH; "removing old user akhanna from cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58318 [18:12:40] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [18:12:56] New review: Demon; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56104 [18:24:29] paravoid: jenkins isn't configured for that repo, so i think you have to submit it manually (re: https://gerrit.wikimedia.org/r/#/c/58311/) [18:33:17] Coren: So your tools.wmflabs.org cert i sback [18:33:22] im putting on tridge and resolving ticket for you [18:33:30] Ooo. Danke! [18:33:40] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [18:33:42] quite welcome, sorry it took so long [18:33:51] i accidentally inputted my personal email to send it to [18:33:52] not my work [18:33:53] =P [18:33:57] didnt find till last night [18:33:58] ... have you ever /worked/ in corporate? :-) [18:34:26] Took less than a week = omgfast! [18:34:27] :-) [18:34:29] yep, i just hold myself to higher standard ;] [18:37:10] PROBLEM - Puppet freshness on virt1005 is CRITICAL: No successful Puppet run in the last 10 hours [18:38:52] Is full of workies. Thanks. [18:39:37] Thehelpfulone: i gave you admin on the test@ list, test away.., bbl [19:05:56] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [19:07:29] heya Ryan_Lane, you there? [19:07:34] oo, wait, i want to ping you in #labs [19:15:43] New patchset: MaxSem; "New Wikipedia apple-touch-icon from our design cabal" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58323 [19:50:20] anyone with +sysop on wikitech? https://wikitech.wikimedia.org/wiki/Special:Contributions/Fadili [19:54:17] uuuuggghhh [19:54:47] Special:Nuke? [19:55:02] blocked [19:55:27] determined spammer [19:56:29] !log installed Nuke on wikitech [19:56:36] Logged the message, Master [19:58:51] it seems like this user may not have been a spammer [19:59:06] but someone who thinks wikitech was the right place to make an omegawiki clone [19:59:33] nuked [20:00:15] https://wikitech.wikimedia.org/wiki/New_Project_Request/A_wiktionary_for_social_sciences [20:00:21] yep [20:00:37] all right, mobile deployment time [20:03:06] I emailed him [20:04:01] lol Author: Max Semenik [20:04:16] git rocks! [20:04:24] yeah, *that* Max [20:04:42] I love that Max, such a wide range of personalities he has [20:05:18] but srsly, git merges colud've looked saner [20:06:44] PROBLEM - Varnish HTTP mobile-frontend on cp1041 is CRITICAL: Connection timed out [20:07:44] RECOVERY - Varnish HTTP mobile-frontend on cp1041 is OK: HTTP OK: HTTP/1.1 200 OK - 707 bytes in 0.819 second response time [20:09:34] New patchset: Pyoungmeister; "Update TTMServer Solr schema" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57498 [20:09:41] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [20:10:01] PROBLEM - Puppet freshness on db1051 is CRITICAL: No successful Puppet run in the last 10 hours [20:10:01] PROBLEM - Puppet freshness on db1058 is CRITICAL: No successful Puppet run in the last 10 hours [20:12:56] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/57498 [20:16:42] New patchset: Ottomata; "Importing exists method from os.path in e3-metrics.settings.py.erb" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58328 [20:16:47] New patchset: Pyoungmeister; "defining labsdbs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58222 [20:17:08] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58328 [20:18:27] New patchset: Ottomata; "Importing more python modules for e3-metrics.settings.py.erb" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58329 [20:18:36] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58329 [20:21:20] Change merged: MaxSem; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58043 [20:21:42] Change merged: MaxSem; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58323 [20:23:33] csteipp, could you have a look at https://bugzilla.wikimedia.org/show_bug.cgi?id=28976 for any security issues with enabling the API for OTRS? This shouldn't require the OTRS upgrade if I understand correctly so some people would like it enabled soon (i.e. now if possible) [20:33:22] New patchset: Pyoungmeister; "defining labsdbs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58222 [20:36:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:37:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.135 second response time [20:39:19] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58222 [20:53:09] New patchset: Reedy; "In sync-dir, actually perform the syntax check" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56105 [20:53:16] New patchset: Reedy; "Basic puppetization of dsh" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56107 [20:53:20] New patchset: Reedy; "Remove some node lists" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56108 [20:53:49] heya ops, mind reviewing this change: https://gerrit.wikimedia.org/r/#/c/56577/, it's adding the package lillypond to deploys to support the Score extension, which we hope to deploy soon. [20:56:42] New review: Lcarr; "(1 comment)" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/56107 [20:56:55] greg-g: i'll check it out [20:58:02] thanks much LeslieCarr [20:58:51] ewww midi [20:58:52] ;) [20:59:34] hey, don't we all want WP pages to autoplay midi songs? [20:59:41] yeah… in 1997 [20:59:55] csteipp: so do you plan on turning on the xff stuff? [21:00:06] New patchset: Lcarr; "Install lilypond on Apache nodes (used by Score extension)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56577 [21:00:29] New review: Lcarr; "as much as i hate hate hate ensure => latest , since it's already on the other packages, i'll let it..." [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/56577 [21:00:36] New review: Lcarr; "as much as i hate hate hate ensure => latest , since it's already on the other packages, i'll let it..." [operations/puppet] (production); V: 2 - https://gerrit.wikimedia.org/r/56577 [21:00:37] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/56577 [21:00:38] LeslieCarr: aww, it sounds so cute, "lilypond" [21:00:41] AaronSchulz: I'm planning to sometime this week. After the stuff I'm working on. [21:00:55] hrm, "latest" isn't so cute though [21:01:17] LeslieCarr: midi is *awesome* [21:01:26] * Ryan_Lane isn't kidding [21:01:33] general midi sucks though [21:01:36] yes [21:01:38] general midi sucks [21:01:54] New patchset: Pyoungmeister; "rmeoving db1012 from unused db list" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58417 [21:01:54] but midi itself is awesome [21:02:06] and bad sound cards gave midi a bad rep [21:02:15] even though in theory it could sounds like anything [21:02:15] New patchset: Pyoungmeister; "rmeoving db1012 from unused db list" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58417 [21:02:18] it's simple, does what it needs to do and is incredibly standard [21:02:32] *sound [21:02:59] yep. really midi is just for controlling things [21:03:34] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58417 [21:04:53] "There were grammatical errors even in his silence." -- Stanisław Jerzy Lec [21:04:57] lol, bz quip [21:05:25] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [21:05:48] :D [21:07:25] PROBLEM - MySQL Recent Restart Port 3306 on labsdb1001 is CRITICAL: NRPE: Command check_mysql_recent_restart_3306 not defined [21:07:35] PROBLEM - MySQL Slave Delay Port 3306 on labsdb1001 is CRITICAL: NRPE: Command check_mysql_slave_delay_3306 not defined [21:07:45] PROBLEM - MySQL Slave Running Port 3306 on labsdb1001 is CRITICAL: NRPE: Command check_mysql_slave_running_3306 not defined [21:07:55] PROBLEM - mysqld processes on labsdb1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [21:08:15] PROBLEM - MySQL Idle Transactions Port 3306 on labsdb1001 is CRITICAL: NRPE: Command check_mysql_idle_transaction_3306 not defined [21:08:37] "Penyulap, if this were run by people with a polite background, you would be bounced from it in a second. The fact that you would say such things and upload the first coprophilia file to Commons is quite a bit hypocritical." [21:08:47] Ryan_Lane: Commons:Village_pump is fun [21:09:03] he speaks crazy talk [21:09:15] RECOVERY - MySQL Idle Transactions Port 3306 on labsdb1001 is OK: OK longest blocking idle transaction sleeps for seconds [21:09:25] RECOVERY - MySQL Recent Restart Port 3306 on labsdb1001 is OK: OK seconds since restart [21:09:29] :) [21:09:35] RECOVERY - MySQL Slave Delay Port 3306 on labsdb1001 is OK: OK replication delay seconds [21:09:45] RECOVERY - MySQL Slave Running Port 3306 on labsdb1001 is OK: OK replication [21:09:50] * Damianz looks at labsdb1001 and drools a little [21:11:20] AaronSchulz: Commons folks generally learnt to ignore this guy. [21:11:23] New patchset: RobH; "removing old user akhanna from cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58318 [21:11:23] New patchset: RobH; "RT 4600 Adding Michelle Grover to cluster access / stat1 access" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58419 [21:11:41] bleh, i didnt mean to new patchset the first but wasnt sure how to selectively yank it out [21:11:42] oh well [21:11:44] doesnt hurt. [21:13:40] i take back all the nice shit i said about gerrit speed this morning =P [21:23:45] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [21:30:25] PROBLEM - SSH on cp1041 is CRITICAL: Connection timed out [21:31:15] RECOVERY - SSH on cp1041 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [21:31:45] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [21:36:59] New patchset: RobH; "added wikimaps.net to act like the wikimaps.com/org" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/54582 [21:43:47] Change abandoned: Dzahn; "already done. duplicate of 58117 (RT-4747)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/55920 [21:44:47] !log Zuul is slowly dequeuing a large number of jobs. There is currently rougly 30 minutes lag between change submission and actually processing of the job. No idea why but investigating. [21:44:54] Logged the message, Master [21:44:57] !log maxsem Started syncing Wikimedia installation... : Weekly mobile deployment [21:45:04] Logged the message, Master [21:46:46] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [21:59:15] New review: Nemo bis; "awjrichards, issue addressed in bug report." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/23112 [22:03:46] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [22:04:24] LeslieCarr: how do I know when lilypond is installed on the machines? I just tried the test page for that bug and it is still erroring and I want to make sure I put the blame in the right place :) [22:04:47] lemme check a machine [22:04:50] thanks [22:05:54] insatlled on several machines [22:06:01] so it's been >30 minutes, yah ? [22:06:17] PROBLEM - Puppet freshness on db1012 is CRITICAL: No successful Puppet run in the last 10 hours [22:06:27] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [22:07:02] installed on everything i checked [22:07:10] so … i'm guessing it's the extension [22:07:15] you can wait an extra half hour to be extra safe [22:07:23] LeslieCarr: thanks, will do to be extra safe [22:09:30] New patchset: RobH; "removing old user akhanna from cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58318 [22:14:13] Heja ops, do you want to have an RT ticket for https://bugzilla.wikimedia.org/show_bug.cgi?id=46976 (Thumbnail cache purging issues (Varnishes only?)), or how can make somebody looking into it? [22:14:13] New patchset: MaxSem; "Enable $wgMFVaryResources on testwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58427 [22:16:32] andre__: I think LeslieCarr looked into that yesterday, mark was also aware of it, I believe. Not sure of status, though. [22:16:41] !log maxsem Finished syncing Wikimedia installation... : Weekly mobile deployment [22:16:48] Logged the message, Master [22:17:05] greg-g, I see, thanks [22:18:41] andre__: what I know is LeslieCarr couldn't see anything network-wise that was the issue. [22:18:54] https://bugzilla.wikimedia.org/show_bug.cgi?id=15716 [22:18:57] old bugs are old! [22:18:57] PROBLEM - Puppet freshness on cp3003 is CRITICAL: No successful Puppet run in the last 10 hours [22:19:16] fuck wrong window [22:19:33] greg-g: need better example [22:19:38] andre__: need better example [22:19:44] the two shown are like the same age [22:19:51] and i am not going to look for image differences [22:19:54] different age and size plz [22:21:45] LeslieCarr: I can still reproduce the problem by going to https://upload.wikimedia.org/wikipedia/commons/thumb/c/cb/NRHP_Illustrated_Counties.svg/400px-NRHP_Illustrated_Counties.svg.png and reloading a few times (Ctrl+F5 in Firefox) [22:21:52] hence wondering how to provide useful info. [22:22:08] i need size and ages [22:22:14] not visual differences [22:22:30] New patchset: Reedy; "Update fywiki sort order, add note about default Wikibase settings" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56367 [22:22:42] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/56367 [22:22:44] or at least different sizes [22:22:48] LeslieCarr: okay. how do I do this? :) [22:23:03] find some image that is broken that has different sizes in the broken and unbroken version [22:24:05] LeslieCarr, that's the case for aforementioned link. If I download both versions, one is 133,559, the other is 127,301 [22:24:23] (bytes, obviously) [22:24:25] the examples given in the bug were 133k [22:24:26] both [22:25:10] so I wonder how to "reach" the 127301 one via a command. [22:26:16] ah, got luck this time. I'll update the bug report [22:26:29] ah there we go [22:26:38] cp1028 seems to have a bad varnishhtcpd thing going on [22:26:41] LeslieCarr, https://bugzilla.wikimedia.org/show_bug.cgi?id=46976#c4 [22:28:03] New patchset: Reedy; "(bug 41745) Remove ptwiki, ptwikinews from EmergencyCaptcha" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58081 [22:28:14] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58081 [22:29:21] New patchset: Reedy; "Set branch.autosetuprebase to always when setting up new deployment" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58236 [22:29:26] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58236 [22:30:24] New patchset: Reedy; "(bug 46990) Add the 'editor' restriction level on pl.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58038 [22:30:38] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58038 [22:30:57] New patchset: Reedy; "(bug 46153) Subject namespace to thwikibooks, change wgSitename" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58251 [22:31:06] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/58251 [22:33:41] !log clearing a few varnish caches - fyi if db load is seen to increase [22:33:48] Logged the message, Mistress of the network gear. [22:35:02] New patchset: Reedy; "Disable ClickTracking extension" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57769 [22:35:57] Does Quim ever use IRC? [22:36:08] !log reedy synchronized wmf-config/ [22:36:16] Logged the message, Master [22:37:25] odder: yes [22:37:39] idle : 0 days 0 hours 41 mins 22 secs [signon: Tue Apr 9 22:04:29 2013] [22:38:26] andre__: ok, other examples ? [22:39:16] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58318 [22:39:31] odder: He's in #wikimedia-dev as qgil [22:39:31] New patchset: RobH; "RT 4600 Adding Michelle Grover to cluster access / stat1 access" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58419 [22:43:09] LeslieCarr, the only other *useful* testcase on the VPs was https://bugzilla.wikimedia.org/show_bug.cgi?id=46976#c6 [22:43:44] 4.1k vs 10k, cool [22:47:09] fixed [22:47:24] yay just finding a few cases where it looks like varnishhtcpd froze silently or something [22:47:28] any more cases andre__ ? [22:48:24] LeslieCarr: not that I'm aware of (yet). Thanks a lot for looking into these! What would decrease the number of such problems? Fixing https://bugzilla.wikimedia.org/show_bug.cgi?id=43449 ? [22:48:33] (Monitor effectiveness of HTCP purging) [22:49:03] yes that would hep [22:52:50] !log maxsem synchronized php-1.22wmf1/extensions/MobileFrontend/ [22:52:57] Logged the message, Master [22:55:13] !log maxsem synchronized php-1.21wmf12/extensions/MobileFrontend/ [22:55:20] Logged the message, Master [22:55:56] MaxSem: little late, eh? :) [22:56:05] greg-g, bugzzz [22:56:10] New patchset: Odder; "(bug 46712) Set a different favicon for iswiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57683 [22:56:47] MaxSem: anything in particular? we don't want to go over windows without letting people know (especially by an hour) [22:57:49] we discovered a bug after a scap [22:58:37] MaxSem: ok, reasonable, just next time, ping me please, I wasn't reading -mobile [22:58:47] ok, sorry [22:59:23] New patchset: Odder; "(bug 46712) Set a different favicon for iswiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57683 [22:59:26] New review: RobH; "picard" [operations/puppet] (production) C: 2; - https://gerrit.wikimedia.org/r/58419 [22:59:27] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58419 [22:59:29] LeslieCarr: darn, one more after going through the VPs again: https://bugzilla.wikimedia.org/show_bug.cgi?id=46976#c8 [22:59:44] New review: Odder; "Proper indentation." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/57683 [22:59:46] MaxSem: thanks man, just don't want to give someone else the go ahead and find ouot the window isn't actually open, is all :) [23:04:17] New patchset: RobH; "missed updating class name for mgrover user" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58435 [23:04:44] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58435 [23:05:04] PROBLEM - Puppet freshness on db1012 is CRITICAL: No successful Puppet run in the last 10 hours [23:05:14] PROBLEM - Puppet freshness on xenon is CRITICAL: No successful Puppet run in the last 10 hours [23:17:46] andre__: fixed [23:19:04] !log running upgrades on db64 and rebooting [23:19:11] Logged the message, notpeter [23:20:58] !log running upgrades on db56 and rebooting [23:21:05] Logged the message, notpeter [23:21:30] andre__: any more ? [23:21:42] ironically all the messaed up boxes, in the US [23:21:51] i guess americans just don't report the bugs [23:22:32] we're lazy [23:23:07] greg-g: yeah, but we feel realy bad about being lazy [23:23:09] and that's what counts [23:23:44] PROBLEM - Host db64 is DOWN: PING CRITICAL - Packet loss = 100% [23:23:47] andre__: can you also update all the village pumps [23:23:47] please [23:24:02] notpeter: stop killing the databases plz [23:24:14] did you do anything to db64 or … ? [23:24:20] LeslieCarr: I logged [23:24:22] is in pmtpa [23:24:26] upgrade and reboot [23:24:30] oh there too [23:24:33] i say db56 [23:24:34] saw [23:24:39] ja [23:25:25] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [23:25:54] PROBLEM - Host db56 is DOWN: PING CRITICAL - Packet loss = 100% [23:26:24] RECOVERY - Varnish traffic logger on cp1041 is OK: PROCS OK: 3 processes with command name varnishncsa [23:27:04] RECOVERY - Host db56 is UP: PING OK - Packet loss = 0%, RTA = 26.52 ms [23:27:34] RECOVERY - Host db64 is UP: PING OK - Packet loss = 0%, RTA = 26.57 ms [23:27:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:28:34] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.159 second response time [23:29:00] !log tstarling synchronized php-1.22wmf1/extensions/Nostalgia [23:29:06] Logged the message, Master [23:32:54] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [23:37:05] greg-g, can I push another fix? [23:44:24] gwicke: https://www.varnish-cache.org/trac/wiki/VCLExampleEnableForceRefresh [23:45:16] New review: Jeremyb; "fu Ibf891307ab2d3ac76ca" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58419 [23:46:00] New review: Jeremyb; "fixed Iffb9a5e7c2c96600f" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/58435 [23:47:03] anybody deploying right now? I need to push a fiix [23:50:20] PROBLEM - Varnish traffic logger on cp1041 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [23:57:20] PROBLEM - SSH on gadolinium is CRITICAL: Server answer: [23:58:20] RECOVERY - SSH on gadolinium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0)