[00:02:51] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 00:02:39 UTC 2013 [00:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [00:04:00] RECOVERY - DPKG on db1048 is OK: All packages OK [00:15:21] bd808: There's a bot that capable of relaying changes. [00:33:30] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 00:33:26 UTC 2013 [00:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [01:06:51] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 01:06:45 UTC 2013 [01:07:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [01:25:28] (as I mentioned on #wikimedia-dev) git.wikimedia.org seems down, proxy error reading a blob URL and timeout accessing https://git.wikimedia.org [01:30:46] spagewmf: yeah, chad knows :/ [01:30:53] spagewmf: apparently bots aren't being nice [01:31:00] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [01:31:00] PROBLEM - Puppet freshness on holmium is CRITICAL: No successful Puppet run in the last 10 hours [01:31:00] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [01:31:00] PROBLEM - Puppet freshness on pdf3 is CRITICAL: No successful Puppet run in the last 10 hours [01:31:00] PROBLEM - Puppet freshness on sq41 is CRITICAL: No successful Puppet run in the last 10 hours [01:31:01] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [01:31:12] greg-g thx for the update, no worries [01:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 01:32:45 UTC 2013 [01:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [02:03:00] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [02:03:00] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [02:03:00] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [02:03:00] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [02:03:00] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [02:03:01] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [02:03:20] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 02:03:19 UTC 2013 [02:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [02:08:47] !log LocalisationUpdate completed (1.22wmf12) at Wed Aug 7 02:08:47 UTC 2013 [02:08:59] Logged the message, Master [02:22:30] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Aug 7 02:22:30 UTC 2013 [02:22:41] Logged the message, Master [02:33:30] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 02:33:21 UTC 2013 [02:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [03:02:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 03:02:38 UTC 2013 [03:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [03:33:30] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 03:33:19 UTC 2013 [03:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [03:35:03] (03CR) 10Dzahn: "(1 comment)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/71968 (owner: 10Hashar) [03:56:00] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [04:05:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 04:05:45 UTC 2013 [04:06:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [04:33:00] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 04:32:51 UTC 2013 [04:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [04:35:56] (cross-posted from #wikimedia-dev) Are production wfDebug logs archived, and for how long? [04:36:53] I'd like to help a Wikisym researcher answer questions about edit conflicts, and the debug logs seem to be the only place I can find this information. [04:37:27] Obviously, they are sensitive and we would have to review any data before publishing. [05:02:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 05:02:39 UTC 2013 [05:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [05:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 05:32:46 UTC 2013 [05:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [05:56:10] PROBLEM - RAID on snapshot3 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:02:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 06:02:41 UTC 2013 [06:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [06:03:43] (03PS1) 10Yuvipanda: Add redis lua library to labsproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 [06:04:19] (03CR) 10jenkins-bot: [V: 04-1] Add redis lua library to labsproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 (owner: 10Yuvipanda) [06:21:40] PROBLEM - Disk space on analytics1023 is CRITICAL: DISK CRITICAL - free space: / 1069 MB (3% inode=90%): [06:28:54] (03PS1) 10Rangilo Gujarati: Request filed at https://meta.wikimedia.org/w/index.php?title=Planet_Wikimedia&oldid=5668583 [operations/puppet] - 10https://gerrit.wikimedia.org/r/78009 [06:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 06:32:42 UTC 2013 [06:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [06:37:00] PROBLEM - Puppet freshness on neon is CRITICAL: No successful Puppet run in the last 10 hours [06:48:00] PROBLEM - Puppet freshness on db9 is CRITICAL: No successful Puppet run in the last 10 hours [07:01:16] (03PS2) 10Yuvipanda: Add redis lua library to labsproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 [07:02:40] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 07:02:35 UTC 2013 [07:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [07:07:22] (03PS1) 10ArielGlenn: switch primary and secondary in rsync between dump servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/78011 [07:08:06] (03PS3) 10Yuvipanda: Add redis lua library to labsproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 [07:08:14] git.wikimedia.org appears to be done. is that a known known? [07:08:26] it is? [07:08:39] 503 with guru meditation from varnish [07:09:05] https://dpaste.de/XWJEA/raw/ [07:10:57] I take it retrying gts you the same thing? [07:11:23] * apergos tries looking at a random commit [07:13:04] not giving me an error, nor doing anything else useful [07:15:05] yeah, and actually i heard someone grumbling about that earlier [07:15:27] at the time i assumed s/he reported it, but then i thought i had better check [07:15:44] now I got HTTP/1.1 503 Service Temporarily Unavailable [07:16:07] !log restarted gitblit, (btw the init script did not stop it properly) [07:16:13] try again. [07:16:18] you were too quick on the trigger [07:16:19] Logged the message, Master [07:16:29] worked. [07:16:37] thanks apergos [07:16:48] thanks for reporting it. ganglia showed nothing unusual [07:16:48] i'll file a bug for the init script [07:16:54] which means no load spike or any of that [07:17:01] it might not be the init script [07:17:13] i didn't think it was, but that's a bug regardless [07:17:34] it could be that [07:17:43] Didn't Chad fix that recently? [07:17:48] Or at least, made a changeset to do so [07:17:53] well, in that case I'll leave it to you to file the bug [07:18:14] simply because it sounds like the console log might be handy [07:18:19] Reedy: dunno [07:18:24] ori-l: apergos: This might be related https://gerrit.wikimedia.org/r/77909 [07:18:42] yeah, they were talking about this in the channel yesterday [07:18:53] now that you bring it up [07:19:04] we were kind of hesitant though to block google.... [07:19:12] understandable [07:19:16] Reedy: google is hammering it [07:19:21] I wonder if we can get em to jut slow down their craler [07:19:23] I saw that earlier [07:19:23] that would be enough [07:19:41] * apergos looks around at their google contacts... wrong tz but I'll ping someone [07:19:49] unless your "did d fix that" was talking about the int script [07:20:28] apergos: well gitblit does support lucene indexing (its just broken on our install afaik) so google doesn't really need to index it at all [07:20:43] and there is also github for full text search as well [07:22:08] I think we might has well have the results show up there, anyways I'll see what people say over there [07:22:20] it will take a few days likely to get feedback [07:23:46] (03PS1) 10Lcarr: updated to text-varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/78015 [07:23:49] hmmm maybe crawling that zip|gz|bzip2 link for every dir in every commit (or wherever else) is what is killing it [07:29:52] (03PS3) 10Dzahn: the existing etherpad-lite_1.0-wm2 package in a operations/debs repo for completeness [operations/debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/76654 [07:31:24] (03PS1) 10TTO: Give testwiki some custom namespaces [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78016 [07:32:51] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 07:32:49 UTC 2013 [07:33:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [07:38:48] LeslieCarr: what were you saying about esams/europe issues? I have an odd one, images not showing up for me on the projects (either originals or scaled) but works for retrieval from the u.s. (failure to retrieve anything from upload.wm) [07:40:54] and I see a new core dump on cr2 from 10am but who knows what that actually does [07:42:44] (03CR) 10ArielGlenn: [C: 032] switch primary and secondary in rsync between dump servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/78011 (owner: 10ArielGlenn) [07:43:47] (03CR) 10TTO: [C: 031] "All languages for which VE is enabled have these messages (except ru lacking 'visualeditor-beta-appendix' - but I don't think that is used" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/77269 (owner: 10Catrope) [07:46:10] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:51:10] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [07:55:56] (03PS1) 10ArielGlenn: wikivoyage-lb using text-varnish so updated checks accordingly (rt 5581) [operations/puppet] - 10https://gerrit.wikimedia.org/r/78017 [08:02:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 08:02:40 UTC 2013 [08:03:40] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:07:39] (03CR) 10Hashar: [C: 031 V: 032] "This will essentially produce the same console output to scap operators, so as far as I am concerned this is not going to disrupt my workf" [operations/puppet] - 10https://gerrit.wikimedia.org/r/77838 (owner: 10Pyoungmeister) [08:08:08] (03PS1) 10TTO: Set up flood flag for zhwikinews [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78018 [08:09:09] (03PS4) 10Yuvipanda: Add redis lua library to labsproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 [08:11:48] (03PS5) 10Yuvipanda: Add redis lua library to labsproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 [08:12:18] (03CR) 10ArielGlenn: [C: 032] wikivoyage-lb using text-varnish so updated checks accordingly (rt 5581) [operations/puppet] - 10https://gerrit.wikimedia.org/r/78017 (owner: 10ArielGlenn) [08:26:00] RECOVERY - Puppet freshness on neon is OK: puppet ran at Wed Aug 7 08:25:50 UTC 2013 [08:32:41] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 08:32:36 UTC 2013 [08:33:41] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [08:49:13] (03CR) 10coren: [C: 031] "The tool labs stuff is okay by me." [operations/puppet] - 10https://gerrit.wikimedia.org/r/75087 (owner: 10Ori.livneh) [08:51:05] (03PS1) 10Yuvipanda: Read routing tables from Redis [operations/puppet] - 10https://gerrit.wikimedia.org/r/78025 [08:54:41] PROBLEM - Disk space on analytics1023 is CRITICAL: DISK CRITICAL - free space: / 1069 MB (3% inode=90%): [08:55:38] (03CR) 10Tim Landscheidt: "What's the licence of modules/labsproxy/files/redis.lua?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 (owner: 10Yuvipanda) [09:02:51] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 09:02:43 UTC 2013 [09:03:41] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [09:19:54] (03PS4) 10Dzahn: the existing etherpad-lite_1.0-wm2 package in a operations/debs repo for completeness [operations/debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/76654 [09:20:25] (03Abandoned) 10Dzahn: RT #5464 - apply etherpad-lite live hack fix by apergos [operations/debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/76661 (owner: 10Dzahn) [09:21:38] (03PS5) 10Dzahn: the existing etherpad-lite_1.0-wm2 package in a operations/debs repo for completeness [operations/debs/etherpad-lite] - 10https://gerrit.wikimedia.org/r/76654 [09:30:54] (03PS1) 10ArielGlenn: re-enable rsync between dump servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/78036 [09:32:47] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 09:32:37 UTC 2013 [09:33:13] (03CR) 10ArielGlenn: [C: 032] re-enable rsync between dump servers [operations/puppet] - 10https://gerrit.wikimedia.org/r/78036 (owner: 10ArielGlenn) [09:33:37] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [09:38:07] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:39:08] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [09:42:07] PROBLEM - SSH on pdf2 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:43:07] RECOVERY - SSH on pdf2 is OK: SSH OK - OpenSSH_4.7p1 Debian-8ubuntu3 (protocol 2.0) [09:47:10] (03PS2) 10Lcarr: updated to text-varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/78015 [09:49:48] (03CR) 10Lcarr: [C: 032] updated to text-varnish [operations/puppet] - 10https://gerrit.wikimedia.org/r/78015 (owner: 10Lcarr) [09:50:16] LeslieCarr: what is in there besides a commit message? [09:50:27] oh [09:50:36] haha someone else fixed that in the meantime [09:50:38] me [09:50:39] :-D [09:50:42] yay [09:50:45] you rock [09:50:48] thanks! [10:06:27] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [10:22:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:23:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [10:32:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:32:57] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 10:32:52 UTC 2013 [10:33:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.122 second response time [10:33:27] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [10:40:52] (03PS1) 10ArielGlenn: puppetize and enable production xml dumps rsync to gluster public labs share [operations/puppet] - 10https://gerrit.wikimedia.org/r/78043 [10:41:32] (03CR) 10jenkins-bot: [V: 04-1] puppetize and enable production xml dumps rsync to gluster public labs share [operations/puppet] - 10https://gerrit.wikimedia.org/r/78043 (owner: 10ArielGlenn) [11:02:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:03:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [11:04:14] (03PS2) 10ArielGlenn: puppetize and enable production xml dumps rsync to gluster public labs share [operations/puppet] - 10https://gerrit.wikimedia.org/r/78043 [11:04:57] (03CR) 10jenkins-bot: [V: 04-1] puppetize and enable production xml dumps rsync to gluster public labs share [operations/puppet] - 10https://gerrit.wikimedia.org/r/78043 (owner: 10ArielGlenn) [11:08:22] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [11:12:52] RECOVERY - search indices - check lucene status page on search19 is OK: HTTP OK: HTTP/1.1 200 OK - 60075 bytes in 0.110 second response time [11:15:15] (03PS3) 10ArielGlenn: puppetize and enable production xml dumps rsync to gluster public labs share [operations/puppet] - 10https://gerrit.wikimedia.org/r/78043 [11:15:52] (03CR) 10jenkins-bot: [V: 04-1] puppetize and enable production xml dumps rsync to gluster public labs share [operations/puppet] - 10https://gerrit.wikimedia.org/r/78043 (owner: 10ArielGlenn) [11:18:03] (03PS4) 10ArielGlenn: puppetize and enable production xml dumps rsync to gluster public labs share [operations/puppet] - 10https://gerrit.wikimedia.org/r/78043 [11:21:22] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:24:12] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [11:25:37] @info dbkey [11:25:38] RoanKattouw_away: Unknown identifier (dbkey) [11:25:45] @info wikimania2013wiki [11:25:45] RoanKattouw_away: [wikimania2013wiki: s3 (DEFAULT)] db1019: 10.64.16.8, db1003: 10.64.0.7, db1010: 10.64.0.14, db1035: 10.64.16.24 [11:25:54] @info centralauth [11:25:54] RoanKattouw_away: [centralauth: s7] db1041: 10.64.16.30, db1007: 10.64.0.11, db1024: 10.64.16.13, db1028: 10.64.16.17 [11:26:04] @info db1024 [11:26:04] RoanKattouw_away: [db1024: s7] 10.64.16.13 [11:26:12] @replag s7 [11:26:12] RoanKattouw_away: [s7] db1041: 0s, db1007: 0s, db1024: 0s, db1028: 0s [11:31:32] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [11:31:32] PROBLEM - Puppet freshness on holmium is CRITICAL: No successful Puppet run in the last 10 hours [11:31:32] PROBLEM - Puppet freshness on pdf3 is CRITICAL: No successful Puppet run in the last 10 hours [11:31:32] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [11:31:32] PROBLEM - Puppet freshness on sq41 is CRITICAL: No successful Puppet run in the last 10 hours [11:31:33] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [11:33:52] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 11:33:45 UTC 2013 [11:34:22] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [11:52:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:53:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [12:02:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:03:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [12:03:28] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [12:03:28] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [12:03:28] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [12:03:28] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [12:03:28] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [12:03:29] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [12:08:48] !log removing 91.198.174.7/32 from ssl3001 & maerlant, old/deprecated ipv6 testing IP [12:09:00] Logged the message, Master [12:10:13] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:22:53] PROBLEM - Host mw31 is DOWN: PING CRITICAL - Packet loss = 100% [12:23:43] RECOVERY - Host mw31 is UP: PING OK - Packet loss = 0%, RTA = 26.61 ms [12:26:55] (03CR) 10ArielGlenn: [C: 032] puppetize and enable production xml dumps rsync to gluster public labs share [operations/puppet] - 10https://gerrit.wikimedia.org/r/78043 (owner: 10ArielGlenn) [12:33:43] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 12:33:42 UTC 2013 [12:34:13] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [12:38:52] (03PS1) 10ArielGlenn: fix rsync dup package declaration in mirror.pp/download.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/78051 [12:42:08] (03CR) 10ArielGlenn: [C: 032] fix rsync dup package declaration in mirror.pp/download.pp [operations/puppet] - 10https://gerrit.wikimedia.org/r/78051 (owner: 10ArielGlenn) [12:46:12] (03PS1) 10ArielGlenn: fix ensure typo in dump gluster rsync [operations/puppet] - 10https://gerrit.wikimedia.org/r/78053 [12:46:29] (03CR) 10jenkins-bot: [V: 04-1] fix ensure typo in dump gluster rsync [operations/puppet] - 10https://gerrit.wikimedia.org/r/78053 (owner: 10ArielGlenn) [12:48:22] (03PS2) 10ArielGlenn: fix ensure typo in dump gluster rsync [operations/puppet] - 10https://gerrit.wikimedia.org/r/78053 [12:51:41] (03CR) 10ArielGlenn: [C: 032] fix ensure typo in dump gluster rsync [operations/puppet] - 10https://gerrit.wikimedia.org/r/78053 (owner: 10ArielGlenn) [13:07:36] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [13:32:56] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 13:32:50 UTC 2013 [13:33:36] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [13:38:57] (03PS1) 10Dereckson: (bug 52578) New user group 'botadmin' on ckb.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78056 [13:56:56] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours [14:10:42] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [14:29:15] (03PS1) 10Jgreen: change shell for user otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/78058 [14:32:02] (03CR) 10Jgreen: [C: 032 V: 031] change shell for user otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/78058 (owner: 10Jgreen) [14:33:12] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 14:33:02 UTC 2013 [14:33:42] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [14:50:19] <^d> manybubbles: I've almost finished reviewing your index-splitting change. [14:50:31] Thanks! [14:50:35] I'm sorry there is so much of it [14:51:42] <^d> No worries. [14:52:37] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:53:27] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [15:03:47] <^d> manybubbles: Review posted. tl;dr: "Mostly good, need some minor fixes & clarification" [15:04:08] will fix! [15:09:20] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [15:21:39] (03PS2) 10Petr Onderka: Reading from MediaWiki [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/77906 [15:22:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:25:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [15:26:07] <^d> manybubbles: And I did that all before breakfast...I should eat now :) [15:26:23] ^d: I was thinking it is early there [15:27:19] * ^d is an early riser [15:27:22] <^d> I'm usually up and about by 6:30 at the latest. [15:30:05] (03PS3) 10Petr Onderka: Reading from MediaWiki [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/77906 [15:32:24] (03PS1) 10Ori.livneh: EventLogging: Specify 'raw=1' in raw logger input stream URIs [operations/puppet] - 10https://gerrit.wikimedia.org/r/78070 [15:32:35] yo manybubbles [15:32:47] any chance you could merge that? ^ [15:32:50] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 15:32:42 UTC 2013 [15:33:00] it's a config change for a software / server i administer [15:33:06] really trivial. [15:33:20] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [15:33:21] manybubbles merging on ops/puppet [15:33:33] he doesn't have permissions to do that :) [15:33:45] hey paravoid [15:33:47] * ^d will give everyone permissions just for fun :) [15:33:49] yeah - I was gonna say: "0 chance. I can't click +2" [15:33:51] (03CR) 10Faidon: [C: 032] EventLogging: Specify 'raw=1' in raw logger input stream URIs [operations/puppet] - 10https://gerrit.wikimedia.org/r/78070 (owner: 10Ori.livneh) [15:33:58] thanks :) [15:34:18] (03CR) 10Faidon: [C: 032] Rename 'redis.py' to 'redis_monitoring.py' to avoid conflict [operations/puppet] - 10https://gerrit.wikimedia.org/r/77657 (owner: 10Ori.livneh) [15:34:31] bonus round! [15:34:43] ^d: fun one: https://bugzilla.wikimedia.org/show_bug.cgi?id=52612 [15:34:52] for after breakfast [15:35:33] <^d> Ouch. [15:35:34] <^d> Yeah [15:35:58] <^d> paravoid: Another easy one :) https://gerrit.wikimedia.org/r/#/c/76884/ [15:40:23] (03PS2) 10Faidon: svn::server: remove some unused packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/76884 (owner: 10Demon) [15:40:57] there's nothing inherent with svn not working when e.g. graphviz is installed [15:41:20] puppet question: any way/examples of a class that adds content to an apache vhost's config file? [15:41:20] so I don't think an ensure => absent is appropriate; it might be installed due to other packages in the system, for example [15:41:48] (03CR) 10Faidon: [C: 032 V: 032] svn::server: remove some unused packages [operations/puppet] - 10https://gerrit.wikimedia.org/r/76884 (owner: 10Demon) [15:44:11] (03PS1) 10Jgreen: fixed exim group for otrs delivery pipe [operations/puppet] - 10https://gerrit.wikimedia.org/r/78072 [15:45:21] (03CR) 10Jgreen: [C: 032 V: 031] fixed exim group for otrs delivery pipe [operations/puppet] - 10https://gerrit.wikimedia.org/r/78072 (owner: 10Jgreen) [15:45:26] <^d> paravoid: So basically I want to remove the bits from the config entirely (they're unused now that svn is r/o). [15:45:46] <^d> Eventually I need to move the read-only service off formey since it's EOL and in Tampa. [15:46:00] <^d> And I don't want a bunch of old junk on whatever eqiad box it ends up on :) [15:46:51] ^d: saw my PS2 above? :) [15:46:52] I merged that [15:46:56] and removed the packages manually on formey [15:47:50] <^d> Hehe, no I didn't :) [15:47:52] <^d> Thanks then [15:48:52] (03PS2) 10Demon: Remove obsolete backup stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/76885 [15:52:20] ^d: ensure => absent for /svnroot/bak won't work [15:52:27] it won't be recursive [15:52:35] you need recurse => true [15:52:36] <^d> Ah, I can just nuke it by hand. [15:52:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:52:52] but it's the same as before, I'm not sure if I see the point of having puppet clean up this one box [15:53:07] let's just remove the dump class altogether and clean up manually, it looks very simple [15:53:37] if it was something on the appservers I'd probably have a different opinion [15:54:00] <^d> amending. [15:54:05] (03PS4) 10Petr Onderka: Reading from MediaWiki [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/77906 [15:54:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.134 second response time [15:54:29] (03CR) 10Petr Onderka: [C: 032 V: 032] Reading from MediaWiki [operations/dumps/incremental] (gsoc) - 10https://gerrit.wikimedia.org/r/77906 (owner: 10Petr Onderka) [15:54:49] (03PS3) 10Demon: Remove obsolete backup stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/76885 [15:55:17] (03CR) 10Faidon: [C: 032] Remove obsolete backup stuff [operations/puppet] - 10https://gerrit.wikimedia.org/r/76885 (owner: 10Demon) [15:56:18] svndump.php & cron cleaned up [15:56:27] shall I do /svnroot/bak too? [15:56:33] <^d> Yeah that's not needed anymore. [15:56:40] seems to be backups, so I thought to confirm first :) [15:56:53] done [15:57:03] <^d> Old backups. I made final backups and gzip'd them. [15:57:10] perfect [15:57:26] <^d> I figured the old rotated backups on tridge can be nuked to save some space, and the one-off backups get copied $somewhereSafe. [15:57:57] ori-l: time to chat about a vagrant+puppet+apache design question? https://www.mediawiki.org/wiki/User:BDavis_(WMF)/Notes/Thumb.php_with_Vagrant#Puppet [15:58:20] <^d> paravoid: The final backups for wherever we want to stash them are currently at formey:/svnroot/final-backup/ [15:58:22] bd808: oh, are you working on thumb.php? [15:58:32] paravoid: yes [15:58:40] ^d: I have zero clue about our backup system(s) [15:58:48] ^d: akosiaris is starting to ramp up on this, afaik [15:58:54] I got it working with manual changes. Now I want to turn it into a role for vagrant [15:58:58] bd808: cool [15:59:07] bd808: I do media storage among other things [15:59:07] <^d> paravoid: Cool, I'll sync up with him then :) [15:59:36] bd808: so I'm available if there's something I can help you with [15:59:57] and don't worry about asking silly newbie questions, I won't judge at all :) [16:00:07] I've been there too and it wasn't too far in the past [16:00:13] At the moment I just need some puppet & apache style guidance I think [16:00:36] I need to add a block of config to the apache vhost when the role is enabled [16:00:39] the vagrant puppet stuff are a bit different than production [16:00:44] and I don't know much about it [16:01:00] so for example, we don't do LocalSettings.php via puppet on prod [16:01:01] and I have a couple ideas of how that might be done, but don't know which is "better" or "right" [16:01:14] or maybe even "possilble" [16:01:44] bd808: I have to finish syncing a change and then run off. I took a very cursory glance and it looks like good progress. I'll take a look in a few hours, if that's cool, unless paravoid figures it out first. (paravoid, thanks for helping.) [16:02:03] I don't think I can be of much help here... [16:02:26] ori-l: cool beans. I'll take a stab at something and then you can tell me what I did so very wrong. :) [16:08:51] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:11:12] bd808: you probably want something like [16:11:13] https://dpaste.de/6OGHH/raw/ [16:11:23] (poorly formatted because i did it in a hurry) [16:13:22] ori-l: my brain was making it much harder than that, but that might work [16:14:24] if you're feeling up to it, you could factor out of the mediawiki::extension resource type a more generic mediawiki::settings resource type that gives the same nice mapping of puppet hashes to PHP code (paravoid is probably shuddering) [16:14:42] i've been meaning to do that, and modify mediawiki::extension so that it uses it [16:14:55] i'm still very jetlagged, so if that makes no sense at all, it's probably my fault [16:15:11] ori-l: are you in HK? [16:15:17] yep [16:15:25] GO TO BED! [16:15:43] was just about to go out for a while actually :P [16:16:19] * bd808 shakes head at kids these days [16:22:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:23:21] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [16:30:16] (03PS1) 10Jgreen: move otrs homedir from /opt/otrs-home to /var/lib/otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/78082 [16:31:16] (03CR) 10Jgreen: [C: 032 V: 031] move otrs homedir from /opt/otrs-home to /var/lib/otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/78082 (owner: 10Jgreen) [16:33:11] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 16:33:07 UTC 2013 [16:33:51] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [16:37:19] (03PS1) 10Demon: Redo search configuration [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78083 [16:39:20] (03PS2) 10Demon: Redo search configuration [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78083 [16:44:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:46:25] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 4.728 second response time [16:48:45] PROBLEM - Puppet freshness on db9 is CRITICAL: No successful Puppet run in the last 10 hours [16:49:24] haaai paravoid, have you created a librdkafka .deb yet? [16:49:30] no [16:49:40] well, I did months ago but it's outdated by now [16:50:02] ok, but the debian/ stuff is there? [16:50:04] i could build another one? [16:57:49] /usr/bin/ld.bfd.real: unrecognized option '-Wl,-z,relro' [17:00:50] ottomata: (working on updating it) [17:00:55] what do you need for btw? [17:02:30] want to play with varnishkafka in labs [17:05:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:05:48] I haven't started to work on varnishkafka packaging yet [17:07:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.125 second response time [17:08:50] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [17:11:11] blergh, makefile is broken [17:22:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:23:42] !log authdnsiupdate for lists.w.o spf records [17:23:44] greg-g: ^ [17:23:54] Logged the message, RobH [17:24:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [17:24:56] why is google on that SPF record? [17:25:00] it shouldn't be [17:26:00] RobH: wee [17:26:01] I think 'v=spf1 a mx ~all' should be enough [17:26:08] the rest are really unneeded for lists [17:26:21] RobH: ^ [17:26:24] =/ [17:26:37] so dont include ipv4/ipv6 ranges or google? [17:27:01] (is that since they are included in top level?) [17:27:28] google won't ever send from a @lists.wikimedia.org address [17:27:52] this doesn't belong to google (until we switch to google groups anyway *ducks*) [17:28:19] and the rest aren't needed because all the MTAs that would ever send from @lists.wikimedia.org are on the MX record [17:28:21] * greg-g kicks paravoid  [17:29:03] :D [17:33:00] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 17:32:55 UTC 2013 [17:33:50] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [17:36:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:37:53] RobH: are you fixing? [17:39:48] sorry, was in PMs with folks [17:40:02] paravoid: ok, so lemme fix [17:40:11] this is why the 5 minute ttl rocks [17:40:13] heh [17:41:00] paravoid: mx all? [17:41:14] lists.wikimedia.org. 600 IN TXT "v=spf1 ?all" [17:41:17] yes? [17:41:41] no [17:41:45] lists.wikimedia.org. 600 IN TXT "v=spf1 ?all" [17:41:45] lists.wikimedia.org. 600 IN SPF "v=spf1 ?all" [17:41:53] ok, then i misunderstand. [17:42:07] 20:26 < paravoid> I think 'v=spf1 a mx ~all' should be enough [17:42:14] "a mx" [17:42:38] lists has the mx of lists and mchenry [17:42:46] yes, that's fine [17:43:03] im confused as shit. [17:43:09] what's the issue? [17:43:18] ottomata: https://github.com/paravoid/librdkafka/tree/debian [17:43:36] lists 1H IN A 208.80.154.4 ; service IP on sodium [17:43:36] 1H IN MX 10 lists [17:43:36] 1H IN MX 60 mchenry [17:43:37] ; lists.wikimedia.org SPF txt and rr records [17:43:39] lists.wikimedia.org. 600 IN TXT "v=spf1" [17:43:41] woot, danke paravoid [17:43:41] lists.wikimedia.org. 600 IN SPF "v=spf1" [17:43:43] ? [17:43:47] that seems fucked up to me [17:44:06] IN TXT "v=spf1 a mx ~all" [17:44:14] ok [17:45:00] "mx" inside the SPF record means "the mx (mail receivers) of this domain are trusted to send mails on behalf of this domain" [17:45:10] ahh, ok [17:45:28] and i still need both txt and spf with that in it [17:45:33] just confirming before i push ;] [17:45:44] yeah, if powerdns accepts SPF RRs, yes [17:45:45] or you can just svn diff on sockpuppet [17:46:03] I just did, +2 [17:46:03] we will find out i suppose...? ;P [17:46:06] cool [17:46:41] paravoid: thank you for explaining it, its appreciated =] [17:47:03] !log authdns-update (just incase i break it all) [17:47:15] Logged the message, RobH [17:48:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [17:54:26] paravoid: So we have the ip subnets in the wikimedia.org domain since we have individual servers sending mail on behalf of wikimedia.org right? [17:54:41] I guess? :) [17:54:42] where as lists is only from list server, so we restrict to the MX hosts which include that [17:54:59] thats the only reason i can think we would have the IPs in there instead of the pointer to mx records [17:55:03] plus google domain stuff... [17:55:09] but yes, that's the effect, that any server within our networks can be "trusted" to send mail with from: @wikimedia.org [17:55:38] well holy shit i think i get how the spf shit works now.. [17:55:51] or i really dont and have fooled myself [17:55:56] either way i feel accomplished. [17:57:07] ideally we would have two designated smarthosts that would handle all outgoing traffic for @wikimedia.org (plus google, of course) [17:57:32] when will I be able to test it via looking at email headers? [17:57:32] maybe we are already, no idea really [17:57:56] greg-g: uhh, its live on our servers now, but who knows [17:57:57] and we'd also do dkim signing [17:58:17] RobH: yeah, just looking at my latest mailing list email and still see "neutral" in the SPF header [17:58:26] 10mins after RobH's change, 17:47 UTC + 10 = 17:57 UTC [17:58:34] i.e. right about now [17:58:43] you still will see neutral [17:58:43] * greg-g nods [17:58:46] oh [17:58:48] that hasn't changed [17:59:24] I don't know why google suggested adding SPF with neutral, not sure how they factor this in their spam scoring [18:02:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:03:23] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [18:10:33] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:11:39] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [18:12:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [18:15:24] I now get: [18:15:25] Received-SPF: softfail (google.com: domain of transitioning wiki-research-l-bounces@lists.wikimedia.org does not designate 129.13.185.202 as permitted sender) client-ip=129.13.185.202; [18:21:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:22:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.129 second response time [18:25:43] grrr [18:26:10] (03PS1) 10Jgreen: cron jobs for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/78089 [18:26:29] mailout.scc.kit.edu how is that sending for us? [18:26:33] greg-g: [18:28:14] greg-g: 129.13.185.202 isnt us.... [18:28:30] so not sure what that exactly means [18:29:23] I have a guess... [18:30:01] do tell [18:30:27] well, from the mails I've seen google seems to just ignore our hops for SPF and check the hop before us [18:30:41] and I have a theory on where the bug lies [18:30:47] it ignores the ipv6 hops :) [18:32:14] suck [18:32:22] maybe, who knows [18:32:25] it shouldn't do that [18:33:19] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 18:33:15 UTC 2013 [18:33:39] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [18:34:34] paravoid: mind if I cc you on ths thread with the google rep? [18:34:44] or should I cc RobH ? [18:34:46] :) [18:35:57] Authentication-Results: mx.google.com; spf=pass (google.com: domain of root@lists.wikimedia.org designates 2620:0:860:2:219:b9ff:fedd:c027 as permitted sender) smtp.mail=root@lists.wikimedia.org [18:36:01] this worked [18:36:20] greg-g: feel free but I don't have high hopes there [18:36:35] hmmm, that's good [18:36:48] and as Tim said, something changed recently, so us hunting SPF issues is kind of orthogonal [18:37:06] (03PS1) 10Mark Bergsma: Setup a misc services varnish cluster [operations/puppet] - 10https://gerrit.wikimedia.org/r/78090 [18:37:27] argh, and just as I check wikitech to find an email [18:37:31] bgp level3 issue [18:37:34] yeah [18:37:49] oh? [18:38:07] (and that message has a softfail and an ipv4 address) [18:38:31] mark: level3 doesn't get our ipv6 announcement, somehow [18:38:45] ok [18:38:49] why is that a problem though? [18:39:04] presumably because level3 is a tier 1? :) [18:39:19] doesn't matter [18:39:54] so is tinet [18:41:26] traceroute from level3's lg fails apparently [18:41:35] (03CR) 10Jgreen: [C: 032 V: 031] cron jobs for otrs [operations/puppet] - 10https://gerrit.wikimedia.org/r/78089 (owner: 10Jgreen) [18:41:56] so it doesn't only get our announcement directly, but also not from its peers [18:42:05] yeah, I didn't mean directly befoer [18:42:08] I meant at all [18:42:12] ok [18:42:16] yeah that is a problem ;) [18:42:49] for a DFZ network it is, yeah and esp. when it's a tier 1 :) [18:43:06] interesting [18:43:11] neither US nor european prefix [18:43:50] yeah [18:44:16] although [18:44:22] if I ask for 2620:0:862::/48 I do get a result [18:45:14] also 2620:0:862::/46 [18:45:18] if you don't select 'longer prefixes' [18:45:35] HONESTY-AS, lovely [18:45:41] No routes found for 2620:0:862::/48. [18:45:44] that's what level3 calls our AS [18:45:50] it's the previous owner of our ASN [18:45:52] MANY years ago [18:46:02] that was [18:46:04] Route results for 2620:0:862::/48 from Amsterdam, Netherlands [18:46:21] paravoid, i think librdkafka needs a build depends on zlib1g-dev [18:46:22] oh that was with longer prefixes [18:46:31] Route results for 2620:0:862::/48 from Amsterdam, Netherlands [18:46:31] BGP routing table entry for 2620:0:862::/48 [18:46:32] Paths: (4 available, best #2) [18:46:33] yep [18:46:35] aside from that, it works great [18:46:39] danke [18:46:45] ottomata: thanks, will fix [18:47:02] this was done really quickly obviously :) [18:47:11] aye ja [18:47:50] I don't see HONESTY-AS [18:47:54] wheredid you see tha? [18:48:01] AS-path translation: { SWIPNET WIKIMEDIA-EU } [18:48:04] in the bgp output for 2620:0:860 [18:48:11] ah, 860 [18:48:12] HONESTY-AS used to be AS14907 [18:48:25] but we've had that ASN since 2006 or 2007 or so ;) [18:49:03] hm, so they *do* get the route [18:49:08] yup [18:49:12] that longer prefixes checkbox is so broken [18:52:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:53:05] but ping/traceroute don't work, neither to eqiad nor esams [18:54:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.137 second response time [18:57:32] but given that there's a route, it's kind of weird that it's just stars in traceroute [19:02:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:02:40] aaaaaand my theory was right [19:03:10] if there received headers ipv4 (-> ipv4)* -> ipv6, it just ignores all the ipv6 ones altogether for SPF checks [19:03:13] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [19:03:23] and consider the sender as the first ipv4 it will find [19:03:34] even if it's 4 hops below [19:04:05] perfect, just perfect [19:06:23] just remove that SPF [19:07:00] well, the SPF is also wrong, since we don't have AAAA on the MX [19:07:43] but google is definitely buggy [19:09:45] !log authdns-update: removing lists' SPF [19:09:56] Logged the message, Master [19:12:13] watchmouse shows only one monitoring station that can't reach us over ipv6 [19:12:18] and it's in cologne, germany [19:12:39] its ipv6 may well be broken altogether [19:13:24] because another address outside our networks doesn't work either [19:13:38] nlnog to the rescue? [19:14:13] i don't have a login to that [19:14:16] I do [19:14:34] what should I trace? start with eqiad? [19:14:40] yes [19:14:41] paravoid: I'll update the BZ ticket about them being removed again, but if they ask why I may have to ask you to comment ;] [19:14:45] much more likely to be behind l3 [19:14:53] ie: spf entries, but its not priority. [19:16:25] haha [19:16:55] or maybe that's because he's in hongkong? ;) [19:17:15] a wikimania with spotty internet connectivity? never. [19:18:07] huh, no output, I wonder what I do wrong [19:18:21] i didn't get output from the irc bot tool either [19:18:31] oh there's a ring-trace though [19:18:35] yes [19:18:46] I was trying ring-all traceroute6 :)_ [19:19:12] hehe [19:19:14] there's ring-ping too [19:19:22] to be fair, the public Wifi at the conference isn't 100%, tfinc's not the only user who's getting issues [19:19:35] you know awfully lot about it for someone without access ;) [19:19:45] Jasper_Deng: i'm not blaming him, just trying to remove some noise atm ;) [19:20:17] paravoid: i just never bothered to get a node or request access, it's not like i haven't seen the project grow ;) [19:20:27] I know, kidding :) [19:21:49] Traceback (most recent call last): File "/usr/local/bin/ring-trace", line 1297, in [19:22:35] nice [19:22:43] I'll fallback to ring-all [19:23:09] (03PS2) 10Yuvipanda: Read routing tables from Redis [operations/puppet] - 10https://gerrit.wikimedia.org/r/78025 [19:23:13] no level3 anywhere [19:24:48] i'm not surprised [19:24:57] out of 151 traceroutes [19:25:03] ring node participants tend to be peered well ;) [19:26:15] well [19:26:28] that may point to level3 not announcing the route and blackholing though [19:26:39] which is good :) [19:26:42] yes [19:27:02] except for direct l3 customers [19:27:19] singlehomed ones [19:27:24] right [19:27:29] greg-g: The above Received-SPF: line was from your Google Mail account? [19:29:59] so besides the looking glass [19:30:04] does anyone actually see the issue? :) [19:31:08] someone reported it on wikitech [19:31:16] which was forwarded from outages@ [19:31:27] http://tty.gr/s/google-ipv6-spf.txt spot the error [19:33:08] i know, but that's the only report i've seen [19:33:12] and it seems based on the LG [19:33:18] although one probably wouldn't ask the LG if not experiencing the problem [19:36:34] now it works again, in the L3 LG [19:36:48] i'll send a response to wikitech [19:40:17] ok [19:41:51] import urlparse; print urlparse.urlparse('tcp://foo.bar/?q=123').query [19:42:04] python 2.7.5: 'q=123' [19:42:08] python 2.7.3: '' [19:42:22] that just ate an hour of my life. [19:42:34] Python 2.7.3 (default, Jan 2 2013, 13:56:14) [19:42:36] IPython 0.13.1 -- An enhanced Interactive Python. [19:42:39] In [1]: import urlparse; print urlparse.urlparse('tcp://foo.bar/?q=123').query [19:42:43] q=123 [19:42:49] * ori-l headdesks. [19:43:11] argh. try on a production host. [19:43:48] huh [19:44:12] empty string [19:44:18] Ubuntu's 2.7.3 is empty, Debian's 2.7.3 is okay [19:44:36] fedora also gives empty string (f18) [19:44:49] 2.7.3 of course [19:44:54] http://bugs.python.org/issue9374 [19:44:59] python2.7 (2.7.3-4) unstable; urgency=low [19:44:59] * Follwup for issue #9374. Restore the removed attributes in the [19:44:59] urlparse module. [19:45:01] -- Matthias Klose Sun, 26 Aug 2012 12:24:31 +0200 [19:45:24] python2.7 (2.7.3-0ubuntu3.2) precise-proposed; urgency=low [19:45:27] fun! [19:45:57] gahhh. [19:46:21] (this was backported in Debian patch in 2.7.3-4; Ubuntu precise forked off before 2.7.3 entered Debian) [19:46:38] -- Matthias Klose Thu, 28 Mar 2013 12:40:39 +0100 [19:48:49] ori-l: so the issue is that tcp:// is a custom scheme and python interpreted the RFCs at some point that query strings are protocol-specific, not a URI scheme generic thing [19:49:41] yeah, i came across that, but then tried it in a local python shell to verify that was indeed the case [19:51:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:51:53] and was a touch mystified by the different behavior [19:52:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [19:52:37] paravoid, did you ever set up varnishkafka? [19:52:42] no [19:52:45] hmm, ok [19:52:50] what's up? [19:52:55] i got it working, but just not sending to kafka broker [19:56:43] (03PS3) 10Yuvipanda: Read routing tables from Redis [operations/puppet] - 10https://gerrit.wikimedia.org/r/78025 [19:56:44] (03PS6) 10Yuvipanda: Add redis lua library to labsproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 [19:56:45] (03CR) 10Manybubbles: [C: 04-1] "(1 comment)" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78083 (owner: 10Demon) [19:58:13] (03CR) 10Manybubbles: "Oh, I did forget to say that this is way nicer then when I tried to setup CirrusSearch. I'm just getting nervous as we get closer to prod" [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78083 (owner: 10Demon) [20:01:16] (03PS4) 10Yuvipanda: Read routing tables from Redis [operations/puppet] - 10https://gerrit.wikimedia.org/r/78025 [20:01:17] (03PS7) 10Yuvipanda: Add redis lua library to labsproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 [20:01:26] Ooooooo its working! [20:01:36] yay [20:01:37] it looks like varnishkafka will only work if the topic already exits in kafka [20:01:42] it won't create its own topics [20:01:46] scfc_de: yeah (sorry, was getting lunch) [20:01:47] or, at least that's what just happened to me :0 [20:01:54] scfc_de: my @wikimedia.org account [20:03:01] greg-g: Okay, thanks. [20:03:45] (03PS5) 10Yuvipanda: Read routing tables from Redis [operations/puppet] - 10https://gerrit.wikimedia.org/r/78025 [20:03:46] (03PS8) 10Yuvipanda: Add redis lua library to labsproxy [operations/puppet] - 10https://gerrit.wikimedia.org/r/78002 [20:06:39] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [20:07:53] greg-g: I removed the SPF record again [20:08:11] greg-g: google's spf validator is buggy :) [20:08:14] (03PS6) 10Yuvipanda: Read routing tables from Redis [operations/puppet] - 10https://gerrit.wikimedia.org/r/78025 [20:11:01] (03PS1) 10Danny B.: cswikiquote: Add custom namespace for works [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/78142 [20:13:22] paravoid: ugh, and I just emailed OIT saying 'the SPF thing is done, mostly" [20:13:25] :) [20:13:33] * greg-g hates being go between [20:13:45] one issue tracked in 3 different trackers [20:13:47] well, SPF in this case actually made matters worse [20:13:49] OIT, RT, and BZ [20:13:54] oh? huh, awesome [20:14:10] well, yeah, absence of an SPF is "neutral", i.e. shouldn't affect spam scoring [20:14:33] (which is why I've said from the get go that this sudden spam classification has a different reason) [20:14:40] yeah [20:14:46] a correct SPF, may give minus spam points [20:14:57] but in this case due to google's bug, it turned it into a softfail [20:15:11] which actually (presumably) increases the spam score :) [20:15:47] awesome [20:15:50] softfail for lists or google apps mail? [20:16:26] lists [20:16:37] greg-g: http://tty.gr/s/google-ipv6-spf.txt :) [20:16:56] (that's with @wikimedia.org, but the same with @lists.wikimedia.org) [20:17:05] *it's [20:18:48] the messages I received passed [20:21:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:23:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.857 second response time [20:25:11] paravoid: alright, I'm passing this off to you, ken, and Michael now. I'm revoking my in-betweener status now :) [20:25:42] do you have sufficient perms to revoke? :) [20:26:07] WONTFIX grants them [20:26:34] jeremyb: luckily, it is my self-hosted, internal issue tracker, so yes. :) [20:28:18] okay, found proper google contacts [20:28:21] I'll try pinging them [20:29:16] the spf issues are orthogonal to the recent problems. [20:29:22] they'd help, sure, but please ignore them for now [20:29:41] paravoid: will do [20:35:49] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 20:35:48 UTC 2013 [20:36:29] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:36:39] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [20:39:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.177 second response time [20:42:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:43:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.140 second response time [20:52:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:53:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.132 second response time [20:58:52] (03PS1) 10Ottomata: Updating debian/bin/kafka with new bin scripts and removed obsolete ones. [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/78153 [21:01:33] (03PS2) 10Ottomata: Updating debian/bin/kafka with new bin scripts and removed obsolete ones. [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/78153 [21:03:17] (03CR) 10Ottomata: [C: 032 V: 032] Updating debian/bin/kafka with new bin scripts and removed obsolete ones. [operations/debs/kafka] (debian) - 10https://gerrit.wikimedia.org/r/78153 (owner: 10Ottomata) [21:07:25] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [21:22:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:23:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.127 second response time [21:25:30] !log authdns-update: adding lists AAAA & (neutral) SPF [21:25:43] Logged the message, Master [21:27:11] :) [21:27:54] just cleaning up house [21:32:15] PROBLEM - Puppet freshness on erzurumi is CRITICAL: No successful Puppet run in the last 10 hours [21:32:15] PROBLEM - Puppet freshness on holmium is CRITICAL: No successful Puppet run in the last 10 hours [21:32:15] PROBLEM - Puppet freshness on pdf3 is CRITICAL: No successful Puppet run in the last 10 hours [21:32:15] PROBLEM - Puppet freshness on manutius is CRITICAL: No successful Puppet run in the last 10 hours [21:32:15] PROBLEM - Puppet freshness on sq41 is CRITICAL: No successful Puppet run in the last 10 hours [21:32:16] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: No successful Puppet run in the last 10 hours [21:33:15] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 21:33:14 UTC 2013 [21:33:25] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [21:46:38] !log stopping and restarting gitblit for google testing [21:46:49] Logged the message, Master [21:52:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:53:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [21:57:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:03:18] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [22:04:08] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: No successful Puppet run in the last 10 hours [22:04:08] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: No successful Puppet run in the last 10 hours [22:04:08] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: No successful Puppet run in the last 10 hours [22:04:08] PROBLEM - Puppet freshness on virt1 is CRITICAL: No successful Puppet run in the last 10 hours [22:04:08] PROBLEM - Puppet freshness on virt3 is CRITICAL: No successful Puppet run in the last 10 hours [22:04:09] PROBLEM - Puppet freshness on virt4 is CRITICAL: No successful Puppet run in the last 10 hours [22:14:08] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [22:15:35] hey ^d, I have two questions if you're available [22:28:02] <^d> paravoid: Shoot [22:28:20] first is, how do we delete github projects when the corresponding gerrit project gets deleted? [22:28:31] <^d> By hand at the moment :\ [22:30:12] okay [22:32:12] <^d> The delete action doesn't have a hook (yet), so I can't tie in and clean up replicated copies. [22:32:25] <^d> On my list for things that only bother me really so I've not gotten to it :) [22:33:15] (03PS1) 10Edenhill: Added support for escaping troublesome characters in tag content. [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/78157 [22:33:28] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 22:33:20 UTC 2013 [22:33:42] okay [22:34:04] could you delete https://github.com/wikimedia/operations-varnishkafka when you get the chance then? [22:34:08] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [22:34:09] it has been replaced by [22:34:16] https://github.com/wikimedia/operations-software-varnish-varnishkafka [22:34:23] which brings me to my second question [22:34:29] do the mappings have to be 1:1? [22:34:40] or could we make this e.g. github.com/wikimedia/varnishkafka ? [22:34:49] <^d> No, they don't. Christian added some functionality where we can change that 1:1 mapping. [22:34:59] <^d> It just defaults to the 1:1. [22:35:01] cool! [22:35:28] so, what's the process? asking you/chris over irc? mail? bug report? [22:35:52] <^d> It should be puppetized, gimme a sec and I'll look. [22:35:58] <^d> I think we already enabled it once. [22:40:46] (03PS1) 10Demon: Replicate varnishkafka to a custom destination [operations/puppet] - 10https://gerrit.wikimedia.org/r/78158 [22:40:48] <^d> paravoid: ^ [22:41:04] oh, great [22:41:05] thanks! [22:41:24] so I guess after this gets in you need to delete both of the old ones, right? [22:41:50] <^d> Yeah, keeps people from getting confused :) [22:41:53] (03CR) 10Faidon: [C: 032] Replicate varnishkafka to a custom destination [operations/puppet] - 10https://gerrit.wikimedia.org/r/78158 (owner: 10Demon) [22:46:03] <^d> !log reloaded gerrit replication plugin to pick up new config [22:46:14] Logged the message, Master [22:46:14] <^d> I should puppetize that. [22:56:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:57:17] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.124 second response time [23:00:21] (03PS1) 10Edenhill: Added JSON formatter, field name identifers and type casting option. [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/78160 [23:00:22] (03PS1) 10Edenhill: Link with -lrt [operations/software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/78161 [23:07:54] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [23:22:24] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:23:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.157 second response time [23:32:08] (03PS1) 10Faidon: mailman: add IPv6 support to lighttpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/78164 [23:32:54] RECOVERY - Puppet freshness on mexia is OK: puppet ran at Wed Aug 7 23:32:49 UTC 2013 [23:33:14] (03PS2) 10Faidon: mailman: add IPv6 support to lighttpd [operations/puppet] - 10https://gerrit.wikimedia.org/r/78164 [23:33:47] (03CR) 10Faidon: [C: 032 V: 032] "Tested." [operations/puppet] - 10https://gerrit.wikimedia.org/r/78164 (owner: 10Faidon) [23:33:54] PROBLEM - Puppet freshness on mexia is CRITICAL: No successful Puppet run in the last 10 hours [23:34:29] warning -- i'll be rebooting cr2-knams in ~10 minutes [23:43:24] !log deactivating external bgp neighbors on cr2-knams [23:43:35] Logged the message, Mistress of the network gear. [23:45:34] PROBLEM - Host mobile-lb.esams.wikimedia.org_ipv6_https is DOWN: PING CRITICAL - Packet loss = 100% [23:45:37] PROBLEM - Host mobile-lb.esams.wikimedia.org_ipv6 is DOWN: PING CRITICAL - Packet loss = 100% [23:46:24] RECOVERY - Host mobile-lb.esams.wikimedia.org_ipv6 is UP: PING OK - Packet loss = 0%, RTA = 85.80 ms [23:46:49] !log rebooting cr2-knams [23:46:59] Logged the message, Mistress of the network gear. [23:50:44] RECOVERY - Host mobile-lb.esams.wikimedia.org_ipv6_https is UP: PING OK - Packet loss = 0%, RTA = 85.57 ms [23:51:25] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:53:14] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.126 second response time [23:57:14] PROBLEM - Puppet freshness on mchenry is CRITICAL: No successful Puppet run in the last 10 hours