[00:00:36] !log gallium - restarting Apache, enabling new ServerAlias [00:00:45] Logged the message, Master [00:07:38] New patchset: Pyoungmeister; "test hack coredb_mysql module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36677 [00:09:12] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36677 [00:13:16] New patchset: Pyoungmeister; "Revert "test hack coredb_mysql module"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36680 [00:13:21] New patchset: Asher; "db68 was missing from hostsByName" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36681 [00:13:22] New patchset: Ryan Lane; "Add grains for realm, site and cluster" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36682 [00:13:50] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36682 [00:14:16] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36681 [00:14:27] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36680 [00:15:10] !log asher synchronized wmf-config/db.php 'adding db68 to s7' [00:15:18] Logged the message, Master [00:18:29] New review: Dzahn; "this breaks because of librsvg Ubuntu packages vs. our own. can be fixed manually by removing librsv..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36583 [00:28:55] New patchset: Ryan Lane; "Get repourls by site before iterating repos" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36685 [00:29:15] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36685 [00:30:27] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:34:46] New patchset: Dzahn; "add librsvg2-2 package to gallium (integration)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36688 [00:35:49] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36688 [00:43:35] !log pgehres synchronized php-1.21wmf5/extensions/CentralNotice/ 'Updating CentralNotice to master. Adding API for CN logs' [00:43:43] Logged the message, Master [00:44:40] Ryan_Lane: stop trying to hax0r tin [00:44:45] heh [00:44:53] I fucked up the permissions on the sudoers file I created :) [00:46:01] :) [00:46:48] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.515 seconds [00:46:57] PROBLEM - MySQL Slave Delay on db1013 is CRITICAL: CRIT replication delay 312 seconds [00:47:31] !log pgehres synchronized php-1.21wmf4/extensions/CentralNotice/ 'Updating CentralNotice to master. Adding API for CN logs' [00:47:39] Logged the message, Master [01:00:45] !log gallium - removing multiarch support (32bit on 64bit systems) which causes issues with librsvg package [01:00:54] Logged the message, Master [01:01:35] ah.. there is the culprit /Stage[main]/Misc::Contint::Android::Sdk/Package[ia32-libs] [01:03:27] New review: Dzahn; "this and removing multiarch support (/etc/dpkg/dpkg.cfg.d/multiarch) fixes the issue with librsvg. w..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36688 [01:04:09] New patchset: Ryan Lane; "Add deployment pillars to deployment servers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36691 [01:05:34] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36691 [01:05:51] RECOVERY - MySQL Slave Delay on db1013 is OK: OK replication delay 0 seconds [01:13:12] New patchset: Ryan Lane; "Ensure the join sets a value" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36693 [01:13:35] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36693 [01:17:54] pgehres gave E3 permission to sync-dir E3Experiments, starting it now [01:18:16] heh, I only said I wasn't doing anything anymore :-p [01:19:27] improved account creation will ensure happy new users who donate more money. It's the circle of life [01:19:48] except that we don't show banners to logged-in users :-p [01:19:56] it means less money!!! [01:20:06] actually spagewmf -- do you have any retention statistics on all the users we've pushed to sign up? [01:22:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:23:14] mwalker, DarTar would be the one. We haven't encouraged account creation yet (just made it better and given people surveys and improved Community Portal after signup). EE's ArticleFeedbackv5 has a call to action to create an account, as I recall it doesn't generate a lot of account creations. [01:30:36] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [01:31:07] !log spage synchronized php-1.21wmf5/extensions/E3Experiments 'E3 backports to wmf5' [01:31:17] Logged the message, Master [01:38:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.033 seconds [01:47:33] New patchset: Ryan Lane; "Fix repourl pulling" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36696 [01:48:52] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36696 [02:11:19] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:23:29] !log LocalisationUpdate completed (1.21wmf5) at Tue Dec 4 02:23:29 UTC 2012 [02:23:39] Logged the message, Master [02:24:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.097 seconds [02:44:31] !log LocalisationUpdate completed (1.21wmf4) at Tue Dec 4 02:44:31 UTC 2012 [02:44:41] Logged the message, Master [03:30:32] New patchset: Ryan Lane; "Various git deploy fixes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36699 [03:34:03] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36699 [03:41:20] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [03:48:53] New patchset: Ryan Lane; "Ensure the repo exists and not just the repodir" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36701 [03:54:01] New patchset: Ryan Lane; "Ensure the repo exists and not just the repodir" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36701 [03:54:42] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36701 [04:10:16] New patchset: Ryan Lane; "Move slots out of common" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36702 [04:10:39] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36702 [04:29:14] New patchset: Ryan Lane; "Allow tin to run deployment runners" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36704 [04:29:32] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36704 [04:40:51] New patchset: Ryan Lane; "Set pillar roots for production" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36706 [04:41:07] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36706 [04:52:12] New patchset: Ryan Lane; "Make salt master erb properly output runner_dirs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36707 [04:53:24] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36707 [05:06:06] New patchset: Ryan Lane; "Compatibility changes for deployment runner" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36708 [05:06:24] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36708 [05:08:37] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [05:08:37] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [05:25:45] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [05:25:45] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [06:27:27] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [06:41:17] New patchset: Ryan Lane; "Whitespace" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36709 [06:41:36] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36709 [07:49:44] New patchset: ArielGlenn; "expand description of pages-logging.xml, fixes bug #42618" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/36712 [09:24:40] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [09:28:06] !log nikerabbit synchronized php-1.21wmf5/extensions/Narayam/ 'i18n deploy' [09:28:21] Logged the message, Master [09:28:46] 10.28 <+logmsgbot> !log nikerabbit synchronized php-1.21wmf5/extensions/Narayam/ 'i18n deploy' [09:29:05] can I ignore this: [09:29:06] snapshot1002: rsync: mkdir "/apache/common-local/php-1.21wmf5/extensions/Narayam" failed: No such file or directory (2) [09:29:09] snapshot1002: rsync error: error in file IO (code 11) at main.c(605) [Receiver=3.0.9] [09:34:56] yes [09:35:12] it's in the process of being upgraded [09:35:17] thanks for checking [09:38:46] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [09:40:00] New patchset: ArielGlenn; "ms-be3 gets new setup as 720xd with ssds" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36720 [09:42:00] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36720 [09:43:11] whose runner change in puppet? [09:43:13] - ret = sorted(set(keys['minions_pre']) - set(minions)) [09:43:13] + ret = sorted(set(keys['minions_pre']) - set(minions)) [09:45:20] I guess this is a whitespace change [09:46:43] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [10:50:19] New patchset: Matthias Mullie; "Enable AFTv5 for dewiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/34964 [10:51:30] New review: Jarry1250; "Thanks for the pointer Daniel. This is not my area of expertise; hopefully one of the others can act..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36583 [11:32:02] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [12:17:22] Change abandoned: Hashar; "no time to work on that" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/31575 [12:17:26] Change abandoned: Hashar; "no time to work on that" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/31579 [12:17:31] Change abandoned: Hashar; "no time to work on that" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/31582 [12:17:38] Change abandoned: Hashar; "no time to work on that" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/31583 [13:42:35] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [14:08:42] New review: preilly; "Thanks." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36709 [14:26:19] New patchset: Mark Bergsma; "Don't software raid ms-fe300x, they have hw RAID" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36764 [14:26:45] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36764 [14:44:59] !log Installed ms-fe3001 [14:45:09] Logged the message, Master [15:09:21] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [15:09:21] PROBLEM - Puppet freshness on sockpuppet is CRITICAL: Puppet has not run in the last 10 hours [15:09:21] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [15:17:50] mark: can you check rt3687 and verify that the switch ports are enabled. I am not able to pxe boot..."media test failure" [15:18:00] ok [15:19:43] cmjohnson1: they look fine [15:20:04] hrmm..ok [15:26:27] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [15:26:27] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [15:42:59] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:49:36] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.027 seconds [15:54:32] PROBLEM - SSH on amslvs1 is CRITICAL: Server answer: [15:56:11] RECOVERY - SSH on amslvs1 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [15:58:11] !log authdns update...adding pc1001-1003 [15:58:20] Logged the message, Master [16:13:24] sbernardin: did you set the mgmt ip on ms-be3 [16:14:15] cmjohnson1: yes I did...that was done yesterday [16:14:40] ok..i can't ssh into it...did you fix the idrac7 w/temp license? [16:15:11] cmjohnson1: yes...I was able to log into idrac7 to verify [16:16:40] hmm [16:23:06] sbernardin: can you check the mgmt cable [16:23:47] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:28:44] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [16:40:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.019 seconds [16:46:57] New patchset: Dereckson; "(bug 42511) Namespaces configuration for ru.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/35663 [16:48:37] !log es1004 replacing bad hdd slot 1 (rt3954) [16:48:46] Logged the message, Master [16:49:45] New review: Dereckson; "PS2: community requested a new alias, T -> NS_TEMPLATE." [operations/mediawiki-config] (master) C: 0; - https://gerrit.wikimedia.org/r/35663 [17:13:36] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:24:34] apergos: mgmt is working now on ms-be3 [17:24:39] yay [17:24:49] New patchset: Andrew Bogott; "Move the puppetdoc websites out of /" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36785 [17:24:52] what was the problem, out of curiosity? [17:25:16] New patchset: Dereckson; "(bug 42690) Namespaces configuration for ro.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36787 [17:25:26] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36785 [17:25:53] apergos: from what I gather...there was a conflict with the old c2100 still cabled...steve disconnected and rebooted the new be3 [17:27:32] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.432 seconds [17:28:02] huh, ok [17:28:08] PROBLEM - Host srv278 is DOWN: PING CRITICAL - Packet loss = 100% [17:30:25] well I'm on it now so that's the good thing [17:30:50] RECOVERY - Host srv278 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms [17:34:26] PROBLEM - Apache HTTP on srv278 is CRITICAL: Connection refused [17:35:08] really? is that server still being a bleeper? [17:35:56] RECOVERY - Apache HTTP on srv278 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.047 second response time [17:40:50] !log cp1031 powering down for fan assembly/main board swap rt3614 [17:40:58] Logged the message, Master [17:43:29] New patchset: ArielGlenn; "update mac for ms-be3 (new host, old name)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36790 [17:43:36] !log reedy synchronized php-1.21wmf5/extensions/GoogleNewsSitemap/ [17:43:44] Logged the message, Master [17:44:04] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36790 [18:00:30] New patchset: Jgreen; "overhauled fundraising db dumper" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36792 [18:00:58] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36792 [18:01:17] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:04:34] New patchset: Jgreen; "deprecate fundraising dump config files" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36794 [18:05:16] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36794 [18:09:50] RECOVERY - Host ms-be3 is UP: PING OK - Packet loss = 0%, RTA = 0.46 ms [18:13:08] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: Connection refused by host [18:13:26] PROBLEM - SSH on ms-be3 is CRITICAL: Connection refused [18:13:26] PROBLEM - swift-object-auditor on ms-be3 is CRITICAL: Connection refused by host [18:13:35] PROBLEM - swift-container-server on ms-be3 is CRITICAL: Connection refused by host [18:13:35] PROBLEM - swift-account-reaper on ms-be3 is CRITICAL: Connection refused by host [18:13:35] PROBLEM - swift-object-replicator on ms-be3 is CRITICAL: Connection refused by host [18:13:44] PROBLEM - swift-container-replicator on ms-be3 is CRITICAL: Connection refused by host [18:13:44] PROBLEM - swift-container-updater on ms-be3 is CRITICAL: Connection refused by host [18:13:44] PROBLEM - swift-account-server on ms-be3 is CRITICAL: Connection refused by host [18:14:20] PROBLEM - swift-object-server on ms-be3 is CRITICAL: Connection refused by host [18:14:29] PROBLEM - swift-object-updater on ms-be3 is CRITICAL: Connection refused by host [18:14:29] PROBLEM - swift-account-replicator on ms-be3 is CRITICAL: Connection refused by host [18:14:47] PROBLEM - swift-account-auditor on ms-be3 is CRITICAL: Connection refused by host [18:19:14] New patchset: Andrew Bogott; "Remove the broken 'puppetsource' site." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36796 [18:19:35] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.022 seconds [18:20:12] !log reedy synchronized php-1.21wmf5/extensions/GoogleNewsSitemap [18:20:20] Logged the message, Master [18:20:34] New review: Andrew Bogott; "Reviewers --" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/36796 [18:34:26] PROBLEM - NTP on ms-be3 is CRITICAL: NTP CRITICAL: No response from NTP server [18:39:24] mutante / notpeter - ready for working on icinga ? [18:47:13] !log taking down bast1001 for reimaging [18:47:21] Logged the message, notpeter [18:51:23] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:54:59] PROBLEM - SSH on bast1001 is CRITICAL: Connection refused [18:57:23] PROBLEM - Host ms-be3 is DOWN: PING CRITICAL - Packet loss = 100% [18:59:18] cmjohnson1: you on site today by any chance? [18:59:36] yes in eqiad [19:00:00] I'm not getting concole output from bast1001, and this might hamper my upgrade efforts [19:00:05] would you be willing to investigate? [19:00:20] yep i will take a look [19:00:54] thanks! [19:01:29] or, if you don't have much time, can you just plug in a keyboard/monitor and press "ok" to whatever disk partitioning related bullshit is up on the screen? :) [19:04:22] werrr [19:04:38] * apergos is glad they took a copy of their (tiny) home dir earlier today [19:05:26] apergos: shouldn't affect homdirs [19:05:33] has nfs [19:05:52] I thought bast1001 does not use nfs home for /home [19:06:11] it has the class nfs::netapp::home::othersite [19:06:12] in puppet [19:06:22] so, unless that's extremely misleadingly named.... [19:06:24] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.881 seconds [19:06:44] well my home dir there has much different stuff than the one on fenari [19:06:48] huh [19:06:54] ok [19:07:34] it mounts it but not as /home iirc [19:08:02] RECOVERY - SSH on bast1001 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [19:08:18] ah! gotcha [19:08:41] well, then I'm extra glad that you backed stuff up [19:14:20] PROBLEM - NTP on bast1001 is CRITICAL: NTP CRITICAL: No response from NTP server [19:18:14] PROBLEM - Host bast1001 is DOWN: PING CRITICAL - Packet loss = 100% [19:20:38] RECOVERY - Host ms-be3 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [19:21:37] notpeter: you should be good to go...make sure you using console com2 not connect com2 [19:22:35] RECOVERY - Host bast1001 is UP: PING OK - Packet loss = 0%, RTA = 26.55 ms [19:24:11] still no output :/ [19:24:17] Imean, the host is back up [19:24:24] so this can totally wait until later [19:24:34] but I just ge ta blinky cursor [19:24:50] !log upgraded Bugzilla to 4.2.4 [19:24:59] Logged the message, Master [19:25:44] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [19:26:48] notpeter: odd..can you close the console [19:26:57] i will tinker with it some more if that's okay? [19:28:06] yep [19:28:11] sorry [19:28:14] logged out of terminal now [19:30:31] k..thx [19:31:39] andre__: thanks for the new bugzilla :) [19:31:44] + whoever did the ops stuff :-] [19:32:50] hashar, mutante did [19:35:29] PROBLEM - Host bast1001 is DOWN: PING CRITICAL - Packet loss = 100% [19:36:25] !log committing to pfw1-eqiad - has occasionally lost a few packets in the past [19:36:33] Logged the message, Mistress of the network gear. [19:37:18] mutante: thanks for the new bugzilla :-] [19:37:55] New patchset: Matthias Mullie; "Enable AFTv5 for dewiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/34964 [19:38:06] Change merged: Matthias Mullie; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/34964 [19:38:27] hashar: hey, wanna join the talk with Jarry on other channel [19:38:41] mutante: which chan ? [19:38:42] hashar: i am glad it works:) [19:39:01] ah mutante #mediawiki I guess [19:39:05] hashar: yes [19:39:20] or ..actually.. lets get him over here [19:39:41] RECOVERY - SSH on ms-be3 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [19:39:50] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [19:39:50] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:39:55] hi [19:40:08] hashar: so... the situation on gallium is like this: [19:40:20] i could fix the librsvg issue... but there was a price to it [19:40:25] it broke something else [19:40:29] Jarry1250: we follow up in this channel. it is quieter [19:40:37] which is: /Stage[main]/Misc::Contint::Android::Sdk/Package[ia32-libs] [19:40:43] mutante: Hi, so, as I say, I'm really quite clueless on these issues, but it seems to me that forcing puppet to keep the version manually placed there would work? [19:40:50] the issue comnes from having "multiarch" enabled [19:40:57] mutante: I think that package is required by the Sun jdk [19:41:08] or maybe the Android sdk [19:41:26] Jarry1250: maybe we need to use APT pinning via puppet ..yep [19:41:41] hashar: so, do we need 32bit stuff on gallium [19:41:51] because Android SDK is not avail as 64bit or something [19:42:11] or could we get rid of the 32bit stuff .. ia32* are compatibility packages [19:42:43] if i delete this file: /etc/dpkg/dpkg.cfg.d/multiarch it disables that [19:42:50] Re: priorities, Android SDK wins every time. [19:43:02] and then puppet will also install the correct librsvg packages... our packages... as opposed to the Ubuntu packages [19:43:39] apparently we need the ia32-libs to actually build the Android packages [19:43:52] ok, there goes the easiest solution then [19:44:28] actually it is already being told to prefer Wikimedia packages over Ubuntu packages.. hrmmm [19:44:31] $ file platform-tools/aapt [19:44:31] platform-tools/aapt: ELF 32-bit LSB executable, Intel 80386, v [19:44:33] in apt preferences [19:45:11] * andre__ bbl [19:46:44] RECOVERY - Host bast1001 is UP: PING OK - Packet loss = 0%, RTA = 26.70 ms [19:46:45] yeah.. let me look at it again in a little while [19:46:49] ahh [19:46:51] mutante: http://andrzejgrzesik.info/2012/08/02/ubuntu-12-04-android-sdk-fun/ [19:47:31] notpeter: i don't know yet...need to look into more. the cfg is correct [19:47:47] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [19:47:59] mutante: his solution is: apt-get --no-install-recomends install ia32-libs-multiarch [19:48:12] mutante: honestly I have no idea what is wrong. Might try on labs with a Precise instance [19:48:16] New patchset: Dereckson; "(bug 42693) Enable Article Feedback Tool on de.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36810 [19:49:49] !log mlitn synchronized php-1.21wmf4/extensions/ArticleFeedbackv5 'desc' [19:49:57] Logged the message, Master [19:50:12] !log mlitn synchronized php-1.21wmf5/extensions/ArticleFeedbackv5 'desc' [19:50:21] Logged the message, Master [19:50:54] hashar: Removing librsvg2-bin ... [19:50:54] Removing librsvg2-2 ... [19:50:56] :/ [19:50:58] !log mlitn synchronized wmf-config/InitialiseSettings.php [19:51:06] Logged the message, Master [19:51:23] !log mlitn synchronized wmf-config/CommonSettings.php [19:51:31] Logged the message, Master [19:51:37] mutante: :( [19:51:40] hashar: oh wait..just let me try a little more [19:51:56] Change merged: Matthias Mullie; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36810 [19:52:32] !log mlitn synchronized wmf-config/InitialiseSettings.php [19:52:40] Logged the message, Master [19:53:36] New patchset: Cmjohnson; "Replaced main board for cp1031, updating NIC1 MAC" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36811 [19:53:57] aus der Funktion „ArticleFeedbackv5Fetch::run“. Die Datenbank meldete den Fehler „1146: Table 'dewiki.aft_article_feedback' doesn't exist (10.0.6.54)“. [19:54:48] robh: can you +2 https://gerrit.wikimedia.org/r/36811 [19:54:56] cmjohnson1: ok, cool. thanks [19:55:24] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36811 [19:55:37] cmjohnson1: done [19:55:42] thx [19:56:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.031 seconds [19:57:40] could someone please create the AFTv5 tables for dewiki? [19:57:49] dewiki is broken [19:59:32] wait, someone deployed an extension without deploying the schema? [19:59:39] seems so [19:59:45] ask Matthis Mulle [19:59:56] he deployed AFTv5 for dewiki a few minutes ago [20:00:04] is he here? [20:00:22] don't know his IRC nick [20:01:12] he's mlitn, just asked him to come in [20:01:18] thanks [20:01:30] hi [20:01:53] mlitn: you just deployed aftv5 to dewiki? [20:02:04] still busy ;) [20:02:14] ? [20:02:38] mlitn: what the fuck are you doing? [20:02:59] mlitn: this seems urgent; de.wikipedia is broken. see https://bugzilla.wikimedia.org/show_bug.cgi?id=42693 . did you deploy an extension without deploying the schema? [20:03:51] sumanah: it's only peculiar pages like the watchlist, all is fine reading and editing [20:03:53] mlitn: when you deploy, you need to be on irc [20:03:56] every time [20:04:13] <^demon> Ok, let's talk about that later. [20:04:16] probably the 11k pages that are in the aftv5 lottery as well [20:04:26] <^demon> New tables either need creating, or we need to shut off Aftv5. [20:04:28] <^demon> I don't care which. [20:04:32] winning the lottery has never been so.. win! [20:05:30] how about we turn off the extension for now [20:05:32] mlitn: fyi for the future, the person deploying an extension for the first time generally deploys the whole thing [20:05:33] ^demon: extensions/ArticleFeedbackv5/sql/ArticleFeedbackv5.sql is what needs to be executed [20:05:38] and then deal with this once we don't have 11k broken articles [20:05:45] se4598: we have to create your MySQL tables and all will be fine again [20:06:00] <^demon> mlitn: Yes, I'm well aware of that. But unless someone's doing it right now, we need to turn the extension back off. [20:06:00] mlitn: as in - you run the create table .sql file before enabling [20:06:03] se4598: (your = those required by the extension) [20:06:55] ok, but seriously, we've been talking about this for 5 minutes now. why is the broken code still live. [20:07:17] <^demon> I'm syncing. [20:07:18] !log demon synchronized wmf-config/InitialiseSettings.php 'Fix dewiki breakage until tables are created and everyone's bikeshedding over the proper way to deploy' [20:07:26] Logged the message, Master [20:07:26] ok, but table-creating was mentioned in the commit-msg, lot of complains at dewp :) [20:07:39] <^demon> dewiki fixed. [20:07:46] ^demon: thanks :) [20:07:49] on the other hand, aftv5 is a less offensive product if its db tables are dropped.. [20:08:02] just ran sql [20:09:04] tables are created [20:10:37] so - I'm sorry; I was in the middle of getting tables deployed [20:10:42] obviously should have run that first [20:11:53] mlitn: once you're done with the immediate fixing, I think it would be good to talk about whether there were missing instructions somewhere like at http://wikitech.wikimedia.org/view/How_to_deploy_code [20:11:54] you should also be on irc and communicative before, during, and after any deploy, especially if it has gone wrong [20:13:12] http://wikitech.wikimedia.org/view/How_to_deploy_code mentions schema changes and links to the appropriate doc [20:13:44] it also talks about understanding prerequisites before deploying [20:14:34] I will add the tip in the == Basic common sense == section. It lists the things to be careful of. [20:14:58] sumanah: I'm sure it's all listed on there (though I've got something to make up for and will re-read it tomorrow), it's more of a "I got used to deploying stuff with less impact, that I did not check it as thoroughly as I obviously should have" [20:15:28] Yeah. [20:17:40] wangatlargo: your name makes me think you're chilling out in key largo [20:18:50] mlitn: congrats on your first site outage, it's a badge of honor (not sarcasm) [20:19:08] <^demon> We really should've made t-shirts :\ [20:19:27] create a barnstar for it [20:20:08] I've been talking up the ti-shirt idea for years [20:20:09] rofl - can't say I'm feeling proud ;) [20:20:12] but there's no rt ticket so... [20:20:21] mlitn: and if you think there's anyone else in your shoes (or in the shoes you figuratively were wearing 2 hours ago), might be worth telling them your story as a Confession [20:20:51] sumanah: makes sense ;) [20:21:07] mlitn: "mea culpa and I should have read this doc better: My True Story" [20:21:55] ^demon or LeslieCarr > could you add me in the Editor group? http://wikitech.wikimedia.org/index.php?title=Special%3AUserRights&user=Dereckson [20:22:06] * sumanah has to go to lunch [20:22:13] <^demon> Dereckson: Doned. [20:22:21] Thank you. [20:27:39] so, now that all tables are up, is it ok to re-enable aft on dewiki? [20:28:39] mlitn: sure [20:29:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:31:36] * mlitn crosses fingers [20:31:45] !log mlitn synchronized wmf-config/InitialiseSettings.php 're-enable aftv5 on dewiki after having created missing tables' [20:31:53] Logged the message, Master [20:32:23] uhoh [20:32:25] who broke enwiki ? [20:32:26] http://en.wikipedia.org/wiki/Shooting_an_apple_off_one%27s_child%27s_head [20:32:51] LeslieCarr: you really think this kind of jokes work? [20:33:00] no, not a joke [20:33:22] oh force reload worked [20:33:36] I was about to say [20:33:42] both logged in and logged out looked ok for me [20:33:48] crazy [20:33:51] oh well [20:33:59] maybe one varnish cache has the old css ? [20:34:33] ugh [20:39:18] RoanKattouw_away: https://gerrit.wikimedia.org/r/#/c/2141/ [20:39:32] paravoid: https://gerrit.wikimedia.org/r/#/c/15874/ [20:39:36] (i'm looking at old changes [20:40:21] <^demon> LeslieCarr: Not old, but if you're looking for easy merges... https://gerrit.wikimedia.org/r/#/c/35615/ [20:40:23] <^demon> :) [20:40:38] checking out now [20:41:15] ^demon - why the mixing of tabs and spaces ? https://gerrit.wikimedia.org/r/#/c/35615/1/files/gerrit/hooks/hookhelper_test.py [20:41:36] <^demon> Should all be spaces now, I was removing tabs. [20:42:00] there's lots of tabs [20:42:04] look at all the tabs! [20:42:18] <^demon> I don't see them on the right side of the diff. [20:42:26] <^demon> I see spaces in increments of 4. [20:42:34] really ? [20:43:00] <^demon> My diff options are: ignore whitespace all, whitespace errors and show tabs on [20:43:29] <^demon> If you don't have Show Tabs on, it's kind of hard to tell the difference between spaces and tabs. [20:43:53] i have show tabs, ignore whitespace none, whitespace errors [20:44:14] weird, if i update ignore whitespace it appears to be tabs [20:44:28] <^demon> Bah, stupid diff preferences. [20:44:38] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.032 seconds [20:44:38] but when i switch to ignore whitespace none it goes back to tabsa [20:44:47] <^demon> If you really want to verify, you could pull the patch and grep for ^\t [20:44:53] <^demon> I promise I changed all the tabs -> spaces. [20:44:55] that was what i was just about to say [20:44:56] :) [20:44:58] do [20:45:35] yep, when looking at it after pulling it's correct [20:45:40] why does gerrit hate me ? [20:46:11] New review: Lcarr; "FYI on gerrit preferences - ignore whitespace NONE this appears to still have tabs. Pulled the patc..." [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/35615 [20:46:12] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/35615 [20:46:42] wow, sockpuppet has a lot of patches to be merged and ironically has not run puppet in forever [20:47:53] New patchset: Pyoungmeister; "s7: depooling db56 for reimage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36821 [20:47:59] !log upgrading sockpuppet's 57 pending security upgrades [20:48:08] Logged the message, Mistress of the network gear. [20:48:22] Dec 4 03:36:08 sockpuppet puppet-agent[29617]: Finished catalog run in 44.79 seconds [20:48:38] that was after applying Salt_master stuff [20:49:27] New patchset: Reedy; "Upping scap forklimit from 5 to 7 to speed up sync" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9130 [20:50:08] moar horsepower [20:50:08] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36821 [20:50:55] New review: Lcarr; "we're trying the limit of 7 to see the effects. oldest operations gerrit change... merged!" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9130 [20:50:55] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9130 [20:51:25] !log py synchronized wmf-config/db.php 's7: depooling db56 for reimage' [20:51:32] Logged the message, Master [20:53:21] New patchset: Reedy; "Rename *.wikimedia.org.crt to star.wikimedia.org.crt like it is used in files/owa/owa-apache" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/32924 [20:55:50] mlitn: a message is missing: see https://de.wikipedia.org/wiki/Spezial:Artikelr%C3%BCckmeldungen_v5/B%C3%BCschelaffen/19 [20:56:04] New review: Strainu; "See inline comments" [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/36787 [20:57:11] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/23598 [20:58:38] PROBLEM - MySQL Replication Heartbeat on db56 is CRITICAL: Connection refused by host [20:58:57] PROBLEM - MySQL Slave Delay on db56 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:00:44] PROBLEM - Host db56 is DOWN: PING CRITICAL - Packet loss = 100% [21:01:01] Ryan_Lane: you approved but never actually merged https://gerrit.wikimedia.org/r/#/c/33066/ ? [21:01:20] I was assuming that faidon would merge and deploy it [21:02:48] paravoid: ^^ ? [21:05:32] so, does anyone have a reason i should not reboot sockpuppet ? [21:06:21] LeslieCarr: can you wait like 10 minutes [21:06:25] no...not a legitimate one...but if you break it you buy it [21:06:26] RECOVERY - Host db56 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [21:06:28] for you, i can wait 11 [21:06:30] :) [21:06:36] good one cmjohnson1 :) [21:06:48] preilly: did you need this ? https://gerrit.wikimedia.org/r/#/c/32866/1 [21:07:23] LeslieCarr: that's awjr change [21:07:37] New patchset: Dereckson; "(bug 42690) Namespaces configuration for ro.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36787 [21:08:23] LeslieCarr: it's not urgent but we'll need it when we finish getting esi support in MobileFrontend [21:08:58] which probably won't be until the end of the year or early next year [21:09:27] is it a "must clear varnish cache on mobile" change or a "when merged will be ok" change ? [21:09:44] New patchset: Pyoungmeister; "mysql::client : using mysqlfb client for precise" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36874 [21:09:49] New review: Dereckson; "Thanks to have caught that." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36787 [21:10:11] PROBLEM - Full LVS Snapshot on db56 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:10:20] PROBLEM - MySQL Idle Transactions on db56 is CRITICAL: Connection refused by host [21:10:20] PROBLEM - mysqld processes on db56 is CRITICAL: Connection refused by host [21:10:30] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36874 [21:10:47] PROBLEM - MySQL Recent Restart on db56 is CRITICAL: Connection refused by host [21:10:47] PROBLEM - MySQL disk space on db56 is CRITICAL: Connection refused by host [21:10:47] PROBLEM - MySQL Slave Running on db56 is CRITICAL: Connection refused by host [21:11:59] apergos: SELECT job_namespace,job_title,COUNT(*) AS count FROM job GROUP BY job_namespace, job_title ORDER BY count DESC LIMIT 10; [21:12:03] pretty funny [21:12:27] a bot keeps editing that top page a bunch of times per day [21:12:35] New patchset: Andrew Bogott; "Remove the broken 'puppetsource' site." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36796 [21:12:56] * AaronSchulz wonders if how this will be in wmf6 [21:13:17] LeslieCarr: ok, thanks [21:13:20] all done [21:13:48] all right [21:13:53] rebooting sockpuppet in 60 seconds [21:15:39] !log rebooting sockpuppet for kernel upgrades [21:15:47] Logged the message, Mistress of the network gear. [21:15:55] hashar, can you confirm that the apache config for puppet docs is somewhat correct? I'm worried about clobbering the other docs on Gallium [21:16:12] (latest patch is here: https://gerrit.wikimedia.org/r/#/c/36796/) [21:16:25] andrewbogott: there are no doc on gallium yet :-]  just http://integration.mediawiki.org/ [21:16:34] andrewbogott: looking :-] [21:16:36] well, clobbering that then. [21:17:05] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:19:39] andrewbogott: so the patch removes an apache configuration file ? :-) [21:20:03] hashar: The patch itself isn't interesting, the file that the patch modifies is what I'm hoping you'll look at. [21:20:24] The patch changes it from creating two sites to one, the one remaining site is the one I care about. [21:20:25] huzzah, sockpuppet alive again [21:20:30] Sorry, that was confusing [21:20:34] ahh [21:20:48] * hashar download patch [21:21:01] somehow I find it easier to review stuff in a terminal than in a browser [21:22:26] hashar: btw… I removed the second 'puppetsource' site because I couldn't make it work. And since I couldn't, that makes me think I'm mostly not understanding how this config will behave. [21:24:17] andrewbogott: sorry I finished handling my pull requests :-d [21:24:27] andrewbogott: so I am looking at the apache configuration file rightnow [21:25:11] and I will give you a secret: I have huge difficulty manually compiling apache files in my brain, but that one does not choke me at least :-] I am not sure you want an alias though. [21:25:13] <^demon> hashar: I filed a bug for gallium :) [21:25:18] <^demon> I need asciidoc. [21:25:43] ^demon: oh come on!! send a puppet patch :-]]]]]] [21:25:54] <^demon> Where should I put it? [21:26:29] <^demon> Maybe with the $CI_FOO_PACKAGES things? A new $CI_DOC_PACKAGES? [21:26:29] ^demon: manifests/misc/contint.pp somewhere under the packages class (please ) [21:26:50] RECOVERY - Host analytics1007 is UP: PING OK - Packet loss = 0%, RTA = 26.50 ms [21:26:52] ^demon: I need to convert that class to a module. That is a huge pile of messy stuff currently :( [21:27:16] andrewbogott: I would set up a virtual host doc.wikimedia.org served from /srv/org/wikimedia/doc/ [21:27:42] andrewbogott: then put the puppet doc in /srv/org/wikimedia/doc/puppet/ which would then be available at doc.wikimedia.org/puppet/ [21:27:50] New patchset: Demon; "Need asciidoc for compiling gerrit" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36882 [21:27:59] andrewbogott: also we probably want to support https:// too but we can always set that up later on. Not really important [21:27:59] hashar: I don't think I know how to do that, but otherwise it sounds reasonable :) [21:28:16] ^demon: is asciidoc able to compile markdown ? :-] [21:28:22] <^demon> Nope. [21:28:44] andrewbogott: I must have an apache configuration for it already. Let me find it [21:29:11] Thanks. Partly it bothers me that I can't test virtual host configs on a test machine since they depend on the actual hostname [21:29:18] (or, at least, I believe that to be true. True?) [21:29:25] <^demon> There's talk of moving all the asciidoc docs in gerrit to markdown since we're using markdown in some new places. [21:29:29] <^demon> But it's not a priority. [21:29:43] New review: Hashar; "Looks fine to me. I would approve it if I could :-]" [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/36882 [21:29:50] PROBLEM - NTP on db56 is CRITICAL: NTP CRITICAL: No response from NTP server [21:30:11] andrewbogott: you can put the remote hostname in your local /etc/hosts :-] [21:31:10] andrewbogott: or : curl -x localhost:80 http://hostname.to.test/ (curl will ask the last argument to its proxy which happen to be … your local apache listening on port 80). [21:31:22] I think I learned that one from domas [21:31:49] andrewbogott: ahhh an example is files/apache/sites/integration.mediawiki.org [21:32:00] andrewbogott: basically set ServerName doc.wikimedia.org [21:32:05] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 8.530 seconds [21:32:22] andrewbogott: that should do it :) [21:32:31] brb, daughter crying [21:32:47] hashar, I can just add another section to that existing file, right? [21:32:50] RECOVERY - Puppet freshness on sockpuppet is OK: puppet ran at Tue Dec 4 21:32:40 UTC 2012 [21:32:50] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [21:32:53] Or should it be separate? [21:34:33] !log bast1001 back up with clean puppet runs [21:34:41] Logged the message, notpeter [21:35:32] RECOVERY - Full LVS Snapshot on db56 is OK: OK no full LVM snapshot volumes [21:35:41] RECOVERY - MySQL Replication Heartbeat on db56 is OK: OK replication delay seconds [21:35:59] RECOVERY - MySQL Idle Transactions on db56 is OK: OK longest blocking idle transaction sleeps for seconds [21:36:08] RECOVERY - MySQL disk space on db56 is OK: DISK OK [21:36:08] RECOVERY - MySQL Recent Restart on db56 is OK: OK seconds since restart [21:36:27] RECOVERY - MySQL Slave Running on db56 is OK: OK replication [21:36:35] RECOVERY - MySQL Slave Delay on db56 is OK: OK replication delay seconds [21:38:05] RECOVERY - NTP on db56 is OK: NTP OK: Offset -0.07356703281 secs [21:38:46] andrewbogott: back sorry. I had to invoke wife() ;-D [21:38:59] np [21:39:03] andrewbogott: yeah I think it is better to get a different apache file [21:39:24] andrewbogott: something like files/apache/sites/doc.wikimedia.org [21:39:39] which would be mostly a copy/paste from another file I guess [21:39:54] up till we start including a puppet module to easy handle all that crap :-] [21:40:21] meanwhile, lets grow the technical debt a bit more. That is not that much of an issue imho [21:40:30] The distinction between files and sections is ignored by apache, right? It's just about controlling granularity when enabling/disabling sites? [21:40:37] yup [21:40:51] I think apache just concatenate all the available conf files [21:40:56] and parse them [21:41:16] the isolate each section properly (or should :p ) [21:41:35] but don't quote me on that :-] [21:41:43] That's how it looks to me too, at least. [21:41:47] Will you be up for a bit longer? [21:42:34] andrewbogott: files/apache/sites/irc.wikimedia.org <-- seems to be a good candidate to base your work upon [21:42:35] PROBLEM - Host db56 is DOWN: PING CRITICAL - Packet loss = 100% [21:42:37] that is a few lines [21:42:52] cool. [21:43:05] andrewbogott: one thing is to decide whether you really just wand a file..or maybe a template is better [21:43:17] note there is ./files/apache/sites and ./templates/apache/sites [21:43:23] want [21:43:29] I think I just want a file… previous version ended up as a template just because of the pattern I was following. [21:43:36] *copying [21:43:50] andrewbogott: http://dpaste.org/rwUN8/ that might be good [21:44:10] oh templates!!! you are awesome mutante :-] [21:44:23] and i agree you want to have a separate file.... that way you can disable/enabled the site separately from the "integration" site [21:44:39] in this case I don't think we need to use a template though [21:45:09] or andrew will end up having to write a new puppet class like apache::virtualhost( domain, docroot, https(true|false) ... [21:45:20] it is 2 steps in puppet.. one puts it in ./sites-available/ ., the other creates the link from there to ./sites-enabled/ [21:45:24] which would be nice eventually but it is a lot more work than a few lines of apache conf [21:45:51] hashar: another one ... ?! [21:45:53] PROBLEM - Host analytics1007 is DOWN: PING CRITICAL - Packet loss = 100% [21:46:04] mutante: do we have a puppet class to create an apache ghost ? [21:46:08] ghost -> vhost [21:46:28] hashar: there has been something.. webserver:: [21:46:54] i think part of the problem is resolving the chaos and make it uniform [21:47:19] we should do ops / dev hackaton one day :-] [21:47:21] also..we will want to turn stuff into puppet modules.... [21:47:32] RECOVERY - Host db56 is UP: PING OK - Packet loss = 0%, RTA = 0.41 ms [21:47:37] that will change the directory structure again [21:47:57] andrewbogott: ahh so we have an apache module in puppet :-] modules/apache/README.md [21:48:58] um… for my first pass I will do this the way that I know how :) [21:48:59] and there is webserver.pp with webserver::php5 ... ssl = true [21:49:19] Although, hm, I guess right now my class installs a webserver which is definitely not needed [21:49:21] andrewbogott: so you might end up with something like: http://dpaste.org/G3XnL/ [21:49:21] andrewbogott: fair.. we probably have too many ways to do it [21:49:40] all you need to do here is add another apache site [21:49:48] gallium already has the Apache itself of course [21:49:49] Oh, does apache::vhost create the site automatically? That seems good. [21:50:04] * andrewbogott prepares to rewrite everything [21:50:37] andrewbogott: you will have to ask puppet to create each directories that leads to /srv/org/wikimedia/doc (using file {} ) that comes from the README.md [21:50:54] andrewbogott: Ryan_Lane added that apache module last week apparently. [21:51:02] !log starting innoback from db1041 to db56 [21:51:04] i can just tell you this works: class {'webserver::php5': ssl => 'true'; } .. apache_site { ...} file { ... [21:51:10] Logged the message, notpeter [21:51:21] as it is done f.e. in /misc/bugzilla.pp [21:51:35] RECOVERY - Host analytics1007 is UP: PING OK - Packet loss = 0%, RTA = 26.55 ms [21:51:39] ^demon: I have +1ed your change to get asciidoc on the contint server :D [21:51:47] so if you already have Apache, just one apache_site definition and one file definition [21:51:59] poor andrew :-] [21:52:27] andrewbogott: I am heading to bed, but I guess you have enough different way to create the new vhost with different people able to support you if needed :-] [21:52:52] andrewbogott: if all fail, there is an Air France flight to Paris at 3:45pm :-D [21:53:51] and mutante / ryan taught me everything I know anyway so you should be safe [21:54:46] as a final note, feel free to merge / apply the conf on gallium. Just make sure that http://integration.mediawiki.org/ci/ still work, that is the entry point to reach Jenkins. [22:03:32] !log awjrichards synchronized php-1.21wmf5/extensions/MobileFrontend/stylesheets/specials/watchlist.css 'touch file' [22:03:40] Logged the message, Master [22:04:06] New patchset: Andrew Bogott; "Don't set up any apache stuff for puppetdocs." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36796 [22:07:11] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:07:14] andrewbogott: you are deleting templates/apache/sites/labsconsole.wikimedia.org :-D [22:07:26] shit [22:07:30] New review: Hashar; "deleting templates/apache/sites/labsconsole.wikimedia.org ? :-)" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/36796 [22:12:28] New patchset: Andrew Bogott; "Don't set up any apache stuff for puppetdocs." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36796 [22:20:05] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.728 seconds [22:21:39] !log taking down hume for upgrade to precise [22:21:42] New patchset: Andrew Bogott; "Add class for a doc site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36904 [22:21:47] Logged the message, notpeter [22:23:01] heading bed, have a good afternoon [22:23:06] New patchset: Andrew Bogott; "Add class for a doc site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36904 [22:23:12] good night hashar [22:23:19] 'night [22:23:30] mutante, does this look roughly as you'd expect? ^ [22:27:16] andrewbogott: i added inline comments [22:27:27] thanks [22:27:56] this: class {'webserver::php5': ssl => 'true'; } instead of the "require" is a parameterized class [22:28:21] one step more "modern way" than the old one.. but still one less than module and role class [22:28:58] and a system_role is nice to have [22:29:29] see example manifests/misc/bugzilla.pp misc::bugzilla::server [22:31:03] PROBLEM - SSH on hume is CRITICAL: Connection refused [22:31:11] New patchset: Andrew Bogott; "Add class for a doc site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36904 [22:31:13] andrewbogott: one more.. the directory /srv/org/wikimedia/doc will not exist [22:31:26] there is just /srv/org/mediawiki/integration [22:31:29] Hm, thought that git::clone took care of that [22:31:36] I'll test. [22:31:46] Oh, I guess that's in a different bit... [22:32:14] yeah, it is also mediawiki vs. wikimedia [22:35:30] andrewbogott: dont be too quick or it will work but we have that other issue on gallium :) [22:35:50] which other issue? [22:36:07] librsvg vs. android SDK 32bit [22:36:11] package install issue [22:36:34] Oh, um, I don't know what that is or what it means [22:36:57] Must've missed it in the backscroll [22:37:24] you dont need to.. just dont wonder about it if you see puppet issues related to it [22:38:13] if a puppet run breaks it _could_ keep you from getting the new apache site up [22:39:00] !log awjrichards Started syncing Wikimedia installation... : Updating MobileFrontend per regular weekly deployment [22:39:09] Logged the message, Master [22:39:09] RECOVERY - SSH on hume is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [22:39:12] mutante: OK, good to know [22:42:37] notpeter: what's up with https://gerrit.wikimedia.org/r/#/c/36874/1 [22:44:23] Hm… are puppet files broken in puppetmaster::self? [22:44:36] I can see the file but puppet says it can't, possible because it's trying to get it from a server rather than locally [22:45:02] PROBLEM - Host neon is DOWN: PING CRITICAL - Packet loss = 100% [22:45:29] New patchset: Asher; "mysql-client-5.5 for precise hosts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36910 [22:46:12] RECOVERY - Host neon is UP: PING OK - Packet loss = 0%, RTA = 26.56 ms [22:48:45] PROBLEM - SSH on hume is CRITICAL: Connection refused [22:49:25] New patchset: Dzahn; "add neon to raid-lvm partman recipe" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36913 [22:49:30] PROBLEM - MySQL disk space on neon is CRITICAL: Connection refused by host [22:49:57] PROBLEM - SSH on neon is CRITICAL: Connection refused [22:49:57] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36913 [22:50:42] PROBLEM - HTTP on neon is CRITICAL: Connection refused [22:50:46] hi - who should I suck up to in order to hopefully be allowed to push another small change out today (fixing a few issues with aftv5 on dewiki)? [22:53:26] New patchset: Ryan Lane; "Performance tweaks for deployment scripts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36914 [22:54:00] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36910 [22:54:05] New patchset: Ryan Lane; "Performance tweaks for deployment scripts" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36914 [22:54:08] matthiasmullie: no one? [22:54:12] you have deploy access, right? [22:55:12] RECOVERY - SSH on hume is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [22:55:12] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:55:18] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36914 [22:55:24] Ryan_Lane: indeed, want to make sure it's ok though :) [22:58:23] !log mlitn synchronized php-1.21wmf4/extensions/ArticleFeedbackv5 'apply some patches to aft for dewiki' [22:58:31] Logged the message, Master [22:58:33] matthiasmullie: as long as you aren't deploying during someone else's window [22:58:52] !log mlitn synchronized php-1.21wmf5/extensions/ArticleFeedbackv5 'apply some patches to aft for dewiki' [22:59:00] Logged the message, Master [22:59:20] Ryan_Lane: ok, thanks, good to know [22:59:30] you know where to check for windows, righr? [22:59:31] *right [22:59:57] i do [23:01:47] New patchset: Andrew Bogott; "Add class for a doc site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36904 [23:01:51] matthiasmullie, do you realise you've just synched in the middle of a scap? [23:02:37] it does not *seem* to break anything, but still not recommended [23:08:42] PROBLEM - NTP on hume is CRITICAL: NTP CRITICAL: No response from NTP server [23:09:18] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [23:09:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 6.987 seconds [23:10:21] PROBLEM - NTP on neon is CRITICAL: NTP CRITICAL: No response from NTP server [23:11:15] RECOVERY - mysqld processes on db56 is OK: PROCS OK: 1 process with command name mysqld [23:13:04] mutante: once more? [23:13:51] mutante: i'm going to reassign you https://rt.wikimedia.org/Ticket/Display.html?id=3201 [23:14:51] PROBLEM - MySQL Replication Heartbeat on db56 is CRITICAL: CRIT replication delay 420 seconds [23:15:09] RECOVERY - NTP on hume is OK: NTP OK: Offset -0.02918851376 secs [23:16:12] PROBLEM - MySQL Slave Delay on db56 is CRITICAL: CRIT replication delay 232 seconds [23:16:19] I'm trying to deploy something for the first time, and sync-file complains at me that it doesn't recognize the identification of the apaches in order to copy things. and wants me to add host keys to my /home/anomie/.ssh/known_hosts. Is there a command to quickly do that? [23:16:41] <^demon|busy> Ugh, *again* [23:16:46] <^demon|busy> We just fixed that crap. [23:17:04] New patchset: Andrew Bogott; "Don't set up any apache stuff for puppetdocs." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36796 [23:17:47] ^demon|busy: perms on that file are correct [23:17:51] RECOVERY - MySQL Slave Delay on db56 is OK: OK replication delay 0 seconds [23:17:54] although deploy to hume is broken right now [23:17:55] andrewbogott: looks good.. just minor things like " vs. ' and those 2 red tabs [23:17:56] as I just reimaged [23:18:00] RECOVERY - MySQL Replication Heartbeat on db56 is OK: OK replication delay 0 seconds [23:18:09] cool, thanks. [23:18:17] (although I'm syncing hume by hand right now [23:18:17] <^demon|busy> notpeter: I wasn't trying to deploy, anomie was. [23:18:23] like in the file section, one uses " , the other uses ' ..see [23:18:39] ^demon|busy: oh, right, sorry [23:18:59] LeslieCarr: hmmm..ok.. no idea yet.. but i shall keep an eye open on that when we are back to puppet runs [23:19:42] MaxSem: was that scap scheduled? [23:20:30] I can't wait till our deployment system disallows more than one deploy at a time [23:20:33] yes: http://wikitech.wikimedia.org/view/Deployments [23:20:38] Ryan_Lane: it was, I didn't wait long enough for it to finish [23:20:42] -_- [23:20:51] you didn't check with the people who scheduled it? [23:20:54] New patchset: Andrew Bogott; "Add class for a doc site" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36904 [23:20:56] scap can take up to 30 minutes [23:20:57] or more [23:21:08] like 80 [23:21:13] fucking crazy [23:21:24] 30, I wish! :) [23:21:25] <^demon|busy> It used to not be this bad :( [23:21:34] you guys need to try out the new system :D [23:21:40] there's no such thing as scap anymore [23:21:42] <^demon|busy> This is why scap is bad unless you *really really* need to sync everything. [23:21:43] increased NFS load? [23:21:48] <^demon|busy> This is what sync-dir is for. [23:21:50] everything is a scap and it only pulls changes [23:21:58] I thought you had to scap for any i18n changes [23:22:13] I think you can deploy i18n separately [23:22:24] hm I guess I'm not handling i18n in the current system [23:22:31] <^demon|busy> Prolly not ;-) [23:22:33] the biggest delay in scap is i18n [23:22:35] err [23:22:37] in the new system [23:22:40] i18n is the only part that's really a bear [23:22:52] <^demon|busy> It also rebuilds texvc. [23:22:57] <^demon|busy> Which is almost never actually needed. [23:23:08] ^demon|busy, I thought it's not anymore... [23:23:09] we need to 7z i18n before sending it out [23:23:20] and un7z on the opposite side [23:23:25] I tried to build a script for just updating i18n, but I could never get it to actually work. [23:23:39] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36796 [23:23:43] there's some voodoo I'm missing [23:23:46] ugh [23:23:51] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36904 [23:23:54] updated wikitech has that fucking mediawiki session bug [23:24:09] New patchset: Pyoungmeister; "Revert "s7: depooling db56 for reimage"" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36918 [23:25:45] so is scap finished now or still running? [23:26:06] RECOVERY - SSH on neon is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [23:26:22] Change merged: Pyoungmeister; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36918 [23:27:02] kaldari, you'll see a message in this channel when it will [23:27:24] !log py synchronized wmf-config/db.php 's7: repooling db56' [23:27:33] Logged the message, Master [23:31:06] anomie: looks like Arthur started a scap about an hour ago, should be finished soon [23:32:34] !log re-siging neon on sockpuppet, remove from ssh_known_hosts, run puppet ... [23:32:43] Logged the message, Master [23:39:10] hello [23:41:08] !log awjrichards Finished syncing Wikimedia installation... : Updating MobileFrontend per regular weekly deployment [23:41:16] Logged the message, Master [23:41:32] kaldari, ^^ [23:42:10] finally :) [23:42:31] New patchset: Ryan Lane; "sync_all should either clone or fetch for repos" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36921 [23:42:40] anomie: You're all clear kid, let's blow this thing and go home! [23:43:16] Still the error when it tries to copy to hume. :( [23:43:21] PROBLEM - Puppet freshness on db62 is CRITICAL: Puppet has not run in the last 10 hours [23:43:38] anomie: probably don't need to worry about hume [23:43:51] any other errors? [23:44:09] !log anomie synchronized php-1.21wmf4/skins/modern/main.css 'CSS fixes backported from 1.21wmf5' [23:44:17] Logged the message, Master [23:44:17] anomie: I'm on that [23:44:20] don't worry about it [23:44:31] hey opsen, question about udp2log [23:44:35] anomie: I'm seeing the new CSS on de.wiki :) [23:44:38] kaldari- snapshot1002 missing the directory /apache/common-local/php-1.21wmf4/skins, and timeout on srv238 and srv266 [23:44:47] it seems that traffic from blog.wikimedia.org is not send to udp2log [23:45:00] anomie: yeah, that one you can ignore too [23:45:02] !log anomie synchronized php-1.21wmf4/skins/vector/screen.css 'CSS fixes backported from 1.21wmf5' [23:45:10] is that correct and if so is that something that we could easily enable? [23:45:11] Logged the message, Master [23:45:17] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36921 [23:45:18] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:45:20] anomie: srv266 is evil ..always ignore :) [23:45:40] ^ it will not die [23:46:00] where is fatalmonitor in Gerrit? [23:46:05] New patchset: Andrew Bogott; "Install puppet docs on gallium." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36922 [23:46:24] notpeter, do you know whether blog.wikimedia.org traffic is send to udp2log? [23:46:28] !log anomie synchronized php-1.21wmf4/extensions/Vector/modules/ext.vector.collapsibleNav.css 'CSS fixes backported from 1.21wmf5' [23:46:36] Logged the message, Master [23:46:49] cmjohnson1: hehee [23:47:17] drdee: no clue, sorry [23:47:18] mutante: One more? https://gerrit.wikimedia.org/r/#/c/36922/1 [23:49:30] andrewbogott: misc::docsite and misc::docs:puppet ? [23:49:41] are those 2 separate things? [23:49:46] yep [23:49:53] notpeter: who might know? [23:49:59] One is the apache site (which presumably will contain all kinds of docs) [23:50:00] !log awjrichards synchronized php-1.21wmf5/extensions/MobileFrontend/ 'touch files' [23:50:07] and the other actually generates the particular puppet docs [23:50:08] Logged the message, Master [23:51:20] andrewbogott: it looks ok.. i just dont know if the git cloning and doc generation works like that..it _looks_ like it would [23:51:33] mutante: That part I've tested a bunch. [23:52:40] andrewbogott: ok, i can merge it and see on gallium ... [23:53:15] andrewbogott: is misc::docs::puppet merged though? [23:53:41] mutante: It's part of that same patch [23:55:04] andrewbogott: hmm..it is already in the file ..ok.. but not in this patch..like was there before [23:55:42] mutante: ? I see it in a big green block here: https://gerrit.wikimedia.org/r/#/c/36922/1/manifests/misc/docs.pp [23:55:55] andrewbogott: you merged it in 36904 [23:56:08] drdee: uuuuuhhhhh [23:56:49] andrewbogott: sorry, i meant the other one... misc::docsite [23:56:49] I think that robh has mostly looked after the blog [23:57:05] notpeter; ty [23:57:09] um… yes, was already merged. [23:58:00] New patchset: Jdlrobson; "ensure all photo uploads go to commons" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/36103 [23:58:29] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/36922 [23:59:09] mutante: Thanks! You did the merge on sockpuppet too? [23:59:12] andrewbogott: ok,,now we just have to wait a while... [23:59:19] yes.. but puppet is slow right now [23:59:30] it will take a while until gallium got it [23:59:49] Should I not force a refresh on gallium?