[00:00:09] Nikerabbit: lots of 'MessageCache failed to load messages' in exception.log [00:00:59] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:02:02] Fatal error: wikiversions.cdb has no version entry for `DB connection error: No working slave server: Unknown error (10.0.6.46)`. [00:02:07] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/39330 [00:02:08] haha [00:04:17] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.237 second response time [00:06:00] can someone please flush Varnish? [00:08:56] PROBLEM - Apache HTTP on srv221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:09:14] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:10:26] RECOVERY - Apache HTTP on srv221 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.333 second response time [00:11:21] notpeter: "Since 1997, as the bureau chief of the 'Institute of Junior Assembly Members Who Think About the Outlook of Japan and History Education'" [00:11:33] http://en.wikipedia.org/wiki/Shinz%C5%8D_Abe#Unpopularity_and_sudden_resignation [00:12:01] maybe that just translated funny [00:12:24] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.358 second response time [00:14:09] MaxSem: I'm going to do it, but as I told jon, if you guys are going to need a flush on deploy you need to poke an ops person *before* you deploy [00:14:24] Ryan_Lane, thanks [00:14:43] and there's no excuse for not knowing. [00:14:54] flushed [00:15:32] PROBLEM - Apache HTTP on srv221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:15:50] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [00:16:35] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:17:02] RECOVERY - Apache HTTP on srv221 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.117 second response time [00:17:29] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:18:14] PROBLEM - Varnish HTTP mobile-backend on cp1042 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:18:41] PROBLEM - Varnish traffic logger on cp1042 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:19:10] New patchset: Cmjohnson; "Adding tellurium mac address" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/39332 [00:19:44] RECOVERY - Varnish HTTP mobile-backend on cp1042 is OK: HTTP OK HTTP/1.1 200 OK - 698 bytes in 0.058 seconds [00:21:32] New patchset: Cmjohnson; "Adding tellurium mac address" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/39332 [00:21:50] RECOVERY - Varnish traffic logger on cp1042 is OK: PROCS OK: 3 processes with command name varnishncsa [00:22:15] Change merged: Cmjohnson; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/39332 [00:23:15] anyone want to check out http://ganglia.wikimedia.org/3.5.4/ before we make it "latest" ? [00:23:47] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [00:25:08] * Matthew_ can't even get in, so meh :) [00:25:26] PROBLEM - Apache HTTP on srv221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:25:32] !log modifying project groups [00:25:39] Logged the message, Master [00:26:12] !log make that modifying project groups in labs by running syncProjectGroups.php maintenance script in OpenStackManager [00:26:19] Logged the message, Master [00:29:00] AaronSchulz: maybe we need one of those for the US [00:29:02] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.577 second response time [00:31:26] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 4.383 seconds [00:33:09] !log awjrichards synchronized php-1.21wmf6/extensions/MobileFrontend/javascripts/common/main.js 'touch file' [00:33:17] Logged the message, Master [00:33:32] RECOVERY - Apache HTTP on srv221 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.496 second response time [00:33:59] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:34:01] I really should remember to do batch installs when doing package upgrades using salt '*' [00:35:10] New patchset: Lcarr; "Oupgraded ganglia-web to latest version" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/39334 [00:35:23] notpeter: https://gerrit.wikimedia.org/r/#/c/39334/ [00:35:24] New patchset: Brion VIBBER; "offhost_backups should only copy gzipped db dumps" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/39335 [00:40:14] LeslieCarr: LeslieCarr lgtm [00:40:17] PROBLEM - Apache HTTP on srv221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:45:06] RECOVERY - Apache HTTP on srv221 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 5.109 second response time [00:45:32] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.275 second response time [00:46:08] !log taking down for another reinstall (this time with raid!) [00:46:17] Logged the message, notpeter [00:46:50] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/39334 [00:50:29] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:53:30] New patchset: Ryan Lane; "Also don't specify the top dir" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/39338 [00:53:48] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [00:55:39] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/39338 [00:56:20] PROBLEM - SSH on hume is CRITICAL: No route to host [01:00:50] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [01:01:44] Is anyone deploying now? [01:01:53] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.326 second response time [01:05:21] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:06:50] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:08:29] PROBLEM - Apache HTTP on srv221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:09:07] New patchset: Bsitu; "Enable Echo on test server" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/39341 [01:09:59] RECOVERY - Apache HTTP on srv221 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 2.370 second response time [01:11:02] RECOVERY - SSH on hume is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [01:11:04] Change merged: Bsitu; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/39341 [01:11:09] New patchset: Dfoy; "comment change only revised comments in file" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/39342 [01:14:31] !log demon synchronized php-1.21wmf6/extensions/Wikibase/lib/resources/templates.js 'Deploying I641725a2' [01:14:42] Logged the message, Master [01:15:41] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 314 seconds [01:16:03] !log bsitu synchronized wmf-config/InitialiseSettings.php 'Enable Echo on test server' [01:16:12] Logged the message, Master [01:16:53] RECOVERY - Puppetmaster HTTPS on sockpuppet is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.024 seconds [01:17:20] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 1 seconds [01:19:54] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.172 second response time [01:20:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.039 seconds [01:24:26] PROBLEM - NTP on hume is CRITICAL: NTP CRITICAL: Offset unknown [01:26:05] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:29:14] RECOVERY - NTP on hume is OK: NTP OK: Offset -0.01475405693 secs [01:30:08] PROBLEM - Apache HTTP on srv221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:31:38] RECOVERY - Apache HTTP on srv221 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.058 second response time [01:32:41] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.657 second response time [01:37:38] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:41:19] !log awjrichards synchronized php-1.21wmf6/extensions/MobileFrontend/javascripts/modules/mf-cleanuptemplates.js [01:41:27] Logged the message, Master [01:44:01] !log finished upgrading salt on all production minions [01:44:09] Logged the message, Master [01:44:53] !log awjrichards synchronized php-1.21wmf6/extensions/MobileFrontend/stylesheets/modules/mf-cleanuptemplates.css 'touch file' [01:45:01] Logged the message, Master [01:46:29] PROBLEM - Apache HTTP on srv221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:47:24] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 8.920 second response time [01:48:00] RECOVERY - Apache HTTP on srv221 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.058 second response time [01:51:44] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:52:29] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:01:11] PROBLEM - Apache HTTP on srv221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:01:22] !log awjrichards synchronized php-1.21wmf5/extensions/MobileFrontend/javascripts/common/main.js [02:01:30] Logged the message, Master [02:02:08] New patchset: Ryan Lane; "Bringing salt master thread count down." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/39345 [02:02:28] !log awjrichards synchronized php-1.21wmf5/extensions/MobileFrontend/javascripts/modules/mf-toggle.js 'touch file [02:02:32] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/39345 [02:02:36] Logged the message, Master [02:09:44] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.023 seconds [02:11:14] RECOVERY - Apache HTTP on srv221 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.198 second response time [02:15:35] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.662 second response time [02:20:32] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:25:38] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.642 second response time [02:30:44] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:37:20] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.339 second response time [02:42:17] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:46:15] New patchset: Bsitu; "Enable Echo on test2wiki and mediawikiwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/39349 [02:47:52] about to run scap [02:48:17] !log LocalisationUpdate completed (1.21wmf6) at Wed Dec 19 02:48:16 UTC 2012 [02:48:26] Logged the message, Master [03:03:53] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [03:10:29] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.215 second response time [03:15:26] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:21:26] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.009 second response time on port 11000 [03:24:24] !log kaldari Started syncing Wikimedia installation... : [03:24:33] Logged the message, Master [03:31:24] !log LocalisationUpdate completed (1.21wmf5) at Wed Dec 19 03:31:24 UTC 2012 [03:31:32] Logged the message, Master [03:36:44] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.189 second response time [03:41:41] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:49:55] Change merged: Bsitu; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/39349 [03:59:09] !log kaldari Finished syncing Wikimedia installation... : [03:59:17] Logged the message, Master [04:01:20] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.429 second response time [04:03:19] !log bsitu synchronized wmf-config/InitialiseSettings.php 'Enable Echo on test2 and mediawiki' [04:03:27] Logged the message, Master [04:06:17] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:09:27] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.092 second response time [04:19:29] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:21:08] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.228 second response time [04:21:57] bsitu / kaldari: cool! [04:24:09] PROBLEM - Swift HTTP on ms-fe1004 is CRITICAL: HTTP CRITICAL - No data received from host [04:25:56] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:26:49] PROBLEM - Apache HTTP on srv221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:27:43] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.390 second response time [04:28:19] RECOVERY - Apache HTTP on srv221 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.232 second response time [04:31:46] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [04:31:46] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: Puppet has not run in the last 10 hours [04:31:47] PROBLEM - Puppet freshness on ms-be1002 is CRITICAL: Puppet has not run in the last 10 hours [04:31:47] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [04:31:47] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [04:32:40] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:35:58] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.576 second response time [04:40:55] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:45:52] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: Puppet has not run in the last 10 hours [04:50:49] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.654 second response time [04:55:46] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:22:46] PROBLEM - Apache HTTP on srv221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:25:47] RECOVERY - Apache HTTP on srv221 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.769 second response time [05:30:53] PROBLEM - Apache HTTP on srv221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:37:20] RECOVERY - Apache HTTP on srv221 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.272 second response time [05:42:26] PROBLEM - Apache HTTP on srv221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:47:14] RECOVERY - Apache HTTP on srv221 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 6.508 second response time [05:50:14] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [05:50:15] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [05:52:20] PROBLEM - Apache HTTP on srv221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:53:32] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.163 second response time [05:58:29] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:01:47] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.541 second response time [06:06:17] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [06:06:44] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:23:45] RECOVERY - Apache HTTP on srv221 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 3.077 second response time [06:52:44] PROBLEM - Apache HTTP on srv221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:53:38] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 9.799 second response time [06:55:53] RECOVERY - Apache HTTP on srv221 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 4.166 second response time [06:58:35] PROBLEM - Apache HTTP on srv223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:00:05] RECOVERY - Apache HTTP on srv223 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.061 second response time [07:06:19] !log shot a bunch of converts on the image scalers, looks like a couple started flapping about 7-8 hours ago [07:06:28] Logged the message, Master [07:40:29] archive.org down since hours ago for power outage... do we have a paging system from outside the datacentres too? [07:42:30] really it is? aww :-( [07:42:49] we have a notification system yes [07:42:58] that is not dependent on us or our dcs [08:08:02] PROBLEM - Puppet freshness on ms-be3 is CRITICAL: Puppet has not run in the last 10 hours [08:35:09] PROBLEM - Puppet freshness on erzurumi is CRITICAL: Puppet has not run in the last 10 hours [09:02:09] if you have a power outage, you don't need a paging system though [09:02:14] trust me, you'll know [09:31:21] !log Jenkins: enabled unit test run on mw/core for some whitelisted people {{gerrit|39310}} [09:31:31] Logged the message, Master [09:35:56] RECOVERY - Puppet freshness on ms1004 is OK: puppet ran at Wed Dec 19 09:35:46 UTC 2012 [09:43:25] New review: Hashar; "recheck" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/39056 [09:43:37] New review: Hashar; "recheck" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/39057 [09:43:48] New review: Hashar; "recheck" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/39058 [09:43:59] New review: Hashar; "recheck" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/39059 [09:44:14] New review: Hashar; "recheck" [operations/mediawiki-config] (master); V: 0 C: 0; - https://gerrit.wikimedia.org/r/39060 [09:59:28] hi is here anyone who can handle some labs issue? [09:59:42] everyone sleeps :/ [10:05:02] zZZ zZZ [10:12:10] petan: paravoid is probably awake and might be able to handle labs issues. [10:12:31] we're already talking in #-labs [10:12:32] I figured out [10:16:35] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [10:24:32] PROBLEM - Puppet freshness on ssl3001 is CRITICAL: Puppet has not run in the last 10 hours [10:55:25] hello , no space left on build1.pmtpa.wmflabs [10:55:26] zero [10:55:42] can we please have some space there ? I need to build some packages [10:58:17] you may want to ask for that in the labs irc channel [11:01:59] apergos: hi, I just talked to hashar, he re-directed me here [11:02:09] oh :-D [11:02:17] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [11:06:38] ends up the analytics instances have not been rebooted :) [11:07:35] uh do they need to be? (I have no idea how any of this stuff works, I figure the people who do are in the labs channel) [11:09:30] apergos: yup that is needed. /home/ used to be mounted on some NFS file system which has been made readonly. It has been migrated to Gluster so one need to reboot to change the /home/ mount :) [11:13:05] New patchset: Ori.livneh; "Use logrotate to archive log files" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/39366 [11:18:04] apergos: got a sec for https://gerrit.wikimedia.org/r/#/c/39366/1? [11:18:37] lemme see [11:20:27] * apergos removes the ? from the end of that link (stupid irc client) [11:21:04] well, probably my fault -- it's a valid URI :) [11:21:47] heh [11:29:47] Change merged: ArielGlenn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/39366 [11:30:48] ariel, thanks, whatever your nickname is :) [11:31:51] just a sec, running puppet on stat1 now [11:32:59] The following packages have unmet dependencies: [11:32:59] libmysqlclient-dev : Depends: libmysqlclient18 (= 5.5.28-0ubuntu0.12.04.3) but 5.5.28-mariadb-wmf201212041~precise is to be installed [11:33:11] err: /Stage[main]/Misc::Statistics::Gerrit_stats/Git::Clone[gerrit-stats]/Exec[git_pull_gerrit-stats]/returns: change from notrun to 0 failed: git pull --quiet returned 128 instead of one of [0] at /var/lib/git/operations/puppet/manifests/generic-definitions.pp:679 [11:33:20] and finally [11:33:21] err: /Stage[main]/Misc::Statistics::Mediawiki/Git::Clone[statistics_mediawiki]/Exec[git_pull_statistics_mediawiki]/returns: change from notrun to 0 failed: git pull --quiet returned 1 instead of one of [0] at /var/lib/git/operations/puppet/manifests/generic-definitions.pp:679 [11:33:32] if you aren't already aware of those [11:33:49] the /etc/logrotate.d/eventlogging change applied fine [11:34:49] ori-l: [11:35:07] apergos: none of those are related to my change or my config classes [11:35:14] but i'll let andrew otto know [11:35:28] no, they aren't related to your change, just issues that someone ought to either fix or know can be ignored [11:35:31] thanks [11:35:40] i'll e-mail him right now [11:35:46] thanks for flagging [11:35:49] yup [11:36:03] and for running puppet :) [11:36:09] sure [11:43:21] PROBLEM - Memcached on virt0 is CRITICAL: Connection refused [11:48:18] RECOVERY - Memcached on virt0 is OK: TCP OK - 0.005 second response time on port 11000 [11:53:51] !log Updated solr, cleaning up killlist [11:53:59] Logged the message, Master [12:58:30] RECOVERY - Host ms-be1005 is UP: PING OK - Packet loss = 0%, RTA = 26.57 ms [12:58:30] RECOVERY - Host ms-be1006 is UP: PING OK - Packet loss = 0%, RTA = 26.56 ms [12:58:39] RECOVERY - Host ms-be1007 is UP: PING OK - Packet loss = 0%, RTA = 26.53 ms [13:01:48] PROBLEM - swift-container-replicator on ms-be1005 is CRITICAL: Connection refused by host [13:01:48] PROBLEM - swift-object-replicator on ms-be1005 is CRITICAL: Connection refused by host [13:01:57] PROBLEM - swift-account-reaper on ms-be1005 is CRITICAL: Connection refused by host [13:01:58] PROBLEM - swift-container-server on ms-be1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:06] PROBLEM - SSH on ms-be1006 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:02:07] PROBLEM - swift-account-reaper on ms-be1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:07] PROBLEM - swift-account-server on ms-be1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:07] PROBLEM - swift-object-auditor on ms-be1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:07] PROBLEM - swift-container-updater on ms-be1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:15] PROBLEM - swift-container-server on ms-be1005 is CRITICAL: Connection refused by host [13:02:15] PROBLEM - swift-account-replicator on ms-be1005 is CRITICAL: Connection refused by host [13:02:16] PROBLEM - swift-container-server on ms-be1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:16] PROBLEM - swift-object-replicator on ms-be1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:16] PROBLEM - swift-object-server on ms-be1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:16] PROBLEM - swift-container-replicator on ms-be1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:25] PROBLEM - SSH on ms-be1005 is CRITICAL: Connection refused [13:02:25] PROBLEM - swift-object-updater on ms-be1005 is CRITICAL: Connection refused by host [13:02:25] PROBLEM - swift-container-updater on ms-be1005 is CRITICAL: Connection refused by host [13:02:25] PROBLEM - swift-account-auditor on ms-be1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:25] PROBLEM - swift-object-updater on ms-be1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:33] PROBLEM - swift-object-server on ms-be1005 is CRITICAL: Connection refused by host [13:02:34] PROBLEM - swift-container-auditor on ms-be1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:34] PROBLEM - swift-account-replicator on ms-be1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:34] PROBLEM - swift-account-server on ms-be1005 is CRITICAL: Connection refused by host [13:02:51] PROBLEM - swift-object-updater on ms-be1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:51] PROBLEM - swift-object-replicator on ms-be1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:51] PROBLEM - swift-container-replicator on ms-be1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:52] PROBLEM - swift-account-auditor on ms-be1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:52] PROBLEM - swift-container-auditor on ms-be1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:02:52] PROBLEM - SSH on ms-be1007 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:02:52] PROBLEM - swift-container-updater on ms-be1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:03:00] PROBLEM - swift-object-auditor on ms-be1005 is CRITICAL: Connection refused by host [13:03:00] PROBLEM - swift-account-replicator on ms-be1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:03:01] PROBLEM - swift-account-server on ms-be1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:03:10] PROBLEM - swift-container-auditor on ms-be1005 is CRITICAL: Connection refused by host [13:03:10] PROBLEM - swift-object-server on ms-be1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:03:10] PROBLEM - swift-account-reaper on ms-be1006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:03:18] PROBLEM - swift-account-auditor on ms-be1005 is CRITICAL: Connection refused by host [13:03:21] *eyeroll* [13:03:23] apergos: swift seems to have some issue :/ [13:03:26] oh page :) [13:03:37] pages are probably more reliable than irc notification Oo [13:03:37] swift in eqiad. unused. etc. [13:03:46] PROBLEM - swift-object-auditor on ms-be1007 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:06:09] PROBLEM - Host ms-be1005 is DOWN: PING CRITICAL - Packet loss = 100% [13:11:30] it's me [13:11:33] I'm reformatting the boxes [13:11:51] RECOVERY - Host ms-be1005 is UP: PING WARNING - Packet loss = 37%, RTA = 26.64 ms [13:11:53] have fun [13:13:58] RECOVERY - SSH on ms-be1007 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [13:14:16] pep8_1.3.3-0ubuntu1_all.deb !!! [13:14:20] I BACKPORTED A PACKAGE!!!!!!!!!!!!!!!!!!! [13:14:22] oh yeah [13:14:24] \O/ [13:14:42] RECOVERY - SSH on ms-be1006 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [13:21:27] PROBLEM - NTP on ms-be1006 is CRITICAL: NTP CRITICAL: No response from NTP server [13:21:54] RECOVERY - SSH on ms-be1005 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [13:22:39] PROBLEM - NTP on ms-be1007 is CRITICAL: NTP CRITICAL: No response from NTP server [13:24:09] PROBLEM - Swift HTTP on ms-fe1004 is CRITICAL: HTTP CRITICAL - No data received from host [13:28:20] apergos: paravoid : what does it take to get a package uploaded on apt.w.o ? Should I open a rt ticket giving the places where the .deb .gz .changes .dsc files are? [13:28:27] or is it something I can do myself ? [13:28:49] no it needs global root [13:29:03] you can open an RT ticket or you can give it to me now and I'll do it for you :-) [13:29:55] I choose the later if you are available now :-D [13:30:02] The result is available on fenari in /home/hashar/pep8-backport [13:30:11] oo wait [13:30:22] just figured out I could potentially grant myself root by giving a fake package [13:30:26] how would you validate it ? [13:30:37] I'll look at the source [13:30:52] from the .deb ? [13:31:05] well, ideally we'd rebuild them [13:31:07] from source [13:31:12] I don't think we realistically do that [13:31:34] I am too paranoid maybe [13:31:44] no you're right [13:32:26] so that's a straight backport? [13:32:29] have you rebuilt the package? [13:32:55] I used backportpackage from ubuntu-dev-tools package [13:33:03] basically followed the nice guide at https://wikitech.wikimedia.org/view/Backport_packages [13:33:15] it is not signed with a PGP key since I don't have one [13:33:50] So I did: backportpackage --dont-sign -s raring -d precise -w workdir pep8 [13:34:19] then pbuilder --basetgz=/path/to/precise.tgz build pep8.dsc [13:34:21] hmm [13:34:27] I guess that rebuild it from scratch. [13:34:31] why doesn't it have a ~precise1 suffix? [13:35:46] PROBLEM - NTP on ms-be1005 is CRITICAL: NTP CRITICAL: No response from NTP server [13:35:50] backportpackage says it does that [13:36:08] ahh I must have run pbuilder against pep8_1.3.3-0ubuntu1.dsc instead of pep8_1.3.3-0ubuntu1~precise1.dsc [13:36:23] ha [13:36:49] who knew, reviews help! [13:38:00] rebuilding [13:43:24] paravoid: rebuild. I have deleted all files from fenari:/home/hashar/pep8-backport and reuploaded the result. I got a deb named pep8_1.3.3-0ubuntu1~precise1_all.deb [13:45:17] done [13:45:33] is it for gallium? [13:45:40] or do you need me to install it somewhere? [13:47:41] just for gallium, I will update it tthere [13:48:00] cool [13:48:34] !log gallium: apt-get install pep8 v1.3.3 (backported from raring) [13:48:42] Logged the message, Master [13:49:04] (apt-get upgrade would also do it, fwiw) [13:49:48] I am so happy to have been able to backport a package [13:50:01] * hashar strikes achievement "backport an Ubuntu package" [13:50:06] sometimes it's more difficult [13:50:17] but others it's just a 5' work [13:50:21] I have noticed that with the python modules I needed for Zuul :/ [13:50:35] luckily upgrading gallium to Precise fixed it [13:50:39] I can take a stab at them, although probably not this week [13:50:41] oh [13:50:45] even better :-) [13:51:12] Some packages were from Quantal and back porting them to Lucid would have required to backport a tooooon of dependencies [13:51:19] so yeah, fixed :-] [13:51:42] oh yeah, that path is almost always harder [13:51:50] New review: Hashar; "recheck" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/39039 [13:52:28] so hmm [13:52:34] we have a lot of python issues https://integration.mediawiki.org/ci/job/operations-puppet-pep8/36/violations/? :-] [13:53:29] I'm not surprised [13:55:10] New review: Hashar; "recheck" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/39049 [13:56:32] New review: Hashar; "recheck" [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/39040 [14:03:07] !log gallium: updated PHPUnit to 3.7.10 thus solving {{bug|42724}} [14:03:16] Logged the message, Master [14:08:31] hashar: incidentally, it occurred to me that the lack of librsvg on gallium means I'll have to take the tests out or jenkins will block the merge. That's right isn't it? [14:08:54] Jarry1250: I thought we had that issue sorted out aren't we ? [14:09:16] Did we? [14:09:29] I can't remember the bug # either :-D [14:10:10] I don't think there ever was a bug [14:10:17] How does one search gerrit by keyword? [14:11:23] New patchset: Reedy; "RT #2295: Run cleanupUploadStash across all wikis daily" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/37968 [14:24:32] hashar? Any ideas? [14:31:03] Jarry1250: i use google :-] [14:31:12] git log :-] [14:31:29] there is some trace at https://gerrit.wikimedia.org/r/#/c/36583/ [14:32:14] Jarry1250: the jenkins box has rsvg-convert : $ rsvg-convert --version [14:32:15] rsvg-convert version 2.36.1 (Wikimedia) [14:32:19] PROBLEM - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours [14:32:19] PROBLEM - Puppet freshness on ms-be1002 is CRITICAL: Puppet has not run in the last 10 hours [14:32:19] PROBLEM - Puppet freshness on ms-be1001 is CRITICAL: Puppet has not run in the last 10 hours [14:32:19] PROBLEM - Puppet freshness on ms-be1007 is CRITICAL: Puppet has not run in the last 10 hours [14:32:20] PROBLEM - Puppet freshness on ms-be1006 is CRITICAL: Puppet has not run in the last 10 hours [14:32:20] PROBLEM - Puppet freshness on ms-be1010 is CRITICAL: Puppet has not run in the last 10 hours [14:32:20] PROBLEM - Puppet freshness on ms-be1005 is CRITICAL: Puppet has not run in the last 10 hours [14:32:21] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [14:32:21] PROBLEM - Host ms-be3004 is DOWN: PING CRITICAL - Packet loss = 100% [14:34:38] hasshar: Oh, okay, great, I just misremembered then. [14:34:42] *hashar [14:34:44] Coolio. [14:41:56] New patchset: Demon; "Make github replication config forward compatible" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/39385 [14:43:09] New review: Demon; "Replication plugin will just ignore directives it doesn't understand, so this can be merged whenever..." [operations/puppet] (production) C: 1; - https://gerrit.wikimedia.org/r/39385 [14:46:35] PROBLEM - Puppet freshness on ms-be1003 is CRITICAL: Puppet has not run in the last 10 hours [14:55:17] RECOVERY - Host ms-be3004 is UP: PING OK - Packet loss = 0%, RTA = 109.72 ms [15:08:32] Change abandoned: Jgreen; "I thought I'd already abandoned this one--script has been fixed." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/39335 [15:27:47] New review: Hashar; "Has Tim said, this is going nowhere. I have logged Tim's idea under https://bugzilla.wikimedia.org/s..." [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/15720 [15:29:38] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:31:08] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.262 seconds [15:34:12] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/39056 [15:36:38] !log hashar synchronized wmf-config/CommonSettings.php [15:36:47] Logged the message, Master [15:37:35] !log hashar synchronized wmf-config/InitialiseSettings.php [15:37:44] Logged the message, Master [15:37:50]