[00:02:11] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:02:24] but i'm basically like those people who flood police tip hotlines with useless tips (neighbor hasn't shaved!) so i'm going to stop now [00:05:01] PROBLEM - Puppet freshness on mc1008 is CRITICAL: Puppet has not run in the last 10 hours [00:12:10] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [00:14:09] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.276 second response time [00:20:11] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [00:41:11] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [00:43:12] PROBLEM - Varnish traffic logger on cp1028 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [00:50:10] RECOVERY - Varnish traffic logger on cp1028 is OK: PROCS OK: 3 processes with command name varnishncsa [00:55:49] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 185 seconds [00:56:09] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 188 seconds [00:57:09] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [01:03:19] PROBLEM - Varnish traffic logger on cp1033 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:06:09] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [01:06:49] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [01:15:20] New review: Tim Starling; "Looks good, thanks for that. Can be merged once the whitespace issues are fixed." [operations/debs/lucene-search-2] (master) C: -1; - https://gerrit.wikimedia.org/r/53299 [01:22:19] RECOVERY - Varnish traffic logger on cp1033 is OK: PROCS OK: 3 processes with command name varnishncsa [01:23:21] PROBLEM - Varnish traffic logger on cp1034 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:31:21] PROBLEM - Varnish traffic logger on cp1025 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:31:22] PROBLEM - Varnish traffic logger on cp1033 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:34:19] RECOVERY - Varnish traffic logger on cp1034 is OK: PROCS OK: 3 processes with command name varnishncsa [01:44:20] PROBLEM - Varnish traffic logger on cp1034 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:45:05] New patchset: Ori.livneh; "Add 'eventlogging' puppet module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54324 [01:46:10] PROBLEM - Varnish traffic logger on cp1026 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [01:47:19] RECOVERY - Varnish traffic logger on cp1025 is OK: PROCS OK: 3 processes with command name varnishncsa [01:51:14] ops: I'd appreciate if someone gave RT 4752 a fitting title. I left it out and am not allowed to edit. Something like 'Add database access credentials for EventLogging to puppet-private'. [01:55:21] RECOVERY - Varnish traffic logger on cp1033 is OK: PROCS OK: 3 processes with command name varnishncsa [01:58:46] Red herring is my favorite type of herring. [02:00:26] salted, smoked, or pickled? [02:03:09] Deep-fried. [02:05:20] RECOVERY - Varnish traffic logger on cp1034 is OK: PROCS OK: 3 processes with command name varnishncsa [02:07:09] RECOVERY - Varnish traffic logger on cp1026 is OK: PROCS OK: 3 processes with command name varnishncsa [02:24:53] New patchset: Ram; "Bug: 45266 Use sequence numbers instead of timestamps" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/53299 [02:30:04] !log LocalisationUpdate completed (1.21wmf11) at Mon Mar 18 02:30:03 UTC 2013 [02:30:11] Logged the message, Master [02:32:06] New review: Ram; "Patch set 4 fixes whitespace issues." [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/53299 [02:43:09] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [02:47:10] PROBLEM - Varnish traffic logger on cp1028 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [02:49:32] ori-l: how do you like putting eventlogging creds in a my.cnf formatted file? [02:50:24] funny that ori-l can't edit. /me edits [02:51:04] it could work; the format is more or less equivalent to the one used by python's ConfigParser module [02:51:53] ori-l: refresh the ticket [02:52:21] thanks, jeremyb_ [02:52:59] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [02:53:10] ori-l: you forgot to ask if it's wine sauce or cream sauce [02:53:32] i knew what the answer would be [02:53:41] really?? [02:53:51] no :) [02:56:04] i hear http://www.vitafoodproducts.com/c-68-herring.aspx is a particularly good brand. of course they lose points for .NET [03:01:10] RECOVERY - Varnish traffic logger on cp1028 is OK: PROCS OK: 3 processes with command name varnishncsa [03:06:09] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [03:17:10] PROBLEM - Varnish traffic logger on cp1023 is CRITICAL: PROCS CRITICAL: 2 processes with command name varnishncsa [03:24:27] New review: Tim Starling; "I don't think it's appropriate to put load balancing code in configuration files. You should set $wg..." [operations/mediawiki-config] (master) C: -2; - https://gerrit.wikimedia.org/r/43029 [03:37:09] RECOVERY - Varnish traffic logger on cp1023 is OK: PROCS OK: 3 processes with command name varnishncsa [03:39:47] New review: Tim Starling; "Note that the vhost will stay there indefinitely if this change is merged, since there is no ensure=..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/52742 [03:58:20] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:00:20] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 5.360 second response time [04:04:59] PROBLEM - Puppet freshness on colby is CRITICAL: Puppet has not run in the last 10 hours [04:09:59] PROBLEM - Puppet freshness on search20 is CRITICAL: Puppet has not run in the last 10 hours [04:16:25] New review: Tim Starling; "In general, you can't have different redirects for different protocols using mod_rewrite, see https:..." [operations/apache-config] (master) C: -2; - https://gerrit.wikimedia.org/r/47088 [04:19:21] ori-l: yt? [04:33:50] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 198 seconds [04:33:50] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 198 seconds [04:34:30] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:38:19] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK: Status line output matched 400 - 336 bytes in 0.133 second response time [04:39:59] PROBLEM - Puppet freshness on mw1061 is CRITICAL: Puppet has not run in the last 10 hours [04:49:49] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [04:50:01] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [04:57:01] PROBLEM - Puppet freshness on europium is CRITICAL: Puppet has not run in the last 10 hours [04:59:05] New review: Tim Starling; "Reviewed the complete diff." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/53125 [05:05:53] ori-l: well, i'm running away. you should send a puppet changeset that makes an eventlogging.conf similarly to how https://gerrit.wikimedia.org/r/gitweb?p=operations/puppet.git;a=commitdiff;h=6fb59e38e59c88c4938a6ca412598f6a7b3d5741 does [05:13:00] jeremyb_: ah, that makes total sense. [05:13:49] New review: Tim Starling; "(8 comments)" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/53299 [05:16:18] New patchset: Tim Starling; "Bug: 45266 Use sequence numbers instead of timestamps" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/53299 [05:16:56] New review: Tim Starling; "PS5: fixed space indenting" [operations/debs/lucene-search-2] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/53299 [05:16:57] Change merged: Tim Starling; [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/53299 [05:23:01] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Puppet has not run in the last 10 hours [05:23:01] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [05:23:01] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Puppet has not run in the last 10 hours [05:23:01] PROBLEM - Puppet freshness on msfe1002 is CRITICAL: Puppet has not run in the last 10 hours [05:59:09] PROBLEM - Puppet freshness on cp1036 is CRITICAL: Puppet has not run in the last 10 hours [06:02:59] PROBLEM - Puppet freshness on sq73 is CRITICAL: Puppet has not run in the last 10 hours [06:04:59] PROBLEM - Puppet freshness on virt1005 is CRITICAL: Puppet has not run in the last 10 hours [06:21:19] New patchset: ArielGlenn; "bug fix, handle partial buffers that don't start with open parenthesis" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/54330 [06:37:01] PROBLEM - Puppet freshness on db56 is CRITICAL: Puppet has not run in the last 10 hours [06:38:59] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Puppet has not run in the last 10 hours [07:22:19] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 187 seconds [07:22:49] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 195 seconds [07:30:07] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/54330 [07:33:19] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [07:33:51] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [07:39:29] PROBLEM - LVS HTTPS IPv6 on bits-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [07:40:41] PROBLEM - LVS HTTPS IPv4 on foundation-lb.eqiad.wikimedia.org is CRITICAL: Connection timed out [07:40:41] PROBLEM - LVS HTTPS IPv4 on wikinews-lb.eqiad.wikimedia.org is CRITICAL: Connection timed out [07:40:41] PROBLEM - LVS HTTPS IPv4 on mediawiki-lb.eqiad.wikimedia.org is CRITICAL: Connection timed out [07:40:41] PROBLEM - LVS HTTP IPv6 on foundation-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [07:40:41] PROBLEM - LVS HTTP IPv4 on bits-lb.eqiad.wikimedia.org is CRITICAL: Connection timed out [07:40:44] PROBLEM - LVS HTTPS IPv6 on wikiquote-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [07:40:44] PROBLEM - LVS HTTPS IPv4 on wikibooks-lb.eqiad.wikimedia.org is CRITICAL: Connection timed out [07:40:44] PROBLEM - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [07:41:14] PROBLEM - LVS HTTPS IPv4 on wikiquote-lb.eqiad.wikimedia.org is CRITICAL: Connection timed out [07:41:14] PROBLEM - LVS HTTPS IPv4 on wikisource-lb.eqiad.wikimedia.org is CRITICAL: Connection timed out [07:41:14] PROBLEM - LVS HTTP IPv6 on wikiversity-lb.eqiad.wikimedia.org_ipv6 is CRITICAL: Connection timed out [07:41:14] PROBLEM - LVS HTTPS IPv4 on wiktionary-lb.eqiad.wikimedia.org is CRITICAL: Connection timed out [07:41:14] PROBLEM - LVS HTTPS IPv4 on wikidata-lb.eqiad.wikimedia.org is CRITICAL: Connection timed out [07:41:59] PROBLEM - LVS HTTPS IPv4 on wikiversity-lb.eqiad.wikimedia.org is CRITICAL: Connection timed out [07:42:19] RECOVERY - LVS HTTPS IPv4 on mediawiki-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 66821 bytes in 0.012 second response time [07:42:19] RECOVERY - LVS HTTP IPv6 on mobile-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 15993 bytes in 0.041 second response time [07:42:44] RECOVERY - LVS HTTP IPv6 on wikiversity-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 66821 bytes in 0.006 second response time [07:42:44] RECOVERY - LVS HTTPS IPv4 on wiktionary-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 66821 bytes in 0.021 second response time [07:42:44] RECOVERY - LVS HTTPS IPv6 on wikiquote-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 66821 bytes in 0.021 second response time [07:42:44] RECOVERY - LVS HTTPS IPv4 on wikinews-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 66821 bytes in 0.016 second response time [07:42:44] RECOVERY - LVS HTTP IPv4 on bits-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 3846 bytes in 0.001 second response time [07:42:54] RECOVERY - LVS HTTPS IPv4 on wikibooks-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 66821 bytes in 0.016 second response time [07:42:55] RECOVERY - LVS HTTPS IPv4 on wikiversity-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 66823 bytes in 0.022 second response time [07:42:55] RECOVERY - LVS HTTPS IPv4 on wikidata-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 602 bytes in 0.013 second response time [07:43:22] RECOVERY - LVS HTTPS IPv4 on wikisource-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 66821 bytes in 0.021 second response time [07:43:22] RECOVERY - LVS HTTPS IPv4 on foundation-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 66821 bytes in 0.011 second response time [07:43:22] RECOVERY - LVS HTTPS IPv6 on bits-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 3852 bytes in 0.009 second response time [07:43:35] RECOVERY - LVS HTTPS IPv4 on wikiquote-lb.eqiad.wikimedia.org is OK: HTTP OK: HTTP/1.1 200 OK - 66822 bytes in 0.012 second response time [07:43:35] RECOVERY - LVS HTTP IPv6 on foundation-lb.eqiad.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 200 OK - 66822 bytes in 0.007 second response time [08:00:13] helo [08:06:16] New review: Hashar; "(1 comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53875 [08:13:22] New patchset: ArielGlenn; "new tool fifo_to_mysql.pl for feeding chunks to LOAD DATA INFILE" [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/54337 [08:29:04] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [08:33:35] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 196 seconds [08:33:45] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 199 seconds [08:36:35] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [08:36:45] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [08:41:04] PROBLEM - Puppet freshness on sq70 is CRITICAL: Puppet has not run in the last 10 hours [08:50:06] Change merged: ArielGlenn; [operations/dumps] (ariel) - https://gerrit.wikimedia.org/r/54337 [08:59:47] New patchset: Krinkle; "Add sudo user "krinkle" on gallium." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53861 [09:28:34] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 182 seconds [09:28:35] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 182 seconds [09:31:35] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 193 seconds [09:31:35] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 194 seconds [09:33:29] New patchset: Hashar; "apache confs for nagios.* are no more needed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54341 [09:37:35] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 196 seconds [09:37:55] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 204 seconds [09:39:36] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [09:39:36] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [09:39:44] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [09:39:56] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [09:43:52] New patchset: Hashar; "(bug 45926) b/c for nagios URL" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54343 [10:05:19] New patchset: Silke Meyer; "Adjusted load balancer settings on Wikidata test repos." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54344 [10:06:04] PROBLEM - Puppet freshness on mc1008 is CRITICAL: Puppet has not run in the last 10 hours [10:34:44] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 189 seconds [10:34:55] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 197 seconds [10:37:55] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [10:38:46] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [11:02:54] PROBLEM - Varnish HTTP bits on niobium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:02:54] PROBLEM - SSH on niobium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:03:45] RECOVERY - Varnish HTTP bits on niobium is OK: HTTP OK: HTTP/1.1 200 OK - 635 bytes in 1.466 second response time [11:03:54] RECOVERY - SSH on niobium is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [11:06:56] PROBLEM - Varnish HTTP bits on niobium is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:14:46] RECOVERY - Varnish HTTP bits on niobium is OK: HTTP OK: HTTP/1.1 200 OK - 635 bytes in 0.001 second response time [11:27:37] !log restarted both the varnishncsas on niobium, they were giant again [11:27:43] Logged the message, Master [11:34:04] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 182 seconds [11:34:55] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 209 seconds [11:36:54] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [11:37:05] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [11:47:11] varnishncsa was? [11:47:28] not varnish? [11:54:02] "again"? [11:56:50] not varnish [11:57:09] and yes, again (well recently I only restarted the vanadium one, iirc, I logged it at the time) [12:13:59] New patchset: Matmarex; "(bug 45911) Set $wgCategoryCollation to 'uca-pt' for the Portuguese Wikipedia and Wikibooks" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/52903 [12:15:42] New patchset: Matmarex; "(bug 45968) Set $wgCategoryCollation to 'uca-pl' on Polish Wikivoyage" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54352 [12:18:48] New patchset: Matmarex; "(bug 45596) Set $wgCategoryCollation to 'uca-hu' on Hungarian Wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54353 [12:26:27] New review: Nikerabbit; "Why is this not done for all wikis in the same language?" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54353 [12:26:43] New review: Peachey88; "Possibly causes Bug 46264?" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/46826 [12:33:54] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 187 seconds [12:35:56] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 3 seconds [12:39:49] New patchset: Mark Bergsma; "Update streaming range patch to M.B.Grydeland's updated version" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54358 [12:39:49] New patchset: Mark Bergsma; "varnish (3.0.3plus~rc1-wm7) precise; urgency=low" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54359 [12:39:49] New patchset: Mark Bergsma; "Disable internal jemalloc so the system jemalloc can be used" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54360 [12:39:50] New patchset: Mark Bergsma; "varnish (3.0.3plus-rc1-1~1.gbpae5519) UNRELEASED; urgency=low" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54361 [12:39:50] New patchset: Mark Bergsma; "Refresh the varnishncsa udplog patch against 3.0.3plus-rc1" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54362 [12:39:50] New patchset: Mark Bergsma; "Remove escaping of spaces in header lines" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54363 [12:39:51] New patchset: Matmarex; "(bug 46005) Set $wgCategoryCollation to 'uca-be-tarask' on be-x-old.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54364 [12:39:51] New patchset: Matmarex; "(bug 46004) Set $wgCategoryCollation to 'uca-be' on be.wikipedia" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54365 [12:41:11] Change merged: Mark Bergsma; [operations/debs/ganglia] (master) - https://gerrit.wikimedia.org/r/53374 [12:42:23] New patchset: Mark Bergsma; "Update streaming range patch to M.B.Grydeland's updated version" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54358 [12:42:39] Change merged: Mark Bergsma; [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54358 [12:42:59] New patchset: Mark Bergsma; "varnish (3.0.3plus~rc1-wm7) precise; urgency=low" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54359 [12:43:08] Change merged: Mark Bergsma; [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54359 [12:43:38] Change abandoned: Mark Bergsma; "(no reason)" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54360 [12:43:49] Change abandoned: Mark Bergsma; "(no reason)" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54361 [12:45:45] New patchset: Mark Bergsma; "Refresh the varnishncsa udplog patch against 3.0.3plus-rc1" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54362 [12:45:46] New patchset: Mark Bergsma; "Remove escaping of spaces in header lines" [operations/debs/varnish] (testing/3.0.3plus-rc1) - https://gerrit.wikimedia.org/r/54363 [12:48:16] New review: Matmarex; "Because that's how it was done before, because for now I'm just trying to deal with the wikis that s..." [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54353 [12:52:23] apergos: do you have any spare cycles? [12:52:31] :-D [12:52:36] no but ask anyways, what's up? [12:52:36] :) [12:53:32] New review: Wizardist; "Lacks bewikisource config. Community notice link provided in Bugzilla." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/54365 [12:53:37] I'm building a new Swift version [12:53:50] to fix a bug that's been holding the deployment of large files [12:53:52] (according to Aaron) [12:54:05] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [12:54:05] (RT 4499) [12:54:12] would you like to be involved in the upgrade? [12:54:21] eh might as well :-D [12:54:51] I see [12:55:09] we still have boxes on the old old vrsion [12:55:20] as they get replaced they get 1.7.x [12:55:47] but given how slow that process is you may want to do this round differently [12:57:30] New patchset: Matmarex; "(bug 46004) Set $wgCategoryCollation to 'uca-be' on be.wikipedia and be.wikisource" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54365 [12:57:30] New patchset: Matmarex; "(bug 46081) Set $wgCategoryCollation to 'uca-default' on Polish Wiktionary" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54367 [12:57:55] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 184 seconds [12:58:01] no, that's for proxies only [12:58:03] so that should be okay [12:58:05] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 185 seconds [12:58:18] ah good [12:58:21] New review: Hashar; "I have confirmed that it works properly by crafting a tiny job that creates a single files." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53990 [12:58:27] how's the box replacement going? [12:58:27] tediously slow [12:58:44] where are we now? [12:58:47] and when we finally get rid of all the old ones we still get to pull out the new ones that have the wrong controller and ssds >_< [12:59:00] https://wikitech.wikimedia.org/wiki/Swift/Deploy_Plan_-_R720xds_in_tampa [12:59:18] Mon Mar 18 - done remove weight from ms-be9 to 0 add weight to ms-be12 to 66 [12:59:43] after this next box comes out there's still three to go [12:59:55] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [12:59:58] three c2100s? [13:00:03] yep [13:00:06] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [13:01:47] !log depooling ms-fe1 for new swift version (patched 1.7.4) [13:01:54] Logged the message, Master [13:02:17] but you're replacing those with H710 boxes now, right? [13:02:20] yep [13:02:29] the ones coming out now get replaced with the right stuff [13:02:40] May? [13:02:42] omg [13:02:45] it's the 710s thaat got put in before you discovered the controller issue that are the problem [13:02:46] yeah [13:02:51] so this like a full 6 months to replace boxes [13:02:56] tell me about it, makes me want to shoot something (maybe me) [13:03:03] yes, horrid is as horrid does [13:12:46] apergos: I'm restarting the rest [13:12:54] ok [13:13:17] poor swift, 1.4k req/s [13:13:46] how is ceph coming along? [13:15:16] haven't had the chance to give it the love it needs yet [13:16:00] would love to try it out sometime [13:16:20] sure [13:16:33] are you doing dns first? [13:16:51] I'd like to, yes [13:17:03] but ceph is much more important [13:17:07] considering it's almost there... [13:19:01] hm, swift object count stopped working an hour ago [13:19:13] oh god, I upgraded ganglia-frontend on the boxes [13:19:27] that means ben's ganglia plugin might have stopped working [13:19:29] oh dear [13:19:34] shouldn't [13:23:27] !log depooled, upgraded, restarted and pooled again all ms-fe[1-4] [13:23:33] Logged the message, Master [13:24:14] RECOVERY - Puppet freshness on search20 is OK: puppet ran at Mon Mar 18 13:24:11 UTC 2013 [13:25:54] https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Swift%20pmtpa&h=ms-fe1.pmtpa.wmnet&v=85.6663827975&m=swift_200_hits_%25&r=hour&z=default&jr=&js=&st=1363613021&vl=hps&z=large [13:25:54] lol [13:25:54] something's broken since... May last year [13:25:54] and fixed now [13:26:41] and yet that broke http://ganglia.wikimedia.org/latest/graph_all_periods.php?m=swift_object_count&z=small&h=Swift+pmtpa+prod&c=Swift+pmtpa&r=hour [13:29:17] it's a bunch of cronjobs in root crontab [13:30:35] the logtailer seems to work, somehow [13:30:52] huh [13:30:57] I'm trying to see where that swift_object_count is generated [13:31:16] doesn't seem to be logtailer [13:31:48] /usr/local/bin/swift-ganglia-report-global-stats does that [13:32:07] yeah [13:32:09] just found that :) [13:32:39] it uses gmetric [13:32:47] yuck [13:33:16] logtailer too [13:33:54] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 191 seconds [13:34:06] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 191 seconds [13:34:43] know what it might be? [13:34:53] gmond no longer runs as nobody:root or ganglia:root now [13:35:01] the new version actually does setgid() [13:36:33] ha! [13:36:35] that's exactly it [13:36:43] there's a config file with the password [13:36:58] root:root [13:37:04] hehe [13:37:04] hehe [13:37:12] I found it before I switched back to IRC [13:37:27] * mark giggles [13:37:56] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 13 seconds [13:37:56] RECOVERY - Puppet freshness on mw1061 is OK: puppet ran at Mon Mar 18 13:37:51 UTC 2013 [13:38:05] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 5 seconds [13:39:18] hm, although [13:39:28] hm, maybe not [13:39:33] the script runs from root's crontab [13:46:33] did you touch ganglia? [13:46:37] it's throwing php errors now [13:46:44] I didn't [13:46:55] ok [13:46:57] * mark looks at it [13:49:33] these are some hella error outputs [13:50:50] yeah i get ganglia probs too, the php files are being served directly rather than executing [13:51:22] interesting [13:51:28] looks like the php apache module isn't enabled anymore [13:51:44] lol? [13:51:52] there was just an apache USN released [13:51:57] do we have ensure => latest? [13:52:02] doubt it [13:53:14] did you ran an upgrade? [13:53:21] running it now [13:53:24] iU apache2-utils 2.2.14-5ubuntu8.11 utility programs for webservers [13:53:27] ii apache2.2-bin 2.2.14-5ubuntu8.11 Apache HTTP Server common binary files [13:53:31] ah [13:53:33] okay, wasn't sure if it was puppet or you [13:53:47] someone just set up https for ganglia, right? did that change something? [13:54:05] apache2: Syntax error on line 204 of /etc/apache2/apache2.conf: Syntax error on line 1 of /etc/apache2/mods-enabled/php5.load: Cannot load /usr/lib/apache2/modules/libphp5.so into server: /usr/lib/apache2/modules/libphp5.so: cannot open shared object file: No such file or directory [13:54:30] root@nickel:/var/log# apt-cache policy libapache2-mod-php5 [13:54:30] libapache2-mod-php5: [13:54:30] Installed: (none) [13:54:30] Candidate: 5.3.2-2wm1 [13:54:33] wha?! [13:55:04] 2013-03-18 13:35:00 remove libapache2-mod-php5 5.3.2-2wm1 5.3.2-2wm1 [13:55:06] aha [13:55:10] mpm-worker [13:55:12] probably removed php [13:56:45] Mar 18 13:35:14 nickel puppet-agent[25055]: (/Stage[main]/Webserver::Php5/Package[apache2]/ensure) ensure changed '2.2.14-5ubuntu8.10' to '2.2.14-5ubuntu8.11' [13:57:02] package { [ "apache2", "libapache2-mod-php5" ]: [13:57:03] ensure => latest; [13:57:03] } [13:57:06] oooof course. [13:57:50] since you are looking that and here, paravoid, is ensure => latest a good thing to do? [13:58:06] seems error prone, puppet will upgrade things when you aren't looking [13:58:09] right? [13:58:13] that's right [13:58:18] so you use it only when that's not a problem [13:58:22] like ganglia ;-p [13:58:24] I hate ensure => latest [13:58:30] ok cool, in general me too [13:58:38] i've seen it in a lot of our manifests and was wondering [13:59:04] PROBLEM - Puppet freshness on mw1104 is CRITICAL: Puppet has not run in the last 10 hours [13:59:04] PROBLEM - Puppet freshness on mw1131 is CRITICAL: Puppet has not run in the last 10 hours [13:59:58] ok ganglia back up [14:00:06] PROBLEM - Puppet freshness on mw1124 is CRITICAL: Puppet has not run in the last 10 hours [14:00:52] thank you! [14:01:23] New patchset: Faidon; "Remove ensure => latest from webserver.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54371 [14:02:37] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54371 [14:02:44] why [14:02:57] New patchset: Mark Bergsma; "Install apache2-mpm-prefork instead of letting APT install -worker" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54372 [14:02:57] thanks for conflicting [14:03:53] New patchset: Mark Bergsma; "Revert "Remove ensure => latest from webserver.pp"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54373 [14:04:02] lol [14:04:04] PROBLEM - Puppet freshness on sq79 is CRITICAL: Puppet has not run in the last 10 hours [14:04:07] commit wars? :) [14:04:23] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54373 [14:04:44] New patchset: Mark Bergsma; "Install apache2-mpm-prefork instead of letting APT install -worker" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54372 [14:05:51] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54372 [14:06:04] PROBLEM - Puppet freshness on colby is CRITICAL: Puppet has not run in the last 10 hours [14:06:53] I still think that we shouldn't let puppet upgrade apaches [14:06:56] and restart them [14:07:28] until we have better security upgrade processes in place, i think it's a good idea [14:10:22] New patchset: Mark Bergsma; "Fix dependency" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54385 [14:10:50] New patchset: Mark Bergsma; "Fix dependency" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54385 [14:12:11] back to swift ganglia [14:12:31] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54385 [14:13:41] are you guys aware that blog.wm.o is serving up php source? [14:13:52] * hexmode checks backlog to see [14:13:56] haha [14:14:02] yes i am [14:14:33] good... just wanted to verify :) [14:16:42] ekrem too [14:16:48] are you handling all of them mark? [14:17:20] yes [14:17:25] k [14:24:43] New review: Ottomata; "We are not going to use this change, I will eventually abandon it." [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/48041 [14:28:21] good thing ganglia data is not stored by the php [14:28:59] brb, errands [14:29:09] good thing production apaches aren't webserver::php5 [14:29:17] hehe [14:29:30] now that would have been fun [14:30:17] ha [14:30:22] found the problem with gmetric [14:30:30] shot in the dark [14:30:36] root@ms-fe1:~# /usr/bin/gmetric --name "swift object change" --type int32 --conf /etc/ganglia/gmond.conf --spoof "Swift pmtpa prod:Swift pmtpa prod" --value 5 --units "objects per second" [14:30:40] root@ms-fe1:~# /usr/bin/gmetric --name "swift_object_change" --type int32 --conf /etc/ganglia/gmond.conf --spoof "Swift pmtpa prod:Swift pmtpa prod" --value 5 --units "objects per second" [14:30:44] first one works, second one doesn't [14:30:46] other way around [14:30:49] er [14:31:04] first one is what ben's script runs and it fails with the newer gmetric [14:34:04] PROBLEM - Puppet freshness on search17 is CRITICAL: Puppet has not run in the last 10 hours [14:35:05] PROBLEM - Puppet freshness on db1011 is CRITICAL: Puppet has not run in the last 10 hours [14:35:05] PROBLEM - Puppet freshness on search1016 is CRITICAL: Puppet has not run in the last 10 hours [14:36:16] New patchset: Faidon; "Fix swift-ganglia-report-global-stats for 3.5" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54464 [14:38:29] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54464 [14:42:36] New patchset: Hashar; "contint: update apache conf file headers" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54465 [14:42:37] New patchset: Hashar; "Apache conf for https://zuul.wikimedia.org" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54466 [14:43:37] nice [14:50:04] and percentage queries by status also fixed [14:50:08] hits__ -> hits_%25 [14:50:41] now if only I could merge https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Swift%20pmtpa&h=ms-fe1.pmtpa.wmnet&v=84.7094625632&m=swift_200_hits__&r=hour&z=default&jr=&js=&st=1363617641&vl=hps&z=large history into https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Swift%20pmtpa&h=ms-fe1.pmtpa.wmnet&v=83.6527293844&m=swift_200_hits_%25&r=hour&z=default&jr=&js=&st=1363617641&vl=hps&z=large [14:52:03] hashar: so you're renaming integration.wm.org to zuul.wm.org? [14:52:53] paravoid: hop adding a new vhost :-] [14:53:15] paravoid: I will eventually have to pull Zuul out of gallium to a new host. Maybe next year. [14:53:47] paravoid: and will eventually mimic openstack by providing a nice status page such as http://zuul.openstack.org/ ( with nice performances stats from graphite) [14:54:01] the aim is to replace https://integration.wikimedia.org/zuul/status entirely [14:54:04] heh [14:54:26] for graphite I will need python-statsd to be Debianized for Precise :] [14:58:05] PROBLEM - Puppet freshness on europium is CRITICAL: Puppet has not run in the last 10 hours [15:08:06] PROBLEM - Apache HTTP on mw1040 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:08:07] PROBLEM - Apache HTTP on mw1047 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:08:54] RECOVERY - Apache HTTP on mw1047 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.178 second response time [15:08:55] RECOVERY - Apache HTTP on mw1040 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 1.135 second response time [15:21:34] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.0391825 (gt 8.0) [15:23:55] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 188 seconds [15:23:56] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 190 seconds [15:24:04] PROBLEM - Puppet freshness on lvs1004 is CRITICAL: Puppet has not run in the last 10 hours [15:24:05] PROBLEM - Puppet freshness on lvs1005 is CRITICAL: Puppet has not run in the last 10 hours [15:24:05] PROBLEM - Puppet freshness on lvs1006 is CRITICAL: Puppet has not run in the last 10 hours [15:24:05] PROBLEM - Puppet freshness on msfe1002 is CRITICAL: Puppet has not run in the last 10 hours [15:29:55] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 16 seconds [15:30:44] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [15:33:45] Change merged: Andrew Bogott; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54344 [15:47:34] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.97212751773 [15:51:39] New review: Diederik; "The reason why we haven't installed Panda yet is because of the unbelievable long list of dependencies:" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/54116 [15:54:50] New patchset: Ottomata; "Fixing email-blog-pageviews.erb" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54476 [15:55:30] New review: Ottomata; "I think this is ok on stat1 (unless another opsen disagrees). stat1 is meant for number crunching ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54116 [15:57:11] New review: Ottomata; "Also, if you own this as root:www-data 640, how are Evan and others going to write files into this ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54116 [15:57:39] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54476 [16:00:08] PROBLEM - Puppet freshness on cp1036 is CRITICAL: Puppet has not run in the last 10 hours [16:00:09] RECOVERY - zuul_service_running on gallium is OK: PROCS OK: 1 process with args zuul-server [16:01:37] !log Hard restarted Zuul which was dead locked again :-( [16:01:44] Logged the message, Master [16:03:16] so, uh, Andre can open https://wikimediafoundation.org/wiki/Staff?showall=1 but I'm getting a "cannot connect to server" error. He's in Czech Republic, I'm in SF. [16:03:37] ah, nevermind, works now, but I promise, it wasn't working there for at least 3 minutes [16:03:50] :) [16:04:12] PROBLEM - Puppet freshness on sq73 is CRITICAL: Puppet has not run in the last 10 hours [16:04:13] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 181 seconds [16:04:14] PROBLEM - zuul_service_running on gallium is CRITICAL: PROCS CRITICAL: 2 processes with args zuul-server [16:04:14] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 184 seconds [16:04:55] paravoid: going to swap out disk 4 on ms-be1004...any objection? [16:06:00] yes, give me a sec first [16:06:16] PROBLEM - Puppet freshness on virt1005 is CRITICAL: Puppet has not run in the last 10 hours [16:09:41] New patchset: Ottomata; "Fixing email blog pageviews job" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54478 [16:10:09] paravoid: okay...lmk [16:10:59] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54478 [16:15:34] New patchset: Ottomata; "Need to include passwords::mysql::research" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54481 [16:17:05] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54481 [16:19:12] cmjohnson1: ok, go ahead [16:19:19] ok...thx [16:19:33] !log removing/replacing disk 4 ms-be1004 [16:19:41] Logged the message, Master [16:21:16] New review: Silke Meyer; "OK on a Wikidata client, but repo doesn't get it's extensions. I'll have to investigate this further!" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/51797 [16:21:36] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.58248736842 (gt 8.0) [16:22:24] New patchset: Ottomata; "Removing newline" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54483 [16:22:57] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54483 [16:28:24] cmjohnson1: plugged the new disk? [16:29:37] so many bad disks in the new ceph servers? [16:29:41] yeah [16:29:46] osdmap e170605: 144 osds: 137 up, 137 in [16:31:18] yes [16:31:29] mark: there are several nearly all of them have at least 1 [16:31:36] but the osds show up which is odd [16:31:47] did you also do the megacli magic? [16:32:09] the system can't see the disk, I'm guessing there's no array [16:32:32] parvoid: no, i am not sure how to get that disk back in the right order [16:36:21] paravoid: if i screw up the order than it will mess up the mapping on the OSD's...correct? any suggestions? [16:36:49] it won't be a huge deal as ceph doesn't particularly care, but it'll confuse us in the future [16:36:51] New review: Andrew Bogott; "I cut out those lines based on the assumption that they were there because of copy/paste and didn't ..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/51797 [16:38:05] PROBLEM - Puppet freshness on db56 is CRITICAL: Puppet has not run in the last 10 hours [16:39:28] hey milimetric [16:39:51] the mobile reportcards seem to have reasonable number of caching issues. is that being tracked someplace? [16:40:00] clearing cache after every update doesn't seem right [16:40:05] PROBLEM - Puppet freshness on lvs1003 is CRITICAL: Puppet has not run in the last 10 hours [16:40:33] milimetric: sorry, wrong channel [16:50:06] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 28 seconds [16:50:16] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [17:05:57] Etherpad Lite seems to be down. Is this known? [17:08:17] etherpad lite is still a labs project [17:08:19] ask in #wikimedia-labs [17:11:42] paravoid: I asked there but received no response. [17:14:05] RECOVERY - Puppet freshness on sq70 is OK: puppet ran at Mon Mar 18 17:14:02 UTC 2013 [17:16:26] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 187 seconds [17:17:06] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 200 seconds [17:24:05] * jeremyb_ has advised valeriej to be a little more vocal in #-labs :) [17:36:07] New patchset: Dzahn; "add empty password class for eventlogging to passwords, to reflect addition in private repo" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54490 [17:37:55] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54490 [17:38:22] mutante: :D thank you [17:38:52] ori-l: np, you should be able to use it now, user/pass taken from vanadium as requested [17:40:52] awesome, thanks [17:42:01] ori-l: errr, Peter == milimetric ? [17:42:33] ori-l: Do you happen to know how to properly debian/package ruby software? More specifically a ruby gem? (I'm referring to jsduck, and yes it also has a few other gems as dependencies) [17:42:43] https://bugzilla.wikimedia.org/show_bug.cgi?id=46236 [17:43:06] I'm currently blocked on this to continue with the jenkins jobs, as they need that bin. [17:43:11] mutante: ^ [17:43:19] notpeter: ^ [17:43:22] jeremyb_ hm? I'm Dan Andreescu, notpeter is Peter Youngmeister [17:43:32] Krinkle: not sure. i'll take a look [17:43:33] milimetric: tell ori-l :-) [17:43:43] oh he knows :) [17:43:52] idk... :) [17:44:25] paravoid: so when might we have 4 ceph frontends running? [17:45:02] Krinkle: never packaged a ruby gem before, but this might help http://stackoverflow.com/questions/7116377/create-a-debian-package-from-a-ruby-gem [17:45:17] do we need 4? [17:45:53] probably not since we didn't *need* 4 swift ones, but last I checked we had 1 [17:46:02] 2 would be nice [17:46:06] we have two [17:46:22] actually running and getting a portion of traffic? [17:46:25] and we have two more needing a reformat [17:46:37] is using this to package ruby gems a good idea? http://rubygems.org/gems/fpm "Convert directories, rpms, python eggs, rubygems, and more to rpms, debs, solaris packages and more." [17:46:48] no [17:47:02] just use gem2deb [17:47:09] Krinkle: ^ :) [17:47:51] mutante: I tried several of them, but they all seemed to have crappy details in the end. But then again, I don't know much about this. I recall something problematic about dependencies and the specific versions of the dependencies [17:47:53] TCP ms-fe.svc.eqiad.wmnet:http wrr -> ms-fe1001.eqiad.wmnet:http Route 40 11 8 -> ms-fe1002.eqiad.wmnet:http Route 40 9 8 [17:48:07] whitespace damaged, but yes, both are active/load-balanced [17:48:22] Krinkle: also tried gem2deb as paravoid suggests? [17:48:30] mutante: Anyhow, I can't try anything because I couldn't verify it properly. I'd be working in the dark. [17:48:56] I could spend hours/days on it, and it'd be a total waste of effort and time. [17:49:20] New review: Ori.livneh; "> Also, if you own this as root:www-data 640, how are Evan and others going to write files into this..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54116 [17:50:18] I'd like to believe this is a task for operations. We have separate teams so that people can do what they're good at. I'm always curious to learn more, but in this case I think it's fair to say this is out of my reach. [17:50:48] New patchset: Ori.livneh; "Rsync public data for visualization to stat1001" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54116 [17:50:49] I think that's fair [17:51:17] that's what I tell people too [17:51:25] paravoid: is the balancing weighted? [17:51:34] if you want to do it I'll help/review in the spirit of levelup [17:51:37] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 11.697558547 (gt 8.0) [17:51:41] every script I run seem to just hit fe1002 [17:51:47] if not that's entirely fair and it's a thing for our team [17:52:15] PROBLEM - Packetloss_Average on oxygen is CRITICAL: CRITICAL: packet_loss_average is 8.27187689922 (gt 8.0) [17:52:19] AaronSchulz: both have equal weight [17:52:23] New patchset: Dzahn; "turn planet into a puppet module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54493 [17:52:24] New patchset: Dzahn; "rename planet class, per docs init.pp must exist and contain a class matching the module name" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54494 [17:52:24] New patchset: Dzahn; "move defined resource types into separate files, one definition per file inside a module as recommended by puppet docs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54495 [17:52:24] New patchset: Dzahn; "move package install to own class and file, move generic::locales out of init.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54496 [17:52:24] New patchset: Dzahn; "move webserver setup for planet out of init.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54497 [17:52:25] New patchset: Dzahn; "move locales install and generation into own file, step 1 to making module self contained instead of declaring stuff from generic-definitions.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54498 [17:52:25] New patchset: Dzahn; "move needed planet directories to own class" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54499 [17:52:25] New patchset: Dzahn; "move user/theme/apache_sites out of venus.pp, move index_site setup to own file, move webserver setup out of init.pp" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54500 [17:52:25] New patchset: Dzahn; "move language prefixes and translations to own class/file languages. use a qualified variable to access it" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54501 [17:52:26] New patchset: Dzahn; "various fixes to make puppet-lint like it, like wrong quoting, unaligned arrows, variables without explicit scope, lines longer than 80 chars, and more" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54502 [17:52:32] AaronSchulz: where do you see that? [17:52:34] fe1002? [17:52:37] ganglia [17:52:39] mutanteflood [17:52:45] maybe ganglia is just wrong ;) [17:52:48] paravoid: I got enough op-related levelup for the time being. Some other time :) I want to make sure operations people still have a job to do :P [17:53:02] AaronSchulz: persistent connections? [17:53:14] paravoid: But thanks for the offer, I'll probably take you up on it one day. [17:53:58] I don't think so [17:54:37] hrm [17:55:37] paravoid: would love yr feedback on https://gerrit.wikimedia.org/r/#/c/54324/ [17:56:03] got a meeting in 5' [17:56:08] so probably later [17:56:18] np, thanks [17:56:46] looks good in general, I have a couple of suggestions though [17:58:43] paravoid: cool, note them when you get a chance and i'll make fixes [18:00:35] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 184 seconds [18:01:26] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 198 seconds [18:01:33] New patchset: Reedy; "Set wgCookieExpiration to 30 days" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54505 [18:02:09] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54505 [18:03:25] !log aaron synchronized wmf-config/PrivateSettings.php 'Removed old testing cruft' [18:03:33] Logged the message, Master [18:05:58] !log reedy synchronized wmf-config/CommonSettings.php [18:06:05] Logged the message, Master [18:11:35] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 9.66516654676 (gt 8.0) [18:24:26] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [18:24:36] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [18:26:37] New patchset: Ottomata; "add. Some new global settings for metrics-api project." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53868 [18:28:18] PROBLEM - Packetloss_Average on emery is CRITICAL: CRITICAL: packet_loss_average is 8.58356535714 (gt 8.0) [18:30:04] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [18:30:30] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53868 [18:32:22] PROBLEM - Packetloss_Average on oxygen is CRITICAL: CRITICAL: packet_loss_average is 9.38441573643 (gt 8.0) [18:39:53] !log reedy synchronized php-1.21wmf12 'Initial sync of php-1.21wmf12' [18:40:01] Logged the message, Master [18:48:17] PROBLEM - Packetloss_Average on emery is CRITICAL: CRITICAL: packet_loss_average is 8.64687382979 (gt 8.0) [18:54:11] Reedy: pm [18:56:26] !log asher synchronized wmf-config/db-eqiad.php 'pulling db1043, db1009 for upgrade' [18:56:34] Logged the message, Master [18:58:09] PROBLEM - MySQL Slave Running on db78 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Query partially completed on the master (error on master: 1317) and w [18:58:19] RECOVERY - Packetloss_Average on emery is OK: OK: packet_loss_average is 3.69825935714 [18:58:20] PROBLEM - MySQL Slave Running on db1025 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Query partially completed on the master (error on master: 1317) and w [19:02:19] PROBLEM - Packetloss_Average on oxygen is CRITICAL: CRITICAL: packet_loss_average is 8.47387868217 (gt 8.0) [19:02:46] !log asher synchronized wmf-config/db-eqiad.php 'db1050 for special s1' [19:02:54] Logged the message, Master [19:03:46] !log reedy synchronized docroot [19:04:16] Logged the message, Master [19:05:24] mark: paravoid soooo [19:05:32] db78 and db1025 are having issues [19:05:36] they are fundraising boxes [19:05:42] neither asher nor I have access.... [19:07:38] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 10.0916172269 (gt 8.0) [19:07:52] !log asher synchronized wmf-config/db-eqiad.php 'returning db1043 db1009' [19:07:59] Logged the message, Master [19:10:02] New patchset: Rfaulk; "add. Flag option to use flask.ext.login for metrics API." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54513 [19:10:26] RECOVERY - Puppet freshness on mw1124 is OK: puppet ran at Mon Mar 18 19:10:25 UTC 2013 [19:10:34] Copying to fenari from 10.0.5.8...rsync: send_files failed to open "/php-1.21wmf11/.git/modules/extensions/FormPreloadPostCache/index.lock" (in common): Permission denied (13) [19:11:39] New patchset: Demon; "Make hooks-bugzilla less spammy" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54514 [19:13:19] pastebin.com looks interesting today :p [19:14:47] RECOVERY - Puppet freshness on mw1104 is OK: puppet ran at Mon Mar 18 19:14:38 UTC 2013 [19:15:14] eh? [19:19:07] RECOVERY - Puppet freshness on search17 is OK: puppet ran at Mon Mar 18 19:19:01 UTC 2013 [19:19:15] New review: Ottomata; "Hm, cool!" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54513 [19:19:47] PROBLEM - MySQL Slave Delay on db39 is CRITICAL: CRIT replication delay 194 seconds [19:21:18] RECOVERY - Puppet freshness on search1016 is OK: puppet ran at Mon Mar 18 19:21:14 UTC 2013 [19:22:16] RECOVERY - Puppet freshness on mw1131 is OK: puppet ran at Mon Mar 18 19:22:13 UTC 2013 [19:24:57] !log reedy Started syncing Wikimedia installation... : Rebuild message cache for 1.21wmf12 [19:25:03] Logged the message, Master [19:25:46] RECOVERY - MySQL Slave Delay on db39 is OK: OK replication delay 29 seconds [19:27:36] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 9.61299543478 (gt 8.0) [19:28:07] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 190 seconds [19:28:07] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 189 seconds [19:28:17] PROBLEM - Packetloss_Average on emery is CRITICAL: CRITICAL: packet_loss_average is 8.42276690647 (gt 8.0) [19:32:22] !log upgraded all coredb mariadb replicas to 5.5.30 [19:32:37] Logged the message, Master [19:36:16] PROBLEM - LVS HTTP IPv6 on wikidata-lb.pmtpa.wikimedia.org_ipv6 is CRITICAL: Connection timed out [19:36:56] uh? [19:37:01] RECOVERY - LVS HTTP IPv6 on wikidata-lb.pmtpa.wikimedia.org_ipv6 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 995 bytes in 0.065 second response time [19:37:08] right.... [19:38:06] icinga's sloooow [19:42:27] gerrit too [19:42:40] well it also freezes up my browser [19:42:44] PROBLEM - Packetloss_Average on oxygen is CRITICAL: CRITICAL: packet_loss_average is 8.91653930233 (gt 8.0) [19:42:47] (icinga) [19:43:52] ^demon, is gerrit borky? [19:43:53] Connection to gerrit.wikimedia.org closed by remote host. [19:44:01] The proxy server received an invalid response from an upstream server. [19:44:28] <^demon> !log jetty freaked out again, forcing a gerrit restart [19:44:36] Logged the message, Master [19:45:28] danke! [19:48:23] PROBLEM - swift-account-reaper on ms-be12 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [19:48:24] PROBLEM - Apache HTTP on mw27 is CRITICAL: Connection refused [19:48:24] PROBLEM - MySQL Slave Running on rdb1002 is CRITICAL: NRPE: Command check_mysql_slave_running not defined [19:48:24] PROBLEM - Backend Squid HTTP on knsq17 is CRITICAL: Connection refused [19:48:24] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 3037 seconds [19:50:14] PROBLEM - Packetloss_Average on emery is CRITICAL: CRITICAL: packet_loss_average is 8.80697151079 (gt 8.0) [19:50:23] PROBLEM - swift-account-reaper on ms-be11 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [19:51:25] PROBLEM - Solr on vanadium is CRITICAL: Average request time is 526.92883 (gt 400) [19:52:09] hrmmm, no maxsem. (for solr) [19:52:27] jeremyb_, and now? [19:52:35] (I'm in a meeting, btw) [19:52:46] MaxSem: there was an icinga alert above [19:53:22] and mw27/knsq17 ? someone working on them? [19:53:35] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 185 seconds [19:53:35] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 185 seconds [19:54:34] jeremyb_, vanadium is Nikerabbit. it's his crazy requests taking too long:P [19:54:53] MaxSem: ok [20:00:17] !log reedy Finished syncing Wikimedia installation... : Rebuild message cache for 1.21wmf12 [20:00:24] Logged the message, Master [20:02:15] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 0.0 [20:03:25] chmod g+w /home/wikipedia/common/php-1.21wmf11/.git/modules/extensions/FormPreloadPostCache/index.lock [20:04:12] !log reedy rebuilt wikiversions.cdb and synchronized wikiversions files: [20:04:20] Logged the message, Master [20:05:02] New patchset: Reedy; "1.21wmf12 deployment stuffs" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54520 [20:05:19] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54520 [20:06:02] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [20:06:09] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 1 seconds [20:06:09] PROBLEM - Puppet freshness on mc1008 is CRITICAL: Puppet has not run in the last 10 hours [20:06:14] New patchset: Reedy; "Update php symlink" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54521 [20:07:37] Change merged: Reedy; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54521 [20:12:01] New patchset: Ram; "Bug: 46295 Fix error parsing InitialiseSettings.php" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/54522 [20:12:07] PROBLEM - Packetloss_Average on emery is CRITICAL: CRITICAL: packet_loss_average is 9.70951 (gt 8.0) [20:12:43] hia, paravoid, quick .deb q for you, if you are still awake [20:12:47] I"m creating the flask-login deb [20:12:49] LeslieCarr, the noc email address, am I right in think that's an OTRS queue? [20:12:50] someone already created this repo [20:12:51] operations/debs/flask-login [20:12:59] but, the .deb will be called python-flask-login [20:13:04] hah, parsing php in java. that's a good one [20:13:04] should the repo be named the same? [20:13:09] Thehelpfulone: i think so [20:13:36] i noticed that git-buildpackage builds .dsc files named flask-login_0.1.2-1.dsc, and the .deb as python-flask-login_0.1.2-1_all.deb [20:13:39] which seems weird to me [20:13:44] Thehelpfulone: otrs-wiki agrees [20:14:36] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 14.104221223 (gt 8.0) [20:15:25] yeah, I was just looking for someone who could be regarded as the "owner" or contact for the queue jeremyb_, filling out https://otrs-wiki.wikimedia.org/wiki/User:Rjd0060/sandbox [20:16:03] paravoid: was this one already discussed as an option for the ruby gems? "13:21 < FriedBob> mutante: You can ensure on gems, and install gems via the package resource [20:18:12] Thehelpfulone: you could use some gridlines... [20:18:42] hey I didn't make it, blame RD for that ;) [20:18:57] Thehelpfulone: yeah, leslie's probably fine. you can ask RD who else actually works the queue [20:20:13] !log authdns-update [20:20:26] Logged the message, RobH [20:20:57] (most of ops in SF is at lunch, just fyi) [20:21:02] for folks asking questions to them ;] [20:21:06] Thehelpfulone: noc is just an alias for root [20:21:17] mutante, it's also a queue in OTRS [20:21:28] i dont think it gets any email then [20:21:36] or does it [20:21:43] it used to years ago, i dunno if it still odes [20:21:46] yeah that's why I was asking, things go through mchenry first don't they? [20:21:48] noc? I think it does [20:21:50] i know that no one in ops logs into otrs regularly ;] [20:21:56] I see mail to noc pretty often (but also a lot of spam) [20:21:57] (for ops work that is) [20:22:07] all i can say that from mchenry point of view it is nothing but an alias for root [20:22:13] I dunno about the queue, I don't think I have access to that one [20:22:22] man... dont make me login to otrs [20:22:27] i have not done in months! [20:22:36] I do just often enough to keep my account [20:22:47] well once in a while I get asked to lok at something so.. [20:22:58] huh, i dont even see noc. [20:23:17] prettu sure cuz its empty... [20:23:19] i dunno otrs at all. [20:23:28] redirects help@ to thehelpfulone [20:23:30] I recall that it used to go into OTRS years ago [20:23:33] not :) [20:23:44] guys [20:23:50] woops about to be wrong channel [20:24:05] https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketSearch&Subaction=Search&QueueIDs=5 [20:24:07] I know that Leslie does a few tickets from there because I've seen emails in OTRS [20:25:03] Is the tampa network ill? [20:27:04] jeremyb_: i see no poen tickets though [20:27:14] 6 eqix-dc2.wikimediafound.com (206.126.236.221) 29.226 ms 28.552 ms 28.531 ms [20:27:14] 7 ae0.cr1-eqiad.wikimedia.org (208.80.154.193) 24.947 ms 24.494 ms 24.463 ms [20:27:14] 8 xe-0-0-1.cr1-sdtpa.wikimedia.org (208.80.154.210) 59.379 ms 58.536 ms 58.519 ms [20:27:14] 9 * * * [20:27:14] 10 * * * [20:27:17] RobH: but how old are the closed ones? [20:27:27] RobH: and https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketQueue&QueueID=5 [20:27:42] im still not sure what the question is [20:27:46] noc@ doesnt go there [20:27:50] so i have no idea how things are going in there [20:27:59] RobH: the question is "is it really used in any way" [20:28:09] not that i know of. [20:28:56] * jeremyb_ is thinking it should go the way of the legal [20:29:13] if that means kill it [20:29:15] i agree. [20:29:44] notpeter, fyi, I took the flask-login RT ticket [20:29:46] i think I got it [20:30:04] ottomata: you are awesome [20:31:15] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 14 seconds [20:31:16] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 11 seconds [20:31:30] RobH: well there was a maybe 6-12 month period where legal@ was an alias instead of going to OTRS but the queue was still there. there were several announcements to forward to legal instead of using the queue but people kept moving stuff from other queues into legal. (which then got ignored). [20:32:01] RobH: idk if anyone moves stuff to noc or if we have anywhere telling people too but if it's unused then we should just disable the ability to move stuff there [20:32:11] (like was eventually done with legal) [20:32:45] telling people to* [20:34:59] ^demon, you around?, i'm about to do some gerrit stuff with a new repo and I want to be sure I don't mess it up and have to ask for help later :) [20:35:08] <^demon> Yes, I am. [20:35:15] PROBLEM - MySQL Slave Delay on db32 is CRITICAL: CRIT replication delay 186 seconds [20:35:15] PROBLEM - MySQL Replication Heartbeat on db32 is CRITICAL: CRIT replication delay 187 seconds [20:35:40] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/53624 [20:35:44] so, i have a new repo [20:35:55] operations/debs/python-flask-login (I know you created flask-login, but i thikn it is named incorrectly) [20:35:58] jeremyb_: legal is an Office IT thing nowadays [20:36:00] it does not have any commits on the remote [20:36:13] my local has 3 branches [20:36:25] mutante: huh? [20:36:33] I want to push everything except for the latest commit on master directly [20:36:35] !log authdns-update rt4694 [20:36:39] and then push the latest commit on master for review [20:36:42] mutante: oh, you mean legal is google apps? [20:36:46] jeremyb_: yes [20:36:50] Logged the message, RobH [20:37:10] jeremyb_: mchenry just has alias TO legal, but the target is google apps [20:37:11] mutante: very recently? it wasn't google apps when luis was added. iirc [20:37:16] ^demon, is that possible? [20:37:19] <^demon> ottomata: Yes. [20:37:33] <^demon> `git push origin [branchname]` for the branches you want to directly push all of. [20:37:55] <^demon> `git push origin HEAD~1:refs/heads/master` for the one you want to push all-but-latest-to-master [20:38:04] jeremyb_: December 2011 [20:38:06] ooooo [20:38:08] cool [20:38:15] <^demon> And then `git push origin HEAD:refs/for/master` for the one you want reviewed. [20:38:19] <^demon> Or git-review, if that suits you :) [20:39:03] mwalker: hi [20:39:11] jeremyb_: and legal != legalteam, which is a mailman list :) there is always confusion about it no matter how it's done [20:39:33] <^demon> ottomata: So generally speaking, a full git push is `git push [remote] [local ref]:[remote ref]` [20:39:48] mutante: yeah, i have 4642 open :) [20:40:15] RECOVERY - MySQL Slave Delay on db32 is OK: OK replication delay 0 seconds [20:40:15] RECOVERY - MySQL Replication Heartbeat on db32 is OK: OK replication delay 0 seconds [20:40:16] <^demon> notpeter: I came across a couple more wikis with borked indices. I filed https://rt.wikimedia.org/Ticket/Display.html?id=4757 [20:40:56] ^demon: cool! will take a look today :) [20:40:57] thanks! [20:40:59] mutante: huh, 2166. i wonder what i was thinking [20:41:31] jeremyb_: maybe you thought of legalteam [20:41:44] or roaaary or roaaaary :) [20:42:17] New patchset: Ottomata; "Creating debian/ directory using stdeb. python setup.py --command-packages=stdeb.command debianize See: https://pypi.python.org/pypi/stdeb" [operations/debs/python-flask-login] (master) - https://gerrit.wikimedia.org/r/54525 [20:42:27] stdeb is bad [20:42:27] YEEHAW [20:42:28] so is fpm [20:42:31] thanks ^demon [20:42:36] :-D [20:42:38] <^demon> yw. [20:42:39] sigh [20:42:43] -1 [20:42:47] paravoid, whaaaaa, but I only used it to create the debian/ [20:42:51] that's bad? [20:42:58] i'm still using git-buildpackage for repo structure and building [20:42:58] yeah use mkdir && vim :-] [20:43:18] ottomata: you will have to train me on git-buildpackage :-] [20:43:24] mwalker: you about? [20:43:37] they usually produce nonsense [20:44:05] this one doesn't look too bad, although the description seems borked, maintainer is wrong [20:44:11] btw, for wikimedia-task-appserver, clone, debuild -us -uc , and i got a .deb [20:44:16] preinst is redundant [20:44:27] paravoid, is this bad? [20:44:27] https://gist.github.com/ottomata/5190642 [20:44:27] debian/source/format is better to be 3.0 (quilt) [20:44:38] New patchset: RobH; "adding redirection for wikiipedia.org" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/54526 [20:44:48] build-deps & standards version are old [20:44:54] debhelper is 7, which is minor [20:45:17] other than that, okay [20:45:37] ok, ha, this is the fastest review ever!? [20:45:53] :) [20:46:54] hm, paravoid, when you say build-debs and standars are old [20:46:56] what should they be? [20:47:48] python-support is >= 0.8.4 [20:47:58] that's even pre-lucid [20:48:01] New review: Demon; "This needs upstream https://gerrit-review.googlesource.com/#/c/43280/ merged before we can deploy. J..." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54514 [20:48:14] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 181 seconds [20:48:24] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 185 seconds [20:51:19] hmm, ok I can change them [20:51:21] q though, paravoid [20:51:38] those are minimum deps that stdeb picked out, if it thinks thats all it needs to build the package [20:51:42] why would we require a higher version? [20:52:15] PROBLEM - Packetloss_Average on emery is CRITICAL: CRITICAL: packet_loss_average is 10.9044645714 (gt 8.0) [20:52:21] hello [20:54:38] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 23 seconds [20:55:17] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [20:55:24] New review: RobH; "jenkins is now amazingly slow to update:" [operations/apache-config] (master); V: 2 C: 2; - https://gerrit.wikimedia.org/r/54526 [20:55:25] Change merged: RobH; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/54526 [20:57:27] ottomata: just remove the versions if they're <= lucid [20:57:36] hm ok [20:58:03] mwalker: hi, I'd really love to ask you some questions about what you're doing on db1008 [20:58:13] replication is currently broken to the backup slaves [20:58:18] because of some of your activity [20:58:21] can you please contact me [20:58:26] robh is doing a graceful restart of all apaches [20:58:45] !log robh gracefulled all apaches [20:58:52] Logged the message, Master [20:59:18] RECOVERY - Apache HTTP on mw27 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.163 second response time [20:59:31] New patchset: Rfaulk; "mod. Check for existence of flask-login first." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54529 [21:00:16] robh is doing a graceful restart of all apaches [21:00:25] !log troubleshooting erros in apache restart script [21:00:32] Logged the message, RobH [21:00:35] !log robh gracefulled all apaches [21:00:42] Logged the message, Master [21:01:50] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54341 [21:02:35] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54343 [21:02:37] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 9.93969067797 (gt 8.0) [21:08:09] PROBLEM - Apache HTTP on mw1159 is CRITICAL: Connection refused [21:08:18] PROBLEM - Apache HTTP on mw1157 is CRITICAL: Connection refused [21:08:18] PROBLEM - Apache HTTP on mw1152 is CRITICAL: Connection refused [21:08:18] PROBLEM - Apache HTTP on mw1155 is CRITICAL: Connection refused [21:08:18] PROBLEM - Apache HTTP on mw1156 is CRITICAL: Connection refused [21:08:18] PROBLEM - Apache HTTP on mw1151 is CRITICAL: Connection refused [21:08:19] PROBLEM - Apache HTTP on mw1150 is CRITICAL: Connection refused [21:08:19] PROBLEM - Apache HTTP on mw1154 is CRITICAL: Connection refused [21:08:19] PROBLEM - Apache HTTP on mw115 is CRITICAL: Connection refused [21:08:19] PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: Connection refused [21:08:21] PROBLEM - Apache HTTP on mw1153 is CRITICAL: Connection refused [21:08:38] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 0.0 [21:08:50] New patchset: Rfaulk; "add. Flag option to use flask.ext.login for metrics API." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54513 [21:08:51] wth? [21:08:57] graceful on apaches [21:09:03] i pushed an apache change that passed tests [21:09:03] PROBLEM - Apache HTTP on mw1158 is CRITICAL: Connection refused [21:09:06] needs more grace [21:09:06] and those didnt restart [21:09:11] ah I see it now [21:09:12] so that shouldnnnnt be me. [21:09:23] hm [21:09:26] as those are not restarted by apache-graceful-all [21:09:30] RobH: the jenkins slowness I am aware of it. Going to hack something tomorrow morning. [21:09:42] hashar: cool, for now i go and find the actual test, and link it in my comment [21:09:54] cuz it did pass tests, just the handoff back to gerrit to update comments is slow [21:10:12] RECOVERY - Apache HTTP on mw1159 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.064 second response time [21:10:29] RobH: yeah it sometime take up to 1hour to report back. But I think I got the patch that fix the issue :-] [21:10:47] Change abandoned: Rfaulk; "fixed in https://gerrit.wikimedia.org/r/#/c/54513/ .. error with the amend." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54529 [21:11:14] RECOVERY - Apache HTTP on mw1156 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.097 second response time [21:11:35] PROBLEM - MySQL Replication Heartbeat on db53 is CRITICAL: CRIT replication delay 183 seconds [21:12:13] RECOVERY - Apache HTTP on mw1154 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.096 second response time [21:12:13] RECOVERY - Apache HTTP on mw1152 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.139 second response time [21:12:14] PROBLEM - MySQL Slave Delay on db53 is CRITICAL: CRIT replication delay 199 seconds [21:12:20] Ok, so the gracefull all gave me an error on pubkey for those servers [21:12:23] notpeter: want to work together to get the fundraising db slaving happy ? [21:12:25] and for some reason those servers also are now dead. [21:12:32] LeslieCarr: turns out I have access [21:12:36] oh cool [21:12:37] and I have found the problem [21:12:41] tanks, though! [21:12:42] what's the problem ? [21:12:43] okay [21:12:55] notpeter: problem with current rendering cluster? [21:13:08] its not running apache on anything, and i think its my fault from the graceful all [21:13:15] (well, my fault in that i ran it, not my fault it broke) [21:13:42] PROBLEM - Apache HTTP on mw116 is CRITICAL: Connection refused [21:13:52] New patchset: Ottomata; "Creating debian/ directory using stdeb. python setup.py --command-packages=stdeb.command debianize See: https://pypi.python.org/pypi/stdeb" [operations/debs/python-flask-login] (master) - https://gerrit.wikimedia.org/r/54525 [21:13:53] PROBLEM - Apache HTTP on mw1149 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:14:04] !log started apache on all imagescalers [21:14:10] Logged the message, Master [21:14:12] PROBLEM - Apache HTTP on mw1166 is CRITICAL: Connection refused [21:14:13] PROBLEM - Apache HTTP on mw1163 is CRITICAL: Connection refused [21:14:13] PROBLEM - Apache HTTP on mw1168 is CRITICAL: Connection refused [21:14:13] PROBLEM - Apache HTTP on mw1162 is CRITICAL: Connection refused [21:14:13] RECOVERY - Apache HTTP on mw1155 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.100 second response time [21:14:13] RECOVERY - Apache HTTP on mw1157 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.100 second response time [21:14:13] PROBLEM - Apache HTTP on mw1164 is CRITICAL: Connection refused [21:14:14] PROBLEM - Apache HTTP on mw1165 is CRITICAL: Connection refused [21:14:14] PROBLEM - Apache HTTP on mw1167 is CRITICAL: Connection refused [21:14:15] RECOVERY - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 66437 bytes in 0.239 second response time [21:14:24] RECOVERY - Apache HTTP on mw1153 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.071 second response time [21:14:24] I'm on mw1151 and it claims it took your key [21:14:32] lookng in auth.log [21:14:34] RECOVERY - MySQL Slave Running on db78 is OK: OK replication Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Error: [21:14:34] well, it did when i just logged in [21:14:40] but refused it awhile abck, lemme see [21:14:44] RECOVERY - Apache HTTP on mw1158 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.096 second response time [21:14:46] well I mean about 15 mins ago [21:15:15] PROBLEM - Apache HTTP on mw1169 is CRITICAL: Connection refused [21:15:15] PROBLEM - Apache HTTP on mw1161 is CRITICAL: Connection refused [21:15:16] RECOVERY - Apache HTTP on mw1164 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.080 second response time [21:15:16] it has the graceful in the log right after that [21:15:23] PROBLEM - Apache HTTP on mw1152 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:15:33] RECOVERY - MySQL Slave Running on db1025 is OK: OK replication Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Error: [21:15:40] apergos: where ? [21:15:44] RECOVERY - Apache HTTP on mw1149 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 3.141 second response time [21:15:45] im looking at tail end of the file [21:16:03] New patchset: Asher; "upgrading one db per core shard in eqiad to mariadb" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54575 [21:16:15] RECOVERY - Apache HTTP on mw1169 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.054 second response time [21:16:16] RECOVERY - Apache HTTP on mw1163 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.061 second response time [21:16:16] RECOVERY - Apache HTTP on mw1168 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.050 second response time [21:16:16] RECOVERY - Apache HTTP on mw1152 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.070 second response time [21:16:16] PROBLEM - MySQL Slave Delay on db78 is CRITICAL: CRIT replication delay 5415 seconds [21:16:21] in auth.log on mw1151 [21:16:44] RECOVERY - Apache HTTP on mw116 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.174 second response time [21:16:49] I did tail -200 but anyways it was like about 2 minutes to the hour [21:17:03] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54513 [21:17:14] RECOVERY - Apache HTTP on mw1151 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.084 second response time [21:17:34] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 6852 seconds [21:17:49] * apergos gets off [21:17:49] it [21:18:14] RECOVERY - Apache HTTP on mw1150 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.074 second response time [21:18:14] RECOVERY - Apache HTTP on mw1161 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.057 second response time [21:18:23] !log service apache2 start on the mw-eqiad group via dsh [21:18:29] Logged the message, Mistress of the network gear. [21:18:37] mw1151: Mar 18 21:05:46 10.64.16.131 apache2[28796]: [notice] caught SIGTERM, shutting down [21:18:55] LeslieCarr notpeter; don't know which one of you is still poking at fundraising's DBs; but I'm guessing I just broke replication again with a crappy update query -- [21:19:15] RECOVERY - Apache HTTP on mw1162 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.057 second response time [21:19:15] RECOVERY - Apache HTTP on mw1166 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.070 second response time [21:19:15] RECOVERY - Apache HTTP on mw1165 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.068 second response time [21:19:15] RECOVERY - Apache HTTP on mw1167 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.088 second response time [21:19:25] mwalker: we shall see soon... [21:19:47] they're still catching up to that point in the binlog [21:20:00] looks like they're back now :) [21:20:13] RECOVERY - Apache HTTP on mw115 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 747 bytes in 0.199 second response time [21:21:01] ... and today Matt learns that queries that work on his local will break replication when they take 2 hours on the cluster :p [21:21:21] !log robh synchronized docroot [21:21:25] did anyone manually stop puppet on lvs's ? [21:21:28] Logged the message, Master [21:21:35] New patchset: Asher; "pulling db per shard for upgrads" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54576 [21:22:06] !log gracefuling all apaches resulted in rendering cluster overload, they didnt restart apache, odd [21:22:14] RECOVERY - Packetloss_Average on oxygen is OK: OK: packet_loss_average is 3.8211 [21:22:24] Logged the message, RobH [21:22:36] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 11.2982573729 (gt 8.0) [21:22:53] * Damianz thinks RobH needs moar gracefullness [21:23:10] mwalker: you're good. slaving totally caught up on db78 [21:23:18] New patchset: Jeremyb; "make ircecho config sane (not just very long strings)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8344 [21:23:18] New patchset: Jeremyb; "change all $ircecho_server to use the chat record" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22698 [21:23:18] RECOVERY - MySQL Slave Delay on db78 is OK: OK replication delay 0 seconds [21:23:28] Mar 18 21:12:49 mw1169 apache2[7723]: [notice] caught SIGTERM, shutting down [21:23:30] paravoid, how's it look now? [21:23:30] https://gerrit.wikimedia.org/r/#/c/54525/ [21:24:15] RECOVERY - Puppet freshness on lvs1003 is OK: puppet ran at Mon Mar 18 21:24:08 UTC 2013 [21:24:35] New review: Jeremyb; "unrotted" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8344 [21:24:41] ottomata: nope :) [21:24:42] !log authdns-update rt4683 [21:24:44] New review: Jeremyb; "unrotted" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22698 [21:24:48] Logged the message, RobH [21:24:50] !log ran "ddsh -g apaches -cM '/etc/init.d/apache2 start'" [21:24:55] dzahn is doing a graceful restart of all apaches [21:24:56] Logged the message, Master [21:25:06] ^^ started around 30% of apaches [21:25:29] LeslieCarr: look up :-) [21:25:45] ok, really have to go, bbl [21:25:52] paravoid: nope as in it looks no good or nope as in not right now? [21:25:54] thanks jeremyb_ [21:26:47] New review: Faidon; "See http://wiki.debian.org/Python/TransitionToDHPython2" [operations/debs/python-flask-login] (master) C: -1; - https://gerrit.wikimedia.org/r/54525 [21:27:36] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 191 seconds [21:28:01] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54576 [21:28:06] notpeter: ^ interesting that db78 is OK; but db1025 isn't [21:28:10] I told you, stdeb is bad [21:28:15] very old practices [21:28:31] we'll get there, it'd be just quicker to make it from scratch :-) [21:28:47] mwalker: I started slaving on db1025 later :) [21:28:47] it'll catch up [21:28:59] dzahn is doing a graceful restart of all apaches [21:29:09] RobH: testing [21:29:36] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 1 seconds [21:29:39] !log dzahn gracefulled all apaches [21:29:47] RobH: no key errors, just the expected [21:29:47] Logged the message, Master [21:29:54] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54575 [21:30:34] !log authdns-update: removing *.ts.wikimedia.org records [21:30:35] oh [21:30:40] Logged the message, Master [21:30:41] i thought you said "other than that, okay" [21:30:51] meaning stdeb left in some bad stuff, but we should change it and it would be ok [21:31:06] paravoid^ [21:31:21] New patchset: Asher; "enabling extended_keys since https://mariadb.atlassian.net/browse/MDEV-4220 has been fixed" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54578 [21:31:44] yeah, I had a closer look :) [21:31:48] we'll get there! [21:31:55] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54578 [21:32:06] ah i see your review, thanks [21:36:13] Make the build-dep >= 8 (no minor version needed) and debian/compat to 8. <= that's for debhelper [21:36:35] !log asher synchronized wmf-config/db-eqiad.php 'pulling a db from s3-7 for upgrade' [21:36:36] New patchset: Ottomata; "Creating debian/ directory using stdeb. python setup.py --command-packages=stdeb.command debianize See: https://pypi.python.org/pypi/stdeb" [operations/debs/python-flask-login] (master) - https://gerrit.wikimedia.org/r/54525 [21:36:43] Logged the message, Master [21:37:01] ja, wasn't sure but assumed, how's that? ^ [21:37:15] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8344 [21:37:20] you ignored my first line :) [21:37:27] http://wiki.debian.org/Python/TransitionToDHPython2 [21:38:19] BAH [21:38:19] sorry [21:38:25] while you're at it [21:38:26] i opened the link but then did the others [21:38:27] reading [21:38:33] fix the commit author, it has your personal gmail [21:38:36] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.8922340678 (gt 8.0) [21:38:38] (git commit --amend --reset-author) [21:38:51] remove the stdeb from commit message and README to not mislead others into thinking this is a good idea [21:39:02] and changelog, just make it "Initial release." [21:39:12] censorship! [21:39:22] it has my personal email? [21:39:46] ah, author is right, commiter is personal [21:39:50] acotto :) [21:40:03] where? [21:40:05] i'm looking for that [21:40:10] https://gerrit.wikimedia.org/r/#/c/54525/3//COMMIT_MSG [21:41:08] PROBLEM - mysqld processes on db1010 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [21:41:19] hm. [21:41:19] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/22698 [21:41:52] New patchset: Jeremyb; "followup Ibb454f8883bfa8" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54581 [21:42:03] LeslieCarr: ^ [21:42:15] okay thanks for uncrufting :) [21:42:21] PROBLEM - Packetloss_Average on emery is CRITICAL: CRITICAL: packet_loss_average is 8.51332992857 (gt 8.0) [21:42:45] Change merged: Lcarr; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54581 [21:43:32] ottomata: btw, when you get a chance I've added you as a reviewer on https://gerrit.wikimedia.org/r/#/c/53714/ [21:43:38] RECOVERY - MySQL Replication Heartbeat on db53 is OK: OK replication delay 0 seconds [21:43:51] oh awesome, thanks paravoid, would love to review that [21:44:18] RECOVERY - MySQL Slave Delay on db53 is OK: OK replication delay 0 seconds [21:44:31] it needs a rewrite basically :) [21:44:48] aye [21:45:33] hey, I'd really like to push this along: https://gerrit.wikimedia.org/r/#/c/51668 [21:45:42] anyone have opinions? [21:45:53] notpeter: why is the module named "coredb_mysql" and not coredb? [21:45:57] New patchset: RobH; "added wikimaps.net to act like the wikimaps.com/org" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/54582 [21:45:58] (just curious) [21:46:18] PROBLEM - mysqld processes on db1011 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [21:46:24] notpeter: yes, both me and mark have reject this in the past :) [21:46:44] paravoid: okie dokie [21:46:47] paravoid: namespace [21:46:56] if it was just coredb [21:46:57] RECOVERY - Puppet freshness on db1011 is OK: puppet ran at Mon Mar 18 21:46:49 UTC 2013 [21:47:06] it gets confused with role::coredb [21:47:07] PROBLEM - mysqld processes on db1026 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [21:47:08] RECOVERY - mysqld processes on db1010 is OK: PROCS OK: 1 process with command name mysqld [21:47:08] PROBLEM - mysqld processes on db1027 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [21:47:11] Why does http://www.wikimediafoundation.org/ redirect to https://www.wikipedia.org/ ? [21:47:20] notpeter: it shouldn't [21:47:22] did you use ::coredb? [21:47:28] Krinkle: thats bad. [21:47:30] checking. [21:47:34] paravoid: I believe so [21:47:51] paravoid: Ryan_Lane has also noted have similar namespace issues combining classes and modules [21:47:59] https://gerrit.wikimedia.org/r/#/c/16661/ [21:48:07] but yes, I agree that it shouldn't happen :) [21:48:08] http://www.wikimediafoundation.com seems fine. [21:48:31] paravoid: ok [21:48:59] mutante: https://gerrit.wikimedia.org/r/#/c/54526/ [21:48:59] you said you'll cleanup base, didn't you? :D [21:49:19] RECOVERY - mysqld processes on db1011 is OK: PROCS OK: 1 process with command name mysqld [21:49:37] PROBLEM - MySQL Slave Running on db1010 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Error Table ./ruwiktionary/flaggedpage_pending is marked as crashe [21:49:54] paravoid: yep :) [21:49:58] Krinkle: Can you file a bug? [21:50:04] RobH: that second [OR] [21:50:05] I'll add it to my list :) [21:50:34] New patchset: Faidon; "appserver: don't install a particular PHP version" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54585 [21:50:37] notpeter: ^^^ [21:50:45] New patchset: RobH; "fixing or statement" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/54586 [21:51:04] paravoid: sounds reasonable to me [21:51:06] Change merged: RobH; [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/54586 [21:51:36] yes, classes will conflict with modules [21:51:37] Susan: https://bugzilla.wikimedia.org/show_bug.cgi?id=46297 [21:51:42] for silly reasons [21:52:05] New review: Krinkle; "Should fix bug 46297." [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/54586 [21:52:06] it's definitely a bug, but who else is really doing what we are doing? [21:52:09] RECOVERY - mysqld processes on db1026 is OK: PROCS OK: 1 process with command name mysqld [21:52:09] RECOVERY - mysqld processes on db1027 is OK: PROCS OK: 1 process with command name mysqld [21:52:09] PROBLEM - Host db1011 is DOWN: PING CRITICAL - Packet loss = 100% [21:52:25] Ryan_Lane: you mean foo::bar::baz will conflict with ::baz? [21:52:34] let me find an example [21:52:38] Krinkle: good catch, i had a bad [OR] in redirects [21:52:44] and our apache linting doesnt throw an error for that [21:52:57] RobH: Yeah [21:52:58] i fixed my procedure for pushing to fix so it doesnt happen again, but argh [21:53:06] syncing out the fix now [21:53:15] RobH: Looks like a common error though, happened at least two more times in the last few weeks [21:53:16] * RobH added all top level domains to his apache-fast-test input file [21:53:19] paravoid: role::salt::master will conflict with salt::master, if salt::master is in a module [21:53:26] and role is not [21:53:28] yea, my addition to my test file fixes it for me [21:53:28] causing weird cases to fall through and redirect a whole bunch of domains [21:53:28] RECOVERY - Host db1011 is UP: PING OK - Packet loss = 0%, RTA = 0.32 ms [21:53:29] ok, thanks for the reviews paravoid, i'll try to get the dh_python stuff working tomorrow [21:53:41] ottomata: I can do it too btw [21:53:47] PROBLEM - Host db1026 is DOWN: PING CRITICAL - Packet loss = 100% [21:53:59] BAH [21:53:59] ottomata: I just prefer to do reviews so that you know how to do the next :) [21:54:06] ran sync-common by accident, means i get to wait on it [21:54:07] oh!, if you would that would be delightful, then i'd also have somethign to go on in the future [21:54:10] yeah totally [21:54:24] rfaulkner is hoping to have this up before they present their thing at some conference [21:54:27] paravoid: which is why I've changed role::salt::master to role::salt::masters [21:54:28] ottomata: the dh_python2 changes are like 3 lines btw [21:54:34] yeah i thought [21:54:35] ok ok ok [21:54:37] i'll see if I can do it now [21:54:40] i just wasn't sure [21:54:40] it really fucks up the whole namespace [21:54:41] ok [21:55:07] role::salt::master::production <— also doesn't work [21:55:14] robh is doing a graceful restart of all apaches [21:55:14] basically: remove crap from control, switch to dh_python2? [21:55:18] PROBLEM - Host db1027 is DOWN: PING CRITICAL - Packet loss = 100% [21:55:30] not sure about this though: [21:55:30] export DH_OPTIONS=--buildsystem=python_distutils [21:55:31] ditch it [21:55:56] !log robh gracefulled all apaches [21:55:57] jenkins is borked [21:56:05] Logged the message, Master [21:56:22] do I chagne dh to dh_python2? [21:56:26] dzahn is doing a graceful restart of all apaches [21:56:28] RECOVERY - Host db1026 is UP: PING OK - Packet loss = 0%, RTA = 0.28 ms [21:56:41] no [21:56:42] New review: Faidon; "23:51 < notpeter> paravoid: sounds reasonable to me" [operations/puppet] (production); V: 2 C: 2; - https://gerrit.wikimedia.org/r/54585 [21:56:44] dh --with python2 [21:56:50] RECOVERY - Host db1027 is UP: PING OK - Packet loss = 0%, RTA = 0.34 ms [21:56:52] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54585 [21:57:12] !log dzahn gracefulled all apaches [21:57:19] Logged the message, Master [21:57:34] paravoid: I am learning that present is always the right answer to anything that we build ourselves :) [21:57:41] PROBLEM - MySQL Recent Restart on db1028 is CRITICAL: Connection refused by host [21:57:41] PROBLEM - MySQL Slave Delay on db1028 is CRITICAL: Connection refused by host [21:57:51] PROBLEM - NTP peers on linne is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:57:58] !log purging wikimediafoundation.org from squid [21:57:58] New review: Alex Monk; "(1 comment)" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/15698 [21:58:00] we had that discussion earlier today when puppet decided to break ganglia, blog, apple-dictionary-bridge and whatever else [21:58:04] Logged the message, Master [21:58:26] New patchset: Rfaulk; "mod. e3_analysis_path for use in metrics api package." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54590 [21:58:38] RECOVERY - NTP peers on linne is OK: NTP OK: Offset -0.001398 secs [21:58:45] paravoid: gotcha [21:59:09] we didn't exactly agreed, although this case is a bit different :) [21:59:16] *agree [21:59:18] PROBLEM - Host db1028 is DOWN: PING CRITICAL - Packet loss = 100% [21:59:40] eh [21:59:44] it's generally safer [22:00:34] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54590 [22:01:22] New patchset: Ottomata; "Creating debian/ directory using stdeb. python setup.py --command-packages=stdeb.command debianize See: https://pypi.python.org/pypi/stdeb" [operations/debs/python-flask-login] (master) - https://gerrit.wikimedia.org/r/54525 [22:01:29] RECOVERY - Host db1028 is UP: PING OK - Packet loss = 0%, RTA = 0.27 ms [22:01:44] RECOVERY - MySQL Recent Restart on db1028 is OK: OK seconds since restart [22:01:44] RECOVERY - MySQL Slave Delay on db1028 is OK: OK replication delay seconds [22:03:27] paravoid : https://gerrit.wikimedia.org/r/#/c/54525/ [22:04:27] oops [22:04:31] sorry commit message and readme [22:04:32] one sec [22:04:46] drop readme completely [22:04:54] and changelog :) [22:05:01] ja [22:05:04] I mean, change the line in changelog, not dropping it [22:05:11] right [22:05:15] and drop the commented lines in rules [22:05:19] right [22:05:21] New review: Alex Monk; "(1 comment)" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/15698 [22:05:23] and the trailing whitespace [22:05:32] soryr, poked you before I did my own review :p [22:05:44] i built the package with dh_python2 and was too excited! [22:06:05] why drop README.debian completely? especially for noobs like me that stuff is helpful [22:06:25] it's very repetitive to have that in every package [22:06:50] hmmm, ok [22:06:57] also, it refers to stdeb [22:07:02] right, meant to change that [22:07:07] have a look at PS1 and PS5 and tell me how much of stdeb was left :) [22:07:13] heh [22:07:52] really, it's 5 files, 47 lines in total [22:08:00] why do people say it's hard, I'll never get ;) [22:08:59] i thikn because there are SO many ways to do it [22:10:35] robh is doing a graceful restart of all apaches [22:10:54] !log robh gracefulled all apaches [22:11:03] Logged the message, Master [22:11:32] New patchset: Ottomata; "Creating debian/ directory using stdeb. python setup.py --command-packages=stdeb.command debianize See: https://pypi.python.org/pypi/stdeb" [operations/debs/python-flask-login] (master) - https://gerrit.wikimedia.org/r/54525 [22:12:33] paravoid, ooook what else? [22:13:57] robh is doing a graceful restart of all apaches [22:14:02] ok, lets try this on wired. [22:14:22] !log robh gracefulled all apaches [22:14:38] Logged the message, Master [22:15:21] New review: Faidon; "(3 comments)" [operations/puppet] (production) C: -1; - https://gerrit.wikimedia.org/r/54324 [22:16:11] ottomata: commit message ;-) [22:16:16] you can do this from within gerrit too [22:16:36] bah doh [22:16:44] down to 35 lines [22:17:03] New patchset: Ottomata; "Initial debianization." [operations/debs/python-flask-login] (master) - https://gerrit.wikimedia.org/r/54525 [22:17:39] New review: Faidon; "That's great!" [operations/debs/python-flask-login] (master) C: 2; - https://gerrit.wikimedia.org/r/54525 [22:17:43] there! [22:17:53] New patchset: Lcarr; "removing second instance of mysql client" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54594 [22:19:20] Ryan_Lane: https://gerrit.wikimedia.org/r/#/c/53490/ [22:19:45] yeehaw! thanks paravoid! [22:19:50] thank you [22:19:53] oh! [22:19:54] for putting up with my reviews :) [22:20:04] hm, wait i just thought about one more thing, because i just thought about adding this to apt [22:20:12] does this need a wikimedia distributiont tag? [22:20:28] nah that's fine, you can do --ignore=wrongdistribution [22:20:34] !log intradatacenter link is flapping , switching links, this may cause some higher latency [22:20:40] Logged the message, Mistress of the network gear. [22:20:44] know the worst part of the flapping ? that i have to call fpl [22:21:27] reprepro —ignore=-wrongdistribution -C main include …changes [22:21:29] ? [22:21:41] yes [22:21:55] that wasn't too hard, was it? [22:22:05] the debianization I mean [22:22:19] PROBLEM - Packetloss_Average on emery is CRITICAL: CRITICAL: packet_loss_average is 8.40279857143 (gt 8.0) [22:22:33] ok, rebooted, once more with feeling! [22:22:38] robh is doing a graceful restart of all apaches [22:22:46] naw, not too bad, thanks to dh_python, [22:22:57] most stuff are automated like this [22:22:57] i think the kafka one won't be so bad if we are both reviewing and fixing at the same time [22:22:57] !log robh gracefulled all apaches [22:23:03] Logged the message, Master [22:23:03] like gem2deb [22:23:10] or dh-make-perl [22:23:13] or dh-make-pear/pecl [22:23:28] scala / java? [22:23:31] :) [22:23:41] there are some maven helpers [22:23:45] hm, eah [22:23:53] one of them is being developed right about now [22:23:59] look at the debian-java mailing list for more [22:27:12] paravoid, what is the distribution name on this? [22:27:13] all? [22:27:26] hm? [22:27:28] precise-wikimedia [22:27:37] oh ok [22:27:38] i see [22:27:50] we also have lucid-wikimedia [22:27:55] but I surely hope this isn't lucid :) [22:27:59] no its precise [22:28:03] great [22:28:09] thought that since we were using ignore it should match in the package or something [22:28:11] but i get it [22:28:35] !log added python-flask-login-0.1.2-1 to apt [22:28:37] does it pass lintian? [22:28:39] the package? [22:28:42] Logged the message, Master [22:28:43] you know about lintian, right? [22:28:46] PROBLEM - Packetloss_Average on locke is CRITICAL: CRITICAL: packet_loss_average is 8.71119228571 (gt 8.0) [22:28:48] barely [22:29:08] hmm, i get some warnings and an error about no copyright file [22:29:15] what are the warnings? [22:29:50] I think it's okay for our packages to not have debian/copyright [22:30:08] ACKNOWLEDGEMENT - NTP on analytics1007 is CRITICAL: NTP CRITICAL: No response from NTP server daniel_zahn RT-3946, issue with RAID, has never been up [22:30:08] ACKNOWLEDGEMENT - Puppet freshness on analytics1007 is CRITICAL: Puppet has not run in the last 10 hours daniel_zahn RT-3946, issue with RAID, has never been up [22:30:09] ACKNOWLEDGEMENT - SSH on analytics1007 is CRITICAL: Connection timed out daniel_zahn RT-3946, issue with RAID, has never been up [22:30:09] W: flask-login source: changelog-should-mention-nmu [22:30:10] W: flask-login source: source-nmu-has-incorrect-version-number 0.1.2-1 [22:30:10] W: flask-login source: no-debian-copyright [22:30:10] W: python-flask-login: new-package-should-close-itp-bug [22:30:10] E: python-flask-login: no-copyright-file [22:30:18] ok [22:30:28] the first two are because last line in changelog != maintainer line [22:30:36] ah (WMF) [22:30:36] ? [22:30:39] Mar 18 22:30:09 mw1179 sshd[1196]: Postponed publickey for root from 208.80.152.165 port 44778 ssh2 [preauth] [22:30:46] wtf is it postponing for. [22:30:49] probably [22:30:50] I use that to match my keyname [22:32:30] yeah, fix it in git, don't rebuild [22:32:50] just so that it's less noisy on the next upload [22:33:03] anyway [22:33:03] good job [22:33:05] ok [22:33:06] cool [22:33:16] New patchset: Ottomata; "Initial debianization." [operations/debs/python-flask-login] (master) - https://gerrit.wikimedia.org/r/54525 [22:33:21] tahnks! thanks for being around, its super late there I bet [22:33:35] not super late [22:33:39] just 30' after midnight [22:34:01] Change merged: Ottomata; [operations/debs/python-flask-login] (master) - https://gerrit.wikimedia.org/r/54525 [22:35:54] paravoid: why not '/var/eventlogging'? [22:36:23] ori-l: http://www.pathname.com/fhs/pub/fhs-2.3.html [22:36:49] Applications must generally not add directories to the top level of /var. Such directories should only be added if they have some system-wide implication, and in consultation with the FHS mailing list. [22:36:59] New patchset: Ottomata; "Installing python-flask-login package for metrics.wikimedia.org" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54599 [22:37:06] not that e.g. /a obeys the FHS [22:37:16] but for new things, better use /srv or /var/lib [22:38:04] right, for consistency with PDP-11 disk structure [22:38:29] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54599 [22:41:53] yayyyyy paravoid rfaulkner sends you many thanks! [22:41:53] notice: /Stage[main]/Misc::Statistics::Sites::Metrics/Package[python-flask-login]/ensure: ensure changed 'purged' to 'present' [22:41:58] and me too! [22:42:16] RECOVERY - Packetloss_Average on emery is OK: OK: packet_loss_average is 2.77332957143 [22:42:17] paravoid: personal thanks :D [22:42:39] ottomata did all the work, you shouldn't be thanking me [22:42:42] and big thanks to ottomantatoo! [22:42:44] RECOVERY - Packetloss_Average on locke is OK: OK: packet_loss_average is 3.4027893617 [22:42:46] ottomata [22:44:41] RECOVERY - MySQL Slave Running on db1010 is OK: OK replication Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Error: [22:45:05] !log converted rogue s3 tables (moodbar_feedback, trackbacks, transcache) to innodb [22:45:14] Logged the message, Master [22:46:22] PROBLEM - MySQL Slave Delay on db1010 is CRITICAL: CRIT replication delay 3013 seconds [22:50:07] New patchset: Asher; "Revert "pulling db per shard for upgrads"" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54601 [22:50:19] New patchset: Rfaulk; "add. handle VersionConflict and require flask-login ver 0.1.2." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54602 [22:51:04] Change merged: Ottomata; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54602 [22:51:13] New patchset: Dzahn; "puppet-lint: fix all "ERROR: tab character found", :retab with 2-space soft tabs per puppet style guide" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54603 [22:51:13] New patchset: Dzahn; "puppet-lint: fix all "double quoted string containing no variables"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54604 [22:51:13] New patchset: Dzahn; "puppet-lint: fix most "=> on line isn't properly aligned for resource", and all "unquoted file mode" and "ensure found on line but it's not the first attribute"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54605 [22:53:22] RECOVERY - MySQL Slave Delay on db1010 is OK: OK replication delay 0 seconds [22:54:12] ottomata: I have a task for you [22:54:22] ottomata: https://gerrit.wikimedia.org/r/#/c/44408/ [22:54:25] yessuh (i'm about to sign out for the day) [22:54:31] haha ok [22:54:32] cool [22:54:34] ;-) [22:54:58] there's also python-jsonschema that ori-l wanted [22:55:01] PROBLEM - Puppet freshness on ms1004 is CRITICAL: Puppet has not run in the last 10 hours [22:55:08] * needed :) [22:55:19] but: https://bugzilla.wikimedia.org/show_bug.cgi?id=46233 [22:55:24] someone merged that commit but I think it was not ready to be pushed yet or something [22:55:30] vague recollection [22:55:35] thanks woosters! better get outta her quick! [22:55:51] yes, i know more about python debs now [22:55:58] python jsonschema should be more doable [22:56:25] ottomata ;-P [22:56:36] * paravoid is evil [22:56:41] gerrit in 3, 2, 1.. [22:56:45] New patchset: Ori.livneh; "Add 'eventlogging' puppet module" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54324 [22:57:02] ottomata: :-) [22:57:06] heheh [22:57:59] ori-l: what about the rest of my comments? [22:58:04] in return for that, could you push on the puppet-merge stuff? :D [22:58:24] wasn't mark reviewing that with you? [22:58:41] New patchset: Dzahn; "puppet-lint: fix "unquoted file mode" and "unquoted resource title" warnings" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54608 [22:58:42] New patchset: Dzahn; "puppet lint: fix "double quoted string containing no variables" and "quoted boolean value"s" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54609 [22:58:42] New patchset: Dzahn; "puppet-lint: fix "ensure found on line but it's not the first attribute"" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54610 [22:58:42] New patchset: Dzahn; "puppet-lint: fix "tab character found on line" by using :retab in vim with 2-space soft tabs (tabstop=2,shiftwidth=2,expandtab)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54611 [22:58:42] New patchset: Dzahn; "puppet-lint: fix "two-space soft tabs not used on line" errors" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54612 [22:58:44] paravoid: i replied. for retaining the role-class, i don't want to do that, simply because i'm deferring a more granular role structure to a later commit [22:58:57] regarding git-deploy: gerrit compromise == puppet/private compromise. I don't want to have to migrate to an entirely new deployment system to get this out. [22:59:24] replied where? [22:59:41] paravoid: he was, afaik know he needed to talk with you about it [22:59:49] paravoid: gah, stupid drafts. my bad. i did not submit the comments [22:59:54] :) [23:00:12] ottomata: okay, let's sync up tomorrow or so [23:00:17] cool, danke [23:00:23] submitted [23:00:34] i'll poke you in the morning here when you are usually both online, thanks! [23:01:01] ori-l: still can't see it [23:01:30] New review: Ori.livneh; "(3 comments)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54324 [23:02:32] Change merged: Asher; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54601 [23:03:06] ori-l: so, we can't have puppet run code from gerrit as root [23:04:03] puppet pulling from git is so and so, running setup.py is just not acceptable from a security PoV [23:04:27] I think git-deploy is in a workable state right now and being used by other teams [23:04:35] but it doesn't have to be git-deploy [23:04:40] is this for limn? [23:04:45] no [23:04:46] eventlogging [23:04:52] ah [23:05:05] https://gerrit.wikimedia.org/r/#/c/54324/2/modules/eventlogging/manifests/init.pp [23:05:13] if you'd like to add another repo to be deployed, just let me know. [23:06:17] I should replace the frontend with the python one we wrote at some point :) [23:06:48] New patchset: Krinkle; "doc.mediawiki.org: Redirect to canonical wikimedia.org and fix invalid SSL" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54614 [23:07:51] Krinkle: why meeee [23:08:12] Because you merged the other series of contint patches last week [23:08:18] * paravoid adds mutante [23:08:24] I figured you'd be somewhat familiar with the context [23:08:24] paravoid: "you touched it" [23:08:32] but no worries, any merge is a good merge :) [23:08:33] ;) [23:08:41] I am, I'm just a reviewer in too many things these days [23:09:32] paravoid: But ahm, much more than reviewing something I did for opertoins, I actually need a patch written _from_ operations. [23:09:38] A debian package [23:10:09] https://bugzilla.wikimedia.org/show_bug.cgi?id=46236 [23:10:22] ah, wait, paravoid == Faidon, you already know. [23:11:30] !log asher synchronized wmf-config/db-eqiad.php 'returning db10[10-11],[26-28] at low weights' [23:11:37] Logged the message, Master [23:11:43] jsduck has been used a lot by us, and I'm familiar with the code base (submitted various patchses upstream myself). And it also isn't public facing (running locally on gallium from jenkins jobs) [23:13:05] I just need it to be packaged properly in a way that works for it. [23:23:02] Krinkle: want to submit another patch to upstream for me? [23:23:15] drop the require 'rubygems' from ./lib/jsduck/doc_formatter.rb [23:24:18] paravoid: Hm.. Are you sure that won't affect anything? [23:24:59] google that [23:25:09] http://www.rubyinside.com/why-using-require-rubygems-is-wrong-1478.html [23:25:12] etc. [23:25:29] also the gemspec could use a better description [23:25:34] I know a fair bit ruby, but not really about gems (other than installing gems) [23:25:57] I'll file it and submit a pull request [23:26:00] Thx [23:28:03] New patchset: Asher; "db69,71 to mariadb" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54615 [23:28:34] sigh, embedded extjs [23:28:48] that would never get into debian [23:29:06] paravoid: "since RubyGems is included with Ruby 1.9 and loaded by default." [23:29:18] -rwxr-xr-x root/root 901 2013-03-19 01:28 ./usr/bin/jsduck [23:29:18] -rwxr-xr-x root/root 10079 2013-03-19 01:28 ./usr/bin/compare [23:29:18] -rwxr-xr-x root/root 1133 2013-03-19 01:28 ./usr/bin/graph [23:29:18] Would wmf unload it? [23:29:20] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54615 [23:29:21] srsly? [23:29:28] /usr/bin/compare? [23:29:37] * paravoid rants about ruby programmers [23:29:58] Krinkle: it's wrong to require it [23:30:03] sure [23:30:07] paravoid: Uhm, I don't get those bins in my path when I install it [23:30:07] Krinkle: I've patched it locally for the package [23:30:10] only jsduck is installed [23:30:16] the others are for local usage in the repo [23:31:01] paravoid: ok [23:31:14] paravoid: Hm.. so yeah, I find it odd indeed why it would be in the code in the first place [23:31:44] Or does rubygem have some way to download dependencies at run-time or something that it would need those classes to be available :super-evil: [23:32:17] just remove that line [23:32:24] I know, already done locally [23:32:37] I'm trying to make sense of why someone would do this in the first place (originally) [23:32:53] Or is it just sloppy ignorant crap dump? [23:33:04] I have no idea [23:34:30] !log i <3 running "dpkg -P mysql-server-5.1 mysqlfb-client-5.1 mysqlfb-common mysqlfb-server-5.1 mysqlfb-server-core-5.1 libmysqlfbclient16 libmysqlclient16" [23:34:36] Logged the message, Master [23:34:42] htat's just a crappy gem [23:34:48] breaks in other ways [23:34:57] sigh [23:35:47] Krinkle: the bins also require rubygems [23:35:58] PROBLEM - mysqld processes on db69 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [23:36:01] binasher: awesome :) [23:36:54] paravoid: Indeed, they shouldn't. "gem" will put that in the wrapper it puts in /usr/bin when (if!) the user does gem-install. [23:36:55] PROBLEM - mysqld processes on db71 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [23:37:18] New patchset: Asher; "woops, incorrect use of quotes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54616 [23:37:34] # Runs Esprima.js through V8. [23:37:37] oh god [23:38:16] that needs therubyracer [23:38:27] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54616 [23:38:36] that's embedding v8 into ruby [23:38:37] wth [23:38:41] :D [23:38:41] I'm not packaging that [23:38:53] paravoid: https://github.com/senchalabs/jsduck/issues/338 [23:38:57] paravoid: hey, that'll get you one gem closer to gitlab [23:39:02] !log paravoid is now the ruby subject matter expert in addition to search [23:39:31] hehehehe [23:39:41] * paravoid throws a Ryan_Lane to binasher [23:39:43] paravoid: therubyracer was already listed in the gemspec fyi. [23:39:50] yes it is [23:39:53] but it's not packaged either [23:39:59] and I don't *want* to package that [23:40:09] what do you mean, but it "isn't". The tool you use doesn't catch it? [23:40:35] paravoid: How so? [23:40:35] no, the "tool I use" doesn't recursively build debs for all of the possible gems out there :) [23:41:16] well, I'd think it would include only recurse the tree relevant for the package at hand. [23:41:48] It seems like a broken tool if it purposely ignores dependencies, they are dependencies because it depends on it... [23:42:59] New patchset: Asher; "fix hostname regex" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54617 [23:44:02] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/54617 [23:44:22] for the love of god [23:44:23] paravoid: v8 is one of the best javascript engines, and esprisma one of the best js parsers. [23:44:32] the gem BUNDLES the whole v8 library [23:44:59] you mean it should allow for installing v8 separately [23:45:07] wth is wrong with those people [23:45:14] e.g. from another debian package in our case [23:47:46] sorry I don't think this should go into production [23:48:00] i just had an amazing idea. paravoid: what if you could get jruby to run on v8 via a java->script compiler, then try to get esprisma running on that, so you can v8 on your v8? [23:48:55] RECOVERY - mysqld processes on db71 is OK: PROCS OK: 1 process with command name mysqld [23:49:17] binasher: you think you're being funny, but https://github.com/cowboyd/therubyrhino [23:49:26] Embed the Mozilla Rhino JavaScript interpreter into Ruby [23:49:36] REQUIREMENTS: [23:49:37] JRuby >= 1.6 [23:49:40] INSTALL: [23:49:40] jruby -S gem install therubyrhino [23:49:50] JRUBY -S GEM INSTALL FTW [23:50:23] paravoid: can you get parsoid running in that? [23:50:47] PROBLEM - MySQL Slave Delay on db71 is CRITICAL: CRIT replication delay 998 seconds [23:50:47] RECOVERY - mysqld processes on db69 is OK: PROCS OK: 1 process with command name mysqld [23:51:09] is ruby trying to become the new emacs? [23:51:24] there's opal, a ruby to javascript compiler [23:51:27] I'm sure someone would create the opposite [23:53:02] binasher: gitlab was using execjs, which is an abstraction layer for therubyracer, therubyrhino and ... [23:53:05] drumroll [23:53:07] Node.js! [23:53:45] PROBLEM - MySQL Slave Delay on db69 is CRITICAL: CRIT replication delay 1127 seconds [23:53:57] gitlab was also using rubypython, to run Python from within Ruby [23:54:04] so that they could run pygments [23:54:14] I'm not shitting you [23:54:28] those were just three of the 98 gems it wanted to be installed [23:54:45] a bunch of them native code too [23:55:25] paravoid: Looks like they require ruby 1.9 also [23:55:41] I tried installing it on a labs vm in a local directory (instead of in /usr) but it filed [23:55:43] failed* [23:56:15] paravoid: https://gist.github.com/Krinkle/ce9ba9f726d8daead7f1 [23:56:54] This is really annoying, as it has been running from another labs machine for months without troubles [23:57:04] (granted, installed via gem install, not puppet - but on labs) [23:57:45] PROBLEM - Puppet freshness on formey is CRITICAL: Puppet has not run in the last 10 hours [23:58:02] and if we want to stay on schedule, we need these built postmerge on doc.wikimedia.org within a few weeks [23:58:49] er? [23:58:53] New patchset: Asher; "pulling dbs for upgrades, returning others" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/54619 [23:59:00] whose schedule? [23:59:06] paravoid: crazyness aside, what are the main blockers? [23:59:17] therubyracer is the main blocker. [23:59:32] this is just too complicated [23:59:33] paravoid: Well, we're using jsduck syntax in our repositories. And normal people have no issue running gem-install locally and it all works fine. [23:59:43] so? [23:59:45] what's your point?