[00:02:10] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:14:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.021 seconds [00:24:58] New review: Helder.wiki; "Any hints on how to fix this?" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/21475 [00:46:32] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:56:53] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 3.747 seconds [01:31:50] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:41:35] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 253 seconds [01:42:11] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.235 seconds [01:42:11] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 292 seconds [01:44:35] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 8 seconds [01:46:41] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 22 seconds [02:17:03] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:21:15] PROBLEM - Puppet freshness on ssl2 is CRITICAL: Puppet has not run in the last 10 hours [02:24:15] PROBLEM - Puppet freshness on db1019 is CRITICAL: Puppet has not run in the last 10 hours [02:24:15] PROBLEM - Puppet freshness on mw24 is CRITICAL: Puppet has not run in the last 10 hours [02:24:15] PROBLEM - Puppet freshness on mw47 is CRITICAL: Puppet has not run in the last 10 hours [02:24:15] PROBLEM - Puppet freshness on ms-fe1004 is CRITICAL: Puppet has not run in the last 10 hours [02:24:15] PROBLEM - Puppet freshness on cp1036 is CRITICAL: Puppet has not run in the last 10 hours [02:24:15] PROBLEM - Puppet freshness on srv301 is CRITICAL: Puppet has not run in the last 10 hours [02:24:15] PROBLEM - Puppet freshness on sq67 is CRITICAL: Puppet has not run in the last 10 hours [02:24:16] PROBLEM - Puppet freshness on sq68 is CRITICAL: Puppet has not run in the last 10 hours [02:24:16] PROBLEM - Puppet freshness on ssl1004 is CRITICAL: Puppet has not run in the last 10 hours [02:25:18] PROBLEM - Puppet freshness on search34 is CRITICAL: Puppet has not run in the last 10 hours [02:28:54] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.042 seconds [02:33:15] PROBLEM - Puppet freshness on virt2 is CRITICAL: Puppet has not run in the last 10 hours [02:33:15] PROBLEM - Puppet freshness on sq71 is CRITICAL: Puppet has not run in the last 10 hours [02:33:15] PROBLEM - Puppet freshness on sq55 is CRITICAL: Puppet has not run in the last 10 hours [02:34:18] PROBLEM - Puppet freshness on analytics1008 is CRITICAL: Puppet has not run in the last 10 hours [02:34:18] PROBLEM - Puppet freshness on analytics1006 is CRITICAL: Puppet has not run in the last 10 hours [02:34:18] PROBLEM - Puppet freshness on cadmium is CRITICAL: Puppet has not run in the last 10 hours [02:34:18] PROBLEM - Puppet freshness on labstore3 is CRITICAL: Puppet has not run in the last 10 hours [02:34:18] PROBLEM - Puppet freshness on cp1029 is CRITICAL: Puppet has not run in the last 10 hours [02:34:18] PROBLEM - Puppet freshness on cp1034 is CRITICAL: Puppet has not run in the last 10 hours [02:34:18] PROBLEM - Puppet freshness on lvs4 is CRITICAL: Puppet has not run in the last 10 hours [02:34:19] PROBLEM - Puppet freshness on search13 is CRITICAL: Puppet has not run in the last 10 hours [02:34:19] PROBLEM - Puppet freshness on search1011 is CRITICAL: Puppet has not run in the last 10 hours [02:34:20] PROBLEM - Puppet freshness on search1017 is CRITICAL: Puppet has not run in the last 10 hours [02:34:20] PROBLEM - Puppet freshness on search1007 is CRITICAL: Puppet has not run in the last 10 hours [02:34:21] PROBLEM - Puppet freshness on manutius is CRITICAL: Puppet has not run in the last 10 hours [02:34:21] PROBLEM - Puppet freshness on search20 is CRITICAL: Puppet has not run in the last 10 hours [02:34:22] PROBLEM - Puppet freshness on sq48 is CRITICAL: Puppet has not run in the last 10 hours [02:34:22] PROBLEM - Puppet freshness on sq51 is CRITICAL: Puppet has not run in the last 10 hours [02:34:23] PROBLEM - Puppet freshness on sq54 is CRITICAL: Puppet has not run in the last 10 hours [02:34:23] PROBLEM - Puppet freshness on sq60 is CRITICAL: Puppet has not run in the last 10 hours [02:34:24] PROBLEM - Puppet freshness on sq81 is CRITICAL: Puppet has not run in the last 10 hours [02:34:24] PROBLEM - Puppet freshness on sq78 is CRITICAL: Puppet has not run in the last 10 hours [02:34:25] PROBLEM - Puppet freshness on sq86 is CRITICAL: Puppet has not run in the last 10 hours [02:34:25] PROBLEM - Puppet freshness on sq84 is CRITICAL: Puppet has not run in the last 10 hours [02:35:12] PROBLEM - Puppet freshness on ocg1 is CRITICAL: Puppet has not run in the last 10 hours [02:36:22] New patchset: Helder.wiki; "(bug 39652) Fix "autoreviewer" restriction level on ptwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23997 [02:38:12] RECOVERY - Puppet freshness on analytics1006 is OK: puppet ran at Mon Sep 17 02:37:54 UTC 2012 [02:38:39] RECOVERY - Puppet freshness on ssl2 is OK: puppet ran at Mon Sep 17 02:38:30 UTC 2012 [02:40:09] RECOVERY - Puppet freshness on search13 is OK: puppet ran at Mon Sep 17 02:39:44 UTC 2012 [02:40:09] RECOVERY - Puppet freshness on search20 is OK: puppet ran at Mon Sep 17 02:39:55 UTC 2012 [02:40:09] RECOVERY - Puppet freshness on search1011 is OK: puppet ran at Mon Sep 17 02:40:03 UTC 2012 [02:40:27] RECOVERY - Puppet freshness on cp1029 is OK: puppet ran at Mon Sep 17 02:40:15 UTC 2012 [02:40:45] RECOVERY - Puppet freshness on mw24 is OK: puppet ran at Mon Sep 17 02:40:38 UTC 2012 [02:40:54] RECOVERY - Puppet freshness on labstore3 is OK: puppet ran at Mon Sep 17 02:40:44 UTC 2012 [02:41:12] RECOVERY - Puppet freshness on mw47 is OK: puppet ran at Mon Sep 17 02:41:01 UTC 2012 [02:41:39] RECOVERY - Puppet freshness on manutius is OK: puppet ran at Mon Sep 17 02:41:19 UTC 2012 [02:42:15] RECOVERY - Puppet freshness on sq81 is OK: puppet ran at Mon Sep 17 02:41:54 UTC 2012 [02:42:42] RECOVERY - Puppet freshness on srv253 is OK: puppet ran at Mon Sep 17 02:42:18 UTC 2012 [02:42:51] RECOVERY - Puppet freshness on sq54 is OK: puppet ran at Mon Sep 17 02:42:40 UTC 2012 [02:43:45] RECOVERY - Puppet freshness on db1019 is OK: puppet ran at Mon Sep 17 02:43:27 UTC 2012 [02:43:45] RECOVERY - Puppet freshness on labstore4 is OK: puppet ran at Mon Sep 17 02:43:38 UTC 2012 [02:43:54] RECOVERY - Puppet freshness on sq86 is OK: puppet ran at Mon Sep 17 02:43:40 UTC 2012 [02:45:15] RECOVERY - Puppet freshness on cp1036 is OK: puppet ran at Mon Sep 17 02:45:00 UTC 2012 [02:45:42] RECOVERY - Puppet freshness on search1017 is OK: puppet ran at Mon Sep 17 02:45:32 UTC 2012 [02:47:12] RECOVERY - Puppet freshness on ms-fe1004 is OK: puppet ran at Mon Sep 17 02:47:04 UTC 2012 [02:47:39] RECOVERY - Puppet freshness on williams is OK: puppet ran at Mon Sep 17 02:47:19 UTC 2012 [02:47:39] RECOVERY - Puppet freshness on sq60 is OK: puppet ran at Mon Sep 17 02:47:23 UTC 2012 [02:47:39] RECOVERY - Puppet freshness on cp1003 is OK: puppet ran at Mon Sep 17 02:47:29 UTC 2012 [02:47:57] RECOVERY - Puppet freshness on lvs4 is OK: puppet ran at Mon Sep 17 02:47:41 UTC 2012 [02:48:15] RECOVERY - Puppet freshness on analytics1008 is OK: puppet ran at Mon Sep 17 02:47:59 UTC 2012 [02:49:09] RECOVERY - Puppet freshness on searchidx2 is OK: puppet ran at Mon Sep 17 02:48:51 UTC 2012 [02:49:09] RECOVERY - Puppet freshness on sq51 is OK: puppet ran at Mon Sep 17 02:48:54 UTC 2012 [02:49:09] RECOVERY - Puppet freshness on srv301 is OK: puppet ran at Mon Sep 17 02:48:58 UTC 2012 [02:50:12] RECOVERY - Puppet freshness on cp1034 is OK: puppet ran at Mon Sep 17 02:49:59 UTC 2012 [02:51:15] RECOVERY - Puppet freshness on sq48 is OK: puppet ran at Mon Sep 17 02:50:51 UTC 2012 [02:51:15] RECOVERY - Puppet freshness on ssl3 is OK: puppet ran at Mon Sep 17 02:50:59 UTC 2012 [02:53:12] RECOVERY - Puppet freshness on virt2 is OK: puppet ran at Mon Sep 17 02:52:58 UTC 2012 [02:55:09] RECOVERY - Puppet freshness on sq78 is OK: puppet ran at Mon Sep 17 02:54:58 UTC 2012 [02:55:18] PROBLEM - Puppet freshness on search1024 is CRITICAL: Puppet has not run in the last 10 hours [02:55:18] PROBLEM - Puppet freshness on erzurumi is CRITICAL: Puppet has not run in the last 10 hours [02:55:18] PROBLEM - Puppet freshness on sq58 is CRITICAL: Puppet has not run in the last 10 hours [02:55:18] PROBLEM - Puppet freshness on srv278 is CRITICAL: Puppet has not run in the last 10 hours [02:55:27] RECOVERY - Puppet freshness on sq68 is OK: puppet ran at Mon Sep 17 02:55:12 UTC 2012 [02:56:03] RECOVERY - Puppet freshness on search34 is OK: puppet ran at Mon Sep 17 02:55:53 UTC 2012 [02:56:12] RECOVERY - Puppet freshness on sq84 is OK: puppet ran at Mon Sep 17 02:56:05 UTC 2012 [02:56:39] RECOVERY - Puppet freshness on sq71 is OK: puppet ran at Mon Sep 17 02:56:20 UTC 2012 [02:56:57] RECOVERY - Puppet freshness on search1024 is OK: puppet ran at Mon Sep 17 02:56:41 UTC 2012 [02:56:57] RECOVERY - Puppet freshness on ocg1 is OK: puppet ran at Mon Sep 17 02:56:45 UTC 2012 [02:58:45] RECOVERY - Puppet freshness on search1007 is OK: puppet ran at Mon Sep 17 02:58:17 UTC 2012 [03:00:15] RECOVERY - Puppet freshness on cadmium is OK: puppet ran at Mon Sep 17 02:59:52 UTC 2012 [03:00:15] RECOVERY - Puppet freshness on sq55 is OK: puppet ran at Mon Sep 17 02:59:56 UTC 2012 [03:01:09] RECOVERY - Puppet freshness on srv278 is OK: puppet ran at Mon Sep 17 03:00:48 UTC 2012 [03:01:45] RECOVERY - Puppet freshness on erzurumi is OK: puppet ran at Mon Sep 17 03:01:16 UTC 2012 [03:03:42] RECOVERY - Puppet freshness on sq58 is OK: puppet ran at Mon Sep 17 03:03:32 UTC 2012 [03:05:21] RECOVERY - Puppet freshness on sq67 is OK: puppet ran at Mon Sep 17 03:05:10 UTC 2012 [03:06:15] RECOVERY - Puppet freshness on ssl1004 is OK: puppet ran at Mon Sep 17 03:06:05 UTC 2012 [03:08:12] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [03:08:12] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [03:08:12] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [03:08:12] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [03:08:12] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [03:08:12] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [03:47:29] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [03:51:17] Ryan_Lane: hope that answers the question [03:55:37] ori-l: :D [03:55:40] yeah. that does [03:55:40] heh [03:56:05] Ryan_Lane: how i imagine my .bash_history makes me look: http://imgur.com/XKejZ [03:56:08] I always check on odd commands, as it could indicate bad things ;) [03:56:15] hahaha [04:38:32] New patchset: Nemo bis; "Run stylize.php on InitialiseSettings.php" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/23998 [04:39:40] wow, that's extreme ori-l [04:39:52] jeremyb: ? [04:40:00] ori-l: imgur ;) [04:40:10] jeremyb: not for my own .bash_history surely [04:41:06] you never brute-force command-line args? :) [04:41:20] i don't quite know what you mean [04:41:45] i like shell-fu so whenever possible i read and re-read man pages if i forget how to do something [04:42:04] ohhh, that kind of brute force [04:42:05] but periodically i'm focussed on doing something else and some command or tool won't give me what i want [04:42:09] what about --help ? [04:42:19] so i guess / fudge / take drastic measures [04:44:15] yeah, i spend more time on #bash and http://mywiki.wooledge.org/ than is sane [04:44:46] heh [04:45:01] * jeremyb is usually happy to discuss bash questions [04:45:55] cool, i'll add you to gerrit reviews of shell scripts in the future [04:50:27] that works ;) [04:52:11] ori-l: see e.g. https://gerrit.wikimedia.org/r/#/c/17964/3/files/misc/wlm/update_from_toolserver.sh,unified [04:53:13] cd -> pushd [04:53:40] * jeremyb does usually use pushd personally... [04:54:08] `` -> $() [04:54:37] yes, that too [04:55:16] ori-l: btw, are you working tomorrow? [04:55:27] yeah [04:55:43] there's a guy that showed up friday night looking for answers about clicktracking... he may come back again [04:56:03] * ori-l cries. [04:56:06] actually he's in #-analytics now. average_drifter [04:56:19] i told him to ask again monday [04:56:40] oh, thanks for letting me know [04:56:41] (his own instance of mediawiki with clicktracking) [04:56:56] i inherited the extension but didn't write it [04:57:07] hah. didn't even know if it was yours [04:57:46] * jeremyb sleeps [04:58:06] peace [05:03:45] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [05:54:58] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [07:12:46] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [08:21:27] !log swap was using 7gb on ms-fe1... restarted swift-proxy over there. wlll prolly restart it on ms-fe2 through 4 over the next day or two [08:21:39] Logged the message, Master [08:38:17] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [08:38:17] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [08:39:11] PROBLEM - Puppet freshness on mw22 is CRITICAL: Puppet has not run in the last 10 hours [08:42:43] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 203 seconds [08:43:01] PROBLEM - MySQL Slave Delay on db33 is CRITICAL: CRIT replication delay 210 seconds [08:48:43] PROBLEM - MySQL Replication Heartbeat on db33 is CRITICAL: CRIT replication delay 189 seconds [08:50:49] RECOVERY - MySQL Slave Delay on db33 is OK: OK replication delay 0 seconds [08:51:43] RECOVERY - MySQL Replication Heartbeat on db33 is OK: OK replication delay 0 seconds [09:41:53] PROBLEM - Puppet freshness on manganese is CRITICAL: Puppet has not run in the last 10 hours [09:54:56] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [11:23:38] hey there [11:23:41] just woke up (1pm) [11:25:01] will do some accounting this afternoon and be there later tonight :-D [13:09:04] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [13:09:04] PROBLEM - Puppet freshness on singer is CRITICAL: Puppet has not run in the last 10 hours [13:09:04] PROBLEM - Puppet freshness on virt1001 is CRITICAL: Puppet has not run in the last 10 hours [13:09:04] PROBLEM - Puppet freshness on virt1002 is CRITICAL: Puppet has not run in the last 10 hours [13:09:04] PROBLEM - Puppet freshness on virt1004 is CRITICAL: Puppet has not run in the last 10 hours [13:09:04] PROBLEM - Puppet freshness on virt1003 is CRITICAL: Puppet has not run in the last 10 hours [13:30:12] New review: Burthsceh; "This is correct change to the request of bug 40270 of mine. Please merge this change and expand in J..." [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/23927 [13:43:41] New patchset: Krinkle; "docroot/noc: Sync configuration viewer" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/24003 [13:44:01] Change merged: Krinkle; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/24003 [13:48:31] PROBLEM - Puppet freshness on ms-be10 is CRITICAL: Puppet has not run in the last 10 hours [14:05:28] RECOVERY - udp2log log age for locke on locke is OK: OK: all log files active [14:10:44] New patchset: Ottomata; "Checking proper path of banner impressions log when checking log age." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24006 [14:11:41] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/24006 [14:35:49] New patchset: Ottomata; "Giving access to stat1 to Maryana and Dan." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24008 [14:36:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/24008 [14:37:03] New patchset: Ottomata; "Giving access to stat1 to Maryana and Dan." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24008 [14:37:55] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/24008 [14:44:56] New review: Ottomata; "Comments in line" [operations/debs/lucene-search-2] (master) C: 0; - https://gerrit.wikimedia.org/r/23583 [15:01:45] New patchset: Diederik; "Incorporated feedback Ottomata, filter sensitive searchterms from queries" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/24009 [15:04:52] PROBLEM - Puppet freshness on zhen is CRITICAL: Puppet has not run in the last 10 hours [15:10:28] robh: are srv's up to srv250 in a different location than 10.in-addr.arpa? [15:16:46] hiyooo, anybody around to approve? [15:16:48] https://gerrit.wikimedia.org/r/#/c/24006/ [15:17:12] New patchset: Diederik; "Incorporated feedback Ottomata, filter sensitive searchterms from queries" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/24009 [15:17:13] i want to push somethign else but I'm afraid I need to wait for this approval , or I will get merge conflicts [15:19:46] cmjohnson1: check line 101 in the 10 file [15:19:52] its part of a generated range [15:24:22] you're simply changing a file path? [15:24:33] seems trivial enough [15:28:56] yup [15:28:58] just a path [15:29:02] it was incorrect previously [15:34:06] thx robh [15:38:23] cmjohnson1: is dell showing up tomorrow? just trying to figure out when I need to be around [15:40:01] Platonides, can you +2? [15:40:26] apergos: no, nothing has been confirmed [15:40:33] not certain they will ever show up [15:40:56] hmm have you been in touch with them since that email of theirs last week? [15:41:29] no, i have not...the last they needed on thursday was info on the working systems but nothing since [15:41:38] i will be following up w/ them [15:42:09] ok, thanks. maybe they think we are supposed to call them and we think they are supposed to call us or something [15:55:50] PROBLEM - Puppet freshness on ms-be7 is CRITICAL: Puppet has not run in the last 10 hours [16:23:20] New review: Jeremyb; "Please keep all of the discussion about whether or not this is ready for merge in one place. (bugzil..." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/23927 [16:35:08] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:40:59] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.023 seconds [16:43:05] yo ^demon, would you be so kind to migrate wikistats from subversion to analytics/wikistats.git? [16:43:52] <^demon> drdee: Please put it on the list. Won't happen today--I'm still trying to catch up on everything I missed last week. [16:44:10] k, what is the URL? [16:45:01] <^demon> http://www.mediawiki.org/wiki/Git/New_repositories like normal, just mention that I'm converting from SVN in the notes. [16:45:07] thx [16:46:57] anyone know the status on jenkins with regards to open registration? [17:00:32] New patchset: Pyoungmeister; "re-adding dsc and diederik access to vanadium." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24016 [17:01:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/24016 [17:02:02] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24016 [17:04:32] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24006 [17:04:45] ottomata: +2'd the thing you marked for me to review [17:12:28] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:12:31] Is there any reason lists.wikimedia.org doesn't have an AAAA record? Mchenry has an ipv6 address so mail comes over ipv6 but the dns check fails because the ptr for the ipv6 address is lists. with a missing forward looking record :( [17:14:07] PROBLEM - Puppet freshness on neon is CRITICAL: Puppet has not run in the last 10 hours [17:22:30] New patchset: Ottomata; "Adding new zero filters, also udp-filter now uses cidr ranges." [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24018 [17:23:43] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/24018 [17:24:28] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.066 seconds [17:31:32] Change merged: Pyoungmeister; [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/22829 [17:32:44] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24008 [17:33:04] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24018 [17:33:44] oo, thank you [17:33:45] notpeter [17:34:25] no pro [17:34:29] b [17:38:25] PROBLEM - SSH on lvs6 is CRITICAL: Server answer: [17:39:55] RECOVERY - SSH on lvs6 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [17:43:09] Change abandoned: Diederik; "see https://gerrit.wikimedia.org/r/#/c/24009/" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/23583 [17:50:16] PROBLEM - mysqld processes on es6 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [17:50:43] RECOVERY - mysqld processes on es8 is OK: PROCS OK: 1 process with command name mysqld [17:53:15] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/11963 [17:54:55] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:56:34] PROBLEM - mysqld processes on es8 is CRITICAL: PROCS CRITICAL: 0 processes with command name mysqld [17:59:53] Change abandoned: Diederik; "(no reason)" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/24009 [18:03:04] New patchset: Diederik; "Filter creditcard numbers, email addresses and social security numbers from searchterms" [operations/debs/lucene-search-2] (master) - https://gerrit.wikimedia.org/r/24021 [18:05:38] Change restored: Jeremyb; "this is a test" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17041 [18:06:00] Change abandoned: Jeremyb; "test done" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/17041 [18:07:05] !log granted 'research' read access to user rollup/filter dbs on dbs 42,1047 [18:07:15] Logged the message, Master [18:09:55] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.035 seconds [18:10:04] RECOVERY - mysqld processes on es8 is OK: PROCS OK: 1 process with command name mysqld [18:13:54] apergos1: so we still need a rewrite.py change to finish timeline [18:14:07] RECOVERY - mysqld processes on es9 is OK: PROCS OK: 1 process with command name mysqld [18:14:08] AaronSchulz: hi? :) [18:17:25] RECOVERY - mysqld processes on es1008 is OK: PROCS OK: 1 process with command name mysqld [18:17:43] RECOVERY - mysqld processes on es10 is OK: PROCS OK: 1 process with command name mysqld [18:19:13] RECOVERY - mysqld processes on es1009 is OK: PROCS OK: 1 process with command name mysqld [18:19:22] RECOVERY - mysqld processes on es1010 is OK: PROCS OK: 1 process with command name mysqld [18:30:52] paravoid: can we get https://gerrit.wikimedia.org/r/#/c/23309/ merged? [18:39:10] PROBLEM - Puppet freshness on magnesium is CRITICAL: Puppet has not run in the last 10 hours [18:39:10] PROBLEM - Puppet freshness on zinc is CRITICAL: Puppet has not run in the last 10 hours [18:40:13] PROBLEM - Puppet freshness on mw22 is CRITICAL: Puppet has not run in the last 10 hours [18:43:31] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:54:23] New patchset: Pyoungmeister; "setting srv190 and mw55-59 to use apache modules/precise" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24028 [18:55:16] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/24028 [18:55:30] !log removing srv190 and mw55-59 from apaches pool for upgrade to precise [18:55:31] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 1.308 seconds [18:55:40] Logged the message, notpeter [19:05:52] PROBLEM - SSH on srv190 is CRITICAL: Connection refused [19:06:01] PROBLEM - Apache HTTP on srv190 is CRITICAL: Connection refused [19:06:46] PROBLEM - Memcached on srv190 is CRITICAL: Connection refused [19:11:52] RECOVERY - SSH on srv190 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [19:21:39] TomDaley: gitweb is soooooooo slow :( [19:21:55] how's that plugin for gitblit coming along? heh [19:24:10] Soon :) [19:25:13] PROBLEM - NTP on srv190 is CRITICAL: NTP CRITICAL: No response from NTP server [19:30:01] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:30:22] https://bugzilla.wikimedia.org/40306 "May be HTML5 related?" [19:31:50] Ops probably won't care much about html5 bugs [19:32:50] so this is a thing that the users can fix? [19:32:56] uh yeah... wrong channel [19:42:01] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.704 seconds [19:43:13] PROBLEM - Puppet freshness on manganese is CRITICAL: Puppet has not run in the last 10 hours [19:49:46] New review: OliverKeyes; "We need to hold off for now, I think :(. Need to work through some community angle stuff." [operations/mediawiki-config] (master) C: -1; - https://gerrit.wikimedia.org/r/23632 [19:53:25] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 244 seconds [19:53:25] PROBLEM - MySQL Slave Delay on storage3 is CRITICAL: CRIT replication delay 245 seconds [19:56:07] PROBLEM - Puppet freshness on ms-be6 is CRITICAL: Puppet has not run in the last 10 hours [19:56:25] RECOVERY - MySQL Slave Delay on storage3 is OK: OK replication delay 3 seconds [19:57:55] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 5 seconds [19:59:01] New review: Catrope; "Oliver tells me they really don't want this deployed just yet. Also, whatever happens, this shouldn'..." [operations/mediawiki-config] (master); V: 0 C: -2; - https://gerrit.wikimedia.org/r/23632 [19:59:19] New review: Catrope; "(-2ed so this isn't merged prematurely)" [operations/mediawiki-config] (master); V: 0 C: -2; - https://gerrit.wikimedia.org/r/23632 [20:16:16] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:31:07] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.032 seconds [20:52:49] !log stopped puppet on aluminium for config testing [20:52:59] Logged the message, Master [21:02:46] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:06:49] PROBLEM - Host sq37 is DOWN: PING CRITICAL - Packet loss = 100% [21:08:12] New patchset: Jgreen; "adding packages used for fundraising analytics" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24079 [21:09:07] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/24079 [21:09:49] Change merged: Jgreen; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24079 [21:19:07] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 0.039 seconds [21:28:36] PROBLEM - Apache HTTP on mw55 is CRITICAL: Connection refused [21:28:36] PROBLEM - Apache HTTP on mw56 is CRITICAL: Connection refused [21:28:36] PROBLEM - Apache HTTP on mw57 is CRITICAL: Connection refused [21:28:36] PROBLEM - Apache HTTP on mw59 is CRITICAL: Connection refused [21:28:36] PROBLEM - Apache HTTP on mw58 is CRITICAL: Connection refused [21:30:13] ερ? [21:34:32] apergos1...i think notpeter is working on those [21:45:05] !log (from 5 hours ago) last of the swift proxy services restarted, it had reached 8.6gb of swap use. good for a few more days now [21:45:08] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/24028 [21:45:16] Logged the message, Master [21:45:26] stupid connection [21:47:08] apergos: is swift facing some mem leak? [21:47:12] yes [21:47:23] proxy server [21:47:32] apergos: if you have some spare time, you could depool and upgrade one or two of the frontend boxes to precise [21:47:35] figured I'd be preemptive [21:47:39] should be straightforward [21:47:50] and might help with the memory leaks [21:47:51] we'll see what tomorrow lookslike [21:47:58] my connection is so awful I dunno [21:48:04] maybe you should get a nagios warning whenever swap is too high ;-) [21:48:16] I'm getting dropped several times an hour [21:48:18] like swift-proxy01 : reboot me please! I got swap usage to high. [21:48:25] binasher found a eventlet changelog entry that mentioned memory leaks, seems to be fixed in precise (always according to the changelog) [21:48:33] right [21:48:46] I'm reimaging mw boxes, btw [21:48:55] expect nagios alerts soon [21:49:07] I already saw whines about mw55 through something [21:49:19] kk [21:49:22] bit not your log entry (guess I was disconnected when it went through) [21:49:27] really aggravating [21:49:39] apergos: was a while ago, sorry :( [21:49:45] that's why I rementioned it [21:49:46] wasn't your fault [21:49:53] what's wrong with your connection? [21:49:54] PROBLEM - Host mw57 is DOWN: PING CRITICAL - Packet loss = 100% [21:50:03] there we go [21:50:03] I get dropped several tims an hour [21:50:19] it's either the router or the cables in the building or the isp [21:50:23] the router's theirs [21:50:24] anyway, we have more than enough capacity on the rest of the frontends, so depooling one for several hours shouldn't be a problem [21:50:30] PROBLEM - Host mw58 is DOWN: PING CRITICAL - Packet loss = 100% [21:50:32] you can take your time :-) [21:50:57] PROBLEM - Host mw59 is DOWN: PING CRITICAL - Packet loss = 100% [21:51:06] PROBLEM - Puppetmaster HTTPS on stafford is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:51:55] it's a relatively new development [21:52:27] PROBLEM - SSH on mw55 is CRITICAL: Connection refused [21:53:12] PROBLEM - Memcached on mw56 is CRITICAL: Connection refused [21:53:30] PROBLEM - Memcached on mw55 is CRITICAL: Connection refused [21:53:48] PROBLEM - SSH on mw56 is CRITICAL: Connection refused [21:55:27] RECOVERY - Host mw57 is UP: PING OK - Packet loss = 0%, RTA = 0.63 ms [21:56:03] RECOVERY - Host mw58 is UP: PING OK - Packet loss = 0%, RTA = 0.55 ms [21:56:05] so I would be reinstalling the same version of swift? [21:56:18] cause I could upgrade it for the proxy server on a box and see how it is [21:56:30] RECOVERY - Host mw59 is UP: PING OK - Packet loss = 0%, RTA = 0.59 ms [21:56:38] woosters: no decrease in bits cache hitrate [21:56:58] hmmm, interesting [21:57:32] (paravoid) [21:57:49] I have moved some favicons to bits last week [21:58:00] apergos: er, what? [21:58:08] (12:56:05 πμ) apergos: so I would be reinstalling the same version of swift? [21:58:08] (12:56:18 πμ) apergos: cause I could upgrade it for the proxy server on a box and see how it is [21:58:11] apergos: yes, reinstall with precise but otherwise exactly the same (swift 1.5, same configs) [21:58:16] ok [21:58:21] and that's just for the proxies for now [21:58:26] of course [21:59:20] since a) they're the ones having the memory leak b) we wouldn't want to reboot backends, they might never come back :/ [21:59:29] no, I wasn't planning to touch backends [21:59:30] PROBLEM - Memcached on mw57 is CRITICAL: Connection refused [21:59:40] just thinking that it might be interesting to upgrade the front end swift version [21:59:48] PROBLEM - SSH on mw59 is CRITICAL: Connection refused [21:59:48] PROBLEM - SSH on mw57 is CRITICAL: Connection refused [22:00:06] PROBLEM - SSH on mw58 is CRITICAL: Connection refused [22:00:33] PROBLEM - Memcached on mw58 is CRITICAL: Connection refused [22:00:37] they've changed the ring format in the new versions, so we shouldn't run a mixed versioned cluster for long [22:00:42] PROBLEM - Memcached on mw59 is CRITICAL: Connection refused [22:01:18] RECOVERY - SSH on mw56 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [22:01:27] RECOVERY - SSH on mw55 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [22:01:32] I thought the new swift can read the old ring versions; we would just want to to rebalancing etc against a copy on a box running the old version [22:02:00] so is anyone merging those rewrite changes? [22:02:06] * AaronSchulz works on another set of changes [22:02:48] weren't you two talking about them before? [22:02:48] RECOVERY - SSH on mw57 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [22:02:57] RECOVERY - SSH on mw58 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [22:03:15] RECOVERY - Puppetmaster HTTPS on stafford is OK: HTTP OK HTTP/1.1 400 Bad Request - 336 bytes in 5.607 seconds [22:04:00] RECOVERY - Apache HTTP on srv190 is OK: HTTP OK HTTP/1.1 200 OK - 453 bytes in 0.004 seconds [22:04:27] RECOVERY - SSH on mw59 is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [22:08:48] I was not planning to merge anything at this hour and deploy it, if that was being asked of me [22:09:00] it's 1 am here and I don't plan to be around for much longer [22:09:19] someone in the sf timezone should probably do it [22:09:20] apergos: can you look at it maybe tomorrow? [22:09:34] if you have no takers today [22:09:36] * AaronSchulz misses having an sf timezone person for this :( [22:09:40] what's the link? [22:09:58] apergos: https://gerrit.wikimedia.org/r/#/c/23309/2 [22:10:15] apergos: https://gerrit.wikimedia.org/r/#/c/23392/ while at it (though not important) [22:10:18] RECOVERY - Apache HTTP on mw55 is OK: HTTP OK HTTP/1.1 200 OK - 454 bytes in 0.011 seconds [22:11:21] PROBLEM - NTP on mw56 is CRITICAL: NTP CRITICAL: No response from NTP server [22:11:57] AaronSchulz: you need to become a european timezone person :-P [22:12:16] I used to be an all-timezone person back in the day [22:12:56] heh [22:13:00] I'm trying to cut down on that [22:13:06] people wondered were I lived ;) [22:13:20] people used to think I didn't sleep. ever. [22:13:26] I actually like my sleep. [22:13:57] woosters: fyi, the increased traffic on the few bits apaches is due to tim rotating them into memcached service on saturday [22:14:05] totally unrelated to bits traffic [22:14:38]