[01:06:40] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [01:10:43] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [01:13:48] New patchset: Jeremyb; "redirect wikidata.org to [[m:wikidata]]" [operations/apache-config] (master) - https://gerrit.wikimedia.org/r/9874 [01:15:49] PROBLEM - Puppet freshness on search14 is CRITICAL: Puppet has not run in the last 10 hours [01:16:52] PROBLEM - Puppet freshness on search15 is CRITICAL: Puppet has not run in the last 10 hours [01:17:55] PROBLEM - Puppet freshness on search16 is CRITICAL: Puppet has not run in the last 10 hours [01:18:49] PROBLEM - Puppet freshness on search18 is CRITICAL: Puppet has not run in the last 10 hours [01:18:49] PROBLEM - Puppet freshness on search17 is CRITICAL: Puppet has not run in the last 10 hours [01:19:52] PROBLEM - Puppet freshness on search20 is CRITICAL: Puppet has not run in the last 10 hours [01:19:52] PROBLEM - Puppet freshness on search19 is CRITICAL: Puppet has not run in the last 10 hours [01:20:28] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.110 seconds [01:25:07] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.123 seconds [01:41:46] PROBLEM - MySQL Slave Delay on db1025 is CRITICAL: CRIT replication delay 262 seconds [01:45:58] RECOVERY - MySQL Slave Delay on db1025 is OK: OK replication delay 0 seconds [02:38:28] RECOVERY - Puppet freshness on search15 is OK: puppet ran at Sun Jun 3 02:38:05 UTC 2012 [02:42:58] PROBLEM - Puppet freshness on bellin is CRITICAL: Puppet has not run in the last 10 hours [02:49:25] RECOVERY - Puppet freshness on search14 is OK: puppet ran at Sun Jun 3 02:49:07 UTC 2012 [02:51:58] RECOVERY - Puppet freshness on search16 is OK: puppet ran at Sun Jun 3 02:51:41 UTC 2012 [02:56:55] RECOVERY - Puppet freshness on search17 is OK: puppet ran at Sun Jun 3 02:56:40 UTC 2012 [02:57:49] RECOVERY - Puppet freshness on search20 is OK: puppet ran at Sun Jun 3 02:57:29 UTC 2012 [02:59:01] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [03:00:04] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [03:02:28] RECOVERY - Puppet freshness on search18 is OK: puppet ran at Sun Jun 3 03:02:21 UTC 2012 [03:05:37] RECOVERY - Puppet freshness on search19 is OK: puppet ran at Sun Jun 3 03:05:25 UTC 2012 [03:15:22] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.178 seconds [03:18:25] New review: Jeremyb; "can we send these msgs to stderr instead?" [operations/software] (master) - https://gerrit.wikimedia.org/r/9761 [03:18:31] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27406 bytes in 0.108 seconds [03:26:01] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [03:26:01] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [03:26:01] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [03:35:43] New patchset: Jeremyb; "followup change 9759 (snaprotate params for db26)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9876 [03:36:05] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9876 [03:42:05] New review: Jeremyb; "(no comment)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9753 [04:22:43] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [04:28:34] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [04:30:58] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [04:48:04] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.126 seconds [04:50:46] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.109 seconds [05:21:49] PROBLEM - Router interfaces on cr1-sdtpa is CRITICAL: CRITICAL: host 208.80.152.196, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-5/2/1 (FPL/Level3, CV71028) [10Gbps wave]BR [05:34:16] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [05:35:46] RECOVERY - Router interfaces on cr1-sdtpa is OK: OK: host 208.80.152.196, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0 [05:38:10] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [05:49:25] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.121 seconds [05:52:07] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.108 seconds [06:28:52] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [06:32:32] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [06:50:50] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.109 seconds [06:52:20] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.111 seconds [07:00:44] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [07:27:44] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [08:49:56] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:50:05] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [08:52:56] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.209 seconds [09:02:23] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27409 bytes in 0.107 seconds [09:07:42] New patchset: Pyoungmeister; "correct mac for search35 and search34" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9879 [09:08:03] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9879 [09:09:26] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [09:16:41] !log pushing new zone files. only minor changes [09:16:46] Logged the message, notpeter [09:19:29] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [09:21:42] New review: Pyoungmeister; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9879 [09:21:44] Change merged: Pyoungmeister; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9879 [09:22:29] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.180 seconds [09:26:05] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27406 bytes in 0.107 seconds [09:53:41] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [10:03:08] PROBLEM - swift-container-auditor on ms-be1 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:08:59] RECOVERY - swift-container-auditor on ms-be1 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:09:53] New patchset: Hashar; "wgHTCPMulticast* is only used on pmtpa cluster" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9384 [10:10:00] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9384 [10:10:30] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9384 [10:10:33] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9384 [10:11:23] New patchset: Hashar; "wgLoadScript is only used on production cluster" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9383 [10:11:28] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9383 [10:11:38] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9383 [10:11:41] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9383 [10:11:56] going to sync that on the live cluster [10:13:21] OHNOES [10:13:49] Reedy: do you know what is the wfItalianWikipediaDisableScripts hook for ? [10:13:57] wmf-config/killscripts.php (not in git) [10:13:58] Yeah [10:14:09] You remember when the italians had that blackout? [10:14:14] PROBLEM - swift-container-auditor on ms-be3 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:14:18] That was Tims fix to allow people to still visit it [10:14:18] Reedy: yup [10:14:24] ROFL [10:16:02] Reedy: $ git diff --name-only master origin/master [10:16:08] that gives the list of files that changed [10:16:19] between the current (master) wmf-config and the remote one (origin/master) [10:16:25] you need to "git remote update" first though [10:17:28] mw64: rsync: write failed on "/apache/common-local/wmf-config/CommonSettings.php": No space left on device (28) [10:17:30] yeahhh [10:17:45] lol [10:17:51] !root [10:17:53] !ping ops [10:17:53] pong [10:18:01] wm-bot: oh glad to meet you [10:18:17] !log mw64: rsync: write failed on "/apache/common-local/wmf-config/CommonSettings.php": No space left on device (28) [10:18:21] Logged the message, Master [10:19:17] yet another server with only 7.4G in / ;-D [10:25:24] New patchset: Mark Bergsma; "Use a custom flatten function for realserver_ips" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9885 [10:25:46] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9885 [10:27:14] New review: Hashar; "(no comment)" [operations/apache-config] (master) C: 1; - https://gerrit.wikimedia.org/r/9874 [10:31:58] New patchset: Faidon; "Merge if-up & if-down to a single script to avoid duplication." [operations/debs/wikimedia-lvs-realserver] (master) - https://gerrit.wikimedia.org/r/9886 [10:31:59] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9876 [10:32:01] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9876 [10:34:56] New review: Mark Bergsma; "(no comment)" [operations/debs/wikimedia-lvs-realserver] (master); V: 0 C: 1; - https://gerrit.wikimedia.org/r/9886 [10:35:02] New review: Faidon; "(no comment)" [operations/debs/wikimedia-lvs-realserver] (master); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9886 [10:35:13] New review: Faidon; "(no comment)" [operations/debs/wikimedia-lvs-realserver] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9886 [10:35:15] Change merged: Faidon; [operations/debs/wikimedia-lvs-realserver] (master) - https://gerrit.wikimedia.org/r/9852 [10:35:16] Change merged: Faidon; [operations/debs/wikimedia-lvs-realserver] (master) - https://gerrit.wikimedia.org/r/9886 [10:35:59] PROBLEM - Varnish HTTP upload-backend on cp1022 is CRITICAL: HTTP CRITICAL - No data received from host [10:37:00] New patchset: Asher; "month of binlogs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9887 [10:37:21] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9887 [10:37:39] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9887 [10:37:41] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9887 [10:40:11] RECOVERY - Varnish HTTP upload-backend on cp1022 is OK: HTTP OK HTTP/1.1 200 OK - 632 bytes in 0.054 seconds [10:41:05] RECOVERY - swift-container-auditor on ms-be3 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:44:23] PROBLEM - Varnish HTTP upload-backend on cp1022 is CRITICAL: Connection refused [10:54:17] PROBLEM - Varnish HTTP upload-backend on cp1021 is CRITICAL: Connection refused [10:55:13] so, in ruby 0 and '' both are true in a bool context. not what i expected. good thing i tested! [11:03:35] binasher: do you know why mysql.pp uses so much "true" / "false" instead of real bools? [11:03:47] does puppet really need that? [11:04:03] (i've only spent a few mins trying to figure out what convention is [11:04:06] ) [11:06:31] i was adding the false values because erb needs variables to be explicitly defined unlike puppet templates, but they could probably be real bools. [11:07:54] s/templates/manifests/ ? [11:08:29] what would you name a var which just holds a '#' or '' depending on whether binlogs are enabled? [11:08:44] or should you just do a ternary on every line? [11:09:01] (more dedupe ;) ) [11:12:40] uh [11:13:06] you want to proceed lines with a variable containing a # or "" ? [11:13:16] precede* [11:13:19] but yes [11:13:27] prepend really is the right word [11:13:35] or begin [11:14:16] don't like. [11:14:50] you prefer having 5 lines repeated twice verbatim except that one set has an extra character on the front? (a #) [11:15:56] so it could either be a ternary evaluated once per line or evaluated right above the block of lines and # or '' is then stored in a var and then that's used on each line. [11:16:21] just delete the entire commented out portion [11:17:01] really? don't even leave a note with a hint of what to change in puppet? [11:17:13] jesus [11:17:34] this is the the block next to the change in 9887 [11:17:42] christ? ;) [11:24:54] binasher: so? remove entirely then? [11:36:46] if it bothers you, sure [11:37:23] i just don't see the point in repeating it if it's easy to have one copy for both variants [11:37:37] there's no real point to the commented out portion [11:37:39] we can easily produce exactly the same output [11:37:42] ok [11:54:04] New patchset: Jeremyb; "prod.my.cnf.erb: rm commented binlog section" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9955 [11:54:26] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9955 [11:56:41] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9955 [11:56:43] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9955 [12:20:20] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/8043 [12:20:23] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/8043 [12:34:32] New patchset: Petrb; "inserted 'accountcreator' to wg[Add|Remove]Group for crats on frwiki per !b 37271, requested by local crats" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9958 [12:34:38] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9958 [12:35:41] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [12:39:51] Change abandoned: Reedy; "(no reason)" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9617 [12:40:51] New patchset: Petrb; "inserted 'accountcreator' for crats on frwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9958 [12:40:57] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9958 [12:41:40] binasher: es4 dead very much? [12:42:51] New patchset: Petrb; "inserted 'accountcreator' for crats on frwiki" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9958 [12:42:59] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9958 [12:43:47] PROBLEM - Puppet freshness on bellin is CRITICAL: Puppet has not run in the last 10 hours [12:49:11] PROBLEM - Host db1047 is DOWN: PING CRITICAL - Packet loss = 100% [12:49:13] New review: Jeremyb; "(no comment)" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/9958 [12:52:11] RECOVERY - Host es4 is UP: PING OK - Packet loss = 0%, RTA = 0.43 ms [12:52:44] domas: ^ [12:52:46] New review: Hydriz; "(no comment)" [operations/mediawiki-config] (master) C: 1; - https://gerrit.wikimedia.org/r/9958 [12:56:32] PROBLEM - MySQL Slave Running on es4 is CRITICAL: CRIT replication Slave_IO_Running: Yes Slave_SQL_Running: No Last_Error: Error Duplicate entry 209147182 for key PRIMARY on query. Defau [12:58:45] domas: ^ [12:58:52] and binasher [12:59:23] RECOVERY - MySQL Slave Running on es4 is OK: OK replication Slave_IO_Running: Yes Slave_SQL_Running: Yes Last_Error: [13:01:29] PROBLEM - Auth DNS on ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [13:02:05] PROBLEM - MySQL Slave Delay on es4 is CRITICAL: CRIT replication delay 5111090 seconds [13:03:43] New patchset: Hashar; "(bug 37271) crats on frwiki to +- 'accountcreator' group" [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9958 [13:03:49] New review: jenkins-bot; "Build Successful " [operations/mediawiki-config] (master); V: 1 C: 0; - https://gerrit.wikimedia.org/r/9958 [13:04:03] New review: Hashar; "(no comment)" [operations/mediawiki-config] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9958 [13:04:04] now ns1? [13:04:05] Change merged: Hashar; [operations/mediawiki-config] (master) - https://gerrit.wikimedia.org/r/9958 [13:04:19] petan|wk: ^ [13:05:24] hashar: Oooooh, we have the universal linter up and running now? [13:06:15] RoanKattouw: jenkins? [13:06:33] Yeah [13:08:23] what's the "universal linter"? [13:09:53] PROBLEM - Apache HTTP on mw64 is CRITICAL: Connection refused [13:10:12] New review: Jeroen De Dauw; "(no comment)" [operations/apache-config] (master) C: 1; - https://gerrit.wikimedia.org/r/9874 [13:12:30] New patchset: Ryan Lane; "Revoking Ben's key" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9961 [13:12:52] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9961 [13:12:53] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [13:12:59] New review: Ryan Lane; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9961 [13:13:02] Change merged: Ryan Lane; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9961 [13:13:46] jeremyb: what? [13:13:49] jeremyb: I'm working on es4 [13:13:57] it is quite a clusterfuck [13:13:59] was lagged for 5M secs [13:14:13] i see that now [13:14:37] RoanKattouw: the universal linter only lints PHP files that changed, and only for mediawiki/core [13:14:48] domas: only 5 milliseconds? what's the problem? [13:14:53] RoanKattouw: we could most probably extends it so other jobs could call it and pass the repo / path / list of files to it [13:15:01] Ryan_Lane: M for Mega ? [13:15:05] Ryan_Lane: 59 days [13:16:20] domas: microseconds? [13:20:59] RECOVERY - Apache HTTP on mw64 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 0.029 second response time [13:23:20] =) [13:23:23] feMto [13:23:24] for fuck sake [13:26:41] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [13:26:41] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [13:26:41] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [13:26:59] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.169 seconds [13:27:53] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [13:30:36] heh [13:32:05] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.107 seconds [13:36:35] PROBLEM - Apache HTTP on mw64 is CRITICAL: Connection refused [13:46:07] New patchset: Mark Bergsma; "Create definition for binding a (v4 mapped) IPv6 address to an interface" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9969 [13:46:24] New patchset: Faidon; "Fix postinst to work with the merged script." [operations/debs/wikimedia-lvs-realserver] (master) - https://gerrit.wikimedia.org/r/9970 [13:46:24] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (production); V: -1 - https://gerrit.wikimedia.org/r/9969 [13:46:44] New review: Faidon; "(no comment)" [operations/debs/wikimedia-lvs-realserver] (master); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9970 [13:46:46] Change merged: Faidon; [operations/debs/wikimedia-lvs-realserver] (master) - https://gerrit.wikimedia.org/r/9970 [13:49:07] New patchset: Mark Bergsma; "Create definition for binding a (v4 mapped) IPv6 address to an interface" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9969 [13:49:30] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9969 [13:51:35] RECOVERY - MySQL Slave Delay on es4 is OK: OK replication delay NULL seconds [13:53:32] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [13:55:47] PROBLEM - MySQL Slave Delay on es4 is CRITICAL: CRIT replication delay 2405044 seconds [13:56:11] New patchset: Mark Bergsma; "Create definition for binding a (v4 mapped) IPv6 address to an interface" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9969 [13:56:33] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9969 [13:57:08] RECOVERY - MySQL Slave Delay on es4 is OK: OK replication delay NULL seconds [13:57:44] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.138 seconds [13:58:25] New patchset: Mark Bergsma; "Create definition for binding a (v4 mapped) IPv6 address to an interface" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9969 [13:58:47] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9969 [14:01:20] PROBLEM - MySQL Slave Delay on es4 is CRITICAL: CRIT replication delay 104475 seconds [14:02:49] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9969 [14:04:43] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9969 [14:05:18] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9885 [14:05:21] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9969 [14:05:22] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9885 [14:06:53] RECOVERY - MySQL Slave Delay on es4 is OK: OK replication delay 0 seconds [14:08:42] New patchset: Mark Bergsma; "Add IPv6 addresses to sq67-70" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9972 [14:08:50] RECOVERY - Apache HTTP on mw64 is OK: HTTP OK - HTTP/1.1 301 Moved Permanently - 1.310 second response time [14:09:03] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9972 [14:09:20] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9972 [14:09:22] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9972 [14:11:19] New patchset: Mark Bergsma; "Fix inline_template" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9973 [14:11:40] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9973 [14:11:49] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9973 [14:11:51] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9973 [14:12:56] New patchset: RobH; "added in row c eqiad" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9974 [14:13:16] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9974 [14:14:19] New patchset: Mark Bergsma; "Fix template" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9975 [14:14:40] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9975 [14:14:52] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9975 [14:14:54] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9975 [14:20:06] New patchset: Faidon; "Facter ensure => latest" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9976 [14:20:27] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9976 [14:20:27] New review: Faidon; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9976 [14:20:30] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9976 [14:21:26] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [14:23:32] RECOVERY - Auth DNS on ns1.wikimedia.org is OK: DNS OK: 0.034 seconds response time. www.wikipedia.org returns 208.80.154.225 [14:28:29] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.172 seconds [14:31:47] PROBLEM - Puppet freshness on db29 is CRITICAL: Puppet has not run in the last 10 hours [14:39:20] New patchset: RobH; "added in row c eqiad" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9974 [14:39:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9974 [14:46:38] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [14:57:53] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.185 seconds [15:03:10] New review: RobH; "tab removal, adding row c, trivial (hopefully)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9974 [15:03:12] Change merged: RobH; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9974 [15:03:51] New patchset: Mark Bergsma; "The existing test is broken, just try to add every time" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9982 [15:04:12] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9982 [15:10:21] !log torrus failed to refresh via puppet (failed refresh takes too long) so manually running the refresh/rebuild command as puppet copied the updates to the system [15:10:25] Logged the message, RobH [15:17:03] New review: Dzahn; "true, it is called 30.. in prod now." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/6593 [15:17:06] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6593 [15:17:32] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [15:19:39] New patchset: Faidon; "PXE: switch lvs1 to precise" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9983 [15:20:00] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9983 [15:20:01] New review: Faidon; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/9983 [15:20:06] Change merged: Faidon; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9983 [15:28:56] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.112 seconds [15:31:53] !log reinstalling lvs1 with precise [15:31:57] Logged the message, Master [15:35:41] PROBLEM - BGP status on cr2-pmtpa is CRITICAL: CRITICAL: host 208.80.152.197, sessions up: 6, down: 1, shutdown: 0BRPeering with AS64600 not established - BR [15:38:23] PROBLEM - SSH on lvs1 is CRITICAL: Connection refused [15:40:06] now you've done it paravoid [15:40:10] it's broken! [15:40:14] what is? [15:40:34] nothing should be broken, lvs1 was an inactive LVS [15:40:36] [15:45:12] !log aborting lvs1 install, partition map is not ready; putting it back to production as-is [15:45:17] Logged the message, Master [15:45:17] RECOVERY - SSH on lvs1 is OK: SSH OK - OpenSSH_5.3p1 Debian-3ubuntu7 (protocol 2.0) [15:45:35] RECOVERY - BGP status on cr2-pmtpa is OK: OK: host 208.80.152.197, sessions up: 7, down: 0, shutdown: 0 [16:10:38] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [16:35:50] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.106 seconds [17:01:47] PROBLEM - Puppet freshness on storage3 is CRITICAL: Puppet has not run in the last 10 hours [17:26:32] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [17:28:47] PROBLEM - Puppet freshness on ocg3 is CRITICAL: Puppet has not run in the last 10 hours [17:36:26] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.107 seconds [18:09:08] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [18:11:23] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [18:32:59] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.112 seconds [18:38:05] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.108 seconds [18:43:21] New patchset: Asher; "actually try myisam table recovery after a crash on the external store (this has been off since the es server migration..)" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9987 [18:43:27] notpeter: can you review ^^ [18:43:42] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9987 [18:45:44] binasher: heh [18:46:49] heh [18:47:35] ahem [18:48:14] New review: Pyoungmeister; "I hear good things about recovery" [operations/puppet] (production); V: 0 C: 1; - https://gerrit.wikimedia.org/r/9987 [18:49:37] lol [18:50:51] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9987 [18:50:54] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9987 [19:04:38] PROBLEM - MySQL Slave Delay on es1004 is CRITICAL: CRIT replication delay 256 seconds [19:07:20] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [19:08:50] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27407 bytes in 0.106 seconds [19:15:52] New patchset: Asher; "missing else" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9988 [19:16:14] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/9988 [19:16:36] New review: Asher; "(no comment)" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/9988 [19:16:39] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/9988 [19:21:26] RECOVERY - MySQL Slave Delay on es1004 is OK: OK replication delay 0 seconds [19:54:44] PROBLEM - Puppet freshness on analytics1001 is CRITICAL: Puppet has not run in the last 10 hours [20:46:38] PROBLEM - Backend Squid HTTP on cp1001 is CRITICAL: Connection refused [21:04:47] RECOVERY - Backend Squid HTTP on cp1001 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.185 seconds [21:45:26] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused [22:12:08] RECOVERY - Backend Squid HTTP on cp1002 is OK: HTTP OK HTTP/1.0 200 OK - 27399 bytes in 0.163 seconds [22:36:44] PROBLEM - Puppet freshness on spence is CRITICAL: Puppet has not run in the last 10 hours [22:44:41] PROBLEM - Puppet freshness on bellin is CRITICAL: Puppet has not run in the last 10 hours [23:01:05] New review: Jeremyb; "for reference, followed up in Ib724dab91a8fd50" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/4149 [23:01:05] New review: Jeremyb; "for reference, this is follows up on Ibf47b64d56deb12" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/6593 [23:27:44] PROBLEM - Puppet freshness on es1003 is CRITICAL: Puppet has not run in the last 10 hours [23:27:44] PROBLEM - Puppet freshness on maerlant is CRITICAL: Puppet has not run in the last 10 hours [23:27:44] PROBLEM - Puppet freshness on professor is CRITICAL: Puppet has not run in the last 10 hours [23:44:41] PROBLEM - Puppet freshness on sq69 is CRITICAL: Puppet has not run in the last 10 hours [23:58:56] PROBLEM - Backend Squid HTTP on cp1002 is CRITICAL: Connection refused