[00:08:17] (03PS8) 10BryanDavis: iegreview: Create module and role for deployment [puppet] - 10https://gerrit.wikimedia.org/r/165231 (https://bugzilla.wikimedia.org/71597) [00:11:32] (03CR) 10BryanDavis: "Addressed most of Ori's concerns in patch set 8" (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/165231 (https://bugzilla.wikimedia.org/71597) (owner: 10BryanDavis) [00:12:27] bd808: qunit passed [00:12:56] w00t [00:13:06] sunspots then? [00:51:16] PROBLEM - MySQL Replication Heartbeat on db1016 is CRITICAL: CRIT replication delay 315 seconds [00:51:45] PROBLEM - MySQL Slave Delay on db1016 is CRITICAL: CRIT replication delay 335 seconds [00:52:47] RECOVERY - MySQL Slave Delay on db1016 is OK: OK replication delay 2 seconds [00:53:27] RECOVERY - MySQL Replication Heartbeat on db1016 is OK: OK replication delay -0 seconds [01:18:27] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:16:12] !log LocalisationUpdate completed (1.25wmf2) at 2014-10-12 02:16:12+00:00 [02:16:23] Logged the message, Master [02:28:03] !log LocalisationUpdate completed (1.25wmf3) at 2014-10-12 02:28:03+00:00 [02:28:09] Logged the message, Master [02:32:57] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54210 bytes in 0.435 second response time [03:30:18] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Oct 12 03:30:17 UTC 2014 (duration 30m 16s) [03:30:28] Logged the message, Master [03:45:14] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:54:55] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54210 bytes in 2.474 second response time [05:24:38] PROBLEM - puppet last run on ms-be2006 is CRITICAL: CRITICAL: puppet fail [05:44:57] RECOVERY - puppet last run on ms-be2006 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:15:27] PROBLEM - puppet last run on mw1022 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:18] PROBLEM - puppet last run on amslvs1 is CRITICAL: CRITICAL: puppet fail [06:28:27] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: puppet fail [06:28:37] RECOVERY - Disk space on ms-be1013 is OK: DISK OK [06:29:07] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:18] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:28] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:57] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:28] PROBLEM - puppet last run on db1026 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:47] RECOVERY - puppet last run on mw1022 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:33:58] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: puppet fail [06:45:18] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [06:45:38] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:45:50] RECOVERY - puppet last run on mw1118 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [06:45:58] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [06:47:48] RECOVERY - puppet last run on amslvs1 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:47:58] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:48:58] RECOVERY - puppet last run on db1026 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:52:37] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [07:04:57] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:03:08] PROBLEM - Disk space on ms-be1013 is CRITICAL: DISK CRITICAL - free space: / 1345 MB (2% inode=83%): [08:38:35] PROBLEM - puppet last run on mw1137 is CRITICAL: CRITICAL: puppet fail [08:57:40] RECOVERY - puppet last run on mw1137 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [09:56:03] (03PS1) 10TTO: Remove alias codes from langlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166281 (https://bugzilla.wikimedia.org/43697) [11:10:35] (03PS1) 1020after4: redirector for bugzilla -> phabricator [puppet] - 10https://gerrit.wikimedia.org/r/166283 [11:11:18] (03CR) 10jenkins-bot: [V: 04-1] redirector for bugzilla -> phabricator [puppet] - 10https://gerrit.wikimedia.org/r/166283 (owner: 1020after4) [11:15:17] (03PS2) 1020after4: redirector for bugzilla -> phabricator [puppet] - 10https://gerrit.wikimedia.org/r/166283 [11:15:57] (03CR) 10jenkins-bot: [V: 04-1] redirector for bugzilla -> phabricator [puppet] - 10https://gerrit.wikimedia.org/r/166283 (owner: 1020after4) [11:16:55] (03PS3) 1020after4: redirector for bugzilla -> phabricator [puppet] - 10https://gerrit.wikimedia.org/r/166283 [11:17:37] (03CR) 10jenkins-bot: [V: 04-1] redirector for bugzilla -> phabricator [puppet] - 10https://gerrit.wikimedia.org/r/166283 (owner: 1020after4) [11:20:34] (03PS4) 1020after4: redirector for bugzilla -> phabricator [puppet] - 10https://gerrit.wikimedia.org/r/166283 [11:37:36] Someone needs to poke antimony with a stick [11:38:28] It's ganglia graphs look scary, but it didn't totally go away it seems [11:38:34] Nemo_bis: ^ [11:38:41] Would be happy to help more, but I can't [11:39:01] I can only tell that antimony is not responding and varnish is not at blame, but that's not worth much [11:45:35] https://bugzilla.wikimedia.org/71974 [11:45:58] Yep, that's why I was poking [11:46:19] everything except of actual web service looks fine according to icinga [11:50:57] It regularly needs an apache and/or gitblit restart, according to logs. Can wait for the first op seeing it, but I hope it's actually visible somewhere they look ;) [11:51:31] Nemo_bis: Not sure they get paged about it... but I doubt it's worth manual paging [11:51:46] If I had shell on it, I could probably do it :S [11:52:40] A bug report was filed, relax :) [11:53:32] legoktm: Around? [11:53:50] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54211 bytes in 0.383 second response time [11:55:17] It's indeed up again :) [11:59:36] for the 10th time ;) [12:01:15] echo (Rant::newForLanguage( 'Java' ))->rant(); [12:01:16] :P [12:15:48] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [500.0] [12:20:38] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [12:35:39] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:39:48] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54232 bytes in 6.651 second response time [12:40:19] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [12:43:58] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:46:49] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54232 bytes in 2.280 second response time [12:54:09] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:12:18] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54211 bytes in 2.201 second response time [13:15:28] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:16:21] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54211 bytes in 4.834 second response time [13:19:48] !log reedy Synchronized php-1.25wmf3/includes/templates/: Unbreak user signup (duration: 00m 15s) [13:19:56] Logged the message, Master [13:21:38] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [13:21:55] !log reedy Synchronized php-1.25wmf3/extensions/CentralAuth/: (no message) (duration: 00m 18s) [13:22:00] Logged the message, Master [13:36:48] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54211 bytes in 0.663 second response time [13:56:29] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [16:34:29] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:35:19] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54211 bytes in 0.546 second response time [16:50:09] PROBLEM - puppet last run on cp4005 is CRITICAL: CRITICAL: puppet fail [17:09:30] RECOVERY - puppet last run on cp4005 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [17:46:39] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: puppet fail [18:07:00] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [18:43:50] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:00:40] So has anybody from ops taken a look at Git's issues this weekend already? - See icinga-wm above and https://bugzilla.wikimedia.org/show_bug.cgi?id=71974 [19:45:39] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 63, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/3/0: down - Transit: ! Telia [10Gbps DF]BR [19:47:57] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [19:59:02] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 63, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/3/0: down - Transit: ! Telia [10Gbps DF]BR [20:06:44] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54237 bytes in 0.397 second response time [20:12:03] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 63, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/3/0: down - Transit: ! Telia [10Gbps DF]BR [20:13:34] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [21:17:19] (03PS1) 10coren: Labs: allow for growing volumes [puppet] - 10https://gerrit.wikimedia.org/r/166351 [21:21:16] (03PS2) 10coren: Labs: allow for growing volumes [puppet] - 10https://gerrit.wikimedia.org/r/166351 [21:24:44] (03PS3) 10coren: Labs: allow for growing volumes [puppet] - 10https://gerrit.wikimedia.org/r/166351 [21:38:43] PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: puppet fail [21:58:02] RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [22:13:47] hoo: hey [22:14:25] legoktm: Saw my CA patch? [22:14:37] It's flooding the hhvm log, but not as bad as I initially thought [22:17:14] PROBLEM - puppet last run on db1037 is CRITICAL: CRITICAL: Puppet has 1 failures [22:35:42] RECOVERY - puppet last run on db1037 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [22:59:26] !paste [23:26:02] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:27:02] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54237 bytes in 9.789 second response time [23:32:03] PROBLEM - git.wikimedia.org on antimony is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:39:26] (03CR) 10John Erling Blad: "The most important domain is set correct, so as far as I'm concirned this can be merged. Still I would like if someone could verify that w" (036 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/166176 (https://bugzilla.wikimedia.org/71195) (owner: 10Glaisher) [23:43:05] can someone unbreak git.wikimedia.org? [23:51:14] RECOVERY - git.wikimedia.org on antimony is OK: HTTP OK: HTTP/1.1 200 OK - 54237 bytes in 0.653 second response time [23:56:27] (03PS1) 10Hoo man: Make sure openstack::openstack-manager creates /a [puppet] - 10https://gerrit.wikimedia.org/r/166360 [23:57:03] This is the whole output of me trying to create a wikitech role... I gave up as the result was way to big for code review (in any reasonable time) :(