[00:39:13] PROBLEM - Kafka Broker Messages In Per Second on tungsten is CRITICAL: CRITICAL: Anomaly detected: 0 data above and 47 below the confidence bounds [00:39:16] PROBLEM - Kafka Broker Messages In Per Second on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 0 data above and 47 below the confidence bounds [00:52:36] It seems the French quickly surrendered. [00:53:45] ? [00:55:34] From the server admin log. [00:56:18] RECOVERY - Kafka Broker Messages In Per Second on graphite1001 is OK: OK: No anomaly detected [00:56:28] RECOVERY - Kafka Broker Messages In Per Second on tungsten is OK: OK: No anomaly detected [00:58:13] jerkins [01:07:35] PROBLEM - RAID on analytics1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:16:48] RECOVERY - RAID on analytics1004 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [02:04:53] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [500.0] [02:05:51] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [500.0] [02:11:02] !log l10nupdate Synchronized php-1.25wmf10/cache/l10n: (no message) (duration: 00m 01s) [02:11:07] !log LocalisationUpdate completed (1.25wmf10) at 2014-12-07 02:11:07+00:00 [02:11:13] Logged the message, Master [02:11:17] Logged the message, Master [02:16:26] !log l10nupdate Synchronized php-1.25wmf11/cache/l10n: (no message) (duration: 00m 02s) [02:16:30] Logged the message, Master [02:16:30] !log LocalisationUpdate completed (1.25wmf11) at 2014-12-07 02:16:30+00:00 [02:16:33] Logged the message, Master [02:17:49] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [02:19:48] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [02:32:03] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [500.0] [02:32:48] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [500.0] [02:44:45] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [02:46:58] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [03:41:53] !log LocalisationUpdate ResourceLoader cache refresh completed at Sun Dec 7 03:41:52 UTC 2014 (duration 41m 51s) [03:41:57] Logged the message, Master [04:41:54] hi. i need help. i speak spanish. [04:42:36] !help [04:43:07] icinga-wm [07:33:21] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [07:35:12] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [07:43:48] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [07:44:46] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [09:30:22] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 659 [09:35:24] RECOVERY - check_mysql on db1008 is OK: Uptime: 4565306 Threads: 49 Questions: 101066743 Slow queries: 29797 Opens: 87166 Flush tables: 2 Open tables: 64 Queries per second avg: 22.137 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [09:57:21] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [500.0] [09:59:22] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 53.33% of data above the critical threshold [500.0] [10:01:56] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [10:03:37] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 12 data above and 1 below the confidence bounds [10:20:33] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [10:22:36] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [10:44:59] just a quick note - since nobody I've spoken to is entirely certain this is being dealt with. "DEAR WIKIPEDIA READERS: You're probably busy, so we'll get right to it. This week we ask our readers to help us. This week we ask our readers to protect our site:en.wikipedia.org" is horrific [10:45:24] when searching Google. It's pot luck whether you get a correct article excerpt or the fundraising text. [10:46:06] Erik might be working on it, but surely the fundraising banner should be disabled until it can be confirmed it's working correctly and not wrecking Google results [10:53:39] JollyOldStNick: https://phabricator.wikimedia.org/T76743 [10:53:49] Please add your comment there. [10:54:05] perfect, will do [11:39:48] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [11:41:24] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [18:40:38] Anyone with "staff" rights on Wikimedia projects? I am trying to track down a possible bug with CheckUser and AbuseFilter... [18:42:53] can someone run some commands on Labs please ? [18:42:59] https://wikitech.wikimedia.org/wiki/User:Yuvipanda/Restarting_magnus_wdq [18:43:14] WDQ is down [18:44:27] GerardM-: #wikimedia-labs would be a better place to ask. [18:45:48] yeah right as if I had not done that already [18:46:21] service during European times is piss poor [18:46:28] Well, -labs is the channel for this stuff. -operations is a different realm here. [18:47:09] there are people here who can [18:47:20] and they are also in -labs. [18:47:26] right [18:48:31] the only thing I want is to get this done [18:48:41] and at some stage I do not care about niceties [19:15:45] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [19:17:45] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [19:41:05] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [19:41:40] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [19:59:28] PROBLEM - Disk space on mw1017 is CRITICAL: DISK CRITICAL - free space: /run 46 MB (3% inode=99%): [20:05:28] RECOVERY - Disk space on mw1017 is OK: DISK OK [22:10:21] (03PS1) 10Yuvipanda: Tools: Add libraries to toollabs exec_environ [puppet] - 10https://gerrit.wikimedia.org/r/178130 [22:10:26] matanya: ^ :) [22:10:39] matanya: do file phab requests as well in the future, but if you want you can make the patch after too :) [22:11:11] noted, i will, thanks. I remembered something like that, but not 100% [22:11:15] matanya: :) [22:11:29] matanya: actually, I fucked up. need to keep them in alphabetical order [22:12:04] (03PS2) 10Yuvipanda: Tools: Add libraries to toollabs exec_environ [puppet] - 10https://gerrit.wikimedia.org/r/178130 [22:12:18] that's better [22:12:29] now let's wait for jenkins-bot [22:13:24] (03CR) 10Yuvipanda: [C: 032] Tools: Add libraries to toollabs exec_environ [puppet] - 10https://gerrit.wikimedia.org/r/178130 (owner: 10Yuvipanda) [22:18:06] (03PS1) 10Yuvipanda: base: Do include ssh::server on all hosts [puppet] - 10https://gerrit.wikimedia.org/r/178133 [22:18:44] (03CR) 10Yuvipanda: "This accidentally dropped ssh::server include, which made puppet fail on all labs hosts :)" [puppet] - 10https://gerrit.wikimedia.org/r/177864 (owner: 10Alexandros Kosiaris) [22:19:24] (03CR) 10Yuvipanda: [C: 032] base: Do include ssh::server on all hosts [puppet] - 10https://gerrit.wikimedia.org/r/178133 (owner: 10Yuvipanda) [22:20:59] matanya: all fixed, and forcing run on tools-login :) that should have your packages now, and all other machines in the next 30-40mins [22:37:49] thanks YuviPanda ! [22:38:00] hi cajoel is the copy done? [22:50:32] (03PS1) 10Matanya: Tools: Add perl library to toollabs exec_environ [puppet] - 10https://gerrit.wikimedia.org/r/178134 [22:50:44] YuviPanda: ^ too please :) [22:51:51] (03CR) 10Yuvipanda: [C: 032] Tools: Add perl library to toollabs exec_environ [puppet] - 10https://gerrit.wikimedia.org/r/178134 (owner: 10Matanya) [22:53:37] matanya: ^ puppet merged [22:53:43] :) [22:54:23] now need to wait for a run [22:55:11] matanya: I can force one on -dev or somewhere if you’d like :) [22:55:19] I really need to get a salt grain in for ‘project' [22:55:23] hmm, maybe it already is there? [22:55:35] no rush, i should go to sleep anyway [22:58:23] matanya: ok! I’ll let it be then [22:58:32] it is there now [22:58:48] matanya: ah, cool :) [22:59:06] now i need to fox a code not touched since 2010 ... :) [22:59:46] fix [23:04:13] matanya: heh, sounds fun [23:04:25] i proved it works [23:04:35] new i need to bring it to 2014 [23:04:39] *now [23:10:23] YuviPanda: any policy against installing from cpan ? [23:10:38] matanya: globally? yes. everything has to be a deb package [23:10:51] and in my /home ? [23:10:57] matanya: nope! feel free to do that [23:11:14] i should just package the thing. /me lazes [23:11:21] heh [23:11:30] well, if you can get away with doing it in your home, I’d reccomend that [23:11:35] same way for python I’d reccomend virtualenv [23:11:55] software rots badly [23:12:28] (03PS1) 10Yuvipanda: dynamicproxy: Make no user agent 403 more verbose [puppet] - 10https://gerrit.wikimedia.org/r/178135 [23:59:52] PROBLEM - HHVM processes on mw1033 is CRITICAL: PROCS CRITICAL: 0 processes with command name hhvm