[02:12:19] <icinga-wm>	 PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3790 MB (3% inode=99%):  
[02:13:53] <logmsgbot>	 !log LocalisationUpdate completed (1.24wmf3) at 2014-05-11 02:12:49+00:00
[02:14:04] <morebots>	 Logged the message, Master
[02:21:19] <icinga-wm>	 PROBLEM - Disk space on virt0 is CRITICAL: DISK CRITICAL - free space: /a 3433 MB (3% inode=99%):  
[02:23:06] <logmsgbot>	 !log LocalisationUpdate completed (1.24wmf4) at 2014-05-11 02:22:02+00:00
[02:23:12] <morebots>	 Logged the message, Master
[02:41:09] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds  
[03:00:19] <icinga-wm>	 RECOVERY - Disk space on virt0 is OK: DISK OK  
[03:10:09] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 14.29% of data exceeded the critical threshold [500.0]  
[03:11:43] <logmsgbot>	 !log LocalisationUpdate ResourceLoader cache refresh completed at Sun May 11 03:10:37 UTC 2014 (duration 10m 36s)
[03:11:50] <morebots>	 Logged the message, Master
[03:18:19] <chasemp>	 Got a page...seems not meaningful, should tweak anomaly stuff to be only if paging isn't actionable right now
[03:19:05] <chasemp>	 Email only I meant
[03:24:09] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% data above the threshold [250.0]  
[04:02:09] <icinga-wm>	 RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected  
[07:54:00] <grrrit-wm>	 (03CR) 10Hoo man: [C: 04-1] "Looks good in general" (035 comments) [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/130274 (https://bugzilla.wikimedia.org/64255) (owner: 10Gerrit Patch Uploader)
[08:13:33] <jzerebecki>	 csteipp: the better gpg keyfob: https://www.assembla.com/spaces/cryptostick/wiki and the other one i spoke about: www.ftsafe.com/product/epass/epass2003
[08:31:51] <mwalker>	 ori, sumanah would love to have you in room 2 to talk about performance guidelines if you're around
[09:26:25] <grrrit-wm>	 (03PS4) 10Hoo man: Run rebuildEntityPerPage.php on Wikidata (once per week) [operations/puppet] - 10https://gerrit.wikimedia.org/r/120535 
[09:29:50] <grrrit-wm>	 (03PS5) 10Hoo man: Run rebuildEntityPerPage.php on Wikidata (once per week) [operations/puppet] - 10https://gerrit.wikimedia.org/r/120535 
[11:16:08] <grrrit-wm>	 (03PS1) 10TheDJ: Remove old WikiEditor settings [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/132793 
[11:17:04] <grrrit-wm>	 (03PS2) 10Ori.livneh: Move rcstream server implementation to external repo [operations/puppet] - 10https://gerrit.wikimedia.org/r/132429 
[11:18:45] <grrrit-wm>	 (03PS2) 10Ori.livneh: Move diamond::generic to manifests/ and lint [operations/puppet] - 10https://gerrit.wikimedia.org/r/132218 
[11:19:17] <grrrit-wm>	 (03CR) 10Ori.livneh: "Ok. I'll amend it to make the changes to the parameter types to the file in its current location." [operations/puppet] - 10https://gerrit.wikimedia.org/r/132218 (owner: 10Ori.livneh)
[11:19:31] <grrrit-wm>	 (03PS4) 10Ori.livneh: Tidy ::applicationserver & ::applicationserver::pybal_check [operations/puppet] - 10https://gerrit.wikimedia.org/r/132217 
[11:19:39] <icinga-wm>	 PROBLEM - Apache HTTP on mw1156 is CRITICAL: Connection timed out  
[11:20:06] <grrrit-wm>	 (03PS1) 10Hoo man: Remove long absent Wikidata crons from puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/132794 
[11:20:08] <grrrit-wm>	 (03PS1) 10Hoo man: Run Wikidata maint. scripts as apache instead of mwdeploy [operations/puppet] - 10https://gerrit.wikimedia.org/r/132795 
[11:20:09] <icinga-wm>	 PROBLEM - Apache HTTP on mw1155 is CRITICAL: Connection timed out  
[11:20:09] <icinga-wm>	 PROBLEM - Apache HTTP on mw1158 is CRITICAL: Connection timed out  
[11:20:09] <icinga-wm>	 PROBLEM - Apache HTTP on mw1157 is CRITICAL: Connection timed out  
[11:20:09] <icinga-wm>	 PROBLEM - Apache HTTP on mw1159 is CRITICAL: Connection timed out  
[11:20:09] <icinga-wm>	 PROBLEM - Apache HTTP on mw1154 is CRITICAL: Connection timed out  
[11:20:09] <icinga-wm>	 PROBLEM - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is CRITICAL: Connection timed out  
[11:20:25] <hoo>	 ottomata: ^
[11:21:59] <icinga-wm>	 RECOVERY - Apache HTTP on mw1155 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 443 bytes in 0.053 second response time  
[11:21:59] <icinga-wm>	 RECOVERY - Apache HTTP on mw1158 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 443 bytes in 0.078 second response time  
[11:21:59] <icinga-wm>	 RECOVERY - Apache HTTP on mw1154 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 443 bytes in 0.071 second response time  
[11:21:59] <icinga-wm>	 RECOVERY - Apache HTTP on mw1159 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 443 bytes in 0.080 second response time  
[11:21:59] <icinga-wm>	 RECOVERY - LVS HTTP IPv4 on rendering.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 68115 bytes in 0.269 second response time  
[11:22:09] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 21.43% of data exceeded the critical threshold [500.0]  
[11:22:29] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Remove long absent Wikidata crons from puppet [operations/puppet] - 10https://gerrit.wikimedia.org/r/132794 (owner: 10Hoo man)
[11:22:29] <icinga-wm>	 RECOVERY - Apache HTTP on mw1156 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 443 bytes in 0.059 second response time  
[11:22:58] <chasemp>	 So what in the heck was that about
[11:22:59] <icinga-wm>	 RECOVERY - Apache HTTP on mw1157 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 443 bytes in 0.066 second response time  
[11:24:31] <paravoid>	 https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Swift%20eqiad&m=cpu_report&r=hour&s=by%20name&hc=4&mc=2&st=1399807327&g=network_report&z=large
[11:24:44] <grrrit-wm>	 (03CR) 10Ottomata: [C: 032 V: 032] Run Wikidata maint. scripts as apache instead of mwdeploy [operations/puppet] - 10https://gerrit.wikimedia.org/r/132795 (owner: 10Hoo man)
[11:32:48] <chasemp>	 Wow, thanks.  Serious spike 
[11:33:09] <paravoid>	 swift's network got saturated
[11:33:26] <paravoid>	 there seems to be a more or less equivalent network spike on imagescalers
[11:34:22] <paravoid>	 so this suggests the imagescaler cluster requested one or more large images from swift at about the same time
[11:34:37] <paravoid>	 with multiple requests, since all of the imagescalers show this spike and all of the swift proxies as ewll
[11:35:19] <Reedy>	 PoolCounter :)
[11:36:15] <paravoid>	 nypl again
[11:36:24] <paravoid>	 GET /v1/AUTH_mw/wikipedia-commons-local-public.29/2/29/Bronx,_V._12,_Double_Page_Plate_No._273_%28Map_bounded_by_Whiting_Ave.,_Ewen_Ave.,_Warren_Ave.,_Hudson_River%29_NYPL2001533.tiff:
[11:36:28] <paravoid>	 ffs
[11:37:00] <Reedy>	 Multiple requests for the same file to scale?
[11:37:09] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% data above the threshold [250.0]  
[11:38:28] <paravoid>	 11/May/2014/11/36/06 PUT /v1/AUTH_mw/wikipedia-commons-local-public.bc/b/bc/Manhattan%252C_V._11%252C_Plate_No._2_%2528Map_bounded_by_12th_Ave.%252C_W._133rd_St.%252C_Broadway%252C_W._130th_St.%2529_NYPL1995957.tiff
[11:38:36] <paravoid>	 that looks like an upload doesn't it
[11:38:55] <Reedy>	 must be with the put
[11:39:08] <paravoid>	 https://commons.wikimedia.org/wiki/Special:Contributions/F%C3%A6
[11:41:00] <paravoid>	 the good news is, it has been going on for hours
[11:41:43] <paravoid>	 oh wait, these were metadata updates
[11:42:22] <Reedy>	 https://commons.wikimedia.org/wiki/Special:ListFiles/F%C3%A6
[11:43:16] <Reedy>	 50 in the last couple of ideas
[11:43:20] <Reedy>	 ideas? hours
[11:52:19] <paravoid>	 # grep -hr filename * | grep NYPL | sort -u |wc -l
[11:52:20] <paravoid>	 7396
[11:52:22] <paravoid>	 yeah okay
[12:05:53] <paravoid>	 chasemp: placed a workaround and sent an update to ops@
[12:06:02] <paravoid>	 going back to the hackathon stuff now
[12:07:13] <chasemp>	 Awesome thank you for keeping me in the loop, even though I'm not too useful on it
[12:33:58] <icinga-wm>	 PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Sun May 11 12:29:07 2014  
[12:35:58] <icinga-wm>	 PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Sun May 11 12:29:07 2014  
[12:37:58] <icinga-wm>	 PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Sun May 11 12:29:07 2014  
[12:39:58] <icinga-wm>	 PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Sun May 11 12:29:07 2014  
[12:40:34] <greg-g>	 we get it
[12:41:51] <James_F>	 :-)
[12:41:58] <icinga-wm>	 PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Sun May 11 12:29:07 2014  
[12:42:09] * andrewbogott  is looking but only has five mins
[12:43:08] <icinga-wm>	 RECOVERY - Puppet freshness on hooft is OK: puppet ran at Sun May 11 12:43:02 UTC 2014  
[12:43:46] <andrewbogott>	 ooh, it looks like it's my fault too :(
[12:43:51] <greg-g>	 hah!
[12:44:58] <icinga-wm>	 PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Sun May 11 12:43:02 2014  
[12:46:58] <icinga-wm>	 PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Sun May 11 12:43:02 2014  
[12:47:59] <andrewbogott>	 The fact that it's telling us every five minutes though… I don't think that part is my fault :)
[12:48:48] <icinga-wm>	 RECOVERY - Puppet freshness on hooft is OK: puppet ran at Sun May 11 12:48:41 UTC 2014  
[12:50:58] <icinga-wm>	 PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Sun May 11 12:48:41 2014  
[12:52:58] <icinga-wm>	 PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Sun May 11 12:48:41 2014  
[12:54:58] <icinga-wm>	 PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Sun May 11 12:48:41 2014  
[12:55:08] <icinga-wm>	 PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 21.43% of data exceeded the critical threshold [500.0]  
[12:56:58] <icinga-wm>	 PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Sun May 11 12:48:41 2014  
[12:58:58] <icinga-wm>	 PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Sun May 11 12:48:41 2014  
[13:00:08] <icinga-wm>	 RECOVERY - Puppet freshness on hooft is OK: puppet ran at Sun May 11 13:00:03 UTC 2014  
[13:01:58] <icinga-wm>	 PROBLEM - Puppet freshness on hooft is CRITICAL: Last successful Puppet run was Sun May 11 13:00:03 2014  
[13:08:04] <icinga-wm>	 RECOVERY - Puppet freshness on hooft is OK: puppet ran at Sun May 11 13:07:59 UTC 2014  
[13:11:08] <grrrit-wm>	 (03PS1) 10Yuvipanda: toollabs: Add gdal-bin to the exec environment [operations/puppet] - 10https://gerrit.wikimedia.org/r/132803 
[13:11:12] <YuviPanda>	 andrewbogott: Coren ^ minor patch?
[13:12:20] <andrewbogott>	 YuviPanda: is there a bug for that?  Might be good to keep a trail of what/why
[13:12:34] <YuviPanda>	 andrewbogott: yeah, moment
[13:12:51] <grrrit-wm>	 (03PS2) 10Yuvipanda: toollabs: Add gdal-bin to the exec environment [operations/puppet] - 10https://gerrit.wikimedia.org/r/132803 (https://bugzilla.wikimedia.org/65123) 
[13:12:52] <YuviPanda>	 andrewbogott: done
[13:13:05] <YuviPanda>	 that was the only missing package I think
[13:13:34] <andrewbogott>	 YuviPanda: I think we're going to give the wikiatlas people their own project, so this is moot ftm
[13:13:59] <YuviPanda>	 andrewbogott: no I am sitting next to them right now and they don't need their own project.
[13:14:03] <YuviPanda>	 this was the only thing missing
[13:14:19] <YuviPanda>	 andrewbogott: plus once I explained that they will have to spend time administering it themselves and it is much easier with tools they were happy with tools
[13:14:24] <icinga-wm>	 RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% data above the threshold [250.0]  
[13:14:26] <YuviPanda>	 let me comment on the bug as well
[13:14:27] <andrewbogott>	 ok then :)
[13:15:26] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] toollabs: Add gdal-bin to the exec environment [operations/puppet] - 10https://gerrit.wikimedia.org/r/132803 (https://bugzilla.wikimedia.org/65123) (owner: 10Yuvipanda)
[13:16:05] <YuviPanda>	 andrewbogott: ty
[14:06:31] <andrewbogott>	 akosiaris: the vm is named trusty-test-puppetmaster
[14:21:56] <cmjohnson1>	 !log power cycling asw-d5-eqiad 
[14:22:03] <morebots>	 Logged the message, Master
[14:26:24] <icinga-wm>	 RECOVERY - Host mw1208 is UP: PING OK - Packet loss = 0%, RTA = 0.49 ms  
[14:26:24] <icinga-wm>	 RECOVERY - Host mw1201 is UP: PING OK - Packet loss = 0%, RTA = 1.73 ms  
[14:26:24] <icinga-wm>	 RECOVERY - Host mw1209 is UP: PING OK - Packet loss = 0%, RTA = 0.74 ms  
[14:26:24] <icinga-wm>	 RECOVERY - Host mw1203 is UP: PING OK - Packet loss = 0%, RTA = 0.24 ms  
[14:26:34] <icinga-wm>	 RECOVERY - Host mw1210 is UP: PING OK - Packet loss = 0%, RTA = 1.12 ms  
[14:26:44] <icinga-wm>	 RECOVERY - check configured eth on lvs1002 is OK: NRPE: Unable to read output  
[14:26:44] <icinga-wm>	 RECOVERY - Host mw1202 is UP: PING OK - Packet loss = 0%, RTA = 1.00 ms  
[14:26:44] <icinga-wm>	 RECOVERY - check configured eth on lvs1003 is OK: NRPE: Unable to read output  
[14:27:04] <icinga-wm>	 RECOVERY - check configured eth on lvs1001 is OK: NRPE: Unable to read output  
[14:33:14] <icinga-wm>	 PROBLEM - Puppet freshness on mw1203 is CRITICAL: Last successful Puppet run was Sat May 10 08:23:53 2014  
[14:45:24] <andrewbogott>	 godog: here?
[14:54:14] <icinga-wm>	 RECOVERY - Puppet freshness on mw1203 is OK: puppet ran at Sun May 11 14:54:07 UTC 2014  
[14:55:17] <grrrit-wm>	 (03CR) 10Andrew Bogott: [C: 032] Include the labs_initial_content role in labs_vagrant. [operations/puppet] - 10https://gerrit.wikimedia.org/r/132721 (owner: 10Andrew Bogott)
[15:07:29] <akosiaris>	 YuviPanda: https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help#Connecting_to_OSM_via_the_official_CLI_PostgreSQL
[15:14:08] <matanya>	 akosiaris: did you have time to think of role::firewall ?
[15:14:28] <twkozlowski>	 paravoid: What does 'brief' mean?
[15:14:52] <twkozlowski>	 (in your e-mail to multimedia@lists.wm.org)
[15:16:17] <twkozlowski>	 Oh. Log says two minutes.
[15:17:29] <Nemo_bis>	 twkozlowski: nice duration to report on the tech news for once ;)
[15:19:19] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Create a wikimaps_atlas postgis database [operations/puppet] - 10https://gerrit.wikimedia.org/r/132813 (https://bugzilla.wikimedia.org/63382) 
[15:21:49] <twkozlowski>	 Nemo_bis: My thought exactly, but that'll have to wait till next week
[15:22:37] <twkozlowski>	 'For two minutes on May 11, there were problems with image scaling due to a high server load.'
[15:23:34] <logmsgbot>	 !log reedy synchronized php-1.24wmf4/thumb.php
[15:23:40] <morebots>	 Logged the message, Master
[15:24:37] <Nemo_bis>	 twkozlowski: I think you can rewrite it to be more positive :P
[15:24:38] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032] Create a wikimaps_atlas postgis database [operations/puppet] - 10https://gerrit.wikimedia.org/r/132813 (https://bugzilla.wikimedia.org/63382) (owner: 10Alexandros Kosiaris)
[15:25:57] <twkozlowski>	 'For just two minutes on May 11, there were unnoticeable problems with scaling of a small number of files due to an excessively high server load.'
[15:26:12] <logmsgbot>	 !log reedy synchronized php-1.24wmf3/thumb.php
[15:26:20] <morebots>	 Logged the message, Master
[15:30:59] <grrrit-wm>	 (03PS1) 10Alexandros Kosiaris: Fix a typo (planembad=>planemad) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132814 
[15:32:17] <grrrit-wm>	 (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Fix a typo (planembad=>planemad) [operations/puppet] - 10https://gerrit.wikimedia.org/r/132814 (owner: 10Alexandros Kosiaris)
[15:43:23] <grrrit-wm>	 (03CR) 10Ori.livneh: [C: 032] Tidy ::applicationserver & ::applicationserver::pybal_check [operations/puppet] - 10https://gerrit.wikimedia.org/r/132217 (owner: 10Ori.livneh)
[15:50:49] <grrrit-wm>	 (03CR) 10JanZerebecki: "There should be no problem with the possible slight increase in CPU load as the affected clusters aren't AFAIK utilized to that extend:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132393 (https://bugzilla.wikimedia.org/53259) (owner: 10JanZerebecki)
[15:57:44] <icinga-wm>	 PROBLEM - NTP on mw1208 is CRITICAL: NTP CRITICAL: Offset unknown  
[16:03:44] <icinga-wm>	 RECOVERY - NTP on mw1208 is OK: NTP OK: Offset 0.005566000938 secs  
[16:12:27] <grrrit-wm>	 (03CR) 10JanZerebecki: "In case the HSTS causes undesirable effects (more HTTPS users than could be expected) it can be reversed with max-age=0 ." [operations/puppet] - 10https://gerrit.wikimedia.org/r/132393 (https://bugzilla.wikimedia.org/53259) (owner: 10JanZerebecki)
[16:20:57] <grrrit-wm>	 (03CR) 10JanZerebecki: "This needs a files/ssl/dhparam.pem in the private repository generated by using: openssl dhparam 2048" [operations/puppet] - 10https://gerrit.wikimedia.org/r/132393 (https://bugzilla.wikimedia.org/53259) (owner: 10JanZerebecki)
[16:47:14] <icinga-wm>	 PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Sun May 11 13:46:58 2014  
[16:47:34] <icinga-wm>	 RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Sun May 11 16:47:31 UTC 2014  
[18:19:34] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 3 below the confidence bounds  
[20:00:44] <icinga-wm>	 RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected  
[22:50:44] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 9 below the confidence bounds  
[23:00:45] <icinga-wm>	 PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 9 below the confidence bounds