[00:03:50] (03CR) 10Ori.livneh: [C: 031] "Woot! Thank you so much!" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152960 (owner: 10BBlack) [00:13:57] RECOVERY - puppet last run on mw1022 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:19:57] RECOVERY - Puppet freshness on db1010 is OK: puppet ran at Sat Aug 9 00:19:44 UTC 2014 [00:30:37] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Epic puppet fail [00:32:18] PROBLEM - Disk space on lanthanum is CRITICAL: DISK CRITICAL - free space: /var/lib/jenkins-slave/tmpfs 15 MB (3% inode=99%): [00:42:18] PROBLEM - Disk space on lanthanum is CRITICAL: DISK CRITICAL - free space: /var/lib/jenkins-slave/tmpfs 8 MB (1% inode=99%): [00:44:18] RECOVERY - Disk space on lanthanum is OK: DISK OK [00:50:37] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [01:47:57] PROBLEM - OCG health on ocg1001 is CRITICAL: CRITICAL: /mnt/tmpfs 5.0GB (= 5.0GB critical): /srv/deployment/ocg/output 1449964954B: /srv/deployment/ocg/postmortem 948296B: ocg_job_status 6060 msg: ocg_render_job_queue 0 msg [02:15:17] PROBLEM - Disk space on ocg1001 is CRITICAL: DISK CRITICAL - free space: /mnt/tmpfs 0 MB (0% inode=99%): [02:20:06] !log LocalisationUpdate completed (1.24wmf15) at 2014-08-09 02:19:03+00:00 [02:20:14] Logged the message, Master [02:34:58] (03PS4) 10Ori.livneh: apache::def: port to env-{enabled,disabled} [operations/puppet] - 10https://gerrit.wikimedia.org/r/152868 [02:35:20] (03PS1) 10Ori.livneh: HHVM: ensure runtime dirs are created before the service [operations/puppet] - 10https://gerrit.wikimedia.org/r/153010 [02:36:13] (03PS5) 10Ori.livneh: apache::def: port to env-{enabled,disabled} [operations/puppet] - 10https://gerrit.wikimedia.org/r/152868 [02:37:56] !log LocalisationUpdate completed (1.24wmf16) at 2014-08-09 02:36:52+00:00 [02:38:04] Logged the message, Master [02:43:07] PROBLEM - OCG health on ocg1001 is CRITICAL: CRITICAL: /mnt/tmpfs 33.3GB (= 5.0GB critical): /srv/deployment/ocg/output 1345580119B: /srv/deployment/ocg/postmortem 948296B: ocg_job_status 6060 msg: ocg_render_job_queue 0 msg [02:45:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 3885 MB (10% inode=94%): /srv 509149 MB (35% inode=99%): [02:50:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 3799 MB (10% inode=94%): /srv 507313 MB (34% inode=99%): [02:55:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 3698 MB (10% inode=94%): /srv 505561 MB (34% inode=99%): [03:00:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 3611 MB (10% inode=94%): /srv 503773 MB (34% inode=99%): [03:05:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 3524 MB (9% inode=94%): /srv 501945 MB (34% inode=99%): [03:10:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 3460 MB (9% inode=94%): /srv 500125 MB (34% inode=99%): [03:13:47] PROBLEM - Puppet freshness on db1006 is CRITICAL: Last successful Puppet run was Sat 09 Aug 2014 01:13:01 UTC [03:15:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 3381 MB (9% inode=94%): /srv 498409 MB (34% inode=99%): [03:20:08] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 3325 MB (9% inode=94%): /srv 496585 MB (34% inode=99%): [03:21:13] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Aug 9 03:20:07 UTC 2014 (duration 20m 6s) [03:21:19] Logged the message, Master [03:25:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 3248 MB (9% inode=94%): /srv 494817 MB (34% inode=99%): [03:30:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 3167 MB (8% inode=94%): /srv 492989 MB (33% inode=99%): [03:35:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 3094 MB (8% inode=94%): /srv 491233 MB (33% inode=99%): [03:40:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 3005 MB (8% inode=94%): /srv 489397 MB (33% inode=99%): [03:45:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2768 MB (7% inode=94%): /srv 487517 MB (33% inode=99%): [03:50:08] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2605 MB (7% inode=94%): /srv 485745 MB (33% inode=99%): [03:53:07] RECOVERY - Puppet freshness on db1006 is OK: puppet ran at Sat Aug 9 03:52:57 UTC 2014 [03:55:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2291 MB (6% inode=94%): /srv 484129 MB (33% inode=99%): [04:00:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2249 MB (6% inode=94%): /srv 483205 MB (33% inode=99%): [04:05:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2249 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [04:10:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2249 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [04:11:10] (03CR) 10Ori.livneh: "* injected_stack_trace is gone (see 2b860aa71f34bf426d252faa1f3987eb2aadd9b7)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/148544 (owner: 10MaxSem) [04:15:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2248 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [04:15:50] (03PS2) 10Ori.livneh: HHVM: simplify logging options [operations/puppet] - 10https://gerrit.wikimedia.org/r/148544 (owner: 10MaxSem) [04:16:37] (03CR) 10Ori.livneh: [C: 032 V: 032] HHVM: simplify logging options [operations/puppet] - 10https://gerrit.wikimedia.org/r/148544 (owner: 10MaxSem) [04:20:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2248 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [04:25:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2248 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [04:30:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2248 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [04:35:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2247 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [04:40:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2247 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [04:45:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2247 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [04:50:08] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2247 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [04:50:17] PROBLEM - puppet last run on ms-be3001 is CRITICAL: CRITICAL: Epic puppet fail [04:53:47] PROBLEM - Puppet freshness on db1011 is CRITICAL: Last successful Puppet run was Sat 09 Aug 2014 02:53:27 UTC [04:55:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2247 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [04:57:20] Hi, we have a rogue icinga alert in frack, I'm hoping someone can acknowledge for us? [04:57:55] We nearly filled the disk on our staging box, but it's stable now, and non - critical. I just need to turn the alert off... [05:00:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2246 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [05:00:55] That one. ^^ needs acknowledge :) [05:05:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2246 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [05:10:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2246 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [05:10:17] RECOVERY - puppet last run on ms-be3001 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [05:15:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2246 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [05:20:17] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2245 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [05:25:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2245 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [05:30:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2245 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [05:34:07] RECOVERY - Puppet freshness on db1011 is OK: puppet ran at Sat Aug 9 05:33:57 UTC 2014 [05:35:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2245 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [05:40:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2244 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [05:45:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2244 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [05:47:47] PROBLEM - puppet last run on cp4004 is CRITICAL: CRITICAL: Epic puppet fail [05:50:08] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2244 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [05:55:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2244 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [06:00:07] PROBLEM - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2243 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): [06:01:13] <_joe_> I will ack this [06:02:58] ACKNOWLEDGEMENT - check_disk on lutetium is CRITICAL: DISK CRITICAL - free space: / 2243 MB (6% inode=94%): /srv 483252 MB (33% inode=99%): Giuseppe Lavagetto awight We nearly filled the disk on our staging box, but its stable now, and non - critical. I just need to turn the alert off... [06:08:47] RECOVERY - puppet last run on cp4004 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [06:28:27] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:37] PROBLEM - puppet last run on analytics1035 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:08] PROBLEM - puppet last run on db1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:17] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:17] PROBLEM - puppet last run on lvs1005 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:17] PROBLEM - puppet last run on elastic1008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:37] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: Puppet has 3 failures [06:29:37] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:47] PROBLEM - puppet last run on analytics1030 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:57] PROBLEM - puppet last run on lvs3001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:58] PROBLEM - puppet last run on mw1173 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:58] PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:07] PROBLEM - puppet last run on cp1061 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:08] RECOVERY - check_disk on lutetium is OK: DISK OK - free space: / 12513 MB (35% inode=94%): /srv 483257 MB (33% inode=99%): [06:30:28] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:38] PROBLEM - puppet last run on searchidx1001 is CRITICAL: CRITICAL: Puppet has 2 failures [06:40:58] (03PS3) 10Giuseppe Lavagetto: switch rcstream to source hash LVS scheduling [operations/puppet] - 10https://gerrit.wikimedia.org/r/152960 (owner: 10BBlack) [06:41:17] (03CR) 10Giuseppe Lavagetto: [C: 032] switch rcstream to source hash LVS scheduling [operations/puppet] - 10https://gerrit.wikimedia.org/r/152960 (owner: 10BBlack) [06:42:07] <_joe_> ori: whenever you're around, I'm merging ^^, we can test rcstream to work correctly with two backends and this LB config [06:44:37] RECOVERY - puppet last run on analytics1035 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [06:44:58] RECOVERY - puppet last run on mw1009 is OK: OK: Puppet is currently enabled, last run 1 seconds ago with 0 failures [06:45:17] RECOVERY - puppet last run on lvs1005 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:45:17] RECOVERY - puppet last run on elastic1008 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [06:45:27] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [06:45:57] RECOVERY - puppet last run on lvs3001 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [06:45:58] RECOVERY - puppet last run on mw1173 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:46:07] RECOVERY - puppet last run on cp1061 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:46:07] RECOVERY - puppet last run on db1002 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:46:17] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [06:46:17] RECOVERY - Disk space on ocg1001 is OK: DISK OK [06:46:27] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:46:37] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:46:37] RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:46:48] RECOVERY - puppet last run on analytics1030 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:47:17] RECOVERY - OCG health on ocg1001 is OK: OK: /mnt/tmpfs 0B: /srv/deployment/ocg/output 1345580119B: /srv/deployment/ocg/postmortem 1058122B: ocg_job_status 6066 msg: ocg_render_job_queue 0 msg [06:48:38] (03PS1) 10Springle: replicate information_schema_p from sanitarium to labsdb [operations/puppet] - 10https://gerrit.wikimedia.org/r/153015 [06:49:37] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: Puppet has 1 failures [06:49:42] (03CR) 10Springle: [C: 032] replicate information_schema_p from sanitarium to labsdb [operations/puppet] - 10https://gerrit.wikimedia.org/r/153015 (owner: 10Springle) [06:50:38] RECOVERY - puppet last run on searchidx1001 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:54:47] (03PS1) 10Springle: MariaDB events for sanitarium slaves [operations/software] - 10https://gerrit.wikimedia.org/r/153017 [06:56:01] (03CR) 10Springle: [C: 032] MariaDB events for sanitarium slaves [operations/software] - 10https://gerrit.wikimedia.org/r/153017 (owner: 10Springle) [07:00:47] PROBLEM - Puppet freshness on db1010 is CRITICAL: Last successful Puppet run was Sat 09 Aug 2014 05:00:24 UTC [07:02:37] (03PS1) 10Springle: No-auto-cleanup mode for osc_host.sh when wrapping pt-online-schema-change, to allow manual intervention and sanity checks. Useful for tables that are more likely to cause metadata locking issues, eg, revision, page, anything-enwiki. [operations/software] - 10https://gerrit.wikimedia.org/r/153019 [07:03:29] (03CR) 10Springle: [C: 032] No-auto-cleanup mode for osc_host.sh when wrapping pt-online-schema-change, to allow manual intervention and sanity checks. Useful for table [operations/software] - 10https://gerrit.wikimedia.org/r/153019 (owner: 10Springle) [07:06:37] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [07:18:41] <_joe_> hey sean [07:19:30] <_joe_> woking on the weekend? [07:35:02] _joe_: yeah, rainy day :) [07:35:11] you too huh [07:36:11] <_joe_> oh I'm just daughter-less and my wife was still sleeping [07:36:25] <_joe_> so, what can a nerd do? [07:36:29] :D [07:38:12] "Code-Review-3" that does not look right : https://gerrit.wikimedia.org/r/#/c/148619/ [07:40:38] RECOVERY - Puppet freshness on db1010 is OK: puppet ran at Sat Aug 9 07:40:36 UTC 2014 [07:52:41] (03PS1) 10Springle: MariaDB events for analytics eventlogging [operations/software] - 10https://gerrit.wikimedia.org/r/153021 [07:54:20] (03CR) 10Springle: [C: 032] MariaDB events for analytics eventlogging [operations/software] - 10https://gerrit.wikimedia.org/r/153021 (owner: 10Springle) [07:57:06] (03PS1) 10Springle: loop bounds bug [operations/software] - 10https://gerrit.wikimedia.org/r/153022 [07:57:31] (03CR) 10Springle: [C: 032] loop bounds bug [operations/software] - 10https://gerrit.wikimedia.org/r/153022 (owner: 10Springle) [08:00:36] (03PS1) 10Springle: MariaDB events for labsdb100[123] [operations/software] - 10https://gerrit.wikimedia.org/r/153023 [08:01:18] (03CR) 10Springle: [C: 032] MariaDB events for labsdb100[123] [operations/software] - 10https://gerrit.wikimedia.org/r/153023 (owner: 10Springle) [08:13:26] (03CR) 10Gage: [C: 032 V: 032] Deb for logstash-gelf.jar: liblogstash-gelf-java [operations/debs/logstash-gelf] - 10https://gerrit.wikimedia.org/r/151615 (owner: 10Gage) [08:15:11] (03PS1) 10Springle: external script replaced by events [operations/software] - 10https://gerrit.wikimedia.org/r/153024 [08:15:46] (03CR) 10Springle: [C: 032] external script replaced by events [operations/software] - 10https://gerrit.wikimedia.org/r/153024 (owner: 10Springle) [08:52:37] (03Restored) 10Yuvipanda: Tools: Add some i386 compat packages to exec nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/125241 (owner: 10Yuvipanda) [08:52:56] (03PS3) 10Yuvipanda: Tools: Add some i386 compat packages to exec nodes [operations/puppet] - 10https://gerrit.wikimedia.org/r/125241 [09:37:47] PROBLEM - Puppet freshness on db1009 is CRITICAL: Last successful Puppet run was Sat 09 Aug 2014 07:37:33 UTC [09:47:57] PROBLEM - puppet last run on amssq46 is CRITICAL: CRITICAL: Epic puppet fail [09:56:26] andrewbogott_afk: re https://gerrit.wikimedia.org/r/#/c/140881/ yes, its more important though that it still works for you to auth at https://ishmael.wikimedia.org/ [10:08:57] RECOVERY - puppet last run on amssq46 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [10:12:51] (03CR) 10Filippo Giunchedi: "a comparison of the two approaches is outlined here:" [operations/puppet] - 10https://gerrit.wikimedia.org/r/151605 (owner: 10Giuseppe Lavagetto) [10:14:09] (03CR) 10Filippo Giunchedi: [C: 031] apache::def: port to env-{enabled,disabled} [operations/puppet] - 10https://gerrit.wikimedia.org/r/152868 (owner: 10Ori.livneh) [10:23:00] (03CR) 10Filippo Giunchedi: "LGTM, but will need some time next week to think more about it" [operations/puppet] - 10https://gerrit.wikimedia.org/r/151869 (owner: 10Giuseppe Lavagetto) [10:25:37] (03PS2) 10Dzahn: Revert "apache::monitoring: add diamond support; ensure mod_status is enabled" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152943 (owner: 10BryanDavis) [10:26:17] (03CR) 10Dzahn: [C: 031] "the reverted change breaks beta" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152943 (owner: 10BryanDavis) [10:26:48] (03PS3) 10Hoo man: Allow "hoo" to sudo into datasets [operations/puppet] - 10https://gerrit.wikimedia.org/r/152724 [10:26:50] (03PS1) 10Hoo man: Remove brion from the "datasets" group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153034 [10:27:32] (03PS2) 10Hoo man: Remove brion from the "dataset-admin" group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153034 [10:31:24] (03CR) 10Dzahn: [C: 032] "Brion confirmed at Wikimania" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153034 (owner: 10Hoo man) [10:40:48] (03CR) 10Brion VIBBER: "Note that we've moved mobile non-store & beta release files to the new 'releases.wikimedia.org' so I no longer need access on the old data" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153034 (owner: 10Hoo man) [10:43:01] (03CR) 10Dzahn: "vumi was a project by Patrick and Jeremy from South Africa, we gotta figure out who, if any, supports this nowadays and if it's still even" [operations/puppet] - 10https://gerrit.wikimedia.org/r/117673 (owner: 10Matanya) [10:53:08] mutante: https://gerrit.wikimedia.org/r/#/c/125241/ [10:53:47] PROBLEM - Puppet freshness on db1011 is CRITICAL: Last successful Puppet run was Sat 09 Aug 2014 08:52:42 UTC [10:58:56] Coren_WM2014: ping? [10:59:26] (03CR) 10Dzahn: graphite: add access for ldap groups ops and nda (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/152928 (owner: 10JanZerebecki) [11:09:39] (03CR) 10Dzahn: [C: 031] Tools: Use apt::repository instead of file resources [operations/puppet] - 10https://gerrit.wikimedia.org/r/123882 (owner: 10Tim Landscheidt) [11:13:37] RECOVERY - Puppet freshness on db1011 is OK: puppet ran at Sat Aug 9 11:13:33 UTC 2014 [11:15:16] (03CR) 10Dzahn: "deprecated by upcoming phabricator or still worth it?" [operations/puppet] - 10https://gerrit.wikimedia.org/r/80577 (owner: 10Faidon Liambotis) [11:19:29] (03CR) 10Dzahn: graphite: add access for ldap groups ops and nda (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/152928 (owner: 10JanZerebecki) [11:23:55] _joe_: here now! [11:25:08] (03PS4) 10Dzahn: Allow "hoo" to sudo into datasets [operations/puppet] - 10https://gerrit.wikimedia.org/r/152724 (owner: 10Hoo man) [11:25:36] (03CR) 10Ori.livneh: [C: 032] apache::def: port to env-{enabled,disabled} [operations/puppet] - 10https://gerrit.wikimedia.org/r/152868 (owner: 10Ori.livneh) [11:28:11] ori: where are you? [11:28:35] yuvipanda: barbican hall [11:28:52] upper level [11:29:20] Ori oh OK. I'll find you in a bit [11:31:24] cool [11:32:36] !log added Ryan Lange to NDA LDAP group [11:32:41] Lane,, duh [11:32:42] Logged the message, Master [11:37:17] RECOVERY - Puppet freshness on db1009 is OK: puppet ran at Sat Aug 9 11:37:13 UTC 2014 [12:06:37] (03PS1) 10Dzahn: introduce labmon-roots admin group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153053 [12:07:18] (03CR) 10jenkins-bot: [V: 04-1] introduce labmon-roots admin group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153053 (owner: 10Dzahn) [12:12:47] (03PS2) 10Dzahn: introduce labmon-roots admin group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153053 [12:15:02] (03PS1) 10Dzahn: add yuvipanda to labmon-roots admin group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153054 [12:18:25] yuvipanda: ^ [12:18:36] that is before the servers are even deployed though :p [12:30:46] legoktm: 208.80.154.0/24 [12:31:11] (03PS1) 10Ori.livneh: add (and use) shell_exports() and apache::vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/153058 [12:31:45] legoktm: 198.35.26.0/23 [12:32:17] legoktm: 185.15.56.0/22 , 2a02:ec80::/32 [12:32:36] (03PS2) 10Ori.livneh: HHVM: ensure runtime dirs are created before the service [operations/puppet] - 10https://gerrit.wikimedia.org/r/153010 [12:32:42] (03Abandoned) 10Ori.livneh: HHVM: ensure runtime dirs are created before the service [operations/puppet] - 10https://gerrit.wikimedia.org/r/153010 (owner: 10Ori.livneh) [12:33:10] legoktm: 91.198.174.0/24 [12:33:42] 2620:0:860::/46 [12:34:36] (03PS2) 10Ori.livneh: beta: Fix IP mapping for stream.wmflabs.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/152791 (owner: 10Tim Landscheidt) [12:34:51] (03CR) 10Ori.livneh: [C: 032 V: 032] beta: Fix IP mapping for stream.wmflabs.org [operations/puppet] - 10https://gerrit.wikimedia.org/r/152791 (owner: 10Tim Landscheidt) [12:36:15] mutante: https://noc.wikimedia.org/conf/highlight.php?file=squid.php [12:36:51] Reedy: hah, that vs. manifests/network.pp external networks [12:41:12] (03PS1) 10Dzahn: add eqiad network to external networks [operations/puppet] - 10https://gerrit.wikimedia.org/r/153062 [12:43:18] mutante: https://www.mediawiki.org/wiki/Git/Reviewers [12:46:22] legoktm: so cool, adding myself [12:46:28] PROBLEM - MySQL Processlist on db1056 is CRITICAL: Timeout while attempting connection [12:47:22] (03CR) 10JanZerebecki: graphite: add access for ldap groups ops and nda (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/152928 (owner: 10JanZerebecki) [12:48:07] PROBLEM - Disk space on db1056 is CRITICAL: Timeout while attempting connection [12:48:17] PROBLEM - MySQL Idle Transactions on db1056 is CRITICAL: Timeout while attempting connection [12:48:38] PROBLEM - RAID on db1056 is CRITICAL: Timeout while attempting connection [12:49:07] PROBLEM - MySQL disk space on db1056 is CRITICAL: Timeout while attempting connection [12:49:07] RECOVERY - MySQL Idle Transactions on db1056 is OK: OK longest blocking idle transaction sleeps for seconds [12:49:37] RECOVERY - RAID on db1056 is OK: OK: optimal, 1 logical, 2 physical [12:49:41] springle: ^ [12:49:43] that is commons [12:49:51] api server [12:49:53] we have reports coming in [12:49:57] RECOVERY - MySQL disk space on db1056 is OK: DISK OK [12:50:07] commons looks to be fully down [12:50:07] RECOVERY - Disk space on db1056 is OK: DISK OK [12:50:17] it's just one slave though [12:50:27] RECOVERY - MySQL Processlist on db1056 is OK: OK 8 unauthenticated, 0 locked, 1 copy to table, 2 statistics [12:50:31] it's up [12:51:37] wfm again [12:52:17] Slow Q (-1min): 774 [12:53:05] jzerebecki: https://noc.wikimedia.org/conf/highlight.php?file=db-eqiad.php [12:53:24] recovering afaict , back to Slow Q (-1min): 6 [12:54:28] jzerebecki: also http://noc.wikimedia.org/dbtree/ [12:54:48] (03CR) 10Filippo Giunchedi: add (and use) shell_exports() and apache::vars (033 comments) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153058 (owner: 10Ori.livneh) [12:58:19] (03CR) 10Dzahn: [C: 04-1] "oops, apergos says these aren't the right servers to achieve what you wanted to achieve" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152724 (owner: 10Hoo man) [13:00:59] <_joe_> ori: still here? [13:01:15] he's talking to robla atm [13:03:56] <_joe_> Reedy: be my spycam :P [13:12:36] bd808|BUFFER: going to merge https://gerrit.wikimedia.org/r/#/c/151099/ if it won't overwrite anything on deployment-salt [13:18:01] db1056 was hit by a wave of identical SELECT /* LocalRepo::findFiles 127.0.0.1 */ from wikiadmin [13:18:06] odd [13:18:18] <^d> Wonder if that's Aaron's new LocalRepo script? [13:18:49] <_joe_> aren't we supposed not to release anything during WM? [13:19:01] <^d> We've been breaking that rule all week :p [13:19:02] <_joe_> :P [13:19:09] <_joe_> oh nice [13:19:18] <_joe_> next time, hhvm in prod [13:21:14] springle: which tables btw? [13:21:27] <^d> springle: https://gerrit.wikimedia.org/r/#/c/152066/ [13:21:32] <^d> That's the new script. [13:21:37] <^d> (Seems a likely culprit) [13:21:51] godog: image [13:22:02] yeah what ^d said [13:23:03] hmm [13:23:24] would that maintenance script be sending traffic via mwNNNN nodes? [13:23:47] all manner of mw10NN ips [13:24:07] oh ok, no afaik they shouldn't be running maint scripts [13:24:59] as though some cache flushed and everything tried to refresh at once [13:25:02] (03CR) 10Yuvipanda: [C: 031] introduce labmon-roots admin group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153053 (owner: 10Dzahn) [13:25:18] (03CR) 10Yuvipanda: [C: 031] "Yes please (full root)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153054 (owner: 10Dzahn) [13:28:06] (03PS1) 10Springle: s4 move api traffic back to db1042 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153067 [13:28:46] (03CR) 10Springle: [C: 032] s4 move api traffic back to db1042 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153067 (owner: 10Springle) [13:28:50] (03Merged) 10jenkins-bot: s4 move api traffic back to db1042 [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153067 (owner: 10Springle) [13:29:49] !log springle Synchronized wmf-config/db-eqiad.php: move s4 api traffic back to db1042 (duration: 00m 06s) [13:29:55] Logged the message, Master [13:36:38] sync-file failed for mw1010 [13:36:48] err 23.. too many open files [13:40:21] godog: Yeah merge away! [13:43:51] !log springle Synchronized wmf-config/db-eqiad.php: move s4 api traffic back to db1042 (duration: 00m 07s) [13:43:56] Logged the message, Master [13:44:38] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 29 data above and 0 below the confidence bounds [13:44:47] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 30 data above and 0 below the confidence bounds [13:50:13] _joe_: yo [13:51:34] <_joe_> ori: so if you're up for testing, I'll enable the second host on rcstream [13:52:26] yeah :) [13:52:32] <_joe_> mmmh no sorry [13:52:40] <_joe_> this change requires to restart pybal [13:52:59] heh ok, no problem. i'm listening to a talk anyway, should probably focus ;) [13:53:05] <_joe_> I'd prefer to do that on monday [13:53:10] no problem at all [13:53:11] <_joe_> yes that's better [13:53:28] <_joe_> if you have a test client for rcstream, I may use it to validate [13:53:42] <_joe_> not needing to harass you while you're jetlagged [13:53:48] <_joe_> ;) [13:53:57] (03PS2) 10Ori.livneh: add (and use) shell_exports() and apache::vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/153058 [13:58:26] _joe_: https://wikitech.wikimedia.org/wiki/RCStream#Client [13:58:30] the python example works [13:58:36] just pip install socketIO_client [13:58:53] <_joe_> I guess I can figure it out :) [13:59:16] <_joe_> shame on me for not searching on wikitech [13:59:23] _joe_: change self.emit('subscribe', 'commons.wikimedia.org') to self.emit('subscribe', '*') [13:59:29] easier to verify that way [13:59:32] <_joe_> ok [14:00:38] ori: http://paste.debian.net/114522/ err 23 too many open files. do we do any ulimits for scap? [14:01:19] I suspect it's fenari [14:01:22] (03PS5) 10Filippo Giunchedi: beta: puppet rebase script [operations/puppet] - 10https://gerrit.wikimedia.org/r/151099 (https://bugzilla.wikimedia.org/66683) (owner: 10BryanDavis) [14:01:28] i don't think so, but Reedy ought to prune 14 out of the 19 branches we have in prod [14:01:28] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] beta: puppet rebase script [operations/puppet] - 10https://gerrit.wikimedia.org/r/151099 (https://bugzilla.wikimedia.org/66683) (owner: 10BryanDavis) [14:01:48] ah [14:02:01] Need someone to work out how many we can delete ;) [14:02:05] greg-g: ^^ ;D [14:02:06] error: Ref refs/remotes/origin/production is at ac0e6bef3414b8120f9ea4355652b5c55803e3ca but expected 60c01d2958034deb789a209e2fe80318e3dad400 [14:02:10] sadface [14:02:40] (second puppet-merge worked) [14:04:55] (03PS1) 10Reedy: Remove php-1.23wmf2[12] [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153070 [14:05:44] mw1017: rm: cannot remove ‘/usr/local/apache/common-local/php-1.23wmf21’: Permission denied [14:05:57] * Reedy will check on the server in a few [14:07:02] the blog seems to be down [14:07:05] http://pastebin.com/raw.php?i=E1gsGQS2 [14:07:41] we don't host it :) [14:07:50] (03CR) 10Reedy: [C: 032] Remove php-1.23wmf2[12] [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153070 (owner: 10Reedy) [14:07:54] (03Merged) 10jenkins-bot: Remove php-1.23wmf2[12] [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153070 (owner: 10Reedy) [14:08:48] Reedy: where should I report it then? [14:08:58] seems to only affect readers in Europe, btw [14:09:43] It's fine from a vm at home [14:10:17] we're behind some proxy here [14:11:08] reedy@mw1017:~$ touch /usr/local/apache/common-local/test [14:11:08] touch: cannot touch ‘/usr/local/apache/common-local/test’: Permission denied [14:11:10] springle: ori ^^ [14:11:41] (same when sudo -u mwdeploy [14:11:42] ) [14:12:03] (03PS3) 10Ori.livneh: add (and use) shell_exports() and apache::vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/153058 [14:13:20] Reedy: blergh, i'll look [14:13:28] seems ok at /tmp [14:14:21] but different fs etc [14:14:57] Reedy: try now? [14:15:50] yup, that looks to fix it :) [14:21:08] (03PS1) 10Reedy: Remove 1.24wmf5 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153072 [14:22:39] (03CR) 10Reedy: [C: 032] Remove 1.24wmf5 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153072 (owner: 10Reedy) [14:22:43] (03Merged) 10jenkins-bot: Remove 1.24wmf5 symlinks [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153072 (owner: 10Reedy) [14:25:26] Reedy: which versions are on tin? [14:25:38] !log Removed <= MediaWiki 1.24wmf5 [14:25:43] Logged the message, Master [14:26:04] Reedy: we can safely delete <=1.24wmf10, right? [14:26:21] I sorta presumed <= 8 [14:26:44] stop it with the ascii penises already [14:26:57] <==============================8 [14:27:14] don't involve figlet pls [14:27:17] * ori claps. [14:27:54] <_joe_> apt-get install cowsay [14:28:19] E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied) [14:28:19] E: Unable to lock the administration directory (/var/lib/dpkg/), are you root? [14:28:35] sudo !! [14:50:08] MaxSem: we got a mail from yandex.ru saying they have some issues with indexing us.. can i forward? [14:50:20] it's Russian, just used automatic translation [14:51:40] mutante, https://gerrit.wikimedia.org/r/#/c/151221/ [14:51:50] gogogogmergeit [14:52:17] MaxSem: aah, that seems to be what they are talking about :) yep [14:57:22] ori: where are you again? [14:58:10] garden room [14:59:04] ori: ah ok [15:09:11] (03PS1) 10Ori.livneh: Don't require native CDB support to load {interwiki,trustedxff}.cdb [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153081 (https://bugzilla.wikimedia.org/69332) [15:09:35] Reedy: can i has sanity check? ^ [15:14:11] <^d> ori: We shouldn't require it? [15:14:19] <^d> I already ported the pure-php version to mw-config [15:14:41] <^d> Ah, should actually read the patch lol. [15:14:43] <^d> :) [15:15:25] (03CR) 10Chad: [C: 031] "Should be fine as we fall back on the pure-PHP implementation when dba_* functions DNE." [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153081 (https://bugzilla.wikimedia.org/69332) (owner: 10Ori.livneh) [15:17:38] (03CR) 10Dzahn: [C: 032] Allow ldap "nda" users to access ishmael [operations/puppet] - 10https://gerrit.wikimedia.org/r/140881 (owner: 10Hoo man) [15:20:34] ^d: thanks :) [15:20:45] <^d> yw [15:21:01] (03CR) 10Ori.livneh: [C: 032] Don't require native CDB support to load {interwiki,trustedxff}.cdb [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153081 (https://bugzilla.wikimedia.org/69332) (owner: 10Ori.livneh) [15:21:07] (03Merged) 10jenkins-bot: Don't require native CDB support to load {interwiki,trustedxff}.cdb [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153081 (https://bugzilla.wikimedia.org/69332) (owner: 10Ori.livneh) [15:22:47] !log ori Synchronized wmf-config/CommonSettings.php: I313b09ffc: Don't require native CDB support to load {interwiki,trustedxff}.cdb (duration: 00m 05s) [15:22:52] Logged the message, Master [15:37:10] (03CR) 10Dzahn: [C: 032] graphite: add access for ldap groups ops and nda [operations/puppet] - 10https://gerrit.wikimedia.org/r/152928 (owner: 10JanZerebecki) [15:41:48] (03PS1) 10Yuvipanda: Quarry: Add sqlalchemy as an explicit dependency [operations/puppet] - 10https://gerrit.wikimedia.org/r/153111 [15:41:55] mutante: can you merge ^? should totally be uncontroversial :) [16:06:38] PROBLEM - Number of mediawiki jobs running on tungsten is CRITICAL: CRITICAL: Anomaly detected: 29 data above and 0 below the confidence bounds [16:06:47] PROBLEM - Number of mediawiki jobs queued on tungsten is CRITICAL: CRITICAL: Anomaly detected: 29 data above and 0 below the confidence bounds [16:15:01] (03PS5) 10Hoo man: Allow "hoo" to sudo into datasets [operations/puppet] - 10https://gerrit.wikimedia.org/r/152724 [16:15:31] godog: wanna merge trivial https://gerrit.wikimedia.org/r/#/c/153111/ while you're at it? :) [16:17:04] (03PS1) 10JanZerebecki: tendril: add access for ldap groups ops and nda [operations/puppet] - 10https://gerrit.wikimedia.org/r/153116 [16:17:49] (03PS3) 10Hoo man: Remove brion from the "dataset-admin" group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153034 [16:18:18] (03CR) 10Ori.livneh: "the package is also declared in modules/toollabs/manifests/exec_environ.pp, modules/eventlogging/manifests/package.pp, and modules/puppetm" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153111 (owner: 10Yuvipanda) [16:19:11] (03PS4) 10Hoo man: Remove brion from the "dataset-admin" group [operations/puppet] - 10https://gerrit.wikimedia.org/r/153034 [16:19:27] (03CR) 10Filippo Giunchedi: [C: 031] add (and use) shell_exports() and apache::vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/153058 (owner: 10Ori.livneh) [16:19:31] (03CR) 10Hoo man: "Rebased" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153034 (owner: 10Hoo man) [16:20:33] (03CR) 10Hoo man: "Changed so that we now have two groups (one for the snapshot and one for the dataset hosts). Per Daniel" [operations/puppet] - 10https://gerrit.wikimedia.org/r/152724 (owner: 10Hoo man) [16:21:02] (03PS1) 10Dzahn: remove blog.wikmedia.org related things [operations/puppet] - 10https://gerrit.wikimedia.org/r/153117 [16:21:24] (03PS2) 10Nemo bis: Change upload settings on fa.wikipedia [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/152042 (https://bugzilla.wikimedia.org/69171) (owner: 10Calak) [16:21:39] (03CR) 10Yuvipanda: "@Ori: First two shouldn't happen, and third shouldn't either since no nagios on labs from puppet. Am testing it now anyway" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153111 (owner: 10Yuvipanda) [16:22:51] yuvipanda: sure [16:23:12] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Quarry: Add sqlalchemy as an explicit dependency [operations/puppet] - 10https://gerrit.wikimedia.org/r/153111 (owner: 10Yuvipanda) [16:23:24] godog: ty [16:23:32] ori: I also tested it, and didn't break :) [16:23:38] (03PS4) 10Ori.livneh: add (and use) shell_exports() and apache::vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/153058 [16:23:40] (03CR) 10Dzahn: [C: 031] "removes stuff called " This is a terrible hack" :)" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153117 (owner: 10Dzahn) [16:24:21] (03CR) 10jenkins-bot: [V: 04-1] add (and use) shell_exports() and apache::vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/153058 (owner: 10Ori.livneh) [16:26:18] (03PS5) 10Ori.livneh: add (and use) shell_exports() and apache::vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/153058 [16:32:14] (03PS6) 10Ori.livneh: add (and use) shell_exports() and apache::vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/153058 [16:32:27] (03CR) 10Ori.livneh: [C: 032 V: 032] "tested in labs" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153058 (owner: 10Ori.livneh) [16:34:47] PROBLEM - Puppet freshness on db1011 is CRITICAL: Last successful Puppet run was Sat 09 Aug 2014 14:34:09 UTC [16:36:09] (03PS1) 10JanZerebecki: kibana: add access for ldap groups ops and nda [operations/puppet] - 10https://gerrit.wikimedia.org/r/153118 [16:47:34] (03CR) 10Tim Landscheidt: [C: 04-1] "I think we should use hashar's androidsdk::dependencies for this." [operations/puppet] - 10https://gerrit.wikimedia.org/r/125241 (owner: 10Yuvipanda) [16:50:07] (03CR) 10BearND: "FWIW, here are the shared libraries needed by aapt (as available on tools-login):" [operations/puppet] - 10https://gerrit.wikimedia.org/r/125241 (owner: 10Yuvipanda) [16:51:12] (03CR) 10JanZerebecki: "http://puppet-compiler.wmflabs.org/202/change/153118/html/logstash1001.eqiad.wmnet.html" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153118 (owner: 10JanZerebecki) [16:53:48] (03PS1) 10Hoo man: Set "siteGroup" for testwikidata and wikidata [operations/mediawiki-config] - 10https://gerrit.wikimedia.org/r/153121 [16:56:30] (03PS1) 10Yuvipanda: diamond: Enable stats collection for quarry [operations/puppet] - 10https://gerrit.wikimedia.org/r/153123 [16:56:37] godog: ^ can you merge? :) [16:56:49] godog: the whitelist should go away in a week or two, when graphite for labs moves to labmon [16:56:58] (03PS1) 10Dzahn: allow LDAP groups 'ops' and 'nda' login on icinga [operations/puppet] - 10https://gerrit.wikimedia.org/r/153124 [16:57:17] PROBLEM - puppet last run on elastic1003 is CRITICAL: CRITICAL: Puppet has 1 failures [16:57:44] (03PS2) 10Dzahn: allow LDAP groups 'ops' and 'nda' login on icinga [operations/puppet] - 10https://gerrit.wikimedia.org/r/153124 [16:58:21] (03CR) 10Dzahn: [C: 032] allow LDAP groups 'ops' and 'nda' login on icinga [operations/puppet] - 10https://gerrit.wikimedia.org/r/153124 (owner: 10Dzahn) [17:02:05] (03PS2) 10Dzahn: kibana: add access for ldap groups ops and nda [operations/puppet] - 10https://gerrit.wikimedia.org/r/153118 (owner: 10JanZerebecki) [17:02:51] (03CR) 10Dzahn: [C: 032] "this is logstash" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153118 (owner: 10JanZerebecki) [17:03:04] (03CR) 10Hoo man: [C: 031] tendril: add access for ldap groups ops and nda [operations/puppet] - 10https://gerrit.wikimedia.org/r/153116 (owner: 10JanZerebecki) [17:04:40] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] diamond: Enable stats collection for quarry [operations/puppet] - 10https://gerrit.wikimedia.org/r/153123 (owner: 10Yuvipanda) [17:04:45] godog: ty! [17:04:46] yuvipanda: yup, done! [17:04:49] hehe [17:13:07] RECOVERY - Puppet freshness on db1011 is OK: puppet ran at Sat Aug 9 17:13:01 UTC 2014 [17:15:17] RECOVERY - puppet last run on elastic1003 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [17:24:38] PROBLEM - puppet last run on amslvs2 is CRITICAL: CRITICAL: Epic puppet fail [17:30:36] (03PS1) 10Ori.livneh: apache: set a default 5-second GracefulShutdownTimeout [operations/puppet] - 10https://gerrit.wikimedia.org/r/153128 [17:43:38] RECOVERY - puppet last run on amslvs2 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [17:52:47] PROBLEM - puppet last run on amslvs3 is CRITICAL: CRITICAL: Epic puppet fail [18:12:47] RECOVERY - puppet last run on amslvs3 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [18:38:56] (03CR) 10Giuseppe Lavagetto: [C: 031] apache: set a default 5-second GracefulShutdownTimeout [operations/puppet] - 10https://gerrit.wikimedia.org/r/153128 (owner: 10Ori.livneh) [18:39:34] (03PS1) 10Hashar: zuul: typo in role::zuul::merger description [operations/puppet] - 10https://gerrit.wikimedia.org/r/153208 [18:40:02] (03Abandoned) 10Giuseppe Lavagetto: test.wp.org: move to mw1018 [operations/puppet] - 10https://gerrit.wikimedia.org/r/152858 (owner: 10Giuseppe Lavagetto) [18:48:24] (03PS1) 10Hashar: contint: analytics packages are not on Trusty yet [operations/puppet] - 10https://gerrit.wikimedia.org/r/153209 [19:04:11] (03CR) 10Hashar: [C: 031 V: 031] "Cherry picked on integration puppetmaster" [operations/puppet] - 10https://gerrit.wikimedia.org/r/153209 (owner: 10Hashar) [19:10:28] PROBLEM - puppet last run on lvs4003 is CRITICAL: CRITICAL: Epic puppet fail [19:16:08] oh [19:16:09] epic [19:29:28] RECOVERY - puppet last run on lvs4003 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [19:32:03] (03PS1) 10Dzahn: OTRS - use apache::site instead of file resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/153213 [19:32:34] (03PS2) 10Dzahn: OTRS - use apache::site instead of file resource [operations/puppet] - 10https://gerrit.wikimedia.org/r/153213 [19:36:27] (03PS1) 10Dzahn: noc.wm - use apache::site and move to templates [operations/puppet] - 10https://gerrit.wikimedia.org/r/153214 [19:46:29] (03PS1) 10Dzahn: planet - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153215 [19:48:47] (03PS1) 10Dzahn: gerrit - retab apache template [operations/puppet] - 10https://gerrit.wikimedia.org/r/153216 [19:51:18] (03PS1) 10Dzahn: gerrit apache template - qualify vars [operations/puppet] - 10https://gerrit.wikimedia.org/r/153217 [20:00:32] (03PS1) 10Dzahn: varnish/zero.inc.vcl.erb - retab [operations/puppet] - 10https://gerrit.wikimedia.org/r/153219 [20:01:42] (03CR) 10JanZerebecki: [C: 04-1] planet - use ssl_ciphersuite (031 comment) [operations/puppet] - 10https://gerrit.wikimedia.org/r/153215 (owner: 10Dzahn) [20:05:35] (03PS2) 10Dzahn: planet - use ssl_ciphersuite [operations/puppet] - 10https://gerrit.wikimedia.org/r/153215 [20:08:14] (03PS1) 10Dzahn: sudo.pp - minor linting fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/153223 [20:09:53] (03PS2) 10Dzahn: sudo.pp - minor linting fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/153223 [20:21:59] (03PS1) 10Dzahn: role/parsoid - minor lint fixes [operations/puppet] - 10https://gerrit.wikimedia.org/r/153225 [20:24:32] (03PS1) 10Dzahn: rm files/misc/boost-backports.key [operations/puppet] - 10https://gerrit.wikimedia.org/r/153226 [20:28:10] (03PS1) 10Dzahn: rm wmfsmssend [operations/puppet] - 10https://gerrit.wikimedia.org/r/153227 [20:41:53] (03PS1) 10Dzahn: delete blog SSL certificates [operations/puppet] - 10https://gerrit.wikimedia.org/r/153228 [23:36:38] PROBLEM - puppet last run on cp4009 is CRITICAL: CRITICAL: Epic puppet fail [23:43:24] "epic puppet fail", huh? [23:45:47] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Epic puppet fail [23:55:37] RECOVERY - puppet last run on cp4009 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures