[00:02:10] RECOVERY - Cassandra database on xenon is OK: PROCS OK: 1 process with UID = 109 (cassandra), command name java, args CassandraDaemon [00:02:35] !log remounted /mnt/data on xenon [00:02:41] Logged the message, Master [00:03:32] RECOVERY - Cassandra database on cerium is OK: PROCS OK: 1 process with UID = 109 (cassandra), command name java, args CassandraDaemon [00:05:41] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:10:53] !log updated fstab data array name from md2 to md127 on cerium, xenon and praseodymium; naming changed after reboot; should probably use uuid instead [00:10:59] Logged the message, Master [00:22:51] RECOVERY - puppet last run on cerium is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [00:25:02] (03PS17) 1020after4: VarnishStatusCollector for diamond. [puppet] - 10https://gerrit.wikimedia.org/r/199302 (https://phabricator.wikimedia.org/T88705) [00:32:56] 6operations: setup/deploy ganeti2001-2006 - https://phabricator.wikimedia.org/T94042#1155773 (10RobH) [00:36:34] 6operations, 10Wikimedia-SVG-rendering, 7Upstream: Filter effect Gaussian blur filter not rendered correctly for small to medium thumbnail sizes - https://phabricator.wikimedia.org/T44090#1155781 (10Patrick87) This is finally fixed upstream! librsvg 2.40.9 (just released) includes the fix. When / how can w... [00:51:22] 6operations, 6Phabricator, 10Wikimedia-Bugzilla: Sanitise a Bugzilla database dump - https://phabricator.wikimedia.org/T85141#1155787 (10Dzahn) and this is the way John meant, basically. and it let's us verify this neatly: ``` [zirconium:/srv/org/wikimedia/static-bugzilla] $ grep "You are not authorized to... [01:05:54] 6operations: setup/deploy ganeti2001-2006 - https://phabricator.wikimedia.org/T94042#1155800 (10RobH) At this point, everything is done other than writing a new partman recipe for this setup. I just didn't quite get to it today with other items. (Though I actually really would like to work on the partman stuff... [01:17:01] PROBLEM - puppet last run on db2028 is CRITICAL: CRITICAL: puppet fail [01:21:40] (03PS1) 10Dzahn: strongswan: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/200088 [01:22:04] 6operations: setup/deploy ganeti2001-2006 - https://phabricator.wikimedia.org/T94042#1155803 (10RobH) actually... lvm.cfg seems to be good for this... 300mb boot, 40gb /, rest in lvm space, not used... Did the unused lvm space need to be in its own container? If so then we need to modify lvm.cfg into somethin... [01:29:19] (03CR) 10Jforrester: "> Lets hold off on enabling more wikis until we have reduced the load generated by template updates." [puppet] - 10https://gerrit.wikimedia.org/r/198433 (https://phabricator.wikimedia.org/T93452) (owner: 10GWicke) [01:33:12] RECOVERY - puppet last run on db2028 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [01:38:36] (03CR) 10BBlack: [C: 031] Clean up bastionhost domain_search [puppet] - 10https://gerrit.wikimedia.org/r/196964 (owner: 10Hoo man) [01:41:02] (03PS18) 1020after4: VarnishStatusCollector for diamond. [puppet] - 10https://gerrit.wikimedia.org/r/199302 (https://phabricator.wikimedia.org/T88705) [01:41:42] (03PS1) 10Dzahn: cassandra: firewall hole for port 9042 [puppet] - 10https://gerrit.wikimedia.org/r/200093 [01:42:19] (03CR) 10Dzahn: "http://stackoverflow.com/questions/2359159/cassandra-port-usage-how-are-the-ports-used" [puppet] - 10https://gerrit.wikimedia.org/r/200093 (owner: 10Dzahn) [02:17:05] (03CR) 10GWicke: [C: 031] "This is now pretty much resolved, see https://phabricator.wikimedia.org/T93751." [puppet] - 10https://gerrit.wikimedia.org/r/198433 (https://phabricator.wikimedia.org/T93452) (owner: 10GWicke) [02:24:12] !log l10nupdate Synchronized php-1.25wmf22/cache/l10n: (no message) (duration: 05m 03s) [02:24:24] Logged the message, Master [02:27:40] !log LocalisationUpdate completed (1.25wmf22) at 2015-03-27 02:26:36+00:00 [02:27:45] Logged the message, Master [02:36:18] (03PS1) 10Jforrester: [WIP] Make VisualEditor access RESTbase directly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200098 (https://phabricator.wikimedia.org/T90374) [02:39:37] spagewmf: Busy watching Netflix obviously [02:43:51] !log l10nupdate Synchronized php-1.25wmf23/cache/l10n: (no message) (duration: 03m 08s) [02:44:00] Logged the message, Master [02:44:51] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [02:46:25] !log LocalisationUpdate completed (1.25wmf23) at 2015-03-27 02:45:22+00:00 [02:46:31] Logged the message, Master [02:49:01] (03CR) 10GWicke: "Should we start on group0 first for half a day or so?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200098 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [02:49:42] (03CR) 10Jforrester: "> Should we start on group0 first for half a day or so?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200098 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [02:54:26] (03CR) 10GWicke: "> We should start in Labs." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200098 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [03:09:59] (03CR) 10Eevans: cassandra: firewall hole for port 9042 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/200093 (owner: 10Dzahn) [03:15:17] 6operations, 10RESTBase-Cassandra: cassandra - enable Inter-node encryption - https://phabricator.wikimedia.org/T94132#1155942 (10Dzahn) 3NEW [03:16:16] 6operations, 10RESTBase-Cassandra: cassandra - enable Inter-node encryption - https://phabricator.wikimedia.org/T94132#1155951 (10Dzahn) [03:18:47] (03CR) 10Dzahn: cassandra: firewall hole for port 9042 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/200093 (owner: 10Dzahn) [03:20:03] 6operations, 10RESTBase-Cassandra: cassandra - enable Inter-node encryption - https://phabricator.wikimedia.org/T94132#1155953 (10Dzahn) [03:37:11] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 2 below the confidence bounds [03:42:50] PROBLEM - puppet last run on db2041 is CRITICAL: CRITICAL: puppet fail [03:47:34] (03PS2) 10Papaul: add mgmt asset tag info for wtp200(1-20) [dns] - 10https://gerrit.wikimedia.org/r/199275 [03:52:01] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [03:59:01] RECOVERY - puppet last run on db2041 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [04:42:38] (03CR) 10Dzahn: "in racktables wmf6162 is called wfp2001, wmf6163 is wfp2002 etc. here they are wtp's. what's wtp vs. wfp and which is right?" [dns] - 10https://gerrit.wikimedia.org/r/199275 (owner: 10Papaul) [04:55:52] (03CR) 10Catrope: [C: 04-1] [WIP] Make VisualEditor access RESTbase directly (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200098 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [04:57:05] (03PS1) 10Dzahn: wikimetrics: lint [puppet] - 10https://gerrit.wikimedia.org/r/200103 [05:08:20] (03PS2) 10Jforrester: Make VisualEditor access RESTbase directly in group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200098 (https://phabricator.wikimedia.org/T90374) [05:08:22] (03PS1) 10Jforrester: Make VisualEditor access RESTbase directly on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200105 (https://phabricator.wikimedia.org/T90374) [05:08:24] (03PS1) 10Jforrester: Make VisualEditor access RESTbase directly on Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200106 (https://phabricator.wikimedia.org/T90374) [05:08:26] (03PS1) 10Jforrester: Make VisualEditor access RESTbase directly on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200107 (https://phabricator.wikimedia.org/T90374) [05:09:50] (03CR) 10Jforrester: Make VisualEditor access RESTbase directly in group0 (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200098 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [05:13:36] (03PS1) 10Dzahn: various role classes - indentation fixes [puppet] - 10https://gerrit.wikimedia.org/r/200110 [05:24:30] 6operations, 10Wikimedia-SVG-rendering, 7Upstream: Filter effect Gaussian blur filter not rendered correctly for small to medium thumbnail sizes - https://phabricator.wikimedia.org/T44090#1156044 (10Dzahn) http://apt.wikimedia.org/wikimedia/pool/main/libr/librsvg/ [05:29:43] 6operations, 10Wikimedia-SVG-rendering, 7Upstream: Filter effect Gaussian blur filter not rendered correctly for small to medium thumbnail sizes - https://phabricator.wikimedia.org/T44090#1156045 (10Dzahn) stable 2.36.1-2 testing 2.40.5-1 unstable 2.40.5-1 exp 2.40.8-1 upstream: http://ftp.gnome.org/pu... [05:35:18] 6operations, 10Wikimedia-SVG-rendering, 7Upstream: Filter effect Gaussian blur filter not rendered correctly for small to medium thumbnail sizes - https://phabricator.wikimedia.org/T44090#1156046 (10Dzahn) adding _joe_ since he built librsvg (2.40.2-1+wm1) per debian/changelog [06:29:22] PROBLEM - puppet last run on amssq47 is CRITICAL: CRITICAL: puppet fail [06:30:21] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 2 failures [06:35:12] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:21] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:01] PROBLEM - puppet last run on mw2206 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:11] PROBLEM - puppet last run on mw2093 is CRITICAL: CRITICAL: Puppet has 1 failures [06:37:01] PROBLEM - puppet last run on mw2134 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:01] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:46:41] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:50] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:31] RECOVERY - puppet last run on mw2206 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [06:47:40] RECOVERY - puppet last run on mw2093 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:48:30] RECOVERY - puppet last run on mw2134 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:52] RECOVERY - puppet last run on amssq47 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:55:11] (03CR) 10Gerardduenas: [C: 031] [sshd] Disable agent forwarding [puppet] - 10https://gerrit.wikimedia.org/r/199936 (owner: 10Chad) [07:15:44] !log LocalisationUpdate ResourceLoader cache refresh completed at Fri Mar 27 07:14:37 UTC 2015 (duration 14m 36s) [07:15:52] Logged the message, Master [09:04:33] (03CR) 10Hashar: "+ Filippo who helped me packaging Zuul." [puppet] - 10https://gerrit.wikimedia.org/r/199598 (https://phabricator.wikimedia.org/T84956) (owner: 10Gilles) [09:20:03] (03CR) 10Faidon Liambotis: [C: 04-1] "This is nice work!" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/194471 (owner: 10Alexandros Kosiaris) [09:38:25] (03PS2) 10Filippo Giunchedi: zuul: provide sane defaults in init scripts [puppet] - 10https://gerrit.wikimedia.org/r/199861 (owner: 10Hashar) [09:38:32] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] zuul: provide sane defaults in init scripts [puppet] - 10https://gerrit.wikimedia.org/r/199861 (owner: 10Hashar) [09:38:37] \o/ [09:38:55] godog: will run puppet on gallium (which hosts zuul and zuul-merger daemon) and make sure it works 100% [09:39:29] paravoid: I am very happy with akosaris pbuilder change. Much better than the crazy puppet thing we had previously :) [09:39:30] hasharCall: sure [09:39:42] paravoid: will let us easily add Jenkins jobs to build package for several distros [09:52:29] and I have a generic debian packaging question. When a package has multiple binary packages, do we have to build each of them? [09:52:50] or is that one build, then split the result to bundle thelm as individual .deb? [09:53:25] usually the latter, build the source and fan out to different binary packages [10:07:02] (03CR) 10Filippo Giunchedi: "couple of comments, LGTM" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/199302 (https://phabricator.wikimedia.org/T88705) (owner: 1020after4) [10:10:01] (03CR) 10Filippo Giunchedi: [C: 031] [sshd] Disable agent forwarding [puppet] - 10https://gerrit.wikimedia.org/r/199936 (owner: 10Chad) [10:11:18] (03CR) 10Alexandros Kosiaris: [C: 032] Clean up bastionhost domain_search [puppet] - 10https://gerrit.wikimedia.org/r/196964 (owner: 10Hoo man) [10:15:17] (03CR) 10Alexandros Kosiaris: [C: 031] "Seems fine to me. Any objections?" [puppet] - 10https://gerrit.wikimedia.org/r/199936 (owner: 10Chad) [10:19:27] twentyafterfour: ping me when you are online re: https://gerrit.wikimedia.org/r/199302 [10:41:21] PROBLEM - puppet last run on pollux is CRITICAL: CRITICAL: puppet fail [10:41:24] (03PS1) 10Giuseppe Lavagetto: proxies: allow filtering by datacenter [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 [10:41:43] (03CR) 10jenkins-bot: [V: 04-1] proxies: allow filtering by datacenter [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [10:44:20] (03CR) 10Giuseppe Lavagetto: "I sometimes use agent forwarding to allow me to use the bastions to copy debs from a "build" host like the one I use in labs for HHVM to c" [puppet] - 10https://gerrit.wikimedia.org/r/199936 (owner: 10Chad) [10:53:18] (03CR) 10Mobrovac: [C: 031] trim list of Cassandra metrics [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/199998 (https://phabricator.wikimedia.org/T78514) (owner: 10Eevans) [10:53:28] (03PS1) 10Filippo Giunchedi: install-server: partman for dm-cache [puppet] - 10https://gerrit.wikimedia.org/r/200134 (https://phabricator.wikimedia.org/T88994) [10:54:32] (03CR) 10Gilles: "I've checked on Jessie and the dependency versions are all over the place compared to what Sentry needs. Some packages are too new, others" [puppet] - 10https://gerrit.wikimedia.org/r/199598 (https://phabricator.wikimedia.org/T84956) (owner: 10Gilles) [10:57:42] 6operations, 7Graphite, 5Patch-For-Review: use graphite1002 to test dm-cache - https://phabricator.wikimedia.org/T88994#1156289 (10fgiunchedi) once provisioned, there will be a large partition on ssd left unused and needs to be provisioned as a cache pool for LVM: ``` pvcreate /dev/sdb2 vgextend vg-data /de... [10:59:21] RECOVERY - puppet last run on pollux is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:02:47] 6operations, 10ops-esams: Rack, cable, prepare cp3030-3049 - https://phabricator.wikimedia.org/T92514#1156298 (10mark) >>! In T92514#1123673, @BBlack wrote: > re: BIOS, Faidon already set up the DRAC networking and passwords there. Today I audited the rest of the settings and changed any that needed changing... [11:03:33] 6operations, 10ops-esams: Rack, cable, prepare cp3030-3049 - https://phabricator.wikimedia.org/T92514#1156299 (10mark) [11:03:38] godog: I'm online, what's up [11:05:21] (03CR) 1020after4: VarnishStatusCollector for diamond. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/199302 (https://phabricator.wikimedia.org/T88705) (owner: 1020after4) [11:07:40] twentyafterfour: I think we're good to go with https://gerrit.wikimedia.org/r/199302 modulo the last two minor comments, can merge that [11:08:36] Is there a reason to prefer not initializing the dictionary with values? [11:09:37] It doesn't bother me either way but the values get used in multiple places so pre-initializing them made more sense to me. [11:11:54] twentyafterfour: agreed, I'm replying to your comments on gerrit [11:12:27] twentyafterfour: either way it is more academic at this point, nothing that prevents merging :) [11:12:36] :) [11:13:05] it works really well: https://graphite.wmflabs.org/dashboard/#availability [11:15:15] nice one [11:16:53] (03PS1) 10Giuseppe Lavagetto: proxies: order the list by IP distance [tools/scap] - 10https://gerrit.wikimedia.org/r/200137 [11:17:10] (03CR) 10jenkins-bot: [V: 04-1] proxies: order the list by IP distance [tools/scap] - 10https://gerrit.wikimedia.org/r/200137 (owner: 10Giuseppe Lavagetto) [11:19:21] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] VarnishStatusCollector for diamond. (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/199302 (https://phabricator.wikimedia.org/T88705) (owner: 1020after4) [11:19:25] (03PS19) 10Filippo Giunchedi: VarnishStatusCollector for diamond. [puppet] - 10https://gerrit.wikimedia.org/r/199302 (https://phabricator.wikimedia.org/T88705) (owner: 1020after4) [11:19:56] (03PS2) 10Giuseppe Lavagetto: proxies: order the list by IP distance [tools/scap] - 10https://gerrit.wikimedia.org/r/200137 [11:19:59] (03CR) 10Filippo Giunchedi: [V: 032] VarnishStatusCollector for diamond. [puppet] - 10https://gerrit.wikimedia.org/r/199302 (https://phabricator.wikimedia.org/T88705) (owner: 1020after4) [11:20:12] (03CR) 10jenkins-bot: [V: 04-1] proxies: order the list by IP distance [tools/scap] - 10https://gerrit.wikimedia.org/r/200137 (owner: 10Giuseppe Lavagetto) [11:20:23] twentyafterfour: merged [11:20:55] godog: awesome, thank you! [11:21:20] I'll remove my cherry pick on the beta puppetmaster [11:22:32] 6operations, 10MediaWiki-extensions-GWToolset, 6Multimedia, 7Performance: Can Commons support a mass upload of 14 million files (1.5 TB)? - https://phabricator.wikimedia.org/T88758#1156367 (10fgiunchedi) what's the progress? FWIW we should be fully done with the swift expansion in ~10d but no harm to try o... [11:23:25] 7Puppet, 10Continuous-Integration, 6Release-Engineering: Suggestion: disable autoloader_layout checks in our jenkins puppet-lint - https://phabricator.wikimedia.org/T1289#1156369 (10Krinkle) [11:24:25] twentyafterfour: yw [11:24:50] (03PS3) 10Giuseppe Lavagetto: proxies: order the list by IP distance [tools/scap] - 10https://gerrit.wikimedia.org/r/200137 [11:25:49] <_joe_> lol our flake8 on scap requires line to be 79 characters long, not 80 [11:27:18] 6operations, 10ops-esams: Remove all Toolserver equipment - https://phabricator.wikimedia.org/T92518#1156372 (10mark) a:5mark>3Cmjohnson @Cmjohnson: could you perhaps take a stab at removing -everything- (except PDUs) from Racktables that's currently in racks OE10 and OE16? All that stuff is now gone. AFAI... [11:36:42] PROBLEM - very high load average likely xfs on ms-be1009 is CRITICAL: CRITICAL - load average: 293.32, 189.53, 91.10 [11:37:51] (03PS2) 10Giuseppe Lavagetto: proxies: allow filtering by datacenter [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 [11:38:06] (03CR) 10jenkins-bot: [V: 04-1] proxies: allow filtering by datacenter [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [11:38:19] * godog sighs at ms-be1009 [11:38:28] <_joe_> godog: what's happening there? [11:38:45] what the alarm says, xfs crapping itself [11:38:49] rebooting.. [11:40:00] !log reboot ms-be1009, xfs stuck [11:40:08] Logged the message, Master [11:44:30] 7Puppet, 6operations, 7Swift: puppet failure "invalid byte sequence in utf-8" while copying swift ring builder files - https://phabricator.wikimedia.org/T93614#1156425 (10fgiunchedi) hah we've seen this before! T91453 though we don't control the files here since they are autogenerated by swift when manipulaa... [11:50:13] (03CR) 10Giuseppe Lavagetto: icinga: check its own configuration (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/199841 (owner: 10Giuseppe Lavagetto) [11:51:25] (03PS3) 10Giuseppe Lavagetto: icinga: check its own configuration [puppet] - 10https://gerrit.wikimedia.org/r/199841 [11:53:05] 6operations, 10ops-esams, 7HTTPS, 3HTTPS-by-default: esams power capacity issues - https://phabricator.wikimedia.org/T90000#1156463 (10mark) [11:53:06] 6operations, 10ops-esams: Rack, cable, prepare cp3030-3049 - https://phabricator.wikimedia.org/T92514#1156462 (10mark) [11:53:20] RECOVERY - very high load average likely xfs on ms-be1009 is OK: OK - load average: 18.40, 5.48, 1.90 [11:54:18] 6operations, 10ops-esams, 7HTTPS, 3HTTPS-by-default: esams power capacity issues - https://phabricator.wikimedia.org/T90000#1051108 (10mark) Faidon previously confirmed use of approximately 350-400W (judging from the PDU measurements) when he ran stress on one box. I just did the same and got 322W using t... [12:00:51] 7Puppet, 6operations, 7Swift: puppet failure "invalid byte sequence in utf-8" while copying swift ring builder files - https://phabricator.wikimedia.org/T93614#1156489 (10fgiunchedi) also since this bug we've changed the builder files (e.g. by adding devices or changing weights) so for example now only `acco... [12:01:07] 6operations, 10MediaWiki-extensions-GWToolset, 6Multimedia, 7Performance: Can Commons support a mass upload of 14 million files (1.5 TB)? - https://phabricator.wikimedia.org/T88758#1156490 (10Harej) Haven't started yet. There's a lot I need to work out on my end first. [12:01:18] (03PS4) 10Giuseppe Lavagetto: icinga: check its own configuration [puppet] - 10https://gerrit.wikimedia.org/r/199841 [12:01:34] <_joe_> godog: care to take another look? ^^ I corrected the typo [12:02:20] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/199841 (owner: 10Giuseppe Lavagetto) [12:02:23] _joe_: sure [12:02:39] this is driving me nuts at how silly it is https://phabricator.wikimedia.org/T93614 [12:19:02] godog: hoo we had that on beta a few weeks ago [12:19:15] seems to be related to the ruby version that does not recognized unicode in puppet manifests [12:19:39] though in a file, that is quite crazy [12:22:13] 7Puppet, 6operations, 7Swift: puppet failure "invalid byte sequence in utf-8" while copying swift ring builder files - https://phabricator.wikimedia.org/T93614#1156506 (10hashar) Could it be related to ruby having troubles with unicode? We had to fix a bunch of them in puppet manifests {T91453} ex: https://... [12:36:20] RECOVERY - puppet last run on ms-be1016 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:42:42] 7Puppet, 6operations, 7Swift: puppet failure "invalid byte sequence in utf-8" while copying swift ring builder files - https://phabricator.wikimedia.org/T93614#1156553 (10fgiunchedi) definitely, running with tracing enabled seems to point to `show_diff`, so my guess is that puppet fails to detect the file as... [12:42:48] (03PS1) 10Filippo Giunchedi: swift: don't show diff for builder files [puppet] - 10https://gerrit.wikimedia.org/r/200146 (https://phabricator.wikimedia.org/T93614) [12:43:46] (03PS2) 10Filippo Giunchedi: swift: don't show diff for builder files [puppet] - 10https://gerrit.wikimedia.org/r/200146 (https://phabricator.wikimedia.org/T93614) [13:01:36] (03PS1) 10Filippo Giunchedi: geoip: disable show_diff [puppet] - 10https://gerrit.wikimedia.org/r/200147 (https://phabricator.wikimedia.org/T93614) [13:18:21] PROBLEM - puppet last run on mw2087 is CRITICAL: CRITICAL: puppet fail [13:19:44] 6operations, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform/Ops April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1156644 (10Qgil) [13:20:37] 6operations, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform/Ops April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1156650 (10Qgil) Instead of "Blocks" should be "Blocked by", right? But we don't need all these notifications, so let's just... [13:33:05] (03CR) 10Alexandros Kosiaris: "Yeah, the idea is to create a viable module to replace the misc while cleaning up the role classes as well." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/194471 (owner: 10Alexandros Kosiaris) [13:36:11] RECOVERY - puppet last run on mw2087 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:43:04] 10Ops-Access-Requests, 6operations: Access to francium - https://phabricator.wikimedia.org/T94093#1156696 (10mark) Please work with @ArielGlenn (and others) to set this up. [13:49:53] 6operations, 10Wikimedia-Git-or-Gerrit: Git.wikimedia.org keeps going down - https://phabricator.wikimedia.org/T73974#1156722 (10demon) >>! In T73974#1155506, @Spage wrote: > Would {T51371} help? Not much. Getting rid of the blasted tool would help more (T752) [13:51:27] (03PS16) 10Alexandros Kosiaris: Package builder module [puppet] - 10https://gerrit.wikimedia.org/r/194471 [13:52:26] (03CR) 10Alexandros Kosiaris: "@faidon, does this look better ?" [puppet] - 10https://gerrit.wikimedia.org/r/194471 (owner: 10Alexandros Kosiaris) [13:58:59] akosiaris: yes [13:59:06] akosiaris: although under https://gerrit.wikimedia.org/r/#/c/194471/16/modules/package_builder/manifests/pbuilder_hook.pp [13:59:10] I'd use require => :) [14:04:11] that one actually looks ugly with either approach... [14:04:28] but that is mostly due to the large string value [14:05:41] RECOVERY - salt-minion processes on labstore2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [14:05:42] I am still ambivalent over the 2 approaches to be honest. I 've always used the require metaparameter. The result is that is doesn't even register in my eyes these days when I am reviewing [14:13:00] WIKIMEDIA=yes ahah [14:13:01] I love it [14:13:17] akosiaris: I generally prefer require => tbh :) [14:16:56] (03PS2) 10Ottomata: wikimetrics: lint [puppet] - 10https://gerrit.wikimedia.org/r/200103 (owner: 10Dzahn) [14:17:30] akosiaris: ping [14:17:55] akosiaris: the zotero translators repo (&& deploy) should be updated from upstream if possible [14:18:28] * mobrovac hides [14:21:12] mobrovac: https://gerrit.wikimedia.org/r/mediawiki/services/zotero/translators. Simple git pull, git review/push, trebuchet deploy. [14:21:21] (03CR) 10Hashar: [C: 04-1] "I like the inclusion of the arch in the basepath, will make it trivial to reuse all that logic for Jenkins (the scripts I use assume arch " (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/194471 (owner: 10Alexandros Kosiaris) [14:21:41] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [14:21:42] akosiaris: perfect, thnx [14:21:47] mobrovac: I think you got access to do that so I am not really needed. If something goes wrong I am around [14:21:50] now.. [14:22:01] ey ey captain [14:22:06] https://gerrit.wikimedia.org/r/mediawiki/services/zotero/translation-server [14:22:13] this one is kind of more difficult [14:22:35] I think I have added a good README.md on how to do it [14:23:11] akosiaris: indeed it is a good readme, luckily that repo hasn't been updated in months, so no fear :) [14:23:47] translators have mediawiki/services rights so i should be able to pull it off, will cry for help if not [14:24:09] mobrovac: exactly. ping me if there is any problem [14:26:13] akosiaris: cant follow up today on your pbuilder patch. But I like it a lot :) I have commented about WIKIMEDIA=yes / hookdir being skipped [14:28:05] (03CR) 10Faidon Liambotis: [C: 032] swift: don't show diff for builder files [puppet] - 10https://gerrit.wikimedia.org/r/200146 (https://phabricator.wikimedia.org/T93614) (owner: 10Filippo Giunchedi) [14:28:21] RECOVERY - HHVM rendering on mw1034 is OK: HTTP OK: HTTP/1.1 200 OK - 69026 bytes in 0.186 second response time [14:28:23] <_joe_> !log restarted mw1034, stuck in HPHP::StatCache::refresh [14:28:25] (03PS2) 10Faidon Liambotis: geoip: disable show_diff [puppet] - 10https://gerrit.wikimedia.org/r/200147 (https://phabricator.wikimedia.org/T93614) (owner: 10Filippo Giunchedi) [14:28:29] Logged the message, Master [14:28:31] (03CR) 10Faidon Liambotis: [C: 032 V: 032] geoip: disable show_diff [puppet] - 10https://gerrit.wikimedia.org/r/200147 (https://phabricator.wikimedia.org/T93614) (owner: 10Filippo Giunchedi) [14:28:36] 6operations, 10ops-esams: Rack, cable, prepare cp3030-3049 - https://phabricator.wikimedia.org/T92514#1156811 (10mark) Servers cp3030 - cp3039 have now been relocated to rack OE13, in identical rack unit positions (18 and up). They are no longer cabled in any way. [14:29:13] 6operations, 10ops-esams: Rack, cable, prepare cp3030-3049 - https://phabricator.wikimedia.org/T92514#1156813 (10mark) [14:29:31] RECOVERY - Apache HTTP on mw1034 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.066 second response time [14:30:00] hashar: one quick question. Would you be OK with DIST=jessie-wikimedia being a shortcut for WIKIMEDIA=yes DIST=jessie ? [14:30:30] (03CR) 10Faidon Liambotis: [C: 031] install-server: partman for dm-cache [puppet] - 10https://gerrit.wikimedia.org/r/200134 (https://phabricator.wikimedia.org/T88994) (owner: 10Filippo Giunchedi) [14:31:10] RECOVERY - puppet last run on ms-be1018 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [14:32:32] akosiaris: my idea was to have the apt.wm.o hook recognize *-wikimedia and magically insert apt.wm.o [14:32:51] akosiaris: though that is not explicity and a hidden convention one will have to know, that makes it a breeze :) [14:33:02] 6operations, 6Release-Engineering, 6WMF-Legal, 7Documentation: Sphinx generated documentation should state license in footer - https://phabricator.wikimedia.org/T94000#1156816 (10Dzahn) "Copyright Platform" seems not optimal, i would wonder what that actually is. Maybe "Wikimedia Platform Engineering Team"... [14:33:40] akosiaris: the jenkins script can live with DIST=jessie , I will just have to tweak the cowbuilder path being used since my script assume base-$DIST-$arch.cow [14:34:10] akosiaris: so I would have to inject an env var that overrides the path to base-$DIST.cow or if wikimedia is set base-$DIST-wikimedia.cow [14:34:14] not the end of the world [14:34:44] ok, I think I got it. I 'll try and refactor that part to make it easy for you [14:35:16] but to be honest, I want DIST to be distinct from wikimedia [14:35:28] aka, no base-jessie-wikimedia.cow [14:35:49] so the same .cow for both jessie and jessie-wikimedia right ? [14:35:50] base-jessie.cow is sufficient if the apt hook is run and the build dependencies can be satisfied [14:36:00] <_joe_> ^d: I am working a bit on your ES patch, if you don't mind [14:36:03] is ran* [14:36:07] hashar: exactly [14:36:11] RECOVERY - HHVM busy threads on mw1034 is OK: OK: Less than 30.00% above the threshold [57.6] [14:36:21] RECOVERY - HHVM queue size on mw1034 is OK: OK: Less than 30.00% above the threshold [10.0] [14:36:38] akosiaris: you will want to make sure the cow image is always build with the stock version so [14:36:56] I think you might end up with the cow image being build with the wikimedia flavor [14:37:12] and thus potentially have our packages in the stock image [14:37:20] thus the non wikimedia flavor would end up using our packages [14:37:33] (theorically / assume I dont fully understand everything) [14:37:40] that is something I am hoping to avoid [14:37:50] yeah [14:37:59] I have been lazy and just created a cow per distrib flavor [14:38:00] :=) [14:38:03] which is why I have decoupled the wikimedia apt hook from the distribution itself [14:38:08] (03PS8) 10Nuria: Adding a Last-Access cookie to text and mobile requests [puppet] - 10https://gerrit.wikimedia.org/r/196009 (https://phabricator.wikimedia.org/T88813) [14:38:29] yeah I understand it now [14:38:32] and that make sense [14:39:05] <^d> _joe_: Go right ahead, I'm just making my coffee now [14:39:10] * ^d is about if you need anything [14:39:25] my use case for jessie and jessie-wikimedia cow images was to make sure I will never do a newbie mistake. I am not using hook, I just hacked in the apt preference in my -wikimedia image [14:40:35] <_joe_> ^d: not really, just moving a few things around - I was going to fix a small issue with your patch and decided for a do-over of a couple of things [14:41:11] RECOVERY - puppet last run on ms-be1017 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [14:45:07] (03CR) 10Hashar: "lifting my -1 following a discussion with Alexandros. His idea is to have a single cow used for either stock or wikimedia flavors. That " [puppet] - 10https://gerrit.wikimedia.org/r/194471 (owner: 10Alexandros Kosiaris) [14:45:26] off for grocery shopping + kids. be back later in the evening [14:46:10] (03PS1) 10Dzahn: dsh: use FQDN for mediawiki-installation hosts [puppet] - 10https://gerrit.wikimedia.org/r/200158 (https://phabricator.wikimedia.org/T93983) [14:48:05] (03PS9) 10Nuria: Adding a Last-Access cookie to text and mobile requests [puppet] - 10https://gerrit.wikimedia.org/r/196009 (https://phabricator.wikimedia.org/T88813) [14:49:46] <_joe_> ^d: I see in role/elasticsearch.pp a variable called $filter_cache_size, that is then not referenced anywhere [14:50:18] (03CR) 10Alexandros Kosiaris: "Actually, hashar has a very good comment about the way the hook is being conditionally included. I 'll amend" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/194471 (owner: 10Alexandros Kosiaris) [14:50:28] (03CR) 10Nuria: "Updated cookie expiration to happen in the next 30 days." [puppet] - 10https://gerrit.wikimedia.org/r/196009 (https://phabricator.wikimedia.org/T88813) (owner: 10Nuria) [14:51:18] <_joe_> oh ok got it [14:51:25] <_joe_> another small error in the PS [14:52:09] <_joe_> actually, no, it was already there... [14:53:36] 6operations: packages not upgraded post-install - https://phabricator.wikimedia.org/T94177#1156873 (10fgiunchedi) 3NEW [14:54:22] (03Abandoned) 10Filippo Giunchedi: install-server: upgrade kernel on swift HP machines [puppet] - 10https://gerrit.wikimedia.org/r/198227 (https://phabricator.wikimedia.org/T90922) (owner: 10Filippo Giunchedi) [14:54:54] 6operations, 10ops-eqiad, 5Patch-For-Review: Rack and set up ms-be1016-1018 - https://phabricator.wikimedia.org/T90922#1156882 (10fgiunchedi) 5Open>3Resolved machines in service, resolving [14:55:42] <_joe_> ^d: we do have indices.cache.filter.size: 10% in production, while I suspect you guys wanted it to be 20% [14:55:48] <_joe_> am I right? [14:56:22] <^d> Lemme double check [14:57:44] <^d> _joe_: Yeah, looks like it [14:57:53] <^d> Should be 20% according to current puppet [14:58:12] <_joe_> yeah, well, if the parameter is not passed to the class instance, it is never inherited :) [14:58:33] <^d> whoops :p [14:59:32] <_joe_> yeah I'm doing a little revision of the whole thing [14:59:43] <_joe_> I guess we'll go live next week though [15:00:31] <^d> mmk. We could pull into staging and/or beta today if we want some more testing of your refactor [15:00:51] <_joe_> well... first order of business is me finishing it :P [15:02:47] (03PS3) 10Ottomata: wikimetrics: lint [puppet] - 10https://gerrit.wikimedia.org/r/200103 (owner: 10Dzahn) [15:11:51] PROBLEM - puppet last run on mw2152 is CRITICAL: CRITICAL: puppet fail [15:13:12] (03CR) 10Ottomata: [C: 032] wikimetrics: lint [puppet] - 10https://gerrit.wikimedia.org/r/200103 (owner: 10Dzahn) [15:17:17] (03PS10) 10Nuria: Adding logster to count requests to wikimetrics UI [puppet] - 10https://gerrit.wikimedia.org/r/197411 [15:18:07] ottomata: thanks:) [15:18:10] (03CR) 10jenkins-bot: [V: 04-1] Adding logster to count requests to wikimetrics UI [puppet] - 10https://gerrit.wikimedia.org/r/197411 (owner: 10Nuria) [15:22:24] (03CR) 10Milimetric: Adding a Last-Access cookie to text and mobile requests (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/196009 (https://phabricator.wikimedia.org/T88813) (owner: 10Nuria) [15:26:31] (03CR) 10Nuria: Adding a Last-Access cookie to text and mobile requests (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/196009 (https://phabricator.wikimedia.org/T88813) (owner: 10Nuria) [15:30:11] RECOVERY - puppet last run on mw2152 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [15:31:08] 6operations, 6MediaWiki-Core-Team, 6Multimedia, 6Parsoid-Team, and 3 others: Prepare Platform/Ops April 2015 quarterly review presentation - https://phabricator.wikimedia.org/T91803#1156972 (10bd808) >>! In T91803#1156650, @Qgil wrote: > Instead of "Blocks" should be "Blocked by", right? But we don't need... [15:37:33] 6operations, 10ops-codfw: correct wtp2001-2016 entries in racktables - https://phabricator.wikimedia.org/T94183#1157004 (10RobH) 3NEW a:3Papaul [15:37:56] (03CR) 10Dzahn: "https://phabricator.wikimedia.org/T94183#1157004" [dns] - 10https://gerrit.wikimedia.org/r/199275 (owner: 10Papaul) [15:39:00] (03CR) 10Glaisher: "per werdna and hoo. It doesn't look like there is going to be a performance issue." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195938 (https://phabricator.wikimedia.org/T87431) (owner: 10Glaisher) [15:39:27] (03CR) 10RobH: [C: 032] add mgmt asset tag info for wtp200(1-20) (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/199275 (owner: 10Papaul) [15:41:32] 6operations: create mgmt dns entries for asset tags of wtp2001-2020 - https://phabricator.wikimedia.org/T90274#1157023 (10RobH) 5Open>3Resolved Papaul forgot the reference to task in the commit, and I didn't notice until after merge. https://gerrit.wikimedia.org/r/#/c/199275/ [15:48:41] 6operations, 10ops-codfw: correct wtp2001-2016 entries in racktables - https://phabricator.wikimedia.org/T94183#1157071 (10Papaul) 5Open>3Resolved complete [15:50:19] (03CR) 10Greg Grossmeier: [C: 04-1] "See comment please before Monday at 15:00 UTC" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198691 (https://phabricator.wikimedia.org/T93210) (owner: 10Nemo bis) [15:52:59] 6operations: boron passive checks aren't being collected - https://phabricator.wikimedia.org/T89983#1157079 (10faidon) Notifications for boron have been disabled. Whoever fixes this **must** re-enable them or we will lose future alerts. [15:57:15] (03PS6) 10Giuseppe Lavagetto: Hiera-ize the Elasticsearch config [puppet] - 10https://gerrit.wikimedia.org/r/197533 (owner: 10Chad) [15:58:09] <_joe_> ^d: ^^ [15:58:54] <_joe_> so basically: I removed any class parameter from the role, as you were just copying them to the elasticsearch class anyway [15:59:13] <_joe_> but please do a thorough review, I'll try it with the puppet compiler in the meanwhile [15:59:48] <^d> I'll have a look in a minute :) [16:00:23] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] trim list of Cassandra metrics [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/199998 (https://phabricator.wikimedia.org/T78514) (owner: 10Eevans) [16:04:11] (03PS7) 10Giuseppe Lavagetto: Hiera-ize the Elasticsearch config [puppet] - 10https://gerrit.wikimedia.org/r/197533 (owner: 10Chad) [16:04:31] (03PS1) 10Filippo Giunchedi: update cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/200167 [16:05:10] <^d> Ahh, hieradata/regex.yaml! [16:05:15] <^d> That was the missing magic I needed [16:05:25] <_joe_> yeah, to use sparingly [16:05:45] (03PS8) 10Giuseppe Lavagetto: Hiera-ize the Elasticsearch config [puppet] - 10https://gerrit.wikimedia.org/r/197533 (owner: 10Chad) [16:07:11] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] update cassandra submodule [puppet] - 10https://gerrit.wikimedia.org/r/200167 (owner: 10Filippo Giunchedi) [16:09:03] <^d> _joe_: Passing puppet compiler yet? [16:13:22] (03CR) 10Alex Monk: "Restbase is not available on special wikis yet, is it?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200107 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [16:13:29] (03PS1) 10Papaul: add mgmt asset tag info for ganeti2(1-6) [dns] - 10https://gerrit.wikimedia.org/r/200168 [16:13:39] (03CR) 10jenkins-bot: [V: 04-1] add mgmt asset tag info for ganeti2(1-6) [dns] - 10https://gerrit.wikimedia.org/r/200168 (owner: 10Papaul) [16:14:26] (03PS1) 10Faidon Liambotis: strongswan: fix typo in disable-sysvinit Exec [puppet] - 10https://gerrit.wikimedia.org/r/200169 [16:14:45] (03CR) 10Faidon Liambotis: [C: 032 V: 032] strongswan: fix typo in disable-sysvinit Exec [puppet] - 10https://gerrit.wikimedia.org/r/200169 (owner: 10Faidon Liambotis) [16:18:53] (03PS17) 10Alexandros Kosiaris: Package builder module [puppet] - 10https://gerrit.wikimedia.org/r/194471 [16:19:10] 6operations, 10ops-codfw: set asset tag mgmt dns entries - https://phabricator.wikimedia.org/T94041#1157136 (10Papaul) @RobH Asset tag mgmt entries set https://gerrit.wikimedia.org/r/#/c/200168/ [16:19:42] papaul: something is wrong with that changeset [16:19:46] it has two files that shouldnt exist [16:20:07] (03CR) 10Alex Monk: "I'm not sure what your point is GWicke... This needs to start in labs, then get turned on at group 0 production wikis, then other wikis." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200098 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [16:20:37] (03CR) 10RobH: [C: 04-1] "Get rid of the filenames ending in ~, they are some kind of mistake." [dns] - 10https://gerrit.wikimedia.org/r/200168 (owner: 10Papaul) [16:20:49] (03PS1) 10coren: Labs: puppetize labstore1005's mysql setup [puppet] - 10https://gerrit.wikimedia.org/r/200170 (https://phabricator.wikimedia.org/T88234) [16:21:02] <_joe_> ^d: it does, but I need to do a couple of things to make it work correctly, namely rebase it [16:21:03] robh: sorry i don't know staying trying to figure out this git stuff [16:21:10] no problem [16:21:15] but you must have git added those at some point [16:21:21] you'll want to git rm them on your local [16:21:25] and git commit --amend -a [16:21:25] (03PS9) 10Giuseppe Lavagetto: Hiera-ize the Elasticsearch config [puppet] - 10https://gerrit.wikimedia.org/r/197533 (owner: 10Chad) [16:21:44] robh:ok [16:21:45] papaul: you dont have to get it right the first time, that is the joy of code review ;D [16:22:02] robh: that makes me feel better [16:22:32] robH: it is tuff [16:26:28] <^d> _joe_: We want to exclude admin from labs. Otherwise I get duplicate Group[ops] definitions. [16:26:44] <_joe_> ^d: yeah, right [16:26:58] <_joe_> I should really finish my work and include admin in base [16:27:03] <_joe_> for prod [16:27:07] <_joe_> and be done with it [16:31:27] (03CR) 10Jforrester: [C: 04-2] "Not until wmf24 hits group0." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200098 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [16:37:56] 6operations, 10ops-codfw: codw pfw* serial connections problem - https://phabricator.wikimedia.org/T84737#1157199 (10Papaul) 5Open>3Resolved I am boxing the two old routing engines to send back to Juniper. I have confirmation on IRC with Faidon that everything is working. I also send an email to the Junipe... [16:51:23] 6operations, 10RESTBase-Cassandra: puppet fail on prasedymium / cassandra db not running - https://phabricator.wikimedia.org/T94195#1157250 (10Dzahn) 3NEW [16:51:56] ACKNOWLEDGEMENT - Cassandra database on praseodymium is CRITICAL: PROCS CRITICAL: 0 processes with UID = 109 (cassandra), command name java, args CassandraDaemon daniel_zahn T94195 [16:51:57] ACKNOWLEDGEMENT - puppet last run on praseodymium is CRITICAL: CRITICAL: Puppet has 1 failures daniel_zahn T94195 [16:56:20] 6operations, 10Deployment-Systems, 6Release-Engineering, 5Patch-For-Review: /usr/local/bin/deploy2graphite broken on tin due to nc command syntax - https://phabricator.wikimedia.org/T1387#1157273 (10greg) I think the only patch left here is https://gerrit.wikimedia.org/r/#/c/199857/ After that is this don... [16:57:54] 6operations, 10RESTBase-Cassandra: puppet fail on prasedymium / cassandra db not running - https://phabricator.wikimedia.org/T94195#1157284 (10Dzahn) @praseodymium:/var/lib# file cassandra cassandra: broken symbolic link to `/mnt/data/cassandra/' [16:59:01] help? gtt/level3 packet loss out of 200 paul [16:59:09] !log mount /mnt/data on praseodymium to fix cassandra [16:59:15] Logged the message, Master [17:00:51] RECOVERY - Cassandra database on praseodymium is OK: PROCS OK: 1 process with UID = 109 (cassandra), command name java, args CassandraDaemon [17:01:21] RECOVERY - puppet last run on praseodymium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:02:11] 6operations, 10RESTBase-Cassandra: puppet fail on prasedymium / cassandra db not running - https://phabricator.wikimedia.org/T94195#1157291 (10Dzahn) i did: mount /mnt/data and ran puppet again, which fixed the immediate issue: /Service[cassandra]/ensure: ensure changed 'stopped' to 'running' < icinga-wm> R... [17:03:13] cajoel: to? [17:03:16] traceroute? [17:03:26] I changed our localprefs [17:03:31] traceroute from 200p to 4.2.2.2 [17:03:43] I was seeing loss between gtt and level3 peering [17:03:52] level3-test.ip4.gtt.net [17:03:53] wth [17:03:59] I cut the office over to sue monkeybrains [17:04:04] use [17:04:10] starting breakfast now [17:04:18] can you test/confirm from a box inside 200p? [17:04:28] confirmed, fixing [17:04:29] thanks [17:04:32] <3 [17:05:00] 6operations, 10RESTBase-Cassandra: puppet fail on prasedymium / cassandra db not running - https://phabricator.wikimedia.org/T94195#1157295 (10Dzahn) this was because the device name changed after a recent reboot. /etc/fstab had been adjusted but this just needed to be mounted. i'm resolving, though, the mou... [17:05:27] 6operations, 10RESTBase-Cassandra: puppet fail on prasedymium / cassandra db not running - https://phabricator.wikimedia.org/T94195#1157296 (10Dzahn) [17:05:35] 6operations, 10RESTBase-Cassandra: puppet fail on prasedymium / cassandra db not running - https://phabricator.wikimedia.org/T94195#1157299 (10Dzahn) 5Open>3Resolved a:3Dzahn [17:06:39] cajoel: fixed [17:07:09] did you bypass gtt? [17:07:11] yes [17:07:19] worth hassling them about? [17:07:45] maybe... [17:09:57] (03PS1) 10Nuria: Add counter for absolute number of lines on log [debs/logster] - 10https://gerrit.wikimedia.org/r/200182 (https://phabricator.wikimedia.org/T94193) [17:11:12] flipped the office back over [17:11:14] hurrah for telia [17:11:33] thanks for the quick assistance. [17:11:34] (03CR) 10BryanDavis: [C: 031] scap: improve deploy2graphite [puppet] - 10https://gerrit.wikimedia.org/r/199857 (https://phabricator.wikimedia.org/T1387) (owner: 10Filippo Giunchedi) [17:11:54] (03CR) 10GWicke: [C: 04-1] "We decided to wait until we have thinned out old revision renders to make space for new data. Changing my vote to a -1 until then." [puppet] - 10https://gerrit.wikimedia.org/r/198433 (https://phabricator.wikimedia.org/T93452) (owner: 10GWicke) [17:12:22] (03PS11) 10Nuria: Adding logster to count requests to wikimetrics UI [puppet] - 10https://gerrit.wikimedia.org/r/197411 (https://phabricator.wikimedia.org/T94193) [17:13:18] (03CR) 10jenkins-bot: [V: 04-1] Adding logster to count requests to wikimetrics UI [puppet] - 10https://gerrit.wikimedia.org/r/197411 (https://phabricator.wikimedia.org/T94193) (owner: 10Nuria) [17:14:38] (03CR) 10GWicke: "@Alex, the first version of the patch started with the big wikis, not group0." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200098 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [17:18:12] PROBLEM - puppet last run on mw2204 is CRITICAL: CRITICAL: Puppet has 1 failures [17:22:22] (03Abandoned) 10Papaul: add mgmt asset tag info for ganeti2(1-6) [dns] - 10https://gerrit.wikimedia.org/r/200168 (owner: 10Papaul) [17:30:38] cajoel: no worries; I mailed their NOC btw [17:33:00] RECOVERY - puppet last run on mw2204 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [17:44:33] (03CR) 10Catrope: [C: 04-1] Make VisualEditor access RESTbase directly in group0 (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200098 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [17:45:34] (03PS1) 10Ottomata: Install ipython-notebook on analytics client nodes [puppet] - 10https://gerrit.wikimedia.org/r/200193 [17:50:25] (03CR) 10Ottomata: [C: 032] Install ipython-notebook on analytics client nodes [puppet] - 10https://gerrit.wikimedia.org/r/200193 (owner: 10Ottomata) [17:53:13] (03PS3) 10Catrope: Make VisualEditor access RESTbase directly in group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200098 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [17:53:15] (03PS2) 10Catrope: Make VisualEditor access RESTbase directly on Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200106 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [17:53:19] (03PS2) 10Catrope: Make VisualEditor access RESTbase directly on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200107 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [17:53:21] (03PS2) 10Catrope: Make VisualEditor access RESTbase directly on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200105 (https://phabricator.wikimedia.org/T90374) (owner: 10Jforrester) [17:53:23] (03PS1) 10Catrope: Make VisualEditor access RESTbase directly in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200196 [17:53:28] (03CR) 10Alexandros Kosiaris: "I think I 've answered all question (of the first pass) plus fixed a couple of bugs on PS4" (039 comments) [puppet] - 10https://gerrit.wikimedia.org/r/198794 (https://phabricator.wikimedia.org/T87258) (owner: 10Alexandros Kosiaris) [18:00:08] (03PS4) 10Alexandros Kosiaris: Ganeti module/role introduced [puppet] - 10https://gerrit.wikimedia.org/r/198794 (https://phabricator.wikimedia.org/T87258) [18:11:30] 6operations, 10RESTBase-Cassandra: puppet fail on praseodymium / cassandra db not running - https://phabricator.wikimedia.org/T94195#1157512 (10Dzahn) [18:12:05] _joe_: can you restart nutcracker on mw1147? [18:13:03] (03Restored) 10Dzahn: add mgmt asset tag info for ganeti2(1-6) [dns] - 10https://gerrit.wikimedia.org/r/200168 (owner: 10Papaul) [18:14:18] (03PS2) 10Dzahn: add mgmt asset tag info for ganeti2(1-6) [dns] - 10https://gerrit.wikimedia.org/r/200168 (owner: 10Papaul) [18:17:29] (03CR) 10Faidon Liambotis: Ganeti module/role introduced (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/198794 (https://phabricator.wikimedia.org/T87258) (owner: 10Alexandros Kosiaris) [18:18:16] (03PS1) 10Gage: cp3001 & cp3002 no longer exist [puppet] - 10https://gerrit.wikimedia.org/r/200204 [18:18:34] (03CR) 10Dzahn: "Papaul: wanted to show you how we can amend to the existing change and work on it with multiple people. here's what i did:" [dns] - 10https://gerrit.wikimedia.org/r/200168 (owner: 10Papaul) [18:19:50] (03CR) 10Faidon Liambotis: [C: 04-2] "Relaying our IRC conversation:" [tools/scap] - 10https://gerrit.wikimedia.org/r/200137 (owner: 10Giuseppe Lavagetto) [18:22:33] (03PS1) 10Rush: address mail loops on some exim servers [puppet] - 10https://gerrit.wikimedia.org/r/200205 [18:23:41] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: Puppet last ran 4 hours ago [18:24:55] (03CR) 10Jforrester: [C: 031] Make VisualEditor access RESTbase directly in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200196 (owner: 10Catrope) [18:25:11] greg-g: Can RoanKattouw push a Labs-only config patch? [18:25:54] (03CR) 10Faidon Liambotis: [C: 04-1] "root@iodine.wikimedia.org from iodine will still loop... (it sucks, I know :)" [puppet] - 10https://gerrit.wikimedia.org/r/200205 (owner: 10Rush) [18:26:26] (03PS1) 10Alex Monk: Try to unbreak VE on http://ee-prototype.wikipedia.beta.wmflabs.org/ [puppet] - 10https://gerrit.wikimedia.org/r/200206 [18:27:39] (03PS2) 10Alex Monk: Try to unbreak VE on http://ee-prototype.wikipedia.beta.wmflabs.org/ [puppet] - 10https://gerrit.wikimedia.org/r/200206 [18:27:51] James_F: yessir [18:28:22] greg-g: Thanks. [18:28:29] (03CR) 10Catrope: [C: 032] Make VisualEditor access RESTbase directly in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200196 (owner: 10Catrope) [18:29:24] (03CR) 10Dzahn: [C: 031] "yea, both are unreachable and i hear they have been unplugged to make room for new cache nodes" [puppet] - 10https://gerrit.wikimedia.org/r/200204 (owner: 10Gage) [18:30:04] (03CR) 10Gage: [C: 032] cp3001 & cp3002 no longer exist [puppet] - 10https://gerrit.wikimedia.org/r/200204 (owner: 10Gage) [18:30:42] (03PS2) 10Rush: address mail loops on some exim servers [puppet] - 10https://gerrit.wikimedia.org/r/200205 [18:30:46] (03CR) 10Rush: "on it, let's do both? seems sensible" [puppet] - 10https://gerrit.wikimedia.org/r/200205 (owner: 10Rush) [18:32:42] (03CR) 10GWicke: [C: 031] Try to unbreak VE on http://ee-prototype.wikipedia.beta.wmflabs.org/ [puppet] - 10https://gerrit.wikimedia.org/r/200206 (owner: 10Alex Monk) [18:40:17] (03PS3) 10Dzahn: add mgmt asset tag info for ganeti2(1-6) [dns] - 10https://gerrit.wikimedia.org/r/200168 (https://phabricator.wikimedia.org/T94042) (owner: 10Papaul) [18:41:11] (03PS4) 10Dzahn: add mgmt asset tag info for ganeti2(1-6) [dns] - 10https://gerrit.wikimedia.org/r/200168 (https://phabricator.wikimedia.org/T94042) (owner: 10Papaul) [18:42:11] robh: wanna confirm this? https://gerrit.wikimedia.org/r/#/c/200168/4/templates/10.in-addr.arpa [18:43:45] (03CR) 10RobH: [C: 032] add mgmt asset tag info for ganeti2(1-6) [dns] - 10https://gerrit.wikimedia.org/r/200168 (https://phabricator.wikimedia.org/T94042) (owner: 10Papaul) [18:44:09] done [18:44:16] (not sure why i needed to as well but thats cool) [18:44:24] its live [18:44:30] cool,thx [18:45:55] (03Merged) 10jenkins-bot: Make VisualEditor access RESTbase directly in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200196 (owner: 10Catrope) [18:49:07] (03PS2) 10Dzahn: strongswan: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/200088 [18:50:10] (03CR) 10Dzahn: "fixed path conflict" [puppet] - 10https://gerrit.wikimedia.org/r/200088 (owner: 10Dzahn) [18:50:23] (03CR) 10Gage: [C: 032] strongswan: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/200088 (owner: 10Dzahn) [18:50:50] :) [18:51:23] (03PS1) 10Cenarium: Add abusefilter group for testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200216 (https://phabricator.wikimedia.org/T94214) [18:51:33] (03PS2) 10Dzahn: dsh: use FQDN for mediawiki-installation hosts [puppet] - 10https://gerrit.wikimedia.org/r/200158 (https://phabricator.wikimedia.org/T93983) [18:51:54] (03PS3) 10Dzahn: dsh: use FQDN for mediawiki-installation hosts [puppet] - 10https://gerrit.wikimedia.org/r/200158 (https://phabricator.wikimedia.org/T93983) [18:52:00] mutante: -labs [18:52:01] bd808: ^ [18:52:33] Negative24: just joined? [18:52:41] What? [18:53:07] did you mean "look at the labs channel"? [18:53:11] yeah [18:53:21] i wasn't on it, i just joined it [18:53:56] oh. well I have a question for you on #wikimedia-labs [18:54:09] (03CR) 10BryanDavis: [C: 031] "Awesome" [puppet] - 10https://gerrit.wikimedia.org/r/200158 (https://phabricator.wikimedia.org/T93983) (owner: 10Dzahn) [18:54:12] ok, i'm there now [18:58:33] 6operations, 10ops-esams: decommission cp3001 & cp3002 - https://phabricator.wikimedia.org/T94215#1157710 (10Gage) 3NEW [19:01:19] (03PS18) 10Alexandros Kosiaris: Package builder module [puppet] - 10https://gerrit.wikimedia.org/r/194471 [19:01:42] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [19:02:29] (03CR) 10Faidon Liambotis: [C: 031] "Looks ok, as long as I'm not the one to babysit it :)" [puppet] - 10https://gerrit.wikimedia.org/r/200205 (owner: 10Rush) [19:03:45] RoanKattouw: Don't forget to sync your change, even though it's beta only [19:04:09] (03PS3) 10Rush: address mail loops on some exim servers [puppet] - 10https://gerrit.wikimedia.org/r/200205 [19:06:10] hoo: Will do [19:06:29] (03CR) 10Alexandros Kosiaris: "@hashar. I think I 've just catered to jenkins needs as well. doing jessie-wikimedia will automagically have pbuilder (well cowbuilder) se" [puppet] - 10https://gerrit.wikimedia.org/r/194471 (owner: 10Alexandros Kosiaris) [19:06:40] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [19:07:19] 6operations, 6Release-Engineering, 6WMF-Legal, 7Documentation: Sphinx generated documentation should state license in footer - https://phabricator.wikimedia.org/T94000#1157745 (10Mattflaschen) >>! In T94000#1156816, @Dzahn wrote: > "Copyright Platform" seems not optimal, i would wonder what that actually i... [19:07:53] 6operations, 6MediaWiki-Core-Team, 6Release-Engineering, 6WMF-Legal, 7Documentation: Sphinx generated documentation should state license in footer - https://phabricator.wikimedia.org/T94000#1157746 (10Mattflaschen) [19:09:18] (03CR) 10Rush: [C: 032] address mail loops on some exim servers [puppet] - 10https://gerrit.wikimedia.org/r/200205 (owner: 10Rush) [19:10:51] 6operations, 6MediaWiki-Core-Team, 6Release-Engineering, 6WMF-Legal, 7Documentation: Sphinx generated documentation should state license in footer - https://phabricator.wikimedia.org/T94000#1157755 (10greg) FYI: Just putting: ``` (C) Wikimedia Foundation Licensed under $whatever_license, see LICENSE for... [19:13:18] andre____: the "Send Message" button does not work for normal users - not sure if this is known yet [19:14:23] Steinsplitter: it's related to conpherence app which is not enabled [19:14:31] there is an issue somewhere to discuss [19:14:48] Steinsplitter, is that about Phabricator? Steps to reproduce are welcome, in general. :) [19:14:53] but yeah, what Chase wrote [19:14:59] * andre__ was about to test on https://meta.wikimedia.org/wiki/Special:EmailUser/Steinsplitter [19:15:55] andre___ - sorry. phab :-) [19:21:24] (03PS1) 10Dzahn: decom cp3001,cp3002. keep mgmt [dns] - 10https://gerrit.wikimedia.org/r/200222 (https://phabricator.wikimedia.org/T94215) [19:24:31] 6operations, 10Deployment-Systems: Use FQDNs for mediawiki-installation - https://phabricator.wikimedia.org/T93983#1157794 (10Dzahn) >>! In T93983#1151838, @bd808 wrote: > The fix will be to update `mediawiki-installation` which is currently maintained manually to use FQDNs. A big comment should be added to th... [19:25:05] (03CR) 1020after4: "looks good" [puppet] - 10https://gerrit.wikimedia.org/r/199857 (https://phabricator.wikimedia.org/T1387) (owner: 10Filippo Giunchedi) [19:25:13] (03CR) 1020after4: [C: 031] scap: improve deploy2graphite [puppet] - 10https://gerrit.wikimedia.org/r/199857 (https://phabricator.wikimedia.org/T1387) (owner: 10Filippo Giunchedi) [19:25:19] 6operations, 10Deployment-Systems: Use FQDNs for mediawiki-installation - https://phabricator.wikimedia.org/T93983#1157796 (10Dzahn) >>! In T93983#1157794, @Dzahn wrote: >>>! In T93983#1151838, @bd808 wrote: >> The fix will be to update `mediawiki-installation` which is currently maintained manually to use FQD... [19:26:34] (03CR) 1020after4: [C: 031] Add skins to wgLocalisationUpdateRepositories [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169716 (https://bugzilla.wikimedia.org/67154) (owner: 10Reedy) [19:26:53] andre__: schould i file a bugreport? can't find an existing one with phab searchfunction [19:29:26] 6operations, 7network: Network congestion between DTAG & eqiad - https://phabricator.wikimedia.org/T92548#1157806 (10Ciencia_Al_Poder) Has it been resolved really? I currently see slowness at this time (from Spain): ``` $:/tmp> wget http://tools.wmflabs.org/snapshots/builds/mediawiki-core/mediawiki-snapshot-... [19:31:27] (03CR) 10Dzahn: [C: 031] scap: improve deploy2graphite [puppet] - 10https://gerrit.wikimedia.org/r/199857 (https://phabricator.wikimedia.org/T1387) (owner: 10Filippo Giunchedi) [19:32:32] 6operations, 7network: Network congestion between DTAG & eqiad - https://phabricator.wikimedia.org/T92548#1157822 (10Florian) @Ciencia_Al_Poder: I have no problems from Germany, but you're using another ISP than me. [19:33:33] 6operations, 10ops-codfw, 5Patch-For-Review: rack/onsite setup of ganeti2001-2006 - https://phabricator.wikimedia.org/T91977#1157834 (10Dzahn) [19:33:34] 6operations, 10ops-codfw: set asset tag mgmt dns entries - https://phabricator.wikimedia.org/T94041#1157832 (10Dzahn) 5Open>3Resolved @Papaul I amended to your change and left comments on gerrit how i did it, then Robh merged it. So this is done now. See comments on https://gerrit.wikimedia.org/r/#/c/20... [19:34:22] (03CR) 1020after4: [C: 031] "So if we delete it from the git repo, it'll get recreated how?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188388 (https://phabricator.wikimedia.org/T75905) (owner: 10Reedy) [19:35:55] 6operations: Force https for archiva.wikimedia.org - https://phabricator.wikimedia.org/T88139#1157837 (10RobH) The certificate order is tracked on https://rt.wikimedia.org/Ticket/Display.html?id=9286 [19:40:59] (03CR) 1020after4: [C: 031] "looks good, I don't have a scap test environment set up to run, however, from a visual inspection of the code it looks like this should wo" [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [19:41:28] (03PS2) 10Rush: Update uninstalled applications IDs [puppet] - 10https://gerrit.wikimedia.org/r/199966 (owner: 10Negative24) [19:41:36] (03CR) 10Rush: [C: 032 V: 032] Update uninstalled applications IDs [puppet] - 10https://gerrit.wikimedia.org/r/199966 (owner: 10Negative24) [19:43:03] ori: hiya [19:43:10] trying to play with rcstream [19:43:18] getting 404 [19:43:18] In [10]: socketIO = socketIO_client.SocketIO('stream.wikimedia.org', 80) [19:43:18] WARNING:root:stream.wikimedia.org:80/socket.io [waiting for connection] unexpected status code (404) [19:43:25] in server logs, i see: [19:43:33] "GET /socket.io/?EIO=3&transport=polling&t=1427485344028-0 HTTP/1.1" 404 96 0.000101 [19:46:04] (03CR) 10Dzahn: [C: 04-1] "what jenkins says: undefined name 'cfg'. that's in utils.py, see inline comment. should be "conf" instead?" (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [19:48:11] Hm, does anyone use rcstream? is it brooken? [19:50:01] Someone does, there are examples in the docs IIRC [19:50:10] ja, i got examples, but it isn't working [19:50:13] getting 404 [19:50:15] using the examples [19:50:42] ottomata: are you using python to try that? [19:50:51] yes [19:50:55] from labs [19:50:57] RCStream is not accessible from python client ... [19:51:02] https://phabricator.wikimedia.org/T91393 [19:51:03] ? [19:51:11] oh. [19:51:11] hm [19:51:18] due to using socket-io 1.0 while only socket-io 0.9 is offered .. shrug. just found it [19:51:32] HMMM [19:51:40] ok [19:51:41] thanks mutante [19:51:46] that must be my problem [19:51:48] much appreciated [19:52:30] you're welcome [19:58:48] 6operations, 7network: Network congestion between DTAG & eqiad - https://phabricator.wikimedia.org/T92548#1157844 (10JanZerebecki) I think you issue is not related to this one. Perhaps contact your ISP telefonica their support? Your reverse: ``` $ mtr -c 1 --report-wide --report 80.58.75.241 HOST: bastion1... [20:01:24] (03PS3) 1020after4: proxies: allow filtering by datacenter [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [20:01:38] (03CR) 10jenkins-bot: [V: 04-1] proxies: allow filtering by datacenter [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [20:01:40] ottomata: pywikibot solved by downgrading [20:01:57] aye Nemo_bis, thanks, i downgraded to and it is doing better [20:02:01] still not working, gettin g [20:02:04] WARNING:socketIO_client:stream.wikimedia.org:80/socket.io/1: [packet error] unhandled namespace path () [20:02:12] but i can see that it is at least receiving something [20:02:18] i'm just doing something wrong [20:02:22] mayb ethe example needs updating, dunno [20:10:48] !log thinning out old renders in restbase, keeping only the latest per revision; starting with group0, followed by wikipedia once done [20:10:57] Logged the message, Master [20:14:48] (03PS4) 1020after4: proxies: allow filtering by datacenter [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [20:16:04] (03CR) 1020after4: "re-factored slightly to be more efficient, only do the for loop once (this also avoids the long line that flake8 was complaining about)" [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [20:23:20] (03PS1) 10Catrope: Set RESTbase URL in SetupAfterCache hook [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200228 [20:23:25] James_F: ----^^ [20:24:35] legoktm: I'm forced to do https://gerrit.wikimedia.org/r/200228 but I hate it, do you have a better idea? [20:24:57] (03CR) 10Jforrester: [C: 031] "Pending Legoktm's OK." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200228 (owner: 10Catrope) [20:26:41] RoanKattouw: ew :/, can you use $wgExtensionFunctions at least? [20:27:06] legoktm: D'oh, of course [20:27:37] (03PS2) 10Catrope: Set RESTbase URL in SetupAfterCache hook [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200228 [20:29:02] (03CR) 10Legoktm: [C: 031] "Ew :/, we should figure out a better long-term solution." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200228 (owner: 10Catrope) [20:29:53] RoanKattouw: oh, need to update the commit message :P [20:31:29] (03CR) 10Chad: "They get recreated by the dumpInterwiki script. My point has just been that we only need to create & sync them, not check the blobs into g" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188388 (https://phabricator.wikimedia.org/T75905) (owner: 10Reedy) [20:34:32] (03CR) 10Nikerabbit: [C: 031] "Can this go in a SWAT?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169716 (https://bugzilla.wikimedia.org/67154) (owner: 10Reedy) [20:35:31] (03PS1) 10Andrew Bogott: Update the password for labs rabbitmq [puppet] - 10https://gerrit.wikimedia.org/r/200231 [20:36:49] (03CR) 10Jforrester: [C: 032] Set RESTbase URL in SetupAfterCache hook [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200228 (owner: 10Catrope) [20:37:42] (03CR) 10Andrew Bogott: [C: 032] Update the password for labs rabbitmq [puppet] - 10https://gerrit.wikimedia.org/r/200231 (owner: 10Andrew Bogott) [20:41:37] (03PS1) 10Andrew Bogott: Fix epic c/p error in the last rabbit password patch [puppet] - 10https://gerrit.wikimedia.org/r/200232 [20:43:33] (03CR) 10Andrew Bogott: [C: 032] Fix epic c/p error in the last rabbit password patch [puppet] - 10https://gerrit.wikimedia.org/r/200232 (owner: 10Andrew Bogott) [20:46:29] (03PS3) 10Catrope: Set RESTbase URL in an extension function [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200228 [20:46:41] (03CR) 10Catrope: [C: 032] "Per James" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200228 (owner: 10Catrope) [20:47:28] (03PS1) 10Dzahn: mailman i/o monitoring: raise timeout to 23 [puppet] - 10https://gerrit.wikimedia.org/r/200233 [20:48:13] (03Merged) 10jenkins-bot: Set RESTbase URL in an extension function [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200228 (owner: 10Catrope) [20:54:13] (03CR) 10Greg Grossmeier: [C: 031] "plz do asap, beta is broke without this" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198662 (owner: 10BryanDavis) [20:54:22] (03PS2) 10Dzahn: mailman i/o monitoring: raise timeout [puppet] - 10https://gerrit.wikimedia.org/r/200233 [20:55:41] (03PS3) 10Dzahn: mailman i/o monitoring: raise timeout [puppet] - 10https://gerrit.wikimedia.org/r/200233 (https://phabricator.wikimedia.org/T93783) [20:57:06] (03PS4) 10Dzahn: mailman i/o monitoring: raise timeout [puppet] - 10https://gerrit.wikimedia.org/r/200233 (https://phabricator.wikimedia.org/T93783) [20:57:46] (03CR) 10Dzahn: [C: 032] mailman i/o monitoring: raise timeout [puppet] - 10https://gerrit.wikimedia.org/r/200233 (https://phabricator.wikimedia.org/T93783) (owner: 10Dzahn) [20:58:26] (03CR) 10Chad: [C: 032] Monolog: simplify beta configurations [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198662 (owner: 10BryanDavis) [21:02:38] !log redeploy security patches to wmf23 [21:02:43] Logged the message, Master [21:03:30] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [21:06:09] icinga-wm: it's ok, it's chris [21:07:07] (03Merged) 10jenkins-bot: Monolog: simplify beta configurations [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198662 (owner: 10BryanDavis) [21:07:32] <^d> No, that's not chris. [21:07:33] <^d> Ugh [21:07:37] <^d> My fault [21:07:41] oh, heh [21:07:46] <^d> RoanKattouw: I'm going to no-op that beta logging config live ^ [21:07:51] go for it [21:09:00] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [21:09:25] !log demon Synchronized wmf-config/logging-labs.php: shut up icinga, you're drunk (duration: 00m 07s) [21:09:31] Logged the message, Master [21:09:50] It's Friday. Of course I'm drunk. [21:10:28] !log redeploy security patches to wmf22 [21:10:33] Logged the message, Master [21:11:23] I know no one sees them, but I favorite the tweets of !logs that I like [21:11:30] https://twitter.com/wikimediatech/status/581563723652943872 [21:12:15] <^d> greg-g: Do you have a link to the feed? [21:12:17] <^d> Of your favs? [21:12:18] <^d> :) [21:12:42] heh [21:12:56] I don't think twitter is that sophisticated [21:12:57] 6operations, 6Phabricator, 6Project-Creators: create procurement project - https://phabricator.wikimedia.org/T93796#1158229 (10Dzahn) since the basic requirement is "We don't want folks joining this project unless they are in WMF-NDA" and assuming that's true, it seems to me like we don't actually much of a... [21:15:30] (03PS1) 10Catrope: Follow-up 9dd85fe3d1e7: add missing global [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200237 [21:15:39] legoktm: I'm an idiot :( https://gerrit.wikimedia.org/r/200237 [21:17:27] (03CR) 10Chad: [C: 032] Follow-up 9dd85fe3d1e7: add missing global [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200237 (owner: 10Catrope) [21:17:33] (03Merged) 10jenkins-bot: Follow-up 9dd85fe3d1e7: add missing global [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200237 (owner: 10Catrope) [21:17:39] <^d> The other option is passing both globals in a use() but I guess it's all the same [21:18:14] !log demon Synchronized wmf-config/CommonSettings-labs.php: for completeness (duration: 00m 09s) [21:18:16] <^d> RoanKattouw: All done [21:18:21] Logged the message, Master [21:18:24] ty ^d [21:19:28] <^d> greg-g: I never was good at sports, but I can deploy faster than just about anyone :) [21:19:59] :) :) [21:20:13] <^d> Only falls down when I get too clever with my own git and trip over myself. [21:20:20] <^d> Or Jenkins decided to take 30 minutes [21:22:58] 6operations, 7Mail, 7Monitoring, 5Patch-For-Review: Mailing lists alerts - https://phabricator.wikimedia.org/T93783#1158275 (10Dzahn) I/O monitoring on sodium now running: Current Status: OK (for 0d 0h 12m 10s) Status Information: OK - I/O stats: Transfers/Sec=28.80 Read Requests/Sec=0.10 Write Requ... [21:25:22] 6operations, 7Mail, 7Monitoring, 5Patch-For-Review: Mailing lists alerts - https://phabricator.wikimedia.org/T93783#1158295 (10Dzahn) please feel free to help adjusting the threshholds: it's ``` nrpe_command => '/usr/local/lib/nagios/plugins/check_iostat -i -w 150,10,200,100,5000 -c 300,20,400,200,10000... [21:30:14] hey guys, could somebody please restart zotero @ sca100[1-2].eqiad ? [21:31:00] robh: ^^ since you on duty yo [21:31:19] cheers greg-g :) [21:31:55] oh btw greg-g still have to reply to your mail, but TL;DR i'm in for the 'future-of-deployment' team [21:33:02] mobrovac: sweet! :) [21:54:40] (03PS1) 10Dzahn: cassandra: add monitoring for CQL interface [puppet] - 10https://gerrit.wikimedia.org/r/200242 (https://phabricator.wikimedia.org/T93886) [22:00:52] (03CR) 10Dzahn: [C: 032] cassandra: add monitoring for CQL interface [puppet] - 10https://gerrit.wikimedia.org/r/200242 (https://phabricator.wikimedia.org/T93886) (owner: 10Dzahn) [22:12:20] 6operations, 10MediaWiki-extensions-Sentry, 6Multimedia, 3Multimedia-Sprint-2015-03-25: Procure hardware for Sentry - placeholder (not a live request) - https://phabricator.wikimedia.org/T93138#1158523 (10Tgr) [22:13:19] urandom: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=cql [22:13:42] 6operations, 10RESTBase, 10RESTBase-Cassandra, 5Patch-For-Review: Cassandra/CQL query interface monitoring - https://phabricator.wikimedia.org/T93886#1158538 (10Dzahn) added TCP port check to Icinga: https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=cql [22:14:21] greg-g: did someone take care of that for you? [22:14:25] mutante: sweet [22:14:31] mutante: did you just do that? [22:14:31] i was afk =] [22:14:39] urandom: yea [22:14:39] mobrovac: ^ [22:14:45] mutante: very nice [22:14:48] :) [22:15:04] mobrovac: did the zotero restart happen or still need a root? [22:15:04] robh: it's me i needed help :) [22:15:23] robh: did not happen, still need it [22:15:42] urandom: it's not paging or mailing specific people just yet, it will show up on IRC though [22:15:45] 6operations, 10MediaWiki-extensions-Sentry, 6Multimedia, 3Multimedia-Sprint-2015-03-25: Procure hardware for Sentry - placeholder (not a live request) - https://phabricator.wikimedia.org/T93138#1158539 (10Tgr) [22:15:55] robh: haven't got enough privilege for that [22:16:10] why are maintenance reports on wikitech disabled? for example https://wikitech.wikimedia.org/wiki/Special:UncategorizedPages and https://wikitech.wikimedia.org/wiki/Special:BrokenRedirects [22:16:18] 6operations, 10MediaWiki-extensions-Sentry, 6Multimedia, 3Multimedia-Sprint-2015-03-25: Procure hardware for Sentry - placeholder (not a live request) - https://phabricator.wikimedia.org/T93138#1130243 (10Tgr) [22:16:25] mobrovac: doign now [22:16:31] cool thnx [22:17:02] !log manually restarted zotero service on sca100[1-2] [22:17:09] Logged the message, Master [22:17:21] mobrovac: im no longer afk so if that doesnt do it, lemme know [22:17:41] robh: ok, cheers [22:18:31] mutante: ok [22:18:42] mutante: what did you use to write the check? [22:19:08] urandom: https://gerrit.wikimedia.org/r/#/c/200242/1/manifests/role/cassandra.pp [22:19:57] monitoring::service for things from external, nrpe::monitor_service when it needs to be executed on the monitored host [22:20:19] 7Blocked-on-Operations, 6operations, 6Scrum-of-Scrums, 3Continuous-Integration-Isolation: Review Jenkins isolation architecture with Antoine - https://phabricator.wikimedia.org/T92324#1158547 (10hashar) Thanks Chase, I guess the project seems bigger than it really is for ops. I am looking to get: - one o... [22:20:57] 6operations, 10MediaWiki-extensions-Sentry, 6Multimedia, 3Multimedia-Sprint-2015-03-25: Procure hardware for Sentry - placeholder (not a live request) - https://phabricator.wikimedia.org/T93138#1158550 (10Tgr) [22:21:29] mutante: oh I see, it's a tcp port check [22:22:14] urandom: yes, it's check_tcp from nagios-plugins package [22:23:04] 6operations, 10MediaWiki-extensions-Sentry, 6Multimedia, 3Multimedia-Sprint-2015-03-25: Procure hardware for Sentry - placeholder (not a live request) - https://phabricator.wikimedia.org/T93138#1158554 (10Tgr) [22:23:28] per "a TCP port check of 9042 would be better" but not the "custom check that ran a simple CQL query" [22:34:22] (03PS1) 10BryanDavis: beta: Stop running git clone as mwdeploy [puppet] - 10https://gerrit.wikimedia.org/r/200248 [22:34:47] (03PS2) 10BryanDavis: beta: Stop running git clone as mwdeploy [puppet] - 10https://gerrit.wikimedia.org/r/200248 (https://phabricator.wikimedia.org/T94261) [22:36:05] (03PS2) 10Thcipriani: Trebuchet group wikidev; mw-staging owner mwdeploy [puppet] - 10https://gerrit.wikimedia.org/r/199988 [22:36:12] 6operations, 10Datasets-General-or-Unknown, 7HHVM: Convert snapshot hosts to use HHVM and trusty - https://phabricator.wikimedia.org/T94277#1158594 (10hoo) 3NEW [22:39:14] 6operations, 10MediaWiki-extensions-Sentry, 6Multimedia, 10hardware-requests, 3Multimedia-Sprint-2015-03-25: Procure hardware for Sentry - placeholder (not a live request) - https://phabricator.wikimedia.org/T93138#1158603 (10Tgr) [22:40:03] (03CR) 10BryanDavis: "cherry-picked to beta for testing" [puppet] - 10https://gerrit.wikimedia.org/r/200248 (https://phabricator.wikimedia.org/T94261) (owner: 10BryanDavis) [22:40:18] (03CR) 10BryanDavis: "cherry-picked to beta for testing" [puppet] - 10https://gerrit.wikimedia.org/r/199988 (owner: 10Thcipriani) [22:42:47] robh: could you please restart zotero once more? it seems the deployed changes hasn't been pulled in (even though I can see the files on sca100? are updated) [22:42:50] which is usper-strange [22:43:10] sure [22:43:44] done [22:43:50] grazie [22:44:09] welcoe [22:44:11] welcome even [22:47:52] i see role::sca, but what does sca stand for? [22:48:38] service cluster a [22:48:42] https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions [22:49:01] heh, pretty meaningless [22:49:03] thanks [22:52:38] Has anyone been having trouble connecting to upload.? [22:58:02] 6operations: upload.wikimedia.org not loading 3/27/2015 - https://phabricator.wikimedia.org/T94269#1158686 (10Krenair) a:5Nemo_bis>3None [22:58:11] (03Abandoned) 10Mjbmr: Add alias namespace for previous project namespace fawikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195017 (owner: 10Mjbmr) [23:04:45] (03PS4) 10Alex Monk: cleanup: upload has been disabled on outreachwiki, no longer needed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/196885 (owner: 10Steinsplitter) [23:06:18] (03PS1) 10Mattflaschen: Enable VisualEditor for Flow posts on Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200252 [23:08:45] 6operations: boron passive checks aren't being collected - https://phabricator.wikimedia.org/T89983#1158772 (10Dzahn) >>! In T89983#1050852, @Jgreen wrote: > probably relevant -- we recently upgraded boron from precise to trusty, and someone mentioned that nsca may be broken for trusty? how recent was that? i l... [23:09:28] (03CR) 10EBernhardson: [C: 031] Enable VisualEditor for Flow posts on Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200252 (owner: 10Mattflaschen) [23:10:20] 6operations: upload.wikimedia.org not loading 3/27/2015 - https://phabricator.wikimedia.org/T94269#1158791 (10SlayerFanatic1999) Hello @Krenair, it's this that's not loading fine: upload.wikimedia.org/wikipedia/commons/a/a9/John_Forbes_Nash%2C_Jr._by_Peter_Badge.jpg [23:11:43] 6operations: upload.wikimedia.org not loading 3/27/2015 - https://phabricator.wikimedia.org/T94269#1158800 (10Krenair) Nope, that's fine. [23:15:34] 6operations: boron passive checks aren't being collected - https://phabricator.wikimedia.org/T89983#1158804 (10Dzahn) regarding firewalling: neon has a hole for: ACCEPT tcp -- 10.64.40.0/24 anywhere tcp dpt:nsca and boron.frack.eqiad.wmnet has address 10.64.40.66 So i ran tcpdump to... [23:17:22] 6operations, 6Commons, 6Multimedia, 7HHVM, and 4 others: Convert Imagescalers to HHVM, Trusty - https://phabricator.wikimedia.org/T84842#1158824 (10Liuxinyu970226) [23:18:00] (03CR) 10EBernhardson: [C: 032] Enable VisualEditor for Flow posts on Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200252 (owner: 10Mattflaschen) [23:18:05] (03Merged) 10jenkins-bot: Enable VisualEditor for Flow posts on Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200252 (owner: 10Mattflaschen) [23:18:50] (03CR) 10Mattflaschen: "/tmp/hudson4427506681602253614.sh: line 2: /usr/local/bin/wmf-beta-mwconfig-update: No such file or directory" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200252 (owner: 10Mattflaschen) [23:23:38] 6operations: boron passive checks aren't being collected - https://phabricator.wikimedia.org/T89983#1158870 (10Dzahn) nevermind, i found the sudo password for boron. so i can also confirm outgoing firewall rule seems to be there, destination port 5667 on neon's IP. but indeed, no packets seem to be going out wh... [23:27:03] (03PS3) 10BryanDavis: beta: Fix ::beta::autoupdater to work again [puppet] - 10https://gerrit.wikimedia.org/r/200248 (https://phabricator.wikimedia.org/T94261) [23:29:42] 6operations: upload.wikimedia.org not loading 3/27/2015 - https://phabricator.wikimedia.org/T94269#1158894 (10Florian) Works for me, too. @SlayerFanatic1999: Can you try another browser? Do you connect to the internet through a proxy server? Can you make a [[ https://en.wikipedia.org/wiki/Ping_(networking_utili... [23:30:27] 6operations: upload.wikimedia.org not loading 3/27/2015 - https://phabricator.wikimedia.org/T94269#1158895 (10Krenair) 5Open>3Invalid a:3Krenair [23:30:35] 6operations: upload.wikimedia.org not loading 3/27/2015 - https://phabricator.wikimedia.org/T94269#1158426 (10Krenair) a:5Krenair>3None [23:33:42] (03PS4) 10BryanDavis: beta: Fix ::beta::autoupdater to work again [puppet] - 10https://gerrit.wikimedia.org/r/200248 (https://phabricator.wikimedia.org/T94261) [23:34:36] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [23:36:45] People Y U no sync your beta changes :P [23:38:45] hoo: mattflaschen@deployment-bastion:~$ ls -l /usr/local/bin/wmf-beta-mwconfig-update.wonky.wonk.wonk [23:39:01] it was turned off :P [23:40:55] why did you not sync it to production? [23:41:12] Krenair: -labs only changes [23:41:14] Yeah, the production sync is more of an issue [23:41:16] Still [23:42:08] do we have written anywhere that you should sync those? seems incorrect to me [23:42:38] -labs only changes must be synced along with every other file [23:42:45] if you change a file in that repo, you sync it [23:52:20] (03PS5) 10BryanDavis: beta: Fix ::beta::autoupdater to work again [puppet] - 10https://gerrit.wikimedia.org/r/200248 (https://phabricator.wikimedia.org/T94261) [23:52:21] Sorry, I forgot about that. Will do in the future.