[00:00:04] RoanKattouw, ^d, Krenair: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150304T0000). [00:02:31] ^d: CA will need a scap or I can disable the special page for now and wait until tomorrow's scap [00:02:51] <^d> I'll just scap everything [00:03:15] ok :D [00:09:39] !log running concurrent test dumps of enwiki and dewiki through xenon [00:09:46] Logged the message, Master [00:12:47] PROBLEM - Kafka Broker Messages In Per Second on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 15 data above and 45 below the confidence bounds [00:14:04] !log demon Started scap: evening swat: centralauth, VE, user.php fix [00:14:09] Logged the message, Master [00:19:27] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [500.0] [00:31:17] PROBLEM - HHVM rendering on mw1184 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:31:37] RECOVERY - Kafka Broker Messages In Per Second on graphite1001 is OK: OK: No anomaly detected [00:31:47] PROBLEM - Apache HTTP on mw1184 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [00:33:38] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [00:36:57] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Puppet has 2 failures [00:44:24] !log demon Finished scap: evening swat: centralauth, VE, user.php fix (duration: 30m 19s) [00:44:32] (03CR) 10Krinkle: "This leaves a 404/empty directory linked from https://noc.wikimedia.org/ "Core DB Replication Layout and Lag" link." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193143 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [00:44:33] Logged the message, Master [00:45:28] 6operations, 5Patch-For-Review: dbtree - duplicated code in 2 locations - puppetize config - https://phabricator.wikimedia.org/T90837#1085177 (10Krinkle) 5Resolved>3Open >>! In T90837#1080595, @Dzahn wrote: > Patches have been merged, the mw-config patch has been deployed in SWAT. confirmed on terbium. >... [00:45:34] (03CR) 10Krinkle: "(See T90837.)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193143 (https://phabricator.wikimedia.org/T90837) (owner: 10Dzahn) [00:46:23] <^d> Krenair, legoktm: Scap done, plz verify [00:46:47] ^d: https://meta.wikimedia.org/wiki/Special:UsersWhoWillBeRenamed looks good :D [00:46:54] thanks! [00:46:59] <^d> yw [00:47:40] ^d: seems non-broken, thanks [00:52:07] PROBLEM - HHVM queue size on mw1184 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [80.0] [00:52:37] (03PS1) 10GWicke: Enable phase0 and *.wikipedia.org wikis in restbase [puppet] - 10https://gerrit.wikimedia.org/r/194244 [00:54:36] (03PS2) 10GWicke: Enable phase0 and *.wikipedia.org wikis in restbase [puppet] - 10https://gerrit.wikimedia.org/r/194244 [00:55:27] gwicke, testwikidata, zerowiki? [00:55:44] ah, zerowiki is private [00:55:54] is there VE use on testwikidata? [00:56:47] yes [00:57:07] ok [00:57:41] do you know the dbname for testwikidata? [00:57:46] http://parsoid-lb.eqiad.wikimedia.org/testwikidata/Main_Page [00:57:48] testwikidatawiki? [00:57:50] not found [00:57:52] that [00:57:53] yes [00:58:12] http://parsoid-lb.eqiad.wikimedia.org/testwikidatawiki/Main_Page doesn't work either [00:58:55] interesting [00:59:01] https://test.wikidata.org/wiki/Wikidata:Main_Page?veaction=edit works for me [00:59:03] http://parsoid-lb.eqiad.wikimedia.org/testwikidatawiki/Wikidata%3AMain_Page?oldid=9400 [00:59:10] forgot the Wikidata: [00:59:25] so yeah, let me add that [00:59:37] oh right, yeah I don't think you can redirect that with wikibase [01:02:24] (03PS3) 10GWicke: Enable test/phase0 and *.wikipedia.org wikis in restbase [puppet] - 10https://gerrit.wikimedia.org/r/194244 [01:05:36] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [01:06:38] 6operations, 5Patch-For-Review: dbtree - duplicated code in 2 locations - puppetize config - https://phabricator.wikimedia.org/T90837#1085254 (10Dzahn) That looks like the last mw-deploy/scap emptied out the directory (again) which shouldn't happen because that was supposed to be gone from the mw config reposi... [01:11:50] 6operations, 5Patch-For-Review: dbtree - duplicated code in 2 locations - puppetize config - https://phabricator.wikimedia.org/T90837#1085257 (10Chad) @dzahn rsync deleting files that aren't in the deployment version? [01:13:28] 6operations, 5Patch-For-Review: dbtree - duplicated code in 2 locations - puppetize config - https://phabricator.wikimedia.org/T90837#1085260 (10Dzahn) https://gerrit.wikimedia.org/r/#/c/193143/ ^ this should have deleted that. why does a scap run still touch that directory and remove the contents that puppet... [01:15:28] 6operations, 5Patch-For-Review: dbtree - duplicated code in 2 locations - puppetize config - https://phabricator.wikimedia.org/T90837#1085261 (10Dzahn) >>! In T90837#1085257, @Chad wrote: > @dzahn rsync deleting files that aren't in the deployment version? in that case, all the related changes here are a prob... [01:27:11] (03PS1) 10Dzahn: dbtree: add to misc varnish config [puppet] - 10https://gerrit.wikimedia.org/r/194246 (https://phabricator.wikimedia.org/T90837) [01:30:20] (03PS1) 10Dzahn: add dbtree and point to misc-web [dns] - 10https://gerrit.wikimedia.org/r/194247 (https://phabricator.wikimedia.org/T90837) [01:36:39] (03PS1) 10Dzahn: dbtree: add Apache config, move to own docroot [puppet] - 10https://gerrit.wikimedia.org/r/194248 (https://phabricator.wikimedia.org/T90837) [01:39:52] (03PS1) 10Dzahn: noc apache,adjust old path to this file in comment [puppet] - 10https://gerrit.wikimedia.org/r/194249 [01:42:18] mutante: unbreaking dbtree seems to had quite a flow-on effect. sorry about that :) [01:58:38] springle: arr, yea, i didn't think about rsync running with --delete , so mw deploy would still delete it even when gone from mw-config repo [01:59:35] springle: so now it's "make it dbtree.wm.org like all the other tools linked from noc" i'll merge tomorrow though [02:15:06] !log l10nupdate Synchronized php-1.25wmf18/cache/l10n: (no message) (duration: 00m 01s) [02:15:14] Logged the message, Master [02:16:13] !log LocalisationUpdate completed (1.25wmf18) at 2015-03-04 02:15:10+00:00 [02:16:18] Logged the message, Master [02:28:11] !log l10nupdate Synchronized php-1.25wmf19/cache/l10n: (no message) (duration: 00m 01s) [02:28:20] Logged the message, Master [02:29:19] !log LocalisationUpdate completed (1.25wmf19) at 2015-03-04 02:28:15+00:00 [02:29:24] Logged the message, Master [03:37:46] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: Puppet last ran 4 hours ago [03:48:30] (03CR) 10Santhosh: [C: 031] CX: Publish translations to the Main namespace by default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193835 (owner: 10KartikMistry) [03:49:02] 6operations, 10Continuous-Integration, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1085388 (10Krinkle) Similar: ``` Mar 4 03:35:20 integration-slave1010 puppet-agent[27419]: Could not retrieve catalog from remote server: Error 40... [03:56:38] 6operations, 10Continuous-Integration, 5Patch-For-Review: Jenkins: Re-enable lint checks for Apache config in operations-puppet - https://phabricator.wikimedia.org/T72068#1085401 (10Krinkle) >>! In T72068#1056111, @scfc wrote: > Could the `operations-apache-config-lint` job be completely removed until it is... [03:56:46] 6operations, 10Continuous-Integration: Jenkins: Re-enable lint checks for Apache config in operations-puppet - https://phabricator.wikimedia.org/T72068#1085402 (10Krinkle) [04:10:17] PROBLEM - puppet last run on xenon is CRITICAL: Timeout while attempting connection [04:34:25] (03PS1) 10KartikMistry: Beta: CX: Add wgContentTranslationCampaigns [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194265 [04:34:58] (03CR) 10jenkins-bot: [V: 04-1] Beta: CX: Add wgContentTranslationCampaigns [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194265 (owner: 10KartikMistry) [04:40:03] (03PS2) 10KartikMistry: Beta: CX: Add wgContentTranslationCampaigns [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194265 [04:40:09] (03CR) 10jenkins-bot: [V: 04-1] Beta: CX: Add wgContentTranslationCampaigns [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194265 (owner: 10KartikMistry) [04:43:05] (03PS1) 10Ori.livneh: Revert a1fff17 [software/statsdlb] - 10https://gerrit.wikimedia.org/r/194266 [04:43:17] (03CR) 10Ori.livneh: [C: 032 V: 032] Revert a1fff17 [software/statsdlb] - 10https://gerrit.wikimedia.org/r/194266 (owner: 10Ori.livneh) [04:43:48] (03PS3) 10KartikMistry: Beta: CX: Add wgContentTranslationCampaigns [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194265 [04:58:25] (03PS1) 10Ori.livneh: Don't try to handle SIGTERM / SIGINT [software/statsdlb] - 10https://gerrit.wikimedia.org/r/194267 [05:05:30] (03CR) 10Ori.livneh: [C: 032 V: 032] Don't try to handle SIGTERM / SIGINT [software/statsdlb] - 10https://gerrit.wikimedia.org/r/194267 (owner: 10Ori.livneh) [05:15:44] (03PS1) 10Ori.livneh: Don't bother cleaning up on exit [software/statsdlb] - 10https://gerrit.wikimedia.org/r/194270 [05:16:15] (03CR) 10Ori.livneh: [C: 032 V: 032] Don't bother cleaning up on exit [software/statsdlb] - 10https://gerrit.wikimedia.org/r/194270 (owner: 10Ori.livneh) [05:32:47] PROBLEM - puppet last run on xenon is CRITICAL: Timeout while attempting connection [06:20:42] (03CR) 10Tim Landscheidt: "Friday is fine with me." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/119428 (https://phabricator.wikimedia.org/T62925) (owner: 10Tim Landscheidt) [06:27:16] <_joe_> morning [06:28:38] PROBLEM - puppet last run on logstash1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:17] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:38] PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:57] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:06] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:45:47] RECOVERY - puppet last run on mw1123 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [06:46:16] RECOVERY - puppet last run on logstash1002 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [06:46:28] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:46:37] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:47:17] RECOVERY - puppet last run on mw1042 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [06:55:13] 6operations, 10Wikimedia-Labs-Other, 7Tracking: (Tracking) Database replication services - https://phabricator.wikimedia.org/T50930#1085467 (10Springle) [07:01:39] 6operations: configure pt-kill for wikiuser on coredbs - https://phabricator.wikimedia.org/T82802#1085472 (10Springle) 5Open>3Resolved The solution in place is [1]. It still isn't perfect but good enough for now and doesn't require an external service that can be blocked by max_connections. [1] https://git.... [07:02:48] 6operations: add ES cluster to noc.wikimedia.org/dbtree reporting - https://phabricator.wikimedia.org/T81251#1085475 (10Springle) 5Open>3Resolved Done. [07:03:45] Nikerabbit: apparently sodium has been in swapdeath occasionally, there was a peak of 25k messages in queue https://ganglia.wikimedia.org/latest/?r=week&cs=&ce=&c=Miscellaneous+eqiad&h=sodium.wikimedia.org&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=ALLGROUPS [07:06:30] Looks rather frequent from November but not such peaks https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20eqiad&h=sodium.wikimedia.org&r=year&z=default&jr=&js=&st=1425452676&v=2.7&m=cpu_wio&vl=%25&ti=CPU%20wio&z=large [07:24:51] !log LocalisationUpdate ResourceLoader cache refresh completed at Wed Mar 4 07:23:48 UTC 2015 (duration 23m 47s) [07:25:00] Logged the message, Master [07:41:03] 6operations, 3wikis-in-codfw: install/deploy codfw appservers - https://phabricator.wikimedia.org/T85227#1085485 (10Joe) p:5Normal>3High [07:53:18] (03CR) 10Yuvipanda: [C: 031] Conntrack collector for diamond (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/192335 (owner: 10coren) [07:53:32] (03CR) 10Yuvipanda: Conntrack collector for diamond [puppet] - 10https://gerrit.wikimedia.org/r/192335 (owner: 10coren) [08:11:28] (03PS1) 10Giuseppe Lavagetto: codfw: assign IPs to mediawiki appservers in row a [dns] - 10https://gerrit.wikimedia.org/r/194286 [08:13:22] (03CR) 10Giuseppe Lavagetto: "If we want this to be in the class (and it may make sense), we should allow a parameter to allow/not allow this. We never used redis as a " [puppet] - 10https://gerrit.wikimedia.org/r/194095 (owner: 10Yuvipanda) [08:16:37] (03CR) 10Alexandros Kosiaris: "I ran this through puppet compiler throughout the fleet and seems like it wont break anything. Merging and shepherding into production" [puppet] - 10https://gerrit.wikimedia.org/r/123903 (owner: 10Tim Landscheidt) [08:16:43] (03CR) 10Alexandros Kosiaris: [C: 032] Use apt::repository instead of file resources [puppet] - 10https://gerrit.wikimedia.org/r/123903 (owner: 10Tim Landscheidt) [08:18:40] <^d> Opsen! [08:18:46] <^d> I fucked up in Phab [08:22:37] (03PS3) 10Yuvipanda: redis: Have redis machines overcommit if persistance is enabled [puppet] - 10https://gerrit.wikimedia.org/r/194095 [08:26:49] ^d: mails :D [08:30:00] (03CR) 10Alexandros Kosiaris: "Thank you! I did meet this while testing puppet on ruby1.9 and had managed to sidestep it but this is way better :-)" [puppet] - 10https://gerrit.wikimedia.org/r/194214 (https://phabricator.wikimedia.org/T91453) (owner: 10Dzahn) [08:30:30] (03PS1) 10KartikMistry: Beta: Enable Armenian (hy) in target wiki [puppet] - 10https://gerrit.wikimedia.org/r/194287 [08:30:56] (03PS4) 10Yuvipanda: redis: Have redis machines overcommit if persistance is enabled [puppet] - 10https://gerrit.wikimedia.org/r/194095 [08:34:07] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: Enable Armenian (hy) in target wiki [puppet] - 10https://gerrit.wikimedia.org/r/194287 (owner: 10KartikMistry) [08:34:46] (03CR) 10Alexandros Kosiaris: "armenian is hy ? weird" [puppet] - 10https://gerrit.wikimedia.org/r/194287 (owner: 10KartikMistry) [08:39:56] (03CR) 10Alexandros Kosiaris: [C: 032] codfw: assign IPs to mediawiki appservers in row a [dns] - 10https://gerrit.wikimedia.org/r/194286 (owner: 10Giuseppe Lavagetto) [08:41:24] (03CR) 10Mobrovac: [C: 04-1] "Yeah :) Let's just put the default storage group as well to prevent forgetting it at a later stage." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/194244 (owner: 10GWicke) [08:42:37] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "One typo. Apart from that, this would enable vm overcommit on rdb* databases, where it is not enabled right now. I'd need to ponder what e" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/194095 (owner: 10Yuvipanda) [08:43:58] (03PS5) 10Yuvipanda: redis: Have redis machines overcommit if persistance is enabled [puppet] - 10https://gerrit.wikimedia.org/r/194095 [08:48:18] (03Abandoned) 10Giuseppe Lavagetto: mediawiki: use mpm_worker everywhere [puppet] - 10https://gerrit.wikimedia.org/r/183828 (owner: 10Giuseppe Lavagetto) [08:49:33] akosiaris: ah. I was updating patch :/ [08:50:06] kart_: no harm done, submit a new one [08:50:58] yep [08:51:25] (03PS1) 10KartikMistry: Beta: Enable few language for March 2015 user testing [puppet] - 10https://gerrit.wikimedia.org/r/194290 (https://phabricator.wikimedia.org/T91371) [08:52:40] akosiaris: ^^ [08:52:43] (03Abandoned) 10Giuseppe Lavagetto: fixed clossing brace [puppet] - 10https://gerrit.wikimedia.org/r/193867 (owner: 10Papaul) [08:54:34] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: Enable few language for March 2015 user testing [puppet] - 10https://gerrit.wikimedia.org/r/194290 (https://phabricator.wikimedia.org/T91371) (owner: 10KartikMistry) [08:58:08] (03PS2) 10Giuseppe Lavagetto: added mc2135 [puppet] - 10https://gerrit.wikimedia.org/r/193864 (owner: 10Papaul) [09:09:01] (03CR) 10Alexandros Kosiaris: [C: 032] Add apache config for m.{project}.org (-wikipedia) [puppet] - 10https://gerrit.wikimedia.org/r/185461 (https://phabricator.wikimedia.org/T78421) (owner: 10Glaisher) [09:09:18] (03CR) 10Alexandros Kosiaris: [V: 032] Add apache config for m.{project}.org (-wikipedia) [puppet] - 10https://gerrit.wikimedia.org/r/185461 (https://phabricator.wikimedia.org/T78421) (owner: 10Glaisher) [09:42:13] 6operations, 10Continuous-Integration, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1087623 (10hashar) Upstream has migrated their bugtracker, their ticket is now https://tickets.puppetlabs.com/browse/PUP-1031 [09:43:23] 6operations, 10Continuous-Integration, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1087624 (10hashar) @Dzahn any idea why it does not seem to happen on the production puppetmaster? hashar@integration-puppetmaster:~$ puppet -... [10:15:07] PROBLEM - puppet last run on amssq31 is CRITICAL: CRITICAL: Puppet last ran 4 hours ago [10:15:16] (03PS1) 10KartikMistry: Beta: Fix config for CX registry in Yandex [puppet] - 10https://gerrit.wikimedia.org/r/194299 [10:15:44] akosiaris: one more :) ^^ [10:18:27] PROBLEM - puppet last run on cp4011 is CRITICAL: CRITICAL: Puppet last ran 4 hours ago [10:20:01] (03PS1) 10Giuseppe Lavagetto: codfw: assign IPs to mediawiki appservers in row b [dns] - 10https://gerrit.wikimedia.org/r/194301 [10:20:03] (03PS1) 10Giuseppe Lavagetto: codfw: assign IPs to mediawiki appservers in row c [dns] - 10https://gerrit.wikimedia.org/r/194302 [10:20:38] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1087737 (10Aklapper) >>! In T706#1083957, @chasemp wrote: > I think the docs are [[ http://www.mediawiki.org/wiki/Phabricator/Creating_and_ren... [10:42:12] (03CR) 10Giuseppe Lavagetto: [C: 031] added mc2135 [puppet] - 10https://gerrit.wikimedia.org/r/193864 (owner: 10Papaul) [10:54:14] 6operations, 7HTTPS, 3HTTPS-by-default: Point rel=canonical to HTTPS for all Russian Wikimedia projects - https://phabricator.wikimedia.org/T90527#1087816 (10Nemo_bis) [10:54:25] 6operations, 7HTTPS, 3HTTPS-by-default: Point rel=canonical to HTTPS for all Russian Wikimedia projects - https://phabricator.wikimedia.org/T90527#1061295 (10Nemo_bis) [10:55:58] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: Fix config for CX registry in Yandex [puppet] - 10https://gerrit.wikimedia.org/r/194299 (owner: 10KartikMistry) [10:56:47] 6operations, 7HTTPS, 3HTTPS-by-default: Point rel=canonical to HTTPS for all Russian Wikimedia projects - https://phabricator.wikimedia.org/T90527#1087821 (10Nemo_bis) Ok, as there doesn't seem to be a ticket about HSTS I'm using this one instead. [10:59:45] (03CR) 10Alexandros Kosiaris: [C: 032] codfw: assign IPs to mediawiki appservers in row b [dns] - 10https://gerrit.wikimedia.org/r/194301 (owner: 10Giuseppe Lavagetto) [11:00:55] <_joe_> thanks alex! [11:01:02] (03CR) 10Alexandros Kosiaris: [C: 032] codfw: assign IPs to mediawiki appservers in row c [dns] - 10https://gerrit.wikimedia.org/r/194302 (owner: 10Giuseppe Lavagetto) [11:02:24] 6operations, 10Continuous-Integration, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1087823 (10akosiaris) @hashar It has nothing to do with puppet version but ruby version $ ruby -v ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-... [11:21:52] (03Abandoned) 10Yuvipanda: salt: Use fqdn as client id for labs as well [puppet] - 10https://gerrit.wikimedia.org/r/179592 (https://phabricator.wikimedia.org/T1154) (owner: 10Yuvipanda) [11:23:56] (03Abandoned) 10Yuvipanda: androidsdk: Add class to set up wikipedia app build [puppet] - 10https://gerrit.wikimedia.org/r/167198 (owner: 10Yuvipanda) [11:39:52] (03PS1) 10Giuseppe Lavagetto: mediawiki: add jobrunner definitions for codfw [puppet] - 10https://gerrit.wikimedia.org/r/194308 [11:48:55] 6operations: Enable memory overcommit for all redis hosts with persistance - https://phabricator.wikimedia.org/T91498#1088055 (10yuvipanda) 3NEW [11:49:23] (03PS6) 10Yuvipanda: redis: Have redis machines overcommit if persistance is enabled [puppet] - 10https://gerrit.wikimedia.org/r/194095 (https://phabricator.wikimedia.org/T91498) [12:11:34] akosiaris: around? [12:11:52] akosiaris: config.js, should be at, /srv/deployment/cxserver/deploy/src ? (I can't find it) [12:12:00] deployment-cxserver03 [12:12:11] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: add jobrunner definitions for codfw [puppet] - 10https://gerrit.wikimedia.org/r/194308 (owner: 10Giuseppe Lavagetto) [12:16:27] kart_: cat /srv/deployment/cxserver/deploy/src/config.js [12:16:27] /* [12:16:28] * THIS FILE IS MANAGED BY PUPPET [12:16:29] etc etc etc [12:17:52] akosiaris: blah. [12:18:31] akosiaris: it wasn't there :) [12:18:55] had to restart service. [12:19:26] I sure did not create it .. [12:19:43] you sure it wasn't there ? maybe something happened ? [12:20:02] like jenkins/deploy destroying it or something [12:20:12] not that this should happen [12:20:26] I'm just guessing [12:22:23] PROBLEM - RAID on mw2001 is CRITICAL: Connection refused by host [12:22:44] PROBLEM - configured eth on mw2001 is CRITICAL: Connection refused by host [12:22:55] PROBLEM - dhclient process on mw2001 is CRITICAL: Connection refused by host [12:23:04] PROBLEM - mediawiki-installation DSH group on mw2001 is CRITICAL: Host mw2001 is not in mediawiki-installation dsh group [12:23:14] PROBLEM - nutcracker port on mw2001 is CRITICAL: Connection refused by host [12:23:33] PROBLEM - nutcracker process on mw2001 is CRITICAL: Connection refused by host [12:23:45] PROBLEM - puppet last run on mw2001 is CRITICAL: Connection refused by host [12:23:54] PROBLEM - salt-minion processes on mw2001 is CRITICAL: Connection refused by host [12:24:14] PROBLEM - DPKG on mw2001 is CRITICAL: Connection refused by host [12:24:24] PROBLEM - Disk space on mw2001 is CRITICAL: Connection refused by host [12:29:54] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [12:31:12] akosiaris: This is first time I've noticed that, will check while doing commit to /deploy [12:33:14] RECOVERY - Disk space on mw2001 is OK: DISK OK [12:33:23] RECOVERY - RAID on mw2001 is OK: OK: no RAID installed [12:33:43] RECOVERY - configured eth on mw2001 is OK: NRPE: Unable to read output [12:33:44] RECOVERY - salt-minion processes on mw2001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [12:33:54] RECOVERY - dhclient process on mw2001 is OK: PROCS OK: 0 processes with command name dhclient [12:34:04] RECOVERY - DPKG on mw2001 is OK: All packages OK [12:43:05] 6operations, 10Continuous-Integration: Provide lint for yaml files in operations repository - https://phabricator.wikimedia.org/T91496#1088194 (10hashar) One would need to write a test suite that is able to test the hiera files are valid and eventually add some integration test on the resulting configuration.... [12:44:24] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [13:05:00] (03PS2) 10BBlack: kill exec bit on systemd unit files [puppet] - 10https://gerrit.wikimedia.org/r/194237 [13:05:06] (03CR) 10BBlack: [C: 032 V: 032] kill exec bit on systemd unit files [puppet] - 10https://gerrit.wikimedia.org/r/194237 (owner: 10BBlack) [13:06:15] (03PS1) 10BBlack: depool amssq31 for reinstall [puppet] - 10https://gerrit.wikimedia.org/r/194318 [13:06:26] <_joe_> bblack: yeah, makes sense [13:06:30] (03CR) 10BBlack: [C: 032 V: 032] depool amssq31 for reinstall [puppet] - 10https://gerrit.wikimedia.org/r/194318 (owner: 10BBlack) [13:07:07] I was going to just leave it alone, but then the latest jessie systemd now actually logs about it :p [13:07:25] <_joe_> oh, well, how annoying [13:09:23] Mar 3 19:31:20 cp1060 systemd[1]: Configuration file /etc/systemd/system/varnish-frontend.service is marked executable. Please remove executable permission bits. Proceeding anyway. [13:09:38] ^ if you're just going to proceed anyways, why complain? :P [13:11:04] !log depooled amssq31 in esams for reinstall [13:11:11] Logged the message, Master [13:18:23] (03PS1) 10BBlack: set amssq caches to new jessie disk layout [puppet] - 10https://gerrit.wikimedia.org/r/194319 [13:19:02] (03CR) 10BBlack: [C: 032 V: 032] set amssq caches to new jessie disk layout [puppet] - 10https://gerrit.wikimedia.org/r/194319 (owner: 10BBlack) [13:25:14] RECOVERY - puppet last run on amssq31 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [13:29:51] (03PS1) 10BBlack: amssq31 -> jessie [puppet] - 10https://gerrit.wikimedia.org/r/194320 [13:30:05] (03CR) 10BBlack: [C: 032 V: 032] amssq31 -> jessie [puppet] - 10https://gerrit.wikimedia.org/r/194320 (owner: 10BBlack) [13:42:53] 6operations, 10ops-eqiad: Rack Setup new diskshelf for labstore1001 - https://phabricator.wikimedia.org/T88802#1088232 (10Cmjohnson) 5stalled>3Resolved This was completed last week. [13:55:39] 6operations, 10Datasets-General-or-Unknown, 10Wikidata: Wikidata dumps contain old-style serialization. - https://phabricator.wikimedia.org/T74348#1088255 (10JanZerebecki) [14:00:04] (03PS1) 10BBlack: repool amssq31 [puppet] - 10https://gerrit.wikimedia.org/r/194323 [14:01:32] (03CR) 10BBlack: [C: 032] repool amssq31 [puppet] - 10https://gerrit.wikimedia.org/r/194323 (owner: 10BBlack) [14:01:47] PROBLEM - DPKG on palladium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:03:57] RECOVERY - DPKG on palladium is OK: All packages OK [14:07:08] PROBLEM - DPKG on palladium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:08:18] RECOVERY - DPKG on palladium is OK: All packages OK [14:08:20] YuviPanda: labnet1001 says WARNING: Puppet is currently disabled, last run 11 days ago with 0 failures [14:09:45] (03CR) 10Anomie: Beta: CX: Add wgContentTranslationCampaigns (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194265 (owner: 10KartikMistry) [14:11:17] paravoid: uh. That ain't me. I'll take a look in a few min [14:15:40] anomie: thanks. [14:19:32] (03PS4) 10KartikMistry: Beta: CX: Add wgContentTranslationCampaigns [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194265 [14:27:56] marktraceur, around? [14:28:44] * yurik wonders if anyone could do a very minor update now, as I won't be around during swat [14:35:39] 6operations, 5Patch-For-Review: Enable memory overcommit for all redis hosts with persistance - https://phabricator.wikimedia.org/T91498#1088356 (10coren) p:5Triage>3Normal Reviewing patch now. [14:37:31] 6operations, 6Phabricator, 6Project-Creators: Create policy projects and convert people projects to open - https://phabricator.wikimedia.org/T90491#1088363 (10Aklapper) Afterwards, need to update https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_projects#Type_of_project to list //acl* with Pol... [14:40:16] (03CR) 10coren: [C: 031] "This does what is intended, but is this the best thing? overcommit set to 1 will never fail a malloc() call, possibly overcommitting unbo" [puppet] - 10https://gerrit.wikimedia.org/r/194095 (https://phabricator.wikimedia.org/T91498) (owner: 10Yuvipanda) [14:40:38] anomie, could you do a quick depl before swat? [14:40:50] i'm worried that I won't be around during swat [14:41:38] yurik: Maybe, although you really should use the provided windows. How quick? [14:41:53] anomie, all patches are in [14:42:20] anomie, https://gerrit.wikimedia.org/r/#q,194327,n,z [14:42:33] described at https://wikitech.wikimedia.org/wiki/Deployments#Wednesday.2C.C2.A0March.C2.A004 [14:43:41] YuviPanda: +1'ed the overcommit patch for reddis, but with a suggestion in the comment [14:43:58] twentyafterfour: Are you doing any deployments at the moment? [14:44:27] anomie: not yet [14:44:54] 4 hours until deployment train [14:45:01] twentyafterfour: Ok, thanks [14:46:47] !log anomie Synchronized php-1.25wmf19/extensions/Graph: early SWAT: Update Graph extension to fix IE bug [[gerrit:194326]] (duration: 00m 06s) [14:46:49] yurik: ^ Test please [14:46:54] Logged the message, Master [14:47:01] anomie, thanks! testing [14:47:41] rebooting to test IE [14:47:45] good so far [14:49:46] anomie, all works, thanks! [14:54:36] 6operations, 10Datasets-General-or-Unknown, 10Wikidata: Wikidata dumps contain old-style serialization. - https://phabricator.wikimedia.org/T74348#1088416 (10ArielGlenn) Is anyone looking at the redirects serialization? [14:56:12] paravoid: strange [14:56:14] > The last Puppet run was at Fri Feb 6 14:48:22 UTC 2015 (13 minutes ago). [14:56:27] > yuvipanda@labmon1001:~$ date [14:56:27] Wed Mar 4 14:56:01 UTC 2015 [14:56:46] Very early SWAT. [14:57:10] paravoid: oh, labnet. [14:57:14] let me look [14:58:21] yup, it’s disabled, without reason, and I’m not sure who did it [14:59:52] Coren: andrewbogott_afk akosiaris do you have any idea why puppet might have been disabled on feb 20th on labnet1001? [15:00:05] chasemp: Respected human, time to deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150304T1500). Please do the needful. [15:09:56] 6operations: overhaul fundraising cluster monitoring - https://phabricator.wikimedia.org/T91508#1088471 (10Jgreen) 3NEW a:3Jgreen [15:11:32] 6operations: boron passive checks aren't being collected - https://phabricator.wikimedia.org/T89983#1088489 (10Jgreen) [15:11:32] 6operations: overhaul fundraising cluster monitoring - https://phabricator.wikimedia.org/T91508#1088488 (10Jgreen) [15:13:40] 6operations: overhaul fundraising cluster monitoring - https://phabricator.wikimedia.org/T91508#1088506 (10Jgreen) [15:18:06] 6operations, 6Labs: Rename specific account in LDAP, Wikitech, Gerrit and Phabricator - https://phabricator.wikimedia.org/T85913#1088529 (10Tobi_WMDE_SW) @Chad @yuvipanda @coren so, nothing seemed to has happened here for almost a month now. can you tell us what people might be relevant for getting this done.... [15:24:17] 6operations, 10Incident-20150205-SiteOutage, 6MediaWiki-Core-Team, 10Wikimedia-Logstash, 5Patch-For-Review: Decouple logging infrastructure failures from MediaWiki logging - https://phabricator.wikimedia.org/T88732#1088573 (10Anomie) [15:28:52] 6operations, 10Wikimedia-Apache-configuration, 10Wikimedia-DNS, 10Wikimedia-General-or-Unknown, 5Patch-For-Review: m.{project}.org portal/redirect consistency and i18n issues - https://phabricator.wikimedia.org/T78421#1088601 (10Glaisher) Yay, thanks! So all m.project.orgs are now an alias for www.projec... [15:29:22] 6operations, 10Wikimedia-Apache-configuration, 5Patch-For-Review: wikibooks.org redirects to en.wikibooks.org - https://phabricator.wikimedia.org/T87039#1088603 (10Glaisher) [15:29:24] 6operations, 10Wikimedia-Apache-configuration, 10Wikimedia-DNS, 10Wikimedia-General-or-Unknown, 5Patch-For-Review: m.{project}.org portal/redirect consistency and i18n issues - https://phabricator.wikimedia.org/T78421#1088602 (10Glaisher) [15:30:05] 6operations, 10Wikimedia-Apache-configuration, 10Wikimedia-DNS, 10Wikimedia-General-or-Unknown: m.{project}.org portal/redirect consistency - https://phabricator.wikimedia.org/T78421#1088606 (10Glaisher) [15:41:54] (03PS3) 10Giuseppe Lavagetto: added mc2135 [puppet] - 10https://gerrit.wikimedia.org/r/193864 (owner: 10Papaul) [15:42:03] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] added mc2135 [puppet] - 10https://gerrit.wikimedia.org/r/193864 (owner: 10Papaul) [15:47:33] (03PS1) 10Giuseppe Lavagetto: role::deployment: add codfw deployment master (still tin) [puppet] - 10https://gerrit.wikimedia.org/r/194341 [15:50:27] I suppose I'll SWAT today. kart_, still here? SWAT in 10 minutes. [15:52:33] (03PS2) 10Giuseppe Lavagetto: role::deployment: add codfw deployment master (still tin) [puppet] - 10https://gerrit.wikimedia.org/r/194341 [15:52:55] 6operations, 6Phabricator, 6Project-Creators: Create policy projects and convert people projects to open - https://phabricator.wikimedia.org/T90491#1088760 (10chasemp) >>! In T90491#1088363, @Aklapper wrote: > Afterwards, need to update https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_project... [15:53:29] (03CR) 10Giuseppe Lavagetto: [C: 032] role::deployment: add codfw deployment master (still tin) [puppet] - 10https://gerrit.wikimedia.org/r/194341 (owner: 10Giuseppe Lavagetto) [15:58:25] kart_: Ping for SWAT [15:59:38] RECOVERY - puppet last run on graphite2001 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [16:00:04] manybubbles, anomie, ^d, marktraceur: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150304T1600). [16:00:13] oh [16:00:14] * anomie starts SWAT, and waits for kart_ [16:00:23] (03PS4) 10GWicke: Enable test/phase0 and *.wikipedia.org wikis in restbase [puppet] - 10https://gerrit.wikimedia.org/r/194244 [16:00:53] its a beta sync too [16:01:23] anomie: around now. [16:01:27] paravoid: I’m just about to go to breakfast, but I’d enjoy a response to https://phabricator.wikimedia.org/T84772 — wondering if I can get http/s between the labs vlan and misc-web. [16:01:47] (03PS5) 10Anomie: Beta: CX: Add wgContentTranslationCampaigns [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194265 (owner: 10KartikMistry) [16:02:03] (03CR) 10Anomie: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194265 (owner: 10KartikMistry) [16:02:07] (03Merged) 10jenkins-bot: Beta: CX: Add wgContentTranslationCampaigns [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194265 (owner: 10KartikMistry) [16:02:19] (03CR) 10Mobrovac: [C: 031] Enable test/phase0 and *.wikipedia.org wikis in restbase [puppet] - 10https://gerrit.wikimedia.org/r/194244 (owner: 10GWicke) [16:02:36] kart_: Test it on Beta? [16:02:58] anomie: sure. [16:03:07] manybubbles: poke [16:03:08] :) [16:03:12] godog: good morning [16:03:28] sorry sorry coming [16:03:54] godog: what's the process for adding urandom to the wmf group so that he can see our logs? [16:05:01] gwicke: he is still in vacation IIRC [16:05:15] <_joe_> gwicke: godog is not here today [16:05:33] gwicke: which wmf group are you referring to exactly ? [16:05:45] web access ? [16:06:35] YuviPanda: no idea at all about puppet being disabled on labnet1001 [16:06:52] akosiaris: kibana basically [16:06:57] and graphite [16:07:43] gwicke: ok, just open a phab ticket in operations. He is a wmf employee right ? the new hire right ? [16:08:00] yup [16:08:18] ticket on the way.. [16:09:09] !log anomie Synchronized wmf-config: SWAT: Beta-only change: CX: Add wgContentTranslationCampaigns [[gerrit:194265]] (duration: 00m 07s) [16:09:16] Logged the message, Master [16:09:17] kart_: ^ Also test nothing broke in prod (not that it should have from that) [16:09:26] 6operations: Kibana / Graphite access for Eric Evans (eevans) - https://phabricator.wikimedia.org/T91513#1088819 (10GWicke) 3NEW [16:13:11] anomie: sure! [16:13:37] 6operations: Kibana / Graphite access for Eric Evans (eevans) - https://phabricator.wikimedia.org/T91513#1088847 (10akosiaris) I assume eevans is the labs username ? [16:15:13] 7Blocked-on-Operations, 6operations, 10RESTBase, 10hardware-requests, 7RESTBase-architecture: RESTBase production hardware - 5 of 6 ready - https://phabricator.wikimedia.org/T76986#1088862 (10mobrovac) [16:15:14] 6operations, 10ops-eqiad, 10RESTBase, 6Services: restbase1006 faulty disk controller - https://phabricator.wikimedia.org/T89639#1088863 (10mobrovac) [16:17:56] 6operations: Kibana / Graphite access for Eric Evans (eevans) - https://phabricator.wikimedia.org/T91513#1088881 (10Eevans) @akosiaris it is eevans, yes! [16:18:27] anomie: tested. Thanks! [16:18:28] 6operations, 10ops-eqiad, 10RESTBase, 6Services: restbase1006 faulty disk controller - https://phabricator.wikimedia.org/T89639#1088884 (10mobrovac) [16:18:34] * anomie is done with SWAT [16:18:37] PROBLEM - Host restbase1006 is DOWN: PING CRITICAL - Packet loss = 100% [16:18:49] * kart_ had to create new a/c in Beta and tried hard :D [16:19:56] YuviPanda: That timeline fits between the labstore1005 fail and around the first 1012 (network) fail. Perhaps that was andrewbogott_afk trying to fix networking? [16:24:48] 6operations, 10ops-eqiad, 10RESTBase, 6Services: restbase1006 faulty disk controller - https://phabricator.wikimedia.org/T89639#1088927 (10Cmjohnson) Part is being shipped by HP [16:25:18] PROBLEM - puppet last run on amssq53 is CRITICAL: CRITICAL: puppet fail [16:29:23] 6operations: Kibana / Graphite access for Eric Evans (eevans) - https://phabricator.wikimedia.org/T91513#1088941 (10akosiaris) 5Open>3Resolved a:3akosiaris user eevans has been added to the LDAP wmf group [16:30:32] akosiaris: thank you! [16:30:44] you're welcome [16:37:25] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1088964 (10Awjrichards) >>! In T706#1083957, @chasemp wrote: > I added you. I think the docs are [[ http://www.mediawiki.org/wiki/Phabricator... [16:38:22] 6operations, 6Phabricator, 6Project-Creators, 6Triagers: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1088966 (10Awjrichards) >>! In T706#1087737, @Aklapper wrote: > They are, and though people are pointed to the guidelines in this very task, *... [16:41:23] YuviPanda: labnet1001 is almost certainly my fault — I think you should go ahead and reenable. [16:45:09] RECOVERY - puppet last run on amssq53 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:53:59] mark: have time for a quick networking question? I’m wondering if I can get ports 80 and 443 open between misc-web and a server on the labs vlan… or if my asking that is fundamentally misunderstanding how vlans work :) [16:54:46] I would guess you probably can't do that [16:54:53] But doesn't labs have its own misc-web-like proxy? [16:55:24] RoanKattouw: Ah, I’m not talking about a labs instance, but rather a physical server [16:55:31] Oh... [16:55:41] I see [16:55:58] Now I'm very curious as to what you're trying to do [16:56:26] <_joe_> me too [16:56:38] Horizon [16:56:58] It needs to talk to labs services via rest, but I’d also like it to be publicly accessible via misc-web. [16:57:47] Probably all this would be trivial if the horizon host were outside of the labs vlan. [16:58:12] Hm… actually, I don’t know if silver is in the labs vlan or not. It certainly has no trouble talking to labs services [16:58:49] 6operations, 10Continuous-Integration, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1089059 (10hashar) ``` integration-puppetmaster:~$ ruby -v ruby 1.9.3p484 (2013-11-22 revision 43786) [x86_64-linux] ``` The previous instance was... [16:59:27] _joe_, RoanKattouw, am I making sense? [16:59:36] <_joe_> yes [17:00:29] <_joe_> and well, you don't need to speak with labs instances, just with physical servers in the labs vlan, right? [17:00:35] <_joe_> I think that's allowed [17:00:36] right [17:01:09] Yeah, I think so too, just don’t know how to implement. It’s certainly the case that right now there’s no contact between misc-web and californium. And californium isn’t running a firewall atm... [17:01:19] Hence my appeal to a higher network authority :) [17:02:12] andrewbogott: Well silver has a public IP and it can apparently talk to labs things [17:02:24] So that would seem to indicate that what you want to do is possible in some way [17:02:51] yeah. [17:02:56] Although depending on how special silver is, maybe the proxy to Horizon needs to live on silver instead of misc-web [17:04:44] 7Blocked-on-Operations, 6operations, 10Analytics, 6Mobile-Apps, and 4 others: Avoid cache fragmenting URLs for Share a Fact shares - https://phabricator.wikimedia.org/T90606#1089065 (10dr0ptp4kt) @BBlack, okay if we model after https://gerrit.wikimedia.org/r/#/c/120617/ ? [17:05:54] (03PS1) 10Dzahn: statistics: remove 2 UTF-8 characters [puppet] - 10https://gerrit.wikimedia.org/r/194353 (https://phabricator.wikimedia.org/T91453) [17:07:16] (03PS2) 10Dzahn: statistics: remove 2 UTF-8 characters [puppet] - 10https://gerrit.wikimedia.org/r/194353 (https://phabricator.wikimedia.org/T91453) [17:07:35] (03PS1) 10Phuedx: [WikiGrok] Add new suggestions to the actor campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194354 [17:07:45] (03CR) 10Dzahn: [C: 032] statistics: remove 2 UTF-8 characters [puppet] - 10https://gerrit.wikimedia.org/r/194353 (https://phabricator.wikimedia.org/T91453) (owner: 10Dzahn) [17:08:01] (03CR) 10jenkins-bot: [V: 04-1] statistics: remove 2 UTF-8 characters [puppet] - 10https://gerrit.wikimedia.org/r/194353 (https://phabricator.wikimedia.org/T91453) (owner: 10Dzahn) [17:08:57] hashar: ^ that -1 from jenkins seems new and unrelated [17:09:08] 17:07:43 raise ImportError("Entry point %r not found" % ((group,name),)) [17:09:36] Could not record history. Previous build's commit, 372117d773c0cc786066a8159c1ec8ae30bba40e, does not exist in the current repository. [17:10:16] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/194353 (https://phabricator.wikimedia.org/T91453) (owner: 10Dzahn) [17:13:56] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - tfinc - https://phabricator.wikimedia.org/T90927#1089075 (10chasemp) >>! In T90927#1083951, @Jalexander wrote: >>>! In T90927#1083939, @chasemp wrote: >> who is the boss type person who should approve this? > > Damon I... [17:15:04] 6operations, 6Security: Define in Puppet or remove rogue user accounts not currently defined in admin/data.yaml - https://phabricator.wikimedia.org/T90923#1089081 (10chasemp) [17:15:28] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [17:16:37] 10Ops-Access-Requests, 6operations, 6Security: define in Puppet or remove user account - milimetric - https://phabricator.wikimedia.org/T90956#1089085 (10chasemp) @tnegrin ping :) [17:20:28] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [17:25:01] (03CR) 10Dzahn: [C: 032 V: 032] "overruling jenkins - the reason for the -1 is a missing package on the new integration instance on trusty" [puppet] - 10https://gerrit.wikimedia.org/r/194353 (https://phabricator.wikimedia.org/T91453) (owner: 10Dzahn) [17:33:45] 6operations, 10Continuous-Integration, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1089194 (10Dzahn) >>! In T91453#1085388, @Krinkle wrote: > Similar: > ``` > invalid byte sequence in US-ASCII at /etc/puppet/manifests/role/statist... [17:40:19] 6operations, 10Citoid, 10VisualEditor, 5§ VisualEditor Q3 Blockers: Improve citoid production service - https://phabricator.wikimedia.org/T90281#1089221 (10akosiaris) [17:40:20] 7Blocked-on-Operations, 6operations, 10Citoid, 6Scrum-of-Scrums, 6Services: Zotero not running in production - https://phabricator.wikimedia.org/T76308#1089218 (10akosiaris) [17:40:23] 6operations, 10Citoid: Backport and using zotero-standalone for the zotero service - https://phabricator.wikimedia.org/T89866#1089215 (10akosiaris) 5Open>3declined a:3akosiaris The zotero standalone package is not actually needed. xulrunner-24.0 and xulrunner-dev are the ones actually needed [17:40:42] 6operations, 10Continuous-Integration: Provide lint for yaml files in operations repository - https://phabricator.wikimedia.org/T91496#1089230 (10coren) p:5Triage>3Normal [17:41:34] 6operations, 10Staging: Package trebuchet-trigger for trusty - https://phabricator.wikimedia.org/T91463#1089237 (10coren) p:5Triage>3Normal [17:49:59] hashar, hi [17:53:04] 7Blocked-on-Operations, 6operations, 10Citoid, 6Scrum-of-Scrums, 6Services: Zotero not running in production - https://phabricator.wikimedia.org/T76308#1089362 (10akosiaris) @mobrovac yes I have. So zotero seems to run OK under xulrunner (firefox will not do), some LD_LIBRARY_PATH, redefinition of the GR... [17:53:30] Jeff_Green, another restricted project? https://phabricator.wikimedia.org/T91508 [17:53:32] What is this one for? [17:57:43] 7Blocked-on-Operations, 6operations, 10Citoid, 6Scrum-of-Scrums, 6Services: Zotero not running in production - https://phabricator.wikimedia.org/T76308#1089369 (10akosiaris) [17:58:31] 7Blocked-on-Operations, 6operations, 10Citoid, 6Scrum-of-Scrums, 6Services: Zotero not running in production - https://phabricator.wikimedia.org/T76308#795651 (10akosiaris) [18:00:15] Krenair: I'm not sure what that means [18:00:34] Jeff_Green, there is "operations" and "Restricted Project" [18:01:27] ah, the other one is ops-fundraising [18:01:45] i don't know why phabricator prevents even seeing the project name [18:01:52] Where was that project requested and why is it restricted visibility? [18:02:24] Seems to have been requested here: https://phabricator.wikimedia.org/T89160 [18:02:50] I restricted visibility because a lot of the tasks under that project are about payments-related stuff [18:02:54] ... [18:02:58] That's not how Phabricator works. [18:03:16] Jeff_Green: at least it's not operations-only :p [18:03:28] Changing the visibility of a project does not affect tasks or any other objects. [18:04:14] sigh [18:04:30] there's probably not a good way to do what I need [18:04:54] What on earth do you need that requires restricted project visibility? [18:05:13] nothing, it didn't occur to me that that was even a feature [18:05:57] PROBLEM - puppet last run on amssq32 is CRITICAL: CRITICAL: Puppet last ran 4 hours ago [18:07:06] so you could have a project that is visible, with all the contained tasks invisible? [18:07:30] what I'd like is for that to be the default until I have a chance to triage the task [18:07:32] yes harej [18:07:52] harej, or vice versa [18:08:08] Jeff_Green, ... you want all tasks created that have a certain project, to inherit policy from that project? [18:08:47] as a default yes, but I'd want to be able to triage them to public viewable once I determine they don't contain sensitive information [18:08:53] That's very much Resolved, declined. "You can have multiple projects on a task - which should I inherit policy from?" etc. [18:08:59] twentyafterfour, hi [18:10:41] the most restrictive [18:10:49] That's undefined. [18:11:01] Policies can be very complex [18:11:17] You need to set the visibility policy on the new private tasks to allow include members of the ops-fundraising project, and just restrict editing/joining that project [18:11:40] Krenair: ok [18:13:16] Jeff_Green, You could have two projects to inherit policy from, for example - "Allow users x, y, z, and members of project q, while the moon is full" vs. "Allow members of projects f and g but deny users x and y" [18:13:46] There is no sane way to inherit policy from more than one object. [18:15:08] ok. dealing task by task will work ok, at least until it doesn't :-) [18:18:33] Jeff_Green, I hope you didn't make any private tasks which you were expecting to inherit the invisibility policy [18:19:16] nope, I didn't [18:19:41] mforns: yo [18:19:46] hey [18:19:52] Jeff_Green, thank you for restoring the project [18:20:05] sure [18:20:52] mforns: what can I do for ya [18:21:20] twentyafterfour, one q, I'd like to get added to labs 'deployment' project to be able to ssh into deployment-eventlogging02.eqiad.wmflabs. They told me that the deployment project lead is hashar, but he is not there [18:21:38] I suppose he is not in his working hours any more (europe?) [18:22:06] I saw you are also in that project, can you also help me? or is it just the project lead that can do this? [18:23:47] greg-g: ^ [18:24:08] mforns: I don't know if I'm an admin on that project ... [18:24:21] deployment-prep project... lemme see [18:24:23] I am, but don't know the rules for adding people. [18:24:47] mforns: I don't think I know you, who are you? :) [18:25:00] I'm Marcel from Analytics [18:25:16] Probably Marcel Ruiz Forns, software engineer in analytics [18:25:18] also, for things regarind Beta Cluster (aka "deployment-prep" aka https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep) the #wikimedia-releng channel is good [18:25:23] regarding* [18:25:25] that said, not cloaked on IRC [18:25:36] :) [18:25:48] what is your wikitech username? [18:25:53] mforns [18:26:02] easy enough [18:26:12] :] [18:26:53] "Successfully added mforns to deployment-prep. " [18:27:19] oh, many thanks greg-g, Krenair and twentyafterfour! [18:27:33] I forget how long it takes to propagate (<=20 minutes?) [18:27:44] now I can ssh, awesome :] [18:27:46] sweet [18:27:55] thx [18:28:05] greg-g, I'm wondering if there is a bug in OpenStackManager: https://wikitech.wikimedia.org/w/index.php?title=Nova_Resource:Deployment-prep&diff=next&oldid=144649 [18:28:26] did you deliberately add that person to admin? was it today? [18:28:41] not just now, no [18:28:47] okay [18:28:50] but I did add tyler at some point [18:28:50] this has been happening for ages [18:28:55] oh, greg-g, another question, how can I get root on deployment-eventlogging02.eqiad.wmflabs? I'd like to tail some EventLogging logs [18:29:49] mforn: sudo [18:29:56] mforns ^ [18:30:16] just added you to the sudo (under_NDA) policy [18:30:24] twentyafterfour, sure :], but do all members have root access? [18:30:27] probably https://phabricator.wikimedia.org/T73164 [18:30:29] via https://wikitech.wikimedia.org/wiki/Special:NovaSudoer [18:30:48] ok, perfect [18:30:53] thanks again guys! [18:31:40] Krenair: thanks, just commented [18:34:32] sigh, the OSM extension needs quite a bit more work, looking at the list of tasks against it [18:34:39] andrewbogott, ^ [18:35:19] 7Puppet, 6Multimedia, 7Blocked-on-RelEng: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1089461 (10MarkTraceur) [18:35:24] Krenair: our plan is to deprecate OSM and adopt Horizon instead. [18:35:31] ok [18:35:43] is that long term, or quarterly goal? :) [18:35:48] And, actually, most of those issues are SMW problems more than OSM [18:35:57] PROBLEM - HHVM busy threads on mw1184 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [115.2] [18:36:01] Um… getting Horizon working is a quarterly goal, but it won’t be feature-complete [18:36:21] okay, so we'll still rely on OSM for some stuff [18:37:19] for the near term, yeah. [18:48:02] (03PS1) 10Dzahn: hhvm: convert 'ASCII art' to ASCII [puppet] - 10https://gerrit.wikimedia.org/r/194365 (https://phabricator.wikimedia.org/T91453) [18:53:57] (03PS1) 10Dzahn: keyholder/eventlogging: replace UTF-8 chars [puppet] - 10https://gerrit.wikimedia.org/r/194369 (https://phabricator.wikimedia.org/T91453) [18:54:24] (03CR) 10Bmansurov: [C: 031] [WikiGrok] Add new suggestions to the actor campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194354 (owner: 10Phuedx) [18:56:45] (03CR) 10Dzahn: [C: 032] hhvm: convert 'ASCII art' to ASCII [puppet] - 10https://gerrit.wikimedia.org/r/194365 (https://phabricator.wikimedia.org/T91453) (owner: 10Dzahn) [18:56:58] RECOVERY - HHVM busy threads on mw1184 is OK: OK: Less than 30.00% above the threshold [76.8] [18:57:38] (03CR) 10Dzahn: [C: 032] keyholder/eventlogging: replace UTF-8 chars [puppet] - 10https://gerrit.wikimedia.org/r/194369 (https://phabricator.wikimedia.org/T91453) (owner: 10Dzahn) [18:57:47] 7Puppet, 6Multimedia, 6Scrum-of-Scrums, 7Blocked-on-RelEng: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1089526 (10dduvall) [18:59:19] 6operations, 10Continuous-Integration, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1089529 (10Dzahn) now all UTF-8 chars should be gone from all .pp files (but still in .erb files but that should not break things) grep -l --color... [19:00:00] (03CR) 10Phuedx: [C: 04-1] "Don't merge this until we've tackled https://trello.com/c/dbelLkvn/233-5-run-new-campaigns-at-100-in-stable" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194354 (owner: 10Phuedx) [19:00:04] twentyafterfour, greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150304T1900). Please do the needful. [19:00:27] PROBLEM - HHVM busy threads on mw1184 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [115.2] [19:03:08] !log Creating new deployment branch 1.25wmf20 [19:03:14] Logged the message, Master [19:04:03] <^d> mutante: https://phabricator.wikimedia.org/diffusion/SVN/ :D [19:04:58] ^d: !! wooohoo [19:05:05] ^d: and the last commit:) very nice [19:06:03] https://phabricator.wikimedia.org/rSVN115794#53fe8d0c [19:06:51] <^d> Ah crud [19:08:13] <^d> Phab doesn't support svn over http(s) yet...still have to keep svn.wm.o available for checking out for now :\ [19:08:17] 6operations: Decommission svn.wikimedia.org server (import SVN into Phabricator) - https://phabricator.wikimedia.org/T86655#1089545 (10Dzahn) https://phabricator.wikimedia.org/diffusion/SVN/ [19:08:55] 6operations, 10Wikimedia-General-or-Unknown, 7Regression: svn.wikimedia.org security certificate expired - https://phabricator.wikimedia.org/T88731#1089548 (10Dzahn) let's close this as rejected, T86655 is happening instead [19:09:27] I'm running the checkout now? [19:09:30] does it not complete? [19:09:36] ^d: .. and i thought if we didn't import we would want to keep only the web part but disable the check out part [19:09:47] <^d> chasemp: Checking out from Phab? [19:09:49] so that way it could be behind misc-web and cert issue gone as well [19:09:55] ^d: yessir [19:10:02] try it? [19:10:14] svn checkout http://svn.wikimedia.org/svnroot/mediawiki [19:10:20] <^d> That's not Phab [19:10:23] <^d> look at the url [19:10:26] ah christ [19:10:37] it's like my brain set me up for that intentionally [19:10:51] ^d: so is svn being decom'd? [19:10:52] two windows, two end points, one fail, I look at the wrong one [19:11:09] why do we need to keep the check out [19:11:14] if we imported it [19:11:19] <^d> The repo still matters. [19:11:32] I think phab has to come out from behidn misc-web [19:11:34] <^d> People should be able to get the old code [19:11:41] notify server, git, ssh, svn [19:11:57] no way that all goes through cp* I think, seems it failed w/ gerrit [19:13:45] ^d: I have seen this handled by doing a conversion rather than caring about the svn protocol forever. roughly http://john.albin.net/git/convert-subversion-to-git [19:13:56] idk why svn particular would remain important? [19:14:20] <^d> I've been converting history out of SVN for 2-3 years now. [19:14:39] <^d> Doing it for sub-paths of the repo is fine, but a catch-all of the whole repo is massively gigantic and not very useful. [19:14:44] grrr.. make-wmf-branch seems to be broken, again [19:14:47] <^d> Although, could be interesting [19:15:00] <^d> I know more about repacking since I tried last :p [19:15:04] did the git version change on tin? I don't understand why it breaks every week [19:15:06] if we can, import it all [19:15:41] * ^d dodges rabbit holeeeeee [19:16:13] ^d :D well I'm not going to volunteer at this point so happy to accept that conclusion [19:18:02] fatal: git checkout: updating paths is incompatible with switching branches. [19:18:05] Did you intend to checkout 'origin/wmf/1.25wmf20' which can not be resolved as commit? [19:18:07] Unable to checkout submodule 'extensions/MoodBar' [19:18:21] it's failing on every extension submodule [19:18:56] crickets [19:22:00] ^d: any idea what happened to make-wmf-branch? [19:23:09] twentyafterfour: if I had any clue I would chime in :) [19:23:53] twentyafterfour: did anything at all change since last week? [19:24:16] well I didn't pull from the release repo so it should be identical code [19:24:32] hm [19:25:13] <^d> ...sounds like remote branches don't exist or the repo wasn't fetched [19:25:36] right but isn't this the code that creates the remote branches? [19:25:50] i think it does afaik [19:27:23] <^d> Yes it should, before it tries to clone them into the local repo [19:28:10] (03PS1) 10Bmansurov: [WikiGrok] Create 'film director' campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194373 [19:36:02] (03PS1) 10Bmansurov: [WikiGrok] Create 'screenwriter' campaign [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194378 [19:55:21] (03PS1) 1020after4: Add package 'tig' to tin (via deployment role) to assist with deployment [puppet] - 10https://gerrit.wikimedia.org/r/194382 [19:56:28] PROBLEM - Kafka Broker Messages In Per Second on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 0 data above and 45 below the confidence bounds [19:56:52] hmm [19:58:48] PROBLEM - Kafka Broker Messages In Per Second on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 0 data above and 45 below the confidence bounds [19:59:04] what's happeninnnn [19:59:10] kafka looks fine [19:59:17] hm [20:06:38] PROBLEM - Kafka Broker Messages In Per Second on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 0 data above and 46 below the confidence bounds [20:12:08] PROBLEM - Kafka Broker Messages In Per Second on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 0 data above and 46 below the confidence bounds [20:13:40] uhhhh [20:13:50] no data in graphite for recent stuff, i think kafka is fine [20:26:50] (03CR) 10John F. Lewis: "@dzahn re evaluate this please" [puppet] - 10https://gerrit.wikimedia.org/r/172434 (owner: 10John F. Lewis) [20:28:18] mutante: Hm.. running into this one: [20:28:19] Mar 4 18:59:10 integration-puppetmaster puppet-master[1325]: Puppet::Parser::AST::Resource failed with error ArgumentError: Could not find declared class ::hhvm at /etc/puppet/modules/contint/manifests/hhvm.pp:28 on node i-000005cd.eqiad.wmflabs [20:28:35] Mar 4 18:56:31 integration-puppetmaster puppet-master[1325]: message repeated 2 times: [ Could not find class role::ci::website::labs for i-00000474.eqiad.wmflabs on node i-00000474.eqiad.wmflabs] [20:28:56] Mar 4 18:16:29 integration-puppetmaster puppet-master[1325]: Variable access via 'projectgroup' is deprecated. Use '@projectgroup' instead. template[/etc/puppet/modules/ldap/templates/access.conf.erb]:6 [20:28:57] Mar 4 18:16:29 integration-puppetmaster puppet-master[1325]: (at /etc/puppet/modules/ldap/templates/access.conf.erb:6:in `block in result') [20:28:57] Mar 4 18:16:30 integration-puppetmaster puppet-master[1325]: Variable access via 'ssldir' is deprecated. Use '@ssldir' instead. template[/etc/puppet/modules/puppet/templates/puppet.conf.d/10-self.conf.erb]:6 [20:29:03] Later runs don't have those errors though [20:29:22] But I've quite often seen that errors only happen at the first run of a module and then just think it's installed [20:29:54] !log Manually completed the global rename Gabriel2517 -> WikiGuy2517 (was stuck on WD.o) [20:30:00] Logged the message, Master [20:30:18] mutante: As of 18:00 GMT, the UTF-8 error is gone [20:34:05] <_joe_> Krinkle: thanks a lot to you CI guys for weeding out the bugs in the ruby 1.9 puppetmaster for us :) [20:34:22] 6operations, 10Continuous-Integration, 5Patch-For-Review: invalid byte sequence in US-ASCII - puppet issues with UTF-8 - https://phabricator.wikimedia.org/T91453#1089996 (10Krinkle) @dzahn The last occurrence: ``` Mar 4 18:12:35 integration-puppetmaster puppet-master[1325]: Could not parse for environment... [20:34:26] Summarised at https://phabricator.wikimedia.org/T91453 [20:34:29] _joe_: yw :) [20:49:48] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation, 7Upstream: [upstream] Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1090047 (10hashar) I managed to get a rough package locally using a random upstream commit without any of our hack. Good progress so far. [20:50:08] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation, 7Upstream: [upstream] Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1090049 (10hashar) a:3hashar [20:53:29] 6operations, 10Analytics: investigate txstatsd error logs - https://phabricator.wikimedia.org/T91464#1090061 (10BBlack) [20:55:45] 6operations, 10Analytics: investigate txstatsd error logs - https://phabricator.wikimedia.org/T91464#1090071 (10Ottomata) a:5fgiunchedi>3Ottomata [20:55:58] 6operations: investigate txstatsd error logs - https://phabricator.wikimedia.org/T91464#1084838 (10Ottomata) p:5Triage>3Normal [20:56:54] 6operations: investigate txstatsd error logs - https://phabricator.wikimedia.org/T91464#1084838 (10Ottomata) This looks to be a problem with logster. It is trying to send non-numeric types to statsd. The linked to paste comes from logster parsing this and then flattenign the json keys: ``` "analytics101... [21:00:04] gwicke, cscott, arlolra, subbu: Respected human, time to deploy Parsoid/OCG (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150304T2100). Please do the needful. [21:04:57] (03PS3) 10BBlack: base: move instance-upstarts to manifest [puppet] - 10https://gerrit.wikimedia.org/r/188612 (owner: 10John F. Lewis) [21:06:56] (03CR) 10BBlack: [C: 032 V: 032] base: move instance-upstarts to manifest [puppet] - 10https://gerrit.wikimedia.org/r/188612 (owner: 10John F. Lewis) [21:07:32] bblack: woo [21:09:45] !log deployed parsoid version 06c8cf33 [21:09:53] Logged the message, Master [21:12:32] JohnFLewis: it was just a refactoring to move it to the correct file, right? didn't look like a functional diff to me [21:12:51] so i figured I could just fast-track it through [21:12:57] bblack: yeah [21:18:33] 7Puppet, 6Multimedia, 6Scrum-of-Scrums, 7Blocked-on-RelEng: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1090183 (10Tgr) See also the //Guidance on creating Debian packages for puppet// ops thread from January. The consensus there seemed to be that apt packages are an over... [21:20:14] bblack: just added you to another patch in my backlog :p [21:24:49] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation, 7Upstream: [upstream] Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1090218 (10hashar) My super lame commands: Create a cow image for debian sid: cowbuilder --create --debug \ --basepath /p... [21:25:33] JohnFLewis: sorry I know nothing about ganglia-vs-ganglia_new and firewall holes, and I don't have the tuits to dig into it today [21:26:33] bblack: it should be good to merge but if youre not confident I'll leave it go mutante :) [21:26:43] *to [21:27:04] (03PS1) 1020after4: Add 1.25wmf20 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194390 [21:27:06] (03PS1) 1020after4: Wikipedias to 1.25wmf19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194391 [21:27:08] (03PS1) 1020after4: Group0 to 1.25wmf20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194392 [21:27:56] (03CR) 1020after4: [C: 032] Add 1.25wmf20 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194390 (owner: 1020after4) [21:28:01] (03Merged) 10jenkins-bot: Add 1.25wmf20 symlinks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194390 (owner: 1020after4) [21:28:32] (03CR) 1020after4: [C: 032] Wikipedias to 1.25wmf19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194391 (owner: 1020after4) [21:28:37] (03Merged) 10jenkins-bot: Wikipedias to 1.25wmf19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194391 (owner: 1020after4) [21:31:08] RECOVERY - Kafka Broker Messages In Per Second on graphite1001 is OK: OK: No anomaly detected [21:33:45] !log twentyafterfour Started scap: Wikipedias to 1.25wmf19, testwiki to 1.25wmf20 and rebuild l10n cache [21:33:53] Logged the message, Master [21:35:03] is etherpad slow for anyone else? [21:36:14] greg-g: seems ok to me [21:36:30] ok here too [21:37:12] :/ [21:40:17] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [21:45:18] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [21:47:28] apergos: Coren is labstore1003 not puppetized? [21:47:32] I see only standard is included [21:49:10] YuviPanda: That's because there is literally nothing running on that box except nfs-kernel-server (which, indeed, should be included) and the exports file. You're right that this should be added; lemme do that now it'll take 2 minutes. [21:49:22] Coren: ok! [21:49:45] I have no idea YuviPanda [21:50:16] apergos: coren seems to have ideas! [21:50:21] so all good [21:50:37] ottomata: I need your help, sir. I can't access EL data any more. [21:50:52] ottomata: I guess it's the password change but I have no idea how to get the new password. [21:51:07] 6operations, 6Labs: Puppetize labstore1003 - https://phabricator.wikimedia.org/T91573#1090306 (10yuvipanda) 3NEW [21:51:13] !log twentyafterfour Finished scap: Wikipedias to 1.25wmf19, testwiki to 1.25wmf20 and rebuild l10n cache (duration: 17m 26s) [21:51:13] YuviPanda: Hm. Think it's worthwhile to create a role class for - literally - two stanzas of config? [21:51:18] deskana@stat1002:~$ less /etc/mysql/conf.d/research-client.cnf [21:51:18] /etc/mysql/conf.d/research-client.cnf: Permission denied [21:51:19] Logged the message, Master [21:51:42] Coren: yup [21:51:51] sweet cause it's bedtime for bonzo over here [21:51:53] Coren: everytime someone puts code directly in site.pp _joe_ kills a kitten [21:52:08] Wouldn't want to have that! [21:52:12] yup [21:53:46] Or anyone else that can help me with that? [21:53:49] Deskana: stat1003 /etc/mysql/conf.d/research-client.cnf [21:53:57] Oh, I'm on the wrong server. [21:54:29] ottomata: Thank you. :) [21:54:55] yup :) [21:57:34] (03PS1) 10coren: Labs: Puppetize labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/194395 [21:58:08] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Puppet has 2 failures [21:59:08] YuviPanda: ^^ but hang on, I forgot to put my usual managed-by-puppet notice on the file [21:59:40] woah. full scap of new branch in <18m? Awesome [22:00:13] (03PS2) 10coren: Labs: Puppetize labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/194395 [22:00:38] PROBLEM - puppet last run on amssq50 is CRITICAL: CRITICAL: Puppet has 1 failures [22:08:00] (03CR) 1020after4: [C: 032] Group0 to 1.25wmf20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194392 (owner: 1020after4) [22:08:05] (03Merged) 10jenkins-bot: Group0 to 1.25wmf20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194392 (owner: 1020after4) [22:09:41] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group0 to 1.25wmf20 [22:09:47] Logged the message, Master [22:10:13] bd808: yeah it was fast this time [22:15:49] Coren: sorry, was in a bus. [22:17:01] (03CR) 10Yuvipanda: [C: 04-1] "nit" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/194395 (owner: 10coren) [22:17:17] (03PS3) 10Yuvipanda: Labs: Puppetize labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/194395 (https://phabricator.wikimedia.org/T91573) (owner: 10coren) [22:17:46] YuviPanda: I continue to be mystified by your work schedule. [22:17:58] RECOVERY - puppet last run on cp4011 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [22:18:08] RECOVERY - puppet last run on amssq50 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [22:18:43] Coren: :D me too [22:19:12] (03PS1) 10BBlack: switch all cpNNNN to jessie storage config for installs [puppet] - 10https://gerrit.wikimedia.org/r/194401 [22:19:14] (03PS1) 10BBlack: switch default PXE installer to jessie [puppet] - 10https://gerrit.wikimedia.org/r/194402 [22:19:38] YuviPanda: you're on a bus at 02:30 local time? [22:19:50] (03CR) 10BBlack: [C: 032 V: 032] switch all cpNNNN to jessie storage config for installs [puppet] - 10https://gerrit.wikimedia.org/r/194401 (owner: 10BBlack) [22:19:53] bd808: 3:45 AM local time, yeah [22:20:14] bd808: just dropped Alice back at the airport, and got back to friend’s place I am crashing at [22:20:30] ah. hope the beach time was fun [22:21:16] YuviPanda: nit indeed. You *know* when puppet-lint isn't whining about it that it's a tiny thing. :-P [22:21:34] YuviPanda: But also, did you actually *add* trailing whitespace in the commit message? [22:22:29] (03PS4) 10Yuvipanda: Labs: Puppetize labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/194395 (https://phabricator.wikimedia.org/T91573) (owner: 10coren) [22:22:33] Coren: bah, yeah, accident [22:22:41] bd808: yup. I hope to be in the US by end of this month [22:22:46] (03PS5) 10coren: Labs: Puppetize labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/194395 [22:25:58] RECOVERY - puppet last run on amssq32 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [22:29:31] 6operations, 10Wikimedia-SVG-rendering, 7Upstream: Filter effect Gaussian blur filter not rendered correctly for small to medium thumbnail sizes - https://phabricator.wikimedia.org/T44090#1090417 (10Perhelion) [22:31:11] 6operations, 10Wikimedia-SVG-rendering, 7Upstream: Filter effect Gaussian blur filter not rendered correctly for small to medium thumbnail sizes - https://phabricator.wikimedia.org/T44090#1090434 (10Perhelion) 5duplicate>3Open [22:31:26] 6operations, 10Wikimedia-SVG-rendering, 7Upstream: Filter effect Gaussian blur filter not rendered correctly for small to medium thumbnail sizes - https://phabricator.wikimedia.org/T44090#461916 (10Perhelion) [22:34:10] (03CR) 10Yuvipanda: "(I do not know enough about NFS to +1 this...)" [puppet] - 10https://gerrit.wikimedia.org/r/194395 (owner: 10coren) [22:48:56] (03PS5) 10Rush: Enable test/phase0 and *.wikipedia.org wikis in restbase [puppet] - 10https://gerrit.wikimedia.org/r/194244 (owner: 10GWicke) [22:50:39] (03CR) 10Rush: [C: 032 V: 032] Enable test/phase0 and *.wikipedia.org wikis in restbase [puppet] - 10https://gerrit.wikimedia.org/r/194244 (owner: 10GWicke) [22:52:19] Coren: ^^ [22:53:03] can some smart person tell me what is up with my attempt to save https://office.wikimedia.org/wiki/Job_descriptions/Software_Engineer_%28Search%29?veaction=edit ? [22:53:20] it looks like I'm performing a couple of requests to api.php per second. [22:53:20] (03CR) 10Rush: "merging, gwicke is going to babysit. This has been tested at scale over the past weeks." [puppet] - 10https://gerrit.wikimedia.org/r/194244 (owner: 10GWicke) [22:55:26] ah badtoken [22:55:27] well then [22:55:40] now how do I stop this? [22:56:39] oh sad day [22:56:42] its gone [22:56:46] by too big edit is gone [22:56:57] ugh [22:57:11] manybubbles: known, cause unknown [22:57:16] ctrl-a ctrl-c doesn't seem to pick up the whole page for copy and paste in ve.... [22:57:19] manybubbles: if you still have the page open, let's save it [22:57:31] MatmaRex: its gone I think [22:57:31] open browser console/debugger and do: [22:57:41] I reloaded before you came [22:57:50] aw :( [22:57:55] I hid the popup and copy and pasted the page [22:58:00] but that didn't actually pick it up [22:58:10] so reloading the page and then pasting it just emptied the page [22:58:38] :( [22:59:10] for future reference, when you have the "Save" dialog open, you can do `ve.init.target.docToSave.body.outerHTML` in the console [23:00:02] (if you don't have it open when something breaks, the document can usually be recovered too, but i don't have the code for it handy) [23:00:09] this is per https://www.mediawiki.org/wiki/Parsoid/Debugging#Dumping_HTML_DOM_before_save_in_VE [23:00:38] once you have the HTML document, you can run it through Parsoid manually to get wikitext, or possibly cheat someone to load in it VE directly. [23:00:55] uh, cheat *somehow*. :) [23:01:16] ah [23:03:16] manybubbles: it's probably this bug: https://phabricator.wikimedia.org/T91158 [23:04:19] thanks [23:08:51] https://en.wikipedia.org/wiki/Talk:Main_Page#Histmerging_the_Main_Page [23:10:18] !log Enable test/phase0 and *.wikipedia.org wikis in restbase https://gerrit.wikimedia.org/r/#/c/194244/ [23:10:24] Logged the message, Master [23:10:34] (03CR) 10Tim Landscheidt: Labs: Puppetize labstore1003 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/194395 (owner: 10coren) [23:11:42] (03PS2) 10Rush: switch default PXE installer to jessie [puppet] - 10https://gerrit.wikimedia.org/r/194402 (owner: 10BBlack) [23:12:14] (03CR) 10Rush: [C: 031] "seems good but probably worth an email to ops list? I'm not sure if everybody knows this is ready." [puppet] - 10https://gerrit.wikimedia.org/r/194402 (owner: 10BBlack) [23:12:42] (03PS2) 10Rush: Add package 'tig' to tin (via deployment role) to assist with deployment [puppet] - 10https://gerrit.wikimedia.org/r/194382 (owner: 1020after4) [23:13:11] (03CR) 10Rush: "I have seen controversy here on new packages. Trying to investigate how to make sure this is cool. But from me personally seems fine." [puppet] - 10https://gerrit.wikimedia.org/r/194382 (owner: 1020after4) [23:15:38] (03CR) 1020after4: "I asked in #wikimedia-operations and didn't get any push-back about it. It's really low-risk since it's a tiny package with just one binar" [puppet] - 10https://gerrit.wikimedia.org/r/194382 (owner: 1020after4) [23:16:18] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 3 below the confidence bounds [23:16:31] (03CR) 10Yuvipanda: [C: 031] Add package 'tig' to tin (via deployment role) to assist with deployment [puppet] - 10https://gerrit.wikimedia.org/r/194382 (owner: 1020after4) [23:17:17] Hi, the parsoid team just completed a deploy but metrics being sent to statsd/graphite aren't showing up. what could be the issue? the statsd server? [23:17:28] (03CR) 10Rush: [C: 032] "seems good" [puppet] - 10https://gerrit.wikimedia.org/r/194382 (owner: 1020after4) [23:22:26] clarification: all the metrics are showing up in beta labs, but only one set of metrics is showing up in production. also, y'day Subbu (from the Parsoid team) successfully tested the same metrics using a local install of graphite/statsd. [23:22:54] any idea what could be the issue? [23:24:24] (03PS1) 10BBlack: reduce static ssl buffer size to fit 1 packet always [puppet] - 10https://gerrit.wikimedia.org/r/194412 (https://phabricator.wikimedia.org/T86666) [23:26:02] (03CR) 10BBlack: [C: 032] reduce static ssl buffer size to fit 1 packet always [puppet] - 10https://gerrit.wikimedia.org/r/194412 (https://phabricator.wikimedia.org/T86666) (owner: 10BBlack) [23:26:27] (03PS1) 10Thcipriani: Add classes via hiera for labs [puppet] - 10https://gerrit.wikimedia.org/r/194413 (https://phabricator.wikimedia.org/T90592) [23:28:27] (03CR) 10Yuvipanda: [C: 04-1] "Should definitely be split into two patches (one for the include classes via hiera, and one for the parameterization), but otherwise me li" [puppet] - 10https://gerrit.wikimedia.org/r/194413 (https://phabricator.wikimedia.org/T90592) (owner: 10Thcipriani) [23:33:24] (03PS3) 10Jforrester: Disable 'beta' label in tab for the VE opt-in wiki (enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/112590 (https://phabricator.wikimedia.org/T60583) [23:34:58] (03PS4) 10Jforrester: Disable 'beta' label in tab for the VE opt-in wiki (enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/112590 (https://phabricator.wikimedia.org/T60583) [23:36:58] jouncebot, next [23:36:58] In 0 hour(s) and 23 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150305T0000) [23:46:36] (03CR) 10Jforrester: [C: 031] "Let's do this early next week; the performance loss is irritating." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/112590 (https://phabricator.wikimedia.org/T60583) (owner: 10Jforrester) [23:47:44] James_F: thanks [23:47:45] :) [23:47:53] :-) [23:54:02] James_F, you think this needs communication? [23:55:10] Hmm. I guess it would, sadly. [23:55:14] MaxSem, hey [23:55:24] pong [23:55:26] I just noticed you have a patch up for swat, which I'm concerned about [23:56:07] your change to wmgEnableGeoData disables it on special wikis (that aren't commons) [23:56:16] This could break a lot of pages [23:56:18] yep [23:56:23] example? [23:56:31] anything relying on #coordinates [23:56:32] where is it used? [23:56:37] like? [23:58:45] Krenair: Suddenly changing the interface for > 30k editors on a wiki? Yeah. [23:58:49] hmm, I guess I need to flip it back on on test and test2 [23:59:22] !log stopping cassandra cluster for cleanup [23:59:28] Logged the message, Master [23:59:34] test and test2 are the obvious ones [23:59:37] there may be more [23:59:41] make sure you check them