[00:01:20] matt_flaschen: does everything look ok with https://gerrit.wikimedia.org/r/#/c/313930/ on mw1099? (if you had a chance to check) [00:01:41] thcipriani, yeah, I'm still testing. Unrelated complication. [00:01:55] np :) [00:04:34] !log Started run of exportRestrictions script on terbium (T135278); this is running in screen as user gwicke. It is not expected to generate noticeable load. [00:04:35] T135278: Import page restrictions to Cassandra restriction table - https://phabricator.wikimedia.org/T135278 [00:04:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:05:18] thcipriani, looks good, sorry for the delay. [00:05:36] matt_flaschen: np, pushing live everywhere [00:08:13] !log thcipriani@tin Synchronized php-1.28.0-wmf.20/extensions/Flow/includes/BoardMover.php: SWAT: [[gerrit:313930|BoardMover: do not try to save a null edit (T138310)]] (duration: 00m 49s) [00:08:15] T138310: Flow as a Beta feature: enable, disable and reenable doesn't seem to work - https://phabricator.wikimedia.org/T138310 [00:08:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:08:21] ^ matt_flaschen live everywhere [00:15:36] PROBLEM - puppet last run on db1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:17:13] arseny92: oh! I just now refreshed the deployment page. We ran over time a bit for evening SWAT and I was getting to go afk. Can you reschedule for a different SWAT window (or the same one tomorrow?). [00:17:19] sorry I missed you :( [00:18:27] Well it's trivial enough, I can deploy it. [00:18:55] (03PS6) 10Smalyshev: Add config for units on Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/311206 (https://phabricator.wikimedia.org/T117032) [00:19:16] thcipriani|afk, are you sure it's enabled everywhere? [00:19:19] (03PS2) 10Dereckson: Remove dead pybal link [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313162 (owner: 10MaxSem) [00:19:23] Dereckson: if you and arseny92 are both available that'd be awesome :) [00:19:37] The test case failed unfortuantely, but it's the same issue as before, so not sure there's a reason to roll back. [00:19:38] (03CR) 10Dereckson: [C: 032] "SWAT (late addition)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313162 (owner: 10MaxSem) [00:20:07] (03Merged) 10jenkins-bot: Remove dead pybal link [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313162 (owner: 10MaxSem) [00:20:19] thanks lol [00:21:20] Couldn't find a rebase link on gerrit though for the attribution. Where is it? [00:21:20] arseny92: live on mw1099 (if noc. is served by application servers) [00:21:32] matt_flaschen: eep...just double-checked, seems to be everywhere. [00:21:57] thcipriani|afk, okay. I don't think we should roll back, but we will have to follow up. [00:22:05] arseny92: https://s3.amazonaws.com/upload.screenshot.co/8449b71c43 [00:22:20] We as in our team [00:24:44] so noc. isn't available on mwxxx servers [00:26:23] !log dereckson@tin Synchronized docroot/noc/index.html: Remove dead pybal link on noc. ([[Gerrit:313162]]) (duration: 00m 48s) [00:26:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:26:53] looks good to me [00:27:05] Dereckson http://i.imgur.com/Kza8eFf.png [00:27:27] arseny92: yeah because the change is merged [00:27:46] the screenshot was from the page before the merge [00:28:17] the goal of Gerrit is to contain "changes to merge", comment them, improve them (that's what review implies) [00:28:41] once merged, the review is more or less closed, you can't offer a new patchset, as the idea is the change merged is the last patchset [00:29:00] (and if merged to master, rebase it would be a no op operation, as the change is already there) [00:29:46] arseny92: https://noc.wikimedia.org/index.html -> you can check if all looks good to you. A refresh with ctrl + shift + r could be needed if the page is cached by your browser [00:33:36] arseny92: ping? [00:34:25] Was it actually pushed to production? [00:34:50] Commit data says its deployed to beta [00:35:34] Dereckson, noc.wm.o routing? [00:35:49] `git grep "noc\.wikimedia\.org"` in the puppet repo [00:35:58] important result is the modules/role/manifests/cache/misc.pp file [00:36:17] that domain as well as dbtree.wikimedia.org is routed to mw1152.eqiad.wmnet [00:36:28] I'm not sure why that random mw host in particular, but there it is [00:41:15] RECOVERY - puppet last run on db1020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:44:42] Tried to see the page from several browsers and also from device where I didn't went to that page so it couldn't be cached. Change still not on production, or what's the labs url for noc to check if its seen on the beta cluster? [00:46:53] I don't think noc even works in beta [00:47:11] not sure it was ever set up [00:47:56] arseny92, what about when you `curl -I "https://noc.wikimedia.org/index.html" | grep Last-Modified` ? [00:49:47] you should get Last-Modified: Tue, 04 Oct 2016 00:20:33 GMT [00:50:56] Jenkins says it deployed to beta though but browsers all show th old page for some reason. Same datetime on curl [00:50:59] though [00:51:04] Weird [00:51:33] beta probably has the files but no means to serve them [00:52:42] if you do the curl without the -I and the grep, does it give you the right version of the file? [01:00:41] It does [01:03:10] And also does so if i use a text browser. Normal stuff display the old page for some reason no matter cache clearings. Tried both IE and ff . Ugh... [01:06:57] http://i.imgur.com/b4mk7zA.png Weird. And yes the gui page is refreshed [01:07:55] Figured. https://noc.wikimedia.org/index.html shows new page . https://noc.wikimedia.org/ without fullurl shows old [01:08:02] oh [01:08:16] could be a Varnish issue ? [01:09:04] Fixed [01:09:13] !log echo 'https://noc.wikimedia.org/' | mwscript purgeList.php [01:09:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:09:34] arseny92: https://noc.wikimedia.org/ now prints well the new page [01:09:56] well [01:09:59] not really [01:10:26] When I purged, it showed the new one, but now it shows again the old one. [01:10:34] curl -I "https://noc.wikimedia.org/" | grep Last-Modified → Last-Modified: Thu, 01 Sep 2016 10:22:52 GMT [01:10:58] immediately after a purge: Last-Modified: Tue, 04 Oct 2016 00:20:33 GMT [01:11:08] Now shows new page for me also [01:11:17] yes, but it's because I purged it again [01:11:39] let's wait a few minutes if it's definitively okay now [01:12:13] What could be the cause for it to not work on the first try (and also only on fullurl) ? [01:12:19] some race condition [01:12:36] A cache update as the same time than the purge. [01:12:50] seems stable now [01:13:18] confirmed on all my devices [01:13:40] Thanks for testing. [01:14:21] Okay now the 00:34:25 < arseny92> Was it actually pushed to production? 00:34:49 < arseny92> Commit data says its deployed to beta [01:14:44] Any change merged in this operations/mediawiki-config repository: [01:15:05] - must be manually pushed to production servers (that's what we do during the SWAT windows) [01:15:31] - automatically triggers a Jenkins job to update the beta cluster configuration [01:15:44] What you've seen on Gerrit is the result of the second. [01:16:13] The notification of the first was the 00:26:23 < logmsgbot> !log dereckson@tin Synchronized docroot/noc/index.html: Remove dead pybal link on noc. ([[Gerrit:313162]]) (duration: 00m 48s) [01:16:49] it's a command to update https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:17:18] I'm not that dumb ;) [01:18:46] Of course not. That one was indeed self documented. [01:19:20] What I mean is: Gerrit gets notification about what happens in Jenkins jobs (including the beta update), our server admin log is the reference to know what happens in prod [01:27:51] [03:27] arseny92: yeah because the change is merged --> don't see rebase option on any commit though [01:32:38] only having cherrypick and followup [01:38:11] Dereckson then i don't understand the point of these functions if they do basically the same thing ( if i understood what's described in patchset 36 at https://gerrit.wikimedia.org/r/#/c/306259/ correctly ) [02:00:25] 06Operations, 10Phabricator, 10Traffic: Phabricator needs to expose notification daemon (websocket) - https://phabricator.wikimedia.org/T112765#2686629 (10mmodell) >>! In T112765#2509512, @BBlack wrote: > There's a little bit of refactoring work (already in-progress) to do on the Varnish side to support it "... [02:16:40] 06Operations, 10Phabricator, 10Traffic: Phabricator needs to expose notification daemon (websocket) - https://phabricator.wikimedia.org/T112765#2686660 (10BBlack) >>! In T112765#2686629, @mmodell wrote: >>>! In T112765#2509512, @BBlack wrote: >> ... but even if that weren't ready in time we can use DNS hack... [02:36:43] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.20) (duration: 18m 19s) [02:36:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:41:39] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Oct 4 02:41:38 UTC 2016 (duration 4m 55s) [02:41:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:47:17] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [50.0] [02:54:57] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [03:26:47] (03PS1) 1020after4: Configuration for Aphlict [puppet] - 10https://gerrit.wikimedia.org/r/313937 (https://phabricator.wikimedia.org/T112765) [03:42:41] hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina [03:56:36] hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina hola alvaro molina [04:14:18] PROBLEM - puppet last run on dbproxy1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:16:00] (03PS2) 1020after4: Configuration for Aphlict [puppet] - 10https://gerrit.wikimedia.org/r/313937 (https://phabricator.wikimedia.org/T112765) [04:18:05] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 05Goal: Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) - https://phabricator.wikimedia.org/T10217#2687167 (10HenryLi) >>! In T10217#2683465, @Liuxinyu970226 wrote: >>>! In... [04:39:29] RECOVERY - puppet last run on dbproxy1004 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [05:00:49] 06Operations, 10Phabricator, 10Traffic, 13Patch-For-Review: Phabricator needs to expose notification daemon (websocket) - https://phabricator.wikimedia.org/T112765#2687202 (10mmodell) @bblack: https://gerrit.wikimedia.org/r/#/c/313937/ is a first-attempt at puppetizing the aphlict notification service [05:04:59] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [05:09:22] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [05:22:48] AaronSchulz: looks like https://phabricator.wikimedia.org/rMW09ca28d01a170d4859e68d4eba1c861ffb576f43 breaks one of the beta cluster integration jobs [05:23:18] log output: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/11812/console [05:25:32] or rather, I guess that patch was supposed to fix it but didn't? [05:36:47] twentyafterfour: maybe https://gerrit.wikimedia.org/r/313943 will help the traces [05:37:15] I don't have that problem locally with LBFactoryMulti [05:37:17] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 05Goal: Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) - https://phabricator.wikimedia.org/T10217#2687230 (10Liuxinyu970226) >>! In T10217#2687167, @HenryLi wrote: >>>! In... [05:58:10] (03PS2) 10Ori.livneh: [WIP] Module for Recommendation API [puppet] - 10https://gerrit.wikimedia.org/r/312045 [05:59:35] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Module for Recommendation API [puppet] - 10https://gerrit.wikimedia.org/r/312045 (owner: 10Ori.livneh) [06:13:36] hmm [06:14:16] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 05Goal: Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) - https://phabricator.wikimedia.org/T10217#2687244 (10HenryLi) >>! In T10217#2687230, @Liuxinyu970226 wrote: >>>! In... [06:21:07] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 05Goal: Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) - https://phabricator.wikimedia.org/T10217#2687245 (10Liuxinyu970226) >>! In T10217#2687244, @HenryLi wrote: >>>! In... [06:33:14] 06Operations, 10media-storage: Two recently uploaded files have disappeared (404) - https://phabricator.wikimedia.org/T147040#2687261 (10greg) From @fgiunchedi on the ops list: > The only correlation I can spot from a quick look is both were uploaded via > flickr2commons, maybe related? [06:33:56] 06Operations, 10media-storage: Two recently uploaded files have disappeared (404) - https://phabricator.wikimedia.org/T147040#2687263 (10greg) (As this is still UBN!) Any new reports of this? [06:35:46] 06Operations, 10media-storage: Two recently uploaded files have disappeared (404) - https://phabricator.wikimedia.org/T147040#2687264 (10greg) >>! In T147040#2687263, @greg wrote: > Any new reports of this? Nothing on the linked Commons page. [06:41:21] PROBLEM - puppet last run on ms-be2018 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata] [06:42:25] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 05Goal: Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) - https://phabricator.wikimedia.org/T10217#2687272 (10HenryLi) >>! In T10217#2687245, @Liuxinyu970226 wrote: >>>! In... [06:57:53] (03PS2) 10Muehlenhoff: udp2log: Restrict to domain networks [puppet] - 10https://gerrit.wikimedia.org/r/312525 [07:02:43] (03CR) 10Muehlenhoff: [C: 032] Revert "Update Debian patches for 1.0.2i" [debs/openssl] - 10https://gerrit.wikimedia.org/r/313418 (owner: 10Muehlenhoff) [07:04:29] !log executed salt -C 'G@cluster:jobrunner and G@site:eqiad' cmd.run 'find /var/log/hhvm/ -type f -user root -exec chown www-data:www-data {} \;' (also in codfw) to reduce cronspam [07:07:08] RECOVERY - puppet last run on ms-be2018 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:08:50] (03PS1) 10Muehlenhoff: Revert "Bump changelog for 1.0.2i update" [debs/openssl] - 10https://gerrit.wikimedia.org/r/313951 [07:09:36] !log rebooting eventlog1001 for kernel upgrades [07:09:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:20:06] (03PS3) 10Alexandros Kosiaris: Apertium: Support jessie [puppet] - 10https://gerrit.wikimedia.org/r/308679 (https://phabricator.wikimedia.org/T144588) (owner: 10KartikMistry) [07:20:58] PROBLEM - Kafka MirrorMaker main-eqiad_to_analytics on kafka1012 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_analytics/producer\.properties [07:21:48] mmmmmm [07:24:15] (03CR) 10Muehlenhoff: [C: 032] Revert "Bump changelog for 1.0.2i update" [debs/openssl] - 10https://gerrit.wikimedia.org/r/313951 (owner: 10Muehlenhoff) [07:24:17] (03PS4) 10Alexandros Kosiaris: Apertium: Support jessie [puppet] - 10https://gerrit.wikimedia.org/r/308679 (https://phabricator.wikimedia.org/T144588) (owner: 10KartikMistry) [07:24:48] (03PS6) 10Giuseppe Lavagetto: scap::source: also define the corresponding dsh group [puppet] - 10https://gerrit.wikimedia.org/r/306431 [07:24:50] (03PS2) 10Giuseppe Lavagetto: role::deployment::server: fix scap3/trebuchet declarations [puppet] - 10https://gerrit.wikimedia.org/r/306440 (https://phabricator.wikimedia.org/T143692) [07:24:52] (03PS1) 10Giuseppe Lavagetto: scap::source: simple wrapper around scap_source [puppet] - 10https://gerrit.wikimedia.org/r/313953 [07:24:54] (03PS1) 10Giuseppe Lavagetto: role::deployment::server: remove useless scap::server class [puppet] - 10https://gerrit.wikimedia.org/r/313954 [07:26:08] RECOVERY - Kafka MirrorMaker main-eqiad_to_analytics on kafka1012 is OK: PROCS OK: 1 process with command name java, regex args kafka.tools.MirrorMaker.+/etc/kafka/mirror/main-eqiad_to_analytics/producer\.properties [07:27:33] I just re-started it, but sadly I am not super expert about mirror maker (replicating kafka topics across clusters) [07:27:40] (03CR) 10jenkins-bot: [V: 04-1] scap::source: simple wrapper around scap_source [puppet] - 10https://gerrit.wikimedia.org/r/313953 (owner: 10Giuseppe Lavagetto) [07:28:55] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "PCC is happy at https://puppet-compiler.wmflabs.org/4198/, this should allow to support jessie seamlessly during the migration, merging" [puppet] - 10https://gerrit.wikimedia.org/r/308679 (https://phabricator.wikimedia.org/T144588) (owner: 10KartikMistry) [07:29:00] (03PS5) 10Alexandros Kosiaris: Apertium: Support jessie [puppet] - 10https://gerrit.wikimedia.org/r/308679 (https://phabricator.wikimedia.org/T144588) (owner: 10KartikMistry) [07:29:02] (03CR) 10Alexandros Kosiaris: [V: 032] Apertium: Support jessie [puppet] - 10https://gerrit.wikimedia.org/r/308679 (https://phabricator.wikimedia.org/T144588) (owner: 10KartikMistry) [07:31:47] PROBLEM - puppet last run on mw1226 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:48:59] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 05Goal: Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) - https://phabricator.wikimedia.org/T10217#2687345 (10Verdy_p) >>! In T10217#2687230, @Liuxinyu970226 wrote: > Huh,... [07:55:00] RECOVERY - puppet last run on mw1226 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [08:03:49] (03CR) 10Giuseppe Lavagetto: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/313953 (owner: 10Giuseppe Lavagetto) [08:05:53] (03PS2) 10Giuseppe Lavagetto: scap::source: simple wrapper around scap_source [puppet] - 10https://gerrit.wikimedia.org/r/313953 [08:06:05] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] scap::source: simple wrapper around scap_source [puppet] - 10https://gerrit.wikimedia.org/r/313953 (owner: 10Giuseppe Lavagetto) [08:10:26] (03PS2) 10Giuseppe Lavagetto: role::deployment::server: remove useless scap::server class [puppet] - 10https://gerrit.wikimedia.org/r/313954 [08:14:20] (03CR) 10Giuseppe Lavagetto: [C: 032] role::deployment::server: remove useless scap::server class [puppet] - 10https://gerrit.wikimedia.org/r/313954 (owner: 10Giuseppe Lavagetto) [08:18:54] (03PS3) 10Giuseppe Lavagetto: role::deployment::server: fix scap3/trebuchet declarations [puppet] - 10https://gerrit.wikimedia.org/r/306440 (https://phabricator.wikimedia.org/T143692) [08:29:47] (03PS7) 10Giuseppe Lavagetto: scap::source: also define the corresponding dsh group [puppet] - 10https://gerrit.wikimedia.org/r/306431 [08:29:49] (03PS4) 10Giuseppe Lavagetto: role::deployment::server: fix scap3/trebuchet declarations [puppet] - 10https://gerrit.wikimedia.org/r/306440 (https://phabricator.wikimedia.org/T143692) [08:35:25] (03PS1) 10Muehlenhoff: Add explicit dependency on ghostscript [puppet] - 10https://gerrit.wikimedia.org/r/313963 [08:36:22] 07Puppet, 10Beta-Cluster-Infrastructure: deployment-apertium01 puppet failing due to missing packages on trusty - https://phabricator.wikimedia.org/T147210#2687554 (10hashar) deployment-apertium01 is a Trusty instance. Maybe we can move apertium to the deployment-sca* instances which are jessie? that is {T14... [08:43:58] !log reimaging mw119[89] to jessie [08:44:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:48:02] (03CR) 10Hashar: "That would probably clash with mediawiki::packages::multimedia which defines ghostscript as well." [puppet] - 10https://gerrit.wikimedia.org/r/313963 (owner: 10Muehlenhoff) [08:48:33] (03PS8) 10Giuseppe Lavagetto: scap::source: also define the corresponding dsh group [puppet] - 10https://gerrit.wikimedia.org/r/306431 [08:48:35] (03PS5) 10Giuseppe Lavagetto: role::deployment::server: fix scap3/trebuchet declarations [puppet] - 10https://gerrit.wikimedia.org/r/306440 (https://phabricator.wikimedia.org/T143692) [08:51:14] (03CR) 10Giuseppe Lavagetto: [C: 032] role::deployment::server: fix scap3/trebuchet declarations [puppet] - 10https://gerrit.wikimedia.org/r/306440 (https://phabricator.wikimedia.org/T143692) (owner: 10Giuseppe Lavagetto) [08:52:38] (03PS1) 10Muehlenhoff: Move mediawiki-converters.profile mediawiki-firejail-ghostscript to all mediawiki servers [puppet] - 10https://gerrit.wikimedia.org/r/313964 [08:57:25] (03PS3) 10Hashar: rpc: trick mw into generating a raw exception report [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312077 [08:58:57] (03CR) 10Hashar: "Added to European SWAT." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312077 (owner: 10Hashar) [09:01:03] (03CR) 10Filippo Giunchedi: [C: 032] Upgrade to 0.1.22 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/313781 (owner: 10Gilles) [09:01:37] (03PS2) 10Muehlenhoff: Add explicit dependency on ghostscript [puppet] - 10https://gerrit.wikimedia.org/r/313963 [09:03:01] !log Regenerating configuration of all Jenkins job due to https://gerrit.wikimedia.org/r/#/c/313306/ [09:03:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:03:54] moritzm: I have no idea where PdfHandler is running [09:03:59] maybe it is just on the jobrunner [09:04:44] 07Puppet, 10Beta-Cluster-Infrastructure, 07Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#2687581 (10akosiaris) [09:04:47] 07Puppet, 10Beta-Cluster-Infrastructure: deployment-apertium01 puppet failing due to missing packages on trusty - https://phabricator.wikimedia.org/T147210#2687578 (10akosiaris) 05Open>03Resolved a:03akosiaris https://gerrit.wikimedia.org/r/#/c/308679/ fixes this. I 'll close as resolved [09:05:14] moritzm: might want to run it via the puppet compiler [09:06:25] hashar: mediawiki::packages already declares packages used by that extension (e.g. pdfinfo as provided by poppler-utils), so I suppose it's running on the app servers [09:06:33] I guess [09:06:53] but I am not sure whether mediawiki::packages is always included :/ [09:10:05] 06Operations, 10ContentTranslation-CXserver, 10ContentTranslation-Deployments, 10MediaWiki-extensions-ContentTranslation: Migrate apertium to SCB - https://phabricator.wikimedia.org/T147288#2687592 (10akosiaris) [09:12:18] (03PS2) 10Alexandros Kosiaris: apertium: Enable it on SCB [puppet] - 10https://gerrit.wikimedia.org/r/310311 (https://phabricator.wikimedia.org/T147288) [09:12:20] (03PS1) 10Alexandros Kosiaris: conftool: Add apertium to scb cluster [puppet] - 10https://gerrit.wikimedia.org/r/313965 (https://phabricator.wikimedia.org/T147288) [09:12:22] (03PS1) 10Alexandros Kosiaris: conftool: Set the apertium services on all scb nodes [puppet] - 10https://gerrit.wikimedia.org/r/313966 (https://phabricator.wikimedia.org/T147288) [09:12:24] (03PS1) 10Alexandros Kosiaris: lvs: Migrate apertium to scb [puppet] - 10https://gerrit.wikimedia.org/r/313967 (https://phabricator.wikimedia.org/T147288) [09:12:26] (03PS1) 10Alexandros Kosiaris: conftool: Remove apertium from sca machines [puppet] - 10https://gerrit.wikimedia.org/r/313968 (https://phabricator.wikimedia.org/T147288) [09:12:28] (03PS1) 10Alexandros Kosiaris: conftool: Remove the apertium service from sca [puppet] - 10https://gerrit.wikimedia.org/r/313969 (https://phabricator.wikimedia.org/T147288) [09:12:30] (03PS1) 10Alexandros Kosiaris: Remove apertium from sca [puppet] - 10https://gerrit.wikimedia.org/r/313970 (https://phabricator.wikimedia.org/T147288) [09:17:06] (03PS2) 10Filippo Giunchedi: deployment-prep: Move poolcounter to deployment-poolcounter04 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313789 (https://phabricator.wikimedia.org/T123734) [09:20:01] 06Operations, 10OCG-General, 13Patch-For-Review: Tons of OCG jobs caused a massive increase in queue length - https://phabricator.wikimedia.org/T147211#2687623 (10elukey) @cscott Thanks a lot for all the work! [09:23:40] (03CR) 10Filippo Giunchedi: [C: 032] deployment-prep: Move poolcounter to deployment-poolcounter04 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313789 (https://phabricator.wikimedia.org/T123734) (owner: 10Filippo Giunchedi) [09:24:05] (03Merged) 10jenkins-bot: deployment-prep: Move poolcounter to deployment-poolcounter04 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313789 (https://phabricator.wikimedia.org/T123734) (owner: 10Filippo Giunchedi) [09:26:30] 06Operations, 06Services, 15User-mobrovac: Migrate SCA cluster to SCB (Jessie and Node 4.2) - https://phabricator.wikimedia.org/T96017#2687639 (10akosiaris) [09:26:32] (03PS1) 10Muehlenhoff: Revert "Imported Upstream version 1.0.2i" [debs/openssl] - 10https://gerrit.wikimedia.org/r/313972 [09:26:33] 06Operations, 10Citoid, 06Services, 10VisualEditor: Package and test Zotero for Jessie - https://phabricator.wikimedia.org/T107302#2687636 (10akosiaris) 05Open>03declined I 'll decline this. XULrunner is not present in Jessie and will never be. The project no longer receives any updates in general and... [09:29:57] (03PS3) 10Filippo Giunchedi: Automatic async cleanup of thumbor temp files [puppet] - 10https://gerrit.wikimedia.org/r/313782 (owner: 10Gilles) [09:33:55] (03CR) 10Filippo Giunchedi: [C: 032] Automatic async cleanup of thumbor temp files [puppet] - 10https://gerrit.wikimedia.org/r/313782 (owner: 10Gilles) [09:37:07] (03CR) 10Giuseppe Lavagetto: [C: 031] Upgrade memcached on mc2009 to 1.4.28 [puppet] - 10https://gerrit.wikimedia.org/r/313803 (https://phabricator.wikimedia.org/T129963) (owner: 10Elukey) [09:40:38] (03PS3) 10Elukey: Upgrade memcached on mc2009 to 1.4.28 [puppet] - 10https://gerrit.wikimedia.org/r/313803 (https://phabricator.wikimedia.org/T129963) [09:43:51] (03PS3) 10Filippo Giunchedi: Enable manhole on thumbor [puppet] - 10https://gerrit.wikimedia.org/r/313783 (owner: 10Gilles) [09:45:57] (03CR) 10Filippo Giunchedi: [C: 032] Enable manhole on thumbor [puppet] - 10https://gerrit.wikimedia.org/r/313783 (owner: 10Gilles) [09:50:00] (03CR) 10jenkins-bot: [V: 04-1] Remove apertium from sca [puppet] - 10https://gerrit.wikimedia.org/r/313970 (https://phabricator.wikimedia.org/T147288) (owner: 10Alexandros Kosiaris) [09:56:40] !log adding mw119[89] to the live api server pool (volans provides magic) [09:56:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:58:48] elukey: rotfl! [10:07:23] 06Operations, 06Developer-Relations (Jul-Sep-2016): Operations Team Offsite - https://phabricator.wikimedia.org/T141940#2687801 (10Qgil) 05Open>03Resolved As far as I am aware this is now done. [10:07:33] !log reimaging mw120[23] to Jessie [10:07:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:10:25] (03PS4) 10Elukey: Upgrade memcached on mc2009 to 1.4.28 [puppet] - 10https://gerrit.wikimedia.org/r/313803 (https://phabricator.wikimedia.org/T129963) [10:12:57] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 211, down: 0, dormant: 0, excluded: 0, unused: 0 [10:13:00] 06Operations, 10DBA, 10MediaWiki-General-or-Unknown: img_metadata queries for PDF files saturates s4 slaves - https://phabricator.wikimedia.org/T147296#2687819 (10Volans) [10:18:47] (03CR) 10Elukey: [C: 032 V: 032] "Extra puppet compiler run https://puppet-compiler.wmflabs.org/4200/" [puppet] - 10https://gerrit.wikimedia.org/r/313803 (https://phabricator.wikimedia.org/T129963) (owner: 10Elukey) [10:20:20] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3183303 keys - replication_delay is 13 [10:22:57] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 209, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-codfw:xe-5/2/1 (Telia, IC-307235, 34ms) {#2648} [10Gbps wave]BR [10:23:16] !log installed memcached 1.4.28-1.1+wmf1 on mc2009 as part of a performance test - T129963 [10:23:17] T129963: Update memcached package and configuration options - https://phabricator.wikimedia.org/T129963 [10:23:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:23:26] (03CR) 10Muehlenhoff: "PCC shows it's a nop for existing video/image scalers and that it gets added to the remaining mediawiki servers:" [puppet] - 10https://gerrit.wikimedia.org/r/313964 (owner: 10Muehlenhoff) [10:23:51] (03PS1) 10Urbanecm: [throttle] Increase account creation limits for an event in Perpignan on 2016-10-15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313986 (https://phabricator.wikimedia.org/T147293) [10:24:05] (03CR) 10jenkins-bot: [V: 04-1] [throttle] Increase account creation limits for an event in Perpignan on 2016-10-15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313986 (https://phabricator.wikimedia.org/T147293) (owner: 10Urbanecm) [10:27:11] PROBLEM - puppet last run on ms-be1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:30:51] (03PS2) 10Urbanecm: [throttle] Increase account creation limits for an event in Perpignan on 2016-10-15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313986 (https://phabricator.wikimedia.org/T147293) [10:39:56] (03CR) 10Alexandros Kosiaris: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/313970 (https://phabricator.wikimedia.org/T147288) (owner: 10Alexandros Kosiaris) [10:41:25] 06Operations, 06Performance-Team, 10Thumbor: Figure out a way to live-debug running production thumbor processes - https://phabricator.wikimedia.org/T146143#2687922 (10fgiunchedi) looks like this is working in production now: ```lines=4 root@thumbor1001:/tmp/systemd-private-df157af6e95c486c8c2f94c895a96346-... [10:42:44] (03PS2) 10Muehlenhoff: Move mediawiki-converters.profile mediawiki-firejail-ghostscript to all mediawiki servers [puppet] - 10https://gerrit.wikimedia.org/r/313964 [10:50:39] (03CR) 10Muehlenhoff: [C: 032] Revert "Imported Upstream version 1.0.2i" [debs/openssl] - 10https://gerrit.wikimedia.org/r/313972 (owner: 10Muehlenhoff) [10:52:13] RECOVERY - puppet last run on ms-be1009 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:00:12] PROBLEM - check_mysql on frdb1001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1272 [11:03:25] (03PS1) 10Muehlenhoff: Update to 4.4.22 [debs/linux44] - 10https://gerrit.wikimedia.org/r/313989 [11:05:05] (03PS1) 10Marostegui: db-eqiad.php: db1019 is going to be decommissioned [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313990 (https://phabricator.wikimedia.org/T146265) [11:05:12] RECOVERY - check_mysql on frdb1001 is OK: Uptime: 339066 Threads: 1 Questions: 76744901 Slow queries: 2655 Opens: 3502 Flush tables: 1 Open tables: 586 Queries per second avg: 226.342 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [11:10:39] (03CR) 10Volans: [C: 031] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313990 (https://phabricator.wikimedia.org/T146265) (owner: 10Marostegui) [11:11:22] (03CR) 10Volans: "we can discuss separately about the hostsByName array entries" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313990 (https://phabricator.wikimedia.org/T146265) (owner: 10Marostegui) [11:11:38] (03CR) 10Marostegui: [C: 032] db-eqiad.php: db1019 is going to be decommissioned [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313990 (https://phabricator.wikimedia.org/T146265) (owner: 10Marostegui) [11:12:04] (03Merged) 10jenkins-bot: db-eqiad.php: db1019 is going to be decommissioned [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313990 (https://phabricator.wikimedia.org/T146265) (owner: 10Marostegui) [11:14:14] !log adding mw120[23] back to the live api servers pool [11:14:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:14:31] no magic this time? :-P [11:14:48] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Removing db1019 entry as it is going to be decommissioned - T146265 (duration: 00m 51s) [11:14:50] T146265: db1019: Decommission - https://phabricator.wikimedia.org/T146265 [11:14:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:16:25] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 631 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 3185314 keys - replication_delay is 631 [11:17:03] volans: yes of course! :D [11:17:18] 06Operations, 10Citoid, 06Services, 10VisualEditor: Package and test Zotero for Jessie - https://phabricator.wikimedia.org/T107302#1491929 (10Mvolz) Relevant discussion here: https://phabricator.wikimedia.org/T93579 I believe it was decided not to rewrite Zotero core functions in Node because Zotero is sw... [11:18:17] (03CR) 10Nschaaf: "I think it would be better to install the repo as a package instead of serving directly from the source. I outlined a few reasons here: ht" [puppet] - 10https://gerrit.wikimedia.org/r/312045 (owner: 10Ori.livneh) [11:20:16] (03PS2) 10Muehlenhoff: Update to 4.4.22 [debs/linux44] - 10https://gerrit.wikimedia.org/r/313989 [11:35:40] (03PS1) 10Marostegui: db1019 is going to be decommisioned [puppet] - 10https://gerrit.wikimedia.org/r/313995 (https://phabricator.wikimedia.org/T146265) [11:43:47] (03CR) 10Volans: "Puppet compiler run on https://puppet-compiler.wmflabs.org/4203/db1019.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/313995 (https://phabricator.wikimedia.org/T146265) (owner: 10Marostegui) [12:01:11] (03CR) 10Volans: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/313995 (https://phabricator.wikimedia.org/T146265) (owner: 10Marostegui) [12:04:12] PROBLEM - puppet last run on alsafi is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:06:06] (03CR) 10jenkins-bot: [V: 04-1] db1019 is going to be decommisioned [puppet] - 10https://gerrit.wikimedia.org/r/313995 (https://phabricator.wikimedia.org/T146265) (owner: 10Marostegui) [12:09:42] away for lunch (actually: cat sitting. last day) [12:12:48] (03CR) 10Dzahn: [C: 031] "approved by lang commitee on https://meta.wikimedia.org/wiki/Requests_for_new_languages/Wikipedia_Livvi-Karelian just needs the " Tell in " [dns] - 10https://gerrit.wikimedia.org/r/312805 (https://phabricator.wikimedia.org/T146612) (owner: 10MarcoAurelio) [12:14:12] PROBLEM - OCG health on ocg1001 is CRITICAL: CRITICAL: ocg_job_status 1201318 msg (=800000 warning): ocg_render_job_queue 3188 msg (=3000 critical) [12:14:25] PROBLEM - OCG health on ocg1002 is CRITICAL: CRITICAL: ocg_job_status 1201341 msg (=800000 warning): ocg_render_job_queue 3152 msg (=3000 critical) [12:16:16] <_joe_> that's a consequence of the high number of evictions/intended failures we're causing [12:17:23] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure: prepare storage layer for olo.wikipedia - https://phabricator.wikimedia.org/T147302#2688126 (10Dzahn) [12:19:23] PROBLEM - BGP status on cr1-ulsfo is CRITICAL: BGP CRITICAL - AS1299/IPv6: Active, AS1299/IPv4: Connect [12:23:42] PROBLEM - IPv4 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 48 probes of 421 (alerts on 19) - https://atlas.ripe.net/measurements/1791307/#!map [12:24:22] RECOVERY - puppet last run on alsafi is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [12:29:55] RECOVERY - BGP status on cr1-ulsfo is OK: BGP OK - up: 16, down: 0, shutdown: 0 [12:30:15] RECOVERY - IPv4 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 2 probes of 421 (alerts on 19) - https://atlas.ripe.net/measurements/1791307/#!map [12:38:43] moritzm: elukey: would you mind landing a patch for beta cluster deployment servers. Should be the final one now https://gerrit.wikimedia.org/r/#/c/312654/ :D [12:38:52] merely bring us back to deployment-tin as the primary [12:39:01] Hi. Where should I chat about IP sockpuppetry that I've encountered? [12:39:02] people were confused with deployment-mira being the primary [12:40:39] (03CR) 10Mobrovac: [C: 031] "LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/313892 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [12:41:06] 06Operations, 10DBA: Drop database table "email_capture" from Wikimedia wikis - https://phabricator.wikimedia.org/T57676#2688222 (10Marostegui) This table has been renamed to `TO_DROP_email_capture` across eqiad in the following wikis. S1: enwiki S3: testwiki [12:41:54] Mardus: this channel is only for the infrastructure management . We don't deal with users sockpuppets . Might want to try #wikimedia-checkusers ? [12:42:16] hashar: Thanks, I'll look into it. [12:42:25] But there's overflow. [12:42:37] Mardus: there is also #wikimedia-gs for global sysops [12:42:38] ah [12:42:50] yeah #wikimedia-checkusers redirects to #wikimedia-stewards apparently [12:45:15] 06Operations, 10Citoid, 10ContentTranslation-CXserver, 10MediaWiki-extensions-ContentTranslation, and 5 others: Decom legacy ex-parsoidcache cxserver, citoid, and restbase service hostnames - https://phabricator.wikimedia.org/T133001#2688238 (10BBlack) [12:47:26] 06Operations, 10MediaWiki-Email, 10Traffic, 07Easy, 07HTTPS: Links in MediaWiki emails should respect the user's https preference - https://phabricator.wikimedia.org/T41676#2688250 (10BBlack) [12:48:07] (03CR) 10Mobrovac: [C: 04-1] "Looking much better, some minor comments in-lined." (033 comments) [software/cassandra-twcs] - 10https://gerrit.wikimedia.org/r/313825 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [12:48:44] 06Operations, 10MediaWiki-extensions-ZeroPortal, 10Traffic, 06Zero: Move proxy IP lists to META for Varnish XFF decoding - https://phabricator.wikimedia.org/T89838#2688259 (10BBlack) [12:49:17] (03CR) 10Mobrovac: [C: 04-1] "Hm, ok, just setting the repo up in hiera is not enough. We need scap::target resources on the nodes, so likely we need to place them in t" [puppet] - 10https://gerrit.wikimedia.org/r/313892 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [12:50:39] (03CR) 10Rush: Add python version of maintain-replicas script (031 comment) [software] - 10https://gerrit.wikimedia.org/r/295607 (https://phabricator.wikimedia.org/T138450) (owner: 10Alex Monk) [12:51:07] hashar: As nobody else scheduled his/her patches for next SWAT, is it possible to start it a bit sooner? [12:51:15] jouncebot: next [12:51:15] In 0 hour(s) and 8 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161004T1300) [12:51:21] Urbanecm: yeas [12:51:29] Maybe ^ can be interesting for zeljkof too :) [12:51:37] Okay, so I'm ready and prepared for deploys :) [12:52:05] 06Operations, 10MediaWiki-Email, 10Traffic, 07Easy, 07HTTPS: Links in MediaWiki emails should respect the user's https preference - https://phabricator.wikimedia.org/T41676#476434 (10Dzahn) The "Phabricator_maintenance" user re-added the project HTTPS. Then herald re-added Traffic and Operations. This is... [12:52:06] (03PS3) 10Hashar: [throttle] Increase account creation limits for an event in Perpignan on 2016-10-15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313986 (https://phabricator.wikimedia.org/T147293) (owner: 10Urbanecm) [12:52:45] 06Operations, 10RESTBase, 06Services, 10Traffic, 07Service-Architecture: Proxying new services through RESTBase - https://phabricator.wikimedia.org/T96688#2688264 (10BBlack) Clearly at least some new services are being deployed as RB-based services, and some legacy services have been converted (but a few... [12:52:53] hashar: what's the plan for swat today? you? me? [12:53:11] I can handle it [12:53:13] :D [12:53:41] (03CR) 10Hashar: [C: 032] [throttle] Increase account creation limits for an event in Perpignan on 2016-10-15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313986 (https://phabricator.wikimedia.org/T147293) (owner: 10Urbanecm) [12:53:43] PROBLEM - puppet last run on labstore1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:53:43] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 211, down: 0, dormant: 0, excluded: 0, unused: 0 [12:54:05] (03PS2) 10Hashar: Enable subpages for main namespace in arbcom_nlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313797 (https://phabricator.wikimedia.org/T147186) (owner: 10Urbanecm) [12:54:11] (03Merged) 10jenkins-bot: [throttle] Increase account creation limits for an event in Perpignan on 2016-10-15 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313986 (https://phabricator.wikimedia.org/T147293) (owner: 10Urbanecm) [12:54:22] hashar: great, please do then :) [12:54:42] Urbanecm: it is on mw1099 testing :) [12:55:01] frwiki looks fine [12:55:03] PROBLEM - puppet last run on wasat is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:55:32] hashar: Is it needed to test throttle patch? [12:55:42] just being careful :] [12:56:13] !log hashar@tin Synchronized wmf-config/throttle.php: [throttle] Increase account creation limits for an event in Perpignan on 201 T147293 (duration: 00m 50s) [12:56:14] T147293: Increase account creation limits for an event in Perpignan on 2016-10-15 - https://phabricator.wikimedia.org/T147293 [12:56:16] hashar: Okay, but yesterday you told to zeljkof that it could be deployed without testing. [12:56:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:56:29] But both wikidata and cawiki seems to be ok. [12:56:30] Urbanecm: yeah because I did review it [12:56:47] hashar: Okay. [12:56:49] and in this case I wanted the extra confidence :] [12:56:53] Is it deployed hashar ? [12:56:58] Okay, understood. [12:57:00] yes [12:57:07] Okay, going to close the task. [12:57:18] https://gerrit.wikimedia.org/r/#/c/313797/2/wmf-config/InitialiseSettings.php [12:57:22] hashar: hello! Can't rebase the patchset 4 due to merge conflicts, can you check? [12:57:24] it is missing the leading + [12:57:27] so I guess that will override [12:57:37] elukey: ahhhhhh will do after swat :( [12:57:53] hashar: Why did you sent a link to diff? [12:57:57] Is it something wrong with it? [12:57:59] sure! The patch looks good to me, and I guess that you have cherry picked in deployment-prep right? [12:58:15] (03CR) 10Hashar: Enable subpages for main namespace in arbcom_nlwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313797 (https://phabricator.wikimedia.org/T147186) (owner: 10Urbanecm) [12:58:49] Urbanecm: yeah so that variable has a set of default, and if you use the key "arbcom_nlwiki" that will override the default [12:59:01] with a trailing + such as "+arbcom_nlwiki" that will merge the settings [12:59:11] Understood, going to fix it then. [12:59:20] so the end result is that only the namespace 0 will have subpages [12:59:33] we should get tests for those :( [12:59:40] This isn't what I wanted :) [12:59:50] so just add a + [12:59:51] :] [13:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161004T1300). Please do the needful. [13:00:05] Urbanecm and hashar: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:00:09] (03PS3) 10Urbanecm: Enable subpages for main namespace in arbcom_nlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313797 (https://phabricator.wikimedia.org/T147186) [13:00:13] We know jouncebot :) [13:00:19] * hashar is dealing with the SWAT [13:00:31] hashar: PS3 fixed it. [13:01:31] (03CR) 10Hashar: [C: 032] Enable subpages for main namespace in arbcom_nlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313797 (https://phabricator.wikimedia.org/T147186) (owner: 10Urbanecm) [13:01:58] (03Merged) 10jenkins-bot: Enable subpages for main namespace in arbcom_nlwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313797 (https://phabricator.wikimedia.org/T147186) (owner: 10Urbanecm) [13:03:59] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable subpages for main namespace in arbcom_nlwiki T147186 (duration: 00m 49s) [13:04:00] T147186: arbcom-nl: enable subpages for main namespace - https://phabricator.wikimedia.org/T147186 [13:04:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:04:13] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 209, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-codfw:xe-5/2/1 (Telia, IC-307235, 34ms) {#2648} [10Gbps wave]BR [13:04:30] !log Purged namespace 0 pages for arbcom_nlwiki (T147186) via: mwscript purgeList.php --wiki=arbcom_nlwiki --namespace=0 --verbose [13:04:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:04:53] hashar: Notice, I can't test it because I have no access to the wiki. I'll ask for check the arbs. [13:04:55] Is it ok? [13:05:03] Urbanecm: I guess that wiki is private so I am not going to publicly list the page that got purged [13:05:13] 06Operations, 10Analytics-Cluster, 10Traffic: Respect X-Forwarded-For only from trustworthy sources - https://phabricator.wikimedia.org/T56783#2688311 (10BBlack) So I've linked above that we have a separate task already about moving Zero's trusted proxy lists to metawiki, for more-transparent / community man... [13:05:18] Yes, wiki is private with no access for me. [13:05:27] Urbanecm: it should be all fine now [13:05:33] hashar: Thanks a lot! [13:05:59] Urbanecm: if you notice there is an issue, do ping me / #wikimedia-releng or add to the next swat :] [13:06:05] Urbanecm: thank you very much for the patches [13:06:28] Okay. You're welcome hashar ;). [13:07:16] (03PS4) 10Hashar: rpc: trick mw into generating a raw exception report [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312077 [13:07:26] hashar: The proposer confirmed it works so all should be fine :) [13:07:33] Urbanecm: that was fast! [13:08:07] 06Operations, 07LDAP: update ldap-[codfw|eqiad].wikimedia.org certificates (expire on 2016-09-20) - https://phabricator.wikimedia.org/T145201#2688322 (10BBlack) [13:08:15] Yeah hashar [13:08:17] lets tweak the rpc/RunJobs.php entry point now [13:08:44] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312077 (owner: 10Hashar) [13:08:45] !log reimage mw120[45] to Jessie [13:08:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:09:07] 06Operations, 10Domains, 10Traffic, 10Wikimedia-Site-requests, 13Patch-For-Review: Private wiki for Project Grants Committee - https://phabricator.wikimedia.org/T143138#2688325 (10BBlack) [13:09:12] (03Merged) 10jenkins-bot: rpc: trick mw into generating a raw exception report [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312077 (owner: 10Hashar) [13:16:54] !log hashar@tin Synchronized rpc/RunJobs.php: trick mw into generating a raw exception report (duration: 00m 47s) [13:16:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:18:13] (03CR) 10Filippo Giunchedi: [C: 04-1] "LGTM generally, comments inline" (039 comments) [puppet] - 10https://gerrit.wikimedia.org/r/310719 (https://phabricator.wikimedia.org/T135427) (owner: 10Thcipriani) [13:19:23] RECOVERY - puppet last run on labstore1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:19:35] PROBLEM - puppet last run on cp3018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:20:53] RECOVERY - puppet last run on wasat is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:22:05] (03PS5) 10Hashar: beta: update deployment-tin IP and make it master [puppet] - 10https://gerrit.wikimedia.org/r/312654 (https://phabricator.wikimedia.org/T144006) [13:22:40] hashar: will look into 312654 in a bit [13:24:13] (03CR) 10Volans: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/313995 (https://phabricator.wikimedia.org/T146265) (owner: 10Marostegui) [13:24:51] moritzm: elukey was willing to handle it :] I got https://gerrit.wikimedia.org/r/#/c/312654/ rebased all fine [13:25:00] there was a conflict with the scap version being passed to the class. quite trivial [13:25:05] cherry picked on beta again [13:25:28] hashar: ah, ok! [13:25:40] 06Operations, 10DNS, 10Mail, 10Traffic: Set up role accounts and feedback loops (FBL) with all providers - https://phabricator.wikimedia.org/T106664#2688423 (10BBlack) [13:26:25] (03PS3) 10Muehlenhoff: Update to 4.4.22 [debs/linux44] - 10https://gerrit.wikimedia.org/r/313989 [13:27:15] (03CR) 10Muehlenhoff: [C: 032] Update to 4.4.22 [debs/linux44] - 10https://gerrit.wikimedia.org/r/313989 (owner: 10Muehlenhoff) [13:28:26] moritzm: if you want to double check it would be better :) [13:28:51] 06Operations, 10DBA, 06Performance-Team, 07Availability, 07Wikimedia-Multiple-active-datacenters: Apache <=> mariadb SSL/TLS for cross-datacenter writes - https://phabricator.wikimedia.org/T134809#2688443 (10BBlack) [13:29:08] elukey: sure, will have a look in a bit [13:29:11] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/313995 (https://phabricator.wikimedia.org/T146265) (owner: 10Marostegui) [13:31:41] (03CR) 10Hashar: [C: 04-1] "This patch conflict on the beta cluster puppetmaster. The file hieradata/common/scap/server.yaml has been removed by https://gerrit.wikim" [puppet] - 10https://gerrit.wikimedia.org/r/305256 (https://phabricator.wikimedia.org/T143129) (owner: 10Mobrovac) [13:32:53] (03CR) 10Marostegui: [C: 032] db1019 is going to be decommisioned [puppet] - 10https://gerrit.wikimedia.org/r/313995 (https://phabricator.wikimedia.org/T146265) (owner: 10Marostegui) [13:33:13] !log upgrade grafana to 3.1.1 on labmon1001 - T146354 [13:33:14] T146354: upgrade grafana to 3.1.1 - https://phabricator.wikimedia.org/T146354 [13:33:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:33:54] !log Remove db1019 from prometheus also adding it to spare as it is going to be decommissioned [13:33:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:34:15] 06Operations, 10Traffic: cache_upload should give an informative 404 rather than 403 on req.http.host != upload.wikimedia.org - https://phabricator.wikimedia.org/T118394#2688475 (10BBlack) 05Open>03Resolved a:03BBlack This was fixed some time ago, independently of this ticket I guess. [13:37:12] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure: prepare storage layer for olo.wikipedia - https://phabricator.wikimedia.org/T147302#2688485 (10jcrespo) This is not a blocking step at this point- the process can continue but this must be kept open until the production side of filtering is run. [13:40:34] !log Updated Wikidata's property suggester with data from Monday's json dump and applied the T132839 workarounds [13:40:35] T132839: Property suggester suggests human properties for non-human items - https://phabricator.wikimedia.org/T132839 [13:40:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:40:42] sjoerddebruin: FYI ^ [13:41:02] Todays dump? [13:41:11] yesterday's [13:41:13] it's Tuesday [13:41:17] Oh right. :P [13:42:16] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/312654 (https://phabricator.wikimedia.org/T144006) (owner: 10Hashar) [13:42:47] 06Operations, 05Goal: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#2688514 (10akosiaris) [13:42:49] 06Operations: Migrate puppetmaster/backends to jessie - https://phabricator.wikimedia.org/T123730#2688511 (10akosiaris) 05Open>03Resolved a:03akosiaris Indeed. resolving [13:43:46] 06Operations, 10MediaWiki-API, 10Monitoring, 06Services, 10Traffic: Set up action API latency / error rate metrics & alerts - https://phabricator.wikimedia.org/T123854#2688517 (10BBlack) Is there stuff left to do here beyond what's present in the current dashboards? I mean, our metrics can always be "be... [13:45:34] RECOVERY - puppet last run on cp3018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:46:47] 06Operations, 10hardware-requests: Decommission db1019 - https://phabricator.wikimedia.org/T147309#2688523 (10Marostegui) [13:47:17] (03PS1) 10Muehlenhoff: Update to 4.4.23 [debs/linux44] - 10https://gerrit.wikimedia.org/r/314002 [13:48:42] 06Operations, 10DBA, 13Patch-For-Review: db1019: Decommission - https://phabricator.wikimedia.org/T146265#2688549 (10Marostegui) I have created the HW request to get it removed: https://phabricator.wikimedia.org/T147309 Also @Volans and myself were unsure about whether it can be safely remove from the array... [13:49:18] (03PS1) 10Elukey: Decommission analytics1015 and analytics 1026 [puppet] - 10https://gerrit.wikimedia.org/r/314003 [13:49:21] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 05Goal: Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) - https://phabricator.wikimedia.org/T10217#2688551 (10Liuxinyu970226) >>! In T10217#2687272, @HenryLi wrote: TL;DR..... [13:52:22] (03CR) 10Elukey: [C: 032] Decommission analytics1015 and analytics 1026 [puppet] - 10https://gerrit.wikimedia.org/r/314003 (owner: 10Elukey) [13:55:52] 06Operations, 10MediaWiki-extensions-ZeroBanner, 06Reading-Web-Backlog, 10Wikimedia-Apache-configuration, and 3 others: m.wikipedia.org incorrectly redirects to en.m.wikipedia.org - https://phabricator.wikimedia.org/T69015#2688575 (10BBlack) [13:57:29] (03PS1) 10Elukey: Remove any reference for analytics1015 and analytics1026 from Hiera [puppet] - 10https://gerrit.wikimedia.org/r/314004 [13:57:50] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 05Goal: Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) - https://phabricator.wikimedia.org/T10217#2688576 (10jhsoby) >>! In T10217#2688551, @Liuxinyu970226 wrote: > What's... [13:57:52] elukey: did you have to resolve the conflict with site.pp? I just pushed it and I am sure you got it :p - sorry about that! [13:58:19] (03CR) 10Elukey: [C: 032 V: 032] Remove any reference for analytics1015 and analytics1026 from Hiera [puppet] - 10https://gerrit.wikimedia.org/r/314004 (owner: 10Elukey) [13:58:35] (03CR) 10Muehlenhoff: [C: 032] Move mediawiki-converters.profile mediawiki-firejail-ghostscript to all mediawiki servers [puppet] - 10https://gerrit.wikimedia.org/r/313964 (owner: 10Muehlenhoff) [13:58:39] (03PS3) 10Muehlenhoff: Move mediawiki-converters.profile mediawiki-firejail-ghostscript to all mediawiki servers [puppet] - 10https://gerrit.wikimedia.org/r/313964 [14:02:54] 06Operations, 10ops-eqiad: ms-be1004.eqiad.wmnet: slot=3 dev=sdd failed - https://phabricator.wikimedia.org/T144499#2688597 (10Cmjohnson) Disk was replaced...needs to be added back. [14:04:51] 06Operations, 10Analytics-Cluster, 10hardware-requests: Decommission analytics1026 and analytics1015 - https://phabricator.wikimedia.org/T147313#2688601 (10elukey) [14:04:54] 06Operations, 10ops-eqiad: dbstore1001: check drive bays - https://phabricator.wikimedia.org/T145389#2688613 (10Cmjohnson) yes, the SFF disk can be removed and replaced w/LFF disks. [14:05:23] 06Operations, 10ops-eqiad, 10media-storage: diagnose failed(?) sda on ms-be1022 - https://phabricator.wikimedia.org/T140597#2688627 (10Cmjohnson) @fgiunchedi please check this when you get a chance. Thanks! [14:07:29] !log db1055 swapped disk 0 [14:07:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:07:43] cmjohnson1: Thanks!!!! \o/ [14:08:46] 06Operations, 10ops-eqiad, 10DBA: db1055: degraded array - https://phabricator.wikimedia.org/T147172#2683544 (10Cmjohnson) Replaced disk 0 on db1055...rebuilding Adapter #0 Enclosure Device ID: 32 Slot Number: 0 Drive's position: DiskGroup: 0, Span: 0, Arm: 0 Enclosure position: 1 Device Id: 0 WWN: 5000C50... [14:09:53] 06Operations, 10ops-eqiad, 10DBA: db1055: degraded array - https://phabricator.wikimedia.org/T147172#2688636 (10Marostegui) Thanks Chris - will check in a few hours and will close this as resolved once it has finished. Manuel [14:11:02] !log ms-be1002 replacing failed disk slot 11 [14:11:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:11:48] RECOVERY - MegaRAID on ms-be1002 is OK: OK: optimal, 13 logical, 13 physical [14:13:06] (03Abandoned) 10Muehlenhoff: Profile firejail containment for ghostscript [puppet] - 10https://gerrit.wikimedia.org/r/313669 (owner: 10Muehlenhoff) [14:13:15] 06Operations, 10ops-eqiad: ms-be1002.eqiad.wmnet: slot=11 dev=sdl failed - https://phabricator.wikimedia.org/T146741#2688642 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson Replaced disk at slot 11 RECOVERY - MegaRAID on ms-be1002 is OK: OK: optimal, 13 logical, 13 physical [14:18:46] 06Operations, 10MediaWiki-API, 10Monitoring, 06Services, 10Traffic: Set up action API latency / error rate metrics & alerts - https://phabricator.wikimedia.org/T123854#2688672 (10GWicke) @bblack, do we have basic latency / error rate alerts set up for the API? [14:26:55] 06Operations, 05Prometheus-metrics-monitoring: Port apache httpd metrics from ganglia to prometheus - https://phabricator.wikimedia.org/T147316#2688697 (10fgiunchedi) [14:28:05] RECOVERY - HTTPS-toolserver on www.toolserver.org is OK: SSL OK - Certificate stable.toolserver.org valid until 2017-01-02 13:27:00 +0000 (expires in 89 days) [14:29:23] 06Operations, 10ops-eqiad: ms-be1004.eqiad.wmnet: slot=3 dev=sdd failed - https://phabricator.wikimedia.org/T144499#2688717 (10Cmjohnson) 05Open>03Resolved Disk added back Adapter 0: Created VD 3 Configured physical device at Encl-32:Slot-3. 1 physical devices are Configured on adapter 0. [14:34:57] 06Operations, 05Prometheus-metrics-monitoring: Port apache httpd metrics from ganglia to prometheus - https://phabricator.wikimedia.org/T147316#2688740 (10elukey) I would love to have more/all worker states available in the scoreboard: ``` Scoreboard Key: "_" Waiting for Connection, "S" Starting up, "R" Rea... [14:36:33] (03PS6) 10Elukey: beta: update deployment-tin IP and make it master [puppet] - 10https://gerrit.wikimedia.org/r/312654 (https://phabricator.wikimedia.org/T144006) (owner: 10Hashar) [14:38:14] (03CR) 10Elukey: [C: 032] beta: update deployment-tin IP and make it master [puppet] - 10https://gerrit.wikimedia.org/r/312654 (https://phabricator.wikimedia.org/T144006) (owner: 10Hashar) [14:38:29] hashar: --^ mergig [14:38:32] *merging [14:39:02] (03PS1) 10Andrew Bogott: Move contint::slave_scripts from a class to a role. [puppet] - 10https://gerrit.wikimedia.org/r/314007 (https://phabricator.wikimedia.org/T147233) [14:39:13] 06Operations, 10ops-eqiad, 10media-storage: diagnose failed(?) sda on ms-be1022 - https://phabricator.wikimedia.org/T140597#2688751 (10fgiunchedi) a:05Cmjohnson>03fgiunchedi @Cmjohnson I'm not seeing the errors above after reimage, taking this and putting the machine in service [14:39:38] (03CR) 10jenkins-bot: [V: 04-1] Move contint::slave_scripts from a class to a role. [puppet] - 10https://gerrit.wikimedia.org/r/314007 (https://phabricator.wikimedia.org/T147233) (owner: 10Andrew Bogott) [14:39:46] good news godog! one down one to go [14:40:30] cmjohnson1: aye! good news indeed :)) [14:42:11] 06Operations, 10MediaWiki-API, 10Monitoring, 06Services, 10Traffic: Set up action API latency / error rate metrics & alerts - https://phabricator.wikimedia.org/T123854#2688775 (10BBlack) That's fair, that is what's in the title. I think I was thinking one thing and saying another above. I was thinking... [14:45:12] (03CR) 10Eevans: "> Hm, ok, just setting the repo up in hiera is not enough. We need" [puppet] - 10https://gerrit.wikimedia.org/r/313892 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [14:45:45] (03PS2) 10Andrew Bogott: Move contint::slave_scripts from a class to a role. [puppet] - 10https://gerrit.wikimedia.org/r/314007 (https://phabricator.wikimedia.org/T147233) [14:45:47] (03PS3) 10Andrew Bogott: Add role::beta::autoupdater [puppet] - 10https://gerrit.wikimedia.org/r/313904 (https://phabricator.wikimedia.org/T147233) [14:46:34] 06Operations, 10ops-codfw, 06DC-Ops, 10Traffic: lvs2002 Embedded Flash/SD-CARD iLO errors - https://phabricator.wikimedia.org/T126321#2688790 (10BBlack) I guess I dropped this, sorry! @Papaul when's a good time? The work on our end is pretty trivial, we just need to be working together for a few minutes b... [14:46:45] (03PS3) 10Dzahn: DNS configuration for olo.wikipedia.org [dns] - 10https://gerrit.wikimedia.org/r/312805 (https://phabricator.wikimedia.org/T146612) (owner: 10MarcoAurelio) [14:47:39] (03CR) 10Dzahn: [C: 032] "storage part not a blocker per comment on T147302, moving forward" [dns] - 10https://gerrit.wikimedia.org/r/312805 (https://phabricator.wikimedia.org/T146612) (owner: 10MarcoAurelio) [14:48:49] (03Abandoned) 10Andrew Bogott: Move contint::slave_scripts from a class to a role. [puppet] - 10https://gerrit.wikimedia.org/r/314007 (https://phabricator.wikimedia.org/T147233) (owner: 10Andrew Bogott) [14:50:08] !log eqiad-prod: ms-be1022 to weight 1000 T136631 [14:50:10] T136631: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631 [14:50:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:50:44] 06Operations, 10ops-eqiad, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2341913 (10fgiunchedi) [14:50:46] 06Operations, 10ops-eqiad, 10media-storage: diagnose failed(?) sda on ms-be1022 - https://phabricator.wikimedia.org/T140597#2688801 (10fgiunchedi) 05Open>03Resolved following up on T136631 [15:00:06] !log created marostegui account into Racktables [15:00:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:00:21] Thank you :) [15:03:55] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: onboarding Manuel Arostegui in ops - https://phabricator.wikimedia.org/T144469#2600739 (10Volans) Created racktables account. [15:07:55] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure: prepare storage layer for olo.wikipedia - https://phabricator.wikimedia.org/T147302#2688832 (10Dzahn) Thank you, i will go ahead with adding it to DNS. [15:09:19] RECOVERY - MegaRAID on ms-be2009 is OK: OK: optimal, 13 logical, 13 physical [15:09:28] 06Operations: Migrate puppetmaster/backends to jessie - https://phabricator.wikimedia.org/T123730#1936586 (10Dzahn) Cool! But needs a subtask to actually shutdown and decom palladium? [15:12:38] RECOVERY - puppet last run on ms-be1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:13:31] 06Operations: Migrate puppetmaster/backends to jessie - https://phabricator.wikimedia.org/T123730#2688845 (10akosiaris) Subtask of this one ? I 'd say not, but of T123525. I 'll file it and connect it [15:15:27] 06Operations, 10ops-codfw: ms-be2009.codfw.wmnet: slot=3 dev=sdd failed - https://phabricator.wikimedia.org/T147060#2688852 (10Papaul) a:05Papaul>03fgiunchedi Disk replacement complete. [15:16:14] (03PS1) 1020after4: Pass nameserver to ipresolve [puppet] - 10https://gerrit.wikimedia.org/r/314011 [15:16:32] 06Operations, 05Goal: Decomission palladium - https://phabricator.wikimedia.org/T147320#2688855 (10akosiaris) [15:16:42] 06Operations, 05Goal: Decomission palladium - https://phabricator.wikimedia.org/T147320#2688855 (10akosiaris) Setting to stalled for say.. 2 weeks ? [15:18:29] !log adding mw120[45] back to the api live pool after reimage [15:18:30] (03Draft1) 10Paladox: Initial configuration for olo.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314010 (https://phabricator.wikimedia.org/T146612) [15:18:32] (03Draft2) 10Paladox: Initial configuration for olo.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314010 (https://phabricator.wikimedia.org/T146612) [15:18:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:20:57] (03CR) 10Alex Monk: [C: 031] Pass nameserver to ipresolve [puppet] - 10https://gerrit.wikimedia.org/r/314011 (owner: 1020after4) [15:23:33] (03CR) 10Andrew Bogott: [C: 032] Pass nameserver to ipresolve [puppet] - 10https://gerrit.wikimedia.org/r/314011 (owner: 1020after4) [15:23:40] 06Operations, 10ops-codfw, 10DBA: db2017 failed disk (degraded RAID) - https://phabricator.wikimedia.org/T145844#2688904 (10Papaul) a:03Marostegui Disk replacement complete [15:24:42] (03Draft1) 10Paladox: labs dnsrecursor: add olo.wiki(pedia) [puppet] - 10https://gerrit.wikimedia.org/r/314012 (https://phabricator.wikimedia.org/T146612) [15:24:45] (03Draft2) 10Paladox: labs dnsrecursor: add olo.wiki(pedia) [puppet] - 10https://gerrit.wikimedia.org/r/314012 (https://phabricator.wikimedia.org/T146612) [15:26:52] (03CR) 10Paladox: "Logos have already been done in https://phabricator.wikimedia.org/T146745" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314010 (https://phabricator.wikimedia.org/T146612) (owner: 10Paladox) [15:27:49] 06Operations, 10OCG-General, 13Patch-For-Review: Tons of OCG jobs caused a massive increase in queue length - https://phabricator.wikimedia.org/T147211#2688919 (10greg) >>! In T147211#2686306, @cscott wrote: > ok, deployed a patch to blacklist en.wiktionary.org for the time being, rejecting jobs in the front... [15:28:10] 06Operations, 10Traffic, 06WMF-Communications, 07HTTPS, 07Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2688920 (10Florian) @BBlack: Even if the case is closed I would use it for reference in OTRS tickets, so this isn't the... [15:31:36] (03PS1) 1020after4: Pass nameserver to ipresolve (missed a spot) [puppet] - 10https://gerrit.wikimedia.org/r/314014 [15:34:08] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 211, down: 0, dormant: 0, excluded: 0, unused: 0 [15:36:43] 06Operations, 10Traffic, 07HTTPS, 07Wikimedia-Incident: Make OCSP Stapling support more generic and robust - https://phabricator.wikimedia.org/T93927#2688956 (10greg) >>! In T93927#2684299, @BBlack wrote: > Arguably, if the link from the incident to this open ticket of broader scope is annoying, it could b... [15:38:45] (03CR) 10Andrew Bogott: [C: 032] Pass nameserver to ipresolve (missed a spot) [puppet] - 10https://gerrit.wikimedia.org/r/314014 (owner: 1020after4) [15:45:39] 06Operations, 10ops-eqiad, 13Patch-For-Review: Decommission labsdb1002 - https://phabricator.wikimedia.org/T146455#2688977 (10RobH) Please note that the process for decommissioning hosts has been clarified/updated from the recent ops offsite meeting. Please review https://wikitech.wikimedia.org/wiki/Server_... [15:51:00] RECOVERY - puppet last run on ms-be2009 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [15:51:49] PROBLEM - puppet last run on mw1294 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:51:58] 06Operations, 10ops-codfw: ms-be2009.codfw.wmnet: slot=3 dev=sdd failed - https://phabricator.wikimedia.org/T147060#2689000 (10fgiunchedi) 05Open>03Resolved disk rebuilding [16:00:04] godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161004T1600). [16:00:59] 06Operations, 06Labs, 13Patch-For-Review: Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module - https://phabricator.wikimedia.org/T120159#1846901 (10ema) On my self-hosted puppetmaster using `role::puppet::self` I've ended up having two stanzas for `[agent]`, o... [16:03:26] (03CR) 10Mobrovac: "Yup. We need scap on the nodes, and scap::target provides that and sets up the target on the nodes." [puppet] - 10https://gerrit.wikimedia.org/r/313892 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [16:03:39] 06Operations, 06Labs, 13Patch-For-Review: Phase out the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module - https://phabricator.wikimedia.org/T120159#2689030 (10AlexMonk-WMF) >>! In T120159#2689020, @ema wrote: > On my self-hosted puppetmaster using `role::puppet::self` I'... [16:05:08] 06Operations, 05Prometheus-metrics-monitoring: Port memcached statistics from ganglia to prometheus - https://phabricator.wikimedia.org/T147326#2689034 (10fgiunchedi) [16:06:53] PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds. [16:07:09] (03PS1) 10Elukey: Set l10nupdate group in its related logrotate config file [puppet] - 10https://gerrit.wikimedia.org/r/314026 (https://phabricator.wikimedia.org/T132324) [16:07:50] (03CR) 10Alex Monk: Add python version of maintain-replicas script (031 comment) [software] - 10https://gerrit.wikimedia.org/r/295607 (https://phabricator.wikimedia.org/T138450) (owner: 10Alex Monk) [16:09:32] RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor [16:10:17] (03CR) 10Andrew Bogott: [C: 031] Set l10nupdate group in its related logrotate config file [puppet] - 10https://gerrit.wikimedia.org/r/314026 (https://phabricator.wikimedia.org/T132324) (owner: 10Elukey) [16:11:08] (03CR) 10jenkins-bot: [V: 04-1] Set l10nupdate group in its related logrotate config file [puppet] - 10https://gerrit.wikimedia.org/r/314026 (https://phabricator.wikimedia.org/T132324) (owner: 10Elukey) [16:13:24] (03PS2) 10Elukey: Set l10nupdate group in its related logrotate config file [puppet] - 10https://gerrit.wikimedia.org/r/314026 (https://phabricator.wikimedia.org/T132324) [16:13:36] I know jenkins, I forgot a space, don't be so angry [16:16:03] (03CR) 10Elukey: [C: 032] Set l10nupdate group in its related logrotate config file [puppet] - 10https://gerrit.wikimedia.org/r/314026 (https://phabricator.wikimedia.org/T132324) (owner: 10Elukey) [16:16:49] RECOVERY - puppet last run on mw1294 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:17:22] PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds. [16:19:53] RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor [16:22:39] (03PS1) 10Rush: labstore: align tools drbd with current prod [puppet] - 10https://gerrit.wikimedia.org/r/314028 [16:22:53] (03PS1) 10Addshore: Enabled simple-json-datasource on prod Grafana [puppet] - 10https://gerrit.wikimedia.org/r/314029 (https://phabricator.wikimedia.org/T147329) [16:23:03] (03CR) 10jenkins-bot: [V: 04-1] labstore: align tools drbd with current prod [puppet] - 10https://gerrit.wikimedia.org/r/314028 (owner: 10Rush) [16:28:40] 06Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 07HTTPS, 07JavaScript: Use Upgrade Insecure Requests on Wikimedia wikis - https://phabricator.wikimedia.org/T101002#2689152 (10Krinkle) [16:28:59] 06Operations, 10ops-codfw, 06DC-Ops, 10Traffic: lvs2002 Embedded Flash/SD-CARD iLO errors - https://phabricator.wikimedia.org/T126321#2689153 (10Papaul) @BBlack let me know when is best for you. Any time from 9:30am is okay with me. [16:29:15] !log authdns-gen-zones -f /srv/authdns/git/templates /etc/gdnsd/zones && gdnsd checkconf && gdnsd reload-zones [16:29:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:30:00] !log authdns commands from T97051#1994679 to add olo.wp for T146612 [16:30:01] T97051: adding new languages to DNS langs.tmpl doesn't work until zone template is edited as well - https://phabricator.wikimedia.org/T97051 [16:30:02] T146612: Create Livvi-Karelian Wikipedia at olo.wikipedia.org - https://phabricator.wikimedia.org/T146612 [16:30:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:30:17] (03PS3) 10Rush: labstore: drbd resource setup sanity [puppet] - 10https://gerrit.wikimedia.org/r/312023 [16:30:24] (03PS2) 10Rush: labstore: align tools drbd with current prod [puppet] - 10https://gerrit.wikimedia.org/r/314028 [16:30:55] !log upgrading labvirt1014 to Linux 4.4 [16:31:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:32:21] !log new wiki language Livvi-Karelian -> olo.wikipedia.org has been added to DNS (T146612) [16:32:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:39:03] 06Operations: Migrate planet from singer - https://phabricator.wikimedia.org/T81287#2689176 (10Dzahn) [16:39:14] 06Operations, 10Wikimedia-Planet: Migrate planet from singer - https://phabricator.wikimedia.org/T81287#886028 (10Dzahn) [16:48:41] @seen Wikimedia_Australia [16:48:41] mutante: I have never seen Wikimedia_Australia [16:52:27] (03PS1) 10Krinkle: noc: Clean up index.html and use HiDPI logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314034 [16:53:08] (03CR) 10Krinkle: [C: 032] noc: Clean up index.html and use HiDPI logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314034 (owner: 10Krinkle) [16:53:36] (03Merged) 10jenkins-bot: noc: Clean up index.html and use HiDPI logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314034 (owner: 10Krinkle) [16:55:44] !log krinkle@tin Synchronized docroot/noc: (no message) (duration: 01m 01s) [16:55:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:56:14] (03Abandoned) 10Dereckson: Initial configuration for olo.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314010 (https://phabricator.wikimedia.org/T146612) (owner: 10Paladox) [16:57:55] (03PS3) 1020after4: Configuration for Aphlict [puppet] - 10https://gerrit.wikimedia.org/r/313937 (https://phabricator.wikimedia.org/T112765) [16:58:44] 06Operations, 10Ops-Access-Requests, 10netops: Access to network devices - https://phabricator.wikimedia.org/T147061#2689222 (10RobH) a:03faidon My understanding is that all access to the switches is currently handed by @faidon. As such, I'm escalating this task to him as part of my clinic duty week, and... [16:59:04] 06Operations, 10Ops-Access-Requests, 10netops: elukey - Access to network devices - https://phabricator.wikimedia.org/T147061#2679684 (10RobH) [17:00:04] yurik, gwicke, cscott, arlolra, subbu, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161004T1700). [17:04:29] (03CR) 1020after4: "This is mostly working now on beta (deployment-phab01)" [puppet] - 10https://gerrit.wikimedia.org/r/313937 (https://phabricator.wikimedia.org/T112765) (owner: 1020after4) [17:08:05] (03PS1) 10Muehlenhoff: Fix quoting for br_netfilter kmod configuration [puppet] - 10https://gerrit.wikimedia.org/r/314035 [17:10:11] no parsoid deploy today [17:13:19] Polish planet, .info domain, 404 message in Japanese, translates to "Reflect wait of settings, it is the address that does not exist.". well... delete [17:19:14] !log deploying kartotherian & tilerator updates [17:19:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:20:51] PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds. [17:23:21] RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor [17:30:53] PROBLEM - puppet last run on mw1203 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:34:46] 06Operations, 10Beta-Cluster-Infrastructure, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2689359 (10hashar) The deployment servers on beta cluster are now fully migrated to Jessie. We ended up keeping the same hostname and have: * deploymen... [17:35:33] (03PS1) 10Dzahn: planet: delete broken Polish feeds [puppet] - 10https://gerrit.wikimedia.org/r/314038 (https://phabricator.wikimedia.org/T134435) [17:37:24] (03CR) 10Ema: [C: 031] cache_upload: jemalloc chunk size: s/1MB/128KB/ [puppet] - 10https://gerrit.wikimedia.org/r/313847 (owner: 10BBlack) [17:38:26] !log deployed kartotherian https://gerrit.wikimedia.org/r/#/c/314018/ -- Possible issue https://phabricator.wikimedia.org/T147334 [17:38:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:39:54] yurik: hrm, looks like you're out of space on a few machines :\ [17:40:15] thcipriani, yes, we ran out of space and rebuilding them (they are not in prod( [17:40:34] i'm still checking if the actual code made it to the servers [17:40:43] (03PS2) 10Dzahn: planet: delete broken Polish feeds [puppet] - 10https://gerrit.wikimedia.org/r/314038 (https://phabricator.wikimedia.org/T134435) [17:41:03] doesn't look like it made it that far :( [17:41:05] (03CR) 10Dzahn: [C: 032] planet: delete broken Polish feeds [puppet] - 10https://gerrit.wikimedia.org/r/314038 (https://phabricator.wikimedia.org/T134435) (owner: 10Dzahn) [17:41:07] thcipriani, seems like it didn't restart the service, even though i said "n" to rollback [17:41:26] (03PS1) 10Krinkle: noc: Convert db.php from broken jQuery UI to simple nav sections [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314039 [17:41:38] (03CR) 10Krinkle: [C: 032] noc: Convert db.php from broken jQuery UI to simple nav sections [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314039 (owner: 10Krinkle) [17:42:01] yurik: is that not what you want? Don't rollback means don't do anything, leave the servers as they are. [17:42:09] (03Merged) 10jenkins-bot: noc: Convert db.php from broken jQuery UI to simple nav sections [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314039 (owner: 10Krinkle) [17:42:34] seems like they didn't get to the point of fetching the config down from tin to run a deploy [17:42:50] thcipriani, hmm... good question. In theory, if I try to deploy to a large list of servers, and some of them are down, I might be ok to just skip the broken servers [17:43:10] which means that i would expect it to complete the deployment for all others [17:43:19] we have a task for this somewhere [17:43:21] * thcipriani digs [17:43:40] https://phabricator.wikimedia.org/T145512 [17:44:25] thanks, subscribed [17:44:37] !log krinkle@tin Synchronized docroot/noc/db.php: (no message) (duration: 00m 48s) [17:44:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:45:07] cool, I'll bring it up at the bug triage meeting tomorrow. [17:45:22] PROBLEM - check_disk on db1025 is CRITICAL: DISK CRITICAL - free space: / 3748 MB (52% inode=72%): /dev 32199 MB (99% inode=99%): /run 6441 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 32209 MB (100% inode=99%): /a 585532 MB (47% inode=99%): /a/tmp 3932 MB (3% inode=99%) [17:45:36] thcipriani, does targets file allow comments (#) ? [17:45:47] yep [17:46:03] lines starting with # and blank lines are ignored [17:46:33] ^^^ db1025 looking.... [17:48:18] thcipriani, i just hit an error on maps cluster during scap, which went away on refresh. Shouldn't it make it so that it doesn't break during scap? [17:48:25] sadly i refreshed without saving the error [17:49:08] you mean the service had an error during deployment? [17:50:25] PROBLEM - check_disk on db1025 is CRITICAL: DISK CRITICAL - free space: / 3748 MB (52% inode=72%): /dev 32199 MB (99% inode=99%): /run 6441 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 32209 MB (100% inode=99%): /a 585524 MB (47% inode=99%): /a/tmp 3932 MB (3% inode=99%) [17:50:27] it shouldn't happen. Symlink swap should be instant and as long as you're using deploy groups not all of your servers should be down at the same time. [17:51:32] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:52:41] thcipriani, we have just one group, so maybe we should break it into multiple groups [17:52:58] (03PS1) 10Mattflaschen: Flow opt in: Temporarily disable all, MW.org is redundant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314042 (https://phabricator.wikimedia.org/T147241) [17:53:00] (03PS4) 10Eevans: Extend classpath via Puppet [puppet] - 10https://gerrit.wikimedia.org/r/313619 (https://phabricator.wikimedia.org/T133395) [17:53:52] (03CR) 10Catrope: [C: 031] Flow opt in: Temporarily disable all, MW.org is redundant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314042 (https://phabricator.wikimedia.org/T147241) (owner: 10Mattflaschen) [17:54:11] thcipriani, https://phabricator.wikimedia.org/D401 hasn't gone into the actual repo yet? [17:54:14] yet it was accepted [17:54:30] yurik: ah, yeah, then there could be a window during which all services are restarting. Splitting into 2 groups could be a workaround for this. [17:54:58] Krenair: yeah, differential is a little different, the patch upload has to actually "land" the patch in the repo after it is accepted. [17:55:22] PROBLEM - check_disk on db1025 is CRITICAL: DISK CRITICAL - free space: / 3748 MB (52% inode=72%): /dev 32199 MB (99% inode=99%): /run 6441 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 32209 MB (100% inode=99%): /a 585524 MB (47% inode=99%): /a/tmp 3932 MB (3% inode=99%) [17:55:30] thcipriani, the uploader has to? [17:56:30] Krenair: that's the typical workflow, yes [17:56:32] RECOVERY - puppet last run on mw1203 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:56:43] Krenair: the uploader doesn't necessarily *have to* I could land it or you could land it, but acceptance doesn't auto land anything. [17:56:44] pust patch, get approval, land it yourself [17:56:54] ^ but, yeah, that's typical [17:56:54] *post patch [17:57:04] okay, why is it not auto-landed? [17:57:09] we haven't set jenkins up to do that yet? [17:57:15] there was a time when arc land didn't like to land other people's patches, but I think that has been fixed [17:57:19] 06Operations, 10Traffic: OpenSSL 1.1 deployment for cache clusters - https://phabricator.wikimedia.org/T144523#2689458 (10BBlack) The chapoly prehack and +3des stuff are in the first 3 commits here and should rebase fine as they are: https://phabricator.wikimedia.org/diffusion/ODFP/history/wmf-1.1/ . Note the... [17:57:44] !log deployed tilerator (disabled on maps-test*) https://gerrit.wikimedia.org/r/#/c/314030/ [17:57:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:57:51] yeah, arc land has gotten considerably better since we started using differential for scap things. [17:58:04] arc in general [17:58:31] Um [17:58:43] Arc land attempts to use the 'origin' remote instead of phabricator's remote? [17:59:26] sounds right [17:59:33] okay that's broken [17:59:34] so is this [17:59:38] PUSHING Pushing changes to "phab/master". [17:59:38] Exception: You do not have permission to push to this repository. [17:59:38] fatal: Could not read from remote repository. [18:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161004T1800). Please do the needful. [18:00:49] Hello. [18:01:02] I can SWAT this morning. [18:01:05] Krenair: oh good. Could you file a task for the remote piece (this could be how we have that repo configured). Looks like I'll have to land that patch, could be something that twentyafterfour could fix though :) [18:01:09] matt_flaschen: ping? [18:01:17] Apparently I have to add myself to the push policy [18:01:21] Despite being in the edit policy [18:01:45] oh, hrm. [18:01:53] RoanKattouw: ping? [18:01:54] I've managed to do it with the policy fix + 'arc land fix-sync-file --remote phab' [18:01:58] Dereckson, present. [18:02:22] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314042 (https://phabricator.wikimedia.org/T147241) (owner: 10Mattflaschen) [18:02:36] Krenair: cool, thank you for the patch. [18:02:47] (03Merged) 10jenkins-bot: Flow opt in: Temporarily disable all, MW.org is redundant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314042 (https://phabricator.wikimedia.org/T147241) (owner: 10Mattflaschen) [18:02:51] Wait, no [18:02:53] It's not actually worked [18:02:56] It just said it did [18:03:36] https://phabricator.wikimedia.org/rMSCAbe919816e89639250c4fdce4f965d0da2fad6fbb [18:03:39] Failed to load the commit because the commit has not been parsed yet. [18:04:08] hrm, I got the change when I fetched down master. The diff takes a few to close. [18:04:11] It uses background tasks to parse commits? And this takes a non-trivial amount of time? [18:05:00] Okay so I think my commit is in the git repo on iridium but the web interface hasn't been updated to reflect it yet? [18:05:11] RECOVERY - check_disk on db1025 is OK: DISK OK - free space: / 3748 MB (52% inode=72%): /dev 32199 MB (99% inode=99%): /run 6441 MB (99% inode=99%): /run/lock 5 MB (100% inode=99%): /run/shm 32209 MB (100% inode=99%): /a 585640 MB (47% inode=99%): /a/tmp 102316 MB (99% inode=99%) [18:05:15] sounds likely [18:05:31] (03PS1) 10Dzahn: planet: delete poradnikwebmastera.blox.pl from pl [puppet] - 10https://gerrit.wikimedia.org/r/314046 (https://phabricator.wikimedia.org/T134435) [18:05:51] matt_flaschen: live on mw1099 [18:05:56] (03PS2) 10Dzahn: planet: delete poradnikwebmastera.blox.pl from pl [puppet] - 10https://gerrit.wikimedia.org/r/314046 (https://phabricator.wikimedia.org/T134435) [18:05:58] Thanks, checking now. [18:06:01] (03CR) 10Dzahn: [C: 032] planet: delete poradnikwebmastera.blox.pl from pl [puppet] - 10https://gerrit.wikimedia.org/r/314046 (https://phabricator.wikimedia.org/T134435) (owner: 10Dzahn) [18:06:28] It shows me in the push logs [18:07:29] There we go [18:07:34] That took far too long [18:08:09] 6 minutes for a 1 line commit [18:09:21] Commits are hard ;-) [18:10:36] (03PS5) 10Eevans: [WIP]: Cassandra TWCS deploy repository [software/cassandra-twcs] - 10https://gerrit.wikimedia.org/r/313825 (https://phabricator.wikimedia.org/T133395) [18:10:41] (03CR) 10Eevans: [WIP]: Cassandra TWCS deploy repository (033 comments) [software/cassandra-twcs] - 10https://gerrit.wikimedia.org/r/313825 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [18:11:43] Dereckson, looks good, thanks. [18:13:49] 06Operations, 10ops-eqiad: cannot login to labstore1003 and labstore1004 mgmt - https://phabricator.wikimedia.org/T147340#2689548 (10RobH) [18:14:48] (03CR) 10Rush: Add python version of maintain-replicas script (032 comments) [software] - 10https://gerrit.wikimedia.org/r/295607 (https://phabricator.wikimedia.org/T138450) (owner: 10Alex Monk) [18:14:54] (03CR) 10Paladox: [C: 031] Initial configuration for olo.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312807 (https://phabricator.wikimedia.org/T146612) (owner: 10MarcoAurelio) [18:17:22] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:17:48] 06Operations, 10ops-eqiad: cannot login to mgmt on labvirt1003 and labvirt1004 - https://phabricator.wikimedia.org/T147340#2689627 (10RobH) [18:19:48] (03PS1) 10BBlack: cache_misc: add multicast HTCP purging via vhtcpd [puppet] - 10https://gerrit.wikimedia.org/r/314048 [18:22:54] (03CR) 10BBlack: [C: 032] cache_misc: add multicast HTCP purging via vhtcpd [puppet] - 10https://gerrit.wikimedia.org/r/314048 (owner: 10BBlack) [18:23:43] (03CR) 10RobH: [C: 031] "This looks good, and if no objections are noted on the access request task, I'll merge after the three day period (so this Thursday AM)" [puppet] - 10https://gerrit.wikimedia.org/r/313777 (https://phabricator.wikimedia.org/T146924) (owner: 10ArielGlenn) [18:23:59] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002 and stat1004 for nschaaf - https://phabricator.wikimedia.org/T146924#2689645 (10RobH) [18:24:24] (03PS2) 10RobH: add nschaaf to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/313777 (https://phabricator.wikimedia.org/T146924) (owner: 10ArielGlenn) [18:24:40] or i can realize the task is older than the patchset, rebase and merge... heh [18:24:54] well, be told that, not realize... [18:25:26] !log cutting branch 1.28.0-wmf.21 of mediawiki and extensions [18:25:26] (03CR) 10Dzahn: [C: 031] "has been added to DNS, can be added anytime the services team deploys restbase changes" [puppet] - 10https://gerrit.wikimedia.org/r/312808 (https://phabricator.wikimedia.org/T146612) (owner: 10MarcoAurelio) [18:25:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:25:44] (03CR) 10RobH: [C: 032] add nschaaf to analytics-privatedata-users [puppet] - 10https://gerrit.wikimedia.org/r/313777 (https://phabricator.wikimedia.org/T146924) (owner: 10ArielGlenn) [18:26:00] 06Operations, 10Citoid, 06Services, 10VisualEditor: Package and test Zotero for Jessie - https://phabricator.wikimedia.org/T107302#1491929 (10GWicke) >>! In T107302#2688011, @Mvolz wrote: > I believe it was decided not to rewrite Zotero core functions in Node > because Zotero is switching to Electron. But... [18:27:32] Dereckson, how's it going? [18:27:49] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002 and stat1004 for nschaaf - https://phabricator.wikimedia.org/T146924#2689668 (10RobH) 05Open>03Resolved a:03RobH @schana: Your access request has been merged live (with Ariel's patchset). 3 days had passed since not only the request, b... [18:29:05] jouncebot: next restbase [18:29:05] In 0 hour(s) and 30 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161004T1900) [18:30:10] (03CR) 10Dzahn: [C: 031] "olo.wikipedia has been added to prod DNS" [puppet] - 10https://gerrit.wikimedia.org/r/314012 (https://phabricator.wikimedia.org/T146612) (owner: 10Paladox) [18:32:08] matt_flaschen: logs are good too [18:32:23] (03CR) 10Dzahn: "added Marostegui" [puppet] - 10https://gerrit.wikimedia.org/r/313235 (https://phabricator.wikimedia.org/T146673) (owner: 10Paladox) [18:32:33] (03PS4) 10Eevans: Enable cassandra/twcs deploy repository [puppet] - 10https://gerrit.wikimedia.org/r/313892 (https://phabricator.wikimedia.org/T133395) [18:32:40] syncing to prod [18:32:44] Thanks [18:33:57] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Flow opt in: Temporarily disable all, MW.org is redundant ([[Gerrit:314042]]) (duration: 00m 50s) [18:34:14] (03CR) 10Dzahn: "i don't know. i checked the upstream link but it has not been reviewed yet it seems" [puppet] - 10https://gerrit.wikimedia.org/r/313029 (owner: 10Paladox) [18:34:17] matt_flaschen: here you are ^ [18:34:54] (03CR) 10Paladox: "yeh, but this is a workaround until upstream merges it and we upgrade to the version of gerrit that includes that fix." [puppet] - 10https://gerrit.wikimedia.org/r/313029 (owner: 10Paladox) [18:36:18] Thanks, Dereckson, testing now. [18:37:50] 06Operations, 10ops-eqiad: cannot login to mgmt on labvirt1003 and labvirt1004 - https://phabricator.wikimedia.org/T147340#2689699 (10RobH) 05Open>03declined Not sure whats up but these were setup and I just failed to login. I blame being sick plus jetlag, closing task. [18:40:00] (03PS6) 10MarcoAurelio: Initial configuration for olo.wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312807 (https://phabricator.wikimedia.org/T146612) [18:40:12] (03CR) 10Eevans: "> Yup. We need scap on the nodes, and scap::target provides that and" [puppet] - 10https://gerrit.wikimedia.org/r/313892 (https://phabricator.wikimedia.org/T133395) (owner: 10Eevans) [18:40:27] !log Running extension/Echo/removeInvalidNotification.php on testwiki, test2wiki and mediawikiwiki (T147138) [18:40:28] T147138: "There are no notifications" although the page has a notification counter of "1" - https://phabricator.wikimedia.org/T147138 [18:40:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Mr. Obvious [18:41:10] PROBLEM - puppet last run on mw1229 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:42:14] 06Operations, 10Citoid, 06Services, 10VisualEditor: Package and test Zotero for Jessie - https://phabricator.wikimedia.org/T107302#2689716 (10Mvolz) >>! In T107302#2689663, @GWicke wrote: >>>! In T107302#2688011, @Mvolz wrote: >> I believe it was decided not to rewrite Zotero core functions in Node >> beca... [18:42:37] (03PS3) 10Paladox: Gerrit: Fix copying text from comments in Internet Explorer [puppet] - 10https://gerrit.wikimedia.org/r/313029 [18:42:39] Dereckson, confirmed in production, thanks. [18:45:03] (03CR) 10Paladox: Gerrit: Fix copying text from comments in Internet Explorer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/313029 (owner: 10Paladox) [18:47:12] (03PS4) 10Dzahn: Gerrit: Fix copying text from comments in Internet Explorer [puppet] - 10https://gerrit.wikimedia.org/r/313029 (owner: 10Paladox) [18:48:09] (03CR) 10Hashar: [C: 031] "This change is fine. Somehow upstream has -ms-user-select: initial which is not recognized by IE and fall back to not selectable." [puppet] - 10https://gerrit.wikimedia.org/r/313029 (owner: 10Paladox) [18:48:21] mutante: that Gerrit IE [18:48:27] (03CR) 10Dzahn: [C: 032] "ACK, this is only for IE._ms_ user select." [puppet] - 10https://gerrit.wikimedia.org/r/313029 (owner: 10Paladox) [18:48:29] So SWAT is done. [18:48:29] mutante: that Gerrit IE css tweak for Gerrit can land without trouble [18:48:43] Thankyou hashar [18:48:45] hashar: yep, we were just talking about it and confirmed [18:48:46] mutante: that user-select thing is not even in the CSS spec from a quick search [18:48:46] thanks [18:48:53] i know, merged [18:48:54] and it is definitely not going to cause any harm :] [18:49:25] not sure why Gerrit default -ms-user-select: initial; seems that is not supported by IE and it is probably just a copy paste [18:49:49] hashar IE is deprecated [18:49:58] Microsoft Edge is the new way now [18:50:35] Microsoft Ripped out the new engine they were going to use and used it in Microsoft Edge instead [18:50:37] calls it an "edge case" [18:50:43] lol [18:51:10] 06Operations, 10RESTBase, 10hardware-requests: Expand RESTBase cluster capacity - https://phabricator.wikimedia.org/T93790#2689735 (10GWicke) [18:51:17] Seems microsoft edge has bugs [18:51:26] sooo surprised :) [18:51:46] LOL, but it was working in the aniversery update i think [18:52:39] (03CR) 10Dzahn: [V: 032] Gerrit: Fix copying text from comments in Internet Explorer [puppet] - 10https://gerrit.wikimedia.org/r/313029 (owner: 10Paladox) [18:53:38] paladox: it's on lead now [18:54:04] paladox: can you confirm it works fine on prod now ? [18:54:05] Thanks [18:54:20] Yep [18:54:21] fixed now [18:54:24] thanks mutante [18:54:25] hashar ^^ [18:54:27] hurra thank you mutante :) [18:54:34] :) [18:54:49] paladox: you might want to state on upstream Gerrit issue tracker that the fix is pushed on wikimedia one and works [18:54:55] that can help getting your patch merged [18:55:03] Ok, will do that now [18:55:04] thanks [18:55:05] :) [18:55:14] will +1 as well [18:55:41] Thanks [18:55:43] :) [18:57:36] gerrit-review.googlesource.com is quite slow [18:57:52] Yeh [18:58:06] Thats because they switched to using NoteDB or ReviewDB [18:58:08] I think [18:58:14] I read this somewhere though [18:58:33] hashar https://groups.google.com/forum/#!topic/repo-discuss/mOLmh2ZS7u0 [18:58:33] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002 and stat1004 for nschaaf - https://phabricator.wikimedia.org/T146924#2689787 (10Nuria) Please @schana verify you can run oozie commands/access hive. https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Oozie#Checking_logs: [18:58:40] anyway that is done for us now. Next bug paladox ! :) [18:58:53] Thanks and LOL [18:59:01] copy/paste in firefox now? [18:59:04] Oh [18:59:13] Is copy and paste not working in Firefox? [18:59:17] I thought it was working? [19:00:01] oh, yes, it is working [19:00:04] To be determined: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161004T1900). [19:00:09] hashar i actually release 2.12.5 includes a new option for us in the diff [19:00:19] i actually = [19:00:24] To be determined: ^ [19:00:33] gerrit 2.12.5 includes a new option for diff [19:00:44] WHich fixes our downstream task in phabricator [19:00:48] I also did the fix :) [19:01:19] yeah I noticed your patch :] [19:01:25] good to see they merged it quite fast [19:02:50] Yep [19:03:27] hashar i do like the big scroll bars in gerrit 2.13 - http://gerrit-new.wmflabs.org/ though [19:03:32] much better to scroll [19:04:11] RECOVERY - puppet last run on mw1229 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:05:43] hashar for https://gerrit-review.googlesource.com/#/c/87310/3 you said "We had this .css tweak applied on the Wikimedia Gerrit installation and Paladox has confirmed it makes the comment text selectable in Microsoft Edge.", it always worked in Microsoft Edge since it dosen't use -ms- prefix any more it droped ie old engine, It was faking to be chrome, but anyways it fixed IE [19:05:44] :) [19:05:59] :) [19:06:27] :) [19:07:07] (03PS3) 10Dzahn: gridengine: use present instead of latest in package [puppet] - 10https://gerrit.wikimedia.org/r/310710 (https://phabricator.wikimedia.org/T115348) (owner: 10Paladox) [19:07:10] bigger scroll bars i like as well [19:07:15] Yep [19:07:30] It also has a new plugin for graphite i think [19:07:36] or graph something [19:08:24] ah, missed the to be determined ping, mw train is in progress [19:09:09] heh, oops [19:09:30] thcipriani: seems the database issue got fixed ? [19:10:06] hashar: yeah, so it seems, fixed before I checked the blocking task. [19:10:33] (03CR) 10Jhobs: [C: 031] "You should be able to remove your -1 now. Does InitialiseSettings-labs.php need to change at all?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313898 (https://phabricator.wikimedia.org/T145442) (owner: 10Jdlrobson) [19:10:43] the LB one, yeah [19:10:45] there was some task about a global rename task that is stuck. I could not fix it up but apparently tgr said it is not to be a blocker [19:10:45] (03CR) 10Dzahn: "amended to only switch 'latest' to 'present' but not introduce the use of require_package" [puppet] - 10https://gerrit.wikimedia.org/r/310710 (https://phabricator.wikimedia.org/T115348) (owner: 10Paladox) [19:11:03] jouncebot: now [19:11:03] For the next 1 hour(s) and 48 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161004T1900) [19:11:13] oh right, you don't say who when doing now [19:11:16] jouncebot: refresh [19:11:19] I refreshed my knowledge about deployments. [19:19:57] (03CR) 10Rush: [C: 031] "sure, but I don't have time to push this out atm. @dzahn do you?" [puppet] - 10https://gerrit.wikimedia.org/r/310710 (https://phabricator.wikimedia.org/T115348) (owner: 10Paladox) [19:23:55] (03CR) 10Rush: [C: 031] Fix quoting for br_netfilter kmod configuration [puppet] - 10https://gerrit.wikimedia.org/r/314035 (owner: 10Muehlenhoff) [19:26:11] !log thcipriani@tin Started scap: testwiki to 1.28.0-wmf.21 and rebuild l10n cache [19:26:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:27:20] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 9x or 15x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2689956 (10GWicke) One factor we have ignored a bit in this discussion is the longer term plan to separate current revisions from archival storage (see T1201... [19:28:22] (03PS4) 10Dzahn: gridengine: use present instead of latest in package [puppet] - 10https://gerrit.wikimedia.org/r/310710 (https://phabricator.wikimedia.org/T115348) (owner: 10Paladox) [19:28:58] (03CR) 10Dzahn: [C: 032] "sure, thanks. just wanted that +1 from you guys" [puppet] - 10https://gerrit.wikimedia.org/r/310710 (https://phabricator.wikimedia.org/T115348) (owner: 10Paladox) [19:29:06] Thanks [19:37:52] PROBLEM - MegaRAID on db1065 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [19:41:42] (03CR) 10Jdlrobson: "Footer code is riding the train so this is still -1ed until Thursday." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313898 (https://phabricator.wikimedia.org/T145442) (owner: 10Jdlrobson) [19:42:58] (03PS4) 10Andrew Bogott: Add role::beta::deploymentserver [puppet] - 10https://gerrit.wikimedia.org/r/313904 (https://phabricator.wikimedia.org/T147233) [19:44:27] (03CR) 10Andrew Bogott: [C: 032] Add role::beta::deploymentserver [puppet] - 10https://gerrit.wikimedia.org/r/313904 (https://phabricator.wikimedia.org/T147233) (owner: 10Andrew Bogott) [19:44:32] (03PS5) 10Andrew Bogott: Add role::beta::deploymentserver [puppet] - 10https://gerrit.wikimedia.org/r/313904 (https://phabricator.wikimedia.org/T147233) [19:45:11] (03CR) 10Dzahn: "gridengine::submit_host is included in all the toollabs classes, checker.pp, bastion.pp, services.pp etc, so i picked a random instance, t" [puppet] - 10https://gerrit.wikimedia.org/r/310710 (https://phabricator.wikimedia.org/T115348) (owner: 10Paladox) [19:56:12] PROBLEM - HP RAID on ms-be1023 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds. [19:56:33] 06Operations, 10RESTBase, 06Services: RESTBase and domain renames - https://phabricator.wikimedia.org/T113307#2690103 (10GWicke) p:05Normal>03Low [19:58:42] RECOVERY - HP RAID on ms-be1023 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor [20:03:55] !log RESTBase deploy 810b6aa563 to staging [20:04:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:06:13] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:14:42] (03PS1) 10Andrew Bogott: Remove role::deployment::server from deploymentserver.pp [puppet] - 10https://gerrit.wikimedia.org/r/314068 [20:15:51] (03CR) 10Andrew Bogott: [C: 032] Remove role::deployment::server from deploymentserver.pp [puppet] - 10https://gerrit.wikimedia.org/r/314068 (owner: 10Andrew Bogott) [20:16:54] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [20:19:23] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [20:22:03] !log thcipriani@tin Finished scap: testwiki to 1.28.0-wmf.21 and rebuild l10n cache (duration: 55m 51s) [20:22:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:25:53] PROBLEM - Apache HTTP on mw1274 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:28:55] Doesn't seem like I'm going to be rolling this version forward to group0, looks like it's completely broken [20:28:57] https://test.wikipedia.org/ [20:29:36] > Cannot access the database: No working replica DB server: Unknown error [20:30:32] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [20:31:21] AaronSchulz: ^ [20:31:29] AaronSchulz: re test.wikipedia.org [20:33:37] * AaronSchulz checks logs [20:35:47] https://test.wikipedia.org/wiki/Special:Version [20:36:14] lots of 503 [20:36:53] yes, confirmed [20:36:55] thcipriani: ^ [20:37:24] see prevouis 10 lines of scrollback :) [20:37:50] I will rollback now, there are probably plenty of logs to investigate. [20:37:55] +1 [20:38:50] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: testwiki back to 1.28.0-wmf.20 [20:38:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:56:54] bd808: i would like to periodically (say weekly, via cron) query logstash, make a report, and mail it to a list, any suggestions about where to do that from? [20:58:03] urandom: I think we opened up access to the logstash elasticsearch cluster from some internal hosts. let me look for the ferm rule [20:59:09] tin at least for the deploy checker [20:59:12] tin/mira [20:59:39] urandom: that looks right ^ $DEPLOYMENT_HOSTS [21:00:02] greg-g, bd808: do you think it would be OK to use one of those for this purpose? [21:00:22] i assume something would need to be done to make email work [21:00:41] Not sure why it wouldn't be. and email should work fine outbound I think [21:00:42] ideally not, I wonder if we can add the work machiens (terbium and uh, whatever in codfw, wasat?) to that rule [21:00:53] * greg-g is protective, apparently [21:00:58] sure. someboy just needs to put up a patch [21:01:02] *somebody [21:01:28] so terbium (once blessed by ferm)? [21:01:29] if this is at all important the cron should be managed by puppet too [21:02:02] yeah terbium is the more common host for crons to run on [21:04:11] bd808, greg-g: cool; thanks! [21:05:58] thcipriani: https://gerrit.wikimedia.org/r/#/c/314173/1 [21:08:12] AaronSchulz: ok. I can give that a shot if you're around for a few. [21:08:21] sure [21:13:06] 06Operations, 10Cassandra: Address abnormally wide partitions - https://phabricator.wikimedia.org/T143056#2690464 (10Eevans) [21:22:33] (03PS2) 10BBlack: cache_upload: jemalloc chunk size: s/1MB/128KB/ [puppet] - 10https://gerrit.wikimedia.org/r/313847 [21:22:42] (03CR) 10BBlack: [C: 032 V: 032] cache_upload: jemalloc chunk size: s/1MB/128KB/ [puppet] - 10https://gerrit.wikimedia.org/r/313847 (owner: 10BBlack) [21:24:38] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure: Prepare storage layer for olo.wikipedia - https://phabricator.wikimedia.org/T147302#2690580 (10MarcoAurelio) p:05Triage>03Normal [21:25:12] hrmm, how the hell is the topic locked in here [21:25:19] set topiclock is off [21:25:25] but its op only setting topic allowed... [21:25:45] PROBLEM - puppet last run on kafka1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:32:30] AaronSchulz: sorry for the delay, waiting on jenkins/zuul things [21:32:42] (03PS1) 10Andrew Bogott: Move toollabs node classes to roles. [puppet] - 10https://gerrit.wikimedia.org/r/314180 (https://phabricator.wikimedia.org/T147233) [21:34:17] (03Abandoned) 10Andrew Bogott: Add a role wrapper around base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/313819 (owner: 10Andrew Bogott) [21:34:28] (03CR) 10jenkins-bot: [V: 04-1] Move toollabs node classes to roles. [puppet] - 10https://gerrit.wikimedia.org/r/314180 (https://phabricator.wikimedia.org/T147233) (owner: 10Andrew Bogott) [21:34:46] !log thcipriani@tin Synchronized php-1.28.0-wmf.21/includes/libs/rdbms/loadmonitor/LoadMonitor.php: [[gerrit:314176|Add version to LoadMonitor::getCacheKey() (T147359)]] (duration: 00m 53s) [21:34:48] T147359: Cannot access the database: No working replica DB server: Unknown error - https://phabricator.wikimedia.org/T147359 [21:34:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:35:07] ^ AaronSchulz sync'd, going to try to roll testwiki back on to 1.28.0-wmf.21 [21:36:14] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: testwiki to 1.28.0-wmf.21 [21:36:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:37:33] !log cache_upload: rolling frontend restarts for https://gerrit.wikimedia.org/r/#/c/313847/ (sequential depooled, ~30s per host) [21:37:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:38:28] (03PS2) 10Andrew Bogott: Move toollabs node classes to roles. [puppet] - 10https://gerrit.wikimedia.org/r/314180 (https://phabricator.wikimedia.org/T147233) [21:39:24] AaronSchulz: seems to have worked... [21:42:16] (03PS1) 10Thcipriani: Group0 to 1.28.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314186 [21:43:03] (03CR) 10Thcipriani: [C: 032] Group0 to 1.28.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314186 (owner: 10Thcipriani) [21:43:30] (03Merged) 10jenkins-bot: Group0 to 1.28.0-wmf.21 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314186 (owner: 10Thcipriani) [21:44:54] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.28.0-wmf.21 [21:44:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:50:34] RECOVERY - puppet last run on kafka1014 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [21:52:36] (03PS1) 10Hashar: openstack: skip DNS update for contintcloud [puppet] - 10https://gerrit.wikimedia.org/r/314188 [21:55:46] RECOVERY - MegaRAID on db2017 is OK: OK: optimal, 1 logical, 6 physical [22:04:12] @seen coren [22:04:12] mutante: Last time I saw Coren they were joining the channel, they are still in the channel #wmhack at 9/22/2016 9:39:21 PM (12d24m51s ago) [22:12:48] @seen mutante [22:12:48] paladox: mutante is in here, right now [22:12:58] @seen paladox [22:12:58] paladox: are you really looking for yourself? [22:15:47] Warning: Memcached::touch(): touch is only supported with binary protocol in /srv/mediawiki/php-1.28.0-wmf.20/includes/libs/objectcache/MemcachedPeclBagOStuff.php on line 253 [22:16:55] Would someone know about this error and if already reported somewhere? [22:17:42] ah yes https://phabricator.wikimedia.org/T143464 [22:19:37] @seen wm-bot [22:19:37] bblack: I am right here [22:23:26] PROBLEM - puppet last run on restbase1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:31:56] (03PS1) 10Dereckson: Set Flow database for wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314192 (https://phabricator.wikimedia.org/T127792) [22:32:10] matt_flaschen: I'm adding this to SWAT so we can create the tables ^ [22:33:32] (03PS2) 10Dzahn: contint: remove the ganglia jenkins plugin [puppet] - 10https://gerrit.wikimedia.org/r/313579 (https://phabricator.wikimedia.org/T147065) (owner: 10Hashar) [22:35:59] (03CR) 10Mattflaschen: [C: 031] Set Flow database for wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314192 (https://phabricator.wikimedia.org/T127792) (owner: 10Dereckson) [22:36:25] Dereckson, +1 looks good. I assigned the task to you to follow up on. I am available to review and help out. [22:37:15] * Dereckson nods. [22:37:30] Dereckson, actually, I have a suggestion. Will comment on the Gerrit. [22:38:06] (03CR) 10Dzahn: [C: 032] contint: remove the ganglia jenkins plugin [puppet] - 10https://gerrit.wikimedia.org/r/313579 (https://phabricator.wikimedia.org/T147065) (owner: 10Hashar) [22:39:34] (03CR) 10Mattflaschen: [C: 04-1] "Actually, should we just move the normal:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314192 (https://phabricator.wikimedia.org/T127792) (owner: 10Dereckson) [22:40:28] (03CR) 10Dereckson: "Indeed. The issue will exist in the future too for them." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314192 (https://phabricator.wikimedia.org/T127792) (owner: 10Dereckson) [22:43:10] (03PS2) 10Dzahn: contint: remove Jenkins gmond legacy files [puppet] - 10https://gerrit.wikimedia.org/r/313581 (https://phabricator.wikimedia.org/T147065) (owner: 10Hashar) [22:44:35] (03CR) 10Dzahn: [C: 032] "Notice: /Stage[main]/Role::Ci::Master/File[/usr/lib/ganglia/python_modules/gmond_jenkins.pyc]/ensure: removed" [puppet] - 10https://gerrit.wikimedia.org/r/313581 (https://phabricator.wikimedia.org/T147065) (owner: 10Hashar) [22:46:16] (03PS2) 10Dereckson: Set Flow database for wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314192 (https://phabricator.wikimedia.org/T127792) [22:48:16] RECOVERY - puppet last run on restbase1009 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [22:48:51] (03CR) 10Dereckson: "PS2: discard temporary fix per previous comment." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314192 (https://phabricator.wikimedia.org/T127792) (owner: 10Dereckson) [22:55:12] (03PS1) 10Dereckson: Always set wgFlowDefaultWikiDb [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314194 [22:58:42] (03CR) 10Dereckson: "I1de36a1de671dbdc7e5da76ecb3f81ca5710e661 takes care of the CS part." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/314192 (https://phabricator.wikimedia.org/T127792) (owner: 10Dereckson) [23:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161004T2300). [23:01:26] (03PS3) 10Paladox: labs dnsrecursor: add olo.wiki(pedia) [puppet] - 10https://gerrit.wikimedia.org/r/314012 (https://phabricator.wikimedia.org/T146612) [23:01:38] nothing to deploy? I'll swat my config cleanups [23:03:41] (03PS2) 10MaxSem: wfLoadExtension( 'GeoData' ) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313154 [23:03:43] (03CR) 10Dzahn: [C: 032] labs dnsrecursor: add olo.wiki(pedia) [puppet] - 10https://gerrit.wikimedia.org/r/314012 (https://phabricator.wikimedia.org/T146612) (owner: 10Paladox) [23:03:55] (03CR) 10MaxSem: [C: 032] wfLoadExtension( 'GeoData' ) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313154 (owner: 10MaxSem) [23:04:23] (03Merged) 10jenkins-bot: wfLoadExtension( 'GeoData' ) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313154 (owner: 10MaxSem) [23:08:21] matt_flaschen: if https://gerrit.wikimedia.org/r/314194 looks good to you, we can deploy it now too. [23:13:47] !log maxsem@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/313154/2 (duration: 00m 50s) [23:13:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:14:53] (03PS2) 10MaxSem: Kill $wmgEnableGeoSearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313155 [23:15:25] (03CR) 10MaxSem: [C: 032] Kill $wmgEnableGeoSearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313155 (owner: 10MaxSem) [23:15:54] (03Merged) 10jenkins-bot: Kill $wmgEnableGeoSearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313155 (owner: 10MaxSem) [23:20:22] !log maxsem@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/313155/2 (duration: 00m 49s) [23:22:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:22:50] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/313155/2 (duration: 00m 49s) [23:23:37] (03PS2) 10MaxSem: No reason to ever vary $wgGeoDataDebug by wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313156 [23:23:42] (03CR) 10MaxSem: [C: 032] No reason to ever vary $wgGeoDataDebug by wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313156 (owner: 10MaxSem) [23:24:13] (03Merged) 10jenkins-bot: No reason to ever vary $wgGeoDataDebug by wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313156 (owner: 10MaxSem) [23:24:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:28:05] (03PS3) 10Ori.livneh: [WIP] Module for Recommendation API [puppet] - 10https://gerrit.wikimedia.org/r/312045 [23:29:54] !log maxsem@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/313156/2 (duration: 00m 57s) [23:30:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:31:07] when importing a file: [23:31:09] Importing 20161003_Panel_LA_Case_Study_HD.webm...Fatal error: Call to a member function getNamespace() on a non-object in /srv/mediawiki/php-1.28.0-wmf.20/extensions/Wikidata/extensions/Wikibase/client/includes/Hooks/BeforePageDisplayHandler.php on line 41 [23:31:21] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/313156/2 (duration: 00m 50s) [23:31:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:32:25] (03PS2) 10MaxSem: GeoData: get rid of wmg, part 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313157 [23:32:40] (03CR) 10MaxSem: [C: 032] GeoData: get rid of wmg, part 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313157 (owner: 10MaxSem) [23:32:49] That remembers me https://phabricator.wikimedia.org/T147127 [23:33:11] (03Merged) 10jenkins-bot: GeoData: get rid of wmg, part 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313157 (owner: 10MaxSem) [23:33:47] 06Operations: investigate shared inbox options - https://phabricator.wikimedia.org/T146746#2691198 (10Dzahn) a:03Dzahn [23:34:09] 06Operations: investigate shared inbox options - https://phabricator.wikimedia.org/T146746#2669983 (10Dzahn) p:05Triage>03Normal [23:37:02] 06Operations: investigate shared inbox options - https://phabricator.wikimedia.org/T146746#2691234 (10Dzahn) @Peachey88 thank you for summary, that is very useful. i'll report back to the team [23:39:54] !log maxsem@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/313157/2 (duration: 01m 38s) [23:39:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:41:08] (03PS2) 10MaxSem: GeoData: get rid of wmg, part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313158 [23:41:19] (03CR) 10MaxSem: [C: 032] GeoData: get rid of wmg, part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313158 (owner: 10MaxSem) [23:41:55] (03Merged) 10jenkins-bot: GeoData: get rid of wmg, part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/313158 (owner: 10MaxSem) [23:52:55] PROBLEM - puppet last run on lvs3004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues