[00:00:53] (03PS3) 10Paladox: Testing [debs/gerrit] - 10https://gerrit.wikimedia.org/r/302371 [00:02:56] (03PS4) 10Paladox: Testing [debs/gerrit] - 10https://gerrit.wikimedia.org/r/302371 [00:03:05] (03CR) 10Alex Monk: "Had to re-apply this today to get nginx starting..." [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T70387) (owner: 10Alex Monk) [00:07:09] (03CR) 10Dzahn: "there should be a separate change that converts this whole thing to ferm::service and DNS names and/or gets the list of deployment and mai" [puppet] - 10https://gerrit.wikimedia.org/r/302366 (https://phabricator.wikimedia.org/T141619) (owner: 10Dzahn) [00:08:36] (03PS2) 10Dzahn: tcpircbot: allow connections from terbium [puppet] - 10https://gerrit.wikimedia.org/r/302366 (https://phabricator.wikimedia.org/T141619) [00:09:26] 06Operations, 06Release-Engineering-Team, 15User-greg, 07Wikimedia-Incident: Institute a weekly review of all UBN! tasks - https://phabricator.wikimedia.org/T141130#2513777 (10greg) [00:11:57] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [00:12:27] (03PS5) 10Paladox: Testing [debs/gerrit] - 10https://gerrit.wikimedia.org/r/302371 [00:13:18] (03PS1) 10Dzahn: tcpircbot: add instances on terbium and wasat [puppet] - 10https://gerrit.wikimedia.org/r/302375 (https://phabricator.wikimedia.org/T141619) [00:13:37] (03PS6) 10Paladox: Testing [debs/gerrit] - 10https://gerrit.wikimedia.org/r/302371 [00:13:47] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master). [00:15:04] (03PS2) 10Dzahn: tcpircbot: allow connections from terbium and wasat [puppet] - 10https://gerrit.wikimedia.org/r/302375 (https://phabricator.wikimedia.org/T141619) [00:16:55] puppet fail on tin is something i merged [00:17:09] the icinga check for the root-owned files that chad wrote [00:18:09] ah, dependency for icinga user/group but not running on icinga server, runs on deployment servers, where the user doesnt exist [00:18:32] (03PS7) 10Paladox: Testing [debs/gerrit] - 10https://gerrit.wikimedia.org/r/302371 [00:19:17] (03CR) 10Dzahn: "should also add wasat, deployment server codfw" [puppet] - 10https://gerrit.wikimedia.org/r/302366 (https://phabricator.wikimedia.org/T141619) (owner: 10Dzahn) [00:19:27] will fix within the next hour. brb [00:19:43] isn't wasat the codfw maintenance server? [00:19:44] deployment is mira [00:20:28] ACKNOWLEDGEMENT - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 2 failures daniel_zahn dependency issue, icinga user not here [00:21:01] Krenair: yes, it is. 2 separate things going on [00:21:12] one is that puppet run on tin (icing check for root files) [00:21:20] the other is making dologmsg work [00:21:31] on maintenance servers. was requsted for terbium [00:21:41] so i said should also be wasat [00:21:47] yep [00:21:56] gotta move, be back soon [00:23:21] (03Abandoned) 10Paladox: Add gbp.conf file for debian [debs/gerrit] - 10https://gerrit.wikimedia.org/r/301841 (owner: 10Paladox) [00:23:34] (03Restored) 10Paladox: Add gbp.conf file for debian [debs/gerrit] - 10https://gerrit.wikimedia.org/r/301841 (owner: 10Paladox) [00:24:27] PROBLEM - puppet last run on mira is CRITICAL: CRITICAL: Puppet has 1 failures [00:29:44] bblack, so, here's the thing... the acme_tiny script relies on a working https cert if you redirect the verification URL to HTTPS [00:29:56] otherwise this happens: urllib2.URLError: [00:30:27] I wonder what Let's Encrypt says [00:30:57] (03PS1) 10Dereckson: Revert "Revert "Add $wmgEchoMentionStatusNotifications and enable it in beta labs"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302377 (https://phabricator.wikimedia.org/T135717) [00:33:03] RoanKattouw: log clean [00:35:35] RoanKattouw: we retry it? [00:39:37] Think I got it... [00:44:04] Krenair: I had to comment that out before, yeah [00:44:21] Krenair: there's an internal check in acme-tiny that tries the challenge on itself first. comment those out and LE itself will work just fine. [00:44:39] Seems like a fix I should upstream [00:44:52] yeah there should at least be a flag to skip self-verification [00:45:00] (it's possible to make urllib2 ignore invalid certs) [00:45:03] upstream is strange, though, adverse to adding extra lines of code and such :) [00:48:43] :/ [00:48:47] Krenair: the biggest problem is going to be that LE doesn't do wildcards. It's the same problem with our non-canonical redirect, though, since we need $lang.wikipedia.com -> $lang.wikipedia.org and such [00:49:24] I was working on that last week, but probably another week or two to go before it's ready (to create thousands of LE certs templated out on the language list and such) [00:49:47] (well thousands of SANs anyways, split up 100/cert) [00:50:42] (03PS4) 10Paladox: Gerrit: Support having phab commits as links [puppet] - 10https://gerrit.wikimedia.org/r/302229 (https://phabricator.wikimedia.org/T76459) [00:51:15] bblack, yeah, I'm just fiddling with upload at the moment [00:51:45] PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on tin is CRITICAL: NRPE: Command check__srv_mediawiki-staging_owned not defined [00:53:05] PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on mira is CRITICAL: NRPE: Command check__srv_mediawiki-staging_owned not defined [00:53:56] I'm guessing dz is still working on ^ [00:56:24] RoanKattouw: ping ? [00:58:14] tada: https://upload.beta.wmflabs.org/wikisource/en/thumb/6/62/Wind_in_the_Willows_%281913%29.djvu/page7-1024px-Wind_in_the_Willows_%281913%29.djvu.jpg [01:00:36] nice! [01:02:20] (03CR) 10Paladox: "maybe leave this part as it is and copy this below it. so it dosrnt affect / urls and fixes /r urls please." [puppet] - 10https://gerrit.wikimedia.org/r/301829 (owner: 10Chad) [01:18:55] (03PS3) 10Alex Monk: beta: Use Let's Encrypt for upload, and new self-signed SSL certificate for text [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T70387) [01:19:40] (03CR) 10Alex Monk: [C: 04-1] "Obviously this is not ready yet." [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T70387) (owner: 10Alex Monk) [01:19:57] (03CR) 10jenkins-bot: [V: 04-1] beta: Use Let's Encrypt for upload, and new self-signed SSL certificate for text [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T70387) (owner: 10Alex Monk) [01:23:13] (03PS4) 10Alex Monk: beta: Use Let's Encrypt for upload, and new self-signed SSL certificate for text [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T70387) [01:25:48] Someone else already ran into our acme-tiny verification issue: https://github.com/diafygi/acme-tiny/issues/73#issuecomment-222746150 [01:25:57] Gonna submit a pull request and see what they say [01:34:31] (03PS1) 10BBlack: ciphersuites: drop mid-level dhe+aes256 options [puppet] - 10https://gerrit.wikimedia.org/r/302378 [01:35:13] actually I just left a comment at https://github.com/diafygi/acme-tiny/pull/116 which allows disabling the check [01:38:42] (03PS5) 10Alex Monk: beta: Use Let's Encrypt for upload, and new self-signed SSL certificate for text [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T70387) [01:46:30] (03CR) 10Alex Monk: "Hi, I think someone may have merged this on deployment-puppetmaster... If so, please don't. We use cherry-pick for these." [puppet] - 10https://gerrit.wikimedia.org/r/302303 (https://phabricator.wikimedia.org/T141575) (owner: 10Ladsgroup) [01:47:05] Dereckson: Sorry, I had forgotten about this and gone home [01:47:12] I'll reschedule for tomorrow [01:48:21] (03CR) 10Alex Monk: "(I've cleaned it up now, and the automatic-pulling seems to work again)" [puppet] - 10https://gerrit.wikimedia.org/r/302303 (https://phabricator.wikimedia.org/T141575) (owner: 10Ladsgroup) [01:53:19] RoanKattouw: okay, I prepared https://gerrit.wikimedia.org/r/302377 for that [01:54:32] Thanks [02:15:25] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [02:21:10] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.12) (duration: 08m 25s) [02:21:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:23:36] (03PS6) 10Alex Monk: beta: Use Let's Encrypt for upload, and new self-signed SSL certificate for text [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T70387) [02:26:32] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Aug 2 02:26:32 UTC 2016 (duration 5m 22s) [02:26:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:46:57] PROBLEM - puppet last run on mw2140 is CRITICAL: CRITICAL: Puppet has 1 failures [02:57:38] (03CR) 10Alex Monk: [C: 04-1] "no ferm rule for wasat?" [puppet] - 10https://gerrit.wikimedia.org/r/302375 (https://phabricator.wikimedia.org/T141619) (owner: 10Dzahn) [02:59:51] (03CR) 10Alex Monk: [C: 04-1] "should probably be merged with the other commit - and yes, needs wasat" [puppet] - 10https://gerrit.wikimedia.org/r/302366 (https://phabricator.wikimedia.org/T141619) (owner: 10Dzahn) [03:05:07] 06Operations, 10Beta-Cluster-Infrastructure, 06Labs, 10Labs-Infrastructure: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#2514079 (10Krenair) a:03Krenair [03:06:04] 06Operations, 10Beta-Cluster-Infrastructure, 06Labs, 10Labs-Infrastructure: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#527800 (10Krenair) This is now working for meta.wikimedia.beta.wmflabs.org and deployment.wikimedia.beta.wmflabs.org (and their... [03:07:46] (03PS7) 10Alex Monk: beta: Use Let's Encrypt for upload, and new self-signed SSL certificate for most of text [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T50501) [03:09:39] (03CR) 10Alex Monk: [C: 04-1] "Still not ready. I also have an extra hack on deployment-puppetmaster stopping the TLS redirection from affecting domains that don't yet h" [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T50501) (owner: 10Alex Monk) [03:12:31] (03PS1) 10Dzahn: deployment-server: fix icinga plugin owner/group [puppet] - 10https://gerrit.wikimedia.org/r/302381 [03:12:36] RECOVERY - puppet last run on mw2140 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [03:19:19] (03CR) 10Dzahn: [C: 032] deployment-server: fix icinga plugin owner/group [puppet] - 10https://gerrit.wikimedia.org/r/302381 (owner: 10Dzahn) [03:21:07] (03CR) 10Dzahn: "aka. fix owner of bad_directory_owner.pp" [puppet] - 10https://gerrit.wikimedia.org/r/302381 (owner: 10Dzahn) [03:22:46] RECOVERY - puppet last run on mira is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [03:23:16] RECOVERY - puppet last run on tin is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:27:58] (03PS3) 10Dzahn: tcpircbot: allow connections from terbium, wasat [puppet] - 10https://gerrit.wikimedia.org/r/302366 (https://phabricator.wikimedia.org/T141619) [03:41:13] (03PS3) 10Dzahn: tcpircbot: allow connections from terbium and wasat [puppet] - 10https://gerrit.wikimedia.org/r/302375 (https://phabricator.wikimedia.org/T141619) [03:42:27] PROBLEM - puppet last run on db2043 is CRITICAL: CRITICAL: puppet fail [03:45:37] (03CR) 10Dzahn: "works now after https://gerrit.wikimedia.org/r/#/c/302381/" [puppet] - 10https://gerrit.wikimedia.org/r/301842 (owner: 10Chad) [03:45:48] (03PS4) 10Dzahn: tcpircbot: allow connections from terbium and wasat [puppet] - 10https://gerrit.wikimedia.org/r/302375 (https://phabricator.wikimedia.org/T141619) [03:46:57] (03CR) 10Dzahn: "works now after https://gerrit.wikimedia.org/r/#/c/302381/" [puppet] - 10https://gerrit.wikimedia.org/r/301327 (owner: 10Chad) [03:48:04] https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=tin&service=Improperly+owned+-0%3A0-+files+in+%2Fsrv%2Fmediawiki-staging [03:48:25] that works now. tells us about root-owned files on staging [03:48:36] both mira and tin [03:50:58] hmm, actually.. it may not really work yet, just fixed some things on the way [03:51:15] there is always this, f.e. [03:51:18] ./.git/refs/remotes/readonly/master [04:08:35] RECOVERY - puppet last run on db2043 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [04:16:13] (03PS1) 10Dzahn: deployment/icinga: fix check_dir-not-bad-owner [puppet] - 10https://gerrit.wikimedia.org/r/302382 [04:19:19] (03PS2) 10Dzahn: deployment/icinga: fix check_dir-not-bad-owner [puppet] - 10https://gerrit.wikimedia.org/r/302382 [04:26:55] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [50.0] [04:32:56] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [04:34:42] (03CR) 10Dzahn: [C: 032] deployment/icinga: fix check_dir-not-bad-owner [puppet] - 10https://gerrit.wikimedia.org/r/302382 (owner: 10Dzahn) [04:39:47] (03CR) 10Dzahn: "another follow-up https://gerrit.wikimedia.org/r/#/c/302382/ because there is always "/srv/mediawiki-staging/.git/refs/remotes/readonly/m" [puppet] - 10https://gerrit.wikimedia.org/r/301327 (owner: 10Chad) [05:35:15] ACKNOWLEDGEMENT - Improperly owned -0:0- files in /srv/mediawiki-staging on mira is CRITICAL: Improperly owned (:) files in /srv/mediawiki-staging daniel_zahn new check is still being worked on [05:35:15] ACKNOWLEDGEMENT - Improperly owned -0:0- files in /srv/mediawiki-staging on tin is CRITICAL: Improperly owned (:) files in /srv/mediawiki-staging daniel_zahn new check is still being worked on [05:40:43] 06Operations: potassium - 'puppetmaster.test.eqiad.wmnet' did not match server certificate - https://phabricator.wikimedia.org/T141839#2514135 (10Dzahn) [05:41:20] ACKNOWLEDGEMENT - puppet last run on potassium is CRITICAL: CRITICAL: puppet fail daniel_zahn https://phabricator.wikimedia.org/T141839 [06:05:25] (03PS1) 10Dzahn: deployment/icinga: fix bad directory owner check [puppet] - 10https://gerrit.wikimedia.org/r/302384 [06:11:10] (03PS2) 10Dzahn: deployment/icinga: fix bad directory owner check [puppet] - 10https://gerrit.wikimedia.org/r/302384 [06:11:31] (03CR) 10Dzahn: [C: 032] deployment/icinga: fix bad directory owner check [puppet] - 10https://gerrit.wikimedia.org/r/302384 (owner: 10Dzahn) [06:18:15] (03CR) 10Dzahn: "one more fix-up change: https://gerrit.wikimedia.org/r/302384" [puppet] - 10https://gerrit.wikimedia.org/r/301327 (owner: 10Chad) [06:19:10] (03CR) 10Dzahn: "follow-up https://gerrit.wikimedia.org/r/302384" [puppet] - 10https://gerrit.wikimedia.org/r/301842 (owner: 10Chad) [06:30:17] PROBLEM - puppet last run on cp1054 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:45] PROBLEM - puppet last run on db1056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:46] PROBLEM - puppet last run on druid1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:46] PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:16] PROBLEM - puppet last run on elastic1042 is CRITICAL: CRITICAL: Puppet has 4 failures [06:32:56] PROBLEM - puppet last run on cp3036 is CRITICAL: CRITICAL: Puppet has 1 failures [06:39:16] PROBLEM - puppet last run on mw2063 is CRITICAL: CRITICAL: puppet fail [06:45:44] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-dan-nor] - 10https://gerrit.wikimedia.org/r/269916 (https://phabricator.wikimedia.org/T124137) (owner: 10KartikMistry) [06:46:43] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-dan] - 10https://gerrit.wikimedia.org/r/269912 (https://phabricator.wikimedia.org/T124137) (owner: 10KartikMistry) [06:49:45] PROBLEM - puppet last run on analytics1016 is CRITICAL: CRITICAL: Puppet has 1 failures [06:50:05] PROBLEM - puppet last run on analytics1003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:55:23] 06Operations, 10ContentTranslation-CXserver, 10ContentTranslation-Deployments, 10MediaWiki-extensions-ContentTranslation, and 5 others: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306#2514221 (10Arrbee) [06:55:55] RECOVERY - puppet last run on cp1054 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [06:56:15] RECOVERY - puppet last run on db1056 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:56:16] RECOVERY - puppet last run on druid1002 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [06:57:06] RECOVERY - puppet last run on kafka1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:35] RECOVERY - puppet last run on elastic1042 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:16] RECOVERY - puppet last run on cp3036 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [07:06:55] RECOVERY - puppet last run on mw2063 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:15:16] RECOVERY - puppet last run on analytics1016 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [07:15:36] RECOVERY - puppet last run on analytics1003 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [07:27:31] 06Operations, 10Analytics, 10Analytics-Wikistats, 07Regression: [Regression] stats.wikipedia.org redirect no longer works ("Domain not served here") - https://phabricator.wikimedia.org/T126281#2514295 (10Nemo_bis) p:05Triage>03Normal [07:31:49] 06Operations, 10Ops-Access-Requests, 06Editing-Analysis, 13Patch-For-Review: Requesting access to research groups for Helen Jiang - https://phabricator.wikimedia.org/T140659#2514296 (10HJiang-WMF) 05Resolved>03Open Outage of access to bast1001. OS: Fedora 23. Have successfully ssh-ed into other servi... [07:44:52] (03PS6) 10MarcoAurelio: Closing wikimania2015wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298772 (https://phabricator.wikimedia.org/T139032) [07:47:18] (03PS8) 10Alex Monk: beta: Use Let's Encrypt for upload, and new self-signed SSL certificate for most of text [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T50501) [07:51:09] 06Operations, 10Traffic: Age header reset to 0 after 24 hours on varnish frontends - https://phabricator.wikimedia.org/T141373#2514310 (10ema) I've left varnishncsa running for six hours on a cache_upload machine, and the maximum value for Age found there was 601758 (1 week). Same procedure on a cache_text sy... [07:54:28] (03CR) 10Muehlenhoff: "@paladox: I'll review and merge later on" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/299164 (owner: 10Chad) [07:54:48] !log akosiaris@palladium conftool action : set/pooled=no; selector: wtp1001.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=parsoid', 'service=parsoid']) [07:54:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:54:56] !log akosiaris@palladium conftool action : set/pooled=no; selector: wtp1002.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=parsoid', 'service=parsoid']) [07:54:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:55:00] !log T135176 depool wtp100[12] [07:55:01] T135176: Migrate Parsoid cluster to Jessie / node 4.x - https://phabricator.wikimedia.org/T135176 [07:55:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:07:32] (03PS1) 10Ema: cache_upload: use default_ttl instead of ttl_fixed [puppet] - 10https://gerrit.wikimedia.org/r/302388 [08:11:10] 06Operations, 10Ops-Access-Requests, 06Editing-Analysis, 13Patch-For-Review: Requesting access to research groups for Helen Jiang - https://phabricator.wikimedia.org/T140659#2471648 (10AlexMonk-WMF) >>! In T140659#2514296, @HJiang-WMF wrote: > Standard ssh config file but unable to ssh into bast1001 now. A... [08:16:25] PROBLEM - Host mr1-codfw.oob is DOWN: PING CRITICAL - Packet loss = 100% [08:24:05] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 35 probes of 393 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [08:28:29] (03PS1) 10Alexandros Kosiaris: wtp100[12] to jessie [puppet] - 10https://gerrit.wikimedia.org/r/302389 (https://phabricator.wikimedia.org/T135176) [08:28:56] RECOVERY - Host mr1-codfw.oob is UP: PING OK - Packet loss = 0%, RTA = 38.58 ms [08:29:05] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 212, down: 0, dormant: 0, excluded: 1, unused: 0 [08:29:13] (03PS1) 10Elukey: Move the AQS partman setup to RAID10 (instead of RAID0) [puppet] - 10https://gerrit.wikimedia.org/r/302390 [08:29:56] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 2 probes of 393 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [08:30:34] (03CR) 10Elukey: [C: 032] Move the AQS partman setup to RAID10 (instead of RAID0) [puppet] - 10https://gerrit.wikimedia.org/r/302390 (owner: 10Elukey) [08:30:39] (03CR) 10Alexandros Kosiaris: [C: 032] wtp100[12] to jessie [puppet] - 10https://gerrit.wikimedia.org/r/302389 (https://phabricator.wikimedia.org/T135176) (owner: 10Alexandros Kosiaris) [08:30:53] (03PS2) 10Elukey: Move the AQS partman setup to RAID10 (instead of RAID0) [puppet] - 10https://gerrit.wikimedia.org/r/302390 [08:31:10] ah akosiaris stole my +2 submit spot :P [08:32:27] elukey: ha! beat ya! :P [08:32:59] tbh, these race cases make me question of ff-only policy on the puppet repo [08:33:12] that and cleaning up my local branches [08:34:14] the temptation to +2 verified to anticipate the others is also a big risk :D [08:34:38] oh, I am almost always V+2 myself after a rebase these days [08:34:57] unless the change is big, the risk is pretty low [08:35:39] yeah true [08:35:45] RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 1 probes of 398 (alerts on 19) - https://atlas.ripe.net/measurements/1790945/#!map [08:36:13] !log reimaging the AQS cluster to use raid10 (aqs100[456], not serving live traffic) [08:43:17] PROBLEM - Host mr1-codfw.oob is DOWN: CRITICAL - Time to live exceeded (216.117.46.36) [08:45:09] (03PS1) 10Gehel: Maps - remove expire files [puppet] - 10https://gerrit.wikimedia.org/r/302392 [08:45:35] PROBLEM - IPv4 ping to eqiad on ripe-atlas-eqiad is CRITICAL: CRITICAL - failed 36 probes of 398 (alerts on 19) - https://atlas.ripe.net/measurements/1790945/#!map [08:48:26] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [08:49:00] elukey: that ^ would be you [08:49:17] RECOVERY - Host mr1-codfw.oob is UP: PING OK - Packet loss = 0%, RTA = 33.75 ms [08:51:26] RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 1 probes of 398 (alerts on 19) - https://atlas.ripe.net/measurements/1790945/#!map [08:53:05] PROBLEM - Host cp1070 is DOWN: PING CRITICAL - Packet loss = 100% [08:53:42] akosiaris: checking, I didn't see anything weird in palladium though [08:54:56] RECOVERY - Host cp1070 is UP: PING OK - Packet loss = 0%, RTA = 1.45 ms [08:55:50] akosiaris: just to avoid any mess, a sudo git pull origin in /var/lib/git/operations/puppet should be enough right? [08:57:54] elukey: sudo puppet-merge [08:58:10] works fine on the backend puppetmasters as well [08:58:25] and actually the command you pasted would create a mess [08:58:35] that directory should not have root owned files [08:58:55] see I was right to ask first :) [08:58:55] PROBLEM - salt-minion processes on cp1070 is CRITICAL: Connection refused by host [08:58:56] PROBLEM - puppet last run on cp1070 is CRITICAL: Connection refused by host [08:59:08] thanks, it should be solved now [08:59:16] :-) [08:59:16] PROBLEM - DPKG on cp1070 is CRITICAL: Connection refused by host [08:59:25] PROBLEM - Disk space on cp1070 is CRITICAL: Connection refused by host [08:59:29] ema --^ [09:00:07] PROBLEM - configured eth on cp1070 is CRITICAL: Connection refused by host [09:00:16] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [09:00:23] (03PS1) 10Muehlenhoff: Enable base::firewall for palladium [puppet] - 10https://gerrit.wikimedia.org/r/302394 [09:00:25] elukey: yeah, I was looking into that. It was a software-decommed spare, no clue why icinga started complaining now [09:00:26] PROBLEM - dhclient process on cp1070 is CRITICAL: Connection refused by host [09:00:47] ema: super, just wanted to ping you in case you didn't see it :) [09:05:18] (03PS3) 10Urbanecm: Add possibility to disable CompactLink in default state and disable it on enwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298187 (https://phabricator.wikimedia.org/T139903) [09:14:06] (03PS2) 10Gehel: Maps - initial data import [puppet] - 10https://gerrit.wikimedia.org/r/300572 (https://phabricator.wikimedia.org/T138501) [09:17:16] 06Operations, 10ops-eqiad, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2514484 (10fgiunchedi) on ms-be1023 with the firmware upgraded I'm still seeing a couple of kernel messages from hpsa about aborted commands and `check_hpssacli` takes... [09:19:37] (03PS4) 10ArielGlenn: tiny script that retrieves config values from dump config files [dumps] - 10https://gerrit.wikimedia.org/r/301712 (https://phabricator.wikimedia.org/T141563) [09:20:06] (03PS7) 10MarcoAurelio: Closing wikimania2015wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298772 (https://phabricator.wikimedia.org/T139032) [09:20:35] (03CR) 10ArielGlenn: [C: 032] tiny script that retrieves config values from dump config files [dumps] - 10https://gerrit.wikimedia.org/r/301712 (https://phabricator.wikimedia.org/T141563) (owner: 10ArielGlenn) [09:24:52] 06Operations, 06Discovery, 06Discovery-Search-Backlog, 10Elasticsearch, 07Easy: Improve Elasticsearch icinga alerting - https://phabricator.wikimedia.org/T133844#2514522 (10Gehel) [09:33:03] !log upload scap 3.2.1-1 to carbon T127762 [09:33:27] T127762: Update Debian Package for Scap3 - https://phabricator.wikimedia.org/T127762 [09:33:46] thanks stashbot, logmsgbot I'd like my receipt [09:34:02] sad_trombone.wav [09:34:23] heheh indeed [09:34:25] logmsgbot: help [09:34:55] ah no nevermind that's morebots which of course isn't here [09:35:12] (03PS9) 10Alex Monk: beta: Use Let's Encrypt for upload, and new self-signed SSL certificate for most of text [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T50501) [09:36:28] (03CR) 10jenkins-bot: [V: 04-1] beta: Use Let's Encrypt for upload, and new self-signed SSL certificate for most of text [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T50501) (owner: 10Alex Monk) [09:38:04] morebots: status [09:38:04] I am a logbot running on tools-exec-1217. [09:38:05] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [09:38:05] To log a message, type !log . [09:39:08] !log upload scap 3.2.1-1 to carbon T127762 (at 9:33) [09:39:09] T127762: Update Debian Package for Scap3 - https://phabricator.wikimedia.org/T127762 [09:39:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:42:10] !log installing chromium security updates on osmium [09:42:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:44:48] (03CR) 10Paladox: "@Muehlenhoff ok thanks, is there any chance you could do It later today please?" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/299164 (owner: 10Chad) [09:45:38] PROBLEM - Host wtp1002 is DOWN: PING CRITICAL - Packet loss = 100% [09:48:29] RECOVERY - Host wtp1002 is UP: PING OK - Packet loss = 0%, RTA = 2.02 ms [09:49:30] (03PS10) 10Alex Monk: beta: Use Let's Encrypt cert [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T50501) [09:49:50] !log temporarily stop puppet and disable check_hpssacli on ms-be1023 T136631 [09:49:51] T136631: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631 [09:49:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:50:02] (03PS1) 10Alexandros Kosiaris: base: Remove unused snmp.conf.erb [puppet] - 10https://gerrit.wikimedia.org/r/302397 [09:50:17] moritzm (Muehlenhoff) hi im wondering about ^^ is there any chance of you reviewing it later today please. Since im testing an all puppet installation of gerrit and im stuck until it is merged and uploaded to apt? [09:51:04] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "A remnant of when we were using snmp in icinga puppet run checks. Unreferenced by anything, remove" [puppet] - 10https://gerrit.wikimedia.org/r/302397 (owner: 10Alexandros Kosiaris) [09:52:30] PROBLEM - configured eth on wtp1002 is CRITICAL: Connection refused by host [09:52:49] PROBLEM - Check size of conntrack table on wtp1002 is CRITICAL: Connection refused by host [09:52:50] PROBLEM - dhclient process on wtp1002 is CRITICAL: Connection refused by host [09:53:06] (03PS6) 10ArielGlenn: add cron job for Content Translation dumps [puppet] - 10https://gerrit.wikimedia.org/r/301773 (https://phabricator.wikimedia.org/T127793) [09:53:08] PROBLEM - DPKG on wtp1002 is CRITICAL: Connection refused by host [09:53:10] PROBLEM - parsoid on wtp1002 is CRITICAL: Connection refused [09:53:20] PROBLEM - Disk space on wtp1002 is CRITICAL: Connection refused by host [09:53:43] PROBLEM - parsoid disk space on wtp1002 is CRITICAL: Connection refused by host [09:53:44] PROBLEM - MegaRAID on wtp1002 is CRITICAL: Connection refused by host [09:53:58] PROBLEM - puppet last run on wtp1002 is CRITICAL: Connection refused by host [09:54:00] arh [09:54:00] PROBLEM - salt-minion processes on wtp1002 is CRITICAL: Connection refused by host [09:54:03] sorry that's me [09:54:13] why is disk space on parsoid a paging check? [09:54:22] I am wondering the same thing [09:57:28] (03PS1) 10Alex Monk: beta: Get rid of old unused upload.beta.wmflabs.org apache config [puppet] - 10https://gerrit.wikimedia.org/r/302398 (https://phabricator.wikimedia.org/T84950) [09:57:34] oh I know why [09:58:13] mark: wrong answer to the problem is the answer. https://wikitech.wikimedia.org/wiki/Incident_documentation/20140211-Parsoid [09:58:45] i see [09:59:18] PROBLEM - puppet last run on mw2091 is CRITICAL: CRITICAL: Puppet has 1 failures [09:59:31] I think I am gonna remove that ... makes no sense any more [10:02:21] (03PS11) 10Alex Monk: beta: Use Let's Encrypt cert [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T50501) [10:05:01] (03CR) 10Muehlenhoff: [C: 04-1] "Two comments, otherwise looks good to merge" (032 comments) [debs/gerrit] - 10https://gerrit.wikimedia.org/r/299164 (owner: 10Chad) [10:05:36] (03PS1) 10Alexandros Kosiaris: parsoid: Remove disk space alert [puppet] - 10https://gerrit.wikimedia.org/r/302401 [10:07:45] (03Draft2) 10MarcoAurelio: Grant permission 'managechangetags' to 'abusefilter' group on English Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302400 (https://phabricator.wikimedia.org/T141847) [10:11:08] 06Operations, 10ops-eqiad, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2514726 (10fgiunchedi) Another difference is load average and system cpu % utilization, I've tried running `perf top` on a couple of machines ms-be1023 (gen9, jessie,... [10:12:25] (03PS1) 10Alexandros Kosiaris: Include passwords::misc::scripts when used [puppet] - 10https://gerrit.wikimedia.org/r/302403 [10:12:59] (03PS7) 10ArielGlenn: add cron job for Content Translation dumps [puppet] - 10https://gerrit.wikimedia.org/r/301773 (https://phabricator.wikimedia.org/T127793) [10:13:07] if anyone else wants to poke at ^ to see what's up for swift machines for trusty vs jessie, it looks related to the kernel afaict [10:14:13] (03CR) 10Alex Monk: "89 domains on deployment-cache-text04... Adds 4-5 minutes to puppet runs, and that's with verification commented out... So maybe it's not " [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T50501) (owner: 10Alex Monk) [10:15:35] (03CR) 10Alexandros Kosiaris: [C: 032] Include passwords::misc::scripts when used [puppet] - 10https://gerrit.wikimedia.org/r/302403 (owner: 10Alexandros Kosiaris) [10:17:59] (03CR) 10ArielGlenn: "I'd like to see a flag for the maintenance script so that it runs quietly when all goes well, thus cutting down cronspam. After updating " [puppet] - 10https://gerrit.wikimedia.org/r/301773 (https://phabricator.wikimedia.org/T127793) (owner: 10ArielGlenn) [10:23:53] RECOVERY - puppet last run on mw2091 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [10:26:41] 06Operations, 06Commons, 10media-storage: Install mscorefonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T140141#2514788 (10MoritzMuehlenhoff) I don't think we should use the installer, but rather download the fonts and package them in an internal deb package. This would - avoi... [10:29:04] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002/stat1004 for Jdlrobson - https://phabricator.wikimedia.org/T141811#2514806 (10ema) p:05Triage>03Normal [10:30:11] (03PS1) 10Ladsgroup: Beta: move from ores.wikimedia.org to ores-beta.wmflabs.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302406 (https://phabricator.wikimedia.org/T141825) [10:34:21] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [10:34:43] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [10:34:46] (03CR) 10Mobrovac: [C: 04-1] "LGTM overall, but I think it would be best to split this patch into 2 - one that provides the needed changes in service::node, and the oth" (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/302309 (https://phabricator.wikimedia.org/T139674) (owner: 10Ppchelko) [10:35:32] PROBLEM - Unmerged changes on repository puppet on rhodium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [10:40:31] (03CR) 10Filippo Giunchedi: [C: 031] parsoid: Remove disk space alert [puppet] - 10https://gerrit.wikimedia.org/r/302401 (owner: 10Alexandros Kosiaris) [10:41:51] (03PS1) 10Alex Monk: deployment-prep: Swap config to HTTPS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302409 (https://phabricator.wikimedia.org/T50501) [10:42:30] (03PS1) 10MarcoAurelio: Move abusefilter permissions to abusefilter.php for azbwiki. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302410 (https://phabricator.wikimedia.org/T141860) [10:42:38] (03CR) 10Alex Monk: [C: 032] deployment-prep: Swap config to HTTPS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302409 (https://phabricator.wikimedia.org/T50501) (owner: 10Alex Monk) [10:43:03] (03Merged) 10jenkins-bot: deployment-prep: Swap config to HTTPS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302409 (https://phabricator.wikimedia.org/T50501) (owner: 10Alex Monk) [10:44:02] !log krenair@tin Synchronized wmf-config: https://gerrit.wikimedia.org/r/302409 - labs-only changes (duration: 00m 34s) [10:44:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:55:47] (03CR) 10Alex Monk: [C: 04-1] "10.64.31.12 doesn't seem to exist and the comment was not updated" [puppet] - 10https://gerrit.wikimedia.org/r/302366 (https://phabricator.wikimedia.org/T141619) (owner: 10Dzahn) [11:14:52] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [11:15:32] RECOVERY - Unmerged changes on repository puppet on rhodium is OK: No changes to merge. [11:16:22] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [11:33:33] PROBLEM - Host cp3022 is DOWN: PING CRITICAL - Packet loss = 100% [11:37:42] RECOVERY - Host cp3022 is UP: PING OK - Packet loss = 0%, RTA = 83.74 ms [11:37:43] (03PS1) 10Alexandros Kosiaris: role::labsdb::manager: Move under the role module [puppet] - 10https://gerrit.wikimedia.org/r/302414 [11:37:55] (cp3022 is me) [11:41:51] PROBLEM - Disk space on cp3022 is CRITICAL: Connection refused by host [11:41:52] (03CR) 10BBlack: beta: Use Let's Encrypt cert (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T50501) (owner: 10Alex Monk) [11:42:11] PROBLEM - MD RAID on cp3022 is CRITICAL: Connection refused by host [11:42:23] PROBLEM - MegaRAID on cp3022 is CRITICAL: Connection refused by host [11:42:31] PROBLEM - configured eth on cp3022 is CRITICAL: Connection refused by host [11:42:39] cp3022 is a dead host too I'm pretty sure [11:42:42] PROBLEM - Check size of conntrack table on cp3022 is CRITICAL: Connection refused by host [11:42:42] PROBLEM - dhclient process on cp3022 is CRITICAL: Connection refused by host [11:42:45] they're coming back from the dead in puppet data? [11:42:55] oh it's mark :) [11:43:01] PROBLEM - puppet last run on cp3022 is CRITICAL: Connection refused by host [11:43:02] PROBLEM - salt-minion processes on cp3022 is CRITICAL: Connection refused by host [11:43:18] i suppose it should be purged from puppet [11:43:29] * mark goes to read how to do that these days :P [11:43:31] PROBLEM - DPKG on cp3022 is CRITICAL: Connection refused by host [11:43:48] well I guess once it booted back up, it registered with the puppetmaster by trying to run [11:43:52] maybe [11:44:18] i'm reinstalling it [11:44:42] thanks for your review bblack. I wasn't quite expecting it yet, but this is helpful :) [11:45:07] one or two things I spotted in there which I don't think are used anymore, were to support the old self-signed cert [11:45:35] akosiaris: should I purge puppet on palladium still? [11:46:21] mark: yeah, palladium wise nothings changed yet [11:48:29] !log puppet node clean & salt-key -d of cp3022.esams.wmnet [11:48:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:50:27] 06Operations, 10Datasets-General-or-Unknown: reinstall snapshot1001.eqiad.wmnet with RAID, decomm snapshot1002,3,4 - https://phabricator.wikimedia.org/T140439#2515031 (10faidon) >>! In T140439#2512257, @RobH wrote: > The H200 seems very, very much like the H300/H310. Considering that the H310 isn't good enoug... [11:51:21] (03CR) 10BBlack: [C: 031] "But be sure to use salt+varnishadm to runtime set default_ttl on cache_upload to the new value before the patch hits, just in case. (also," [puppet] - 10https://gerrit.wikimedia.org/r/302388 (owner: 10Ema) [11:53:19] 06Operations, 10Datasets-General-or-Unknown, 10netops: dumps.wikimedia.org seems to have poor throughput towards some destinations - https://phabricator.wikimedia.org/T120425#2515046 (10Nemo_bis) This is a test of download from upload.wm.o on 217.30.184.184 (lakka.kapsi.fi). Just dumping here ``` federico@l... [11:56:49] 06Operations, 10Datasets-General-or-Unknown, 10netops: dumps.wikimedia.org seems to have poor throughput towards some destinations - https://phabricator.wikimedia.org/T120425#2515066 (10mark) So you're currently seeing 4+ MB/s, or ~40+ Mbps from eqiad, downloading from busy servers with many many connections... [11:57:15] (03PS2) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/301192 [11:57:29] (03PS3) 10Hashar: Jenkins job validation (DO NOT SUBMIT) [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/301192 [12:02:18] (03PS1) 10Mark Bergsma: Move cp3022 to standard puppet role after reinstall [puppet] - 10https://gerrit.wikimedia.org/r/302419 [12:03:25] (03CR) 10jenkins-bot: [V: 04-1] Move cp3022 to standard puppet role after reinstall [puppet] - 10https://gerrit.wikimedia.org/r/302419 (owner: 10Mark Bergsma) [12:04:27] (03PS2) 10Mark Bergsma: Move cp3022 to standard puppet role after reinstall [puppet] - 10https://gerrit.wikimedia.org/r/302419 [12:04:51] (03PS1) 10Alexandros Kosiaris: role::labsdb::manager: contain variables in class [puppet] - 10https://gerrit.wikimedia.org/r/302420 [12:05:05] what, ' ' required these days? [12:05:31] (03CR) 10jenkins-bot: [V: 04-1] Move cp3022 to standard puppet role after reinstall [puppet] - 10https://gerrit.wikimedia.org/r/302419 (owner: 10Mark Bergsma) [12:05:33] (03PS3) 10Mark Bergsma: Move cp3022 to standard puppet role after reinstall [puppet] - 10https://gerrit.wikimedia.org/r/302419 [12:07:12] PROBLEM - parsoid on wtp1002 is CRITICAL: Connection refused [12:07:53] PROBLEM - salt-minion processes on wtp1002 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [12:08:03] (03CR) 10Mark Bergsma: [C: 032] Move cp3022 to standard puppet role after reinstall [puppet] - 10https://gerrit.wikimedia.org/r/302419 (owner: 10Mark Bergsma) [12:10:12] 06Operations, 10Datasets-General-or-Unknown: reinstall snapshot1001.eqiad.wmnet with RAID, decomm snapshot1002,3,4 - https://phabricator.wikimedia.org/T140439#2515095 (10ArielGlenn) @faidon What would you suggest? Performance is not critical for this host, as it's going to be a canary/testbed, but it would be... [12:10:23] 06Operations, 10Datasets-General-or-Unknown: reinstall snapshot1001.eqiad.wmnet with RAID - https://phabricator.wikimedia.org/T140439#2515096 (10Peachey88) [12:11:29] (03CR) 10Alexandros Kosiaris: "guess what... PCC says the previous thing never really worked!!!!" [puppet] - 10https://gerrit.wikimedia.org/r/302414 (owner: 10Alexandros Kosiaris) [12:11:32] PROBLEM - puppet last run on db2042 is CRITICAL: CRITICAL: puppet fail [12:12:07] jynus: any idea what I am fixing here ? https://gerrit.wikimedia.org/r/#/c/302414/1 ? [12:12:34] (03PS4) 10Addshore: Remove T107711 debug logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302113 (https://phabricator.wikimedia.org/T107711) [12:12:41] cause I can't shake the feeling this thing is not really used [12:12:46] akosiaris, I have 0 context about that [12:12:59] for me skrillex is that band btw... [12:13:12] role/db.pp is not used, that I know [12:13:17] oh it is [12:13:30] role::labsdb::manager is applied to tin and mira [12:13:33] in labs, maybe [12:13:41] labsdb::manager? [12:13:47] now whether that thing is actually used anywhere... [12:14:00] yeah I have no idea how it ended in there [12:14:15] will you believe me if I tell you it is the first time I have heard about that? [12:14:23] yes, mostly cause it's mine too [12:14:37] I heard about skrillex, I just need to remember [12:15:04] yes, role/db.pp references wmf_db, which is deprecated [12:16:03] I am gonna merge those changes just to stop puppetmaster whining on syslog every now and then [12:16:14] but something tells me that thing should be killed [12:16:27] (03CR) 10Alexandros Kosiaris: [C: 032] role::labsdb::manager: Move under the role module [puppet] - 10https://gerrit.wikimedia.org/r/302414 (owner: 10Alexandros Kosiaris) [12:16:32] (03PS2) 10Alexandros Kosiaris: role::labsdb::manager: Move under the role module [puppet] - 10https://gerrit.wikimedia.org/r/302414 [12:16:35] (03CR) 10Alexandros Kosiaris: [V: 032] role::labsdb::manager: Move under the role module [puppet] - 10https://gerrit.wikimedia.org/r/302414 (owner: 10Alexandros Kosiaris) [12:16:51] (03PS2) 10Alexandros Kosiaris: role::labsdb::manager: contain variables in class [puppet] - 10https://gerrit.wikimedia.org/r/302420 [12:16:56] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] role::labsdb::manager: contain variables in class [puppet] - 10https://gerrit.wikimedia.org/r/302420 (owner: 10Alexandros Kosiaris) [12:17:35] and of course the code had horrible lookups that did not work! [12:18:08] that is all to be killed [12:18:13] referencing class before they were included is not a sane way to get variables from then [12:18:18] them* [12:18:25] jynus: yeah I thought so too [12:18:47] the reason I just do not do it is because it requires some small research to not affect potential users outside of production [12:19:01] I will, it is just not a priority [12:19:28] but if you are doing cleanup I will put it higher [12:19:58] to show you I am working on it, see: https://gerrit.wikimedia.org/r/#/c/301076/ [12:20:30] the problems is that all previous dbas instead of fixing puppet, just created a new set of classes on top of the previous one [12:20:34] jynus: I am not really doing a cleanup. It's just the puppet 3.8 spews a whole new awesomeness of messages in the syslog [12:20:40] and guess what came up :-) [12:20:54] but I am really happy you are cleaning that already [12:20:57] :-D [12:21:04] I will try to avoid that, and I think now we can delete most of that [12:21:16] for the first time since last week [12:21:43] akosiaris, please tell me the issue, and I will get it done for you [12:21:59] as in "this code has errors" [12:22:41] (Scope(Class[Role::Labsdb::Manager])) Could not look up qualified variable 'passwords::misc::scripts::mysql_labsdb_root_pass'; class passwords::misc::scripts has not been evaluated [12:23:18] but the general idea is that db or mysql roles, or wmf_mysql and coredb classes can be killed [12:23:30] :D :D :D [12:24:08] if that is what the commit is trying to do, just pass it to me and I will delete it safely this afternoon [12:24:14] so, fixes for the issues I 've noticed are in https://gerrit.wikimedia.org/r/#/c/302420/ and https://gerrit.wikimedia.org/r/#/c/302403/ [12:24:19] as you can see, it's labs [12:24:41] I should have prevented you from fixing it [12:24:42] point is, variables being referenced before the class containing them has been evaluated is a bad idea [12:24:47] and just kill it with fire [12:24:57] oh, no worries it was small enough [12:25:10] I now got a small mess with ::site and ::realm... that one is way worse [12:25:21] not sure what yeah is to be done ... :-( [12:25:29] 06Operations, 06Release-Engineering-Team, 15User-greg, 07Wikimedia-Incident: Institute a weekly review of all UBN! tasks - https://phabricator.wikimedia.org/T141130#2515119 (10Aklapper) I appreciate this as I considered this my work so far. :P (I tried more like every other week, plus only pinging / naggin... [12:26:01] akosiaris, let me wait, in case it was in use by labs people [12:26:11] but I can take it from here [12:26:41] ok [12:26:57] (03PS5) 10Paladox: Gerrit: Avoid breaking full phabricator URLs [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) [12:27:42] (03PS6) 10Paladox: Gerrit: Avoid breaking full phabricator URLs [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) [12:28:43] "make skrillex.py executable by wikidev group" asher Jun 19 2013, 1:38 AM [12:28:50] (03CR) 10Subramanya Sastry: [C: 031] parsoid: Remove disk space alert (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/302401 (owner: 10Alexandros Kosiaris) [12:33:19] !log akosiaris@palladium conftool action : set/weight=1; selector: wtp1002.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=parsoid', 'service=parsoid']) [12:33:23] !log akosiaris@palladium conftool action : set/weight=1; selector: wtp1001.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=parsoid', 'service=parsoid']) [12:33:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:33:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:33:30] !log akosiaris@palladium conftool action : set/pooled=yes; selector: wtp1001.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=parsoid', 'service=parsoid']) [12:33:34] !log akosiaris@palladium conftool action : set/pooled=yes; selector: wtp1002.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=parsoid', 'service=parsoid']) [12:33:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:33:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:34:07] !log T135176 repool wtp100[12] with a weight of 1 instead of 15 [12:34:08] T135176: Migrate Parsoid cluster to Jessie / node 4.x - https://phabricator.wikimedia.org/T135176 [12:34:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:36:53] (03PS8) 10ArielGlenn: add cron job for Content Translation dumps [puppet] - 10https://gerrit.wikimedia.org/r/301773 (https://phabricator.wikimedia.org/T127793) [12:38:11] RECOVERY - puppet last run on db2042 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:39:11] RECOVERY - salt-minion processes on wtp1002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [12:40:52] RECOVERY - parsoid on wtp1002 is OK: HTTP OK: HTTP/1.1 200 OK - 1514 bytes in 0.021 second response time [12:44:42] (03CR) 10KartikMistry: "Looks fine to me with my limited Puppet knowledge. I would love to use contenttranslation everywhere instead of different naming: xlation," (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/301773 (https://phabricator.wikimedia.org/T127793) (owner: 10ArielGlenn) [12:45:01] apergos: minor comments, otherwise LGTM. [12:45:12] great [12:45:27] thanks for your comments! [12:45:29] apergos: As noted: with my limited Puppet foo knowledge. [12:45:33] sure [12:45:33] :) [12:46:24] (03CR) 10Nikerabbit: add cron job for Content Translation dumps (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/301773 (https://phabricator.wikimedia.org/T127793) (owner: 10ArielGlenn) [12:46:43] (03PS1) 10Jcrespo: Add a field to pt-heartbeat to monitor different datacenters [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/302426 (https://phabricator.wikimedia.org/T114752) [12:47:55] Nikerabbit: thanks for checking. that one routine is not called in fact :-D copy-paste from else where..... [12:48:07] (03CR) 10Jcrespo: [C: 04-2] "Do not deploy; not only this has not been tested, it will break replication monitoring if it was done before a heartbeat schema change." [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/302426 (https://phabricator.wikimedia.org/T114752) (owner: 10Jcrespo) [12:52:40] 06Operations, 10Parsoid, 06Services, 15User-mobrovac: Migrate Parsoid cluster to Jessie / node 4.x - https://phabricator.wikimedia.org/T135176#2515157 (10akosiaris) wtp100[12] have been reinstalled as jessie and repooled with a very low weight temporarily which will be increased within the day, assuming pr... [12:56:37] (03PS1) 10Jcrespo: Remove labsdb::manager [puppet] - 10https://gerrit.wikimedia.org/r/302427 [12:58:11] (03CR) 10Samtar: [C: 031] "Requires rebase, code lgtm though" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302400 (https://phabricator.wikimedia.org/T141847) (owner: 10MarcoAurelio) [12:58:13] akosiaris, see ^. I will talk to chase and tin users in case some are affected [13:02:44] (03PS1) 10Jcrespo: Remove db role [puppet] - 10https://gerrit.wikimedia.org/r/302429 [13:03:11] 06Operations, 13Patch-For-Review: Update firejail to 0.40 - https://phabricator.wikimedia.org/T121756#2515170 (10MoritzMuehlenhoff) [13:03:29] (03CR) 10Jcrespo: [C: 04-1] "This needs more research to delete all dependencies and cruft that is no longer needed." [puppet] - 10https://gerrit.wikimedia.org/r/302429 (owner: 10Jcrespo) [13:20:05] (03CR) 10Alexandros Kosiaris: [C: 04-1] "don't forget to remove skrillex.py as well!" [puppet] - 10https://gerrit.wikimedia.org/r/302427 (owner: 10Jcrespo) [13:21:42] (03PS1) 10Alexandros Kosiaris: puppetmaster: Revert all hosts pointed to it [puppet] - 10https://gerrit.wikimedia.org/r/302431 [13:22:19] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] puppetmaster: Revert all hosts pointed to it [puppet] - 10https://gerrit.wikimedia.org/r/302431 (owner: 10Alexandros Kosiaris) [13:24:01] PROBLEM - eventlogging-service-eventbus endpoints health on kafka2002 is CRITICAL: /v1/events (Produce a valid test event) is CRITICAL: Test Produce a valid test event returned the unexpected status 500 (expecting: 201) [13:26:01] RECOVERY - eventlogging-service-eventbus endpoints health on kafka2002 is OK: All endpoints are healthy [13:28:00] so this is a known problem, and I believe my downtime has expired [13:28:14] I am going to silence it in a bit [13:28:34] (and yes we are working on it and we'll fix it before upgrading kafka on eventbus eqiad :) [13:47:26] elukey: is this about the kafka py client problem ottomata is working on? [13:47:42] yessir [13:47:45] kk [13:47:46] thnx [13:48:22] (03PS9) 10ArielGlenn: add cron job for Content Translation dumps [puppet] - 10https://gerrit.wikimedia.org/r/301773 (https://phabricator.wikimedia.org/T127793) [13:49:01] (03CR) 10AndyRussG: [C: 032] "Cool! Yes, this config var was removed in Ice057bd33 :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301747 (owner: 10MaxSem) [13:49:30] (03PS2) 10AndyRussG: Labs: remove $wgCentralGeoScriptURL - matches prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301747 (owner: 10MaxSem) [13:54:08] (03CR) 10ArielGlenn: "I've changed the dir (and therefore the url) to the full name 'contenttranslation' as per your preference (Kartik). Generally the manifes" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/301773 (https://phabricator.wikimedia.org/T127793) (owner: 10ArielGlenn) [13:57:19] 06Operations, 10Security-Reviews, 07Surveys: Re-evaluate Limesurvey - https://phabricator.wikimedia.org/T109606#2515352 (10Nemo_bis) [13:59:25] (03CR) 10AndyRussG: [C: 04-2] "Thanks so much for the tidying!!! :D Maybe I'm missing something...?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301748 (owner: 10MaxSem) [14:01:05] (03CR) 10Alex Monk: "I think you are... The line you point to matches the one in this file." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301748 (owner: 10MaxSem) [14:05:11] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-spa-arg] - 10https://gerrit.wikimedia.org/r/295122 (https://phabricator.wikimedia.org/T124370) (owner: 10KartikMistry) [14:06:20] akosiaris: what's next with Apertium packages? I'm looking at jenkins failures, some looks geniune. [14:08:56] kart_: the genuine ones should be fixed of course. but the next step is reviewing the +2ed by jenkins and merging and uploading on apt.w.o [14:09:21] akosiaris: some are dependencies issues. [14:09:30] akosiaris: I'm looking them one-by-one. [14:09:44] yeah. [14:09:47] akosiaris: Please do :) [14:10:37] yeah the dependency issues will be fixed on their own as I upload packages on apt.w.o [14:11:43] /go lindseyanne [14:11:46] nope [14:12:15] but /go is from https://scripts.irssi.org/scripts/go.pl [14:13:23] 06Operations, 10Traffic: Support TLS chacha20-poly1305 AEAD ciphers - https://phabricator.wikimedia.org/T131908#2515382 (10MoritzMuehlenhoff) Some summarised comments in mostly random order :-) First of all, I don't think moving to BoringSSL/LibreSSL is doable (or even desirable). BoringSSL is essentially an... [14:14:50] !log resizing online labsdb1006 postgres fs to increase available disk space [14:14:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:18:26] (03PS1) 10Alexandros Kosiaris: Revert "puppetmaster: Switch all of codfw to the new server" [puppet] - 10https://gerrit.wikimedia.org/r/302433 [14:19:26] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Revert "puppetmaster: Switch all of codfw to the new server" [puppet] - 10https://gerrit.wikimedia.org/r/302433 (owner: 10Alexandros Kosiaris) [14:22:16] !log akosiaris@palladium conftool action : set/weight=8; selector: wtp1002.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=parsoid', 'service=parsoid']) [14:22:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:22:21] !log akosiaris@palladium conftool action : set/weight=8; selector: wtp1001.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=parsoid', 'service=parsoid']) [14:22:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:24:17] !log upgrading restbase staging systems to firejail 0.9.40.3 [14:24:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:24:50] !log setting default_ttl=604800 on cache_upload varnish backends [14:24:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:26:22] !log T135176 set weight for wtp100[12] to 8 [14:26:23] T135176: Migrate Parsoid cluster to Jessie / node 4.x - https://phabricator.wikimedia.org/T135176 [14:26:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:27:09] (03PS2) 10Ema: cache_upload: use default_ttl instead of ttl_fixed [puppet] - 10https://gerrit.wikimedia.org/r/302388 [14:27:33] (03CR) 10Ema: [C: 032 V: 032] cache_upload: use default_ttl instead of ttl_fixed [puppet] - 10https://gerrit.wikimedia.org/r/302388 (owner: 10Ema) [14:27:40] (03PS3) 10MarcoAurelio: Grant permission 'managechangetags' to 'abusefilter' group on English Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302400 (https://phabricator.wikimedia.org/T141847) [14:29:26] (03CR) 10Giuseppe Lavagetto: "I am splitting this up progressively." (033 comments) [debs/pybal] - 10https://gerrit.wikimedia.org/r/272679 (owner: 10Giuseppe Lavagetto) [14:29:38] !log upgrading restbase2003 system to firejail 0.9.40.3 [14:29:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:29:59] (03CR) 10Chad: Minor tweaks to 2.12.2 package (032 comments) [debs/gerrit] - 10https://gerrit.wikimedia.org/r/299164 (owner: 10Chad) [14:30:08] (03PS7) 10Chad: Minor tweaks to 2.12.2 package [debs/gerrit] - 10https://gerrit.wikimedia.org/r/299164 [14:30:28] (03PS1) 10Giuseppe Lavagetto: Split IPVS Manager into the interface and manager implementation [debs/pybal] - 10https://gerrit.wikimedia.org/r/302434 [14:30:30] (03PS1) 10Giuseppe Lavagetto: Add generic Finite States Machine [debs/pybal] - 10https://gerrit.wikimedia.org/r/302435 [14:34:57] (03CR) 10AndyRussG: [C: 031] "> I think you are... The line you point to matches the one in this file." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301748 (owner: 10MaxSem) [14:35:21] PROBLEM - puppet last run on restbase2003 is CRITICAL: CRITICAL: Puppet has 1 failures [14:37:20] (03CR) 10Muehlenhoff: [C: 032] Minor tweaks to 2.12.2 package [debs/gerrit] - 10https://gerrit.wikimedia.org/r/299164 (owner: 10Chad) [14:38:10] PROBLEM - puppet last run on aqs1006 is CRITICAL: CRITICAL: Puppet last ran 6 hours ago [14:38:49] this is me, I re-enabled it but it must have not been run in the meantime [14:38:55] AndyRussG, did you get a lot of browser test failures last night? [14:39:08] I ask because if so it was probably me [14:39:42] Krenair: I love you. [14:40:02] RECOVERY - puppet last run on aqs1006 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [14:40:39] (03PS1) 10Ema: cache_upload: persistent storage backend naming on v4 [puppet] - 10https://gerrit.wikimedia.org/r/302439 (https://phabricator.wikimedia.org/T131502) [14:41:37] (03PS1) 10Mark Bergsma: Add BACKPORTS=yes hook to include $DIST-backports apt source [puppet] - 10https://gerrit.wikimedia.org/r/302440 [14:41:46] akosiaris: ^ something like this? [14:41:59] (03PS2) 10MarcoAurelio: Move abusefilter permissions to abusefilter.php for azbwiki. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302410 (https://phabricator.wikimedia.org/T141860) [14:42:10] (03PS2) 10Ottomata: Hieraize eventlogging_kafka_handler to allow selection of different kafka clients [puppet] - 10https://gerrit.wikimedia.org/r/301126 [14:45:44] (03PS2) 10Mark Bergsma: Add BACKPORTS=yes hook to include $DIST-backports apt source [puppet] - 10https://gerrit.wikimedia.org/r/302440 [14:46:36] (03CR) 10Ottomata: [C: 032] Hieraize eventlogging_kafka_handler to allow selection of different kafka clients [puppet] - 10https://gerrit.wikimedia.org/r/301126 (owner: 10Ottomata) [14:46:51] (03PS2) 10Chad: Gerrit: Redirect plain "/r" (no trailing slash) to gerrit as well [puppet] - 10https://gerrit.wikimedia.org/r/301829 [14:46:54] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002/stat1004 for Jdlrobson - https://phabricator.wikimedia.org/T141811#2513064 (10ema) Hi @Jdlrobson! Could you please briefly explain the reason for this access request? CC: @elukey @Ottomata @Nuria [14:47:08] 06Operations, 10Traffic: Support TLS chacha20-poly1305 AEAD ciphers - https://phabricator.wikimedia.org/T131908#2515438 (10BBlack) >>! In T131908#2515382, @MoritzMuehlenhoff wrote: > Some summarised comments in mostly random order :-) Thanks for the feedback! I know it's a complex annoying topic :) > First... [14:49:24] (03PS1) 10Andrew Bogott: Change labs_nova_network_host for labtestnet to just a hostname. [puppet] - 10https://gerrit.wikimedia.org/r/302441 [14:50:49] (03CR) 10Andrew Bogott: [C: 032] Change labs_nova_network_host for labtestnet to just a hostname. [puppet] - 10https://gerrit.wikimedia.org/r/302441 (owner: 10Andrew Bogott) [14:55:32] 06Operations, 10Ops-Access-Requests: root access on security-tools instances for Darian Patrick - https://phabricator.wikimedia.org/T138873#2412705 (10ema) Any progress on this? @dpatrick: can we see the scripts? [15:00:04] anomie, ostriches, thcipriani, hashar, and twentyafterfour: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160802T1500). [15:00:04] kart_ and Addshore: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:17] * kart_ is here [15:00:21] * addshore is here [15:00:47] addshore: want to run this one? [15:01:09] thcipriani: sure! :) [15:01:25] addshore: awesome. Let me know if you need anything, I'll be around. [15:02:00] thcipriani: does anything special have to be done with updating js files? [15:02:43] addshore: mostly resource loader does the right thing [15:02:48] okay! :) [15:02:57] https://wikitech.wikimedia.org/wiki/How_to_deploy_code#A_note_on_JavaScript_and_CSS [15:03:15] great! [15:03:56] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002/stat1004 for Jdlrobson - https://phabricator.wikimedia.org/T141811#2515469 (10Ottomata) Ja, access to Hive isn't descriptive enough. What data do you need to work with? [15:04:25] RECOVERY - puppet last run on restbase2003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:05:10] (03PS2) 10Jcrespo: Remove labsdb::manager [puppet] - 10https://gerrit.wikimedia.org/r/302427 [15:06:03] kart_: doing yours first :) [15:06:12] sure! [15:07:40] * kart_ is finding quick short article to test.. [15:07:48] :) [15:08:15] found. So, I'm ready. [15:08:24] cool! just waiting for jenkins to finish! [15:08:39] (03CR) 10Faidon Liambotis: [C: 031] parsoid: Remove disk space alert [puppet] - 10https://gerrit.wikimedia.org/r/302401 (owner: 10Alexandros Kosiaris) [15:08:50] 06Operations, 06Discovery, 10netops, 03Discovery-Search-Sprint: deploy elasticsearch/plugins to relforge1001-1002 servers - https://phabricator.wikimedia.org/T141085#2515472 (10Gehel) Some notes of our discussion with @chasemp: [[ https://github.com/wikimedia/operations-puppet/blob/production/hieradata/co... [15:10:47] (03PS2) 10Faidon Liambotis: pmacct: Limit to production networks [puppet] - 10https://gerrit.wikimedia.org/r/301621 (owner: 10Muehlenhoff) [15:10:51] (03PS1) 10Andrew Bogott: Fix the puppet source in some file comments. [puppet] - 10https://gerrit.wikimedia.org/r/302446 [15:10:53] (03PS1) 10Andrew Bogott: Nova: Disable instance creation for all non-admins. [puppet] - 10https://gerrit.wikimedia.org/r/302447 [15:10:55] (03PS1) 10Andrew Bogott: Switch Labs from Openstack Kilo to Liberty; [puppet] - 10https://gerrit.wikimedia.org/r/302448 [15:11:06] (03CR) 10Faidon Liambotis: [C: 032] "Yup, definitely." [puppet] - 10https://gerrit.wikimedia.org/r/301621 (owner: 10Muehlenhoff) [15:11:29] andrewbogott: \o/ [15:12:46] (03PS3) 10Gehel: Maps - initial data import [puppet] - 10https://gerrit.wikimedia.org/r/300572 (https://phabricator.wikimedia.org/T138501) [15:12:55] (03PS3) 10Faidon Liambotis: network: use $all_networks in exim4 [puppet] - 10https://gerrit.wikimedia.org/r/296737 [15:13:03] (03CR) 10Faidon Liambotis: [C: 032 V: 032] network: use $all_networks in exim4 [puppet] - 10https://gerrit.wikimedia.org/r/296737 (owner: 10Faidon Liambotis) [15:13:08] kart_: it should be on mw1099 for you to check! [15:13:35] (03CR) 10Andrew Bogott: [C: 032] Fix the puppet source in some file comments. [puppet] - 10https://gerrit.wikimedia.org/r/302446 (owner: 10Andrew Bogott) [15:13:56] (03PS3) 10Faidon Liambotis: librenms: remove nets setting [puppet] - 10https://gerrit.wikimedia.org/r/296738 [15:14:11] (03CR) 10Faidon Liambotis: [C: 032 V: 032] librenms: remove nets setting [puppet] - 10https://gerrit.wikimedia.org/r/296738 (owner: 10Faidon Liambotis) [15:14:14] addshore: hmm. It can be tested with article publish, is that fine to test with mw1099? [15:14:30] (03PS2) 10Andrew Bogott: Fix the puppet source in some file comments. [puppet] - 10https://gerrit.wikimedia.org/r/302446 [15:14:32] (03CR) 10Faidon Liambotis: [C: 032] parsoid: Remove disk space alert [puppet] - 10https://gerrit.wikimedia.org/r/302401 (owner: 10Alexandros Kosiaris) [15:14:35] (03PS2) 10Faidon Liambotis: parsoid: Remove disk space alert [puppet] - 10https://gerrit.wikimedia.org/r/302401 (owner: 10Alexandros Kosiaris) [15:14:44] (03CR) 10Faidon Liambotis: [V: 032] parsoid: Remove disk space alert [puppet] - 10https://gerrit.wikimedia.org/r/302401 (owner: 10Alexandros Kosiaris) [15:15:18] kart_: anything you do on a regular server will be fine to test on mw1099! See https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug [15:15:33] (03PS4) 10Faidon Liambotis: network: move external_networks to hiera as well [puppet] - 10https://gerrit.wikimedia.org/r/296739 [15:15:47] addshore: yes. Was just not sure about publish. But since CX stays on same tab, let me try. [15:15:55] addshore: give me 3-4 minutes to translate. [15:15:58] okay! [15:16:18] (03CR) 10Faidon Liambotis: [C: 032] network: move external_networks to hiera as well [puppet] - 10https://gerrit.wikimedia.org/r/296739 (owner: 10Faidon Liambotis) [15:16:24] PROBLEM - puppet last run on mw2240 is CRITICAL: CRITICAL: puppet fail [15:16:39] 06Operations, 06Commons, 10media-storage: Install mscorefonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T140141#2515495 (10fgiunchedi) @MoritzMuehlenhoff a separate package hosted internally sounds good to me! should be easy to base it off the existing one [15:16:52] (03PS3) 10Andrew Bogott: Fix the puppet source in some file comments. [puppet] - 10https://gerrit.wikimedia.org/r/302446 [15:18:15] (03PS1) 10Gehel: Labs network categorization correction [puppet] - 10https://gerrit.wikimedia.org/r/302450 (https://phabricator.wikimedia.org/T141085) [15:18:21] (03PS1) 10ArielGlenn: don't use -q arg for php, we run a modern version which doesn't need it [dumps] - 10https://gerrit.wikimedia.org/r/302451 [15:20:13] (03PS4) 10Andrew Bogott: Fix the puppet source in some file comments. [puppet] - 10https://gerrit.wikimedia.org/r/302446 [15:20:22] (03CR) 10Faidon Liambotis: [C: 032] Labs network categorization correction [puppet] - 10https://gerrit.wikimedia.org/r/302450 (https://phabricator.wikimedia.org/T141085) (owner: 10Gehel) [15:20:29] (03PS2) 10Faidon Liambotis: Labs network categorization correction [puppet] - 10https://gerrit.wikimedia.org/r/302450 (https://phabricator.wikimedia.org/T141085) (owner: 10Gehel) [15:21:00] paravoid: thanks! You're fast! [15:21:18] the canonical source isn't necessarily DNS, fwiw [15:21:21] kart_: you could also test it on testwiki to avoid having to actually translate something correctly ;) [15:21:27] the network config itself is probably the canonical source [15:21:31] (whiule using mw1099)! [15:21:33] addshore: I did. Checking. [15:21:35] well, sometimes :) [15:21:51] (03PS1) 10Ottomata: Hieraize kafka handler for eventlogging analytics, select appropriate auto_offset_reset [puppet] - 10https://gerrit.wikimedia.org/r/302453 (https://phabricator.wikimedia.org/T133779) [15:21:57] addshore: looks good. Go ahead. [15:22:02] (03CR) 10Ottomata: [C: 032] 1.2.5 release [debs/python-kafka] (debian) - 10https://gerrit.wikimedia.org/r/302255 (owner: 10Ottomata) [15:22:03] doing! [15:22:17] paravoid: unless the network config is wrong :) In any case, that's not something I'm going to mess with without discussion with someone who knows what he is doing... [15:22:24] * kart_ will fix other things in article to make sure it is really nice article. [15:22:40] (03PS2) 10Andrew Bogott: Switch Labs from Openstack Kilo to Liberty; [puppet] - 10https://gerrit.wikimedia.org/r/302448 [15:22:42] (03PS2) 10Andrew Bogott: Nova: Disable instance creation for all non-admins. [puppet] - 10https://gerrit.wikimedia.org/r/302447 [15:22:44] (03PS5) 10Andrew Bogott: Fix the puppet source in some file comments. [puppet] - 10https://gerrit.wikimedia.org/r/302446 [15:22:50] yeah :) [15:22:51] !log addshore@tin Synchronized php-1.28.0-wmf.12/extensions/ContentTranslation/modules/tools/ext.cx.tools.link.js: SWAT: [[gerrit:302413|Fix: Target links has source link titles]] (duration: 00m 34s) [15:22:52] (03CR) 10jenkins-bot: [V: 04-1] Hieraize kafka handler for eventlogging analytics, select appropriate auto_offset_reset [puppet] - 10https://gerrit.wikimedia.org/r/302453 (https://phabricator.wikimedia.org/T133779) (owner: 10Ottomata) [15:22:53] kart_: ^^ it's everywhere [15:22:54] if in doubt, ask :) [15:22:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:23:03] cool. Thanks! addshore [15:23:05] Hi i am getting this error network::external in any Hiera data file and no default supplied at /etc/puppet/modules/network/manifests/constants.pp:5 [15:23:07] in labs [15:23:13] i didnt get it yesturday [15:23:23] (03CR) 10Addshore: [C: 032] Remove T107711 debug logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302113 (https://phabricator.wikimedia.org/T107711) (owner: 10Addshore) [15:23:31] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find data item network::external in any Hiera data file and no default supplied at /etc/puppet/modules/network/manifests/constants.pp:5 on node gerrit-test3.git.eqiad.wmflabs [15:23:33] oops [15:23:51] (03Merged) 10jenkins-bot: Remove T107711 debug logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302113 (https://phabricator.wikimedia.org/T107711) (owner: 10Addshore) [15:24:05] oh [15:24:05] probably related to https://gerrit.wikimedia.org/r/296739 [15:24:16] Oh yeh [15:24:25] (03PS2) 10Ottomata: Hieraize kafka handler for eventlogging analytics, select appropriate auto_offset_reset [puppet] - 10https://gerrit.wikimedia.org/r/302453 (https://phabricator.wikimedia.org/T133779) [15:24:31] (03CR) 10Rush: [C: 031] "tested in labtest and announced via labs-announce" [puppet] - 10https://gerrit.wikimedia.org/r/302448 (owner: 10Andrew Bogott) [15:25:03] (03PS8) 10Filippo Giunchedi: puppetization for thumbor [puppet] - 10https://gerrit.wikimedia.org/r/300827 (https://phabricator.wikimedia.org/T139606) [15:25:05] (03PS3) 10Filippo Giunchedi: lvs: add thumbor to lvs [puppet] - 10https://gerrit.wikimedia.org/r/300244 (https://phabricator.wikimedia.org/T139606) [15:25:10] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:302113|Remove T107711 debug logging]] (duration: 00m 30s) [15:25:11] T107711: [Bug] Catchable fatal error: Argument 1 passed to StatementGroupRendererFactory::newLanguageAwareRenderer() must be an instance of Language, StubUserLang given - https://phabricator.wikimedia.org/T107711 [15:25:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:25:37] (03CR) 10jenkins-bot: [V: 04-1] Hieraize kafka handler for eventlogging analytics, select appropriate auto_offset_reset [puppet] - 10https://gerrit.wikimedia.org/r/302453 (https://phabricator.wikimedia.org/T133779) (owner: 10Ottomata) [15:26:11] (03CR) 10Filippo Giunchedi: [C: 032] puppetization for thumbor [puppet] - 10https://gerrit.wikimedia.org/r/300827 (https://phabricator.wikimedia.org/T139606) (owner: 10Filippo Giunchedi) [15:26:29] (03CR) 10Filippo Giunchedi: [V: 032] puppetization for thumbor [puppet] - 10https://gerrit.wikimedia.org/r/300827 (https://phabricator.wikimedia.org/T139606) (owner: 10Filippo Giunchedi) [15:26:36] (03PS9) 10Filippo Giunchedi: puppetization for thumbor [puppet] - 10https://gerrit.wikimedia.org/r/300827 (https://phabricator.wikimedia.org/T139606) [15:26:39] (03CR) 10Filippo Giunchedi: [V: 032] puppetization for thumbor [puppet] - 10https://gerrit.wikimedia.org/r/300827 (https://phabricator.wikimedia.org/T139606) (owner: 10Filippo Giunchedi) [15:27:49] (03PS3) 10Ottomata: Hieraize kafka handler for eventlogging analytics, select appropriate auto_offset_reset [puppet] - 10https://gerrit.wikimedia.org/r/302453 (https://phabricator.wikimedia.org/T133779) [15:28:10] paravoid hi, could we revert that patch please, or are you working on a fix? [15:29:14] I'm not sure what the fix would be? [15:29:23] (03PS1) 10Muehlenhoff: Fix malformed changelog entry [debs/gerrit] - 10https://gerrit.wikimedia.org/r/302455 [15:29:24] how are the rest of network hiera stuff handled? [15:29:50] Im not sure, i just ran sudo puppet agent -tv and it brought that error up [15:29:53] !log addshore@tin Synchronized php-1.28.0-wmf.12/extensions/RevisionSlider/modules/ext.RevisionSlider.init.js: SWAT: [[gerrit:302421|Track the load times of RevisionSlider]] (duration: 00m 26s) [15:29:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:30:11] it looks like puppet may be broken across the whole of labs [15:30:13] They are also handled through there own hieradata pages for example https://wikitech.wikimedia.org/wiki/Hiera:Git [15:30:16] Yeh [15:30:18] thcipriani: SWAT all done :) [15:30:26] (03CR) 10Muehlenhoff: [C: 032] Fix malformed changelog entry [debs/gerrit] - 10https://gerrit.wikimedia.org/r/302455 (owner: 10Muehlenhoff) [15:30:36] Krenair it is https://phabricator.wikimedia.org/rOPUP1f831b25c3187878d0b111ff65cd2c977a45b9b5 [15:30:40] which broke puppet [15:30:58] addshore: awesome! nicely done, sir :) [15:31:04] I know. [15:31:08] ok [15:31:09] (03CR) 10Ema: [C: 032] cache_upload: persistent storage backend naming on v4 [puppet] - 10https://gerrit.wikimedia.org/r/302439 (https://phabricator.wikimedia.org/T131502) (owner: 10Ema) [15:31:12] thcipriani: why thankyou! :) [15:31:20] (03PS2) 10Ema: cache_upload: persistent storage backend naming on v4 [puppet] - 10https://gerrit.wikimedia.org/r/302439 (https://phabricator.wikimedia.org/T131502) [15:31:23] (03CR) 10Ema: [V: 032] cache_upload: persistent storage backend naming on v4 [puppet] - 10https://gerrit.wikimedia.org/r/302439 (https://phabricator.wikimedia.org/T131502) (owner: 10Ema) [15:31:36] Im guessing it needs to be reverted then? [15:32:35] Krenair ^^ paravoid [15:32:47] !log performing schema change on heartbeat.heartbeat on all core databases [15:32:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:33:06] maybe not the whole of labs [15:33:13] the change is correct by itself, it just needs a corresponding Labs change (or something) [15:33:18] Oh [15:33:32] If a corresponding labs change needs to be made it should've been part of the commit [15:33:50] (03PS4) 10Filippo Giunchedi: lvs: add thumbor to lvs [puppet] - 10https://gerrit.wikimedia.org/r/300244 (https://phabricator.wikimedia.org/T139606) [15:33:52] (03PS1) 10Filippo Giunchedi: site: add thumbor100[12] [puppet] - 10https://gerrit.wikimedia.org/r/302457 (https://phabricator.wikimedia.org/T139606) [15:34:04] that change wouldn't be on the same repository [15:34:10] so I don't see how I could do that [15:34:23] (03CR) 10Ottomata: [C: 032] Hieraize kafka handler for eventlogging analytics, select appropriate auto_offset_reset [puppet] - 10https://gerrit.wikimedia.org/r/302453 (https://phabricator.wikimedia.org/T133779) (owner: 10Ottomata) [15:34:27] (03PS4) 10Ottomata: Hieraize kafka handler for eventlogging analytics, select appropriate auto_offset_reset [puppet] - 10https://gerrit.wikimedia.org/r/302453 (https://phabricator.wikimedia.org/T133779) [15:34:30] where's the Labs equivalent of hieradata/common/network.yaml? [15:35:04] do we really have an entirely separate distinct Hiera tree for Labs? [15:35:06] I don't know but I'm not sure it goes to a different repository [15:36:04] andrewbogott might know [15:38:52] or chasemp [15:40:23] RECOVERY - puppet last run on relforge1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:41:40] (03CR) 10MaxSem: [C: 031] "Or ${expire_dir}/*" [puppet] - 10https://gerrit.wikimedia.org/r/302392 (owner: 10Gehel) [15:41:54] paravoid: the labs subnets are in hieradata/common/network.yaml [15:42:03] PROBLEM - puppet last run on cp4001 is CRITICAL: CRITICAL: Puppet has 1 failures [15:42:12] there isn't really anything more specific [15:42:17] as I understand it ^ there is no separation at this level [15:42:22] 06Operations, 10Ops-Access-Requests, 06Editing-Analysis, 13Patch-For-Review: Requesting access to research groups for Helen Jiang - https://phabricator.wikimedia.org/T140659#2515585 (10HJiang-WMF) ssh-add -L lists exactly the key listed in the previous comment "Bad owner or permissions“ on my path_to_ssh... [15:42:44] (03PS2) 10Filippo Giunchedi: site: add thumbor100[12] [puppet] - 10https://gerrit.wikimedia.org/r/302457 (https://phabricator.wikimedia.org/T139606) [15:42:46] (03PS5) 10Filippo Giunchedi: lvs: add thumbor to lvs [puppet] - 10https://gerrit.wikimedia.org/r/300244 (https://phabricator.wikimedia.org/T139606) [15:42:48] (03PS1) 10Filippo Giunchedi: thumbor: reference swift::params::account_keys [puppet] - 10https://gerrit.wikimedia.org/r/302459 [15:42:52] andrewbogott, is that file applied by labs instances though? [15:43:06] 06Operations, 10Datasets-General-or-Unknown, 10netops: dumps.wikimedia.org seems to have poor throughput towards some destinations - https://phabricator.wikimedia.org/T120425#2515591 (10Nemo_bis) Yes, but of course Helsinki is a very happy place. Also, upload.wm.o is much faster than dumps.wm.o, which AFAICT... [15:43:07] This is breaking puppet on deployment-cache-*, deployment-tin, and I think others [15:43:27] Krenair: It might be :/ What's the error message? [15:43:35] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find data item network::external in any Hiera data file and no default supplied at /etc/puppet/modules/network/manifests/constants.pp:5 on node deployment-cache-upload04.deployment-prep.eqiad.wmflabs [15:44:03] RECOVERY - puppet last run on cp4001 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [15:44:07] Im getting that error too [15:44:33] RECOVERY - puppet last run on mw2240 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [15:45:00] it's a product of the roles or config on the VM, not a labs wide thing afaiu [15:45:22] some roles use or have a hiera lookup that may fail if soemthing changes out from underneath it [15:45:24] !log upgraded firejail on restbase* to 0.9.40.3 [15:45:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:45:29] but I don't see tools-bastion-03.tools.eqiad.wmflabs failing for instance [15:46:00] But [15:46:00] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find data item network::external in any Hiera data file and no default supplied at /etc/puppet/modules/network/manifests/constants.pp:5 on node gerrit-test3.git.eqiad.wmflabs [15:46:06] is which i only set up yesturday [15:46:10] Puppet runs just fine on my canary instance. So, indeed, this issue is specific to particular project/instance config [15:46:34] tools-elastic-01 hits this [15:46:43] (03CR) 10Zppix: [C: 031] Nova: Disable instance creation for all non-admins. [puppet] - 10https://gerrit.wikimedia.org/r/302447 (owner: 10Andrew Bogott) [15:47:01] !log restbase rolling restart for firejail upgrade to 0.9.40 [15:47:02] and it also has ferm rules [15:47:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:47:12] my guess is this is related to ferm processing off the top of my head [15:47:24] at least in some cases [15:49:12] (03PS3) 10Filippo Giunchedi: site: add thumbor100[12] [puppet] - 10https://gerrit.wikimedia.org/r/302457 (https://phabricator.wikimedia.org/T139606) [15:49:14] (03PS2) 10Filippo Giunchedi: thumbor: reference swift::params::account_keys [puppet] - 10https://gerrit.wikimedia.org/r/302459 [15:49:16] (03PS6) 10Filippo Giunchedi: lvs: add thumbor to lvs [puppet] - 10https://gerrit.wikimedia.org/r/300244 (https://phabricator.wikimedia.org/T139606) [15:49:33] andrewbogott, I think it may be related to specific roles/classes [15:49:44] yeah [15:49:46] must be [15:49:50] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] thumbor: reference swift::params::account_keys [puppet] - 10https://gerrit.wikimedia.org/r/302459 (owner: 10Filippo Giunchedi) [15:49:52] probably anything using ferm rules [15:49:56] (03PS3) 10Filippo Giunchedi: thumbor: reference swift::params::account_keys [puppet] - 10https://gerrit.wikimedia.org/r/302459 [15:49:59] (03CR) 10Filippo Giunchedi: [V: 032] thumbor: reference swift::params::account_keys [puppet] - 10https://gerrit.wikimedia.org/r/302459 (owner: 10Filippo Giunchedi) [15:50:04] either way unless we can fix this it needs a labs-wide revert [15:50:50] this may be a product of a typo...idk [15:50:51] yeh [15:50:53] hand on a sec [15:50:55] hang on even [15:51:31] 06Operations, 10Datasets-General-or-Unknown, 10netops: dumps.wikimedia.org seems to have poor throughput towards some destinations - https://phabricator.wikimedia.org/T120425#2515609 (10BBlack) upload is very different than dumps. It's part of our traffic-optimize cache cluster termination, and dumps isn't.... [15:51:58] ok [15:52:21] (03PS1) 10Rush: labs: quote 'common' for labs puppetmaster hiera [puppet] - 10https://gerrit.wikimedia.org/r/302461 [15:52:39] having only thought about this for 2s and looking at the error I'm wondering if ^ [15:53:15] (03CR) 10Paladox: [C: 031] labs: quote 'common' for labs puppetmaster hiera [puppet] - 10https://gerrit.wikimedia.org/r/302461 (owner: 10Rush) [15:53:16] because that part of the tree should be the same but is ref'd in production.hiera.yaml as - "common" [15:53:24] paravoid: about? ^ [15:53:24] (03CR) 10Andrew Bogott: [C: 032] "Harmless, at worst." [puppet] - 10https://gerrit.wikimedia.org/r/302461 (owner: 10Rush) [15:53:24] greg-g: so the labs upgrade window is starting soon, and i also see a services window during the labs window [15:53:45] robh: I coordinated with the services people, it's all good [15:53:48] awesome [15:53:57] sorry to bug you about it! =] [15:54:00] ah yes [15:54:13] i just assumed all things deployment calendar ping greg first =] [15:54:58] (03PS2) 10Jcrespo: Add a field to pt-heartbeat to monitor different datacenters [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/302426 (https://phabricator.wikimedia.org/T114752) [15:55:04] Krenair: any better? [15:55:11] chasemp: oh my! [15:55:17] (03PS5) 10Ottomata: Hieraize kafka handler for eventlogging analytics, select appropriate auto_offset_reset [puppet] - 10https://gerrit.wikimedia.org/r/302453 (https://phabricator.wikimedia.org/T133779) [15:55:19] (03CR) 10Ottomata: [V: 032] Hieraize kafka handler for eventlogging analytics, select appropriate auto_offset_reset [puppet] - 10https://gerrit.wikimedia.org/r/302453 (https://phabricator.wikimedia.org/T133779) (owner: 10Ottomata) [15:55:25] Nope still happends [15:55:31] gwicke: subbu cscott_away any of you planning to do a deploy during this morning's services window? the labs upgrade will be on-going at that time (impacting wikitech as well). probably best to delay if you can. [15:55:50] greg-g, no parsoid deploy plans. [15:56:07] (03CR) 10Jcrespo: "This has been provisionally tested and the schema change applied to all shards." [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/302426 (https://phabricator.wikimedia.org/T114752) (owner: 10Jcrespo) [15:56:25] andrewbogott nope still happend [15:56:29] still errors with puppet [15:56:33] same error message [15:56:34] ok [15:57:05] so, paravoid, what do you think? Revert for now? (I'm about to disable CI so… not log to resolve this) [15:57:18] it hasn't hit labcontrol yet [15:57:18] that master is a minute behind [15:57:28] I'm about to run into a meeting [15:57:30] so still being broken isn't surprising but actually this should be part of the client config [15:57:35] if that doesn't fix it, feel free to revert [15:57:40] andrewbogott: give me a minute here [15:57:46] paravoid: ok, thanks [15:57:48] chasemp: makes sense, waiting [15:57:48] I don't know yet if that is or is not hte fix [15:58:03] it's not a very important commit, we can always reintroduce it later [16:00:04] andrewbogott: Dear anthropoid, the time has come. Please deploy Labs Openstack Upgrade (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160802T1600). [16:00:24] (03PS4) 10Filippo Giunchedi: site: add thumbor100[12] [puppet] - 10https://gerrit.wikimedia.org/r/302457 (https://phabricator.wikimedia.org/T139606) [16:00:29] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] site: add thumbor100[12] [puppet] - 10https://gerrit.wikimedia.org/r/302457 (https://phabricator.wikimedia.org/T139606) (owner: 10Filippo Giunchedi) [16:01:11] chasemp: ? [16:01:28] I'm looking for a host that was broken as is off the main labs master [16:01:53] the state of tools and some on the tools master (which I think is behind) is confusing things [16:02:00] (03PS1) 10Chad: Gerrit: Remove lets_encrypt variable, this is always true now [puppet] - 10https://gerrit.wikimedia.org/r/302462 [16:02:10] (03PS2) 10BBlack: ciphersuites: drop mid-level dhe+aes256 options [puppet] - 10https://gerrit.wikimedia.org/r/302378 [16:02:10] paladox: what's teh host things are broken on for you? [16:02:11] paladox had such an instance I think [16:02:26] gerrit-test3.git.eqiad.wmflabs [16:02:30] Yeh [16:02:39] ok trying [16:02:39] i only setup that yesturday [16:02:42] thanks [16:02:51] nope all be damned [16:02:57] andrewbogott: still broken and I'm not sure why [16:03:00] Oh [16:03:05] so I guess revert and we'll have to circle the wagons [16:03:17] chasemp: yeah. Anyone have a link to the patch in question? [16:03:25] * andrewbogott came in in the middle [16:03:30] https://gerrit.wikimedia.org/r/#/c/296739/ [16:03:34] I believe [16:03:44] https://gerrit.wikimedia.org/r/296739 [16:04:21] it makes no sense that should work to my mind [16:04:26] (03CR) 10BBlack: [C: 032] ciphersuites: drop mid-level dhe+aes256 options [puppet] - 10https://gerrit.wikimedia.org/r/302378 (owner: 10BBlack) [16:04:28] (03PS1) 10Andrew Bogott: Revert "network: move external_networks to hiera as well" [puppet] - 10https://gerrit.wikimedia.org/r/302463 [16:04:32] (03CR) 10Luke081515: [C: 031] "See my comment at the task, but if not, this is a full +1." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302400 (https://phabricator.wikimedia.org/T141847) (owner: 10MarcoAurelio) [16:04:36] (03PS2) 10Andrew Bogott: Revert "network: move external_networks to hiera as well" [puppet] - 10https://gerrit.wikimedia.org/r/302463 [16:05:27] it seems to work now [16:05:34] andrewbogott chasemp ^^ [16:05:39] that is too qiuck for it to be the revert [16:05:43] Oh yeh [16:05:45] I didn't merge the revert [16:05:46] seems to fail now [16:05:53] PROBLEM - puppet last run on thumbor1002 is CRITICAL: CRITICAL: Puppet has 1 failures [16:05:59] bah [16:05:59] :) [16:06:06] Oh wait it applied a change and failed with the same error [16:06:07] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find data item network::external in any Hiera data file and no default supplied at /etc/puppet/modules/network/manifests/constants.pp:5 on node gerrit-test3.git.eqiad.wmflabs [16:06:07] Warning: Not using cache on failed catalog [16:06:07] Error: Could not retrieve catalog; skipping run [16:06:08] (03CR) 10Andrew Bogott: [C: 032] Revert "network: move external_networks to hiera as well" [puppet] - 10https://gerrit.wikimedia.org/r/302463 (owner: 10Andrew Bogott) [16:06:51] I notice modules/network/spec/fixtures/hiera.yaml contains :hierarchy: - common [16:07:06] Krenair: on labcontrol or? [16:07:13] and references modules/network/spec/fixtures/hieradata/common.yaml [16:07:19] hm [16:07:20] chasemp, I don't have access to labcontrol remember? [16:07:42] ok, reverted [16:07:44] RECOVERY - puppet last run on thumbor1002 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [16:07:52] I'm looking at deployment-puppetmaster's current copy of the repo [16:07:57] ah [16:08:12] that file is where network::subnets lives [16:08:23] PROBLEM - puppet last run on thumbor1001 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:23] I still think this is an errant somethign w/ how the tree is defined and that typo should have been it [16:08:36] but we can run a few lookup tests I guess [16:08:39] paladox, chasemp, better? [16:08:50] which is the existing data that modules/network/manifests/constants.pp pulls from hiera [16:08:52] Yeh [16:08:52] (03CR) 10Andrew Bogott: [C: 032] Nova: Disable instance creation for all non-admins. [puppet] - 10https://gerrit.wikimedia.org/r/302447 (owner: 10Andrew Bogott) [16:09:03] (03PS3) 10Andrew Bogott: Nova: Disable instance creation for all non-admins. [puppet] - 10https://gerrit.wikimedia.org/r/302447 [16:09:05] root@gerrit-test3:/home/paladox# sudo puppet agent -tv [16:09:05] Notice: Run of Puppet configuration client already in progress; skipping (/var/lib/puppet/state/agent_catalog_run.lock exists) [16:09:13] (03PS3) 10Andrew Bogott: Switch Labs from Openstack Kilo to Liberty; [puppet] - 10https://gerrit.wikimedia.org/r/302448 [16:09:15] andrebogott thanks for fixing the problem and chasemp too [16:09:19] different errors anyway andrewbogott :) [16:09:32] seems like that hit [16:10:25] * andrewbogott waits for CI to verify the patch that will break CI [16:10:51] andrewbogott what does the new update to labs bring? [16:11:18] paladox: nothing much that you'll notice :) [16:11:26] Ok [16:11:48] (03CR) 10Filippo Giunchedi: [C: 031] Increase permissions validity on RESTBase cluster [puppet] - 10https://gerrit.wikimedia.org/r/301878 (https://phabricator.wikimedia.org/T140869) (owner: 10Eevans) [16:11:53] PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: puppet fail [16:11:58] andrewbogott i see it brings the new application catalog [16:12:02] https://www.openstack.org/software/liberty/ [16:12:03] PROBLEM - puppet last run on mc1005 is CRITICAL: CRITICAL: puppet fail [16:12:13] PROBLEM - puppet last run on es1012 is CRITICAL: CRITICAL: puppet fail [16:12:23] RECOVERY - puppet last run on thumbor1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:12:33] PROBLEM - puppet last run on db1019 is CRITICAL: CRITICAL: puppet fail [16:12:36] (03PS3) 10Mark Bergsma: Add BACKPORTS=yes hook to include $DIST-backports apt source [puppet] - 10https://gerrit.wikimedia.org/r/302440 [16:13:15] (03CR) 10Zppix: [C: 04-1] "Read phab comment" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302400 (https://phabricator.wikimedia.org/T141847) (owner: 10MarcoAurelio) [16:13:23] PROBLEM - puppet last run on aqs1006 is CRITICAL: CRITICAL: puppet fail [16:13:55] bblack: if you spare some time today, lvs for thumbor https://gerrit.wikimedia.org/r/#/c/300244 fairly straightforward but just in case I missed something [16:14:12] (03CR) 10Mark Bergsma: [C: 032] Add BACKPORTS=yes hook to include $DIST-backports apt source [puppet] - 10https://gerrit.wikimedia.org/r/302440 (owner: 10Mark Bergsma) [16:16:58] !log disabling puppet on all lab* hosts for staged upgrade [16:17:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:17:28] (03PS4) 10Andrew Bogott: Switch Labs from Openstack Kilo to Liberty; [puppet] - 10https://gerrit.wikimedia.org/r/302448 [16:17:37] (03CR) 10Zppix: [C: 031] Grant permission 'managechangetags' to 'abusefilter' group on English Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302400 (https://phabricator.wikimedia.org/T141847) (owner: 10MarcoAurelio) [16:17:44] (03CR) 10Andrew Bogott: [C: 032 V: 032] Switch Labs from Openstack Kilo to Liberty; [puppet] - 10https://gerrit.wikimedia.org/r/302448 (owner: 10Andrew Bogott) [16:18:01] godog: should probably add the LVS IPs to the instances, too [16:19:09] as in like this from role::cache::text: [16:19:10] class { 'lvs::realserver': [16:19:10] realserver_ips => $lvs::configuration::service_ips['text'][$::site] [16:19:13] } [16:19:26] but in role::thumbor::mediawiki [16:19:41] andrewbogott with the new labs update it brings application catalog, will that work? [16:19:56] paladox: I'm working on the upgrade, no real time to chat now [16:20:01] ok [16:20:03] sorry [16:20:05] paladox: in the middle of the upgrade is a bad time to talk :) [16:20:16] ok [16:20:37] (03CR) 10BBlack: [C: 031] "+1 -ish, but in this patch or a followup, the actual thumbor100x nodes need something like:" [puppet] - 10https://gerrit.wikimedia.org/r/300244 (https://phabricator.wikimedia.org/T139606) (owner: 10Filippo Giunchedi) [16:21:01] bblack: indeed, thanks I'm adding it [16:22:45] 07Blocked-on-Operations, 06Operations, 10Cassandra: Update Cassandra in Wikimedia APT repository - https://phabricator.wikimedia.org/T140409#2515748 (10Eevans) [16:23:36] !log downtime for lab* prod hosts for libert upgrade for 3 hours [16:24:36] 06Operations, 10Datasets-General-or-Unknown, 10netops: dumps.wikimedia.org seems to have poor throughput towards some destinations - https://phabricator.wikimedia.org/T120425#2515751 (10BBlack) Oh, also if you're in Helsinki, hopefully you're hitting the upload.wm.o terminator in esams, which is close to you... [16:27:07] 07Blocked-on-Operations, 06Operations, 10Cassandra: change graphite aggregation function for cassandra 'count' metrics - https://phabricator.wikimedia.org/T121789#2515756 (10Eevans) [16:28:42] (03PS7) 10Filippo Giunchedi: lvs: add thumbor to lvs [puppet] - 10https://gerrit.wikimedia.org/r/300244 (https://phabricator.wikimedia.org/T139606) [16:29:20] !log uploaded gerrit 2.12.2-wmf1 to apt.wikimedia.org [16:30:32] (03CR) 10BBlack: [C: 031] lvs: add thumbor to lvs [puppet] - 10https://gerrit.wikimedia.org/r/300244 (https://phabricator.wikimedia.org/T139606) (owner: 10Filippo Giunchedi) [16:30:51] moritzm it dosent seem to have logged it for you [16:30:59] neither for chasemp [16:31:00] probaly do to the labs update. [16:31:15] oh [16:31:17] yeah, probably the labs upgrade, I'll repeat later the evening [16:31:23] ok thanks [16:33:23] (03PS1) 10Mark Bergsma: Introduce $upstream_mirror parameter [puppet] - 10https://gerrit.wikimedia.org/r/302464 [16:35:38] (03CR) 10Mark Bergsma: [C: 032 V: 031] Introduce $upstream_mirror parameter [puppet] - 10https://gerrit.wikimedia.org/r/302464 (owner: 10Mark Bergsma) [16:35:44] (03CR) 10Mark Bergsma: [V: 032] Introduce $upstream_mirror parameter [puppet] - 10https://gerrit.wikimedia.org/r/302464 (owner: 10Mark Bergsma) [16:38:40] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [16:39:20] RECOVERY - puppet last run on mc1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:39:39] RECOVERY - puppet last run on aqs1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:40:09] RECOVERY - puppet last run on es1012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:40:50] RECOVERY - puppet last run on db1019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:41:37] (03PS4) 10Ppchelko: service::node - support sampled logging [puppet] - 10https://gerrit.wikimedia.org/r/302309 (https://phabricator.wikimedia.org/T139674) [16:42:53] (03PS8) 10Filippo Giunchedi: lvs: add thumbor to lvs [puppet] - 10https://gerrit.wikimedia.org/r/300244 (https://phabricator.wikimedia.org/T139606) [16:44:08] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] lvs: add thumbor to lvs [puppet] - 10https://gerrit.wikimedia.org/r/300244 (https://phabricator.wikimedia.org/T139606) (owner: 10Filippo Giunchedi) [16:44:27] bblack: thanks! [16:46:40] PROBLEM - Puppet catalogue fetch on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/labs-puppetmaster/eqiad - 185 bytes in 0.205 second response time [16:46:41] 06Operations, 06Release-Engineering-Team, 15User-greg, 07Wikimedia-Incident: Institute a weekly review of all UBN! tasks - https://phabricator.wikimedia.org/T141130#2515799 (10greg) >>! In T141130#2515119, @Aklapper wrote: > I appreciate this as I considered this my work so far. :P (I tried more like every... [16:47:23] ^expected [16:48:37] andrewbogott: I added tools-checker for same downtime as it will page and we expect turbulence so we'll have to keep an eye on it [16:48:37] (03CR) 10Ppchelko: "Puppet compiler: https://github.com/wikimedia/operations-puppet/blob/production/modules/restbase/templates/config.yaml.erb#L904-L916" [puppet] - 10https://gerrit.wikimedia.org/r/302309 (https://phabricator.wikimedia.org/T139674) (owner: 10Ppchelko) [16:49:29] (03PS1) 10Filippo Giunchedi: lvs: add thumbor port [puppet] - 10https://gerrit.wikimedia.org/r/302465 [16:50:07] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] lvs: add thumbor port [puppet] - 10https://gerrit.wikimedia.org/r/302465 (owner: 10Filippo Giunchedi) [16:50:44] bblack: ^ of course I forgot something heh [16:51:06] (03CR) 10Ppchelko: "> Puppet compiler: https://github.com/wikimedia/operations-puppet/blob/production/modules/restbase/templates/config.yaml.erb#L904-L916" [puppet] - 10https://gerrit.wikimedia.org/r/302309 (https://phabricator.wikimedia.org/T139674) (owner: 10Ppchelko) [16:51:08] (03PS1) 10Mark Bergsma: Hardcode $dist-backports components [puppet] - 10https://gerrit.wikimedia.org/r/302466 [16:51:47] (03CR) 10Mark Bergsma: [C: 032 V: 031] Hardcode $dist-backports components [puppet] - 10https://gerrit.wikimedia.org/r/302466 (owner: 10Mark Bergsma) [16:51:49] (03CR) 10Mark Bergsma: [V: 032] Hardcode $dist-backports components [puppet] - 10https://gerrit.wikimedia.org/r/302466 (owner: 10Mark Bergsma) [16:52:19] (03PS2) 10Mark Bergsma: Hardcode $dist-backports components [puppet] - 10https://gerrit.wikimedia.org/r/302466 [16:52:46] (03CR) 10Mark Bergsma: [V: 032] Hardcode $dist-backports components [puppet] - 10https://gerrit.wikimedia.org/r/302466 (owner: 10Mark Bergsma) [16:55:05] (03PS1) 10Filippo Giunchedi: thumbor: include lvs::configuration [puppet] - 10https://gerrit.wikimedia.org/r/302467 [16:55:10] PROBLEM - puppet last run on thumbor1002 is CRITICAL: CRITICAL: puppet fail [16:56:02] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] thumbor: include lvs::configuration [puppet] - 10https://gerrit.wikimedia.org/r/302467 (owner: 10Filippo Giunchedi) [16:56:34] 06Operations, 06Commons, 10media-storage: Install mscorefonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T140141#2515829 (10kaldari) That sounds good to me if someone's up to building a new package. [16:56:50] PROBLEM - puppet last run on thumbor1001 is CRITICAL: CRITICAL: puppet fail [16:59:09] RECOVERY - puppet last run on thumbor1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:59:20] RECOVERY - puppet last run on thumbor1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:59:53] 06Operations, 10Traffic, 13Patch-For-Review: Convert upload cluster to Varnish 4 - https://phabricator.wikimedia.org/T131502#2515835 (10ema) >>! In T131502#2512853, @ema wrote: > 1) User1 requests range 1-10 of a 1GB object. We want to make sure that varnish fetches the > whole thing, and sends those 10... [17:00:04] yurik, gwicke, cscott, arlolra, and subbu: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160802T1700). [17:00:27] anyone absolutely need to deploy? [17:00:31] if not, let's skip [17:01:12] (03PS1) 10Jcrespo: Add datacenter to lag checks [puppet] - 10https://gerrit.wikimedia.org/r/302469 [17:01:29] !log starting branch-cut for 1.28.0-wmf.13 [17:01:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:02:20] (03CR) 10Jcrespo: [C: 04-2] "Unfinished, needs icinga changes." [puppet] - 10https://gerrit.wikimedia.org/r/302469 (owner: 10Jcrespo) [17:05:54] 06Operations, 10hardware-requests: Eqiad: procure 4 servers for kubernetes - https://phabricator.wikimedia.org/T141624#2515849 (10RobH) We happen to have the spares to do this; except they don't meet the memory requirement. The request is for 4 servers, but the internal IP address requirement doesn't state if... [17:07:49] RECOVERY - Puppet catalogue fetch on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 2.497 second response time [17:08:55] (03CR) 10ArielGlenn: [C: 032] don't use -q arg for php, we run a modern version which doesn't need it [dumps] - 10https://gerrit.wikimedia.org/r/302451 (owner: 10ArielGlenn) [17:12:03] (03PS1) 10Filippo Giunchedi: introduce thumbor-admins group [puppet] - 10https://gerrit.wikimedia.org/r/302471 (https://phabricator.wikimedia.org/T139606) [17:14:34] 06Operations, 10hardware-requests: eqiad: (4) spare pool servers for kubernetes - https://phabricator.wikimedia.org/T141624#2515909 (10RobH) 05Open>03stalled p:05Triage>03Normal [17:14:35] (03PS3) 10Jcrespo: Add a field to pt-heartbeat to monitor different datacenters [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/302426 (https://phabricator.wikimedia.org/T114752) [17:19:00] 06Operations, 10RESTBase, 06Services, 13Patch-For-Review, 15User-mobrovac: RESTBase shutting down spontaneously - https://phabricator.wikimedia.org/T136957#2515944 (10MoritzMuehlenhoff) The restbase cluster has been upgraded to firejail 0.9.40.3 which addresses the problem with the incorrect exit code ha... [17:19:30] (03PS4) 10Jcrespo: Add a field to pt-heartbeat to monitor different datacenters [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/302426 (https://phabricator.wikimedia.org/T114752) [17:25:50] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [50.0] [17:26:43] (03PS1) 10Mark Bergsma: Pin $dist-backports at priority 500 [puppet] - 10https://gerrit.wikimedia.org/r/302472 [17:27:21] (03PS2) 10Mark Bergsma: Pin $dist-backports at priority 500 [puppet] - 10https://gerrit.wikimedia.org/r/302472 [17:29:07] 06Operations, 06Release-Engineering-Team, 06Services, 07Wikimedia-Incident: Review new service 'pre-deployment to production' checklist - https://phabricator.wikimedia.org/T141897#2515987 (10greg) [17:29:21] (03CR) 10Mark Bergsma: [C: 032 V: 031] Pin $dist-backports at priority 500 [puppet] - 10https://gerrit.wikimedia.org/r/302472 (owner: 10Mark Bergsma) [17:29:24] (03CR) 10Mark Bergsma: [V: 032] Pin $dist-backports at priority 500 [puppet] - 10https://gerrit.wikimedia.org/r/302472 (owner: 10Mark Bergsma) [17:29:46] (03PS10) 10ArielGlenn: add cron job for Content Translation dumps [puppet] - 10https://gerrit.wikimedia.org/r/301773 (https://phabricator.wikimedia.org/T127793) [17:32:00] 06Operations, 10DBA, 13Patch-For-Review: Puppetize pt-heartbeat on MariaDB10 masters and its corresponding checks on the several monitoring backends - https://phabricator.wikimedia.org/T114752#1705156 (10jcrespo) This is progressing by adding a datacenter field. The pending scope of this ticket may be chang... [17:33:47] 06Operations, 10ops-eqiad, 10netops: cr2-eqiad temperature alerts ("system warm") - https://phabricator.wikimedia.org/T141898#2516015 (10faidon) [17:34:43] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [17:35:26] (03PS3) 10MaxSem: Labs: remove $wgCentralGeoScriptURL - matches prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301747 [17:35:28] (03PS2) 10MaxSem: Labs: remove wgCentralAuthEnableUserMerge - matches the default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301745 [17:35:31] (03PS2) 10MaxSem: Labs: remove MobileApp inclusion, duplicates prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301744 [17:35:32] (03PS2) 10MaxSem: Labs: remove experimental $wgGadgetsCaching override [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301749 [17:35:34] (03PS2) 10MaxSem: Labs: remove $wgCentralDBname - matches prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301748 [17:35:36] (03PS2) 10MaxSem: Labs: remove duplicate $wgFlowParsoidURL assignment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301743 [17:35:38] (03PS1) 10MaxSem: Labs: remove wgJobLogFile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302474 [17:35:48] (03PS1) 10Mark Bergsma: C/p error [puppet] - 10https://gerrit.wikimedia.org/r/302475 [17:36:38] (03CR) 10Mark Bergsma: [C: 032 V: 031] C/p error [puppet] - 10https://gerrit.wikimedia.org/r/302475 (owner: 10Mark Bergsma) [17:36:44] (03CR) 10Mark Bergsma: [V: 032] C/p error [puppet] - 10https://gerrit.wikimedia.org/r/302475 (owner: 10Mark Bergsma) [17:39:11] PROBLEM - LVS HTTP IPv4 on thumbor.svc.codfw.wmnet is CRITICAL: Connection refused [17:39:33] PROBLEM - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is CRITICAL: Connection refused [17:39:33] oops, apologies that's me [17:39:41] (paged too just fyi) [17:39:45] MaxSem, thanks for doing all of these [17:40:29] chasemp: yup thanks, I didn't realize it would eventually pop a paging alert [17:40:34] Krenair, thank YOU [17:40:56] not having https was a huge thorn in everybody's sides [17:41:34] (03PS1) 10RobH: Revert "robh on vacation, removing from paging" [puppet] - 10https://gerrit.wikimedia.org/r/302476 [17:42:46] well, i forgot to put myself back into paging so that was a nice reminder ;D [17:44:07] (03PS2) 10RobH: Revert "robh on vacation, removing from paging" [puppet] - 10https://gerrit.wikimedia.org/r/302476 [17:44:37] 06Operations, 06Discovery, 10netops, 03Discovery-Search-Sprint: deploy elasticsearch/plugins to relforge1001-1002 servers - https://phabricator.wikimedia.org/T141085#2516056 (10Gehel) 05Open>03Resolved Correcting network categorization solved the plugin deployment issue. Closing this task. [17:45:09] 06Operations, 10Ops-Access-Requests, 06Editing-Analysis, 13Patch-For-Review: Requesting access to research groups for Helen Jiang - https://phabricator.wikimedia.org/T140659#2516073 (10Dzahn) >>! In T140659#2515585, @HJiang-WMF wrote: > "Bad owner or permissions“ on my path_to_ssh_config file if I don't su... [17:47:43] (03PS2) 10Chad: Gerrit: Set cache.projects.loadOnStartup = true [puppet] - 10https://gerrit.wikimedia.org/r/301898 (https://phabricator.wikimedia.org/T141065) [17:48:23] (03PS1) 10Andrew Bogott: Designate: Rename quota settings [puppet] - 10https://gerrit.wikimedia.org/r/302477 [17:51:23] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [50.0] [17:51:31] yes, thanks Krenair re: https in beta! [17:53:33] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [17:54:12] ok, how can one disable all the hotkeys for gerrit? [17:54:28] robh: that will be fixed in the next version upgrade. son [17:54:29] soon [17:55:43] paladox: ^ rememeber the ticket? [17:56:11] Yep [17:56:29] https://phabricator.wikimedia.org/T141245 [17:56:37] https://phabricator.wikimedia.org/T141297 [17:56:55] * robh is too paranoid to not test a change so wont be paged until the queue is processed. [17:57:56] robh: ^ those links paladox pasted. the last one is the actual bug [17:58:08] re: the hotkeys [17:59:47] \o/ [18:01:59] (03PS2) 10Andrew Bogott: Designate: Rename/update quota settings [puppet] - 10https://gerrit.wikimedia.org/r/302477 [18:03:05] (03CR) 10Andrew Bogott: [C: 032 V: 032] Designate: Rename/update quota settings [puppet] - 10https://gerrit.wikimedia.org/r/302477 (owner: 10Andrew Bogott) [18:19:13] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 657 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4809659 keys - replication_delay is 657 [18:19:56] !log akosiaris@palladium conftool action : set/weight=15; selector: wtp1001.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=parsoid', 'service=parsoid']) [18:20:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:20:00] !log akosiaris@palladium conftool action : set/weight=15; selector: wtp1002.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=parsoid', 'service=parsoid']) [18:20:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:21:22] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4792057 keys - replication_delay is 0 [18:21:41] !log T135176 set weight for wtp100[12] to 15 [18:21:42] T135176: Migrate Parsoid cluster to Jessie / node 4.x - https://phabricator.wikimedia.org/T135176 [18:21:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:27:29] (03PS1) 10Paladox: Gerrit: Support footer prefix Task: for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/302482 (https://phabricator.wikimedia.org/T91001) [18:27:56] (03PS2) 10Paladox: Gerrit: Support footer prefix Task: for phabricator [puppet] - 10https://gerrit.wikimedia.org/r/302482 (https://phabricator.wikimedia.org/T91001) [18:31:07] Is it just me or is Javascript failing on various projects? [18:31:20] sjoerddebruin: in what way? what are you seeing? [18:31:51] Well, the Javascript elements are not working. [18:32:21] My personal scripts and gadgets aren't loading and I get the Javascript fallbacks for most stuff [18:33:31] Hm, resaving preferences helped. Weird. [18:33:52] doesn't sound like an -operations issue, at least [18:33:54] (03PS5) 10Ottomata: Confluent MirrorMaker puppetization [puppet] - 10https://gerrit.wikimedia.org/r/300879 (https://phabricator.wikimedia.org/T134184) [18:35:01] (03CR) 10Ottomata: "You are right about all of these! I changed to use source_zookeeper_url when I was close to done this patch, I just forgot to change it i" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/300879 (https://phabricator.wikimedia.org/T134184) (owner: 10Ottomata) [18:35:35] Well, it could always be a side-effect of a deploy. [18:36:12] sjoerddebruin: on wikidata/ [18:36:13] ? [18:36:25] Yeah and I've had it on Commons too. [18:36:44] hmmm [18:36:50] my gadgets appear to work [18:37:03] is there any js error in the console? [18:38:05] thcipriani: are you doing the train today? [18:38:13] aude: yup [18:38:31] 06Operations, 10MediaWiki-Configuration: Content namespace definition remains in InitialiseSettings.php after namespace has been deleted - https://phabricator.wikimedia.org/T141906#2516269 (10Neil_P._Quinn_WMF) [18:38:50] thcipriani: ok [18:39:11] i'd like to update the "wikidata" extension and can submit a patch to bump the submodule [18:40:08] aude: kk, I'll start the deploy in 20 mins, I can get your submodule update in pre-deploy. [18:40:25] ok [18:40:32] * aude just waiting for jenkins [18:40:38] (03PS1) 10Andrew Bogott: Revert "Nova: Disable instance creation for all non-admins." [puppet] - 10https://gerrit.wikimedia.org/r/302484 [18:40:44] (03PS2) 10Andrew Bogott: Revert "Nova: Disable instance creation for all non-admins." [puppet] - 10https://gerrit.wikimedia.org/r/302484 [18:41:10] oh, right. hopefully ^ means it'll be back shortly. [18:41:26] er, nodepool jobs at least [18:46:27] (03CR) 10Andrew Bogott: [C: 032 V: 032] Revert "Nova: Disable instance creation for all non-admins." [puppet] - 10https://gerrit.wikimedia.org/r/302484 (owner: 10Andrew Bogott) [18:49:37] jenkins is slow and busy.... [18:50:39] yeah, nodepool instance creation was off for the openstack upgrade. Looks like it's coming back now... [18:51:00] It is back [18:51:01] now [18:51:07] according to andrewbogott in -releng [18:51:13] (03PS1) 10Andrew Bogott: no-op test patch [puppet] - 10https://gerrit.wikimedia.org/r/302487 [18:51:27] (03PS1) 10Madhuvishy: [WIP] labstore: Configure drbd for a HA labstore setup [puppet] - 10https://gerrit.wikimedia.org/r/302488 [18:53:15] (03PS2) 10Madhuvishy: [WIP] labstore: Configure drbd for a HA labstore setup [puppet] - 10https://gerrit.wikimedia.org/r/302488 [18:59:05] (03PS1) 10Dzahn: gerrit: ensure symlink /etc/default/gerritcodereview [puppet] - 10https://gerrit.wikimedia.org/r/302491 (https://phabricator.wikimedia.org/T141803) [18:59:55] (03CR) 10Paladox: [C: 031] "yes please." [puppet] - 10https://gerrit.wikimedia.org/r/302491 (https://phabricator.wikimedia.org/T141803) (owner: 10Dzahn) [19:00:04] thcipriani: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160802T1900). Please do the needful. [19:01:42] hrm [19:02:14] PROBLEM - puppet last run on silver is CRITICAL: CRITICAL: Puppet has 6 failures [19:03:22] aude: seems like it may take a while for CI to catch up to the backlog. gate-and-submit will clear up before running through the test queue. Fine to bump wikidata later? Or is that going to cause problems? [19:03:44] (03CR) 10Chad: [C: 04-1] "Or we just fix the package to install it to the right location :)" [puppet] - 10https://gerrit.wikimedia.org/r/302491 (https://phabricator.wikimedia.org/T141803) (owner: 10Dzahn) [19:03:52] later is okay, though it will need scap again [19:04:19] it's been 32 minutes, just for the initial jobs (and 2/3 done) [19:07:22] got a guess on when ci wil be caught up? [19:07:27] (03CR) 10Chad: [C: 031] gerrit: ensure symlink /etc/default/gerritcodereview [puppet] - 10https://gerrit.wikimedia.org/r/302491 (https://phabricator.wikimedia.org/T141803) (owner: 10Dzahn) [19:07:42] apergos: not 100% sure yet [19:08:03] wondering whether i shoudl check back in in a half hour, an hour, .... [19:08:33] looks like maybe an hour from what au de says [19:08:38] seems possibly it's stuck [19:08:42] hrm [19:08:51] why do you think it's stuck? [19:09:02] https://integration.wikimedia.org/ci/ shows jobs running on the nodepool instances [19:09:02] https://www.amazon.com/gp/product/B0010DZZVK/ [19:09:05] ah sorry [19:09:09] :) [19:09:09] :) [19:09:09] https://integration.wikimedia.org/zuul/ [19:09:19] it's not moving for a while for wikidata [19:09:42] not sure if you're aware that there was a labs downtime which means (most) CI downtime [19:09:46] it's catching back up now [19:09:52] it was down for ~2 hours [19:11:07] everything seems to be moving accordingly [19:11:15] yeah [19:11:18] okey dokey [19:11:20] can be patient [19:11:41] fine time for icecream, just discovered I have an opened container in the freezer, been there for 9 months at least! [19:13:22] (03PS4) 10Gehel: Maps - initial data import [puppet] - 10https://gerrit.wikimedia.org/r/300572 (https://phabricator.wikimedia.org/T138501) [19:15:48] * aude gets something to eat...back soon [19:17:44] PROBLEM - puppet last run on wtp2020 is CRITICAL: CRITICAL: puppet fail [19:18:40] (03PS1) 10Thcipriani: Group0 to 1.28.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302496 [19:25:33] PROBLEM - puppet last run on labcontrol1002 is CRITICAL: CRITICAL: Puppet has 1 failures [19:28:14] RECOVERY - puppet last run on silver is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [19:28:29] The graph to watch for CI wait time to come back to normal: https://grafana-admin.wikimedia.org/dashboard/snapshot/JaY5XiTConXGaAV0WArJmpbp8lbxgoxT?panelId=18&fullscreen [19:29:05] oh, you'll need to adjust the timespan, that one is fixed and won't update to "now" [19:29:12] greg-g: does it queue and then eventually try to gorge itself on jobs when it can? [19:29:29] chasemp: yep, see https://integration.wikimedia.org/zuul/ [19:30:13] and you can see the actual nodepool instances doing things https://integration.wikimedia.org/ci/ (the ones named "ci-[jessie|trusty]-wikimedia-blah" [19:31:34] !log thcipriani@tin Started scap: testwiki to php-1.28.0-wmf.13 and rebuild l10n cache [19:31:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:37:46] I finally got a +2 out of mine but of course it needs rebasing now :-/ [19:39:12] (03PS1) 10Chad: Gerrit: Do a reindex on a fresh install, less surprises [puppet] - 10https://gerrit.wikimedia.org/r/302497 [19:39:13] shouldn't be a whole lot longer now, looking at zuul [19:40:14] (03CR) 10jenkins-bot: [V: 04-1] [WIP] labstore: Configure drbd for a HA labstore setup [puppet] - 10https://gerrit.wikimedia.org/r/302488 (owner: 10Madhuvishy) [19:41:05] gah [19:41:25] "do a reindeer on a fresh install" <-- really should I be looking at gerrit when I misread stuff like that? [19:41:32] (03CR) 10Paladox: [C: 031] "We should deffintly try this." [puppet] - 10https://gerrit.wikimedia.org/r/302497 (owner: 10Chad) [19:41:42] also, fewer* [19:41:48] (not less) [19:42:39] yeah I'm waiting for the last of the old jobs to get done before I resubmit [19:43:54] RECOVERY - puppet last run on wtp2020 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [19:44:11] * aude back [19:44:32] (03PS1) 10Chad: gerrit (2.12.2-wmf.2) [debs/gerrit] - 10https://gerrit.wikimedia.org/r/302498 [19:45:41] (03CR) 10Paladox: [C: 031] gerrit (2.12.2-wmf.2) [debs/gerrit] - 10https://gerrit.wikimedia.org/r/302498 (owner: 10Chad) [19:45:43] RECOVERY - Improperly owned -0:0- files in /srv/mediawiki-staging on mira is OK: Files ownership is ok. [19:47:31] (03PS11) 10ArielGlenn: add cron job for Content Translation dumps [puppet] - 10https://gerrit.wikimedia.org/r/301773 (https://phabricator.wikimedia.org/T127793) [19:48:03] apergos: zuul looks caught up, fwiw [19:48:11] already rebased [19:48:17] watching it run right now [19:49:32] (03Abandoned) 10Andrew Bogott: no-op test patch [puppet] - 10https://gerrit.wikimedia.org/r/302487 (owner: 10Andrew Bogott) [19:51:53] RECOVERY - puppet last run on labcontrol1002 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [19:51:55] (03PS3) 10RobH: Revert "robh on vacation, removing from paging" [puppet] - 10https://gerrit.wikimedia.org/r/302476 [19:54:34] 06Operations, 10Ops-Access-Requests, 10LDAP-Access-Requests, 06Release-Engineering-Team, and 3 others: Determine a core set or a checklist of permissions for deployment purpose - https://phabricator.wikimedia.org/T140270#2458739 (10dpatrick) Because there are a number of Phabricator tickets in the Security... [19:54:53] (03CR) 10ArielGlenn: [C: 032] add cron job for Content Translation dumps [puppet] - 10https://gerrit.wikimedia.org/r/301773 (https://phabricator.wikimedia.org/T127793) (owner: 10ArielGlenn) [19:55:11] (03CR) 10RobH: [C: 032] Revert "robh on vacation, removing from paging" [puppet] - 10https://gerrit.wikimedia.org/r/302476 (owner: 10RobH) [19:58:13] (03Abandoned) 10RobH: Revert "robh on vacation, removing from paging" [puppet] - 10https://gerrit.wikimedia.org/r/302476 (owner: 10RobH) [20:00:57] thcipriani: https://gerrit.wikimedia.org/r/#/c/302500/ [20:01:37] (03PS1) 10RobH: robh back into paging group [puppet] - 10https://gerrit.wikimedia.org/r/302501 [20:02:07] (03CR) 10RobH: [C: 032] robh back into paging group [puppet] - 10https://gerrit.wikimedia.org/r/302501 (owner: 10RobH) [20:02:15] (03PS1) 10ArielGlenn: run the content translation dump job once a week [puppet] - 10https://gerrit.wikimedia.org/r/302503 [20:02:36] welcome back to sweet sweet paging robh [20:02:50] =P [20:03:15] jouncebot: next [20:03:15] In 2 hour(s) and 56 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160802T2300) [20:03:23] (03CR) 10Dzahn: [C: 032] scap: Allow configuration of the master rsync server in wmflabs [puppet] - 10https://gerrit.wikimedia.org/r/301408 (owner: 10BryanDavis) [20:03:30] (03PS1) 10Chad: Gerrit: Make heapLimit configurable per host as well [puppet] - 10https://gerrit.wikimedia.org/r/302504 [20:03:36] aude: kk, this scap is almost done, I'll merge that and re-run once finished. [20:04:22] ok thanks [20:04:23] (03CR) 10ArielGlenn: [C: 032] run the content translation dump job once a week [puppet] - 10https://gerrit.wikimedia.org/r/302503 (owner: 10ArielGlenn) [20:04:30] (03PS2) 10ArielGlenn: run the content translation dump job once a week [puppet] - 10https://gerrit.wikimedia.org/r/302503 [20:05:38] is the lack of trending messages in logstash a known thing? Is it on my end somehow? Couldn't find a task, made this one: https://phabricator.wikimedia.org/T141919 [20:05:45] (03CR) 10Dzahn: "would have merged, but there are dependencies" [puppet] - 10https://gerrit.wikimedia.org/r/301408 (owner: 10BryanDavis) [20:06:02] (03CR) 10ArielGlenn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/302503 (owner: 10ArielGlenn) [20:06:12] 06Operations, 10Ops-Access-Requests, 10LDAP-Access-Requests, 06Release-Engineering-Team, and 3 others: Determine a core set or a checklist of permissions for deployment purpose - https://phabricator.wikimedia.org/T140270#2458739 (10MaxSem) Cluster access means much greater degree of trust, and by running a... [20:06:22] are all "related changes" in gerrit also "parent changes"? [20:06:31] or is there another kind of relation [20:06:42] where do you actually see the parent now [20:07:06] thcipriani: I think I may have jsut fixed it [20:07:48] bd808: amazing timing :) [20:08:13] PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on mira is CRITICAL: Improperly owned (0:0) files in /srv/mediawiki-staging [20:08:32] nice, working for me. [20:10:59] why did that ACK expire. was it fixed? [20:11:46] ACKNOWLEDGEMENT - Improperly owned -0:0- files in /srv/mediawiki-staging on mira is CRITICAL: Improperly owned (0:0) files in /srv/mediawiki-staging daniel_zahn . [20:15:31] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 9x or 15x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2516667 (10RobH) Ok, this just shifted the quantity from 6 to 12 (previously mentioned) along with another 3 in staging. I want to clarify, this is for 6 (o... [20:15:52] gwicke: ^ not sure on how many and where these need to be? [20:16:07] i have back pricing for all optoins and im in the process of summarizing and escalating to you guys to review [20:17:14] !log thcipriani@tin Finished scap: testwiki to php-1.28.0-wmf.13 and rebuild l10n cache (duration: 45m 40s) [20:17:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:19:37] (03CR) 10BryanDavis: "> would have merged, but there are dependencies" [puppet] - 10https://gerrit.wikimedia.org/r/301408 (owner: 10BryanDavis) [20:23:39] (03PS3) 10Dzahn: scap: Allow configuration of the master rsync server in wmflabs [puppet] - 10https://gerrit.wikimedia.org/r/301408 (owner: 10BryanDavis) [20:24:11] (03PS4) 10Dzahn: scap: Allow configuration of the master rsync server in wmflabs [puppet] - 10https://gerrit.wikimedia.org/r/301408 (owner: 10BryanDavis) [20:36:54] !log thcipriani@tin Started scap: testwiki to php-1.28.0-wmf.13 and rebuild l10n cache with Wikidata [20:36:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:37:03] ^ aude fyi [20:38:16] (03PS1) 10ArielGlenn: add some missing directories to the datasets dir manifest [puppet] - 10https://gerrit.wikimedia.org/r/302547 [20:45:44] RECOVERY - Improperly owned -0:0- files in /srv/mediawiki-staging on mira is OK: Files ownership is ok. [20:46:48] (03PS2) 10ArielGlenn: add some missing directories to the datasets dir manifest [puppet] - 10https://gerrit.wikimedia.org/r/302547 [20:49:57] (03CR) 10ArielGlenn: [C: 032] add some missing directories to the datasets dir manifest [puppet] - 10https://gerrit.wikimedia.org/r/302547 (owner: 10ArielGlenn) [20:51:15] (03PS1) 10Chad: Logstash: Enable log4j provider [puppet] - 10https://gerrit.wikimedia.org/r/302601 [20:51:59] (03PS2) 10Chad: Logstash: Enable log4j provider [puppet] - 10https://gerrit.wikimedia.org/r/302601 [20:53:45] (03CR) 10Paladox: [C: 031] Gerrit: Make heapLimit configurable per host as well [puppet] - 10https://gerrit.wikimedia.org/r/302504 (owner: 10Chad) [20:57:23] (03PS1) 10ArielGlenn: move management of pagetitles and mediatitles dirs to datasets manifest [puppet] - 10https://gerrit.wikimedia.org/r/302603 [20:59:12] (03CR) 10ArielGlenn: [C: 032] move management of pagetitles and mediatitles dirs to datasets manifest [puppet] - 10https://gerrit.wikimedia.org/r/302603 (owner: 10ArielGlenn) [20:59:41] thcipriani: thanks [21:01:08] !log thcipriani@tin Finished scap: testwiki to php-1.28.0-wmf.13 and rebuild l10n cache with Wikidata (duration: 24m 13s) [21:01:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:01:40] (03CR) 10Thcipriani: [C: 032] Group0 to 1.28.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302496 (owner: 10Thcipriani) [21:02:09] (03CR) 10Alex Monk: [C: 04-1] "Just found an issue - project_main_zone_ids[project] can fail for projects without a $project.wmflabs.org domain" [puppet] - 10https://gerrit.wikimedia.org/r/300331 (https://phabricator.wikimedia.org/T104521) (owner: 10Alex Monk) [21:02:11] (03Merged) 10jenkins-bot: Group0 to 1.28.0-wmf.13 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302496 (owner: 10Thcipriani) [21:03:06] jouncebot: next [21:03:07] In 1 hour(s) and 56 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160802T2300) [21:03:21] merging 3 gerrit config changes now [21:03:32] then applying them at once. so just 1 restart to minimize user impact [21:03:46] (03PS7) 10Dzahn: Gerrit: Avoid breaking full phabricator URLs [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) (owner: 10Paladox) [21:03:55] well I just wrapped up my work for the day I think [21:03:59] midnight so that's about right [21:04:05] I can log in again tomorrow :-P [21:04:06] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.28.0-wmf.13 [21:04:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:04:20] Oh your two hours a head of me apergos [21:04:21] lol [21:04:40] yep it'smidnight-04 here [21:04:46] (03CR) 10Dzahn: [C: 032] Gerrit: Avoid breaking full phabricator URLs [puppet] - 10https://gerrit.wikimedia.org/r/302129 (https://phabricator.wikimedia.org/T75997) (owner: 10Paladox) [21:04:46] oh [21:04:47] mutante: you should probably wait for thcipriani to finish the train [21:04:50] it is 10pm here [21:05:12] bd808: jouncebot claimed it's about 2 hours until the next deploy [21:05:17] waits [21:05:26] mutante: sorry, running a bit behind. [21:05:46] ok, np, i'll wait [21:06:06] PROBLEM - Improperly owned -0:0- files in /srv/mediawiki-staging on mira is CRITICAL: Improperly owned (0:0) files in /srv/mediawiki-staging [21:06:21] mutante: just finished up. error logs look fine. All clear :) [21:06:24] icinga-wm: i keep telling you about that,, why you forget [21:06:31] thcipriani: oh, that was quick:) thanks [21:06:49] :D [21:07:14] (03PS5) 10Dzahn: Rely on commits name instead of branch [puppet] - 10https://gerrit.wikimedia.org/r/301849 (owner: 10Paladox) [21:07:41] (03CR) 10Paladox: "This does not need merging. Since it dosent work yet." [puppet] - 10https://gerrit.wikimedia.org/r/301849 (owner: 10Paladox) [21:07:44] paladox: wait..ugh ^ that one says it requires another one [21:07:47] Yep [21:07:49] in phabricator [21:07:51] and it also dosent work [21:07:55] yep [21:08:03] just in time.. ok [21:08:08] yep [21:08:09] and ok [21:08:29] well then.. i should: [21:08:38] (03CR) 10Dzahn: [C: 031] "14:13 < paladox> and it also dosent work" [puppet] - 10https://gerrit.wikimedia.org/r/301849 (owner: 10Paladox) [21:08:44] (03CR) 10Dzahn: [C: 04-1] Rely on commits name instead of branch [puppet] - 10https://gerrit.wikimedia.org/r/301849 (owner: 10Paladox) [21:08:51] Ok thanks [21:08:58] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 9x or 15x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2516791 (10GWicke) @Robh: The current staging nodes are in eqiad, and so far the assumption has been that we would replace the old existing hardware. [21:09:08] (03PS3) 10Dzahn: Gerrit: Set cache.projects.loadOnStartup = true [puppet] - 10https://gerrit.wikimedia.org/r/301898 (https://phabricator.wikimedia.org/T141065) (owner: 10Chad) [21:09:35] (03CR) 10Dzahn: [C: 032] Gerrit: Set cache.projects.loadOnStartup = true [puppet] - 10https://gerrit.wikimedia.org/r/301898 (https://phabricator.wikimedia.org/T141065) (owner: 10Chad) [21:09:36] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 9x or 15x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2516792 (10RobH) Ok, so 6 or 12 nodes (depending on final pricing) for codfw and eqiad each, plus an additional 3 nodes in eqiad. Thanks! [21:11:21] (03PS5) 10Paladox: Gerrit: Support having phab commits as links [puppet] - 10https://gerrit.wikimedia.org/r/302229 (https://phabricator.wikimedia.org/T76459) [21:12:18] (03CR) 10Alex Monk: "Also can't handle this *.base.wikitextexp.wmflabs.org. IN A 208.80.155.182" [puppet] - 10https://gerrit.wikimedia.org/r/300331 (https://phabricator.wikimedia.org/T104521) (owner: 10Alex Monk) [21:15:00] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002/stat1004 for Jdlrobson - https://phabricator.wikimedia.org/T141811#2516800 (10Jdlrobson) Reading web team has done some performance changes that should reduce bytes shipped on Japanese Wikipedia. I need to run some analysis on page views on Jap... [21:15:22] (03PS1) 10Krinkle: noc: Update outdated symlink to favicon.ico [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302607 [21:15:29] (03PS2) 10Dzahn: Gerrit: Double size of conflicts cache [puppet] - 10https://gerrit.wikimedia.org/r/301894 (owner: 10Chad) [21:15:43] (03CR) 10Dzahn: [C: 032] Gerrit: Double size of conflicts cache [puppet] - 10https://gerrit.wikimedia.org/r/301894 (owner: 10Chad) [21:18:28] (03PS2) 10Krinkle: noc: Update outdated symlink to favicon.ico [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302607 [21:18:59] (03PS3) 10Krinkle: noc: Update outdated symlink to favicon.ico [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302607 (https://phabricator.wikimedia.org/T107430) [21:19:53] 06Operations, 06Commons, 10Wikimedia-SVG-rendering: SVG files larger than 10 MB cannot be thumbnailed - https://phabricator.wikimedia.org/T111815#2516828 (10Amitie_10g) I ran a bot and got a (huge) list of files SVGs > 10 MB (more than 8000 ones). I'll post the list soon, so, I don't have the time to see eve... [21:20:17] gerrit deploy now... [21:20:35] !log gerrit is restarting to apply config changes: 301898 (warm cache, faster startup) 301894 (double size of conflicts cache) 302129 (avoid breaking full phabricator urls) [21:20:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:21:39] done [21:21:51] paladox: you can confirm the phab urls [21:21:55] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master). [21:22:11] 06Operations, 10Traffic: TLS stats regression related to Chrome/41 on Windows - https://phabricator.wikimedia.org/T141786#2516829 (10BBlack) [21:22:19] Yay it works [21:22:20] https://gerrit.wikimedia.org/r/#/c/302129/ [21:22:23] mutante ^^ [21:22:26] great [21:23:05] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 9x or 15x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2516833 (10Eevans) >>! In T139961#2516792, @RobH wrote: > Ok, so 6 or 12 nodes (depending on final pricing) for codfw and eqiad each, plus an additional 3 no... [21:23:45] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 9x or 15x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2516834 (10RobH) So only half of each to each DC, plus the staging in eqiad. [21:25:06] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Puppet has 1 failures [21:27:10] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 9x or 15x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2516838 (10Eevans) Yes, cluster expansions have to be in multiples of 6 (we have to add them to each rack, and there are 3 racks in each DC). So the minimum... [21:28:34] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002/stat1004 for Jdlrobson - https://phabricator.wikimedia.org/T141811#2516851 (10Ottomata) If you want webrequest access logs, then you need to be in the `analytics-privatedata-users` group. [21:35:54] RECOVERY - Improperly owned -0:0- files in /srv/mediawiki-staging on tin is OK: Files ownership is ok. [21:36:14] RECOVERY - Improperly owned -0:0- files in /srv/mediawiki-staging on mira is OK: Files ownership is ok. [21:36:17] 06Operations, 10Ops-Access-Requests: Platonides access to #mediawiki_security - https://phabricator.wikimedia.org/T140288#2516861 (10RobH) 05Open>03Resolved a:03RobH signed, so granted. [21:36:39] 06Operations, 10Traffic: TLS stats regression related to Chrome/41 on Windows - https://phabricator.wikimedia.org/T141786#2516864 (10BBlack) The history around the Windows updates involved is confusing (it seems they released a bad patch, then stopped offering it as an update, then started offering a replaceme... [21:41:52] !log krinkle@tin Synchronized docroot/noc: Update favicon.ico symlink (duration: 00m 34s) [21:41:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:42:05] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [21:47:35] 06Operations, 10Traffic: TLS stats regression related to Chrome/41 on Windows - https://phabricator.wikimedia.org/T141786#2516938 (10BBlack) I take that back, I've found a way to reason about the latter part. There are supposed to be commas on the end when counting bytes. Counting them like that, the 1024-bo... [21:50:55] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [21:55:46] PROBLEM - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 10.65.0.24 [21:57:36] RECOVERY - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms [22:01:10] 06Operations, 10Traffic: TLS stats regression related to Chrome/41 on Windows - https://phabricator.wikimedia.org/T141786#2517011 (10BBlack) `KB3163018` is a better search term. That's the big cumulative update to Win10 that included these things (among others). Some links related to it that are relevant: h... [22:12:18] mutante: was grrrit-wm restarted? [22:12:47] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 9x or 15x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2517211 (10GWicke) To follow up on the DC question: The staging nodes could also be placed in codfw. The existing staging nodes in either DC won't have enoug... [22:13:01] legoktm i doint think so [22:13:02] ? [22:23:15] (03Abandoned) 10Yuvipanda: WIP replacement of modify-ldap-groups [puppet] - 10https://gerrit.wikimedia.org/r/301058 (owner: 10Yuvipanda) [22:23:24] paladox ^ works? [22:23:31] Oh [22:23:56] yuvipanda it seems to only work on your own patches now [22:24:16] yuvipanda i tryed https://gerrit.wikimedia.org/r/#/c/301763/ [22:24:18] shall I revert your change? [22:24:23] Yes please [22:24:47] actually paladox [22:24:48] > error: prefix=weber.freenode.net, server=weber.freenode.net, command=err_cannotsendtochan, rawCommand=404, commandType=error, args=[grrrit-wm, #wikimedia-releng, Cannot send to channel] [22:24:54] Oh [22:25:05] I wonder why it carnt [22:25:24] (03PS2) 10Alex Monk: beta: Get rid of old unused upload.beta.wmflabs.org apache config [puppet] - 10https://gerrit.wikimedia.org/r/302398 (https://phabricator.wikimedia.org/T84950) [22:25:43] (03CR) 10Yuvipanda: [C: 032 V: 032] beta: Get rid of old unused upload.beta.wmflabs.org apache config [puppet] - 10https://gerrit.wikimedia.org/r/302398 (https://phabricator.wikimedia.org/T84950) (owner: 10Alex Monk) [22:25:48] (03PS3) 10Chad: Logstash: Enable log4j provider [puppet] - 10https://gerrit.wikimedia.org/r/302601 [22:25:49] paladox ^ [22:25:59] oh [22:26:13] it seems it says chad for rebasing now [22:26:19] even though it was me [22:27:47] (03CR) 10Paladox: "Testing bot" [puppet] - 10https://gerrit.wikimedia.org/r/302601 (owner: 10Chad) [22:28:03] yuvipanda i think i may know a fix [22:28:07] for it saying wrong user again [22:28:11] i think i may try message.author.name [22:28:17] which is what CR uses [22:29:11] paladox I can add you to the tool if yo want and you can debug this yourself? [22:29:12] yuvipanda https://gerrit.wikimedia.org/r/#/c/302617/ please [22:29:21] Oh yes please? [22:29:44] paladox sure. gimme a moment. I can then guide you through normal operations and you can then test it. [22:29:48] Ok [22:29:50] thanks [22:29:55] moment [22:30:07] ok [22:30:29] (03PS12) 10Alex Monk: Puppetise script to manage labs floating IP PTR records [puppet] - 10https://gerrit.wikimedia.org/r/300331 (https://phabricator.wikimedia.org/T104521) [22:35:00] (03PS5) 10Alex Monk: Delegate 208.80.155.128/25 (labs instances) PTR records to labs-ns* so they can be managed automatically [dns] - 10https://gerrit.wikimedia.org/r/299513 (https://phabricator.wikimedia.org/T104521) [22:37:12] Krenair: Shall we remove bits.beta? [22:37:24] If you think it can go, let's. [22:37:37] https://github.com/search?q=org:wikimedia+-repo:wikimedia/operations-debs-hhvm+-repo:wikimedia/mediawiki-debian+-repo:wikimedia/wikimedia-fundraising-civicrm-buildkit-vendor+-repo:wikimedia/wikimedia-fundraising-crm-civicrm+%22bits.beta%22&ref=opensearch&type=Code [22:42:14] So the first result can go if the domain is going [22:42:29] Thing is Varnish actually uses that bits_domain variable [22:42:44] And I don't know about that portal result or that eventlogging result [22:43:23] Varnish uses it for... Checking against incoming requests' hosts. We could let it default to prod's one I suppose [22:46:09] Krinkle, any idea about eventlogging/portals? [22:46:18] Krenair: Fixed portals [22:46:19] fixing el now [22:46:23] ok [22:47:32] (03CR) 10Paladox: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/302601 (owner: 10Chad) [22:53:22] (03PS7) 10Chad: Logstash: Enable log4j provider [puppet] - 10https://gerrit.wikimedia.org/r/302601 [22:53:26] (03CR) 10Alex Monk: "Thanks for the review Brandon!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T50501) (owner: 10Alex Monk) [22:58:04] sorry about grrrit-wm [23:00:04] RoanKattouw, ostriches, MaxSem, and Dereckson: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160802T2300). [23:02:14] (03PS4) 10Dzahn: tcpircbot: allow connections from terbium, wasat [puppet] - 10https://gerrit.wikimedia.org/r/302366 (https://phabricator.wikimedia.org/T141619) [23:02:30] Hi RoanKattouw. There is still https://gerrit.wikimedia.org/r/#/c/302377/ to deploy. [23:02:52] (03CR) 10Dzahn: "fixed IP of terbium. terbium.eqiad.wmnet has address 10.64.32.13" [puppet] - 10https://gerrit.wikimedia.org/r/302366 (https://phabricator.wikimedia.org/T141619) (owner: 10Dzahn) [23:03:51] (03CR) 10Alex Monk: [C: 031] tcpircbot: allow connections from terbium, wasat [puppet] - 10https://gerrit.wikimedia.org/r/302366 (https://phabricator.wikimedia.org/T141619) (owner: 10Dzahn) [23:04:53] (03CR) 10Dzahn: [C: 032] tcpircbot: allow connections from terbium, wasat [puppet] - 10https://gerrit.wikimedia.org/r/302366 (https://phabricator.wikimedia.org/T141619) (owner: 10Dzahn) [23:05:06] (03PS5) 10Dzahn: tcpircbot: allow connections from terbium, wasat [puppet] - 10https://gerrit.wikimedia.org/r/302366 (https://phabricator.wikimedia.org/T141619) [23:05:34] (03CR) 10Alex Monk: [C: 031] tcpircbot: allow connections from terbium and wasat [puppet] - 10https://gerrit.wikimedia.org/r/302375 (https://phabricator.wikimedia.org/T141619) (owner: 10Dzahn) [23:06:29] (03PS1) 10Dereckson: Reset ar.wikipedia content namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302619 (https://phabricator.wikimedia.org/T141906) [23:08:39] (03PS5) 10Dzahn: tcpircbot: allow connections from terbium and wasat [puppet] - 10https://gerrit.wikimedia.org/r/302375 (https://phabricator.wikimedia.org/T141619) [23:08:58] 06Operations, 06Labs, 10Labs-Infrastructure: python-designateclient package version does not match between labtestweb2001 and silver - https://phabricator.wikimedia.org/T134543#2268964 (10AlexMonk-WMF) Update: @Andrew did this during the upgrade of Labs from Kilo to Liberty. [23:10:25] (03CR) 10Dzahn: [C: 032] "can go now after https://gerrit.wikimedia.org/r/#/c/302375/ double checked IPs" [puppet] - 10https://gerrit.wikimedia.org/r/302375 (https://phabricator.wikimedia.org/T141619) (owner: 10Dzahn) [23:11:56] (03CR) 10Dzahn: "i wonder if we should add "rhodium". it's the new puppetmaster" [puppet] - 10https://gerrit.wikimedia.org/r/302375 (https://phabricator.wikimedia.org/T141619) (owner: 10Dzahn) [23:12:26] (03CR) 10Dereckson: [C: 031] Move abusefilter permissions to abusefilter.php for azbwiki. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302410 (https://phabricator.wikimedia.org/T141860) (owner: 10MarcoAurelio) [23:15:54] (03CR) 10Dereckson: "GitHub truncates titles at 72 characters and best practices recommend 50 max." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302400 (https://phabricator.wikimedia.org/T141847) (owner: 10MarcoAurelio) [23:18:07] (03PS4) 10MarcoAurelio: Allow abuse filter editors group to edit tags on en.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302400 (https://phabricator.wikimedia.org/T141847) [23:19:12] (03CR) 10Dereckson: [C: 031] Allow abuse filter editors group to edit tags on en.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302400 (https://phabricator.wikimedia.org/T141847) (owner: 10MarcoAurelio) [23:20:32] (03PS1) 10Dzahn: tcpircbot: add rhodium to allowed hosts [puppet] - 10https://gerrit.wikimedia.org/r/302621 [23:21:31] * Dereckson adds 3 config patches to the SWAT (302619, 302410, 302400) [23:23:53] ostriches: around? [23:25:01] (03PS5) 10Dereckson: Allow abuse filter editors group to edit tags on en.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302400 (https://phabricator.wikimedia.org/T141847) (owner: 10MarcoAurelio) [23:27:06] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302400 (https://phabricator.wikimedia.org/T141847) (owner: 10MarcoAurelio) [23:27:34] (03Merged) 10jenkins-bot: Allow abuse filter editors group to edit tags on en.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302400 (https://phabricator.wikimedia.org/T141847) (owner: 10MarcoAurelio) [23:28:26] 302400 live on mw1099 [23:28:46] Works fine. [23:29:31] !log dereckson@tin Synchronized wmf-config/abusefilter.php: Allow abuse filter editors group to edit tags on en.wikipedia (T141847) (duration: 00m 32s) [23:29:32] T141847: Add permission 'managechangetags' to 'abusefilter' group on English Wikipedia - https://phabricator.wikimedia.org/T141847 [23:29:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:29:41] Works fine in prod too. [23:30:08] (03PS3) 10Dereckson: Move abusefilter permissions to abusefilter.php for azbwiki. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302410 (https://phabricator.wikimedia.org/T141860) (owner: 10MarcoAurelio) [23:30:15] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302410 (https://phabricator.wikimedia.org/T141860) (owner: 10MarcoAurelio) [23:30:41] (03Merged) 10jenkins-bot: Move abusefilter permissions to abusefilter.php for azbwiki. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302410 (https://phabricator.wikimedia.org/T141860) (owner: 10MarcoAurelio) [23:31:22] (03CR) 10Dereckson: "Best practices suggest to avoid the final dot in commit message title." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302410 (https://phabricator.wikimedia.org/T141860) (owner: 10MarcoAurelio) [23:32:01] 302410 live on mw1099 [23:34:41] works fine [23:36:12] (03PS1) 10Dzahn: add IPv6 AAAA and reverse for palladium [dns] - 10https://gerrit.wikimedia.org/r/302624 [23:36:26] (03CR) 10jenkins-bot: [V: 04-1] add IPv6 AAAA and reverse for palladium [dns] - 10https://gerrit.wikimedia.org/r/302624 (owner: 10Dzahn) [23:38:42] !log dereckson@tin Synchronized wmf-config/abusefilter.php: Move abusefilter permissions to abusefilter.php for azbwiki (T141860, 1/2) (duration: 00m 28s) [23:38:43] T141860: Move azbwiki abusefilter configuration from InitialiseSettings.php to abusefilter.php - https://phabricator.wikimedia.org/T141860 [23:38:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:39:05] (03PS2) 10Dzahn: add IPv6 AAAA and reverse for palladium [dns] - 10https://gerrit.wikimedia.org/r/302624 [23:39:29] (03CR) 10Dzahn: "cool that jenkins-bot catches it" [dns] - 10https://gerrit.wikimedia.org/r/302624 (owner: 10Dzahn) [23:39:31] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Move abusefilter permissions to abusefilter.php for azbwiki (T141860, 2/2) (duration: 00m 27s) [23:39:32] T141860: Move azbwiki abusefilter configuration from InitialiseSettings.php to abusefilter.php - https://phabricator.wikimedia.org/T141860 [23:39:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:40:41] Still working file in prod [23:40:43] fine [23:41:23] (03PS2) 10Dereckson: Reset ar.wikipedia content namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302619 (https://phabricator.wikimedia.org/T141906) [23:43:13] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302619 (https://phabricator.wikimedia.org/T141906) (owner: 10Dereckson) [23:43:39] (03Merged) 10jenkins-bot: Reset ar.wikipedia content namespaces [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302619 (https://phabricator.wikimedia.org/T141906) (owner: 10Dereckson) [23:43:53] 302619 live on mw1099 [23:45:32] (03PS1) 10Dzahn: add mapped IPv6 on rhodium and strontium [puppet] - 10https://gerrit.wikimedia.org/r/302626 [23:46:17] jynus: I've on mw1099 a ResourceLoaderModule::saveFileDependencies: failed to update DB: exception 'DBReadOnlyError' with message 'Database is read-only: The database has been automatically locked while the slave database servers catch up to the master.' [23:47:00] Dereckson jynus is not around, offline. You may want to pick him tomarror. [23:48:28] paladox: IRC isn't only a real time communication media, especially for information level notifications purpose. [23:48:38] oh [23:48:46] sorry [23:49:14] that + All communication MUST happen in #wikimedia-operations connect on Freenode (not in separate team or area-specific channels) [23:49:20] -- https://wikitech.wikimedia.org/wiki/SWAT_deploys#Guidelines [23:50:27] ok [23:50:30] sorry [23:51:35] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Reset ar.wikipedia content namespaces (T141906) (duration: 00m 26s) [23:51:36] T141906: Content namespace definition for arwiki is still in InitialiseSettings.php after namespace was deleted - https://phabricator.wikimedia.org/T141906 [23:51:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:53:07] (03CR) 10Dzahn: "related: https://gerrit.wikimedia.org/r/#/c/302624/ , https://gerrit.wikimedia.org/r/#/c/302626/" [puppet] - 10https://gerrit.wikimedia.org/r/302621 (owner: 10Dzahn) [23:53:39] search still works on ar. [23:53:47] SWAT done. [23:54:05] wee [23:54:17] I'm going to deploy some labs changes [23:54:18] (03PS2) 10Dzahn: tcpircbot: add rhodium to allowed hosts [puppet] - 10https://gerrit.wikimedia.org/r/302621 [23:54:58] MaxSem: yes, green light, I've finished the 3 config patches scheduled. [23:55:21] (03PS3) 10MaxSem: Labs: remove duplicate $wgFlowParsoidURL assignment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301743 [23:55:33] (03CR) 10MaxSem: [C: 032] Labs: remove duplicate $wgFlowParsoidURL assignment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301743 (owner: 10MaxSem) [23:55:59] (03Merged) 10jenkins-bot: Labs: remove duplicate $wgFlowParsoidURL assignment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301743 (owner: 10MaxSem) [23:56:01] (03PS3) 10MaxSem: Labs: remove MobileApp inclusion, duplicates prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301744 [23:56:11] (03CR) 10MaxSem: [C: 032] Labs: remove MobileApp inclusion, duplicates prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301744 (owner: 10MaxSem) [23:56:26] (03PS3) 10MaxSem: Labs: remove wgCentralAuthEnableUserMerge - matches the default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301745 [23:56:36] (03Merged) 10jenkins-bot: Labs: remove MobileApp inclusion, duplicates prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301744 (owner: 10MaxSem) [23:56:39] (03CR) 10MaxSem: [C: 032] Labs: remove wgCentralAuthEnableUserMerge - matches the default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301745 (owner: 10MaxSem) [23:56:50] (03PS4) 10MaxSem: Labs: remove $wgCentralGeoScriptURL - matches prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301747 [23:56:57] (03CR) 10MaxSem: [C: 032] Labs: remove $wgCentralGeoScriptURL - matches prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301747 (owner: 10MaxSem) [23:57:10] (03Merged) 10jenkins-bot: Labs: remove wgCentralAuthEnableUserMerge - matches the default [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301745 (owner: 10MaxSem) [23:57:25] (03Merged) 10jenkins-bot: Labs: remove $wgCentralGeoScriptURL - matches prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301747 (owner: 10MaxSem) [23:57:46] (03PS3) 10MaxSem: Labs: remove $wgCentralDBname - matches prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301748 [23:57:51] (03CR) 10MaxSem: [C: 032] Labs: remove $wgCentralDBname - matches prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301748 (owner: 10MaxSem) [23:58:02] (03PS3) 10MaxSem: Labs: remove experimental $wgGadgetsCaching override [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301749 [23:58:19] (03Merged) 10jenkins-bot: Labs: remove $wgCentralDBname - matches prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301748 (owner: 10MaxSem) [23:58:22] (03CR) 10MaxSem: [C: 032] Labs: remove experimental $wgGadgetsCaching override [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301749 (owner: 10MaxSem) [23:58:32] (03PS2) 10MaxSem: Labs: remove wgJobLogFile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302474 [23:58:39] (03CR) 10MaxSem: [C: 032] Labs: remove wgJobLogFile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302474 (owner: 10MaxSem) [23:58:47] (03Merged) 10jenkins-bot: Labs: remove experimental $wgGadgetsCaching override [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301749 (owner: 10MaxSem) [23:59:08] (03Merged) 10jenkins-bot: Labs: remove wgJobLogFile [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302474 (owner: 10MaxSem)