[00:01:31] (03CR) 10Ottomata: setting dns entries for eventlog1002 (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/406127 (https://phabricator.wikimedia.org/T185667) (owner: 10RobH) [00:01:37] 10Operations, 10Analytics: setup/install eventlog1002.eqiad.wmnet - https://phabricator.wikimedia.org/T185667#3929813 (10Ottomata) @Robh, sorry I missed your ping on this, Trusty please! :) [00:05:15] 10Operations, 10Wikimedia-General-or-Unknown: Beta English Wikipedia: History of the page 'Bird' generates a 500 or 503 error - https://phabricator.wikimedia.org/T185969#3929818 (10matmarex) [00:10:38] (03PS7) 10Paladox: WIP: phabricator: Switch from apache to nginx [puppet] - 10https://gerrit.wikimedia.org/r/406243 (https://phabricator.wikimedia.org/T185644) [00:10:55] (03CR) 10jerkins-bot: [V: 04-1] WIP: phabricator: Switch from apache to nginx [puppet] - 10https://gerrit.wikimedia.org/r/406243 (https://phabricator.wikimedia.org/T185644) (owner: 10Paladox) [00:18:51] (03CR) 10Elukey: "My personal view: only from reading the commit message I don't find a good reason to switch from apache to nginx. As far as I can see (it " [puppet] - 10https://gerrit.wikimedia.org/r/406243 (https://phabricator.wikimedia.org/T185644) (owner: 10Paladox) [00:22:33] (03PS8) 10Paladox: WIP: phabricator: Switch from apache to nginx [puppet] - 10https://gerrit.wikimedia.org/r/406243 (https://phabricator.wikimedia.org/T185644) [00:26:17] 10Operations, 10Analytics: setup/install eventlog1002.eqiad.wmnet - https://phabricator.wikimedia.org/T185667#3929857 (10faidon) Trusty has about a year left of upstream support, and likely less for our own purposes. Any reason to not switch to somewhere more recent while we're at it? [00:29:14] 10Operations, 10Ops-Access-Requests: Requesting access to stat1004, stat1005, stat1006 for mneisler - https://phabricator.wikimedia.org/T184838#3929858 (10MNeisler) Thanks @RobH! Just confirming that I've been able to connect successfully. [00:29:33] (03Abandoned) 10Awight: Refactor ORES uWSGI workers to use an absolute count [puppet] - 10https://gerrit.wikimedia.org/r/396055 (https://phabricator.wikimedia.org/T182249) (owner: 10Awight) [00:31:56] !log demon@tin rebuilt and synchronized wikiversions files: not changing versions, testing something [00:32:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:39:41] 10Operations, 10Phabricator: Add some ssd's to phab1001 and phab2001 - https://phabricator.wikimedia.org/T185971#3929861 (10Paladox) [00:39:42] 10Operations, 10Phabricator, 10Release-Engineering-Team: Add some ssd's to phab1001 and phab2001 - https://phabricator.wikimedia.org/T185971#3929874 (10Paladox) [00:39:42] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to ops group in admin for bstorm - https://phabricator.wikimedia.org/T185591#3929875 (10Dzahn) p:05Triage>03High [00:39:42] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3929877 (10Dzahn) [00:39:42] 10Operations, 10Ops-Access-Requests, 10Patch-For-Review: Requesting access to ops group in admin for bstorm - https://phabricator.wikimedia.org/T185591#3920534 (10Dzahn) 05Open>03stalled [00:39:42] 10Operations, 10Scap, 10Patch-For-Review: scap sudo violation on first puppet run - https://phabricator.wikimedia.org/T185189#3929881 (10Dzahn) p:05Triage>03High [00:39:42] (03CR) 10Gergő Tisza: [C: 031] Enable TemplateStyles extension on svwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394831 (https://phabricator.wikimedia.org/T176082) (owner: 10Jon Harald Søby) [00:39:42] 10Operations, 10Scap, 10Patch-For-Review: scap sudo violation on first puppet run - https://phabricator.wikimedia.org/T185189#3908788 (10Dzahn) p:05High>03Normal [00:39:42] 10Operations, 10Analytics: setup/install eventlog1002.eqiad.wmnet - https://phabricator.wikimedia.org/T185667#3929884 (10Ottomata) We are holding for Kubernetes! :) When it is ready, we will move the many individual processes (which are managed and monitored via a custom upstart based eventloggingctl scripts... [00:44:34] 10Operations, 10Phabricator, 10Release-Engineering-Team: Add some ssd's to phab1001 and phab2001 - https://phabricator.wikimedia.org/T185971#3929911 (10Dzahn) p:05Triage>03Low This should automatically happen once phab1001/2001 get replaced in the future. We are planning to use only SSDs unless there is... [00:48:01] 10Operations, 10Phabricator, 10Patch-For-Review: Switch phabricator from using apache to nginx - https://phabricator.wikimedia.org/T185644#3929916 (10Dzahn) I recall one time i had to restart it recently. Have there been more restarts by others? Is it really every week? [00:49:44] 10Operations, 10Phabricator, 10Patch-For-Review: Switch phabricator from using apache to nginx - https://phabricator.wikimedia.org/T185644#3929920 (10Paladox) @dzahn yep. Apparently it went unnoticed due to us restarting apache every week. And when we didnt restart it, phab would get slow. [00:50:44] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3929921 (10Bstorm) >>! In T185493#3929298, @Dzahn wrote: > @Bstorm feel free to ping me about the Icinga contact part, happy to do it together or show you where to do it self-se... [01:02:28] (03PS1) 10Ottomata: Add log_segment_bytes param to kafka broker profile [puppet] - 10https://gerrit.wikimedia.org/r/406781 [01:05:22] (03PS2) 10Ottomata: Add log_segment_bytes param to kafka broker profile [puppet] - 10https://gerrit.wikimedia.org/r/406781 [01:07:02] (03CR) 10Ottomata: [C: 032] "No-op in prod: https://puppet-compiler.wmflabs.org/compiler02/9816/" [puppet] - 10https://gerrit.wikimedia.org/r/406781 (owner: 10Ottomata) [01:21:12] (03CR) 10Krinkle: Lower Thumbor subprocess timeout to 59 seconds (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/405698 (https://phabricator.wikimedia.org/T185479) (owner: 10Gilles) [02:12:19] gone again for family dinner (they eat really early around these parts) [02:23:57] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.17) (duration: 05m 58s) [02:24:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:23:07] 10Operations, 10Phabricator, 10Patch-For-Review: Switch phabricator from using apache to nginx - https://phabricator.wikimedia.org/T185644#3930039 (10Dzahn) I suggest(ed) we add a puppetized cron to restart Apache once a week on Sunday. But we should also try to find out what is actually happening. [03:28:42] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 891.03 seconds [03:54:01] PROBLEM - Check Varnish expiry mailbox lag on cp4024 is CRITICAL: CRITICAL: expiry mailbox lag is 2031986 [03:54:11] (03PS1) 10Dzahn: hiera/wmflib/pybal: rename ganglia_clusters to wikimedia_clusters [puppet] - 10https://gerrit.wikimedia.org/r/406794 (https://phabricator.wikimedia.org/T177225) [03:55:18] (03PS2) 10Dzahn: hiera/wmflib/pybal: rename ganglia_clusters to wikimedia_clusters [puppet] - 10https://gerrit.wikimedia.org/r/406794 (https://phabricator.wikimedia.org/T177225) [03:57:51] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 133.86 seconds [04:00:06] (03CR) 10Dzahn: "now suggesting https://gerrit.wikimedia.org/r/#/c/406794/ instead" [puppet] - 10https://gerrit.wikimedia.org/r/382930 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [04:00:12] (03CR) 10Dzahn: [C: 04-1] "now suggesting https://gerrit.wikimedia.org/r/#/c/406794/ instead" [puppet] - 10https://gerrit.wikimedia.org/r/382931 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [04:06:35] 10Operations, 10Phabricator, 10Patch-For-Review: Switch phabricator from using apache to nginx - https://phabricator.wikimedia.org/T185644#3921927 (10Joe) So before we go to such a shotgun approach, I'd like to first: # Try to understand why apache is "stalling". Using `strace(1)` on the hanging apache proc... [04:24:40] (03PS3) 10Giuseppe Lavagetto: Refactor conftool.action, add the edit action [software/conftool] - 10https://gerrit.wikimedia.org/r/405303 [04:31:29] 10Operations, 10cloud-services-team (Kanban): Labstore1006/7 profile for meltdown kernel - https://phabricator.wikimedia.org/T185101#3930100 (10bd808) [04:57:41] PROBLEM - Check Varnish expiry mailbox lag on cp4021 is CRITICAL: CRITICAL: expiry mailbox lag is 2123808 [05:05:51] 10Operations, 10cloud-services-team: silver: / partition low on space - https://phabricator.wikimedia.org/T151493#2819106 (10bd808) >>! In T151493#3925240, @aborrero wrote: >>>! In T151493#3533580, @herron wrote: >> In addition to the directories outlined in the description /usr/share/texlive appears to be lar... [06:49:58] giving up for the night, too sleepy. have a good one [07:13:38] 10Operations, 10ORES, 10Scoring-platform-team, 10Traffic, and 4 others: 503 spikes and resulting API slowness starting 18:45 October 26 - https://phabricator.wikimedia.org/T179156#3930185 (10demon) Is there anything left here, now that everything in the summary is done? [07:29:10] (03PS1) 10Urbanecm: Update logo for urwikibooks, add hd logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406801 (https://phabricator.wikimedia.org/T185977) [08:08:51] PROBLEM - puppet last run on labpuppetmaster1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[setup crl dir] [08:18:43] (03PS1) 10Chad: Add open.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406803 [08:20:52] (03CR) 10jerkins-bot: [V: 04-1] Add open.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406803 (owner: 10Chad) [08:35:36] !log installing libxml2 security updates [08:35:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:38:51] RECOVERY - puppet last run on labpuppetmaster1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:34:30] !log installing wireshark security updates [09:34:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:46:42] (03PS2) 10Chad: Add open.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406803 [09:52:28] 10Operations, 10Ops-Access-Requests: Requesting access to analytics-users / webrequest for Esteban - https://phabricator.wikimedia.org/T185988#3930285 (10Esteban) [10:01:41] PROBLEM - puppet last run on cp2012 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tshark] [10:04:42] PROBLEM - puppet last run on mw2182 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tshark] [10:09:35] (03CR) 10MarcoAurelio: "Some comments." (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/404942 (https://phabricator.wikimedia.org/T184981) (owner: 10Lokal Profil) [10:09:41] PROBLEM - puppet last run on mw1332 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tshark] [10:11:56] 10Operations, 10Ops-Access-Requests: Requesting access to analytics-users / webrequest for Esteban - https://phabricator.wikimedia.org/T185988#3930304 (10Esteban) [10:29:42] RECOVERY - puppet last run on mw2182 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [10:31:41] RECOVERY - puppet last run on cp2012 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [10:34:41] RECOVERY - puppet last run on mw1332 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:38:41] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 35, down: 2, dormant: 0, excluded: 0, unused: 0 [10:41:12] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0 [10:41:21] PROBLEM - Router interfaces on cr1-ulsfo is CRITICAL: CRITICAL: host 198.35.26.192, interfaces up: 66, down: 1, dormant: 0, excluded: 0, unused: 0 [10:44:32] (03CR) 10MarcoAurelio: "Maybe a custom composer/phpcs rule on our mediawiki codesniffer to find and replace automatically this would be easier?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) (owner: 10TerraCodes) [10:46:06] (03PS1) 10Muehlenhoff: Record extended access for CPS Data fundraising contractors [puppet] - 10https://gerrit.wikimedia.org/r/406807 [10:53:20] (03CR) 10Muehlenhoff: [C: 032] Record extended access for CPS Data fundraising contractors [puppet] - 10https://gerrit.wikimedia.org/r/406807 (owner: 10Muehlenhoff) [10:57:20] !log installing ffmpeg security updates [10:57:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:31] RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [11:00:31] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 68, down: 0, dormant: 0, excluded: 0, unused: 0 [11:00:52] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [11:24:43] (03PS1) 10Muehlenhoff: Add library hint for ffmpeg [puppet] - 10https://gerrit.wikimedia.org/r/406812 [11:36:29] (03CR) 10Muehlenhoff: [C: 032] Add library hint for ffmpeg [puppet] - 10https://gerrit.wikimedia.org/r/406812 (owner: 10Muehlenhoff) [11:40:02] 10Operations, 10CirrusSearch, 10Discovery, 10MediaWiki-JobQueue, and 6 others: Job queue is increasing non-stop - https://phabricator.wikimedia.org/T173710#3930445 (10Ladsgroup) a:03Ladsgroup I'm working on for a proper solution for refresh links jobs that are triggered from Wikidata, Made lots of progre... [12:15:37] !log installing libxtst updates [12:15:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:29:39] 10Operations: Integrate jessie 8.10 point release - https://phabricator.wikimedia.org/T182656#3930535 (10MoritzMuehlenhoff) These are fully rolled out: krb5 libx11 libxfixes libxi libxrandr ncurses sudo [13:15:50] !log installing rsync security updates on trusty [13:16:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:48:25] 10Operations, 10PAWS, 10Pywikibot-Commons, 10Traffic, and 2 others: Server error (500) while trying to download files from Commons from PAWS - https://phabricator.wikimedia.org/T178567#3930577 (10Chicocvenancio) 05Open>03Resolved a:03Chicocvenancio Creating a new task to deal with pywikibot's side. [14:10:16] 10Operations, 10Phabricator, 10Patch-For-Review: Switch phabricator from using apache to nginx - https://phabricator.wikimedia.org/T185644#3930673 (10Paladox) For this "Try to understand why apache is "stalling". Using strace(1) on the hanging apache processes should give us some indication of what is going... [14:24:24] (03PS1) 10Urbanecm: New throttle rule, clean obsolete rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406820 (https://phabricator.wikimedia.org/T186002) [14:26:17] (03CR) 10jerkins-bot: [V: 04-1] New throttle rule, clean obsolete rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406820 (https://phabricator.wikimedia.org/T186002) (owner: 10Urbanecm) [14:30:30] (03PS2) 10Urbanecm: New throttle rule, clean obsolete rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406820 (https://phabricator.wikimedia.org/T186002) [14:32:56] Hello all, I neeed emergency deploy for https://gerrit.wikimedia.org/r/#/c/406820/ - throttle rule for February 01. See T186002. [14:32:56] T186002: Requesting temporary lift of IP cap - https://phabricator.wikimedia.org/T186002 [14:33:09] (I've posted the same message to -tech and -releng) [14:33:47] HI [14:33:50] Sorry for lating [14:34:10] Zoranzoki21, do you have deploy access? [14:34:33] no [14:35:02] thcipriani, Reedy, around? [14:35:12] Urbanecm hi, could you email greg please? [14:35:21] Urbanecm thcipriani is at an offsite with releng. [14:35:25] so may not be around [14:36:17] paladox, technically. greg-g once said that throttle requests are safe to deploy, but if you need an explicit approval... [14:36:38] yep. [14:36:49] yep == you do need it? :) [14:37:08] Urbanecm in that case, you will need to find someone around who is willing to deploy. [14:37:15] I'm tryin to :D [14:37:17] trying [14:38:07] I sended him email yesterday [14:38:14] He no replied [14:38:31] Zoranzoki21, he's manager of releng, isn't he? [14:38:39] yes [14:38:45] So he might not be available as well as whole #releng [14:42:28] !log installing curl security updates on app server canaries along with HHVM restart [14:42:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:52] <_joe_> Urbanecm: you'll have a hard time finding anyone [14:51:25] <_joe_> most of the SREs are in flight (I am), and the releng team is at their offsite [14:51:45] <_joe_> so probably discussing things instead than around irc [14:51:54] Flight with internet connection, fascinating :D [14:52:05] I've warned that this might be impossible to completed at the task [14:52:18] Urbanecm: it exists. it also has 1.5 second pings ;) [14:52:21] <_joe_> Urbanecm: yeah, good enough to tell you this, not enough to deploy a change :P [14:52:35] <_joe_> MatmaRex: nah my latency is actually quite decent [14:52:44] MatmaRex, I know, satelite connection or something like this [14:53:10] _joe_, IRC has low speed requirements :) [15:04:07] 10Operations, 10ORES, 10Scoring-platform-team, 10Performance: Diagnose and fix 4.5k req/min ceiling for ores* requests - https://phabricator.wikimedia.org/T182249#3930810 (10Halfak) @awight it seems we'll still want that change when we get to the new cluster, won't we? [15:07:04] Hello everyone, what are the conditions for joining SWAT? [15:08:23] (03PS1) 10Jcrespo: mariadb: Disable notifications on es2018 before reimage [puppet] - 10https://gerrit.wikimedia.org/r/406826 [15:10:20] 10Operations, 10ORES, 10Scoring-platform-team, 10Performance: Diagnose and fix 4.5k req/min ceiling for ores* requests - https://phabricator.wikimedia.org/T182249#3930844 (10awight) @Halfak We rolled the same fix into https://gerrit.wikimedia.org/r/#/c/396064/ [15:10:56] (03CR) 10Jcrespo: [C: 032] mariadb: Disable notifications on es2018 before reimage [puppet] - 10https://gerrit.wikimedia.org/r/406826 (owner: 10Jcrespo) [15:11:09] razesoldier, the child in me wants to list a bunch of things related to marksmanship and hostage negotiation skills but as this is a grown up channel, you can find more about SWAT here https://wikitech.wikimedia.org/wiki/SWAT_deploys#team [15:16:00] jgleeson: I should go ask greg-g? [15:16:39] sounds like a solid way forward [15:30:14] razesoldier: what do you mean "join swat"? Be a SWAT deployer or have a patch you propose be deployed? [15:30:15] !log stop and reimage es2018 [15:30:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:37] greg-g: Become a SWAT Deployer [15:33:10] razesoldier: email me with your background/experience and why you want to join. greg @ wikimedia org [15:33:18] I have to go afk for a bit [15:33:52] ok [15:35:45] !log installing libxcursor security updates [15:35:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:24] (03PS1) 10Jcrespo: mariadb: Allow temporary manual reimage of es[12]00[89] [puppet] - 10https://gerrit.wikimedia.org/r/406828 [15:45:26] (03PS2) 10Jcrespo: mariadb: Allow temporary manual reimage of es[12]01[89] [puppet] - 10https://gerrit.wikimedia.org/r/406828 [15:48:02] (03CR) 10Jcrespo: [C: 032] mariadb: Allow temporary manual reimage of es[12]01[89] [puppet] - 10https://gerrit.wikimedia.org/r/406828 (owner: 10Jcrespo) [15:52:48] !log installing libxml2 security updates [15:53:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:58:26] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3930894 (10chasemp) [16:08:59] What happening again with mediawiki-core-doxygen-publish? [16:13:24] (03PS1) 10Jcrespo: mariadb: Reimage es* servers with stretch by default [puppet] - 10https://gerrit.wikimedia.org/r/406831 [16:14:45] (03CR) 10Jcrespo: [C: 032] mariadb: Reimage es* servers with stretch by default [puppet] - 10https://gerrit.wikimedia.org/r/406831 (owner: 10Jcrespo) [16:43:13] (03CR) 10Chad: "We've already spent more time on this than it's worth :\" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) (owner: 10TerraCodes) [16:51:57] jouncebot: next [16:51:58] In 0 hour(s) and 8 minute(s): Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180130T1700) [16:57:27] jouncebot: next [16:57:28] In 0 hour(s) and 2 minute(s): Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180130T1700) [16:58:07] jouncebot: next [16:58:07] In 0 hour(s) and 1 minute(s): Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180130T1700) [16:58:33] jouncebot: next [16:58:33] In 0 hour(s) and 1 minute(s): Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180130T1700) [16:58:57] jouncebot: next [16:58:57] In 0 hour(s) and 1 minute(s): Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180130T1700) [16:59:03] Zoranzoki21: please stop [16:59:17] ok. I will stop [17:00:04] godog, moritzm, and _joe_: My dear minions, it's time we take the moon! Just kidding. Time for Puppet SWAT(Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180130T1700). [17:00:04] No GERRIT patches in the queue for this window AFAICS. [17:00:10] puppet swat is different to swat. [17:00:21] puppet swat is when you have a patch in the puppet repo. [17:00:32] oh my god [17:03:08] checked. nothing in swat window [17:04:35] I confused and added few changes but it is not for pupept [17:06:04] (03CR) 10Rush: [C: 031] "nice" [puppet] - 10https://gerrit.wikimedia.org/r/406778 (https://phabricator.wikimedia.org/T185967) (owner: 10Volans) [17:14:42] (03PS1) 10Jcrespo: Revert "mariadb: Disable notifications on es2018 before reimage" [puppet] - 10https://gerrit.wikimedia.org/r/406838 [17:15:51] (03PS2) 10Jcrespo: Revert "mariadb: Disable notifications on es2018 before reimage" [puppet] - 10https://gerrit.wikimedia.org/r/406838 [17:24:46] 10Operations, 10Phabricator, 10Release-Engineering-Team: Consider ssd's for phabricator - https://phabricator.wikimedia.org/T185796#3931129 (10faidon) [17:24:53] 10Operations, 10Phabricator, 10Release-Engineering-Team: Add some ssd's to phab1001 and phab2001 - https://phabricator.wikimedia.org/T185971#3931132 (10faidon) [17:26:25] 10Operations, 10Phabricator, 10Release-Engineering-Team: Add some ssd's to phab1001 and phab2001 - https://phabricator.wikimedia.org/T185971#3931145 (10Zoranzoki21) >>! In T185971#3929911, @Dzahn wrote: > This should automatically happen once phab1001/2001 get replaced in the future. We are planning to use o... [17:27:58] 10Operations, 10Phabricator, 10Release-Engineering-Team: Add some ssd's to phab1001 and phab2001 - https://phabricator.wikimedia.org/T185971#3931170 (10Paladox) >>! In T185971#3931145, @Zoranzoki21 wrote: >>>! In T185971#3929911, @Dzahn wrote: >> This should automatically happen once phab1001/2001 get replac... [17:28:34] 10Operations, 10Phabricator, 10Release-Engineering-Team: Add some ssd's to phab1001 and phab2001 - https://phabricator.wikimedia.org/T185971#3931174 (10Dzahn) >>! In T185971#3931145, @Zoranzoki21 wrote: > Why you no change apache with ngnix? Please see T185644 for that topic. [17:29:21] 10Operations, 10Phabricator, 10Release-Engineering-Team: Add some ssd's to phab1001 and phab2001 - https://phabricator.wikimedia.org/T185971#3931176 (10Zoranzoki21) >>! In T185971#3931174, @Dzahn wrote: >>>! In T185971#3931145, @Zoranzoki21 wrote: >> Why you no change apache with ngnix? > > Please see T1856... [17:34:15] 10Operations, 10Analytics-Kanban, 10User-Elukey: Expand meitnerium's root partition to 100G - https://phabricator.wikimedia.org/T186020#3931188 (10elukey) [17:40:43] 10Operations, 10netops: reconfigure esams switch port for new bastion - https://phabricator.wikimedia.org/T186021#3931220 (10Dzahn) p:05Triage>03High [17:40:48] 10Operations, 10Analytics-Kanban, 10User-Elukey: Expand meitnerium's root partition to 100G - https://phabricator.wikimedia.org/T186020#3931235 (10elukey) Looking to https://wikitech.wikimedia.org/w/index.php?title=Ganeti#Resize_a_VM, it might be less painful to create a new disk, format it and then use it a... [17:42:14] 10Operations, 10netops: reconfigure esams switch port for new bastion - https://phabricator.wikimedia.org/T186021#3931220 (10Dzahn) [17:47:54] 10Operations, 10Phabricator, 10Release-Engineering-Team: Add some ssd's to phab1001 and phab2001 - https://phabricator.wikimedia.org/T185971#3931267 (10faidon) 05Open>03declined This has no problem statement, diagnosis, root cause analysis or evidence of I/O starvation -- and yet we're jumping to actiona... [17:52:11] 10Operations, 10Phabricator, 10Release-Engineering-Team: Add some ssd's to phab1001 and phab2001 - https://phabricator.wikimedia.org/T185971#3929861 (10demon) >>! In T185971#3929911, @Dzahn wrote: > While SSDs would of course not hurt performance, Its not like Phabricator is permanently slow. I think the ga... [17:52:17] paravoid: Ahhh, missed your decline/comment [17:52:30] (I was mid-comment and contemplating a decline) [17:52:40] (03PS1) 10Bstorm: Add bstorm to cloud-wide root [labs/private] - 10https://gerrit.wikimedia.org/r/406842 [17:54:23] 10Operations, 10Analytics-Kanban, 10User-Elukey: Expand meitnerium's root partition to 100G - https://phabricator.wikimedia.org/T186020#3931334 (10MoritzMuehlenhoff) Yeah, it's probably easiest to add a new disk and move /var/lib/archiva to it [17:55:18] 10Operations, 10Phabricator, 10Release-Engineering-Team: Add some ssd's to phab1001 and phab2001 - https://phabricator.wikimedia.org/T185971#3931345 (10demon) Oh, and if you want to have a look at IO usage, [[ https://grafana.wikimedia.org/dashboard/db/prometheus-machine-stats?from=now-3h&to=now&orgId=1&var-... [17:55:37] (03CR) 10Paladox: Add bstorm to cloud-wide root (031 comment) [labs/private] - 10https://gerrit.wikimedia.org/r/406842 (owner: 10Bstorm) [17:58:39] 10Operations, 10ops-codfw, 10DBA, 10hardware-requests: Decommission db2016, db2017, db2018, db2019, db2023, db2028, db2029 - https://phabricator.wikimedia.org/T184090#3931359 (10Papaul) [18:00:06] cscott, arlolra, subbu, halfak, and Amir1: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Services – Graphoid / Parsoid / Citoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180130T1800). [18:00:06] No GERRIT patches in the queue for this window AFAICS. [18:00:13] (03PS2) 10Bstorm: Add bstorm to cloud-wide root [labs/private] - 10https://gerrit.wikimedia.org/r/406842 (https://phabricator.wikimedia.org/T185493) [18:00:15] Nothing for ORES [18:07:25] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Disable notifications on es2018 before reimage" [puppet] - 10https://gerrit.wikimedia.org/r/406838 (owner: 10Jcrespo) [18:07:43] RECOVERY - Check Varnish expiry mailbox lag on cp4021 is OK: OK: expiry mailbox lag is 0 [18:08:09] !log installing PHP security updates [18:08:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:08:47] 10Operations, 10Domains, 10Traffic, 10WMF-Design, and 2 others: Create subdomain for Design and Wikimedia User Interface Style Guide - https://phabricator.wikimedia.org/T185282#3931431 (10Dzahn) 05Open>03stalled We need some more information what exactly is requested, please. Is the expectation a DNS... [18:12:22] 13:00 jouncebot: "#bothumor When your hammer is PHP, everything starts looking like a thumb." 13:08 moritzm> !log installing PHP security updates haha [18:13:23] * moritzm looks at his mashed thumb [18:16:44] ;) [18:18:59] 10Operations, 10Ops-Access-Requests: Requesting access to analytics-users / webrequest for Esteban - https://phabricator.wikimedia.org/T185988#3931468 (10Dzahn) p:05Triage>03Normal [18:23:20] 10Operations, 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Tests, 10HHVM: HHVM 3.18.5+dfsg-1+wmf3 changes parse_url causing unit tests to fail - https://phabricator.wikimedia.org/T185024#3931478 (10MoritzMuehlenhoff) A revised fix has been released (along with 3.18.8), I'll roll that into our... [18:24:39] (03PS1) 10Marostegui: s6.hosts: Remove db1030 [software] - 10https://gerrit.wikimedia.org/r/406843 (https://phabricator.wikimedia.org/T184397) [18:24:48] 10Operations, 10Discovery, 10Discovery-Analysis, 10Discovery-Search (Current work): Upload shiny-server .deb to our Jessie apt repository - https://phabricator.wikimedia.org/T168967#3931493 (10bd808) [18:26:49] (03CR) 10Rush: [V: 032 C: 032] Add bstorm to cloud-wide root [labs/private] - 10https://gerrit.wikimedia.org/r/406842 (https://phabricator.wikimedia.org/T185493) (owner: 10Bstorm) [18:32:54] PROBLEM - MegaRAID on db1051 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough [18:41:21] (03PS3) 10Madhuvishy: Cumin: add custom backend in WMCS [puppet] - 10https://gerrit.wikimedia.org/r/406778 (https://phabricator.wikimedia.org/T185967) (owner: 10Volans) [18:42:54] RECOVERY - MegaRAID on db1051 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy [18:43:18] (03CR) 10Volans: [C: 032] Cumin: add custom backend in WMCS [puppet] - 10https://gerrit.wikimedia.org/r/406778 (https://phabricator.wikimedia.org/T185967) (owner: 10Volans) [18:52:29] 10Operations, 10Ops-Access-Requests: Requesting access to analytics-users / webrequest for Esteban - https://phabricator.wikimedia.org/T185988#3931621 (10Dzahn) [19:19:35] (03CR) 10Volans: [C: 04-1] "Looks mostly good, see the comments inline." (032 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/405301 (owner: 10Giuseppe Lavagetto) [19:20:38] 10Operations, 10Ops-Access-Requests: Requesting access to analytics-users / webrequest for Esteban - https://phabricator.wikimedia.org/T185988#3931822 (10Dzahn) a:03Dzahn Hello @Esteban could you add some more information please? What do you need the access for? Have you worked with anyone at WMF? Is it for... [19:22:44] (03PS1) 10Jcrespo: mariadb: Emergency depool db1051 (s5- dewiki - dump) because it is lagging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406847 [19:28:49] (03CR) 10Volans: "Looks good, a couple of nitpick comments inline." (032 comments) [software/conftool] - 10https://gerrit.wikimedia.org/r/405302 (https://phabricator.wikimedia.org/T185080) (owner: 10Giuseppe Lavagetto) [19:31:22] (03CR) 10Jcrespo: [C: 032] mariadb: Emergency depool db1051 (s5- dewiki - dump) because it is lagging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406847 (owner: 10Jcrespo) [19:32:54] PROBLEM - MegaRAID on db1051 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough [19:35:05] (03Merged) 10jenkins-bot: mariadb: Emergency depool db1051 (s5- dewiki - dump) because it is lagging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406847 (owner: 10Jcrespo) [19:36:34] jynus: thanks I was looking at it now [19:37:00] I think someone augmented the time between checks [19:37:13] (03CR) 10jenkins-bot: mariadb: Emergency depool db1051 (s5- dewiki - dump) because it is lagging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406847 (owner: 10Jcrespo) [19:37:15] that is not ok for databases [19:37:41] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1051 (duration: 00m 57s) [19:37:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:41:06] jynus: it look it's every 10 minutes with 3 retries, so ends in 30 minutes before alarming AFACIS [19:42:40] (03CR) 10Chad: [C: 032] Add open.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406803 (owner: 10Chad) [19:43:21] 10Operations, 10Analytics-Kanban, 10User-Elukey: Expand meitnerium's root partition to 100G - https://phabricator.wikimedia.org/T186020#3931925 (10elukey) So meitnerium seems to be on ganeti1005 that has a ton of disk space free, so in theory the only thing needed to create the new disk would be the followin... [19:43:25] 10Operations, 10ops-esams, 10netops: reconfigure esams switch port for new bastion - https://phabricator.wikimedia.org/T186021#3931926 (10Dzahn) [19:46:00] 10Operations, 10ops-esams, 10Patch-For-Review: install/designate other machine as esams bastion - https://phabricator.wikimedia.org/T184936#3931945 (10Dzahn) [19:46:02] 10Operations, 10ops-esams, 10netops: reconfigure esams switch port for new bastion - https://phabricator.wikimedia.org/T186021#3931941 (10Dzahn) 05Open>03stalled We don't know the switch port this was connected to and it's already disabled. So it wasn't possible for ayounsi to do this from remote. Also... [19:46:08] 10Operations, 10ops-esams, 10netops: reconfigure esams switch port for new bastion - https://phabricator.wikimedia.org/T186021#3931946 (10Dzahn) a:05Dzahn>03None [19:47:30] (03Merged) 10jenkins-bot: Add open.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406803 (owner: 10Chad) [19:47:43] 10Operations, 10Domains, 10Traffic, 10WMF-Design, and 2 others: Create subdomain for Design and Wikimedia User Interface Style Guide - https://phabricator.wikimedia.org/T185282#3911827 (10Bawolff) Have we considered just having it as a subdirectory of https://doc.wikimedia.org/ ? It seems like a documentat... [19:47:44] (03CR) 10jenkins-bot: Add open.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/406803 (owner: 10Chad) [19:51:21] 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): Onboard bstorm to WMF - https://phabricator.wikimedia.org/T185493#3931971 (10chasemp) [19:52:54] 10Operations, 10Domains, 10Traffic, 10WMF-Design, and 2 others: Create subdomain for Design and Wikimedia User Interface Style Guide - https://phabricator.wikimedia.org/T185282#3931973 (10Volker_E) @Bawolff There have been conversations in the Design team around the location. The Style Guide is planned as... [19:58:34] (03PS1) 10Rush: cloud: overlay whitelist as default [puppet] - 10https://gerrit.wikimedia.org/r/406851 [20:01:29] (03CR) 10Andrew Bogott: [C: 031] cloud: overlay whitelist as default [puppet] - 10https://gerrit.wikimedia.org/r/406851 (owner: 10Rush) [20:04:30] (03PS1) 10Andrew Bogott: openstack horizon: rough in manifests for source deploy of Horizon 'ocata' [puppet] - 10https://gerrit.wikimedia.org/r/406853 (https://phabricator.wikimedia.org/T168470) [20:05:05] (03CR) 10jerkins-bot: [V: 04-1] openstack horizon: rough in manifests for source deploy of Horizon 'ocata' [puppet] - 10https://gerrit.wikimedia.org/r/406853 (https://phabricator.wikimedia.org/T168470) (owner: 10Andrew Bogott) [20:22:54] RECOVERY - MegaRAID on db1051 is OK: OK: optimal, 1 logical, 2 physical, WriteBack policy [20:31:56] 10Operations, 10ORES, 10Scoring-platform-team, 10Patch-For-Review: Update log config for scb* boxes, to deal with ORES verbose logging - https://phabricator.wikimedia.org/T182497#3932192 (10Halfak) [21:00:40] 10Operations, 10Traffic, 10Performance-Team (Radar): load.php response taking 160s (of which only 0.031s in Apache) - https://phabricator.wikimedia.org/T181315#3932349 (10Imarlier) [21:38:43] !log demon@tin Synchronized dblists/open.dblist: Adding open.dblist (duration: 00m 57s) [21:38:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:39:54] !log demon@tin Synchronized docroot/noc/conf/open.dblist: (no justification provided) (duration: 00m 57s) [21:40:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:22:14] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/403574 (https://phabricator.wikimedia.org/T183999) (owner: 10Thcipriani) [22:24:02] (03PS1) 10Elukey: profile::analytics::refinery::job::camus: add netflow hourly job [puppet] - 10https://gerrit.wikimedia.org/r/406951 (https://phabricator.wikimedia.org/T181036) [22:25:42] (03PS2) 10Elukey: profile::analytics::refinery::job::camus: add netflow hourly job [puppet] - 10https://gerrit.wikimedia.org/r/406951 (https://phabricator.wikimedia.org/T181036) [22:34:13] Ok, I set jhs to banned for now and dropped them a PM. their client spamming connection/disconnection every minute was getting old. [22:34:24] whenever they fix their client, they can be unbanned [22:37:50] robh: to be honest, its been a daily for like atleast days or even weeks at this point 9.9 [22:38:16] with normal channel spam of bots and commits i just never noticed ;] [22:38:38] well its not only this channel -stewards is also affected [22:39:05] I'm pretty sure I lack rights elsewhere [22:39:11] they banned it too there, i would say perma ban the user until he fixed the problem [22:39:41] i didnt set expiry so unless someone undoes it, its sticking. if they state they've fixed it im happy to let them back in but they need to address it [22:40:28] yea, im wondering what the issue is [22:41:00] cause once it happens it seems to get into "crash" loop if you will... [22:42:22] (03PS3) 10Elukey: profile::analytics::refinery::job::camus: add netflow hourly job [puppet] - 10https://gerrit.wikimedia.org/r/406951 (https://phabricator.wikimedia.org/T181036) [22:42:41] (03PS1) 10Dzahn: openstack: replace apache module with httpd module [puppet] - 10https://gerrit.wikimedia.org/r/406954 [22:45:24] PROBLEM - HHVM rendering on mw2209 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [22:46:14] RECOVERY - HHVM rendering on mw2209 is OK: HTTP OK: HTTP/1.1 200 OK - 75872 bytes in 0.300 second response time [22:50:41] robh: https://meta.wikimedia.org/wiki/User_talk:Jon_Harald_S%C3%B8by#Notice <- i posted onwiki so he might notice it sooner [22:51:50] col thx =] [22:51:54] cool even [22:51:56] (03CR) 10Dzahn: [C: 04-1] "http://puppet-compiler.wmflabs.org/9817/ :(" [puppet] - 10https://gerrit.wikimedia.org/r/406954 (owner: 10Dzahn) [22:52:32] that is indeed better than just my irc pm (easily lost with client issues) [22:52:50] (03CR) 10Dzahn: [C: 04-1] "well... i didn't realize californium is on trusty :/" [puppet] - 10https://gerrit.wikimedia.org/r/406954 (owner: 10Dzahn) [22:53:29] If I would were to guess it might be an issue with a really strict reconnect policy [22:57:30] yeah its not nick based (not being booted for nick) since its showing his cloak when he is connected [22:57:35] so its identified to services fine [22:57:40] oh well [22:57:53] rephrase: i dont think its nick based, could be wrong. [22:58:47] I think he got K-lined multiple times even [22:58:55] (03PS2) 10Dzahn: openstack/striker: replace apache module with httpd module [puppet] - 10https://gerrit.wikimedia.org/r/406954 [22:58:55] for this exact issue [22:58:57] i didnt realize who it was until you linked their wikiuserpage [22:59:15] yea I had to figure it out myself too [23:00:33] https://meta.wikimedia.org/wiki/Stewards/confirm/2007#Jon_Harald_Søby <- its mentioned here [23:05:13] 10Operations, 10monitoring: Icinga: page in case all MediaWiki are throwing 5xx - https://phabricator.wikimedia.org/T186069#3932784 (10Volans) [23:05:56] Also banned in #wikidata btw [23:07:14] you might want to mention that on his talkpage [23:07:53] The bans in wikidata and stewards get removed daily. :| [23:07:57] (by some bot) [23:08:25] banbot yes [23:08:41] can't you set an expiry? [23:08:53] wikibugs just quit other channels for flood [23:08:58] I probably can, but I'm lazy and mostly people don't have recurring problems as this [23:23:42] (03PS3) 10Dzahn: openstack: unified role wikitech/horizon/striker, apache -> httpd [puppet] - 10https://gerrit.wikimedia.org/r/406954 [23:34:27] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/9819/californium.wikimedia.org/change.californium.wikimedia.org.err" [puppet] - 10https://gerrit.wikimedia.org/r/406954 (owner: 10Dzahn) [23:37:37] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: Rack/setup frmon1001 - https://phabricator.wikimedia.org/T186073#3932868 (10RobH) [23:37:45] 10Operations, 10ops-eqiad, 10fundraising-tech-ops: Rack/setup frmon1001 - https://phabricator.wikimedia.org/T186073#3932868 (10RobH) p:05Lowest>03Normal [23:39:14] (03PS4) 10Dzahn: openstack: unified role wikitech/horizon/striker, apache -> httpd [puppet] - 10https://gerrit.wikimedia.org/r/406954 [23:52:11] (03PS5) 10Dzahn: openstack: unified role wikitech/horizon/striker,apache -> httpd [puppet] - 10https://gerrit.wikimedia.org/r/406954