[00:08:05] RECOVERY - puppet last run on mw2017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:28:24] RECOVERY - puppet last run on ms-be2022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:10:03] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [01:10:52] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [01:14:32] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [01:15:13] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [01:30:57] 06Operations, 06Commons, 10media-storage, 07User-notice: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2446267 (10kaldari) Seeing complaints on the Commons' village pump: https://commons.wikimedia.org/wiki/Commons:Village_pump#Fo... [02:21:14] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.9) (duration: 08m 41s) [02:21:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:26:55] !log l10nupdate@tin ResourceLoader cache refresh completed at Mon Jul 11 02:26:55 UTC 2016 (duration 5m 41s) [02:27:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:56:58] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/install/deploy labvirt nodes - https://phabricator.wikimedia.org/T138509#2446359 (10Andrew) [04:04:37] PROBLEM - puppet last run on elastic2010 is CRITICAL: CRITICAL: puppet fail [04:28:57] RECOVERY - puppet last run on elastic2010 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [04:37:48] (03PS1) 10EBernhardson: logstash: Normalize a few more fields [puppet] - 10https://gerrit.wikimedia.org/r/298242 [04:38:08] (03PS2) 10EBernhardson: logstash: Normalize a few more fields [puppet] - 10https://gerrit.wikimedia.org/r/298242 [04:45:06] (03CR) 10BryanDavis: [C: 031] "Is this going to turn into a never ending game of whack-a-mole?" [puppet] - 10https://gerrit.wikimedia.org/r/298242 (owner: 10EBernhardson) [05:37:42] (03PS2) 10Muehlenhoff: ocg: Restrict to DOMAIN_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/297840 [05:41:44] (03CR) 10Muehlenhoff: [C: 032 V: 032] ocg: Restrict to DOMAIN_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/297840 (owner: 10Muehlenhoff) [06:13:33] !log restarted saltmaster on neodymium [06:13:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:16:33] (03Abandoned) 10Muehlenhoff: ocg: Use PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/297583 (owner: 10Muehlenhoff) [06:30:28] PROBLEM - puppet last run on mw2208 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:18] PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:27] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Puppet has 3 failures [06:31:29] PROBLEM - puppet last run on aqs1002 is CRITICAL: CRITICAL: Puppet has 2 failures [06:31:58] PROBLEM - puppet last run on mw1110 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:28] PROBLEM - puppet last run on mw2250 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:37] PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:37] PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:57] PROBLEM - puppet last run on analytics1047 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:28] PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:08] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:24] <_joe_> !log restarted hhvm on mw1168 [06:34:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:36:08] RECOVERY - HHVM jobrunner on mw1168 is OK: HTTP OK: HTTP/1.1 200 OK - 222 bytes in 0.016 second response time [06:56:17] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [06:56:39] RECOVERY - puppet last run on mw1110 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:57:08] RECOVERY - puppet last run on mw2250 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:57:18] RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:57:19] RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:57:38] RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:39] RECOVERY - puppet last run on mw2208 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:58:09] RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:27] RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:39] RECOVERY - puppet last run on aqs1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:58] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:59:46] (03PS2) 10Giuseppe Lavagetto: puppetmaster: perform git init in the private repo dir [puppet] - 10https://gerrit.wikimedia.org/r/297988 (https://phabricator.wikimedia.org/T98173) [06:59:48] (03PS2) 10Giuseppe Lavagetto: puppetmaster: add rhodium as an inactive backend [puppet] - 10https://gerrit.wikimedia.org/r/297987 (https://phabricator.wikimedia.org/T98173) [07:01:39] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: perform git init in the private repo dir [puppet] - 10https://gerrit.wikimedia.org/r/297988 (https://phabricator.wikimedia.org/T98173) (owner: 10Giuseppe Lavagetto) [07:06:25] 06Operations, 10Wikimedia-Apache-configuration, 07HHVM, 07Wikimedia-log-errors: Fix Apache proxy_fcgi error "Invalid argument: AH01075: Error dispatching request to" (Causing HTTP 503) - https://phabricator.wikimedia.org/T73487#2446548 (10elukey) After a bit of digging, it seems that the AH01075/AH01068 er... [07:11:18] (03CR) 10Giuseppe Lavagetto: [C: 032] "https://puppet-compiler.wmflabs.org/3306/palladium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/297987 (https://phabricator.wikimedia.org/T98173) (owner: 10Giuseppe Lavagetto) [07:13:13] !log mobileapps deploying 6e409f46 [07:13:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:15:19] !log citoid deploying 274c0231d [07:15:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:16:33] PROBLEM - Restbase root url on restbase1009 is CRITICAL: Connection refused [07:16:43] PROBLEM - restbase endpoints health on restbase1009 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.48.110, port=7231): Max retries exceeded with url: /en.wikipedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [07:17:25] again? [07:17:27] looking ^ [07:18:42] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: puppet fail [07:18:43] RECOVERY - Restbase root url on restbase1009 is OK: HTTP OK: HTTP/1.1 200 - 15273 bytes in 0.010 second response time [07:18:54] RECOVERY - restbase endpoints health on restbase1009 is OK: All endpoints are healthy [07:19:47] !log cxserver deploying fd8eca47e [07:19:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:24:45] !log mathoid deploying 669cfc0 [07:24:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:26:33] !log graphoid deploying 375d31fd [07:26:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:29:12] !log change-prop deploying 2b699a6 [07:29:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:37:13] 06Operations, 06Services, 13Patch-For-Review, 15User-mobrovac: Updates various services to nodejs 4.4.6 - https://phabricator.wikimedia.org/T138561#2446582 (10mobrovac) [07:45:02] RECOVERY - Unmerged changes on repository puppet on rhodium is OK: No changes to merge. [07:45:43] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [07:46:52] (03PS3) 10Giuseppe Lavagetto: puppetmaster: perform git init in the private repo dir [puppet] - 10https://gerrit.wikimedia.org/r/297988 (https://phabricator.wikimedia.org/T98173) [07:48:54] (03PS1) 10KartikMistry: apertium-mk-en: Initial Debian packaging [debs/contenttranslation/apertium-mk-en] - 10https://gerrit.wikimedia.org/r/298250 (https://phabricator.wikimedia.org/T139918) [07:49:46] mobrovac: thanks for node upgrade. [07:50:31] mobrovac: what are blockers in https://gerrit.wikimedia.org/r/#/c/292894/ ? It is waiting for your review and deployment. [07:50:56] (03CR) 10Giuseppe Lavagetto: [C: 032] "I'll apply this carefully." [puppet] - 10https://gerrit.wikimedia.org/r/297988 (https://phabricator.wikimedia.org/T98173) (owner: 10Giuseppe Lavagetto) [07:54:20] (03PS1) 10Giuseppe Lavagetto: puppetmaster: fixup for I1842586a7 [puppet] - 10https://gerrit.wikimedia.org/r/298251 [07:56:50] (03CR) 10Giuseppe Lavagetto: [C: 032] puppetmaster: fixup for I1842586a7 [puppet] - 10https://gerrit.wikimedia.org/r/298251 (owner: 10Giuseppe Lavagetto) [08:03:00] RECOVERY - puppet last run on rhodium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [08:16:19] (03PS1) 10Ori.livneh: PCC: Fix success/failure detection [puppet] - 10https://gerrit.wikimedia.org/r/298252 [08:23:00] (03CR) 10ArielGlenn: [C: 032] allow list of jobs to run to be passed as argument for dump scripts [dumps] - 10https://gerrit.wikimedia.org/r/297572 (owner: 10ArielGlenn) [08:38:50] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: access: eventlogging-admins -> hafnium - https://phabricator.wikimedia.org/T139202#2446657 (10elukey) @Ottomata, I tried to look the history of the changes but I got lost in it. From what I gathered including one of the eventlogging roles also brings t... [08:51:42] 06Operations, 10ops-eqiad, 10media-storage: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631#2446673 (10fgiunchedi) @Cmjohnson I see ms-be1027 is missing from the list? also what's the status of these servers? thanks! [08:58:28] (03PS1) 10ArielGlenn: add argument for specifying date of dump to use for prefetch [dumps] - 10https://gerrit.wikimedia.org/r/298254 (https://phabricator.wikimedia.org/T137887) [09:15:21] PROBLEM - puppet last run on db2048 is CRITICAL: CRITICAL: puppet fail [09:19:54] 06Operations, 10ops-eqiad, 13Patch-For-Review: Rack and Set up new application servers mw1261-1283 - https://phabricator.wikimedia.org/T133798#2446709 (10Joe) [09:19:56] 06Operations, 10ops-eqiad, 13Patch-For-Review: Rack and Set up new application servers mw1284-1306 - https://phabricator.wikimedia.org/T134309#2446708 (10Joe) 05Open>03Resolved [09:20:10] 06Operations, 10ops-eqiad, 13Patch-For-Review: Rack and Set up new application servers mw1261-1283 - https://phabricator.wikimedia.org/T133798#2244548 (10Joe) 05Open>03Resolved [09:21:29] 06Operations, 13Patch-For-Review: install/setup/deploy server rhodium as puppetmaster (scaling out) - https://phabricator.wikimedia.org/T98173#2446713 (10Joe) a:05akosiaris>03Joe [09:22:04] (03PS1) 10Giuseppe Lavagetto: puppetmaster: correct puppettization of the private repo [puppet] - 10https://gerrit.wikimedia.org/r/298258 (https://phabricator.wikimedia.org/T98173) [09:26:08] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: correct puppettization of the private repo [puppet] - 10https://gerrit.wikimedia.org/r/298258 (https://phabricator.wikimedia.org/T98173) (owner: 10Giuseppe Lavagetto) [09:29:20] RECOVERY - puppet last run on db2048 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [09:30:39] (03PS2) 10Giuseppe Lavagetto: puppetmaster: correct puppettization of the private repo [puppet] - 10https://gerrit.wikimedia.org/r/298258 (https://phabricator.wikimedia.org/T98173) [09:32:57] 06Operations, 06Commons, 10media-storage, 07User-notice: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2446734 (10MoritzMuehlenhoff) I've built backports of Ghostscript 9.19 and Pango 1.40.1 for jessie, but to test this on a depo... [09:40:19] !log upgrading cp1046 to varnish 4.1.3-1wm1 [09:40:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:47:41] !log installing GCC stable updates on trusty systems (also provides some runtime libs in addition to GCC itself) [09:47:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:59:02] (03CR) 10ArielGlenn: [C: 032] add argument for specifying date of dump to use for prefetch [dumps] - 10https://gerrit.wikimedia.org/r/298254 (https://phabricator.wikimedia.org/T137887) (owner: 10ArielGlenn) [10:00:22] !log swift codfw-prod: ms-be202[567] weight 2000 [10:00:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:08:48] (03Abandoned) 10Hashar: (DO NOT SUBMIT) contint: pin chromium to 49 on Trusty [puppet] - 10https://gerrit.wikimedia.org/r/291116 (https://phabricator.wikimedia.org/T136188) (owner: 10Hashar) [10:17:26] (03CR) 10Filippo Giunchedi: "scheduled for next puppet SWAT, tomorrow" [puppet] - 10https://gerrit.wikimedia.org/r/204528 (https://phabricator.wikimedia.org/T96230) (owner: 10Coren) [10:18:18] hashar: ^ has been running cherry-picked anyway until now no? [10:21:49] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: access: eventlogging-admins -> hafnium - https://phabricator.wikimedia.org/T139202#2446798 (10fgiunchedi) p:05Triage>03Normal [10:24:13] godog: yeah on the CI puppet master [10:24:21] and that only impacts labs instances [10:24:28] godog: it is a bit of a hack / trick though :( [10:25:00] hashar: I didn't read the whole history, I'm assuming slow disks ? [10:25:54] !log CI: upgraded Chromium from v49 to v51 (v50 caused qunit jobs to fail / timeout randomly) T136188 [10:25:55] T136188: qunit jobs have karma stall when chromium disconnect - https://phabricator.wikimedia.org/T136188 [10:25:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:26:05] godog: yeah so mysql data are on a tmpfs [10:26:38] godog: and there is some kind of race condition between the mysql package installing on /var/mysql , our conf willing to have it on /mnt/tmpfs or something [10:26:45] and tmpfs having to be available before mysql start [10:28:45] lunch & [10:33:06] (03PS9) 10Addshore: WIP DRAFT WMDE_Analytics module [puppet] - 10https://gerrit.wikimedia.org/r/269467 [10:33:10] hashar: I see [10:34:07] (03CR) 10jenkins-bot: [V: 04-1] WIP DRAFT WMDE_Analytics module [puppet] - 10https://gerrit.wikimedia.org/r/269467 (owner: 10Addshore) [10:34:34] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to deployment hosts (tin/terbium) for Brian Wolff - https://phabricator.wikimedia.org/T138635#2446811 (10fgiunchedi) p:05Triage>03Normal [10:35:59] 06Operations, 10Ops-Access-Requests: root access on security-tools instances for Darian Patrick - https://phabricator.wikimedia.org/T138873#2446815 (10fgiunchedi) p:05Triage>03Normal [10:37:14] (03CR) 10Muehlenhoff: [C: 031] "Ack. This was introduced as a side effect of a refactoring done by Ori back in November." [puppet] - 10https://gerrit.wikimedia.org/r/298120 (https://phabricator.wikimedia.org/T139202) (owner: 10Dzahn) [10:39:44] (03CR) 10Giuseppe Lavagetto: [C: 032] "I agree, this should be removed; it also notably didn't work very well." [puppet] - 10https://gerrit.wikimedia.org/r/296727 (owner: 10Faidon Liambotis) [10:40:59] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: access: eventlogging-admins -> hafnium - https://phabricator.wikimedia.org/T139202#2446825 (10elukey) From @MoritzMuehlenhoff in the code review: ``` Ack. This was introduced as a side effect of a refactoring done by Ori back in November. commit 0846... [10:41:15] (03CR) 10Nikerabbit: [C: 031] State Compact Language Links isn't beta anymore [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298134 (https://phabricator.wikimedia.org/T136677) (owner: 10Dereckson) [10:42:17] 06Operations, 10ops-esams: Move cp3030+ from OE14 to OE13 in racktables - https://phabricator.wikimedia.org/T136403#2446827 (10MoritzMuehlenhoff) p:05Triage>03Normal [10:44:35] (03PS6) 10Elukey: Add the -T VSL API timeout parameter plus the related formatter. [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/295652 [10:45:30] 06Operations, 10Ops-Access-Requests: analytics server access request for three users from CPS Data Consulting - https://phabricator.wikimedia.org/T139764#2446832 (10fgiunchedi) p:05Triage>03Normal If I understand correctly this is also shell access, in that case the the steps outlined at https://wikitech.w... [10:48:13] 06Operations, 10Ops-Access-Requests: analytics server access request for three users from CPS Data Consulting - https://phabricator.wikimedia.org/T139764#2441876 (10MoritzMuehlenhoff) @Jgreen Does the consulting contract have a determined end? We need some kind of feedback loop so that we disable their accoun... [10:48:51] 06Operations, 10Ops-Access-Requests: Allow *-admin groups to see systemd logs for their units - https://phabricator.wikimedia.org/T137878#2446840 (10fgiunchedi) p:05Triage>03Normal [10:51:40] 06Operations, 06Analytics-Kanban, 10Traffic, 13Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2446844 (10elukey) I checked again how VSL manages memory to establish the effect of the -T timeout (default 120 sec) and -L limit (... [10:53:04] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Augeas was enabled in Ie6ee99dc on trusty, and quickly disabled on trusty because of reported labs failures." [puppet] - 10https://gerrit.wikimedia.org/r/296728 (owner: 10Faidon Liambotis) [10:53:19] (03PS1) 10Dereckson: Allow to import from zh.wikipedia to beta.wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298266 (https://phabricator.wikimedia.org/T139922) [10:59:45] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "Generally very good, a few things to fix." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/296729 (owner: 10Faidon Liambotis) [11:16:06] (03PS10) 10Addshore: WIP DRAFT WMDE_Analytics module [puppet] - 10https://gerrit.wikimedia.org/r/269467 [11:17:12] (03CR) 10jenkins-bot: [V: 04-1] WIP DRAFT WMDE_Analytics module [puppet] - 10https://gerrit.wikimedia.org/r/269467 (owner: 10Addshore) [11:19:07] !log installing hhvm updates on canary app servers [11:19:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:22:47] (03PS11) 10Addshore: WIP DRAFT WMDE_Analytics module [puppet] - 10https://gerrit.wikimedia.org/r/269467 [11:30:37] !log upgrading eqiad cache_maps to varnish 4.1.3-1wm1 [11:30:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:34:36] legoktm: around? [11:35:47] Amir1: not really [11:36:43] legoktm: okay, I guess it's around 4 AM there. Just wanted to thank you and ask you to write an email to global renamers whenever you can (no rush) :) [11:39:15] PROBLEM - aqs endpoints health on aqs1004 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.107, port=7232): Max retries exceeded with url: /analytics.wikimedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [11:39:55] PROBLEM - AQS root url on aqs1004 is CRITICAL: Connection refused [11:43:01] bad deploy, checking---^ [11:44:03] this is not serving live traffic btw, so silencing it [11:51:52] mobrovac: do you have a minute ? [12:03:11] !log upgrading codfw cache_maps to varnish 4.1.3-1wm1 [12:03:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:03:24] PROBLEM - Apache HTTP on mw1261 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.004 second response time [12:03:37] checking --^ [12:05:25] RECOVERY - Apache HTTP on mw1261 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.037 second response time [12:12:49] 06Operations, 10GlobalRename, 10MediaWiki-extensions-CentralAuth, 13Patch-For-Review, and 2 others: GlobalRename gets stuck sometimes - https://phabricator.wikimedia.org/T137973#2447037 (10biplabanand) >>! In T137973#2440559, @Pokefan95 wrote: >>>! In T137973#2440540, @biplabanand wrote: >>>>! In T137973#2... [12:14:12] !log restarted hhvm on mw1261 [12:14:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:16:02] I didn't find any culprit, hhvm seemed not working (forgot to dump debug messages sorry). I jumped on it since mw1261 is running a patched version of httpd, but it was not related [12:23:47] (03PS1) 10Yuvipanda: Add python2 webservice support [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298270 (https://phabricator.wikimedia.org/T139783) [12:24:31] !log upgrading ulsfo cache_maps to varnish 4.1.3-1wm1 [12:24:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:24:39] (03PS2) 10Yuvipanda: Add python2 webservice support [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298270 (https://phabricator.wikimedia.org/T139783) [12:27:22] (03CR) 10Merlijn van Deen: [C: 031] "lgtm although 256M migth be on the low side" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298270 (https://phabricator.wikimedia.org/T139783) (owner: 10Yuvipanda) [12:28:23] (03CR) 10Yuvipanda: [C: 032] Add python2 webservice support [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298270 (https://phabricator.wikimedia.org/T139783) (owner: 10Yuvipanda) [12:28:45] PROBLEM - Apache HTTP on mw1261 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50404 bytes in 0.025 second response time [12:28:58] (03Merged) 10jenkins-bot: Add python2 webservice support [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298270 (https://phabricator.wikimedia.org/T139783) (owner: 10Yuvipanda) [12:29:41] elukey: looks like whatever it was is occuring again [12:31:15] RECOVERY - Apache HTTP on mw1261 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.045 second response time [12:33:34] RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy [12:33:44] RECOVERY - AQS root url on aqs1004 is OK: HTTP OK: HTTP/1.1 200 - 727 bytes in 0.007 second response time [12:33:53] checking mw1261 again [12:34:41] (03PS12) 10Yuvipanda: Add python2 base + web image [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/297624 [12:35:39] (03CR) 10Yuvipanda: [C: 032] Add python2 base + web image [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/297624 (owner: 10Yuvipanda) [12:35:53] (03CR) 10KartikMistry: "Please see, https://phabricator.wikimedia.org/T138524#2446680" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298187 (https://phabricator.wikimedia.org/T139903) (owner: 10Urbanecm) [12:36:22] (03Merged) 10jenkins-bot: Add python2 base + web image [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/297624 (owner: 10Yuvipanda) [12:41:30] 06Operations, 10Ops-Access-Requests: analytics server access request for three users from CPS Data Consulting - https://phabricator.wikimedia.org/T139764#2447081 (10Jgreen) >>! In T139764#2446832, @fgiunchedi wrote: > If I understand correctly this is also shell access, in that case the the steps outlined at h... [12:45:32] 06Operations, 10Ops-Access-Requests: analytics server access request for three users from CPS Data Consulting - https://phabricator.wikimedia.org/T139764#2447082 (10Jgreen) >>! In T139764#2446836, @MoritzMuehlenhoff wrote: > @Jgreen Does the consulting contract have a determined end? We need some kind of feed... [12:51:06] !log upgrading nodejs to 4.4.6 on maps2.* servers [12:51:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:01:40] (03PS7) 10Elukey: Add the -T VSL API timeout parameter plus the related formatter. [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/295652 [13:02:16] elukey: what happened with aqs? [13:02:20] with the deploy i mean [13:05:23] mobrovac: o/ [13:06:24] the new cluster has different settings (joal has more context) and I wasn't able to check the logs to see what was happening.. I didn't find any error message in logstash->restbase so I disabled logstash logging in the aqs1004 config and checked via journalctl [13:06:24] (03PS1) 10Hashar: contint: migrate coverage report under doc.wm.o [puppet] - 10https://gerrit.wikimedia.org/r/298274 (https://phabricator.wikimedia.org/T139620) [13:07:04] the nodejs processes were churning a lot (fail/respawn) and I think that they were unable to log on logstash [13:07:13] 06Operations, 10GlobalRename, 10MediaWiki-extensions-CentralAuth, 13Patch-For-Review, and 2 others: GlobalRename gets stuck sometimes - https://phabricator.wikimedia.org/T137973#2447138 (10K6ka) Can we start renaming again? Recent renames seem to be going through, but I'm not getting any "official" confirm... [13:07:13] but I might be wrong, very ignorant [13:07:19] (03CR) 10jenkins-bot: [V: 04-1] contint: migrate coverage report under doc.wm.o [puppet] - 10https://gerrit.wikimedia.org/r/298274 (https://phabricator.wikimedia.org/T139620) (owner: 10Hashar) [13:07:38] !log upgrading esams cache_maps to varnish 4.1.3-1wm1 [13:07:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:07:45] PROBLEM - puppet last run on oxygen is CRITICAL: CRITICAL: puppet fail [13:09:34] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2447139 (10Gehel) [13:12:49] (03CR) 10Elukey: "After Ema's review I noticed that varnishncsa does offer a VSL formatter with a very different semantic (https://www.varnish-cache.org/doc" [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/295652 (owner: 10Elukey) [13:13:23] (03CR) 10Hashar: "Has to be synced with a couple other changes:" [puppet] - 10https://gerrit.wikimedia.org/r/298274 (https://phabricator.wikimedia.org/T139620) (owner: 10Hashar) [13:16:32] (03PS8) 10Elukey: Add the -T VSL API timeout parameter plus the related formatter. [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/295652 [13:18:32] 06Operations, 10DBA, 13Patch-For-Review, 05Prometheus-metrics-monitoring: implement performance_schema for mysql monitoring - https://phabricator.wikimedia.org/T99485#2447148 (10jcrespo) [13:19:10] 06Operations, 10Traffic, 10Continuous-Integration-Infrastructure (phase-out-gallium): Move gallium to an internal host? - https://phabricator.wikimedia.org/T133150#2447151 (10hashar) integration.wikimedia.org (with Zuul and Jenkins) is going to migrate to scandium.eqiad.wmnet doc.wikimedia.org is looking fo... [13:19:55] 06Operations, 10vm-requests, 13Patch-For-Review, 05Prometheus-metrics-monitoring: eqiad/codfw: 4 VM request for prometheus - https://phabricator.wikimedia.org/T136313#2447153 (10jcrespo) [13:20:08] 06Operations, 10DBA, 05Prometheus-metrics-monitoring: Decide storage backend for performance schema monitoring stats - https://phabricator.wikimedia.org/T119619#2447154 (10jcrespo) [13:21:37] (03CR) 10Ema: [C: 031] Add the -T VSL API timeout parameter plus the related formatter. [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/295652 (owner: 10Elukey) [13:31:47] !log rebooting acamar for kernel update [13:31:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:34:14] RECOVERY - puppet last run on oxygen is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [13:37:26] PROBLEM - PyBal backends health check on lvs2006 is CRITICAL: PYBAL CRITICAL - apaches_80 - Could not depool server mw2193.codfw.wmnet because of too many down!: kartotherian_6533 - Could not depool server maps2004.codfw.wmnet because of too many down!: restbase_7231 - Could not depool server restbase2003.codfw.wmnet because of too many down! [13:39:04] PROBLEM - PyBal backends health check on lvs2003 is CRITICAL: PYBAL CRITICAL - rendering_80 - Could not depool server mw2087.codfw.wmnet because of too many down!: apaches_80 - Could not depool server mw2187.codfw.wmnet because of too many down! [13:39:27] <_joe_> moritzm: ^^ you right [13:39:54] RECOVERY - PyBal backends health check on lvs2006 is OK: PYBAL OK - All pools are healthy [13:39:56] no, not me? [13:40:17] I upgraded the canaries only so far [13:40:30] (in eqiad) [13:41:25] RECOVERY - PyBal backends health check on lvs2003 is OK: PYBAL OK - All pools are healthy [13:45:30] (03CR) 10Ottomata: "I think it would be fine to put eventlogging-admins in the webperf role hiera." [puppet] - 10https://gerrit.wikimedia.org/r/298120 (https://phabricator.wikimedia.org/T139202) (owner: 10Dzahn) [13:46:11] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2447205 (10BBlack) The cutoff date is coming up tomorrow! One more list update, from the past 48H: New usernames not seen before: ``` HWY... [13:49:31] (03CR) 10Ottomata: "eventlogging::package doesn't exist anymore. webperf classes do use eventlogging code, currently deployed via trebuchet. We are waiting " [puppet] - 10https://gerrit.wikimedia.org/r/298120 (https://phabricator.wikimedia.org/T139202) (owner: 10Dzahn) [13:54:43] (03CR) 10BBlack: [C: 031] "Actual deployment may be tricky - we may have to stop pybal and clear the relevant ipvs services completely via ipvsadm before restarting " [puppet] - 10https://gerrit.wikimedia.org/r/297418 (https://phabricator.wikimedia.org/T108827) (owner: 10Ema) [13:59:02] (03CR) 10Gehel: [C: 031] logstash: Normalize a few more fields [puppet] - 10https://gerrit.wikimedia.org/r/298242 (owner: 10EBernhardson) [13:59:46] (03PS1) 10Andrew Bogott: Desigate policy: Allow projectadmins to delete domains [puppet] - 10https://gerrit.wikimedia.org/r/298280 [14:00:14] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1001 is OK: OK - nfs-exports is active [14:00:29] 06Operations, 10Analytics, 10MediaWiki-extensions-CentralNotice, 10Traffic: Generate a list of junk CN cookies being sent by clients - https://phabricator.wikimedia.org/T132374#2447244 (10BBlack) Yes, we can help wipe these out at the Varnish layer, by unsetting blacklisted cookies we see. We've done that... [14:01:38] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/install/deploy labvirt nodes - https://phabricator.wikimedia.org/T138509#2447245 (10Cmjohnson) Raid 10 is setup. [14:01:53] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/install/deploy labvirt nodes - https://phabricator.wikimedia.org/T138509#2447246 (10Cmjohnson) [14:03:01] (03PS1) 10Giuseppe Lavagetto: mediawiki: remove the inactive appservers from conftool-data, dsh [puppet] - 10https://gerrit.wikimedia.org/r/298281 (https://phabricator.wikimedia.org/T139353) [14:03:03] (03PS1) 10Giuseppe Lavagetto: puppet: remove all references to the decommissioned appservers [puppet] - 10https://gerrit.wikimedia.org/r/298282 (https://phabricator.wikimedia.org/T139353) [14:04:25] (03CR) 10Ottomata: WIP DRAFT WMDE_Analytics module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/269467 (owner: 10Addshore) [14:04:27] (03CR) 10Giuseppe Lavagetto: [C: 032] "servers have been out of rotation for a week, it's time to decom them." [puppet] - 10https://gerrit.wikimedia.org/r/298281 (https://phabricator.wikimedia.org/T139353) (owner: 10Giuseppe Lavagetto) [14:04:34] (03CR) 10Giuseppe Lavagetto: [V: 032] "servers have been out of rotation for a week, it's time to decom them." [puppet] - 10https://gerrit.wikimedia.org/r/298281 (https://phabricator.wikimedia.org/T139353) (owner: 10Giuseppe Lavagetto) [14:05:21] !log rebooting achernar for kernel update [14:05:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:07:53] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 4 others: Redo /beacon/impression system (formerly Special:RecordImpression) to remove extra round trips on all FR impressions (title was: S:RI should pyroperish) - https://phabricator.wikimedia.org/T45250#2447263 (10f... [14:08:46] 06Operations: Randomly failing puppetmaster sync to strontium - https://phabricator.wikimedia.org/T128895#2447266 (10fgiunchedi) [14:10:42] 06Operations, 06Labs: Failed drive in labstore2001 array - https://phabricator.wikimedia.org/T139937#2447267 (10chasemp) [14:10:49] 06Operations, 06Labs: Failed drive in labstore2001 array - https://phabricator.wikimedia.org/T139937#2447280 (10chasemp) p:05Triage>03High [14:12:17] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 4 others: Redo /beacon/impression system (formerly Special:RecordImpression) to remove extra round trips on all FR impressions (title was: S:RI should pyroperish) - https://phabricator.wikimedia.org/T45250#2447284 (10B... [14:13:13] 06Operations, 06Labs: Failed drive in labstore2001 array - https://phabricator.wikimedia.org/T139937#2447290 (10chasemp) [14:15:02] 06Operations, 10hardware-requests, 10Continuous-Integration-Infrastructure (phase-out-gallium): eqiad: 2 300GB SSD disks for scandium.eqiad.wmnet - https://phabricator.wikimedia.org/T139938#2447292 (10hashar) [14:15:13] (03PS12) 10Addshore: Statistics::wmde puppetization [puppet] - 10https://gerrit.wikimedia.org/r/269467 (https://phabricator.wikimedia.org/T125989) [14:15:24] 06Operations, 10media-storage, 13Patch-For-Review: 'swift' user/group IDs should be consistent across the fleet - https://phabricator.wikimedia.org/T123918#2447306 (10fgiunchedi) a:03fgiunchedi [14:15:31] 06Operations, 10hardware-requests, 10Continuous-Integration-Infrastructure (phase-out-gallium): eqiad: 2 300GB SSD disks for scandium.eqiad.wmnet - https://phabricator.wikimedia.org/T139938#2447307 (10hashar) Follow up a checkin last week with @thcipriani and @chasemp [14:15:44] 06Operations, 10hardware-requests, 10Continuous-Integration-Infrastructure (phase-out-gallium): eqiad: 2 300GB SSD disks for scandium.eqiad.wmnet - https://phabricator.wikimedia.org/T139938#2447310 (10hashar) [14:15:46] 06Operations, 06DC-Ops, 10Continuous-Integration-Infrastructure (phase-out-gallium): Can scandium.eqiad.wmnet receives a couple 500G hard drive in a RAID 1 array? - https://phabricator.wikimedia.org/T138955#2447309 (10hashar) [14:16:18] 06Operations, 06DC-Ops, 10Continuous-Integration-Infrastructure (phase-out-gallium): Can scandium.eqiad.wmnet receives a couple 500G hard drive in a RAID 1 array? - https://phabricator.wikimedia.org/T138955#2414901 (10hashar) 05Open>03stalled I have filled #hardware-requests {T139938} to get us a couple... [14:16:29] 06Operations, 10hardware-requests, 10Continuous-Integration-Infrastructure (phase-out-gallium): eqiad: 2 300GB SSD disks for scandium.eqiad.wmnet - https://phabricator.wikimedia.org/T139938#2447292 (10hashar) [14:16:59] (03PS13) 10Addshore: Statistics::wmde puppetization [puppet] - 10https://gerrit.wikimedia.org/r/269467 (https://phabricator.wikimedia.org/T125989) [14:17:18] (03PS1) 10Andrew Bogott: Assign IPs to labvirt1012, 1013, 1014 [dns] - 10https://gerrit.wikimedia.org/r/298284 (https://phabricator.wikimedia.org/T138509) [14:18:22] (03CR) 10Andrew Bogott: [C: 032] Assign IPs to labvirt1012, 1013, 1014 [dns] - 10https://gerrit.wikimedia.org/r/298284 (https://phabricator.wikimedia.org/T138509) (owner: 10Andrew Bogott) [14:21:00] (03CR) 10Ottomata: Statistics::wmde puppetization (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/269467 (https://phabricator.wikimedia.org/T125989) (owner: 10Addshore) [14:21:45] 06Operations, 10hardware-requests, 10Continuous-Integration-Infrastructure (phase-out-gallium): eqiad: 2 300GB SSD disks for scandium.eqiad.wmnet - https://phabricator.wikimedia.org/T139938#2447334 (10chasemp) scandium as it turns out is a very old server which I'm not sure makes sense to spend time migratin... [14:23:26] 06Operations: Add openldap/labs servers to backup - https://phabricator.wikimedia.org/T120919#2447336 (10fgiunchedi) ldap backups are now enabled only on serpens, shall we enable it on all machines in codfw and eqiad too? [14:24:02] (03CR) 10Addshore: Statistics::wmde puppetization (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/269467 (https://phabricator.wikimedia.org/T125989) (owner: 10Addshore) [14:24:04] (03PS14) 10Addshore: Statistics::wmde puppetization [puppet] - 10https://gerrit.wikimedia.org/r/269467 (https://phabricator.wikimedia.org/T125989) [14:24:46] (03PS2) 10Giuseppe Lavagetto: puppet: remove all references to the decommissioned appservers [puppet] - 10https://gerrit.wikimedia.org/r/298282 (https://phabricator.wikimedia.org/T139353) [14:27:53] (03CR) 10Ottomata: [C: 031] "One tiny nit about param doc, other than that +1" (031 comment) [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/295652 (owner: 10Elukey) [14:28:36] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] puppet: remove all references to the decommissioned appservers [puppet] - 10https://gerrit.wikimedia.org/r/298282 (https://phabricator.wikimedia.org/T139353) (owner: 10Giuseppe Lavagetto) [14:28:51] (03CR) 10Elukey: Add the -T VSL API timeout parameter plus the related formatter. (031 comment) [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/295652 (owner: 10Elukey) [14:31:10] (03PS9) 10Elukey: Add the -T VSL API timeout parameter plus the related formatter. [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/295652 [14:31:24] <_joe_> any alarm on the icinga config will be fixed by the next puppet run on neon [14:32:22] 06Operations: Add openldap/labs servers to backup - https://phabricator.wikimedia.org/T120919#2447373 (10MoritzMuehlenhoff) I think that makes sense. For the OIT mirrors we also ended up having a backup for both syncrepl endpoints since the slapd data is so tiny. I'll make a patch. [14:34:56] PROBLEM - mediawiki-installation DSH group on mw1151 is CRITICAL: Host mw1151 is not in mediawiki-installation dsh group [14:37:06] 06Operations, 13Patch-For-Review: revisit swift (sys)logging - https://phabricator.wikimedia.org/T137397#2447390 (10fgiunchedi) p:05Triage>03Normal [14:39:11] <_joe_> !log shutting down mw1090-1113,mw1149-51 for decommissioning [14:39:12] (03CR) 10Stang: [C: 031] Allow to import from zh.wikipedia to beta.wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298266 (https://phabricator.wikimedia.org/T139922) (owner: 10Dereckson) [14:39:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:40:01] (03PS1) 10Ema: package_builder: install WMF lintian profile file [puppet] - 10https://gerrit.wikimedia.org/r/298286 [14:41:00] (03PS1) 10Muehlenhoff: Run slapd backup on both labs LDAP servers [puppet] - 10https://gerrit.wikimedia.org/r/298287 [14:41:14] 06Operations: Add openldap/labs servers to backup - https://phabricator.wikimedia.org/T120919#2447396 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff [14:41:44] 06Operations, 06Labs, 10Labs-Infrastructure: investigate slapd memory leak - https://phabricator.wikimedia.org/T130593#2447400 (10MoritzMuehlenhoff) [14:41:46] 06Operations: Add openldap/labs servers to backup - https://phabricator.wikimedia.org/T120919#1864777 (10MoritzMuehlenhoff) 05Open>03Resolved The LDAP servers for OIT and labs are part of the backup for a while now, closing. [14:44:33] 06Operations: netmon1001 daily logrotate cronspam - https://phabricator.wikimedia.org/T139943#2447415 (10fgiunchedi) [14:46:56] !log upgrading cache_misc to varnish 4.1.3-1wm1 [14:47:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:48:01] (03PS1) 10Gehel: Maps - enable tileratorui notification of new expier files [puppet] - 10https://gerrit.wikimedia.org/r/298288 (https://phabricator.wikimedia.org/T139451) [14:49:51] (03PS15) 10Addshore: Statistics::wmde puppetization [puppet] - 10https://gerrit.wikimedia.org/r/269467 (https://phabricator.wikimedia.org/T125989) [14:51:30] 06Operations, 06Services, 13Patch-For-Review, 15User-mobrovac: Updates various services to nodejs 4.4.6 - https://phabricator.wikimedia.org/T138561#2447442 (10Gehel) Maps servers tested and updated to nodejs 4.4.6. @Yurik, @MaxSem let me know if you see anything unusual. [14:51:45] 06Operations, 06Services, 13Patch-For-Review, 15User-mobrovac: Updates various services to nodejs 4.4.6 - https://phabricator.wikimedia.org/T138561#2447444 (10Gehel) [14:55:41] (03PS1) 10Filippo Giunchedi: mediawiki: add delaycompress for jobrunner logs [puppet] - 10https://gerrit.wikimedia.org/r/298289 (https://phabricator.wikimedia.org/T132324) [14:57:13] (03PS1) 10Andrew Bogott: Include access_new_install on labcontrol1001. [puppet] - 10https://gerrit.wikimedia.org/r/298290 (https://phabricator.wikimedia.org/T138509) [15:00:04] anomie, ostriches, thcipriani, hashar, and twentyafterfour: Dear anthropoid, the time has come. Please deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160711T1500). [15:00:04] Urbanecm, Dereckson, and Dereckson: A patch you scheduled for Morning SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [15:00:46] Hi [15:01:00] Around [15:01:36] (03CR) 10Elukey: [C: 031] "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/298289 (https://phabricator.wikimedia.org/T132324) (owner: 10Filippo Giunchedi) [15:01:42] I can SWAT today. [15:02:32] (03PS4) 10Thcipriani: Change Albanian Wikiquote logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297195 (https://phabricator.wikimedia.org/T139229) (owner: 10Urbanecm) [15:02:51] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297195 (https://phabricator.wikimedia.org/T139229) (owner: 10Urbanecm) [15:03:29] (03Merged) 10jenkins-bot: Change Albanian Wikiquote logo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297195 (https://phabricator.wikimedia.org/T139229) (owner: 10Urbanecm) [15:04:29] RECOVERY - MD RAID on labstore2001 is OK: OK: Active: 11, Working: 11, Failed: 0, Spare: 0 [15:05:30] 06Operations, 06Labs: Failed drive in labstore2001 array - https://phabricator.wikimedia.org/T139937#2447554 (10Papaul) @chasemp yes i have 3*2TB SAS disks in spare [15:05:49] RECOVERY - MegaRAID on labstore2001 is OK: OK: optimal, 11 logical, 11 physical [15:06:26] !log thcipriani@tin Synchronized static/images/project-logos/sqwikiquote.png: SWAT: [[gerrit:297195|Change Albanian Wikiquote logo (T139229)]] (duration: 00m 34s) [15:06:27] T139229: Change Albanian Wikiquote logo - https://phabricator.wikimedia.org/T139229 [15:06:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:06:55] !log removing mw114[0-8] from service via conftool as first decom step (T139353) [15:06:56] T139353: Decommission all old mediawiki appservers in eqiad - https://phabricator.wikimedia.org/T139353 [15:06:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:07:02] ^ Urbanecm logo purged check please [15:07:05] 06Operations, 10hardware-requests, 10Continuous-Integration-Infrastructure (phase-out-gallium): eqiad: 2 300GB SSD disks for scandium.eqiad.wmnet - https://phabricator.wikimedia.org/T139938#2447562 (10hashar) contint1001 was setup in the production network when it would need to be in the labs support host ne... [15:07:56] It seems working. [15:09:05] Urbanecm: it looks like you added an 'i' after the comment here, could you fix that? https://gerrit.wikimedia.org/r/#/c/297196/2/wmf-config/InitialiseSettings.php [15:09:47] e.g. #i Wikiquote vs # Wikiquote [15:09:50] 06Operations, 10Phabricator: Phabricator weekly report not generated (or at least sent) - https://phabricator.wikimedia.org/T139950#2447582 (10Danny_B) [15:10:16] thcipriani: Sure, working on it... [15:10:22] thanks :) [15:10:31] (03PS2) 10Thcipriani: Remove old throttle rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297279 (owner: 10Urbanecm) [15:10:49] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297279 (owner: 10Urbanecm) [15:11:26] (03Merged) 10jenkins-bot: Remove old throttle rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297279 (owner: 10Urbanecm) [15:11:39] 06Operations, 06Performance-Team, 10Traffic, 13Patch-For-Review, 07perfnotice: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2447594 (10BBlack) {F4262192} [15:12:26] (03CR) 10EBernhardson: "very much it could, yes." [puppet] - 10https://gerrit.wikimedia.org/r/298242 (owner: 10EBernhardson) [15:12:34] (03PS3) 10Urbanecm: HD version for sqwikiquote's logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297196 (https://phabricator.wikimedia.org/T139229) [15:12:59] !log thcipriani@tin Synchronized wmf-config/throttle.php: SWAT: [[gerrit:297279|Remove old throttle rules]] (duration: 00m 30s) [15:13:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:13:08] thcipriani: Done [15:13:16] * thcipriani looks [15:13:42] (03PS2) 10Filippo Giunchedi: mediawiki: add delaycompress for jobrunner logs [puppet] - 10https://gerrit.wikimedia.org/r/298289 (https://phabricator.wikimedia.org/T132324) [15:13:49] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] mediawiki: add delaycompress for jobrunner logs [puppet] - 10https://gerrit.wikimedia.org/r/298289 (https://phabricator.wikimedia.org/T132324) (owner: 10Filippo Giunchedi) [15:14:34] Urbanecm: looks like you deleted the space between the '#' and the name of the wiki now, could you add that back? Sorry to be a pedant... [15:15:31] (03PS3) 10Thcipriani: Add contentdm.lib.byu.edu to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296882 (https://phabricator.wikimedia.org/T139095) (owner: 10Dereckson) [15:15:44] (03PS4) 10Urbanecm: HD version for sqwikiquote's logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297196 (https://phabricator.wikimedia.org/T139229) [15:16:03] thcipriani: ^ [15:16:09] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296882 (https://phabricator.wikimedia.org/T139095) (owner: 10Dereckson) [15:16:56] (03Merged) 10jenkins-bot: Add contentdm.lib.byu.edu to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296882 (https://phabricator.wikimedia.org/T139095) (owner: 10Dereckson) [15:17:13] (03PS5) 10Thcipriani: HD version for sqwikiquote's logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297196 (https://phabricator.wikimedia.org/T139229) (owner: 10Urbanecm) [15:17:57] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297196 (https://phabricator.wikimedia.org/T139229) (owner: 10Urbanecm) [15:18:24] 06Operations, 06Labs: Failed drive in labstore2001 array - https://phabricator.wikimedia.org/T139937#2447615 (10Papaul) a:05Papaul>03chasemp Disk replacement complete. [15:18:34] (03Merged) 10jenkins-bot: HD version for sqwikiquote's logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297196 (https://phabricator.wikimedia.org/T139229) (owner: 10Urbanecm) [15:19:19] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:296882|Add contentdm.lib.byu.edu to wgCopyUploadsDomains (T139095)]] (duration: 00m 26s) [15:19:20] T139095: Please add contentdm.lib.byu.edu to $wgCopyUploadsDomains - https://phabricator.wikimedia.org/T139095 [15:19:22] ^ Dereckson check please [15:19:41] 06Operations, 10GlobalRename, 10MediaWiki-extensions-CentralAuth, 13Patch-For-Review, and 2 others: GlobalRename gets stuck sometimes - https://phabricator.wikimedia.org/T137973#2447626 (10Steinsplitter) Update from IRC: ``` legoktm set the topic: (...) | Status: <10 concurrent renames plz ``` Which mean... [15:21:17] yuvipanda: meeting :) [15:23:30] thcipriani: works [15:23:37] Dereckson: thanks for checking [15:23:51] Sorry for the day I struggled with URLs with a lot of ? and an ajax helper script to "zoom" on an image. [15:24:42] I hope they've URL to their own direct files. [15:24:59] !log thcipriani@tin Synchronized static/images/project-logos: SWAT: [[gerrit:297196|HD version for sqwikiquote logos (T139229)]] PART I (duration: 00m 25s) [15:25:00] T139229: Change Albanian Wikiquote logo - https://phabricator.wikimedia.org/T139229 [15:25:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:25:36] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:297196|HD version for sqwikiquote logos (T139229)]] PART II (duration: 00m 27s) [15:25:37] T139229: Change Albanian Wikiquote logo - https://phabricator.wikimedia.org/T139229 [15:25:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:25:41] ^ Urbanecm check please [15:27:28] Files are present. I have no retina display so I can't fully check. [15:27:48] (03PS2) 10Thcipriani: Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298028 (owner: 10Dereckson) [15:28:51] Urbanecm: ack. I have a hidpi display. Looking good! Thanks for all the revisions :) [15:28:53] 06Operations, 10Phabricator: Phabricator weekly report not generated (or at least sent) - https://phabricator.wikimedia.org/T139950#2447676 (10Aklapper) p:05Triage>03High Didn't receive it either. Needs someone to run manually to check whether some SQL query barks after DB schema changes or whether it's m... [15:29:20] You're welcome. [15:29:26] Thanks for deploying! [15:29:28] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298028 (owner: 10Dereckson) [15:30:05] (03Merged) 10jenkins-bot: Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298028 (owner: 10Dereckson) [15:32:58] !log thcipriani@tin Synchronized wmf-config/interwiki.php: SWAT: [[gerrit:298028|Update interwiki map]] (duration: 00m 28s) [15:33:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:33:04] ^ Dereckson check please [15:33:42] Works. [15:33:59] (03PS3) 10Thcipriani: State Compact Language Links isn't beta anymore [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298134 (https://phabricator.wikimedia.org/T136677) (owner: 10Dereckson) [15:34:26] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298134 (https://phabricator.wikimedia.org/T136677) (owner: 10Dereckson) [15:34:51] This one is no op. [15:35:16] yarp. [15:35:24] (03Merged) 10jenkins-bot: State Compact Language Links isn't beta anymore [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298134 (https://phabricator.wikimedia.org/T136677) (owner: 10Dereckson) [15:37:19] !log thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:298134|State Compact Language Links is not beta anymore (T136677)]] (duration: 00m 26s) [15:37:20] T136677: Deployment of Compact Language Links - https://phabricator.wikimedia.org/T136677 [15:37:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:37:35] (03PS2) 10Thcipriani: Allow to import from zh.wikipedia to beta.wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298266 (https://phabricator.wikimedia.org/T139922) (owner: 10Dereckson) [15:37:57] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298266 (https://phabricator.wikimedia.org/T139922) (owner: 10Dereckson) [15:38:34] (03Merged) 10jenkins-bot: Allow to import from zh.wikipedia to beta.wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298266 (https://phabricator.wikimedia.org/T139922) (owner: 10Dereckson) [15:40:30] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:298266|Allow to import from zh.wikipedia to beta.wikiversity (T139922)]] (duration: 00m 26s) [15:40:31] T139922: Add zhwiki to betawikiversity import sources. - https://phabricator.wikimedia.org/T139922 [15:40:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:40:38] Can't test this one, will add a request to original requester to ensuire zh.wikipedia appears in Special:Import. [15:40:45] Dereckson: ack, thanks. [15:41:07] (03PS4) 10Thcipriani: Update logo settings for the Nepali Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297134 (https://phabricator.wikimedia.org/T139240) (owner: 10Odder) [15:41:09] PROBLEM - HHVM rendering on mw1170 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:41:30] PROBLEM - Apache HTTP on mw1170 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:41:42] checking --^ [15:42:27] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297134 (https://phabricator.wikimedia.org/T139240) (owner: 10Odder) [15:43:13] (03Merged) 10jenkins-bot: Update logo settings for the Nepali Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297134 (https://phabricator.wikimedia.org/T139240) (owner: 10Odder) [15:43:33] 06Operations, 10hardware-requests: Find and rack 2 EX4200s in rack c1-eqiad - https://phabricator.wikimedia.org/T139752#2447740 (10Papaul) [15:43:35] 06Operations, 10ops-codfw, 10hardware-requests: codfw: add all spare network switches to hardware spares tracking - https://phabricator.wikimedia.org/T139776#2447738 (10Papaul) 05Open>03Resolved COMPLETE [15:43:48] RECOVERY - Apache HTTP on mw1170 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.066 second response time [15:43:50] !log restarted hhvm on mw1170 (Apache errors while reading FCGI headers, HHVM dump debug in /tmp/hhvm.14968.bt.) [15:43:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:44:26] low level lock in HPHP::Treadmill::getAgeOldestRequest () from /usr/bin/hhvm [15:45:39] RECOVERY - HHVM rendering on mw1170 is OK: HTTP OK: HTTP/1.1 200 OK - 72011 bytes in 0.224 second response time [15:46:44] !log thcipriani@tin Synchronized static/images/project-logos: SWAT: [[gerrit:297134|Update logo settings for the Nepali Wikipedia (T139240)]] PART I (duration: 00m 27s) [15:46:45] T139240: Update Nepali Wikipedia logo - https://phabricator.wikimedia.org/T139240 [15:46:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:46:51] Checking. [15:47:21] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:297134|Update logo settings for the Nepali Wikipedia (T139240)]] PART II (duration: 00m 26s) [15:47:22] T139240: Update Nepali Wikipedia logo - https://phabricator.wikimedia.org/T139240 [15:47:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:48:56] https://ne.wikipedia.org/static/images/project-logos/newiki.png?debug=true is ok [15:49:11] https://ne.wikipedia.org/static/images/project-logos/newiki.png too [15:49:14] Perfect. [15:50:54] Dereckson: cool, thanks for checking. Can confirm hidpi seems to be working on my display. [15:51:44] 06Operations, 10Traffic, 10fundraising-tech-ops: Fix nits in Fundraising HTTPS/HSTS configs in wikimedia.org domain - https://phabricator.wikimedia.org/T137161#2447784 (10BBlack) [15:52:07] (03PS1) 10Cmjohnson: Adding dhcpd entries for mc1019-1036 [puppet] - 10https://gerrit.wikimedia.org/r/298294 [15:52:38] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [15:54:19] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [15:54:50] 06Operations, 10Traffic, 10fundraising-tech-ops: Fix nits in Fundraising HTTPS/HSTS configs in wikimedia.org domain - https://phabricator.wikimedia.org/T137161#2447794 (10BBlack) @Jgreen thanks for working on this! I've re-audited all the Fundraising wikimedia.org hostnames, updated https://wikitech.wikimed... [15:55:26] thcipriani: okay, nice. Thanks for deploying. [15:55:29] (03PS1) 10BryanDavis: logstash: Update default mappings for Elasticsearch 2.x [puppet] - 10https://gerrit.wikimedia.org/r/298295 (https://phabricator.wikimedia.org/T136001) [15:58:07] (03PS2) 10BryanDavis: logstash: Update default mappings for Elasticsearch 2.x [puppet] - 10https://gerrit.wikimedia.org/r/298295 (https://phabricator.wikimedia.org/T136001) [15:58:39] (03CR) 10BryanDavis: logstash: Update default mappings for Elasticsearch 2.x (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/298295 (https://phabricator.wikimedia.org/T136001) (owner: 10BryanDavis) [15:58:59] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [15:59:38] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [16:00:04] 06Operations, 10Traffic, 13Patch-For-Review: Investigate TCP Fast Open for tlsproxy - https://phabricator.wikimedia.org/T108827#2447812 (10BBlack) I'd like to share keys in the long run, but I think sh for port 80 is the right move for now. It will also clear up confusion on our TFO success/fail stats in ge... [16:00:09] 06Operations, 10Phabricator: Phabricator weekly report not generated (or at least sent) - https://phabricator.wikimedia.org/T139950#2447582 (10greg) The cron was disabled in prep for the DB upgrade. See T138460#2437836 [16:03:02] 06Operations, 10Phabricator: Phabricator weekly report not generated (or at least sent) - https://phabricator.wikimedia.org/T139950#2447821 (10Danny_B) So can it be at least ran manually now, please? [16:03:08] (03PS9) 10Addshore: Deploy RevisionSlider to test and test2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296753 (https://phabricator.wikimedia.org/T138943) [16:05:41] James_F: ^^ updated, There hasn't been 1 design review, but it has been reviewed for design throughout. Also there hasn't been a 'performance' review, impact is minimal, and again has just been reviewed throughout. I would guess performance is also covered in the sec review [16:06:47] Oh, sure, I don't think that the design review is an issue. [16:07:11] I think the checklist is done then! [16:07:34] Can you add it to https://www.mediawiki.org/wiki/Beta_Features/Gallery ? :-) [16:07:50] (03PS10) 10Addshore: Deploy RevisionSlider to test and test2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296753 (https://phabricator.wikimedia.org/T138943) [16:07:50] yup [16:08:24] (03CR) 10Jforrester: [C: 031] "Good to ship." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296753 (https://phabricator.wikimedia.org/T138943) (owner: 10Addshore) [16:08:36] shame I missed morning swat :D [16:11:32] addshore: there is always an evening SWAT :D [16:11:35] 06Operations, 10Traffic, 10Wikimedia-Blog, 07HTTPS: Switch blog to HTTPS-only - https://phabricator.wikimedia.org/T105905#2447858 (10BBlack) If there's no real cost to do so, it would be ideal to ask them to switch our VIP for blog.wm.o to HTTPS-by-default and LetsEncrypt (as the latter will save us some m... [16:11:49] Luke081515: indeed, but that means staying up late ;) [16:12:07] morning swat is 4pm swat for me ;) [16:13:07] addshore: 5 pm for me ;) [16:16:05] (03CR) 10Addshore: "@JanZerebecki could you please remove you -2" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296753 (https://phabricator.wikimedia.org/T138943) (owner: 10Addshore) [16:16:18] jzerebecki: ^^ [16:18:00] (03CR) 10Cmjohnson: [C: 032] Adding dhcpd entries for mc1019-1036 [puppet] - 10https://gerrit.wikimedia.org/r/298294 (owner: 10Cmjohnson) [16:19:12] addshore: done [16:19:15] (03CR) 10JanZerebecki: [C: 031] Deploy RevisionSlider to test and test2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296753 (https://phabricator.wikimedia.org/T138943) (owner: 10Addshore) [16:19:19] (03CR) 10Aude: [C: 031] Add Cape Verdean Creole (kea) as extra language for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297556 (https://phabricator.wikimedia.org/T127435) (owner: 10Thiemo Mättig (WMDE)) [16:19:20] 06Operations, 10Traffic, 13Patch-For-Review: Investigate TCP Fast Open for tlsproxy - https://phabricator.wikimedia.org/T108827#2447888 (10BBlack) Another thing just occurred to me though - until we switch port 80 to nginx or patch our varnish, we don't have TFO support on port 80 regardless, as varnish does... [16:20:10] 06Operations, 10Traffic, 13Patch-For-Review: Switch port 80 to nginx on primary clusters - https://phabricator.wikimedia.org/T107236#2447893 (10BBlack) [16:20:12] 06Operations, 10Traffic, 13Patch-For-Review: Investigate TCP Fast Open for tlsproxy - https://phabricator.wikimedia.org/T108827#2447892 (10BBlack) [16:22:06] 06Operations, 10Traffic, 13Patch-For-Review: Investigate TCP Fast Open for tlsproxy - https://phabricator.wikimedia.org/T108827#2447901 (10ema) >>! In T108827#2447888, @BBlack wrote: > Another thing just occurred to me though - until we switch port 80 to nginx or patch our varnish, we don't have TFO support... [16:22:39] (03CR) 10Aude: "this is scheduled for swat later today" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297964 (https://phabricator.wikimedia.org/T136814) (owner: 10Thiemo Mättig (WMDE)) [16:22:48] (03CR) 10Aude: "this is scheduled for swat later today" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297556 (https://phabricator.wikimedia.org/T127435) (owner: 10Thiemo Mättig (WMDE)) [16:22:55] (03PS10) 10EBernhardson: Duplicate logstash output to alternate elasticsearch cluster [puppet] - 10https://gerrit.wikimedia.org/r/295442 [16:23:22] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2447906 (10Whatamidoing-WMF) I've contacted the newest three. I'm going to post general messages to all the WP:BOTN pages and a few VPTs a... [16:23:31] (03PS1) 10Alex Monk: deployment-prep: Point upload cache at swift, fix rewrite.py to use beta.wmflabs.org domains [puppet] - 10https://gerrit.wikimedia.org/r/298297 (https://phabricator.wikimedia.org/T64835) [16:27:44] (03PS2) 10Alex Monk: deployment-prep: Point upload cache at swift, fix rewrite.py to use beta.wmflabs.org domains [puppet] - 10https://gerrit.wikimedia.org/r/298297 (https://phabricator.wikimedia.org/T64835) [16:31:25] 06Operations, 06Performance-Team, 10Traffic: Support brotli compression - https://phabricator.wikimedia.org/T137979#2447957 (10Gilles) It might be a good idea to experiment with this locally using our real content, to see what kind of gains we'd be looking at. SDCH+gzip might be worth looking into as well.... [16:33:13] (03CR) 10BBlack: [C: 031] debug_proxy: Limit to production networks [puppet] - 10https://gerrit.wikimedia.org/r/297982 (owner: 10Muehlenhoff) [16:36:35] 06Operations, 06Performance-Team, 10Traffic: Support brotli compression - https://phabricator.wikimedia.org/T137979#2447965 (10BBlack) I agree that SDCH has better upsides (for supporting clients), it just also seems like a much larger effort to turn it on and get it tuned, and I have no idea how we'd integr... [16:37:24] 06Operations, 06Labs: Failed drive in labstore2001 array - https://phabricator.wikimedia.org/T139937#2447968 (10chasemp) this array is resyncing now, I used my notes from https://phabricator.wikimedia.org/T127076#2067539. a few pointers here. Find present adapters: megacli -CfgDsply -a0 | grep Adapter this... [16:40:24] (03PS1) 10Alex Monk: [labs/deployment-prep] Switch file backends to swift [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298299 (https://phabricator.wikimedia.org/T64835) [16:41:10] (03CR) 10BryanDavis: "Cherry-picked to deployment-puppetmaster and used to update the mapping loaded in the cluster. We will be able to see after GMT midnight i" [puppet] - 10https://gerrit.wikimedia.org/r/298295 (https://phabricator.wikimedia.org/T136001) (owner: 10BryanDavis) [16:41:11] 06Operations, 06Performance-Team, 10Traffic: Support brotli compression - https://phabricator.wikimedia.org/T137979#2448019 (10Gilles) Actually it's probably Linkedin, not Facebook that this guy works for. I pieced it together from his HN history, he oftens comments on Apache Traffic Server, which Linkedin i... [16:42:45] (03CR) 10BryanDavis: "Alternate approach in I638d88e1d874fdb8be211bd74a1e36998d42dc09 that I'm going to test on the beta cluster. That change is intended to for" [puppet] - 10https://gerrit.wikimedia.org/r/298242 (owner: 10EBernhardson) [16:43:01] 06Operations, 10Traffic: Evaluate Apache Traffic Server - https://phabricator.wikimedia.org/T96853#2448043 (10BBlack) http://www.slideshare.net/thenickberry/reflecting-a-year-after-migrating-to-apache-traffic-server [16:43:16] 06Operations, 06Performance-Team, 10Traffic: Support brotli compression - https://phabricator.wikimedia.org/T137979#2448048 (10BBlack) Nice ATS link! Added to T96853 [16:43:38] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [16:44:09] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [16:46:11] 06Operations, 06Commons, 10media-storage, 07User-notice: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2448093 (10kaldari) @MoritzMuehlenhoff: The bug definitely shows up for me at https://upload.wikimedia.org/wikipedia/commons/t... [16:48:18] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [16:48:48] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [16:50:36] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 6x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2448119 (10Eevans) [16:50:46] (03PS1) 10Andrew Bogott: Replace labvirt1011 to the nova scheduling pool. [puppet] - 10https://gerrit.wikimedia.org/r/298300 [16:51:15] (03CR) 10Yuvipanda: [C: 031] Replace labvirt1011 to the nova scheduling pool. [puppet] - 10https://gerrit.wikimedia.org/r/298300 (owner: 10Andrew Bogott) [16:51:36] (03PS2) 10Andrew Bogott: Replace labvirt1011 to the nova scheduling pool. [puppet] - 10https://gerrit.wikimedia.org/r/298300 [16:52:57] !log unclog the phabricator task queue (phd) by cherry-picking upstream fix 12c6f87ca to wmf/stable (+restarted phd on iridium) [16:53:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:53:39] (03CR) 10Andrew Bogott: [C: 032] Replace labvirt1011 to the nova scheduling pool. [puppet] - 10https://gerrit.wikimedia.org/r/298300 (owner: 10Andrew Bogott) [16:54:32] 06Operations, 10Phabricator: Phabricator weekly report not generated (or at least sent) - https://phabricator.wikimedia.org/T139950#2448166 (10mmodell) It was disabled because work running on the database slave would interfere with the upgrade. Running the report from a master database might be too much for it... [16:55:09] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: access: eventlogging-admins -> hafnium - https://phabricator.wikimedia.org/T139202#2448175 (10MoritzMuehlenhoff) This was approved in the ops meeting [16:56:40] 06Operations, 06Discovery, 06Maps, 07Epic: Epic: switch Maps to production status - https://phabricator.wikimedia.org/T133744#2241419 (10Yurik) [16:58:06] 06Operations, 06Discovery, 06Maps, 07Epic: Epic: switch Maps to production status - https://phabricator.wikimedia.org/T133744#2448191 (10Urbanecm) [16:58:18] 06Operations, 06Discovery, 06Maps, 07Epic: Epic: switch Maps to production status - https://phabricator.wikimedia.org/T133744#2241419 (10Urbanecm) [16:58:32] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2448199 (10BBlack) @Whatamidoing-WMF Thanks! I'm still getting caught up a bit from being on vacation.... The original plan (and still th... [16:59:53] 06Operations: (www.)wmfusercontent.org should respond to HTTP - https://phabricator.wikimedia.org/T104735#2448220 (10Mike_Peel) Thanks @Bawolff! wmfusercontent.org now works as expected. However, accessing https://phab.wmfusercontent.org/ gives the error message: Unhandled Exception ("Exception") This Phabricat... [17:00:05] gehel: Dear anthropoid, the time has come. Please deploy Weekly Wikidata query service deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160711T1700). [17:01:08] thcipriani: fyi, I'm doing the second scap3 deployment on WDQS. Please cross fingers with me :P [17:01:18] *crosses fingers* [17:01:38] * thcipriani crosses fingers [17:01:52] (03PS1) 10Andrew Bogott: TEMPORARY HACK: Add access_new_install to iron [puppet] - 10https://gerrit.wikimedia.org/r/298301 (https://phabricator.wikimedia.org/T138509) [17:02:26] * gehel is testing wdqs on beta first... [17:02:27] (03Abandoned) 10Andrew Bogott: Include access_new_install on labcontrol1001. [puppet] - 10https://gerrit.wikimedia.org/r/298290 (https://phabricator.wikimedia.org/T138509) (owner: 10Andrew Bogott) [17:02:55] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2343854 (10AryanSogd) What should I do to make my bot (AryanBot) is not broken? >>! In T136674#2447906, @Whatamidoing-WMF wrote: > I'v... [17:03:04] 06Operations, 10Traffic, 10fundraising-tech-ops: Fix nits in Fundraising HTTPS/HSTS configs in wikimedia.org domain - https://phabricator.wikimedia.org/T137161#2448253 (10Jgreen) >>! In T137161#2447794, @BBlack wrote: > @Jgreen thanks for working on this! I've re-audited all the Fundraising wikimedia.org ho... [17:04:23] 06Operations, 06Discovery, 06Maps, 07Epic: Epic: switch Maps to production status - https://phabricator.wikimedia.org/T133744#2448257 (10Urbanecm) [17:04:33] 06Operations, 06Discovery, 06Maps, 07Epic: Epic: switch Maps to production status - https://phabricator.wikimedia.org/T133744#2241419 (10Urbanecm) [17:05:12] new WDQS on beta looks good, let's see the real thing [17:06:40] (03PS1) 10Yuvipanda: Factor out container specification [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298302 [17:07:10] (03CR) 10jenkins-bot: [V: 04-1] Factor out container specification [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298302 (owner: 10Yuvipanda) [17:07:12] !log starting deployment of latest WDQS (second time deploying with scap3) [17:07:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:08:31] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2448299 (10AryanSogd) What should I do to make my bot (AryanBot) is not broken? [17:09:55] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2448315 (10Elitre) Quoting a linked list message: - The simple solution is to simply include the "rawcontinue" parameter with your requ... [17:12:57] 07Blocked-on-Operations, 06Operations, 06Discovery, 10Kartographer, and 4 others: Enable Interactive Maps (Kartographer) on Macedonian Wikipedia - https://phabricator.wikimedia.org/T139946#2448391 (10Urbanecm) [17:14:08] 07Blocked-on-Operations, 06Operations, 10Kartographer, 10Wikimedia-Site-requests, and 2 others: Enable Interactive Maps (Kartographer) on Macedonian Wikipedia - https://phabricator.wikimedia.org/T139946#2448415 (10Urbanecm) [17:14:57] 06Operations, 06Labs: Don't forget to clean the new_install key off of iron - https://phabricator.wikimedia.org/T139967#2448421 (10Andrew) [17:15:23] 07Blocked-on-Operations, 06Operations, 10Kartographer, 10Wikimedia-Site-requests, and 2 others: Enable Interactive Maps (Kartographer) on Macedonian Wikipedia - https://phabricator.wikimedia.org/T139946#2447489 (10Urbanecm) p:05Triage>03Low [17:15:52] 07Blocked-on-Operations, 06Operations, 10Kartographer, 10Wikimedia-Site-requests, and 2 others: Enable Interactive Maps (Kartographer) on Macedonian Wikipedia - https://phabricator.wikimedia.org/T139946#2447489 (10Urbanecm) 05Open>03stalled Till T133744 will be finished this is stalled I think. [17:17:13] (03CR) 10RobH: [C: 031] TEMPORARY HACK: Add access_new_install to iron [puppet] - 10https://gerrit.wikimedia.org/r/298301 (https://phabricator.wikimedia.org/T138509) (owner: 10Andrew Bogott) [17:17:20] 07Blocked-on-Operations, 06Operations, 10Kartographer, 10Wikimedia-Site-requests, and 2 others: Enable Interactive Maps (Kartographer) on Macedonian Wikipedia - https://phabricator.wikimedia.org/T139946#2448455 (10Urbanecm) @Yurik Okay. Thanks. When T133744 will be finished should I upload a patch to enabl... [17:17:52] 07Blocked-on-Operations, 06Operations, 10Kartographer, 10Wikimedia-Extension-setup, and 3 others: Enable Interactive Maps (Kartographer) on Macedonian Wikipedia - https://phabricator.wikimedia.org/T139946#2448470 (10Urbanecm) [17:18:52] SMalyshev, thcipriani: this WDQS deployment was smooth! [17:18:57] * gehel loves canaries... [17:19:20] gehel: :D [17:19:20] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2448487 (10AryanSogd) Thank you, Elitre. [17:19:40] 06Operations, 06Labs: access_new_install role vs. Labs vs. the future - https://phabricator.wikimedia.org/T139971#2448488 (10Andrew) [17:20:11] 06Operations, 06Labs: Don't forget to clean the new_install key off of iron - https://phabricator.wikimedia.org/T139967#2448501 (10Andrew) [17:20:13] 06Operations, 06Labs: access_new_install role vs. Labs vs. the future - https://phabricator.wikimedia.org/T139971#2448500 (10Andrew) [17:20:30] (03CR) 10Andrew Bogott: [C: 032] TEMPORARY HACK: Add access_new_install to iron [puppet] - 10https://gerrit.wikimedia.org/r/298301 (https://phabricator.wikimedia.org/T138509) (owner: 10Andrew Bogott) [17:22:35] 07Blocked-on-Operations, 06Operations, 10Kartographer, 10Wikimedia-Extension-setup, and 3 others: Enable Interactive Maps (Kartographer) on Macedonian Wikipedia - https://phabricator.wikimedia.org/T139946#2448504 (10Yurik) @urbanecm, I will be happy to enable it first on the smaller wikis with the tech-sav... [17:23:44] (03CR) 10EBernhardson: "since deployment-logstash3 isn't even the main beta cluster logstash, can probably drop the index and let it recreate today's to see how i" [puppet] - 10https://gerrit.wikimedia.org/r/298295 (https://phabricator.wikimedia.org/T136001) (owner: 10BryanDavis) [17:28:59] (03PS1) 10Andrew Bogott: Make labvirt1012-1014 nova compute nodes. [puppet] - 10https://gerrit.wikimedia.org/r/298307 (https://phabricator.wikimedia.org/T138509) [17:29:32] (03PS16) 10Addshore: Statistics::wmde puppetization [puppet] - 10https://gerrit.wikimedia.org/r/269467 (https://phabricator.wikimedia.org/T125989) [17:29:32] 06Operations: setup YubiHSM and laptop at office - https://phabricator.wikimedia.org/T123818#2448519 (10RobH) [17:29:57] 06Operations, 10EventBus, 10MediaWiki-Cache, 06Performance-Team, and 2 others: Setup a 2 server Kafka instance in both eqiad and codfw for reliable purge streams - https://phabricator.wikimedia.org/T114191#2448532 (10RobH) [17:31:04] 07Blocked-on-Operations, 06Operations, 10RESTBase, 10hardware-requests: Expand SSD space in Cassandra cluster - https://phabricator.wikimedia.org/T121575#2448556 (10RobH) [17:31:19] 06Operations, 10hardware-requests, 13Patch-For-Review: Upgrade restbase100[7-9] to match restbase100[1-6] hardware - https://phabricator.wikimedia.org/T119935#2448560 (10RobH) [17:31:24] (03PS17) 10Addshore: Statistics::wmde puppetization [puppet] - 10https://gerrit.wikimedia.org/r/269467 (https://phabricator.wikimedia.org/T125989) [17:31:28] 06Operations, 10ops-codfw: ms-be2007 - System halted!Error: Integrated RAID - https://phabricator.wikimedia.org/T122844#2448576 (10RobH) [17:31:56] 06Operations, 06Discovery, 10hardware-requests: Refresh elastic10{01..16}.eqiad.wmnet servers - https://phabricator.wikimedia.org/T128000#2448584 (10RobH) [17:32:29] 06Operations, 10ops-codfw, 13Patch-For-Review: es2011-es2019 racking and onsite setup tasks - https://phabricator.wikimedia.org/T126006#2448592 (10RobH) [17:33:14] 06Operations, 10hardware-requests: new labstore hardware for eqiad - https://phabricator.wikimedia.org/T126089#2448597 (10RobH) [17:34:49] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [17:35:03] (03CR) 10Andrew Bogott: [C: 032] Make labvirt1012-1014 nova compute nodes. [puppet] - 10https://gerrit.wikimedia.org/r/298307 (https://phabricator.wikimedia.org/T138509) (owner: 10Andrew Bogott) [17:36:19] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [17:36:43] (03PS18) 10Addshore: Statistics::wmde puppetization [puppet] - 10https://gerrit.wikimedia.org/r/269467 (https://phabricator.wikimedia.org/T125989) [17:40:39] (03PS19) 10Ottomata: Statistics::wmde puppetization [puppet] - 10https://gerrit.wikimedia.org/r/269467 (https://phabricator.wikimedia.org/T125989) (owner: 10Addshore) [17:40:49] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [17:41:29] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [17:42:20] PROBLEM - puppet last run on labvirt1012 is CRITICAL: CRITICAL: puppet fail [17:42:36] (03PS2) 10Dzahn: admin: create shell user for bawolff [puppet] - 10https://gerrit.wikimedia.org/r/297456 (https://phabricator.wikimedia.org/T138635) [17:44:23] (03PS3) 10Dzahn: admin: create shell user for bawolff [puppet] - 10https://gerrit.wikimedia.org/r/297456 (https://phabricator.wikimedia.org/T138635) [17:44:40] (03CR) 10Dzahn: [C: 032] admin: create shell user for bawolff [puppet] - 10https://gerrit.wikimedia.org/r/297456 (https://phabricator.wikimedia.org/T138635) (owner: 10Dzahn) [17:46:52] (03PS20) 10Ottomata: Statistics::wmde puppetization [puppet] - 10https://gerrit.wikimedia.org/r/269467 (https://phabricator.wikimedia.org/T125989) (owner: 10Addshore) [17:47:08] (03CR) 10Ottomata: [C: 032 V: 032] Statistics::wmde puppetization [puppet] - 10https://gerrit.wikimedia.org/r/269467 (https://phabricator.wikimedia.org/T125989) (owner: 10Addshore) [17:47:38] PROBLEM - puppet last run on labvirt1013 is CRITICAL: CRITICAL: puppet fail [17:47:48] (03PS2) 10Dzahn: admin: add bawolff to deployers group [puppet] - 10https://gerrit.wikimedia.org/r/297457 (https://phabricator.wikimedia.org/T138635) [17:48:08] (03PS3) 10Dzahn: admin: add bawolff to deployers group [puppet] - 10https://gerrit.wikimedia.org/r/297457 (https://phabricator.wikimedia.org/T138635) [17:48:26] (03CR) 10Dzahn: [C: 032] admin: add bawolff to deployers group [puppet] - 10https://gerrit.wikimedia.org/r/297457 (https://phabricator.wikimedia.org/T138635) (owner: 10Dzahn) [17:54:19] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: puppet fail [17:55:42] (03PS1) 10Addshore: Fix path and bad User requirement in stats_wmde [puppet] - 10https://gerrit.wikimedia.org/r/298311 (https://phabricator.wikimedia.org/T125989) [17:56:47] (03PS2) 10Ottomata: Fix path and bad User requirement in stats_wmde [puppet] - 10https://gerrit.wikimedia.org/r/298311 (https://phabricator.wikimedia.org/T125989) (owner: 10Addshore) [17:58:56] (03CR) 10Ottomata: [C: 032] Fix path and bad User requirement in stats_wmde [puppet] - 10https://gerrit.wikimedia.org/r/298311 (https://phabricator.wikimedia.org/T125989) (owner: 10Addshore) [17:59:16] (03PS3) 10BryanDavis: logstash: Update default mappings for Elasticsearch 2.x [puppet] - 10https://gerrit.wikimedia.org/r/298295 (https://phabricator.wikimedia.org/T136001) [18:02:21] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [18:02:33] PROBLEM - NTP on labvirt1012 is CRITICAL: NTP CRITICAL: Offset unknown [18:03:24] (03CR) 10Ori.livneh: [C: 031] debug_proxy: Limit to production networks [puppet] - 10https://gerrit.wikimedia.org/r/297982 (owner: 10Muehlenhoff) [18:04:52] PROBLEM - DPKG on labvirt1012 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:05:08] (03PS2) 10Ori.livneh: PCC: Fix success/failure detection [puppet] - 10https://gerrit.wikimedia.org/r/298252 [18:05:23] (03CR) 10Ori.livneh: [C: 032 V: 032] PCC: Fix success/failure detection [puppet] - 10https://gerrit.wikimedia.org/r/298252 (owner: 10Ori.livneh) [18:06:21] RECOVERY - NTP on labvirt1012 is OK: NTP OK: Offset -7.057189941e-05 secs [18:06:41] (03PS1) 10Ottomata: Use $statistics::working_path variable in statistics::wmde [puppet] - 10https://gerrit.wikimedia.org/r/298314 [18:06:53] (03PS2) 10Ottomata: Use $statistics::working_path variable in statistics::wmde [puppet] - 10https://gerrit.wikimedia.org/r/298314 [18:07:08] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Requesting access to deployment hosts (tin/terbium) for Brian Wolff - https://phabricator.wikimedia.org/T138635#2448782 (10Dzahn) 05Open>03Resolved on bast1001: Notice: /Stage[main]/Admin/Admin::Hashuser[bawolff]/Admin::User[bawolff]/User[bawolff]... [18:07:11] RECOVERY - DPKG on labvirt1012 is OK: All packages OK [18:08:01] 06Operations, 10Ops-Access-Requests: Requesting access to deployment hosts (tin/terbium) for Brian Wolff - https://phabricator.wikimedia.org/T138635#2448791 (10Dzahn) [18:08:29] !log welcome new mediawiki deployer Brian Wolff (T138635) [18:08:30] T138635: Requesting access to deployment hosts (tin/terbium) for Brian Wolff - https://phabricator.wikimedia.org/T138635 [18:08:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:09:02] (03CR) 10Ottomata: [C: 032] Use $statistics::working_path variable in statistics::wmde [puppet] - 10https://gerrit.wikimedia.org/r/298314 (owner: 10Ottomata) [18:09:41] PROBLEM - puppet last run on labvirt1014 is CRITICAL: CRITICAL: puppet fail [18:12:21] (03PS4) 10BryanDavis: logstash: Update default mappings for Elasticsearch 2.x [puppet] - 10https://gerrit.wikimedia.org/r/298295 (https://phabricator.wikimedia.org/T136001) [18:12:52] RECOVERY - puppet last run on labvirt1012 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [18:14:52] (03CR) 10Ori.livneh: [C: 031] "Thanks" [software/statsdlb] - 10https://gerrit.wikimedia.org/r/297281 (owner: 10Filippo Giunchedi) [18:15:27] (03PS1) 10Andrew Bogott: Henceforth, recommend linux-image-generic-lts-xenial for compute nodes. [puppet] - 10https://gerrit.wikimedia.org/r/298316 [18:18:29] 06Operations, 10Ops-Access-Requests: Requesting access to deployment hosts (tin/terbium) for Brian Wolff - https://phabricator.wikimedia.org/T138635#2448890 (10Dzahn) P.S.Yes, and the maintenance script hosts are included too. i checked your user exists there now. in eqiad, terbium.eqiad.wmnet in codfw, wasa... [18:22:17] (03PS5) 10BryanDavis: logstash: Update default mappings for Elasticsearch 2.x [puppet] - 10https://gerrit.wikimedia.org/r/298295 (https://phabricator.wikimedia.org/T136001) [18:24:02] (03PS1) 10Yuvipanda: Add a webservice shell command [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 [18:24:32] (03CR) 10jenkins-bot: [V: 04-1] Add a webservice shell command [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 (owner: 10Yuvipanda) [18:25:18] (03PS2) 10Yuvipanda: Add a webservice shell command [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 [18:25:51] (03CR) 10jenkins-bot: [V: 04-1] Add a webservice shell command [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 (owner: 10Yuvipanda) [18:26:42] PROBLEM - salt-minion processes on labvirt1012 is CRITICAL: Connection refused by host [18:27:12] PROBLEM - SSH on labvirt1012 is CRITICAL: Connection refused [18:27:22] PROBLEM - DPKG on labvirt1012 is CRITICAL: Connection refused by host [18:27:41] PROBLEM - Disk space on labvirt1012 is CRITICAL: Connection refused by host [18:27:41] RECOVERY - puppet last run on labvirt1014 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [18:27:52] PROBLEM - configured eth on labvirt1012 is CRITICAL: Connection refused by host [18:28:11] RECOVERY - puppet last run on labvirt1013 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [18:28:12] PROBLEM - dhclient process on labvirt1012 is CRITICAL: Connection refused by host [18:28:22] PROBLEM - HP RAID on labvirt1012 is CRITICAL: Connection refused by host [18:28:41] PROBLEM - puppet last run on labvirt1012 is CRITICAL: Connection refused by host [18:32:20] PROBLEM - salt-minion processes on labvirt1013 is CRITICAL: Connection refused by host [18:32:29] PROBLEM - Disk space on labvirt1013 is CRITICAL: Connection refused by host [18:32:29] PROBLEM - dhclient process on labvirt1014 is CRITICAL: Connection refused by host [18:32:39] PROBLEM - HP RAID on labvirt1014 is CRITICAL: Connection refused by host [18:32:48] PROBLEM - Disk space on labvirt1014 is CRITICAL: Connection refused by host [18:32:50] (03PS3) 10Dzahn: admin: add eventlogging-admins in webperf role [puppet] - 10https://gerrit.wikimedia.org/r/298120 (https://phabricator.wikimedia.org/T139202) [18:33:00] PROBLEM - NTP on labvirt1014 is CRITICAL: NTP CRITICAL: Offset unknown [18:33:18] PROBLEM - Host labvirt1012 is DOWN: PING CRITICAL - Packet loss = 100% [18:33:18] PROBLEM - configured eth on labvirt1014 is CRITICAL: Connection refused by host [18:33:30] PROBLEM - DPKG on labvirt1013 is CRITICAL: Connection refused by host [18:33:49] PROBLEM - HP RAID on labvirt1013 is CRITICAL: Connection refused by host [18:33:49] PROBLEM - puppet last run on labvirt1014 is CRITICAL: Connection refused by host [18:34:08] PROBLEM - NTP on labvirt1013 is CRITICAL: NTP CRITICAL: Offset unknown [18:34:08] PROBLEM - salt-minion processes on labvirt1014 is CRITICAL: Connection refused by host [18:34:19] RECOVERY - dhclient process on labvirt1012 is OK: PROCS OK: 0 processes with command name dhclient [18:34:19] PROBLEM - SSH on labvirt1013 is CRITICAL: Connection refused [18:34:20] (03CR) 10Dzahn: [C: 031] "Thanks Moritz and Otto! Yes, role webperf makes sense" [puppet] - 10https://gerrit.wikimedia.org/r/298120 (https://phabricator.wikimedia.org/T139202) (owner: 10Dzahn) [18:34:29] RECOVERY - Host labvirt1012 is UP: PING OK - Packet loss = 0%, RTA = 1.43 ms [18:34:29] (03PS4) 10Dzahn: admin: add eventlogging-admins in webperf role [puppet] - 10https://gerrit.wikimedia.org/r/298120 (https://phabricator.wikimedia.org/T139202) [18:34:39] PROBLEM - configured eth on labvirt1013 is CRITICAL: Connection refused by host [18:34:58] PROBLEM - dhclient process on labvirt1013 is CRITICAL: Connection refused by host [18:34:58] RECOVERY - Disk space on labvirt1012 is OK: DISK OK [18:35:09] RECOVERY - SSH on labvirt1012 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [18:35:19] RECOVERY - HP RAID on labvirt1012 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:1:5, 2I:1:6, Controller, Battery/Capacitor [18:35:25] (03CR) 10Dzahn: [C: 032] "approved in meeting, re-adding access (via role that is on hafnium)" [puppet] - 10https://gerrit.wikimedia.org/r/298120 (https://phabricator.wikimedia.org/T139202) (owner: 10Dzahn) [18:35:28] RECOVERY - DPKG on labvirt1012 is OK: All packages OK [18:35:29] RECOVERY - puppet last run on labvirt1012 is OK: OK: Puppet is currently enabled, last run 16 minutes ago with 0 failures [18:35:49] PROBLEM - puppet last run on labvirt1013 is CRITICAL: Connection refused by host [18:36:08] PROBLEM - DPKG on labvirt1014 is CRITICAL: Connection refused by host [18:36:08] RECOVERY - configured eth on labvirt1012 is OK: OK - interfaces up [18:36:18] RECOVERY - salt-minion processes on labvirt1012 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [18:36:18] RECOVERY - NTP on labvirt1013 is OK: NTP OK: Offset 0.002319812775 secs [18:36:30] (03PS1) 10Ottomata: Fix mode of wmde scripts [puppet] - 10https://gerrit.wikimedia.org/r/298321 [18:36:45] (03CR) 10Ottomata: [C: 032 V: 032] Fix mode of wmde scripts [puppet] - 10https://gerrit.wikimedia.org/r/298321 (owner: 10Ottomata) [18:36:58] PROBLEM - SSH on labvirt1014 is CRITICAL: Connection refused [18:36:58] RECOVERY - NTP on labvirt1014 is OK: NTP OK: Offset 0.0001848936081 secs [18:37:39] (03PS5) 10Dzahn: admin: add eventlogging-admins in webperf role [puppet] - 10https://gerrit.wikimedia.org/r/298120 (https://phabricator.wikimedia.org/T139202) [18:37:40] PROBLEM - kvm ssl cert on labvirt1014 is CRITICAL: Connection refused by host [18:37:42] !log new hd for failed array in labstore2001 [18:37:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:38:05] andrewbogott: I downtimed labvirt101[34] for 48 hours fyi [18:38:17] lot of alert spam here [18:38:25] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/298120 (https://phabricator.wikimedia.org/T139202) (owner: 10Dzahn) [18:38:28] oh, thanks. More reboots than I was planning on :( [18:39:57] if you like you can use a shell script on neon to schedule downtimes, no need to click in the web ui [18:42:03] (03PS1) 10Ottomata: Fix paths to scripts in wmde minutely and daily.sh [puppet] - 10https://gerrit.wikimedia.org/r/298323 [18:42:09] RECOVERY - kvm ssl cert on labvirt1014 is OK: Cert /etc/ssl/localcerts/labvirt-star.eqiad.wmnet.crt will not expire for at least 90 days [18:42:19] RECOVERY - DPKG on labvirt1013 is OK: All packages OK [18:42:28] (03CR) 10Ottomata: [C: 032 V: 032] Fix paths to scripts in wmde minutely and daily.sh [puppet] - 10https://gerrit.wikimedia.org/r/298323 (owner: 10Ottomata) [18:42:28] RECOVERY - salt-minion processes on labvirt1013 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [18:42:30] RECOVERY - puppet last run on labvirt1013 is OK: OK: Puppet is currently enabled, last run 14 minutes ago with 0 failures [18:42:39] RECOVERY - puppet last run on labvirt1014 is OK: OK: Puppet is currently enabled, last run 14 minutes ago with 0 failures [18:42:48] RECOVERY - HP RAID on labvirt1013 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:1:5, 2I:1:6, Controller, Battery/Capacitor [18:42:49] RECOVERY - DPKG on labvirt1014 is OK: All packages OK [18:42:49] RECOVERY - dhclient process on labvirt1014 is OK: PROCS OK: 0 processes with command name dhclient [18:42:49] RECOVERY - Disk space on labvirt1013 is OK: DISK OK [18:42:59] RECOVERY - HP RAID on labvirt1014 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:1:5, 2I:1:6, Controller, Battery/Capacitor [18:42:59] RECOVERY - salt-minion processes on labvirt1014 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [18:43:10] RECOVERY - Disk space on labvirt1014 is OK: DISK OK [18:43:18] RECOVERY - SSH on labvirt1013 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [18:43:38] RECOVERY - configured eth on labvirt1013 is OK: OK - interfaces up [18:43:39] RECOVERY - SSH on labvirt1014 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.7 (protocol 2.0) [18:43:50] RECOVERY - dhclient process on labvirt1013 is OK: PROCS OK: 0 processes with command name dhclient [18:44:00] RECOVERY - configured eth on labvirt1014 is OK: OK - interfaces up [18:45:09] (03CR) 10Brian Wolff: "Whee!" [puppet] - 10https://gerrit.wikimedia.org/r/297457 (https://phabricator.wikimedia.org/T138635) (owner: 10Dzahn) [18:46:22] (03PS2) 10Andrew Bogott: Henceforth, recommend linux-image-generic-lts-xenial for compute nodes. [puppet] - 10https://gerrit.wikimedia.org/r/298316 [18:46:24] (03PS1) 10Andrew Bogott: Set up device labels for labvirt1012, 1013, 1014 [puppet] - 10https://gerrit.wikimedia.org/r/298325 [18:47:06] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: access: eventlogging-admins -> hafnium - https://phabricator.wikimedia.org/T139202#2449175 (10Dzahn) After some discussion on Gerrit and approval in ops meeting, the patch got amended to re-add this group to the node, but via the "role webperf" in hier... [18:47:32] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: access: eventlogging-admins -> hafnium - https://phabricator.wikimedia.org/T139202#2449176 (10Dzahn) 05Open>03Resolved [18:49:00] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: access: eventlogging-admins -> hafnium - https://phabricator.wikimedia.org/T139202#2422507 (10Dzahn) [hafnium:~] $ id nuria uid=4193(nuria) gid=500(wikidev) groups=500(wikidev),733(eventlogging-admins),739(eventlogging-roots) [hafnium:~] $ id legoktm u... [18:49:33] 06Operations, 10Ops-Access-Requests: access: eventlogging-admins -> hafnium - https://phabricator.wikimedia.org/T139202#2449190 (10Dzahn) [18:49:53] (03PS2) 10Andrew Bogott: Set up volumes for labvirt1012, 1013, 1014 [puppet] - 10https://gerrit.wikimedia.org/r/298325 [18:51:21] (03CR) 10Andrew Bogott: [C: 032] Set up volumes for labvirt1012, 1013, 1014 [puppet] - 10https://gerrit.wikimedia.org/r/298325 (owner: 10Andrew Bogott) [18:56:31] (03PS2) 10Dzahn: typos file: add 'mariabd' and 'eqad' [puppet] - 10https://gerrit.wikimedia.org/r/298033 [19:00:03] 06Operations: (www.)wmfusercontent.org should respond to HTTP - https://phabricator.wikimedia.org/T104735#2449250 (10Bawolff) >>! In T104735#2448220, @Mike_Peel wrote: > Thanks @Bawolff! wmfusercontent.org now works as expected. However, accessing https://phab.wmfusercontent.org/ gives the error message: To be... [19:00:05] gehel: Respected human, time to deploy logstash / kibana / elasticsearch upgrade CANCELED will be rescheduled. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160711T1900). Please do the needful. [19:00:05] ebernhardson and bd808: A patch you scheduled for logstash / kibana / elasticsearch upgrade CANCELED will be rescheduled. is about to be deployed. Please be available during the process. [19:00:06] (03PS3) 10Dzahn: typos file: add 'mariabd' and 'eqad' [puppet] - 10https://gerrit.wikimedia.org/r/298033 [19:00:58] (03PS3) 10Dzahn: backup: Use PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/295786 (owner: 10Muehlenhoff) [19:02:57] 06Operations: (www.)wmfusercontent.org should respond to HTTP - https://phabricator.wikimedia.org/T104735#2449279 (10Mike_Peel) >>! In T104735#2449250, @Bawolff wrote: >>>! In T104735#2448220, @Mike_Peel wrote: >> Thanks @Bawolff! wmfusercontent.org now works as expected. However, accessing https://phab.wmfuserc... [19:03:05] 06Operations, 10Ops-Access-Requests: analytics server access request for three users from CPS Data Consulting - https://phabricator.wikimedia.org/T139764#2449280 (10spatton) @Jgreen, @MoritzMuehlenhoff: Current contract end date is December 7th, 2016, but we plan to extend thru the end of December (and the end... [19:04:18] (03PS2) 10Ottomata: Upgrading cdh module [puppet] - 10https://gerrit.wikimedia.org/r/297873 (owner: 10Nuria) [19:04:59] (03CR) 10Ottomata: [C: 032 V: 032] Upgrading cdh module [puppet] - 10https://gerrit.wikimedia.org/r/297873 (owner: 10Nuria) [19:05:06] (03PS2) 10Dzahn: admin: add yubikey ssh key for ladsgroup [puppet] - 10https://gerrit.wikimedia.org/r/298130 (owner: 10Ladsgroup) [19:06:02] (03CR) 10Dzahn: "@Ladsgroup could you paste it on office wiki on your user page and/or sign it with the GPG key created at Wikimania" [puppet] - 10https://gerrit.wikimedia.org/r/298130 (owner: 10Ladsgroup) [19:06:26] Logstash deployment reported to next week... [19:08:03] (03PS1) 10Ottomata: Update cdh module to proper sha [puppet] - 10https://gerrit.wikimedia.org/r/298330 [19:08:45] (03CR) 10Ottomata: [C: 032 V: 032] Update cdh module to proper sha [puppet] - 10https://gerrit.wikimedia.org/r/298330 (owner: 10Ottomata) [19:21:32] (03PS3) 10Yuvipanda: Add a webservice shell command [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 [19:22:37] (03PS4) 10Yuvipanda: Add a webservice shell command [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 [19:25:17] (03CR) 10jenkins-bot: [V: 04-1] Add a webservice shell command [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 (owner: 10Yuvipanda) [19:27:00] (03PS1) 10Nuria: Using default value for log retention policy in yarn [puppet/cdh] - 10https://gerrit.wikimedia.org/r/298332 (https://phabricator.wikimedia.org/T139178) [19:27:17] (03PS2) 10Nuria: Using default value for log retention policy in yarn [puppet/cdh] - 10https://gerrit.wikimedia.org/r/298332 (https://phabricator.wikimedia.org/T139178) [19:30:24] (03CR) 10Ottomata: [C: 032] Using default value for log retention policy in yarn [puppet/cdh] - 10https://gerrit.wikimedia.org/r/298332 (https://phabricator.wikimedia.org/T139178) (owner: 10Nuria) [19:31:16] (03PS1) 10Andrew Bogott: Allow projectadmins (on the commandline) to specify labvirt hosts. [puppet] - 10https://gerrit.wikimedia.org/r/298333 [19:33:30] (03PS4) 10Reedy: Load RestBaseUpdateJobs via wfLoadExtension() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298095 (https://phabricator.wikimedia.org/T139800) [19:37:50] (03PS1) 10Nuria: Upgrade of cdh module [puppet] - 10https://gerrit.wikimedia.org/r/298335 [19:38:03] (03PS5) 10Yuvipanda: Add a webservice shell command [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 (https://phabricator.wikimedia.org/T139952) [19:38:58] (03CR) 10Ottomata: [C: 032 V: 032] Upgrade of cdh module [puppet] - 10https://gerrit.wikimedia.org/r/298335 (owner: 10Nuria) [19:39:27] (03PS1) 10BBlack: Insecure POST: 20% fail for labs, 100% for external [puppet] - 10https://gerrit.wikimedia.org/r/298336 (https://phabricator.wikimedia.org/T136674) [19:39:38] (03PS6) 10Yuvipanda: Add a webservice shell command [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 (https://phabricator.wikimedia.org/T139952) [19:40:17] (03Abandoned) 10Yuvipanda: Factor out container specification [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298302 (owner: 10Yuvipanda) [19:43:00] (03CR) 10BryanDavis: [C: 031] Insecure POST: 20% fail for labs, 100% for external [puppet] - 10https://gerrit.wikimedia.org/r/298336 (https://phabricator.wikimedia.org/T136674) (owner: 10BBlack) [19:43:10] Are you aware of the current login errors at enWP? [19:43:16] [V4PWKApAAEQAAIrxM9QAAACM] 2016-07-11 17:23:53: Fatal exception of type "Exception" [19:43:47] anomie: tgr|away ^ [19:44:19] (03CR) 10Dzahn: [C: 031] Insecure POST: 20% fail for labs, 100% for external [puppet] - 10https://gerrit.wikimedia.org/r/298336 (https://phabricator.wikimedia.org/T136674) (owner: 10BBlack) [19:44:32] twentyafterfour experenced that error yesturday [19:44:57] 06Operations, 06Labs, 10Labs-Infrastructure: Some labs instances IP have multiple PTR entries in DNS - https://phabricator.wikimedia.org/T115194#2449556 (10Ottomata) Ok, except @Andrew just told me to tack these on... :) ``` otto@deployment-kafka03:~$ host 10.68.16.138 138.16.68.10.in-addr.arpa domain name... [19:44:59] 06Operations, 10Traffic: Content purges are unreliable - https://phabricator.wikimedia.org/T133821#2449554 (10BBlack) >>! In T133821#2352086, @ori wrote: >>>! In T133821#2245711, @BBlack wrote: >> However, we reverted this because it seemed to make the race issues worse at the time. > > How did you know? Bec... [19:49:05] (03PS7) 10Yuvipanda: Add a webservice shell command [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 (https://phabricator.wikimedia.org/T139952) [19:50:37] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016), 13Patch-For-Review: Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2449600 (10BBlack) The patch link above is pretty self-descriptive, and I'm planning to deploy that tomorrow. Will u... [19:52:01] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/install/deploy labvirt nodes - https://phabricator.wikimedia.org/T138509#2449618 (10Andrew) [19:52:14] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/install/deploy labvirt nodes - https://phabricator.wikimedia.org/T138509#2402452 (10Andrew) 05Open>03Resolved Thank you Chris! [19:52:48] (03PS3) 10Dzahn: gerrit: make Apache config compatible with 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/298041 [19:54:06] (03PS4) 10Dzahn: gerrit: make Apache config compatible with 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/298041 [19:54:12] (03CR) 10Paladox: [C: 031] "Looks ok to me :)" [puppet] - 10https://gerrit.wikimedia.org/r/298041 (owner: 10Dzahn) [19:55:07] 06Operations, 10Ops-Access-Requests: Requesting access to deployment hosts (tin/terbium) for Brian Wolff - https://phabricator.wikimedia.org/T138635#2449667 (10Bawolff) Awesome. Thank you :) [19:56:31] (03PS8) 10Yuvipanda: Add a webservice shell command [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 (https://phabricator.wikimedia.org/T139952) [19:58:51] (03PS1) 10Nuria: Correcting typo on yarn-site.xml [puppet/cdh] - 10https://gerrit.wikimedia.org/r/298338 (https://phabricator.wikimedia.org/T139178) [20:00:04] gwicke, cscott, arlolra, subbu, bearND, and mdholloway: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / Mobileapps / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160711T2000). [20:00:55] (03CR) 10Paladox: [C: 031] gerrit: make Apache config compatible with 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/298041 (owner: 10Dzahn) [20:01:50] (03CR) 10Ottomata: [C: 032] Correcting typo on yarn-site.xml [puppet/cdh] - 10https://gerrit.wikimedia.org/r/298338 (https://phabricator.wikimedia.org/T139178) (owner: 10Nuria) [20:03:26] !log starting parsoid deploy [20:03:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:04:16] 06Operations, 06Labs, 10Labs-Infrastructure, 07IPv6: Enable ipv6 on labs - https://phabricator.wikimedia.org/T37947#399081 (10FastLizard4) Bumping this task. We could use IPv6 connectivity for the account-creation-assistance project. Since IPv6 addresses are starting to show up on Wikipedia now, it becom... [20:04:53] (03PS9) 10Yuvipanda: Add a webservice shell command [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 (https://phabricator.wikimedia.org/T139952) [20:04:55] (03PS1) 10Nuria: Upgrading cdh module [puppet] - 10https://gerrit.wikimedia.org/r/298341 [20:05:44] !log synced new parsoid code; restarted parsoid on wtp1001 as a canary [20:05:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:08:59] (03CR) 10Ottomata: [C: 032] Upgrading cdh module [puppet] - 10https://gerrit.wikimedia.org/r/298341 (owner: 10Nuria) [20:09:29] !log finished deploying parsoid sha e738c415 [20:09:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:09:34] time to verify [20:10:12] (03CR) 10BryanDavis: Add a webservice shell command (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 (https://phabricator.wikimedia.org/T139952) (owner: 10Yuvipanda) [20:15:22] (03PS10) 10Yuvipanda: Add a webservice shell command [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 (https://phabricator.wikimedia.org/T139952) [20:23:02] (03CR) 10BryanDavis: [C: 032] "Let's give it a shot!" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 (https://phabricator.wikimedia.org/T139952) (owner: 10Yuvipanda) [20:23:33] (03Merged) 10jenkins-bot: Add a webservice shell command [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 (https://phabricator.wikimedia.org/T139952) (owner: 10Yuvipanda) [20:25:25] !log starting mobileapps deployment [20:25:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:27:02] (03PS2) 10Gehel: Maps - enable tileratorui notification of new expier files [puppet] - 10https://gerrit.wikimedia.org/r/298288 (https://phabricator.wikimedia.org/T139451) [20:29:40] !log mobileapps deployed df16702 [20:29:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:31:39] !log rolling restart of hadoop-yarn-nodemanager to apply log aggregation retention seconds [20:31:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:32:27] PROBLEM - puppet last run on db2056 is CRITICAL: CRITICAL: puppet fail [20:32:28] (03PS1) 10Jforrester: Enable ShortUrl on Urdu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298344 (https://phabricator.wikimedia.org/T138507) [20:34:35] (03CR) 10Merlijn van Deen: "It seems I'm too late :(" (036 comments) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 (https://phabricator.wikimedia.org/T139952) (owner: 10Yuvipanda) [20:37:37] PROBLEM - Hadoop NodeManager on analytics1032 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [20:38:21] that's me, on it [20:39:20] that's me in the corner [20:39:31] (03CR) 10Gehel: [C: 032] Maps - enable tileratorui notification of new expier files [puppet] - 10https://gerrit.wikimedia.org/r/298288 (https://phabricator.wikimedia.org/T139451) (owner: 10Gehel) [20:39:48] RECOVERY - Hadoop NodeManager on analytics1032 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.yarn.server.nodemanager.NodeManager [20:43:06] (03PS3) 10Reedy: Alphasort extension-list-labs, use extension.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298236 [20:43:13] (03CR) 10Reedy: [C: 032] Alphasort extension-list-labs, use extension.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298236 (owner: 10Reedy) [20:43:57] (03Merged) 10jenkins-bot: Alphasort extension-list-labs, use extension.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298236 (owner: 10Reedy) [20:44:46] !log reedy@tin Synchronized wmf-config/extension-list-labs: nooop for prod (duration: 00m 32s) [20:44:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:47:07] (03PS1) 10Yuvipanda: Normalize exit codes [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298347 [20:47:53] (03CR) 10Yuvipanda: Add a webservice shell command (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298318 (https://phabricator.wikimedia.org/T139952) (owner: 10Yuvipanda) [20:48:24] (03PS1) 10Yuvipanda: Slightly better error message [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298348 [20:48:33] 06Operations, 06Labs, 10Labs-Infrastructure: Some labs instances IP have multiple PTR entries in DNS - https://phabricator.wikimedia.org/T115194#2449987 (10hashar) @Ottomata Fair call sorry :-) Nodepool spawns instances with an incremental ID to give some indication about the progress: | Time (UTC) | ID |-... [20:50:47] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:52:57] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [20:53:37] (03PS1) 10BBlack: Add nginx.org ubsan shift patches [software/nginx] (wmf-1.11.2) - 10https://gerrit.wikimedia.org/r/298350 [20:53:39] (03PS1) 10BBlack: Add Cloudflare TLS dynamic record sizing [software/nginx] (wmf-1.11.2) - 10https://gerrit.wikimedia.org/r/298351 [20:53:41] (03PS1) 10BBlack: nginx (1.11.2-1+wmf1) jessie; urgency=medium [software/nginx] (wmf-1.11.2) - 10https://gerrit.wikimedia.org/r/298352 [20:56:48] RECOVERY - puppet last run on db2056 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [21:01:48] (03CR) 10BryanDavis: "Tested on beta cluster. I'm going to apply the mapping generates manually on the production cluster now." [puppet] - 10https://gerrit.wikimedia.org/r/298295 (https://phabricator.wikimedia.org/T136001) (owner: 10BryanDavis) [21:03:10] !log Updated default mapping for logstash-* index creation using json generated by https://gerrit.wikimedia.org/r/#/c/298295/. Should take effect starting with the logstash-2016.07.12 index. [21:03:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:04:10] 06Operations, 10Gerrit: Update gerrit sshkey in role::ci::slave::labs when upgrade to Jessie happens - https://phabricator.wikimedia.org/T131903#2450053 (10Paladox) p:05Low>03High Changing to high priority due to the fact we are near to updating gerrit sometime this week. [21:06:48] paladox: that kind of priority changing doesn't really help anything. It's an obvious thing that needs to be done after the upgrade happens (hence the relationship). [21:07:24] Sorry [21:09:41] 06Operations, 10Gerrit: Update gerrit sshkey in role::ci::slave::labs when upgrade to Jessie happens - https://phabricator.wikimedia.org/T131903#2450119 (10hashar) Good catch! I guess @chad has a pending patch that replaces all occurences of the host key. Maybe the key will be migrated to the new server an... [21:11:57] 06Operations, 10Gerrit: Update gerrit sshkey in role::ci::slave::labs when upgrade to Jessie happens - https://phabricator.wikimedia.org/T131903#2450142 (10Dzahn) [21:11:59] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: reinstall/upgrade gerrit server (ytterbium) from precise to jessie - https://phabricator.wikimedia.org/T125018#2450143 (10Dzahn) [21:14:45] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: replace gerrit server (ytterbium) with jessie server (lead) - https://phabricator.wikimedia.org/T125018#2450154 (10Dzahn) [21:16:43] (03PS1) 10Dzahn: contint/gerrit: allow ssh for git on new gerrit server [puppet] - 10https://gerrit.wikimedia.org/r/298377 (https://phabricator.wikimedia.org/T125018) [21:17:51] (03PS2) 10Paladox: contint/gerrit: allow ssh for git on new gerrit server [puppet] - 10https://gerrit.wikimedia.org/r/298377 (https://phabricator.wikimedia.org/T125018) (owner: 10Dzahn) [21:18:07] (03PS3) 10Paladox: contint/gerrit: allow ssh for git on new gerrit server [puppet] - 10https://gerrit.wikimedia.org/r/298377 (https://phabricator.wikimedia.org/T125018) (owner: 10Dzahn) [21:19:34] (03PS5) 10Dzahn: gerrit: make Apache config compatible with 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/298041 (https://phabricator.wikimedia.org/T70271) [21:20:53] (03PS6) 10Paladox: gerrit: make Apache config compatible with 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/298041 (https://phabricator.wikimedia.org/T70271) (owner: 10Dzahn) [21:21:52] (03PS7) 10Dzahn: gerrit: make Apache config compatible with 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/298041 (https://phabricator.wikimedia.org/T70271) [21:22:38] (03PS1) 10BryanDavis: logstash: Remove normalize_fields fitler [puppet] - 10https://gerrit.wikimedia.org/r/298381 [21:22:40] (03PS1) 10BryanDavis: logstash: Remove all _* fields from gelf records [puppet] - 10https://gerrit.wikimedia.org/r/298382 [21:23:48] (03CR) 10BryanDavis: [C: 04-1] "Handled more generically by I638d88e1d874fdb8be211bd74a1e36998d42dc09" [puppet] - 10https://gerrit.wikimedia.org/r/298242 (owner: 10EBernhardson) [21:24:22] (03Abandoned) 10EBernhardson: logstash: Normalize a few more fields [puppet] - 10https://gerrit.wikimedia.org/r/298242 (owner: 10EBernhardson) [21:25:25] (03CR) 10Dzahn: [C: 031] "we can just merge this. on the old server it's duplicate and on the new server apache isnt running yet" [puppet] - 10https://gerrit.wikimedia.org/r/297723 (https://phabricator.wikimedia.org/T132661) (owner: 10Dzahn) [21:26:08] (03CR) 10Paladox: [C: 031] gerrit: remove NameVirtualHost *:80 from Apache template [puppet] - 10https://gerrit.wikimedia.org/r/297723 (https://phabricator.wikimedia.org/T132661) (owner: 10Dzahn) [21:26:58] (03CR) 10BryanDavis: "The graphoid application seems to at least occasionally trigger the error mentioned in the commit message by setting fields like `"_id": "" [puppet] - 10https://gerrit.wikimedia.org/r/298382 (owner: 10BryanDavis) [21:27:06] (03PS3) 10Andrew Bogott: Henceforth, recommend linux-image-generic-lts-xenial for compute nodes. [puppet] - 10https://gerrit.wikimedia.org/r/298316 [21:27:22] (03PS2) 10Andrew Bogott: Allow projectadmins (on the commandline) to specify labvirt hosts. [puppet] - 10https://gerrit.wikimedia.org/r/298333 [21:28:10] (03CR) 10Paladox: [C: 031] contint/gerrit: allow ssh for git on new gerrit server [puppet] - 10https://gerrit.wikimedia.org/r/298377 (https://phabricator.wikimedia.org/T125018) (owner: 10Dzahn) [21:28:47] (03PS2) 10Dzahn: gerrit: remove NameVirtualHost *:80 from Apache template [puppet] - 10https://gerrit.wikimedia.org/r/297723 (https://phabricator.wikimedia.org/T132661) [21:28:57] (03PS3) 10Dzahn: gerrit: remove NameVirtualHost *:80 from Apache template [puppet] - 10https://gerrit.wikimedia.org/r/297723 (https://phabricator.wikimedia.org/T132661) [21:28:59] (03CR) 10Andrew Bogott: [C: 032] Henceforth, recommend linux-image-generic-lts-xenial for compute nodes. [puppet] - 10https://gerrit.wikimedia.org/r/298316 (owner: 10Andrew Bogott) [21:29:14] (03CR) 10Andrew Bogott: [C: 032] Allow projectadmins (on the commandline) to specify labvirt hosts. [puppet] - 10https://gerrit.wikimedia.org/r/298333 (owner: 10Andrew Bogott) [21:29:29] (03PS2) 10Alex Monk: [labs/deployment-prep] Switch file backends to swift [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298299 (https://phabricator.wikimedia.org/T64835) [21:30:05] (03CR) 10BryanDavis: [C: 032] Normalize exit codes [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298347 (owner: 10Yuvipanda) [21:30:27] (03CR) 10BryanDavis: [C: 032] Slightly better error message [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298348 (owner: 10Yuvipanda) [21:30:36] (03Merged) 10jenkins-bot: Normalize exit codes [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298347 (owner: 10Yuvipanda) [21:31:12] (03Merged) 10jenkins-bot: Slightly better error message [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/298348 (owner: 10Yuvipanda) [21:31:13] 06Operations, 06Discovery, 06Maps, 10Maps-data: Maps - enable Geoshapes on production - https://phabricator.wikimedia.org/T138525#2450225 (10Gehel) [21:31:19] (03PS4) 10Dzahn: gerrit: remove NameVirtualHost *:80 from Apache template [puppet] - 10https://gerrit.wikimedia.org/r/297723 (https://phabricator.wikimedia.org/T132661) [21:31:23] thanks bd808 (IRC) [21:32:21] yw yuvipanda (Matrix) ;) [21:32:48] haha :D I need to move to my own bridge... [21:34:00] * bd808 took the red pill [21:35:20] (03PS3) 10Alex Monk: [labs/deployment-prep] Switch file backends to swift [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298299 (https://phabricator.wikimedia.org/T64835) [21:35:44] (03CR) 10Dzahn: [C: 032] gerrit: remove NameVirtualHost *:80 from Apache template [puppet] - 10https://gerrit.wikimedia.org/r/297723 (https://phabricator.wikimedia.org/T132661) (owner: 10Dzahn) [21:36:18] PROBLEM - puppet last run on labservices1002 is CRITICAL: CRITICAL: Puppet has 1 failures [21:37:27] (03PS4) 10Alex Monk: [labs/deployment-prep] Switch file backends to swift [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298299 (https://phabricator.wikimedia.org/T64835) [21:37:34] 06Operations, 06Discovery, 06Maps, 10Maps-data: Maps - enable Geoshapes on production - https://phabricator.wikimedia.org/T138525#2450235 (10Gehel) There are multiple [[ http://data.wmflabs.org/wiki/Regional_maps | examples of Geoshapes ]], showing that the service works and behaves correctly. Limit on max... [21:37:45] !log ytterbium - graceful'ed Apache, warning about duplicate NameVirtual host is gone [21:37:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:40:01] (03CR) 10Dzahn: "@Alex After thinking about it again i'd like to go back to the original simple change that just removes them. On Apache 2.2 it is duplicat" [puppet] - 10https://gerrit.wikimedia.org/r/297727 (https://phabricator.wikimedia.org/T132661) (owner: 10Dzahn) [21:40:38] 06Operations, 06Discovery, 06Maps, 10Maps-data: Maps - enable Geoshapes on production - https://phabricator.wikimedia.org/T138525#2450255 (10Gehel) [21:40:40] 06Operations, 06Discovery, 06Maps, 07Epic: Epic: switch Maps to production status - https://phabricator.wikimedia.org/T133744#2241419 (10Gehel) [21:42:38] (03CR) 10Alex Monk: [C: 032] "Only touches labs files" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298299 (https://phabricator.wikimedia.org/T64835) (owner: 10Alex Monk) [21:45:50] (03PS1) 10Andrew Bogott: Revert "TEMPORARY HACK: Add access_new_install to iron" [puppet] - 10https://gerrit.wikimedia.org/r/298385 (https://phabricator.wikimedia.org/T139967) [21:46:38] (03CR) 10Chad: [C: 031] contint/gerrit: allow ssh for git on new gerrit server [puppet] - 10https://gerrit.wikimedia.org/r/298377 (https://phabricator.wikimedia.org/T125018) (owner: 10Dzahn) [21:46:45] (03CR) 10Chad: [C: 031] gerrit: make Apache config compatible with 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/298041 (https://phabricator.wikimedia.org/T70271) (owner: 10Dzahn) [21:47:38] (03PS2) 10Andrew Bogott: Revert "TEMPORARY HACK: Add access_new_install to iron" [puppet] - 10https://gerrit.wikimedia.org/r/298385 (https://phabricator.wikimedia.org/T139967) [21:47:52] (03PS5) 10Alex Monk: [labs/deployment-prep] Switch file backends to swift [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298299 (https://phabricator.wikimedia.org/T64835) [21:47:59] (03CR) 10Alex Monk: [labs/deployment-prep] Switch file backends to swift [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298299 (https://phabricator.wikimedia.org/T64835) (owner: 10Alex Monk) [21:48:10] (03CR) 10Alex Monk: [C: 032] "rebased" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298299 (https://phabricator.wikimedia.org/T64835) (owner: 10Alex Monk) [21:49:01] (03Merged) 10jenkins-bot: [labs/deployment-prep] Switch file backends to swift [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298299 (https://phabricator.wikimedia.org/T64835) (owner: 10Alex Monk) [21:49:23] (03CR) 10Andrew Bogott: [C: 032] Revert "TEMPORARY HACK: Add access_new_install to iron" [puppet] - 10https://gerrit.wikimedia.org/r/298385 (https://phabricator.wikimedia.org/T139967) (owner: 10Andrew Bogott) [21:49:34] 06Operations, 10Gerrit, 13Patch-For-Review: Update gerrit sshkey in role::ci::slave::labs when upgrade to Jessie happens - https://phabricator.wikimedia.org/T131903#2450308 (10demon) >>! In T131903#2450119, @hashar wrote: > Good catch! I guess @chad has a pending patch that replaces all occurences of the ho... [21:49:45] I still forget to use scap... [21:49:47] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:50:13] !log krenair@tin Synchronized wmf-config: sync labs-only change, should be a noop here: https://gerrit.wikimedia.org/r/#/c/298299/ (duration: 00m 39s) [21:50:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:51:42] 06Operations, 10Traffic, 10Continuous-Integration-Infrastructure (phase-out-gallium): Move gallium to an internal host? - https://phabricator.wikimedia.org/T133150#2450319 (10Dzahn) >>! In T133150#2447151, @hashar wrote: > `doc.wikimedia.org` is looking for a new home. Ganeti VM ? [21:52:05] 06Operations, 06Labs, 13Patch-For-Review: access_new_install role vs. Labs vs. the future - https://phabricator.wikimedia.org/T139971#2450322 (10Andrew) [21:52:07] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [21:52:07] 06Operations, 06Labs, 13Patch-For-Review: Don't forget to clean the new_install key off of iron - https://phabricator.wikimedia.org/T139967#2450320 (10Andrew) 05Open>03Resolved Patch reverted, keys shredded and removed, script removed. [21:59:42] (03PS8) 10Dzahn: gerrit: make Apache config compatible with 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/298041 (https://phabricator.wikimedia.org/T70271) [22:01:30] (03PS9) 10Paladox: gerrit: make Apache config compatible with 2.4 [puppet] - 10https://gerrit.wikimedia.org/r/298041 (https://phabricator.wikimedia.org/T70271) (owner: 10Dzahn) [22:01:44] (03CR) 10Paladox: [C: 031] "Looks all ok now :)" [puppet] - 10https://gerrit.wikimedia.org/r/298041 (https://phabricator.wikimedia.org/T70271) (owner: 10Dzahn) [22:03:25] RECOVERY - puppet last run on labservices1002 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [22:03:48] (03CR) 10Dzahn: [C: 032] "old server < 2.4 and new server doesnt have it installed yet" [puppet] - 10https://gerrit.wikimedia.org/r/298041 (https://phabricator.wikimedia.org/T70271) (owner: 10Dzahn) [22:06:25] (03PS4) 10Dzahn: contint/gerrit: allow ssh for git on new gerrit server [puppet] - 10https://gerrit.wikimedia.org/r/298377 (https://phabricator.wikimedia.org/T125018) [22:08:19] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/install/deploy labvirt nodes - https://phabricator.wikimedia.org/T138509#2450377 (10Southparkfan) @Andrew labvirt1012 lacks hyperthreading. Can you enable that? [22:11:20] 06Operations, 06Labs, 13Patch-For-Review: access_new_install role vs. Labs vs. the future - https://phabricator.wikimedia.org/T139971#2448488 (10RobH) All of the below is my understanding of things, and it could be wrong! Our historical install process is we used to install a root password via the installer... [22:12:45] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: Puppet has 1 failures [22:15:19] (03CR) 10Dzahn: [C: 032] "it's possible we might want to replace this with ferm::service or use DNS names but for now i'm keeping it consistent between old and new " [puppet] - 10https://gerrit.wikimedia.org/r/298377 (https://phabricator.wikimedia.org/T125018) (owner: 10Dzahn) [22:15:51] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/install/deploy labvirt nodes - https://phabricator.wikimedia.org/T138509#2450414 (10RobH) >>! In T138509#2450377, @Southparkfan wrote: > @Andrew labvirt1012 lacks hyperthreading. Can you enable that? I can confirm via /proc/cpuinfo that labvirt1012... [22:17:54] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: replace gerrit server (ytterbium) with jessie server (lead) - https://phabricator.wikimedia.org/T125018#2450423 (10Dzahn) on gallium (CI server) there are now rules to allow connections from lead [22:25:33] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup/install/deploy labvirt nodes - https://phabricator.wikimedia.org/T138509#2450470 (10RobH) 05Resolved>03Open [22:28:41] (03CR) 10Ori.livneh: [C: 031] prometheus: add node_exporter support [puppet] - 10https://gerrit.wikimedia.org/r/276243 (https://phabricator.wikimedia.org/T92813) (owner: 10Filippo Giunchedi) [22:33:33] (03PS2) 10Dzahn: redirects.dat - split non-canonical to separate section [puppet] - 10https://gerrit.wikimedia.org/r/292785 (https://phabricator.wikimedia.org/T133548) (owner: 10BBlack) [22:34:55] (03CR) 10Dzahn: "PS2: manual rebase, create redirects.conf from script and add to fix AssertionError" [puppet] - 10https://gerrit.wikimedia.org/r/292785 (https://phabricator.wikimedia.org/T133548) (owner: 10BBlack) [22:35:34] (03CR) 10Dzahn: "it's a good sign that the resulting .conf is +/- 199" [puppet] - 10https://gerrit.wikimedia.org/r/292785 (https://phabricator.wikimedia.org/T133548) (owner: 10BBlack) [22:36:43] addshore, https://gerrit.wikimedia.org/r/#/c/296753/ falls under "No new features/extensions" from https://wikitech.wikimedia.org/wiki/SWAT_deploys [22:37:47] MaxSem: so it does! I was actually unaware of that rule! [22:37:53] 06Operations, 06Discovery, 06Maps, 10Maps-data: Maps - enable Geoshapes on production - https://phabricator.wikimedia.org/T138525#2450524 (10Yurik) Agreed. [22:38:07] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [22:40:59] (03PS1) 10Alex Monk: [labs/deployment-prep] Point RedisLockManager to actual redis servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298393 (https://phabricator.wikimedia.org/T64835) [22:41:48] (03CR) 10Alex Monk: [C: 032] [labs/deployment-prep] Point RedisLockManager to actual redis servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298393 (https://phabricator.wikimedia.org/T64835) (owner: 10Alex Monk) [22:41:56] RECOVERY - Last backup of the maps filesystem on labstore1001 is OK: OK - Last run for unit replicate-maps was successful [22:42:16] RECOVERY - Last backup of the others filesystem on labstore1001 is OK: OK - Last run for unit replicate-others was successful [22:42:26] (03Merged) 10jenkins-bot: [labs/deployment-prep] Point RedisLockManager to actual redis servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298393 (https://phabricator.wikimedia.org/T64835) (owner: 10Alex Monk) [22:45:15] RECOVERY - Last backup of the tools filesystem on labstore1001 is OK: OK - Last run for unit replicate-tools was successful [22:46:22] !log krenair@tin Synchronized wmf-config: more labs-only changes: https://gerrit.wikimedia.org/r/298393 (duration: 00m 36s) [22:46:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:47:52] (03CR) 10Dzahn: "get list of hosts to use for testing before/after with apache-fast-test:" [puppet] - 10https://gerrit.wikimedia.org/r/292785 (https://phabricator.wikimedia.org/T133548) (owner: 10BBlack) [22:57:13] greg-g: what deploy slot should I be looking for for the RevisionSlider to go to testwiki then? as SWAT won't do as it is a new feature / extension! Should it go out with the train or in its own magical slot? [22:57:23] magical slot! [22:57:42] addshore: its own slot [22:58:54] (03PS2) 10Dzahn: debug_proxy: Limit to production networks [puppet] - 10https://gerrit.wikimedia.org/r/297982 (owner: 10Muehlenhoff) [22:59:02] (03CR) 10Dzahn: [C: 032] debug_proxy: Limit to production networks [puppet] - 10https://gerrit.wikimedia.org/r/297982 (owner: 10Muehlenhoff) [23:00:05] RoanKattouw, ostriches, MaxSem, and Dereckson: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160711T2300). Please do the needful. [23:00:05] MaxSem and aude: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:18] I'll do it [23:00:18] * aude waves [23:00:21] ok [23:00:23] greg-g: legoktm awesome, how do I go about arranging one of those? :) [23:00:48] addshore: https://wikitech.wikimedia.org/wiki/Deployments [23:00:55] addshore: coordinate with greg-g and edit the wiki :) [23:00:59] that :) [23:01:02] addshore: are you a deployer yet? [23:01:06] aude: nope [23:01:10] MaxSem: I made it to the list BTW https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=736753&oldid=736745 :-) [23:01:30] you should be :) but suppose someone can help with this [23:02:22] So, I guess I find a deployer that is willing to do it for / with me, ask greg-g and add it to the wikipage? :) [23:02:28] (03PS2) 10Dzahn: Run slapd backup on both labs LDAP servers [puppet] - 10https://gerrit.wikimedia.org/r/298287 (owner: 10Muehlenhoff) [23:02:28] did you request it addshore ? [23:02:46] Krenair: where does one request it? [23:02:53] oh wait, requested deployer? no [23:02:53] phabricator? [23:02:57] (03PS1) 10Alex Monk: Merge filebackend-labs.php and filebackend-production.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298397 [23:03:11] (03CR) 10Dzahn: [C: 032] Run slapd backup on both labs LDAP servers [puppet] - 10https://gerrit.wikimedia.org/r/298287 (owner: 10Muehlenhoff) [23:03:22] " To schedule a deploy window, or if you see a potential conflict with your upcoming deployment, please e-mail Greg Grossmeier." [23:03:27] it's like we don't read wiki pages anymore :) [23:03:48] I was in the process of writing an email but thought I would poke you on IRC first to check ;) [23:04:27] it's nice to do email, interrupt driven decisions aren't the best [23:05:16] (03PS2) 10Alex Monk: Merge filebackend-labs.php and filebackend-production.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298397 [23:05:50] /msg memoserv help send [23:05:52] /me hides [23:06:05] awesine greg-g, will do! [23:07:04] mutante: IRC is not my todo list :P [23:07:29] !log maxsem@tin Synchronized php-1.28.0-wmf.9/extensions/Kartographer/: https://gerrit.wikimedia.org/r/#/c/297556/ (duration: 00m 29s) [23:07:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:08:05] (03PS2) 10MaxSem: Add Cape Verdean Creole (kea) as extra language for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297556 (https://phabricator.wikimedia.org/T127435) (owner: 10Thiemo Mättig (WMDE)) [23:08:16] (03CR) 10MaxSem: [C: 032] Add Cape Verdean Creole (kea) as extra language for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297556 (https://phabricator.wikimedia.org/T127435) (owner: 10Thiemo Mättig (WMDE)) [23:08:53] (03Merged) 10jenkins-bot: Add Cape Verdean Creole (kea) as extra language for wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297556 (https://phabricator.wikimedia.org/T127435) (owner: 10Thiemo Mättig (WMDE)) [23:09:09] jouncebot: next puppet swat [23:09:09] In 15 hour(s) and 50 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160712T1500) [23:11:12] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/297556/ (duration: 00m 28s) [23:11:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:11:23] aude, ^ [23:11:45] (03PS2) 10Dzahn: role::horizon: Limit to production networks [puppet] - 10https://gerrit.wikimedia.org/r/297983 (owner: 10Muehlenhoff) [23:12:09] checking [23:12:32] looks good [23:13:23] (03PS2) 10MaxSem: Disable PDF export in the Wikidata Item namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297964 (https://phabricator.wikimedia.org/T136814) (owner: 10Thiemo Mättig (WMDE)) [23:13:29] (03CR) 10MaxSem: [C: 032] Disable PDF export in the Wikidata Item namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297964 (https://phabricator.wikimedia.org/T136814) (owner: 10Thiemo Mättig (WMDE)) [23:14:13] (03Merged) 10jenkins-bot: Disable PDF export in the Wikidata Item namespace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/297964 (https://phabricator.wikimedia.org/T136814) (owner: 10Thiemo Mättig (WMDE)) [23:15:41] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/297964/ (duration: 00m 32s) [23:15:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:15:48] aude, ^ [23:16:08] (03CR) 10Dzahn: [C: 032] role::horizon: Limit to production networks [puppet] - 10https://gerrit.wikimedia.org/r/297983 (owner: 10Muehlenhoff) [23:16:13] looks good [23:19:06] 06Operations, 10Flow, 10MediaWiki-Redirects, 03Collab-Team-2016-Apr-Jun-Q4, and 2 others: Flow notification links on mobile point to desktop - https://phabricator.wikimedia.org/T107108#2450759 (10Etonkovidova) Testing **betalabs** for the scenarios described in the ticket: |** Notification** |**Mobile pa... [23:20:55] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:22:31] 06Operations, 10Flow, 10MediaWiki-Redirects, 03Collab-Team-2016-Apr-Jun-Q4, and 2 others: Flow notification links on mobile point to desktop - https://phabricator.wikimedia.org/T107108#2450763 (10Jdlrobson) @Etonkovidova in case it's useful information it's worth noting that you should run these tests on... [23:23:06] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [23:23:44] (03PS1) 10Awight: Whitelist a bunch of RSS feeds for Fundraising Tech to play with [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298399 [23:27:05] 06Operations, 10Flow, 10MediaWiki-Redirects, 03Collab-Team-2016-Apr-Jun-Q4, and 2 others: Flow notification links on mobile point to desktop - https://phabricator.wikimedia.org/T107108#2450778 (10Etonkovidova) thx @Jdlrobson - it's certainly requires more exhaustive testing. The table in my comment is from... [23:29:56] thus it hung itself, old poor scap [23:30:14] !log maxsem@tin Synchronized php-1.28.0-wmf.9/extensions/Wikidata: https://gerrit.wikimedia.org/r/#/c/298386/ (duration: 02m 00s) [23:30:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:30:27] aude, ^ [23:30:32] checking [23:31:01] looks good [23:31:02] thanks :) [23:32:04] (03PS6) 10Dzahn: Gerrit: Simplify SSL and hostname management [puppet] - 10https://gerrit.wikimedia.org/r/298117 (owner: 10Chad) [23:32:12] !log maxsem@tin Synchronized php-1.28.0-wmf.9/extensions/Citoid/: https://gerrit.wikimedia.org/r/#/c/298327/ (duration: 00m 27s) [23:32:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:32:31] (03CR) 10Dzahn: [C: 032] Gerrit: Simplify SSL and hostname management [puppet] - 10https://gerrit.wikimedia.org/r/298117 (owner: 10Chad) [23:33:38] James_F, ^ [23:33:47] (03PS7) 10Dzahn: Gerrit: Simplify SSL and hostname management [puppet] - 10https://gerrit.wikimedia.org/r/298117 (https://phabricator.wikimedia.org/T125018) (owner: 10Chad) [23:36:10] (03PS7) 10Dzahn: Gerrit: Add lead's remaining hiera overrides we need [puppet] - 10https://gerrit.wikimedia.org/r/298102 (https://phabricator.wikimedia.org/T125018) (owner: 10Chad) [23:36:19] (03PS8) 10Dzahn: Gerrit: Add lead's remaining hiera overrides we need [puppet] - 10https://gerrit.wikimedia.org/r/298102 (https://phabricator.wikimedia.org/T125018) (owner: 10Chad) [23:36:33] !log maxsem@tin Synchronized php-1.28.0-wmf.9/extensions/Echo/: https://gerrit.wikimedia.org/r/#/c/298400/ (duration: 00m 33s) [23:36:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:36:46] Ta. [23:37:29] MaxSem: Works well. [23:37:46] (03PS9) 10Dzahn: Gerrit: Add lead's remaining hiera overrides we need [puppet] - 10https://gerrit.wikimedia.org/r/298102 (https://phabricator.wikimedia.org/T125018) (owner: 10Chad) [23:38:07] can i schedule a last-minute swat thing? [23:38:25] (03CR) 10Dzahn: [C: 032] Gerrit: Add lead's remaining hiera overrides we need [puppet] - 10https://gerrit.wikimedia.org/r/298102 (https://phabricator.wikimedia.org/T125018) (owner: 10Chad) [23:38:30] sure [23:38:32] cherry-pick of https://gerrit.wikimedia.org/r/#/c/298171/ to wmf.9 [23:39:04] Fun. [23:39:05] w00t, that's a long list of bugs... [23:39:09] MaxSem: thanks, i'll add it to Deployments while you deploy :) [23:39:12] MaxSem: that's all one bug, really [23:39:24] but someone filed a dozen copies for different instances [23:39:37] maybe remove tat list when cherry-picking [23:39:46] https://gerrit.wikimedia.org/r/#/c/298402/ [23:39:46] because grrrit-wm died when i merged it [23:39:52] Tut. [23:42:12] (03PS2) 10Awight: Whitelist a bunch of RSS feeds for Fundraising Tech to play with [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298399 [23:44:04] (03CR) 10Chad: Whitelist a bunch of RSS feeds for Fundraising Tech to play with (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298399 (owner: 10Awight) [23:44:14] awight: Left an inline comment mostly for myself [23:45:15] (03CR) 10Awight: Whitelist a bunch of RSS feeds for Fundraising Tech to play with (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298399 (owner: 10Awight) [23:49:17] !log maxsem@tin Synchronized php-1.28.0-wmf.9/resources/: https://gerrit.wikimedia.org/r/#/c/298402/ (duration: 00m 28s) [23:49:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:49:24] (03PS3) 10Dzahn: contint: tidy Nodepool slaves config history [puppet] - 10https://gerrit.wikimedia.org/r/295641 (https://phabricator.wikimedia.org/T126552) (owner: 10Hashar) [23:49:26] MatmaRex, ^ [23:50:06] MaxSem: thanks, italics are back in all the right places :) [23:50:18] \m/ [23:50:27] awight: I can't find where that git.wm.o one is used at all on mw.org [23:51:24] 06Operations, 10Beta-Cluster-Infrastructure: /mnt/upload7 does not exist anywhere, yet it is referenced in multiple places in wmf-config - https://phabricator.wikimedia.org/T129586#2450877 (10AlexMonk-WMF) a:03AlexMonk-WMF [23:52:02] (03CR) 10Dzahn: contint: tidy Nodepool slaves config history (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295641 (https://phabricator.wikimedia.org/T126552) (owner: 10Hashar) [23:52:07] (03PS4) 10Dzahn: contint: tidy Nodepool slaves config history [puppet] - 10https://gerrit.wikimedia.org/r/295641 (https://phabricator.wikimedia.org/T126552) (owner: 10Hashar) [23:52:34] ostriches: Kind of scary though, cos I'm not sure that the URLs will appear in search-indexed content [23:53:16] I used Special:LinkSearch [23:53:21] I could regex the search index [23:54:07] Aha, it is used [23:54:40] https://www.mediawiki.org/wiki/Extension:Translate#Recent_changes - clearly already broken [23:55:24] (03CR) 10Dzahn: [C: 032] contint: tidy Nodepool slaves config history [puppet] - 10https://gerrit.wikimedia.org/r/295641 (https://phabricator.wikimedia.org/T126552) (owner: 10Hashar) [23:56:38] (03PS1) 10Alex Monk: Remove old pre-Swift directory variables referencing upload7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298404 (https://phabricator.wikimedia.org/T64835) [23:59:49] 06Operations, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: /mnt/upload7 does not exist anywhere, yet it is referenced in multiple places in wmf-config - https://phabricator.wikimedia.org/T129586#2450916 (10AlexMonk-WMF) 05stalled>03Open