[00:00:10] ACKNOWLEDGEMENT - puppet last run on nihal is CRITICAL: CRITICAL: Puppet has 1 failures daniel_zahn new PuppetDB server - Unit puppetdb.service entered failed state. [00:04:15] (03CR) 10Dzahn: "Let's just make it change the contact groups based on the datacenter. If the contactgroup "sms" is in it that means paging, otherwise just" [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [00:05:47] (03CR) 10Dzahn: "also monitoring::service and maybe others have "critical => true/false" and true means paging, without directly changing the contact group" [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [00:13:44] (03CR) 10Dzahn: [C: 04-1] "pending naming discussion" [puppet] - 10https://gerrit.wikimedia.org/r/305095 (https://phabricator.wikimedia.org/T143138) (owner: 10Dzahn) [00:14:17] (03CR) 10Dzahn: [C: 04-1] "pending naming discussion" [dns] - 10https://gerrit.wikimedia.org/r/305120 (https://phabricator.wikimedia.org/T143138) (owner: 10Dzahn) [00:16:57] (03CR) 10Dzahn: [C: 032] DNS: Add production DNS for wezen (new syslog server) Bug:T143146 [dns] - 10https://gerrit.wikimedia.org/r/305144 (https://phabricator.wikimedia.org/T143146) (owner: 10Papaul) [00:19:03] 06Operations, 10ops-codfw: rack/setup/deploy wezen (codfw syslog) - https://phabricator.wikimedia.org/T143146#2558409 (10Dzahn) wezen.codfw.wmnet has address 10.192.48.64 64.48.192.10.in-addr.arpa domain name pointer wezen.codfw.wmnet. @fgiunchedi Want IPv6? [00:25:35] (03CR) 10Dzahn: [C: 031] "does nothing. https://web.archive.org/web/20090927061219*/http://strategyapps.wikimedia.org/wiki/Main_Page" [dns] - 10https://gerrit.wikimedia.org/r/302870 (https://phabricator.wikimedia.org/T31675) (owner: 10Dzahn) [00:28:03] (03CR) 10Dzahn: [C: 031] "not even archived in archive.org" [dns] - 10https://gerrit.wikimedia.org/r/302873 (owner: 10Dzahn) [00:29:09] (03Abandoned) 10Dzahn: pmacct: move role to module, rename to ::netflow [puppet] - 10https://gerrit.wikimedia.org/r/298911 (owner: 10Dzahn) [00:39:02] 06Operations, 06Discovery, 10Elasticsearch, 03Discovery-Search-Sprint: Reclaim nobelium - https://phabricator.wikimedia.org/T142581#2540089 (10Dzahn) @Gehel Could we already "shutdown -h now" the machine at this point or do you still need something on it? [00:49:31] 06Operations, 13Patch-For-Review: Split carbon's install/mirror roles, provision install1001 - https://phabricator.wikimedia.org/T132757#2559632 (10Dzahn) [01:00:48] (03PS1) 10Dzahn: installserver: split DHCP part out into role, add on install1001 [puppet] - 10https://gerrit.wikimedia.org/r/305163 (https://phabricator.wikimedia.org/T132757) [01:02:44] (03PS2) 10Dzahn: installserver: split DHCP part out into role, add on install1001 [puppet] - 10https://gerrit.wikimedia.org/r/305163 (https://phabricator.wikimedia.org/T132757) [01:18:25] (03PS3) 10Dzahn: installserver: split DHCP part out into role, add on install1001 [puppet] - 10https://gerrit.wikimedia.org/r/305163 (https://phabricator.wikimedia.org/T132757) [01:20:37] PROBLEM - MariaDB Slave Lag: m3 on db1043 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1233.61 seconds [01:24:07] PROBLEM - puppet last run on analytics1051 is CRITICAL: CRITICAL: Puppet has 1 failures [01:24:19] (03CR) 10BBlack: "In the long run, we'll probably use (or worst case, patch up, or write from scratch) an MMDB vmod to avoid keeping so much inline C around" [puppet] - 10https://gerrit.wikimedia.org/r/253619 (https://phabricator.wikimedia.org/T99226) (owner: 10Faidon Liambotis) [01:32:07] (03PS1) 10Dzahn: installserver: split 'mirror'-server into a separate role (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/305165 [01:32:32] (03CR) 10Dzahn: [C: 04-2] installserver: split 'mirror'-server into a separate role (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/305165 (owner: 10Dzahn) [01:40:13] (03PS2) 10Dzahn: wmnet: repeat host names on each line, fix indentation, misc cleanup [dns] - 10https://gerrit.wikimedia.org/r/304171 [01:40:19] (03CR) 10jenkins-bot: [V: 04-1] wmnet: repeat host names on each line, fix indentation, misc cleanup [dns] - 10https://gerrit.wikimedia.org/r/304171 (owner: 10Dzahn) [01:42:14] (03PS2) 10Dzahn: wikimedia.org: repeat hostname on each line for multi records [dns] - 10https://gerrit.wikimedia.org/r/304155 [01:42:21] (03CR) 10jenkins-bot: [V: 04-1] wikimedia.org: repeat hostname on each line for multi records [dns] - 10https://gerrit.wikimedia.org/r/304155 (owner: 10Dzahn) [01:44:16] RECOVERY - MariaDB Slave Lag: m3 on db1043 is OK: OK slave_sql_lag Replication lag: 0.82 seconds [01:44:55] (03CR) 10Dzahn: "@Hashar after your -1 Moritz added another comment that it should be fine. What's up, should we do this soon now?" [puppet] - 10https://gerrit.wikimedia.org/r/295880 (https://phabricator.wikimedia.org/T138506) (owner: 10Hashar) [01:45:41] (03CR) 10Dzahn: "arg, yes, this is going to be rebase hell" [dns] - 10https://gerrit.wikimedia.org/r/304171 (owner: 10Dzahn) [01:49:19] (03PS8) 10Dzahn: installserver: move role to module [puppet] - 10https://gerrit.wikimedia.org/r/298907 [01:49:45] RECOVERY - puppet last run on analytics1051 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [01:50:20] (03CR) 10Dzahn: "instead i am splitting out parts of this into separate roles, so we are flexible about moving the installserver part to install1001 and th" [puppet] - 10https://gerrit.wikimedia.org/r/298907 (owner: 10Dzahn) [01:50:25] (03Abandoned) 10Dzahn: installserver: move role to module [puppet] - 10https://gerrit.wikimedia.org/r/298907 (owner: 10Dzahn) [01:53:14] (03PS3) 10Dzahn: wmnet: repeat host names on each line, fix indentation, misc cleanup [dns] - 10https://gerrit.wikimedia.org/r/304171 [01:55:26] PROBLEM - puppet last run on ms-be1009 is CRITICAL: CRITICAL: Puppet has 1 failures [01:55:57] (03PS3) 10Dzahn: wikimedia.org: repeat hostname on each line for multi records [dns] - 10https://gerrit.wikimedia.org/r/304155 [02:00:47] (03CR) 10Dzahn: [C: 031] Add bd808 (Bryan Davis) to deploy-service group [puppet] - 10https://gerrit.wikimedia.org/r/305152 (https://phabricator.wikimedia.org/T143174) (owner: 10BryanDavis) [02:15:29] (03PS1) 10Papaul: DHCP: Add Dhcp entries for wezen (new syslog server) Bug: T143146 [puppet] - 10https://gerrit.wikimedia.org/r/305167 (https://phabricator.wikimedia.org/T143146) [02:20:56] RECOVERY - puppet last run on ms-be1009 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [02:25:20] (03PS1) 10Papaul: adding install params for wezen (new syslog server) Bug:T143146 [puppet] - 10https://gerrit.wikimedia.org/r/305168 (https://phabricator.wikimedia.org/T143146) [02:27:34] 06Operations, 10ops-codfw: rack/setup/deploy wezen (codfw syslog) - https://phabricator.wikimedia.org/T143146#2559767 (10Papaul) [02:32:44] 06Operations, 10Domains, 10Traffic, 10Wikimedia-Site-requests, 13Patch-For-Review: Private wiki for Project Grants Committee - https://phabricator.wikimedia.org/T143138#2558051 (10MZMcBride) What about ? There's already a private wiki here. Will that work? If !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.14) (duration: 11m 31s) [02:33:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:07:43] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.15) (duration: 17m 59s) [03:07:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:15:05] !log l10nupdate@tin ResourceLoader cache refresh completed at Wed Aug 17 03:15:05 UTC 2016 (duration 7m 22s) [03:15:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:15:41] 06Operations, 06Reading-Infrastructure-Team, 06Services, 06Services-next, 07Security-General: Protect sensitive user-related information with a UserData / auth / session service - https://phabricator.wikimedia.org/T140813#2559787 (10Tgr) @GWicke that seems like a good way of handling it, but care should... [03:25:41] (03PS3) 10Legoktm: zuul-test-repo: Allow testing multiple repositories at once [puppet] - 10https://gerrit.wikimedia.org/r/269328 [04:37:04] (hrms) I guess the attempt at finding someone who knew how to purge djvu thumnails fell flat. :/ [05:52:37] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: Puppet has 1 failures [05:56:46] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [06:00:57] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [06:18:15] RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:29:47] PROBLEM - puppet last run on db1046 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:25] !log installing openjdk security updates on the stat* hosts [06:34:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:40:48] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-swe-nor] - 10https://gerrit.wikimedia.org/r/294245 (https://phabricator.wikimedia.org/T137767) (owner: 10KartikMistry) [06:45:25] PROBLEM - puppet last run on db1020 is CRITICAL: CRITICAL: Puppet has 1 failures [06:57:16] RECOVERY - puppet last run on db1046 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:06:10] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-nob] - 10https://gerrit.wikimedia.org/r/269914 (https://phabricator.wikimedia.org/T124317) (owner: 10KartikMistry) [07:07:06] PROBLEM - DPKG on titanium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [07:08:57] RECOVERY - DPKG on titanium is OK: All packages OK [07:10:57] RECOVERY - puppet last run on db1020 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:16:29] akosiaris: https://gerrit.wikimedia.org/r/269914 - found solution to fix similar issues. Messed branches. [07:18:12] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-dan] - 10https://gerrit.wikimedia.org/r/269912 (https://phabricator.wikimedia.org/T124137) (owner: 10KartikMistry) [07:26:00] (03PS2) 10Giuseppe Lavagetto: dynamicproxy: puppetize appendfilename setting [puppet] - 10https://gerrit.wikimedia.org/r/304994 [07:35:14] 06Operations, 10Wikimedia-Logstash, 03Discovery-Search-Sprint: Elasticsearch restarts are failing in the logstash cluster - https://phabricator.wikimedia.org/T142357#2559976 (10dcausse) I have a copy of the shard on disk. But yes having many fields with a single value can explain what happens during a force... [07:36:22] !log cleanup and shutdown of nobelium before reclaim [07:36:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:38:00] why is the kibana query dropdown so fcked atm? [07:44:06] PROBLEM - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 10.65.0.24 [07:45:47] killing the dropdown seemed to work [07:45:47] RECOVERY - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms [07:49:49] (03PS1) 10Gehel: Reclaim nobelium [dns] - 10https://gerrit.wikimedia.org/r/305192 (https://phabricator.wikimedia.org/T142581) [07:54:09] 06Operations, 10ops-eqiad, 06Discovery, 10Elasticsearch, and 2 others: Reclaim nobelium - https://phabricator.wikimedia.org/T142581#2559981 (10Gehel) a:05Gehel>03RobH Steps completed for decommissioning (following [[ https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Reclaim_or_Decommission | Server... [08:00:31] !log installing openjdk security updates on the elastic* clusters [08:00:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:13:55] (03PS5) 10Gehel: WDQS - fix icinga graphite check, metric has been renamed [puppet] - 10https://gerrit.wikimedia.org/r/305020 (https://phabricator.wikimedia.org/T138546) [08:14:23] (03PS1) 10Muehlenhoff: Rename canary host for depdeploy (got replaced by newer hardware) [puppet] - 10https://gerrit.wikimedia.org/r/305194 [08:15:22] (03CR) 10Gehel: [C: 032] WDQS - fix icinga graphite check, metric has been renamed [puppet] - 10https://gerrit.wikimedia.org/r/305020 (https://phabricator.wikimedia.org/T138546) (owner: 10Gehel) [08:17:11] (03PS6) 10Gehel: mwgrep: fails gracefully when an invalid regex is provided [puppet] - 10https://gerrit.wikimedia.org/r/302892 (https://phabricator.wikimedia.org/T141996) (owner: 10DCausse) [08:17:26] (03CR) 10Muehlenhoff: [C: 032] Rename canary host for depdeploy (got replaced by newer hardware) [puppet] - 10https://gerrit.wikimedia.org/r/305194 (owner: 10Muehlenhoff) [08:17:30] (03PS2) 10Muehlenhoff: Rename canary host for depdeploy (got replaced by newer hardware) [puppet] - 10https://gerrit.wikimedia.org/r/305194 [08:17:33] (03CR) 10Muehlenhoff: [V: 032] Rename canary host for depdeploy (got replaced by newer hardware) [puppet] - 10https://gerrit.wikimedia.org/r/305194 (owner: 10Muehlenhoff) [08:19:47] (03PS7) 10Gehel: mwgrep: fails gracefully when an invalid regex is provided [puppet] - 10https://gerrit.wikimedia.org/r/302892 (https://phabricator.wikimedia.org/T141996) (owner: 10DCausse) [08:23:49] (03CR) 10Gehel: [C: 032] mwgrep: fails gracefully when an invalid regex is provided [puppet] - 10https://gerrit.wikimedia.org/r/302892 (https://phabricator.wikimedia.org/T141996) (owner: 10DCausse) [08:34:54] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I think the general structure is sound but the parser function definitely needs some fixes; I would also like to see tests for the functio" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/304928 (https://phabricator.wikimedia.org/T134184) (owner: 10Ottomata) [08:35:50] ACKNOWLEDGEMENT - MegaRAID on graphite1002 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: T141795 [08:36:12] godog: that was me ;) ^^^ [08:39:28] PROBLEM - puppet last run on ms-be2023 is CRITICAL: CRITICAL: puppet fail [08:42:11] volans: ack thanks! [08:42:22] (ah ah) [08:42:43] lol [08:42:54] robh: Any news on getting those thumbnails fixed? [08:43:34] Oh, wait (looks at clock) Sorry. :/ [08:46:36] 06Operations, 10Wikimedia-Logstash, 03Discovery-Search-Sprint: Elasticsearch restarts are failing in the logstash cluster - https://phabricator.wikimedia.org/T142357#2560082 (10Gehel) I sent the NDA to elasticsearch for signing. Since we have a likely explanation of the issue, it is not as crucial to send th... [08:50:18] 06Operations, 10ops-codfw, 06Discovery: rack/setup/deploy wqds200[12] - https://phabricator.wikimedia.org/T142864#2560086 (10Gehel) [08:54:56] (03PS2) 10Gehel: Elasticsearch - use unicast for discovery by default [puppet] - 10https://gerrit.wikimedia.org/r/289202 [08:58:53] (03PS3) 10Alexandros Kosiaris: Revert "Point eqiad url-downloader to codfw" [dns] - 10https://gerrit.wikimedia.org/r/304212 (https://phabricator.wikimedia.org/T134496) [09:07:36] (03PS3) 10Gehel: Elasticsearch - use unicast for discovery by default [puppet] - 10https://gerrit.wikimedia.org/r/289202 [09:07:49] RECOVERY - puppet last run on ms-be2023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:11:31] (03PS1) 10Ladsgroup: ores: increase ores workers to 40 per node [puppet] - 10https://gerrit.wikimedia.org/r/305201 (https://phabricator.wikimedia.org/T143105) [09:13:28] (03PS6) 10Muehlenhoff: Use Yubico OTPs as a second authentication factor for members of the yubiauth group on iron [puppet] - 10https://gerrit.wikimedia.org/r/281630 [09:17:29] (03CR) 10Muehlenhoff: [C: 032] Use Yubico OTPs as a second authentication factor for members of the yubiauth group on iron [puppet] - 10https://gerrit.wikimedia.org/r/281630 (owner: 10Muehlenhoff) [09:20:51] (03PS6) 10Volans: Monitoring: add event handler for RAID checks [puppet] - 10https://gerrit.wikimedia.org/r/304026 (https://phabricator.wikimedia.org/T142085) [09:23:37] (03PS4) 10Gehel: Elasticsearch - use unicast for discovery by default [puppet] - 10https://gerrit.wikimedia.org/r/289202 [09:28:59] (03PS2) 10Alexandros Kosiaris: ores: increase ores workers to 40 per node [puppet] - 10https://gerrit.wikimedia.org/r/305201 (https://phabricator.wikimedia.org/T143105) (owner: 10Ladsgroup) [09:29:04] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] ores: increase ores workers to 40 per node [puppet] - 10https://gerrit.wikimedia.org/r/305201 (https://phabricator.wikimedia.org/T143105) (owner: 10Ladsgroup) [09:30:43] (03CR) 10Alexandros Kosiaris: [C: 032] Revert "Point eqiad url-downloader to codfw" [dns] - 10https://gerrit.wikimedia.org/r/304212 (https://phabricator.wikimedia.org/T134496) (owner: 10Alexandros Kosiaris) [09:31:30] !log uploaded hhvm 3.12.7+dfsg+wmf1~trusty1 for trusty-wikimedia to carbon (also includes a fix for T137642) [09:31:31] T137642: IcuCollation sort keys depend on PHP/HHVM version - https://phabricator.wikimedia.org/T137642 [09:31:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:37:20] !log upgraded mw1017 to HHVM 3.12.7 (plus patches) [09:37:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:45:28] !log upload scap3 3.2.3-1 to carbon T127762 [09:45:29] T127762: Update Debian Package for Scap3 - https://phabricator.wikimedia.org/T127762 [09:45:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:51:37] (03CR) 10Gehel: "Puppet compiler: https://puppet-compiler.wmflabs.org/3725/" [puppet] - 10https://gerrit.wikimedia.org/r/289202 (owner: 10Gehel) [10:00:13] (03CR) 10Volans: "Puppet compiler: https://puppet-compiler.wmflabs.org/3724/" [puppet] - 10https://gerrit.wikimedia.org/r/304026 (https://phabricator.wikimedia.org/T142085) (owner: 10Volans) [10:02:41] 06Operations, 06Discovery, 06Discovery-Search-Backlog, 10Elasticsearch, 07Epic: EPIC: Cultivating the Elasticsearch garden (operational lessons from 1.7.1 upgrade) - https://phabricator.wikimedia.org/T109089#2560196 (10Gehel) [10:07:14] !log ladsgroup@scb[12]00[12]:~$ sudo service celery-ores-worker restart (T143105) [10:07:15] T143105: Increase celery workers to 40 per scb node - https://phabricator.wikimedia.org/T143105 [10:07:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:09:00] restarts are done now [10:10:58] 06Operations, 13Patch-For-Review: Firewall configurations for database hosts - https://phabricator.wikimedia.org/T104699#2560210 (10MoritzMuehlenhoff) Updated list, now really small: db1009.eqiad.wmnet db1020.eqiad.wmnet db2011.codfw.wmnet db1010.eqiad.wmnet (apparently no longer in use: T129395) And finally... [10:17:52] 06Operations, 10DBA, 13Patch-For-Review: Firewall configurations for database hosts - https://phabricator.wikimedia.org/T104699#2560238 (10jcrespo) [10:18:47] 06Operations, 10vm-requests, 13Patch-For-Review: EQIAD: (1) VM request for url-downloader - https://phabricator.wikimedia.org/T134496#2560240 (10akosiaris) 05Open>03Resolved [10:26:00] (03PS1) 10Volans: Add dummy Phabricator token for ops-monitoring-bot [labs/private] - 10https://gerrit.wikimedia.org/r/305214 (https://phabricator.wikimedia.org/T142085) [10:27:41] (03CR) 10Volans: [C: 032 V: 032] Add dummy Phabricator token for ops-monitoring-bot [labs/private] - 10https://gerrit.wikimedia.org/r/305214 (https://phabricator.wikimedia.org/T142085) (owner: 10Volans) [10:29:30] 06Operations, 10DBA, 13Patch-For-Review: Firewall configurations for database hosts - https://phabricator.wikimedia.org/T104699#2560291 (10jcrespo) We should be able to do `db1009` at any time, I do not have a good candidate for failover on eqiad; but most services should be able to handle some seconds of un... [10:32:39] (03Abandoned) 10Giuseppe Lavagetto: role::kafka::main::mirror: allow fetching configs from hiera [puppet] - 10https://gerrit.wikimedia.org/r/305041 (owner: 10Giuseppe Lavagetto) [11:04:19] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Puppet has 1 failures [11:05:16] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 07I18n: MB Lateefi Fonts for Sindhi Wikipedia. - https://phabricator.wikimedia.org/T138136#2560336 (10Aklapper) >>! In T138136#2425343, @MoritzMuehlenhoff wrote: > I contacted the Debian maintainer of font-sil-lateef whether he's fine with uploa... [11:06:34] (03CR) 10Volans: "Updated puppet compiler with dummy API token in labs/private:" [puppet] - 10https://gerrit.wikimedia.org/r/304026 (https://phabricator.wikimedia.org/T142085) (owner: 10Volans) [11:07:41] 06Operations, 13Patch-For-Review: Remove secure.wikimedia.org - https://phabricator.wikimedia.org/T120790#2560346 (10Aklapper) @BBlack: Any idea who could (be in a position to) make a decision (kill vs. decline)? [11:08:29] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:59] PROBLEM - puppet last run on db1061 is CRITICAL: CRITICAL: Puppet has 1 failures [11:10:42] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 07I18n: MB Lateefi Fonts for Sindhi Wikipedia. - https://phabricator.wikimedia.org/T138136#2560363 (10MoritzMuehlenhoff) @Aklapper : Yes, I had uploaded a version to jessie-backports, that is also rolled out on the servers. (https://packages.qa.... [11:15:22] godog: there is a request to add ~1000 metrics from maps to graphite. Do we need to do some capacitiy check ? [11:15:28] 06Operations, 10MediaWiki-General-or-Unknown, 06Release-Engineering-Team, 10Traffic, and 5 others: Make sure we're not relying on HTTP_PROXY headers - https://phabricator.wikimedia.org/T140658#2560365 (10Aklapper) [11:20:49] (03CR) 10DCausse: [C: 031] Elasticsearch - use unicast for discovery by default [puppet] - 10https://gerrit.wikimedia.org/r/289202 (owner: 10Gehel) [11:29:19] RECOVERY - puppet last run on db1061 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:39:51] gehel: 1k for each machine that is? do those come via statsd or graphite ? [11:40:33] godog: 1k total, with a frequency of 1 datapoint per hour per metric [11:41:19] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-dan: New upstream release [debs/contenttranslation/apertium-dan] - 10https://gerrit.wikimedia.org/r/269912 (https://phabricator.wikimedia.org/T124137) (owner: 10KartikMistry) [11:41:52] gehel: yeah that's fine! what's the task btw? [11:41:58] godog: we don't need the aggregation provided by statsd. Implementation is not yet done [11:42:09] godog: T143048 & T143046 [11:42:09] T143046: Add count of pages containing a graph to Grafana - https://phabricator.wikimedia.org/T143046 [11:42:09] T143048: Add map pagecount tracking to Grafana - https://phabricator.wikimedia.org/T143048 [11:42:12] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-nob: New upstream release [debs/contenttranslation/apertium-nob] - 10https://gerrit.wikimedia.org/r/269914 (https://phabricator.wikimedia.org/T124317) (owner: 10KartikMistry) [11:43:05] gehel: yup that seems fine, thanks for checking! [11:43:13] godog: thanks! [11:46:52] (03PS5) 10KartikMistry: apertium-hin: New upstream release and rebuild for Jessie [debs/contenttranslation/apertium-hin] - 10https://gerrit.wikimedia.org/r/296228 (https://phabricator.wikimedia.org/T107306) [11:48:12] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-dan-nor] - 10https://gerrit.wikimedia.org/r/269916 (https://phabricator.wikimedia.org/T124137) (owner: 10KartikMistry) [11:48:57] akosiaris: thanks! https://gerrit.wikimedia.org/r/#/c/296228/ is also good to go. [11:50:36] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-dan-nor] - 10https://gerrit.wikimedia.org/r/269916 (https://phabricator.wikimedia.org/T124137) (owner: 10KartikMistry) [11:51:53] (03PS2) 10Phedenskog: Enable PerformanceInspector extension for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304992 [11:54:55] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-dan_0.5.0~r67099-2+wmf1 [11:54:55] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-nob_0.9.0~r69513-1+wmf1 [11:54:56] T107306: Package apertium (and dependencies) for Jessie - https://phabricator.wikimedia.org/T107306 [11:54:56] T107306: Package apertium (and dependencies) for Jessie - https://phabricator.wikimedia.org/T107306 [11:55:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:56:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:56:21] godog: those metrics will be collected hourly. We only have storage schemas for daily or per minute metrics at the moment [11:57:00] godog: does it make sense to also create a "hourly" hierarchy to save a bit of space / IO ? [11:58:49] akosiaris: some packages are jenkins failed for piuparts test. Should we ignore it? [11:59:18] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-hin: New upstream release and rebuild for Jessie [debs/contenttranslation/apertium-hin] - 10https://gerrit.wikimedia.org/r/296228 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:59:47] kart_: piuparts is ignored anyway IIRC, there must be some other fail as well [11:59:51] perhaps lintian ? [12:00:21] akosiaris: wish there is an easy way to get separate lintian output :) [12:00:29] akosiaris: looking at them again. [12:00:59] (03CR) 10Alexandros Kosiaris: [C: 032] Add bd808 (Bryan Davis) to deploy-service group [puppet] - 10https://gerrit.wikimedia.org/r/305152 (https://phabricator.wikimedia.org/T143174) (owner: 10BryanDavis) [12:01:08] (03PS2) 10Alexandros Kosiaris: Add bd808 (Bryan Davis) to deploy-service group [puppet] - 10https://gerrit.wikimedia.org/r/305152 (https://phabricator.wikimedia.org/T143174) (owner: 10BryanDavis) [12:01:11] (03CR) 10Alexandros Kosiaris: [V: 032] Add bd808 (Bryan Davis) to deploy-service group [puppet] - 10https://gerrit.wikimedia.org/r/305152 (https://phabricator.wikimedia.org/T143174) (owner: 10BryanDavis) [12:02:00] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-dan-nor] - 10https://gerrit.wikimedia.org/r/269916 (https://phabricator.wikimedia.org/T124137) (owner: 10KartikMistry) [12:02:01] kart_: yeah, it probably is doable tough [12:02:29] remember that essentially this jenkins job has ~ 30 days now that it is active [12:02:35] and it's already really helpful [12:02:54] we definitely still have things to improve on [12:03:47] !log repooling mw1298 with config change to allow scaling of huge SVGs (to testdrive further before enabling this in general) [12:03:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:04:30] gehel: it could but it is relatively few metrics, each file is ~300k on disk now, probably not worth it [12:06:12] godog: I did some quick estimate, having also an hourly schema would save ~150k per time serie. But it would probably also save some IO and ensure we don't have holes in the data if we send updates hourly [12:06:56] akosiaris: agree. [12:07:03] akosiaris: https://gerrit.wikimedia.org/r/#/c/269916/ also good to go. [12:07:23] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-swe-nor] - 10https://gerrit.wikimedia.org/r/294245 (https://phabricator.wikimedia.org/T137767) (owner: 10KartikMistry) [12:07:34] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-swe-dan] - 10https://gerrit.wikimedia.org/r/294248 (https://phabricator.wikimedia.org/T137767) (owner: 10KartikMistry) [12:10:12] akosiaris: other 2 is also good to go^^ [12:10:53] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-hin_0.1.0~r59158-1+wmf1 [12:10:56] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-urd-hin] - 10https://gerrit.wikimedia.org/r/296368 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [12:10:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:11:05] T107306: Package apertium (and dependencies) for Jessie - https://phabricator.wikimedia.org/T107306 [12:11:49] kart_: ok, I see you are rechecking all of them, lemme known when you are done and I 'll look into everything +2ed by jenkins [12:12:08] akosiaris: Sure. [12:12:39] akosiaris: I'll be afk for a while, so go ahead with dan-nor, swe-nor and swe-dan. [12:16:42] gehel: yeah good point on the update holes, if you'd like to add hourly too I can code review/assist [12:17:38] godog: it does not cost much to add a rule in storage-schemas, even if we end up not using it... I'll create the patch and let you review... [12:17:47] 06Operations, 06Commons, 10Wikimedia-SVG-rendering, 07User-notice: SVG files larger than 10 MB cannot be thumbnailed - https://phabricator.wikimedia.org/T111815#2560451 (10MoritzMuehlenhoff) I enanbled the --unlimited option locally on scaler mw1298. I tested ten huge SVGs, which scaled fine: https://uplo... [12:18:42] !log restart pybal on low-traffic for thumbor - T139606 [12:18:43] T139606: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606 [12:18:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:18:58] gehel: sounds good! thanks [12:20:35] RECOVERY - LVS HTTP IPv4 on thumbor.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 241 bytes in 0.023 second response time [12:20:58] godog: \o/!!! [12:21:00] nice [12:22:06] <_joe_> does it also serve thumbnails? [12:23:27] not yet no, when it does the results will be stored into swift but not served to users yet [12:24:08] akosiaris: indeed! the alarm is downtimed for a bit more still given ^ [12:27:55] (03PS1) 10Gehel: Introduce a hourly storage schema in graphite [puppet] - 10https://gerrit.wikimedia.org/r/305238 (https://phabricator.wikimedia.org/T143048) [12:29:20] (03CR) 10Yurik: [C: 04-1] Introduce a hourly storage schema in graphite (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305238 (https://phabricator.wikimedia.org/T143048) (owner: 10Gehel) [12:29:44] (03CR) 10Alexandros Kosiaris: "I see this cherry-picked in beta, I assume it's working fine, merging" [puppet] - 10https://gerrit.wikimedia.org/r/298947 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [12:29:49] (03PS7) 10Alexandros Kosiaris: Beta: Fix cxserver restbase_url [puppet] - 10https://gerrit.wikimedia.org/r/298947 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [12:30:55] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: Fix cxserver restbase_url [puppet] - 10https://gerrit.wikimedia.org/r/298947 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [12:34:19] (03PS1) 10Filippo Giunchedi: lvs: don't monitor thumbor/codfw, not set-up yet [puppet] - 10https://gerrit.wikimedia.org/r/305241 [12:36:03] (03CR) 10Filippo Giunchedi: [C: 032] lvs: don't monitor thumbor/codfw, not set-up yet [puppet] - 10https://gerrit.wikimedia.org/r/305241 (owner: 10Filippo Giunchedi) [12:37:13] (03PS2) 10Gehel: Introduce a hourly storage schema in graphite [puppet] - 10https://gerrit.wikimedia.org/r/305238 (https://phabricator.wikimedia.org/T143048) [12:37:39] (03CR) 10Gehel: Introduce a hourly storage schema in graphite (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305238 (https://phabricator.wikimedia.org/T143048) (owner: 10Gehel) [12:38:36] (03CR) 10jenkins-bot: [V: 04-1] Introduce a hourly storage schema in graphite [puppet] - 10https://gerrit.wikimedia.org/r/305238 (https://phabricator.wikimedia.org/T143048) (owner: 10Gehel) [12:39:00] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-swe-dan: Initial Debian packaging [debs/contenttranslation/apertium-swe-dan] - 10https://gerrit.wikimedia.org/r/294248 (https://phabricator.wikimedia.org/T137767) (owner: 10KartikMistry) [12:39:28] (03PS3) 10Gehel: Introduce a hourly storage schema in graphite [puppet] - 10https://gerrit.wikimedia.org/r/305238 (https://phabricator.wikimedia.org/T143048) [12:39:51] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-swe-nor: Initial Debian packaging [debs/contenttranslation/apertium-swe-nor] - 10https://gerrit.wikimedia.org/r/294245 (https://phabricator.wikimedia.org/T137767) (owner: 10KartikMistry) [12:42:44] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-dan-nor: New upstream release [debs/contenttranslation/apertium-dan-nor] - 10https://gerrit.wikimedia.org/r/269916 (https://phabricator.wikimedia.org/T124137) (owner: 10KartikMistry) [12:47:06] (03CR) 10Alexandros Kosiaris: "lintian says" [debs/contenttranslation/apertium-isl] - 10https://gerrit.wikimedia.org/r/296050 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [12:47:11] (03CR) 10Alexandros Kosiaris: [C: 04-1] apertium-isl: Rebuild for Jessie and cleanup [debs/contenttranslation/apertium-isl] - 10https://gerrit.wikimedia.org/r/296050 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [12:48:11] (03CR) 10Alexandros Kosiaris: [C: 04-1] "lintian says" [debs/contenttranslation/apertium-hbs] - 10https://gerrit.wikimedia.org/r/294675 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [12:48:41] (03CR) 10Alexandros Kosiaris: [C: 04-1] "lintian says:" [debs/contenttranslation/apertium-eus] - 10https://gerrit.wikimedia.org/r/294673 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [12:49:32] (03CR) 10Alexandros Kosiaris: [C: 04-1] "needs apertium-kaz which is not yet ready" [debs/contenttranslation/apertium-kaz-tat] - 10https://gerrit.wikimedia.org/r/296369 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [12:49:59] (03CR) 10Alexandros Kosiaris: [C: 04-1] "needs apertium-isl which is not yet ready" [debs/contenttranslation/apertium-isl-eng] - 10https://gerrit.wikimedia.org/r/296157 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [12:50:28] (03CR) 10Alexandros Kosiaris: [C: 04-1] "needs giella-sme which is not yet ready" [debs/contenttranslation/apertium-sme-nob] - 10https://gerrit.wikimedia.org/r/295185 (https://phabricator.wikimedia.org/T120087) (owner: 10KartikMistry) [12:52:17] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/giella-sme] - 10https://gerrit.wikimedia.org/r/294430 (https://phabricator.wikimedia.org/T120087) (owner: 10KartikMistry) [12:53:10] 06Operations, 10Ops-Access-Requests, 10Striker, 13Patch-For-Review: deploy-service access for bd808 - https://phabricator.wikimedia.org/T143174#2560541 (10akosiaris) 05Open>03Resolved a:03akosiaris [12:56:29] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [50.0] [12:57:20] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-dan-nor_1.3.0~r67099-2+wmf1 [12:57:20] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-swe-dan_0.7.0~r66063-1+wmf1 [12:57:20] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-swe-nor_0.2.0~r69544-1+wmf1 [12:57:21] T107306: Package apertium (and dependencies) for Jessie - https://phabricator.wikimedia.org/T107306 [12:57:21] T107306: Package apertium (and dependencies) for Jessie - https://phabricator.wikimedia.org/T107306 [12:57:22] T107306: Package apertium (and dependencies) for Jessie - https://phabricator.wikimedia.org/T107306 [12:57:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:57:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:57:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:00:05] hoo: Dear anthropoid, the time has come. Please deploy Wikidata (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160817T1300). [13:00:59] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [13:06:49] godog: when you have a minute... https://gerrit.wikimedia.org/r/#/c/305238/ [13:07:26] (03PS5) 10Gehel: Elasticsearch - use unicast for discovery by default [puppet] - 10https://gerrit.wikimedia.org/r/289202 [13:08:59] (03CR) 10Gehel: [C: 032] Elasticsearch - use unicast for discovery by default [puppet] - 10https://gerrit.wikimedia.org/r/289202 (owner: 10Gehel) [13:10:18] 06Operations, 07SEO: secure.wikimedia.org entries still showing up in Google search results - https://phabricator.wikimedia.org/T93531#2560554 (10BBlack) [13:10:21] 06Operations, 13Patch-For-Review: Remove secure.wikimedia.org - https://phabricator.wikimedia.org/T120790#2560552 (10BBlack) 05Open>03declined I'm normally in favor of removing cruft when we can, but this was a semi-canonical way to access our domains for a long time, and it still has functional redirects... [13:10:21] !log rolling restart of relforge100* for JVM upgrade. Short downtime expected. [13:10:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:16:21] (03CR) 10Ottomata: "Ah thanks for review! Hadn't been able to actually test this yet, just wanted to push the stuff I wrote at end of day up." [puppet] - 10https://gerrit.wikimedia.org/r/304928 (https://phabricator.wikimedia.org/T134184) (owner: 10Ottomata) [13:23:46] (03PS1) 10Hoo man: Enable allowDataAccessInUserLanguage on meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305247 (https://phabricator.wikimedia.org/T122672) [13:23:51] (03CR) 10Gehel: [C: 031] Upgrade elastic plugins to 2.3.4 [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/302707 (owner: 10DCausse) [13:24:52] (03CR) 10Hoo man: [C: 032] Enable allowDataAccessInUserLanguage on meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305247 (https://phabricator.wikimedia.org/T122672) (owner: 10Hoo man) [13:25:19] (03Merged) 10jenkins-bot: Enable allowDataAccessInUserLanguage on meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305247 (https://phabricator.wikimedia.org/T122672) (owner: 10Hoo man) [13:27:45] (03CR) 10Volans: [C: 031] "The module part LGTM (I'm not familiar with conftool specific syntax though)." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295202 (owner: 10Giuseppe Lavagetto) [13:28:41] !log hoo@tin Synchronized wmf-config/InitialiseSettings.php: Enable allowDataAccessInUserLanguage on meta (T122672) (duration: 00m 56s) [13:28:42] T122672: [Task] Enable allowDataAccessInUserLanguage on meta - https://phabricator.wikimedia.org/T122672 [13:28:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:28:53] Tested, works :) [13:29:18] (03CR) 10Gehel: [C: 032] "Looks good, merging and starting elasticsearch upgrade" [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/302707 (owner: 10DCausse) [13:29:46] (03CR) 10Gehel: [V: 032] Upgrade elastic plugins to 2.3.4 [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/302707 (owner: 10DCausse) [13:30:43] * hoo is done [13:35:21] * mafk wantes what is that, wants to test [13:35:46] mw.wikibase.label etc. now use the user's language rather than just English [13:36:49] there will also be an announcement, hopefully today [13:37:18] oh hoo and talking about wikibase, is there any plan to update the tests to allow unblocking that mediawiki/core patch of mine which changes language fallback for lzh to zh_hant? [13:38:15] mafk: ah, good point [13:38:21] will put it on my to do list [13:38:29] should be able to get to it soonish [13:38:47] thank you so much [13:39:05] jenkins-bot is always complaining [13:39:21] and even tried a false "depends-on" as siebrand suggested [13:39:26] but no [13:40:43] (03CR) 10Volans: salt: add conftool module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/295202 (owner: 10Giuseppe Lavagetto) [13:44:05] mafk: I know… it will be a bit of work on our side, but it's doable [13:57:07] 06Operations, 10ArticlePlaceholder, 10Traffic, 10Wikidata: Performance and caching considerations for article placeholders accesses - https://phabricator.wikimedia.org/T142944#2560681 (10hoo) >>! In T142944#2557015, @BBlack wrote: > 30 minutes isn't really reasonable, and neither is spamming more purge tra... [14:10:50] 06Operations, 10ArticlePlaceholder, 10Traffic, 10Wikidata: Performance and caching considerations for article placeholders accesses - https://phabricator.wikimedia.org/T142944#2560699 (10BBlack) I think I'm lacking a lot of context here about these special pages and placeholders. But my bottom line though... [14:11:15] akosiaris: hmm. no. https://gerrit.wikimedia.org/r/#/c/298947/ need to be revert. [14:11:36] akosiaris: can you do that? It is blocked by, https://phabricator.wikimedia.org/T138088 [14:12:03] kart_: what do you mean blocked by ? [14:12:49] also it was cherry-picked in beta for weeks [14:13:03] akosiaris: yes, but it is not really working. [14:13:30] ah, then it should not have been that long cherry-picked in beta [14:13:40] akosiaris: right. My mistake :/ [14:13:45] I 'll revert [14:13:52] akosiaris: thanks! [14:14:59] (03PS1) 10Alexandros Kosiaris: Revert "Beta: Fix cxserver restbase_url" [puppet] - 10https://gerrit.wikimedia.org/r/305252 [14:16:11] (03CR) 10KartikMistry: "This is pretty heavy package, with build time around 2.5 hours on my local machine with 8 GBs RAM." [debs/contenttranslation/giella-sme] - 10https://gerrit.wikimedia.org/r/294430 (https://phabricator.wikimedia.org/T120087) (owner: 10KartikMistry) [14:18:06] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-en-ca] - 10https://gerrit.wikimedia.org/r/294264 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [14:19:09] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-en-es] - 10https://gerrit.wikimedia.org/r/294314 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [14:20:38] 06Operations: Handling of customised systemd units via puppet in base::service_unit - https://phabricator.wikimedia.org/T143210#2560716 (10MoritzMuehlenhoff) [14:22:40] (03CR) 10Alexandros Kosiaris: [C: 032] Revert "Beta: Fix cxserver restbase_url" [puppet] - 10https://gerrit.wikimedia.org/r/305252 (owner: 10Alexandros Kosiaris) [14:26:42] !log upgrading elasticsearch to 2.3.4 on relforge cluster [14:26:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:29:53] !log upgrading elasticsearch plugins to 2.3.4 on elasticsearch, relforge and logstash clusters. Rolling restart coming next. [14:29:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:34:01] (03PS1) 10Mobrovac: PDF Render Service: Role and module [puppet] - 10https://gerrit.wikimedia.org/r/305256 (https://phabricator.wikimedia.org/T143129) [14:34:53] (03PS4) 10Filippo Giunchedi: Introduce a hourly storage schema in graphite [puppet] - 10https://gerrit.wikimedia.org/r/305238 (https://phabricator.wikimedia.org/T143048) (owner: 10Gehel) [14:34:59] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Introduce a hourly storage schema in graphite [puppet] - 10https://gerrit.wikimedia.org/r/305238 (https://phabricator.wikimedia.org/T143048) (owner: 10Gehel) [14:35:31] gehel: LGTM! merged, thanks! [14:36:01] godog: thanks to you! [14:36:49] gehel: of course metrics now will need the hourly. prefix, but I suppose that's ok [14:37:12] godog: that should not be an issue [14:39:31] <_joe_> thcipriani|afk: ping for when you're around [14:42:09] !log rolling restart of elasticsearch relforge cluster for elasticsearch upgrade to 2.3.4 [14:42:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:42:40] (03PS1) 10Alexandros Kosiaris: postgres: Provision a replication lag check script [puppet] - 10https://gerrit.wikimedia.org/r/305260 [14:42:43] (03PS1) 10Alexandros Kosiaris: maps: Set up postgres replication lag monitoring [puppet] - 10https://gerrit.wikimedia.org/r/305261 [14:43:50] (03CR) 10jenkins-bot: [V: 04-1] postgres: Provision a replication lag check script [puppet] - 10https://gerrit.wikimedia.org/r/305260 (owner: 10Alexandros Kosiaris) [14:44:18] (03CR) 10jenkins-bot: [V: 04-1] maps: Set up postgres replication lag monitoring [puppet] - 10https://gerrit.wikimedia.org/r/305261 (owner: 10Alexandros Kosiaris) [14:45:46] !log bounce carbon on graphite machines [14:45:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:46:22] (03PS2) 10Mobrovac: PDF Render Service: Role and module [puppet] - 10https://gerrit.wikimedia.org/r/305256 (https://phabricator.wikimedia.org/T143129) [14:47:40] (03PS2) 10Mobrovac: PDF Render Service: Add to SCB [puppet] - 10https://gerrit.wikimedia.org/r/305259 (https://phabricator.wikimedia.org/T143129) [14:47:46] !log rolling restart of elasticsearch logstash cluster for elasticsearch upgrade to 2.3.4 [14:47:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:48:18] (03CR) 10Alexandros Kosiaris: "@hashar, do you think we can increase from the 30 mins timeout. Kartik reports 2.5 hours btw ^" [debs/contenttranslation/giella-sme] - 10https://gerrit.wikimedia.org/r/294430 (https://phabricator.wikimedia.org/T120087) (owner: 10KartikMistry) [14:50:08] !log deploying schema change on s6 hosts T139090 [14:50:10] T139090: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090 [14:50:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:51:16] (03PS2) 10Alexandros Kosiaris: maps: Set up postgres replication lag monitoring [puppet] - 10https://gerrit.wikimedia.org/r/305261 [14:51:19] _joe_: what's up? [14:51:54] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [50.0] [14:52:46] <_joe_> thcipriani: so, I see that most repos deployed via scap3 use dsh targets lists created by the repo owners themselves [14:52:51] <_joe_> which is a very bad practice [14:53:07] <_joe_> as say I have to decommission scb1001, like 10 repositories need to be changed [14:53:40] <_joe_> I would like, instead, to move the list of all targets in production under ops control, and derive it from conftool data [14:54:16] <_joe_> to that end, if I read the code correctly, it would be enough to change scap.cfg to point to an absolute path for dsh_targets [14:54:18] <_joe_> am I wrong? [14:54:44] you are not wrong. Scap3 can do that. [14:54:49] <_joe_> ok cool [14:54:56] conftool derived data would would be great. [14:55:08] <_joe_> I have patches up for mediawiki and parsoid [14:55:20] scap3 is looks in /etc/dsh/groups for the file name if it's not found in the local repo, FWIW [14:55:27] <_joe_> https://gerrit.wikimedia.org/r/#/c/283201/ [14:55:33] * thcipriani looks [14:55:37] <_joe_> thcipriani: yes, that I saw too [14:55:45] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [14:57:12] <_joe_> thcipriani: ok cool, next question and I'm done pestering you, I promise :P [14:57:26] nice! is there anyway to derive any info about canary machines? [14:57:33] (03CR) 10jenkins-bot: [V: 04-1] maps: Set up postgres replication lag monitoring [puppet] - 10https://gerrit.wikimedia.org/r/305261 (owner: 10Alexandros Kosiaris) [14:57:35] 06Operations: Handling of customised systemd units via puppet in base::service_unit - https://phabricator.wikimedia.org/T143210#2560815 (10akosiaris) > In comparison, the benefit of shipping systemd units in /lib/systemd (being able to use mask/unmask) is relatively small. If anyone wants to debug such a unit, t... [14:57:38] <_joe_> sadly no [14:57:57] <_joe_> say I have a medium-sized binary to deploy and then link to /usr/local/bin/name-of-binary, can I make scap do the linking for me? [14:58:03] <_joe_> or should I do that in puppet? [14:59:35] you could do it via scap via a command at the end of the promote phase. Might be easiest to do the symlinking in puppet since scap runs as a non-privileged user. [15:00:04] anomie, ostriches, thcipriani, hashar, and twentyafterfour: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160817T1500). [15:00:05] James_F, CFisch_WMDE, and kart_: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:20] Heya. [15:00:25] here [15:00:48] jepp [15:01:42] I can SWAT today. Lots of non-config things, should be a fun one :) [15:01:43] <_joe_> thcipriani: ach, you're right [15:01:56] (03PS2) 10Alexandros Kosiaris: postgres: Provision a replication lag check script [puppet] - 10https://gerrit.wikimedia.org/r/305260 [15:01:58] (03PS3) 10Alexandros Kosiaris: maps: Set up postgres replication lag monitoring [puppet] - 10https://gerrit.wikimedia.org/r/305261 [15:03:30] (03CR) 10jenkins-bot: [V: 04-1] maps: Set up postgres replication lag monitoring [puppet] - 10https://gerrit.wikimedia.org/r/305261 (owner: 10Alexandros Kosiaris) [15:03:35] (03PS3) 10Alexandros Kosiaris: postgres: Provision a replication lag check script [puppet] - 10https://gerrit.wikimedia.org/r/305260 [15:03:37] thcipriani: and its my first swat baby-sitting ^^ - but addshore prepared me well I hope ^^ [15:03:40] (03CR) 10Alexandros Kosiaris: [C: 032] postgres: Provision a replication lag check script [puppet] - 10https://gerrit.wikimedia.org/r/305260 (owner: 10Alexandros Kosiaris) [15:03:41] James_F: will do https://gerrit.wikimedia.org/r/#/c/305148/ at the end since it has i18n/l10n changes [15:03:44] (03CR) 10Alexandros Kosiaris: [V: 032] postgres: Provision a replication lag check script [puppet] - 10https://gerrit.wikimedia.org/r/305260 (owner: 10Alexandros Kosiaris) [15:04:15] thcipriani: The i18n changes don't need to be the in deploy (it's just killing messages no longer used), so no need to scap. [15:04:44] 06Operations, 10ArticlePlaceholder, 10Traffic, 10Wikidata: Performance and caching considerations for article placeholders accesses - https://phabricator.wikimedia.org/T142944#2560827 (10hoo) >>! In T142944#2560699, @BBlack wrote: > I think I'm lacking a lot of context here about these special pages and pl... [15:04:58] (03PS4) 10Alexandros Kosiaris: maps: Set up postgres replication lag monitoring [puppet] - 10https://gerrit.wikimedia.org/r/305261 [15:05:22] James_F: ah, awesome, saw i18n/en.json and mentally said, "not yet" :) [15:05:56] :-) [15:06:15] PROBLEM - logstash process on logstash1002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 998 (logstash), command name java, args logstash [15:06:17] thcipriani: Anyway, aren't we meant to avoid SWAT changes that need scaps? ;-) [15:06:38] heh, in theory :) [15:06:39] ^ logstash is probably me, checking... [15:06:56] (03PS5) 10Alexandros Kosiaris: maps: Set up postgres replication lag monitoring [puppet] - 10https://gerrit.wikimedia.org/r/305261 [15:07:01] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] maps: Set up postgres replication lag monitoring [puppet] - 10https://gerrit.wikimedia.org/r/305261 (owner: 10Alexandros Kosiaris) [15:07:44] James_F: oh, this was merged by twentyafterfour [15:07:53] (betafeatures change) [15:09:29] and deployed: 23:39 logmsgbot: twentyafterfour@tin Synchronized php-1.28.0-wmf.15/extensions/BetaFeatures: deploy https://gerrit.wikimedia.org/r/#/c/305148/ (duration: 00m 49s) [15:09:35] cool, easier. [15:09:56] thcipriani: Ah, OK. That's easier, yes. :-) [15:10:26] (03PS4) 10Thcipriani: Enable VisualEditor by default for logged-out users on Arabic-script Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303587 (https://phabricator.wikimedia.org/T142587) (owner: 10Jforrester) [15:10:38] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303587 (https://phabricator.wikimedia.org/T142587) (owner: 10Jforrester) [15:10:49] (03PS1) 10Alexandros Kosiaris: postgresql: Fix silly typo introduced in a19cbb7 [puppet] - 10https://gerrit.wikimedia.org/r/305267 [15:11:04] (03Merged) 10jenkins-bot: Enable VisualEditor by default for logged-out users on Arabic-script Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303587 (https://phabricator.wikimedia.org/T142587) (owner: 10Jforrester) [15:11:47] ^ James_F live on mw1099 check please [15:12:09] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] postgresql: Fix silly typo introduced in a19cbb7 [puppet] - 10https://gerrit.wikimedia.org/r/305267 (owner: 10Alexandros Kosiaris) [15:12:43] 06Operations, 10Ops-Access-Requests, 06Labs: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2560855 (10RobH) a:05madhuvishy>03MoritzMuehlenhoff Mortiz mentioned (in IRC a day or two ago) he would take care of the pwstore stuff, so I'm assigning this task to him. [15:12:54] PROBLEM - puppet last run on maps-test2002 is CRITICAL: CRITICAL: Puppet has 1 failures [15:13:01] Looking. [15:13:53] thcipriani: Yup, LGTM. [15:14:02] ack, going out everywhere [15:14:15] RECOVERY - logstash process on logstash1002 is OK: PROCS OK: 1 process with UID = 998 (logstash), command name java, args logstash [15:14:55] RECOVERY - puppet last run on maps-test2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:15:55] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:303587|Enable VisualEditor by default for logged-out users on Arabic-script Wikipedias (T142587)]] (duration: 00m 50s) [15:15:56] T142587: Enable VisualEditor by default for all users of all Arabic script Wikipedias - https://phabricator.wikimedia.org/T142587 [15:16:00] ^ James_F live everywhere [15:16:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:16:15] Awesome, thanks. [15:17:21] James_F: https://gerrit.wikimedia.org/r/#/c/305162/ live on mw1099, check please [15:20:31] [15:20:31] Postgres Replication Lag [15:20:31] OK 2016-08-17 15:18:40 0d 0h 2m 56s 1/3 OK - Rep Delay is: 0.0 Seconds [15:20:32] :-) [15:20:43] _joe_: I am pushing this to nihal and nitrogen [15:20:45] PROBLEM - ElasticSearch health check for shards on logstash1004 is CRITICAL: CRITICAL - elasticsearch inactive shards 11 threshold =0.1% breach: status: yellow, number_of_nodes: 6, unassigned_shards: 7, number_of_pending_tasks: 13, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 36, task_max_waiting_in_queue_millis: 45222, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards_percent_as_nu [15:20:56] akosiaris: kool! Thanks for that one! [15:20:58] <_joe_> akosiaris: nice [15:21:04] <_joe_> let's see if it's replicating :P [15:21:09] thcipriani: Yeah, think that's right. Upload tasks are a pain to test. [15:22:45] RECOVERY - ElasticSearch health check for shards on logstash1004 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 6, unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 36, task_max_waiting_in_queue_millis: 0, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards_percent_as_number: 99.0740740741, acti [15:22:46] ack, I'll get it live everywhere. uploadbase first then apiupload, sound right? [15:23:29] 06Operations, 10ops-eqiad: Rack/setup sodium (carbon/mirror server replacement) - https://phabricator.wikimedia.org/T139171#2560927 (10faidon) Any news? Is there an ETA? [15:25:44] James_F: fine to sync uploadbase first then apiupload? [15:26:10] thcipriani: Yes. [15:26:12] thcipriani: James_F: you can test that by uploading any 5+ MB file on testwiki with uploadwizard [15:26:29] MatmaRex: I know, I did. It looked like it worked but, well, uploads. [15:26:29] 06Operations, 10hardware-requests: EQIAD: (2) hardware access request for PUPPET - https://phabricator.wikimedia.org/T142218#2560950 (10RobH) [15:26:30] (03PS1) 10Alexandros Kosiaris: puppetdb: Set up postgres replication lag check on slave [puppet] - 10https://gerrit.wikimedia.org/r/305270 [15:26:37] aight. [15:26:50] 06Operations, 10hardware-requests: CODFW: (2) hardware access request for PUPPET - https://phabricator.wikimedia.org/T142219#2560952 (10RobH) [15:27:29] !log thcipriani@tin Synchronized php-1.28.0-wmf.15/includes/upload/UploadBase.php: SWAT: [[gerrit:305162|Do not call the "UploadStashFile" hook for partially uploaded files (T143161)]] PART I (duration: 00m 53s) [15:27:30] T143161: Catchable fatal error: Argument 3 passed to AbuseFilterHooks::onUploadStashFile() must be of the type array, null given in /var/www/html/w/extensions/AbuseFilter/AbuseFilter.hooks.php on line 730 - https://phabricator.wikimedia.org/T143161 [15:27:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:27:54] 06Operations, 10hardware-requests: EQIAD: (2) hardware access request for PUPPET - https://phabricator.wikimedia.org/T142218#2527576 (10RobH) [15:27:56] 06Operations, 10hardware-requests: CODFW: (2) hardware access request for PUPPET - https://phabricator.wikimedia.org/T142219#2527588 (10RobH) [15:29:08] !log thcipriani@tin Synchronized php-1.28.0-wmf.15/includes/api/ApiUpload.php: SWAT: [[gerrit:305162|Do not call the "UploadStashFile" hook for partially uploaded files (T143161)]] PART II (duration: 00m 50s) [15:29:09] T143161: Catchable fatal error: Argument 3 passed to AbuseFilterHooks::onUploadStashFile() must be of the type array, null given in /var/www/html/w/extensions/AbuseFilter/AbuseFilter.hooks.php on line 730 - https://phabricator.wikimedia.org/T143161 [15:29:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:29:37] (03PS4) 10Thcipriani: wmgEchoMentionStatusNotifications true for test/test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302898 (https://phabricator.wikimedia.org/T141995) (owner: 10Addshore) [15:30:52] CFisch_WMDE: any maintenance scripts need to be run for this change? Or just the change to InitialiseSettings.php? [15:31:24] thcipriani: should be just the settings [15:31:29] ack, thanks [15:31:40] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302898 (https://phabricator.wikimedia.org/T141995) (owner: 10Addshore) [15:32:10] (03Merged) 10jenkins-bot: wmgEchoMentionStatusNotifications true for test/test2wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302898 (https://phabricator.wikimedia.org/T141995) (owner: 10Addshore) [15:32:43] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] puppetdb: Set up postgres replication lag check on slave [puppet] - 10https://gerrit.wikimedia.org/r/305270 (owner: 10Alexandros Kosiaris) [15:32:56] CFisch_WMDE: change is live on mw1099, check please [15:33:48] checking [15:36:35] (03PS1) 10Filippo Giunchedi: hieradata: add thumbor swift account [puppet] - 10https://gerrit.wikimedia.org/r/305275 (https://phabricator.wikimedia.org/T139606) [15:37:55] thcipriani: still waiting for a notification on test. .... but generally looks good might be something unrelated... the setting made it ^^ [15:38:08] on test2. its working [15:38:46] but it was definitely enabled on both, the user settings are there [15:38:55] CFisch_WMDE: ah, I was just waiting on you to give the ok on mw1099, sounds like we're good. I'll roll out everywhere. [15:39:09] kk [15:40:52] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:302898|wmgEchoMentionStatusNotifications true for test/test2wiki (T141995)]] (duration: 00m 50s) [15:40:53] T141995: Deploy mention notifications on the test cluster - https://phabricator.wikimedia.org/T141995 [15:40:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:40:57] ^ CFisch_WMDE live [15:41:16] 06Operations, 10ops-eqiad, 10procurement: rack/setup/deploy (2) new puppetmaster systems - https://phabricator.wikimedia.org/T143219#2561022 (10RobH) [15:41:25] 06Operations, 10ops-eqiad: rack/setup/deploy (2) new puppetmaster systems - https://phabricator.wikimedia.org/T143219#2561039 (10RobH) [15:41:50] 06Operations, 10hardware-requests: EQIAD: (2) hardware access request for PUPPET - https://phabricator.wikimedia.org/T142218#2561041 (10RobH) [15:41:52] 06Operations, 10ops-eqiad: rack/setup/deploy (2) new puppetmaster systems - https://phabricator.wikimedia.org/T143219#2561022 (10RobH) [15:42:16] thcipriani: thanks [15:42:44] 06Operations, 10ops-eqiad: rack/setup/deploy puppetmaster100[12] - https://phabricator.wikimedia.org/T143219#2561022 (10RobH) [15:43:42] RECOVERY - puppet last run on nihal is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [15:44:12] 06Operations, 10ops-codfw, 10DBA: es2004 has a dead disk, but it is not under warranty - https://phabricator.wikimedia.org/T143220#2561047 (10jcrespo) [15:45:40] (03PS3) 10Mobrovac: PDF Render Service: Role and module [puppet] - 10https://gerrit.wikimedia.org/r/305256 (https://phabricator.wikimedia.org/T143129) [15:46:20] (03PS1) 10Dzahn: phabricator: allow ssh between instances for cluster support [puppet] - 10https://gerrit.wikimedia.org/r/305277 (https://phabricator.wikimedia.org/T137928) [15:47:00] 06Operations, 10ops-codfw, 10DBA: es2004 has a dead disk, but it is not under warranty - https://phabricator.wikimedia.org/T143220#2561079 (10jcrespo) p:05Triage>03Low [15:47:16] (03PS2) 10Dzahn: phabricator: allow ssh between servers for cluster support [puppet] - 10https://gerrit.wikimedia.org/r/305277 (https://phabricator.wikimedia.org/T137928) [15:47:30] (03PS1) 10Giuseppe Lavagetto: scap: add conftool class [puppet] - 10https://gerrit.wikimedia.org/r/305278 [15:47:35] <_joe_> mobrovac: ^^ [15:49:00] (03CR) 10jenkins-bot: [V: 04-1] phabricator: allow ssh between servers for cluster support [puppet] - 10https://gerrit.wikimedia.org/r/305277 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [15:52:17] (03CR) 10Mobrovac: "1) Wouldn't it be enough to have:" [puppet] - 10https://gerrit.wikimedia.org/r/305278 (owner: 10Giuseppe Lavagetto) [15:52:31] (03CR) 10Jcrespo: "Would it be wise to create a CNAME with phab1001 in case we failover iridium?" [puppet] - 10https://gerrit.wikimedia.org/r/305277 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [15:52:32] kart_: https://gerrit.wikimedia.org/r/#/c/305190/1 is live on mw1099 for group0 wikis can you test there? [15:52:58] still waiting on wmf.14 jenkins stuffs [15:53:13] thcipriani: bit tricky. It need to be watch for any deadlock errors. [15:53:32] thcipriani: so I'll suggest to go ahead with both branches. [15:53:58] kart_: ack, will roll live when jenkins completes. [15:54:38] <_joe_> mobrovac: gah I committed an empty file by error [15:54:41] <_joe_> sigh [15:54:46] <_joe_> ok I'll amend [15:55:06] <_joe_> mobrovac: my idea is that role::parsoid can include scap::conftool [15:55:24] <_joe_> or better, it can be included by service::node when the deployment method is scap [15:57:12] thcipriani: [15:57:33] oops. Ignore it thcipriani :) [15:58:22] _joe_: yeah, that was my initial idea - for service::node to include that [15:59:40] !log thcipriani@tin Synchronized php-1.28.0-wmf.15/extensions/ContentTranslation: SWAT: [[gerrit:305190|Avoid deadlock patterns in cx_corpora updates (T134245)]] (duration: 00m 52s) [15:59:40] T134245: Internal database error while saving translations - https://phabricator.wikimedia.org/T134245 [15:59:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:59:47] ^ kart_ group0 wikis live [15:59:57] Thanks! [16:00:28] 06Operations, 10Domains, 10Traffic, 10Wikimedia-Site-requests, 13Patch-For-Review: Private wiki for Project Grants Committee - https://phabricator.wikimedia.org/T143138#2561115 (10Mjohnson_WMF) MZMcBride, I've asked Katy Love about https://grants.wikimedia.org/. I'm not familiar with that wiki. I'll le... [16:00:30] kart_: possible to check there, or do you need group1? [16:00:36] group1/all [16:00:59] thcipriani: not possible until we've error :) So, go ahead. [16:01:05] ack [16:01:12] thcipriani: I'm keeping watch on logstash for next few hours. [16:02:46] !log thcipriani@tin Synchronized php-1.28.0-wmf.14/extensions/ContentTranslation: SWAT: [[gerrit:305188|Avoid deadlock patterns in cx_corpora updates (T134245)]] (duration: 00m 50s) [16:02:47] T134245: Internal database error while saving translations - https://phabricator.wikimedia.org/T134245 [16:02:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:02:51] ^ kart_ live everywhere [16:03:05] thcipriani: thanks! [16:08:56] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-collaborations: Analytics cluster access request for ISI Foundation team - https://phabricator.wikimedia.org/T141634#2561133 (10ccattuto) I've created an account on Wikitech: https://wikitech.wikimedia.org/wiki/User:Ccattuto [16:09:31] PROBLEM - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 10.65.0.24 [16:11:21] RECOVERY - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms [16:19:27] 06Operations, 10ops-eqiad: Rack/setup sodium (carbon/mirror server replacement) - https://phabricator.wikimedia.org/T139171#2561185 (10Cmjohnson) Replaced the broken cable, during post I am still getting the same message that the VD is not handled by bios [16:23:15] (03PS1) 10Ema: varnishmedia: remove dead code paths [puppet] - 10https://gerrit.wikimedia.org/r/305287 [16:31:12] (03PS2) 10Giuseppe Lavagetto: scap: add conftool class [puppet] - 10https://gerrit.wikimedia.org/r/305278 [16:31:14] (03PS1) 10Giuseppe Lavagetto: service::node: add scap::conftool when relevant [puppet] - 10https://gerrit.wikimedia.org/r/305290 [16:33:59] (03CR) 10Giuseppe Lavagetto: "I would honestly wait until we need those parameters before parametrizing this. I think whenever we need something different from the stan" [puppet] - 10https://gerrit.wikimedia.org/r/305278 (owner: 10Giuseppe Lavagetto) [16:34:20] (03CR) 10BBlack: [C: 031] varnishmedia: remove dead code paths [puppet] - 10https://gerrit.wikimedia.org/r/305287 (owner: 10Ema) [16:37:18] 06Operations: Handling of customised systemd units via puppet in base::service_unit - https://phabricator.wikimedia.org/T143210#2561277 (10BBlack) My only realm qualm here is that we didn't fix the mask/unmask bug arbitrarily, I don't think. I seem to recall there was a reason we needed to fix it, probably some... [16:38:44] 06Operations, 10ops-eqiad, 13Patch-For-Review: Broken memory on mw1217 - https://phabricator.wikimedia.org/T138925#2561280 (10Joe) If the server is broken and out of warranty, we should decom it for sure, but we probably want to plan for some replacements anyways later this year. We can afford to lose some... [16:40:36] (03PS5) 10Ottomata: [WIP] Mirror main-eqiad into main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/304928 (https://phabricator.wikimedia.org/T134184) [16:41:22] (03CR) 10Giuseppe Lavagetto: "@akosiaris please see my comment." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/304456 (owner: 10Giuseppe Lavagetto) [16:43:25] (03CR) 10jenkins-bot: [V: 04-1] [WIP] Mirror main-eqiad into main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/304928 (https://phabricator.wikimedia.org/T134184) (owner: 10Ottomata) [16:44:32] 07Puppet, 06Labs, 10Phabricator: Phabricator labs puppet role configures phabricator wrong - https://phabricator.wikimedia.org/T131899#2561286 (10demon) p:05Normal>03Low [16:51:16] (03PS2) 10Filippo Giunchedi: hieradata: add thumbor swift account [puppet] - 10https://gerrit.wikimedia.org/r/305275 (https://phabricator.wikimedia.org/T139606) [16:51:18] (03PS1) 10Filippo Giunchedi: swift: allow disabling account stats [puppet] - 10https://gerrit.wikimedia.org/r/305294 [17:06:25] (03PS3) 10Rush: dynamicproxy: puppetize appendfilename setting [puppet] - 10https://gerrit.wikimedia.org/r/304994 (owner: 10Giuseppe Lavagetto) [17:10:23] (03PS4) 10Mobrovac: PDF Render Service: Role and module [puppet] - 10https://gerrit.wikimedia.org/r/305256 (https://phabricator.wikimedia.org/T143129) [17:14:18] (03PS1) 10Ottomata: Mirror main-eqiad into main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/305301 (https://phabricator.wikimedia.org/T134184) [17:16:25] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-collaborations: Analytics cluster access request for ISI Foundation team - https://phabricator.wikimedia.org/T141634#2561339 (10RobH) a:03RobH [17:17:25] (03CR) 10Ottomata: "Looks good here https://puppet-compiler.wmflabs.org/3733/kafka2001.codfw.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/305301 (https://phabricator.wikimedia.org/T134184) (owner: 10Ottomata) [17:17:53] (03PS5) 10Mobrovac: PDF Render Service: Role and module [puppet] - 10https://gerrit.wikimedia.org/r/305256 (https://phabricator.wikimedia.org/T143129) [17:19:11] (03CR) 10Ottomata: "Am having a lot of trouble getting this to work with tests. Considering some quarterly time constraints, and KISS, I'm putting this more " (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/304928 (https://phabricator.wikimedia.org/T134184) (owner: 10Ottomata) [17:22:14] (03PS2) 10Ottomata: Mirror main-eqiad into main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/305301 (https://phabricator.wikimedia.org/T134184) [17:25:10] (03CR) 10Ottomata: "And with kafka_cluster_name fix looks good in https://puppet-compiler.wmflabs.org/3734/kafka2001.codfw.wmnet/ too" [puppet] - 10https://gerrit.wikimedia.org/r/305301 (https://phabricator.wikimedia.org/T134184) (owner: 10Ottomata) [17:26:10] (03PS3) 10Filippo Giunchedi: hieradata: add thumbor swift account [puppet] - 10https://gerrit.wikimedia.org/r/305275 (https://phabricator.wikimedia.org/T139606) [17:27:12] (03PS1) 10RobH: Analytics cluster access request for ISI Foundation team [puppet] - 10https://gerrit.wikimedia.org/r/305306 (https://phabricator.wikimedia.org/T141634) [17:28:03] (03CR) 10RobH: [C: 032] Analytics cluster access request for ISI Foundation team [puppet] - 10https://gerrit.wikimedia.org/r/305306 (https://phabricator.wikimedia.org/T141634) (owner: 10RobH) [17:28:51] meh, i have a build failure, lame. [17:28:57] (03CR) 10RobH: Analytics cluster access request for ISI Foundation team [puppet] - 10https://gerrit.wikimedia.org/r/305306 (https://phabricator.wikimedia.org/T141634) (owner: 10RobH) [17:29:14] (03CR) 10Mobrovac: scap: add conftool class (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305278 (owner: 10Giuseppe Lavagetto) [17:30:58] (03CR) 10jenkins-bot: [V: 04-1] Analytics cluster access request for ISI Foundation team [puppet] - 10https://gerrit.wikimedia.org/r/305306 (https://phabricator.wikimedia.org/T141634) (owner: 10RobH) [17:31:07] !log deploying Kafka main-eqiad -> main-codfw 'eqiad.*' topic mirroing [17:31:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:31:14] (03CR) 10Ottomata: [C: 032] Mirror main-eqiad into main-codfw [puppet] - 10https://gerrit.wikimedia.org/r/305301 (https://phabricator.wikimedia.org/T134184) (owner: 10Ottomata) [17:32:11] 17:27:39 'Users assigned that do not exist: %r' % non_existent_users [17:32:11] 17:27:39 AssertionError: Users assigned that do not exist: ['mtizzoni', 'ciro', 'paolotti', 'panisson'] [17:32:19] i know they dont exist im makign them in the same file/patchset =P [17:32:35] anyone else run into this condition before? [17:33:01] I've created a user and added to groups in the same patchset in the past... [17:34:32] robh: yes, i have [17:34:44] bleh, maybe its my indentation, i am missing spaces before the stanzas for each new user [17:34:46] (03PS2) 10RobH: Analytics cluster access request for ISI Foundation team [puppet] - 10https://gerrit.wikimedia.org/r/305306 (https://phabricator.wikimedia.org/T141634) [17:34:51] (03PS1) 10Ottomata: confluent::kafka::mirror::alerts should require instance define properly [puppet] - 10https://gerrit.wikimedia.org/r/305307 (https://phabricator.wikimedia.org/T134184) [17:34:54] maybe thats it, dunno, it'll rerun now [17:34:56] robh: i create new empty groups and then add them in a second step [17:35:19] well, these are existing groups, and new users being added to the data file as users and members in groups in the same patch [17:35:40] but maybe my formatting fucked it up, seems odd that some spaces would but meh. [17:35:50] oh, ok [17:35:56] (03CR) 10Ottomata: [C: 032 V: 032] confluent::kafka::mirror::alerts should require instance define properly [puppet] - 10https://gerrit.wikimedia.org/r/305307 (https://phabricator.wikimedia.org/T134184) (owner: 10Ottomata) [17:36:07] ha, it was [17:36:12] its already passed the failed teest [17:36:26] all cuz i left out 2 spaces from in front of the new user entries. [17:36:31] =P [17:36:58] (or an odd race condition that i hit and didnt hit a second time, who knows but it works) [17:37:17] (03CR) 10RobH: [C: 032] Analytics cluster access request for ISI Foundation team [puppet] - 10https://gerrit.wikimedia.org/r/305306 (https://phabricator.wikimedia.org/T141634) (owner: 10RobH) [17:37:19] ah, yea, in yaml a space can make a big difference [17:37:27] (03PS3) 10RobH: Analytics cluster access request for ISI Foundation team [puppet] - 10https://gerrit.wikimedia.org/r/305306 (https://phabricator.wikimedia.org/T141634) [17:37:56] these are the reasons i never skip auto-verification. [17:38:25] right [17:38:54] (03PS1) 10Ottomata: Fix dependency cycle in confluent::kafka::mirror::alerts [puppet] - 10https://gerrit.wikimedia.org/r/305309 (https://phabricator.wikimedia.org/T134184) [17:39:36] (03CR) 10Ottomata: [C: 032 V: 032] Fix dependency cycle in confluent::kafka::mirror::alerts [puppet] - 10https://gerrit.wikimedia.org/r/305309 (https://phabricator.wikimedia.org/T134184) (owner: 10Ottomata) [17:40:39] meh testing is slowwww [17:41:37] (03PS4) 10RobH: Analytics cluster access request for ISI Foundation team [puppet] - 10https://gerrit.wikimedia.org/r/305306 (https://phabricator.wikimedia.org/T141634) [17:41:46] (03CR) 10RobH: [V: 032] Analytics cluster access request for ISI Foundation team [puppet] - 10https://gerrit.wikimedia.org/r/305306 (https://phabricator.wikimedia.org/T141634) (owner: 10RobH) [17:44:31] ahh shit, some of them didnt sign the server doc... now i have to roll back my change. annoying mistake by me. [17:45:21] (03CR) 10Mobrovac: service::node: add scap::conftool when relevant (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305290 (owner: 10Giuseppe Lavagetto) [17:45:26] (03PS1) 10Ottomata: Removing kafka mirror from main codfw hosts. [puppet] - 10https://gerrit.wikimedia.org/r/305311 (https://phabricator.wikimedia.org/T134184) [17:45:51] (03PS2) 10Ottomata: Removing kafka mirror from main codfw hosts. [puppet] - 10https://gerrit.wikimedia.org/r/305311 (https://phabricator.wikimedia.org/T134184) [17:46:00] (03CR) 10Ottomata: [C: 032 V: 032] Removing kafka mirror from main codfw hosts. [puppet] - 10https://gerrit.wikimedia.org/r/305311 (https://phabricator.wikimedia.org/T134184) (owner: 10Ottomata) [17:47:22] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-collaborations: Analytics cluster access request for ISI Foundation team - https://phabricator.wikimedia.org/T141634#2561448 (10RobH) So I pushed the access live, and now realize not everyone has signed the L3 document, which is required... [17:50:05] (03CR) 10Filippo Giunchedi: [C: 031] varnishmedia: remove dead code paths [puppet] - 10https://gerrit.wikimedia.org/r/305287 (owner: 10Ema) [17:50:20] (03PS1) 10RobH: Analytics cluster access request for ISI Foundation team [puppet] - 10https://gerrit.wikimedia.org/r/305313 (https://phabricator.wikimedia.org/T141634) [17:50:47] (03PS2) 10RobH: Analytics cluster access request for ISI Foundation team [puppet] - 10https://gerrit.wikimedia.org/r/305313 (https://phabricator.wikimedia.org/T141634) [17:52:23] bleh, i messed up the process i helped write [17:52:26] * robh feels shame [17:52:58] (03CR) 10RobH: [C: 032] Analytics cluster access request for ISI Foundation team [puppet] - 10https://gerrit.wikimedia.org/r/305313 (https://phabricator.wikimedia.org/T141634) (owner: 10RobH) [17:53:24] (03CR) 10Alexandros Kosiaris: [C: 031] "I 'll merge tomorrow European morning if nobody beats me to it" [puppet] - 10https://gerrit.wikimedia.org/r/302601 (https://phabricator.wikimedia.org/T141324) (owner: 10Chad) [17:54:49] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-collaborations: Analytics cluster access request for ISI Foundation team - https://phabricator.wikimedia.org/T141634#2561492 (10RobH) When everyone signs the L3 document, patchset https://gerrit.wikimedia.org/r/305313 can be reverted to r... [17:58:10] 06Operations: Handling of customised systemd units via puppet in base::service_unit - https://phabricator.wikimedia.org/T143210#2561498 (10akosiaris) >>! In T143210#2561277, @BBlack wrote: > My only realm qualm here is that we didn't fix the mask/unmask bug arbitrarily, I don't think. I seem to recall there was... [17:59:25] 06Operations, 10ops-eqiad: Rack/setup sodium (carbon/mirror server replacement) - https://phabricator.wikimedia.org/T139171#2561501 (10Cmjohnson) updated firmware for both bios and controller. Not able to see RAID in BIOS. [18:00:37] 06Operations: post build failures for operations/puppet on operations-puppet-doc - https://phabricator.wikimedia.org/T143233#2561502 (10RobH) [18:03:42] (03CR) 10Mobrovac: [C: 031] "Heh, didn't see your comment before commenting myself. Ok, I'm ok with waiting on parametrisation, but let's put a TODO comment to that ef" [puppet] - 10https://gerrit.wikimedia.org/r/305278 (owner: 10Giuseppe Lavagetto) [18:03:53] ori, godog, gilles, _joe_ , mobrovac : sorry about being 10 min late. I'm not sure what happened. My smartphone alarm was right next to me on a shelf, where ironically it used to be across the room. [18:04:46] :) [18:05:08] (03CR) 10Filippo Giunchedi: [C: 04-1] "mostly re: hardcoding the names, also what Jaime said" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305277 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [18:05:51] AaronSchulz: hehe no worries at all, it happens [18:08:08] the volume is set right too, odd [18:08:28] (03CR) 10Jcrespo: "The reasoning is that it wouldn't be the first time something like iridium gets decommed, but not deleted from DNS; then ip gets reused; a" [puppet] - 10https://gerrit.wikimedia.org/r/305277 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [18:08:48] godog: hopefully the pad was useful, I updated it a few days prior [18:08:55] (03PS3) 10BBlack: varnish: switch from libGeoIP to libmaxminddb [WIP!] [puppet] - 10https://gerrit.wikimedia.org/r/253619 (https://phabricator.wikimedia.org/T99226) (owner: 10Faidon Liambotis) [18:09:40] we idled waiting for the leader we needed :-) [18:10:33] (03PS4) 10BBlack: varnish: switch from libGeoIP to libmaxminddb [WIP!] [puppet] - 10https://gerrit.wikimedia.org/r/253619 (https://phabricator.wikimedia.org/T99226) (owner: 10Faidon Liambotis) [18:12:36] (03PS5) 10BBlack: varnish: switch from libGeoIP to libmaxminddb [WIP!] [puppet] - 10https://gerrit.wikimedia.org/r/253619 (https://phabricator.wikimedia.org/T99226) (owner: 10Faidon Liambotis) [18:17:03] !log T143226: Perform major compaction on local_group_wikipedia_T_parsoid_html.data, restbase1008-a.eqiad.wmnet [18:17:04] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [18:17:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:20:02] PROBLEM - puppet last run on bast1001 is CRITICAL: CRITICAL: Puppet has 2 failures [18:22:48] 06Operations: Handling of customised systemd units via puppet in base::service_unit - https://phabricator.wikimedia.org/T143210#2561622 (10MoritzMuehlenhoff) True, I didn't think of the use case of service owners with sudo rules to grant them mask/umask. I'll dig into systemd unit overrides, maybe that's an option, [18:31:08] robh: there is a puppet issue with a user panisson on bastion [18:31:19] robh: is that one of the new ones? [18:31:32] !log T143226: Perform major compaction on local_group_wikipedia_T_parsoid_html.data, restbase2002-a.codfw.wmnet [18:31:32] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [18:31:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:31:50] yes, lemme check [18:32:01] mutante: thats the only one that should work [18:32:13] i pushed a patch for 4 users, then had to revoke 3 of the 4 when i realized they hadn't signed the L3 doc yet. [18:32:20] it's something about the parsoid-admin group [18:32:32] i didnt dd them to parsoid admin though.... [18:32:34] ah [18:32:57] they are bastionsonly, then some stat and analytics groups [18:33:08] mutante: refire it a second time? [18:33:17] maybe it was the race condition that occassionally happens for users [18:33:18] ok [18:33:27] where it tries to assgin group before it creates the user [18:33:42] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: Puppet has 2 failures [18:33:44] typically i see it go away on a second run [18:33:58] same on 4001 it semes, checking [18:34:01] hmm. that bast4001 is probably the same [18:34:03] ack [18:34:13] i could repeat the problem on 1001 [18:34:40] oh, it doesnt go away? damn. [18:34:44] Error: /User[panisson]: Could not evaluate: Puppet::Util::Log requires a message [18:34:47] ? eh.. [18:34:55] that is odd [18:34:57] Util::Log ? [18:35:04] i've never seen that [18:35:06] i dunno [18:35:17] same issue on 4001 [18:35:38] i should ahve run with more verbosity. [18:36:18] so.. the user exists [18:36:22] and it should exist [18:36:27] right [18:36:37] correct [18:36:41] the user should exist. [18:36:51] (its the only one that should for the 4 on that task so far) [18:36:53] it's somehow confused due to the revert [18:37:01] other than my screwup due to revert [18:37:01] PROBLEM - puppet last run on rutherfordium is CRITICAL: CRITICAL: Puppet has 2 failures [18:37:12] !log T143226: Perform major compaction on local_group_wikipedia_T_parsoid_html.data, restbase2003-a.codfw.wmnet [18:37:13] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [18:37:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:37:31] we could set them all to absent and see if it clears the issue. [18:37:46] well, all being just panisson now [18:37:52] robh: i was about to say that, revert that user too and then re-add them all at once when they signed [18:38:08] to see if it clears it yea [18:38:18] well, not revert, but absent you mean right? [18:38:25] since the user was created, just reverting leaves orphaned shit [18:38:34] imu [18:39:17] i thought revert to before it was added. not sure [18:39:34] looking at history [18:39:46] well, if the admin module adds a user to a host [18:39:46] removing the user from the module entirely doesnt make the user go away [18:39:55] it seems bad to leave an orphaned user outisde of puppet mgmt [18:40:10] at least, removal in past didnt, i recall chase having to clean them up [18:40:12] i dont know, i have not seen the changes yet [18:40:33] !log T143226: Perform major compaction on local_group_wikipedia_T_parsoid_html.data, restbase2005-a.codfw.wmnet [18:40:34] T143226: Cluster-wide major compactions: parsoid-html - https://phabricator.wikimedia.org/T143226 [18:40:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:41:24] mutante: uhh, i have this messed up now =P [18:41:35] so the file has panison but no ssh key, i have no fucking clue what i did wrong but i messed up [18:41:42] PROBLEM - puppet last run on bast2001 is CRITICAL: CRITICAL: Puppet has 2 failures [18:41:56] that could be part of the issue, a user with no passowrd and no ssh key but present. [18:42:03] oh, yea [18:42:42] do you want to add the key for it first? [18:43:21] or do you want to set them all to absent [18:43:38] i am not sure about the ones that are "absent" with the blank ssh keys [18:44:29] (03PS1) 10RobH: fixing user panisson's ssh key [puppet] - 10https://gerrit.wikimedia.org/r/305316 [18:44:29] so fixing up the absented with blank ssh keys to match other absented with no ssh keys [18:44:35] and adding in panissons ssh key [18:45:04] I wanted to follow and help clarify but I'm confused now too :) [18:45:09] seems like they should all have keys [18:45:16] (03CR) 10RobH: [C: 032] fixing user panisson's ssh key [puppet] - 10https://gerrit.wikimedia.org/r/305316 (owner: 10RobH) [18:45:33] Ok, starting from the beginning. I made a patch to add 4 new users and merged it live. [18:45:56] Then I realized I fucked up, as 3 of them had not signed the L3 document. A followup patch to absent 3 of the 4 was merged (it passed testing). [18:46:04] ah [18:46:10] my followup patch blanked the ssh key of the active one by mistake [18:46:15] as well as the other 3 absented ones. [18:46:39] i think the blank ssh key may be why its fucked up puppet runs, but uncertain. [18:46:47] i have https://gerrit.wikimedia.org/r/#/c/305316/ to fix that ssh key not being there [18:47:00] but its that, or try to absent out the user entirely to see if it fixes the puppet break on bastions [18:47:13] i think adding that key there makes sense. that user is the one that shows up in the errors [18:47:19] i think so too [18:47:20] and also the one user that is supposed to be active [18:47:31] merging now, we shall see shortly [18:47:35] (03PS6) 10Mobrovac: PDF Render Service: Role and module [puppet] - 10https://gerrit.wikimedia.org/r/305256 (https://phabricator.wikimedia.org/T143129) [18:48:01] rerunning puppet on bast4001 [18:48:19] mutante: thanks for spotting the error btw =] [18:48:41] ah yes an empty array is probably allowed but I'm unsure if that first syntax translates to empty array [18:48:48] i dont think it did [18:48:57] i think it was puking for all the keys on a seperate line of - [18:49:01] rather than [] but meh [18:49:06] its failing still so who knows (not me) [18:49:28] ok, example of box it fails on? [18:49:28] shit. [18:49:35] any bastion host =] [18:49:45] im on 4001 [18:49:48] daniel is on 1001 [18:49:50] try out 2001 =] [18:50:12] (or feel free to hop on 4001, doesnt matter to me) [18:50:41] heh, i see ya on 4001, i wont rerun puppet ;] [18:52:40] robh: it's choking on encoding [18:52:42] for realname: André Panisson [18:52:48] ha [18:52:49] Error: Could not convert change 'comment' to string: incompatible character encodings: UTF-8 and ASCII-8BIT [18:52:52] well, shit, lemme fix that [18:53:26] chasemp: thats now amusing to me. [18:54:07] I guess we should lint on that [18:54:29] indeed, that is annoyinnnnnng [18:54:29] (03PS1) 10RobH: Andre's name formatting caused encoding errors [puppet] - 10https://gerrit.wikimedia.org/r/305317 [18:54:36] i can make a phab task to add it to the lint check [18:55:08] chasemp: did it tell you specifically which line it could not encode or you just looked at the file and knew that character was issue? [18:55:24] rephrase: how did you figure that out? [18:55:42] (03CR) 10RobH: [C: 032] "once more with feeling!" [puppet] - 10https://gerrit.wikimedia.org/r/305317 (owner: 10RobH) [18:56:04] reasoning was: dependency issue related to user panisson, saw error related to 'Error: Could not convert change 'comment' to string: incompatible character encodings: UTF-8 and ASCII-8BIT', looked up comment is indeed the real name and looked and saw the special char for that user [18:58:20] (03PS7) 10Mobrovac: PDF Render Service: Role and module [puppet] - 10https://gerrit.wikimedia.org/r/305256 (https://phabricator.wikimedia.org/T143129) [18:58:33] where did you look up the comment, syslog? [18:58:43] just wondering wher eit pushed out that line that had the issue is all [19:00:05] twentyafterfour: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160817T1900). [19:00:21] i'm in a moving car, be back soon, sry [19:00:34] mutante: no worries [19:02:01] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [19:02:12] and now puppet works on adding the user, huzzah [19:02:13] Notice: /Stage[main]/Admin/Admin::Hashuser[panisson]/Admin::User[panisson]/User[panisson]/comment: comment changed 'André Panisson' to 'Andre Panisson' [19:02:19] and all issues resolve. [19:02:23] RECOVERY - puppet last run on bast1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:02:36] other than the post build op failing for puppet docs =[ [19:02:42] PROBLEM - MegaRAID on ms-be1005 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) [19:03:11] RECOVERY - puppet last run on rutherfordium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:04:59] (03PS2) 10RobH: DHCP: Add Dhcp entries for wezen (new syslog server) Bug: T143146 [puppet] - 10https://gerrit.wikimedia.org/r/305167 (https://phabricator.wikimedia.org/T143146) (owner: 10Papaul) [19:05:52] PROBLEM - puppet last run on ms-be1005 is CRITICAL: CRITICAL: Puppet has 1 failures [19:06:27] (03CR) 10RobH: [C: 032] DHCP: Add Dhcp entries for wezen (new syslog server) Bug: T143146 [puppet] - 10https://gerrit.wikimedia.org/r/305167 (https://phabricator.wikimedia.org/T143146) (owner: 10Papaul) [19:06:44] (03PS2) 10RobH: adding install params for wezen (new syslog server) Bug:T143146 [puppet] - 10https://gerrit.wikimedia.org/r/305168 (https://phabricator.wikimedia.org/T143146) (owner: 10Papaul) [19:07:51] RECOVERY - puppet last run on bast2001 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [19:08:31] jouncebot: I'm on it [19:08:34] (03CR) 10RobH: [C: 04-1] "please see in line comment for netboot.cfg edit, thanks!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305168 (https://phabricator.wikimedia.org/T143146) (owner: 10Papaul) [19:08:40] (03PS1) 10Ottomata: Refactor zookeeper cluster config so it is available in all hiera scopes [puppet] - 10https://gerrit.wikimedia.org/r/305321 (https://phabricator.wikimedia.org/T143232) [19:09:08] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: Rack and setup Fundraising DB - https://phabricator.wikimedia.org/T136200#2561825 (10Jgreen) [19:09:25] ok, in line editor to fix tiny shit is kinda nice. [19:09:36] i didnt commit since its papaul's patchset to fix, but still =] [19:10:18] (03CR) 10RobH: [C: 031] admin: add dpatrick to sectools-roots, put group in role [puppet] - 10https://gerrit.wikimedia.org/r/296651 (https://phabricator.wikimedia.org/T138873) (owner: 10Dzahn) [19:13:06] (03CR) 10jenkins-bot: [V: 04-1] Refactor zookeeper cluster config so it is available in all hiera scopes [puppet] - 10https://gerrit.wikimedia.org/r/305321 (https://phabricator.wikimedia.org/T143232) (owner: 10Ottomata) [19:15:06] (03PS2) 10Ottomata: Refactor zookeeper cluster config so it is available in all hiera scopes [puppet] - 10https://gerrit.wikimedia.org/r/305321 (https://phabricator.wikimedia.org/T143232) [19:15:56] (03CR) 10Krinkle: [C: 04-1] Update gallery image bounding box on svwiki to 150x150 (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304991 (https://phabricator.wikimedia.org/T113877) (owner: 10Gilles) [19:16:51] PROBLEM - HP RAID on ms-be1026 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds. [19:19:33] (03PS3) 10Ottomata: Refactor zookeeper cluster config so it is available in all hiera scopes [puppet] - 10https://gerrit.wikimedia.org/r/305321 (https://phabricator.wikimedia.org/T143232) [19:21:06] (03PS1) 1020after4: group1 wikis to 1.28.0-wmf.15 refs T140971 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305323 [19:23:11] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group1 wikis to 1.28.0-wmf.15 refs T140971 [19:23:12] T140971: MW-1.28.0-wmf.15 deployment blockers - https://phabricator.wikimedia.org/T140971 [19:23:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:26:39] (03CR) 10Thcipriani: [C: 031] "Good to merge whenever. Just a bugfix release. Working now on beta." [puppet] - 10https://gerrit.wikimedia.org/r/305078 (owner: 10Thcipriani) [19:27:32] (03PS4) 10Ottomata: Refactor zookeeper cluster config so it is available in all hiera scopes [puppet] - 10https://gerrit.wikimedia.org/r/305321 (https://phabricator.wikimedia.org/T143232) [19:27:44] (03PS6) 10BBlack: varnish: switch from libGeoIP to libmaxminddb [puppet] - 10https://gerrit.wikimedia.org/r/253619 (https://phabricator.wikimedia.org/T99226) (owner: 10Faidon Liambotis) [19:28:31] RECOVERY - HP RAID on ms-be1026 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor [19:29:21] bblack: <3 [19:29:24] 06Operations, 10Traffic: Convert text cluster to Varnish 4 - https://phabricator.wikimedia.org/T131503#2561905 (10BBlack) [19:29:28] 06Operations, 10Fundraising-Backlog, 10Traffic, 13Patch-For-Review: Switch Varnish's GeoIP code to libmaxminddb/GeoIP2 - https://phabricator.wikimedia.org/T99226#2561904 (10BBlack) [19:31:06] hey bblack, when you get a minute, can you weigh in on this task: https://phabricator.wikimedia.org/T142399#2553744 ? Thanks! [19:35:01] Call to a member function getEntireText() on a non-object (boolean) in (1.28.0-wmf.15) /includes/content/WikitextContentHandler.php [19:35:10] (03PS5) 10Ottomata: Refactor zookeeper cluster config so it is available in all hiera scopes [puppet] - 10https://gerrit.wikimedia.org/r/305321 (https://phabricator.wikimedia.org/T143232) [19:36:52] 06Operations, 10netops: configure port for frdb1001 - https://phabricator.wikimedia.org/T143248#2561911 (10Jgreen) [19:37:24] jhobs: done [19:38:02] paravoid: I think it's basically in deployable shape, except going back through it for detailed review to find stupid bugs. It "works" on cp1008 and shouldn't leak or do awful things. [19:38:23] bblack: appreciated! [19:40:23] (03CR) 10Ottomata: [C: 031] "Tested in labs in analytics and deployment-prep projects. Noop on affected nodes in prod: https://puppet-compiler.wmflabs.org/3740/" [puppet] - 10https://gerrit.wikimedia.org/r/305321 (https://phabricator.wikimedia.org/T143232) (owner: 10Ottomata) [19:40:34] (03PS7) 10BBlack: varnish: switch from libGeoIP to libmaxminddb [puppet] - 10https://gerrit.wikimedia.org/r/253619 (https://phabricator.wikimedia.org/T99226) (owner: 10Faidon Liambotis) [19:40:46] killed a compiler warning nit in PS7 :) [19:41:52] bblack: thanks for the reply [19:43:54] (03PS6) 10Ottomata: Refactor zookeeper cluster config so it is available in all hiera scopes [puppet] - 10https://gerrit.wikimedia.org/r/305321 (https://phabricator.wikimedia.org/T143232) [19:47:00] robh: back and i saw the recoveries. cool [19:49:27] (03CR) 10BBlack: [C: 031] "This is ready for a fuller nitpicky review if anyone wants to trawl through it. I've cleaned up a lot of the code in the process, so it's" [puppet] - 10https://gerrit.wikimedia.org/r/253619 (https://phabricator.wikimedia.org/T99226) (owner: 10Faidon Liambotis) [20:00:04] gwicke, cscott, arlolra, subbu, bearND, mdholloway, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160817T2000). [20:01:48] !log starting parsoid deploy [20:01:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:04:02] !log synced new parsoid code; restarted parsoid on wtp1001 as a canary [20:04:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:05:02] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-collaborations, 10Research-management: Request access to data for WDQS research - https://phabricator.wikimedia.org/T142780#2546133 (10RobH) Please note bastiononly access group will need to be included. At this point we have the signo... [20:07:00] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-collaborations: Analytics cluster access request for ISI Foundation team - https://phabricator.wikimedia.org/T141634#2562050 (10Milimetric) Thanks, @RobH, some of these folks are on vacation so we'll have to wait a little bit. [20:07:24] (03PS1) 10Dzahn: add phab1001.eqiad as CNAME for iridium.eqiad [dns] - 10https://gerrit.wikimedia.org/r/305335 [20:08:38] (03PS2) 10Dzahn: add phab1001.eqiad as CNAME for iridium.eqiad [dns] - 10https://gerrit.wikimedia.org/r/305335 [20:09:30] !log finished deploying parsoid sha 3cf877bb [20:09:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:10:22] (03CR) 10Dzahn: "@jcrespo yes, sounds good. i made https://gerrit.wikimedia.org/r/#/c/305335/ to add the CNAME as suggsted" [puppet] - 10https://gerrit.wikimedia.org/r/305277 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [20:19:10] ahghgharofjawe;sdljkfasd [20:19:16] i just typed up a huge task and got an error screen [20:19:27] it was so detailed and had 30 differnt checkboxes ;_; [20:22:53] 06Operations, 10procurement: rack/setup/deploy puppetmaster200[12] - https://phabricator.wikimedia.org/T143255#2562105 (10RobH) [20:25:14] if you backpage int [20:25:20] in the browser it's gone eh? [20:25:23] !log starting mobileapps deploy [20:25:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:25:43] (03PS3) 10Dzahn: phabricator: allow ssh between servers for cluster support [puppet] - 10https://gerrit.wikimedia.org/r/305277 (https://phabricator.wikimedia.org/T137928) [20:26:08] !log Restarting Cassandra, aqs1004-a.eqiad.wmnet [20:26:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:28:46] (03CR) 10Dzahn: "@filippo all done. using hiera (have to add the names in the first place, needs that extra CNAME), resolving the names, no if guards" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305277 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [20:31:47] (03PS4) 10Dzahn: phabricator: allow ssh between servers for cluster support [puppet] - 10https://gerrit.wikimedia.org/r/305277 (https://phabricator.wikimedia.org/T137928) [20:37:17] (03CR) 10Dzahn: [C: 031] "http://puppet-compiler.wmflabs.org/3741/" [puppet] - 10https://gerrit.wikimedia.org/r/305277 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [20:38:08] (03PS1) 10Eevans: Instance-aware Cassandra restarts for aqs-admins [puppet] - 10https://gerrit.wikimedia.org/r/305367 [20:39:34] (03PS2) 10Eevans: Instance-aware Cassandra restarts for aqs-admins [puppet] - 10https://gerrit.wikimedia.org/r/305367 (https://phabricator.wikimedia.org/T143259) [20:39:49] jenkins sez, "This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository." [20:39:57] anyone know what "cross-repo dependencies" are? [20:40:00] (03CR) 10Dzahn: "but _after_ https://gerrit.wikimedia.org/r/#/c/305335/" [puppet] - 10https://gerrit.wikimedia.org/r/305277 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [20:40:05] (03CR) 10Nuria: "We have two clusters now, so we probably need both commands, correct?" [puppet] - 10https://gerrit.wikimedia.org/r/305367 (https://phabricator.wikimedia.org/T143259) (owner: 10Eevans) [20:40:20] (03CR) 10Dzahn: "needed by https://gerrit.wikimedia.org/r/#/c/305277/" [dns] - 10https://gerrit.wikimedia.org/r/305335 (owner: 10Dzahn) [20:40:31] because the patch itself (https://gerrit.wikimedia.org/r/304043) is against current master / HEAD [20:42:34] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-collaborations: Analytics cluster access request for ISI Foundation team - https://phabricator.wikimedia.org/T141634#2562235 (10Michele.tizzoni) Hi, I've signed the L3 document. [20:43:10] anyone know off-hand, is it 'wikipedia' or 'wiki' to deploy to all Wikipedias in InitialiseSettings.php? Documentation says check all.dblist, which has neither, but it looks like others are using 'wikipedia' in the file [20:43:38] !log deployed mobileapps 81bd74f [20:43:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:47:16] (03PS1) 10Jhobs: Deploy lazy loaded images to all mobile web wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305387 (https://phabricator.wikimedia.org/T142399) [20:48:22] jhobs: in dblists a wikipedia is just a "wiki" while all other projects have more specific names [20:48:44] mutante: right, I'm saying to deploy to all wikipedias [20:48:45] enwiki enwikiquote enwikibooks ... [20:48:57] mutante: is the key `wikipedia` because that's the name of the dblist? [20:49:11] i think it is, yes [20:49:15] wikipedia.dblist is the file [20:49:35] mutante: thank you, I was unsure if the dblist filenames also corresponded to keys or not [20:49:59] since they do not appear in all.dblist [20:50:13] hmm [20:51:08] 06Operations, 10Cassandra: Address abnormally wide partitions - https://phabricator.wikimedia.org/T143056#2562261 (10Eevans) [20:54:07] jhobs: well, i can confirm the others use 'wikipedia' in several places [20:54:32] the filename.. might be :) [20:54:47] mutante: yeah, that's what I was going off of as well. I just realized the documentation is a bit misleading, but does actually confirm it in the code block https://wikitech.wikimedia.org/wiki/Configuration_files#InitialiseSettings.php [20:54:50] thanks for your help! [20:55:11] yep, yw [21:00:20] 06Operations, 06Labs, 13Patch-For-Review, 07Tracking: overhaul labstore setup [tracking] - https://phabricator.wikimedia.org/T126083#2562283 (10chasemp) [21:00:22] 06Operations, 06Labs: Create an NFS mount manager - https://phabricator.wikimedia.org/T140483#2562282 (10chasemp) 05Open>03Resolved [21:02:38] (03PS3) 10Dzahn: adding install params for wezen (new syslog server) Bug:T143146 [puppet] - 10https://gerrit.wikimedia.org/r/305168 (https://phabricator.wikimedia.org/T143146) (owner: 10Papaul) [21:03:16] (03CR) 10Dzahn: adding install params for wezen (new syslog server) Bug:T143146 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305168 (https://phabricator.wikimedia.org/T143146) (owner: 10Papaul) [21:03:55] (03CR) 10Dzahn: [C: 032] adding install params for wezen (new syslog server) Bug:T143146 [puppet] - 10https://gerrit.wikimedia.org/r/305168 (https://phabricator.wikimedia.org/T143146) (owner: 10Papaul) [21:07:23] (03CR) 10Dzahn: [C: 032] zuul-test-repo: Allow testing multiple repositories at once [puppet] - 10https://gerrit.wikimedia.org/r/269328 (owner: 10Legoktm) [21:07:32] (03PS4) 10Dzahn: zuul-test-repo: Allow testing multiple repositories at once [puppet] - 10https://gerrit.wikimedia.org/r/269328 (owner: 10Legoktm) [21:12:01] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/305168 (https://phabricator.wikimedia.org/T143146) (owner: 10Papaul) [21:23:05] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/269328 (owner: 10Legoktm) [21:23:34] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-collaborations: Analytics cluster access request for ISI Foundation team - https://phabricator.wikimedia.org/T141634#2562339 (10ccattuto) I've signed the L3 document. [21:25:12] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-collaborations: Analytics cluster access request for ISI Foundation team - https://phabricator.wikimedia.org/T141634#2562340 (10Daniela.paolotti) I've signed the L3 document as well [21:27:35] (03PS1) 10RobH: Revert "Analytics cluster access request for ISI Foundation team" [puppet] - 10https://gerrit.wikimedia.org/r/305398 [21:27:58] (03CR) 10jenkins-bot: [V: 04-1] Revert "Analytics cluster access request for ISI Foundation team" [puppet] - 10https://gerrit.wikimedia.org/r/305398 (owner: 10RobH) [21:28:01] 06Operations, 10Cassandra: Address abnormally wide partitions - https://phabricator.wikimedia.org/T143056#2562350 (10Eevans) Script to delete the partitions >= 10G. {P3845} [21:32:12] (03PS2) 10RobH: Enabling all users for analytics cluster access request for ISI Foundation team" [puppet] - 10https://gerrit.wikimedia.org/r/305398 [21:32:40] 06Operations, 10ops-eqiad: Rack/setup sodium (carbon/mirror server replacement) - https://phabricator.wikimedia.org/T139171#2562353 (10Cmjohnson) Created a new work order to have a technician come to the data center and troubleshoot. [21:33:46] (03CR) 10RobH: [C: 032] Enabling all users for analytics cluster access request for ISI Foundation team" [puppet] - 10https://gerrit.wikimedia.org/r/305398 (owner: 10RobH) [21:35:56] (03PS5) 10Dzahn: zuul-test-repo: Allow testing multiple repositories at once [puppet] - 10https://gerrit.wikimedia.org/r/269328 (owner: 10Legoktm) [21:36:26] (03CR) 10Dzahn: [V: 032] "already verified" [puppet] - 10https://gerrit.wikimedia.org/r/269328 (owner: 10Legoktm) [21:36:52] damn testing is slow [21:37:00] still queued for rake-jessie... [21:37:33] CI is backed up, RelEng knows [21:37:44] thcipriani looking into it AIUI [21:38:15] ori: thx for info =] [21:39:04] looking into it/fretting while watching debug logs [21:40:05] (03PS1) 10Rush: sge collector: set correct env [puppet] - 10https://gerrit.wikimedia.org/r/305401 [21:40:20] it is working, just very slowly. Instance allocation is lower than it has been and it doesn't seem like we're even getting our lower instance allocation at the moment. Thanks for bearing with us on this :\ [21:40:51] thcipriani: seeing about ~4 concurrent nodepool instance? [21:40:53] *s [21:40:54] thanks, don't sweat it [21:41:35] greg-g: we seem to be maintaining a pool of 6 in varying states of readiness [21:41:51] 4 active at any given time is probably pretty accurate [21:41:52] * greg-g nods [21:42:13] I see somewhere around 6-7 if I list instances in the project usually [21:42:28] I have never anecdotelly seen 10 [21:42:54] chasemp: did you see that comment from thcipriani yesterday re tyler changing nodepool's idea of what it's quota is? [21:42:56] looking to see if some are stuck in delete state or something [21:43:09] !log starting OCG deploy [21:43:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:43:19] greg-g: I missed that [21:43:36] chasemp: yeah, I can't figure out why nodepool thinks we have 10 instances. long rambling comments here: https://phabricator.wikimedia.org/T143016 [21:43:52] chasemp: start here: https://phabricator.wikimedia.org/T143016#2559382 [21:44:02] fwiw I see 6 in project in active state and none look stuck in delete or anything (seem fine) [21:44:48] huh ok that is interesting [21:44:59] yup, further down the rabbit hole we go [21:45:22] (03PS3) 10RobH: Enabling all users for analytics cluster access request for ISI Foundation team" [puppet] - 10https://gerrit.wikimedia.org/r/305398 (https://phabricator.wikimedia.org/T141634) [21:45:36] so this is "right" [21:45:41] openstack quota show contintcloud | grep instances [21:45:41] | instances | 10 [21:45:52] but I'm not going to say I trust that implicitly [21:46:45] !log aaron@tin Synchronized php-1.28.0-wmf.15/extensions/CentralAuth: ef4b5d45f9bb59c978e23f21ed09649aa628c4d1 (duration: 00m 59s) [21:46:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:47:05] (03CR) 10Merlijn van Deen: "I thought this was achieved using bind mounts. Does this code run on a host that doesn't/can't have those?" [puppet] - 10https://gerrit.wikimedia.org/r/305401 (owner: 10Rush) [21:47:06] thcipriani: could it be since instance count is an implied limit beholden to both ram and cpu that the 6 take up that space already? [21:47:20] the 7th instance is a figment w/ no actual resources, I guess I'll try to see [21:47:24] !log updated OCG to version e3e0fd015ad8fdbf9da1838c830fe4b075c59a29 (T133001, T142226) [21:47:26] T142226: Productize the Electron PDF render service & create a REST API end point - https://phabricator.wikimedia.org/T142226 [21:47:26] T133001: Decom legacy ex-parsoidcache cxserver, citoid, and restbase service hostnames - https://phabricator.wikimedia.org/T133001 [21:47:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:47:44] (03CR) 10Dzahn: [C: 031] Reclaim nobelium [dns] - 10https://gerrit.wikimedia.org/r/305192 (https://phabricator.wikimedia.org/T142581) (owner: 10Gehel) [21:47:50] chasemp: looked at that, we're spinning up m1.mediums so should be under limits all the way around near as I can tell. [21:47:53] (03CR) 10RobH: [V: 032] Enabling all users for analytics cluster access request for ISI Foundation team" [puppet] - 10https://gerrit.wikimedia.org/r/305398 (https://phabricator.wikimedia.org/T141634) (owner: 10RobH) [21:48:50] (03PS4) 10RobH: Enabling all users for request of ISI Foundation team [puppet] - 10https://gerrit.wikimedia.org/r/305398 [21:49:22] (03CR) 10Thcipriani: [C: 031] "scap3 stuff all lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/301505 (https://phabricator.wikimedia.org/T141014) (owner: 10BryanDavis) [21:49:39] (03PS2) 10Dzahn: Reclaim nobelium [dns] - 10https://gerrit.wikimedia.org/r/305192 (https://phabricator.wikimedia.org/T142581) (owner: 10Gehel) [21:50:30] !log set contintcloud project instance quota to 12 for testing [21:50:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:51:01] thcipriani: if this bring up a stable 8 then we are really down teh looking glass [21:51:58] chasemp: also, fwiw, I have the internal limit set to 6 max servers thinking that would at least stop the 403ing. Unless nodepool has been restarted since the last puppet run and/or it reread the config for some reason. [21:52:06] (03PS5) 10RobH: Enabling all users for request of ISI Foundation team [puppet] - 10https://gerrit.wikimedia.org/r/305398 [21:52:14] 06Operations, 10ops-eqiad: ms-be1005 - MegaRAID - CRITICAL: 1 failed LD(s) (Offline) - https://phabricator.wikimedia.org/T143265#2562438 (10Dzahn) [21:52:24] thcipriani: well I do indeed see 8 concurrent now :) [21:53:00] yarp [21:53:03] ACKNOWLEDGEMENT - MegaRAID on ms-be1005 is CRITICAL: CRITICAL: 1 failed LD(s) (Offline) daniel_zahn https://phabricator.wikimedia.org/T143265 [21:53:17] I don't even [21:53:17] seeing: Forbidden: Quota exceeded for instances: Requested 1, but already used 12 of 12 instances (HTTP 403) in debug logs [21:53:40] these two do not see eye to eye on usage [21:53:43] that much is clear [21:54:57] max servers is indeed 10 again so I think puppet snuck in on you [21:55:44] disabling for test [21:56:17] 06Operations, 10ops-eqiad: ms-be1005 - MegaRAID - CRITICAL: 1 failed LD(s) (Offline) - https://phabricator.wikimedia.org/T143265#2562438 (10Dzahn) Error: mkfs -t xfs -L swift-sdd1 -i size=512 /dev/sdd1 returned 1 instead of one of [0] Error: /Stage[main]/Role::Swift::Storage/Swift::Init_device[/dev/sdd]/Exec[m... [21:56:57] ACKNOWLEDGEMENT - puppet last run on ms-be1005 is CRITICAL: CRITICAL: Puppet has 1 failures daniel_zahn https://phabricator.wikimedia.org/T143265 [21:56:59] yeah, config yaml file changed, unclear if nodepool as a process reread that config at anypoint. puppet doesn't seem to do any restart/hup for this [21:57:16] but it would make sense considering the number of servers is over 6 :) [21:57:28] but... 12? [21:57:34] ok I just set it to 12 :) [21:57:53] maybe "# of instances + 2" is what it wants? [21:58:11] so...I'm wondering about interaction of min-ready w/ types [21:58:16] among other things [21:58:23] how aggressive is min-ready count? [21:58:55] not very, it's a loose guideline [21:58:59] !log disable puppet on labnodepool for testing of instance threshold [21:59:02] !log openstack quota set --instances 14 contintcloud [21:59:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:59:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:59:26] 06Operations, 10ops-eqiad: ms-be1005 - MegaRAID - CRITICAL: 1 failed LD(s) (Offline) - https://phabricator.wikimedia.org/T143265#2562485 (10Dzahn) ``` 1006 Aug 17 18:59:13 ms-be1005 kernel: [3837686.869494] sd 0:2:3:0: [sdd] 1007 Aug 17 18:59:13 ms-be1005 kernel: [3837686.869499] Result: hostbyte=DID_BAD_T... [22:00:13] I have no explanation but I want to see if the offset holds true [22:00:33] i.e quota 14 and nodepool 10, does that still fail consistently to grab an instance [22:01:14] hrm, seeing 9 servers pretty consistantly [22:01:16] it's doing it's post restart state cleanup for nodes (mass delete atm) [22:01:50] thcipriani: I see 10 openstack side (which is probably canonical considering) [22:01:50] still seeing 403s in the logs :( [22:02:15] (03CR) 10Dzahn: [C: 032] "it's down, not in icinga puppet or salt" [dns] - 10https://gerrit.wikimedia.org/r/305192 (https://phabricator.wikimedia.org/T142581) (owner: 10Gehel) [22:03:49] !log restart nodepool with 10s cycle rate [22:03:54] strange, I do see nodepool list climb to 10 occasionally, but I think it's wishful thinking on nodepool's part, i.e. it thinks it's under its max-server limit. [22:03:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:06:13] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Traffic: ULS GeoIP should use the Cookie - https://phabricator.wikimedia.org/T143270#2562516 (10BBlack) [22:06:39] I have some loose thinking that the max servers in progress matching exactly quota isn't going to work out w/ churn allowances (delete/build/setup) [22:06:41] 06Operations, 10Fundraising-Backlog, 10Traffic, 13Patch-For-Review: Switch Varnish's GeoIP code to libmaxminddb/GeoIP2 - https://phabricator.wikimedia.org/T99226#2562532 (10BBlack) [22:06:45] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Traffic: ULS GeoIP should use the Cookie - https://phabricator.wikimedia.org/T143270#2562516 (10BBlack) [22:06:51] i.e. having those two be exact doesn't make sense [22:07:48] chasemp: fwiw the debug log is so far devoid of 403 [22:07:59] yeah, I did a few things I'm waiting out here [22:08:35] I set the quota a bit higher than configured concurrent instances to give allowance for churn times (we are still deleting but nodepool has moved on and wants a new instance) and I upped the rate [22:08:53] as in, gave it more space [22:09:20] trying to reconcile every second seems like overkill not worth debugging [22:09:28] 06Operations, 10MediaWiki-extensions-CentralNotice, 10Traffic: CN: Stop using the geoiplookup HTTPS service (always use the Cookie) - https://phabricator.wikimedia.org/T143271#2562534 (10BBlack) [22:09:58] 06Operations, 10Fundraising-Backlog, 10Traffic, 13Patch-For-Review: Switch Varnish's GeoIP code to libmaxminddb/GeoIP2 - https://phabricator.wikimedia.org/T99226#2562548 (10BBlack) [22:10:02] 06Operations, 10MediaWiki-extensions-CentralNotice, 10Traffic: CN: Stop using the geoiplookup HTTPS service (always use the Cookie) - https://phabricator.wikimedia.org/T143271#2562534 (10BBlack) [22:10:33] I would say this almost makes sense...except I see a normalized 9 instances atm :) but it's not an exact science obv as it's dependent on nodepool state management [22:10:53] ok now I see 10 more often [22:11:08] thcipriani: how is the CI backlog doing? [22:11:59] chasemp: clearing up. Less slowly than it has been. [22:12:46] only needs 29 instances rather than the 37 it needed a few minutes ago [22:14:33] 06Operations, 10Fundraising-Backlog, 10Traffic, 13Patch-For-Review: Switch Varnish's GeoIP code to libmaxminddb/GeoIP2 - https://phabricator.wikimedia.org/T99226#2562582 (10BBlack) I'm planning to merge up the new version of the Varnish GeoIP code from 371d7cc737d0 in the next couple of days. If anyone ha... [22:15:39] caught one so it's still happening just less frequently [22:15:39] seems to wait 5s from deletion to request for new instance [22:15:40] im still waiting for a check on one of mine [22:15:40] !log openstack quota set --instances 15 contintcloud [22:15:40] has been 22 minutes... [22:15:40] but i see 38min ones as well =[ [22:15:40] wait, that one just rolled off. [22:15:41] it's running tests but I'm not sure how to quantify speed of backlog consumption [22:15:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:16:01] !log restart nodepool [22:16:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:16:15] want to refresh things here to see when it trips up [22:16:46] thcipriani: where are you looking to see that? [22:16:56] wtf [22:17:00] tail -f /var/log/nodepool/debug.log [22:17:26] (03PS1) 10Dzahn: en.planet: add 3 new feeds [puppet] - 10https://gerrit.wikimedia.org/r/305410 [22:20:42] (03CR) 10jenkins-bot: [V: 04-1] sge collector: set correct env [puppet] - 10https://gerrit.wikimedia.org/r/305401 (owner: 10Rush) [22:25:12] thcipriani: so it's a tolerance thing based on job rate possibly and delete is async, so it seems to do [22:25:14] 2016-08-17 22:24:22,070 INFO nodepool.NodePool: Deleted node id: 355548 [22:25:14] 2016-08-17 22:24:27,337 INFO nodepool.NodePool: Need to launch 1 [22:25:32] i.e. needs buffer in the quota because it churns on nodes faster than they can actually be deleted and released I think [22:25:39] not our only issue possibly but I thikn it's one for sure [22:28:09] (03CR) 10jenkins-bot: [V: 04-1] Enabling all users for request of ISI Foundation team [puppet] - 10https://gerrit.wikimedia.org/r/305398 (owner: 10RobH) [22:28:16] (03CR) 10jenkins-bot: [V: 04-1] en.planet: add 3 new feeds [puppet] - 10https://gerrit.wikimedia.org/r/305410 (owner: 10Dzahn) [22:28:25] now we are cooking [22:29:36] (03PS2) 10Rush: sge collector: set correct env [puppet] - 10https://gerrit.wikimedia.org/r/305401 [22:29:52] (03CR) 10Merlijn van Deen: [C: 031] sge collector: set correct env [puppet] - 10https://gerrit.wikimedia.org/r/305401 (owner: 10Rush) [22:30:41] (03CR) 10jenkins-bot: [V: 04-1] sge collector: set correct env [puppet] - 10https://gerrit.wikimedia.org/r/305401 (owner: 10Rush) [22:31:02] (03PS3) 10Dzahn: Make phabricator monitoring dependent on $::site [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [22:31:14] (03CR) 10Rush: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/305401 (owner: 10Rush) [22:31:54] thcipriani: greg-g do you object to me keeping the puppet freeze w/ new nodepool settings on labnodepool over night? [22:31:59] (03PS4) 10Dzahn: phabricator: only send SMS for issues on active server [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [22:32:03] (03CR) 1020after4: [C: 031] "cool! looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [22:32:54] (03CR) 1020after4: [C: 031] add phab1001.eqiad as CNAME for iridium.eqiad [dns] - 10https://gerrit.wikimedia.org/r/305335 (owner: 10Dzahn) [22:33:50] (03PS5) 10Dzahn: phabricator: only send SMS for issues on active server [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [22:34:05] (03CR) 1020after4: [C: 031] "Nice!" [puppet] - 10https://gerrit.wikimedia.org/r/305277 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [22:34:18] does 'recheck' still work? [22:34:59] oh I just missed it as no irc update [22:35:04] took 1m [22:35:31] but the uh test appears broken ' ERROR: pep8: commands failed' [22:35:53] (03CR) 1020after4: [C: 031] Scap3: Go ahead and `scap deploy --init` a freshly provisioned repo [puppet] - 10https://gerrit.wikimedia.org/r/301409 (owner: 10Chad) [22:36:00] (03CR) 10Dzahn: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/305410 (owner: 10Dzahn) [22:36:21] (03CR) 10Rush: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/305401 (owner: 10Rush) [22:36:25] (03CR) 10jenkins-bot: [V: 04-1] phabricator: only send SMS for issues on active server [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [22:38:11] (03CR) 1020after4: [C: 031] Add the fatalmonitor query to logstash_checker [puppet] - 10https://gerrit.wikimedia.org/r/304327 (https://phabricator.wikimedia.org/T142784) (owner: 10Thcipriani) [22:40:21] (03CR) 10Rush: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/304327 (https://phabricator.wikimedia.org/T142784) (owner: 10Thcipriani) [22:40:25] (03PS1) 10Ppchelko: ChangeProp: Set UV_THREADPOOL_SIZE env variable [puppet] - 10https://gerrit.wikimedia.org/r/305414 [22:48:46] (03CR) 10Legoktm: sge collector: set correct env (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305401 (owner: 10Rush) [22:49:35] (03CR) 10Legoktm: sge collector: set correct env (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305401 (owner: 10Rush) [22:50:11] (03PS2) 10Ppchelko: ChangeProp: Set UV_THREADPOOL_SIZE env variable [puppet] - 10https://gerrit.wikimedia.org/r/305414 [22:57:33] (03PS2) 10Dzahn: en.planet: add 3 new feeds [puppet] - 10https://gerrit.wikimedia.org/r/305410 [22:57:40] (03CR) 10Dzahn: [C: 032] en.planet: add 3 new feeds [puppet] - 10https://gerrit.wikimedia.org/r/305410 (owner: 10Dzahn) [22:59:20] (03CR) 10Ppchelko: "PPC: https://puppet-compiler.wmflabs.org/3743/" [puppet] - 10https://gerrit.wikimedia.org/r/305414 (owner: 10Ppchelko) [23:00:05] RoanKattouw, ostriches, MaxSem, and Dereckson: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160817T2300). Please do the needful. [23:02:00] just to teach ebernhardson to type "{{ircnick}}" we'll pretend there's nothing to deploy :P [23:03:01] srsly though, where's ebernhardson ? [23:04:14] ok, waiting for Erik to appear... [23:04:36] (03PS6) 10Dzahn: phabricator: only send SMS for issues on active server [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [23:05:06] MaxSem: :) [23:05:25] (03PS3) 10MaxSem: CirrusSearch - drop references to nobelium and labsearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304204 (https://phabricator.wikimedia.org/T142705) (owner: 10Gehel) [23:05:33] (03CR) 10MaxSem: [C: 032] CirrusSearch - drop references to nobelium and labsearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304204 (https://phabricator.wikimedia.org/T142705) (owner: 10Gehel) [23:05:44] MaxSem: also i did use ircnick...just above my change it says Erik B (ebernhardson) which is that template... [23:06:04] (03Merged) 10jenkins-bot: CirrusSearch - drop references to nobelium and labsearch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304204 (https://phabricator.wikimedia.org/T142705) (owner: 10Gehel) [23:06:28] nope, it was supposed to telly you you have patches [23:07:00] bot's broken, in the wikitext: {{ircnick|ebernhardson|Erik B}} [23:07:10] cuz I fixed that ;) [23:07:14] oh :P [23:07:32] https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=817633&oldid=817621 [23:07:46] this patch is basically a nop, the server is already shutdown and decomissioned, this is just cleaning up old unused config [23:08:16] ebernhardson, pulled on mw1099 [23:08:24] (03PS3) 10Ppchelko: ChangeProp: Update config for the new driver [puppet] - 10https://gerrit.wikimedia.org/r/305414 [23:08:26] (03CR) 10Dzahn: [C: 04-1] "getting there, but not yet. removes "sms" from both like this. http://puppet-compiler.wmflabs.org/3744/" [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [23:08:57] MaxSem: nothing in particular to check. If no errors are being spit out it's good [23:11:35] (03CR) 10Ppchelko: "Puppet compiler: https://puppet-compiler.wmflabs.org/3745/" [puppet] - 10https://gerrit.wikimedia.org/r/305414 (owner: 10Ppchelko) [23:12:14] (03CR) 1020after4: phabricator: only send SMS for issues on active server (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [23:12:33] !log maxsem@tin Synchronized wmf-config/CirrusSearch-production.php: https://gerrit.wikimedia.org/r/#/c/304204/3 (duration: 00m 53s) [23:12:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:12:50] ebernhardson, ^ [23:13:15] MaxSem: sweet! looks good [23:13:51] (03PS7) 10Dzahn: phabricator: only send SMS for issues on active server [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [23:14:56] (03PS2) 10MaxSem: Enable Language ID for Russian, Japanese, Portuguese Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304328 (https://phabricator.wikimedia.org/T142413) (owner: 10Tjones) [23:15:02] (03CR) 10MaxSem: [C: 032] Enable Language ID for Russian, Japanese, Portuguese Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304328 (https://phabricator.wikimedia.org/T142413) (owner: 10Tjones) [23:15:34] (03Merged) 10jenkins-bot: Enable Language ID for Russian, Japanese, Portuguese Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304328 (https://phabricator.wikimedia.org/T142413) (owner: 10Tjones) [23:16:42] MaxSem: seeing 1099 now giving appropriate inter-wiki results [23:16:53] ebernhardson, Trey314159: pulled on mw1099 [23:17:36] unrelated to that patch, it seems someone broke WikimediaMessages again ... [23:18:08] example? [23:18:12] (03PS8) 10BBlack: varnish: switch from libGeoIP to libmaxminddb [puppet] - 10https://gerrit.wikimedia.org/r/253619 (https://phabricator.wikimedia.org/T99226) (owner: 10Faidon Liambotis) [23:18:14] (03PS1) 10BBlack: www.toolserver.org: remove geoiplookup reference [puppet] - 10https://gerrit.wikimedia.org/r/305418 (https://phabricator.wikimedia.org/T100902) [23:18:16] (03PS1) 10BBlack: GeoIP VCL: re-set old IPv6 no-data cookies [puppet] - 10https://gerrit.wikimedia.org/r/305419 (https://phabricator.wikimedia.org/T99226) [23:18:17] MaxSem: https://ru.wikipedia.org/w/index.php?title=%D0%A1%D0%BB%D1%83%D0%B6%D0%B5%D0%B1%D0%BD%D0%B0%D1%8F:%D0%9F%D0%BE%D0%B8%D1%81%D0%BA&profile=default&fulltext=Search&search=testing+some+english+words+here&searchToken=7sligpxaajwd50wqfv9d25gs6 [23:18:18] (03PS1) 10BBlack: Remove geoiplookup service IPs from LVS [puppet] - 10https://gerrit.wikimedia.org/r/305420 (https://phabricator.wikimedia.org/T100902) [23:18:20] (03PS1) 10BBlack: GeoIP VCL: remove JSON output support [puppet] - 10https://gerrit.wikimedia.org/r/305421 (https://phabricator.wikimedia.org/T100902) [23:18:28] (03PS1) 10BBlack: Remove geoiplookup DNS entries [dns] - 10https://gerrit.wikimedia.org/r/305422 (https://phabricator.wikimedia.org/T100902) [23:18:37] MaxSem: it renders with , i checked with mwrepl on terbium and it's unrelated to this patch [23:18:45] also is happening in other languages [23:19:14] ebernhardson, is a scap needed or it's a more fundamental breakage? [23:19:22] MaxSem: not sure yet, i have to look into it now [23:19:57] the actual repo hasn't had any updates outside of l10nbot in awhile, so it's something probably changed in core or some such... [23:20:27] kk, lmk if you need help [23:22:02] (03CR) 10Dzahn: phabricator: only send SMS for issues on active server (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [23:24:18] (03PS8) 10Dzahn: phabricator: only send SMS for issues on active server [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [23:24:22] MaxSem: nothing in particular, safe enough to ship the main patch out [23:25:16] what main patch? [23:25:21] MaxSem: the configuration [23:25:44] MaxSem: that turns on interwiki search for ru, pt and ja [23:25:48] beh, I forgot I haven't synced it [23:25:51] :) [23:26:33] (03PS9) 10Dzahn: phabricator: only send SMS for issues on active server [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [23:26:44] found the problem, bad config in extension.json [23:27:17] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/304328/ (duration: 00m 55s) [23:27:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:27:36] ebernhardson, ^ [23:27:56] MaxSem: https://gerrit.wikimedia.org/r/305424 [23:29:17] that one will probably require a cherry pick to both deploy branches, then a scap [23:30:57] ok, picking all teh cherries [23:31:09] thanks [23:31:27] (03CR) 10Dzahn: [C: 031] "now it works: no change on iridium, removes "sms" on phab2001 http://puppet-compiler.wmflabs.org/3748/" [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [23:31:39] why are these prefixed with "wikimedia", anyway? [23:31:59] (03CR) 10Dzahn: [C: 032] phabricator: only send SMS for issues on active server [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [23:32:05] (03PS10) 10Dzahn: phabricator: only send SMS for issues on active server [puppet] - 10https://gerrit.wikimedia.org/r/305149 (owner: 1020after4) [23:35:01] !log maxsem@tin Started scap: https://gerrit.wikimedia.org/r/#/c/305424/ [23:35:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:38:50] MaxSem: i dunno, probably to match the name of the key in wgMessagesDirs [23:38:53] (03PS5) 10Dzahn: phabricator: allow ssh between servers for cluster support [puppet] - 10https://gerrit.wikimedia.org/r/305277 (https://phabricator.wikimedia.org/T137928) [23:42:22] MaxSem: looks like amir did that at some point, moving them from the prior location in the main wikimedia/ message dir [23:43:50] (03PS4) 10Dzahn: installserver: split DHCP part out into role, add on install1001 [puppet] - 10https://gerrit.wikimedia.org/r/305163 (https://phabricator.wikimedia.org/T132757) [23:44:25] (03PS5) 10Dzahn: installserver: split DHCP part out into own role [puppet] - 10https://gerrit.wikimedia.org/r/305163 (https://phabricator.wikimedia.org/T132757) [23:55:30] (03PS4) 10Ppchelko: ChangeProp: Update config for the new driver [puppet] - 10https://gerrit.wikimedia.org/r/305414