[00:01:05] 10Operations, 10Ops-Access-Requests, 10DBA, 10cloud-services-team (Kanban): Access to raw database tables on labsdb* for wmcs-admin users - https://phabricator.wikimedia.org/T178128#3681728 (10bd808) [00:02:03] 10Operations, 10Data-Services, 10cloud-services-team (Kanban): Switch labstore servers to default SSH configuration - https://phabricator.wikimedia.org/T177914#3681731 (10bd808) [00:04:01] 10Operations, 10Patch-For-Review: create endowment.wm.org microsite - https://phabricator.wikimedia.org/T136735#3681739 (10Dzahn) @kaythaney Great! Sounds good, thank you. I already removed everything on our side. [00:09:32] (03CR) 10Dzahn: [C: 031] "stats were replaced with prometheus in https://phabricator.wikimedia.org/T147426" [puppet] - 10https://gerrit.wikimedia.org/r/382918 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [00:11:53] (03PS2) 10Dzahn: authdns: remove ganglia support [puppet] - 10https://gerrit.wikimedia.org/r/382918 (https://phabricator.wikimedia.org/T177225) [00:18:29] (03CR) 10Dzahn: [C: 032] authdns: remove ganglia support [puppet] - 10https://gerrit.wikimedia.org/r/382918 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [00:22:59] !log authdns servers: deleted /etc/ganglia/conf.d/gndsd.pyconf & /usr/lib/ganglia/python_modules_gdnsd.py - removing ganglia stats [00:23:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:24:05] (03CR) 10Dzahn: "sudo rm /usr/lib/ganglia/python_modules/gdnsd.py" [puppet] - 10https://gerrit.wikimedia.org/r/382918 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [00:26:02] 10Operations, 10monitoring, 10Patch-For-Review: Uninstall ganglia from the fleet - https://phabricator.wikimedia.org/T177225#3681770 (10Dzahn) removed ganglia stats for gdnsd (authdns servers). I checked that there was no regression because stats were ported in T147426 , merged the above, then did: [baham... [00:27:26] 10Operations, 10Traffic, 10Patch-For-Review, 10Prometheus-metrics-monitoring: Port gdnsd statistics from ganglia to prometheus - https://phabricator.wikimedia.org/T147426#2692643 (10Dzahn) removed the ganglia stats for this today [00:46:08] (03CR) 10Dzahn: [C: 031] "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/380959 (owner: 10Aklapper) [00:46:35] (03CR) 10jerkins-bot: [V: 04-1] Phab: Allow aklapper to delete panels on dashboards [puppet] - 10https://gerrit.wikimedia.org/r/380959 (owner: 10Aklapper) [00:50:27] i guess not today then [01:04:34] PROBLEM - Check health of redis instance on 6481 on rdb2005 is CRITICAL: CRITICAL: replication_delay is 1507856667 600 - REDIS 2.8.17 on 127.0.0.1:6481 has 1 databases (db0) with 4129880 keys, up 4 minutes 24 seconds - replication_delay is 1507856667 [01:04:55] PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 127.0.0.1 on port 6479 [01:05:34] RECOVERY - Check health of redis instance on 6481 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6481 has 1 databases (db0) with 4126929 keys, up 5 minutes 27 seconds - replication_delay is 0 [01:05:55] RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 4128325 keys, up 5 minutes 48 seconds - replication_delay is 0 [03:23:44] PROBLEM - nutcracker port on labweb1001 is CRITICAL: connect to address 127.0.0.1 and port 11212: Connection refused [03:24:04] PROBLEM - nutcracker process on labweb1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (nutcracker), command name nutcracker [03:24:34] PROBLEM - Check systemd state on labweb1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [03:24:34] PROBLEM - puppet last run on labweb1001 is CRITICAL: CRITICAL: Puppet has 4 failures. Last run 4 minutes ago with 4 failures. Failed resources (up to 3 shown): Package[libssl1.0.0-dbg],Package[libstdc++6-4.8-dbg],Package[libjson-c2-dbg],Package[libboost1.55-dbg] [03:27:35] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 751.33 seconds [03:48:46] 10Operations, 10Traffic, 10Patch-For-Review: Content purges are unreliable - https://phabricator.wikimedia.org/T133821#3681880 (10Tbayer) For the record: One manifestation of this issue (video and audio files remaining available on upload.wikimedia.org for many hours after being deleted by an admin, T69559)... [04:04:54] PROBLEM - mediawiki-installation DSH group on labweb1001 is CRITICAL: Host labweb1001 is not in mediawiki-installation dsh group [04:13:24] PROBLEM - mediawiki-installation DSH group on labweb1002 is CRITICAL: Host labweb1002 is not in mediawiki-installation dsh group [04:43:43] 10Operations, 10MediaWiki-Platform-Team, 10TechCom-RfC, 10HHVM, 10NewPHP: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3681907 (10alex-mashin) >> I therefore suggest a more ambitious goal: make strict typing compulsory in MediaWiki code. This requirement can be enforced... [04:56:04] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 199.10 seconds [05:32:32] 10Operations, 10DBA: Wikimedia\Rdbms\DBQueryTimeoutError (not repeated) - https://phabricator.wikimedia.org/T178109#3681065 (10Marostegui) I am not seeing an abnormal amount of errors on eswiki for the last 24h hours. Does this happen every time you try it or did it happen just one time? [06:11:14] (03PS1) 10Marostegui: mariadb: Add db2081 to s5 and later s8 [puppet] - 10https://gerrit.wikimedia.org/r/383971 (https://phabricator.wikimedia.org/T170662) [06:12:01] (03PS1) 10Marostegui: s5.hosts: Add db2081 to s5 [software] - 10https://gerrit.wikimedia.org/r/383972 (https://phabricator.wikimedia.org/T170662) [06:14:32] (03CR) 10Marostegui: [C: 032] s5.hosts: Add db2081 to s5 [software] - 10https://gerrit.wikimedia.org/r/383972 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [06:15:20] (03Merged) 10jenkins-bot: s5.hosts: Add db2081 to s5 [software] - 10https://gerrit.wikimedia.org/r/383972 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [06:20:15] (03CR) 10Marostegui: [C: 032] "Puppet looks good: https://puppet-compiler.wmflabs.org/compiler02/8304/" [puppet] - 10https://gerrit.wikimedia.org/r/383971 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [06:23:14] !log Stop MySQL on db2080 to copy its data to db2081 - T170662 [06:23:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:23:22] T170662: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662 [06:26:06] 10Operations, 10MediaWiki-Platform-Team, 10TechCom-RfC, 10HHVM, 10NewPHP: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370#3623015 (10Joe) >>! In T176370#3658845, @EddieGP wrote: >>>! In T176370#3658764, @Reedy wrote: >> Probably. This task would block that; we can't change... [06:27:54] PROBLEM - graphite.wikimedia.org on graphite1003 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 398 bytes in 0.001 second response time [06:28:54] RECOVERY - graphite.wikimedia.org on graphite1003 is OK: HTTP OK: HTTP/1.1 200 OK - 1547 bytes in 0.013 second response time [06:29:34] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:35:44] (03PS3) 10Hashar: spec: subject type is infered from dir structure [puppet] - 10https://gerrit.wikimedia.org/r/383870 [06:39:37] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:09:36] 10Operations, 10ops-esams, 10DC-Ops, 10Traffic: Multiple systems in esams OE10 showing PSU failures - https://phabricator.wikimedia.org/T177228#3682015 (10ema) p:05Triage>03Normal [07:13:57] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/383834 (https://phabricator.wikimedia.org/T177961) (owner: 10Ema) [07:15:38] (03PS2) 10Ema: pybal: use Monitoring::Plugin in check_pybal [puppet] - 10https://gerrit.wikimedia.org/r/383834 (https://phabricator.wikimedia.org/T177961) [07:15:47] (03CR) 10Ema: [V: 032 C: 032] pybal: use Monitoring::Plugin in check_pybal [puppet] - 10https://gerrit.wikimedia.org/r/383834 (https://phabricator.wikimedia.org/T177961) (owner: 10Ema) [07:19:27] PROBLEM - PyBal backends health check on lvs3003 is CRITICAL: NRPE: Unable to read output [07:19:28] PROBLEM - PyBal backends health check on lvs4001 is CRITICAL: NRPE: Unable to read output [07:20:28] uh, checking [07:21:27] RECOVERY - PyBal backends health check on lvs3003 is OK: PYBAL OK - All pools are healthy [07:22:37] RECOVERY - PyBal backends health check on lvs4001 is OK: PYBAL OK - All pools are healthy [07:24:11] strange, on both hosts running puppet fixed the issue by installing libmonitoring-plugin-perl, but check_pybal hasn't been changed [07:24:43] it's as if a previous puppet run would have updated check_pybal without installing libmonitoring-plugin-perl? [07:28:29] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Add db2081 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383983 (https://phabricator.wikimedia.org/T170662) [07:32:04] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Add db2081 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383983 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [07:33:10] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Add db2081 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383983 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [07:33:11] gilles gehel I'm taking a look at the whole T178078 and T150734 thing [07:33:11] T178078: RESTBase logs disappeared from logstash - https://phabricator.wikimedia.org/T178078 [07:33:11] T150734: Make Thumbor logs available in ELK - https://phabricator.wikimedia.org/T150734 [07:33:24] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Add db2081 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383983 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [07:34:17] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Add db2081 to the config - T170662 (duration: 00m 47s) [07:34:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:34:25] T170662: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662 [07:35:10] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Add db2081 to the config - T170662 (duration: 00m 46s) [07:35:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:45:49] (03PS4) 10Elukey: LVS for druid-public broker [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [07:49:06] (03PS1) 10Marostegui: mariadb: Add db2082 to s5 and later s8 [puppet] - 10https://gerrit.wikimedia.org/r/383984 (https://phabricator.wikimedia.org/T170662) [07:49:50] (03PS1) 10Marostegui: s5.hosts: Add db2082 to s5 [software] - 10https://gerrit.wikimedia.org/r/383985 (https://phabricator.wikimedia.org/T170662) [07:51:04] (03CR) 10Marostegui: [C: 032] s5.hosts: Add db2082 to s5 [software] - 10https://gerrit.wikimedia.org/r/383985 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [07:51:53] (03Merged) 10jenkins-bot: s5.hosts: Add db2082 to s5 [software] - 10https://gerrit.wikimedia.org/r/383985 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [07:52:57] (03CR) 10Marostegui: [C: 032] "Puppet looks good: https://puppet-compiler.wmflabs.org/compiler02/8305/" [puppet] - 10https://gerrit.wikimedia.org/r/383984 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [07:54:08] PROBLEM - Host db2081.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [07:57:33] ^ I am on that [07:57:51] (creating a task basically) [07:58:09] 10Operations, 10ops-codfw, 10DBA: db2081 unreachable - https://phabricator.wikimedia.org/T178140#3682064 (10Marostegui) [08:01:44] (03CR) 10Elukey: "added the critical: false option to avoid paging for the initial testing period." [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [08:10:07] 10Operations, 10Traffic, 10Wikimedia-Logstash, 10Services (watching): RESTBase logs disappeared from logstash - https://phabricator.wikimedia.org/T178078#3682104 (10fgiunchedi) [08:10:46] !log depool/repool logstash1001 (service=logstash-gelf) T178078 [08:10:47] PROBLEM - puppet last run on db2082 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 6 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[prometheus-node-exporter],Package[puppet] [08:10:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:10:53] T178078: RESTBase logs disappeared from logstash - https://phabricator.wikimedia.org/T178078 [08:11:18] !log ema@neodymium conftool action : set/pooled=no; selector: name=logstash1001.eqiad.wmnet,service=logstash-gelf,dc=eqiad [08:11:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:11:58] !log ema@neodymium conftool action : set/pooled=yes; selector: name=logstash1001.eqiad.wmnet,service=logstash-gelf,dc=eqiad [08:12:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:15:40] !log restarting commons wiki recentchanges purge of wb entries T177772 [08:15:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:15:47] RECOVERY - puppet last run on db2082 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [08:15:47] T177772: Purge 90% of rows from recentchanges (and posibly defragment) from commonswiki and ruwiki (the ones with source:wikidata) - https://phabricator.wikimedia.org/T177772 [08:18:01] !log upgrade pybal on lvs1006 to 1.14.1 T177815 [08:18:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:18:08] T177815: Alerts on LVS services with one single realserver - https://phabricator.wikimedia.org/T177815 [08:20:34] (03PS1) 10Muehlenhoff: Add library hint for openldap [puppet] - 10https://gerrit.wikimedia.org/r/383987 [08:24:39] (03CR) 10Muehlenhoff: [C: 032] Add library hint for openldap [puppet] - 10https://gerrit.wikimedia.org/r/383987 (owner: 10Muehlenhoff) [08:34:09] !log upgrade pybal on lvs1003 to 1.14.1 T177815 [08:34:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:34:16] T177815: Alerts on LVS services with one single realserver - https://phabricator.wikimedia.org/T177815 [08:36:13] (03CR) 10Ladsgroup: "@Daniel: Yes" [puppet] - 10https://gerrit.wikimedia.org/r/352574 (https://phabricator.wikimedia.org/T140890) (owner: 10Ladsgroup) [08:36:23] gehel: ok to merge https://gerrit.wikimedia.org/r/#/c/380943/ ? [08:36:51] godog: sure! [08:37:15] Hey, Can these get merged today or I need to wait until puppet SWAT? https://gerrit.wikimedia.org/r/#/c/352574/2 https://gerrit.wikimedia.org/r/#/c/352797/ [08:37:35] regarding the second one: apergos ^ [08:37:53] godog: have you seen https://phabricator.wikimedia.org/T178078 ? It looks like restbase lost logging when logstash was restarted yesterday, but as far as I can see, no config change was merged for lgstash... [08:38:57] godog: logstash will require a restart once your change is merged. Busy atm, but I can do the restart a bit later if you need me (other wise, feel free to do it - https://wikitech.wikimedia.org/wiki/Service_restarts#Logstash) [08:40:15] gehel: yeah turns out it was a pybal problem, not logstash :( ema has upgraded pybal there and we're back, though the root cause isn't 100% clear at least to me, I'll update the task [08:40:35] gehel: thanks though! I'll merge and let you know if I need help with it [08:40:41] godog: kool! I was not understanding what was going on! [08:41:27] (03PS3) 10Filippo Giunchedi: Thumbor: don't rewrite host value in logstash messages [puppet] - 10https://gerrit.wikimedia.org/r/380943 (https://phabricator.wikimedia.org/T150734) (owner: 10Gilles) [08:41:42] Amir1: I have your change in mind, assuming that table is really gone by then, I wll merge the change before the next run on the 20th [08:42:04] apergos: it already stopped getting updated [08:42:33] (03CR) 10Filippo Giunchedi: [C: 032] Thumbor: don't rewrite host value in logstash messages [puppet] - 10https://gerrit.wikimedia.org/r/380943 (https://phabricator.wikimedia.org/T150734) (owner: 10Gilles) [08:46:25] 2017-10-11 12:38 [08:47:32] PROBLEM - HHVM rendering on mw2145 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:48:22] RECOVERY - HHVM rendering on mw2145 is OK: HTTP OK: HTTP/1.1 200 OK - 77793 bytes in 0.372 second response time [08:50:49] Amir1: ok, I have had a good look at the tickets. it will be merged at the end of the current wd run. [08:51:34] well all runs tbh. but in a few days. [08:52:52] (03PS1) 10Alexandros Kosiaris: Purge /usr/lib/ganglia/ and /etc/ganglia [puppet] - 10https://gerrit.wikimedia.org/r/383991 (https://phabricator.wikimedia.org/T177225) [08:54:18] 10Operations, 10monitoring, 10Patch-For-Review: Uninstall ganglia from the fleet - https://phabricator.wikimedia.org/T177225#3682216 (10akosiaris) >>! In T177225#3681770, @Dzahn wrote: > removed ganglia stats for gdnsd (authdns servers). > > I checked that there was no regression because stats were ported i... [08:57:30] !log Stop db1103 and db1038 in sync for more checksumming - T164488 [08:57:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:57:38] T164488: Run pt-table-checksum on s3 - https://phabricator.wikimedia.org/T164488 [08:57:58] (03PS5) 10Elukey: LVS for druid-public broker [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [08:58:26] 10Operations, 10Continuous-Integration-Infrastructure (shipyard), 10User-Joe: Unify production and CI docker image build process - https://phabricator.wikimedia.org/T177276#3682226 (10Joe) a:03Joe [08:59:06] (03CR) 10Alexandros Kosiaris: [C: 031] compiler-update-facts: restore optional use of PUPPET_MASTER env [puppet] - 10https://gerrit.wikimedia.org/r/383857 (https://phabricator.wikimedia.org/T97081) (owner: 10Andrew Bogott) [09:01:34] apergos: thank you :) [09:04:10] !log restart pybal on lvs1003 to test logstash depool - T178078 [09:04:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:04:17] T178078: RESTBase logs disappeared from logstash - https://phabricator.wikimedia.org/T178078 [09:04:17] 10Operations, 10Continuous-Integration-Infrastructure (shipyard), 10User-Joe: Unify production and CI docker image build process - https://phabricator.wikimedia.org/T177276#3682232 (10Joe) Status update: I extracted the build script from `operations/docker-images/production-images` and it is able to build th... [09:07:38] (03CR) 10Alexandros Kosiaris: [C: 04-1] LVS for druid-public broker (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [09:09:32] checking --^ [09:10:09] thanks akosiaris, fixing! [09:10:39] I was unsure too about the node level lvs role inclusion but it was used in other places too [09:11:00] yeah lvs::realserver needs a bit of a rewrite. Needs to become a profile [09:11:03] I guess that in places like labs we'll not use the druid public worker role but either something else or a profile [09:11:10] and it was already under a rewrite to a role [09:11:14] hence the confusion [09:11:27] now that we have a better coding paradigm, I guess that work can resume [09:15:07] about - '10.2.2.38' # druid-public.svc.eqiad.wmnet - I think that the original idea behind it was to expose also the overlord daemon's port via a second service, using druid-public.svc.eqiad.wmnet in both broker and overlord (might not be right probably).. We skipped the overlord since it would be a weird configuration: it should be available for hadoop nodes, not domain network (since it allo [09:15:13] ws us to kick off indexing jobs on hdfs data) [09:17:14] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Add db2082 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383992 (https://phabricator.wikimedia.org/T170662) [09:18:20] akosiaris: so iiuc the convention would be to use one domain/ip (like druid-public-broker.svc.etc..) for each "service" exposed (so druid-public-overlord.svc. would have another domain/ip registered in dns and puppet) [09:18:32] (03PS1) 10Marostegui: s5.hosts: Add db2083 to the config [software] - 10https://gerrit.wikimedia.org/r/383993 (https://phabricator.wikimedia.org/T170662) [09:19:34] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Add db2082 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383992 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [09:20:46] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Add db2082 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383992 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [09:20:55] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Add db2082 to the config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383992 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [09:21:43] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Add db2082 to the config - T170662 (duration: 00m 47s) [09:21:43] (03PS1) 10Marostegui: mariadb: Add db2083 to s5 and later s8 [puppet] - 10https://gerrit.wikimedia.org/r/383994 (https://phabricator.wikimedia.org/T170662) [09:21:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:21:50] T170662: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662 [09:22:22] !log a small clean up of ores_classification table in wikidatawiki (T159753) [09:22:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:29] T159753: Concerns about ores_classification table size on enwiki - https://phabricator.wikimedia.org/T159753 [09:22:54] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Add db2082 to the config - T170662 (duration: 00m 46s) [09:23:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:24:34] (03CR) 10Marostegui: [C: 032] s5.hosts: Add db2083 to the config [software] - 10https://gerrit.wikimedia.org/r/383993 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [09:24:55] (03CR) 10Marostegui: [C: 032] "Puppet looks good: https://puppet-compiler.wmflabs.org/compiler02/8308/" [puppet] - 10https://gerrit.wikimedia.org/r/383994 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [09:25:17] (03CR) 10Elukey: LVS for druid-public broker (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [09:26:16] 10Operations, 10Continuous-Integration-Infrastructure (shipyard), 10User-Joe: Unify production and CI docker image build process - https://phabricator.wikimedia.org/T177276#3682327 (10Joe) With a quick skim at the CI repo, the following things done there are not supported by the current build system: * Auto... [09:27:04] (03Merged) 10jenkins-bot: s5.hosts: Add db2083 to the config [software] - 10https://gerrit.wikimedia.org/r/383993 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [09:28:33] (03PS6) 10Elukey: Add the LVS config for the druid-public-broker service [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [09:29:03] (03CR) 10jerkins-bot: [V: 04-1] Add the LVS config for the druid-public-broker service [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [09:29:42] (03PS1) 10Jcrespo: mariadb: Depool db1098 temporarelly for maintenace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383996 (https://phabricator.wikimedia.org/T177772) [09:29:55] modules/role/manifests/druid/public/worker.pp:9 wmf-style: role 'role::druid::public::worker' includes lvs::realserver which is neither a role nor a profile :P [09:30:12] (03CR) 10Marostegui: [C: 031] mariadb: Depool db1098 temporarelly for maintenace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383996 (https://phabricator.wikimedia.org/T177772) (owner: 10Jcrespo) [09:30:47] (03CR) 10Marostegui: [C: 031] "Can you also do templatelinks and pagelinks on frwiki, ruwiki and jawiki?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383996 (https://phabricator.wikimedia.org/T177772) (owner: 10Jcrespo) [09:30:54] elukey: I am not sure what overload even is... but yes... every service should be different [09:30:58] overlord [09:31:05] the name is great [09:31:14] :D [09:31:35] so basically on each druid node there are three daemons connected to each other: the overlord, the middlemanager and the peons [09:32:09] when new data is ready to be indexed on hdfs, Oozie contacts one of the overlords that in turn asks to the middlemanagers to start the work [09:32:19] (that in turn uses the peons locally for single tasks) [09:32:21] what ? no grunts ? [09:32:31] no troll axethrowers ? [09:32:37] :P [09:32:51] ahhahaha it would have been awesome and more descriptive I know [09:33:17] more info in https://phabricator.wikimedia.org/T172681 [09:33:19] nope [09:33:23] http://druid.io/docs/latest/design/indexing-service.html [09:34:24] heh, so peons are actually what normal people would call "job" [09:34:29] (03PS2) 10Jcrespo: mariadb: Depool db1098 temporarelly for maintenace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383996 (https://phabricator.wikimedia.org/T177772) [09:35:05] at least that's what I get from http://druid.io/docs/latest/design/peons.html [09:35:37] the way that's written makes me think a peon will run a single task and after that task is completed it basically dies [09:35:56] which is not what a Warcraft II peon does [09:36:00] :) [09:36:02] (03PS1) 10Filippo Giunchedi: hieradata: force present and empty runcommand.arguments [puppet] - 10https://gerrit.wikimedia.org/r/383997 (https://phabricator.wikimedia.org/T178078) [09:37:06] (03PS1) 10Elukey: Rename druid-public to druid-public-broker [dns] - 10https://gerrit.wikimedia.org/r/383998 (https://phabricator.wikimedia.org/T176223) [09:37:20] a there's also a task notion... whose description kind of conflicts with the peon's... [09:37:31] Tasks are run on middle managers and always operate on a single data source. [09:38:01] (03PS3) 10Jcrespo: mariadb: Depool db1098 temporarily for maintenace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383996 (https://phabricator.wikimedia.org/T177772) [09:38:06] I am guessing that page basically describe the API entity and then the middle manager spawn a peon to complete it ? [09:39:16] "Each Peon is capable of running only one task at a time, however, a middle manager may have multiple peons." is indeed a bit confusing [09:39:40] I still have not understood if a Peon is a long running process or not [09:39:43] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1098 temporarily for maintenace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383996 (https://phabricator.wikimedia.org/T177772) (owner: 10Jcrespo) [09:40:26] I think that they are basically JVMs fired off on demand, there is no druid daemon running called peon on druid100[456] [09:40:50] (to they terminate after their job is done) [09:40:53] (03CR) 10Dereckson: [C: 04-1] "Cirrus variables are also used in tests." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383944 (https://phabricator.wikimedia.org/T45956) (owner: 10Zoranzoki21) [09:40:55] yeah, so not very well named... [09:40:55] (03Merged) 10jenkins-bot: mariadb: Depool db1098 temporarily for maintenace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383996 (https://phabricator.wikimedia.org/T177772) (owner: 10Jcrespo) [09:41:03] (03CR) 10jenkins-bot: mariadb: Depool db1098 temporarily for maintenace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383996 (https://phabricator.wikimedia.org/T177772) (owner: 10Jcrespo) [09:41:11] nor well documented [09:43:49] (03CR) 10Alexandros Kosiaris: [C: 04-1] Add the LVS config for the druid-public-broker service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [09:44:04] (03CR) 10Alexandros Kosiaris: [C: 031] Rename druid-public to druid-public-broker [dns] - 10https://gerrit.wikimedia.org/r/383998 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey) [09:45:49] (03PS7) 10Elukey: Add the LVS config for the druid-public-broker service [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [09:45:59] elukey: I see wmf_style_guide bit ya [09:46:04] I 'd ignore it for now [09:46:15] (03CR) 10jerkins-bot: [V: 04-1] Add the LVS config for the druid-public-broker service [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [09:46:27] I thought I had fixed it with class { 'etc..' } [09:46:36] but it is of course not allowed in a role [09:46:50] yeah, put a lintignore around it [09:46:59] that entire thing needs some restructuring [09:48:21] I forgot a ':' though :P [09:48:26] (03PS8) 10Elukey: Add the LVS config for the druid-public-broker service [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [09:48:58] (03CR) 10Elukey: [C: 032] Rename druid-public to druid-public-broker [dns] - 10https://gerrit.wikimedia.org/r/383998 (https://phabricator.wikimedia.org/T176223) (owner: 10Elukey) [09:50:04] ahhh good now jenkins is fine [09:54:18] https://gerrit.wikimedia.org/r/#/c/383880/8/modules/role/manifests/druid/public/worker.pp ... why is that not caught _joe_ [09:54:19] ? [09:54:32] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1098 (duration: 00m 46s) [09:54:36] A class in a role ? [09:54:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:54:57] (03CR) 10Elukey: Add the LVS config for the druid-public-broker service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [09:56:01] (03PS1) 10Hoo man: sort wikidataclient.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 [09:56:06] maybe it is not enforced currently since a lot of roles do explicitly instanciate classes? [09:56:23] should I revert or wait? [09:56:26] * akosiaris knows not. I haven't had time yet to read the code [09:57:09] (03CR) 10Alexandros Kosiaris: Add the LVS config for the druid-public-broker service (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [09:57:49] 10Operations, 10Pybal, 10Traffic: RunCommandMonitoringProtocol throws an exception if runcommand.arguments is not specified - https://phabricator.wikimedia.org/T178149#3682399 (10ema) [09:58:04] 10Operations, 10Pybal, 10Traffic: RunCommandMonitoringProtocol throws an exception if runcommand.arguments is not specified - https://phabricator.wikimedia.org/T178149#3682411 (10ema) p:05Triage>03Normal [09:58:21] (03PS9) 10Elukey: Add the LVS config for the druid-public-broker service [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [09:58:53] (03PS2) 10Filippo Giunchedi: hieradata: force present and empty runcommand.arguments [puppet] - 10https://gerrit.wikimedia.org/r/383997 (https://phabricator.wikimedia.org/T178078) [10:00:05] (03PS3) 10Filippo Giunchedi: hieradata: force present and empty runcommand.arguments [puppet] - 10https://gerrit.wikimedia.org/r/383997 (https://phabricator.wikimedia.org/T178078) [10:00:14] (03PS1) 10Ema: runcommand: do not crash on empty runcommand.arguments [debs/pybal] - 10https://gerrit.wikimedia.org/r/384000 (https://phabricator.wikimedia.org/T178149) [10:01:05] (03CR) 10Ema: [C: 031] hieradata: force present and empty runcommand.arguments [puppet] - 10https://gerrit.wikimedia.org/r/383997 (https://phabricator.wikimedia.org/T178078) (owner: 10Filippo Giunchedi) [10:05:00] !log stopping slave on db1098 and optimize recentchanges [10:05:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:07:32] (03PS2) 10Hoo man: Sort dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 [10:08:50] (03CR) 10jerkins-bot: [V: 04-1] Sort dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [10:11:24] (03CR) 10Jcrespo: "There was a discussion about this not a long time ago, make sure UTF8 collation is used, and not latin or "C"." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [10:11:27] (03CR) 10Elukey: "pcc: https://puppet-compiler.wmflabs.org/compiler02/8309/lvs1003.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [10:11:50] (03PS1) 10Hoo man: Enable description usage tracking on a few test wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384003 (https://phabricator.wikimedia.org/T177155) [10:12:52] (03CR) 10Jcrespo: "54b84622e875034444d4b0d1ac1b2a7ac5df1af8 is the related one." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [10:14:17] (03CR) 10Filippo Giunchedi: [C: 032] hieradata: force present and empty runcommand.arguments [puppet] - 10https://gerrit.wikimedia.org/r/383997 (https://phabricator.wikimedia.org/T178078) (owner: 10Filippo Giunchedi) [10:15:44] 10Operations, 10Pybal, 10Traffic: Add UDP monitor for pybal - https://phabricator.wikimedia.org/T178151#3682485 (10ema) p:05Triage>03Normal [10:22:48] !log roll-restart pybal to apply https://gerrit.wikimedia.org/r/#/c/383997/ - T178149 [10:22:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:22:55] T178149: RunCommandMonitoringProtocol throws an exception if runcommand.arguments is not specified - https://phabricator.wikimedia.org/T178149 [10:32:06] godog: you have already started probably, but I was wondering if we could have added https://gerrit.wikimedia.org/r/#/c/383880/9 to the rolling restart [10:32:21] otherwise I'll do it later on [10:32:29] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1098 temporarily for maintenace" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384004 [10:32:35] elukey: yeah done already :( [10:32:48] too late :) [10:34:06] akosiaris: (whenever you have time) - if your concerns are resolved in the druid code review I'd proceed later on with the code change [10:35:13] (03CR) 10Alexandros Kosiaris: [C: 031] Add the LVS config for the druid-public-broker service [puppet] - 10https://gerrit.wikimedia.org/r/383880 (https://phabricator.wikimedia.org/T176223) (owner: 10Ottomata) [10:35:20] (03PS2) 10Ema: runcommand: do not crash on empty runcommand.arguments [debs/pybal] - 10https://gerrit.wikimedia.org/r/384000 (https://phabricator.wikimedia.org/T178149) [10:35:48] thankssss [10:35:49] (03PS2) 10Joal: Add druid options to AQS config [puppet] - 10https://gerrit.wikimedia.org/r/379730 (owner: 10Milimetric) [10:35:56] elukey: --^ [10:36:02] (03CR) 10Filippo Giunchedi: [C: 04-1] compiler-update-facts: restore optional use of PUPPET_MASTER env (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/383857 (https://phabricator.wikimedia.org/T97081) (owner: 10Andrew Bogott) [10:37:58] joal: going to tweak a few things and then I'll resubmit [10:38:07] elukey: Thank you :) [10:38:59] (03CR) 10Filippo Giunchedi: [C: 031] Adapt synchronisation of raid blobs to use thirdparty/hwraid [puppet] - 10https://gerrit.wikimedia.org/r/383822 (https://phabricator.wikimedia.org/T158583) (owner: 10Muehlenhoff) [10:39:23] (03CR) 10Filippo Giunchedi: [C: 031] Add a new component thirdparty/hwraid [puppet] - 10https://gerrit.wikimedia.org/r/383821 (https://phabricator.wikimedia.org/T158583) (owner: 10Muehlenhoff) [10:40:13] (03CR) 10Filippo Giunchedi: [C: 031] Use new repository layout for stretch onwards [puppet] - 10https://gerrit.wikimedia.org/r/357559 (https://phabricator.wikimedia.org/T158583) (owner: 10Muehlenhoff) [10:40:28] (03CR) 10Filippo Giunchedi: [C: 031] Remove stretch-wikimedia/backports [puppet] - 10https://gerrit.wikimedia.org/r/383375 (https://phabricator.wikimedia.org/T158583) (owner: 10Muehlenhoff) [10:40:51] (03Abandoned) 10Jcrespo: Revert "mariadb: Depool db2047, hardware issues" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/380314 (owner: 10Jcrespo) [10:41:07] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db1098 temporarily for maintenace" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384004 (owner: 10Jcrespo) [10:42:22] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1098 temporarily for maintenace" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384004 (owner: 10Jcrespo) [10:42:34] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1098 temporarily for maintenace" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384004 (owner: 10Jcrespo) [10:53:07] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1098 (duration: 00m 47s) [10:53:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:54:35] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests, 10User-Addshore: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T177599#3682548 (10Addshore) @Dzahn has @Pablo-WMDE automatically been added to the nda ldap group as part of this ticket or not? [11:13:13] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests, 10User-Addshore: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T177599#3664202 (10MoritzMuehlenhoff) @Addshore : I just checked, he's currently not a member of that group. [11:22:43] (03PS1) 10Marostegui: db-eqiad,db-codfw.php: Add db2083 to s5 config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384010 (https://phabricator.wikimedia.org/T170662) [11:24:38] (03CR) 10Marostegui: [C: 032] db-eqiad,db-codfw.php: Add db2083 to s5 config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384010 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [11:25:48] (03Merged) 10jenkins-bot: db-eqiad,db-codfw.php: Add db2083 to s5 config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384010 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [11:26:33] (03CR) 10jenkins-bot: db-eqiad,db-codfw.php: Add db2083 to s5 config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384010 (https://phabricator.wikimedia.org/T170662) (owner: 10Marostegui) [11:26:48] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Add db2082 to the config - T170662 (duration: 00m 46s) [11:26:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:26:55] T170662: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662 [11:27:40] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Add db2082 to the config - T170662 (duration: 00m 46s) [11:27:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:53] (03CR) 10Hashar: [V: 031 C: 031] "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/383843 (https://phabricator.wikimedia.org/T178076) (owner: 10Hashar) [11:32:56] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/383870 (owner: 10Hashar) [11:33:57] (03CR) 10jerkins-bot: [V: 04-1] spec: subject type is infered from dir structure [puppet] - 10https://gerrit.wikimedia.org/r/383870 (owner: 10Hashar) [11:37:59] (03PS3) 10Elukey: role::aqs: add druid options to AQS config and move aqs to profile [puppet] - 10https://gerrit.wikimedia.org/r/379730 (owner: 10Milimetric) [11:38:09] operations-puppet-tests-docker might fail [11:38:10] sorry [11:45:09] (03PS4) 10Elukey: role::aqs: add druid options to AQS config and move aqs to profile [puppet] - 10https://gerrit.wikimedia.org/r/379730 (owner: 10Milimetric) [11:45:47] (03CR) 10jerkins-bot: [V: 04-1] role::aqs: add druid options to AQS config and move aqs to profile [puppet] - 10https://gerrit.wikimedia.org/r/379730 (owner: 10Milimetric) [11:49:46] PROBLEM - MegaRAID on db1050 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) [11:49:48] ACKNOWLEDGEMENT - MegaRAID on db1050 is CRITICAL: CRITICAL: 1 failed LD(s) (Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T178157 [11:49:50] (03PS5) 10Elukey: role::aqs: add druid options to AQS config and move aqs to profile [puppet] - 10https://gerrit.wikimedia.org/r/379730 (owner: 10Milimetric) [11:49:51] 10Operations, 10ops-eqiad: Degraded RAID on db1050 - https://phabricator.wikimedia.org/T178157#3682623 (10ops-monitoring-bot) [11:50:26] (03CR) 10jerkins-bot: [V: 04-1] role::aqs: add druid options to AQS config and move aqs to profile [puppet] - 10https://gerrit.wikimedia.org/r/379730 (owner: 10Milimetric) [11:51:29] weird I am trying to execute the tests in local and I don't find the errors [11:52:55] (03PS6) 10Elukey: role::aqs: add druid options to AQS config and move aqs to profile [puppet] - 10https://gerrit.wikimedia.org/r/379730 (owner: 10Milimetric) [11:57:17] (03PS7) 10Elukey: role::aqs: add druid options to AQS config and move aqs to profile [puppet] - 10https://gerrit.wikimedia.org/r/379730 (owner: 10Milimetric) [12:00:51] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1050 - https://phabricator.wikimedia.org/T178157#3682643 (10Marostegui) p:05Triage>03Normal [12:01:43] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1050 - https://phabricator.wikimedia.org/T178157#3682623 (10Marostegui) a:03Cmjohnson @Cmjohnson can we get this disk replaced if we have spares? [12:03:54] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1050 - https://phabricator.wikimedia.org/T178157#3682658 (10Marostegui) We could get disks from db1049 which are the same size and db1049 is ready to be decommissioned (T175264) [12:05:35] (03PS8) 10Elukey: role::aqs: add druid options to AQS config and move aqs to profile [puppet] - 10https://gerrit.wikimedia.org/r/379730 (owner: 10Milimetric) [12:12:58] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/383870 (owner: 10Hashar) [12:13:08] (03CR) 10Hashar: [V: 031 C: 031] "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/383843 (https://phabricator.wikimedia.org/T178076) (owner: 10Hashar) [12:16:45] (03PS9) 10Elukey: role::aqs: add druid options to AQS config and move aqs to profile [puppet] - 10https://gerrit.wikimedia.org/r/379730 (owner: 10Milimetric) [12:17:16] (03CR) 10jerkins-bot: [V: 04-1] role::aqs: add druid options to AQS config and move aqs to profile [puppet] - 10https://gerrit.wikimedia.org/r/379730 (owner: 10Milimetric) [12:17:26] PROBLEM - Apache HTTP on mw2221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:18:16] RECOVERY - Apache HTTP on mw2221 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.113 second response time [12:19:51] (03PS10) 10Elukey: role::aqs: add druid options to AQS config and move aqs to profile [puppet] - 10https://gerrit.wikimedia.org/r/379730 (owner: 10Milimetric) [12:26:39] (03CR) 10Elukey: "pcc finally looks good: https://puppet-compiler.wmflabs.org/compiler02/8316/aqs1004.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/379730 (owner: 10Milimetric) [12:26:57] 10Operations, 10User-fgiunchedi: Integrate stretch 9.2 point release - https://phabricator.wikimedia.org/T177739#3682836 (10MoritzMuehlenhoff) These are fully rolled out: krb5 openldap xkeyboard-config [12:27:46] (03PS1) 10Mforns: Add temporary cron job for eventlogging refine test [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) [12:28:08] (03CR) 10jerkins-bot: [V: 04-1] Add temporary cron job for eventlogging refine test [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) (owner: 10Mforns) [12:29:07] (03PS2) 10Mforns: Add temporary cron job for eventlogging refine test [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) [12:29:25] (03PS1) 10Gehel: mjolnir: cleanup service declaration [puppet] - 10https://gerrit.wikimedia.org/r/384029 [12:29:30] (03CR) 10jerkins-bot: [V: 04-1] Add temporary cron job for eventlogging refine test [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) (owner: 10Mforns) [12:29:49] (03CR) 10jerkins-bot: [V: 04-1] mjolnir: cleanup service declaration [puppet] - 10https://gerrit.wikimedia.org/r/384029 (owner: 10Gehel) [12:30:20] (03PS2) 10Gehel: mjolnir: cleanup service declaration [puppet] - 10https://gerrit.wikimedia.org/r/384029 [12:33:08] (03CR) 10DCausse: [C: 031] mjolnir: cleanup service declaration [puppet] - 10https://gerrit.wikimedia.org/r/384029 (owner: 10Gehel) [12:34:57] (03CR) 10Gehel: "Merging this will require cleanup of /etc/systemd/system/mjolnir-kafka-daemon.service on relforge100[12]" [puppet] - 10https://gerrit.wikimedia.org/r/384029 (owner: 10Gehel) [12:38:33] (03PS2) 10Rush: openstack: cleanup ceilometer files and roles [puppet] - 10https://gerrit.wikimedia.org/r/383946 (https://phabricator.wikimedia.org/T171494) [12:39:37] 10Operations, 10Continuous-Integration-Config, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): operations-puppet-tests-docker console output lacks color - https://phabricator.wikimedia.org/T175057#3682894 (10hashar) a:03hashar [12:42:13] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1050 - https://phabricator.wikimedia.org/T178157#3682623 (10jcrespo) @Marostegui shouldn't we just schedule it for decom- the data from the original master was copied to db1098, and we already run checksum on these hosts. Plus it is depooled already. [12:43:08] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1050 - https://phabricator.wikimedia.org/T178157#3682909 (10Marostegui) >>! In T178157#3682903, @jcrespo wrote: > @Marostegui shouldn't we just schedule it for decom- the data from the original master was copied to db1098, and we already run checksum on th... [12:43:21] 10Operations, 10Continuous-Integration-Config, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): operations-puppet-tests-docker console output lacks color - https://phabricator.wikimedia.org/T175057#3682911 (10hashar) 05Open>03Resolved Deployed and verified. Both tox and rspec have color output... [12:44:41] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1050 - https://phabricator.wikimedia.org/T178157#3682918 (10jcrespo) > I didn't know its data was copied over to db1098! It wasn't, an older, an potentially more accurate/more different old master was; but it doesn't matter, it was fully checked on T160509. [12:45:16] 10Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1050 - https://phabricator.wikimedia.org/T178157#3682921 (10Marostegui) 05Open>03declined Host to be decommissioned [12:46:16] 10Operations, 10DBA, 10Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3682937 (10Marostegui) [12:47:19] (03CR) 10Rush: [C: 032] openstack: pdns recursor module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/383909 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [12:47:21] (03PS2) 10Muehlenhoff: Add a new component thirdparty/hwraid [puppet] - 10https://gerrit.wikimedia.org/r/383821 (https://phabricator.wikimedia.org/T158583) [12:47:25] (03PS8) 10Rush: openstack: pdns recursor module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/383909 (https://phabricator.wikimedia.org/T171494) [12:49:00] (03PS3) 10Muehlenhoff: Add a new component thirdparty/hwraid [puppet] - 10https://gerrit.wikimedia.org/r/383821 (https://phabricator.wikimedia.org/T158583) [12:49:09] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1100 (crashed)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384035 [12:49:18] (03CR) 10jerkins-bot: [V: 04-1] Revert "mariadb: Depool db1100 (crashed)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384035 (owner: 10Jcrespo) [12:50:03] (03CR) 10Muehlenhoff: [C: 032] Add a new component thirdparty/hwraid [puppet] - 10https://gerrit.wikimedia.org/r/383821 (https://phabricator.wikimedia.org/T158583) (owner: 10Muehlenhoff) [12:50:17] (03PS1) 10Jcrespo: mariadb: Depool db1053 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384036 (https://phabricator.wikimedia.org/T177772) [12:55:51] (03PS2) 10Filippo Giunchedi: scap: upgrade to 3.7.1-1 [puppet] - 10https://gerrit.wikimedia.org/r/383879 (https://phabricator.wikimedia.org/T127762) (owner: 10Thcipriani) [12:56:45] (03CR) 10Filippo Giunchedi: [C: 032] scap: upgrade to 3.7.1-1 [puppet] - 10https://gerrit.wikimedia.org/r/383879 (https://phabricator.wikimedia.org/T127762) (owner: 10Thcipriani) [12:57:29] !log upload scap 3.7.1-1 - T127762 [12:57:36] (03CR) 10Gehel: [C: 04-1] "Looks good! A few minor comments inline." (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) (owner: 10Bearloga) [12:57:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:57:37] T127762: Update Debian Package for Scap3 - https://phabricator.wikimedia.org/T127762 [12:58:35] (03PS2) 10Jcrespo: mariadb: Depool db1053 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384036 (https://phabricator.wikimedia.org/T177772) [12:58:37] (03PS2) 10Jcrespo: Revert "mariadb: Depool db1100 (crashed)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384035 [12:59:24] (03PS3) 10Jcrespo: Revert "mariadb: Depool db1100 (crashed)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384035 [12:59:40] (03CR) 10Gehel: [C: 04-1] "As to your comment on making this more generic, I would wait until there is a real second use case. The profiles are simple enough as they" [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) (owner: 10Bearloga) [13:01:08] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1053 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384036 (https://phabricator.wikimedia.org/T177772) (owner: 10Jcrespo) [13:02:43] (03PS2) 10Muehlenhoff: Adapt synchronisation of raid blobs to use thirdparty/hwraid [puppet] - 10https://gerrit.wikimedia.org/r/383822 (https://phabricator.wikimedia.org/T158583) [13:02:53] (03Merged) 10jenkins-bot: mariadb: Depool db1053 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384036 (https://phabricator.wikimedia.org/T177772) (owner: 10Jcrespo) [13:03:07] (03CR) 10jenkins-bot: mariadb: Depool db1053 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384036 (https://phabricator.wikimedia.org/T177772) (owner: 10Jcrespo) [13:05:03] (03CR) 10Muehlenhoff: [C: 032] Adapt synchronisation of raid blobs to use thirdparty/hwraid [puppet] - 10https://gerrit.wikimedia.org/r/383822 (https://phabricator.wikimedia.org/T158583) (owner: 10Muehlenhoff) [13:06:36] PROBLEM - puppet last run on labnet1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:06:50] (03PS1) 10Rush: openstack: net_standby vs net_secondary for cold spare labnet [puppet] - 10https://gerrit.wikimedia.org/r/384038 (https://phabricator.wikimedia.org/T171494) [13:07:19] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1053 (duration: 00m 47s) [13:07:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:03] (03CR) 10Rush: [C: 032] openstack: net_standby vs net_secondary for cold spare labnet [puppet] - 10https://gerrit.wikimedia.org/r/384038 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [13:11:36] RECOVERY - puppet last run on labnet1002 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [13:13:01] (03PS3) 10Rush: openstack: cleanup ceilometer files and roles [puppet] - 10https://gerrit.wikimedia.org/r/383946 (https://phabricator.wikimedia.org/T171494) [13:14:22] (03PS1) 10Muehlenhoff: Synchronise jenkins package to thirdparty/ci [puppet] - 10https://gerrit.wikimedia.org/r/384039 (https://phabricator.wikimedia.org/T158583) [13:22:08] (03CR) 10Ottomata: "One nit, looks good though! :)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) (owner: 10Mforns) [13:22:52] 10Operations, 10Commons, 10Thumbor, 10media-storage, 10Performance-Team (Radar): Jessie rsvg/cairo can't render specific SVG file on Commons - https://phabricator.wikimedia.org/T170628#3683019 (10MoritzMuehlenhoff) >>! In T170628#3679730, @Gilles wrote: > The upgrade hasn't fixed the problem. It seems li... [13:24:37] (03CR) 10Rush: [C: 032] openstack: cleanup ceilometer files and roles [puppet] - 10https://gerrit.wikimedia.org/r/383946 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [13:27:27] PROBLEM - puppet last run on labvirt1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:29:43] (03PS1) 10Ottomata: Add temp ssh key for halfak to copy data from university server [puppet] - 10https://gerrit.wikimedia.org/r/384045 (https://phabricator.wikimedia.org/T177521) [13:30:09] (03CR) 10jerkins-bot: [V: 04-1] Add temp ssh key for halfak to copy data from university server [puppet] - 10https://gerrit.wikimedia.org/r/384045 (https://phabricator.wikimedia.org/T177521) (owner: 10Ottomata) [13:35:07] (03PS2) 10Ottomata: Add temp ssh key for halfak to copy data from university server [puppet] - 10https://gerrit.wikimedia.org/r/384045 (https://phabricator.wikimedia.org/T177521) [13:35:32] (03CR) 10jerkins-bot: [V: 04-1] Add temp ssh key for halfak to copy data from university server [puppet] - 10https://gerrit.wikimedia.org/r/384045 (https://phabricator.wikimedia.org/T177521) (owner: 10Ottomata) [13:36:10] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1053 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384048 [13:37:26] RECOVERY - puppet last run on labvirt1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:42:52] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi: Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3683052 (10fgiunchedi) [13:43:07] (03CR) 10Ottomata: [V: 032 C: 032] "Dunno what's up with Jenkins, tox works fine locally." [puppet] - 10https://gerrit.wikimedia.org/r/384045 (https://phabricator.wikimedia.org/T177521) (owner: 10Ottomata) [13:43:14] (03PS3) 10Ottomata: Add temp ssh key for halfak to copy data from university server [puppet] - 10https://gerrit.wikimedia.org/r/384045 (https://phabricator.wikimedia.org/T177521) [13:43:20] (03CR) 10Ottomata: [V: 032 C: 032] Add temp ssh key for halfak to copy data from university server [puppet] - 10https://gerrit.wikimedia.org/r/384045 (https://phabricator.wikimedia.org/T177521) (owner: 10Ottomata) [13:43:59] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi: Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3650139 (10fgiunchedi) [13:53:52] 10Operations, 10Research, 10Patch-For-Review, 10Research-2017-18-Q2: Permissions to upload data to the analytics cluster from a machine at Drexel - https://phabricator.wikimedia.org/T177521#3683081 (10Ottomata) @Halfak, your key is added, it should be available on all relevant hosts within 30 mins. Let me... [13:57:54] (03CR) 10Jcrespo: [C: 032] Revert "mariadb: Depool db1053 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384048 (owner: 10Jcrespo) [13:59:25] (03Merged) 10jenkins-bot: Revert "mariadb: Depool db1053 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384048 (owner: 10Jcrespo) [13:59:35] (03CR) 10jenkins-bot: Revert "mariadb: Depool db1053 for maintenance" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384048 (owner: 10Jcrespo) [14:06:04] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1053 (duration: 00m 46s) [14:06:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:51] (03CR) 10Alexandros Kosiaris: Deployment pipeline profile (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/382608 (https://phabricator.wikimedia.org/T173128) (owner: 10Thcipriani) [14:09:49] (03PS3) 10Mforns: Add temporary cron job for eventlogging refine test [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) [14:10:13] (03CR) 10jerkins-bot: [V: 04-1] Add temporary cron job for eventlogging refine test [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) (owner: 10Mforns) [14:10:33] (03PS4) 10Jcrespo: Repool db1100, depool db1056, fix db1098 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384035 (https://phabricator.wikimedia.org/T177772) [14:11:30] (03CR) 10Mforns: "Thanks for the review!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) (owner: 10Mforns) [14:12:33] (03CR) 10Mforns: "Jenkins is rejecting though... Will look into that" [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) (owner: 10Mforns) [14:13:33] (03PS5) 10Dzahn: Phab: Allow aklapper to delete panels on dashboards [puppet] - 10https://gerrit.wikimedia.org/r/380959 (owner: 10Aklapper) [14:13:54] (03CR) 10jerkins-bot: [V: 04-1] Phab: Allow aklapper to delete panels on dashboards [puppet] - 10https://gerrit.wikimedia.org/r/380959 (owner: 10Aklapper) [14:15:45] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests, 10User-Addshore: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T177599#3683171 (10Dzahn) No, sorry, the nda group was never requested and is used for different things, not for controlling access to repos. [14:16:25] (03CR) 10Dzahn: [C: 031] "CI still broken 14:13:52 ERROR: unknown environment 'testenv'" [puppet] - 10https://gerrit.wikimedia.org/r/380959 (owner: 10Aklapper) [14:20:01] (03PS4) 10Mforns: Add temporary cron job for eventlogging refine test [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) [14:20:12] (03PS2) 10Dzahn: Purge /usr/lib/ganglia/ and /etc/ganglia [puppet] - 10https://gerrit.wikimedia.org/r/383991 (https://phabricator.wikimedia.org/T177225) (owner: 10Alexandros Kosiaris) [14:20:26] (03CR) 10jerkins-bot: [V: 04-1] Add temporary cron job for eventlogging refine test [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) (owner: 10Mforns) [14:21:52] (03CR) 10Jcrespo: [C: 032] Repool db1100, depool db1056, fix db1098 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384035 (https://phabricator.wikimedia.org/T177772) (owner: 10Jcrespo) [14:23:42] (03Merged) 10jenkins-bot: Repool db1100, depool db1056, fix db1098 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384035 (https://phabricator.wikimedia.org/T177772) (owner: 10Jcrespo) [14:25:05] (03PS5) 10Mforns: Add temporary cron job for eventlogging refine test [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) [14:25:30] (03CR) 10jerkins-bot: [V: 04-1] Add temporary cron job for eventlogging refine test [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) (owner: 10Mforns) [14:25:59] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1100, depool db1056, fix db1098 weight (duration: 00m 46s) [14:26:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:26:33] (03CR) 10jenkins-bot: Repool db1100, depool db1056, fix db1098 weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384035 (https://phabricator.wikimedia.org/T177772) (owner: 10Jcrespo) [14:26:37] 10Operations, 10Cloud-Services, 10Traffic: check_dns needs to be rewritten - https://phabricator.wikimedia.org/T133791#3683244 (10ema) 05Open>03Resolved a:03ema check_dns v1.5 (nagios-plugins 1.5) seems to be doing the right thing currently: ``` 14:25:09 ema@labservices1001.wikimedia.org:~ $ /usr/lib/... [14:27:41] (03CR) 10Mforns: "I fixed the single escape problem, but there are 2 "wmf-style" errors that I don't understand, though." [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) (owner: 10Mforns) [14:28:26] PROBLEM - HHVM rendering on mw2202 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:29:16] RECOVERY - HHVM rendering on mw2202 is OK: HTTP OK: HTTP/1.1 200 OK - 77698 bytes in 0.314 second response time [14:29:50] (03CR) 10Jcrespo: "It is explained above:" [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) (owner: 10Mforns) [14:30:28] (03CR) 10Dzahn: [C: 032] Purge /usr/lib/ganglia/ and /etc/ganglia [puppet] - 10https://gerrit.wikimedia.org/r/383991 (https://phabricator.wikimedia.org/T177225) (owner: 10Alexandros Kosiaris) [14:32:23] (03CR) 10Jcrespo: "For context: https://wikitech.wikimedia.org/wiki/Puppet_coding#Organization" [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) (owner: 10Mforns) [14:38:11] !log stopping slave on db1056 and optimize recentchanges [14:38:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:40:31] (03PS1) 10Eevans: Use instance `ID=default` when no ID is supplied [debs/cassandra-tools-wmf] - 10https://gerrit.wikimedia.org/r/384055 (https://phabricator.wikimedia.org/T178169) [14:45:20] (03PS1) 10Jcrespo: mariadb: Repool db1056 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384057 (https://phabricator.wikimedia.org/T177772) [14:48:01] (03PS1) 10Halfak: Removes outdated ssh keys for halfak. [puppet] - 10https://gerrit.wikimedia.org/r/384058 (https://phabricator.wikimedia.org/T177521) [14:48:19] 10Operations, 10Research, 10Patch-For-Review, 10Research-2017-18-Q2: Permissions to upload data to the analytics cluster from a machine at Drexel - https://phabricator.wikimedia.org/T177521#3683281 (10Halfak) I don't need the following keys anymore: * halfak@halfak@tako-umh * halfak@carbon See https://ger... [14:48:23] (03CR) 10jerkins-bot: [V: 04-1] Removes outdated ssh keys for halfak. [puppet] - 10https://gerrit.wikimedia.org/r/384058 (https://phabricator.wikimedia.org/T177521) (owner: 10Halfak) [14:53:23] (03CR) 10Ppchelko: [C: 031] Use instance `ID=default` when no ID is supplied [debs/cassandra-tools-wmf] - 10https://gerrit.wikimedia.org/r/384055 (https://phabricator.wikimedia.org/T178169) (owner: 10Eevans) [14:55:43] (03PS11) 10Ottomata: role::aqs: add druid options to AQS config and move aqs to profile [puppet] - 10https://gerrit.wikimedia.org/r/379730 (owner: 10Milimetric) [14:55:55] (03CR) 10Ottomata: [V: 032 C: 032] role::aqs: add druid options to AQS config and move aqs to profile [puppet] - 10https://gerrit.wikimedia.org/r/379730 (owner: 10Milimetric) [14:56:06] 10Operations, 10Research, 10Patch-For-Review, 10Research-2017-18-Q2: Permissions to upload data to the analytics cluster from a machine at Drexel - https://phabricator.wikimedia.org/T177521#3683290 (10Halfak) I should note that the xfer has started and I expect it to finish by EOD for @ottomata. [14:58:57] (03PS2) 10Ottomata: Removes outdated ssh keys for halfak. [puppet] - 10https://gerrit.wikimedia.org/r/384058 (https://phabricator.wikimedia.org/T177521) (owner: 10Halfak) [14:59:00] (03CR) 10Filippo Giunchedi: Synchronise jenkins package to thirdparty/ci (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/384039 (https://phabricator.wikimedia.org/T158583) (owner: 10Muehlenhoff) [14:59:14] (03CR) 10Ottomata: [V: 032 C: 032] Removes outdated ssh keys for halfak. [puppet] - 10https://gerrit.wikimedia.org/r/384058 (https://phabricator.wikimedia.org/T177521) (owner: 10Halfak) [14:59:53] !log added cparle to wmf LDAP group [14:59:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:02:41] (03CR) 10Ottomata: "Yeha, we'll ignore for this patch. We need to do a bigger refactor to conform with newer puppet coding guidelines." [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) (owner: 10Mforns) [15:03:18] (03PS6) 10Ottomata: Add temporary cron job for eventlogging refine test [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) (owner: 10Mforns) [15:03:50] (03CR) 10jerkins-bot: [V: 04-1] Add temporary cron job for eventlogging refine test [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) (owner: 10Mforns) [15:04:04] (03CR) 10Filippo Giunchedi: [C: 031] "Using set -u would be good practice/hygene" [debs/cassandra-tools-wmf] - 10https://gerrit.wikimedia.org/r/384055 (https://phabricator.wikimedia.org/T178169) (owner: 10Eevans) [15:04:28] (03CR) 10Ottomata: [V: 032 C: 032] Add temporary cron job for eventlogging refine test [puppet] - 10https://gerrit.wikimedia.org/r/384028 (https://phabricator.wikimedia.org/T177783) (owner: 10Mforns) [15:06:19] (03PS1) 10Giuseppe Lavagetto: tox: reorganize tests [puppet] - 10https://gerrit.wikimedia.org/r/384063 (https://phabricator.wikimedia.org/T176671) [15:06:31] <_joe_> mutante: ^^ [15:06:50] <_joe_> mutante: rebase your change on top of this and it shouldn't fail [15:06:53] 10Operations, 10Traffic: Renew unified certificates 2017 - https://phabricator.wikimedia.org/T178173#3683312 (10BBlack) [15:07:31] _joe_: ok:) thanks [15:07:55] (03CR) 10Zoranzoki21: "All in wmf-config folder is renamed with SynWrite program" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383944 (https://phabricator.wikimedia.org/T45956) (owner: 10Zoranzoki21) [15:08:00] (03PS2) 10Zoranzoki21: Rename $wmf* to $wmg* in wmf-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383944 (https://phabricator.wikimedia.org/T45956) [15:08:34] (03PS2) 10Giuseppe Lavagetto: tox: reorganize tests [puppet] - 10https://gerrit.wikimedia.org/r/384063 (https://phabricator.wikimedia.org/T176671) [15:10:03] (03CR) 10Giuseppe Lavagetto: [C: 032] tox: reorganize tests [puppet] - 10https://gerrit.wikimedia.org/r/384063 (https://phabricator.wikimedia.org/T176671) (owner: 10Giuseppe Lavagetto) [15:10:42] (03PS1) 10Dzahn: admins: add Cormac Parle to LDAP-only (wmf) [puppet] - 10https://gerrit.wikimedia.org/r/384064 [15:14:22] (03CR) 10jerkins-bot: [V: 04-1] Rename $wmf* to $wmg* in wmf-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383944 (https://phabricator.wikimedia.org/T45956) (owner: 10Zoranzoki21) [15:16:31] (03CR) 10Mobrovac: Use instance `ID=default` when no ID is supplied (031 comment) [debs/cassandra-tools-wmf] - 10https://gerrit.wikimedia.org/r/384055 (https://phabricator.wikimedia.org/T178169) (owner: 10Eevans) [15:17:26] 10Operations, 10Release-Engineering-Team, 10vm-requests, 10Security-General: New ganeti VM for MW release pipeline work - https://phabricator.wikimedia.org/T163743#3683345 (10demon) [15:17:30] 10Operations, 10RelEng-Archive-FY201718-Q1, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10Security-General: setup releases1001.eqiad.wmnet (was: setup mwreleases1001) - https://phabricator.wikimedia.org/T164030#3683343 (10demon) 05Open>03Resolved The machine itself is up and running with... [15:20:56] (03PS2) 10Dzahn: admins: add Cormac Parle to LDAP-only (wmf) [puppet] - 10https://gerrit.wikimedia.org/r/384064 [15:21:22] (03CR) 10Dzahn: [C: 032] "no linked ticket because there is no ticket-based workflow for onboardings" [puppet] - 10https://gerrit.wikimedia.org/r/384064 (owner: 10Dzahn) [15:23:28] (03CR) 10Dzahn: [C: 031] "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/380959 (owner: 10Aklapper) [15:25:15] (03CR) 10Jcrespo: [C: 032] mariadb: Repool db1056 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384057 (https://phabricator.wikimedia.org/T177772) (owner: 10Jcrespo) [15:26:36] (03Merged) 10jenkins-bot: mariadb: Repool db1056 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384057 (https://phabricator.wikimedia.org/T177772) (owner: 10Jcrespo) [15:26:38] (03CR) 10Zoranzoki21: [C: 031] Sort dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [15:26:45] (03CR) 10jenkins-bot: mariadb: Repool db1056 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/384057 (https://phabricator.wikimedia.org/T177772) (owner: 10Jcrespo) [15:29:48] (03CR) 10Jcrespo: "BTW, I think a test has to be changed that depended on flow list order." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383999 (owner: 10Hoo man) [15:29:59] (03CR) 10Zoranzoki21: "> Check also multiversion." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383944 (https://phabricator.wikimedia.org/T45956) (owner: 10Zoranzoki21) [15:30:14] (03CR) 10Zoranzoki21: "You think on folder multiversion?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383944 (https://phabricator.wikimedia.org/T45956) (owner: 10Zoranzoki21) [15:31:53] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Repool db1056 (duration: 00m 46s) [15:31:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:04] (03CR) 10Eevans: [C: 04-1] Use instance `ID=default` when no ID is supplied (031 comment) [debs/cassandra-tools-wmf] - 10https://gerrit.wikimedia.org/r/384055 (https://phabricator.wikimedia.org/T178169) (owner: 10Eevans) [15:33:06] (03CR) 10Dzahn: [C: 032] Phab: Allow aklapper to delete panels on dashboards [puppet] - 10https://gerrit.wikimedia.org/r/380959 (owner: 10Aklapper) [15:33:15] (03PS6) 10Dzahn: Phab: Allow aklapper to delete panels on dashboards [puppet] - 10https://gerrit.wikimedia.org/r/380959 (owner: 10Aklapper) [15:35:55] (03PS3) 10Zoranzoki21: Rename $wmf* to $wmg* in wmf-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383944 (https://phabricator.wikimedia.org/T45956) [15:36:03] (03PS4) 10Zoranzoki21: Rename $wmf* to $wmg* in wmf-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383944 (https://phabricator.wikimedia.org/T45956) [15:37:14] (03CR) 10jerkins-bot: [V: 04-1] Rename $wmf* to $wmg* in wmf-config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383944 (https://phabricator.wikimedia.org/T45956) (owner: 10Zoranzoki21) [16:01:47] OMG it's Friday the 13th. Be careful with your prod systems today. ;) [16:04:52] (03PS1) 10Rush: openstack: horizon to module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/384069 (https://phabricator.wikimedia.org/T171494) [16:05:18] (03PS2) 10Rush: openstack: horizon to module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/384069 (https://phabricator.wikimedia.org/T171494) [16:09:00] (03PS1) 10Dzahn: authdns: purge ganglia files [puppet] - 10https://gerrit.wikimedia.org/r/384070 (https://phabricator.wikimedia.org/T177225) [16:12:38] (03PS2) 10Andrew Bogott: compiler-update-facts: restore optional use of PUPPET_MASTER env [puppet] - 10https://gerrit.wikimedia.org/r/383857 (https://phabricator.wikimedia.org/T97081) [16:13:12] (03CR) 10Andrew Bogott: [C: 04-2] DO NOT MERGE: no-op patch for testing [puppet] - 10https://gerrit.wikimedia.org/r/383942 (owner: 10Andrew Bogott) [16:15:26] (03CR) 10Dzahn: [C: 032] authdns: purge ganglia files [puppet] - 10https://gerrit.wikimedia.org/r/384070 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [16:18:52] 10Operations, 10monitoring, 10Patch-For-Review: Uninstall ganglia from the fleet - https://phabricator.wikimedia.org/T177225#3683479 (10Dzahn) >>! In T177225#3682216, @akosiaris wrote: > So these hosts now having only the "basic" ganglia stuff. Should we switch has_ganglia to `no` for them (or preferably the... [16:22:29] (03PS3) 10Rush: openstack: horizon to module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/384069 (https://phabricator.wikimedia.org/T171494) [16:23:20] (03PS4) 10Rush: openstack: horizon to module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/384069 (https://phabricator.wikimedia.org/T171494) [16:24:20] Hi [16:24:40] To add this patch: https://gerrit.wikimedia.org/r/#/c/377864/ on deployments table on wikitech wiki, or you can deploy without it? [16:25:24] Zoranzoki21: normally no deployments happen on fridays [16:25:38] Ok, I will add on deployments table then [16:26:00] It should be in a swat deploy, but not until monday [16:26:26] OK. I will add in swat [16:29:05] Added in deployments table: https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=1772774&oldid=1772763 [16:32:27] Isarra: are you aware of that patch? ---^ [16:32:50] (03CR) 10Dzahn: [C: 031] "pdns metrics have been migrated to diamond collector in https://phabricator.wikimedia.org/T169600 and the dashboard is at https://grafana" [puppet] - 10https://gerrit.wikimedia.org/r/382929 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [16:33:14] (03PS2) 10Dzahn: dnsrecursor: drop ganglia metrics support [puppet] - 10https://gerrit.wikimedia.org/r/382929 (https://phabricator.wikimedia.org/T177225) [16:33:50] bawolff: Wat? [16:34:11] Actually just assume I'm not aware of anything of late. [16:34:12] 10Operations, 10Continuous-Integration-Infrastructure (shipyard), 10User-Joe: Unify production and CI docker image build process - https://phabricator.wikimedia.org/T177276#3683551 (10thcipriani) >>! In T177276#3682327, @Joe wrote: > * Tagging images with date instead of a semver versioning. I am dibated abo... [16:34:26] Zoranzoki21 added a patch to swat to enable timeless in a bunch of places [16:34:32] Oh. Looks like it needs to be redone. [16:34:45] All those places had consensus. [16:36:08] (03PS3) 10Dzahn: dnsrecursor: drop ganglia metrics support [puppet] - 10https://gerrit.wikimedia.org/r/382929 (https://phabricator.wikimedia.org/T177225) [16:37:14] (03CR) 10Dzahn: [C: 032] dnsrecursor: drop ganglia metrics support [puppet] - 10https://gerrit.wikimedia.org/r/382929 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [16:37:30] bawolff: How do you time and do these things properly? [16:39:45] I'm not sure what you mean by time? [16:40:12] Basically you just make a patch, sign up in a swat window and be in this channel during the window [16:40:13] Uh... neither am I. Nevermind. [16:40:35] I guess the whole swat window thing has me a bit confused. [16:40:53] Don't specific people use specific windows? Doesn't it need to go through them? [16:41:57] SWAT is set aside for randoms who need patches deployed but don't have a specific window [16:42:29] But only specific randoms have permissions. [16:42:52] Oh, its set aside for randoms who don't have permission. Someone with permission will do it for you [16:43:01] Ah. [16:43:20] So the issue with zoranzoki2's patch was that it was never added to a window? [16:43:33] Basically, you just add to list, someone puts it on mwdebug server, they ask you to test, if you say it works, then they put it on whole server [16:43:56] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:44:08] Probably, mediawiki config patches usually don't end up getting deployed if they are not signed up for a window or someone whose a deployer really cares about it [16:44:46] PROBLEM - puppet last run on db1054 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:45:16] PROBLEM - puppet last run on mc1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:45:16] PROBLEM - puppet last run on wtp1032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:45:36] PROBLEM - puppet last run on kafka1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:45:37] PROBLEM - puppet last run on mw1281 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:45:46] PROBLEM - puppet last run on ms-be1017 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:45:46] PROBLEM - puppet last run on db1071 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:45:46] PROBLEM - puppet last run on elastic1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:46:06] bawolff: Okay, I can try to do something about that... later? Maybe? I'm way too ill now. [16:46:06] PROBLEM - puppet last run on scb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:46:07] PROBLEM - puppet last run on ms-be1034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:46:16] PROBLEM - puppet last run on poolcounter1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:46:16] PROBLEM - puppet last run on mc1031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:46:36] PROBLEM - puppet last run on conf1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:46:36] PROBLEM - puppet last run on analytics1034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:46:46] PROBLEM - puppet last run on einsteinium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:46:47] Isarra: In any case, Zoranzoki21 seems to have signed it up under his name, so you probably don't have to do anything about it [16:46:54] I just thought you'd want to know, because timeless [16:47:39] Oh, good. [16:47:52] So it is done? [16:48:59] !log otto@tin Started deploy [analytics/refinery@28db253]: (no justification provided) [16:49:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:49:09] 10Operations, 10monitoring, 10Patch-For-Review: Uninstall ganglia from the fleet - https://phabricator.wikimedia.org/T177225#3683600 (10Dzahn) ^ I amended the one for dnsrecursors to include the "ganglia: false" Hiera setting right away and then merged that too. Metrics were converted to Diamond collector i... [16:50:15] bawolff: So everything is fine, right? [16:50:27] yep [16:50:39] or at least, it will be done on monday [16:51:00] Oh, good. [16:51:17] RECOVERY - puppet last run on mc1031 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [16:51:17] Sorry, I'm a tad slow right now. [16:51:29] bawolff: Thanks. >.< [16:54:49] !log otto@tin Finished deploy [analytics/refinery@28db253]: (no justification provided) (duration: 05m 50s) [16:54:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:08:21] (03CR) 10Volans: compiler-update-facts: restore optional use of PUPPET_MASTER env (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/383857 (https://phabricator.wikimedia.org/T97081) (owner: 10Andrew Bogott) [17:10:36] RECOVERY - puppet last run on kafka1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:10:46] RECOVERY - puppet last run on ms-be1017 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [17:10:47] RECOVERY - puppet last run on elastic1029 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [17:11:07] RECOVERY - puppet last run on scb1001 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [17:11:07] RECOVERY - puppet last run on ms-be1034 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:11:17] RECOVERY - puppet last run on poolcounter1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:11:36] RECOVERY - puppet last run on conf1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:11:36] RECOVERY - puppet last run on analytics1034 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [17:11:46] RECOVERY - puppet last run on einsteinium is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [17:13:56] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [17:14:46] RECOVERY - puppet last run on db1054 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [17:15:16] RECOVERY - puppet last run on mc1035 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [17:15:17] RECOVERY - puppet last run on wtp1032 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:15:37] RECOVERY - puppet last run on mw1281 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [17:15:46] RECOVERY - puppet last run on db1071 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [17:17:42] (03PS5) 10Rush: openstack: horizon to module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/384069 (https://phabricator.wikimedia.org/T171494) [17:30:38] (03CR) 10Jdlrobson: Temporarily prevent users from accessing Special:RenderBook/test (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377929 (https://phabricator.wikimedia.org/T175868) (owner: 10Gergő Tisza) [17:32:40] (03PS6) 10Rush: openstack: horizon to module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/384069 (https://phabricator.wikimedia.org/T171494) [17:33:08] (03CR) 10jerkins-bot: [V: 04-1] openstack: horizon to module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/384069 (https://phabricator.wikimedia.org/T171494) (owner: 10Rush) [17:35:42] (03PS7) 10Rush: openstack: horizon to module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/384069 (https://phabricator.wikimedia.org/T171494) [17:37:12] (03PS1) 10Giuseppe Lavagetto: [WiP] Port docker builder [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/384081 (https://phabricator.wikimedia.org/T177276) [17:41:08] (03PS8) 10Rush: openstack: horizon to module/profile/role [puppet] - 10https://gerrit.wikimedia.org/r/384069 (https://phabricator.wikimedia.org/T171494) [17:46:10] (03CR) 10EBernhardson: [C: 031] Add additional namespaces to search results for bnwikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383794 (https://phabricator.wikimedia.org/T178041) (owner: 10DCausse) [17:54:02] 10Operations, 10Patch-For-Review, 10Prometheus-metrics-monitoring: Port application-specific metrics from ganglia to prometheus - https://phabricator.wikimedia.org/T145659#3683792 (10Dzahn) @fgiunchedi I was wondering if there is something here that replaces the PacketLossLogtailer from udp2log (https://ger... [17:54:57] PROBLEM - Router interfaces on mr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.199, interfaces up: 35, down: 1, dormant: 0, excluded: 1, unused: 0 [17:56:46] PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: PING CRITICAL - Packet loss = 100% [17:58:06] RECOVERY - Router interfaces on mr1-eqiad is OK: OK: host 208.80.154.199, interfaces up: 37, down: 0, dormant: 0, excluded: 1, unused: 0 [18:01:56] RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 44.40 ms [18:04:05] (03Abandoned) 10Dzahn: ocg: remove all ganglia support [puppet] - 10https://gerrit.wikimedia.org/r/382907 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [18:05:04] 10Operations, 10Scap: scap should not pull in HHVM on stretch hosts using PHP7 - https://phabricator.wikimedia.org/T178039#3683836 (10thcipriani) 05Open>03Resolved a:03thcipriani scap 3.7.1-1 is now live. `php5-cli | hhvm | php-cli` is now part of `Suggests` ```[thcipriani@tin ~]$ apt-cache show scap Pa... [18:13:33] 10Operations, 10Research, 10Patch-For-Review, 10Research-2017-18-Q2: Permissions to upload data to the analytics cluster from a machine at Drexel - https://phabricator.wikimedia.org/T177521#3683871 (10Halfak) xfer finished. Keys deleted from University VM. Will submit a patchset shortly to delete the key... [18:15:11] (03PS1) 10Halfak: Removes temp ssh key for halfak. [puppet] - 10https://gerrit.wikimedia.org/r/384084 (https://phabricator.wikimedia.org/T177521) [18:16:43] (03CR) 10Ottomata: [C: 032] Removes temp ssh key for halfak. [puppet] - 10https://gerrit.wikimedia.org/r/384084 (https://phabricator.wikimedia.org/T177521) (owner: 10Halfak) [18:21:38] (03Draft1) 10Paladox: Gerrit: Increase receive.maxObjectSizeLimit to 200m temporarily [puppet] - 10https://gerrit.wikimedia.org/r/384085 (https://phabricator.wikimedia.org/T178189) [18:21:43] (03PS2) 10Paladox: Gerrit: Increase receive.maxObjectSizeLimit to 200m temporarily [puppet] - 10https://gerrit.wikimedia.org/r/384085 (https://phabricator.wikimedia.org/T178189) [18:26:36] PROBLEM - pdfrender on scb1002 is CRITICAL: connect to address 10.64.16.21 and port 5252: Connection refused [18:26:56] 10Operations, 10Availability (Multiple-active-datacenters), 10MediaWiki-Platform-Team (MWPT-Q1-Jul-Sep-2017), 10Patch-For-Review, and 5 others: Allow integration of data from etcd into the MediaWiki configuration - https://phabricator.wikimedia.org/T156924#3683938 (10CCicalese_WMF) [18:37:45] !log joal@tin Started deploy [analytics/aqs/deploy@0375c34]: Deploy new mediawiki-history endpoints-alpha version [18:37:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:40:34] !log joal@tin (no justification provided) [18:40:37] PROBLEM - AQS root url on aqs1004 is CRITICAL: connect to address 10.64.0.107 and port 7232: Connection refused [18:40:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:46:16] (03CR) 10Bmansurov: [C: 031] Gerrit: Increase receive.maxObjectSizeLimit to 200m temporarily [puppet] - 10https://gerrit.wikimedia.org/r/384085 (https://phabricator.wikimedia.org/T178189) (owner: 10Paladox) [18:52:47] !log joal@tin Finished deploy [analytics/aqs/deploy@0375c34]: Deploy new mediawiki-history endpoints-alpha version (duration: 15m 02s) [18:52:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:53:46] RECOVERY - AQS root url on aqs1004 is OK: HTTP OK: HTTP/1.1 200 - 727 bytes in 0.022 second response time [19:03:20] (03CR) 10Chelsyx: [C: 031] "Looks good! Thanks Mikhail!" [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) (owner: 10Bearloga) [19:10:30] (03CR) 10Herron: [C: 032] add mapped IPv6 address for puppetcompiler1001 [puppet] - 10https://gerrit.wikimedia.org/r/383892 (https://phabricator.wikimedia.org/T177843) (owner: 10Dzahn) [19:10:45] (03PS2) 10Herron: add mapped IPv6 address for puppetcompiler1001 [puppet] - 10https://gerrit.wikimedia.org/r/383892 (https://phabricator.wikimedia.org/T177843) (owner: 10Dzahn) [19:13:37] (03CR) 10Dereckson: "Next step: you also have some tests folder in the repository using those variables." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/383944 (https://phabricator.wikimedia.org/T45956) (owner: 10Zoranzoki21) [19:13:41] (03PS1) 10Nuria: Removing appInstallId from whitelist [puppet] - 10https://gerrit.wikimedia.org/r/384093 (https://phabricator.wikimedia.org/T178174) [19:13:58] (03CR) 10jerkins-bot: [V: 04-1] Removing appInstallId from whitelist [puppet] - 10https://gerrit.wikimedia.org/r/384093 (https://phabricator.wikimedia.org/T178174) (owner: 10Nuria) [19:14:16] (03PS2) 10Nuria: Removing appInstallId from whitelist [puppet] - 10https://gerrit.wikimedia.org/r/384093 (https://phabricator.wikimedia.org/T178174) [19:14:40] 10Operations, 10fundraising-tech-ops, 10netops: bonded/redundant network connections for fundraising hosts - https://phabricator.wikimedia.org/T171962#3684051 (10Jgreen) p:05Triage>03Normal [19:14:59] 10Operations, 10ops-codfw, 10fundraising-tech-ops, 10netops: connect second ethernet interface for fundraising codfw hosts - https://phabricator.wikimedia.org/T176175#3684053 (10Jgreen) p:05Triage>03Normal [19:17:44] (03CR) 10Herron: [C: 032] add IPv6 records for puppetcompiler1001 [dns] - 10https://gerrit.wikimedia.org/r/383890 (https://phabricator.wikimedia.org/T177843) (owner: 10Dzahn) [19:17:46] (03PS3) 10Herron: add IPv6 records for puppetcompiler1001 [dns] - 10https://gerrit.wikimedia.org/r/383890 (https://phabricator.wikimedia.org/T177843) (owner: 10Dzahn) [19:18:45] (03PS7) 10Thcipriani: Deployment pipeline profile [puppet] - 10https://gerrit.wikimedia.org/r/382608 (https://phabricator.wikimedia.org/T173128) [19:43:01] 10Operations, 10Commons, 10Thumbor, 10media-storage, 10Performance-Team (Radar): Jessie rsvg/cairo can't render specific SVG file on Commons - https://phabricator.wikimedia.org/T170628#3684122 (10Gilles) Probably something in the Cairo libraries? That's where you located the issue last time. [20:00:26] (03CR) 10Thcipriani: Deployment pipeline profile (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/382608 (https://phabricator.wikimedia.org/T173128) (owner: 10Thcipriani) [20:02:45] (03CR) 10Gergő Tisza: Temporarily prevent users from accessing Special:RenderBook/test (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/377929 (https://phabricator.wikimedia.org/T175868) (owner: 10Gergő Tisza) [20:15:56] (03CR) 10Legoktm: "Did a quick skim :)" (033 comments) [docker-images/docker-pkg] - 10https://gerrit.wikimedia.org/r/384081 (https://phabricator.wikimedia.org/T177276) (owner: 10Giuseppe Lavagetto) [20:19:22] 10Operations, 10Epic, 10MediaWiki-Platform-Team (MWPT-Q2-Oct-Dec-2017), 10Performance-Team (Radar), 10Services (watching): 2017/18 Annual Plan Program 8: Multi-datacenter support, Q2 goals - https://phabricator.wikimedia.org/T175213#3684220 (10CCicalese_WMF) [20:19:37] 10Operations, 10Epic, 10MediaWiki-Platform-Team (MWPT-Q2-Oct-Dec-2017), 10Performance-Team (Radar), 10Services (watching): 2017/18 Annual Plan Program 8: Multi-datacenter support - https://phabricator.wikimedia.org/T175206#3684222 (10CCicalese_WMF) [20:28:48] (03PS1) 10Smalyshev: Temporarily silence noisy warnings for dictionary upgrade [puppet] - 10https://gerrit.wikimedia.org/r/384114 (https://phabricator.wikimedia.org/T175948) [20:36:00] 10Operations, 10netops, 10Patch-For-Review: Merge AS14907 with AS43821 - https://phabricator.wikimedia.org/T167840#3346480 (10Krinkle) It's not often that one of our primary cache PoPs ends up depooled for multiple hours. While obviously unintended, this was an interesting opportunity to measure the differen... [20:36:06] !log gerrit2001 - removing hhvm package (which is now possible without also removing scap, thanks thcipriani), apt-get autoremove to remove more unused packages (T178039) [20:36:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:36:14] T178039: scap should not pull in HHVM on stretch hosts using PHP7 - https://phabricator.wikimedia.org/T178039 [20:36:14] 10Operations, 10Performance-Team, 10netops, 10Patch-For-Review, 10Performance-Team-notice: Merge AS14907 with AS43821 - https://phabricator.wikimedia.org/T167840#3684258 (10Krinkle) [20:36:27] 10Operations, 10netops, 10Patch-For-Review, 10Performance-Team (Radar), 10Performance-Team-notice: Merge AS14907 with AS43821 - https://phabricator.wikimedia.org/T167840#3346480 (10Krinkle) [20:38:13] 10Operations, 10Puppet, 10Performance-Team (Radar), 10User-Joe: Logic problem in puppet.git tests - https://phabricator.wikimedia.org/T176671#3684272 (10Krinkle) 05Open>03Resolved p:05Triage>03Normal a:03Joe [20:42:26] 10Operations, 10Scap: scap should not pull in HHVM on stretch hosts using PHP7 - https://phabricator.wikimedia.org/T178039#3684307 (10Krinkle) [20:58:32] 10Operations, 10LDAP-Access-Requests, 10WMF-NDA-Requests, 10User-Addshore: Request to be added to the ldap/wmde group - https://phabricator.wikimedia.org/T177599#3684322 (10Dzahn) >>! In T177599#3674032, @Addshore wrote: > I don't see the wmde LDAP group in the admins puppet module. You are right! I was g... [21:01:21] (03PS2) 10BBlack: Add inpkts_sent metric [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382864 [21:01:23] (03PS2) 10BBlack: Remove multi-head support from strq [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382865 [21:01:25] (03PS2) 10BBlack: Move the strq object completely inside the purger [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382866 [21:01:27] (03PS2) 10BBlack: Move all URL parsing and HTTP req generation to receiver [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382867 [21:01:29] (03PS2) 10BBlack: Chain the purgers together. [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382868 [21:01:31] (03PS1) 10BBlack: reduce CONN_WAIT_MAX to 8s [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/384164 [21:01:33] (03PS1) 10BBlack: use def loop w/ select() [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/384165 [21:01:35] (03PS2) 10BBlack: split per-purger stats [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382869 [21:01:37] (03PS2) 10BBlack: Bump http-parser upstream src to 2.7.1 [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382870 [21:01:39] (03PS2) 10BBlack: http-parser usage updates [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382871 [21:01:41] (03PS2) 10BBlack: bump copyright years [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382872 [21:01:43] (03PS2) 10BBlack: Release 0.0.12 [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/382873 [21:01:45] (03PS1) 10BBlack: Refactor/improve purging code [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/384167 [21:01:47] (03PS1) 10BBlack: purger: optimize write watcher churn [software/varnish/vhtcpd] - 10https://gerrit.wikimedia.org/r/384168 [21:01:53] @seen bmansurov [21:01:53] mutante: Last time I saw bmansurov they were quitting the network with reason: no reason was given N/A at 10/13/2017 8:18:50 PM (43m3s ago) [21:09:02] 10Operations, 10Patch-For-Review: create endowment.wm.org microsite - https://phabricator.wikimedia.org/T136735#3684344 (10Dzahn) p:05High>03Normal [22:07:59] (03PS1) 10Dzahn: base/icinga: if on labs, don't page for mysql procs [puppet] - 10https://gerrit.wikimedia.org/r/384183 (https://phabricator.wikimedia.org/T178008) [22:15:26] 10Operations, 10monitoring, 10Patch-For-Review: ensure that services on labtest machines never create SMS from Icinga (not send sms pages for labtest* things to non-cloud folks) - https://phabricator.wikimedia.org/T178008#3684538 (10Dzahn) The second patch should be a nicer solution, it avoids having a regex... [22:16:09] 10Operations, 10monitoring, 10Patch-For-Review: ensure that services on labtest machines never create SMS from Icinga (not send sms pages for labtest* things to non-cloud folks) - https://phabricator.wikimedia.org/T178008#3684539 (10Dzahn) There are other existing checks though that would page and don't alre... [22:27:31] (03PS2) 10Dzahn: base/icinga: if on labs, don't page for mysql procs [puppet] - 10https://gerrit.wikimedia.org/r/384183 (https://phabricator.wikimedia.org/T178008) [22:29:24] (03CR) 10Dzahn: [C: 04-1] "https://gerrit.wikimedia.org/r/#/c/384183/ should be the better alternative" [puppet] - 10https://gerrit.wikimedia.org/r/383713 (https://phabricator.wikimedia.org/T178008) (owner: 10Dzahn) [22:30:57] (03PS3) 10Dzahn: base/icinga: if on labs, don't page for mysql procs [puppet] - 10https://gerrit.wikimedia.org/r/384183 (https://phabricator.wikimedia.org/T178008) [22:41:14] 10Operations, 10monitoring, 10Patch-For-Review: ensure that services on labtest machines never create SMS from Icinga (not send sms pages for labtest* things to non-cloud folks) - https://phabricator.wikimedia.org/T178008#3677751 (10Dzahn) [22:55:41] 10Operations, 10Discovery, 10Traffic, 10WMDE-Analytics-Engineering, and 3 others: Allow access to wdqs.svc.eqiad.wmnet on port 8888 - https://phabricator.wikimedia.org/T176875#3684693 (10Dzahn) [23:00:08] (03CR) 10Dzahn: "I see that " osm sync lag from /srv/osmosis/state.txt" is checked as done. But the postgres part is as well?" [puppet] - 10https://gerrit.wikimedia.org/r/382905 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [23:03:32] 10Operations, 10Discovery, 10Traffic, 10WMDE-Analytics-Engineering, and 3 others: Allow access to wdqs.svc.eqiad.wmnet on port 8888 - https://phabricator.wikimedia.org/T176875#3684701 (10Smalyshev) I wonder if it may be more beneficial to use codfw ones for longer tasks, since they are getting less routine... [23:04:16] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 276 bytes in 0.028 second response time [23:04:35] !log scb1001 - systemctl restart pdfrender (Icinga said: pdfrender Connection refused) [23:04:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:05:57] ACKNOWLEDGEMENT - Host db2081.mgmt is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T178140 [23:06:39] (03CR) 10Bearloga: Add profiles/roles for stats/ML on Wikimedia Cloud (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) (owner: 10Bearloga) [23:07:41] (03PS3) 10Bearloga: Add profiles/roles for stats/ML on Wikimedia Cloud [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) [23:08:07] (03CR) 10jerkins-bot: [V: 04-1] Add profiles/roles for stats/ML on Wikimedia Cloud [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) (owner: 10Bearloga) [23:10:25] (03PS4) 10Bearloga: Add profiles/roles for stats/ML on Wikimedia Cloud [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) [23:10:55] (03CR) 10jerkins-bot: [V: 04-1] Add profiles/roles for stats/ML on Wikimedia Cloud [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) (owner: 10Bearloga) [23:12:18] (03PS5) 10Bearloga: Add profiles/roles for stats/ML on Wikimedia Cloud [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) [23:12:33] (03CR) 10Dzahn: Add profiles/roles for stats/ML on Wikimedia Cloud (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) (owner: 10Bearloga) [23:13:40] bearloga: require -> Service['shiny-server'], should probably work [23:13:47] require => [23:14:08] and then there is "onlyif => 'any shell command'," [23:15:19] mutante: I might go for that second one because I want the stuff to work on machines without shiny server but then *if* there's shiny server, then there should be a restart of that service when the R package is uninstalled [23:15:21] mutante: thanks! [23:16:16] welcome, there is "onlyif" and apparently also "unless" [23:16:16] mutante: wait, I want the command to still be executable [23:17:30] if I'm reading the doc right, onlyif would block the exec if there's no shiny server [23:17:47] bearloga: "if Service['shiny-server'] ... exec [23:17:48] ? [23:17:55] so if the resource exists.. then [23:18:09] and then put only the reload inside that [23:18:12] OHHHHHH [23:18:18] and the part you always want outside of it [23:18:27] sorry, it has been A Day [23:18:42] no worries at all, i also didn't think it through first [23:26:38] mutante: so…is this a valid way to do it? https://www.irccloud.com/pastebin/Lz50DD7s/conditional-notification.pp [23:29:28] mutante: thank you again for your help with this! I'm still very much a puppet newbie and know just enough to be dangerous [23:36:16] (03CR) 10Bearloga: Add profiles/roles for stats/ML on Wikimedia Cloud (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/383916 (https://phabricator.wikimedia.org/T178096) (owner: 10Bearloga) [23:38:18] (03CR) 10Chad: [C: 04-1] "For the reasons I outlined on Id04865a1227fc787b91aa832ec501f882c61a4ed https://phabricator.wikimedia.org/T178189#3684726" [puppet] - 10https://gerrit.wikimedia.org/r/384085 (https://phabricator.wikimedia.org/T178189) (owner: 10Paladox)