[00:00:05] <jouncebot>	 twentyafterfour: Dear deployers, time to do the Phabricator update deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181025T0000).
[00:02:27] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work), and 4 others: WDQS Updater ran into issue and stopped working - https://phabricator.wikimedia.org/T207817 (10Smalyshev)
[00:14:39] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10faidon) >>! In T207536#4689900, @GTirloni wrote: > @faidon the complete separation seems like a great goal from a se...
[00:27:48] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): wdqs updater should be better isolated from blazegraph and common workload should be shared between servers - https://phabricator.wikimedia.org/T207837 (10Smalyshev) Huh this is a big one. I've thought about it a bunch l...
[00:30:02] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10faidon) >>! In T207536#4692241, @aborrero wrote: > Please @faidon confirm I'm understanding this right. >  > If I co...
[00:35:39] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10Krenair) >>! In T207536#4693519, @faidon wrote: > I don't think the intention was to put any pressure about doing th...
[00:37:35] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10Krenair) >>! In T207536#4693535, @faidon wrote: >>>! In T207536#4692241, @aborrero wrote: >> Please @faidon confirm...
[01:01:13] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
[01:07:54] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
[01:33:03] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.224 second response time
[01:34:04] <icinga-wm>	 RECOVERY - Memory correctable errors -EDAC- on thumbor1004 is OK: (C)4 ge (W)2 ge 1 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=thumbor1004&var-datasource=eqiad%2520prometheus%252Fops
[01:36:24] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[01:43:05] <wikibugs>	 (03CR) 10GTirloni: [C: 031] Move mail_smarthost (and wikimail_smarthost) to hiera [puppet] - 10https://gerrit.wikimedia.org/r/469524 (https://phabricator.wikimedia.org/T207887) (owner: 10Alex Monk)
[03:30:54] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 966.00 seconds
[03:32:01] <wikibugs>	 10Operations, 10Discovery-Search (Current work): Refactor current code base to support multiple elasticsearch instances/multiple elasticsearch clusters - https://phabricator.wikimedia.org/T207918 (10Mathew.onipe) p:05Triage>03Normal
[03:35:34] <wikibugs>	 10Operations, 10Discovery-Search (Current work): Write cookbooks to support spicerack's elasticsearch multi cluster/instance - https://phabricator.wikimedia.org/T207919 (10Mathew.onipe) p:05Triage>03Normal
[03:38:13] <wikibugs>	 10Operations, 10Discovery-Search (Current work): Test spicerack elasticsearch module on relforge or similar environment - https://phabricator.wikimedia.org/T207920 (10Mathew.onipe) p:05Triage>03Normal
[03:43:42] <wikibugs>	 10Operations, 10Discovery-Search, 10Elasticsearch: Refactor current code base to support multiple elasticsearch instances/multiple elasticsearch clusters - https://phabricator.wikimedia.org/T207918 (10Mathew.onipe)
[03:47:45] <wikibugs>	 (03PS4) 10Mathew.onipe: elasticsearch_cluster: multi-cluster/multi-instance support [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918)
[03:50:10] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] elasticsearch_cluster: multi-cluster/multi-instance support [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe)
[03:50:51] <wikibugs>	 (03CR) 10Mathew.onipe: elasticsearch_cluster: multi-cluster/multi-instance support (033 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe)
[03:53:14] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 152.66 seconds
[03:58:33] <icinga-wm>	 RECOVERY - High lag on wdqs1004 is OK: (C)3600 ge (W)1200 ge 1177 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[04:04:51] <wikibugs>	 10Operations, 10Security-Team, 10Wikimedia-Site-requests, 10Patch-For-Review: Enable csp-report-only mode everywhere - https://phabricator.wikimedia.org/T207900 (10Bawolff) [Just for context, i did small wikis, but I'll wait until talking to logstash folks before doing big wikis]
[04:07:11] <wikibugs>	 (03PS5) 10Mathew.onipe: elasticsearch_cluster: multi-cluster/multi-instance support [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918)
[04:10:22] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] elasticsearch_cluster: multi-cluster/multi-instance support [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe)
[04:15:43] <icinga-wm>	 RECOVERY - High lag on wdqs1005 is OK: (C)3600 ge (W)1200 ge 1165 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[04:15:55] <wikibugs>	 (03PS6) 10Mathew.onipe: elasticsearch_cluster: multi-cluster/multi-instance support [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918)
[05:38:03] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 53, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[05:38:54] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 122, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[06:06:45] <elukey>	 !log upload druid 0.12.3-1 debs to stretch-wikimedia
[06:06:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:17:46] <SMalyshev>	 !log depooling wdqs1003 again, it's not catching up like the other hosts
[06:17:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[06:22:04] <wikibugs>	 (03PS1) 10Elukey: profile::eventlogging::analytics:files: do not delaycompress logs [puppet] - 10https://gerrit.wikimedia.org/r/469556
[06:24:07] <wikibugs>	 (03CR) 10Elukey: [C: 032] profile::eventlogging::analytics:files: do not delaycompress logs [puppet] - 10https://gerrit.wikimedia.org/r/469556 (owner: 10Elukey)
[06:28:44] <icinga-wm>	 PROBLEM - puppet last run on mw1307 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ImageMagick-6/policy.xml]
[06:29:13] <icinga-wm>	 PROBLEM - HHVM rendering on mw1222 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[06:29:33] <icinga-wm>	 PROBLEM - puppet last run on phab1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/diamond/collectors/ApacheStatusSimple/ApacheStatusSimple.py]
[06:30:13] <icinga-wm>	 RECOVERY - HHVM rendering on mw1222 is OK: HTTP OK: HTTP/1.1 200 OK - 74181 bytes in 1.680 second response time
[06:33:18] <icinga-wm>	 PROBLEM - puppet last run on authdns2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/profile.d/bash_autologout.sh]
[06:33:18] <icinga-wm>	 PROBLEM - puppet last run on ms-be1035 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/bin/swift-drive-audit]
[06:58:13] <icinga-wm>	 RECOVERY - puppet last run on authdns2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[06:58:25] <icinga-wm>	 RECOVERY - puppet last run on ms-be1035 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[06:59:23] <icinga-wm>	 RECOVERY - puppet last run on mw1307 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[07:00:13] <icinga-wm>	 RECOVERY - puppet last run on phab1002 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures
[07:11:17] <moritzm>	 !log installing requests security updates on trusty
[07:11:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:16:28] <vgutierrez>	 !log Uploaded certcentral 0.3 to apt.wikimedia.org (stretch) - T207737 T207478
[07:16:32] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:16:34] <stashbot>	 T207478: Avoid infinite attempts on issuing a certificate on permanent LE side errors - https://phabricator.wikimedia.org/T207478
[07:16:34] <stashbot>	 T207737: LE rejects issuing two certificates with the same CSR on a short timespan - https://phabricator.wikimedia.org/T207737
[07:29:53] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.118 second response time
[07:33:14] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[07:38:43] <wikibugs>	 (03PS1) 10Elukey: hive: introduce HIVE_SERVER2_HADOOP_OPTS [puppet/cdh] - 10https://gerrit.wikimedia.org/r/469562 (https://phabricator.wikimedia.org/T184794)
[07:43:42] <wikibugs>	 (03PS2) 10Muehlenhoff: Switch prometheus-ops rsync module to auto_ferm [puppet] - 10https://gerrit.wikimedia.org/r/467990
[07:44:43] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Switch prometheus-ops rsync module to auto_ferm [puppet] - 10https://gerrit.wikimedia.org/r/467990 (owner: 10Muehlenhoff)
[07:46:07] <wikibugs>	 (03PS3) 10Muehlenhoff: Switch prometheus-ops rsync module to auto_ferm [puppet] - 10https://gerrit.wikimedia.org/r/467990
[07:51:41] <wikibugs>	 (03PS1) 10Elukey: hive: add ensure  => 'directory' to /tmp/hive-parquet-logs [puppet/cdh] - 10https://gerrit.wikimedia.org/r/469563
[07:52:07] <wikibugs>	 (03CR) 10Elukey: [V: 032 C: 032] hive: add ensure  => 'directory' to /tmp/hive-parquet-logs [puppet/cdh] - 10https://gerrit.wikimedia.org/r/469563 (owner: 10Elukey)
[07:53:31] <wikibugs>	 (03PS1) 10Elukey: Update cdh submodule [puppet] - 10https://gerrit.wikimedia.org/r/469564
[07:54:52] <wikibugs>	 (03CR) 10Elukey: [C: 032] Update cdh submodule [puppet] - 10https://gerrit.wikimedia.org/r/469564 (owner: 10Elukey)
[07:57:59] <wikibugs>	 (03PS4) 10Muehlenhoff: Switch prometheus-ops rsync module to auto_ferm [puppet] - 10https://gerrit.wikimedia.org/r/467990
[07:58:23] <wikibugs>	 (03PS2) 10Elukey: hive: introduce HIVE_SERVER2_HADOOP_OPTS [puppet/cdh] - 10https://gerrit.wikimedia.org/r/469562 (https://phabricator.wikimedia.org/T184794)
[08:04:58] <wikibugs>	 (03CR) 10Elukey: [C: 032] hive: introduce HIVE_SERVER2_HADOOP_OPTS [puppet/cdh] - 10https://gerrit.wikimedia.org/r/469562 (https://phabricator.wikimedia.org/T184794) (owner: 10Elukey)
[08:05:14] <wikibugs>	 10Operations, 10LDAP-Access-Requests: Remove "jk" from "wmde" ldap group - https://phabricator.wikimedia.org/T207792 (10MoritzMuehlenhoff) >>! In T207792#4691594, @jijiki wrote: > @Addshore could you please give us some context on this (e.g. they are not working for WMDE anymore)? thank you!  (As he's still li...
[08:05:53] <wikibugs>	 (03PS1) 10Elukey: Update cdh submodule [puppet] - 10https://gerrit.wikimedia.org/r/469567
[08:07:28] <wikibugs>	 (03CR) 10DCausse: "left a small suggestion" (032 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe)
[08:11:33] <wikibugs>	 (03CR) 10Elukey: [C: 032] Update cdh submodule [puppet] - 10https://gerrit.wikimedia.org/r/469567 (owner: 10Elukey)
[08:16:34] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 55, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:16:44] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 124, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:19:24] <icinga-wm>	 PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is CRITICAL: cluster=cache_text site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:20:33] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5
[08:20:57] <wikibugs>	 (03CR) 10Gehel: wdqs: increase restart interval of wdqs-updater (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/469447 (https://phabricator.wikimedia.org/T207843) (owner: 10Gehel)
[08:21:42] <elukey>	 it seemed one single spike
[08:21:53] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[08:22:53] <icinga-wm>	 RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1
[08:23:03] <elukey>	 ema: --^
[08:26:53] <icinga-wm>	 PROBLEM - Router interfaces on cr2-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 53, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:26:53] <moritzm>	 probably reboots?
[08:26:54] <icinga-wm>	 PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 122, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[08:28:43] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[08:29:34] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=ulsfo&var-cache_type=All&var-status_type=5
[08:31:50] <elukey>	 some failed fetches from https://grafana.wikimedia.org/dashboard/db/varnish-failed-fetches?orgId=1&var-datasource=ulsfo%20prometheus%2Fops&var-cache_type=text&var-server=All&var-layer=backend&from=now-3h&to=now
[08:33:42] <wikibugs>	 10Operations, 10monitoring, 10Discovery-Search (Current work), 10Patch-For-Review: Create an Icinga check to alert on packet dropped - https://phabricator.wikimedia.org/T206114 (10Gehel) So this shows that we have less than 0.04% of packet loss on the elasticsearch eqiad cluster? I would expect a loss rate...
[08:36:29] <wikibugs>	 10Operations, 10MediaWiki-extensions-Translate: Move a translatable page on mediawiki.org triggers an error message - https://phabricator.wikimedia.org/T207930 (10Trizek-WMF)
[08:37:21] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review: newer version of nagios-nrpe-plugin nrpe (check_nrpe) with fixed logging issue on stretch icinga - https://phabricator.wikimedia.org/T207775 (10MoritzMuehlenhoff) >>! In T207775#4691484, @fgiunchedi wrote: >>>! In T207775#4691005, @fgiunchedi wrote: >> We enabl...
[08:39:08] <wikibugs>	 10Operations, 10MediaWiki-extensions-Translate: Move a translatable page on mediawiki.org triggers an error message - https://phabricator.wikimedia.org/T207930 (10Mainframe98) Same error as with {T207928}.  Trainblocker?
[08:44:51] <wikibugs>	 (03PS1) 10Elukey: hive: replace HADOOP_OPTS with HIVE_SERVER2_HADOOP_OPTS for hive-server2 [puppet/cdh] - 10https://gerrit.wikimedia.org/r/469584 (https://phabricator.wikimedia.org/T184794)
[08:45:54] <wikibugs>	 (03CR) 10Elukey: [V: 032 C: 032] hive: replace HADOOP_OPTS with HIVE_SERVER2_HADOOP_OPTS for hive-server2 [puppet/cdh] - 10https://gerrit.wikimedia.org/r/469584 (https://phabricator.wikimedia.org/T184794) (owner: 10Elukey)
[08:46:39] <wikibugs>	 10Operations, 10MediaWiki-extensions-Translate: Move a translatable page on mediawiki.org triggers an error message - https://phabricator.wikimedia.org/T207930 (10Mainframe98) Presumably caused by {rETRAb2586aebd94d805b82a018459b3197916a3b1992}. Cc'ing @cscott as author of the patch.
[08:49:55] <wikibugs>	 (03PS3) 10Filippo Giunchedi: [deployment-prep] fix elastic config for deployment-logstash2 [puppet] - 10https://gerrit.wikimedia.org/r/469387 (https://phabricator.wikimedia.org/T205672) (owner: 10DCausse)
[08:50:11] <wikibugs>	 10Operations, 10MediaWiki-extensions-Translate: Move a translatable page on mediawiki.org triggers an error message - https://phabricator.wikimedia.org/T207930 (10Trizek-WMF) >>! In T207930#4693990, @Mainframe98 wrote: > Same error as with {T207928}. Looks like it.   I don't have that issue on Meta.
[08:50:50] <wikibugs>	 (03PS1) 10Elukey: role::analytics_cluster_coordinator: enable prometheus metrics for hive [puppet] - 10https://gerrit.wikimedia.org/r/469585 (https://phabricator.wikimedia.org/T184794)
[08:50:53] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] [deployment-prep] fix elastic config for deployment-logstash2 [puppet] - 10https://gerrit.wikimedia.org/r/469387 (https://phabricator.wikimedia.org/T205672) (owner: 10DCausse)
[08:52:22] <wikibugs>	 (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler1002/13197/an-coord1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/469585 (https://phabricator.wikimedia.org/T184794) (owner: 10Elukey)
[08:52:33] <wikibugs>	 (03PS2) 10Elukey: role::analytics_cluster_coordinator: enable prometheus metrics for hive [puppet] - 10https://gerrit.wikimedia.org/r/469585 (https://phabricator.wikimedia.org/T184794)
[08:53:37] <wikibugs>	 (03CR) 10Elukey: [C: 032] role::analytics_cluster_coordinator: enable prometheus metrics for hive [puppet] - 10https://gerrit.wikimedia.org/r/469585 (https://phabricator.wikimedia.org/T184794) (owner: 10Elukey)
[08:57:22] <ema>	 elukey, moritzm: hey
[08:57:32] <ema>	 nope, I haven't started with the reboots yet this morning
[08:58:46] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible, 10Discovery-Search (Current work), 10Patch-For-Review: Elasticsearch puppet config changes broke puppet in various instances - https://phabricator.wikimedia.org/T205672 (10fgiunchedi) Patch merged, though ferm fails because of a known...
[09:00:48] <moritzm>	 ah, ok
[09:02:40] <ema>	 I see no specific issues on the codfw backends, so perhaps a ulsfo<->codfw network blip?
[09:06:24] <icinga-wm>	 PROBLEM - Check systemd state on kafkamon1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[09:06:30] <wikibugs>	 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-extensions-Translate: Move a translatable page on mediawiki.org triggers an error message - https://phabricator.wikimedia.org/T207930 (10Nikerabbit)
[09:07:35] <wikibugs>	 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-extensions-Translate: Move a translatable page on mediawiki.org triggers an error message - https://phabricator.wikimedia.org/T207930 (10Nikerabbit)
[09:08:33] <wikibugs>	 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-extensions-Translate, 10Wikimedia-production-error: Move a translatable page on mediawiki.org triggers an error message - https://phabricator.wikimedia.org/T207930 (10Nikerabbit)
[09:08:42] <wikibugs>	 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-extensions-Translate, 10Wikimedia-production-error: Move a translatable page on mediawiki.org triggers an error message - https://phabricator.wikimedia.org/T207930 (10Nikerabbit)
[09:08:50] <wikibugs>	 (03CR) 10DCausse: elasticsearch_cluster: multi-cluster/multi-instance support (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/468558 (https://phabricator.wikimedia.org/T207918) (owner: 10Mathew.onipe)
[09:09:59] <ema>	 at any rate, it seems that was just a temporary glitch, safe to resume the reboots
[09:10:09] <ema>	 !log resume cache hosts rolling reboots for kernel/microcode updates T203011
[09:10:11] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:15:15] <logmsgbot>	 !log elukey@deploy1001 Started deploy [analytics/turnilo/deploy@84bf1ad]: Upgrade to 1.8.1
[09:15:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:15:25] <logmsgbot>	 !log elukey@deploy1001 Finished deploy [analytics/turnilo/deploy@84bf1ad]: Upgrade to 1.8.1 (duration: 00m 10s)
[09:15:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[09:29:10] <godog>	 dcausse: looks like elasticsearch on deployment-logstash2 is back up!
[09:29:27] <dcausse>	 godog: \o/
[09:29:30] <godog>	 for some reason though new logstash indices are not being created
[09:29:39] <dcausse>	 :/
[09:29:43] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
[09:30:08] <dcausse>	 godog: might be a data.path mixup, I'll take a look
[09:30:09] <godog>	 I have to go shortly, will take a look too later, in case someone wants to look now
[09:30:13] <godog>	 thanks dcausse !
[09:30:15] <dcausse>	 sure
[09:31:51] <godog>	 dcausse: I've turn on manually --debug in the logstash systemd unit and fixed manually the ferm rules due to https://phabricator.wikimedia.org/T205672#4694026 but no changes other than that besides what puppet did
[09:32:11] <dcausse>	 ok
[09:32:13] <icinga-wm>	 PROBLEM - Check systemd state on ms-be2042 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[09:34:04] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
[09:34:43] <icinga-wm>	 RECOVERY - Router interfaces on cr2-codfw is OK: OK: host 208.80.153.193, interfaces up: 124, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[09:34:55] <wikibugs>	 (03PS1) 10Vgutierrez: certcentral: Implement slow retries on challenge rejection by ACME dir. [software/certcentral] - 10https://gerrit.wikimedia.org/r/469590 (https://phabricator.wikimedia.org/T207927)
[09:35:34] <icinga-wm>	 RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 55, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down
[09:36:14] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10aborrero) >>! In T207536#4693554, @Krenair wrote: >>>! In T207536#4693535, @faidon wrote: >>  >> I'm a little confus...
[09:37:37] <wikibugs>	 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-extensions-Translate, 10Wikimedia-production-error: Moving or deleting a translatable page on mediawiki.org triggers an error message - https://phabricator.wikimedia.org/T207930 (10MGChecker)
[09:41:27] <wikibugs>	 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-extensions-Translate, 10Wikimedia-production-error: Moving or deleting a translatable page on mediawiki.org triggers an error message - https://phabricator.wikimedia.org/T207930 (10MGChecker) Reported at [[ https://www.mediawiki.org/wiki/Topi...
[09:43:04] <dcausse>	 godog: I see that it still receives very old events
[09:43:08] <dcausse>	 output received {"event"=>{"severity"=>6, "level"=>"INFO", "timestamp8601"=>"2018-10-22T19:01:31.456878+00:00"
[09:43:20] <wikibugs>	 10Operations, 10LDAP-Access-Requests, 10SRE-Access-Requests, 10Release-Engineering-Team (Kanban): Add Lars Wirzenius to releng LDAP groups - https://phabricator.wikimedia.org/T207833 (10LarsWirzenius) @hashar @jijiki Thanks! I confirm that I can see logstash and grafana now.
[09:44:03] <dcausse>	 I have no clue how it can remember such old events, are they queued now (kafka or something else)?
[09:46:15] <dcausse>	 oh yes I see "closing connection org.apache.kafka.common.network.Selector", it must be catching up its backlog, new indices should come up at some point I suppose 
[09:49:57] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove Pybal Diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/469593 (https://phabricator.wikimedia.org/T183454)
[09:49:59] <wikibugs>	 (03PS1) 10Muehlenhoff: Remove Diamond from LVSes [puppet] - 10https://gerrit.wikimedia.org/r/469594 (https://phabricator.wikimedia.org/T183454)
[09:51:43] <gehel>	 !log resetting deployment directory on wdqs1003
[09:51:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:11:41] <elukey>	 !log upgrade druid100[1-3] to druid 0.12.3
[10:11:43] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:22:38] <wikibugs>	 (03CR) 10Alex Monk: [C: 032] certcentral: Implement slow retries on challenge rejection by ACME dir. [software/certcentral] - 10https://gerrit.wikimedia.org/r/469590 (https://phabricator.wikimedia.org/T207927) (owner: 10Vgutierrez)
[10:25:43] <wikibugs>	 (03Merged) 10jenkins-bot: certcentral: Implement slow retries on challenge rejection by ACME dir. [software/certcentral] - 10https://gerrit.wikimedia.org/r/469590 (https://phabricator.wikimedia.org/T207927) (owner: 10Vgutierrez)
[10:27:33] <wikibugs>	 (03CR) 10jenkins-bot: certcentral: Implement slow retries on challenge rejection by ACME dir. [software/certcentral] - 10https://gerrit.wikimedia.org/r/469590 (https://phabricator.wikimedia.org/T207927) (owner: 10Vgutierrez)
[10:30:03] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.326 second response time
[10:33:24] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:38:34] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
[10:39:08] <elukey>	 mmmm the mw exceptions graph looks horrible since 7:30 AM
[10:39:10] <elukey>	 is it known?
[10:39:22] <Bsadowski1>	 -/sleep
[10:39:26] <Bsadowski1>	 er :O
[10:39:41] <Bsadowski1>	 Sorry, I have a command to do /away everywhere :P
[10:39:47] <Bsadowski1>	 Hmm, night.
[10:47:19] <aharoni>	 Hallo.
[10:47:33] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
[10:47:34] <aharoni>	 I'm here for the SWAT that's happening soon.
[10:47:57] <elukey>	 next
[10:48:07] <elukey>	 logmsgbot: next
[10:48:17] <elukey>	 mmm this morning is difficult
[10:49:05] <elukey>	 jouncebot: next
[10:49:05] <jouncebot>	 In 0 hour(s) and 10 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181025T1100)
[10:49:10] <elukey>	 oh there you go
[10:49:22] <elukey>	 :)
[10:50:24] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 5.482 second response time
[10:53:53] <icinga-wm>	 PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[10:56:48] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production servers (mwlog*, mmaint* ?) for kharlan - https://phabricator.wikimedia.org/T207330 (10MoritzMuehlenhoff) 05Resolved>03Open @kostajh : You're using the same key in production as in WMCS: This is a security risk since...
[10:57:56] <volans>	 !log restart pdfrender on scb1003
[10:57:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[10:58:13] <icinga-wm>	 RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.004 second response time
[11:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181025T1100).
[11:00:04] <jouncebot>	 bmansurov, Zoranzoki21, and aharoni: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[11:00:16] <zeljkof>	 I can swat today
[11:00:25] <bmansurov>	 o/ zeljkof 
[11:01:07] <zeljkof>	 bmansurov: if you are a deployer, feel free to deploy your patch
[11:01:13] <zeljkof>	 otherwise, I can do it
[11:01:19] <bmansurov>	 zeljkof: I'm not a deployer ;(
[11:01:40] <zeljkof>	 bmansurov: that's not a closed club, you can always become one ;)
[11:01:55] <wikibugs>	 (03PS3) 10Zfilipin: Stop collecting data CitaitonUsage and CitationUsagePageLoad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/465418 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov)
[11:02:01] <bmansurov>	 zeljkof: thanks, I'll keep it in mind.
[11:02:16] <zeljkof>	 bmansurov: I'll ping you in a few minutes when the patch is at mwdebug1002 and ready for testing
[11:02:23] <bmansurov>	 zeljkof: cool
[11:03:15] <zeljkof>	 bmansurov: just checking,  a gerrit comments says " Deploy on 10/29"
[11:03:18] <zeljkof>	 did the timeline change?
[11:03:23] <bmansurov>	 zeljkof: yes
[11:03:37] <wikibugs>	 (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/465418 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov)
[11:03:41] <zeljkof>	 ok, merging
[11:03:52] <bmansurov>	 OK
[11:05:13] <wikibugs>	 (03Merged) 10jenkins-bot: Stop collecting data CitaitonUsage and CitationUsagePageLoad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/465418 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov)
[11:07:04] <zeljkof>	 bmansurov: it's at mwdebug1002, please test and let me know if I can deploy it
[11:07:10] <bmansurov>	 ok
[11:07:32] <bmansurov>	 zeljkof: it's working, please go on
[11:07:37] <wikibugs>	 (03CR) 10jenkins-bot: Stop collecting data CitaitonUsage and CitationUsagePageLoad [mediawiki-config] - 10https://gerrit.wikimedia.org/r/465418 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov)
[11:07:47] <zeljkof>	 bmansurov: ok, deploying
[11:08:11] <wikibugs>	 (03PS7) 10GTirloni: ntp: move diamond::collector to where it will only apply to ntp servers [puppet] - 10https://gerrit.wikimedia.org/r/464866 (https://phabricator.wikimedia.org/T183454) (owner: 10Cwhite)
[11:08:58] <logmsgbot>	 !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:465418|Stop collecting data CitaitonUsage and CitationUsagePageLoad (T191086 T203253)]] (duration: 00m 57s)
[11:09:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:09:03] <stashbot>	 T191086: Instrument and collect data via CitationUsage schema - https://phabricator.wikimedia.org/T191086
[11:09:03] <stashbot>	 T203253: Run a Second Round of Data Collection - https://phabricator.wikimedia.org/T203253
[11:09:15] <zeljkof>	 bmansurov: it's deployed, please test and thanks for deploying with #releng ;)
[11:09:57] <aharoni>	 hi zeljkof o/
[11:10:12] <bmansurov>	 zeljkof: looks great. Thank you and great customer service you got at #releng :)))
[11:12:07] <zeljkof>	 bmansurov: we are here to server, until software replaces us ;)
[11:12:17] <zeljkof>	 "here to serve"
[11:12:27] <zeljkof>	 hi aharoni!
[11:12:29] <wikibugs>	 (03CR) 10GTirloni: [C: 032] ntp: move diamond::collector to where it will only apply to ntp servers [puppet] - 10https://gerrit.wikimedia.org/r/464866 (https://phabricator.wikimedia.org/T183454) (owner: 10Cwhite)
[11:12:53] <aharoni>	 I have two patches to deploy. Both are needed for meaningful testing.
[11:12:53] <wikibugs>	 (03PS10) 10GTirloni: hiera: diamond::remove on openstack control role [puppet] - 10https://gerrit.wikimedia.org/r/465456 (https://phabricator.wikimedia.org/T183454) (owner: 10Cwhite)
[11:13:20] <zeljkof>	 aharoni: ok, I'll ping you in a few minutes, just to deploy one simple commit
[11:13:30] <aharoni>	 OK
[11:13:34] <bmansurov>	 ;)
[11:14:33] <wikibugs>	 (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469261 (https://phabricator.wikimedia.org/T207742) (owner: 10Zoranzoki21)
[11:15:35] <wikibugs>	 (03Merged) 10jenkins-bot: New throttle rule for Johannesburg Event on 2018-10-27 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469261 (https://phabricator.wikimedia.org/T207742) (owner: 10Zoranzoki21)
[11:16:49] <Zoranzoki21>	 Hi, I had problems with access
[11:17:00] <Zoranzoki21>	 SWAT is not end?
[11:17:00] <zeljkof>	 Zoranzoki21: just deploying your commit :)
[11:17:26] <zeljkof>	 it looked simple enough, and there's nothing to test anyway...
[11:17:28] <logmsgbot>	 !log zfilipin@deploy1001 Synchronized wmf-config/throttle.php: SWAT: [[gerrit:469261|New throttle rule for Johannesburg Event on 2018-10-27 (T207742)]] (duration: 00m 55s)
[11:17:31] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[11:17:31] <stashbot>	 T207742: Requesting temporary lift of IP cap for Johannesburg Event on 2018-10-27 - https://phabricator.wikimedia.org/T207742
[11:17:34] <Zoranzoki21>	 zeljkof: Yes
[11:17:37] <zeljkof>	 Zoranzoki21: it's deployed! :)
[11:17:53] <zeljkof>	 aharoni: please stand by, you're next! :)
[11:18:06] <aharoni>	 ack
[11:18:13] <zeljkof>	 I'll ping you as soon as the first commit is at mwdebug1002 for testing
[11:19:18] <zeljkof>	 aharoni: so, um, https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ContentTranslation/+/460895 is for master?
[11:19:38] <aharoni>	 zeljkof: what do you mean exactly?
[11:19:38] <zeljkof>	 and https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ContentTranslation/+/469507 also?
[11:19:44] <zeljkof>	 ok, so let me explain
[11:20:08] <aharoni>	 It's merged, and I need both deployed to production Wikipedias in all languages.
[11:20:08] <zeljkof>	 if a commit gets merged into master, like the two above are, they will be deployed during the next train deploy
[11:20:13] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to deployment and analytics-privatedata-users for sbassett - https://phabricator.wikimedia.org/T207852 (10jijiki) p:05Triage>03Normal
[11:20:23] <zeljkof>	 so probably next week
[11:20:43] <zeljkof>	 depending on if they merged before the new deplyoment branch was cut
[11:20:55] <aharoni>	 That's the problem: one before, one after.
[11:21:06] <zeljkof>	 if a commit needs to be deployed before the next train, we can deploy it during swat
[11:21:07] <aharoni>	 and I need them together, in production today if possible.
[11:21:18] <aharoni>	 now is SWAT, isn't it?
[11:21:22] <zeljkof>	 yes
[11:21:35] <zeljkof>	 but we do not deploy master to production
[11:21:46] <zeljkof>	 we deploy a deployment branch
[11:22:12] <zeljkof>	 so, a commit has to be cherry picked to a branch, merged there and deployed
[11:22:24] <aharoni>	 Oh, I thought this is no longer needed.
[11:22:30] <aharoni>	 Can I do it quickly?
[11:22:40] <aharoni>	 to which branch?
[11:22:42] <zeljkof>	 since deployment situation is not good this week, I think new branch is only on group 0
[11:23:00] <Reedy>	 http://tools.wmflabs.org/versions/
[11:23:03] <zeljkof>	 sure, it should be doable in a swat window, it's mostly waiting for CI
[11:23:15] <zeljkof>	 but 10-20 minutes, depending on jobs that run
[11:23:33] <wikibugs>	 (03CR) 10jenkins-bot: New throttle rule for Johannesburg Event on 2018-10-27 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469261 (https://phabricator.wikimedia.org/T207742) (owner: 10Zoranzoki21)
[11:23:52] <zeljkof>	 current branches are 1.32.0-wmf.26 (old, but around possibly for a few more days, or even until next week)
[11:24:04] <aharoni>	 remind me please, what are groups 0, 1, 2?
[11:24:20] <zeljkof>	 and 1.33.0-wmf.1, new, should be on all wikis by Thursday, but in case of trouble, never .)
[11:24:33] <zeljkof>	 go to https://tools.wmflabs.org/versions/
[11:24:40] <zeljkof>	 there are three boxes
[11:24:51] <zeljkof>	 left one is 0, middle is 1, right is 2
[11:24:55] <aharoni>	 oh I see
[11:25:06] <zeljkof>	 click the triangle and it expands the list of wikis
[11:25:18] <aharoni>	 I need it on all Wikipedias, so groups 1 and 2.
[11:25:19] <zeljkof>	 so group 0 are small wikis and test wikis
[11:25:44] <zeljkof>	 group 1 is some middle ground, group 2 are big wikis, like enwiki
[11:26:25] <zeljkof>	 aharoni: so, 460895 is in the new branch? (I guess no action is needed then)
[11:26:35] <zeljkof>	 but 469507 is not?
[11:26:53] <aharoni>	 I think it's the other way around.
[11:27:23] <zeljkof>	 469507 got merged this morning, so it's unlikely it is in the deployment branch
[11:28:06] <zeljkof>	 460895 merged a couple of days ago (October 23) so maybe in the deployment branch, checking
[11:28:28] <aharoni>	 zeljkof: both group 1 and group 2 are on 1.32.0-wmf.26 now.
[11:28:44] <aharoni>	 so I made https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/ContentTranslation/+/469603/
[11:29:06] <zeljkof>	 aharoni: yes, that's the old branch, hopefully going away today, but maybe not
[11:29:58] <aharoni>	 Yeah, and https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/ContentTranslation/+/469507/ was indeed merged today, and it's not part of the branch cut on Tuesday.
[11:30:04] <aharoni>	 Do I need to cherry-pick it?
[11:30:41] <zeljkof>	 aharoni: if you want it deployed, I need a commit in one (or both) currently deployed branches
[11:30:51] <zeljkof>	 so tldr: probably yes
[11:30:52] <zeljkof>	 :)
[11:32:29] <aharoni>	 zeljkof: drat. it depends on the other patch, so I'm afraid I cannot cherry-pick it until the first one goes through CI and is merged.
[11:32:45] <aharoni>	 or is there a way to rebase it somehow?
[11:32:53] <zeljkof>	 aharoni: in the same repo? you can chain commits, right?
[11:33:09] <zeljkof>	 uh, I don't think I've chained cherry picks before
[11:34:15] <aharoni>	 zeljkof: another question: if the train runs tonight, will group 1 and group 2 be switched to 1.33.0-wmf.1 ?
[11:35:07] <zeljkof>	 if current problems are resolved, and if there are no problems while promoting the new branch to groups 1 and 2, then yes
[11:35:28] <zeljkof>	 but in practice, hard to tell, since unexpected problems are, well, unexpected :)
[11:35:57] <aharoni>	 zeljkof: OK... so then the cherry-picks I'm doing now probably have to be done for 1.33.0-wmf.1, too?
[11:36:34] <zeljkof>	 aharoni: yes, if the commit(s) are not already in the branch, they have to be cherry picked and deployed in a swat window
[11:36:47] <aharoni>	 OK
[11:36:51] <zeljkof>	 there are two more swat windows today (during US working hours) 
[11:36:51] <Krenair>	 godog, ema: Hi, please can you put me in contact with a Debian FreeNode Group Contact?
[11:36:53] <wikibugs>	 10Operations: ferm fail to start at boot in some cases - https://phabricator.wikimedia.org/T207417 (10jijiki)
[11:36:55] <wikibugs>	 10Operations, 10Patch-For-Review: Firewall sets not being loaded post-reboot due to a @resolve race on jessie - https://phabricator.wikimedia.org/T148986 (10jijiki)
[11:37:07] <zeljkof>	 so probably not a good time for us, but somebody in the US can be around for SWAT
[11:37:24] <zeljkof>	 or if it's urgent, somebody might stay around for a late deploy :)
[11:37:49] <aharoni>	 zeljkof: OK, I think I figured everything out
[11:37:55] <zeljkof>	 aharoni: I guess you're not in Portland, since you're awake?
[11:38:16] <aharoni>	 zeljkof: no. I was invited actually, but I'm too busy with my family ;)
[11:38:35] <aharoni>	 another baby coming up, if all goes well 
[11:38:57] <zeljkof>	 oh, didn't know, congratulations! :)
[11:39:04] <aharoni>	 thanks ;)
[11:39:45] <wikibugs>	 10Operations, 10ops-eqiad, 10cloud-services-team: Degraded RAID on cloudvirt1019 - https://phabricator.wikimedia.org/T207868 (10jijiki) p:05Triage>03High a:03Cmjohnson
[11:40:06] <aharoni>	 zeljkof: So: I made https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/ContentTranslation/+/469605/ and https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/ContentTranslation/+/469603/
[11:40:12] <aharoni>	 and I'm waiting for CI
[11:42:38] <zeljkof>	 aharoni: ok, so, it's unlikely that those commits will be merged and deployed in this swat window
[11:42:47] <aharoni>	 :(
[11:43:19] <wikibugs>	 10Operations, 10ops-eqiad, 10Cloud-Services, 10cloud-services-team: Degraded RAID on cloudvirt1019 - https://phabricator.wikimedia.org/T207868 (10jijiki)
[11:43:25] <zeljkof>	 looks like CI will need 10-20 minutes, and then another 10-20 when merging, and there are 15 minutes left...
[11:44:05] <zeljkof>	 there are two more windows today, 18:00–19:00 and 23:00–00:00 UTC
[11:44:13] <zeljkof>	 is any of them good for you or somebody from your team?
[11:45:05] <zeljkof>	 if this is causing an outage or a serious problem, I can always extend the swat window, but if it can wait until the next window, that would be better
[11:46:03] <zeljkof>	 or, I can +2 the commits now, before the test pipeline jobs are done, if a job fails, the commits will not get merged anyway
[11:46:23] <zeljkof>	 aharoni: do you need a lot of time to test the commits at mwdebug1002?
[11:46:32] <aharoni>	 no, super-short
[11:46:49] <wikibugs>	 10Operations, 10Patch-For-Review: Upgrade calico in production to version 2.4+ - https://phabricator.wikimedia.org/T207804 (10jijiki) p:05Triage>03Normal
[11:47:11] <aharoni>	 zeljkof: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/ContentTranslation/+/469603/ is ready
[11:47:29] <zeljkof>	 aharoni: does deploying one commit help, or do we need both? or more?
[11:48:04] <aharoni>	 zeljkof: can you merge it https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/ContentTranslation/+/469603/  perhaps? If you merge it, it won't be deployed, right?
[11:48:37] <zeljkof>	 aharoni: if I merge it, I should deploy it, but yes, it does not get deployed automatically
[11:48:46] <aharoni>	 yeah, so you can do it.
[11:48:49] <aharoni>	 it will be helpful
[11:49:06] <zeljkof>	 ok, in that case I'll merge and deploy it
[11:49:31] <zeljkof>	 aharoni: can you please update the calendar with the commits that will be deployed today, so we have a clean record?
[11:49:36] <aharoni>	 OK
[11:52:34] <aharoni>	 zeljkof: done
[11:52:43] <zeljkof>	 aharoni: thanks!
[11:53:39] <zeljkof>	 aharoni: it's very unlikely 469605 will get deployed now
[11:53:55] <zeljkof>	 we'll probably have to extend the window just to deploy 469603
[11:54:35] <zeljkof>	 aaaand one job failed :/ https://integration.wikimedia.org/ci/job/quibble-vendor-mysql-hhvm-docker/22203/
[11:55:06] <zeljkof>	 argh
[11:55:11] <zeljkof>	 `npm ERR! shasum check failed for /tmp/npm-2330-b9caa7b6/registry.npmjs.org/core-js/-/core-js-2.5.7.tgz`
[11:55:20] <zeljkof>	 just what we needed CI trouble
[11:55:46] <zeljkof>	 aharoni: ok, so with 5 minutes left in the window, I suggest that we give up for this window :(
[11:55:59] <aharoni>	 :(
[11:56:09] <zeljkof>	 I don't think we'll be able to deploy anything without extending the window for 30 minutes or more
[11:56:30] <zeljkof>	 that's doable, if this is really urgent, but only if so
[11:56:37] <zeljkof>	 so, how urgent is this? :)
[11:57:35] <aharoni>	 zeljkof: not super-urgent, but what is this failure?!
[11:57:53] <zeljkof>	 looks like CI trouble with caching npm packages :/
[11:58:40] <zeljkof>	 it will probably not happen if I rerun the jobs, but that is slowing everything for another 10-20 minutes :(
[11:58:46] <aharoni>	 sigh
[11:58:58] <zeljkof>	 le sigh
[11:59:17] <zeljkof>	 so, giving up? can you reschedule for later today?
[11:59:24] <aharoni>	 zeljkof: is it possible to rerun it, to at least get it merged?
[11:59:35] <aharoni>	 or does it also have to be deployed if it's merged?
[11:59:39] <zeljkof>	 aharoni: ah, I can't leave stuff merged
[11:59:44] <zeljkof>	 I have to deploy
[11:59:56] <zeljkof>	 yes, I have to deploy merged stuff
[12:00:04] <jouncebot>	 Deploy window Pre MediaWiki train sanity break (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181025T1200)
[12:00:20] <aharoni>	 sigh.
[12:00:23] <aharoni>	 I'll reschedule.
[12:00:27] <aharoni>	 thanks for the help!
[12:01:01] <zeljkof>	 aharoni: ok, sorry for not being able to deploy, we are working on CI, making it faster and more robust, but it takes time...
[12:01:26] <zeljkof>	 aharoni: please update the calendar, I'll remove my +2 from the patch
[12:01:59] <zeljkof>	 !log EU SWAT finished
[12:02:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:02:43] <aharoni>	 zeljkof: I'll update
[12:02:49] <zeljkof>	 thanks!
[12:04:15] <godog>	 Krenair: hi, not sure exactly what group you're referring to, #debian-ops perhaps ?
[12:04:37] <Krenair>	 godog, the debian group on freenode
[12:04:42] <Krenair>	 with debian/* cloaks
[12:05:06] <Krenair>	 I found a page about an IRC council with an email address that sounds like what I want, will check #debian-ops too
[12:05:27] <godog>	 Krenair: ack, yeah I think #debian-ops might be able to help
[12:05:36] <Krenair>	 cool thanks godog 
[12:05:47] <godog>	 dcausse: looks like the backlog flushed and today's indices are created \o/ thanks for your help
[12:10:10] <dcausse>	 cool!
[12:10:25] <volans>	 [head's up] cumin1001 is about to be rebooted in few minutes
[12:11:03] <aharoni>	 zeljkof: "Pupper SWAT" is not appropriate for this, right?
[12:11:46] <zeljkof>	 aharoni: no, it's for deploying things in operations/puppet, as far as I know
[12:12:36] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible, 10Discovery-Search (Current work), 10Patch-For-Review: Elasticsearch puppet config changes broke puppet in various instances - https://phabricator.wikimedia.org/T205672 (10fgiunchedi) Looks like logs in deployment-prep are back now (cc...
[12:14:33] <volans>	 !log rebooting cumin1001 to pick new kernel and clear any potential weird state after OOMs
[12:14:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[12:18:50] <volans>	 cumin1001 back online and at your service
[12:21:52] <wikibugs>	 10Puppet, 10Beta-Cluster-Infrastructure, 10Beta-Cluster-reproducible, 10Discovery-Search (Current work), 10Patch-For-Review: Elasticsearch puppet config changes broke puppet in various instances - https://phabricator.wikimedia.org/T205672 (10dcausse) a:05dcausse>03Krenair I overlooked other instances...
[12:43:51] <wikibugs>	 10Operations, 10Patch-For-Review: Upgrade calico in production to version 2.4+ - https://phabricator.wikimedia.org/T207804 (10jijiki) a:03akosiaris
[12:46:38] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10faidon) >>! In T207536#4694093, @aborrero wrote: > Ok, I think I understand this better now. >  > But we still have...
[12:48:04] <wikibugs>	 10Operations, 10Security-Team, 10Wikimedia-Site-requests, 10Patch-For-Review: Enable csp-report-only mode everywhere - https://phabricator.wikimedia.org/T207900 (10fgiunchedi) >>! In T207900#4693299, @Bawolff wrote: >>>! In T207900#4693255, @faidon wrote: >> Cool! Cc'ing @herron and @fgiunchedi here for aw...
[12:48:41] <wikibugs>	 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-extensions-Translate, 10Language-Team (Language-2018-October-December), and 2 others: Moving or deleting a translatable page on mediawiki.org triggers an error message - https://phabricator.wikimedia.org/T207930 (10Nikerabbit) p:05Triage>0...
[12:59:14] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s5 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 0.00 seconds
[13:00:04] <jouncebot>	 Deploy window MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181025T1300)
[13:01:41] <wikibugs>	 10Operations, 10Patch-For-Review, 10User-fgiunchedi: Register and identify icinga-wm - https://phabricator.wikimedia.org/T205526 (10fgiunchedi) 05Open>03Resolved Completed!  ``` 14:00 :: Whois for: icinga-wm (~icinga-wm@wikimedia/bot/icinga-wm) ```
[13:01:47] <wikibugs>	 10Operations, 10Patch-For-Review, 10User-fgiunchedi: Register and identify icinga-wm - https://phabricator.wikimedia.org/T205526 (10fgiunchedi)
[13:03:13] <godog>	 volans: related to the rabbithole in ^ T205522 can be resolved, what do you think?
[13:03:13] <stashbot>	 T205522: ircecho / icinga-wm crashlooping - https://phabricator.wikimedia.org/T205522
[13:04:00] <volans>	 godog: I guess so, also we didn't had a full repro and we're moving to icinga1001 that might or might not have the same issue
[13:04:06] <volans>	 jessie vs stretch
[13:04:58] <godog>	 indeed, and the code at least won't swallow exceptions now, I'll resolve it
[13:05:40] <wikibugs>	 10Operations, 10IRCecho, 10Patch-For-Review, 10User-fgiunchedi: ircecho / icinga-wm crashlooping - https://phabricator.wikimedia.org/T205522 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi Resolving as the code will log exceptions now and we haven't seen further crashes.
[13:07:11] <volans>	 thanks for the cleanup
[13:07:15] <wikibugs>	 10Operations, 10Traffic, 10Wikimedia-Incident: Power incident in eqsin - https://phabricator.wikimedia.org/T206861 (10faidon)
[13:07:18] <wikibugs>	 10Operations, 10Traffic, 10Wikimedia-Incident: Add maint-announce@ to Equinix's recipient list for eqsin incidents - https://phabricator.wikimedia.org/T207140 (10faidon) 05Resolved>03Open I see emails for SG3 that (as far as I can tell) haven't made it to maint-announce, e.g. ``` Date: Thu, 25 Oct 2018 1...
[13:09:51] <godog>	 np, I had set a reminder to check icinga-wm's cloack
[13:09:54] <godog>	 cloak even
[13:12:39] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, a further improvement might be to check for dpkg-dist conffiles left in case there are config changes to do" [puppet] - 10https://gerrit.wikimedia.org/r/469439 (owner: 10Ema)
[13:15:34] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 031] "Looks good" [puppet] - 10https://gerrit.wikimedia.org/r/469439 (owner: 10Ema)
[13:18:06] <wikibugs>	 (03PS1) 10Gehel: wdqs: cleanup logback configuration [puppet] - 10https://gerrit.wikimedia.org/r/469611 (https://phabricator.wikimedia.org/T207834)
[13:22:06] <wikibugs>	 (03CR) 10Ottomata: "Ahh sorry missed that, thanks elukey." [puppet/cdh] - 10https://gerrit.wikimedia.org/r/469563 (owner: 10Elukey)
[13:23:13] <wikibugs>	 (03CR) 10Ottomata: "Great!" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/469562 (https://phabricator.wikimedia.org/T184794) (owner: 10Elukey)
[13:25:37] <wikibugs>	 (03PS1) 10Filippo Giunchedi: hieradata: change burrow port for kafka logging [puppet] - 10https://gerrit.wikimedia.org/r/469612 (https://phabricator.wikimedia.org/T206454)
[13:25:39] <wikibugs>	 (03PS1) 10Filippo Giunchedi: prometheus: add Burrow metrics for kafka-logging [puppet] - 10https://gerrit.wikimedia.org/r/469613 (https://phabricator.wikimedia.org/T206454)
[13:26:05] <wikibugs>	 (03PS4) 10Ema: wmf-upgrade-and-reboot: non-interactive Debian frontend [puppet] - 10https://gerrit.wikimedia.org/r/469439
[13:26:08] <wikibugs>	 (03PS3) 10Muehlenhoff: Switch srvdumps rsync module to auto_ferm [puppet] - 10https://gerrit.wikimedia.org/r/467978
[13:26:15] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] hieradata: change burrow port for kafka logging [puppet] - 10https://gerrit.wikimedia.org/r/469612 (https://phabricator.wikimedia.org/T206454) (owner: 10Filippo Giunchedi)
[13:27:08] <wikibugs>	 (03CR) 10Ema: [C: 032] wmf-upgrade-and-reboot: non-interactive Debian frontend [puppet] - 10https://gerrit.wikimedia.org/r/469439 (owner: 10Ema)
[13:27:20] <wikibugs>	 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-extensions-Translate, 10Language-Team (Language-2018-October-December), and 2 others: Moving or deleting a translatable page on mediawiki.org triggers an error message - https://phabricator.wikimedia.org/T207930 (10Nikerabbit) Fix has been me...
[13:28:01] <wikibugs>	 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-extensions-Translate, 10Language-Team (Language-2018-October-December), 10Wikimedia-production-error: Moving or deleting a translatable page on mediawiki.org triggers an error message - https://phabricator.wikimedia.org/T207930 (10Nikerabbit)
[13:28:27] <XioNoX>	 !log test add term return-tcp permit on cr2-codfw
[13:28:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:28:48] <wikibugs>	 (03PS1) 10Arturo Borrero Gonzalez: toolforge: bootstrap service node puppet code [puppet] - 10https://gerrit.wikimedia.org/r/469614 (https://phabricator.wikimedia.org/T207591)
[13:29:24] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work), and 4 others: WDQS Updater ran into issue and stopped working - https://phabricator.wikimedia.org/T207817 (10Ottomata) Thanks @Smalyshev, I think you are write that changes like this should be announced a bit better.  We...
[13:29:44] <XioNoX>	 !log test successful, rollback add term return-tcp permit on cr2-codfw
[13:29:45] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:29:50] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] toolforge: bootstrap service node puppet code [puppet] - 10https://gerrit.wikimedia.org/r/469614 (https://phabricator.wikimedia.org/T207591) (owner: 10Arturo Borrero Gonzalez)
[13:30:18] <wikibugs>	 (03CR) 10Ottomata: [C: 031] hieradata: change burrow port for kafka logging [puppet] - 10https://gerrit.wikimedia.org/r/469612 (https://phabricator.wikimedia.org/T206454) (owner: 10Filippo Giunchedi)
[13:30:48] <wikibugs>	 (03CR) 10Ottomata: [C: 031] prometheus: add Burrow metrics for kafka-logging [puppet] - 10https://gerrit.wikimedia.org/r/469613 (https://phabricator.wikimedia.org/T206454) (owner: 10Filippo Giunchedi)
[13:31:32] <wikibugs>	 10Operations, 10Revision-Slider, 10TCB-Team, 10WMDE-Analytics-Engineering, 10Graphite: Fix aggregation of "MediaWiki.RevisionSlider.event.load.sum" from average to sum - https://phabricator.wikimedia.org/T205416 (10fgiunchedi) >>! In T205416#4691176, @Lea_WMDE wrote: > Thanks @fgiunchedi! Just to be sure...
[13:33:19] <wikibugs>	 (03PS2) 10Filippo Giunchedi: hieradata: change burrow port for kafka logging [puppet] - 10https://gerrit.wikimedia.org/r/469612 (https://phabricator.wikimedia.org/T206454)
[13:33:21] <wikibugs>	 (03PS2) 10Filippo Giunchedi: prometheus: add Burrow metrics for kafka-logging [puppet] - 10https://gerrit.wikimedia.org/r/469613 (https://phabricator.wikimedia.org/T206454)
[13:33:44] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] hieradata: change burrow port for kafka logging [puppet] - 10https://gerrit.wikimedia.org/r/469612 (https://phabricator.wikimedia.org/T206454) (owner: 10Filippo Giunchedi)
[13:33:53] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 032] prometheus: add Burrow metrics for kafka-logging [puppet] - 10https://gerrit.wikimedia.org/r/469613 (https://phabricator.wikimedia.org/T206454) (owner: 10Filippo Giunchedi)
[13:34:57] <wikibugs>	 (03PS3) 10Filippo Giunchedi: hieradata: change burrow port for kafka logging [puppet] - 10https://gerrit.wikimedia.org/r/469612 (https://phabricator.wikimedia.org/T206454)
[13:35:37] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 031] Switch prometheus-ops rsync module to auto_ferm [puppet] - 10https://gerrit.wikimedia.org/r/467990 (owner: 10Muehlenhoff)
[13:37:03] <wikibugs>	 (03PS3) 10Filippo Giunchedi: prometheus: add Burrow metrics for kafka-logging [puppet] - 10https://gerrit.wikimedia.org/r/469613 (https://phabricator.wikimedia.org/T206454)
[13:37:09] <wikibugs>	 (03CR) 10Filippo Giunchedi: [V: 032 C: 032] prometheus: add Burrow metrics for kafka-logging [puppet] - 10https://gerrit.wikimedia.org/r/469613 (https://phabricator.wikimedia.org/T206454) (owner: 10Filippo Giunchedi)
[13:39:33] <icinga-wm>	 RECOVERY - Check systemd state on kafkamon1001 is OK: OK - running: The system is fully operational
[13:41:43] <icinga-wm>	 RECOVERY - BGP status on cr2-eqord is OK: BGP OK - up: 64, down: 19, shutdown: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status
[13:42:12] <wikibugs>	 (03CR) 10Filippo Giunchedi: [C: 04-1] icinga: on stretch, tell rsyslog to discard logs from check_nrpe (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/469337 (https://phabricator.wikimedia.org/T207775) (owner: 10Dzahn)
[13:43:23] <icinga-wm>	 RECOVERY - Check systemd state on ms-be2042 is OK: OK - running: The system is fully operational
[13:44:06] <wikibugs>	 (03PS5) 10Muehlenhoff: Switch prometheus-ops rsync module to auto_ferm [puppet] - 10https://gerrit.wikimedia.org/r/467990
[13:45:07] <wikibugs>	 (03CR) 10ArielGlenn: "It looks like auto_ferm_ipv6 is a top-scoped variable as used in rsync::server::module. And I don't see the ipv6 ferm rules in the catalog" [puppet] - 10https://gerrit.wikimedia.org/r/467978 (owner: 10Muehlenhoff)
[13:46:10] <wikibugs>	 (03PS1) 10Anomie: Set CommentTableSchemaMigrationStage => WRITE_NEW on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469617 (https://phabricator.wikimedia.org/T166733)
[13:46:35] <godog>	 !log reformat ms-be2043 xfs filesystems - T199198
[13:46:37] <wikibugs>	 (03CR) 10Anomie: [C: 032] "Deploying planned config change." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469617 (https://phabricator.wikimedia.org/T166733) (owner: 10Anomie)
[13:46:38] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:46:39] <stashbot>	 T199198: Some swift filesystems reporting negative disk usage - https://phabricator.wikimedia.org/T199198
[13:47:01] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Switch prometheus-ops rsync module to auto_ferm [puppet] - 10https://gerrit.wikimedia.org/r/467990 (owner: 10Muehlenhoff)
[13:47:43] <wikibugs>	 (03Merged) 10jenkins-bot: Set CommentTableSchemaMigrationStage => WRITE_NEW on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469617 (https://phabricator.wikimedia.org/T166733) (owner: 10Anomie)
[13:48:55] <logmsgbot>	 !log anomie@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Setting comment table migration stage to write-new/read-both on all wikis (T166733) (duration: 00m 55s)
[13:48:58] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[13:48:59] <stashbot>	 T166733: Deploy refactored comment storage - https://phabricator.wikimedia.org/T166733
[13:56:25] <wikibugs>	 (03PS2) 10Muehlenhoff: Disable prometheus rsyncd module for now [puppet] - 10https://gerrit.wikimedia.org/r/467991
[13:57:51] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Disable prometheus rsyncd module for now [puppet] - 10https://gerrit.wikimedia.org/r/467991 (owner: 10Muehlenhoff)
[13:58:59] <wikibugs>	 (03PS2) 10Muehlenhoff: Switch carbon rsync module to auto_ferm [puppet] - 10https://gerrit.wikimedia.org/r/467958
[14:01:10] <wikibugs>	 (03CR) 10jenkins-bot: Set CommentTableSchemaMigrationStage => WRITE_NEW on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469617 (https://phabricator.wikimedia.org/T166733) (owner: 10Anomie)
[14:04:41] <wikibugs>	 (03CR) 10Jcrespo: [C: 031] Fix PTR for db2042 [dns] - 10https://gerrit.wikimedia.org/r/467711 (owner: 10Volans)
[14:05:26] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10cloud-services-team, 10User-Smalyshev: Provide a way to have test servers on real hardware, isolated from production for Wikidata Query Service - https://phabricator.wikimedia.org/T206636 (10Gehel) >>! In T206636#4690384, @Smalyshev wrote: > @Andrew A...
[14:06:58] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review: newer version of nagios-nrpe-plugin nrpe (check_nrpe) with fixed logging issue on stretch icinga - https://phabricator.wikimedia.org/T207775 (10faidon) Good idea! I think upstream [[ https://github.com/NagiosEnterprises/nrpe/commit/fe006d2556c906de84321188630ab...
[14:13:11] <wikibugs>	 (03PS4) 10Jcrespo: Fix PTR for db2042 [dns] - 10https://gerrit.wikimedia.org/r/467711 (owner: 10Volans)
[14:19:23] <wikibugs>	 (03CR) 10Muehlenhoff: [C: 032] Switch carbon rsync module to auto_ferm [puppet] - 10https://gerrit.wikimedia.org/r/467958 (owner: 10Muehlenhoff)
[14:20:49] <banyek>	 !log running dns update (gerrit patch: 467711)
[14:20:51] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:21:14] <wikibugs>	 (03PS2) 10Elukey: Add missing AAAA records for druid eqiad hosts [dns] - 10https://gerrit.wikimedia.org/r/467701 (owner: 10Volans)
[14:23:48] <wikibugs>	 (03CR) 10Elukey: [C: 032] Add missing AAAA records for druid eqiad hosts [dns] - 10https://gerrit.wikimedia.org/r/467701 (owner: 10Volans)
[14:27:57] <wikibugs>	 (03PS4) 10Muehlenhoff: Switch srvdumps rsync module to auto_ferm [puppet] - 10https://gerrit.wikimedia.org/r/467978
[14:28:42] <elukey>	 !log upgrade druid on druid100[4-6] to Druid 0.12.3
[14:28:44] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[14:29:35] <wikibugs>	 10Operations, 10LDAP-Access-Requests: Remove "jk" from "wmde" ldap group - https://phabricator.wikimedia.org/T207792 (10WMDE-leszek) Thanks for the attention. As the engineering manager at WMDE I confirm that person behind the user name "jk" is no longer doing software development/engineering work at WMF infra...
[14:29:53] <icinga-wm>	 RECOVERY - Filesystem available is greater than filesystem size on ms-be2043 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=ms-be2043&var-datasource=codfw%2520prometheus%252Fops
[14:46:03] <wikibugs>	 10Operations, 10LDAP-Access-Requests: Remove "jk" from "wmde" ldap group - https://phabricator.wikimedia.org/T207792 (10Addshore) >>! In T207792#4693923, @MoritzMuehlenhoff wrote: >>>! In T207792#4691594, @jijiki wrote: >> @Addshore could you please give us some context on this (e.g. they are not working for W...
[14:46:11] <addshore>	 morning James_F 
[14:46:46] <James_F>	 Hey.
[14:46:57] <James_F>	 addshore: Ideas for next step?
[14:49:10] <wikibugs>	 (03CR) 10Muehlenhoff: "The ferm service name is based on name of the rsyncd service, so they won't clash. If multiple services are created for the rsyncd port, t" [puppet] - 10https://gerrit.wikimedia.org/r/467985 (owner: 10Muehlenhoff)
[14:49:12] <wikibugs>	 (03PS1) 10Vgutierrez: certcentral: Avoid fast retry on local errors after cert is issued [software/certcentral] - 10https://gerrit.wikimedia.org/r/469624 (https://phabricator.wikimedia.org/T207927)
[14:50:56] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: setup/install weblog1001/WMF4750 as oxygen replacement - https://phabricator.wikimedia.org/T207760 (10Cmjohnson) a:05Cmjohnson>03RobH @robh added label on server, added to switch asw-a-eqiad ge-6/0/18       up    up   weblog1001 and in private1-a
[14:51:32] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to deployment, operational logs, and analytics cluster for jlinehan - https://phabricator.wikimedia.org/T207951 (10jlinehan)
[14:52:21] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] certcentral: Avoid fast retry on local errors after cert is issued [software/certcentral] - 10https://gerrit.wikimedia.org/r/469624 (https://phabricator.wikimedia.org/T207927) (owner: 10Vgutierrez)
[14:52:52] <wikibugs>	 (03PS1) 10Addshore: Explicitly set wgLexemeEnableRepo for wikidatas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469625
[14:52:58] <addshore>	 jouncebot: now
[14:52:59] <jouncebot>	 For the next 0 hour(s) and 7 minute(s): MediaWiki train - European version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181025T1300)
[14:53:01] <addshore>	 James_F: ^^
[14:53:09] <addshore>	 lets do that then turn it on again
[14:53:10] <addshore>	 jouncebot: next
[14:53:11] <jouncebot>	 In 1 hour(s) and 6 minute(s): Puppet SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181025T1600)
[14:53:26] <addshore>	 James_F: in the office yet? :P
[14:53:36] <James_F>	 Just outside.
[14:53:50] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10cloud-services-team, 10User-Smalyshev: Provide a way to have test servers on real hardware, isolated from production for Wikidata Query Service - https://phabricator.wikimedia.org/T206636 (10Andrew) >>! In T206636#4690379, @Smalyshev wrote: >> I've cr...
[14:53:59] <wikibugs>	 (03PS2) 10Addshore: Explicitly set wgLexemeEnableRepo for wikidatas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469625
[14:55:28] <James_F>	 Let’s try it.
[14:56:11] <James_F>	 Will only fix lexeme though?
[14:57:01] <James_F>	 Yesterday you said there were three entity types still enabled?
[14:58:32] <James_F>	 addshore: Unless I mis-remember?
[14:58:38] <addshore>	 the three were all lexeme
[14:58:44] <James_F>	 Ah, OK.
[14:58:45] <addshore>	 i thought the fix was harder than this ^^
[14:58:46] <James_F>	 Sure, let's do it.
[14:58:49] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: setup/install weblog1001/WMF4750 as oxygen replacement - https://phabricator.wikimedia.org/T207760 (10Cmjohnson)
[14:59:05] <wikibugs>	 10Operations, 10ops-eqiad: apply hostname label for weblog1001/WMF4750 - https://phabricator.wikimedia.org/T207764 (10Cmjohnson) 05Open>03Resolved
[14:59:08] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: setup/install weblog1001/WMF4750 as oxygen replacement - https://phabricator.wikimedia.org/T207760 (10Cmjohnson)
[14:59:13] <James_F>	 addshore: You deploying or should I?
[14:59:50] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: setup/install weblog1001/WMF4750 as oxygen replacement - https://phabricator.wikimedia.org/T207760 (10Cmjohnson)
[15:00:05] <wikibugs>	 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: Rack/setup cr2-eqord - https://phabricator.wikimedia.org/T204170 (10Papaul) Router has been unracked and dropped at shipping for ship out. shipping information below {F26792797}
[15:01:02] <addshore>	 James_F: i can do
[15:01:39] <James_F>	 Go for it.
[15:01:54] <wikibugs>	 (03CR) 10Addshore: [C: 032] Explicitly set wgLexemeEnableRepo for wikidatas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469625 (owner: 10Addshore)
[15:02:04] <godog>	 !log test rsyslog 8.38 upgrade on lithium - T136312
[15:02:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:02:09] <stashbot>	 T136312: Encrypt syslog traffic - https://phabricator.wikimedia.org/T136312
[15:02:29] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to deployment, operational logs, and analytics cluster for jlinehan - https://phabricator.wikimedia.org/T207951 (10Ottomata) @nuria needs to give the sign off from analytics, but from my POV this is all correct!  Yeehaw!  This will be discussed in Monday's...
[15:03:01] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to deployment, operational logs, and analytics cluster for jlinehan - https://phabricator.wikimedia.org/T207951 (10Nuria) Approved on my end.
[15:03:57] <wikibugs>	 (03PS3) 10Addshore: Explicitly set wgLexemeEnableRepo for wikidatas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469625
[15:04:02] <wikibugs>	 (03CR) 10Addshore: [C: 032] Explicitly set wgLexemeEnableRepo for wikidatas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469625 (owner: 10Addshore)
[15:05:02] <wikibugs>	 (03Merged) 10jenkins-bot: Explicitly set wgLexemeEnableRepo for wikidatas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469625 (owner: 10Addshore)
[15:05:08] <wikibugs>	 (03PS1) 10Muehlenhoff: Convert udp2log::rsyncd to auto_ferm [puppet] - 10https://gerrit.wikimedia.org/r/469627
[15:06:44] <icinga-wm>	 PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 26552 MB (5% inode=99%)
[15:06:52] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: Degraded RAID on analytics1029 - https://phabricator.wikimedia.org/T207644 (10Cmjohnson) 05Open>03Resolved a:03Cmjohnson I had one remaining 4TB spare disks on-site.  Replaced the disk, cleared the cache and all disks are back
[15:07:17] <addshore>	 syncing
[15:07:49] <wikibugs>	 (03PS2) 10Addshore: Revert "logging: Disable 'Wikibase.NewItemIdFormatter' channel" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/467345
[15:07:52] <wikibugs>	 (03CR) 10Addshore: [C: 032] Revert "logging: Disable 'Wikibase.NewItemIdFormatter' channel" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/467345 (owner: 10Addshore)
[15:07:59] <wikibugs>	 (03PS2) 10Vgutierrez: certcentral: Avoid fast retry on local errors after cert is issued [software/certcentral] - 10https://gerrit.wikimedia.org/r/469624 (https://phabricator.wikimedia.org/T207927)
[15:08:05] <logmsgbot>	 !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Explicitly set wgLexemeEnableRepo for wikidatas [[gerrit:469625]] (duration: 00m 55s)
[15:08:07] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:08:29] <wikibugs>	 (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/13204/" [puppet] - 10https://gerrit.wikimedia.org/r/469627 (owner: 10Muehlenhoff)
[15:08:57] <wikibugs>	 (03Merged) 10jenkins-bot: Revert "logging: Disable 'Wikibase.NewItemIdFormatter' channel" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/467345 (owner: 10Addshore)
[15:10:03] <addshore>	 James_F: can you make the patch for turning wikibaserepo back on on beta commons?
[15:10:10] <James_F>	 Sure.
[15:10:33] <icinga-wm>	 RECOVERY - MegaRAID on analytics1029 is OK: OK: optimal, 13 logical, 14 physical, WriteBack policy
[15:10:49] <wikibugs>	 (03PS1) 10Jforrester: Revert "Revert "Revert "[Beta Cluster] Re-disable WBMI on Beta Commons for now""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469628
[15:11:07] <addshore>	 so many reverts :P
[15:11:11] <wikibugs>	 (03PS2) 10Jforrester: [Beta Cluster] Re-enable WBMI on Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469628
[15:11:18] <wikibugs>	 (03PS3) 10Addshore: [Beta Cluster] Re-enable WBMI on Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469628 (owner: 10Jforrester)
[15:11:21] <James_F>	 I'm lazy. :-)
[15:11:23] <wikibugs>	 (03CR) 10Addshore: [C: 032] [Beta Cluster] Re-enable WBMI on Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469628 (owner: 10Jforrester)
[15:11:35] <logmsgbot>	 !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Revert "logging: Disable Wikibase.NewItemIdFormatter channel" (duration: 00m 55s)
[15:11:37] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:13:04] <wikibugs>	 (03Merged) 10jenkins-bot: [Beta Cluster] Re-enable WBMI on Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469628 (owner: 10Jforrester)
[15:13:59] <wikibugs>	 (03PS6) 10Jforrester: Enable WikibaseMediaInfo on Beta Cluster Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/466954 (https://phabricator.wikimedia.org/T180981)
[15:14:41] <wikibugs>	 10Operations, 10ops-eqiad: Broken memory on thumbor1004 - https://phabricator.wikimedia.org/T207721 (10Cmjohnson) @MoritzMuehlenhoff I am sure I have one buried in the 300 servers on the floor but the few that are easy to access are only 8GB.
[15:16:05] <wikibugs>	 10Operations, 10ops-eqiad, 10Traffic: cp1076 hardware failure - https://phabricator.wikimedia.org/T206394 (10Cmjohnson) @BBlack the idrac h/w log does not show any failures
[15:16:32] <logmsgbot>	 !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: [Beta Cluster] Re-enable WBMI on Beta Commons (duration: 00m 54s)
[15:16:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:17:11] <addshore>	 James_F: I guess now we wait for it to appear on beta?
[15:17:15] <James_F>	 Yeah.
[15:17:19] <James_F>	 Do-dee-doo.
[15:17:45] <addshore>	 James_F: mind pinging me once it is live again? :D
[15:17:52] * addshore might go grab breakfast / head to the venue
[15:18:07] <James_F>	 Of course. Coming to the Foundation/Wikidata meeting in 12 minutes' time? I assumed it would be cancelled, but…
[15:18:19] <addshore>	 oh
[15:18:21] <addshore>	 forgot about
[15:18:53] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): labvirt1009 HP Raid alert - https://phabricator.wikimedia.org/T198479 (10Cmjohnson) @Bstorm  is this okay to resolve now?
[15:18:54] <wikibugs>	 (03CR) 10jenkins-bot: Explicitly set wgLexemeEnableRepo for wikidatas [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469625 (owner: 10Addshore)
[15:18:56] <wikibugs>	 (03CR) 10jenkins-bot: Revert "logging: Disable 'Wikibase.NewItemIdFormatter' channel" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/467345 (owner: 10Addshore)
[15:18:58] <wikibugs>	 (03CR) 10jenkins-bot: [Beta Cluster] Re-enable WBMI on Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469628 (owner: 10Jforrester)
[15:19:30] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: analytics1068 doesn't boot - https://phabricator.wikimedia.org/T203244 (10Cmjohnson) 05Open>03Resolved  @elukey  okay
[15:21:04] <wikibugs>	 (03PS1) 10Muehlenhoff: When absenting an rsyncd module, also remove the ferm service [puppet] - 10https://gerrit.wikimedia.org/r/469629
[15:21:30] <James_F>	 addshore: Now cancelled.
[15:21:36] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: Degraded RAID on dbstore1002 - https://phabricator.wikimedia.org/T206965 (10Cmjohnson) @elukey dbstore1002 is out of warranty and has 1.2T disks. I don't have disks this size but can replace with a 2TB disk..
[15:26:17] <wikibugs>	 (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/13205/" [puppet] - 10https://gerrit.wikimedia.org/r/469629 (owner: 10Muehlenhoff)
[15:28:48] <wikibugs>	 (03PS1) 10Muehlenhoff: Disable prometheus rsyncd module for now [puppet] - 10https://gerrit.wikimedia.org/r/469630
[15:30:14] <addshore>	 James_F: hooray
[15:30:55] <James_F>	 addshore: beta-scap-eqiad is running now.
[15:31:29] <addshore>	 James_F: yay
[15:31:35] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Kanban): labvirt1009 HP Raid alert - https://phabricator.wikimedia.org/T198479 (10Bstorm) 05Open>03Resolved Looks great to me!  Sorry this got buried.
[15:31:47] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10cloud-services-team, 10User-Smalyshev: Provide a way to have test servers on real hardware, isolated from production for Wikidata Query Service - https://phabricator.wikimedia.org/T206636 (10Smalyshev) >  I've build a new VM t206636-3 that should have...
[15:31:53] <SMalyshev>	 !log depooling wdqs1003 again, it's not catching up like the other hosts
[15:31:54] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:34:44] <icinga-wm>	 RECOVERY - Disk space on elastic1025 is OK: DISK OK
[15:36:11] <elukey>	 !log shutdown aqs1006 to replace one broken disk - T206915
[15:36:14] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[15:36:15] <stashbot>	 T206915: Degraded RAID on aqs1006 - https://phabricator.wikimedia.org/T206915
[15:36:31] <James_F>	 addshore: And it's live.
[15:36:46] <addshore>	 yup, we still have issues but im gonna just investigate through the day rather than turn it off again
[15:36:53] <James_F>	 addshore: "Create an item" isn't listed, but a bunch still are.
[15:36:54] <addshore>	 lexemes are still listed for exmaple...
[15:37:24] <James_F>	 Shouldn't https://commons.wikimedia.beta.wmflabs.org/wiki/Special:ListDatatypes theoretically say "none"?
[15:37:33] <addshore>	 no
[15:37:42] <James_F>	 But… they aren't available?
[15:38:37] <James_F>	 addshore: Shall we do https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/466954 then?
[15:38:59] <addshore>	 lets not do that one quite yet
[15:39:01] <addshore>	 but hopfully today
[15:39:04] <James_F>	 OK.
[15:42:12] <James_F>	 addshore: So… at first glance, I'd say Special:AvailableBadges, Special:EntitiesWithoutDescription, Special:EntitiesWithoutLabel, Special:GoToLinkedPage, Special:ItemByTitle, Special:ItemDisambiguation, Special:ItemsWithoutSitelinks, Special:MergeItems, Special:RedirectEntity, Special:SetDescription, Special:SetLabel, Special:SetSiteLink, Special:SetAliases, Special:SetLabelDescriptionAliases shouldn't be listed or enabled.
[15:42:42] <James_F>	 Special:DispatchStats, Special:ListProperties, Special:EntityData, Special:EntityPage, and Special:MyLanguageFallbackChain are fine, as is Special:ListDatatypes - but should be blank or in some other way showthat the data types are known but not allowed?
[15:42:59] <addshore>	 yup, can you list them on that ticket that i made, then we can keep looking through that
[15:43:53] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] Remove all references to labs_metal [puppet] - 10https://gerrit.wikimedia.org/r/469532 (owner: 10Faidon Liambotis)
[15:44:07] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] "The compiler likes this, let's give it a try :)" [puppet] - 10https://gerrit.wikimedia.org/r/469532 (owner: 10Faidon Liambotis)
[15:44:15] <wikibugs>	 (03PS2) 10Andrew Bogott: Remove all references to labs_metal [puppet] - 10https://gerrit.wikimedia.org/r/469532 (owner: 10Faidon Liambotis)
[15:45:40] <James_F>	 Sure.
[15:45:51] <wikibugs>	 (03PS1) 10Bstorm: sonofgridengine: refactor roles into wmcs namespace [puppet] - 10https://gerrit.wikimedia.org/r/469633 (https://phabricator.wikimedia.org/T200557)
[15:46:23] <jynus>	 there was 10 minutes ago a spike on inserts and updates on enwiki
[15:46:28] <paravoid>	 andrewbogott: cool :) lmk if follow-up is needed
[15:46:28] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] sonofgridengine: refactor roles into wmcs namespace [puppet] - 10https://gerrit.wikimedia.org/r/469633 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm)
[15:47:44] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work), and 4 others: WDQS Updater ran into issue and stopped working - https://phabricator.wikimedia.org/T207817 (10Smalyshev) > engineering@? wikitech-l?  I think engineering is good, and probably wikitech too since the data c...
[15:48:12] <jynus>	 invalidateTitles  it seems
[15:48:24] <wikibugs>	 (03CR) 10Arturo Borrero Gonzalez: [C: 031] "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/469633 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm)
[15:50:00] <wikibugs>	 (03PS1) 10Muehlenhoff: Use auto_ferm for profile::analytics::database::meta::backup_dest [puppet] - 10https://gerrit.wikimedia.org/r/469635
[15:50:02] <wikibugs>	 (03PS2) 10Bstorm: sonofgridengine: refactor roles into wmcs namespace [puppet] - 10https://gerrit.wikimedia.org/r/469633 (https://phabricator.wikimedia.org/T200557)
[15:50:21] <addshore>	 *reads up*
[15:53:15] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on aqs1006 is CRITICAL: CRITICAL: State: degraded, Active: 11, Working: 11, Failed: 0, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T207958
[15:53:25] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on aqs1006 - https://phabricator.wikimedia.org/T207958 (10ops-monitoring-bot)
[15:56:01] <wikibugs>	 (03CR) 10Smalyshev: wdqs: increase restart interval of wdqs-updater (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/469447 (https://phabricator.wikimedia.org/T207843) (owner: 10Gehel)
[15:57:55] <wikibugs>	 (03PS3) 10Bstorm: sonofgridengine: refactor roles into wmcs namespace [puppet] - 10https://gerrit.wikimedia.org/r/469633 (https://phabricator.wikimedia.org/T200557)
[15:58:21] <wikibugs>	 (03CR) 10Muehlenhoff: "PCC: https://puppet-compiler.wmflabs.org/compiler1002/13207/" [puppet] - 10https://gerrit.wikimedia.org/r/469635 (owner: 10Muehlenhoff)
[16:00:05] <jouncebot>	 godog and _joe_: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Puppet SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181025T1600).
[16:00:05] <jouncebot>	 No GERRIT patches in the queue for this window AFAICS.
[16:00:31] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1019 - https://phabricator.wikimedia.org/T196507 (10Cmjohnson) I have not heard back from HP yet, I pinged them again
[16:04:17] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decom promethium/WMF3571 - https://phabricator.wikimedia.org/T191362 (10Andrew)
[16:04:52] <wikibugs>	 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decom promethium/WMF3571 - https://phabricator.wikimedia.org/T191362 (10Andrew) a:05Andrew>03RobH
[16:04:55] <wikibugs>	 (03CR) 10Bstorm: [C: 032] sonofgridengine: refactor roles into wmcs namespace [puppet] - 10https://gerrit.wikimedia.org/r/469633 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm)
[16:07:12] <wikibugs>	 10Operations, 10SRE-Access-Requests: Requesting access to deployment, operational logs, and analytics cluster for jlinehan - https://phabricator.wikimedia.org/T207951 (10Jhernandez) Sounds good from my end too 👍
[16:16:21] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] Decrease OSM update Frequency [puppet] - 10https://gerrit.wikimedia.org/r/469329 (https://phabricator.wikimedia.org/T205735) (owner: 10MSantos)
[16:16:28] <wikibugs>	 (03PS5) 10Alexandros Kosiaris: Decrease OSM update Frequency [puppet] - 10https://gerrit.wikimedia.org/r/469329 (https://phabricator.wikimedia.org/T205735) (owner: 10MSantos)
[16:16:38] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Decrease OSM update Frequency [puppet] - 10https://gerrit.wikimedia.org/r/469329 (https://phabricator.wikimedia.org/T205735) (owner: 10MSantos)
[16:18:23] <icinga-wm>	 PROBLEM - Disk space on elastic1025 is CRITICAL: DISK CRITICAL - free space: /srv 28020 MB (5% inode=99%)
[16:19:05] <icinga-wm>	 ACKNOWLEDGEMENT - MD RAID on aqs1006 is CRITICAL: CRITICAL: State: degraded, Active: 11, Working: 11, Failed: 0, Spare: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T207964
[16:19:09] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on aqs1006 - https://phabricator.wikimedia.org/T207964 (10ops-monitoring-bot)
[16:20:44] <wikibugs>	 10Operations, 10ops-eqiad: Degraded RAID on aqs1006 - https://phabricator.wikimedia.org/T207964 (10elukey)
[16:20:47] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: Degraded RAID on aqs1006 - https://phabricator.wikimedia.org/T206915 (10elukey)
[16:22:13] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics: Degraded RAID on aqs1006 - https://phabricator.wikimedia.org/T206915 (10Cmjohnson) I sent HP a diagnostic log showing disk 5 as failed {F26794607}  {F26794615}
[16:24:49] <shdubsh>	 !log installed patched nagios-nrpe-plugin and nagios-nrpe-server on icinga1001 - T207775
[16:24:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:24:54] <stashbot>	 T207775: newer version of nagios-nrpe-plugin nrpe (check_nrpe) with fixed logging issue on stretch icinga - https://phabricator.wikimedia.org/T207775
[16:25:26] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 031] "LGTM, but adding faidon as well who created that stanza" [puppet] - 10https://gerrit.wikimedia.org/r/469524 (https://phabricator.wikimedia.org/T207887) (owner: 10Alex Monk)
[16:25:42] <wikibugs>	 10Operations, 10ops-eqiad: eqiad: Re-connect cage cameras - https://phabricator.wikimedia.org/T207965 (10Cmjohnson)
[16:26:26] <wikibugs>	 (03CR) 10Smalyshev: [C: 031] wdqs: cleanup logback configuration [puppet] - 10https://gerrit.wikimedia.org/r/469611 (https://phabricator.wikimedia.org/T207834) (owner: 10Gehel)
[16:28:23] <icinga-wm>	 PROBLEM - puppet last run on labsdb1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:29:30] <akosiaris>	 this is me ^
[16:29:55] <akosiaris>	 alongside the maps boxes
[16:30:00] <akosiaris>	 who will alert soon
[16:31:34] <icinga-wm>	 PROBLEM - puppet last run on maps1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:31:52] <volans>	 you can see the future :-P
[16:32:17] <akosiaris>	 4, 8, 15, 16, 23, 42
[16:32:20] <akosiaris>	 you know what to do
[16:32:23] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: osm::planet_sync: Specify correct cron parameter [puppet] - 10https://gerrit.wikimedia.org/r/469644
[16:32:26] <volans>	 lol
[16:33:15] <wikibugs>	 10Operations, 10Wikimedia-Mailing-lists: New list request for 1lib1ref - https://phabricator.wikimedia.org/T207283 (10AVasanth_WMF)
[16:34:19] <gehel>	 !log decreasing relative weight of wdqs1003 in LVS to ease the updater
[16:34:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:23] <logmsgbot>	 !log gehel@puppetmaster1001 conftool action : set/weight=20; selector: dc=eqiad,cluster=wdqs,name=wdqs1004.codfw.wmnet
[16:34:24] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:34:28] <logmsgbot>	 !log gehel@puppetmaster1001 conftool action : set/weight=20; selector: dc=eqiad,cluster=wdqs,name=wdqs1005.codfw.wmnet
[16:34:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[16:35:21] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 032] osm::planet_sync: Specify correct cron parameter [puppet] - 10https://gerrit.wikimedia.org/r/469644 (owner: 10Alexandros Kosiaris)
[16:35:34] <icinga-wm>	 PROBLEM - puppet last run on maps2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:38:34] <icinga-wm>	 RECOVERY - puppet last run on labsdb1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures
[16:40:43] <icinga-wm>	 RECOVERY - puppet last run on maps2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[16:40:59] <wikibugs>	 10Operations, 10Icinga, 10fundraising-tech-ops: Why doesn't icinga notify the team-fr-tech contact for services in WARNING state? - https://phabricator.wikimedia.org/T207966 (10Jgreen)
[16:41:53] <icinga-wm>	 RECOVERY - puppet last run on maps1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures
[16:43:42] <wikibugs>	 (03PS1) 10Gehel: wdqs: switch wdqs1003 and wdqs1006 from public vs internal clusters [puppet] - 10https://gerrit.wikimedia.org/r/469649 (https://phabricator.wikimedia.org/T207947)
[16:48:07] <wikibugs>	 (03PS2) 10Gehel: wdqs: switch wdqs1003 and wdqs1006 from public vs internal clusters [puppet] - 10https://gerrit.wikimedia.org/r/469649 (https://phabricator.wikimedia.org/T207947)
[16:51:37] <wikibugs>	 (03CR) 10Smalyshev: [C: 031] wdqs: switch wdqs1003 and wdqs1006 from public vs internal clusters [puppet] - 10https://gerrit.wikimedia.org/r/469649 (https://phabricator.wikimedia.org/T207947) (owner: 10Gehel)
[16:55:14] <icinga-wm>	 RECOVERY - Disk space on elastic1025 is OK: DISK OK
[16:56:51] <wikibugs>	 (03PS16) 10Dzahn: Planet: Redesign UI [puppet] - 10https://gerrit.wikimedia.org/r/467100 (https://phabricator.wikimedia.org/T207243) (owner: 10Paladox)
[16:59:02] <addshore>	 James_F: :( its hard.....
[16:59:07] <addshore>	 anyway, time for sessions...
[16:59:48] <wikibugs>	 10Operations, 10Analytics, 10hardware-requests, 10User-Elukey: eqiad | (14 + 6) hadoop hardware refresh and expansion - https://phabricator.wikimedia.org/T199673 (10Cmjohnson)
[16:59:51] <wikibugs>	 10Operations, 10Analytics, 10hardware-requests, 10User-Elukey: eqiad | (3) Labs Data Lake hardware - https://phabricator.wikimedia.org/T199674 (10Cmjohnson)
[17:00:05] <jouncebot>	 cscott, arlolra, subbu, halfak, and Amir1: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Services – Graphoid / Parsoid / Citoid / ORES . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181025T1700).
[17:00:05] <volans>	 shdubsh: by install did you mean upgrade?
[17:02:06] <shdubsh>	 volans: maybe that is more accurate.  it was the same version of nrpe, but with additional patches
[17:02:11] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "replaces Bootstrap with bulma.io,  removes jQuery completely" [puppet] - 10https://gerrit.wikimedia.org/r/467100 (https://phabricator.wikimedia.org/T207243) (owner: 10Paladox)
[17:02:38] <volans>	 shdubsh: ack, no prob :)
[17:02:46] <wikibugs>	 (03Abandoned) 10Alex Monk: [WIP] dnsrecursor: Rewrite code setting up lua hooks [puppet] - 10https://gerrit.wikimedia.org/r/304146 (https://phabricator.wikimedia.org/T139438) (owner: 10Alex Monk)
[17:07:45] <James_F>	 addshore: :-(
[17:10:51] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10Ottomata) hey hey heyyy, the nodes are in!  https://phabricator.wikimedia.org/T204177#4695147  How can we move this forwa...
[17:17:46] <wikibugs>	 (03PS1) 10Cmjohnson: Adding dns entries for an-worker10[78-96] [dns] - 10https://gerrit.wikimedia.org/r/469656 (https://phabricator.wikimedia.org/T207192)
[17:17:59] <wikibugs>	 (03PS2) 10Cmjohnson: Adding dns entries for an-worker10[78-96] [dns] - 10https://gerrit.wikimedia.org/r/469656 (https://phabricator.wikimedia.org/T207192)
[17:19:17] <wikibugs>	 10Operations, 10Elasticsearch, 10Discovery-Search (Current work), 10Patch-For-Review: Refactor current code base to support multiple elasticsearch instances/multiple elasticsearch clusters - https://phabricator.wikimedia.org/T207918 (10EBjune)
[17:20:14] <mutante>	 !log planet - regenerating feeds for 'en' and 'de', others will follow by cron. switching to new theme. replaced bootstrap with bulma. removed jQuery. thanks to paladox
[17:20:16] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:20:18] <wikibugs>	 (03PS1) 10Cmjohnson: Merge branch 'master' of https://gerrit.wikimedia.org/r/p/operations/dns into mydnschanges [dns] - 10https://gerrit.wikimedia.org/r/469657
[17:20:32] <logmsgbot>	 !log bsitzmann@deploy1001 Started deploy [mobileapps/deploy@95452cf]: Update mobileapps to 58cbdff (T206527)
[17:20:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:20:39] <stashbot>	 T206527: [BUG] Citations not being parsed correctly - https://phabricator.wikimedia.org/T206527
[17:21:27] <wikibugs>	 (03Abandoned) 10Cmjohnson: Merge branch 'master' of https://gerrit.wikimedia.org/r/p/operations/dns into mydnschanges [dns] - 10https://gerrit.wikimedia.org/r/469657 (owner: 10Cmjohnson)
[17:22:28] <wikibugs>	 (03Abandoned) 10Cmjohnson: Adding dns entries for an-worker10[78-96] [dns] - 10https://gerrit.wikimedia.org/r/469656 (https://phabricator.wikimedia.org/T207192) (owner: 10Cmjohnson)
[17:22:36] <wikibugs>	 (03CR) 10Cwhite: "Per T207775, it looks like consensus points to patching check_nrpe making this changeset unnecessary." [puppet] - 10https://gerrit.wikimedia.org/r/469337 (https://phabricator.wikimedia.org/T207775) (owner: 10Dzahn)
[17:24:00] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: mathoid: Add various informational chart values [deployment-charts] - 10https://gerrit.wikimedia.org/r/469658
[17:24:02] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Invert logic for specifying externalIPs [deployment-charts] - 10https://gerrit.wikimedia.org/r/469659
[17:24:04] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: scaffold: Invert the externalIPs inclusion logic [deployment-charts] - 10https://gerrit.wikimedia.org/r/469660
[17:24:06] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: Add chartid to pod labels [deployment-charts] - 10https://gerrit.wikimedia.org/r/469661
[17:24:08] <wikibugs>	 (03PS1) 10Alexandros Kosiaris: WIP: Support canary functionality [deployment-charts] - 10https://gerrit.wikimedia.org/r/469662
[17:24:22] <logmsgbot>	 !log bsitzmann@deploy1001 Finished deploy [mobileapps/deploy@95452cf]: Update mobileapps to 58cbdff (T206527) (duration: 03m 50s)
[17:24:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:24:34] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[17:25:42] <wikibugs>	 (03PS1) 10Bstorm: sonofgridengine: clean up the old roles after moving under wmcs [puppet] - 10https://gerrit.wikimedia.org/r/469663
[17:26:33] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[17:26:53] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[17:27:43] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[17:29:42] <wikibugs>	 (03CR) 10Bstorm: [C: 032] sonofgridengine: clean up the old roles after moving under wmcs [puppet] - 10https://gerrit.wikimedia.org/r/469663 (owner: 10Bstorm)
[17:34:23] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5
[17:35:24] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=text&var-status_type=5
[17:36:26] <logmsgbot>	 !log aaron@deploy1001 Synchronized php-1.33.0-wmf.1/includes/changetags/ChangeTags.php: 08f8e6a9d7f1dcb281321c5e3a3471169e68348d (duration: 00m 55s)
[17:36:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:42:47] <wikibugs>	 (03PS1) 10Cmjohnson: Adding dns entries an-worker10[78-96] [dns] - 10https://gerrit.wikimedia.org/r/469664 (https://phabricator.wikimedia.org/T207192)
[17:43:01] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Adding dns entries an-worker10[78-96] [dns] - 10https://gerrit.wikimedia.org/r/469664 (https://phabricator.wikimedia.org/T207192) (owner: 10Cmjohnson)
[17:46:57] <wikibugs>	 (03PS2) 10Cmjohnson: Adding dns entries an-worker10[78-96] [dns] - 10https://gerrit.wikimedia.org/r/469664 (https://phabricator.wikimedia.org/T207192)
[17:47:57] <wikibugs>	 (03PS2) 10Cmjohnson: Fix records for camera [dns] - 10https://gerrit.wikimedia.org/r/467709 (owner: 10Volans)
[17:48:33] <icinga-wm>	 PROBLEM - Device not healthy -SMART- on aqs1006 is CRITICAL: cluster=aqs device=sde instance=aqs1006:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=aqs1006&var-datasource=eqiad%2520prometheus%252Fops
[17:50:27] <logmsgbot>	 !log aaron@deploy1001 Synchronized php-1.33.0-wmf.1/tests/phpunit/includes/page/WikiPageDbTestBase.php: f3b5a1df116f426c2809f2a266b9d761f15c349f (duration: 00m 55s)
[17:50:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:51:59] <logmsgbot>	 !log smalyshev@deploy1001 Started deploy [wdqs/wdqs@4967dba]: Test deploy new update & scripts
[17:52:01] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:52:27] <logmsgbot>	 !log smalyshev@deploy1001 Finished deploy [wdqs/wdqs@4967dba]: Test deploy new update & scripts (duration: 00m 28s)
[17:52:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:52:33] <logmsgbot>	 !log aaron@deploy1001 Synchronized php-1.33.0-wmf.1/includes/page/WikiPage.php: f3b5a1df116f426c2809f2a266b9d761f15c349f (duration: 00m 54s)
[17:52:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:53:55] <wikibugs>	 (03PS1) 10Smalyshev: Re-enable kafka on test [puppet] - 10https://gerrit.wikimedia.org/r/469666
[17:56:24] <icinga-wm>	 PROBLEM - Memory correctable errors -EDAC- on wtp2020 is CRITICAL: 6.001 ge 4 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=wtp2020&var-datasource=codfw%2520prometheus%252Fops
[17:58:57] <logmsgbot>	 !log aaron@deploy1001 Synchronized php-1.33.0-wmf.1/extensions/Translate/tag: c5fa239917a870240ec4dcd8a617f0f8033aa9bf (duration: 00m 55s)
[17:58:59] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Time to snap out of that daydream and deploy Morning SWAT (Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181025T1800).
[18:00:04] <jouncebot>	 stephanebisson and aharoni: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[18:00:34] <stephanebisson>	 Hi
[18:00:54] <aharoni>	 Shalom from Jerusalem
[18:02:55] <aharoni>	 who's deploying today?
[18:03:33] <stephanebisson>	 aharoni: Hi! I'll deploy
[18:03:39] <jijiki>	 aharoni: enjoy the lifts on saturday 
[18:03:45] <jijiki>	 :D
[18:04:27] <aharoni>	 (lifts?)
[18:04:39] <jijiki>	 it is said that due to sabbath 
[18:04:48] <jijiki>	 lifts stop on all floors
[18:04:58] <jijiki>	 on saturdays:)
[18:05:30] <aharoni>	 In some buildings, not in mine, thankfully :)
[18:05:35] <aharoni>	 [ https://en.wikipedia.org/wiki/Shabbat_elevator ]
[18:06:27] <jijiki>	 ah yes that one :D
[18:07:02] <wikibugs>	 (03PS3) 10Cmjohnson: Adding dns entries an-worker10[78-95] [dns] - 10https://gerrit.wikimedia.org/r/469664 (https://phabricator.wikimedia.org/T207192)
[18:11:24] <aharoni>	 stephanebisson: so, I have three patches, all of them backports. The one that you've already merged is for wmf/1.33.0-wmf.1 , and it's definitely needed.
[18:11:53] <stephanebisson>	 aharoni: I see one of them is failing jenkins
[18:12:03] <aharoni>	 The other two are perhaps less important, but only if it's certain that the train will run later today and everything will be deployed to all the wikis.
[18:12:16] <aharoni>	 stephanebisson: hmm, it's not supposed to.
[18:12:23] <aharoni>	 zeljkof earlier today said that it's supposed to pass.
[18:12:44] <stephanebisson>	 I've requested a recheck, we'll see
[18:13:00] <stephanebisson>	 The 3rd one is the same as the first one, but for wmf.26, right?
[18:13:07] <aharoni>	 stephanebisson: yes
[18:13:16] <aharoni>	 it depends on the second one
[18:13:26] <stephanebisson>	 wmf.1 is not on group1 yet, the train is delayed
[18:13:34] <wikibugs>	 (03PS1) 10Ottomata: Copy hive-site.xml to HDFS from a normal hive client, not coordinator node [puppet] - 10https://gerrit.wikimedia.org/r/469668
[18:13:37] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10Patch-For-Review, 10User-Elukey: rack/setup/install an-worker10[78-96].eqiad.wmnet - https://phabricator.wikimedia.org/T207192 (10Cmjohnson)
[18:13:58] <aharoni>	 stephanebisson: aha, so it's good to deploy all of them then. let's hope jenkins doesn't give us any more troubles.
[18:14:04] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: rack/setup/install ca-worker100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10Cmjohnson)
[18:15:05] <wikibugs>	 (03CR) 10Ottomata: [C: 032] "Looks good: https://puppet-compiler.wmflabs.org/compiler1002/13210/" [puppet] - 10https://gerrit.wikimedia.org/r/469668 (owner: 10Ottomata)
[18:15:14] <ottomata>	 joal fyi ^
[18:16:32] <joal>	 Many thanks for that ottomata
[18:17:17] <wikibugs>	 (03CR) 10Smalyshev: [C: 04-1] "pending manual testing" [puppet] - 10https://gerrit.wikimedia.org/r/469666 (owner: 10Smalyshev)
[18:26:09] <aharoni>	 stephanebisson: looks like jenkins is better now
[18:28:44] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1010 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[18:30:04] <gehel>	 SMalyshev: ^ that's your test of kafka poller ?
[18:34:39] <stephanebisson>	 aharoni: Your change (469605) is on mwdebug1001, can you test?
[18:35:01] <aharoni>	 stephanebisson: ack
[18:35:49] <aharoni>	 stephanebisson: it's all good, please proceed
[18:36:42] <stephanebisson>	 deploying...
[18:37:26] <logmsgbot>	 !log sbisson@deploy1001 Synchronized php-1.33.0-wmf.1/extensions/ContentTranslation/: SWAT: [[gerrit:469605|Remove the session parameter from AbuseFilter logging]] (duration: 00m 56s)
[18:37:28] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[18:37:56] <aharoni>	 stephanebisson: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/ContentTranslation/+/469603/ passed jenkins
[18:38:18] <aharoni>	 stephanebisson: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/ContentTranslation/+/469608/ will also need +2
[18:38:25] <stephanebisson>	 aharoni: yep, it's next. Actually, should the other 2 be deployed together?
[18:38:34] <aharoni>	 stephanebisson: yes
[18:41:14] <wikibugs>	 (03CR) 10Smalyshev: [C: 031] "Seems to be working ok" [puppet] - 10https://gerrit.wikimedia.org/r/469666 (owner: 10Smalyshev)
[18:42:18] <aharoni>	 stephanebisson: bleh, Jenkins complains about https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/ContentTranslation/+/469608/
[18:43:10] <stephanebisson>	 aharoni: What's the error, I don't see anything
[18:43:58] <aharoni>	 stephanebisson: https://integration.wikimedia.org/ci/job/mwgate-npm-node-6-docker/53225/console
[18:44:07] <aharoni>	 probably recheck will fix it
[18:54:28] <aharoni>	 stephanebisson: wow, slow Jenkins
[18:54:43] <aharoni>	 https://gerrit.wikimedia.org/r/#/c/469603/ is close to being merged
[18:55:08] <aharoni>	 https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/ContentTranslation/+/469608/ will probably need a recheck after 469603 is merged
[18:55:53] <wikibugs>	 10Operations, 10Analytics, 10Analytics-Kanban, 10netops: Figure out networking details for new cloud-analytics-eqiad Hadoop/Presto cluster - https://phabricator.wikimedia.org/T207321 (10faidon) So, this is quite the can of worms :) There are several pieces to this, and honestly, I feel like VLANs is kind o...
[18:58:42] <aharoni>	 stephanebisson: does https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/ContentTranslation/+/469608/ need a recheck? One test there is red on https://integration.wikimedia.org/zuul/
[18:59:08] <stephanebisson>	 aharoni: I don't know if we can or should recheck before the job is finished
[19:00:04] <jouncebot>	 twentyafterfour: I, the Bot under the Fountain, allow thee, The Deployer, to do MediaWiki train - Americas version deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181025T1900).
[19:01:45] <twentyafterfour>	 stephanebisson and aharoni: finish deploying SWAT, I'll wait
[19:02:14] <aharoni>	 stephanebisson: it's done
[19:02:22] <twentyafterfour>	 I don't know if recheck works while it's still running the tests. It should really abort when one test fails IMO
[19:02:27] <wikibugs>	 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: rack/setup/install ca-worker100[1-5].eqiad.wmnet - https://phabricator.wikimedia.org/T207194 (10Ottomata) FYI, networking considerations being worked out in {T207321}
[19:03:20] <stephanebisson>	 twentyafterfour: Thanks, we have a patch to re+2 (I just did). Jenkins is super slow. I don't know how long it's gong to be :(
[19:03:49] <twentyafterfour>	 stephanebisson: yeah I've been following along. It's ok.
[19:08:45] <aharoni>	 stephanebisson: https://integration.wikimedia.org/zuul/ red again :(
[19:09:16] <stephanebisson>	 aharoni: Yeah, not good. I think we should abort mission.
[19:09:32] <wikibugs>	 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Search (Current work): wdqs updater should be better isolated from blazegraph and common workload should be shared between servers - https://phabricator.wikimedia.org/T207837 (10Gehel) There are 3 issues here, and maybe they should be addresse...
[19:10:07] <stephanebisson>	 aharoni: Is it common that CX patches need recheck?
[19:10:16] <aharoni>	 stephanebisson: no :/
[19:10:35] <stephanebisson>	 aharoni: how severe is the problem we're trying to fix with those patches?
[19:11:12] <aharoni>	 not something truly urgent. the first one we merged was the really important one.
[19:11:17] <aharoni>	 (merged and deployed)
[19:11:19] <stephanebisson>	 twentyafterfour: what are the odds that group2 will be on wmf.1 today or this week?
[19:11:28] <aharoni>	 is the train running now for all the groups?
[19:11:56] <twentyafterfour>	 stephanebisson: It all depends on whether all the patches really fix the blockers or if an issue still remains.
[19:12:21] <twentyafterfour>	 I'm not sure yet, I am afraid that T207881 might still remain 
[19:12:22] <stashbot>	 T207881: excessive "lock wait timeout exceeded " error rate after deploying 1.33.0-wmf.1 to group1  - https://phabricator.wikimedia.org/T207881
[19:12:50] <stephanebisson>	 aharoni: so the second patch was merged but not deployed. Should we revert it or deploy it now?
[19:12:52] <twentyafterfour>	 aharoni: I will run group1 and if everything looks good then we'll go to group2
[19:13:07] <aharoni>	 stephanebisson: deploy please
[19:13:11] <stephanebisson>	 ok
[19:13:24] <icinga-wm>	 RECOVERY - Check systemd state on wdqs1010 is OK: OK - running: The system is fully operational
[19:14:41] <stephanebisson>	 aharoni: it's on mwdebug1001, can you test?
[19:15:30] <wikibugs>	 (03PS1) 10Gehel: wdqs-test: switch to kafka poller [puppet] - 10https://gerrit.wikimedia.org/r/469676
[19:16:16] <aharoni>	 stephanebisson: ack, looking, will be very quick
[19:18:14] <wikibugs>	 (03PS2) 10Smalyshev: Re-enable kafka on test [puppet] - 10https://gerrit.wikimedia.org/r/469666
[19:19:35] <wikibugs>	 (03CR) 10Gehel: [C: 032] wdqs-test: switch to kafka poller [puppet] - 10https://gerrit.wikimedia.org/r/469676 (owner: 10Gehel)
[19:19:45] <aharoni>	 stephanebisson: all good
[19:19:54] <stephanebisson>	 deploying...
[19:20:54] <logmsgbot>	 !log sbisson@deploy1001 Synchronized php-1.32.0-wmf.26/extensions/ContentTranslation/: SWAT: [[gerrit:469603|Add detailed logging for AbuseFilter]] (duration: 00m 56s)
[19:20:56] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:20:59] <stephanebisson>	 doe
[19:21:02] <stephanebisson>	 done
[19:21:09] <stephanebisson>	 And that concludes SWAT
[19:21:17] <wikibugs>	 (03PS3) 10Smalyshev: Re-enable kafka on test [puppet] - 10https://gerrit.wikimedia.org/r/469666
[19:21:29] <stephanebisson>	 twentyafterfour: ^ thanks for you patience
[19:21:46] <twentyafterfour>	 stephanebisson: You're welcome. 
[19:21:59] <twentyafterfour>	 Thanks for swatting. 
[19:22:31] <wikibugs>	 (03CR) 10Gehel: [C: 032] Re-enable kafka on test [puppet] - 10https://gerrit.wikimedia.org/r/469666 (owner: 10Smalyshev)
[19:23:19] <twentyafterfour>	 !log beginning mediawiki train. Will start with group1 and then monitor the situation for a few minutes. If everything looks good then we go to group2.
[19:23:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:26:00] <logmsgbot>	 !log twentyafterfour@deploy1001 Started scap: full sync to be sure that 1.33.0-wmf.1 is fully deployed
[19:26:02] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:27:08] <wikibugs>	 (03PS1) 10Gehel: wdqs: switch to kafka updater for wdqs internal and public clusters [puppet] - 10https://gerrit.wikimedia.org/r/469679
[19:27:54] <wikibugs>	 (03CR) 10Gehel: [C: 032] wdqs: switch to kafka updater for wdqs internal and public clusters [puppet] - 10https://gerrit.wikimedia.org/r/469679 (owner: 10Gehel)
[19:29:44] <twentyafterfour>	 So I've never seen this before:  BUG: Bad page map in process hhvm 
[19:31:10] <twentyafterfour>	 looks like they are all from mw1272
[19:31:47] <twentyafterfour>	 mutante: you !logged this error in 2016, any idea what was the cause or the solution?  Should we reboot mw1272? 
[19:33:02] <wikibugs>	 (03PS2) 1020after4: Add .gitreview [software/keyholder] - 10https://gerrit.wikimedia.org/r/460698 (owner: 10Hashar)
[19:33:16] <wikibugs>	 (03CR) 1020after4: [C: 032] Add .gitreview [software/keyholder] - 10https://gerrit.wikimedia.org/r/460698 (owner: 10Hashar)
[19:33:29] <mutante>	 twentyafterfour: no, i don't. yes, we can reboot it
[19:33:54] <wikibugs>	 (03Merged) 10jenkins-bot: Add .gitreview [software/keyholder] - 10https://gerrit.wikimedia.org/r/460698 (owner: 10Hashar)
[19:34:00] <mutante>	 i see you made a ticket already, ack
[19:34:03] <icinga-wm>	 RECOVERY - High lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 643 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen
[19:34:54] <twentyafterfour>	 yeah that was before I noticed it was all one server. Probably good to document the problem anyway 
[19:36:11] <wikibugs>	 10Operations, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to production servers (mwlog*, mmaint* ?) for kharlan - https://phabricator.wikimedia.org/T207330 (10kostajh) @MoritzMuehlenhoff I'm sorry about that. Here's the new public key:  ``` ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHZqJOizHso9Yld...
[19:43:25] <icinga-wm>	 PROBLEM - Apache HTTP on mw2142 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[19:44:24] <icinga-wm>	 RECOVERY - Apache HTTP on mw2142 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.137 second response time
[19:44:40] <twentyafterfour>	 hmm 
[19:45:03] <mutante>	 !log mw1272 - depooled
[19:45:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:45:14] <icinga-wm>	 PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[19:45:42] <twentyafterfour>	 ^ that memcached alert has been showing up occasionally 
[19:46:14] <twentyafterfour>	 it seems like error rate spikes periodically 
[19:47:02] <mutante>	 !log mw1272 - depooled, restarting hhvm (T207983)
[19:47:05] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:47:07] <stashbot>	 T207983: BUG: Bad page map in process hhvm - https://phabricator.wikimedia.org/T207983
[19:47:33] <icinga-wm>	 RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen
[19:50:02] <twentyafterfour>	 uh, error rates just shot up again and I didn't even go to group1 yet
[19:51:10] <mutante>	 !log mw1272 - rebooting (a stop job is running for HHVM PH/Hack runtime) (T207983)
[19:51:13] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:53:36] <wikibugs>	 (03PS4) 10Sbisson: Enable PageTriage/Copyvio on enwiki betalabs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469436
[19:54:22] <mutante>	 !log mw1272 - repooled (T207983)
[19:54:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[19:54:26] <stashbot>	 T207983: BUG: Bad page map in process hhvm - https://phabricator.wikimedia.org/T207983
[19:55:03] <icinga-wm>	 PROBLEM - Apache HTTP on mw1270 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time
[19:55:03] <icinga-wm>	 PROBLEM - HHVM rendering on mw1270 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time
[19:56:04] <icinga-wm>	 RECOVERY - Apache HTTP on mw1270 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.038 second response time
[19:56:04] <icinga-wm>	 RECOVERY - HHVM rendering on mw1270 is OK: HTTP OK: HTTP/1.1 200 OK - 74206 bytes in 0.103 second response time
[19:56:14] <wikibugs>	 (03PS1) 10Gehel: wdqs: remove wdqs1006 from internal cluster [puppet] - 10https://gerrit.wikimedia.org/r/469685
[19:56:16] <wikibugs>	 (03PS1) 10Gehel: wdqs: add wdqs1006 to public cluster [puppet] - 10https://gerrit.wikimedia.org/r/469686
[19:56:18] <wikibugs>	 (03PS1) 10Gehel: wdqs: remove wdqs1003 from public cluster [puppet] - 10https://gerrit.wikimedia.org/r/469687
[19:56:20] <wikibugs>	 (03PS1) 10Gehel: wdqs: add wdqs1003 to internal cluster [puppet] - 10https://gerrit.wikimedia.org/r/469688
[19:56:45] <mutante>	 twentyafterfour: since reboot i have not seen the "BUG" line anymore yet
[19:57:03] <twentyafterfour>	 mutante: yeah it was probably cosmic rays 
[19:57:43] <wikibugs>	 (03CR) 10Sbisson: [C: 032] "Per Roan" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469436 (owner: 10Sbisson)
[19:58:01] <twentyafterfour>	 I'm more worried about the recurring alerts for 503 and the high rate of sql lock wait timeouts 
[19:58:26] <twentyafterfour>	 T207881 
[19:58:27] <stashbot>	 T207881: excessive "lock wait timeout exceeded " error rate after deploying 1.33.0-wmf.1 to group1  - https://phabricator.wikimedia.org/T207881
[19:58:43] <twentyafterfour>	 this is still happenening even with wmf.1 at group0 
[19:59:07] <wikibugs>	 (03Merged) 10jenkins-bot: Enable PageTriage/Copyvio on enwiki betalabs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469436 (owner: 10Sbisson)
[20:02:23] <icinga-wm>	 PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
[20:02:58] <logmsgbot>	 !log twentyafterfour@deploy1001 Finished scap: full sync to be sure that 1.33.0-wmf.1 is fully deployed (duration: 36m 57s)
[20:03:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:03:43] <icinga-wm>	 PROBLEM - Check systemd state on wdqs1010 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed.
[20:04:01] <twentyafterfour>	 uhh
[20:04:47] <twentyafterfour>	 !log still haven't deployed wmf.1 yet error rate increased and icinga is alerting about mediawiki exceptions + wdqs1010 degraded
[20:04:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:05:54] <icinga-wm>	 RECOVERY - Check systemd state on wdqs1010 is OK: OK - running: The system is fully operational
[20:06:44] <icinga-wm>	 RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen
[20:08:41] <Zoranzoki21>	 Hi, I have problems
[20:08:53] <Zoranzoki21>	 zoran@zoran-notebook:~/development/mediawiki$ git fetch
[20:08:53] <Zoranzoki21>	 Terminated
[20:08:53] <Zoranzoki21>	 zoran@zoran-notebook:~/development/mediawiki$ git fetch && git pull
[20:08:53] <Zoranzoki21>	 packet_write_wait: Connection to 208.80.154.85 port 29418: Broken pipe
[20:09:02] <wikibugs>	 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-extensions-Translate, 10Language-Team (Language-2018-October-December), and 3 others: Moving or deleting a translatable page on mediawiki.org triggers an error message - https://phabricator.wikimedia.org/T207930 (10mmodell) Trizek-WMF: Can yo...
[20:10:15] <Zoranzoki21>	 What I should do?
[20:10:20] <Zoranzoki21>	 I again got Broken pipe error
[20:10:24] <wikibugs>	 10Operations, 10docker-pkg, 10Patch-For-Review: Allow selecting which images to build - https://phabricator.wikimedia.org/T186416 (10hashar) 05Open>03Resolved a:03Joe Can now be done by using `--select`
[20:10:34] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team (Kanban): Move various support services for Cloud VPS currently in prod into their own instances - https://phabricator.wikimedia.org/T207536 (10ayounsi) Should the next step here to make an exhaustive list of the "support services" indicating: server, applicat...
[20:12:55] <wikibugs>	 10Operations, 10Security-Team, 10Wikimedia-Site-requests, 10Patch-For-Review: Enable csp-report-only mode everywhere - https://phabricator.wikimedia.org/T207900 (10Tgr) > Based on mw.org, it seems very roughly like a wiki the size of mw.org gets about 30 hits/minute on average, with ocassional spikes to 15...
[20:14:27] <wikibugs>	 (03CR) 10jenkins-bot: Enable PageTriage/Copyvio on enwiki betalabs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469436 (owner: 10Sbisson)
[20:17:36] <wikibugs>	 10Operations, 10Security-Team, 10Wikimedia-Site-requests, 10Patch-For-Review: Enable csp-report-only mode everywhere - https://phabricator.wikimedia.org/T207900 (10Bawolff) > I'd say lets go ahead and do group0 + group1 and see where we're at. Also we have icinga alerts for logstash dropping packets now, i...
[20:18:40] <twentyafterfour>	 Zoranzoki21: I'm not sure 
[20:19:30] <twentyafterfour>	 Zoranzoki21: it works for me
[20:20:11] <Zoranzoki21>	 twentyafterfour: I tryed with another computer (different IP and internet provider) with same settings.. Same happening
[20:20:23] <twentyafterfour>	 weird 
[20:20:35] <Zoranzoki21>	 twentyafterfour: Let's talk on releng
[20:20:36] <paladox>	 twentyafterfour seems to be a internal error
[20:21:19] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review: newer version of nagios-nrpe-plugin nrpe (check_nrpe) with fixed logging issue on stretch icinga - https://phabricator.wikimedia.org/T207775 (10Dzahn) The patched source package has been imported into apt.wikimedia.org   ``` [install1002:~] $ sudo -i reprepro l...
[20:25:05] <twentyafterfour>	 Oct 25 20:21:33 cobalt java[18243]: log4j:WARN Detected problem with connection: java.net.SocketException: Broken pipe (Write failed)
[20:25:15] <twentyafterfour>	 that's not very helpful 
[20:25:24] <twentyafterfour>	 but that's the only java log entry I see
[20:25:47] <paladox>	 oh
[20:26:20] <wikibugs>	 (03PS1) 1020after4: group1 wikis to 1.33.0-wmf.1  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469709
[20:26:23] <wikibugs>	 (03CR) 1020after4: [C: 032] group1 wikis to 1.33.0-wmf.1  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469709 (owner: 1020after4)
[20:26:25] <wikibugs>	 (03CR) 10Dzahn: "since this is elukey's suggestion i think he'd also be the best reviewer here" [puppet] - 10https://gerrit.wikimedia.org/r/468865 (https://phabricator.wikimedia.org/T184261) (owner: 10GTirloni)
[20:28:05] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.33.0-wmf.1  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469709 (owner: 1020after4)
[20:30:07] <logmsgbot>	 !log twentyafterfour@deploy1001 rebuilt and synchronized wikiversions files: group1 wikis to 1.33.0-wmf.1  refs T206655
[20:30:12] <wikibugs>	 (03CR) 10jenkins-bot: group1 wikis to 1.33.0-wmf.1  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469709 (owner: 1020after4)
[20:30:21] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:30:25] <stashbot>	 T206655: 1.33.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T206655
[20:31:02] <logmsgbot>	 !log twentyafterfour@deploy1001 Synchronized php: group1 wikis to 1.33.0-wmf.1  refs T206655 (duration: 00m 54s)
[20:31:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:32:24] <twentyafterfour>	 !log db error rate increased again. rolling back 
[20:32:26] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:33:08] <wikibugs>	 (03Abandoned) 10Dzahn: icinga: on stretch, tell rsyslog to discard logs from check_nrpe [puppet] - 10https://gerrit.wikimedia.org/r/469337 (https://phabricator.wikimedia.org/T207775) (owner: 10Dzahn)
[20:34:42] <wikibugs>	 (03PS1) 1020after4: group1 wikis to 1.32.0-wmf.26  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469711
[20:34:46] <wikibugs>	 (03CR) 1020after4: [C: 032] group1 wikis to 1.32.0-wmf.26  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469711 (owner: 1020after4)
[20:35:41] <wikibugs>	 10Operations, 10Maps (Tilerator), 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): investigate tilerator crash on maps eqiad - https://phabricator.wikimedia.org/T204047 (10Mholloway) Tilerator should be resilient to attempting to access a locked DB resource.  I'd rather see us handle thi...
[20:35:52] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.32.0-wmf.26  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469711 (owner: 1020after4)
[20:39:00] <wikibugs>	 (03Abandoned) 10Dzahn: icinga/etcd: /var/run/icinga/ -> /var/run/nagios/ [puppet] - 10https://gerrit.wikimedia.org/r/467017 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn)
[20:48:01] <twentyafterfour>	 !log staying at group1, error rate seems to have stabilized 
[20:48:03] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[20:48:54] <wikibugs>	 (03PS1) 1020after4: group1 wikis to 1.33.0-wmf.1  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469769
[20:48:56] <wikibugs>	 (03CR) 1020after4: [C: 032] group1 wikis to 1.33.0-wmf.1  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469769 (owner: 1020after4)
[20:50:52] <wikibugs>	 (03Merged) 10jenkins-bot: group1 wikis to 1.33.0-wmf.1  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469769 (owner: 1020after4)
[20:53:49] <wikibugs>	 (03PS1) 10Ayounsi: DNS: assign public /29 for cloud-instance-transport1-b-codfw [dns] - 10https://gerrit.wikimedia.org/r/469771 (https://phabricator.wikimedia.org/T207663)
[20:54:10] <wikibugs>	 (03CR) 10jenkins-bot: group1 wikis to 1.32.0-wmf.26  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469711 (owner: 1020after4)
[20:54:12] <wikibugs>	 (03CR) 10jenkins-bot: group1 wikis to 1.33.0-wmf.1  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469769 (owner: 1020after4)
[20:57:44] <musikanimal>	 anomie: are we in the middling of backfilling comments? I see some old revisions have a comment_id of 0 but rev_comment is non-empty
[20:59:00] <wikibugs>	 (03CR) 10Smalyshev: [C: 031] wdqs: remove wdqs1006 from internal cluster [puppet] - 10https://gerrit.wikimedia.org/r/469685 (owner: 10Gehel)
[20:59:06] <wikibugs>	 (03CR) 10Smalyshev: [C: 031] wdqs: add wdqs1006 to public cluster [puppet] - 10https://gerrit.wikimedia.org/r/469686 (owner: 10Gehel)
[20:59:31] <anomie>	 musikanimal: We haven't started backfilling comments yet. That's the next step, which is possibly blocked on T189158.
[20:59:32] <stashbot>	 T189158: Change `image` view to properly expose the new `img_description_id` field - https://phabricator.wikimedia.org/T189158
[20:59:40] <wikibugs>	 10Operations, 10MediaWiki Language Extension Bundle, 10MediaWiki-extensions-Translate, 10Language-Team (Language-2018-October-December), and 3 others: Moving or deleting a translatable page on mediawiki.org triggers an error message - https://phabricator.wikimedia.org/T207930 (10cscott) Verified that {rETR...
[21:01:04] <wikibugs>	 (03PS1) 10Faidon Liambotis: nfs-exportd: switch (back) from socket to ipaddress [puppet] - 10https://gerrit.wikimedia.org/r/469772
[21:01:04] <wikibugs>	 (03PS1) 10Faidon Liambotis: nfs-exportd: remove unused parameters from Project [puppet] - 10https://gerrit.wikimedia.org/r/469773
[21:01:08] <wikibugs>	 (03PS1) 10Faidon Liambotis: nfs-exportd: remove the Project class [puppet] - 10https://gerrit.wikimedia.org/r/469774
[21:01:32] <musikanimal>	 rats. Was kind of hoping it'd write to both rev_comment and comment_text until backfilling is done :/
[21:07:02] <wikibugs>	 (03CR) 10Dzahn: "> the unit is can be found at /run/systemd/generator.late/icinga.service. Not sure where it comes from though" [puppet] - 10https://gerrit.wikimedia.org/r/462600 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn)
[21:07:39] <SMalyshev>	 is there any script on prod machines that allows to set downtime easily without digging through icinga in browser?
[21:10:48] <wikibugs>	 10Operations, 10Analytics, 10EventBus, 10Wikidata, and 7 others: WDQS Updater ran into issue and stopped working - https://phabricator.wikimedia.org/T207817 (10mobrovac)
[21:13:24] <wikibugs>	 10Operations, 10Core Platform Team Backlog (Watching / External), 10HHVM, 10Patch-For-Review, and 2 others: Migrate to PHP 7 in WMF production - https://phabricator.wikimedia.org/T176370 (10MGChecker)
[21:16:32] <wikibugs>	 (03PS1) 1020after4: group2 wikis to 1.33.0-wmf.1  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469779
[21:16:34] <wikibugs>	 (03CR) 1020after4: [C: 032] group2 wikis to 1.33.0-wmf.1  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469779 (owner: 1020after4)
[21:17:15] <wikibugs>	 10Operations, 10CommRel-Specialists-Support (Oct-Dec-2018), 10User-Johan: Lessons learned: Communicating the server switch 2018 - https://phabricator.wikimedia.org/T206649 (10Johan) Written, sent to Seddon to make sure the CentralNotice part of it makes sense.
[21:19:55] <wikibugs>	 10Operations, 10Maps (Tilerator), 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): investigate tilerator crash on maps eqiad - https://phabricator.wikimedia.org/T204047 (10MSantos) >>! In T204047#4695804, @Mholloway wrote: > Tilerator should be resilient to attempting to access a locked...
[21:20:09] <wikibugs>	 (03Merged) 10jenkins-bot: group2 wikis to 1.33.0-wmf.1  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469779 (owner: 1020after4)
[21:21:04] <wikibugs>	 (03PS1) 10Cwhite: icinga: install nsca_frack.cfg in objects on stretch [puppet] - 10https://gerrit.wikimedia.org/r/469780 (https://phabricator.wikimedia.org/T202782)
[21:21:23] <wikibugs>	 10Operations, 10Cloud-Services, 10netops, 10Patch-For-Review: Renumber cloud-instance-transport1-b-eqiad to public IPs - https://phabricator.wikimedia.org/T207663 (10ayounsi) Thanks for investigating it!  See https://gerrit.wikimedia.org/r/c/operations/dns/+/469771 for the IPs, I took the same model as the...
[21:25:00] <wikibugs>	 (03CR) 10Bstorm: [C: 04-1] "Overall, doing this causes an unneeded rewrite of two functions as well." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/469774 (owner: 10Faidon Liambotis)
[21:25:03] <logmsgbot>	 !log twentyafterfour@deploy1001 rebuilt and synchronized wikiversions files: group2 wikis to 1.33.0-wmf.1  refs T206655
[21:25:06] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:25:07] <stashbot>	 T206655: 1.33.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T206655
[21:25:26] <wikibugs>	 (03CR) 10Ayounsi: [C: 032] DNS: assign public /29 for cloud-instance-transport1-b-codfw [dns] - 10https://gerrit.wikimedia.org/r/469771 (https://phabricator.wikimedia.org/T207663) (owner: 10Ayounsi)
[21:28:19] <wikibugs>	 (03PS1) 1020after4: group2 wikis to 1.32.0-wmf.26  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469781
[21:28:23] <wikibugs>	 (03CR) 1020after4: [C: 032] group2 wikis to 1.32.0-wmf.26  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469781 (owner: 1020after4)
[21:29:45] <XioNoX>	 !log configure 208.80.153.185/29 on cr1/2-codfw - T207663
[21:29:48] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:29:49] <stashbot>	 T207663: Renumber cloud-instance-transport1-b-eqiad to public IPs - https://phabricator.wikimedia.org/T207663
[21:30:42] <wikibugs>	 (03CR) 10jenkins-bot: group2 wikis to 1.33.0-wmf.1  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469779 (owner: 1020after4)
[21:30:56] <wikibugs>	 (03Merged) 10jenkins-bot: group2 wikis to 1.32.0-wmf.26  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469781 (owner: 1020after4)
[21:31:10] <wikibugs>	 (03CR) 10jenkins-bot: group2 wikis to 1.32.0-wmf.26  refs T206655 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469781 (owner: 1020after4)
[21:35:45] <logmsgbot>	 !log twentyafterfour@deploy1001 rebuilt and synchronized wikiversions files: rolling back group1 refs T206655 T208000
[21:35:49] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[21:35:49] <stashbot>	 T208000: Parser.php: Call to a member function get() on a non-object (null) - https://phabricator.wikimedia.org/T208000
[21:35:50] <stashbot>	 T206655: 1.33.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T206655
[21:38:32] <wikibugs>	 (03PS1) 10Bstorm: sonofgridengine: prepare web exec profiles for grid [puppet] - 10https://gerrit.wikimedia.org/r/469783 (https://phabricator.wikimedia.org/T200557)
[21:44:52] <wikibugs>	 (03CR) 10Thcipriani: [C: 032] Switch to Construct for the SSH agent protocol (032 comments) [software/keyholder] - 10https://gerrit.wikimedia.org/r/458233 (owner: 10Faidon Liambotis)
[21:45:33] <wikibugs>	 (03CR) 10Thcipriani: Switch to Construct for the SSH agent protocol [software/keyholder] - 10https://gerrit.wikimedia.org/r/458233 (owner: 10Faidon Liambotis)
[21:45:35] <wikibugs>	 (03PS4) 10Thcipriani: Switch to Construct for the SSH agent protocol [software/keyholder] - 10https://gerrit.wikimedia.org/r/458233 (owner: 10Faidon Liambotis)
[21:45:55] <wikibugs>	 (03CR) 10Thcipriani: [C: 032] "Let's try that again." [software/keyholder] - 10https://gerrit.wikimedia.org/r/458233 (owner: 10Faidon Liambotis)
[21:46:51] <wikibugs>	 (03Merged) 10jenkins-bot: Switch to Construct for the SSH agent protocol [software/keyholder] - 10https://gerrit.wikimedia.org/r/458233 (owner: 10Faidon Liambotis)
[21:47:16] <wikibugs>	 (03CR) 10Cwhite: [C: 031] "Looks like metrics are embedded within pybal itself." [puppet] - 10https://gerrit.wikimedia.org/r/469593 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff)
[21:47:39] <wikibugs>	 (03PS3) 10Thcipriani: Split handle_client_request() into multiple methods [software/keyholder] - 10https://gerrit.wikimedia.org/r/458234 (owner: 10Faidon Liambotis)
[21:47:41] <wikibugs>	 (03CR) 10Cwhite: [C: 031] Remove Diamond from LVSes [puppet] - 10https://gerrit.wikimedia.org/r/469594 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff)
[21:48:42] <wikibugs>	 (03CR) 10Bstorm: [C: 031] "Oh very nice!  I missed that when I upgraded it to python3 again." [puppet] - 10https://gerrit.wikimedia.org/r/469772 (owner: 10Faidon Liambotis)
[21:50:13] <wikibugs>	 10Operations, 10Maps (Tilerator), 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): investigate tilerator crash on maps eqiad - https://phabricator.wikimedia.org/T204047 (10Mholloway) If we're comfortable letting tile generation fail during the period populate_admin() runs (I'm not sure w...
[21:51:18] <wikibugs>	 (03CR) 10Thcipriani: [C: 032] Split handle_client_request() into multiple methods [software/keyholder] - 10https://gerrit.wikimedia.org/r/458234 (owner: 10Faidon Liambotis)
[21:51:58] <wikibugs>	 (03Merged) 10jenkins-bot: Split handle_client_request() into multiple methods [software/keyholder] - 10https://gerrit.wikimedia.org/r/458234 (owner: 10Faidon Liambotis)
[21:52:16] <wikibugs>	 (03PS3) 10Thcipriani: Stop referring to the daemon as a "proxy" [software/keyholder] - 10https://gerrit.wikimedia.org/r/458235 (owner: 10Faidon Liambotis)
[21:52:22] <wikibugs>	 (03CR) 10Faidon Liambotis: "I'm not familiar enough to test or validate, so I'd prefer it you +2ed/merged instead :)" [puppet] - 10https://gerrit.wikimedia.org/r/469772 (owner: 10Faidon Liambotis)
[21:53:02] <wikibugs>	 (03CR) 10Bstorm: [C: 031] "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/469772 (owner: 10Faidon Liambotis)
[21:56:44] <wikibugs>	 (03CR) 10Thcipriani: [C: 032] "Patch works. Description should also be updated in setup.py at some point." (031 comment) [software/keyholder] - 10https://gerrit.wikimedia.org/r/458235 (owner: 10Faidon Liambotis)
[21:57:19] <wikibugs>	 (03CR) 10Bstorm: [C: 032] nfs-exportd: switch (back) from socket to ipaddress [puppet] - 10https://gerrit.wikimedia.org/r/469772 (owner: 10Faidon Liambotis)
[21:57:25] <wikibugs>	 (03CR) 10Faidon Liambotis: "Yeah, I noticed that and was actually wondering whether I should tag this as RFC :)" [puppet] - 10https://gerrit.wikimedia.org/r/469774 (owner: 10Faidon Liambotis)
[21:57:32] <wikibugs>	 (03Merged) 10jenkins-bot: Stop referring to the daemon as a "proxy" [software/keyholder] - 10https://gerrit.wikimedia.org/r/458235 (owner: 10Faidon Liambotis)
[21:57:35] <wikibugs>	 (03Abandoned) 10Faidon Liambotis: nfs-exportd: remove the Project class [puppet] - 10https://gerrit.wikimedia.org/r/469774 (owner: 10Faidon Liambotis)
[21:59:18] <wikibugs>	 (03PS3) 10Thcipriani: Implement all the SSH agent bits and stop proxying [software/keyholder] - 10https://gerrit.wikimedia.org/r/458236 (owner: 10Faidon Liambotis)
[21:59:26] <wikibugs>	 10Operations, 10Maps (Tilerator), 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): investigate tilerator crash on maps eqiad - https://phabricator.wikimedia.org/T204047 (10MSantos) >>! In T204047#4696074, @Mholloway wrote: > If we're comfortable letting tile generation fail during the pe...
[22:03:19] <wikibugs>	 (03PS2) 10Bstorm: sonofgridengine: prepare web exec profiles for grid [puppet] - 10https://gerrit.wikimedia.org/r/469783 (https://phabricator.wikimedia.org/T200557)
[22:05:09] <wikibugs>	 (03CR) 10Bstorm: [C: 032] sonofgridengine: prepare web exec profiles for grid [puppet] - 10https://gerrit.wikimedia.org/r/469783 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm)
[22:13:35] <wikibugs>	 (03CR) 10Bstorm: "It looks like something was intended there but never was used... Will look at this some more." [puppet] - 10https://gerrit.wikimedia.org/r/469773 (owner: 10Faidon Liambotis)
[22:25:34] <wikibugs>	 (03PS1) 10Bstorm: sonofgridengine: Add new roles for stretch grid web nodes [puppet] - 10https://gerrit.wikimedia.org/r/469790 (https://phabricator.wikimedia.org/T200557)
[22:27:43] <wikibugs>	 (03PS1) 10Mobrovac: service::node: Set config-vars.yaml's mode to 0440 [puppet] - 10https://gerrit.wikimedia.org/r/469791 (https://phabricator.wikimedia.org/T207143)
[22:30:06] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "thanks!  https://puppet-compiler.wmflabs.org/compiler1002/13211/" [puppet] - 10https://gerrit.wikimedia.org/r/469780 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite)
[22:30:49] <wikibugs>	 (03PS2) 10Dzahn: icinga: install nsca_frack.cfg in objects on stretch [puppet] - 10https://gerrit.wikimedia.org/r/469780 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite)
[22:32:24] <wikibugs>	 (03PS2) 10Mobrovac: service::node: Set config-vars.yaml's mode to 0440 [puppet] - 10https://gerrit.wikimedia.org/r/469791 (https://phabricator.wikimedia.org/T207143)
[22:34:21] <wikibugs>	 (03CR) 10Mobrovac: "PCC OK - https://puppet-compiler.wmflabs.org/compiler1002/13213/" [puppet] - 10https://gerrit.wikimedia.org/r/469791 (https://phabricator.wikimedia.org/T207143) (owner: 10Mobrovac)
[22:41:59] <wikibugs>	 (03CR) 10Dzahn: [C: 032] "we have 2852 hosts on both jessie and stretch now :)" [puppet] - 10https://gerrit.wikimedia.org/r/469780 (https://phabricator.wikimedia.org/T202782) (owner: 10Cwhite)
[22:52:57] <wikibugs>	 (03PS4) 10Dzahn: icinga: on stretch, use systemd::service, unit file by systemd-sysv-generator [puppet] - 10https://gerrit.wikimedia.org/r/462600 (https://phabricator.wikimedia.org/T202782)
[22:53:51] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] icinga: on stretch, use systemd::service, unit file by systemd-sysv-generator [puppet] - 10https://gerrit.wikimedia.org/r/462600 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn)
[22:54:01] <addshore>	 jouncebot: now
[22:54:01] <jouncebot>	 No deployments scheduled for the next 0 hour(s) and 5 minute(s)
[22:54:04] <addshore>	 jouncebot: Nemo_bis 
[22:54:06] <addshore>	 :/
[22:54:09] <addshore>	 jouncebot: next
[22:54:10] <jouncebot>	 In 0 hour(s) and 5 minute(s): Evening SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181025T2300)
[22:59:32] <James_F>	 twentyafterfour: What code for CN is actually in prod?
[23:00:04] <jouncebot>	 addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the Evening SWAT (Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181025T2300).
[23:00:04] <jouncebot>	 AndyRussG and MaxSem: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker.
[23:00:29] <wikibugs>	 (03CR) 10Cwhite: [C: 031] "Based on the experience with NSCA, it's likely that systemd is generating the unit on installation.  We should manage it." [puppet] - 10https://gerrit.wikimedia.org/r/462600 (https://phabricator.wikimedia.org/T202782) (owner: 10Dzahn)
[23:00:44] <twentyafterfour>	 James_F: what do you mean? what version of CN? 
[23:00:53] <James_F>	 Yeah.
[23:01:09] <James_F>	 I'm reading the code off deployment.eqiad.wmnet now.
[23:01:25] <James_F>	 It doesn't match master.
[23:01:34] <MaxSem>	 Here. I can deploy
[23:01:50] <MaxSem>	 AndyRussG: yt?
[23:01:55] <AndyRussG>	 James_F: it's the head of the wmf_deploy
[23:01:57] <AndyRussG>	 MaxSem: yep!
[23:02:02] <James_F>	 AndyRussG: Eurgh.
[23:02:11] <AndyRussG>	 James_F: well put ;p
[23:02:14] <James_F>	 AndyRussG: This deprecated code was fixed four months ago in I5205ec0d96cb06087624f2cf8d83b8ae2256df0e.
[23:02:16] <AndyRussG>	 there's a task for that
[23:02:20] <James_F>	 AndyRussG: This is Not Helpful™.
[23:02:21] <AndyRussG>	 yes
[23:02:25] <AndyRussG>	 yes
[23:02:39] <AndyRussG>	 apologies
[23:02:47] <twentyafterfour>	 James_F: uh 
[23:02:52] <MaxSem>	 No commit to deploy
[23:03:00] <AndyRussG>	 MaxSem: hmm?
[23:03:09] <James_F>	 AndyRussG: Should I cherry-pick the fix for the UBN to the wmf-deploy branch?
[23:03:11] <bawolff_>	 CentralNotice running super old version is nothing new
[23:03:21] <AndyRussG>	 James_F: I did
[23:03:25] <MaxSem>	 Never mind
[23:03:45] <AndyRussG>	 https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CentralNotice/+/469794/
[23:03:48] <James_F>	 Ta.
[23:03:50] <AndyRussG>	 Lemme see what the -1 is
[23:03:55] <AndyRussG>	 I did test it locally
[23:04:17] <James_F>	 bawolff_: For some values of "super".
[23:04:32] <AndyRussG>	 MaxSem: James_F: the -1 on the Gerrit change is just a flapping QUnit test
[23:04:37] <AndyRussG>	 nothing changed there
[23:06:02] <James_F>	 twentyafterfour: Sorry about this. :-(
[23:06:17] <twentyafterfour>	 James_F: no apologies necessary 
[23:06:19] <wikibugs>	 (03CR) 10Thcipriani: [C: 04-1] "Couple of minor problems in the keyholder bash script" (032 comments) [software/keyholder] - 10https://gerrit.wikimedia.org/r/458236 (owner: 10Faidon Liambotis)
[23:06:59] <AndyRussG>	 James_F: twentyafterfour all the apologies are mine 8p
[23:07:16] <AndyRussG>	 Thanks for making that task! I hadn't seen it...
[23:07:31] <bawolff_>	 Why was this deprecated and removed so quickly anyways?
[23:08:50] <James_F>	 I screwed up.
[23:08:57] <James_F>	 bawolff_: Over six months?
[23:09:14] <bawolff_>	 According to the task it was soft-deprecated in 1.31
[23:09:40] <James_F>	 Yes.
[23:09:48] <bawolff_>	 usually there's at least one release of soft-deprecation, and one release of hard-deprecated, at least for a super commonly called method
[23:09:51] <James_F>	 Soft in 1.31, hard in 1.32, removed in 1.33.
[23:10:04] <James_F>	 We're in 1.33 now.
[23:10:07] <bawolff_>	 Oh right, we're 1.33
[23:10:12] <bawolff_>	 sorry, forgot
[23:10:29] <bawolff_>	 That's a much more reasonable deprecation path
[23:10:31] <James_F>	 But I should have remembered that codesearch lies about things by assuming master is used. :-)
[23:12:44] <wikibugs>	 (03CR) 10Alexandros Kosiaris: [C: 04-1] service::node: Set config-vars.yaml's mode to 0440 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/469791 (https://phabricator.wikimedia.org/T207143) (owner: 10Mobrovac)
[23:13:00] <wikibugs>	 (03PS1) 10Addshore: Define and specify lexeme NS for wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469796
[23:13:07] <wikibugs>	 (03CR) 10Thcipriani: [C: 031] Add permission checks for various commands [software/keyholder] - 10https://gerrit.wikimedia.org/r/458240 (owner: 10Faidon Liambotis)
[23:13:31] <wikibugs>	 (03PS2) 10Addshore: Define and specify lexeme NS for wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469796
[23:13:49] <addshore>	 \o could someone give me a heads up once swat is done? :)
[23:14:03] <addshore>	 James_F: we can still try to get it turned on today :)
[23:14:10] <James_F>	 addshore: OK…
[23:15:35] <bawolff_>	 After you are done that, there is a thingy I want to deploy as well
[23:15:56] <addshore>	 is swat happening?
[23:15:58] <AndyRussG>	 bawolff_: James_F all definitely my fault for not pushing out these accumulated CN changes sooner, and also for not getting to fixing the CN deploy setup
[23:16:01] <James_F>	 bawolff_: You first, we're deploying stuff to Beta Cluster so need to wait.
[23:16:11] <addshore>	 James_F: also, https://phabricator.wikimedia.org/T207683, new property should also not be listed etc right?
[23:16:30] <bawolff_>	 James_F: I'm not ready yet, and I still have to do some things first, you should definitely go first
[23:16:30] <AndyRussG>	 addshore: I think MaxSem is deploying?
[23:16:35] <addshore>	 AndyRussG: ack
[23:16:46] <wikibugs>	 (03CR) 10WMDE-leszek: [C: 031] "looks legit" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469796 (owner: 10Addshore)
[23:16:57] <MaxSem>	 Deploying is such a shiny word for "waiting for Zuul"
[23:17:10] <addshore>	 hehehe
[23:17:15] <addshore>	 *agrees*
[23:17:20] <addshore>	 well, jenkins ;)
[23:17:47] <James_F>	 AndyRussG: Totally understandable. It's just that each team has a special thing it does that makes every other team have difficulties (FR has slow releases for security/stability; Web has webpack pre-built files; Growth has templates(!); Perf has deploy-on-Sunday-night; Lang has support-for-two-years; etc.)
[23:18:12] <James_F>	 Each team makes a local maxima choice, for good reasons, it just disrupts the rest of us. :-(
[23:19:39] <MaxSem>	 Fire, fire! Kill it all with fire! :P
[23:20:36] <wikibugs>	 (03CR) 10Thcipriani: [C: 04-1] "> 2. Might want to check permissions before handling remove(_all) or" [software/keyholder] - 10https://gerrit.wikimedia.org/r/458236 (owner: 10Faidon Liambotis)
[23:22:30] <AndyRussG>	 James_F: yep... mmm one sec, lemme find that Fab dask
[23:22:32] <AndyRussG>	 task
[23:23:39] <AndyRussG>	 James_F: here's at least one (maybe there's more): https://phabricator.wikimedia.org/T113428
[23:26:11] <wikibugs>	 (03CR) 10Jforrester: [C: 031] Define and specify lexeme NS for wikidatawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469796 (owner: 10Addshore)
[23:26:12] <logmsgbot>	 !log maxsem@deploy1001 Synchronized php-1.33.0-wmf.1/extensions/GlobalPreferences/: https://gerrit.wikimedia.org/r/c/469793/ (duration: 00m 58s)
[23:26:17] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:27:19] <James_F>	 addshore: Also, yes, we don't want CreateProperty on Commons right now, but it's OK to have it listed (as long as it's inoperable), as we will want it quite soon.
[23:27:47] <addshore>	 James_F: "as long as it's inoperable", yeh, for that reason I think it is safer to just get rid of it for now :)
[23:27:51] <addshore>	 its less scary that way
[23:28:01] <James_F>	 addshore: Focussing on the actually-wrong things in T207683
[23:28:03] <stashbot>	 T207683: Wikibase Repo api modules and special pages should be conditionally loaded based on entity types enabled? - https://phabricator.wikimedia.org/T207683
[23:28:57] <MaxSem>	 AndyRussG: pulled on mwdebug1002
[23:29:42] <AndyRussG>	 MaxSem: ok checking
[23:32:02] <AndyRussG>	 MaxSem: internal server error
[23:32:13] <AndyRussG>	 Also the request took a long time
[23:32:31] <MaxSem>	 Are you testing og wmf.26?
[23:32:36] <MaxSem>	 s/og/on/
[23:32:39] <AndyRussG>	 MaxSem: oh wait now it worked
[23:32:41] <TimStarling>	 UBN bug fix https://gerrit.wikimedia.org/r/c/mediawiki/core/+/469798
[23:32:50] <TimStarling>	 not sure why the site is up with this bug active
[23:33:10] <addshore>	 interesting
[23:33:13] <AndyRussG>	 MaxSem: no, wmf.1, which is what's on Meta, where the bug occurrs. https://meta.wikimedia.org/wiki/Special:CentralNoticeBanners
[23:33:19] <AndyRussG>	 MaxSem: it just worked now
[23:33:25] <TimStarling>	 maybe it is one for MaxSem?
[23:33:52] <twentyafterfour>	 TimStarling: because I rolled back 
[23:34:07] <addshore>	 twentyafterfour: its still on the other sites though right? you didn't roll back all of the groups?
[23:34:27] <AndyRussG>	 MaxSem: I don't have any way to test for wmf.26, but I thought it was sane to syncrhonize versions
[23:34:40] <twentyafterfour>	 addshore: I did not roll back all of the groups, I rolled back to group1 because the error rate was low at group1 
[23:34:50] <TimStarling>	 maybe the request rate is low?
[23:34:53] <MaxSem>	 AndyRussG: wmf.26 is enwiki
[23:35:25] <twentyafterfour>	 TimStarling: right
[23:35:41] <MaxSem>	 AndyRussG: so... Are we ready to deploy?
[23:35:48] <AndyRussG>	 MaxSem: yeah... just it's I think an issue for future deploys prior to the next train to have CN synced not synced to all versions
[23:36:00] <AndyRussG>	 MaxSem: Did you see what the server error was?
[23:36:07] <AndyRussG>	 maybe mwdebug1002 just timed out because overloaded?
[23:36:08] <wikibugs>	 (03PS1) 10Brian Wolff: Enable CSP-report-only for logged in/session having users on enwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469800 (https://phabricator.wikimedia.org/T207900)
[23:36:14] <AndyRussG>	 I didn't see the error in logstash
[23:36:32] <AndyRussG>	 dunno if mwdebug servers' logs go there
[23:36:32] <MaxSem>	 Your error is Call to undefined method LanguageEn::truncate() in /srv/mediawiki/php-1.33.0-wmf.1/extensions/CentralNotice/special/SpecialCentralNotice.php on line 1543
[23:36:42] <wikibugs>	 (03PS3) 10Mobrovac: service::node: Set config-vars.yaml's mode to 0440 [puppet] - 10https://gerrit.wikimedia.org/r/469791 (https://phabricator.wikimedia.org/T207143)
[23:36:59] <twentyafterfour>	 TimStarling: 10 errors in 15 minutes is pretty low 
[23:37:00] <MaxSem>	 There are a bunch of timeouts, so a particular one is pretty hard to identify
[23:37:02] <AndyRussG>	 MaxSem: on mwdebyg1002?
[23:37:10] <AndyRussG>	 Maybe it was just that wmf.1 wasn't synced yet
[23:37:27] <AndyRussG>	 That's the same error that we're fixing for
[23:37:34] <twentyafterfour>	 I did a full scap earlier today so everything should be sync'd up
[23:37:35] <AndyRussG>	 MaxSem: looks fine now, I'd say deploy
[23:37:38] <MaxSem>	 Hmm, try now?
[23:37:44] <wikibugs>	 (03PS4) 10Faidon Liambotis: Implement all the SSH agent bits and stop proxying [software/keyholder] - 10https://gerrit.wikimedia.org/r/458236
[23:37:46] <wikibugs>	 (03PS3) 10Faidon Liambotis: Split SshAgentCommand type to Request/Response [software/keyholder] - 10https://gerrit.wikimedia.org/r/458237
[23:37:48] <wikibugs>	 (03PS3) 10Faidon Liambotis: Make pylint a little happier [software/keyholder] - 10https://gerrit.wikimedia.org/r/458238
[23:37:50] <wikibugs>	 (03PS3) 10Faidon Liambotis: Use mlockall() to avoid any potential swapping [software/keyholder] - 10https://gerrit.wikimedia.org/r/458239
[23:37:53] <wikibugs>	 (03PS3) 10Faidon Liambotis: Add permission checks for various commands [software/keyholder] - 10https://gerrit.wikimedia.org/r/458240
[23:37:54] <AndyRussG>	 MaxSem: yep still all good
[23:37:54] <wikibugs>	 (03PS3) 10Faidon Liambotis: Verify the validity of signature requests [software/keyholder] - 10https://gerrit.wikimedia.org/r/458241
[23:37:57] <wikibugs>	 (03PS3) 10Faidon Liambotis: Implement SSH_AGENTC_LOCK/SSH_AGENTC_UNLOCK [software/keyholder] - 10https://gerrit.wikimedia.org/r/458242
[23:37:59] <wikibugs>	 (03PS3) 10Faidon Liambotis: Parse/build agent request/responses once [software/keyholder] - 10https://gerrit.wikimedia.org/r/458243
[23:38:01] <wikibugs>	 (03PS3) 10Faidon Liambotis: Refactor handle() [software/keyholder] - 10https://gerrit.wikimedia.org/r/458244
[23:38:02] <wikibugs>	 (03PS3) 10Faidon Liambotis: Add compatibility with Construct 2.8.22 and 2.9.45 [software/keyholder] - 10https://gerrit.wikimedia.org/r/458245
[23:38:05] <wikibugs>	 (03PS3) 10Faidon Liambotis: Switch path handling to pathlib.Path [software/keyholder] - 10https://gerrit.wikimedia.org/r/458246
[23:38:06] <wikibugs>	 (03PS3) 10Faidon Liambotis: Unlink the Unix domain socket when exiting [software/keyholder] - 10https://gerrit.wikimedia.org/r/458247
[23:38:08] <wikibugs>	 (03PS3) 10Faidon Liambotis: Abstract the SSH fingerprint generation [software/keyholder] - 10https://gerrit.wikimedia.org/r/458248
[23:38:11] <wikibugs>	 (03PS3) 10Faidon Liambotis: Stop spawning ssh-keygen but generate fps ourselves [software/keyholder] - 10https://gerrit.wikimedia.org/r/458249
[23:38:24] <AndyRussG>	 pls go ahead and deploy anytime
[23:39:23] <logmsgbot>	 !log maxsem@deploy1001 Synchronized php-1.33.0-wmf.1/extensions/CentralNotice/: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CentralNotice/+/469794/ (duration: 00m 57s)
[23:39:25] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:39:29] <wikibugs>	 (03CR) 10Faidon Liambotis: "1. Really good point. Hadn't thought of that!" (032 comments) [software/keyholder] - 10https://gerrit.wikimedia.org/r/458236 (owner: 10Faidon Liambotis)
[23:39:32] <MaxSem>	 AndyRussG: ^
[23:40:14] <AndyRussG>	 MaxSem: yep all good!
[23:40:18] <AndyRussG>	 Thanks so much!!!! :)
[23:40:32] <MaxSem>	 Whee
[23:40:36] <wikibugs>	 (03CR) 10Mobrovac: service::node: Set config-vars.yaml's mode to 0440 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/469791 (https://phabricator.wikimedia.org/T207143) (owner: 10Mobrovac)
[23:40:46] <AndyRussG>	 heh indeed
[23:41:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Abstract the SSH fingerprint generation [software/keyholder] - 10https://gerrit.wikimedia.org/r/458248 (owner: 10Faidon Liambotis)
[23:41:53] <twentyafterfour>	 is that all for swat? I need to deploy Tim's patch asap 
[23:42:57] <James_F>	 AndyRussG: No need to push I5205ec0d96cb06087624f2cf8d83b8ae2256df0e to 1.32.0-wmf.26, right?
[23:43:15] <James_F>	 The function called was only dropped in 1.33.0-wmf.1.
[23:43:23] <James_F>	 twentyafterfour: Clear, go for it.
[23:43:27] <wikibugs>	 (03PS4) 10Mobrovac: service::node: Set config-vars.yaml's mode to 0440 [puppet] - 10https://gerrit.wikimedia.org/r/469791 (https://phabricator.wikimedia.org/T207143)
[23:44:27] <twentyafterfour>	 well crud, still hasn't merged so I guess I was a little ahead of myself there
[23:44:32] <twentyafterfour>	 it's almost through CI 
[23:44:33] <James_F>	 :-D
[23:45:24] <AndyRussG>	 James_F: As regards actual site functionality goes, that's correct, no need to. As regards deployment procedure... I've heard it's a pain patches merged on the CN wmf_deploy branch don't get deployed to all live versions
[23:45:41] <AndyRussG>	 because the submodule just points to the head of wmf_deploy
[23:45:50] <James_F>	 Oh, right. Let me look.
[23:46:01] <AndyRussG>	 so if someone comes along to deploy something else to wmf.26 then they're confused 'cause there's undeployed stuff
[23:46:03] <AndyRussG>	 something like that
[23:46:38] <twentyafterfour>	 submodules always just point to a detached head 
[23:46:38] <wikibugs>	 (03PS5) 10Mobrovac: service::node: Set config-vars.yaml's mode to 0440 [puppet] - 10https://gerrit.wikimedia.org/r/469791 (https://phabricator.wikimedia.org/T207143)
[23:46:56] <twentyafterfour>	 unless you explicitly check out the branch within the submodule 
[23:46:58] <James_F>	 Yeah, `diff php-1.33.0-wmf.1/extensions/CentralNotice/special/SpecialCentralNotice.php php-1.32.0-wmf.26/extensions/CentralNotice/special/SpecialCentralNotice.php` is not-empty.
[23:47:16] <James_F>	 twentyafterfour: Want to fix?
[23:47:18] <addshore>	 so, techconf is wrapping up, I will need to get https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/469796/ out of the door once the UBN things are deployed
[23:47:28] <James_F>	 Sorry addshore .
[23:47:32] <James_F>	 Too much excitement.
[23:47:39] <twentyafterfour>	 James_F: fix which? 
[23:47:44] <addshore>	 as beta wikidata will be broken until I do :), James_F yup, thats fine, I can come back in a little bit
[23:47:55] <James_F>	 twentyafterfour: Fix the submodule for wmf.26.
[23:48:08] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: Add chart to pod labels [deployment-charts] - 10https://gerrit.wikimedia.org/r/469661
[23:48:10] <wikibugs>	 (03PS2) 10Alexandros Kosiaris: Support canary functionality [deployment-charts] - 10https://gerrit.wikimedia.org/r/469662
[23:48:22] <twentyafterfour>	 James_F: I'm not sure what it's supposed to point to?
[23:48:30] <wikibugs>	 (03PS5) 10Faidon Liambotis: Implement all the SSH agent bits and stop proxying [software/keyholder] - 10https://gerrit.wikimedia.org/r/458236
[23:48:34] <wikibugs>	 (03CR) 10Mobrovac: "PCC ok - https://puppet-compiler.wmflabs.org/compiler1002/13217/" [puppet] - 10https://gerrit.wikimedia.org/r/469791 (https://phabricator.wikimedia.org/T207143) (owner: 10Mobrovac)
[23:48:40] <twentyafterfour>	 should it be the same as 1.33.0-wmf.1?
[23:48:41] <wikibugs>	 (03CR) 10Dzahn: [C: 031] "https://puppet-compiler.wmflabs.org/compiler1002/13215/phab1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/467978 (owner: 10Muehlenhoff)
[23:48:52] <James_F>	 It should, yes, and it isn't.
[23:48:56] <James_F>	 I'll fix, one mo.
[23:48:58] <twentyafterfour>	 ok I can fix that
[23:49:00] <twentyafterfour>	 oh
[23:49:44] <James_F>	 AndyRussG: wmf.26 live on mwdebug1002.
[23:49:49] <James_F>	 AndyRussG: Does it need further testing?
[23:50:04] <AndyRussG>	 James_F: heh no, we can't actually test this change there
[23:50:12] <AndyRussG>	 I mean, other than checking that the site isn't down
[23:50:16] <James_F>	 Oh, right, it's only used on Meta.
[23:50:20] <AndyRussG>	 yeah
[23:50:41] <James_F>	 Site is indeed not down via mwdebug1002.
[23:50:44] <James_F>	 OK, I'll sync.
[23:51:14] <AndyRussG>	 James_F: yeah sounds great! thanks!!!
[23:52:28] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review: newer version of nagios-nrpe-plugin nrpe (check_nrpe) with fixed logging issue on stretch icinga - https://phabricator.wikimedia.org/T207775 (10Dzahn) It has been fixed by @colewhite and by adding the binaries like so:   ``` 17:05 < moritzm> reprepro -C main in...
[23:53:16] <logmsgbot>	 !log jforrester@deploy1001 Synchronized php-1.32.0-wmf.26/extensions/CentralNotice/special/SpecialCentralNotice.php: SWAT Sync versions of SpecialCentralNotice to avoid dirty repo checkout T208004 (duration: 00m 56s)
[23:53:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:53:21] <stashbot>	 T208004: Call to undefined method LanguageEn::truncate() - https://phabricator.wikimedia.org/T208004
[23:53:28] <James_F>	 twentyafterfour: Conch is yours. Just in time for jenkins to merge 469799. :-)
[23:53:44] <twentyafterfour>	 phpunit is at 58% 
[23:53:54] * James_F sighs.
[23:53:57] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review: newer version of nagios-nrpe-plugin nrpe (check_nrpe) with fixed logging issue on stretch icinga - https://phabricator.wikimedia.org/T207775 (10Dzahn) Mostly resolved but we might want to keep it open for the "suggest to upstream maintainer" part too.
[23:54:07] <wikibugs>	 10Operations, 10monitoring, 10Patch-For-Review: newer version of nagios-nrpe-plugin nrpe (check_nrpe) with fixed logging issue on stretch icinga - https://phabricator.wikimedia.org/T207775 (10Dzahn) p:05Triage>03Normal
[23:54:08] <James_F>	 Can we paint some go-faster stripes on the RAM sticks or something?
[23:54:21] * twentyafterfour thinks we need more jenkins slaves
[23:54:30] <twentyafterfour>	 some go-faster stripes would be nice as well
[23:54:38] <James_F>	 Dedicated ones that site around waiting for deployment patches would be nice.
[23:54:51] <James_F>	 "Wasted" but fast when we need it.
[23:55:30] <twentyafterfour>	 like these? https://en.wikipedia.org/wiki/Heat_spreader
[23:55:31] <James_F>	 Or ultra-fast nodes that dumped their in-progress tests when a higher-priority task came along.
[23:55:40] <James_F>	 But that'd be… a messy hand-off.
[23:57:17] <twentyafterfour>	 and merged
[23:57:28] <twentyafterfour>	 !log deploying https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/469799/
[23:57:29] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[23:59:29] <logmsgbot>	 !log twentyafterfour@deploy1001 Synchronized php-1.33.0-wmf.1/includes/parser/Parser.php: deploy https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/469799/ refs T208000 (duration: 00m 56s)
[23:59:57] <twentyafterfour>	 ok done. addshore did you have something to deploy as well?