[00:24:18] PROBLEM - Misc HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=misc&var-status_type=5 [00:24:49] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [00:32:59] RECOVERY - Misc HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=All&var-cache_type=misc&var-status_type=5 [00:33:39] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3&fullscreen&orgId=1&var-site=esams&var-cache_type=All&var-status_type=5 [01:16:48] PROBLEM - High lag on wdqs1003 is CRITICAL: 3603 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [02:21:05] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.8) (duration: 08m 40s) [02:21:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:39:05] !log l10nupdate@deploy1001 scap sync-l10n completed (1.32.0-wmf.999) (duration: 07m 37s) [02:39:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:18:59] PROBLEM - WDQS HTTP Port on wdqs1004 is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - 380 bytes in 0.001 second response time [03:21:08] PROBLEM - Check systemd state on wdqs1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [03:21:18] RECOVERY - WDQS HTTP Port on wdqs1004 is OK: HTTP OK: HTTP/1.1 200 OK - 434 bytes in 0.021 second response time [03:22:09] RECOVERY - Check systemd state on wdqs1004 is OK: OK - running: The system is fully operational [03:30:49] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 793.87 seconds [03:35:18] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 264.75 seconds [04:22:18] PROBLEM - proton endpoints health on proton2002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 503 (expecting: 200) [04:23:19] RECOVERY - proton endpoints health on proton2002 is OK: All endpoints are healthy [04:24:18] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [04:24:28] PROBLEM - MariaDB Slave Lag: s8 on db2085 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 604.57 seconds [04:24:48] PROBLEM - MariaDB Slave Lag: s8 on db1099 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 623.85 seconds [04:24:49] PROBLEM - MariaDB Slave Lag: s8 on db2086 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 628.36 seconds [05:18:58] PROBLEM - High lag on wdqs1005 is CRITICAL: 3.041e+04 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [05:20:19] PROBLEM - wikidata.org dispatch lag is REALLY high ---4000s- on www.wikidata.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1979 bytes in 0.063 second response time [05:22:18] RECOVERY - MariaDB Slave Lag: s8 on db2086 is OK: OK slave_sql_lag Replication lag: 0.33 seconds [05:22:58] RECOVERY - MariaDB Slave Lag: s8 on db2085 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [05:28:26] (03PS1) 10Marostegui: db-eqiad.php: Depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441761 [05:29:48] RECOVERY - MariaDB Slave Lag: s8 on db1099 is OK: OK slave_sql_lag Replication lag: 0.00 seconds [05:29:58] Weird [05:30:21] (03Abandoned) 10Marostegui: db-eqiad.php: Depool db1099 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441761 (owner: 10Marostegui) [05:35:17] (03PS1) 10Marostegui: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441762 (https://phabricator.wikimedia.org/T191316) [05:36:58] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=2&fullscreen [05:37:54] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441762 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [05:39:40] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441762 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [05:39:55] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441762 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [05:40:48] RECOVERY - wikidata.org dispatch lag is REALLY high ---4000s- on www.wikidata.org is OK: HTTP OK: HTTP/1.1 200 OK - 1977 bytes in 0.079 second response time [05:41:12] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1080 for alter table (duration: 01m 17s) [05:41:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:41:32] 10Operations, 10ops-codfw: db2056: disk with predictive failure - https://phabricator.wikimedia.org/T198048#4310204 (10Marostegui) [05:41:41] 10Operations, 10ops-codfw: db2056: disk with predictive failure - https://phabricator.wikimedia.org/T198048#4310216 (10Marostegui) p:05Triage>03Normal [05:42:13] ACKNOWLEDGEMENT - Device not healthy -SMART- on db2056 is CRITICAL: cluster=mysql device=cciss,0 instance=db2056:9100 job=node site=codfw Marostegui T198048 - The acknowledgement expires at: 2018-07-02 05:41:52. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2056&var-datasource=codfw%2520prometheus%252Fops [05:43:02] !log Deploy schema change on db1080 T191316 T192926 T89737 T195193 [05:43:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:43:06] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [05:43:06] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [05:43:06] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [05:43:07] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [05:57:37] 10Operations, 10Wikidata: Investigate possible outage on wikidata on 25th June - 04:13AM UTC - 05:27AM UTC - https://phabricator.wikimedia.org/T198049#4310223 (10Marostegui) [06:05:49] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not found for a nonexistent title) is CRITICAL: Test Respond file not found for a nonexistent title returned the unexpected status 503 (expecting: 404) [06:08:08] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [06:09:18] 10Operations, 10Wikidata: Investigate possible outage on wikidata on 25th June - 04:13AM UTC - 05:27AM UTC - https://phabricator.wikimedia.org/T198049#4310238 (10jcrespo) [06:11:28] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Respond file not found for a nonexistent title) is CRITICAL: Test Respond file not found for a nonexistent title returned the unexpected status 503 (expecting: 404) [06:12:29] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy [06:13:38] PROBLEM - proton endpoints health on proton2002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 503 (expecting: 200) [06:14:48] RECOVERY - proton endpoints health on proton2002 is OK: All endpoints are healthy [06:25:13] (03CR) 10Elukey: [C: 032] profile::prometheus::alerts: remove old checks [puppet] - 10https://gerrit.wikimedia.org/r/441535 (owner: 10Elukey) [06:29:48] PROBLEM - puppet last run on mw1289 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/cgroup-mediawiki-clean] [06:31:38] PROBLEM - puppet last run on mw1308 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/ImageMagick-6/policy.xml] [06:31:48] PROBLEM - puppet last run on rdb1003 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/share/ca-certificates/GlobalSign_Organization_Validation_CA_-_SHA256_-_G2.crt] [06:32:10] <_joe_> looks like a puppet server with some issues ^^ [06:32:58] PROBLEM - High lag on wdqs1003 is CRITICAL: 3620 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [06:32:59] PROBLEM - puppet last run on mw1278 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/php/7.0/cli/php.ini] [06:33:40] 10Operations, 10Deployments, 10HHVM, 10Patch-For-Review, and 3 others: Translation cache exhaustion caused by changes to PHP code in file scope - https://phabricator.wikimedia.org/T103886#4310251 (10Joe) >>! In T103886#4307439, @MoritzMuehlenhoff wrote: > Does that actually still make sense at this point?... [06:35:01] (03CR) 10Elukey: [V: 032 C: 032] "Thanks!" [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/440644 (https://phabricator.wikimedia.org/T197503) (owner: 10Krinkle) [06:35:40] 10Operations, 10Analytics, 10Cleanup, 10Patch-For-Review: Archive operations/puppet/varnishkafka repository - https://phabricator.wikimedia.org/T197503#4310264 (10elukey) [06:40:28] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: WDQS timeout on the public eqiad cluster - https://phabricator.wikimedia.org/T198042#4310266 (10Gehel) Looking at thread dumps on wdqs1005, there is > 5000 threads waiting logging (see stack trace... [06:42:04] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Import some Analytics git puppet submodules to operations/puppet - https://phabricator.wikimedia.org/T188377#4310268 (10elukey) [06:42:09] 10Operations, 10Analytics, 10Cleanup, 10Patch-For-Review: Archive operations/puppet/varnishkafka repository - https://phabricator.wikimedia.org/T197503#4310267 (10elukey) [06:42:39] PROBLEM - Disk space on elastic1024 is CRITICAL: DISK CRITICAL - free space: /srv 60850 MB (12% inode=99%) [06:42:59] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: Enable async logging on Wikidata Query Service - https://phabricator.wikimedia.org/T198051#4310269 (10Gehel) [06:44:23] 10Operations, 10Wikidata: Investigate possible outage on wikidata on 25th June - 04:13AM UTC - 05:27AM UTC - https://phabricator.wikimedia.org/T198049#4310223 (10jcrespo) Edits halved, which is consistent with a wikidata outage: {F22621659} No long running queries on master or db1099:3318: {F22621676} {F22621... [06:49:04] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: Enable async logging on Wikidata Query Service - https://phabricator.wikimedia.org/T198051#4310286 (10Smalyshev) Another task being {T197645}. [06:52:29] RECOVERY - Disk space on elastic1024 is OK: DISK OK [06:57:08] RECOVERY - puppet last run on mw1308 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:18] RECOVERY - puppet last run on rdb1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:58:19] RECOVERY - puppet last run on mw1278 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:00:09] RECOVERY - puppet last run on mw1289 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:01:18] RECOVERY - High lag on wdqs1004 is OK: (C)3600 ge (W)1200 ge 1024 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [07:08:47] (03CR) 10Elukey: "No op: https://puppet-compiler.wmflabs.org/compiler02/11554/" [puppet] - 10https://gerrit.wikimedia.org/r/440540 (owner: 10Elukey) [07:11:32] 10Operations, 10Wikidata: Investigate possible outage on wikidata on 25th June - 04:13AM UTC - 05:27AM UTC - https://phabricator.wikimedia.org/T198049#4310346 (10jcrespo) 51,715 exceptions with: ``` [{exception_id}] {exception_url} Wikimedia\Rdbms\DBReplicationWaitError from line 426 of /srv/mediawiki/php-1.3... [07:11:44] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: WDQS timeout on the public eqiad cluster - https://phabricator.wikimedia.org/T198042#4310347 (10Gehel) The pattern of banned / throttled request as seen on wdqs matches a pattern of [[ https://logs... [07:14:35] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: Investigate HTTP 500 on POST request to WDQS - https://phabricator.wikimedia.org/T198055#4310350 (10Gehel) [07:16:01] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: Investigate HTTP 500 on POST request to WDQS - https://phabricator.wikimedia.org/T198055#4310375 (10Gehel) [07:19:28] (03PS1) 10Gehel: wdqs: enable async logging [puppet] - 10https://gerrit.wikimedia.org/r/441772 (https://phabricator.wikimedia.org/T198051) [07:21:07] 10Operations, 10ops-codfw, 10DNS, 10Traffic, 10Patch-For-Review: rack/setup/install dns200[12].wikimedia.org - https://phabricator.wikimedia.org/T196493#4310383 (10ayounsi) [07:21:10] 10Operations, 10ops-codfw, 10DNS, 10Traffic, 10netops: switch port configuration for dns200[1-2] - https://phabricator.wikimedia.org/T197697#4310380 (10ayounsi) 05Open>03Resolved a:03ayounsi Switch ports enabled and in the public vlans. [07:30:28] 10Operations: Integrate jessie 8.11 point update - https://phabricator.wikimedia.org/T198058#4310395 (10MoritzMuehlenhoff) [07:31:19] RECOVERY - High lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 1178 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [07:32:18] PROBLEM - Disk space on elastic1024 is CRITICAL: DISK CRITICAL - free space: /srv 61873 MB (12% inode=99%) [07:33:52] !log slow rollout of most of the remaining debmonitor client installations [07:33:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:41:08] RECOVERY - Disk space on elastic1024 is OK: DISK OK [07:46:25] !log depooling wdqs1005 to allow it to catch up on updates - T198042 [07:46:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:46:27] T198042: WDQS timeout on the public eqiad cluster - https://phabricator.wikimedia.org/T198042 [07:53:04] (03CR) 10Smalyshev: [C: 031] wdqs: enable async logging [puppet] - 10https://gerrit.wikimedia.org/r/441772 (https://phabricator.wikimedia.org/T198051) (owner: 10Gehel) [08:04:28] (03CR) 10Gehel: [C: 032] wdqs: enable async logging [puppet] - 10https://gerrit.wikimedia.org/r/441772 (https://phabricator.wikimedia.org/T198051) (owner: 10Gehel) [08:05:31] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441773 [08:08:03] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441773 (owner: 10Marostegui) [08:08:33] (03PS1) 10Gehel: wdqs: don't log MemoryManagerClosedException [puppet] - 10https://gerrit.wikimedia.org/r/441774 (https://phabricator.wikimedia.org/T198046) [08:09:37] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441773 (owner: 10Marostegui) [08:09:50] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441773 (owner: 10Marostegui) [08:11:01] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1080 after alter table (duration: 00m 58s) [08:11:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:11:23] !log rolling restart of wdqs to enable async logging - T198051 [08:11:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:11:25] T198051: Enable async logging on Wikidata Query Service - https://phabricator.wikimedia.org/T198051 [08:12:09] (03PS1) 10Marostegui: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441775 (https://phabricator.wikimedia.org/T191316) [08:13:52] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441775 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [08:15:28] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441775 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [08:16:24] !log Stop replication on db1106 to change triggers for db1124 and db1095 - T192926 [08:16:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:16:26] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [08:16:41] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1106 for alter table (duration: 00m 56s) [08:16:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:20:12] esams misc and restbase seem bad [08:20:28] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1106 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441775 (https://phabricator.wikimedia.org/T191316) (owner: 10Marostegui) [08:22:14] !log Deploy schema change on db1106 with replication, this will generate lag on s1 labs T191316 T192926 T89737 T195193 [08:22:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:18] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [08:22:18] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [08:22:19] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [08:22:19] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [08:22:40] 10Operations, 10Discovery, 10Wikidata, 10Wikidata-Query-Service, 10Discovery-Wikidata-Query-Service-Sprint: Investigate HTTP 500 on POST request to WDQS - https://phabricator.wikimedia.org/T198055#4310350 (10Smalyshev) Looks like related to this in the logs: ``` java.time.temporal.UnsupportedTemporalTyp... [08:43:11] (03PS4) 10Marostegui: mariadb: Set db1095 as spare, remove unused code [puppet] - 10https://gerrit.wikimedia.org/r/437720 (https://phabricator.wikimedia.org/T196376) [08:54:36] (03PS1) 10Gehel: wdqs: async appender only support a single appender-ref [puppet] - 10https://gerrit.wikimedia.org/r/441781 (https://phabricator.wikimedia.org/T198051) [08:55:17] (03CR) 10Gehel: [C: 032] wdqs: async appender only support a single appender-ref [puppet] - 10https://gerrit.wikimedia.org/r/441781 (https://phabricator.wikimedia.org/T198051) (owner: 10Gehel) [09:09:44] 10Operations, 10Puppet, 10User-herron: Improve puppet alerting - https://phabricator.wikimedia.org/T178628#4310614 (10fgiunchedi) I think the flooding / spam on IRC is by far the most annoying, so I think we should: * Stop sending individual host puppet failures (`check_puppetrun`) to IRC, keeping them only... [09:11:00] 10Operations, 10Mail, 10monitoring, 10User-herron, 10Wikimedia-Incident: Graph outbound mail volume on per-service or hostgroup level - https://phabricator.wikimedia.org/T197171#4310615 (10fgiunchedi) The more immediate action would be to deploy `mtail` to mx servers and write a few rules to munge intere... [09:15:57] 10Operations, 10ops-codfw, 10monitoring: graphite2001 crashed - https://phabricator.wikimedia.org/T198041#4310622 (10fgiunchedi) p:05Triage>03Low Thanks @jcrespo ! We're replacing this machine soon in {T196483} so I'll triage this as low for now and set its parent. [09:16:08] 10Operations, 10ops-codfw: rack/setup/install graphite2003 - https://phabricator.wikimedia.org/T196483#4258321 (10fgiunchedi) [09:17:58] (03PS1) 10Gehel: wdqs: correct async appender package name [puppet] - 10https://gerrit.wikimedia.org/r/441803 (https://phabricator.wikimedia.org/T198051) [09:18:45] (03CR) 10Gehel: [C: 032] wdqs: correct async appender package name [puppet] - 10https://gerrit.wikimedia.org/r/441803 (https://phabricator.wikimedia.org/T198051) (owner: 10Gehel) [09:24:54] (03PS1) 10Vgutierrez: vcl: Bump AES128-SHA pageview replacement to 10% [puppet] - 10https://gerrit.wikimedia.org/r/441804 (https://phabricator.wikimedia.org/T192555) [09:28:16] (03CR) 10Vgutierrez: [C: 04-2] "Merge on Thursday 28th" [puppet] - 10https://gerrit.wikimedia.org/r/441804 (https://phabricator.wikimedia.org/T192555) (owner: 10Vgutierrez) [09:33:25] (03PS1) 10Aklapper: Phab: Explain to not-yet-approved users how they can access tasks [puppet] - 10https://gerrit.wikimedia.org/r/441806 (https://phabricator.wikimedia.org/T197550) [09:33:48] (03CR) 10Volans: [V: 032 C: 032] Packages details page: don't wrap badges [software/debmonitor] - 10https://gerrit.wikimedia.org/r/440657 (https://phabricator.wikimedia.org/T167504) (owner: 10Volans) [09:34:06] (03PS2) 10Volans: Query optimizations [software/debmonitor] - 10https://gerrit.wikimedia.org/r/440659 (https://phabricator.wikimedia.org/T191299) [09:34:47] (03CR) 10Filippo Giunchedi: [C: 031] phabricator: set smtp-host to localhost [puppet] - 10https://gerrit.wikimedia.org/r/440910 (https://phabricator.wikimedia.org/T196916) (owner: 10Herron) [09:35:26] (03CR) 10jerkins-bot: [V: 04-1] Query optimizations [software/debmonitor] - 10https://gerrit.wikimedia.org/r/440659 (https://phabricator.wikimedia.org/T191299) (owner: 10Volans) [09:36:19] (03CR) 10Filippo Giunchedi: [C: 04-1] "The script is already python3, I think we can ditch the list() suggestions?" [puppet] - 10https://gerrit.wikimedia.org/r/441208 (owner: 10Dzahn) [09:38:29] PROBLEM - Restbase edge eqsin on text-lb.eqsin.wikimedia.org is CRITICAL: /api/rest_v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve all events for Jan 15) timed out before a response was received [09:39:59] PROBLEM - HHVM rendering on mw2193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:40:48] RECOVERY - Restbase edge eqsin on text-lb.eqsin.wikimedia.org is OK: All endpoints are healthy [09:40:59] RECOVERY - HHVM rendering on mw2193 is OK: HTTP OK: HTTP/1.1 200 OK - 80847 bytes in 0.330 second response time [09:41:25] (03PS1) 10Elukey: role::cache::kafka:*: add more alarms for varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/441808 (https://phabricator.wikimedia.org/T198070) [09:41:36] (03CR) 10Volans: "> The script is already python3, I think we can ditch the list()" [puppet] - 10https://gerrit.wikimedia.org/r/441208 (owner: 10Dzahn) [09:46:59] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/compiler02/11555/" [puppet] - 10https://gerrit.wikimedia.org/r/441808 (https://phabricator.wikimedia.org/T198070) (owner: 10Elukey) [09:48:34] 10Operations, 10Analytics, 10SRE-Access-Requests: Requesting access for mbsantos - https://phabricator.wikimedia.org/T197237#4310779 (10faidon) Yes, let's not block this for yet another week! Consider this approved, please go ahead. [09:55:58] (03CR) 10Filippo Giunchedi: "LGTM, see inline for the last modification and then we can merge" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/432528 (https://phabricator.wikimedia.org/T182085) (owner: 1020after4) [10:32:39] 10Operations, 10ops-codfw, 10DNS, 10Traffic, 10Patch-For-Review: rack/setup/install dns200[12].wikimedia.org - https://phabricator.wikimedia.org/T196493#4258543 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on neodymium.eqiad.wmnet for hosts: ``` dns2001.wikimedia.org ``` The... [10:32:44] !log increase CPU count for proton machines from 2 to 10. T197862 [10:32:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:32:47] T197862: Increase the CPU count for proton[12]00[12] - https://phabricator.wikimedia.org/T197862 [10:35:48] 10Operations, 10Proton, 10Services (doing): Increase the CPU count for proton[12]00[12] - https://phabricator.wikimedia.org/T197862#4310975 (10akosiaris) Agreed. While overall the proposed solution is probably the best, I went ahead with option A (increase the vCPU count to 10) alone for now in order to faci... [10:35:52] (03PS1) 10MarcoAurelio: Create site striker.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/441817 (https://phabricator.wikimedia.org/T189637) [10:36:42] 10Operations, 10Proton, 10Services (doing): Increase the CPU count for proton[12]00[12] - https://phabricator.wikimedia.org/T197862#4310981 (10akosiaris) p:05High>03Low Lowering priority to depict we currently have upgraded quite a bit the CPU count but the task is not yet resolved. [10:39:42] (03PS2) 10MarcoAurelio: Create site striker.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/441817 (https://phabricator.wikimedia.org/T189637) [10:40:47] (03PS3) 10MarcoAurelio: Create site striker.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/441817 (https://phabricator.wikimedia.org/T189637) [10:41:32] (03CR) 10MarcoAurelio: "I don't think we need a mobile entry for this site. If you disagree, we can certainly add it. Thanks." [dns] - 10https://gerrit.wikimedia.org/r/441817 (https://phabricator.wikimedia.org/T189637) (owner: 10MarcoAurelio) [10:44:15] (03CR) 10Vgutierrez: [C: 04-2] "pcc looks happy: https://puppet-compiler.wmflabs.org/compiler02/11556/" [puppet] - 10https://gerrit.wikimedia.org/r/440544 (https://phabricator.wikimedia.org/T182993) (owner: 10Vgutierrez) [10:46:01] 10Operations, 10ops-codfw, 10DNS, 10Traffic, 10Patch-For-Review: rack/setup/install dns200[12].wikimedia.org - https://phabricator.wikimedia.org/T196493#4311005 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['dns2001.wikimedia.org'] ``` Of which those **FAILED**: ``` ['dns2001.wikimedia.or... [10:48:53] (03PS1) 10Vgutierrez: install_server: Set netboot config for dns200[12].wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/441822 (https://phabricator.wikimedia.org/T196493) [10:49:41] PROBLEM - Restbase edge eqsin on text-lb.eqsin.wikimedia.org is CRITICAL: /api/rest_v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve all events for Jan 15) timed out before a response was received [10:50:41] RECOVERY - Restbase edge eqsin on text-lb.eqsin.wikimedia.org is OK: All endpoints are healthy [10:52:38] (03CR) 10Vgutierrez: [C: 032] install_server: Set netboot config for dns200[12].wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/441822 (https://phabricator.wikimedia.org/T196493) (owner: 10Vgutierrez) [10:52:52] (03PS2) 10Vgutierrez: install_server: Set netboot config for dns200[12].wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/441822 (https://phabricator.wikimedia.org/T196493) [10:56:10] (03PS3) 10Ppchelko: Switch all jobs to the new queue and clean up the old queue configs. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437767 (https://phabricator.wikimedia.org/T190327) [10:57:53] (03CR) 10jerkins-bot: [V: 04-1] Switch all jobs to the new queue and clean up the old queue configs. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437767 (https://phabricator.wikimedia.org/T190327) (owner: 10Ppchelko) [10:58:05] 10Operations, 10Electron-PDFs, 10Proton, 10Readers-Web-Backlog, and 4 others: New service request: chromium-render/deploy - https://phabricator.wikimedia.org/T186748#4311065 (10akosiaris) >>! In T186748#4301132, @pmiazga wrote: > .. > [snip] > > At the beginning we will render directly to the user - of co... [10:59:22] (03PS4) 10Ppchelko: Switch all jobs to the new queue and clean up the old queue configs. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/437767 (https://phabricator.wikimedia.org/T190327) [10:59:25] 10Operations, 10ops-codfw, 10DNS, 10Traffic, 10Patch-For-Review: rack/setup/install dns200[12].wikimedia.org - https://phabricator.wikimedia.org/T196493#4311071 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on neodymium.eqiad.wmnet for hosts: ``` dns2001.wikimedia.org ``` The... [11:00:04] jan_drewniak: It is that lovely time of the day again! You are hereby commanded to deploy Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180625T1100). [11:00:40] (03CR) 10Alexandros Kosiaris: [C: 032] "This does provide some implicit sudo privileges to the users but" [puppet] - 10https://gerrit.wikimedia.org/r/441379 (https://phabricator.wikimedia.org/T197857) (owner: 10Mobrovac) [11:00:47] (03PS2) 10Alexandros Kosiaris: Add niedzielski, pmiazga and phuedx to deploy-service [puppet] - 10https://gerrit.wikimedia.org/r/441379 (https://phabricator.wikimedia.org/T197857) (owner: 10Mobrovac) [11:00:52] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Add niedzielski, pmiazga and phuedx to deploy-service [puppet] - 10https://gerrit.wikimedia.org/r/441379 (https://phabricator.wikimedia.org/T197857) (owner: 10Mobrovac) [11:09:51] RECOVERY - High lag on wdqs1005 is OK: (C)3600 ge (W)1200 ge 1142 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [11:20:44] 10Operations, 10ops-codfw, 10DNS, 10Traffic, 10Patch-For-Review: rack/setup/install dns200[12].wikimedia.org - https://phabricator.wikimedia.org/T196493#4311178 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['dns2001.wikimedia.org'] ``` and were **ALL** successful. [11:23:10] 10Operations, 10ops-codfw, 10DNS, 10Traffic, 10Patch-For-Review: rack/setup/install dns200[12].wikimedia.org - https://phabricator.wikimedia.org/T196493#4311186 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on neodymium.eqiad.wmnet for hosts: ``` dns2002.wikimedia.org ``` The... [11:23:53] 10Operations, 10ops-codfw, 10DNS, 10Traffic, 10Patch-For-Review: rack/setup/install dns200[12].wikimedia.org - https://phabricator.wikimedia.org/T196493#4311189 (10Vgutierrez) [11:44:03] 10Operations, 10ops-codfw, 10DNS, 10Traffic, 10Patch-For-Review: rack/setup/install dns200[12].wikimedia.org - https://phabricator.wikimedia.org/T196493#4311259 (10Vgutierrez) [11:44:27] 10Operations, 10ops-codfw, 10DNS, 10Traffic, 10Patch-For-Review: rack/setup/install dns200[12].wikimedia.org - https://phabricator.wikimedia.org/T196493#4311260 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['dns2002.wikimedia.org'] ``` and were **ALL** successful. [11:47:04] (03PS1) 10Vgutierrez: site: set proper role for dns200[12].wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/441834 (https://phabricator.wikimedia.org/T196493) [11:48:12] (03PS2) 10Vgutierrez: site: set proper role for dns200[12].wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/441834 (https://phabricator.wikimedia.org/T196493) [11:49:41] PROBLEM - kubelet operational latencies on kubernetes1002 is CRITICAL: instance=kubernetes1002.eqiad.wmnet operation_type={create_container,start_container} https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [11:50:41] RECOVERY - kubelet operational latencies on kubernetes1002 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/kubernetes-kubelets?orgId=1 [11:54:21] PROBLEM - DPKG on labvirt1006 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [11:55:22] RECOVERY - DPKG on labvirt1006 is OK: All packages OK [11:56:34] !log mobrovac@deploy1001 Started deploy [restbase/deploy@f521e7e] (dev-cluster): Dev cluster: Include pagelanguage in title_revisions [11:56:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:57:42] RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy [11:58:59] ^labvirt is the debmonitor rollout [12:00:11] !log repooling wdqs1005 it has catched up on updates - T198042 [12:00:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:00:13] T198042: WDQS timeout on the public eqiad cluster - https://phabricator.wikimedia.org/T198042 [12:00:39] !log mobrovac@deploy1001 Finished deploy [restbase/deploy@f521e7e] (dev-cluster): Dev cluster: Include pagelanguage in title_revisions (duration: 04m 05s) [12:00:39] RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy [12:00:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:03:52] (03PS1) 10Urbanecm: Create TemplateEditor group on enwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441839 (https://phabricator.wikimedia.org/T198056) [12:04:33] !log Restbase: Add the "headers" field to Cassandra schemae in production for T197789 [12:04:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:04:36] T197789: Schema upgrades to add headers field - https://phabricator.wikimedia.org/T197789 [12:05:39] PROBLEM - Disk space on elastic1019 is CRITICAL: DISK CRITICAL - free space: /srv 60706 MB (12% inode=99%) [12:10:10] jouncebot, reload [12:10:19] jouncebot, refresh [12:10:20] I refreshed my knowledge about deployments. [12:11:29] RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy [12:15:54] (03PS3) 10Gehel: maps: remove "style" parameter [puppet] - 10https://gerrit.wikimedia.org/r/439543 [12:22:25] (03PS1) 10Gehel: elasticsearch: raise alerting threshold for disk space [puppet] - 10https://gerrit.wikimedia.org/r/441846 [12:23:01] (03PS2) 10Urbanecm: Remove BN alias for NS_USER on dewikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440652 (https://phabricator.wikimedia.org/T196905) [12:23:13] (03PS2) 10Gehel: elasticsearch: raise alerting threshold for disk space [puppet] - 10https://gerrit.wikimedia.org/r/441846 [12:26:09] !log kartik@deploy1001 Started deploy [cxserver/deploy@cc6dc61]: Update cxserver to ece5e7a (T191874, T196354, T195768, T195768) [12:26:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:26:14] T195768: CX2: Red links appear in the source article - https://phabricator.wikimedia.org/T195768 [12:26:14] T196354: Enable MT for "Simple English" as if it was (regular) English - https://phabricator.wikimedia.org/T196354 [12:26:15] T191874: TypeError: Cannot read property 'attributes' of undefined at MWImage - https://phabricator.wikimedia.org/T191874 [12:29:42] !log kartik@deploy1001 Finished deploy [cxserver/deploy@cc6dc61]: Update cxserver to ece5e7a (T191874, T196354, T195768, T195768) (duration: 03m 33s) [12:29:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:29:59] (03CR) 10DCausse: [C: 031] elasticsearch: raise alerting threshold for disk space [puppet] - 10https://gerrit.wikimedia.org/r/441846 (owner: 10Gehel) [12:30:23] (03CR) 10Gehel: [C: 032] elasticsearch: raise alerting threshold for disk space [puppet] - 10https://gerrit.wikimedia.org/r/441846 (owner: 10Gehel) [12:32:22] (03CR) 10Vgutierrez: [C: 032] site: set proper role for dns200[12].wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/441834 (https://phabricator.wikimedia.org/T196493) (owner: 10Vgutierrez) [12:32:30] (03PS3) 10Vgutierrez: site: set proper role for dns200[12].wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/441834 (https://phabricator.wikimedia.org/T196493) [12:39:15] 10Operations, 10JADE, 10Scoring-platform-team (Current), 10User-Joe: Extension:JADE scalability concerns due to creating a page per revision - https://phabricator.wikimedia.org/T196547#4311554 (10awight) I'm looking at two more data sources that we may decide to integrate with: PageTriage and FlaggedRevs.... [12:41:28] 10Operations, 10ops-codfw, 10DNS, 10Traffic, 10Patch-For-Review: rack/setup/install dns200[12].wikimedia.org - https://phabricator.wikimedia.org/T196493#4311555 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on neodymium.eqiad.wmnet for hosts: ``` dns2001.wikimedia.org ``` The... [12:43:20] RECOVERY - Disk space on elastic1019 is OK: DISK OK [12:43:38] 10Operations, 10Analytics, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345#4311561 (10faidon) a:05elukey>03RobH That spare assignment sounds good to me, consider it approved. @RobH, you can go ahead :) [12:46:20] 10Operations, 10Analytics: Broken apt config on kafka/analytics hosts - https://phabricator.wikimedia.org/T198092#4311574 (10MoritzMuehlenhoff) p:05Triage>03High [12:50:40] PROBLEM - restbase endpoints health on restbase-dev1005 is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve all events for Jan 15) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received [12:51:09] PROBLEM - restbase endpoints health on restbase-dev1004 is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve all events for Jan 15) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received [12:51:09] PROBLEM - restbase endpoints health on restbase-dev1006 is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve all events for Jan 15) timed out before a response was received: /en.wikipedia.org/v1/feed/featured/{yyyy}/{mm}/{dd} (Retrieve aggregated feed content for April 29, 2016) timed out before a response was received [12:51:09] PROBLEM - mobileapps endpoints health on scb2004 is CRITICAL: /{domain}/v1/page/metadata/{title}{/revision}{/tid} (retrieve extended metadata for Video article on English Wikipedia) timed out before a response was received [12:51:10] PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: /{domain}/v1/page/metadata/{title}{/revision}{/tid} (retrieve extended metadata for Video article on English Wikipedia) timed out before a response was received [12:51:10] PROBLEM - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is CRITICAL: /en.wikipedia.org/v1/feed/onthisday/{type}/{mm}/{dd} (Retrieve all events for Jan 15) is CRITICAL: Test Retrieve all events for Jan 15 returned the unexpected status 500 (expecting: 200) [12:51:29] PROBLEM - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is CRITICAL: /{domain}/v1/page/references/{title}{/revision}{/tid} (retrieve structured reference data for the Cat article on English Wikipedia) is CRITICAL: Test retrieve structured reference data for the Cat article on English Wikipedia returned the unexpected status 500 (expecting: 200): /{domain}/v1/page/mobile-sections/{title}{/revision}{/tid} (retrieve en.wp mai [12:51:29] ections) is CRITICAL: Test retrieve en.wp main page via mobile-sections returned the unexpected status 500 (expecting: 200): /{domain}/v1/feed/onthisday/{type}/{month}/{day} (retrieve all events on January 15) is CRITICAL: Test retrieve all events on January 15 returned the unexpected status 500 (expecting: 200): /{domain}/v1/page/mobile-sections-lead/{title}{/revision}{/tid} (retrieve lead section of en.wp Altrincham page via [12:51:29] d) is CRITICAL: Test retrieve lead section of en.wp Altrincham page via mobile-sections-lead returned the unexpected status 500 (expecting: 200): /{domain}/v1/page/news (get In the News content) is CRITICAL: Test get In the News content returned the unexpected status 500 (expecting: 200) [12:51:30] PROBLEM - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is CRITICAL: /{domain}/v1/page/summary/{title}{/revision}{/tid} (Get summary for 2nd Earl of Derby) is CRITICAL: Test Get summary for 2nd Earl of Derby returned the unexpected status 500 (expecting: 200): /{domain}/v1/page/metadata/{title}{/revision}{/tid} (retrieve extended metadata for Video article on English Wikipedia) is CRITICAL: Test retrieve extended metadata [12:51:30] n English Wikipedia returned the unexpected status 500 (expecting: 200): /{domain}/v1/feed/onthisday/{type}/{month}/{day} (retrieve all events on January 15) is CRITICAL: Test retrieve all events on January 15 returned the unexpected status 500 (expecting: 200) [12:51:49] PROBLEM - Restbase LVS codfw on restbase.svc.codfw.wmnet is CRITICAL: /en.wikipedia.org/v1/page/metadata/{title}{/revision} (Get extended metadata of a test page) is CRITICAL: Test Get extended metadata of a test page returned the unexpected status 500 (expecting: 200): /en.wikipedia.org/v1/page/media/{title}{/revision} (Get media in test page) is CRITICAL: Test Get media in test page returned the unexpected status 500 (expecti [12:51:49] edia.org/v1/page/mobile-sections/{title}{/revision} (Get mobile-sections for a test page on enwiki) is CRITICAL: Test Get mobile-sections for a test page on enwiki returned the unexpected status 500 (expecting: 200) [12:53:04] just a typical monday ... [12:53:20] RECOVERY - mobileapps endpoints health on scb2003 is OK: All endpoints are healthy [12:54:04] 10Operations, 10ops-codfw, 10DNS, 10Traffic, 10Patch-For-Review: rack/setup/install dns200[12].wikimedia.org - https://phabricator.wikimedia.org/T196493#4311589 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['dns2001.wikimedia.org'] ``` Of which those **FAILED**: ``` ['dns2001.wikimedia.or... [12:55:25] these RB/MCS errors ^ are known and transient, will clear out soon [12:55:43] (03PS1) 10Ladsgroup: labs: set dispatchLagToMaxLagFactor to 60 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441857 (https://phabricator.wikimedia.org/T194950) [12:56:36] <_joe_> mobrovac: ok, I almost fainted :P [12:56:39] RECOVERY - mobileapps endpoints health on scb2004 is OK: All endpoints are healthy [12:56:51] (03CR) 10Ladsgroup: [C: 032] labs: set dispatchLagToMaxLagFactor to 60 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441857 (https://phabricator.wikimedia.org/T194950) (owner: 10Ladsgroup) [12:56:59] RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy [12:57:12] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441858 [12:57:20] jouncebot: next [12:57:20] In 0 hour(s) and 2 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180625T1300) [12:58:42] (03Merged) 10jenkins-bot: labs: set dispatchLagToMaxLagFactor to 60 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441857 (https://phabricator.wikimedia.org/T194950) (owner: 10Ladsgroup) [12:59:04] (03CR) 10jenkins-bot: labs: set dispatchLagToMaxLagFactor to 60 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441857 (https://phabricator.wikimedia.org/T194950) (owner: 10Ladsgroup) [12:59:06] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441858 (owner: 10Marostegui) [13:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: (Dis)respected human, time to deploy European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180625T1300). Please do the needful. [13:00:05] bmansurov, Jayprakash12345, revi, Amir1, Pchelolo, and Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [13:00:08] Here [13:00:12] Here [13:00:14] present [13:00:17] here [13:00:18] hoi hoi [13:00:20] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1106 after alter table (duration: 00m 57s) [13:00:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:30] o/ [13:00:39] I will do them in order [13:00:42] o/ [13:00:49] RECOVERY - restbase endpoints health on restbase-dev1005 is OK: All endpoints are healthy [13:01:04] hashar: you are doing the SWAT, nice :D [13:01:05] Hi hashar [13:01:19] RECOVERY - Restbase LVS eqiad on restbase.svc.eqiad.wmnet is OK: All endpoints are healthy [13:01:20] zeljkof: yeah I have barely ran any :D [13:01:29] RECOVERY - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is OK: All endpoints are healthy [13:01:33] hashar: I'm around if you get bored :D [13:01:41] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440557 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov) [13:02:09] hashar: a few of people are deployers, I usually ask them if they want to deploy their changes :) [13:02:10] RECOVERY - restbase endpoints health on restbase-dev1004 is OK: All endpoints are healthy [13:02:16] bmansurov: hmm about https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/440557/2/wmf-config/InitialiseSettings.php isn't that going to be 100% :) [13:02:21] or is that 1 out of 100 requests? :] [13:02:31] hashar: yes 1 out of 100 [13:02:48] (03PS2) 10Elukey: Move jmxtrans and kafkatee submodules to environments/production [puppet] - 10https://gerrit.wikimedia.org/r/440540 [13:02:55] bmansurov: I guess I will deploy it everywhere [13:03:01] hashar: ok thanks [13:03:02] there is probably not much we can test on mwdebug1001 [13:03:08] agree [13:03:21] (03CR) 10jerkins-bot: [V: 04-1] Move jmxtrans and kafkatee submodules to environments/production [puppet] - 10https://gerrit.wikimedia.org/r/440540 (owner: 10Elukey) [13:04:04] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1106" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441858 (owner: 10Marostegui) [13:04:25] Urbanecm: I will do the Add import sources to ta.wiktionary change next ( https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/441069 ) not sure why it is flagged as a WIP though [13:04:39] RECOVERY - restbase endpoints health on restbase-dev1006 is OK: All endpoints are healthy [13:04:57] (03CR) 10Hashar: [C: 032] Add import sources to ta.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441069 (https://phabricator.wikimedia.org/T196445) (owner: 10Jayprakash12345) [13:05:04] For me that looks like Jayprakash12345's patch [13:05:31] ugh, none of the mcs/restbase alerts paged at least for me [13:05:50] hashar, it's not my patch (even I can take care about it as well, if its author isn't present). [13:05:52] !log refresh cassandra certificates for 'restbase' cluster [13:05:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:55] !log Deploy schema change on db2035 (codfw primary master for s2) with replication, this will generate lag on s2 codfw T191316 T192926 T89737 T195193 [13:05:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:59] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [13:05:59] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [13:06:00] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [13:06:00] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [13:06:28] I think we can merge it even it's work in progress officially [13:06:40] (oh, seems you already +2ed it :D) [13:07:20] RECOVERY - Restbase LVS codfw on restbase.svc.codfw.wmnet is OK: All endpoints are healthy [13:07:34] Pchelolo: dont you need the Echo patch to be cherry picked to the next branch as well ( 1.32.0-wmf.999 ) [13:08:29] (03PS3) 10Hashar: Enable logging for Schema:CitationUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440557 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov) [13:08:37] (03CR) 10Hashar: [C: 032] Enable logging for Schema:CitationUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440557 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov) [13:08:51] hashar: will the .9 even exist and did it cut already? [13:09:02] Pchelolo: I have no idea :/ [13:09:13] according to this https://phabricator.wikimedia.org/T191055#4274103 it will not as far as I understand [13:09:53] (03PS4) 10Hashar: Add import sources to ta.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441069 (https://phabricator.wikimedia.org/T196445) (owner: 10Jayprakash12345) [13:09:55] (03PS2) 10Hashar: Add wikimania2018wiki to commonsupload.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441076 (https://phabricator.wikimedia.org/T197714) (owner: 10Revi) [13:10:07] I have rebased the changes in order [13:10:12] (03PS2) 10Hashar: Fix ORES config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440974 (https://phabricator.wikimedia.org/T197633) (owner: 10Ladsgroup) [13:10:18] (03Merged) 10jenkins-bot: Enable logging for Schema:CitationUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440557 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov) [13:10:37] (03CR) 10Filippo Giunchedi: [C: 031] cassandra: restore (most) G1GC settings to defaults [puppet] - 10https://gerrit.wikimedia.org/r/426152 (https://phabricator.wikimedia.org/T192112) (owner: 10Eevans) [13:10:42] (03PS3) 10Hashar: Remove BN alias for NS_USER on dewikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440652 (https://phabricator.wikimedia.org/T196905) (owner: 10Urbanecm) [13:10:46] (03CR) 10jenkins-bot: Enable logging for Schema:CitationUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440557 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov) [13:11:12] (03PS4) 10Filippo Giunchedi: prometheus: alert on config reload failure [puppet] - 10https://gerrit.wikimedia.org/r/432059 [13:11:21] (03CR) 10jerkins-bot: [V: 04-1] Add import sources to ta.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441069 (https://phabricator.wikimedia.org/T196445) (owner: 10Jayprakash12345) [13:11:27] hashar: mine is not testable but it would be great if we wait for five minutes for fatals and errors [13:12:59] !log hashar@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable logging for Schema:CitationUsage at 1% | T191086 (duration: 00m 57s) [13:13:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:13:02] T191086: Instrument, collect data, and perform the first round of analysis on click-through data on citations/footnotes - https://phabricator.wikimedia.org/T191086 [13:13:07] (03CR) 10Hashar: [C: 032] Add import sources to ta.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441069 (https://phabricator.wikimedia.org/T196445) (owner: 10Jayprakash12345) [13:13:12] bmansurov: deployed [13:13:14] Amir1: ok :) [13:13:21] hashar: ok checking [13:13:26] it'll take some time [13:14:39] (03PS1) 10Vgutierrez: standard: add dns200[12] to codfw ntp peer list [puppet] - 10https://gerrit.wikimedia.org/r/441860 (https://phabricator.wikimedia.org/T196493) [13:16:17] oh my god [13:16:21] I cant merge https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/441069 [13:17:07] rebase? [13:17:08] Probably because WIP? [13:17:14] yeah [13:17:16] it is a WIP [13:17:21] Then just skip it I guess [13:17:37] Patch owner isn't here anyway [13:18:00] PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:18:41] wth? [13:19:53] godog: ^ puppt on rb1007 complains about an invalid secret, might be connected to your cert refresh? [13:20:09] invalid secret cassandra/restbase/restbase1007-a/restbase1007-a.kst [13:21:19] (03CR) 10Hashar: [C: 032] "I have not deployed this change since it is marked as a WIP and I could not figure out how to mark it ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441069 (https://phabricator.wikimedia.org/T196445) (owner: 10Jayprakash12345) [13:21:22] the path seems correct, maybe missing on the puppet master? [13:21:30] (03PS3) 10Hashar: Add wikimania2018wiki to commonsupload.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441076 (https://phabricator.wikimedia.org/T197714) (owner: 10Revi) [13:21:31] (03PS3) 10Hashar: Fix ORES config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440974 (https://phabricator.wikimedia.org/T197633) (owner: 10Ladsgroup) [13:21:33] (03PS4) 10Hashar: Remove BN alias for NS_USER on dewikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440652 (https://phabricator.wikimedia.org/T196905) (owner: 10Urbanecm) [13:21:56] revi: ok progressing with your patch now [13:21:59] kk [13:22:24] (03CR) 10Hashar: [C: 032] Add wikimania2018wiki to commonsupload.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441076 (https://phabricator.wikimedia.org/T197714) (owner: 10Revi) [13:22:27] mobrovac: very likely, I'll take a look [13:22:41] thnx godog [13:22:53] hashar, can you please remove the CR from 441069? I'm afraid than when Jay will remove WIP status, it will merge automatically => problems for us when deploying first next patch [13:23:11] Urbanecm: yup :) [13:23:18] Thank you :) [13:24:26] (03Merged) 10jenkins-bot: Add wikimania2018wiki to commonsupload.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441076 (https://phabricator.wikimedia.org/T197714) (owner: 10Revi) [13:25:13] revi: you can check on mwdebug1001 :) [13:25:22] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440974 (https://phabricator.wikimedia.org/T197633) (owner: 10Ladsgroup) [13:25:35] Checking [13:25:51] (03PS1) 10Elukey: confluent::kafka::common: add workaround for apt on jessie hosts [puppet] - 10https://gerrit.wikimedia.org/r/441861 (https://phabricator.wikimedia.org/T198092) [13:25:56] aaaand looks good! [13:25:56] mobrovac: fixed, for some reason 1007 secrets weren't in the config file yet, probably an oversight [13:26:43] (03Merged) 10jenkins-bot: Fix ORES config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440974 (https://phabricator.wikimedia.org/T197633) (owner: 10Ladsgroup) [13:27:12] !log hashar@deploy1001 Synchronized dblists/commonsuploads.dblist: Add wikimania2018wiki to commonsupload.dblist | T197714 (duration: 00m 56s) [13:27:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:14] T197714: Restrict uploads to sysop on wikimania2018wiki - https://phabricator.wikimedia.org/T197714 [13:27:41] hashar: thanks for deploying. [13:28:10] RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:28:26] Amir1: doing the ORES one on mwdbeug1001 [13:29:03] 10Operations, 10ops-ulsfo, 10Traffic, 10netops: troubleshoot cr3/cr4 link - https://phabricator.wikimedia.org/T196030#4311726 (10ayounsi) I've been going back and forth with JTAC. The next physical change we need to try is a "loop test", for example connect cr3:et-0/0/1 to cr3:et-0/0/2 and see if the links... [13:29:11] mwdebug1001? [13:29:19] Amir1: yup [13:29:27] it is deployed on mwdebug1001 now [13:30:31] Pchelolo: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Echo/+/441818/ is now on mwdebug1001. Not sure whether it can be tested though [13:31:00] 10Operations, 10Analytics, 10Cleanup, 10User-Elukey: Archive operations/puppet/jmxtrans repository - https://phabricator.wikimedia.org/T198097#4311727 (10elukey) p:05Triage>03Low [13:31:05] hashar: it's fine [13:31:09] (03CR) 10Hashar: [C: 032] Remove BN alias for NS_USER on dewikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440652 (https://phabricator.wikimedia.org/T196905) (owner: 10Urbanecm) [13:31:09] hashar: nope, it can't. The changed code that's actually important is running on the jobrunner which mwdebug is not [13:31:26] 10Operations, 10Analytics, 10Analytics-Kanban, 10User-Elukey: Archive operations/puppet/kafkatee repository - https://phabricator.wikimedia.org/T198098#4311739 (10elukey) p:05Triage>03Low [13:31:47] 10Operations, 10ops-ulsfo, 10Traffic, 10netops, 10Patch-For-Review: Rack/cable/configure ulsfo MX204 - https://phabricator.wikimedia.org/T189552#4311756 (10ayounsi) [13:32:29] (03Merged) 10jenkins-bot: Remove BN alias for NS_USER on dewikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440652 (https://phabricator.wikimedia.org/T196905) (owner: 10Urbanecm) [13:32:34] !log hashar@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Fix ORES config - T197633 (duration: 00m 58s) [13:32:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:32:36] T197633: ORES extension doesn't clean up parent in wp10 model - https://phabricator.wikimedia.org/T197633 [13:33:48] Pchelolo: syncing :] [13:34:41] !log hashar@deploy1001 Synchronized php-1.32.0-wmf.8/extensions/Echo: Remove masterPos from the job specification - T192945 (duration: 00m 58s) [13:34:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:34:43] T192945: Make EchoNotification job JSON-serializable - https://phabricator.wikimedia.org/T192945 [13:35:13] Urbanecm: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/440652/ is on mwdebug1001 now [13:35:18] ack [13:35:59] hashar, seems to be working. [13:37:12] hashar, if you can deploy other patches, please let me know, I'd like to have some other patches deployed, just didn't add them as the SWAT is officially full. [13:37:16] !log hashar@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Remove BN alias for NS_USER on dewikivoyage - T196905 (duration: 00m 57s) [13:37:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:37:18] T196905: bn (Bengali) language code does not work at German Wikivoyage, overwritten by local "Benutzer" alias - https://phabricator.wikimedia.org/T196905 [13:37:31] Urbanecm: sure I guess we can do a couple more [13:37:46] Ok, I'll add them to the Calendar [13:39:31] hashar, added, thank you [13:39:48] hashar: Are you sure you deployed my change? var_dump( $wgOresModels ); in mwmaint1001 returns the old values [13:40:44] (03CR) 10Hashar: [C: 032] Add namespace aliases to zhwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441593 (https://phabricator.wikimedia.org/T198007) (owner: 10Urbanecm) [13:40:50] Amir1: check whether /srv/mediawiki/ has the proper code? [13:41:30] PROBLEM - Device not healthy -SMART- on db2052 is CRITICAL: cluster=mysql device=cciss,1 instance=db2052:9100 job=node site=codfw https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2052&var-datasource=codfw%2520prometheus%252Fops [13:41:42] hashar: it doesn't (in mwmaint1001) [13:41:48] ^ there is a ticket about it, I will silence it again cc papaul [13:42:29] PROBLEM - CirrusSearch eqiad 95th percentile latency on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [13:42:30] terbium:/srv/mediawiki$ grep cleanParent wmf-config/InitialiseSettings.php [13:42:30] 'wp10' => [ 'enabled' => false, 'namespaces' => [ 0 ], 'cleanParent' => true ], [13:42:34] hashar: oooops, I made a mistake [13:42:49] I fixed the default value not enwiki [13:42:52] mwmaint1001:/srv/mediawiki$ grep cleanParent wmf-config/InitialiseSettings.php [13:42:52] 'wp10' => [ 'enabled' => false, 'namespaces' => [ 0 ], 'cleanParent' => true ], [13:42:54] ACKNOWLEDGEMENT - Device not healthy -SMART- on db2052 is CRITICAL: cluster=mysql device=cciss,1 instance=db2052:9100 job=node site=codfw Marostegui T197146 - The acknowledgement expires at: 2018-07-02 13:42:18. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2052&var-datasource=codfw%2520prometheus%252Fops [13:42:59] I need to fix both [13:43:03] (03PS2) 10Hashar: Add namespace aliases to zhwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441593 (https://phabricator.wikimedia.org/T198007) (owner: 10Urbanecm) [13:43:10] Do you have time to add one more? [13:43:10] (03CR) 10Hashar: [C: 032] Add namespace aliases to zhwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441593 (https://phabricator.wikimedia.org/T198007) (owner: 10Urbanecm) [13:44:06] Amir1: yes add it to the calendar please [13:44:15] hm hashar apparently you were right about needing to cherry-pick the change to wmf.999 branch.. [13:44:52] (03Merged) 10jenkins-bot: Add namespace aliases to zhwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441593 (https://phabricator.wikimedia.org/T198007) (owner: 10Urbanecm) [13:45:31] (03PS1) 10Ladsgroup: Fix ORES config, part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441867 (https://phabricator.wikimedia.org/T197633) [13:45:46] hashar: ^ [13:45:49] adding it [13:46:44] (03CR) 10Hashar: "Really that white list does not scale. I am not sure how useful it is to add any governement website that might have free license. The " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441563 (https://phabricator.wikimedia.org/T197944) (owner: 10Urbanecm) [13:47:44] Urbanecm: 441593 Add namespace aliases to zhwikivoyage is on mwdebug1001 [13:47:47] ack [13:48:14] (03PS2) 10Hashar: Autoconfirmed should require 10 edits&4 days on zhwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441592 (https://phabricator.wikimedia.org/T198006) (owner: 10Urbanecm) [13:48:25] (03CR) 10Hashar: [C: 032] Autoconfirmed should require 10 edits&4 days on zhwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441592 (https://phabricator.wikimedia.org/T198006) (owner: 10Urbanecm) [13:48:55] hashar, working, please deploy [13:50:06] (03Merged) 10jenkins-bot: Autoconfirmed should require 10 edits&4 days on zhwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441592 (https://phabricator.wikimedia.org/T198006) (owner: 10Urbanecm) [13:50:12] !log hashar@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Add namespace aliases to zhwikivoyage - T198007 (duration: 00m 56s) [13:50:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:50:15] T198007: Add namespace aliases to zhwikivoyage - https://phabricator.wikimedia.org/T198007 [13:51:11] (03PS2) 10Hashar: Whitelist two Indian government websites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441563 (https://phabricator.wikimedia.org/T197944) (owner: 10Urbanecm) [13:51:13] (03PS2) 10Hashar: Fix ORES config, part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441867 (https://phabricator.wikimedia.org/T197633) (owner: 10Ladsgroup) [13:51:23] (03CR) 10Hashar: [C: 032] Whitelist two Indian government websites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441563 (https://phabricator.wikimedia.org/T197944) (owner: 10Urbanecm) [13:51:39] Amir1: :]]] [13:51:43] (03CR) 10Hashar: [C: 032] Fix ORES config, part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441867 (https://phabricator.wikimedia.org/T197633) (owner: 10Ladsgroup) [13:51:47] !log hashar@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Autoconfirmed should require 10 edits&4 days on zhwikivoyage - T198006 (duration: 00m 56s) [13:51:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:51:49] T198006: Request to change autoconfirmed settings on Chinese Wikivoyage - https://phabricator.wikimedia.org/T198006 [13:52:21] :P [13:52:41] hashar: could I bother you a bit more and ask to get it to wmf.999 as well? https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Echo/+/441868/ [13:52:45] Urbanecm: there are some duplicates articles/namespaces in zhwikivoyage I will paste on the task [13:52:57] ok, thank you. [13:52:58] Pchelolo: sure :) [13:53:07] (03CR) 10Elukey: [C: 032] Move jmxtrans and kafkatee submodules to environments/production [puppet] - 10https://gerrit.wikimedia.org/r/440540 (owner: 10Elukey) [13:53:10] (03CR) 10Elukey: [V: 032 C: 032] Move jmxtrans and kafkatee submodules to environments/production [puppet] - 10https://gerrit.wikimedia.org/r/440540 (owner: 10Elukey) [13:53:12] thank you [13:53:12] (03Merged) 10jenkins-bot: Whitelist two Indian government websites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441563 (https://phabricator.wikimedia.org/T197944) (owner: 10Urbanecm) [13:53:32] (03Merged) 10jenkins-bot: Fix ORES config, part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441867 (https://phabricator.wikimedia.org/T197633) (owner: 10Ladsgroup) [13:53:47] !log merging jmxtrans and kafkatee's submodules to operations/puppet - part 1 (moving them under environments/production) [13:53:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:54:43] !log mwscript namespaceDupes.php --wiki=zhwikivoyage --fix | T198007 [13:54:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:57:26] 10Operations, 10User-herron: Improve visibility of incoming operations tasks - https://phabricator.wikimedia.org/T197624#4297708 (10fgiunchedi) I'm +1 for switching to a board for clinic duty, also added bonus of displaying the task status beside #operations when browsing tasks [13:57:48] argh [13:57:55] !log hashar@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Autoconfirmed should require 10 edits&4 days on zhwikivoyage - T198006 (duration: 00m 56s) [13:57:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:57:57] T198006: Request to change autoconfirmed settings on Chinese Wikivoyage - https://phabricator.wikimedia.org/T198006 [13:59:22] !log hashar@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Whitelist two Indian government websites - T197944 (duration: 00m 57s) [13:59:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:59:25] T197944: Please add *.nic.in and *.gov.in to the wgCopyUploadsDomains whitelist - https://phabricator.wikimedia.org/T197944 [13:59:39] Amir1: syncing your change [14:00:04] addshore and CFisch_WMDE: How many deployers does it take to do FileImporter and FileExporter deployment on production deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180625T1400). [14:00:12] Thanks! [14:01:02] !log hashar@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Fix ORES config, part II - T197633 (duration: 00m 57s) [14:01:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:01:05] T197633: ORES extension doesn't clean up parent in wp10 model - https://phabricator.wikimedia.org/T197633 [14:01:13] almost done [14:01:14] o. [14:01:16] o/ [14:01:19] hashar: coolio! [14:01:21] just need Echo for 1.32.0-wmf.99 [14:01:22] https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Echo/+/441868/ [14:01:23] (03PS1) 10Elukey: Move away jmxtrans/kafkatee's modules from environments/production [puppet] - 10https://gerrit.wikimedia.org/r/441870 (https://phabricator.wikimedia.org/T188377) [14:01:30] but that can be done asynchronously / later [14:01:44] at least operations/mediawiki-config is done [14:02:21] Pchelolo: I will get your Echo patch for 1.32.9-wmf.999 to be synced eventually. It is still going through CI [14:02:25] addshore: but I guess you can start? [14:02:35] hashar: ack! :) [14:02:40] CFisch_WMDE: so, whats first in the list? :D [14:02:42] hashar: ye, I'm watching it progressing [14:02:44] thank you [14:02:54] addshore: meep [14:03:21] yeah start with the top [14:03:36] 440860 [14:03:56] atm I'm working on a very last patch for the FileImporter [14:04:09] (03PS5) 10Addshore: Add ar, de and fa wikipedia to FileImporter interwiki config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440860 (https://phabricator.wikimedia.org/T196969) (owner: 10WMDE-Fisch) [14:04:18] but we can start with the other stuff [14:04:29] ACKNOWLEDGEMENT - Memory correctable errors -EDAC- on cp1053 is CRITICAL: 151 ge 4 Filippo Giunchedi known, T165252 https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=cp1053&var-datasource=eqiad%2520prometheus%252Fops [14:04:33] (03CR) 10jenkins-bot: Add wikimania2018wiki to commonsupload.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441076 (https://phabricator.wikimedia.org/T197714) (owner: 10Revi) [14:04:35] (03CR) 10jenkins-bot: Fix ORES config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440974 (https://phabricator.wikimedia.org/T197633) (owner: 10Ladsgroup) [14:04:38] (03CR) 10jenkins-bot: Remove BN alias for NS_USER on dewikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440652 (https://phabricator.wikimedia.org/T196905) (owner: 10Urbanecm) [14:04:40] (03CR) 10jenkins-bot: Add namespace aliases to zhwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441593 (https://phabricator.wikimedia.org/T198007) (owner: 10Urbanecm) [14:04:43] (03CR) 10jenkins-bot: Autoconfirmed should require 10 edits&4 days on zhwikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441592 (https://phabricator.wikimedia.org/T198006) (owner: 10Urbanecm) [14:04:45] (03CR) 10jenkins-bot: Whitelist two Indian government websites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441563 (https://phabricator.wikimedia.org/T197944) (owner: 10Urbanecm) [14:04:45] CFisch_WMDE: is https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/FileImporter/+/441009/ what uses the config in the other patch? [14:04:47] (03CR) 10jenkins-bot: Fix ORES config, part II [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441867 (https://phabricator.wikimedia.org/T197633) (owner: 10Ladsgroup) [14:04:55] (the jenkins-bot spam are coverage reports) [14:05:09] addshore: yes [14:05:12] but the config hs to exist before that is backported? [14:05:14] having a coffee/water . I will keep looking at the logs anyway [14:05:29] not really [14:05:36] CFisch_WMDE: per https://tools.wmflabs.org/versions/ the backports also have to be on .999 branch [14:05:48] *does that now* [14:05:51] ok [14:06:01] addshore: there will be another backport in a sec [14:06:07] ack [14:06:07] I will give you the links [14:06:37] (03PS3) 10Addshore: Enable license filters for the FileImporter in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440864 (https://phabricator.wikimedia.org/T194502) (owner: 10WMDE-Fisch) [14:06:47] (03PS2) 10Addshore: Enable FileImporter on Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441013 (https://phabricator.wikimedia.org/T196969) (owner: 10WMDE-Fisch) [14:06:55] (03PS3) 10Addshore: Enable FileExpoter on ar-, de- and fa-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441014 (https://phabricator.wikimedia.org/T196969) (owner: 10WMDE-Fisch) [14:07:44] (03CR) 10Ottomata: [C: 031] "OO, thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/441861 (https://phabricator.wikimedia.org/T198092) (owner: 10Elukey) [14:07:58] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler02/11557/" [puppet] - 10https://gerrit.wikimedia.org/r/441870 (https://phabricator.wikimedia.org/T188377) (owner: 10Elukey) [14:08:22] addshore: https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/FileImporter/+/441872/ & https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/FileImporter/+/441873/ [14:08:33] 10Operations, 10netops: Rack/setup cr2-eqdfw - https://phabricator.wikimedia.org/T196941#4311985 (10ayounsi) As the current MX80 uses XFP-10G-LR optics and the MX204 uses EX-SFP-10GE-LR we're going to need 5*EX-SFP-10GE-LR optics (+ at least one spare). @Papaul how many EX-SFP-10GE-LR optics do you have? I'll... [14:08:36] * CFisch_WMDE hopes they work as cherry pick like that [14:09:01] CFisch_WMDE: just a comment on that patch, SiteTableSourceInterWikiLookup might be able to have a better name now :) [14:09:32] CFisch_WMDE: they can come after the other backport patch right? [14:09:32] yeah we changed that already somewhere [14:09:34] :-) [14:09:46] addshore: yeah [14:10:29] We just discovered that today and the patch is on the current master, so I had no time to see if they work like that but lets wait for jenkins [14:11:53] (03CR) 10Addshore: [C: 032] Add ar, de and fa wikipedia to FileImporter interwiki config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440860 (https://phabricator.wikimedia.org/T196969) (owner: 10WMDE-Fisch) [14:12:00] CFisch_WMDE: right, well, lets start with this ^^ [14:12:17] hashar: finally jenkin's done.. [14:12:25] \o/ [14:12:42] addshore: as long as we are not doing the "enable" ones nothing will happen anyway ^^ [14:12:53] addshore: CFisch_WMDE: I could use some minutes at some point to deploy an Echo patch for group0 ( https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Echo/+/441868/ ) [14:13:06] though it is probably better to do it after you are done [14:13:08] hashar: is it already merged? [14:13:21] (03Merged) 10jenkins-bot: Add ar, de and fa wikipedia to FileImporter interwiki config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440860 (https://phabricator.wikimedia.org/T196969) (owner: 10WMDE-Fisch) [14:13:40] !log banning and depooling elastic1042 (high response times) [14:13:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:13:44] hashar: feel free to sync it now, the above mediawiki-config patch is the only thing I have merged so far [14:13:59] (03PS2) 10Elukey: confluent::kafka::common: add workaround for apt on jessie hosts [puppet] - 10https://gerrit.wikimedia.org/r/441861 (https://phabricator.wikimedia.org/T198092) [14:14:16] CFisch_WMDE: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/440860 will probably break the beta file-importer [14:14:25] (03CR) 10jenkins-bot: Add ar, de and fa wikipedia to FileImporter interwiki config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440860 (https://phabricator.wikimedia.org/T196969) (owner: 10WMDE-Fisch) [14:14:41] !log merging jmxtrans and kafkatee's submodules to operations/puppet - part 2 (moving them back from environments/production) [14:14:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:14:48] CFisch_WMDE: you might have to add some config specific to beta in the -labs file too [14:14:49] PROBLEM - recommendation_api endpoints health on scb2003 is CRITICAL: /{domain}/v1/translation/articles/{source}{/seed} (normal source and target with seed) is CRITICAL: Test normal source and target with seed returned the unexpected status 404 (expecting: 200) [14:14:55] addshore: ok doing it [14:15:00] hashar: ack [14:15:14] addshore: yeahprobably [14:16:28] Pchelolo: syncing the echo patch for 1.32.0-wmf.999 [14:16:38] addshore: but on the other hand, if no conifg is set he will just fall back to a default prefix [14:16:41] cool, thank you hashar [14:16:56] CFisch_WMDE: CommonSettings.php is loaded on beta too :) [14:17:06] (03CR) 10Elukey: [C: 032] confluent::kafka::common: add workaround for apt on jessie hosts [puppet] - 10https://gerrit.wikimedia.org/r/441861 (https://phabricator.wikimedia.org/T198092) (owner: 10Elukey) [14:17:09] RECOVERY - recommendation_api endpoints health on scb2003 is OK: All endpoints are healthy [14:17:13] !log hashar@deploy1001 Synchronized php-1.32.0-wmf.999/extensions/Echo: Remove masterPos from the job specification - T192945 (duration: 00m 59s) [14:17:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:16] T192945: Make EchoNotification job JSON-serializable - https://phabricator.wikimedia.org/T192945 [14:18:10] addshore: ahhh :-) [14:18:14] hashar: am I okay to continue?: :) [14:18:15] Pchelolo: synced [14:18:17] addshore: yes [14:18:18] synced [14:18:36] fatal monitor showed some spike on elastica/cirrussearch but that is a known issue [14:18:50] thank you :) [14:18:52] CFisch_WMDE: but I guess youll be able to fix that with 1 more little patch, either set it to an empty array for beta? or whatever it needs to be [14:19:15] !log addshore@deploy1001 Synchronized wmf-config/CommonSettings.php: [[gerrit:440860|Add ar, de and fa wikipedia to FileImporter interwiki config]] T196969 T196976 (duration: 00m 56s) [14:19:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:18] T196976: Allow two- and one-path interwiki linking for user names - https://phabricator.wikimedia.org/T196976 [14:19:18] T196969: Deploy FileExporter and FileImporter to de-,fa- and ar-wiki as a beta feature - https://phabricator.wikimedia.org/T196969 [14:20:05] (03CR) 10Addshore: [C: 032] Enable license filters for the FileImporter in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440864 (https://phabricator.wikimedia.org/T194502) (owner: 10WMDE-Fisch) [14:20:08] hashar, Pchelolo: response times of Cirrus are going down again, we should be good. cc: dcausse [14:20:29] gehel: we didn't touch cirrus at all, we've touched echo [14:21:30] (03Merged) 10jenkins-bot: Enable license filters for the FileImporter in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440864 (https://phabricator.wikimedia.org/T194502) (owner: 10WMDE-Fisch) [14:21:34] Pchelolo: I saw you in the conversation about fatalmonitor. The Cirrus latencies are not related to anything you were doing... [14:21:40] Pchelolo: it's completely unrelated, we have perf issues from time to time so just a unfortunate coincidence [14:21:44] (03CR) 10jenkins-bot: Enable license filters for the FileImporter in production [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440864 (https://phabricator.wikimedia.org/T194502) (owner: 10WMDE-Fisch) [14:21:56] we can crash on our own, without any external help! [14:23:13] CFisch_WMDE: which of these would you like to check on mwdebug1002? [14:23:26] and fatal monitor is all happy :] [14:23:30] I guess the config ones don't need it, other than the ones that actually turn the thing on [14:23:46] yes [14:23:53] !log addshore@deploy1001 Synchronized wmf-config/CommonSettings.php: [[gerrit:440864|Enable license filters for the FileImporter in production]] T194502 (duration: 00m 55s) [14:23:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:23:55] T194502: Only allow moves of files that have one of the required templates - https://phabricator.wikimedia.org/T194502 [14:23:56] so I would start with https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/441013/ [14:24:07] that would be nice to test on debug [14:24:38] (03CR) 10Elukey: [C: 032] "Of course I am stupid and I have missed the rename of 'thirdparty/confluent' to 'thirdparty', that is clearly wrong. Going to fix it now! " [puppet] - 10https://gerrit.wikimedia.org/r/441861 (https://phabricator.wikimedia.org/T198092) (owner: 10Elukey) [14:24:44] https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/441014/ could also be tested on debug, but it not as interessting [14:24:54] CFisch_WMDE: okay, waiting for jenkins to +2 and merge the 2 "Add a temp config to allow interwiki linking" patchs [14:25:10] cool [14:25:25] CFisch_WMDE: then the other "Ignore namespace string..." patches should go in before we start enabeling? [14:25:38] yeah that would be good [14:25:43] okay! [14:26:09] imports would be blocked otherwise because he was not able to recognize localized template names ^^' [14:26:22] (03PS1) 10Ottomata: Disable public revision-score events until we figure out a good schema [puppet] - 10https://gerrit.wikimedia.org/r/441875 (https://phabricator.wikimedia.org/T197000) [14:26:28] and that's stupid when we start with non-english wikis :-D [14:27:15] (03PS1) 10Elukey: confluent::kafka::common: fix copy/paste mistake [puppet] - 10https://gerrit.wikimedia.org/r/441876 (https://phabricator.wikimedia.org/T198092) [14:27:30] ottomata: --^ sorry copy/pasta error :( [14:27:45] (03CR) 10Elukey: [C: 032] confluent::kafka::common: fix copy/paste mistake [puppet] - 10https://gerrit.wikimedia.org/r/441876 (https://phabricator.wikimedia.org/T198092) (owner: 10Elukey) [14:30:30] CFisch_WMDE: sycing https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/FileImporter/+/441871/ [14:30:58] !log addshore@deploy1001 Synchronized php-1.32.0-wmf.999/extensions/FileImporter: [[gerrit:441871|Add a temp config to allow interwiki linking]] T196976 (duration: 00m 58s) [14:31:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:31:01] T196976: Allow two- and one-path interwiki linking for user names - https://phabricator.wikimedia.org/T196976 [14:31:18] (03PS2) 10Ottomata: Disable public revision-score events until we figure out a good schema [puppet] - 10https://gerrit.wikimedia.org/r/441875 (https://phabricator.wikimedia.org/T197000) [14:31:28] (03CR) 10Ottomata: [V: 032 C: 032] Disable public revision-score events until we figure out a good schema [puppet] - 10https://gerrit.wikimedia.org/r/441875 (https://phabricator.wikimedia.org/T197000) (owner: 10Ottomata) [14:31:31] CFisch_WMDE: syncing 441009 [14:32:22] !log addshore@deploy1001 Synchronized php-1.32.0-wmf.8/extensions/FileImporter: [[gerrit:441009|Add a temp config to allow interwiki linking]] T196976 (duration: 00m 57s) [14:32:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:32:49] CFisch_WMDE: waiting for CI for the "Ignore namespace string" ones again now [14:33:00] yepps [14:33:10] RECOVERY - CirrusSearch eqiad 95th percentile latency on graphite1001 is OK: OK: Less than 20.00% above the threshold [500.0] https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?panelId=19&fullscreen&orgId=1&var-cluster=eqiad&var-smoothing=1 [14:36:00] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Broken apt config on kafka/analytics hosts - https://phabricator.wikimedia.org/T198092#4312118 (10elukey) [14:36:11] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Broken apt config on kafka/analytics hosts - https://phabricator.wikimedia.org/T198092#4311563 (10elukey) [14:36:22] CFisch_WMDE: will sync the next 2 backports now, starting with .999 [14:36:30] 10Operations, 10ops-codfw, 10DNS, 10Traffic, 10Patch-For-Review: rack/setup/install dns200[12].wikimedia.org - https://phabricator.wikimedia.org/T196493#4312121 (10Papaul) a:05Papaul>03Vgutierrez [14:37:53] !log addshore@deploy1001 Synchronized php-1.32.0-wmf.999/extensions/FileImporter: [[gerrit:441873|Ignore namespace string when checking templates and categories]] (duration: 00m 58s) [14:37:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:38:08] 10Operations, 10Icinga, 10monitoring: Extend dpkg Icinga check to also check for inconsistent apt state - https://phabricator.wikimedia.org/T190693#4312141 (10MoritzMuehlenhoff) This would also have been useful for T198092 [14:39:36] CFisch_WMDE: syncing that one to .8 now [14:40:16] \o/ :-) [14:40:26] !log addshore@deploy1001 Synchronized php-1.32.0-wmf.8/extensions/FileImporter: [[gerrit:441872|Ignore namespace string when checking templates and categories]] (duration: 00m 56s) [14:40:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:09] CFisch_WMDE: right then, time to turn it on on commons [14:41:09] ? [14:41:28] (03CR) 10Addshore: [C: 032] Enable FileImporter on Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441013 (https://phabricator.wikimedia.org/T196969) (owner: 10WMDE-Fisch) [14:41:29] jepps :-) [14:43:00] (03Merged) 10jenkins-bot: Enable FileImporter on Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441013 (https://phabricator.wikimedia.org/T196969) (owner: 10WMDE-Fisch) [14:43:26] CFisch_WMDE: it is on mwdebug1002 [14:43:38] (03CR) 10jenkins-bot: Enable FileImporter on Wikimedia Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441013 (https://phabricator.wikimedia.org/T196969) (owner: 10WMDE-Fisch) [14:43:44] ok i run my tests now then.. [14:44:22] Without actually clicking the import button, it all seems to work for me :) [14:44:44] using https://commons.wikimedia.org/wiki/Special:ImportFile?clientUrl=%2F%2Ftest2.wikipedia.org%2Fwiki%2FFile%3AOld_bike.jpg&importSource=FileExporter [14:48:08] jepp :-) [14:48:18] all good to go? [14:48:19] Also the filter checks work [14:48:25] let me just do a propper import [14:48:29] ack [14:48:32] * CFisch_WMDE fixes the template [14:51:40] (03CR) 10Vgutierrez: [C: 032] standard: add dns200[12] to codfw ntp peer list [puppet] - 10https://gerrit.wikimedia.org/r/441860 (https://phabricator.wikimedia.org/T196493) (owner: 10Vgutierrez) [14:51:57] (03PS2) 10Vgutierrez: standard: add dns200[12] to codfw ntp peer list [puppet] - 10https://gerrit.wikimedia.org/r/441860 (https://phabricator.wikimedia.org/T196493) [14:53:00] CFisch_WMDE: any joy? [14:53:25] addshore: \o/ [14:53:51] ( of cause I took a German image and had to rename all the stuff -.- ) [14:53:54] but it works [14:54:29] looks pretty good to me [14:54:52] CFisch_WMDE: so in the revision history now the user is interwiki linked, but in the file history on the page they are not [14:55:10] yeah interwiki links in the file history do not work [14:55:18] ack [14:55:30] or at least the import process does not support that in an easy way [14:55:33] we are aware [14:56:04] cool, shall i sync? [14:56:18] Hi. Who from operations handle requests/patches from operations/dns.git, appart from mutante (who's on vacation?) [14:56:20] goat for it! [14:56:32] 10Operations, 10Analytics, 10Cleanup, 10Patch-For-Review: Archive operations/puppet/varnishkafka repository - https://phabricator.wikimedia.org/T197503#4312224 (10Krinkle) [14:57:02] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:441013|Enable FileImporter on Wikimedia Commons]] T196969 (duration: 00m 56s) [14:57:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:05] T196969: Deploy FileExporter and FileImporter to de-,fa- and ar-wiki as a beta feature - https://phabricator.wikimedia.org/T196969 [14:57:10] CFisch_WMDE: ^^ [14:57:16] on to the final one! [14:57:18] 10Operations, 10Analytics, 10Cleanup, 10Patch-For-Review: Archive operations/puppet/varnishkafka repository - https://phabricator.wikimedia.org/T197503#4294420 (10Krinkle) At , I've set the description to `[ARCHIVED] Merged int... [14:57:21] wohooo [14:57:21] (03PS1) 10Aaron Schulz: Use ReplicatedBagOStuff for WAN cache on deployment-prep [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441881 [14:57:22] * Hauskatze wants to test that [14:57:23] (03CR) 10Addshore: [C: 032] Enable FileExpoter on ar-, de- and fa-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441014 (https://phabricator.wikimedia.org/T196969) (owner: 10WMDE-Fisch) [14:58:10] !log Deploy schema change on dbstore1002:s2 T191316 T192926 T89737 T195193 [14:58:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:58:14] T89737: Make several mediawiki table fields unsigned ints on wmf databases - https://phabricator.wikimedia.org/T89737 [14:58:14] T192926: Schema change to drop archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T192926 [14:58:14] T195193: Schema change for ct_tag_id field to change_tag - https://phabricator.wikimedia.org/T195193 [14:58:14] T191316: Schema change to make archive.ar_rev_id NOT NULL - https://phabricator.wikimedia.org/T191316 [14:59:19] (03Merged) 10jenkins-bot: Enable FileExpoter on ar-, de- and fa-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441014 (https://phabricator.wikimedia.org/T196969) (owner: 10WMDE-Fisch) [14:59:33] (03CR) 10jenkins-bot: Enable FileExpoter on ar-, de- and fa-wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441014 (https://phabricator.wikimedia.org/T196969) (owner: 10WMDE-Fisch) [14:59:42] PROBLEM - Host 2620:0:860:3:d294:66ff:fe5f:6a40 is DOWN: PING CRITICAL - Packet loss = 100% [14:59:43] CFisch_WMDE: on mwdebug1002 [15:00:04] addshore and leszek_wmde: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for Enable WikibaseLexeme on Wikibase client wikis . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180625T1500). [15:00:15] leszek_wmde: will be 5 more min :) [15:00:24] addshore: ack [15:00:52] addshore: \o/ :-) [15:00:55] (03PS6) 10Addshore: Load WikibaseLexeme on testwiki (again again) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438005 (https://phabricator.wikimedia.org/T197454) [15:00:57] leszek_wmde: there are no backports to do right, just the config changes? all the code is on the .8 and .999 branches that is needed? [15:00:58] Looks good! [15:01:03] CFisch_WMDE: cool! will sync! [15:01:08] Wohoo [15:01:19] addshore: let me check .999 [15:01:44] CFisch_WMDE: syncing [15:01:51] (03PS1) 10Elukey: Archive repository [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/441883 [15:02:02] CFisch_WMDE: you realy might want to check beta still functions as expected [15:02:07] (03CR) 10jerkins-bot: [V: 04-1] Archive repository [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/441883 (owner: 10Elukey) [15:02:15] I imagine there are some things that might be a bit messed up there now [15:02:16] addshore: yeah thanks [15:02:22] will do so [15:02:28] Might be worth filing a ticket for it so we dont forget [15:02:28] (03PS2) 10Elukey: Archive repository [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/441883 (https://phabricator.wikimedia.org/T198098) [15:02:40] (03CR) 10jerkins-bot: [V: 04-1] Archive repository [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/441883 (https://phabricator.wikimedia.org/T198098) (owner: 10Elukey) [15:02:51] PROBLEM - Host 2620:0:860:3:d294:66ff:fe5f:6a40 is DOWN: CRITICAL - Destination Unreachable (2620:0:860:3:d294:66ff:fe5f:6a40) [15:03:16] ^^ that is a very human readable problem :D [15:03:27] that's me [15:03:32] installing dns2001, sorry [15:03:33] addshore: both branches look OK! [15:03:41] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:441014|Enable FileExpoter on ar-, de- and fa-wiki]] T196969 T197066 (duration: 00m 57s) [15:03:42] leszek_wmde: awesome [15:03:43] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Broken apt config on kafka/analytics hosts - https://phabricator.wikimedia.org/T198092#4312269 (10Nuria) a:03elukey [15:03:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:03:44] CFisch_WMDE: all done! [15:03:44] T197066: Allow moves from mediawiki.org to Commons - https://phabricator.wikimedia.org/T197066 [15:03:44] T196969: Deploy FileExporter and FileImporter to de-,fa- and ar-wiki as a beta feature - https://phabricator.wikimedia.org/T196969 [15:03:55] \o/\o/ thanks addshore ! [15:04:00] (03CR) 10Addshore: [C: 032] Load WikibaseLexeme on testwiki (again again) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438005 (https://phabricator.wikimedia.org/T197454) (owner: 10Addshore) [15:04:11] leszek_wmde: ^^ lets try this again again then! [15:05:22] !log FileImporter slot done, moving onto Lexeme slot [15:05:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:05:39] (03Merged) 10jenkins-bot: Load WikibaseLexeme on testwiki (again again) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438005 (https://phabricator.wikimedia.org/T197454) (owner: 10Addshore) [15:06:07] addshore: is the file(exporter|importer) feat also active on mediawiki.org? [15:06:08] right leszek_wmde, WikibaseLexeme is now loaded on test on mwdebug1002 [15:06:24] (03CR) 10Ottomata: [C: 031] role::cache::kafka:*: add more alarms for varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/441808 (https://phabricator.wikimedia.org/T198070) (owner: 10Elukey) [15:06:35] Hauskatze: yes [15:06:39] :D [15:06:46] mediawiki.org -> commons [15:07:05] hopefully some day metawiki -> commons? [15:07:15] tons of images to transfer still... [15:07:20] Hauskatze: Ask CFisch_WMDE :) [15:08:48] CFisch_WMDE: maybe you'd like to enable meta -> commons file transfer too (another day?)? [15:08:54] CFisch_WMDE: remember to maybe re enable that abusefilter on test that we turned off :) [15:09:14] leszek_wmde: I see no issues, how about you? [15:09:25] addshore: trying [15:09:49] addshore: you have an item with lexeme in statement on test handy? [15:10:14] oh, Q33 would do [15:10:24] nope, I was just looking for the issues that we had last time first, apis etc [15:10:36] (03CR) 10jenkins-bot: Load WikibaseLexeme on testwiki (again again) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438005 (https://phabricator.wikimedia.org/T197454) (owner: 10Addshore) [15:10:59] (03PS1) 10Elukey: Archive repository [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/441886 (https://phabricator.wikimedia.org/T198097) [15:11:14] (03CR) 10jerkins-bot: [V: 04-1] Archive repository [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/441886 (https://phabricator.wikimedia.org/T198097) (owner: 10Elukey) [15:12:53] addshore: ok, I think all good [15:13:12] leszek_wmde: did you make a page on testwiki using Q33? [15:14:16] addshore: i did: https://test.wikipedia.org/wiki/Client_test [15:14:21] aaah, I see it now :) [15:14:23] also checked apis, special pages [15:14:28] cool, will sync [15:15:43] (03PS6) 10Addshore: Load WikibaseLexeme on all of group0 (again) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438006 (https://phabricator.wikimedia.org/T197454) [15:15:50] (03PS6) 10Addshore: Load WikibaseLexeme on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436498 (https://phabricator.wikimedia.org/T195615) [15:16:01] (03PS6) 10Addshore: Load WikibaseLexeme on all wikidata clients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436499 (https://phabricator.wikimedia.org/T195615) [15:16:24] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:438005|Load WikibaseLexeme on testwiki (again again)]] T197454 (duration: 00m 55s) [15:16:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:16:26] T197454: Deploy WikibaseLexeme to wikidata clients - https://phabricator.wikimedia.org/T197454 [15:16:29] addshore: before you do, could you please do action=purge on Client_test? [15:16:34] nah, too late :) [15:16:39] * addshore will do that now ;) [15:16:56] heh leszek_wmde "Error: 500, Internal Server Error " [15:16:59] yup [15:17:04] suddenly [15:18:04] leszek_wmde: [15:18:04] [WzEHXwpAIDsAADiFxnEAAAAL] /w/api.php PHP Fatal Error from line 65 of /srv/mediawiki/php-1.32.0-wmf.999/extensions/WikibaseLexeme/WikibaseLexeme.datatypes.php: Class undefined: Wikibase\Repo\WikibaseRepo [15:18:06] will revert [15:18:21] addshore: yeah the Abusefilter rule can probably go back now ^^ [15:18:29] (03PS1) 10Addshore: Revert "Load WikibaseLexeme on testwiki (again again)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441889 [15:18:34] (03CR) 10Addshore: [C: 032] Revert "Load WikibaseLexeme on testwiki (again again)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441889 (owner: 10Addshore) [15:18:39] Hauskatze: I will give that feedback to the PM ^^ [15:18:49] addshore: just dug up the same error from logstash. whyyyyy? [15:18:57] CFisch_WMDE: ack, thanks [15:19:03] leszek_wmde: it could be to do with some order, I'll revert and then look [15:19:44] (03CR) 10Addshore: [V: 032 C: 032] Revert "Load WikibaseLexeme on testwiki (again again)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441889 (owner: 10Addshore) [15:20:08] syncing [15:20:25] addshore: no, I think this just can't work [15:20:59] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: REVERT: [[gerrit:438005|Load WikibaseLexeme on testwiki (again again)]] T197454 (duration: 00m 57s) [15:21:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:21:10] https://test.wikipedia.org/wiki/Client_test is aliveagain now [15:21:14] (03CR) 1020after4: "Thanks for reviewing this, I haven't forgotten this change I just got busy with other unexpected and urgent work on Gerrit and Phabricator" [puppet] - 10https://gerrit.wikimedia.org/r/432528 (https://phabricator.wikimedia.org/T182085) (owner: 1020after4) [15:21:32] (03PS1) 10Addshore: Load WikibaseLexeme on testwiki (again again again) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441890 [15:21:35] (03CR) 10jenkins-bot: Revert "Load WikibaseLexeme on testwiki (again again)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441889 (owner: 10Addshore) [15:21:40] (03CR) 10Addshore: [C: 04-2] Load WikibaseLexeme on testwiki (again again again) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441890 (owner: 10Addshore) [15:21:54] (03PS7) 10Addshore: Load WikibaseLexeme on all of group0 (again) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438006 (https://phabricator.wikimedia.org/T197454) [15:21:58] (03CR) 10Addshore: [C: 04-2] Load WikibaseLexeme on all of group0 (again) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438006 (https://phabricator.wikimedia.org/T197454) (owner: 10Addshore) [15:22:04] (03PS7) 10Addshore: Load WikibaseLexeme on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436498 (https://phabricator.wikimedia.org/T195615) [15:22:08] (03CR) 10Addshore: [C: 04-2] Load WikibaseLexeme on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436498 (https://phabricator.wikimedia.org/T195615) (owner: 10Addshore) [15:22:14] (03PS7) 10Addshore: Load WikibaseLexeme on all wikidata clients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436499 (https://phabricator.wikimedia.org/T195615) [15:22:17] (03CR) 10Addshore: [C: 04-2] Load WikibaseLexeme on all wikidata clients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/436499 (https://phabricator.wikimedia.org/T195615) (owner: 10Addshore) [15:22:36] leszek_wmde: lets move to #wikimedia-de-tech [15:24:51] (03PS1) 10Vgutierrez: replace achernar with dns2002 [puppet] - 10https://gerrit.wikimedia.org/r/441891 (https://phabricator.wikimedia.org/T196493) [15:25:33] (03CR) 10jerkins-bot: [V: 04-1] replace achernar with dns2002 [puppet] - 10https://gerrit.wikimedia.org/r/441891 (https://phabricator.wikimedia.org/T196493) (owner: 10Vgutierrez) [15:25:42] ouch [15:26:33] (03PS2) 10Vgutierrez: replace achernar with dns2002 [puppet] - 10https://gerrit.wikimedia.org/r/441891 (https://phabricator.wikimedia.org/T196493) [15:26:34] our lovely commit validator :) [15:27:28] 10Operations, 10Analytics, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345#4312330 (10elukey) Sorry to cause all this noise @faidon and @RobH, but after a chat with my team we have some concern related to moving people around between sta... [15:27:53] 10Operations, 10Analytics, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345#4312331 (10elukey) So to summarize: my team would prefer a new host rather than the spare one. [15:30:07] (03PS3) 10Vgutierrez: replace achernar with dns2002 [puppet] - 10https://gerrit.wikimedia.org/r/441891 (https://phabricator.wikimedia.org/T196493) [15:31:10] (03CR) 10Elukey: [V: 032 C: 032] Archive repository [puppet/jmxtrans] - 10https://gerrit.wikimedia.org/r/441886 (https://phabricator.wikimedia.org/T198097) (owner: 10Elukey) [15:31:45] 10Operations, 10Analytics, 10Cleanup, 10Patch-For-Review, 10User-Elukey: Archive operations/puppet/jmxtrans repository - https://phabricator.wikimedia.org/T198097#4312344 (10elukey) [15:31:56] (03CR) 10Elukey: [V: 032 C: 032] Archive repository [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/441883 (https://phabricator.wikimedia.org/T198098) (owner: 10Elukey) [15:32:18] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Archive operations/puppet/kafkatee repository - https://phabricator.wikimedia.org/T198098#4312347 (10elukey) [15:35:01] 10Operations, 10Analytics, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345#4312349 (10RobH) We don't have to move users off a box as soon as the warranty expires, in fact we tend to run boxes for 4-5 years when warrantied for 3. @elukey... [15:35:31] (03PS8) 10EBernhardson: Prep work for multi-instance elasticsearch refactor [puppet] - 10https://gerrit.wikimedia.org/r/440498 [15:35:45] (03PS1) 10EBernhardson: [WIP] convert role::logstash::elasticsearch to profiles [puppet] - 10https://gerrit.wikimedia.org/r/441894 [15:43:42] 10Operations, 10Analytics, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345#4312357 (10Ottomata) We have budget for a new stat box next FY. We'd like to use that budget to order the new box, move stat1005 users to it, and then use stat10... [15:47:55] (03PS4) 10Vgutierrez: conftool-data: Add dns200[12] to pdns_recursor service [puppet] - 10https://gerrit.wikimedia.org/r/441891 (https://phabricator.wikimedia.org/T196493) [15:50:24] (03CR) 10Vgutierrez: [C: 032] conftool-data: Add dns200[12] to pdns_recursor service [puppet] - 10https://gerrit.wikimedia.org/r/441891 (https://phabricator.wikimedia.org/T196493) (owner: 10Vgutierrez) [15:51:49] 10Operations, 10Analytics, 10SRE-Access-Requests: Requesting access for mbsantos - https://phabricator.wikimedia.org/T197237#4312386 (10RobH) [15:52:17] (03PS1) 10RobH: add new shell user mbsantos [puppet] - 10https://gerrit.wikimedia.org/r/441900 (https://phabricator.wikimedia.org/T197237) [15:52:28] gerrit is being slow. [15:53:18] robh pushing or web ui? [15:53:24] pushing to it [15:53:36] ah yep, that's expected that repo is quite large. [15:53:36] https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&orgId=1&var-server=cobalt&var-network=eth0 [15:53:47] i mean slower than usual ;D [15:53:51] (03PS2) 10EBernhardson: [WIP] convert role::logstash::elasticsearch to profiles [puppet] - 10https://gerrit.wikimedia.org/r/441894 [15:53:53] (03PS6) 10EBernhardson: prometheus/elasticsearch support multiple exporters per host [puppet] - 10https://gerrit.wikimedia.org/r/441321 [15:53:55] (03PS9) 10EBernhardson: [WIP] Split instance define out of elasticsearch class [puppet] - 10https://gerrit.wikimedia.org/r/441338 [15:53:57] (03PS37) 10EBernhardson: [WIP] Allow multiple elasticsearch instances per host [puppet] - 10https://gerrit.wikimedia.org/r/440049 [15:54:46] (03CR) 10RobH: [C: 032] add new shell user mbsantos [puppet] - 10https://gerrit.wikimedia.org/r/441900 (https://phabricator.wikimedia.org/T197237) (owner: 10RobH) [15:54:55] (03PS2) 10RobH: add new shell user mbsantos [puppet] - 10https://gerrit.wikimedia.org/r/441900 (https://phabricator.wikimedia.org/T197237) [15:55:12] (03PS1) 10Bmansurov: Enable logging for Schema:CitationUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441901 (https://phabricator.wikimedia.org/T191086) [15:55:23] could just be me though, who knows. (doesnt happen until it happens half a dozen times!) [15:56:33] (03PS3) 10Bmansurov: Increase Schema:CitationUsage sampling rate to 15% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440867 (https://phabricator.wikimedia.org/T191086) [15:56:35] (03PS2) 10Bmansurov: Increase Schema:CitationUsage sampling rate to 100% [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441567 (https://phabricator.wikimedia.org/T191086) [15:56:37] (03PS2) 10Bmansurov: Stop collecting data for Schema:CitationUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441568 (https://phabricator.wikimedia.org/T191086) [15:56:39] (03PS2) 10Bmansurov: Enable logging for Schema:CitationUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441901 (https://phabricator.wikimedia.org/T191086) [16:00:24] !log vgutierrez@puppetmaster1001 conftool action : set/pooled=yes; selector: name=dns2001.wikimedia.org [16:00:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:03:30] (03PS1) 10RobH: adding mbsantos to mulitple user groups [puppet] - 10https://gerrit.wikimedia.org/r/441903 (https://phabricator.wikimedia.org/T197237) [16:04:03] (03PS2) 10Rush: Phab: Explain to not-yet-approved users how they can access tasks [puppet] - 10https://gerrit.wikimedia.org/r/441806 (https://phabricator.wikimedia.org/T197550) (owner: 10Aklapper) [16:04:29] (03CR) 10RobH: [C: 032] adding mbsantos to mulitple user groups [puppet] - 10https://gerrit.wikimedia.org/r/441903 (https://phabricator.wikimedia.org/T197237) (owner: 10RobH) [16:04:38] (03CR) 10Rush: [C: 032] Phab: Explain to not-yet-approved users how they can access tasks [puppet] - 10https://gerrit.wikimedia.org/r/441806 (https://phabricator.wikimedia.org/T197550) (owner: 10Aklapper) [16:04:52] !log vgutierrez@puppetmaster1001 conftool action : set/pooled=yes; selector: name=dns2002.wikimedia.org [16:04:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:05:28] (03CR) 10Rush: [C: 032] "I understand there is ongoing disagreement about whether this should be needed, but since it is the current reality informing users seems " [puppet] - 10https://gerrit.wikimedia.org/r/441806 (https://phabricator.wikimedia.org/T197550) (owner: 10Aklapper) [16:05:42] (03PS3) 10Rush: Phab: Explain to not-yet-approved users how they can access tasks [puppet] - 10https://gerrit.wikimedia.org/r/441806 (https://phabricator.wikimedia.org/T197550) (owner: 10Aklapper) [16:07:34] (03PS1) 10Rush: Revert "Phab: Explain to not-yet-approved users how they can access tasks" [puppet] - 10https://gerrit.wikimedia.org/r/441904 [16:07:55] (03CR) 10Rush: [C: 032] Revert "Phab: Explain to not-yet-approved users how they can access tasks" [puppet] - 10https://gerrit.wikimedia.org/r/441904 (owner: 10Rush) [16:08:22] (03PS2) 10Rush: Revert "Phab: Explain to not-yet-approved users how they can access tasks" [puppet] - 10https://gerrit.wikimedia.org/r/441904 [16:08:30] (03CR) 10Rush: [V: 032 C: 032] Revert "Phab: Explain to not-yet-approved users how they can access tasks" [puppet] - 10https://gerrit.wikimedia.org/r/441904 (owner: 10Rush) [16:08:33] 10Operations, 10Analytics, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access for mbsantos - https://phabricator.wikimedia.org/T197237#4312438 (10RobH) 05Open>03Resolved @MSantos: Your access request has been merged live, with all of the groups you requested. Since this is a new account fo... [16:09:06] (03PS3) 10EBernhardson: [WIP] convert role::logstash::elasticsearch to profiles [puppet] - 10https://gerrit.wikimedia.org/r/441894 [16:10:01] (03CR) 10Jayprakash12345: "@SWAT members, We can't test it on mwdebug1002, So if will not find any error in logstash then go direct for sync." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441069 (https://phabricator.wikimedia.org/T196445) (owner: 10Jayprakash12345) [16:13:24] !log vgutierrez@puppetmaster1001 conftool action : set/pooled=no; selector: name=acamar.wikimedia.org [16:13:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:13:29] (03CR) 10Aklapper: "Uhm, big sorry! Guess I should have escaped apostrophes and not trust the (broken?) syntax highlighting of my text editor. :-/" [puppet] - 10https://gerrit.wikimedia.org/r/441904 (owner: 10Rush) [16:16:11] (03PS4) 10EBernhardson: [WIP] convert role::logstash::elasticsearch to profiles [puppet] - 10https://gerrit.wikimedia.org/r/441894 [16:16:13] 10Operations, 10Proton, 10SRE-Access-Requests, 10Patch-For-Review: Add @pmiazga @Niedzielski and @phuedx to the deploy-service group - https://phabricator.wikimedia.org/T197857#4312458 (10RobH) >>! In T197857#4311094, @gerritbot wrote: > Change 441379 **merged** by Alexandros Kosiaris: > [operations/puppet... [16:18:23] 10Operations, 10DNS, 10Traffic, 10Patch-For-Review: rack/setup/install dns200[12].wikimedia.org - https://phabricator.wikimedia.org/T196493#4312463 (10RobH) [16:19:02] !log vgutierrez@puppetmaster1001 conftool action : set/pooled=no; selector: name=achernar.wikimedia.org [16:19:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:21:57] 10Operations, 10Analytics, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access for mbsantos - https://phabricator.wikimedia.org/T197237#4312472 (10MSantos) Thanks, for all the support and thank you @RobH for the warnings. [16:32:12] (03PS1) 10Papaul: DNS: Add mgmt & production DNS entries for graphite2003 [dns] - 10https://gerrit.wikimedia.org/r/441908 (https://phabricator.wikimedia.org/T196483) [16:32:27] (03PS1) 10Addshore: BETA: Load WikibaseLexeme on beta enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441909 [16:34:05] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install graphite2003 - https://phabricator.wikimedia.org/T196483#4312521 (10Papaul) [16:36:34] 10Operations, 10ops-codfw, 10netops: switch port configuration for graphite2003 - https://phabricator.wikimedia.org/T198119#4312529 (10Papaul) p:05Triage>03Normal [16:37:00] (03PS1) 10Vgutierrez: conftool-data: Remove achernar and acamar from pdns_recursor [puppet] - 10https://gerrit.wikimedia.org/r/441911 (https://phabricator.wikimedia.org/T196493) [16:39:52] (03CR) 10Vgutierrez: [C: 032] conftool-data: Remove achernar and acamar from pdns_recursor [puppet] - 10https://gerrit.wikimedia.org/r/441911 (https://phabricator.wikimedia.org/T196493) (owner: 10Vgutierrez) [16:40:05] (03PS2) 10Vgutierrez: conftool-data: Remove achernar and acamar from pdns_recursor [puppet] - 10https://gerrit.wikimedia.org/r/441911 (https://phabricator.wikimedia.org/T196493) [16:41:22] 10Operations, 10ops-codfw: rack/setup/install authdns2001.wikimedia.org - https://phabricator.wikimedia.org/T196664#4312570 (10Papaul) [16:41:27] (03PS4) 10MarcoAurelio: Increase password policies for 'steward' to max [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440834 (https://phabricator.wikimedia.org/T197577) [16:41:57] * addshore is merging a beta only mediawiki-config patch [16:42:02] jouncebot: now [16:42:02] For the next 0 hour(s) and 17 minute(s): Enable WikibaseLexeme on Wikibase client wikis (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180625T1500) [16:42:08] (03CR) 10Addshore: [C: 032] BETA: Load WikibaseLexeme on beta enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441909 (owner: 10Addshore) [16:42:13] oh, it's still my slot anyway! [16:43:35] (03Merged) 10jenkins-bot: BETA: Load WikibaseLexeme on beta enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441909 (owner: 10Addshore) [16:43:52] (03CR) 10jenkins-bot: BETA: Load WikibaseLexeme on beta enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441909 (owner: 10Addshore) [16:45:18] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: Beta only [[gerrit:441909]] (duration: 00m 57s) [16:45:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:45:37] (03PS1) 10Vgutierrez: standard: Remove acamar and achernar from ntp peer list [puppet] - 10https://gerrit.wikimedia.org/r/441912 (https://phabricator.wikimedia.org/T196493) [16:45:52] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441913 (https://phabricator.wikimedia.org/T128546) [16:46:36] jouncebot: next [16:46:36] In 0 hour(s) and 13 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180625T1700) [16:47:24] (03CR) 10Vgutierrez: [C: 032] standard: Remove acamar and achernar from ntp peer list [puppet] - 10https://gerrit.wikimedia.org/r/441912 (https://phabricator.wikimedia.org/T196493) (owner: 10Vgutierrez) [16:47:34] (03PS2) 10Vgutierrez: standard: Remove acamar and achernar from ntp peer list [puppet] - 10https://gerrit.wikimedia.org/r/441912 (https://phabricator.wikimedia.org/T196493) [16:47:50] 10Operations, 10netops: Rack/setup cr2-eqdfw - https://phabricator.wikimedia.org/T196941#4312600 (10Papaul) @ayounsi I have none. I had 12 left but I used them to connect the lvs2009 and lvs2010 [16:48:26] (03CR) 10Filippo Giunchedi: DNS: Add mgmt & production DNS entries for graphite2003 (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/441908 (https://phabricator.wikimedia.org/T196483) (owner: 10Papaul) [16:49:39] (03PS2) 10Elukey: role::cache::kafka:*: add more alarms for varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/441808 (https://phabricator.wikimedia.org/T198070) [16:50:44] (03PS2) 10Papaul: DNS: Add mgmt & production DNS entries for graphite2003 [dns] - 10https://gerrit.wikimedia.org/r/441908 (https://phabricator.wikimedia.org/T196483) [16:52:49] (03CR) 10Krinkle: "The naming difference (pecl vs nutcracker) were confusing at first, but underlying logic indeed matches now with https://gerrit.wikimedia." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441881 (owner: 10Aaron Schulz) [16:54:04] (03CR) 10Filippo Giunchedi: [C: 031] DNS: Add mgmt & production DNS entries for graphite2003 [dns] - 10https://gerrit.wikimedia.org/r/441908 (https://phabricator.wikimedia.org/T196483) (owner: 10Papaul) [16:54:25] papaul: I have to go btw, but patch looks good [16:54:45] godog: thanks [17:00:05] gehel: #bothumor I � Unicode. All rise for Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180625T1700). [17:01:04] (03PS1) 10Vgutierrez: lvs: use the new dns200[12] recursive DNS servers [puppet] - 10https://gerrit.wikimedia.org/r/441916 (https://phabricator.wikimedia.org/T196493) [17:01:42] jouncebot: wdqs deployment will be delayed, waiting for one more fix to T198042 [17:01:42] T198042: WDQS timeout on the public eqiad cluster - https://phabricator.wikimedia.org/T198042 [17:03:31] (03PS1) 10Vgutierrez: hieradata: Get rid of acamar and achenar references [puppet] - 10https://gerrit.wikimedia.org/r/441918 (https://phabricator.wikimedia.org/T196493) [17:04:18] (03CR) 10Aaron Schulz: Use ReplicatedBagOStuff for WAN cache on deployment-prep (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441881 (owner: 10Aaron Schulz) [17:05:27] (03PS2) 10Vgutierrez: lvs: use the new dns200[12] recursive DNS servers [puppet] - 10https://gerrit.wikimedia.org/r/441916 (https://phabricator.wikimedia.org/T196493) [17:07:22] (03CR) 10Smalyshev: [C: 031] wdqs: don't log MemoryManagerClosedException [puppet] - 10https://gerrit.wikimedia.org/r/441774 (https://phabricator.wikimedia.org/T198046) (owner: 10Gehel) [17:07:46] (03PS1) 10Vgutierrez: smokeping: Replace acamar & achernar with dns2001 and dns2002 [puppet] - 10https://gerrit.wikimedia.org/r/441919 (https://phabricator.wikimedia.org/T196493) [17:08:24] (03PS5) 10EBernhardson: [WIP] convert role::logstash::elasticsearch to profiles [puppet] - 10https://gerrit.wikimedia.org/r/441894 [17:12:03] 10Operations, 10Analytics, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345#4312713 (10elukey) >>! In T196345#4312349, @RobH wrote: > We don't have to move users off a box as soon as the warranty expires, in fact we tend to run boxes for... [17:13:37] (03CR) 10Krinkle: Use ReplicatedBagOStuff for WAN cache on deployment-prep (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441881 (owner: 10Aaron Schulz) [17:16:11] (03PS3) 1020after4: Fix phabricator rate limiting [puppet] - 10https://gerrit.wikimedia.org/r/441525 [17:16:43] (03CR) 10jerkins-bot: [V: 04-1] Fix phabricator rate limiting [puppet] - 10https://gerrit.wikimedia.org/r/441525 (owner: 1020after4) [17:17:21] (03PS4) 1020after4: Fix phabricator rate limiting [puppet] - 10https://gerrit.wikimedia.org/r/441525 (https://phabricator.wikimedia.org/T197922) [17:20:39] (03PS2) 10Gehel: wdqs: don't log MemoryManagerClosedException [puppet] - 10https://gerrit.wikimedia.org/r/441774 (https://phabricator.wikimedia.org/T198046) [17:22:13] (03CR) 10Gehel: [C: 032] wdqs: don't log MemoryManagerClosedException [puppet] - 10https://gerrit.wikimedia.org/r/441774 (https://phabricator.wikimedia.org/T198046) (owner: 10Gehel) [17:22:21] (03PS1) 10Thcipriani: Scap clean: remove remote cache directory [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441920 (https://phabricator.wikimedia.org/T157030) [17:23:48] (03PS3) 10Elukey: role::cache::kafka:*: add more alarms for varnishkafka [puppet] - 10https://gerrit.wikimedia.org/r/441808 (https://phabricator.wikimedia.org/T198070) [17:24:07] (03PS1) 10Papaul: DNS: Add mgmt DNS entries for authdns2001 [dns] - 10https://gerrit.wikimedia.org/r/441921 (https://phabricator.wikimedia.org/T196664) [17:24:32] (03CR) 10C. Scott Ananian: "> Sure, I mean why not beta meta or beta commons instead of" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/438079 (https://phabricator.wikimedia.org/T143628) (owner: 10C. Scott Ananian) [17:25:24] (03CR) 10Ayounsi: "2 tiny changes needed. Other than that LGTM!" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/441919 (https://phabricator.wikimedia.org/T196493) (owner: 10Vgutierrez) [17:25:51] (03CR) 10Elukey: "The last PS adds more up to date dashboard links" [puppet] - 10https://gerrit.wikimedia.org/r/441808 (https://phabricator.wikimedia.org/T198070) (owner: 10Elukey) [17:27:34] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install authdns2001.wikimedia.org - https://phabricator.wikimedia.org/T196664#4312768 (10Papaul) [17:29:11] 10Operations, 10Analytics, 10Cleanup, 10Patch-For-Review, 10User-Elukey: Archive operations/puppet/jmxtrans repository - https://phabricator.wikimedia.org/T198097#4312769 (10elukey) [17:29:34] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Archive operations/puppet/kafkatee repository - https://phabricator.wikimedia.org/T198098#4312770 (10elukey) [17:30:17] 10Operations, 10ops-codfw, 10netops: Swith port information for authdns2001 - https://phabricator.wikimedia.org/T198126#4312781 (10Papaul) p:05Triage>03Normal [17:30:19] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Import some Analytics git puppet submodules to operations/puppet - https://phabricator.wikimedia.org/T188377#4312793 (10elukey) All modules imported into operations/puppet, the remaining thing to do is cleaning up (subtasks). [17:30:49] 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install authdns2001.wikimedia.org - https://phabricator.wikimedia.org/T196664#4312794 (10Papaul) [17:36:04] !log Re-enable job and change-prop kafka topic mirroring from main-codfw -> main-eqiad [17:36:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:56:19] !log gehel@deploy1001 Started deploy [wdqs/wdqs@e9a1e13]: new version of wdqs GUI and updater (wdqs1009 only) [17:56:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:56:43] !log gehel@deploy1001 Finished deploy [wdqs/wdqs@e9a1e13]: new version of wdqs GUI and updater (wdqs1009 only) (duration: 00m 24s) [17:56:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:58:53] !log gehel@deploy1001 Started deploy [wdqs/wdqs@e9a1e13]: new version of wdqs GUI and updater [17:58:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:05] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: How many deployers does it take to do Morning SWAT (Max 6 patches) deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180625T1800). [18:00:05] Smalyshev, bmansurov, Jayprakash12345, and jan_drewniak: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:09] here [18:00:33] here here [18:04:05] I can SWAT [18:06:23] SMalyshev: ping for SWAT [18:07:02] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441901 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov) [18:07:12] !log gehel@deploy1001 Finished deploy [wdqs/wdqs@e9a1e13]: new version of wdqs GUI and updater (duration: 08m 19s) [18:07:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:08:19] SMalyshev: ^ wdqs deployment completed, tests are green [18:08:27] (03Merged) 10jenkins-bot: Enable logging for Schema:CitationUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441901 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov) [18:08:30] gehel: excellent, thanks [18:08:48] bmansurov: I will sync out the change to production (just so I don't surprise the next deployer), but it will be made live in beta by the next https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ build, FYI [18:09:29] thcipriani: do you mean I can test the change tomorrow? [18:11:00] bmansurov: no I mean that the change will be made live by the next run of this jenkins job: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ which should start in a few minutes (happens automatically) [18:11:42] thcipriani: oops, misread your message. thanks! [18:12:06] sure :) [18:12:56] Hello Everyone [18:13:13] Jayprakash12345: hiya [18:13:31] !log thcipriani@deploy1001 Synchronized wmf-config/InitialiseSettings-labs.php: SWAT: [[gerrit:441901|Enable logging for Schema:CitationUsage]] T191086 (noop beta-only sync) (duration: 00m 57s) [18:13:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:34] T191086: Instrument, collect data, and perform the first round of analysis on click-through data on citations/footnotes - https://phabricator.wikimedia.org/T191086 [18:14:43] hey friendly swatters, addshore hashar anomie aude_ MaxSem twentyafterfour RoanKattouw Dereckson thcipriani Niharika zeljkof etc [18:14:59] parsoid would like to deploy a bit early so #services can get to bed before midnight [18:15:17] AIUI it is already 9pm-ish where they are [18:15:48] could you ping me when you are done setting wikis on fire so i can light up parsoid? [18:16:08] cscott: sure thing, I'll ping you when complete [18:16:35] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441069 (https://phabricator.wikimedia.org/T196445) (owner: 10Jayprakash12345) [18:18:34] (03CR) 10Thcipriani: Add import sources to ta.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441069 (https://phabricator.wikimedia.org/T196445) (owner: 10Jayprakash12345) [18:18:46] (03PS5) 10Thcipriani: Add import sources to ta.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441069 (https://phabricator.wikimedia.org/T196445) (owner: 10Jayprakash12345) [18:19:14] (03CR) 10jenkins-bot: Enable logging for Schema:CitationUsage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441901 (https://phabricator.wikimedia.org/T191086) (owner: 10Bmansurov) [18:19:35] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441069 (https://phabricator.wikimedia.org/T196445) (owner: 10Jayprakash12345) [18:20:59] (03Merged) 10jenkins-bot: Add import sources to ta.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441069 (https://phabricator.wikimedia.org/T196445) (owner: 10Jayprakash12345) [18:21:55] Jayprakash12345: your change is live on mwdebug1002 if there's anything you'd like to check there [18:22:14] There is nothing for me [18:22:35] Go ahead if there is no error in logstash [18:22:39] (03PS2) 10Thcipriani: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441913 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [18:22:49] k, going live [18:25:24] !log thcipriani@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:441069|Add import sources to ta.wiktionary]] T196445 (duration: 00m 58s) [18:25:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:25:26] T196445: enable special:Import option in ta.wiktionary - https://phabricator.wikimedia.org/T196445 [18:25:30] ^ Jayprakash12345 live now, thanks! [18:25:55] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441913 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [18:26:20] thcipriani: Thanks :) [18:27:26] (03CR) 10jenkins-bot: Add import sources to ta.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441069 (https://phabricator.wikimedia.org/T196445) (owner: 10Jayprakash12345) [18:27:33] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441913 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [18:27:46] (03CR) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441913 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [18:28:48] jan_drewniak: fetched portals changes to mwdebug1002, check please [18:30:22] thcipriani: looks good! [18:31:05] jan_drewniak: ok, running sync-portals [18:32:28] !log thcipriani@deploy1001 Synchronized portals/wikipedia.org/assets: SWAT: [[gerrit:441913|Bumping portals to master]] T128546 (duration: 00m 57s) [18:32:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:32:31] T128546: [Recurring Task] Update Wikipedia and sister projects portals statistics - https://phabricator.wikimedia.org/T128546 [18:33:26] !log thcipriani@deploy1001 Synchronized portals: SWAT: [[gerrit:441913|Bumping portals to master]] T128546 (duration: 00m 56s) [18:33:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:33:32] jan_drewniak: ^ done! [18:33:57] thcipriani: thanks! [18:34:50] SMalyshev: are you available for SWAT for https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/440420 ? [18:44:45] cscott: SWAT is complete, all yours! [18:44:45] 10Operations, 10Performance-Team, 10Traffic, 10Patch-For-Review, 10Wikimedia-Incident: Collect Backend-Timing in Prometheus - https://phabricator.wikimedia.org/T131894#4313027 (10Gilles) 05Open>03stalled Stuck in review since May [18:55:57] thcipriani: thanks! [19:20:56] ok, folks, about to deploy parsoid [19:21:27] 10Operations, 10Cloud-Services, 10Security: Disable agent forwarding to important hosts - https://phabricator.wikimedia.org/T198138#4313143 (10Krenair) [19:22:48] !log cscott@deploy1001 Started deploy [parsoid/deploy@5925200]: Updating Parsoid to b068bb51 [19:22:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:24:59] 10Operations, 10Cloud-Services, 10Security: Disable agent forwarding to important hosts - https://phabricator.wikimedia.org/T198138#4313147 (10Krenair) Probably a good idea though I don't think anyone ever really minded for non-privileged users (i.e. people who can't do anything particularly sensitive in lab... [19:45:07] cscott: still deploying? [19:45:59] PROBLEM - keystone admin endpoint port 35357 on labcontrol1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:46:44] !log cscott@deploy1001 Started deploy [parsoid/deploy@5925200]: Updating Parsoid to b068bb51 [19:46:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:46:59] RECOVERY - keystone admin endpoint port 35357 on labcontrol1001 is OK: HTTP OK: HTTP/1.1 300 Multiple Choices - 783 bytes in 0.003 second response time [19:46:59] keystone pages are legit I think but unsure why [19:47:55] chasemp: ack, ping if needed ;) [19:48:08] volans: thanks! [19:49:19] mobrovac: yes, i had an scap issue (or ssh issue, not clear which) [19:49:27] moral of the story, always run scap in a screen [19:49:32] or tmux, if you are of that religion [19:49:40] anyway, it is progressing now [19:53:41] !log cscott@deploy1001 Finished deploy [parsoid/deploy@5925200]: Updating Parsoid to b068bb51 (duration: 06m 57s) [19:53:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:53:46] yay [20:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: That opportune time is upon us again. Time for a Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180625T2000). [20:02:17] (03PS1) 10Andrew Bogott: Allow the nova controller node to talk to the cloud puppet API [puppet] - 10https://gerrit.wikimedia.org/r/441942 [20:02:36] !log Completed deploy of Parsoid to version b068bb51 (T197949, T197702, T196799) [20:02:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:02:41] T197949: Parsoid dies if trying to transform to not-yet-supported language variant - https://phabricator.wikimedia.org/T197949 [20:02:42] T197702: Parsoid should not set Vary: Content-Type - https://phabricator.wikimedia.org/T197702 [20:02:42] T196799: Crasher in handleLinkNeighbours dom pass - https://phabricator.wikimedia.org/T196799 [20:03:06] (03CR) 10jerkins-bot: [V: 04-1] Allow the nova controller node to talk to the cloud puppet API [puppet] - 10https://gerrit.wikimedia.org/r/441942 (owner: 10Andrew Bogott) [20:04:37] (03PS2) 10Andrew Bogott: Allow the nova controller node to talk to the cloud puppet API [puppet] - 10https://gerrit.wikimedia.org/r/441942 [20:06:22] !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@fb70d4f]: Update mobileapps to 362f2a0 [20:06:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:07:56] !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@fb70d4f]: Update mobileapps to 362f2a0 (duration: 01m 34s) [20:07:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:08:29] ^deploying to canary failed, rolling back to investigate. [20:09:13] !log deploying mobileapps to canary (scb2001) failed, rolling back to investigate [20:09:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:10:12] (03PS1) 10MusikAnimal: Enable Draft namespace on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441944 (https://phabricator.wikimedia.org/T198143) [20:11:52] (03CR) 10Framawiki: [C: 031] Enable Draft namespace on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441944 (https://phabricator.wikimedia.org/T198143) (owner: 10MusikAnimal) [20:13:46] marostegui umm any idea about https://phabricator.wikimedia.org/T198111#4313305 ? [20:23:38] Nikerabbit: db1064 is a slave, so read only is correct there [20:24:12] marostegui: okay, sounds like a logic error in our code then [20:25:36] (03PS3) 10Andrew Bogott: Allow the nova controller node to talk to the cloud puppet API [puppet] - 10https://gerrit.wikimedia.org/r/441942 [20:25:38] (03PS1) 10Andrew Bogott: Glance: typo fix [puppet] - 10https://gerrit.wikimedia.org/r/441974 [20:26:54] (03CR) 10Andrew Bogott: [C: 032] Glance: typo fix [puppet] - 10https://gerrit.wikimedia.org/r/441974 (owner: 10Andrew Bogott) [20:36:00] (03PS4) 10Andrew Bogott: Allow the nova controller node to talk to the cloud puppet API [puppet] - 10https://gerrit.wikimedia.org/r/441942 [20:36:44] (03CR) 10jerkins-bot: [V: 04-1] Allow the nova controller node to talk to the cloud puppet API [puppet] - 10https://gerrit.wikimedia.org/r/441942 (owner: 10Andrew Bogott) [20:39:09] (03PS5) 10Andrew Bogott: Allow the nova controller node to talk to the cloud puppet API [puppet] - 10https://gerrit.wikimedia.org/r/441942 [20:44:23] (03PS6) 10Andrew Bogott: Allow the nova controller node to talk to the cloud puppet API [puppet] - 10https://gerrit.wikimedia.org/r/441942 [20:47:28] (03CR) 10Andrew Bogott: [C: 032] Allow the nova controller node to talk to the cloud puppet API [puppet] - 10https://gerrit.wikimedia.org/r/441942 (owner: 10Andrew Bogott) [20:52:41] mobileapps deployment take 2 is on its way. we ID'd the problem. [20:53:07] (03PS1) 10Ottomata: Install python oauth libs for Google Search Console data [puppet] - 10https://gerrit.wikimedia.org/r/441978 (https://phabricator.wikimedia.org/T197896) [20:53:56] (03PS2) 10Ottomata: Install python oauth libs for Google Search Console data [puppet] - 10https://gerrit.wikimedia.org/r/441978 (https://phabricator.wikimedia.org/T197896) [20:54:46] (03CR) 10Ottomata: [C: 032] Install python oauth libs for Google Search Console data [puppet] - 10https://gerrit.wikimedia.org/r/441978 (https://phabricator.wikimedia.org/T197896) (owner: 10Ottomata) [20:55:40] (03CR) 10Alex Monk: "pythonpython-oauth2client ?" [puppet] - 10https://gerrit.wikimedia.org/r/441978 (https://phabricator.wikimedia.org/T197896) (owner: 10Ottomata) [20:57:02] (03PS1) 10Ottomata: Fix typo in python package name on stat box [puppet] - 10https://gerrit.wikimedia.org/r/441979 [20:57:07] Krenair: ^ :p [20:57:18] (03CR) 10Ottomata: [V: 032 C: 032] Fix typo in python package name on stat box [puppet] - 10https://gerrit.wikimedia.org/r/441979 (owner: 10Ottomata) [20:58:52] !log mholloway-shell@deploy1001 Started deploy [mobileapps/deploy@770cdb0]: Update mobileapps to 8c76d52 [20:58:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:04:34] !log mholloway-shell@deploy1001 Finished deploy [mobileapps/deploy@770cdb0]: Update mobileapps to 8c76d52 (duration: 05m 41s) [21:04:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:04:48] (03CR) 10Samwilson: [C: 031] Enable Draft namespace on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441944 (https://phabricator.wikimedia.org/T198143) (owner: 10MusikAnimal) [21:06:05] bawolff and Reedy: Time to snap out of that daydream and deploy Weekly Security deployment window. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180625T2100). [21:07:45] (03Abandoned) 10Ottomata: Add $kafka_enable_auto_commit and $kafka_enable_auto_offset_store params [puppet/kafkatee] - 10https://gerrit.wikimedia.org/r/421599 (owner: 10Ottomata) [21:08:29] (03Abandoned) 10Ottomata: Support looking up secrets in different modules [puppet] - 10https://gerrit.wikimedia.org/r/391214 (owner: 10Ottomata) [21:08:54] (03Abandoned) 10Ottomata: Include discovery-stats user in analytics_cluster::users [puppet] - 10https://gerrit.wikimedia.org/r/373689 (https://phabricator.wikimedia.org/T174110) (owner: 10Ottomata) [21:15:29] (03CR) 10Ottomata: "I think we talked about this in IRC...but I forget what we said!" [puppet] - 10https://gerrit.wikimedia.org/r/440507 (owner: 10Elukey) [21:20:35] (03CR) 10Ottomata: [C: 031] "Wow missed this one LONG ago. +1 :p" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/350542 (owner: 10Elukey) [21:34:37] 10Operations, 10Analytics, 10hardware-requests: eqiad: (1) new stat box to offload users from stat1005 - https://phabricator.wikimedia.org/T196345#4253175 (10Tbayer) >>! In T196345#4312713, @elukey wrote: >>>! In T196345#4312349, @RobH wrote: >> We don't have to move users off a box as soon as the warranty e... [21:43:11] 10Operations, 10Research, 10Research-collaborations, 10Research-management, and 2 others: Remove shell access for ironholds on 2018-06-29 - https://phabricator.wikimedia.org/T197895#4313530 (10RobH) [21:52:22] (03CR) 10MarcoAurelio: [C: 031] Fix en-rtl in Special:SiteMatrix in beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441422 (https://phabricator.wikimedia.org/T195675) (owner: 10C. Scott Ananian) [21:54:57] (03PS1) 10Urbanecm: Add some namespace aliases to ruwikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/441988 (https://phabricator.wikimedia.org/T197058) [21:56:42] (03PS1) 10Andrew Bogott: Cloud puppet enc api: Allow read/write access from labservices host [puppet] - 10https://gerrit.wikimedia.org/r/441989 (https://phabricator.wikimedia.org/T198123) [22:00:08] Niharika: around? [22:00:51] (03CR) 10Andrew Bogott: [C: 032] Cloud puppet enc api: Allow read/write access from labservices host [puppet] - 10https://gerrit.wikimedia.org/r/441989 (https://phabricator.wikimedia.org/T198123) (owner: 10Andrew Bogott) [22:01:18] Hauskatze: Partially. In a meeting. What's up? [22:02:33] Niharika: well, it's a question that can wait. Enjoy your meeting. [22:03:10] Hauskatze: Ask away. I'll be in meetings for another 2.5 hours. :P I can respond, but will be a little slow. [22:04:16] Niharika: so I am adding support for Manipuri and I'm not sure you'd know if they speak some other language close to that that could be interchangeable (fallback for mediawiki interface)? [22:05:09] Hauskatze: I wouldn't know. I can potentially ask someone. [22:05:20] it's also called Meithei or Meitei [22:05:36] Niharika: well, like I said, it's not urgent and fallbacks can be set later :) [22:07:20] Hauskatze: Great. I'll reach out to folks and get back to you if I get an answer. :) [22:08:12] Niharika: thanks! - probably T198132 would be a better place [22:08:12] T198132: Add Meithei (mni) to Names.php - https://phabricator.wikimedia.org/T198132 [22:08:31] I'm not always on irc [22:09:16] Subscribed. [22:33:49] (03PS1) 10Alex Monk: [WIP] Central certificates service [puppet] - 10https://gerrit.wikimedia.org/r/441991 (https://phabricator.wikimedia.org/T194962) [22:34:24] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Central certificates service [puppet] - 10https://gerrit.wikimedia.org/r/441991 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [22:45:13] (03PS2) 10Jforrester: Update BetaFeature natural retirement dates based on last user-facing change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440878 [22:45:46] (03CR) 10jerkins-bot: [V: 04-1] Update BetaFeature natural retirement dates based on last user-facing change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440878 (owner: 10Jforrester) [22:48:04] (03PS2) 10Alex Monk: [WIP] Central certificates service [puppet] - 10https://gerrit.wikimedia.org/r/441991 (https://phabricator.wikimedia.org/T194962) [22:48:38] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Central certificates service [puppet] - 10https://gerrit.wikimedia.org/r/441991 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [22:50:09] (03PS3) 10Alex Monk: [WIP] Central certificates service [puppet] - 10https://gerrit.wikimedia.org/r/441991 (https://phabricator.wikimedia.org/T194962) [22:50:35] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Central certificates service [puppet] - 10https://gerrit.wikimedia.org/r/441991 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [22:54:09] (03PS4) 10Alex Monk: [WIP] Central certificates service [puppet] - 10https://gerrit.wikimedia.org/r/441991 (https://phabricator.wikimedia.org/T194962) [22:54:36] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Central certificates service [puppet] - 10https://gerrit.wikimedia.org/r/441991 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [22:54:38] Cherry pick failed: merge conflict <-- meh [22:58:03] (03CR) 10Alex Monk: "People should feel free to fix the wmf_style 'problems', I don't have time for it" [puppet] - 10https://gerrit.wikimedia.org/r/441991 (https://phabricator.wikimedia.org/T194962) (owner: 10Alex Monk) [23:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor I � Unicode. All rise for Evening SWAT (Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180625T2300). [23:00:04] Smalyshev and James_F: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:13] here [23:00:27] * James_F waves. [23:03:45] I can SWAT [23:04:40] (03CR) 1020after4: [C: 032] Add Lexemes to instant-index set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440420 (https://phabricator.wikimedia.org/T196896) (owner: 10Smalyshev) [23:05:12] 10Operations, 10Analytics, 10Traffic: Size of headers processed by varnish? - https://phabricator.wikimedia.org/T198152#4313704 (10Nuria) [23:06:09] (03Merged) 10jenkins-bot: Add Lexemes to instant-index set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440420 (https://phabricator.wikimedia.org/T196896) (owner: 10Smalyshev) [23:07:51] twentyafterfour: unfortunately this patch is hard to test without creating new items, so I'd just verify it on organically created new ones [23:08:15] SMalyshev: so no mwdebug? [23:08:29] yeah no point in putting it there only [23:08:37] ok I'll just sync it then ... [23:08:51] yep, thanks [23:10:17] syncing [23:10:57] !log twentyafterfour@deploy1001 Synchronized wmf-config/Wikibase-production.php: sync https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/440420/ for SWAT (duration: 00m 57s) [23:10:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:11:11] 10Operations, 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Puppet admin module should support adding system users to managed groups - https://phabricator.wikimedia.org/T174465#4313747 (10Nuria) 05Open>03Resolved [23:12:59] SMalyshev: can you verify? [23:14:06] (03PS6) 10EBernhardson: [WIP] convert role::logstash::elasticsearch to profiles [puppet] - 10https://gerrit.wikimedia.org/r/441894 [23:14:23] (03CR) 10Alex Monk: "I don't think non-wiki sites would normally get a mobile subdomain." [dns] - 10https://gerrit.wikimedia.org/r/441817 (https://phabricator.wikimedia.org/T189637) (owner: 10MarcoAurelio) [23:15:50] (03PS3) 10Jforrester: Update BetaFeature natural retirement dates based on last user-facing change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440878 [23:18:57] twentyafterfour: will do [23:19:34] (03CR) 10MarcoAurelio: "Thank you. Yes, my idea is to create this domain and later via apache (funnel?) puppet config point it to toolsadmin.wikimedia.org. I was " [dns] - 10https://gerrit.wikimedia.org/r/441817 (https://phabricator.wikimedia.org/T189637) (owner: 10MarcoAurelio) [23:20:41] (03CR) 1020after4: [C: 032] Update BetaFeature natural retirement dates based on last user-facing change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440878 (owner: 10Jforrester) [23:20:50] Thanks, twentyafterfour. [23:20:52] James_F: looks like your patch is a no-op? [23:20:59] twentyafterfour: Yup, just comment updates. [23:22:00] (03Merged) 10jenkins-bot: Update BetaFeature natural retirement dates based on last user-facing change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440878 (owner: 10Jforrester) [23:22:51] twentyafterfour: seems to be working ok [23:23:03] SMalyshev: cool, thanks for verifying [23:23:17] I don't see any scary errors in the logs so I think we're good [23:23:46] !log twentyafterfour@deploy1001 Synchronized wmf-config/InitialiseSettings.php: sync https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/440878/ for SWAT: no-op (duration: 00m 56s) [23:23:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:23:54] Thanks! [23:24:35] James_F: you're welcome [23:30:08] !log After monitoring logstash, everything appears stable. Evening SWAT is complete. [23:30:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:39:18] (03CR) 10jenkins-bot: Update BetaFeature natural retirement dates based on last user-facing change [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440878 (owner: 10Jforrester) [23:40:48] !log restarting jenkins for updates [23:40:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:41:16] (03CR) 10jenkins-bot: Add Lexemes to instant-index set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440420 (https://phabricator.wikimedia.org/T196896) (owner: 10Smalyshev) [23:53:44] (03PS7) 10EBernhardson: prometheus/elasticsearch support multiple exporters per host [puppet] - 10https://gerrit.wikimedia.org/r/441321 [23:53:46] (03PS10) 10EBernhardson: [WIP] Split instance define out of elasticsearch class [puppet] - 10https://gerrit.wikimedia.org/r/441338 [23:53:48] (03PS38) 10EBernhardson: [WIP] Allow multiple elasticsearch instances per host [puppet] - 10https://gerrit.wikimedia.org/r/440049 [23:54:48] (03CR) 10jerkins-bot: [V: 04-1] prometheus/elasticsearch support multiple exporters per host [puppet] - 10https://gerrit.wikimedia.org/r/441321 (owner: 10EBernhardson)