[00:00:00] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [00:00:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [00:00:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [00:05:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [00:05:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [00:10:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [00:10:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [00:15:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [00:15:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [00:20:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [00:20:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [00:25:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [00:25:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [00:30:10] PROBLEM - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [00:30:10] PROBLEM - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) [00:31:01] ACKNOWLEDGEMENT - check_ssl on saiph is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) Filippo Giunchedi T154448 [00:31:01] ACKNOWLEDGEMENT - check_ssl on thulium is CRITICAL: SSL CRITICAL - Certificate payments-listener.wikimedia.org valid until 2017-01-09 20:31:03 +0000 (expires in 6 days) Filippo Giunchedi T154448 [00:31:30] 06Operations, 10fundraising-tech-ops: SSL cert for payments-listener.wikimedia.org expires on 2017-01-09 (~6 days) - https://phabricator.wikimedia.org/T154448#2911794 (10fgiunchedi) cc @Jgreen @RobH [00:32:49] 06Operations, 10fundraising-tech-ops: SSL cert for payments-listener.wikimedia.org expires on 2017-01-09 (~6 days) - https://phabricator.wikimedia.org/T154448#2911796 (10Dereckson) p:05Triage>03High [01:10:40] PROBLEM - puppet last run on db1048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:38:40] RECOVERY - puppet last run on db1048 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [01:48:00] PROBLEM - Postgres Replication Lag on maps2004 is CRITICAL: CRITICAL - Rep Delay is: 1801.152699 Seconds [01:49:00] RECOVERY - Postgres Replication Lag on maps2004 is OK: OK - Rep Delay is: 25.492843 Seconds [01:55:10] PROBLEM - puppet last run on mw1219 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:10:40] PROBLEM - puppet last run on ms-be1016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:23:10] RECOVERY - puppet last run on mw1219 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [02:40:40] RECOVERY - puppet last run on ms-be1016 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [02:50:02] PROBLEM - puppet last run on serpens is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:18:00] RECOVERY - puppet last run on serpens is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [03:21:30] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 727.17 seconds [03:27:40] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 294.96 seconds [03:35:08] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [04:03:00] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [04:22:50] PROBLEM - puppet last run on labsdb1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:50:50] RECOVERY - puppet last run on labsdb1001 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [04:55:40] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [05:22:40] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [05:27:40] PROBLEM - puppet last run on db1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:56:40] RECOVERY - puppet last run on db1028 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:38:00] PROBLEM - Check HHVM threads for leakage on mw1260 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:46:50] PROBLEM - Check HHVM threads for leakage on mw1259 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:51:00] PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:55:50] PROBLEM - Check HHVM threads for leakage on mw1169 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [06:59:26] 06Operations, 10MediaWiki-Export-or-Import, 10Wikimedia-General-or-Unknown, 13Patch-For-Review: Special:Import error: "Import failed: Could not open import file" - https://phabricator.wikimedia.org/T17000#2912020 (10FilipGCI) 05Open>03Resolved Patch got merged, so closing as 'resolved'. [07:15:54] 06Operations, 10ops-codfw, 10DBA: db2035 reset - https://phabricator.wikimedia.org/T154189#2912029 (10Marostegui) 05Open>03Resolved a:03akosiaris The server looks fine now, so I will close this task for now as looks like it was Papaul by mistake. Thanks for taking care of this Alex! [07:21:50] RECOVERY - Check HHVM threads for leakage on mw1259 is OK: OK [07:30:53] !log Stop mysql db2048 and db2034 for maintenance - https://phabricator.wikimedia.org/T149553 [07:30:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:42:13] 06Operations, 15User-Elukey: hhvm root:adm owned log files cause failures for logrotate - https://phabricator.wikimedia.org/T146464#2912046 (10elukey) Another occurrence happened today for all the codfw hosts, seems related to the same file in all the hosts: ``` elukey@neodymium:~$ sudo -i salt -C 'mw2211*' c... [07:54:00] !log Run optimize table on a few large tables - db1044 - T153826 [07:54:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:54:04] T153826: Defragment db1044 - https://phabricator.wikimedia.org/T153826 [07:58:29] !log chown www-data:www-data all the root:adm hhvm log files on mw codfw hosts (T132324) [07:58:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:58:32] T132324: Tracking and Reducing cron-spam from root@ - https://phabricator.wikimedia.org/T132324 [08:00:31] !log Run optimize table on a few large tables - db1015 - T153739 [08:00:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:00:34] T153739: Defragment db1015 - https://phabricator.wikimedia.org/T153739 [08:06:09] 06Operations, 10Phabricator: Iridium (phabricator): Disk space is low - https://phabricator.wikimedia.org/T154407#2912091 (10mmodell) I cleaned up some temp files, disk space is still very low. [08:15:08] 06Operations, 10MediaWiki-Export-or-Import, 10Wikimedia-General-or-Unknown, 05MW-1.29-release-notes, and 2 others: Special:Import error: "Import failed: Could not open import file" - https://phabricator.wikimedia.org/T17000#2912112 (10Nemo_bis) 05Resolved>03Open Given the default for `$wgHTTPImportTime... [08:16:00] RECOVERY - Check HHVM threads for leakage on mw1260 is OK: OK [08:20:50] RECOVERY - Check HHVM threads for leakage on mw1169 is OK: OK [08:21:32] !log Run optimize table on db1038 on all the revision,templatelinks and pagelinks tables - T154465 [08:21:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:35] T154465: Defragment db1038 - https://phabricator.wikimedia.org/T154465 [08:25:39] (03PS2) 10Muehlenhoff: eventbus: Restrict to domain networks [puppet] - 10https://gerrit.wikimedia.org/r/328665 [08:32:00] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:39:20] 06Operations, 15User-Elukey: hhvm root:adm owned log files cause failures for logrotate - https://phabricator.wikimedia.org/T146464#2912120 (10elukey) Didn't find anything in the SAL and the puppet logs are gone, we might need to re-check periodically (before getting the cronspam) to catch one occurrence and g... [08:40:00] 06Operations, 06Commons, 10TimedMediaHandler-Transcode, 10Wikimedia-Video, and 3 others: Commons video transcoders have over 6500 tasks in the backlog. - https://phabricator.wikimedia.org/T153488#2912123 (10zhuyifei1999) [09:00:00] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [09:10:46] !log stop MySQL dbstore2001 for maintenance - T151552 [09:10:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:50] T151552: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552 [09:10:50] PROBLEM - very high load average likely xfs on ms-be1001 is CRITICAL: CRITICAL - load average: 103.81, 101.08, 98.04 [09:14:00] RECOVERY - Check HHVM threads for leakage on mw1168 is OK: OK [09:16:50] PROBLEM - very high load average likely xfs on ms-be1001 is CRITICAL: CRITICAL - load average: 102.61, 100.62, 98.72 [09:19:50] PROBLEM - very high load average likely xfs on ms-be1001 is CRITICAL: CRITICAL - load average: 101.85, 100.28, 98.86 [09:23:40] !log stop MySQL dbstore2002 for maintenance - T151552 [09:23:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:23:43] T151552: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552 [09:39:50] PROBLEM - very high load average likely xfs on ms-be1001 is CRITICAL: CRITICAL - load average: 106.36, 100.53, 99.23 [09:43:50] PROBLEM - very high load average likely xfs on ms-be1001 is CRITICAL: CRITICAL - load average: 109.36, 102.18, 100.04 [09:45:30] 06Operations, 10MediaWiki-Export-or-Import, 10Wikimedia-General-or-Unknown, 03Google-Code-In-2016, and 2 others: Special:Import error: "Import failed: Could not open import file" - https://phabricator.wikimedia.org/T17000#2912186 (10Aklapper) The pleasure of having tasks mixing up the general MediaWiki cod... [10:03:14] 06Operations, 10MediaWiki-Export-or-Import, 10Wikimedia-General-or-Unknown, 03Google-Code-In-2016, and 2 others: Special:Import error: "Import failed: Could not open import file" - https://phabricator.wikimedia.org/T17000#2912218 (10Joe) >>! In T17000#2908075, @TTO wrote: >> Any timeout longer than 60 seco... [10:03:54] 06Operations, 10MediaWiki-Export-or-Import, 10Wikimedia-General-or-Unknown, 03Google-Code-In-2016, and 2 others: Special:Import error: "Import failed: Could not open import file" - https://phabricator.wikimedia.org/T17000#2912219 (10Joe) >>! In T17000#2912186, @Aklapper wrote: > The pleasure of having task... [10:05:35] (03CR) 10Muehlenhoff: [C: 032] eventbus: Restrict to domain networks [puppet] - 10https://gerrit.wikimedia.org/r/328665 (owner: 10Muehlenhoff) [10:07:50] PROBLEM - very high load average likely xfs on ms-be1001 is CRITICAL: CRITICAL - load average: 104.50, 100.26, 99.13 [10:09:50] PROBLEM - very high load average likely xfs on ms-be1001 is CRITICAL: CRITICAL - load average: 107.53, 102.34, 99.99 [10:14:33] !log start enabling ntpd again across the fleet. Starting with cp boxes on ulsfo and esams [10:14:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:16:50] PROBLEM - very high load average likely xfs on ms-be1001 is CRITICAL: CRITICAL - load average: 106.70, 101.01, 99.88 [10:17:42] (03PS2) 10Alexandros Kosiaris: Revert NTP disabling for leap second [puppet] - 10https://gerrit.wikimedia.org/r/329785 [10:17:50] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Revert NTP disabling for leap second [puppet] - 10https://gerrit.wikimedia.org/r/329785 (owner: 10Alexandros Kosiaris) [10:21:29] 06Operations, 10ops-eqiad, 10media-storage: Degraded RAID on ms-be1001 - https://phabricator.wikimedia.org/T154396#2912231 (10Volans) [10:22:39] 06Operations, 10ops-eqiad, 10media-storage: Degraded RAID on ms-be1001 - https://phabricator.wikimedia.org/T154396#2909758 (10jcrespo) This is likely causing issues with lots of "mkfs.xfs -L swift-sdc1 -i size=512 /dev/sdc1" pile-ups blocked on "umount -fl /srv/swift-storage/sdc1". [10:23:00] PROBLEM - MariaDB Slave IO: s1 on dbstore2002 is CRITICAL: CRITICAL slave_io_state could not connect [10:23:00] PROBLEM - MariaDB Slave SQL: s5 on dbstore2002 is CRITICAL: CRITICAL slave_sql_state could not connect [10:23:00] PROBLEM - MariaDB Slave SQL: s3 on dbstore2002 is CRITICAL: CRITICAL slave_sql_state could not connect [10:23:00] PROBLEM - MariaDB Slave SQL: s1 on dbstore2002 is CRITICAL: CRITICAL slave_sql_state could not connect [10:23:00] PROBLEM - MariaDB Slave SQL: s4 on dbstore2002 is CRITICAL: CRITICAL slave_sql_state could not connect [10:23:00] PROBLEM - MariaDB Slave IO: s4 on dbstore2002 is CRITICAL: CRITICAL slave_io_state could not connect [10:23:00] PROBLEM - MariaDB Slave IO: s5 on dbstore2002 is CRITICAL: CRITICAL slave_io_state could not connect [10:23:10] PROBLEM - MariaDB Slave IO: s3 on dbstore2002 is CRITICAL: CRITICAL slave_io_state could not connect [10:23:10] I thought I disabled the alert [10:23:10] PROBLEM - MariaDB Slave SQL: x1 on dbstore2002 is CRITICAL: CRITICAL slave_sql_state could not connect [10:23:10] PROBLEM - MariaDB Slave IO: m3 on dbstore2002 is CRITICAL: CRITICAL slave_io_state could not connect [10:23:11] :( [10:24:36] 06Operations, 10ops-eqiad, 10media-storage: Degraded RAID on ms-be1006 - https://phabricator.wikimedia.org/T154418#2912250 (10Volans) [10:26:19] RECOVERY - MariaDB Slave IO: s3 on dbstore2002 is OK: OK slave_io_state Slave_IO_Running: Yes [10:26:19] RECOVERY - MariaDB Slave SQL: x1 on dbstore2002 is OK: OK slave_sql_state not a slave [10:26:19] RECOVERY - MariaDB Slave IO: m3 on dbstore2002 is OK: OK slave_io_state not a slave [10:26:59] RECOVERY - MariaDB Slave IO: s1 on dbstore2002 is OK: OK slave_io_state Slave_IO_Running: Yes [10:26:59] RECOVERY - MariaDB Slave SQL: s3 on dbstore2002 is OK: OK slave_sql_state Slave_SQL_Running: Yes [10:26:59] RECOVERY - MariaDB Slave SQL: s5 on dbstore2002 is OK: OK slave_sql_state Slave_SQL_Running: Yes [10:27:09] RECOVERY - MariaDB Slave IO: s5 on dbstore2002 is OK: OK slave_io_state Slave_IO_Running: Yes [10:27:09] RECOVERY - MariaDB Slave SQL: s1 on dbstore2002 is OK: OK slave_sql_state Slave_SQL_Running: Yes [10:27:19] RECOVERY - MariaDB Slave SQL: s4 on dbstore2002 is OK: OK slave_sql_state Slave_SQL_Running: Yes [10:27:19] RECOVERY - MariaDB Slave IO: s4 on dbstore2002 is OK: OK slave_io_state Slave_IO_Running: Yes [10:30:30] 06Operations, 10ops-eqiad, 10media-storage: Degraded RAID on ms-be1001 - https://phabricator.wikimedia.org/T154396#2912252 (10jcrespo) https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?from=1482833816504&to=1483438616504&var-datasource=eqiad%20prometheus%2Fops&var-cluster=swift&cluster=... [10:33:31] 06Operations, 10ops-eqiad, 10media-storage: Degraded RAID on ms-be1001 - https://phabricator.wikimedia.org/T154396#2912253 (10jcrespo) The processes are IO-blocked (cannot be killed). The proper long-term fix is to create a locking mechanism to avoid starting more process if the current ones are ongoing. A r... [10:34:49] PROBLEM - very high load average likely xfs on ms-be1001 is CRITICAL: CRITICAL - load average: 104.51, 100.77, 99.90 [10:38:39] 06Operations: debdeploy should show which servers need service restarts - https://phabricator.wikimedia.org/T154068#2912261 (10MoritzMuehlenhoff) 05Open>03Resolved a:03MoritzMuehlenhoff Already implemented in debdeploy.git: | * List hosts needing a restart by default instead of only showing | the amount... [10:39:00] !log reenabling ntpd on codfw cp boxes [10:39:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:40:57] (03Abandoned) 10Muehlenhoff: Prevent access to hidden directories [puppet] - 10https://gerrit.wikimedia.org/r/217794 (https://phabricator.wikimedia.org/T94570) (owner: 10Muehlenhoff) [10:44:37] !log reenabling ntpd on eqiad cp boxes [10:44:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:49:06] 06Operations, 06Performance-Team, 10Thumbor: Implement rate limiter in Thumbor - https://phabricator.wikimedia.org/T151067#2912278 (10Gilles) [10:49:09] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: Implement PoolCounter support in Thumbor - https://phabricator.wikimedia.org/T151066#2912279 (10Gilles) [10:50:38] !log reenabling ntpd on mw codfw boxes [10:50:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:53:57] !log stopping mysql replication on db1035 (depooled) [10:53:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:59:02] !log reenabling ntpd on mw eqiad boxes [10:59:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:54] 06Operations, 10Phabricator: Iridium (phabricator): Disk space is low - https://phabricator.wikimedia.org/T154407#2912315 (10Paladox) thanks. [11:07:38] !log reenabling ntpd on wtp codfw boxes [11:07:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:08:27] 06Operations, 06Discovery, 10Elasticsearch, 06Discovery-Search (Current work), 13Patch-For-Review: Upgrade our logstash-gelf package to latest available upstream version - https://phabricator.wikimedia.org/T150408#2912317 (10Gehel) Hadoop also [[ https://github.com/wikimedia/operations-puppet/blob/produc... [11:13:58] !log reenabling ntpd on db* codfw boxes [11:14:01] 06Operations, 10DBA, 13Patch-For-Review: Throttle mysql backups on dbstore1001 in order to not saturate the node - https://phabricator.wikimedia.org/T134977#2912319 (10jcrespo) 05Open>03Resolved This seems solved now, no alerts are generated and all backups are created as expected. [11:14:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:15:04] (03PS1) 10Giuseppe Lavagetto: docker::baseimages: add support for alpine linux [puppet] - 10https://gerrit.wikimedia.org/r/330206 [11:15:06] (03PS1) 10Giuseppe Lavagetto: profile::docker::builder: add alpine linux [puppet] - 10https://gerrit.wikimedia.org/r/330207 [11:16:06] !log upgrade lilogstash-gelf on relforge - T150408 [11:16:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:16:10] T150408: Upgrade our logstash-gelf package to latest available upstream version - https://phabricator.wikimedia.org/T150408 [11:16:21] (03CR) 10jerkins-bot: [V: 04-1] profile::docker::builder: add alpine linux [puppet] - 10https://gerrit.wikimedia.org/r/330207 (owner: 10Giuseppe Lavagetto) [11:18:54] (03CR) 10Jcrespo: "Blocked on Andrew B. to know more about puppet and labspuppet." [puppet] - 10https://gerrit.wikimedia.org/r/328476 (owner: 10Jcrespo) [11:21:05] (03PS2) 10Giuseppe Lavagetto: profile::docker::builder: add alpine linux [puppet] - 10https://gerrit.wikimedia.org/r/330207 [11:27:24] !log upgrade liblogstash-gelf on deployment-elastic* - T150408 [11:27:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:27:28] T150408: Upgrade our logstash-gelf package to latest available upstream version - https://phabricator.wikimedia.org/T150408 [11:31:42] 06Operations, 10Wikimedia-Logstash: Update logstash on wikimedia to 2.x or 5.x - https://phabricator.wikimedia.org/T154473#2912333 (10Paladox) [11:31:49] (03PS1) 10Giuseppe Lavagetto: Removed unused CNAME [dns] - 10https://gerrit.wikimedia.org/r/330208 [11:32:11] 06Operations, 10ops-eqiad, 06DC-Ops, 13Patch-For-Review, 15User-Joe: Hardware decommission mw1017, mw1099 - https://phabricator.wikimedia.org/T151303#2912345 (10Joe) Yes, it's ok, I just removed that unused alias. [11:32:21] !log installing tar security updates on trusty hosts [11:32:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:33:07] (03CR) 10Giuseppe Lavagetto: [C: 032] Removed unused CNAME [dns] - 10https://gerrit.wikimedia.org/r/330208 (owner: 10Giuseppe Lavagetto) [11:33:12] 06Operations, 06Discovery, 10Elasticsearch, 06Discovery-Search (Current work), 13Patch-For-Review: Upgrade our logstash-gelf package to latest available upstream version - https://phabricator.wikimedia.org/T150408#2912346 (10Gehel) `relforge*` and `deployment-elastic*` have the new logstash-gelf version... [11:34:50] 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch, 10Wikimedia-Logstash: Update logstash on wikimedia to 2.x or 5.x - https://phabricator.wikimedia.org/T154473#2912348 (10Paladox) [11:38:04] 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch, 10Wikimedia-Logstash: Update logstash on wikimedia to 2.x or 5.x - https://phabricator.wikimedia.org/T154473#2912352 (10Paladox) [11:41:04] 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch, 10Wikimedia-Logstash: Update logstash on wikimedia to 2.x or 5.x - https://phabricator.wikimedia.org/T154473#2912355 (10Paladox) [11:43:51] 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch, 10Wikimedia-Logstash: Update logstash on wikimedia to 2.x or 5.x - https://phabricator.wikimedia.org/T154473#2912359 (10Gehel) @bd808 might be interested in this one as well. Note that our logstash cluster is already upgraded to elasticsearc... [11:46:59] !log reenabling ntpd on cobalt (gerrit) [11:47:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:51:57] !log reenabling ntpd on db* eqiad boxes [11:52:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:05] !log reenabling ntpd on logstash eqiad boxes [11:52:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:54:55] !log reenabling ntpd on wtp eqiad boxes [11:54:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:58:59] (03CR) 10Giuseppe Lavagetto: [C: 032] docker::baseimages: add support for alpine linux [puppet] - 10https://gerrit.wikimedia.org/r/330206 (owner: 10Giuseppe Lavagetto) [12:01:23] (03PS4) 10Paladox: Gerrit: Enable config localUsernameToLowerCase [puppet] - 10https://gerrit.wikimedia.org/r/326150 (https://phabricator.wikimedia.org/T152640) [12:01:33] (03PS5) 10Paladox: Gerrit: Enable config localUsernameToLowerCase [puppet] - 10https://gerrit.wikimedia.org/r/326150 (https://phabricator.wikimedia.org/T152640) [12:02:08] <_joe_> akosiaris: whenever you've enabled ntp on copper, can you run puppet there? [12:02:15] PROBLEM - Postgres Replication Lag on maps-test2003 is CRITICAL: CRITICAL - Rep Delay is: 1911.68373 Seconds [12:02:15] PROBLEM - Postgres Replication Lag on maps-test2004 is CRITICAL: CRITICAL - Rep Delay is: 1915.670167 Seconds [12:03:15] RECOVERY - Postgres Replication Lag on maps-test2003 is OK: OK - Rep Delay is: 0.0 Seconds [12:05:02] _joe_: yup [12:05:06] will do now [12:05:15] RECOVERY - Postgres Replication Lag on maps-test2004 is OK: OK - Rep Delay is: 0.0 Seconds [12:06:05] _joe_: done [12:06:14] !log reenabling ntpd on ganeti eqiad & codfw boxes [12:06:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:06:55] PROBLEM - puppet last run on ganeti1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:07:09] !log reenabling ntpd on pc eqiad & codfw boxes [12:07:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:09:15] PROBLEM - Postgres Replication Lag on maps-test2002 is CRITICAL: CRITICAL - Rep Delay is: 2328.951029 Seconds [12:10:15] RECOVERY - Postgres Replication Lag on maps-test2002 is OK: OK - Rep Delay is: 0.0 Seconds [12:12:07] !log installing squid security updates [12:12:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:12:43] (03CR) 10Faidon Liambotis: "Why was this abandoned?" [puppet] - 10https://gerrit.wikimedia.org/r/304580 (owner: 10Dzahn) [12:12:55] RECOVERY - puppet last run on ganeti1004 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [12:15:15] PROBLEM - Postgres Replication Lag on maps-test2003 is CRITICAL: CRITICAL - Rep Delay is: 2691.631121 Seconds [12:16:15] RECOVERY - Postgres Replication Lag on maps-test2003 is OK: OK - Rep Delay is: 0.0 Seconds [12:18:15] PROBLEM - Postgres Replication Lag on maps-test2002 is CRITICAL: CRITICAL - Rep Delay is: 2869.018796 Seconds [12:19:15] PROBLEM - Postgres Replication Lag on maps-test2004 is CRITICAL: CRITICAL - Rep Delay is: 2935.553906 Seconds [12:21:16] RECOVERY - Postgres Replication Lag on maps-test2002 is OK: OK - Rep Delay is: 0.0 Seconds [12:21:16] RECOVERY - Postgres Replication Lag on maps-test2004 is OK: OK - Rep Delay is: 0.0 Seconds [12:41:15] !log installing python security updates [12:41:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:48:59] !log reenabling ntpd on analytics boxes [12:49:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:52:11] !log reenabling ntpd on lvs boxes [12:52:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:55:14] !log reenabling ntpd on ms-be boxes [12:55:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:55:20] PROBLEM - puppet last run on ms-be1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 18 seconds ago with 1 failures. Failed resources (up to 3 shown): Exec[mkfs-/dev/sdk1] [12:55:30] PROBLEM - puppet last run on ms-be1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 53 seconds ago with 1 failures. Failed resources (up to 3 shown): Exec[mkfs-/dev/sdc1] [13:00:03] !log installing libgd security updates [13:00:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:32] !log reenabling ntpd on ms-fe boxes [13:01:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:15] PROBLEM - Postgres Replication Lag on maps-test2004 is CRITICAL: CRITICAL - Rep Delay is: 2143.856752 Seconds [13:07:15] RECOVERY - Postgres Replication Lag on maps-test2004 is OK: OK - Rep Delay is: 0.0 Seconds [13:09:27] !log reenabling ntpd on mc* boxes [13:09:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:34] !log uploaded Linux 4.4.39 for jessie-wikimedia to carbon [13:09:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:15] PROBLEM - Postgres Replication Lag on maps-test2004 is CRITICAL: CRITICAL - Rep Delay is: 2383.604556 Seconds [13:11:15] RECOVERY - Postgres Replication Lag on maps-test2004 is OK: OK - Rep Delay is: 0.0 Seconds [13:14:15] PROBLEM - Postgres Replication Lag on maps-test2004 is CRITICAL: CRITICAL - Rep Delay is: 2623.810111 Seconds [13:15:15] RECOVERY - Postgres Replication Lag on maps-test2004 is OK: OK - Rep Delay is: 0.0 Seconds [13:15:24] !log reenabling ntpd on scb* boxes [13:15:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:32] !log reenabling ntpd on es* boxes [13:22:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:37] !log reenabling ntpd on conf boxes [13:22:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:24:18] (03CR) 10Giuseppe Lavagetto: [C: 032] profile::docker::builder: add alpine linux [puppet] - 10https://gerrit.wikimedia.org/r/330207 (owner: 10Giuseppe Lavagetto) [13:27:58] !log reenabling ntpd on rdb boxes [13:28:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:28:53] jouncebot: next [13:28:53] In 0 hour(s) and 31 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170103T1400) [13:29:31] hashar: busy eu swat today ^ [13:29:43] ah yeah swat [13:29:51] what's the plan? me? you? #together? [13:33:04] gpgg [13:33:07] gotta review them [13:35:38] (03CR) 10Hashar: [C: 04-1] "I dont think there is much point in adding your.org to the whitelist. Lets follow up on T153569" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328036 (https://phabricator.wikimedia.org/T153569) (owner: 10Urbanecm) [13:37:01] !log reenabling ntpd on elastic boxes [13:37:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:14] (03CR) 10Gehel: [C: 032] New upstream version: 1.11.0 (031 comment) [debs/logstash-gelf] - 10https://gerrit.wikimedia.org/r/320992 (https://phabricator.wikimedia.org/T150408) (owner: 10Gehel) [13:38:24] (03CR) 10Gehel: [C: 032] Imported Upstream version 1.11.0 [debs/logstash-gelf] - 10https://gerrit.wikimedia.org/r/320991 (owner: 10Gehel) [13:39:19] 06Operations, 06Commons, 10TimedMediaHandler-Transcode, 10Wikimedia-Video, and 3 others: Commons video transcoders have over 6500 tasks in the backlog. - https://phabricator.wikimedia.org/T153488#2912617 (10zhuyifei1999) [13:40:35] zeljkof: and I -1ed one already ;D [13:41:06] hashar: that's the way to start new year ;) [13:41:29] (03CR) 10Hashar: [C: 031] Add new page protection level on etwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327789 (https://phabricator.wikimedia.org/T153465) (owner: 10Urbanecm) [13:41:46] I have -1ed 328036 Add ftpmirror.your.org to whitelist of commons [13:49:28] (03CR) 10Hashar: [C: 031] "We will want to optimize the png files later on." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328908 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [13:59:09] jouncebot, next [13:59:09] In 0 hour(s) and 0 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170103T1400) [14:00:05] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170103T1400). Please do the needful. [14:00:05] Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [14:00:11] Ready for deployment [14:01:02] neat [14:01:04] jouncebot: Nemo_bis [14:01:06] jouncebot: next [14:01:06] In 1 hour(s) and 58 minute(s): Wikimania Scholarships 2017 (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170103T1600) [14:01:07] bah [14:01:29] (03PS2) 10Hashar: Add new page protection level on etwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327789 (https://phabricator.wikimedia.org/T153465) (owner: 10Urbanecm) [14:01:33] (03CR) 10Hashar: [C: 032] Add new page protection level on etwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327789 (https://phabricator.wikimedia.org/T153465) (owner: 10Urbanecm) [14:02:23] hashar, according to your comment at 328908. I've run optiPNG already so I think they are optimalized. [14:02:24] (03PS4) 10Hashar: Enable subpages in NS0 for arbcom_cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327700 (https://phabricator.wikimedia.org/T154247) (owner: 10Urbanecm) [14:02:38] (03Merged) 10jenkins-bot: Add new page protection level on etwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327789 (https://phabricator.wikimedia.org/T153465) (owner: 10Urbanecm) [14:02:43] Urbanecm: excellent :] [14:02:49] (03CR) 10jenkins-bot: Add new page protection level on etwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327789 (https://phabricator.wikimedia.org/T153465) (owner: 10Urbanecm) [14:03:35] Urbanecm: page protection level for etwiki is on mwdebug1002 [14:04:03] hashar, that's great! [14:04:53] (03PS5) 10Hashar: Enable subpages in NS0 for arbcom_cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327700 (https://phabricator.wikimedia.org/T154247) (owner: 10Urbanecm) [14:05:15] (03CR) 10Hashar: Enable subpages in NS0 for arbcom_cswiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327700 (https://phabricator.wikimedia.org/T154247) (owner: 10Urbanecm) [14:05:32] (03CR) 10Hashar: [C: 032] Enable subpages in NS0 for arbcom_cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327700 (https://phabricator.wikimedia.org/T154247) (owner: 10Urbanecm) [14:05:33] hashar: you are doing the swat, I guess? ;) [14:05:41] yeah will rush it [14:06:24] (03Merged) 10jenkins-bot: Enable subpages in NS0 for arbcom_cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327700 (https://phabricator.wikimedia.org/T154247) (owner: 10Urbanecm) [14:07:07] hashar, page protection level for etwiki works correctly. [14:07:09] !log reenabling ntpd on kafka boxes [14:07:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:13] (03CR) 10jenkins-bot: Enable subpages in NS0 for arbcom_cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327700 (https://phabricator.wikimedia.org/T154247) (owner: 10Urbanecm) [14:08:16] !log reenabling ntpd on maps, maps-test boxes [14:08:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:36] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Add new page protection level on etwiki - T153465 (duration: 00m 53s) [14:08:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:39] T153465: Add new page protection level on et.wikipedia.org - https://phabricator.wikimedia.org/T153465 [14:09:33] Urbanecm: doing the NS0 for arbcom_cswiki [14:10:39] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable subpages in NS0 for arbcom_cswiki - T154247 (duration: 00m 40s) [14:10:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:10:42] T154247: Enable subpages in NS_MAIN in arbcom_cswiki - https://phabricator.wikimedia.org/T154247 [14:11:33] Urbanecm: about adding ftpmirror.your.org I am not convinced. I -1ed the change and gave my rational on the task [14:11:34] hashar, subpages works :) [14:11:36] so skipping it [14:12:18] hashar, okay, you're right. If archive.org is added and files from commons are there in full-size, it's unneeded. [14:13:00] (03PS2) 10Hashar: Enable SandboxLink on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328502 (https://phabricator.wikimedia.org/T153855) (owner: 10Urbanecm) [14:13:02] (03PS3) 10Hashar: Set sortPrepend for gdwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328570 (https://phabricator.wikimedia.org/T153900) (owner: 10Urbanecm) [14:13:41] (03PS2) 10Hashar: Enable mapframe for nowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328889 (https://phabricator.wikimedia.org/T154021) (owner: 10Urbanecm) [14:13:43] (03PS1) 10Giuseppe Lavagetto: docker::baseimages: improvements to script to build alpine linux [puppet] - 10https://gerrit.wikimedia.org/r/330218 [14:13:51] 06Operations, 06Discovery, 10Elasticsearch, 06Discovery-Search (Current work), 13Patch-For-Review: Upgrade our logstash-gelf package to latest available upstream version - https://phabricator.wikimedia.org/T150408#2912676 (10Ottomata) > Hadoop also uses liblogstash-gelf Sort of. Early on in the Analyt... [14:13:57] (03CR) 10Hashar: [C: 032] Enable SandboxLink on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328502 (https://phabricator.wikimedia.org/T153855) (owner: 10Urbanecm) [14:14:12] (03CR) 10Hashar: [C: 032] Set sortPrepend for gdwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328570 (https://phabricator.wikimedia.org/T153900) (owner: 10Urbanecm) [14:14:16] (03CR) 10Hashar: [C: 032] Enable mapframe for nowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328889 (https://phabricator.wikimedia.org/T154021) (owner: 10Urbanecm) [14:14:50] (03CR) 10jerkins-bot: [V: 04-1] docker::baseimages: improvements to script to build alpine linux [puppet] - 10https://gerrit.wikimedia.org/r/330218 (owner: 10Giuseppe Lavagetto) [14:14:57] (03Merged) 10jenkins-bot: Enable SandboxLink on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328502 (https://phabricator.wikimedia.org/T153855) (owner: 10Urbanecm) [14:15:08] (03CR) 10jenkins-bot: Enable SandboxLink on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328502 (https://phabricator.wikimedia.org/T153855) (owner: 10Urbanecm) [14:15:34] (03Merged) 10jenkins-bot: Set sortPrepend for gdwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328570 (https://phabricator.wikimedia.org/T153900) (owner: 10Urbanecm) [14:15:46] (03CR) 10jenkins-bot: Set sortPrepend for gdwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328570 (https://phabricator.wikimedia.org/T153900) (owner: 10Urbanecm) [14:16:00] (03Merged) 10jenkins-bot: Enable mapframe for nowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328889 (https://phabricator.wikimedia.org/T154021) (owner: 10Urbanecm) [14:16:01] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable SandboxLink on ruwiki T153855 (duration: 00m 40s) [14:16:01] Urbanecm: deploying 3 changes [14:16:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:16:05] T153855: Enable SandboxLink on ruwiki - https://phabricator.wikimedia.org/T153855 [14:17:22] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Set sortPrepend for gdwiki T153900 (duration: 00m 40s) [14:17:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:25] T153900: Set sortPrepend for the Scottish Gaelic Wikipedia, gdwiki - https://phabricator.wikimedia.org/T153900 [14:17:39] (03CR) 10jenkins-bot: Enable mapframe for nowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328889 (https://phabricator.wikimedia.org/T154021) (owner: 10Urbanecm) [14:17:41] !log reenabling ntpd on aqs boxes [14:17:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:52] (03PS5) 10Hashar: [throttle] New rules + remove obsolete rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/329500 (https://phabricator.wikimedia.org/T154245) (owner: 10Urbanecm) [14:19:04] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: Enable mapframe for nowiki T154021 (duration: 00m 39s) [14:19:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:09] T154021: Please turn on mapframe for Norwegian (bokmål) Wikipedia - https://phabricator.wikimedia.org/T154021 [14:19:10] hashar, which three exactly? [14:19:12] Sorry if you already replied. I have some problems with my connection. [14:19:28] (03CR) 10Hashar: [C: 032] [throttle] New rules + remove obsolete rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/329500 (https://phabricator.wikimedia.org/T154245) (owner: 10Urbanecm) [14:19:40] 328502 Enable SandboxLink on ruwiki [14:19:44] 328570 Set sortPrepend for gdwiki [14:19:48] 328889 Enable mapframe for nowiki [14:20:03] I am now doing the throttle ruls [14:20:09] Thanks, checking them. [14:20:25] (03Merged) 10jenkins-bot: [throttle] New rules + remove obsolete rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/329500 (https://phabricator.wikimedia.org/T154245) (owner: 10Urbanecm) [14:20:35] (03CR) 10jenkins-bot: [throttle] New rules + remove obsolete rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/329500 (https://phabricator.wikimedia.org/T154245) (owner: 10Urbanecm) [14:20:55] 06Operations, 06Labs, 10Labs-Infrastructure, 07Wikimedia-Incident: labservices1001 down, suspected overheating - https://phabricator.wikimedia.org/T152340#2845101 (10chasemp) A few things that happen that should not when labservices1001 dies (we do not see these same failures when labservices1002 is down):... [14:21:19] (03PS4) 10Hashar: Add HD logos for multiple wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328908 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [14:21:23] 06Operations, 06Labs, 10Labs-Infrastructure, 07Wikimedia-Incident: labservices1001 down, suspected overheating - https://phabricator.wikimedia.org/T152340#2912706 (10chasemp) p:05Triage>03High [14:21:37] !log hashar@tin Synchronized wmf-config/throttle.php: New rules + remove obsolete rules T154245 (duration: 00m 40s) [14:21:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:21:40] T154245: Account creation throttle exemption for en.wp on 2017-01-13 and 2017-01-20 - https://phabricator.wikimedia.org/T154245 [14:22:00] (03CR) 10Hashar: [C: 032] "Urbanecm has confirmed:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328908 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [14:22:25] (03CR) 10Ottomata: "I likey! One comment" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/330154 (https://phabricator.wikimedia.org/T88640) (owner: 10Elukey) [14:22:38] (03Merged) 10jenkins-bot: Add HD logos for multiple wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328908 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [14:22:49] (03CR) 10jenkins-bot: Add HD logos for multiple wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328908 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [14:23:02] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 05Goal: Wikipedias with zh-* language codes waiting to be renamed (zh-min-nan -> nan, zh-yue -> yue, zh-classical -> lzh) - https://phabricator.wikimedia.org/T10217#2912722 (10Liuxinyu970226) [14:23:05] Urbanecm: and the new logos are on mwdebug1002 [14:23:15] then I have no idea how to test that or whether it makes sense to test [14:23:19] might well just push it ? [14:23:36] (03PS2) 10Giuseppe Lavagetto: docker::baseimages: improvements to script to build alpine linux [puppet] - 10https://gerrit.wikimedia.org/r/330218 [14:25:14] brb, getting a cofee + nature(s call etc [14:29:03] Were the throttles deployed? 329500 [14:29:04] hashar, [14:29:04] !log reenabling ntpd on the rest of the boxes. Leaving restbase only out for last [14:29:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:38] (03CR) 10Giuseppe Lavagetto: [C: 032] docker::baseimages: improvements to script to build alpine linux [puppet] - 10https://gerrit.wikimedia.org/r/330218 (owner: 10Giuseppe Lavagetto) [14:31:46] Urbanecm: yeah [14:33:16] I am syncing the static logos [14:33:25] then the initialisesettings file [14:33:26] hashar, okay. [14:33:36] Ok [14:33:44] The throttles are done? [14:33:49] !log hashar@tin Synchronized static/images/project-logos: [1/2] Add HD logos for multiple wikis - T150618 (duration: 00m 40s) [14:33:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:52] T150618: Provide HD logos for all Wikipedias - https://phabricator.wikimedia.org/T150618 [14:35:03] !log hashar@tin Synchronized wmf-config/InitialiseSettings.php: [2/2] Add HD logos for multiple wikis - T150618 (duration: 00m 40s) [14:35:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:27] Urbanecm: all deployed. The only exception being the your.org change [14:35:31] PROBLEM - puppetmaster https on puppetmaster2001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:37:21] PROBLEM - puppetmaster backend https on puppetmaster2001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:38:51] (03PS1) 10Giuseppe Lavagetto: docker::baseimages: fix typo in build script [puppet] - 10https://gerrit.wikimedia.org/r/330221 [14:39:11] RECOVERY - puppetmaster backend https on puppetmaster2001 is OK: HTTP OK: Status line output matched 400 - 331 bytes in 0.198 second response time [14:39:21] RECOVERY - puppetmaster https on puppetmaster2001 is OK: HTTP OK: Status line output matched 400 - 331 bytes in 0.714 second response time [14:39:38] hashar, thanks for your deployment! [14:43:17] !log upgrade liblogstash-gelf on elastic* - T150408 [14:43:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:20] T150408: Upgrade our logstash-gelf package to latest available upstream version - https://phabricator.wikimedia.org/T150408 [14:45:18] 06Operations, 06Discovery, 10Elasticsearch, 06Discovery-Search (Current work), 13Patch-For-Review: Upgrade our logstash-gelf package to latest available upstream version - https://phabricator.wikimedia.org/T150408#2912783 (10Gehel) `liblogstash-gelf-java` is now up to date on all elasticsearch servers. P... [14:53:24] !log reenabling ntpd on the restbase in eqiad [14:53:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:31] PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 1.999 second response time [14:55:31] RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 2.246 second response time [15:02:55] (03Abandoned) 10Cmjohnson: Removing dns entries for decommissioned servers mw1017 and mw1099 T151303 [dns] - 10https://gerrit.wikimedia.org/r/324940 (owner: 10Cmjohnson) [15:31:13] Is it just me or no one can do git pull because "ssh: connect to host gerrit.wikimedia.org port 29418: Network is unreachable". I checked my connection and my ssh. It was okay [15:31:34] Amir1: Works for me. [15:37:33] (03PS1) 10Muehlenhoff: Enable enhanced sandbox privilege separation for sshd [puppet] - 10https://gerrit.wikimedia.org/r/330227 [15:37:46] PROBLEM - Router interfaces on mr1-eqiad is CRITICAL: CRITICAL: No response from remote host 208.80.154.199 for 1.3.6.1.2.1.2.2.1.8 with snmp version 2 [15:37:46] PROBLEM - Juniper alarms on mr1-eqiad is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 208.80.154.199 [15:38:36] RECOVERY - Juniper alarms on mr1-eqiad is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms [15:38:36] RECOVERY - Router interfaces on mr1-eqiad is OK: OK: host 208.80.154.199, interfaces up: 38, down: 0, dormant: 0, excluded: 0, unused: 0 [15:38:39] (03PS2) 10Muehlenhoff: Enable enhanced sandbox privilege separation for sshd [puppet] - 10https://gerrit.wikimedia.org/r/330227 [15:38:43] (03PS4) 10Giuseppe Lavagetto: Revert "Revert "RESTBase configuration for fi.wikivoyage.org"" [puppet] - 10https://gerrit.wikimedia.org/r/324766 (https://phabricator.wikimedia.org/T151570) (owner: 10Alex Monk) [15:39:35] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Revert "Revert "RESTBase configuration for fi.wikivoyage.org"" [puppet] - 10https://gerrit.wikimedia.org/r/324766 (https://phabricator.wikimedia.org/T151570) (owner: 10Alex Monk) [15:45:45] 06Operations, 10ops-codfw, 10DBA: db2060 crashed (RAID controller) - https://phabricator.wikimedia.org/T154031#2912918 (10Marostegui) I have been talking to Papaul and he's kindly agreed to upgrade its BIOS on Thursday, so we will reboot and upgrade it. [15:47:46] 06Operations, 10fundraising-tech-ops: SSL cert for payments-listener.wikimedia.org expires on 2017-01-09 (~6 days) - https://phabricator.wikimedia.org/T154448#2912920 (10RobH) 05Open>03declined This is a dupe of T153097. Processing it now. [15:48:22] 06Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1053 - https://phabricator.wikimedia.org/T151465#2912927 (10Marostegui) @Cmjohnson can we get this disk replaced? Thanks! [15:49:18] <_joe_> !log rolling restart of restbase on the test cluster [15:49:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:54:54] Amir1, network is unreachable? [15:54:57] uh [15:55:06] no [15:55:09] it works for me [15:55:14] Amir1, can you ping gerrit? [15:55:25] The network is not. That's why I can talk here [15:55:35] I opened it in https and worked [15:55:45] but no ssh on port 29418? [15:55:50] it might be that they sysadmins closed that port [15:56:00] It works now [15:56:07] ok [15:56:21] strange. Probably an issue in my network [15:58:42] <_joe_> !log rolling restart of restbase on the production cluster [15:58:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:04] bd808 and Niharika: Respected human, time to deploy Wikimania Scholarships 2017 (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170103T1600). Please do the needful. [16:00:08] o/ [16:01:00] o/ [16:04:40] (03PS2) 10Thcipriani: Revert "Disable l10nupdate cron" [puppet] - 10https://gerrit.wikimedia.org/r/328839 [16:05:30] (03CR) 10Thcipriani: [C: 031] "Should be ready to go now that we're back." [puppet] - 10https://gerrit.wikimedia.org/r/328839 (owner: 10Thcipriani) [16:06:22] 06Operations, 10Gerrit, 06Release-Engineering-Team, 07Upstream: Gerrit: Restarting gerrit could lead to data loss + maybe accounts - https://phabricator.wikimedia.org/T154205#2912984 (10Paladox) Apparently the way they fixed it was the wrong way, but they are keeping it as is on the stable branch as it wor... [16:17:45] James_F: I want to put this in morning SWAT. Is it okay for you? https://gerrit.wikimedia.org/r/#/c/329026/ [16:18:14] Amir1: No. [16:19:12] Amir1: Currently fawiki is an SET wiki. We're (slowly) moving all wikis to one tab. Going in the other direction is not great. :-( [16:20:24] James_F: in the RfC they want to show the VE to all because they think it's great. it's kind of strange [16:20:39] How should I explain this to them? [16:20:48] Amir1: Yeah, it is kind of strange if they think it's great :p [16:20:51] * ostriches runs and hides [16:20:55] 06Operations, 10hardware-requests: Reclaim/Decommission (specify) stat1001 - https://phabricator.wikimedia.org/T154164#2913047 (10Ottomata) [16:21:17] Amir1: Talk to one of the Community Liaisons? Who mostly does fawiki stuff? Elitre is the expert overall. [16:21:32] not-so-conservative Wikipedians :D [16:22:13] James_F: okay, thanks for letting me know [16:22:32] _joe_: do you have a minute to help with a trebuchet deploy? I'm trying to update scholarships on krypton and `git deploy sync` is not getting a response from krypton [16:22:37] (03Abandoned) 10Ladsgroup: Change VE tabs default preferences to multitab in fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/329026 (https://phabricator.wikimedia.org/T154070) (owner: 10Ladsgroup) [16:22:41] Sorry, I'm sure we can get agreement soon. [16:22:57] <_joe_> bd808: 5 minutes and I'm all yours [16:23:00] and of course I don't have a login on that box to check further or the ability to see salt logs [16:23:11] Amir1: It's not an "absolutely not", just I'm not enough read into it to say go ahead right now. [16:23:11] _joe_: <3 thanks [16:23:32] I hope so, We can restore the patch later [16:23:42] +1 [16:26:19] <_joe_> ReadOnlyError: You can't write against a read only slave. [16:26:28] <_joe_> bd808: ok so now I know what's the error [16:26:34] <_joe_> let's see why that happens [16:26:49] _joe_: from the POV of trebuchet there is just no response to the fetch request [16:26:50] <_joe_> I think this has to do with switching the deployment servers [16:26:57] <_joe_> yeah that's not the case [16:26:59] <_joe_> lemme see [16:27:27] "fetch status: 0 [started: 6 mins ago, last-return: 1248 mins ago]" is what I see [16:28:31] <_joe_> can you retry now? [16:29:50] <_joe_> bd808: ? can you? [16:29:57] _joe_: trying [16:30:18] <_joe_> 2017-01-03 16:29:46,295 [salt.loaded.int.module.cmdmod][ERROR ] Command '/usr/bin/git show-ref refs/tags/scholarships/scholarships-sync-20170103-161945' failed with return code: 1 [16:30:30] <_joe_> this looks a bit better [16:30:43] yeah, at least it is talking [16:33:21] <_joe_> you should abort and retry, maybe? [16:33:26] _joe_: I'll abort the current deploy and try from the top one more time [16:33:38] <_joe_> please do [16:34:00] syncing now [16:34:06] PROBLEM - puppet last run on analytics1042 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:34:19] <_joe_> same error [16:34:23] <_joe_> let me look at one thing [16:34:51] <_joe_> yeah [16:34:54] <_joe_> I thought so [16:35:47] <_joe_> now it will work, hopefully [16:36:05] <_joe_> retry the last deploy [16:36:17] (03CR) 10Alexandros Kosiaris: [C: 031] Enable enhanced sandbox privilege separation for sshd [puppet] - 10https://gerrit.wikimedia.org/r/330227 (owner: 10Muehlenhoff) [16:36:34] _joe_: fetch completed :) [16:37:05] _joe_: it worked. thanks! [16:37:42] !log Updated scholarships to 1690808 on krypton; needed help from _joe_ to make trebuchet work [16:37:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:49:41] !log iridium (phab) - reduce process accounting from 30 days to 10 days to save disk space used by /var/log/account, run /etc/cron.daily/acct (T154407) [16:49:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:49:44] T154407: Iridium (phabricator): Disk space is low - https://phabricator.wikimedia.org/T154407 [16:51:26] (03CR) 10Giuseppe Lavagetto: [C: 032] Revert "Disable l10nupdate cron" [puppet] - 10https://gerrit.wikimedia.org/r/328839 (owner: 10Thcipriani) [16:51:53] _joe_: thanks! [16:52:24] <_joe_> thcipriani: it's up for puppetswat and it's the first one [16:52:29] <_joe_> I'll go in a few [16:52:53] okie doke [16:53:39] (03CR) 10Giuseppe Lavagetto: [C: 032] wikilabels: install nodejs package [puppet] - 10https://gerrit.wikimedia.org/r/329316 (https://phabricator.wikimedia.org/T154122) (owner: 10Ladsgroup) [16:54:21] (03CR) 10Giuseppe Lavagetto: "While the patch looks reasonable, let's merge it during the EU morning when I have more time to look after its consequences." [puppet] - 10https://gerrit.wikimedia.org/r/322601 (https://phabricator.wikimedia.org/T1256) (owner: 10Alex Monk) [16:54:44] <_joe_> Krenair: ^^ maybe tomorrow [16:54:57] <_joe_> Amir1: is your patch just for labs? [16:55:15] ok [16:55:41] _joe_: yes [16:55:45] and one instance [16:55:48] <_joe_> ok [16:55:55] _joe_, did you see the other? [16:56:02] <_joe_> thcipriani: merging your patch now [16:56:15] <_joe_> Krenair: I'll look when I'm done with the rest of puppetSWAT [16:56:17] _joe_: thanks! [16:56:18] ok [16:56:24] ok [16:58:35] <_joe_> thcipriani: done, also ran puppet on tin [16:58:51] (03PS2) 10Giuseppe Lavagetto: wikilabels: install nodejs package [puppet] - 10https://gerrit.wikimedia.org/r/329316 (https://phabricator.wikimedia.org/T154122) (owner: 10Ladsgroup) [16:58:56] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] wikilabels: install nodejs package [puppet] - 10https://gerrit.wikimedia.org/r/329316 (https://phabricator.wikimedia.org/T154122) (owner: 10Ladsgroup) [16:58:57] _joe_: nice, I see the cron back in place. Thanks! [16:59:38] !log enable l10nupdate cron post deployment-freeze [16:59:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:00:05] godog, moritzm, and _joe_: Respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170103T1700). Please do the needful. [17:00:05] thcipriani, Amri1, and Krenair: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [17:00:16] <_joe_> Amir1: you've been served as well [17:00:45] Thanks! [17:00:47] PROBLEM - very high load average likely xfs on ms-be1001 is CRITICAL: CRITICAL - load average: 105.82, 107.71, 108.89 [17:02:06] RECOVERY - puppet last run on analytics1042 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [17:02:31] (03CR) 10Giuseppe Lavagetto: [C: 031] Replace repeated UseMod rewrites in apache config with existing include [puppet] - 10https://gerrit.wikimedia.org/r/311648 (owner: 10Alex Monk) [17:04:36] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [17:04:50] !log iridium (phab) - apt-get clean ; find /var/log/account/ -mtime +10 -delete ; find /var/log/atop/ -mtime +10 -delete (T154407) [17:04:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:04:53] T154407: Iridium (phabricator): Disk space is low - https://phabricator.wikimedia.org/T154407 [17:11:38] 06Operations, 10Phabricator: Iridium (phabricator): Disk space is low - https://phabricator.wikimedia.org/T154407#2913245 (10Dzahn) 05Open>03Resolved a:03Dzahn back to > 3GB free ( ~ 64% use). This should (also) be solved by making the root partition larger when doing T152129 and by stopping Phabricator... [17:24:30] !log starting branch cut for 1.29.0-wmf.7 [17:24:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:26:48] thcipriani: did you check blockers? [17:27:40] Nemo_bis: I saw there was a blocker. Won't deploy until resolved. Wanted to get branch cut: takes a long while :( [17:28:10] oki [17:29:10] _joe_, still going? [17:33:36] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [17:35:47] (03PS1) 10Reedy: Revert "mediawiki: disable 'generate captcha' maintenance job" [puppet] - 10https://gerrit.wikimedia.org/r/330250 [17:35:58] (03PS2) 10Reedy: Revert "mediawiki: disable 'generate captcha' maintenance job" [puppet] - 10https://gerrit.wikimedia.org/r/330250 [17:36:12] (03PS7) 10Paladox: Gerrit: Convert from utf8 to utf8mb4 [puppet] - 10https://gerrit.wikimedia.org/r/328571 (https://phabricator.wikimedia.org/T153899) [17:36:15] (03PS3) 10Reedy: Revert "mediawiki: disable 'generate captcha' maintenance job" [puppet] - 10https://gerrit.wikimedia.org/r/330250 (https://phabricator.wikimedia.org/T150029) [17:36:27] (03CR) 10Reedy: "We want to enable this again to be run in Feb :)" [puppet] - 10https://gerrit.wikimedia.org/r/330250 (https://phabricator.wikimedia.org/T150029) (owner: 10Reedy) [17:43:12] (03CR) 10Dzahn: [C: 032] Revert "mediawiki: disable 'generate captcha' maintenance job" [puppet] - 10https://gerrit.wikimedia.org/r/330250 (https://phabricator.wikimedia.org/T150029) (owner: 10Reedy) [17:44:40] !log terbium - Notice: /Stage[main]/Mediawiki::Maintenance::Generatecaptcha/Cron[generatecaptcha]/ensure: created(T150029) [17:44:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:44:44] T150029: Create cronjob for regular captcha regeneration - https://phabricator.wikimedia.org/T150029 [17:46:14] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting an emoji - https://phabricator.wikimedia.org/T145885#2913402 (10Paladox) [17:46:43] (03PS8) 10Paladox: Gerrit: Convert from utf8 to utf8mb4 [puppet] - 10https://gerrit.wikimedia.org/r/328571 (https://phabricator.wikimedia.org/T153899) [17:46:50] (03PS9) 10Paladox: Gerrit: Convert from utf8 to utf8mb4 [puppet] - 10https://gerrit.wikimedia.org/r/328571 (https://phabricator.wikimedia.org/T153899) [17:47:24] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting an emoji - https://phabricator.wikimedia.org/T145885#2644059 (10Paladox) [17:56:55] (03CR) 10Eevans: RESTBase-Cassandra: Add the topk reporter (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/328660 (https://phabricator.wikimedia.org/T147366) (owner: 10Mobrovac) [18:00:05] yurik, gwicke, cscott, arlolra, subbu, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170103T1800). [18:01:18] !log installing fontconfig security updates [18:01:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:01:37] (03PS1) 10Chad: checkoutMediaWiki: kill this dumpster fire (with fire) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330254 [18:02:34] (03CR) 10Eevans: RESTBase-Cassandra: Add the topk reporter (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/328660 (https://phabricator.wikimedia.org/T147366) (owner: 10Mobrovac) [18:04:08] (03PS2) 10Dzahn: gerrit: Indent @ssl_settings in Apache configuration [puppet] - 10https://gerrit.wikimedia.org/r/329738 (owner: 10Tim Landscheidt) [18:04:20] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/5016/cobalt.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/329738 (owner: 10Tim Landscheidt) [18:10:00] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: [epic] System level upgrade for cirrus / elasticsearch - https://phabricator.wikimedia.org/T151324#2913481 (10Deskana) [18:11:42] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: [epic] System level upgrade for cirrus / elasticsearch - https://phabricator.wikimedia.org/T151324#2814352 (10Deskana) [18:12:15] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: [epic] System level upgrade for cirrus / elasticsearch - https://phabricator.wikimedia.org/T151324#2814352 (10Deskana) These tasks subtasks are closely related, but are not strict dependencies. [18:16:43] (03PS1) 10Chad: gerrit (2.13.4-wmf.1) jessie-wikimedia; urgency=low [debs/gerrit] - 10https://gerrit.wikimedia.org/r/330255 (https://phabricator.wikimedia.org/T154205) [18:17:26] 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch, 10Wikimedia-Logstash: Update logstash on wikimedia to 2.x or 5.x - https://phabricator.wikimedia.org/T154473#2913540 (10bd808) The Logstash Forwarder and Lumberjack inputs are not used in WMF production and thus not a concern for upgrades. W... [18:17:41] 06Operations, 10DBA, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Gerrit: Schedule downtime for T154205 (To do with data loss) - https://phabricator.wikimedia.org/T154327#2913541 (10demon) Gerrit's built against latest stable-2.13 and uploaded for review. Need to pick a (very lengthy, low-tr... [18:30:03] (03CR) 10Paladox: gerrit (2.13.4-wmf.1) jessie-wikimedia; urgency=low (031 comment) [debs/gerrit] - 10https://gerrit.wikimedia.org/r/330255 (https://phabricator.wikimedia.org/T154205) (owner: 10Chad) [18:36:20] (03PS3) 10Gehel: Add configuration for query endpoint URL [puppet] - 10https://gerrit.wikimedia.org/r/328582 (https://phabricator.wikimedia.org/T153897) (owner: 10Smalyshev) [18:37:29] (03CR) 10Volans: [C: 04-1] "Some replies inline" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/328660 (https://phabricator.wikimedia.org/T147366) (owner: 10Mobrovac) [18:38:02] (03CR) 10Gehel: [C: 032] Add configuration for query endpoint URL [puppet] - 10https://gerrit.wikimedia.org/r/328582 (https://phabricator.wikimedia.org/T153897) (owner: 10Smalyshev) [18:39:36] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:44:16] (03CR) 10Paladox: "Needs rebase." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/330255 (https://phabricator.wikimedia.org/T154205) (owner: 10Chad) [18:47:02] (03PS3) 10Ottomata: Alert on EventBus service HTTP error rate [puppet] - 10https://gerrit.wikimedia.org/r/328239 (https://phabricator.wikimedia.org/T153034) [18:47:15] (03CR) 10Ottomata: [V: 032 C: 032] Alert on EventBus service HTTP error rate [puppet] - 10https://gerrit.wikimedia.org/r/328239 (https://phabricator.wikimedia.org/T153034) (owner: 10Ottomata) [18:47:23] (03PS4) 10Gehel: Add configuration for query endpoint URL [puppet] - 10https://gerrit.wikimedia.org/r/328582 (https://phabricator.wikimedia.org/T153897) (owner: 10Smalyshev) [18:47:28] (03CR) 10Gehel: [V: 032 C: 032] Add configuration for query endpoint URL [puppet] - 10https://gerrit.wikimedia.org/r/328582 (https://phabricator.wikimedia.org/T153897) (owner: 10Smalyshev) [18:49:44] (03PS5) 10Gehel: Add configuration for query endpoint URL [puppet] - 10https://gerrit.wikimedia.org/r/328582 (https://phabricator.wikimedia.org/T153897) (owner: 10Smalyshev) [18:50:17] (03CR) 10Gehel: [V: 032 C: 032] Add configuration for query endpoint URL [puppet] - 10https://gerrit.wikimedia.org/r/328582 (https://phabricator.wikimedia.org/T153897) (owner: 10Smalyshev) [18:51:09] (03PS2) 10Chad: gerrit (2.13.4-wmf.1) jessie-wikimedia; urgency=low [debs/gerrit] - 10https://gerrit.wikimedia.org/r/330255 (https://phabricator.wikimedia.org/T154205) [18:52:57] (03PS4) 10Yuvipanda: puppetmaster: Cleanup unused vars / crons in labs puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/312317 [18:53:22] (03CR) 10Yuvipanda: [V: 032 C: 032] puppetmaster: Cleanup unused vars / crons in labs puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/312317 (owner: 10Yuvipanda) [18:53:41] (03PS2) 10Yuvipanda: Add ruby webservice type [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/324730 [18:54:13] (03CR) 10Dzahn: "because the reviews said that it's not good, needs to run on all disks, be smart about detecting RAID, not use hiera etc." [puppet] - 10https://gerrit.wikimedia.org/r/304580 (owner: 10Dzahn) [18:56:49] PROBLEM - puppet last run on mw1222 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:59:27] (03CR) 10Chad: [C: 031] "Already do this for other public wikis, don't see why not. Get it in a swat window :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/329600 (https://phabricator.wikimedia.org/T154358) (owner: 10MarcoAurelio) [19:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170103T1900). Please do the needful. [19:00:04] Amir1: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [19:00:22] o/ [19:01:04] I can SWAT today [19:01:59] (03CR) 10Paladox: "@Hashar hi I'm wondering could you review please?" [puppet] - 10https://gerrit.wikimedia.org/r/328051 (https://phabricator.wikimedia.org/T141450) (owner: 10Paladox) [19:02:33] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/329453 (https://phabricator.wikimedia.org/T153186) (owner: 10Ladsgroup) [19:03:14] 06Operations, 10DBA, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Gerrit: Schedule downtime for T154205 (To do with data loss) - https://phabricator.wikimedia.org/T154327#2913752 (10Paladox) @ostriches that could be done tonight as most of wikimedia is back but not everyone so it would be a... [19:03:37] (03Merged) 10jenkins-bot: Add badge for "digitaldocument" in Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/329453 (https://phabricator.wikimedia.org/T153186) (owner: 10Ladsgroup) [19:03:48] 06Operations, 10DBA, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Gerrit: Schedule downtime for T154205 (To do with data loss) - https://phabricator.wikimedia.org/T154327#2913754 (10demon) I'm not rushing into it **tonight**, it will be scheduled and announced. [19:05:13] 06Operations, 10DBA, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Gerrit: Schedule downtime for T154205 (To do with data loss) - https://phabricator.wikimedia.org/T154327#2913761 (10Paladox) Ok [19:07:28] Amir1: live on mwdebug1002, check please [19:07:39] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [19:07:43] okay [19:08:13] (03PS1) 10Kaldari: Switch nowiki to uca-nb-u-kn collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330264 [19:09:59] thcipriani: Worked just fine https://www.wikidata.org/w/index.php?title=Q21188118&diff=423855154&oldid=360786652 [19:10:18] Amir1: ok, going live everywhere [19:10:40] Thanks [19:12:17] (03CR) 10jenkins-bot: Add badge for "digitaldocument" in Wikibase [mediawiki-config] - 10https://gerrit.wikimedia.org/r/329453 (https://phabricator.wikimedia.org/T153186) (owner: 10Ladsgroup) [19:12:21] (03CR) 10Paladox: [C: 031] "> so, about the existing instances that use this role. If we'd just" [puppet] - 10https://gerrit.wikimedia.org/r/327690 (https://phabricator.wikimedia.org/T139475) (owner: 10Dzahn) [19:13:05] (03PS1) 10BryanDavis: toollabs: bigbrother: stop tracking jobs when rcfile is deleted [puppet] - 10https://gerrit.wikimedia.org/r/330265 (https://phabricator.wikimedia.org/T94500) [19:13:39] !log thcipriani@tin Synchronized wmf-config/Wikibase-production.php: SWAT: [[gerrit:329453|Add badge for "digitaldocument" in Wikibase]] T153186 (duration: 01m 33s) [19:13:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:13:43] T153186: Extra interwiki badge required at Wikidata for Wikisource works to show "digital" documents - https://phabricator.wikimedia.org/T153186 [19:13:54] ^ Amir1 live everywhere [19:14:07] Thanks [19:15:06] thcipriani: Worked fine [19:15:10] Thanks! [19:22:27] (03CR) 10Dzahn: "a +1 combined with "but not until we did X" is confusing, that's like +1 and -1 at the same time" [puppet] - 10https://gerrit.wikimedia.org/r/327690 (https://phabricator.wikimedia.org/T139475) (owner: 10Dzahn) [19:23:43] (03CR) 10Yuvipanda: [C: 032] Add ruby webservice type [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/324730 (owner: 10Yuvipanda) [19:23:48] (03CR) 10Dzahn: "either this role is used (then your last comment makes sense) or it's not used (which is what i asked on channel and you said yourself tha" [puppet] - 10https://gerrit.wikimedia.org/r/327690 (https://phabricator.wikimedia.org/T139475) (owner: 10Dzahn) [19:24:49] RECOVERY - puppet last run on mw1222 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [19:25:09] (03CR) 10Paladox: [C: 04-1] "We need to remove the class from deployment-phab-01 and deploymenet-phab-02 first" [puppet] - 10https://gerrit.wikimedia.org/r/327690 (https://phabricator.wikimedia.org/T139475) (owner: 10Dzahn) [19:31:23] (03CR) 10Paladox: [C: 04-1] "@20after4 hi could you remove the class from those two instances?" [puppet] - 10https://gerrit.wikimedia.org/r/327690 (https://phabricator.wikimedia.org/T139475) (owner: 10Dzahn) [19:42:41] 06Operations, 10ops-eqiad, 10media-storage: Degraded RAID on ms-be1006 - https://phabricator.wikimedia.org/T154418#2913897 (10fgiunchedi) a:03Cmjohnson @Cmjohnson the machine can be shut down at any time via graceful `shutdown`, please replace the disk, thanks! [19:42:49] RECOVERY - MegaRAID on ms-be1001 is OK: OK: optimal, 13 logical, 13 physical [19:42:50] RECOVERY - very high load average likely xfs on ms-be1001 is OK: OK - load average: 5.69, 1.26, 0.41 [19:46:51] (03CR) 10Thcipriani: [C: 032] checkoutMediaWiki: kill this dumpster fire (with fire) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330254 (owner: 10Chad) [19:47:36] (03Merged) 10jenkins-bot: checkoutMediaWiki: kill this dumpster fire (with fire) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330254 (owner: 10Chad) [19:47:53] (03CR) 10jenkins-bot: checkoutMediaWiki: kill this dumpster fire (with fire) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330254 (owner: 10Chad) [19:49:13] ^ ostriches I went ahead and pulled this on mwdebug1002 to be on the safe-side, if there's anything new you can think to check there [19:49:29] can't find anywhere where it's used except in manual processes [19:49:32] Considering we don't ever run it anything other than tin/mira [19:49:41] (03CR) 10Paladox: [C: 031] "Instances doint use any roles." [puppet] - 10https://gerrit.wikimedia.org/r/327690 (https://phabricator.wikimedia.org/T139475) (owner: 10Dzahn) [19:49:43] yarp [19:49:56] alright, Imma sync-dir multiversion [19:50:12] 06Operations, 10ops-eqiad, 10media-storage: Degraded RAID on ms-be1001 - https://phabricator.wikimedia.org/T154396#2914027 (10fgiunchedi) a:03Cmjohnson Indeed as @jcrespo mentioned a reboot fixed the issue, specifically in this case even `umount` was stuck and `mkfs.xfs` called by puppet piled up. Going fo... [19:51:14] ACKNOWLEDGEMENT - puppet last run on ms-be1006 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[mkfs-/dev/sdk1] Filippo Giunchedi T154418 [19:52:19] (03PS2) 10Dzahn: phabricator: delete labs role [puppet] - 10https://gerrit.wikimedia.org/r/327690 (https://phabricator.wikimedia.org/T139475) [19:52:42] (03CR) 10Dzahn: "how do you know?" [puppet] - 10https://gerrit.wikimedia.org/r/327690 (https://phabricator.wikimedia.org/T139475) (owner: 10Dzahn) [19:52:44] !log thcipriani@tin Synchronized multiversion: SWAT: [[gerrit:330254|Remove checkoutMediaWiki]] (duration: 00m 58s) [19:52:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:53:36] (03Abandoned) 10Hashar: Change $wgMaxRedirects to 3 on enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296520 (https://phabricator.wikimedia.org/T67064) (owner: 10Mdann52) [19:54:49] (03Abandoned) 10Hashar: Enable Flow beta feature on plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277529 (https://phabricator.wikimedia.org/T130009) (owner: 10Catrope) [19:54:51] thcipriani: That felt good :) [19:55:00] :D [19:55:08] (03CR) 10Chad: [C: 032] Remove --dry-run option from updateBranchPointers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328329 (owner: 10Chad) [19:55:44] (03Merged) 10jenkins-bot: Remove --dry-run option from updateBranchPointers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328329 (owner: 10Chad) [19:56:00] (03PS3) 10Filippo Giunchedi: Add the HHVM and Apache videoscaler clusters to Prometheus polling [puppet] - 10https://gerrit.wikimedia.org/r/328913 (https://phabricator.wikimedia.org/T147316) (owner: 10Elukey) [19:56:16] (03CR) 10Filippo Giunchedi: [C: 031] Enable enhanced sandbox privilege separation for sshd [puppet] - 10https://gerrit.wikimedia.org/r/330227 (owner: 10Muehlenhoff) [19:56:19] PROBLEM - puppet last run on graphite1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:56:52] !log demon@tin Synchronized multiversion/updateBranchPointers: Removing unused --dry-run option (duration: 00m 40s) [19:56:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:57:51] (03CR) 10jenkins-bot: Remove --dry-run option from updateBranchPointers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328329 (owner: 10Chad) [20:00:05] twentyafterfour: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170103T2000). Please do the needful. [20:00:23] (03CR) 10Dzahn: [C: 032] phabricator: delete labs role [puppet] - 10https://gerrit.wikimedia.org/r/327690 (https://phabricator.wikimedia.org/T139475) (owner: 10Dzahn) [20:01:10] (03CR) 10Dzahn: "Sagan confirmed it wasn't used in the deployment-prep instances (deployment-phab) either" [puppet] - 10https://gerrit.wikimedia.org/r/327690 (https://phabricator.wikimedia.org/T139475) (owner: 10Dzahn) [20:01:29] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [20:02:46] (03CR) 10Filippo Giunchedi: [C: 032] Add the HHVM and Apache videoscaler clusters to Prometheus polling [puppet] - 10https://gerrit.wikimedia.org/r/328913 (https://phabricator.wikimedia.org/T147316) (owner: 10Elukey) [20:03:02] (03PS4) 10Filippo Giunchedi: Add the HHVM and Apache videoscaler clusters to Prometheus polling [puppet] - 10https://gerrit.wikimedia.org/r/328913 (https://phabricator.wikimedia.org/T147316) (owner: 10Elukey) [20:04:14] (03CR) 10Hashar: "recheck" [debs/geckodriver] - 10https://gerrit.wikimedia.org/r/294293 (https://phabricator.wikimedia.org/T137797) (owner: 10Hashar) [20:04:42] oh jouncebot I'm on train today /me fixes deployment page [20:04:47] also train is blocked. [20:05:51] twentyafterfour, hi, i just merged https://gerrit.wikimedia.org/r/#/c/330277/ to wmf7, could you check that it made it to the train please? [20:06:56] thcipriani, oh, something's wrong with the train? [20:07:27] yurik: yarp looks like this: https://phabricator.wikimedia.org/T153761 [20:07:57] thcipriani, ah, fun fun parser bugs :) [20:08:25] whee [20:08:40] yurik: I'll add your patch to the checkout though. [20:08:57] thx :) [20:09:07] it just merged to wmf7, finally [20:10:16] yup, it's staged on tin, it'll go out with train [20:11:19] PROBLEM - puppet last run on db1069 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/lib/nagios/plugins/check_raid] [20:12:36] (03CR) 10Dzahn: "ok, yea, this is something for Andrew and Yuvi" [puppet] - 10https://gerrit.wikimedia.org/r/326312 (owner: 10Tim Landscheidt) [20:13:36] !log gehel@tin Starting deploy [wdqs/wdqs@cd7215c]: (no message) [20:13:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:18:29] !log gehel@tin Finished deploy [wdqs/wdqs@cd7215c]: (no message) (duration: 04m 54s) [20:18:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:25:19] RECOVERY - puppet last run on graphite1003 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [20:26:43] (03CR) 10Dzahn: [C: 04-1] "@paladox it should not be active in both DCs, the point would be to keep it stopped on the non-active server and have it running on the ac" [puppet] - 10https://gerrit.wikimedia.org/r/324833 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [20:27:26] (03CR) 10Dzahn: [C: 04-1] "@20after4 any comments on that? which services should be running and which should be stopped on the instance that is not currently the act" [puppet] - 10https://gerrit.wikimedia.org/r/324833 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [20:29:29] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [20:31:10] RECOVERY - puppet last run on db1069 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [20:45:46] !log otto@tin Starting deploy [eventstreams/deploy@9095b4e]: (no message) [20:45:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:47:55] (03PS1) 10Andrew Bogott: nova-network: Refresh service if some dependency files change. [puppet] - 10https://gerrit.wikimedia.org/r/330308 (https://phabricator.wikimedia.org/T137460) [20:50:40] (03PS2) 10Andrew Bogott: nova-network: Refresh service if config files change. [puppet] - 10https://gerrit.wikimedia.org/r/330308 (https://phabricator.wikimedia.org/T137460) [20:52:29] (03PS3) 10Dzahn: toollabs: install opencv-data (trusty, jessie) [puppet] - 10https://gerrit.wikimedia.org/r/303416 (https://phabricator.wikimedia.org/T142321) (owner: 10Merlijn van Deen) [20:56:49] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [20:57:01] !log otto@tin Finished deploy [eventstreams/deploy@9095b4e]: (no message) (duration: 11m 15s) [20:57:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:57:49] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2874630 keys, up 64 days 12 hours - replication_delay is 0 [20:59:12] yuvipanda: were you involved in setting up the uwsgi puppet class? I have my thing mostly working but can't get logging sorted out. [20:59:37] !log gehel@tin Starting deploy [wdqs/wdqs@a25d3aa]: (no message) [20:59:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:00:32] !log gehel@tin Finished deploy [wdqs/wdqs@a25d3aa]: (no message) (duration: 00m 55s) [21:00:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:08:07] (03CR) 10jerkins-bot: [V: 04-1] nova-network: Refresh service if config files change. [puppet] - 10https://gerrit.wikimedia.org/r/330308 (https://phabricator.wikimedia.org/T137460) (owner: 10Andrew Bogott) [21:08:50] (03PS1) 10MaxSem: Enable mapframe on frwiki and fiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330311 (https://phabricator.wikimedia.org/T151591) [21:13:23] (03CR) 10Yurik: [C: 04-1] Enable mapframe on frwiki and fiwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330311 (https://phabricator.wikimedia.org/T151591) (owner: 10MaxSem) [21:14:56] (03PS2) 10MaxSem: Enable mapframe on frwiki and fiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330311 (https://phabricator.wikimedia.org/T151591) [21:21:59] PROBLEM - swift-account-server on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [21:22:09] PROBLEM - swift-object-replicator on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [21:22:09] PROBLEM - swift-object-updater on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-updater [21:22:09] PROBLEM - swift-container-auditor on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [21:22:29] PROBLEM - swift-account-auditor on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [21:22:29] PROBLEM - swift-object-server on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [21:22:29] PROBLEM - swift-container-server on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [21:22:39] PROBLEM - swift-container-updater on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-updater [21:22:39] PROBLEM - swift-account-reaper on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [21:22:39] PROBLEM - swift-account-replicator on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [21:22:49] PROBLEM - swift-container-replicator on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [21:22:49] PROBLEM - swift-object-auditor on ms-be1001 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [21:24:29] (03PS3) 10Andrew Bogott: nova-network: Refresh service if config files change. [puppet] - 10https://gerrit.wikimedia.org/r/330308 (https://phabricator.wikimedia.org/T137460) [21:25:55] (03CR) 10Andrew Bogott: [C: 032] nova-network: Refresh service if config files change. [puppet] - 10https://gerrit.wikimedia.org/r/330308 (https://phabricator.wikimedia.org/T137460) (owner: 10Andrew Bogott) [21:30:26] andrewbogott: I did, but never touched logging. bd808 on the other hand is fresh off wrangling lots of logging and uwsgi, so maybe he has more context [21:30:52] bd808: here's the good bit: https://gerrit.wikimedia.org/r/#/c/328400/9/modules/role/manifests/labs/openstack/keystone.pp [21:31:05] * bd808 looks [21:31:12] the script that runs uses normal python logging… [21:31:25] It's been a couple of weeks, let me check and see what's actually happening. I think it's a permission problem I'm having now? [21:31:34] andrewbogott: have you looked at how I wired it up for Striker? [21:31:44] bd808: I don't think so... [21:32:01] I mean, my code is c/p of striker but I haven't looked back at the logging bits [21:32:41] modules/striker/manifests/uwsgi.pp has the service setup config [21:32:59] logger + log-route + log-encoder [21:33:23] ok, will try, thanks [21:33:47] that just handles the uwsgi core logging [21:33:53] for striker anyway [21:34:05] the app logging is configured to write its own files [21:34:30] but if it wrote to stderr I think it would be captured in the main.log file [21:34:39] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:34:57] 06Operations, 10netops: Slight packet loss observed on the network starting Nov 2016 - https://phabricator.wikimedia.org/T154507#2914482 (10Aklapper) [21:40:10] (03PS10) 10Andrew Bogott: Keystone: Move api service to uwsgi/nginx [puppet] - 10https://gerrit.wikimedia.org/r/328400 (https://phabricator.wikimedia.org/T150774) [21:48:44] bd808: what's the equivalent of 'service restart uwsgi' to see what's going on when things try to start up? [21:49:33] andrewbogott: there should be a unit for the specific service I think... [21:51:04] for striker I think it is `service uwsgi-striker *` [21:52:27] hm, I don't see anything in /etc/systemd [21:52:32] maybe this isn't getting applied at all... [21:52:59] (03CR) 10Gergő Tisza: [C: 031] Labs: remove wmgUseCommonsMetadata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328868 (owner: 10MaxSem) [21:55:19] (03CR) 10Paladox: [C: 031] Gerrit: Enable logstash in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/326177 (https://phabricator.wikimedia.org/T141324) (owner: 10Paladox) [21:55:27] andrewbogott: are you testing on a jessie host or a trysty one? [21:55:31] *trusty [21:55:33] jessie [22:00:04] yurik, maxsem, and jgirault: Dear anthropoid, the time has come. Please deploy Interactive teamm depl (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170103T2200). [22:00:27] MaxSem, should i or do you want to do it? [22:00:54] I can do [22:01:04] ok, will spot [22:02:15] (03PS3) 10MaxSem: Enable mapframe on frwiki and fiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330311 (https://phabricator.wikimedia.org/T151591) [22:02:39] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [22:03:12] (03PS2) 10Kaldari: Switch nowiki to uca-nb-u-kn collation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330264 [22:03:39] RECOVERY - swift-account-reaper on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [22:03:39] RECOVERY - swift-account-replicator on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [22:03:49] RECOVERY - swift-container-replicator on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [22:03:50] RECOVERY - swift-object-auditor on ms-be1001 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [22:03:59] RECOVERY - swift-account-server on ms-be1001 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [22:04:09] RECOVERY - swift-object-replicator on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [22:04:09] RECOVERY - swift-object-updater on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [22:04:10] RECOVERY - swift-container-auditor on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [22:04:19] (03CR) 10MaxSem: [C: 032] Enable mapframe on frwiki and fiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330311 (https://phabricator.wikimedia.org/T151591) (owner: 10MaxSem) [22:04:29] RECOVERY - swift-container-server on ms-be1001 is OK: PROCS OK: 13 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [22:04:29] RECOVERY - swift-object-server on ms-be1001 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [22:04:29] RECOVERY - swift-account-auditor on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [22:04:39] RECOVERY - swift-container-updater on ms-be1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [22:04:55] (03Merged) 10jenkins-bot: Enable mapframe on frwiki and fiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330311 (https://phabricator.wikimedia.org/T151591) (owner: 10MaxSem) [22:04:57] (03CR) 10Gergő Tisza: [C: 04-1] "This was removed from prod by accident in I3f93183a3. Forced recalculation is needed if you want to see beta code changes reflected in Com" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328869 (owner: 10MaxSem) [22:05:07] (03CR) 10jenkins-bot: Enable mapframe on frwiki and fiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330311 (https://phabricator.wikimedia.org/T151591) (owner: 10MaxSem) [22:06:54] (03PS1) 10Gergő Tisza: Re-add wgCommonsMetadataForceRecalculate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330318 [22:07:06] (03CR) 10jerkins-bot: [V: 04-1] Re-add wgCommonsMetadataForceRecalculate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330318 (owner: 10Gergő Tisza) [22:07:25] (03CR) 10Gergő Tisza: "Partial revert, ForceRecalculate is used on beta." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292622 (owner: 10Jforrester) [22:07:41] yurik, pulled on mdebug1002 [22:07:51] testing... [22:09:25] MaxSem, seems good [22:10:15] (03PS2) 10Gergő Tisza: Re-add wgCommonsMetadataForceRecalculate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330318 [22:10:23] (03PS3) 10Gergő Tisza: Re-add wgCommonsMetadataForceRecalculate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330318 [22:11:09] (03CR) 10Gergő Tisza: [C: 04-1] "Fix in I9c0b8fdc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/328869 (owner: 10MaxSem) [22:11:38] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: mapframe on fr: and fi: https://gerrit.wikimedia.org/r/#/c/330311/3 (duration: 00m 41s) [22:11:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:17:51] (03PS11) 10Andrew Bogott: Keystone: Move api service to uwsgi/nginx [puppet] - 10https://gerrit.wikimedia.org/r/328400 (https://phabricator.wikimedia.org/T150774) [22:17:53] (03PS1) 10Andrew Bogott: Add mirantis backports repo for Openstack classes on Jessie [puppet] - 10https://gerrit.wikimedia.org/r/330319 [22:20:14] (03PS4) 10Dzahn: toollabs: install opencv-data (trusty, jessie) [puppet] - 10https://gerrit.wikimedia.org/r/303416 (https://phabricator.wikimedia.org/T142321) (owner: 10Merlijn van Deen) [22:20:46] (03CR) 1020after4: "Right now none of the phabricator services should be running on the standby server. We can run phd once we get the repository clustering " [puppet] - 10https://gerrit.wikimedia.org/r/324833 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [22:21:35] (03CR) 1020after4: [C: 04-1] Phabricator: Set phabricator active server for iridium and phab2001 [puppet] - 10https://gerrit.wikimedia.org/r/324833 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [22:23:15] (03CR) 10Dzahn: [C: 032] toollabs: install opencv-data (trusty, jessie) [puppet] - 10https://gerrit.wikimedia.org/r/303416 (https://phabricator.wikimedia.org/T142321) (owner: 10Merlijn van Deen) [22:23:21] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting an emoji - https://phabricator.wikimedia.org/T145885#2914781 (10Paladox) [22:23:32] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting an emoji - https://phabricator.wikimedia.org/T145885#2644059 (10Paladox) [22:25:37] (03CR) 10Dzahn: [C: 04-1] "ok, gotcha. so just one thing, we wanted to test if it works as phab-new.wikimedia.org, right, so confirm we have a working standby on jes" [puppet] - 10https://gerrit.wikimedia.org/r/324833 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [22:29:04] (03CR) 10Filippo Giunchedi: [C: 031] l10nupdate: acquire scap lock before changing files (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/303923 (https://phabricator.wikimedia.org/T72752) (owner: 10BryanDavis) [22:30:04] (03Abandoned) 10Paladox: Phabricator: Set phabricator active server for iridium and phab2001 [puppet] - 10https://gerrit.wikimedia.org/r/324833 (https://phabricator.wikimedia.org/T137928) (owner: 10Paladox) [22:38:55] !log maxsem@tin Synchronized php-1.29.0-wmf.6/extensions/Kartographer: https://gerrit.wikimedia.org/r/#/c/330322/ (duration: 00m 42s) [22:38:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:42:23] MaxSem: yurik jgirault could you ping me when you're done with your deployment so I can get the train back on track? [22:42:36] ok [22:44:12] !log maxsem@tin Synchronized php-1.29.0-wmf.7/extensions/Kartographer: https://gerrit.wikimedia.org/r/#/c/330321/ (duration: 00m 42s) [22:44:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:44:21] thcipriani, I'm done [22:44:37] MaxSem: cool, thanks! [22:49:49] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 617 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2875317 keys, up 64 days 14 hours - replication_delay is 617 [22:50:40] (03PS3) 10Hashar: Puppet doc with strings/yard [puppet] - 10https://gerrit.wikimedia.org/r/309561 [22:52:42] (03PS4) 10Hashar: Puppet doc with strings/yard [puppet] - 10https://gerrit.wikimedia.org/r/309561 (https://phabricator.wikimedia.org/T143233) [22:54:19] PROBLEM - All Flannel etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/etcd/flannel - 341 bytes in 0.002 second response time [22:54:19] PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 341 bytes in 0.002 second response time [22:54:19] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 341 bytes in 0.002 second response time [22:54:53] false alarm ^ i got it [22:55:20] !log iptables block of tools-checker-01 to debug DNS SPoF [22:55:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:56:37] ugh should have that been paging? it didn't page me [22:57:49] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 2874203 keys, up 64 days 14 hours - replication_delay is 1 [22:58:20] !log thcipriani@tin Started scap: testwiki to php-1.29.0-wmf.7 and rebuild l10n cache [22:58:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:58:40] 06Operations, 07Puppet, 10Continuous-Integration-Config, 13Patch-For-Review, 07Upstream: post build failures for operations/puppet on operations-puppet-doc - https://phabricator.wikimedia.org/T143233#2914951 (10hashar) Early September I looked at [[ https://github.com/puppetlabs/puppet-strings/ | puppet-... [22:59:07] godog: it's a controlled failure to figure out why it's failing at all [23:00:15] chasemp: ack, thanks! did it page you? [23:00:57] godog: no but I think madhuvishy meant it to be silenced and barely caught it [23:01:15] (not every toolschecker test pages) [23:03:21] ah, that'd explain it, thanks [23:05:35] (03PS5) 10Hashar: Puppet doc with strings/yard [puppet] - 10https://gerrit.wikimedia.org/r/309561 (https://phabricator.wikimedia.org/T143233) [23:09:18] 06Operations, 10MediaWiki-Internationalization: Norwegian messages inContentLanguage look for on-wiki overrides at the /nb subpage, not the root page - https://phabricator.wikimedia.org/T126146#2915023 (10greg) Update? SWATs are open again. [23:11:37] (03PS2) 10Filippo Giunchedi: prometheus: extend ops recording rules [puppet] - 10https://gerrit.wikimedia.org/r/328842 [23:15:11] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: extend ops recording rules [puppet] - 10https://gerrit.wikimedia.org/r/328842 (owner: 10Filippo Giunchedi) [23:19:01] !log gerrit: quick restart of services [23:19:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:19:19] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.005 second response time [23:21:11] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting an emoji - https://phabricator.wikimedia.org/T145885#2915066 (10Paladox) >>! In T145885#2897016, @jcrespo wrote: >> ahaha it works now. >> >> you have to set >> >> character-set-client-handshake = FA... [23:24:12] (03PS1) 10Gergő Tisza: Use Title_blacklist as a local page on meta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330328 (https://phabricator.wikimedia.org/T154112) [23:27:17] (03PS12) 10Andrew Bogott: Keystone: Move api service to uwsgi/nginx [puppet] - 10https://gerrit.wikimedia.org/r/328400 (https://phabricator.wikimedia.org/T150774) [23:27:30] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting an emoji - https://phabricator.wikimedia.org/T145885#2915101 (10Paladox) Apparently jdbc does not support utf8mb4 https://www.google.co.uk/#q=fatal:+++caused+by+java.sql.SQLException:+Unsupported+char... [23:29:39] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting an emoji - https://phabricator.wikimedia.org/T145885#2915102 (10Paladox) https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-charsets.html [23:29:42] (03CR) 10jerkins-bot: [V: 04-1] Keystone: Move api service to uwsgi/nginx [puppet] - 10https://gerrit.wikimedia.org/r/328400 (https://phabricator.wikimedia.org/T150774) (owner: 10Andrew Bogott) [23:31:49] PROBLEM - HHVM rendering on mw1201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:32:49] RECOVERY - HHVM rendering on mw1201 is OK: HTTP OK: HTTP/1.1 200 OK - 69331 bytes in 3.495 second response time [23:32:56] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting an emoji - https://phabricator.wikimedia.org/T145885#2915107 (10Paladox) >>! In T145885#2897016, @jcrespo wrote: >> ahaha it works now. >> >> you have to set >> >> character-set-client-handshake = FA... [23:33:41] (03PS13) 10Andrew Bogott: Keystone: Move api service to uwsgi/nginx [puppet] - 10https://gerrit.wikimedia.org/r/328400 (https://phabricator.wikimedia.org/T150774) [23:41:06] 06Operations, 10ops-eqiad: rack/setup/install/track new ms-fe1005-1008 - https://phabricator.wikimedia.org/T154250#2915132 (10fgiunchedi) re: racking me and @Cmjohnson chatted about it and concluded 1 in A5, 1 in C8 and two in row D (D2 + D7) once the row is fully online (10G required). We can proceed with row... [23:42:21] (03PS1) 10Alex Monk: check_graphite: Fix some KeyError exceptions in SeriesThreshold.format_message [puppet] - 10https://gerrit.wikimedia.org/r/330329 (https://phabricator.wikimedia.org/T154533) [23:42:49] 06Operations, 10ops-eqiad, 13Patch-For-Review: eqiad: Rack and setup new restbase nodes - https://phabricator.wikimedia.org/T150964#2915137 (10fgiunchedi) 05Open>03Resolved All machines are fully in service, resolving [23:45:40] (03PS14) 10Andrew Bogott: Keystone: Move api service to uwsgi/nginx [puppet] - 10https://gerrit.wikimedia.org/r/328400 (https://phabricator.wikimedia.org/T150774) [23:48:11] !log thcipriani@tin Finished scap: testwiki to php-1.29.0-wmf.7 and rebuild l10n cache (duration: 49m 50s) [23:48:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:50:06] (03CR) 10Paladox: [C: 031] "Important fix :)" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/330255 (https://phabricator.wikimedia.org/T154205) (owner: 10Chad) [23:51:25] (03PS1) 10Thcipriani: Group0 to 1.29.0-wmf.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330331 [23:51:28] 06Operations, 13Patch-For-Review, 05Prometheus-metrics-monitoring: Port application-specific metrics from ganglia to prometheus - https://phabricator.wikimedia.org/T145659#2915173 (10fgiunchedi) [23:51:30] 06Operations, 13Patch-For-Review, 05Prometheus-metrics-monitoring, 15User-Joe: Port HHVM metrics from ganglia to prometheus - https://phabricator.wikimedia.org/T147423#2915170 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi All HHVM clusters now in Prometheus [23:51:51] 06Operations, 13Patch-For-Review, 05Prometheus-metrics-monitoring: Port application-specific metrics from ganglia to prometheus - https://phabricator.wikimedia.org/T145659#2637292 (10fgiunchedi) [23:51:53] 06Operations, 13Patch-For-Review, 05Prometheus-metrics-monitoring, 15User-Elukey: Port apache httpd metrics from ganglia to prometheus - https://phabricator.wikimedia.org/T147316#2915174 (10fgiunchedi) 05Open>03Resolved All apache clusters now in prometheus [23:52:01] (03CR) 10Thcipriani: [C: 032] Group0 to 1.29.0-wmf.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330331 (owner: 10Thcipriani) [23:52:42] (03PS2) 10Filippo Giunchedi: prometheus: use key/value for gdnsd rcodes [puppet] - 10https://gerrit.wikimedia.org/r/327282 (https://phabricator.wikimedia.org/T147426) [23:52:44] (03Merged) 10jenkins-bot: Group0 to 1.29.0-wmf.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330331 (owner: 10Thcipriani) [23:53:01] (03CR) 10jenkins-bot: Group0 to 1.29.0-wmf.7 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/330331 (owner: 10Thcipriani) [23:54:39] 06Operations, 13Patch-For-Review, 05Prometheus-metrics-monitoring: Port application-specific metrics from ganglia to prometheus - https://phabricator.wikimedia.org/T145659#2915180 (10fgiunchedi) [23:54:41] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: use key/value for gdnsd rcodes [puppet] - 10https://gerrit.wikimedia.org/r/327282 (https://phabricator.wikimedia.org/T147426) (owner: 10Filippo Giunchedi) [23:55:43] (03PS1) 10Alex Monk: check_graphite: Fix some IndexError exceptions in Threshold.parse_result [puppet] - 10https://gerrit.wikimedia.org/r/330332 (https://phabricator.wikimedia.org/T154533) [23:57:31] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.29.0-wmf.7 [23:57:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:58:41] (03PS1) 10BryanDavis: toollabs: remove host aliases for tools-exec-121[2-6] [puppet] - 10https://gerrit.wikimedia.org/r/330333 (https://phabricator.wikimedia.org/T154539)