[00:00:04] twentyafterfour: Respected human, time to deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170316T0000). Please do the needful. [00:01:17] (03PS10) 10Dzahn: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 [00:03:28] (03CR) 10Dzahn: [C: 04-1] "@paladox: ok, true about host/master_host. what about "slave" though" [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [00:03:38] (03CR) 10jerkins-bot: [V: 04-1] gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [00:03:51] (03CR) 10Paladox: "Slaves could be configured without hiera if you want :)" [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [00:08:21] 06Operations, 10Analytics, 10Analytics-Cluster, 10hardware-requests: EQIAD: stat1003 replacement - https://phabricator.wikimedia.org/T159839#3104654 (10RobH) Raid5 is being used in production on these boxes? That seems, non-ideal.... I'll start pulling together a quotes for this shortly. While this sys... [00:10:27] (03PS11) 10Dzahn: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 [00:13:27] (03CR) 10jerkins-bot: [V: 04-1] gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [00:21:28] (03CR) 10Paladox: gerrit: convert to profile/role structure (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [00:23:46] (03CR) 10Dzahn: "yes, fails when backup::host isn't in the same place and was told in comment before to move it" [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [00:28:23] (03CR) 10Dzahn: gerrit: convert to profile/role structure (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [00:29:55] 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch, 10Wikimedia-Logstash: Import new kibana and logstash .debs to wikimedia experimental repository - https://phabricator.wikimedia.org/T160597#3104732 (10EBernhardson) [00:30:43] (03PS12) 10Dzahn: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 [00:31:39] (03CR) 10jerkins-bot: [V: 04-1] gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [00:31:56] (03CR) 10Jforrester: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341195 (https://phabricator.wikimedia.org/T159634) (owner: 10Mbch331) [00:38:13] (03PS13) 10Dzahn: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 [00:38:52] PROBLEM - puppet last run on elastic1029 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:39:32] (03CR) 10Jforrester: [C: 031] "Seems sane. Happy for this to be SWATed whenever." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341195 (https://phabricator.wikimedia.org/T159634) (owner: 10Mbch331) [00:40:32] PROBLEM - puppet last run on conf1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:51:31] (03Abandoned) 10Dzahn: Add python3-pil for ConfirmEdit captcha generation [puppet] - 10https://gerrit.wikimedia.org/r/337248 (owner: 10Reedy) [01:05:38] (03PS14) 10Dzahn: gerrit: convert to profile/role structure [puppet] - 10https://gerrit.wikimedia.org/r/342692 [01:06:52] RECOVERY - puppet last run on elastic1029 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [01:08:32] RECOVERY - puppet last run on conf1003 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [01:15:12] PROBLEM - check_puppetrun on bellatrix is CRITICAL: CRITICAL: Puppet has 1 failures [01:20:12] RECOVERY - check_puppetrun on bellatrix is OK: OK: Puppet is currently enabled, last run 83 seconds ago with 0 failures [01:22:26] 06Operations: provide download numbers for tails iso - https://phabricator.wikimedia.org/T160600#3104815 (10Dzahn) [01:25:32] 06Operations: provide download numbers for tails iso - https://phabricator.wikimedia.org/T160600#3104832 (10Dzahn) [01:29:03] (03CR) 10Dzahn: "almost now .. except the bacula thing: http://puppet-compiler.wmflabs.org/5789/" [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [01:29:42] PROBLEM - puppet last run on db1039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:38:09] (03CR) 10Paladox: gerrit: convert to profile/role structure (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [01:59:42] RECOVERY - puppet last run on db1039 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [02:24:13] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.15) (duration: 08m 51s) [02:24:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:55:47] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.16) (duration: 13m 39s) [02:55:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:01:37] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Mar 16 03:01:37 UTC 2017 (duration 5m 50s) [03:01:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:44:32] PROBLEM - puppet last run on mw1197 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:12:32] RECOVERY - puppet last run on mw1197 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [05:06:42] PROBLEM - puppet last run on ms-be1023 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:35:42] RECOVERY - puppet last run on ms-be1023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:38:42] PROBLEM - puppet last run on elastic1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:03:56] (03PS8) 10Mbch331: Remove exception on Other Projects sidebar for Dutch Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341195 (https://phabricator.wikimedia.org/T159634) [06:06:42] RECOVERY - puppet last run on elastic1030 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:45:52] PROBLEM - puppet last run on analytics1048 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[tzdata] [06:47:23] 06Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1070 - https://phabricator.wikimedia.org/T158969#3105045 (10Marostegui) >>! In T158969#3071469, @Cmjohnson wrote: > db1070 is under warranty for 2 more months. Requested new part from DEll > > Congratulations: Work Order SR944780612 was successfully submi... [07:08:48] !log Starting pt-table-checksum on s6 (frwiki) - T160509 [07:08:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:08:55] T160509: run pt-tablechecksum on s6 - https://phabricator.wikimedia.org/T160509 [07:13:53] RECOVERY - puppet last run on analytics1048 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [07:36:11] !log Deploy schema change on s7 - T160415 [07:36:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:36:17] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [07:51:21] !log Deploy schema change on s1 - T160415 [07:51:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:51:26] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [07:51:42] PROBLEM - puppet last run on mw1295 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:01:09] (03PS1) 10Marostegui: s1,s2.hosts: Move db1067 from s2 to s1 [software] - 10https://gerrit.wikimedia.org/r/342990 (https://phabricator.wikimedia.org/T160435) [08:01:42] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Add redis switching task, some more stages boilerplate [switchdc] - 10https://gerrit.wikimedia.org/r/342498 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [08:08:13] PROBLEM - puppet last run on cp3037 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 2 minutes ago with 3 failures. Failed resources (up to 3 shown): File[/home/faidon],File[/home/gehel],File[/home/otto] [08:08:42] RECOVERY - puppet last run on mw1295 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [08:12:17] !log upgrading apache on einsteinium/icinga.wikimedia.org [08:12:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:13:42] PROBLEM - puppet last run on mw1271 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:13:52] PROBLEM - puppet last run on labvirt1011 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/home/akosiaris] [08:16:00] (03PS1) 10Marostegui: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342991 (https://phabricator.wikimedia.org/T160415) [08:18:38] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342991 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [08:19:44] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342991 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [08:19:53] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342991 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [08:20:51] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1080 - T160415 (duration: 00m 47s) [08:20:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:20:59] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [08:21:27] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342992 [08:24:44] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342992 (owner: 10Marostegui) [08:25:53] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342992 (owner: 10Marostegui) [08:26:52] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1080 - T160415 (duration: 00m 46s) [08:26:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:26:59] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [08:27:01] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1080" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342992 (owner: 10Marostegui) [08:29:12] (03PS1) 10Marostegui: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342993 (https://phabricator.wikimedia.org/T160415) [08:31:16] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342993 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [08:33:07] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342993 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [08:33:29] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342993 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [08:33:59] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1089 - T160415 (duration: 00m 41s) [08:34:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:34:05] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [08:35:21] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342994 [08:36:12] RECOVERY - puppet last run on cp3037 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [08:37:52] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342994 (owner: 10Marostegui) [08:38:24] 06Operations, 07HHVM, 13Patch-For-Review, 07Upstream: Build / migrate to HHVM 3.18 - https://phabricator.wikimedia.org/T158176#3105208 (10MoritzMuehlenhoff) 05Open>03Resolved The crash hasn't happened again over the course of the last ~ 16 hours in Beta (and previously we've hit it about 10 times per h... [08:40:43] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342994 (owner: 10Marostegui) [08:40:55] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1089" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342994 (owner: 10Marostegui) [08:40:58] 06Operations, 07HHVM, 13Patch-For-Review, 07Upstream: Build / migrate to HHVM 3.18 - https://phabricator.wikimedia.org/T158176#3105211 (10MoritzMuehlenhoff) 05Resolved>03Open [08:41:39] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1089 - T160415 (duration: 00m 43s) [08:41:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:41:45] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [08:41:52] RECOVERY - puppet last run on labvirt1011 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [08:42:42] RECOVERY - puppet last run on mw1271 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [08:42:47] (03PS1) 10Marostegui: db-eqiad.php: Depool db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342995 (https://phabricator.wikimedia.org/T160415) [08:47:04] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342995 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [08:49:45] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342995 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [08:49:58] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1073 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342995 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [08:51:09] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1073 - T160415 (duration: 00m 41s) [08:51:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:51:15] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [08:51:36] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1073" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342997 [08:53:01] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1073" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342997 (owner: 10Marostegui) [08:54:29] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1073" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342997 (owner: 10Marostegui) [08:54:37] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1073" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342997 (owner: 10Marostegui) [08:55:38] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1073 - T160415 (duration: 00m 41s) [08:55:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:56:11] (03CR) 10Alexandros Kosiaris: [C: 032] "Sent email to ops-l, no objections but rather an endorsement, change has been +1ed. I 'll merge, in case anything breaks we can always rev" [puppet] - 10https://gerrit.wikimedia.org/r/342637 (owner: 10Alexandros Kosiaris) [08:56:24] (03PS3) 10Alexandros Kosiaris: Update puppet-lint gem to 2.0.2 [puppet] - 10https://gerrit.wikimedia.org/r/342637 [08:56:37] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Update puppet-lint gem to 2.0.2 [puppet] - 10https://gerrit.wikimedia.org/r/342637 (owner: 10Alexandros Kosiaris) [08:57:16] (03PS3) 10Alexandros Kosiaris: Set up sync for icinga state between hosts [puppet] - 10https://gerrit.wikimedia.org/r/342833 [08:57:57] !log codfw-prod: add ms-be203[1-9] - T158337 [08:58:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:58:04] T158337: codfw: ms-be2028-ms-be2039 rack/setup - https://phabricator.wikimedia.org/T158337 [08:58:38] (03CR) 10Jcrespo: [C: 032] s1,s2.hosts: Move db1067 from s2 to s1 [software] - 10https://gerrit.wikimedia.org/r/342990 (https://phabricator.wikimedia.org/T160435) (owner: 10Marostegui) [09:00:06] (03PS1) 10Marostegui: db-eqiad.php: Depool db1072 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342998 (https://phabricator.wikimedia.org/T160415) [09:01:23] (03PS4) 10Alexandros Kosiaris: Set up sync for icinga state between hosts [puppet] - 10https://gerrit.wikimedia.org/r/342833 [09:01:37] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1072 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342998 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [09:02:51] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1072 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342998 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [09:03:08] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1072 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342998 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [09:03:47] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1072 - T160415 (duration: 00m 41s) [09:03:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:03:53] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [09:04:22] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1072" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342999 [09:06:00] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1072" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342999 (owner: 10Marostegui) [09:07:33] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1072" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342999 (owner: 10Marostegui) [09:07:42] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1072" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342999 (owner: 10Marostegui) [09:08:35] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1072 - T160415 (duration: 00m 42s) [09:08:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:09:11] (03PS1) 10Marostegui: db-eqiad.php: Depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343000 (https://phabricator.wikimedia.org/T160415) [09:10:34] !log upgrading apache on mendelevium/OTRS [09:10:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:11:29] !log upgrading apache on fermium/lists.wikimedia.org [09:11:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:13:10] (03Merged) 10jenkins-bot: s1,s2.hosts: Move db1067 from s2 to s1 [software] - 10https://gerrit.wikimedia.org/r/342990 (https://phabricator.wikimedia.org/T160435) (owner: 10Marostegui) [09:13:53] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343000 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [09:14:12] (03PS1) 10Gehel: maps - tuning of postgresql based on experience [puppet] - 10https://gerrit.wikimedia.org/r/343001 (https://phabricator.wikimedia.org/T160556) [09:14:51] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343000 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [09:15:34] 06Operations, 15User-fgiunchedi: Enable HTTPS for swift clients - https://phabricator.wikimedia.org/T160616#3105285 (10fgiunchedi) [09:15:48] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1066 - T160415 (duration: 00m 42s) [09:15:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:15:54] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [09:15:56] gehel dcausse when you have a minute, I've pinged you in ^ re: https and mediawiki [09:16:06] (03PS1) 10Volans: Add flake8 check to tox [switchdc] - 10https://gerrit.wikimedia.org/r/343002 [09:16:30] 06Operations, 15User-fgiunchedi: encrypt syslog traffic - https://phabricator.wikimedia.org/T136312#3105300 (10fgiunchedi) [09:16:31] godog: having a look... [09:16:34] godog: looking [09:17:11] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1066 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343000 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [09:17:38] wow that was fast, thanks! [09:18:02] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1066" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343003 [09:18:10] godog: your request came in just before my coffee break... :P [09:18:27] godog: I'm replying on the task, but ping me if you want to discuss in more details [09:18:51] gehel: ok will do! [09:19:02] I'm taking notes on your coffee break times [09:19:07] 06Operations, 15User-Elukey: JobQueue Redis codfw replicas periodically lagging - https://phabricator.wikimedia.org/T159850#3105305 (10Joe) @akosiaris are you sure about that? If replica is broken the rdb file is transferred and from what I see only some are larger than 500 MB. We should probably do the follo... [09:19:11] in case I have more requests too [09:19:54] * gehel will start publishing coffee breaks to graphite... [09:20:08] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1066" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343003 (owner: 10Marostegui) [09:21:08] godog: not sure to understand, you want know what we did in mediawiki to properly emit https requests to an internal service? [09:21:26] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1066" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343003 (owner: 10Marostegui) [09:21:34] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1066" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343003 (owner: 10Marostegui) [09:22:02] dcausse: yep exactly! [09:22:07] dcausse: the part that godog is probably interested with is the fact that we had to switch to HTTP connection pooling to mitigate SSL handshake overhead [09:22:25] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1066 - T160415 (duration: 00m 42s) [09:22:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:32] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [09:22:33] yeah and other gotchas like that [09:23:29] 06Operations, 15User-fgiunchedi: Enable HTTPS for swift clients - https://phabricator.wikimedia.org/T160616#3105331 (10Gehel) When activating HTTPS for mediawiki -> elasticsearch traffic, we had to enable HTTP connection pooling to mitigate the SSL handshake overhead. This was particularly important for cross... [09:24:10] dcausse: you probably have more details on what was done on the mediawiki side. I added my bit of info already. I'll get back to it if I remember something else. [09:24:31] sure, will do [09:25:20] (03PS1) 10Marostegui: db-eqiad.php: Depool db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343004 (https://phabricator.wikimedia.org/T160415) [09:26:01] <_joe_> dcausse: yes, I remember you had some perf issues when we switched to https [09:26:14] <_joe_> gehel: btw, there is something I wanted to talk with you about [09:26:24] yes connection init is slow with https over codfw [09:26:27] <_joe_> regarding TLS termination [09:26:43] (03CR) 10Alexandros Kosiaris: "minor comment inline about wal files" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/343001 (https://phabricator.wikimedia.org/T160556) (owner: 10Gehel) [09:26:47] <_joe_> gehel: you are still using the puppet cert for that, right? [09:26:58] _joe_: right [09:27:25] <_joe_> gehel: what is the host mediawiki connects to? [09:27:37] <_joe_> search.svc.$site.wmnet? [09:27:39] <_joe_> right? [09:27:45] right again! [09:28:13] <_joe_> so, the TLS cert puppet creates is RSA-based and has a large key [09:28:30] <_joe_> which makes TLS expensive both in terms of handshake and performance [09:28:47] <_joe_> I created a system to create ECDSA keys and make puppet sign them [09:28:59] sounds interesting! [09:29:08] <_joe_> so that they can be used for efficient internal TLS termination [09:29:36] <_joe_> we also have a script in puppet/utils to make the whole process from your computer [09:29:39] _joe_: you have a pointer to an example somewhere? [09:29:56] <_joe_> gehel: swift and mediawiki :) [09:30:07] <_joe_> I think there is a page on wikitech, let me find it [09:30:07] (03PS2) 10Marostegui: db-eqiad.php: Depool db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343004 (https://phabricator.wikimedia.org/T160415) [09:31:18] <_joe_> gehel: https://wikitech.wikimedia.org/wiki/Puppet-ecdsacert [09:31:37] 06Operations, 15User-fgiunchedi: Enable HTTPS for swift clients - https://phabricator.wikimedia.org/T160616#3105341 (10dcausse) Connection pooling is not available out of the box and only supported by HHVM. The code needed to enable connection pooling is not available in a generic fashion in mediawiki core but... [09:32:40] * gehel is going to do some reading after coffee... [09:33:00] <_joe_> gehel: also, if you do that, please ping me as we want to add dns discovery hostnames to the SAN [09:34:01] <_joe_> oh, the script is waiting for someone to review it: https://gerrit.wikimedia.org/r/#/c/340107/ [09:34:08] _joe_: I will! Not sure I understand exactly what "dns discovery hostnames" are about (well, I get the idea) [09:34:20] (03PS1) 10Filippo Giunchedi: prometheus: set the correct owner/group for generated snmp config [puppet] - 10https://gerrit.wikimedia.org/r/343006 [09:34:29] <_joe_> gehel: so both dcs should answer to "search.discovery.wmnet" [09:34:38] Ok, make sense. [09:35:07] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343004 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [09:35:52] _joe_: we probably need to have a discussion with you, Erik and David about that. Atm we have some routing in mw-config, which is a bit more complex than switching the whole DC at a time [09:37:06] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Decouple logging setup from importing the module [switchdc] - 10https://gerrit.wikimedia.org/r/342657 (owner: 10Giuseppe Lavagetto) [09:37:26] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343004 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [09:37:34] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1055 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343004 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [09:38:29] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1055 - T160415 (duration: 00m 47s) [09:38:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:38:35] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [09:38:42] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1055" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343007 [09:40:10] (03PS2) 10Filippo Giunchedi: prometheus: set the correct owner/group for generated snmp config [puppet] - 10https://gerrit.wikimedia.org/r/343006 [09:43:09] (03CR) 10Filippo Giunchedi: [C: 032] prometheus: set the correct owner/group for generated snmp config [puppet] - 10https://gerrit.wikimedia.org/r/343006 (owner: 10Filippo Giunchedi) [09:44:17] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1055" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343007 (owner: 10Marostegui) [09:44:27] (03PS1) 10Giuseppe Lavagetto: profile::cumin::target: add parameters/tags to allow easy selection [puppet] - 10https://gerrit.wikimedia.org/r/343008 [09:45:36] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1055" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343007 (owner: 10Marostegui) [09:46:03] !log upgrading apache on cobalt/gerrit [09:46:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:46:30] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1055 - T160415 (duration: 00m 42s) [09:46:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:46:35] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [09:46:56] (03PS1) 10Marostegui: db-eqiad.php: Depool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343009 (https://phabricator.wikimedia.org/T160415) [09:47:14] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1055" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343007 (owner: 10Marostegui) [09:52:40] 06Operations, 10Traffic, 07HTTPS, 15User-fgiunchedi: Enable HTTPS for swift clients - https://phabricator.wikimedia.org/T160616#3105367 (10Aklapper) [09:53:15] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343009 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [09:54:09] (03CR) 10Alexandros Kosiaris: [C: 032] Set up sync for icinga state between hosts [puppet] - 10https://gerrit.wikimedia.org/r/342833 (owner: 10Alexandros Kosiaris) [09:54:20] (03PS5) 10Alexandros Kosiaris: Set up sync for icinga state between hosts [puppet] - 10https://gerrit.wikimedia.org/r/342833 [09:54:25] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Set up sync for icinga state between hosts [puppet] - 10https://gerrit.wikimedia.org/r/342833 (owner: 10Alexandros Kosiaris) [09:54:35] 06Operations, 10Monitoring, 13Patch-For-Review, 15User-fgiunchedi: Extract metrics from logs - https://phabricator.wikimedia.org/T147923#3105382 (10fgiunchedi) 05stalled>03Open Upstream patch has been merged, though there's another outstanding issue https://github.com/google/mtail/issues/56 which makes... [09:54:41] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343009 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [09:54:50] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343009 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [09:55:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:44] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [09:56:25] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1051" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343011 [09:56:35] !log installing libevent security updates [09:56:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:59:20] (03PS1) 10Alexandros Kosiaris: It's "partner" sync_icinga_state.sh.erb (typo fix) [puppet] - 10https://gerrit.wikimedia.org/r/343012 [09:59:40] (03PS2) 10Alexandros Kosiaris: It's "partner" sync_icinga_state.sh.erb (typo fix) [puppet] - 10https://gerrit.wikimedia.org/r/343012 [09:59:49] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] It's "partner" sync_icinga_state.sh.erb (typo fix) [puppet] - 10https://gerrit.wikimedia.org/r/343012 (owner: 10Alexandros Kosiaris) [10:01:22] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 277 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [10:03:29] (03PS1) 10Volans: Cumin: update config to new layout [puppet] - 10https://gerrit.wikimedia.org/r/343014 (https://phabricator.wikimedia.org/T160621) [10:03:46] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1051" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343011 (owner: 10Marostegui) [10:04:45] (03CR) 10Giuseppe Lavagetto: [C: 031] Cumin: update config to new layout [puppet] - 10https://gerrit.wikimedia.org/r/343014 (https://phabricator.wikimedia.org/T160621) (owner: 10Volans) [10:06:22] RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 18 probes of 277 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map [10:06:32] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1051" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343011 (owner: 10Marostegui) [10:07:22] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1051" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343011 (owner: 10Marostegui) [10:07:37] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1051 - T160415 (duration: 00m 41s) [10:07:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:07:43] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [10:08:55] (03CR) 10Volans: [C: 032] Cumin: update config to new layout [puppet] - 10https://gerrit.wikimedia.org/r/343014 (https://phabricator.wikimedia.org/T160621) (owner: 10Volans) [10:10:36] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Add cache wipe + warmup phase implementation [switchdc] - 10https://gerrit.wikimedia.org/r/342666 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [10:11:12] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] Add conftool support [switchdc] - 10https://gerrit.wikimedia.org/r/342805 (https://phabricator.wikimedia.org/T160178) (owner: 10Giuseppe Lavagetto) [10:12:12] (03PS4) 10Gilles: Performance Grafana alerts [puppet] - 10https://gerrit.wikimedia.org/r/342431 (https://phabricator.wikimedia.org/T156245) [10:12:27] (03CR) 10Gilles: Performance Grafana alerts (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/342431 (https://phabricator.wikimedia.org/T156245) (owner: 10Gilles) [10:12:34] !log upgraded cumin to version 0.0.2 in the repository and on neodymium/sarin [10:12:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:17:51] 06Operations, 15User-Elukey: JobQueue Redis codfw replicas periodically lagging - https://phabricator.wikimedia.org/T159850#3105448 (10elukey) @Joe we could use SCAN/KEYS in a loop and check TTL and OBJECT IDLETIME of all the Redis keys in the eqiad slaves, getting some useful statistics about the Job queues. [10:19:05] (03CR) 10Gilles: "The error is:" [puppet] - 10https://gerrit.wikimedia.org/r/342811 (https://phabricator.wikimedia.org/T151065) (owner: 10Gilles) [10:20:16] (03CR) 10Filippo Giunchedi: [C: 031] Performance Grafana alerts [puppet] - 10https://gerrit.wikimedia.org/r/342431 (https://phabricator.wikimedia.org/T156245) (owner: 10Gilles) [10:20:31] 06Operations: Upgrading python-phabricator in trusty - https://phabricator.wikimedia.org/T160408#3105455 (10MoritzMuehlenhoff) 05Open>03Resolved python-phabricator 0.6.1 for trusty has been uploaded to apt.wikimedia.org [10:21:45] (03PS1) 10Muehlenhoff: Install python-phabricator module [puppet] - 10https://gerrit.wikimedia.org/r/343019 [10:27:12] (03CR) 10Muehlenhoff: [C: 032] Install python-phabricator module [puppet] - 10https://gerrit.wikimedia.org/r/343019 (owner: 10Muehlenhoff) [10:34:59] 06Operations, 15User-Elukey: JobQueue Redis codfw replicas periodically lagging - https://phabricator.wikimedia.org/T159850#3105488 (10akosiaris) >>! In T159850#3105305, @Joe wrote: > @akosiaris are you sure about that? If replica is broken the rdb file is transferred and from what I see only some are larger t... [10:40:42] PROBLEM - puppet last run on elastic1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:44:15] (03PS3) 10Ema: cache: different parity for start/end ip_local_port_range values [puppet] - 10https://gerrit.wikimedia.org/r/342832 [10:44:27] (03CR) 10Ema: [V: 032 C: 032] cache: different parity for start/end ip_local_port_range values [puppet] - 10https://gerrit.wikimedia.org/r/342832 (owner: 10Ema) [10:47:19] (03PS1) 10Jcrespo: mariadb: Repool db1067 after maintenanace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343023 [10:52:46] (03CR) 10Jcrespo: "This is so low in importance, that should be mixed with another merge at the same time." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343023 (owner: 10Jcrespo) [10:55:06] (03CR) 10Marostegui: [C: 031] "I can do it later, I have more servers to depool" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343023 (owner: 10Jcrespo) [10:56:50] !log joal@tin Started deploy [analytics/aqs/deploy@006bf8c]: (no justification provided) [10:56:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:59:02] PROBLEM - aqs endpoints health on aqs1004 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.64.0.107, port=7232): Max retries exceeded with url: /analytics.wikimedia.org/v1/?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [10:59:03] !log joal@tin Finished deploy [analytics/aqs/deploy@006bf8c]: (no justification provided) (duration: 02m 13s) [10:59:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:02] RECOVERY - aqs endpoints health on aqs1004 is OK: All endpoints are healthy [11:00:36] (03PS2) 10Alexandros Kosiaris: nagios: Specify a parents host relationship [puppet] - 10https://gerrit.wikimedia.org/r/334149 [11:00:41] joal: --$ [11:00:46] err --^ [11:01:26] !log joal@tin Started deploy [analytics/aqs/deploy@006bf8c]: (no justification provided) [11:01:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:01:52] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 3 others: Purge Varnish cache when a banner is saved - https://phabricator.wikimedia.org/T154954#3105544 (10ema) If I understand the main issue at hand correctly, the goal here is to make sure that developers can quic... [11:01:57] joal: ping [11:03:00] elukey: sorry - other chan [11:03:12] elukey: error at restart due to test data - Problem corrected [11:03:24] okok :) [11:03:35] elukey: thanks for monitoring ;) [11:03:48] (03CR) 10Paladox: [C: 031] "@Muehlenhoff hi could you re review / merge please?" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 (owner: 10Paladox) [11:03:49] elukey: continuing the deploy through hosts [11:04:56] sure [11:04:57] !log joal@tin Finished deploy [analytics/aqs/deploy@006bf8c]: (no justification provided) (duration: 03m 30s) [11:05:01] \o/ [11:05:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:07:15] 06Operations, 15User-Elukey: JobQueue Redis codfw replicas periodically lagging - https://phabricator.wikimedia.org/T159850#3105572 (10elukey) @akosiaris: I would also look in a different way - is it normal that the avg_ttl is so high for the keys in the job queues? If I got it correctly those are seconds, so... [11:08:19] (03PS1) 10Alexandros Kosiaris: redis: Increase the hard limit for slave output buffer [puppet] - 10https://gerrit.wikimedia.org/r/343027 (https://phabricator.wikimedia.org/T159850) [11:08:42] RECOVERY - puppet last run on elastic1019 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [11:13:12] PROBLEM - Check systemd state on bast3002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [11:14:14] bast3002 is me [11:16:09] (03CR) 10Alexandros Kosiaris: [C: 032] nagios: Specify a parents host relationship [puppet] - 10https://gerrit.wikimedia.org/r/334149 (owner: 10Alexandros Kosiaris) [11:16:32] (03CR) 10Alexandros Kosiaris: "let's see how badly this will break icinga :-)" [puppet] - 10https://gerrit.wikimedia.org/r/334149 (owner: 10Alexandros Kosiaris) [11:16:53] (03CR) 10Elukey: redis: Increase the hard limit for slave output buffer (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/343027 (https://phabricator.wikimedia.org/T159850) (owner: 10Alexandros Kosiaris) [11:18:12] RECOVERY - Check systemd state on bast3002 is OK: OK - running: The system is fully operational [11:21:13] (03CR) 10Alexandros Kosiaris: "So in the commit message I write" [puppet] - 10https://gerrit.wikimedia.org/r/343027 (https://phabricator.wikimedia.org/T159850) (owner: 10Alexandros Kosiaris) [11:22:24] (03PS2) 10Alexandros Kosiaris: redis: Increase the hard limit for slave output buffer [puppet] - 10https://gerrit.wikimedia.org/r/343027 (https://phabricator.wikimedia.org/T159850) [11:22:44] <_joe_> akosiaris: uhm [11:24:45] (03CR) 10Gehel: maps - tuning of postgresql based on experience (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/343001 (https://phabricator.wikimedia.org/T160556) (owner: 10Gehel) [11:24:52] PROBLEM - puppet last run on mw1185 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:26:43] !log enabled BBR as TCP congestion control algorithm on cp1008 [11:26:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:28:57] akosiaris: good for me, I checked the command that you executed and IIRC the soft limit was increased too, this is why I asked, but what you wrote makes sense. Worst that can happen is that we need to tune the soft limit too :) [11:29:16] moritzm: nice! [11:29:49] (03PS1) 10Addshore: wmgUseInterwikiSorting true for wikidataclients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343033 [11:30:12] (03PS2) 10Addshore: wmgUseInterwikiSorting true for wikidataclients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343033 (https://phabricator.wikimedia.org/T160465) [11:30:14] elukey: ttls are in milliseconds in the output of info keyspace [11:30:48] (03PS3) 10Addshore: wmgUseInterwikiSorting true for wikidataclients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343033 (https://phabricator.wikimedia.org/T160465) [11:31:03] (03CR) 10Addshore: [C: 04-2] "Depends-On must be merged and deployed first" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343033 (https://phabricator.wikimedia.org/T160465) (owner: 10Addshore) [11:31:12] PROBLEM - Check correctness of the icinga configuration on einsteinium is CRITICAL: Icinga configuration contains errors [11:31:20] <_joe_> akosiaris: ^^ [11:31:21] <_joe_> lol [11:31:34] so db0:keys=4043206,expires=4040878,avg_ttl=1160204506 would mean 4043206 keys, out of which 4040878 have an expiry set with an average of 11060204506/86400/1000 => 128 days [11:31:38] damn [11:31:40] <_joe_> why did you go down that rabbithole, alice? [11:31:45] * akosiaris looking into icinga [11:32:02] how does 128days sound for a TTL for the jobqueue btw ? [11:32:09] a little bit high I 'd guess [11:32:27] <_joe_> akosiaris: realistic [11:32:33] akosiaris: 1160204506 ms are ~13 days no? [11:32:34] <_joe_> :P [11:33:06] <_joe_> elukey: 13 days, 5 hours, give or take [11:33:07] did calc just fail me ? [11:33:17] <_joe_> akosiaris: a day is 86400 seconds [11:33:39] yes it's realistic [11:33:46] probably pebkac on my part [11:33:54] so yeah makes sense [11:34:00] <_joe_> a year is 3.15e7 seconds [11:34:31] <_joe_> #thingsAstrophysicistsRemember [11:34:50] <_joe_> so 13 days is actually optimistic :P [11:35:00] given our history ? yes [11:36:07] Warning: Duplicate definition found for service 'Varnishkafka log producer' on host 'cp4018' (config file '/etc/icinga/puppet_services.cfg', starting on line 99775) [11:36:12] ema: ^ [11:36:20] it's a warning, not urgent but icinga is not happy [11:36:26] probably exists for a long time [11:36:34] Warning: Duplicate definition found for service 'keystone http' on host 'labtestcontrol2001' (config file '/etc/icinga/puppet_services.cfg', starting on line 204221) [11:36:34] Warning: Duplicate definition found for service 'keystone http' on host 'labcontrol1001' (config file '/etc/icinga/puppet_services.cfg', starting on line 196772) [11:36:36] andrewbogott: ^ [11:36:42] same as above [11:36:47] and now let's look at the actual error [11:37:05] Error: 'asw-d-eqiad' is not a valid parent for host 'analytics1002' (file '/etc/icinga/puppet_hosts.cfg', line 281)! [11:37:08] ahmm [11:37:13] hmm [11:37:13] whaaat [11:37:26] seems like we don't have our switches in icinga [11:37:49] yup, we don't.. cause we rely on librenms [11:37:50] damn [11:37:54] reverting [11:38:18] should have checked that before merging .. lol [11:38:28] jouncebot refresh [11:38:34] I refreshed my knowledge about deployments. [11:38:40] jouncebot next [11:38:40] In 0 hour(s) and 21 minute(s): InterwikiSorting (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170316T1200) [11:39:09] (03PS1) 10Alexandros Kosiaris: Revert "nagios: Specify a parents host relationship" [puppet] - 10https://gerrit.wikimedia.org/r/343035 [11:39:16] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Revert "nagios: Specify a parents host relationship" [puppet] - 10https://gerrit.wikimedia.org/r/343035 (owner: 10Alexandros Kosiaris) [11:41:48] (03PS1) 10Alexandros Kosiaris: Revert "Revert "nagios: Specify a parents host relationship"" [puppet] - 10https://gerrit.wikimedia.org/r/343036 [11:48:03] !log repair prometheus' leveldb database archived_fingerprint_to_metric on bast3002, upgrade prometheus to latest version from jessie-backports [11:48:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:49:59] (03CR) 10Alexandros Kosiaris: maps - tuning of postgresql based on experience (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/343001 (https://phabricator.wikimedia.org/T160556) (owner: 10Gehel) [11:52:52] RECOVERY - puppet last run on mw1185 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [11:56:18] (03PS4) 10Addshore: wmgUseInterwikiSorting true for wikidataclients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343033 (https://phabricator.wikimedia.org/T160465) [12:00:04] addshore: Respected human, time to deploy InterwikiSorting (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170316T1200). Please do the needful. [12:01:57] o/ [12:06:52] PROBLEM - puppet last run on mc1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:07:11] ebernhardson it looks like you merged something on extensions/ApiFeatureUsage, on php-1.29.0-wmf.16 but didn't deploy? [12:09:04] Actually looks like it was MModell! [12:11:56] hashar ^^ generally what should I do in this situation? :P [12:12:42] I mean, the instructions are just "If there are other changes besides yours, go yell at the culprit" [12:17:47] !log addshore@tin Synchronized php-1.29.0-wmf.15/extensions/InterwikiSorting: [[gerrit:343031|Use ExtensionFunctions instead of BeforeInitialize hook]] T160465 (duration: 00m 43s) [12:17:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:17:53] T160465: Interwiki order recently changed - https://phabricator.wikimedia.org/T160465 [12:19:06] addshore: was lunching sorry [12:19:12] no worries :) [12:19:15] addshore: looks like it was swat related [12:20:04] addshore: oohh hmmm ahhhh [12:20:14] Synchronized php-1.29.0-wmf.16/extensions/ApiFeatureUsage/ApiFeatureUsageQueryEngineElastica.php [12:20:16] gopt deployed [12:20:56] ack, as far as I can tell then, that deployed nothing? [12:21:10] unless im missing something [12:21:14] OHHHHHHHHHH [12:21:48] what shows up that it has not been deployed? [12:22:03] /srv/mediawiki-staging/php-1.29.0-wmf.16 shows the submodule has new commits [12:22:08] https://www.irccloud.com/pastebin/kWZUbZPV/ [12:22:28] which most probably is because Gerrit did not update the mediawiki/core submodules [12:24:23] hmm no [12:24:39] origin/wmf/1.29.0-wmf.16 definitely has Erik and your extensions submodules bumps [12:24:42] PROBLEM - puppet last run on mc1032 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:25:02] [origin/wmf/1.29.0-wmf.16: ahead 5, behind 2] [12:25:13] indeed, but it appears to already have the change? O_o [12:25:17] also [12:25:20] so most probably [12:25:29] people did a git pull in the extension dir [12:25:39] instead of from mediawiki/core dir [12:26:03] okay, as as far as I can tell I can continue and nothing will happen ;) [12:26:17] (nothing bad) [12:28:10] and I guess I should do git submodule update --init --recursive for both my change and the other change [12:29:12] so yeah in /srv/mediawiki-staging/php-1.29.0-wmf.16 [12:29:14] do a git rebase [12:29:18] and the submodule update [12:29:25] and that whoud align them both [12:29:43] we should switch back to Subversion. It was easier [12:30:06] looking good :) [12:31:03] yup \o/ [12:31:29] !log addshore@tin Synchronized php-1.29.0-wmf.16/extensions/InterwikiSorting: [[gerrit:343032|Use ExtensionFunctions instead of BeforeInitialize hook]] T160465 (duration: 00m 43s) [12:31:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:31:36] T160465: Interwiki order recently changed - https://phabricator.wikimedia.org/T160465 [12:34:52] RECOVERY - puppet last run on mc1003 is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [12:38:27] !log addshore@tin Synchronized php-1.29.0-wmf.16/extensions/InterwikiSorting: [[gerrit:343032|Use ExtensionFunctions instead of BeforeInitialize hook]] T160465 (duration: 00m 44s) [12:38:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:38:33] T160465: Interwiki order recently changed - https://phabricator.wikimedia.org/T160465 [12:39:20] !log Deploy schema change on s5 - T160415 [12:39:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:39:27] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [12:39:36] (03CR) 10Addshore: [C: 032] wmgUseInterwikiSorting true for wikidataclients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343033 (https://phabricator.wikimedia.org/T160465) (owner: 10Addshore) [12:41:06] (03Merged) 10jenkins-bot: wmgUseInterwikiSorting true for wikidataclients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343033 (https://phabricator.wikimedia.org/T160465) (owner: 10Addshore) [12:41:09] 06Operations, 10Traffic, 05MW-1.28-release (WMF-deploy-2016-08-09_(1.28.0-wmf.14)), 13Patch-For-Review: Decom bits.wikimedia.org hostname - https://phabricator.wikimedia.org/T107430#3105821 (10BBlack) 05Open>03Resolved a:03BBlack This was resolved on the server side back in early Dec when the MW conf... [12:41:16] (03CR) 10jenkins-bot: wmgUseInterwikiSorting true for wikidataclients [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343033 (https://phabricator.wikimedia.org/T160465) (owner: 10Addshore) [12:41:33] (03Restored) 10Hashar: prometheus: fix lint warning [puppet] - 10https://gerrit.wikimedia.org/r/341965 (owner: 10Dzahn) [12:41:40] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/341965 (owner: 10Dzahn) [12:42:40] jouncebot, next [12:42:40] In 0 hour(s) and 17 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170316T1300) [12:42:40] (03CR) 10jerkins-bot: [V: 04-1] prometheus: fix lint warning [puppet] - 10https://gerrit.wikimedia.org/r/341965 (owner: 10Dzahn) [12:44:00] (03CR) 10Hashar: "puppet-lint got bumped to 2.0.2 meanwhile, but the arrow alignements still fail." [puppet] - 10https://gerrit.wikimedia.org/r/341965 (owner: 10Dzahn) [12:44:14] akosiaris: gotta ask the new puppet lint to ignore the 140chars check ? :] [12:44:55] (03Restored) 10Hashar: puppet-lint: ignore 'lines over 140 chars' warnings [puppet] - 10https://gerrit.wikimedia.org/r/322907 (owner: 10Dzahn) [12:47:19] (03PS5) 10Hashar: puppet-lint: ignore 'lines over 140 chars' warnings [puppet] - 10https://gerrit.wikimedia.org/r/322907 (owner: 10Dzahn) [12:47:42] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:48:28] (03CR) 10Hashar: "Rebased / fixed trivial conflict. 74a3d73d4f bumped puppet-lint to 2.0.2 so we now want to ignore lines longer than 140 chars. At least " [puppet] - 10https://gerrit.wikimedia.org/r/322907 (owner: 10Dzahn) [12:49:04] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:343033|wmgUseInterwikiSorting true for wikidataclients]] T160465 T150183 (duration: 00m 42s) [12:49:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:49:10] T160465: Interwiki order recently changed - https://phabricator.wikimedia.org/T160465 [12:49:10] T150183: Deploy InterwikiSorting extension to production - https://phabricator.wikimedia.org/T150183 [12:49:42] 06Operations, 10MediaWiki-extensions-InterwikiSorting, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Deploy InterwikiSorting extension to production - https://phabricator.wikimedia.org/T150183#3105852 (10Addshore) 05Open>03Resolved [12:49:44] 06Operations, 07Puppet, 13Patch-For-Review: Update puppet-lint to 2.* - https://phabricator.wikimedia.org/T144667#3105855 (10hashar) 05Open>03Resolved Done by @Dzahn / @akosiaris https://gerrit.wikimedia.org/r/#/c/342637/ - bump version https://gerrit.wikimedia.org/r/#/c/322907/ - ignore 140chars [12:51:17] Urbanecm: looks like the only thing for eu swat today is your patch, I will take care of it cc hashar [12:51:29] zeljkof, okay, thank you! [12:51:42] RECOVERY - puppet last run on mc1032 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [12:51:43] \o/ [12:52:04] (03PS2) 10Addshore: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342644 (https://phabricator.wikimedia.org/T160427) (owner: 10Urbanecm) [12:52:21] zeljkof: I just rebased it for you ;) [12:52:36] addshore: want to do the deployment? :) [12:52:53] Can do, I'm already logged in everywhere / have everything open :) [12:53:00] (because of the last deploy slow) [12:53:02] *slot [12:53:06] addshore: in that case, go ahead :) cc Urbanecm [12:53:59] (03CR) 10Addshore: [C: 032] Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342644 (https://phabricator.wikimedia.org/T160427) (owner: 10Urbanecm) [12:54:11] Urbanecm: going slightly early ;) [12:55:40] (03Merged) 10jenkins-bot: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342644 (https://phabricator.wikimedia.org/T160427) (owner: 10Urbanecm) [12:56:53] (03CR) 10jenkins-bot: Add new throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342644 (https://phabricator.wikimedia.org/T160427) (owner: 10Urbanecm) [12:58:04] (03CR) 10Gehel: maps - tuning of postgresql based on experience (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/343001 (https://phabricator.wikimedia.org/T160556) (owner: 10Gehel) [12:59:14] !log addshore@tin Synchronized wmf-config/throttle.php: SWAT: [[gerrit:342644|Add new throttle rule]] T160427 (lift of IP cap for RIT - March 25, 2017) (duration: 00m 43s) [12:59:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:59:20] T160427: Requesting temporary lift of IP cap for RIT - March 25, 2017 - https://phabricator.wikimedia.org/T160427 [12:59:23] Urbanecm ^^ [13:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170316T1300). [13:00:04] Urbanecm: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:00:11] o/ all done alreadyy! [13:00:38] this was a quick eu swat :) [13:00:45] #instaSWAT ;) [13:00:52] (03PS1) 10Marostegui: db-eqiad.php: Depool db1026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343041 (https://phabricator.wikimedia.org/T160415) [13:00:53] addshore nice! so I can go ahead with ^ [13:00:56] :) [13:01:00] marostegui: yup! [13:01:24] thanks! [13:01:26] !log EU SWAT done [13:01:29] addshore, thank you for deploying! [13:01:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:34] (03PS2) 10Marostegui: mariadb: Repool db1067 after maintenanace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343023 (owner: 10Jcrespo) [13:03:14] (03CR) 10Marostegui: [C: 032] mariadb: Repool db1067 after maintenanace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343023 (owner: 10Jcrespo) [13:05:02] PROBLEM - puppet last run on oxygen is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:06:08] (03Merged) 10jenkins-bot: mariadb: Repool db1067 after maintenanace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343023 (owner: 10Jcrespo) [13:06:24] (03PS2) 10Marostegui: db-eqiad.php: Depool db1026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343041 (https://phabricator.wikimedia.org/T160415) [13:07:41] (03CR) 10jenkins-bot: mariadb: Repool db1067 after maintenanace [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343023 (owner: 10Jcrespo) [13:13:02] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343041 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [13:14:21] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343041 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [13:14:33] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1026 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343041 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [13:14:42] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [13:15:53] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1026 - T160415, Repool db1067 - T160435 (duration: 00m 42s) [13:16:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:03] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [13:16:03] T160435: db1057 does not react to powercycle/powerdown/powerup commands - https://phabricator.wikimedia.org/T160435 [13:16:10] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1026" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343043 [13:17:40] 06Operations, 10Analytics, 10Analytics-Cluster, 10hardware-requests: EQIAD: stat1003 replacement - https://phabricator.wikimedia.org/T159839#3105902 (10Ottomata) We want a RAID that gives us the most space with a little bit of redundancy. Is RAID 5 not the best choice? [13:18:45] Raymond_: ping? [13:19:08] Raymond_: do you wish we deploy swwiki Wikichanzo:? [13:21:00] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1026" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343043 (owner: 10Marostegui) [13:22:12] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1026" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343043 (owner: 10Marostegui) [13:22:23] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1026" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343043 (owner: 10Marostegui) [13:23:24] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1026 - T160415 (duration: 00m 42s) [13:23:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:23:29] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [13:32:52] PROBLEM - puppet last run on analytics1039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:33:02] RECOVERY - puppet last run on oxygen is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [13:37:22] 06Operations, 10Traffic: Backport iproute2 4.x from debian testing -> our jessie - https://phabricator.wikimedia.org/T138591#3105941 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff [13:38:07] !log Shutdown es2015 for maintenance - T160242 [13:38:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:38:13] T160242: es2015 crashed on 2017-03-11 - https://phabricator.wikimedia.org/T160242 [13:41:20] 06Operations, 10ops-codfw, 10DBA: es2015 crashed on 2017-03-11 - https://phabricator.wikimedia.org/T160242#3105953 (10Marostegui) @Papaul es2015 is now off. Please turn it on once you are done with the main board replacement. Thank you! [13:43:18] Dereckson: yes pls [13:43:33] (03PS1) 10Marostegui: db-eqiad.php: Depool db1045 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343047 (https://phabricator.wikimedia.org/T160415) [13:46:17] Raymond_: okay, I'm preparing a urgent throttle change, and we can deploy them both [13:46:29] Dereckson: great. thanks [13:47:38] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1045 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343047 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [13:51:04] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1045 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343047 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [13:51:12] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1045 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343047 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [13:51:20] (03PS1) 10Dereckson: Add Odia Wikipedia's 100 Women Editathon throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343048 (https://phabricator.wikimedia.org/T160619) [13:51:28] marostegui: are you done with tin? [13:51:53] Dereckson: finishing the deployment now [13:52:11] * Dereckson nods [13:52:12] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1045 - T160415 (duration: 00m 58s) [13:52:13] Dereckson: all yours! [13:52:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:52:17] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [13:52:18] thanks [13:52:35] don't know why I read the CR+2 line as the deploy log line [13:52:46] !log Resume EU SWAT for two new changes [13:52:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:54:21] (03PS2) 10Dereckson: Create Wikichanzo namespace for swwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/337555 (https://phabricator.wikimedia.org/T158041) (owner: 10Raimond Spekking) [13:54:27] (03CR) 10Dereckson: [C: 032] Create Wikichanzo namespace for swwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/337555 (https://phabricator.wikimedia.org/T158041) (owner: 10Raimond Spekking) [13:55:32] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1045" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343049 [13:56:24] 06Operations, 10MediaWiki-Vagrant, 06Release-Engineering-Team, 07Epic, 13Patch-For-Review: [EPIC] Migrate base image to Debian Jessie - https://phabricator.wikimedia.org/T136429#3106025 (10Gilles) [13:57:54] (03Merged) 10jenkins-bot: Create Wikichanzo namespace for swwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/337555 (https://phabricator.wikimedia.org/T158041) (owner: 10Raimond Spekking) [13:58:02] (03CR) 10jenkins-bot: Create Wikichanzo namespace for swwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/337555 (https://phabricator.wikimedia.org/T158041) (owner: 10Raimond Spekking) [13:59:09] Raymond_: live on mwdebug1002 [13:59:52] Dereckson: once you are done, let me know (no rush) [13:59:57] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343048 (https://phabricator.wikimedia.org/T160619) (owner: 10Dereckson) [14:00:10] marostegui: ok [14:00:52] RECOVERY - puppet last run on analytics1039 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [14:01:13] (03Merged) 10jenkins-bot: Add Odia Wikipedia's 100 Women Editathon throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343048 (https://phabricator.wikimedia.org/T160619) (owner: 10Dereckson) [14:01:23] (03CR) 10jenkins-bot: Add Odia Wikipedia's 100 Women Editathon throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343048 (https://phabricator.wikimedia.org/T160619) (owner: 10Dereckson) [14:01:51] throttle rule live on mwdebug1002 too [14:07:01] syncing throttle rule [14:07:07] Raymond_: that looks good for you? [14:07:40] !log dereckson@tin Synchronized wmf-config/throttle.php: Add Odia Wikipedia's 100 Women Editathon throttle rule (T160619) (duration: 00m 57s) [14:07:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:45] T160619: Odia Wikipedia's 100 Women Editathon: IP throttle exemption on 2017-03-18/19 - https://phabricator.wikimedia.org/T160619 [14:09:43] Dereckson: does it mean deploed on swwiki? I do not see the new namespace on https://sw.wikipedia.org/w/index.php?title=Maalum:Tafuta&profile=advanced&search=&fulltext=1 [14:09:58] marostegui: you can sync the repool change I think while Raymond_ tests its change [14:10:01] (03CR) 10Ottomata: "What's the status of this, still WIP? I think the required refinery stuff has been deployed." [puppet] - 10https://gerrit.wikimedia.org/r/335140 (owner: 10EBernhardson) [14:10:06] oh [14:10:23] Raymond_: yes it's live on sw.wikipedia if you use the debug extension [14:10:53] I see it [14:11:01] ok let's sync [14:11:07] Dereckson: how to use the debug extension? [14:11:19] Raymond_: so for the future, we test the change on a canary server, you can customize headers, or we've an extension for that [14:11:33] https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug [14:11:40] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Create Wikichanzo namespace for swwiki T158041) (duration: 00m 42s) [14:11:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:11:46] T158041: Create Wikichanzo namespace for swwiki - https://phabricator.wikimedia.org/T158041 [14:12:01] Dereckson: Ok thanks. I was not aware of this extension [14:12:05] Dereckson: ok, thanks! [14:12:10] (03PS2) 10Marostegui: Revert "db-eqiad.php: Depool db1045" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343049 [14:12:21] !log EU SWAT, round 2, done [14:12:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:12:34] Raymond_: so now you should see it in prod too [14:13:01] Dereckson: confimed. thanks :) [14:15:51] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1045" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343049 (owner: 10Marostegui) [14:17:16] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1045" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343049 (owner: 10Marostegui) [14:17:24] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1045" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343049 (owner: 10Marostegui) [14:17:52] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 599159 [14:18:22] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1045 - T160415 (duration: 00m 43s) [14:18:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:27] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [14:21:45] (03CR) 10Alexandros Kosiaris: [C: 031] maps - tuning of postgresql based on experience (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/343001 (https://phabricator.wikimedia.org/T160556) (owner: 10Gehel) [14:24:45] (03PS1) 10Marostegui: db-eqiad.php: Repool db1070 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343051 (https://phabricator.wikimedia.org/T157931) [14:28:57] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1070 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343051 (https://phabricator.wikimedia.org/T157931) (owner: 10Marostegui) [14:29:59] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1070 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343051 (https://phabricator.wikimedia.org/T157931) (owner: 10Marostegui) [14:30:11] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1070 with low weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343051 (https://phabricator.wikimedia.org/T157931) (owner: 10Marostegui) [14:31:22] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1070 with low weight - T157931 (duration: 00m 45s) [14:31:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:31:27] T157931: s5: db1070 not using file per table - https://phabricator.wikimedia.org/T157931 [14:33:42] PROBLEM - Apache HTTP on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:33:42] PROBLEM - Nginx local proxy to apache on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:34:02] PROBLEM - HHVM rendering on mw1200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:38:10] (03PS1) 10Marostegui: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343053 (https://phabricator.wikimedia.org/T160415) [14:41:12] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343053 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [14:41:24] 06Operations, 10ops-eqiad, 15User-fgiunchedi: Degraded RAID on ms-be1008 - https://phabricator.wikimedia.org/T160488#3106137 (10Cmjohnson) 05Open>03Resolved @fgiunchedi added disk back [14:42:37] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343053 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [14:42:48] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1082 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343053 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [14:43:32] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1082 - T160415 (duration: 00m 42s) [14:43:38] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343054 [14:43:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:39] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [14:44:52] RECOVERY - MegaRAID on ms-be1008 is OK: OK: optimal, 14 logical, 14 physical [14:47:48] 06Operations: provide download numbers for tails iso - https://phabricator.wikimedia.org/T160600#3106175 (10faidon) 05Open>03declined That's not very easy for us to do right now. Let's decline it for now in the interest of time; if they don't find the 2-3 mirrors that they want we could reevaluate. [14:48:45] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343054 (owner: 10Marostegui) [14:49:54] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343054 (owner: 10Marostegui) [14:50:03] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1082" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343054 (owner: 10Marostegui) [14:50:50] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1082 - T160415 (duration: 00m 41s) [14:50:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:55] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [14:51:28] (03PS1) 10Marostegui: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343057 (https://phabricator.wikimedia.org/T160415) [14:56:00] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343057 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [14:58:58] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343057 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [14:59:12] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1087 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343057 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [14:59:20] 06Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1070 - https://phabricator.wikimedia.org/T158969#3106257 (10Cmjohnson) @Marostegui disk is rebuilding Enclosure Device ID: 32 Slot Number: 10 Drive's position: DiskGroup: 0, Span: 5, Arm: 0 Enclosure position: 1 Device Id: 10 WWN: 500003978859EC70 Sequenc... [15:00:43] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1087 - T160415 (duration: 00m 42s) [15:00:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:51] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [15:00:56] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343062 [15:01:36] 06Operations, 10ops-eqiad, 10DBA: Degraded RAID on db1070 - https://phabricator.wikimedia.org/T158969#3106273 (10Marostegui) Awesome!! Thank you! ``` root@db1070:~# megacli -PDRbld -ShowProg -PhysDrv [32:10] -aALL Rebuild Progress on Device at Enclosure 32, Slot 10 Completed 1% in 5 Minutes. ``` [15:03:12] PROBLEM - HHVM rendering on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:03:42] PROBLEM - Apache HTTP on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:03:43] PROBLEM - Nginx local proxy to apache on mw1207 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:07:52] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 1096 [15:07:54] "load":128 [15:07:55] , "queued":1321 [15:08:03] mw1207 is not feeling really good [15:09:27] !log restart hhvm on mw1207, high load and queued requests - hhvm-dump-debug on /tmp/hhvm.27441.bt. [15:09:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:40] seems again the issue with all threads locked [15:09:47] I really hope that it will go away with 3.18 [15:10:32] RECOVERY - Apache HTTP on mw1207 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.064 second response time [15:10:32] RECOVERY - Nginx local proxy to apache on mw1207 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 615 bytes in 0.343 second response time [15:11:02] RECOVERY - HHVM rendering on mw1207 is OK: HTTP OK: HTTP/1.1 200 OK - 74641 bytes in 1.956 second response time [15:11:20] is mw1200 still broken? [15:11:56] RECOVERY - Check correctness of the icinga configuration on einsteinium is OK: Icinga configuration is correct [15:12:14] 06Operations, 10ops-eqiad, 06Services (watching): Degraded RAID on restbase-dev1001 - https://phabricator.wikimedia.org/T157425#3106341 (10Cmjohnson) @gwicke the ssd has been replaced. you may need to reboot the server or add it back to the cfg. [15:12:53] yes it is.. [15:13:08] !log restart hhvm on mw1200, high load and queued requests - hhvm-dump-debug on /tmp/hhvm.27107.bt. [15:13:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:14:17] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343062 (owner: 10Marostegui) [15:14:36] RECOVERY - Nginx local proxy to apache on mw1200 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 0.036 second response time [15:14:56] RECOVERY - HHVM rendering on mw1200 is OK: HTTP OK: HTTP/1.1 200 OK - 74639 bytes in 0.127 second response time [15:14:56] RECOVERY - Apache HTTP on mw1200 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.036 second response time [15:15:08] welcome back [15:16:08] 06Operations, 10ops-eqiad: Rack and Setup ms-be1028-ms-1033 - https://phabricator.wikimedia.org/T160640#3106352 (10Cmjohnson) [15:17:18] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343062 (owner: 10Marostegui) [15:17:26] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1087" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343062 (owner: 10Marostegui) [15:18:35] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1087 - T160415 (duration: 00m 42s) [15:18:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:41] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [15:19:20] (03PS1) 10Marostegui: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343065 (https://phabricator.wikimedia.org/T160415) [15:19:55] 06Operations, 10Analytics, 10Analytics-Cluster, 10hardware-requests: EQIAD: stat1003 replacement - https://phabricator.wikimedia.org/T159839#3106376 (10RobH) Raid5 has very slow write (same as raid6), due to the calculations on the parity striping (redundancy) across the disks. The fastest raid for writes... [15:20:31] (03CR) 10Hashar: Migrate typos check to a rake task (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/342604 (https://phabricator.wikimedia.org/T119140) (owner: 10Hashar) [15:27:34] !log otto@tin Started deploy [eventlogging/eventbus@75ab39c]: /v1/schemas/:schema_uri endpoint, T159179 [15:27:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:27:39] T159179: Create /schema/:schema endpoint in eventbus service to serve schemas by schema_uri - https://phabricator.wikimedia.org/T159179 [15:27:48] !log otto@tin Finished deploy [eventlogging/eventbus@75ab39c]: /v1/schemas/:schema_uri endpoint, T159179 (duration: 00m 14s) [15:27:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:28:47] (03PS2) 10Hashar: Migrate typos check to a rake task [puppet] - 10https://gerrit.wikimedia.org/r/342604 (https://phabricator.wikimedia.org/T119140) [15:29:36] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me, we can merge this once Chad is fine with it and had a chance to test it." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 (owner: 10Paladox) [15:30:09] (03PS2) 10Jcrespo: mariadb: clear build related files [puppet] - 10https://gerrit.wikimedia.org/r/342506 (owner: 10Hashar) [15:31:19] (03CR) 10Hashar: "Seems to work as long as one has git 2+ (which they should) and git compiled with perl regular expression (which they should as well)." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/342604 (https://phabricator.wikimedia.org/T119140) (owner: 10Hashar) [15:38:27] papaul: o/ - Any chance that we could check mw2256 today/tomorrow? (I'll be off next week) [15:38:48] otherwise it is fine next week, I'll ask to another one to check it [15:38:58] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343065 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [15:41:07] (03PS1) 10Andrew Bogott: Kesytonehooks: Exclude 'novaobserver' user from posix user group. [puppet] - 10https://gerrit.wikimedia.org/r/343074 (https://phabricator.wikimedia.org/T158650) [15:41:43] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343065 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [15:41:51] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1092 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343065 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [15:42:30] (03PS1) 10Volans: Add IRC/SAL logging [switchdc] - 10https://gerrit.wikimedia.org/r/343078 [15:43:39] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343079 [15:43:54] 06Operations, 13Patch-For-Review, 15User-Elukey: JobQueue Redis codfw replicas periodically lagging - https://phabricator.wikimedia.org/T159850#3106496 (10akosiaris) Recapping the IRC discussion. The average TTL is around 13 days (the values are ms) which seems quite reasonable given the jobqueue history [15:44:13] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1092 - T160415 (duration: 00m 42s) [15:44:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:44:19] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [15:44:35] !log reboot ms-be1008 after disk swap to clear stuck mkfs.xfs [15:44:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:46:54] (03CR) 10Chad: "I do not have the time to test this. Either it works or it doesn't and we'll revert." [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 (owner: 10Paladox) [15:47:09] elukey: any task for that? [15:47:39] (03CR) 10Paladox: [C: 031] "> I do not have the time to test this. Either it works or it doesn't" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/333475 (owner: 10Paladox) [15:49:13] (03CR) 10Andrew Bogott: "I've tested this a fair bit. There's a bug in Keystone which means project membership isn't updated as promptly as one might like, but th" [puppet] - 10https://gerrit.wikimedia.org/r/343074 (https://phabricator.wikimedia.org/T158650) (owner: 10Andrew Bogott) [15:49:57] papaul: yep! https://phabricator.wikimedia.org/T155180#3094640 [15:50:30] 06Operations, 10Graphite, 06Labs, 13Patch-For-Review: Move labs 'instances' data to graphite labs - https://phabricator.wikimedia.org/T143405#3106507 (10fgiunchedi) p:05Normal>03High a:05fgiunchedi>03None I can't work on this now, though `instances` is taking 165G on production graphite now. It'd b... [15:50:44] 06Operations, 10Graphite, 06Labs, 13Patch-For-Review, 15User-fgiunchedi: Move labs 'instances' data to graphite labs - https://phabricator.wikimedia.org/T143405#3106511 (10fgiunchedi) [15:51:08] (03CR) 10Marostegui: [V: 032 C: 032] Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343079 (owner: 10Marostegui) [15:52:34] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343079 (owner: 10Marostegui) [15:53:27] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1092 - T160415 (duration: 00m 42s) [15:53:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:53:32] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [15:54:47] elukey: the system is off line if yes i will take a look at it [15:58:13] papaul: shutting it down, 2 mins [15:58:17] (03CR) 10Jcrespo: [C: 032] mariadb: clear build related files [puppet] - 10https://gerrit.wikimedia.org/r/342506 (owner: 10Hashar) [16:00:04] godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170316T1600). [16:00:21] !log racadm serveraction powerdown on mw2256 for hw maintenance [16:00:25] papaul: should be down now [16:00:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:00:55] no patches for puppet swat [16:01:10] * elukey waits for godog's fig [16:01:12] https://media0.giphy.com/media/3o7abldj0b3rxrZUxW/giphy.mp4 [16:01:13] *gif [16:01:15] ahhahaah [16:01:30] dammit it doesn't loop [16:01:38] (03PS1) 10Marostegui: db-codfw.php: Depool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343081 (https://phabricator.wikimedia.org/T160415) [16:01:41] volans should write a bot to sent these gifs [16:01:45] !log Deploy schema change on s4 (commonswiki) https://phabricator.wikimedia.org/T73563 and https://phabricator.wikimedia.org/T160415 [16:01:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:03:20] moritzm: rotfl [16:03:44] PROBLEM - puppet last run on db1065 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:03:59] elukey: thanks [16:04:14] PROBLEM - Check systemd state on prometheus2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [16:04:41] 06Operations, 10ORES, 10Revision-Scoring-As-A-Service-Backlog: [spec] Active-active setup for ORES across datacenters (eqiad, codfw) - https://phabricator.wikimedia.org/T159615#3106577 (10akosiaris) FWIW, ores is one of those services that could potentially work active/active in both DCs, having an extra RTT... [16:04:54] prometheus2003 is me [16:09:12] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1092" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343079 (owner: 10Marostegui) [16:10:57] 06Operations, 10ops-eqiad, 06Services (watching): Degraded RAID on restbase-dev1001 - https://phabricator.wikimedia.org/T157425#3106635 (10Eevans) Is there a standard process for assigning this to someone in #operations to be completed? I do have root on these machines, but wouldn't want to intervene withou... [16:14:38] (03PS5) 10Gehel: maps - cleartables osm replication [puppet] - 10https://gerrit.wikimedia.org/r/341563 (https://phabricator.wikimedia.org/T157613) [16:17:16] (03PS4) 10Eevans: Enable Cassandra client encryption in RESTBase production [puppet] - 10https://gerrit.wikimedia.org/r/342898 (https://phabricator.wikimedia.org/T111113) [16:19:39] (03PS5) 10Eevans: Enable (optional) Cassandra client encryption in RESTBase production [puppet] - 10https://gerrit.wikimedia.org/r/342898 (https://phabricator.wikimedia.org/T111113) [16:20:42] (03PS5) 10Giuseppe Lavagetto: Add stages to manage maintenance [switchdc] - 10https://gerrit.wikimedia.org/r/342806 (https://phabricator.wikimedia.org/T160178) [16:24:47] urandom: it is weird that the disk still shows as failed in restbase-de1001 [16:25:28] elukey: only for md0 [16:25:55] ¯\_(ツ)_/¯ [16:26:38] urandom: even in megacli it is still showed as Failed [16:27:38] elukey: oh, right, this is one of those machines where you have to use the hardware raid, isn't it? [16:27:54] * urandom sighs [16:28:22] 06Operations, 10DBA: dbstore1001 troubleshoot IPMI issue - https://phabricator.wikimedia.org/T158893#3106763 (10Cmjohnson) @marostegui: nothing concrete has been determined since we upgraded the idrac f/w. My POC was holiday right after we started our conversation and then me. I reached out to them today. Wa... [16:28:49] urandom: usually we don't, the disk just needs to be set as JBOD and then mdadm takes care of the rest [16:29:02] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343081 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [16:29:52] (03CR) 10Eevans: [C: 031] "This is ready to be merged; It will flip the bit, enabling *optional* Cassandra client encryption in RESTBase production. A rolling resta" [puppet] - 10https://gerrit.wikimedia.org/r/342898 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [16:30:11] 06Operations, 10DBA: dbstore1001 troubleshoot IPMI issue - https://phabricator.wikimedia.org/T158893#3106780 (10Marostegui) >>! In T158893#3106763, @Cmjohnson wrote: > @marostegui: nothing concrete has been determined since we upgraded the idrac f/w. My POC was holiday right after we started our conversation... [16:31:32] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343081 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [16:31:41] (03CR) 10jenkins-bot: db-codfw.php: Depool db2065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343081 (https://phabricator.wikimedia.org/T160415) (owner: 10Marostegui) [16:32:44] RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:32:51] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2065 - T160415 - T73563 (duration: 00m 42s) [16:32:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:57] T160415: Review schema changes for T125071 - Add index to image table on all wikis - https://phabricator.wikimedia.org/T160415 [16:32:57] T73563: *_minor_mime are varbinary(32) on WMF sites, out of sync with varbinary(100) in MW core - https://phabricator.wikimedia.org/T73563 [16:34:55] (03PS1) 10Marostegui: db-eqiad.php: Increase weight for db1070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343085 (https://phabricator.wikimedia.org/T157931) [16:41:38] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase weight for db1070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343085 (https://phabricator.wikimedia.org/T157931) (owner: 10Marostegui) [16:41:58] 06Operations, 10ops-codfw, 13Patch-For-Review, 15User-Elukey: codfw: mw2251-mw2260 rack/setup - https://phabricator.wikimedia.org/T155180#3106820 (10Papaul) The system log shows DIMM A1 faulty. I swapped DIMM A1 with DIMM B1 and did system check again and now the system log shows DIMM B1 faulty. The mem... [16:42:04] (03PS1) 10Gehel: postgresql - require postgresql / postgis packages for spatialdb [puppet] - 10https://gerrit.wikimedia.org/r/343088 [16:43:02] (03Merged) 10jenkins-bot: db-eqiad.php: Increase weight for db1070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343085 (https://phabricator.wikimedia.org/T157931) (owner: 10Marostegui) [16:43:12] (03CR) 10jenkins-bot: db-eqiad.php: Increase weight for db1070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343085 (https://phabricator.wikimedia.org/T157931) (owner: 10Marostegui) [16:43:17] (03CR) 10jerkins-bot: [V: 04-1] postgresql - require postgresql / postgis packages for spatialdb [puppet] - 10https://gerrit.wikimedia.org/r/343088 (owner: 10Gehel) [16:43:53] elukey: we have a bad DIMM [16:44:37] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase weight for db1070 - T157931 (duration: 00m 41s) [16:44:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:44:43] T157931: s5: db1070 not using file per table - https://phabricator.wikimedia.org/T157931 [16:44:44] PROBLEM - puppet last run on labvirt1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:46:47] 06Operations, 10hardware-requests, 13Patch-For-Review: decom fluorine - https://phabricator.wikimedia.org/T159996#3106831 (10Dzahn) a:05Dzahn>03Cmjohnson This can now be wiped and go ahead with the process. [16:47:16] papaul: nice! Do we need to order a new one or is there any spare in the DC? [16:48:24] elukey: system is under warranty so i will ask Dell to send me a replacement [16:48:37] !log manually adjusted wikiversions on wasat.codfw.wmnet to point all wikis at wmf.16 to rebuild cirrus completion search indices before group2 rolls forward [16:48:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:49:16] (03PS2) 10Gehel: postgresql - require postgresql / postgis packages for spatialdb [puppet] - 10https://gerrit.wikimedia.org/r/343088 [16:49:46] 06Operations, 06Commons, 10Datasets-General-or-Unknown, 07Community-Wishlist-Survey-2016: Back up of Commons files - https://phabricator.wikimedia.org/T160229#3092719 (10Dzahn) I am wondering wow is this task related to "skills required: javascript, css, etc" at all? [16:49:59] 06Operations, 10Analytics, 10Analytics-Cluster, 10hardware-requests: EQIAD: stat1003 replacement - https://phabricator.wikimedia.org/T159839#3106843 (10Ottomata) There aren’t any ‘services’ hosted on these nodes. The drives are only used for local data storage, so we aren’t really concerned with write io... [16:51:00] (03PS3) 10Gehel: postgresql - require postgresql / postgis packages for spatialdb [puppet] - 10https://gerrit.wikimedia.org/r/343088 [16:51:47] 06Operations, 10ops-eqiad, 13Patch-For-Review: db1057 does not react to powercycle/powerdown/powerup commands - https://phabricator.wikimedia.org/T160435#3106846 (10jcrespo) ``` [16:49:04] db1057 has 1 bad pdu [16:49:07] do does db1055 ``` [16:54:28] papaul: thanks! [16:55:45] (03PS4) 10Gehel: postgresql - require postgresql / postgis packages for spatialdb [puppet] - 10https://gerrit.wikimedia.org/r/343088 [16:55:47] elukey: yw [16:57:45] !log started cirrus completion indices rebuild for group2 on wasat.codfw.wmnet [16:57:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:00:04] gwicke, cscott, arlolra, subbu, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170316T1700). [17:00:14] no parsoid deploy today [17:00:25] 06Operations, 10ops-eqiad, 13Patch-For-Review: db1057 does not react to powercycle/powerdown/powerup commands - https://phabricator.wikimedia.org/T160435#3106920 (10jcrespo) It still doesn't come up, not even the bios checks. :-( [17:01:11] (03PS5) 10Gehel: postgresql - require postgresql / postgis packages for spatialdb [puppet] - 10https://gerrit.wikimedia.org/r/343088 [17:04:52] none for ores too [17:12:44] RECOVERY - puppet last run on labvirt1013 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [17:13:40] (03PS1) 10Dzahn: tor: convert to profile/role-structure [puppet] - 10https://gerrit.wikimedia.org/r/343094 [17:16:24] PROBLEM - puppet last run on cp3047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:18:20] 06Operations, 10Analytics, 06WMF-Legal, 07Privacy: Honor DNT header for access logs & varnish logs - https://phabricator.wikimedia.org/T98831#1278628 (10Nuria) >If we're striving to honor DNT in its purest ideological form, people who have it set shouldn't be tracked at all, even for our internal tracking... [17:21:35] 06Operations, 10ops-eqiad, 15User-fgiunchedi: Rack and Setup ms-be1028-ms-1033 - https://phabricator.wikimedia.org/T160640#3107016 (10fgiunchedi) racking will be 3x systems per row, 10G where possible or 1G where we can't do 10G (see also T148647) [17:21:41] 06Operations, 10ops-eqiad, 15User-fgiunchedi: Rack and Setup ms-be1028-ms-1033 - https://phabricator.wikimedia.org/T160640#3107024 (10fgiunchedi) [17:23:54] PROBLEM - puppet last run on db1093 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:24:40] 06Operations, 10ops-eqiad, 15User-fgiunchedi: Rack and Setup ms-be1028-ms-1039 - https://phabricator.wikimedia.org/T160640#3107056 (10Cmjohnson) [17:27:25] 06Operations, 10ops-eqiad, 15User-fgiunchedi: Rack and Setup ms-be1028-ms-1039 - https://phabricator.wikimedia.org/T160640#3107069 (10Cmjohnson) Going to go with 3 in 10G rack A5, C8 and D8. The remaining 3 will go to row B3, B5, C5 [17:28:34] (03PS2) 10Dzahn: tor: convert to profile/role-structure [puppet] - 10https://gerrit.wikimedia.org/r/343094 [17:28:41] cmjohnson1: that'd be 4x per row non 3x per row [17:29:49] godog: I can't accommodate 4 in 10G racks except D8 [17:29:49] D8: Add basic .arclint that will handle pep8 and pylint checks - https://phabricator.wikimedia.org/D8 [17:33:23] cmjohnson1: in a meeting, done in 30' [17:35:44] PROBLEM - puppet last run on mw1236 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:41:13] (03CR) 10Dzahn: [C: 031] "no-op http://puppet-compiler.wmflabs.org/5798/" [puppet] - 10https://gerrit.wikimedia.org/r/343094 (owner: 10Dzahn) [17:43:33] (03PS9) 10Mbch331: Remove exception on Other Projects sidebar for Dutch Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341195 (https://phabricator.wikimedia.org/T159634) [17:45:24] RECOVERY - puppet last run on cp3047 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [17:47:14] (03PS3) 10Dzahn: tor: convert to profile/role-structure [puppet] - 10https://gerrit.wikimedia.org/r/343094 [17:50:54] RECOVERY - puppet last run on db1093 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [17:51:20] (03PS6) 10Giuseppe Lavagetto: Add stages to manage maintenance [switchdc] - 10https://gerrit.wikimedia.org/r/342806 (https://phabricator.wikimedia.org/T160178) [17:51:22] 06Operations, 10Analytics, 10DBA: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3107208 (10Ottomata) Hm, just looked myself, but I don't see any non EventLogging `log` databases on db1046 or db1047. At least, the 'eventlog' user can't see them. Not sure... [17:51:44] cmjohnson1: it is easier to think about it if they are more spread equally among rows, 3x per row should do it, it matters less 10G vs 1G in this case [17:53:33] okay [18:00:01] (03PS1) 10Subramanya Sastry: Update ruthenium nginx conf to handle updated parsoid test domains [puppet] - 10https://gerrit.wikimedia.org/r/343099 [18:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170316T1800). Please do the needful. [18:00:04] DatGuy and James_F: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [18:00:17] * James_F waves. [18:00:18] Here [18:00:33] First SWAT deployment, note if I'm doing anything wrong please :) [18:01:10] I can SWAT today [18:01:54] mutante, i uploaded a patch to update the nginx conf [18:02:34] (03PS2) 10Subramanya Sastry: Update ruthenium nginx conf to handle updated parsoid test domains [puppet] - 10https://gerrit.wikimedia.org/r/343099 (https://phabricator.wikimedia.org/T159995) [18:03:04] 06Operations, 10ops-eqiad, 15User-fgiunchedi: Rack and Setup ms-be1028-ms-1039 - https://phabricator.wikimedia.org/T160640#3107264 (10fgiunchedi) reporting from IRC ``` 17:51 cmjohnson1: it is easier to think about it if they are more spread equally among rows, 3x per row should do... [18:04:23] (03PS5) 10Thcipriani: Turn off patrolling for FlaggedRevs in bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341350 (https://phabricator.wikimedia.org/T158662) (owner: 10DatGuy) [18:04:44] RECOVERY - puppet last run on mw1236 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [18:04:51] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341350 (https://phabricator.wikimedia.org/T158662) (owner: 10DatGuy) [18:05:31] and now I wait on zuul for a bit [18:06:44] DatGuy: are you setup to test your change on mwdebug1002 with the X-Wikimedia-Debug extension? https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug [18:07:19] (03Merged) 10jenkins-bot: Turn off patrolling for FlaggedRevs in bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341350 (https://phabricator.wikimedia.org/T158662) (owner: 10DatGuy) [18:07:32] (03CR) 10jenkins-bot: Turn off patrolling for FlaggedRevs in bswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/341350 (https://phabricator.wikimedia.org/T158662) (owner: 10DatGuy) [18:07:46] I have the extension ready [18:08:07] nice :) [18:08:51] does anyone know about what's happening with portals in mwconfig? seems like the submodule has changes? [18:08:58] on tin [18:11:13] thcipriani: They were deploying earlier this week IIRC [18:11:30] Uh [18:11:33] Dereckson: I think you messed up [18:11:34] https://github.com/wikimedia/operations-mediawiki-config/commit/b409ac620edb4d9c7268df36594c95efbf4a532c [18:11:40] Submodule portals updated from e576c1 to 90f81e [18:12:08] so thcipriani, what should I do? [18:12:35] DatGuy: nothing yet, trying to figure out something on the deployment hosts, hold please [18:12:45] ah I see, no problem :) [18:13:00] thcipriani: I'd be tempted to revert Dereckson's portal change, and commit it [18:13:16] That's gotta be unintetional and just tripping over git submodules [18:13:55] (03PS1) 10Reedy: Revert "Add Odia Wikipedia's 100 Women Editathon throttle rule" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343101 [18:14:24] (03PS2) 10Reedy: Revert portal changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343101 [18:14:33] indeed [18:14:53] Reedy: thanks for the quick draw with git :) [18:15:11] (03CR) 10Reedy: "Revert done via gerrit for laziness. Only partial to remove presumably unintended changes to portals" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343048 (https://phabricator.wikimedia.org/T160619) (owner: 10Dereckson) [18:15:14] (03CR) 10Thcipriani: [C: 032] Revert portal changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343101 (owner: 10Reedy) [18:15:49] it doesn't look like this change was deployed for portals, spot-checking a few boxes [18:15:58] I'll merge on tin and that should be the end of it [18:16:12] should be [18:16:21] ominous. [18:16:33] (03Merged) 10jenkins-bot: Revert portal changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343101 (owner: 10Reedy) [18:16:51] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3107353 (10chasemp) >>! In T157359#3095669, @jcrespo wrote: > @aude @MaxSem @Kolossos Can you verify your applications (e.g. restarting them) and se... [18:17:11] (03CR) 10jenkins-bot: Revert portal changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343101 (owner: 10Reedy) [18:18:16] DatGuy: ok, your change is now live on mwdebug1002, check please and let me know if everything looks correct and I'll sync everywhere. [18:18:33] 06Operations, 10Analytics, 10DBA: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3107357 (10Ottomata) @DarTar @leila, @milimetric, @Tbayer: Q for yall. I need to figure out what we actually need to replace MySQL research slaves. I am unfamiliar with what... [18:19:24] thcipriani: Dereckson just confirmed on the phab task it was unintentional [18:19:47] cool, thanks for confirming. [18:21:29] gerrit makes it very easy to just revert one file from a bit [18:21:31] *commit [18:21:54] 06Operations, 05Prometheus-metrics-monitoring, 15User-fgiunchedi: Effects on adjusting Prometheus retention - https://phabricator.wikimedia.org/T160677#3107392 (10fgiunchedi) [18:23:01] that's good to know. I started fiddling with the change on tin, glanced at IRC, and it was done :) [18:23:23] I'd have to google unstage the file [18:23:27] via cli git [18:23:31] so I just press revert [18:23:32] edit [18:23:39] the undo arrow thing on the change I want to keep, edit commit summary [18:23:40] submit [18:23:41] profit [18:23:54] PROBLEM - puppet last run on dbproxy1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:24:43] 06Operations, 10Analytics, 10DBA: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#2987618 (10Halfak) We can probably not replace db1047 but we'll want to take backups of the user created dbs there. [18:24:55] yeap. git under pressure is not fun. Although "git under pressure" does describe deployments pretty accurately :P [18:25:47] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3107415 (10dschwen) Nope. My stuff fails now with: ``` ERROR: function setsrid(box3d, integer) does not exist LINE 6: SetSRID('BOX3D(-... [18:25:56] git revert [18:25:58] git commit [18:25:59] FUCK IT [18:26:06] :D [18:26:14] though, I'm more likely to use git reset HEAD~1 --hard && scap sync [18:26:54] PROBLEM - puppet last run on wtp1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:30:09] DatGuy: how is you change looking on mwdebug1002? [18:30:33] language is kinda confusing, looking into it a bit more before giving the heads up :) [18:32:04] fair enough :) [18:32:43] just to make sure, you're using ?uselang=en (or your native language code) to check, like https://bs.wikipedia.org/wiki/Posebno:ListaKorisni%C4%8DkihPrava?uselang=en [18:34:08] yes. Just to make sure, do you need to click anything except "on" in the extension? [18:35:33] 06Operations, 10Analytics, 10DBA: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#2987618 (10Neil_P._Quinn_WMF) I've never used either s1 or s2—only analytics-store and (once or twice) x1. As an extra data point, the [data access docs](https://wikitech.wiki... [18:36:26] DatGuy: I don't know, sorry :( [18:40:33] 06Operations, 10OfflineContentGenerator, 06Reading-Web-Backlog, 06Services (watching): Confirm attribution needs - https://phabricator.wikimedia.org/T150875#3107445 (10JKatzWMF) My understanding is that the above that @ZhouZ wrote is insufficient for the 'book' use case. Is that true? It turns out that... [18:41:07] 06Operations, 10Analytics, 10DBA: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3107446 (10Halfak) I've just backed up the `halfak` DB. I can also be responsible for backing up the `staeiou` DB. [18:41:07] addshore: I'm going to remove /srv/mediawiki-staging/php-1.29.0-wmf.16/addOption since it's not tracked [18:41:31] *looks* [18:41:37] er setOption rather. Untracked and undeployed. [18:42:21] thcipriani: you mean the patch? [18:42:40] 06Operations, 10ops-codfw, 10DBA: es2015 crashed on 2017-03-11 - https://phabricator.wikimedia.org/T160242#3107447 (10Marostegui) The Dell technician didn't show up and @Papaul has arranged another appointment for Monday. I will be off on Monday, so it will need to be powered of by @jcrespo. I have just pow... [18:43:15] yep thcipriani, got ocnfirmation it looks good [18:43:37] addshore: it's not a patch, just looks like some random php in /srv/mediawiki-staging/php-1.29.0-wmf.16/setOption [18:43:48] DatGuy: awesome, going live :) [18:43:54] PROBLEM - Check systemd state on prometheus2004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [18:44:07] ahhh thcipriani, looks like that was a command gone wrong.... yes please remove... [18:45:11] Also thcipriani, there are currently users in a group that is going to be removed. what will happen to them? [18:46:54] (03PS4) 10BBlack: [WIP/POC] DNS zones to puppet repo [puppet] - 10https://gerrit.wikimedia.org/r/342887 [18:47:01] DatGuy: I think they will just not be part of that group anymore since it doesn't exist. I know we have a script for changing a group name, but I'm not sure about removing a group. Reedy is there anything special that has to happen for group removal afayk? [18:47:20] Ideally, they want removing first [18:47:34] If they're not being migrated [18:47:44] The entries will still be in the DB [18:47:47] so really want cleaning up [18:48:09] srdjan_m, are any 'crats active right now? [18:49:54] DatGuy: a bureaucrat was active around 8 hours ago [18:50:26] they might be available again in a few hours [18:51:17] Reedy: there's no cleanup script for removing groups? [18:51:23] nope [18:51:32] How many people is it? [18:51:37] Which patch is this? [18:51:48] https://gerrit.wikimedia.org/r/#/c/341350/5 [18:52:23] (03PS1) 10Filippo Giunchedi: install_server: tweak prometheus partman recipe for baremetal [puppet] - 10https://gerrit.wikimedia.org/r/343110 (https://phabricator.wikimedia.org/T148408) [18:52:35] https://bs.wikipedia.org/w/index.php?title=Posebno%3AListaKorisnika&username=&group=patroller&limit=50 [18:52:51] I'd probably just db query delete them tbh [18:52:54] RECOVERY - puppet last run on dbproxy1009 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [18:53:53] there are also 4 people in the reviewer group and a bunch of em in autopatrolled [18:53:54] RECOVERY - Check systemd state on prometheus2004 is OK: OK - running: The system is fully operational [18:54:14] RECOVERY - Check systemd state on prometheus2003 is OK: OK - running: The system is fully operational [18:54:20] bunch = 26 [18:54:54] RECOVERY - puppet last run on wtp1010 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [18:55:26] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3107475 (10dschwen) After fixing those to `ST_SetSRID` my PostGIS query now fails with ``` ERROR: Operation on mixed SRID geometries ``` [18:55:40] hrm. Running hand-crafted deletes on the db during a deploy makes me twitchy. [18:56:49] 06Operations, 10Mail, 15User-Urbanecm: Request for email address seniori@wikimedia.org - https://phabricator.wikimedia.org/T160400#3107480 (10matmarex) [18:57:50] DatGuy: I'm going to revert for now, can you/someone reply on that task to let folks know to remove users from the groups to be removed for db cleanup sake, and then we'll deploy this Soon™? [18:58:07] alright, valve time? [18:58:52] I'd quite like my UBNs to be deployed at some point, BTW. :-) [18:58:52] "valve time"? [18:58:54] PROBLEM - puppet last run on mw1249 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:59:01] James_F: yes, on it. sorry. [18:59:15] No worries. [18:59:23] heh, it's a joke about a video game company releasing their promises months late [18:59:26] (or years) [19:00:04] twentyafterfour: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170316T1900). Please do the needful. [19:00:15] DatGuy: ah, gotcha :P [19:00:19] twentyafterfour: hold please [19:01:16] 06Operations, 05Prometheus-metrics-monitoring, 15User-fgiunchedi: Effects on adjusting Prometheus retention - https://phabricator.wikimedia.org/T160677#3107502 (10fgiunchedi) [19:01:17] (03PS1) 10Thcipriani: Revert "Turn off patrolling for FlaggedRevs in bswiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343112 [19:01:40] (03CR) 10Thcipriani: [C: 032] Revert "Turn off patrolling for FlaggedRevs in bswiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343112 (owner: 10Thcipriani) [19:01:51] James_F: your change is live on mwdebug1002, check please [19:02:00] !log restart relforge to activate new plugins - T160674 [19:02:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:02:07] T160674: Install Chinese plugins for relforge - https://phabricator.wikimedia.org/T160674 [19:03:37] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3107510 (10jcrespo) That is probably a conquequence of a PostGIS upgrade. Can you post/point to the whole query/relevant code? [19:03:43] thcipriani: Yup, LGTM. [19:04:16] James_F: ok, going live [19:04:42] Trey314159: ^ plugin loaded, cluster green... [19:05:04] (03Merged) 10jenkins-bot: Revert "Turn off patrolling for FlaggedRevs in bswiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343112 (owner: 10Thcipriani) [19:07:16] (03CR) 10jenkins-bot: Revert "Turn off patrolling for FlaggedRevs in bswiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343112 (owner: 10Thcipriani) [19:07:36] 06Operations, 10Traffic, 13Patch-For-Review, 05Prometheus-metrics-monitoring, 15User-fgiunchedi: Error collecting metrics from varnish_exporter on some misc hosts - https://phabricator.wikimedia.org/T150479#3107532 (10fgiunchedi) [19:07:40] !log thcipriani@tin Synchronized php-1.29.0-wmf.16/extensions/VisualEditor/lib/ve: SWAT: [[gerrit:343072|Update VE core submodule to wmf/1.29.0-wmf.16 HEAD]] (50a6323d7) T154123 T160479 (duration: 00m 44s) [19:07:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:07:47] T154123: VisualEditor: when pasting wikified text from an article, the text style (including link) can't be modified - https://phabricator.wikimedia.org/T154123 [19:07:47] T160479: [Regression wmf.16] Cursor jumps to the beginning of the page after adding a focusable node - https://phabricator.wikimedia.org/T160479 [19:07:49] ^ James_F live now [19:08:04] Thank you! [19:08:31] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] "CI backed up" [puppet] - 10https://gerrit.wikimedia.org/r/343110 (https://phabricator.wikimedia.org/T148408) (owner: 10Filippo Giunchedi) [19:09:03] 06Operations, 10Mail, 06Office-IT, 15User-Urbanecm: Request for email address seniori@wikimedia.org - https://phabricator.wikimedia.org/T160400#3107542 (10Reedy) [19:09:43] twentyafterfour: all clear for train, sorry for the delay! [19:12:29] 06Operations, 10Mail, 06Office-IT, 15User-Urbanecm: Request for email address seniori@wikimedia.org - https://phabricator.wikimedia.org/T160400#3097403 (10Reedy) Filed as an OIT ticket under ticket number #12822 [19:19:05] (03PS1) 1020after4: all wikis to 1.29.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343114 [19:19:07] (03CR) 1020after4: [C: 032] all wikis to 1.29.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343114 (owner: 1020after4) [19:20:44] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3107585 (10dschwen) Sure the query is here: https://github.com/dschwen/wikiminiatlas/blob/master/tiles/jsontile.php#L115 [19:21:17] (03Merged) 10jenkins-bot: all wikis to 1.29.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343114 (owner: 1020after4) [19:21:26] (03CR) 10jenkins-bot: all wikis to 1.29.0-wmf.16 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/343114 (owner: 1020after4) [19:25:04] Fatal error: Invalid operand type was used: cannot perform this operation with arrays in /srv/mediawiki/php-1.29.0-wmf.15/languages/Language.php on line 471 [19:25:15] that's happening a lot ... [19:26:39] $this->namespaceNames = $wgExtraNamespaces + [19:26:39] self::$dataCache->getItem( $this->mCode, 'namespaceNames' ); [19:26:40] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.29.0-wmf.16 [19:26:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:59] LocalisationCache ok? [19:27:54] RECOVERY - puppet last run on mw1249 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [19:29:24] Reedy: hmm I don't know but it seems like it isn't a problem in wmf.16 so I'm not gonna fret over it much [19:31:44] PROBLEM - CirrusSearch codfw 95th percentile latency - more_like on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [2000.0] [19:32:33] ebernhardson: ^ [19:34:15] ebernhardson: is that related to the train? should I roll back [19:34:37] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3107634 (10jcrespo) This could be the reason- requiring to setup the SRID for constants: http://gis.stackexchange.com/questions/68711/postgis-geomet... [19:36:44] RECOVERY - CirrusSearch codfw 95th percentile latency - more_like on graphite1001 is OK: OK: Less than 20.00% above the threshold [1200.0] [19:38:42] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3107645 (10dschwen) Looks like the OSM data uses SRID 3857 and I compare to a Bounding Box with SRID 900913 [19:38:46] (03CR) 10GWicke: "In order to move this forward I listed this patch for puppet swat on the 21st." [puppet] - 10https://gerrit.wikimedia.org/r/341833 (https://phabricator.wikimedia.org/T159922) (owner: 10GWicke) [19:42:10] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3107654 (10dschwen) Ok, next issue is that suddenly the column `the_geom` does not exist anymore in the `land_polygons` and `coastlines` tables.... [19:44:01] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3107655 (10dschwen) Ok, I think I'm back in business! Testing a bit more now. [19:47:32] (03PS1) 10Yuvipanda: admin: Add ssh key on new machine for Yuvi [puppet] - 10https://gerrit.wikimedia.org/r/343116 [19:48:42] madhuvishy: can you merge ^? new laptop :) [19:48:54] PROBLEM - puppet last run on db1082 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:49:40] (03PS2) 10Madhuvishy: admin: Add ssh key on new machine for Yuvi [puppet] - 10https://gerrit.wikimedia.org/r/343116 (owner: 10Yuvipanda) [19:50:09] (03CR) 10Madhuvishy: [V: 032 C: 032] admin: Add ssh key on new machine for Yuvi [puppet] - 10https://gerrit.wikimedia.org/r/343116 (owner: 10Yuvipanda) [19:50:59] yuvipanda: done [19:52:15] 06Operations, 06Labs: Mount /public/dumps for osmit project - https://phabricator.wikimedia.org/T156586#3107675 (10chasemp) p:05Triage>03Normal @sabas88 we could configure /public/dumps to be available on all instances in this project. Is that acceptable? [19:53:54] something's up with mw2256 [19:54:08] I'm getting errors for wmf.15 even though every wiki is now on wmf.16 [19:54:17] all the errors are coming from mw2256 [19:54:54] PROBLEM - puppet last run on mc1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:55:14] twentyafterfour: Doesn't that have ram errors? [19:55:29] I'm not sure [19:55:39] https://phabricator.wikimedia.org/T155180#3101984 [19:55:45] maybe it should be shut down [19:55:47] Yeah, it shouldn't be pooled [19:55:57] it looks like it's doing something crazy [19:56:05] 06Operations, 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3107687 (10jcrespo) Good to know- feel free to test and communicate issues- we will soon otherwise irrevocably delete the previous instance. Thanks... [19:56:40] Well, also, isn't that codfw? [19:56:44] PROBLEM - Host elastic2020 is DOWN: PING CRITICAL - Packet loss = 100% [19:56:50] 06Operations, 10ops-codfw, 13Patch-For-Review, 15User-Elukey: codfw: mw2251-mw2260 rack/setup - https://phabricator.wikimedia.org/T155180#2936454 (10mmodell) this host is logging a ton of noise in logstash... can we shut down hhvm until the ram problem is fixed? [19:57:06] Reedy: all I know is it's logging [19:57:12] and uhm.... ebernhardson ^^ [19:57:19] grrr same problem with this elastic2020 node [19:57:33] :( [19:57:34] twentyafterfour: this node is known for having issues... [19:58:02] it's unclear but looks like it goes down few minutes after we switchover [19:58:20] it's the 3rd time it happens :/ [19:59:11] I hope my phabricator indexer job hasn't been putting too much load on elastic... it's been running since early this morning [19:59:19] 06Operations, 10ops-codfw, 06DC-Ops, 06Discovery, and 2 others: elastic2020 is powered off and does not want to restart - https://phabricator.wikimedia.org/T149006#3107708 (10dcausse) 05Resolved>03Open Reopening it happened today in exactly the same conditions, few minutes after a switchover [19:59:26] twentyafterfour: ^ [19:59:47] this is very likely a hw issue [20:01:04] 06Operations, 10Wikimedia-Site-requests, 07Performance: Increase $wgExpensiveParserFunctionLimit on nowiki - https://phabricator.wikimedia.org/T160685#3107727 (10Reedy) [20:01:45] 06Operations, 10Wikimedia-Site-requests, 07Performance: Increase $wgExpensiveParserFunctionLimit on nowiki - https://phabricator.wikimedia.org/T160685#3107696 (10Reedy) I suggest we increase it everywhere, just not one place. Wants signoff from #operations and #performance-team before doing [20:04:08] 06Operations, 10Wikimedia-Site-requests, 07Performance: Increase $wgExpensiveParserFunctionLimit on nowiki - https://phabricator.wikimedia.org/T160685#3107736 (10jeblad) Yes, I believe that would be a better solution. ;) [20:05:31] !log restarted phd on iridium to fix workers dieing [20:05:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:05:57] !log dzahn@puppetmaster1001 conftool action : set/pooled=inactive; selector: name=elastic2010.codfw.wmnet [20:06:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:06:05] 06Operations, 06Performance-Team, 10Wikimedia-Site-requests, 07Performance: Increase $wgExpensiveParserFunctionLimit on nowiki - https://phabricator.wikimedia.org/T160685#3107745 (10Reedy) [20:06:59] !log depooled elastic2010 since it is powered-off/down. (set/pooled=inactive) - (T149006) [20:07:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:07:04] T149006: elastic2020 is powered off and does not want to restart - https://phabricator.wikimedia.org/T149006 [20:07:09] mutante: it's elastic2020 :) [20:07:18] arg [20:07:34] dcausse: :/ fixing it [20:07:44] no big deal it can still serve traffic indirectly [20:07:53] ok [20:08:00] !log dzahn@puppetmaster1001 conftool action : set/pooled=yes; selector: name=elastic2010.codfw.wmnet [20:08:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:08:25] !log dzahn@puppetmaster1001 conftool action : set/pooled=inactive; selector: name=elastic2020.codfw.wmnet [20:08:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:08:33] thanks! [20:08:35] !log repooled elastic2010, depooled correct host elastic2020 instead (T149006) [20:08:40] yw [20:08:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:10:26] 06Operations, 06Labs, 06Research-and-Data-Backlog, 10hardware-requests: eqiad: 2 hardware access request for research labsdbs - https://phabricator.wikimedia.org/T146065#3107768 (10chasemp) 05Open>03declined My understanding is this has been put on hold until TBD. I don't want to leave this request fo... [20:12:54] PROBLEM - puppet last run on etcd1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:13:19] 06Operations, 10ops-codfw, 06DC-Ops, 06Discovery, and 2 others: elastic2020 is powered off and does not want to restart - https://phabricator.wikimedia.org/T149006#2739503 (10Dzahn) @papaul confirmed it has the same behaviour again. It shows as status "powered down", then you can tell it to power on and i... [20:17:54] RECOVERY - puppet last run on db1082 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [20:18:38] (03PS4) 10Dzahn: tor: convert to profile/role-structure [puppet] - 10https://gerrit.wikimedia.org/r/343094 [20:22:54] RECOVERY - puppet last run on mc1030 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [20:30:23] 06Operations, 06Labs: Mount /public/dumps for osmit project - https://phabricator.wikimedia.org/T156586#3107843 (10Sabas88) Yes of course! That would do. [20:33:02] (03CR) 10Dzahn: [C: 032] "no-op http://puppet-compiler.wmflabs.org/5799/ merging like this, if the "include passwords" part should move can still do it afterwards" [puppet] - 10https://gerrit.wikimedia.org/r/343094 (owner: 10Dzahn) [20:33:53] (03PS5) 10Dzahn: tor: convert to profile/role-structure [puppet] - 10https://gerrit.wikimedia.org/r/343094 [20:38:50] twentyafterfour un related to any errors on prod wiki's but using wmf 16 on my own wiki i am getting this error [20:38:50] [6649cc54534fd13f0a6c5cb3] /wiki/Special:Version Error from line 663 of /home/randomwi/public_html/en/includes/cache/localisation/LocalisationCache.php: Class 'DOMDocument' not found [20:39:00] I've updated mediawiki to wmf 16 and also vendor too [20:39:33] See https://en.random-wikisaur.tk/wiki/Special:Version [20:39:37] paladox: weird [20:39:41] Yep [20:41:24] twentyafterfour fixed it [20:41:32] Turns out i was missing the php-dom module [20:41:54] RECOVERY - puppet last run on etcd1005 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [20:41:57] as my host does multi php versions. And i switched to php 7.1 today :) [20:44:14] PROBLEM - cassandra-c SSL 10.192.16.164:7001 on restbase2001 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [20:44:24] PROBLEM - cassandra-c CQL 10.192.16.164:9042 on restbase2001 is CRITICAL: connect to address 10.192.16.164 and port 9042: Connection refused [20:44:54] PROBLEM - puppet last run on mw1201 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:45:14] PROBLEM - Check systemd state on restbase2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [20:45:14] PROBLEM - cassandra-c service on restbase2001 is CRITICAL: CRITICAL - Expecting active but unit cassandra-c is failed [20:50:55] apergos: Hey, do you think we can merge this now? https://gerrit.wikimedia.org/r/#/c/339332/ [20:51:23] Amir1: run isn't complete yet, in a few days it will be [20:51:52] apergos: Sorry. I checked https://dumps.wikimedia.org/backup-index.html and there was none. [20:52:41] Amir1: there were two jobs aborted (see the bottom), there's a ticket about what's going on in phab in the dumps-generation project [20:52:55] they should finish up in 1-2 days [20:52:59] no worries [20:53:03] Thanks. [20:53:06] don't worry, I'm watching :-) [20:53:24] * Amir1 is too excited [20:54:54] RECOVERY - MegaRAID on db1070 is OK: OK: optimal, 1 logical, 2 physical [20:55:54] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:00:31] (03PS7) 10EBernhardson: Drop mediawiki logs in HDFS after 90 days [puppet] - 10https://gerrit.wikimedia.org/r/335140 [21:00:58] (03CR) 10EBernhardson: "rebased and removed WIP tag after verifying the latest refinery release deployed to hdfs includes the dependent patch" [puppet] - 10https://gerrit.wikimedia.org/r/335140 (owner: 10EBernhardson) [21:01:29] (03CR) 10jerkins-bot: [V: 04-1] Drop mediawiki logs in HDFS after 90 days [puppet] - 10https://gerrit.wikimedia.org/r/335140 (owner: 10EBernhardson) [21:02:17] 06Operations, 10Analytics, 06WMF-Legal, 07Privacy: Honor DNT header for access logs & varnish logs - https://phabricator.wikimedia.org/T98831#3107961 (10Gilles) Aggregate counts isn't problematic, but the data we store is. If it's recorded, it can be compromised. I know we have retention policies, etc. but... [21:04:11] (03PS1) 10Rush: labstore: add dumps mount for osmit project [puppet] - 10https://gerrit.wikimedia.org/r/343126 [21:05:21] (03PS2) 10Rush: labstore: add dumps mount for osmit project [puppet] - 10https://gerrit.wikimedia.org/r/343126 [21:07:14] RECOVERY - Check systemd state on restbase2001 is OK: OK - running: The system is fully operational [21:07:14] RECOVERY - cassandra-c service on restbase2001 is OK: OK - cassandra-c is active [21:08:14] RECOVERY - cassandra-c SSL 10.192.16.164:7001 on restbase2001 is OK: SSL OK - Certificate restbase2001-c valid until 2017-09-12 15:13:30 +0000 (expires in 179 days) [21:08:24] RECOVERY - cassandra-c CQL 10.192.16.164:9042 on restbase2001 is OK: TCP OK - 0.036 second response time on 10.192.16.164 port 9042 [21:09:27] (03CR) 10Madhuvishy: [C: 031] labstore: add dumps mount for osmit project [puppet] - 10https://gerrit.wikimedia.org/r/343126 (owner: 10Rush) [21:09:42] (03CR) 10Rush: [C: 032] labstore: add dumps mount for osmit project [puppet] - 10https://gerrit.wikimedia.org/r/343126 (owner: 10Rush) [21:10:34] 06Operations, 06Labs: Mount /public/dumps for osmit project - https://phabricator.wikimedia.org/T156586#3108002 (10chasemp) It should appear on a Puppet run sometime in the next hour or so. [21:11:15] (03PS8) 10Ladsgroup: service: Send uwsgi logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) [21:12:05] (03CR) 10jerkins-bot: [V: 04-1] service: Send uwsgi logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) (owner: 10Ladsgroup) [21:12:14] (03CR) 10Ladsgroup: service: Send uwsgi logs to logstash (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) (owner: 10Ladsgroup) [21:12:54] RECOVERY - puppet last run on mw1201 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [21:13:29] 06Operations, 06Labs: Mount /public/dumps for osmit project - https://phabricator.wikimedia.org/T156586#3108017 (10Sabas88) Thanks! [21:14:27] urandom: do you want to do the cassandra change now? (and handle the restarts) [21:14:50] mutante: sure [21:14:59] alrighty [21:15:01] (03CR) 10Dzahn: [C: 032] Enable (optional) Cassandra client encryption in RESTBase production [puppet] - 10https://gerrit.wikimedia.org/r/342898 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [21:15:21] (03PS6) 10Dzahn: Enable (optional) Cassandra client encryption in RESTBase production [puppet] - 10https://gerrit.wikimedia.org/r/342898 (https://phabricator.wikimedia.org/T111113) (owner: 10Eevans) [21:16:56] (03PS9) 10Ladsgroup: service: Send uwsgi logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) [21:17:09] urandom: submitted and merged on puppetmaster [21:17:17] mutante: awesome; thanks! [21:17:21] shall i run puppet? [21:17:26] !log reindexing group2 in cirrussearch for codfw downtime during 2.x -> 5.x upgrade [21:17:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:17:32] or are you doign both [21:17:36] mutante: i can [21:17:38] ok [21:17:48] mutante: i'm going to do a canary or two first anyway [21:17:57] the others will probably run by then anyway [21:17:57] yep, great [21:18:13] and puppet will change config but not do the restart, sounds good [21:18:19] yeah [21:19:33] !log T111113: Restarting Cassandra on restbase1007-a to enable (optional) client encryption [21:19:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:19:41] T111113: Cassandra client encryption - https://phabricator.wikimedia.org/T111113 [21:22:44] PROBLEM - cassandra-a CQL 10.64.0.230:9042 on restbase1007 is CRITICAL: connect to address 10.64.0.230 and port 9042: Connection refused [21:23:04] PROBLEM - cassandra-a SSL 10.64.0.230:7001 on restbase1007 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [21:24:06] ^^^ got that [21:24:51] 06Operations, 06Performance-Team, 10Wikimedia-Site-requests, 07Performance: Increase $wgExpensiveParserFunctionLimit on nowiki - https://phabricator.wikimedia.org/T160685#3107696 (10Krinkle) >>! In T160685#3107736, @jeblad wrote: > Yes, I believe that would be a better solution. ;) Saving a null revision... [21:25:55] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [21:26:28] (03PS1) 10Gehel: wdqs - remove unneeded puppet dependency [puppet] - 10https://gerrit.wikimedia.org/r/343128 [21:27:44] RECOVERY - cassandra-a CQL 10.64.0.230:9042 on restbase1007 is OK: TCP OK - 0.000 second response time on 10.64.0.230 port 9042 [21:28:04] RECOVERY - cassandra-a SSL 10.64.0.230:7001 on restbase1007 is OK: SSL OK - Certificate restbase1007-a valid until 2017-09-12 15:33:29 +0000 (expires in 179 days) [21:35:03] 06Operations, 10Analytics, 10DBA: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3108227 (10leila) @Ottomata: I'm only using dbstore1002 these days. For my work, I'm fine if db1047 is gone. [21:36:23] !log T111113: Restarting Cassandra on restbase1007-{b,c} to enable (optional) client encryption [21:36:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:36:30] T111113: Cassandra client encryption - https://phabricator.wikimedia.org/T111113 [21:43:33] (03PS6) 10Dzahn: microsites: convert to profile/role-structure [puppet] - 10https://gerrit.wikimedia.org/r/342164 [21:50:33] !log T111113: Rolling restarts of Cassandra in codfw, rack 'b' [21:50:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:50:38] T111113: Cassandra client encryption - https://phabricator.wikimedia.org/T111113 [21:50:58] (03PS7) 10Dzahn: microsites: convert to profile/role-structure [puppet] - 10https://gerrit.wikimedia.org/r/342164 [21:57:14] PROBLEM - cassandra-b CQL 10.192.16.163:9042 on restbase2001 is CRITICAL: connect to address 10.192.16.163 and port 9042: Connection refused [21:57:14] PROBLEM - cassandra-b SSL 10.192.16.163:7001 on restbase2001 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [21:58:14] RECOVERY - cassandra-b CQL 10.192.16.163:9042 on restbase2001 is OK: TCP OK - 0.036 second response time on 10.192.16.163 port 9042 [21:58:14] RECOVERY - cassandra-b SSL 10.192.16.163:7001 on restbase2001 is OK: SSL OK - Certificate restbase2001-b valid until 2017-09-12 15:13:28 +0000 (expires in 179 days) [22:00:14] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:00:33] 06Operations, 10Analytics, 10DBA: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3108353 (10Halfak) I've confirmed that all tables in `staeiou` are cleared for deletion. @staeiou can come here to confirm if he gets the ping. [22:02:48] 06Operations, 10Analytics, 10DBA: Prep to decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3108355 (10Staeiou) >>! In T156844#3108353, @Halfak wrote: > I've confirmed that all tables in `staeiou` are cleared for deletion. > > @staeiou can come here to confirm if... [22:03:37] (03PS8) 10Dzahn: microsites: convert to profile/role-structure [puppet] - 10https://gerrit.wikimedia.org/r/342164 [22:04:14] PROBLEM - cassandra-a SSL 10.192.16.165:7001 on restbase2002 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [22:05:14] RECOVERY - cassandra-a SSL 10.192.16.165:7001 on restbase2002 is OK: SSL OK - Certificate restbase2002-a valid until 2017-09-12 15:35:00 +0000 (expires in 179 days) [22:07:00] (03CR) 10Dzahn: [C: 032] "no-op http://puppet-compiler.wmflabs.org/5804/" [puppet] - 10https://gerrit.wikimedia.org/r/342164 (owner: 10Dzahn) [22:07:48] (03PS9) 10Dzahn: microsites: convert to profile/role-structure [puppet] - 10https://gerrit.wikimedia.org/r/342164 [22:11:14] PROBLEM - cassandra-c CQL 10.192.16.167:9042 on restbase2002 is CRITICAL: connect to address 10.192.16.167 and port 9042: Connection refused [22:11:14] PROBLEM - cassandra-c SSL 10.192.16.167:7001 on restbase2002 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [22:12:14] RECOVERY - cassandra-c CQL 10.192.16.167:9042 on restbase2002 is OK: TCP OK - 0.037 second response time on 10.192.16.167 port 9042 [22:12:15] RECOVERY - cassandra-c SSL 10.192.16.167:7001 on restbase2002 is OK: SSL OK - Certificate restbase2002-c valid until 2017-09-12 15:35:06 +0000 (expires in 179 days) [22:15:04] PROBLEM - cassandra-a CQL 10.192.16.176:9042 on restbase2007 is CRITICAL: connect to address 10.192.16.176 and port 9042: Connection refused [22:15:51] is icinga polling more frequently these days? [22:16:04] RECOVERY - cassandra-a CQL 10.192.16.176:9042 on restbase2007 is OK: TCP OK - 0.036 second response time on 10.192.16.176 port 9042 [22:16:24] i'm surprised it's catching so many of these Cassandra restarts, it never did in the past [22:16:36] and they are less than 3 minutes in duration [22:16:58] (03CR) 10Paladox: "Should we keep modules/role/manifests/gerrit/server.pp for backwards combat as i know this will break gerrit-test3 when I'm not around. B" [puppet] - 10https://gerrit.wikimedia.org/r/342692 (owner: 10Dzahn) [22:18:14] PROBLEM - cassandra-b CQL 10.192.16.177:9042 on restbase2007 is CRITICAL: connect to address 10.192.16.177 and port 9042: Connection refused [22:18:14] PROBLEM - cassandra-b SSL 10.192.16.177:7001 on restbase2007 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [22:19:14] RECOVERY - cassandra-b CQL 10.192.16.177:9042 on restbase2007 is OK: TCP OK - 0.037 second response time on 10.192.16.177 port 9042 [22:19:15] RECOVERY - cassandra-b SSL 10.192.16.177:7001 on restbase2007 is OK: SSL OK - Certificate restbase2007-b valid until 2017-09-12 15:35:53 +0000 (expires in 179 days) [22:19:20] urandom: yea, probably it is. since it moved away from neon to einsteinium/tegmen [22:20:27] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: [epic] System level upgrade for cirrus / elasticsearch - https://phabricator.wikimedia.org/T151324#3108484 (10Deskana) [22:20:31] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 2 others: Upgrade cirrus / elasticsearch to Jessie - https://phabricator.wikimedia.org/T151326#3108483 (10Deskana) 05Open>03Resolved [22:21:00] it's ok to see the rolling restart do its thing [22:24:38] k [22:28:24] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [22:29:19] 06Operations, 10Phabricator: Upload php7.1 to apt.wm.org - https://phabricator.wikimedia.org/T160714#3108551 (10Paladox) [22:29:24] PROBLEM - puppet last run on cp3036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:30:39] 06Operations, 10Phabricator: Upload php7.1 to apt.wm.org - https://phabricator.wikimedia.org/T160714#3108565 (10Paladox) [22:34:12] !log T111113: Rolling restarts of Cassandra in codfw, rack 'a' [22:34:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:34:18] T111113: Cassandra client encryption - https://phabricator.wikimedia.org/T111113 [22:36:04] 06Operations, 10Phabricator: Upload php7.1 to apt.wm.org - https://phabricator.wikimedia.org/T160714#3108589 (10Paladox) [22:40:24] PROBLEM - cassandra-b SSL 10.192.32.135:7001 on restbase2003 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [22:41:24] RECOVERY - cassandra-b SSL 10.192.32.135:7001 on restbase2003 is OK: SSL OK - Certificate restbase2003-b valid until 2017-09-12 15:35:15 +0000 (expires in 179 days) [22:43:54] PROBLEM - cassandra-c CQL 10.192.32.136:9042 on restbase2003 is CRITICAL: connect to address 10.192.32.136 and port 9042: Connection refused [22:44:54] RECOVERY - cassandra-c CQL 10.192.32.136:9042 on restbase2003 is OK: TCP OK - 0.036 second response time on 10.192.32.136 port 9042 [22:45:39] (03PS2) 10Dzahn: RT: convert to profile/role-model [puppet] - 10https://gerrit.wikimedia.org/r/342771 [22:50:58] (03PS2) 10Andrew Bogott: Kesytonehooks: Exclude 'novaobserver' user from posix user group. [puppet] - 10https://gerrit.wikimedia.org/r/343074 (https://phabricator.wikimedia.org/T158650) [22:51:00] (03PS1) 10Andrew Bogott: Nova fullstack test: Switch to a testing image, temporarily [puppet] - 10https://gerrit.wikimedia.org/r/343207 [22:51:02] (03PS1) 10Andrew Bogott: Bootstrapvz: Simplify and update [puppet] - 10https://gerrit.wikimedia.org/r/343208 [22:53:25] (03PS3) 10Dzahn: RT: convert to profile/role-model [puppet] - 10https://gerrit.wikimedia.org/r/342771 [22:54:02] (03Draft1) 10Paladox: CI: Install php7.1 [puppet] - 10https://gerrit.wikimedia.org/r/343209 [22:54:04] (03PS2) 10Paladox: CI: Install php7.1 [puppet] - 10https://gerrit.wikimedia.org/r/343209 [22:54:27] (03PS3) 10Paladox: CI: Install php7.1 [puppet] - 10https://gerrit.wikimedia.org/r/343209 [22:55:56] (03PS4) 10Paladox: CI: Install php7.1 [puppet] - 10https://gerrit.wikimedia.org/r/343209 [22:56:14] (03PS2) 10Andrew Bogott: Nova fullstack test: Switch to a testing image, temporarily [puppet] - 10https://gerrit.wikimedia.org/r/343207 [22:56:16] (03PS2) 10Andrew Bogott: Bootstrapvz: Simplify and update [puppet] - 10https://gerrit.wikimedia.org/r/343208 [22:56:18] (03PS3) 10Andrew Bogott: Keystonehooks: Exclude 'novaobserver' user from posix user group. [puppet] - 10https://gerrit.wikimedia.org/r/343074 (https://phabricator.wikimedia.org/T158650) [22:57:24] RECOVERY - puppet last run on cp3036 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [23:00:05] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170316T2300). Please do the needful. [23:00:05] tgr and ebernhardson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:54] (03Draft1) 10Paladox: Phabricator: Allow us to install php7.1 for testing on labs. [puppet] - 10https://gerrit.wikimedia.org/r/343211 [23:02:54] (03PS2) 10Paladox: Phabricator: Allow us to install php7.1 for testing on labs. [puppet] - 10https://gerrit.wikimedia.org/r/343211 [23:04:14] PROBLEM - cassandra-c SSL 10.192.32.145:7001 on restbase2008 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [23:05:14] RECOVERY - cassandra-c SSL 10.192.32.145:7001 on restbase2008 is OK: SSL OK - Certificate restbase2008-c valid until 2017-09-12 15:36:03 +0000 (expires in 179 days) [23:05:49] (03PS5) 10Paladox: CI: Install php7.1 [puppet] - 10https://gerrit.wikimedia.org/r/343209 [23:08:07] (03CR) 10jerkins-bot: [V: 04-1] Phabricator: Allow us to install php7.1 for testing on labs. [puppet] - 10https://gerrit.wikimedia.org/r/343211 (owner: 10Paladox) [23:08:42] (03PS4) 10Dzahn: RT: convert to profile/role-model [puppet] - 10https://gerrit.wikimedia.org/r/342771 [23:09:39] (03PS3) 10Paladox: Phabricator: Allow us to install php7.1 for testing on labs. [puppet] - 10https://gerrit.wikimedia.org/r/343211 [23:10:56] (03CR) 10Smalyshev: [C: 031] wdqs - remove unneeded puppet dependency [puppet] - 10https://gerrit.wikimedia.org/r/343128 (owner: 10Gehel) [23:12:43] (03CR) 10jerkins-bot: [V: 04-1] Phabricator: Allow us to install php7.1 for testing on labs. [puppet] - 10https://gerrit.wikimedia.org/r/343211 (owner: 10Paladox) [23:13:42] (03PS4) 10Paladox: Phabricator: Allow us to install php7.1 for testing on labs. [puppet] - 10https://gerrit.wikimedia.org/r/343211 [23:13:46] (03CR) 10Dzahn: [C: 032] "now no-op as well http://puppet-compiler.wmflabs.org/5807/" [puppet] - 10https://gerrit.wikimedia.org/r/342771 (owner: 10Dzahn) [23:14:17] (03PS5) 10Dzahn: RT: convert to profile/role-model [puppet] - 10https://gerrit.wikimedia.org/r/342771 [23:15:32] 06Operations, 06Office-IT, 15User-Urbanecm: Request for email address seniori@wikimedia.org - https://phabricator.wikimedia.org/T160400#3108631 (10eross) [23:16:32] 06Operations, 06Office-IT, 15User-Urbanecm: Request for email address seniori@wikimedia.org - https://phabricator.wikimedia.org/T160400#3097403 (10eross) Hi All, Thank you for the clarifications, this will be a job for ops. Ive tag them on here, so hopefully someone will get back to you shortly. Best, Eme... [23:16:58] No one doing swat? ebernhardson and tgr about for your patches? [23:17:19] here [23:17:28] I can do it, I got distracted [23:17:35] (03PS3) 10Reedy: Deploy PageViewInfo to group2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342685 (https://phabricator.wikimedia.org/T125917) (owner: 10Gergő Tisza) [23:17:39] I don't mind [23:17:44] Just scrolled up and was confused :) [23:17:47] in that case, thanks [23:17:50] (03CR) 10Reedy: [C: 032] Deploy PageViewInfo to group2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342685 (https://phabricator.wikimedia.org/T125917) (owner: 10Gergő Tisza) [23:17:57] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 2 others: Purge Varnish cache when a banner is saved - https://phabricator.wikimedia.org/T154954#2929493 (10Ejegg) Wouldn't Vary: Cookie explode caching all over the place due to things like the last visit date? [23:19:41] (03Merged) 10jenkins-bot: Deploy PageViewInfo to group2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342685 (https://phabricator.wikimedia.org/T125917) (owner: 10Gergő Tisza) [23:19:52] (03CR) 10jenkins-bot: Deploy PageViewInfo to group2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/342685 (https://phabricator.wikimedia.org/T125917) (owner: 10Gergő Tisza) [23:19:58] 06Operations, 06Office-IT, 15User-Urbanecm: Request for email address seniori@wikimedia.org - https://phabricator.wikimedia.org/T160400#3108641 (10Dzahn) Hi, sorry to be that guy, but this should go back to OIT, since there is a long and ongoing effort to "Move most (all?) exim personal aliases to OIT" (T12... [23:24:34] !log T111113: Rolling restarts of Cassandra in codfw, rack 'b' [23:24:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:24:41] T111113: Cassandra client encryption - https://phabricator.wikimedia.org/T111113 [23:24:42] !log T111113: Rolling restarts of Cassandra in codfw, rack 'd' *correction* [23:24:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:25:14] !log reedy@tin Synchronized wmf-config/InitialiseSettings.php: Enable PageViewInfo to group2 T125917 (duration: 00m 49s) [23:25:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:25:20] T125917: Deploy the PageViewInfo extension to production - https://phabricator.wikimedia.org/T125917 [23:26:28] Reedy: tested, works [23:26:48] Are they magicly hidden? I don't see it on https://en.wikipedia.org/wiki/Main_Page?action=info [23:28:18] ebernhardson: Ping [23:29:02] Reedy: last item in the first table [23:29:13] duhhh [23:29:14] :) [23:34:04] PROBLEM - cassandra-c CQL 10.192.48.48:9042 on restbase2005 is CRITICAL: connect to address 10.192.48.48 and port 9042: Connection refused [23:35:04] RECOVERY - cassandra-c CQL 10.192.48.48:9042 on restbase2005 is OK: TCP OK - 0.036 second response time on 10.192.48.48 port 9042 [23:36:21] (03CR) 10Dzahn: "hrmm, in puppet-lint 2.1.2.pre this is the fix, and it even tells us which column to use:" [puppet] - 10https://gerrit.wikimedia.org/r/341965 (owner: 10Dzahn) [23:36:40] 06Operations, 06Performance-Team, 10Wikimedia-Site-requests, 07Performance: Increase $wgExpensiveParserFunctionLimit on nowiki - https://phabricator.wikimedia.org/T160685#3108695 (10jeblad) The module is a Wikidata project, and performance issues should be directed there. As I said on IRC, I don't think th... [23:37:30] (03CR) 10Dzahn: "so i guess this was fixed between 2.0.2 and 2.1.2.pre and i should just abandon it again until next time we upgrade." [puppet] - 10https://gerrit.wikimedia.org/r/341965 (owner: 10Dzahn) [23:37:32] we are still using CodeReview in production? o_O [23:37:34] PROBLEM - cassandra-a CQL 10.192.48.49:9042 on restbase2006 is CRITICAL: connect to address 10.192.48.49 and port 9042: Connection refused [23:37:43] tgr: Data hasn't been migrated elsewhere [23:37:44] (03Abandoned) 10Dzahn: prometheus: fix lint warning [puppet] - 10https://gerrit.wikimedia.org/r/341965 (owner: 10Dzahn) [23:38:13] I thought it has its own database tables? [23:38:19] it does [23:38:29] But if we disable the extension, the general public can't view them [23:38:34] RECOVERY - cassandra-a CQL 10.192.48.49:9042 on restbase2006 is OK: TCP OK - 0.036 second response time on 10.192.48.49 port 9042 [23:38:34] why not wikitech or mediawiki.org then? [23:38:51] What about them? [23:39:34] in any case, CodeReview is the top error source in fatalmonitor [23:39:46] Is it the one from last week? [23:39:52] Something to do with a preg_ call? [23:40:08] so it seems [23:40:38] Brilliant [23:40:38] https://gerrit.wikimedia.org/r/#/c/341849/ [23:40:44] Jenkins never merged it [23:41:51] (03CR) 10Dzahn: Phabricator: Allow us to install php7.1 for testing on labs. (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/343211 (owner: 10Paladox) [23:42:58] (03CR) 10Dzahn: Phabricator: Allow us to install php7.1 for testing on labs. (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/343211 (owner: 10Paladox) [23:43:06] (03CR) 10Dzahn: [C: 04-1] Phabricator: Allow us to install php7.1 for testing on labs. [puppet] - 10https://gerrit.wikimedia.org/r/343211 (owner: 10Paladox) [23:46:22] (03CR) 10Dzahn: [C: 04-1] "please don't do the change from "ensure => 'latest'" to "ensure => 'installed'" in this same change. It is a rather big change that stops " [puppet] - 10https://gerrit.wikimedia.org/r/343209 (owner: 10Paladox) [23:46:44] !log reedy@tin Synchronized php-1.29.0-wmf.16/extensions/CodeReview: Fix preg_ error again (duration: 00m 47s) [23:46:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:47:27] (03CR) 10Dzahn: [C: 04-1] "would also be good if you add why an upgrade is desired (the thing about phab needing 7.1 and your separate ticket to get that uploaded to" [puppet] - 10https://gerrit.wikimedia.org/r/343209 (owner: 10Paladox) [23:50:55] (03PS3) 10Dzahn: Update ruthenium nginx conf to handle updated parsoid test domains [puppet] - 10https://gerrit.wikimedia.org/r/343099 (https://phabricator.wikimedia.org/T159995) (owner: 10Subramanya Sastry) [23:57:54] (03CR) 10Dzahn: [C: 04-1] "amended to remove literal tab chars." [puppet] - 10https://gerrit.wikimedia.org/r/343099 (https://phabricator.wikimedia.org/T159995) (owner: 10Subramanya Sastry) [23:58:08] (03CR) 10Dzahn: [C: 04-1] "nginx: [emerg] "server" directive is not allowed here in /etc/nginx/sites-enabled/nginx-parsoid-testing:1" [puppet] - 10https://gerrit.wikimedia.org/r/343099 (https://phabricator.wikimedia.org/T159995) (owner: 10Subramanya Sastry) [23:59:03] (03CR) 10Dzahn: [C: 04-1] "that said, the same thing happens on the old config before this change. ehmm.." [puppet] - 10https://gerrit.wikimedia.org/r/343099 (https://phabricator.wikimedia.org/T159995) (owner: 10Subramanya Sastry)