[00:00:05] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: It is that lovely time of the day again! You are hereby commanded to deploy Evening SWAT (Max 6 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20190215T0000). [00:00:05] No GERRIT patches in the queue for this window AFAICS. [00:00:12] marxarelli: uhmm. something about the footer lines.. probably a new line is expected [00:00:19] Line 12: Expected 'Bug:' to be in footer [00:00:54] oh maybe gerrit messed it up. i edited in the ui [00:02:29] (03PS5) 10Dduvall: ci: Permit git traffic between zuul mergers over ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/490790 (https://phabricator.wikimedia.org/T216204) [00:03:06] (03CR) 10Dduvall: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/490790 (https://phabricator.wikimedia.org/T216204) (owner: 10Dduvall) [00:03:23] (03CR) 10jerkins-bot: [V: 04-1] ci: Permit git traffic between zuul mergers over ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/490790 (https://phabricator.wikimedia.org/T216204) (owner: 10Dduvall) [00:03:25] (03CR) 10Dduvall: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/490790 (https://phabricator.wikimedia.org/T216204) (owner: 10Dduvall) [00:03:39] gah. [00:04:10] (03CR) 10jerkins-bot: [V: 04-1] ci: Permit git traffic between zuul mergers over ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/490790 (https://phabricator.wikimedia.org/T216204) (owner: 10Dduvall) [00:04:41] marxarelli put a new line between Hosts and Bug [00:05:43] (03PS6) 10Dduvall: ci: Permit git traffic between zuul mergers over ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/490790 (https://phabricator.wikimedia.org/T216204) [00:06:54] paladox: nice :) [00:07:08] :) [00:08:56] (03PS4) 10CDanis: WIP: puppet for rasdaemon testing [puppet] - 10https://gerrit.wikimedia.org/r/490787 [00:09:18] mutante: the compiler diff looks right to me [00:10:05] it's only iptables. what could go wrong? :) [00:10:19] marxarelli: yea, heh, agree. let's merge :) [00:10:49] we can also revert by removing the generated ferm file, heh [00:11:25] (03CR) 10Dzahn: [C: 03+2] ci: Permit git traffic between zuul mergers over ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/490790 (https://phabricator.wikimedia.org/T216204) (owner: 10Dduvall) [00:12:15] deploying on 2001 [00:13:54] looks fine. ferm refreshed. though.. dont see in ip6tables -L yet.. sec [00:14:12] mutante: `telnet -6 contint2001.wikimedia.org 9418` from contint1001 works now! [00:15:09] and a `git fetch --ipv6` was speedy [00:15:18] !log setting labsdb1005 back into read-write [00:15:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:15:20] thanks for the quick review [00:15:25] marxarelli: oh, of course, i looked in the wrong place :) [00:15:29] glad it works, ok [00:15:47] deploying on 1001 [00:16:01] that will speed up our service-jenkinsjob from 9 minutes to maybe 30 seconds [00:16:36] nice, can use that in an achievement list [00:16:47] so measurable [00:16:56] haha :) [00:17:01] greg-g: ^ {{done}} [00:17:22] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10cloud-services-team (Kanban): Degraded RAID on cloudvirt1020 - https://phabricator.wikimedia.org/T194855 (10bd808) [00:18:00] i'm always for adding more IPv6 records :) [00:18:12] we just have to remember the ferm part [00:20:05] mutante: yay! [00:20:30] :) [00:21:37] er, marxarelli yay! [00:21:40] but both! :) [00:23:10] (03CR) 10Paladox: [C: 03+1] "This looks really nice (based on his test site). Im going to work on making it nice for all integration ci checks. Basing it on this (as i" [puppet] - 10https://gerrit.wikimedia.org/r/490640 (https://phabricator.wikimedia.org/T177868) (owner: 10Thcipriani) [00:25:20] greg-g: "SUCCESS in 33s" 3 seconds off our new KPI [00:25:30] * marxarelli is fired [00:26:45] oh? [00:27:35] (03PS6) 10Dzahn: remove bast3003 prod IP, bast3002 has been repaired [dns] - 10https://gerrit.wikimedia.org/r/489104 (https://phabricator.wikimedia.org/T184936) [00:28:25] (03CR) 10Dzahn: [C: 03+2] "bast3003 never became actively used, then bast3002 was repaired, so 3002 is still the current one" [dns] - 10https://gerrit.wikimedia.org/r/489104 (https://phabricator.wikimedia.org/T184936) (owner: 10Dzahn) [00:32:55] 10Operations, 10ops-esams, 10DC-Ops, 10decommission, 10Patch-For-Review: decom bast3003 (65R8Q4J, formerly amslvs4) - https://phabricator.wikimedia.org/T216199 (10Dzahn) I removed production DNS entries but kept mgmt because the host does not have mgmt entries by asset tag. The DRAC IP is: 10.21.0.109 an... [00:35:42] 10Operations, 10ops-esams, 10DC-Ops, 10decommission: decom bast3003 (65R8Q4J, formerly amslvs4) - https://phabricator.wikimedia.org/T216199 (10Dzahn) a:03RobH [00:36:51] 10Operations, 10ops-esams, 10DC-Ops, 10decommission: decom bast3003 (65R8Q4J, formerly amslvs4) - https://phabricator.wikimedia.org/T216199 (10Dzahn) I did not start the "non-interrupt steps" now or anything but i marked the first 2 check boxes there because it's already removed from puppet (was never in /... [00:39:25] !log puppetmaster1001: sudo puppet node clean bast3003.wikimedia.org ; sudo puppet node deactivate bast3003.wikimedia.org (T216199) [00:39:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:39:28] T216199: decom bast3003 (65R8Q4J, formerly amslvs4) - https://phabricator.wikimedia.org/T216199 [00:40:25] 10Operations, 10ops-esams, 10DC-Ops, 10decommission: decom bast3003 (65R8Q4J, formerly amslvs4) - https://phabricator.wikimedia.org/T216199 (10Dzahn) [00:49:09] 10Operations, 10ops-esams, 10DC-Ops, 10decommission: decom bast3003 (65R8Q4J, formerly amslvs4) - https://phabricator.wikimedia.org/T216199 (10Dzahn) [00:49:26] PROBLEM - Logstash rate of ingestion percent change compared to yesterday on icinga2001 is CRITICAL: 141.5 ge 130 https://grafana.wikimedia.org/dashboard/db/logstash?orgId=1&panelId=2&fullscreen [00:49:43] 10Operations, 10ops-esams, 10Patch-For-Review: install/designate other machine as esams bastion - https://phabricator.wikimedia.org/T184936 (10Dzahn) reverted in T216199 [00:54:48] (03CR) 10Smalyshev: [C: 04-1] "until testing is complete" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490648 (https://phabricator.wikimedia.org/T215684) (owner: 10Jforrester) [00:55:18] (03CR) 10Smalyshev: [C: 03+1] Deploy WikibaseCirrusSearch: Part II, InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490644 (owner: 10Jforrester) [00:56:08] (03CR) 10Smalyshev: [C: 03+1] Deploy WikibaseCirrusSearch: Part III, Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490645 (owner: 10Jforrester) [00:56:13] (03CR) 10Smalyshev: [C: 03+1] [BETA] Enable WikibaseCirrusSearch on wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490646 (owner: 10Jforrester) [01:00:26] (03CR) 10Dzahn: "addressed both comments,should be resolved" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/451821 (owner: 10Dzahn) [01:04:59] (03PS8) 10Dzahn: puppetmaster/configmaster: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/451821 [01:05:32] (03PS1) 10Paladox: Fix replacing ${project-base-name} in gr-project [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/490796 [01:09:02] bug: the bug reporting bug quits when there are too many bugs [01:11:19] (03PS9) 10Dzahn: puppetmaster/configmaster: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/451821 [01:12:12] (03CR) 10jerkins-bot: [V: 04-1] puppetmaster/configmaster: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/451821 (owner: 10Dzahn) [01:12:27] (03CR) 10Dzahn: "manually rebased on top of prod due to changes in puppetmaster::passenger" [puppet] - 10https://gerrit.wikimedia.org/r/451821 (owner: 10Dzahn) [01:13:01] (03CR) 10Paladox: [V: 03+2 C: 03+2] "Verified that this builds." [software/gerrit] (wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/490110 (owner: 10Paladox) [01:13:35] (03PS10) 10Dzahn: puppetmaster/configmaster: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/451821 [01:14:39] (03CR) 10Dzahn: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/451821 (owner: 10Dzahn) [01:20:16] (03CR) 10Dzahn: "see https://puppet-compiler.wmflabs.org/compiler1001/14689/" [puppet] - 10https://gerrit.wikimedia.org/r/451821 (owner: 10Dzahn) [01:22:31] 10Operations, 10DC-Ops, 10Wikimedia-Incident, 10cloud-services-team (Kanban): Increase visibility of auto-generated tasks for RAID errors - https://phabricator.wikimedia.org/T216133 (10Bstorm) [01:30:27] (03PS11) 10Dzahn: puppetmaster/configmaster: convert from apache to httpd module [puppet] - 10https://gerrit.wikimedia.org/r/451821 [01:39:34] (03CR) 10Dzahn: [C: 03+1] "should be good now. any concerns? could you take another look? https://puppet-compiler.wmflabs.org/compiler1001/14689/" [puppet] - 10https://gerrit.wikimedia.org/r/451821 (owner: 10Dzahn) [01:41:35] (03CR) 10Dzahn: [C: 03+1] "you said you wanted to think about reorganizing this, right. maybe it would be ok to merge anyways and live with the duplication until we " [puppet] - 10https://gerrit.wikimedia.org/r/479335 (owner: 10Dzahn) [01:43:02] (03PS4) 10Dzahn: noc: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/416751 [01:46:57] (03PS5) 10CDanis: rasdaemon: add hiera overrides for testing on stretch [puppet] - 10https://gerrit.wikimedia.org/r/490787 (https://phabricator.wikimedia.org/T205396) [01:47:50] (03CR) 10CDanis: "PCC for a stretch host with hiera, and a stretch and a buster host with no hiera:" [puppet] - 10https://gerrit.wikimedia.org/r/490787 (https://phabricator.wikimedia.org/T205396) (owner: 10CDanis) [01:48:57] 10Operations, 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-Logstash, and 6 others: MediaWiki errors overloading logstash - https://phabricator.wikimedia.org/T215611 (10aaron) I think Timo backported the change, so it should be live: ` 18:01 krinkle@deploy1001: Synchronized php-1.33.0-wmf.17/inclu... [01:49:31] (03CR) 10Dzahn: "mediawiki has been converted, so this should be unblocked now, rebased" [puppet] - 10https://gerrit.wikimedia.org/r/416751 (owner: 10Dzahn) [01:58:47] (03CR) 10Dzahn: [C: 04-1] "not there yet, still need to convert noc::php_engine as well" [puppet] - 10https://gerrit.wikimedia.org/r/416751 (owner: 10Dzahn) [02:09:31] (03CR) 10CDanis: "Oh, also -- I was quite careful to have Puppet revert to defaults if hiera is unset, but also am not sure if I implemented it well. Let m" [puppet] - 10https://gerrit.wikimedia.org/r/490787 (https://phabricator.wikimedia.org/T205396) (owner: 10CDanis) [02:14:44] (03PS10) 10Paladox: Introduce gr-wikimedia-prettify-ci-comments [software/gerrit/plugins/wikimedia] - 10https://gerrit.wikimedia.org/r/489483 (https://phabricator.wikimedia.org/T215658) [02:35:01] looks like the cpu has gone up on gerrit again. (dosen't seem to be impacting it) [02:37:37] (03PS1) 10Paladox: Gerrit: Remove socket config from log4j [puppet] - 10https://gerrit.wikimedia.org/r/490797 [02:40:10] (03PS2) 10Paladox: Gerrit: Remove socket config from log4j [puppet] - 10https://gerrit.wikimedia.org/r/490797 [02:40:12] (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/490797 (owner: 10Paladox) [02:41:06] (03CR) 10jerkins-bot: [V: 04-1] Gerrit: Remove socket config from log4j [puppet] - 10https://gerrit.wikimedia.org/r/490797 (owner: 10Paladox) [02:41:07] (03CR) 10jerkins-bot: [V: 04-1] Gerrit: Remove socket config from log4j [puppet] - 10https://gerrit.wikimedia.org/r/490797 (owner: 10Paladox) [02:41:55] (03PS3) 10Paladox: Gerrit: Remove socket config from log4j [puppet] - 10https://gerrit.wikimedia.org/r/490797 [02:43:57] (03PS4) 10Paladox: Gerrit: Remove socket config from log4j [puppet] - 10https://gerrit.wikimedia.org/r/490797 [02:44:04] (03CR) 10Paladox: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/490797 (owner: 10Paladox) [02:49:38] RECOVERY - Logstash rate of ingestion percent change compared to yesterday on icinga2001 is OK: (C)130 ge (W)110 ge 108.3 https://grafana.wikimedia.org/dashboard/db/logstash?orgId=1&panelId=2&fullscreen [04:39:51] 10Operations, 10ops-eqiad: Disk failure on labsdb1005 - https://phabricator.wikimedia.org/T216202 (10Bstorm) Manually dropped the disk from the RAID because the server is being very bad. The database keeps locking up. [05:04:46] PROBLEM - MegaRAID on labsdb1005 is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) [05:04:49] ACKNOWLEDGEMENT - MegaRAID on labsdb1005 is CRITICAL: CRITICAL: 1 failed LD(s) (Partially Degraded) nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T216223 [05:04:54] 10Operations, 10ops-eqiad: Degraded RAID on labsdb1005 - https://phabricator.wikimedia.org/T216223 (10ops-monitoring-bot) [06:11:14] (03PS2) 10Marostegui: Revert "dbproxy1011: Depool labsdb1009" [puppet] - 10https://gerrit.wikimedia.org/r/490636 [06:11:56] (03PS1) 10Marostegui: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490801 (https://phabricator.wikimedia.org/T210713) [06:12:31] (03CR) 10Marostegui: [C: 03+2] Revert "dbproxy1011: Depool labsdb1009" [puppet] - 10https://gerrit.wikimedia.org/r/490636 (owner: 10Marostegui) [06:13:57] !log Reload haproxy on dbproxy11 to repool labsdb1009 [06:13:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:14:26] (03CR) 10Marostegui: [C: 03+2] db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490801 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:15:24] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490801 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:16:35] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1109 (duration: 00m 49s) [06:16:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:17:48] !log Deploy schema change on db1109 - T210713 [06:17:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:17:51] T210713: Drop change_tag.ct_tag column in production - https://phabricator.wikimedia.org/T210713 [06:21:00] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1109 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490801 (https://phabricator.wikimedia.org/T210713) (owner: 10Marostegui) [06:25:27] (03PS2) 10Ammarpad: Increase default thumb size to 260px on Dutch Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490395 (https://phabricator.wikimedia.org/T215106) [06:26:07] (03PS3) 10Ammarpad: Set wgArticleCountMethod='any' for zhwikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/487115 [06:27:33] (03PS10) 10Ammarpad: Add 'Author' namespace in Sanskrit Wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/486221 (https://phabricator.wikimedia.org/T214553) [06:30:18] labsdb1005 is again with too many connections [06:30:21] arturo: ^ [06:39:35] !log Restart labsdb1005 with max_user_connections = 20 T216208 [06:39:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:39:38] T216208: ToolsDB overload and cleanup - https://phabricator.wikimedia.org/T216208 [06:40:58] !log Stop puppet on labsdb1005 to leave "max_user_connections" on my.cnf - T216170 T216208 [06:41:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:41:02] T216170: toolsdb - Per-user connection limits - https://phabricator.wikimedia.org/T216170 [07:02:08] (03PS1) 10Giuseppe Lavagetto: profile::services_proxy: make declaring services optional [puppet] - 10https://gerrit.wikimedia.org/r/490805 (https://phabricator.wikimedia.org/T216164) [07:05:45] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) https://rocm.github.io/hardware.html is very handy. As said previously, I'd stick with a GFX9 card, I'd say a RX Vega 64 or Radeon VII... [07:06:49] (03PS1) 10Marostegui: tools.my.cnf: Temporary max_user_connections [puppet] - 10https://gerrit.wikimedia.org/r/490806 (https://phabricator.wikimedia.org/T216170) [07:08:11] (03CR) 10Marostegui: [C: 03+2] tools.my.cnf: Temporary max_user_connections [puppet] - 10https://gerrit.wikimedia.org/r/490806 (https://phabricator.wikimedia.org/T216170) (owner: 10Marostegui) [07:19:35] (03CR) 10Giuseppe Lavagetto: [C: 03+2] profile::services_proxy: make declaring services optional [puppet] - 10https://gerrit.wikimedia.org/r/490805 (https://phabricator.wikimedia.org/T216164) (owner: 10Giuseppe Lavagetto) [07:19:48] (03PS2) 10Giuseppe Lavagetto: profile::services_proxy: make declaring services optional [puppet] - 10https://gerrit.wikimedia.org/r/490805 (https://phabricator.wikimedia.org/T216164) [07:21:34] 10Operations, 10ops-eqiad, 10Data-Services, 10cloud-services-team: Degraded RAID on labsdb1005 - https://phabricator.wikimedia.org/T216223 (10Marostegui) [07:23:10] 10Operations, 10ops-eqiad, 10Data-Services, 10cloud-services-team: Degraded RAID on labsdb1005 - https://phabricator.wikimedia.org/T216223 (10Marostegui) #cloud-services-team I would suggest you coordinate with @Cmjohnson to get this disk replaced [07:24:23] 10Operations, 10ops-eqiad, 10Toolforge, 10cloud-services-team: Degraded RAID on labsdb1005 - https://phabricator.wikimedia.org/T216223 (10Marostegui) [07:36:12] (03CR) 10Alexandros Kosiaris: [C: 03+2] "Thanks!" [dns] - 10https://gerrit.wikimedia.org/r/489251 (owner: 10Alexandros Kosiaris) [07:36:19] (03PS9) 10Alexandros Kosiaris: Add kubernetes pods PTR records for IPv4 [dns] - 10https://gerrit.wikimedia.org/r/489251 [07:36:24] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Add kubernetes pods PTR records for IPv4 [dns] - 10https://gerrit.wikimedia.org/r/489251 (owner: 10Alexandros Kosiaris) [07:55:40] !log bounce prometheus@ops on prometheus2004 to take a snapshot [07:55:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:55:57] that'll cause a bunch of UNKNOWN for some minutes FYI [08:06:03] 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10User-herron: Replace and expand Elasticsearch storage in eqiad and upgrade the cluster from Debian jessie to stretch - https://phabricator.wikimedia.org/T213898 (10fgiunchedi) On stretch by default we're installing `elasticsearch-curator` from `stre... [08:17:45] PROBLEM - Host backup2001 is DOWN: PING CRITICAL - Packet loss = 100% [08:17:59] RECOVERY - Host backup2001 is UP: PING OK - Packet loss = 0%, RTA = 0.48 ms [08:20:03] (03CR) 10Alexandros Kosiaris: [C: 03+1] Gerrit: comment styles for the pipeline [puppet] - 10https://gerrit.wikimedia.org/r/490640 (https://phabricator.wikimedia.org/T177868) (owner: 10Thcipriani) [08:20:13] (03CR) 10Alexandros Kosiaris: [C: 03+2] Gerrit: comment styles for the pipeline [puppet] - 10https://gerrit.wikimedia.org/r/490640 (https://phabricator.wikimedia.org/T177868) (owner: 10Thcipriani) [08:20:21] (03PS2) 10Alexandros Kosiaris: Gerrit: comment styles for the pipeline [puppet] - 10https://gerrit.wikimedia.org/r/490640 (https://phabricator.wikimedia.org/T177868) (owner: 10Thcipriani) [08:20:23] (03CR) 10Alexandros Kosiaris: [V: 03+2 C: 03+2] Gerrit: comment styles for the pipeline [puppet] - 10https://gerrit.wikimedia.org/r/490640 (https://phabricator.wikimedia.org/T177868) (owner: 10Thcipriani) [08:28:54] !log rolling restart of apertium to pick up Python 3.4 security update [08:28:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:30:48] (03PS1) 10Filippo Giunchedi: logstash: force use elasticsearch-curator 5 [puppet] - 10https://gerrit.wikimedia.org/r/490809 (https://phabricator.wikimedia.org/T213898) [08:37:19] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490810 [08:37:25] 10Operations, 10hardware-requests: GPU for stat1005 - https://phabricator.wikimedia.org/T216226 (10elukey) [08:38:22] 10Operations, 10Analytics, 10Research-management, 10Patch-For-Review, 10User-Elukey: GPU upgrade for stats machine - https://phabricator.wikimedia.org/T148843 (10elukey) Opened T216226 to discuss hw requirements for the new GPU (everybody interested please subscribe/chime-in!), let's use this task to deb... [08:38:37] 10Operations, 10Analytics, 10hardware-requests: GPU upgrade for stat1005 - https://phabricator.wikimedia.org/T216226 (10elukey) [08:38:39] (03CR) 10Marostegui: [C: 03+2] Revert "db-eqiad.php: Depool db1109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490810 (owner: 10Marostegui) [08:39:42] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490810 (owner: 10Marostegui) [08:40:40] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1109 (duration: 00m 46s) [08:40:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:29] !log restart gerrit to pick up https://gerrit.wikimedia.org/r/490640 T177868 [08:42:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:34] T177868: Define pipeline failure developer feedback - https://phabricator.wikimedia.org/T177868 [08:45:29] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1109" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490810 (owner: 10Marostegui) [08:45:33] PROBLEM - puppet last run on webperf2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_performance/docroot] [08:45:43] PROBLEM - puppet last run on cumin2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_operations/cookbooks] [08:48:17] PROBLEM - puppet last run on webperf1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_performance/docroot] [08:49:32] (03CR) 10Filippo Giunchedi: "LGTM, typo in commit message" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/490695 (https://phabricator.wikimedia.org/T214608) (owner: 10Herron) [08:54:07] (03PS2) 10Elukey: Define the zookeeper rmstore path on each Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/490575 [09:02:46] (03CR) 10Muehlenhoff: "The conditional case should not be necessary; mcelog operated on /dev/mcelog and rasdaemon on debugfs, so they can coexist along each othe" [puppet] - 10https://gerrit.wikimedia.org/r/490787 (https://phabricator.wikimedia.org/T205396) (owner: 10CDanis) [09:08:19] (03CR) 10Elukey: [C: 03+2] Define the zookeeper rmstore path on each Hadoop cluster [puppet] - 10https://gerrit.wikimedia.org/r/490575 (owner: 10Elukey) [09:12:41] (03CR) 10Volans: [C: 04-1] "There's a typo, looks good otherwise. Please test it on the test instance first." (031 comment) [software/netbox-reports] - 10https://gerrit.wikimedia.org/r/490700 (owner: 10CRusnov) [09:14:56] (03PS1) 10Elukey: Set the Yarn rmstore zookeeper path for the Hadoop testing cluster [puppet] - 10https://gerrit.wikimedia.org/r/490826 [09:16:05] (03CR) 10Elukey: [C: 03+2] Set the Yarn rmstore zookeeper path for the Hadoop testing cluster [puppet] - 10https://gerrit.wikimedia.org/r/490826 (owner: 10Elukey) [09:16:38] RECOVERY - puppet last run on webperf2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [09:16:48] RECOVERY - puppet last run on cumin2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [09:19:48] RECOVERY - puppet last run on webperf1001 is OK: OK: Puppet is currently enabled, last run 7 minutes ago with 0 failures [09:26:10] (03CR) 10Elukey: [C: 03+2] "I had to follow up with https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/490826/, since the rmstore test cluster path was 'rmstore-c" [puppet] - 10https://gerrit.wikimedia.org/r/490575 (owner: 10Elukey) [09:33:20] !log imported php-defaults debs to thirdparty/php72 [09:33:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:55] !log repool maps100[12] [09:44:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:48:12] 10Operations, 10Puppet, 10Discovery-Search, 10Maps, 10Patch-For-Review: Fix maps puppet to make sure apt-get update runs after configuration change - https://phabricator.wikimedia.org/T214073 (10Gehel) 05Open→03Resolved a:03Gehel [09:55:12] PROBLEM - DPKG on contint1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:58:26] RECOVERY - DPKG on contint1001 is OK: All packages OK [10:03:24] 10Operations, 10MediaWiki-Database, 10Performance-Team, 10Wikimedia-Logstash, and 6 others: MediaWiki errors overloading logstash - https://phabricator.wikimedia.org/T215611 (10fgiunchedi) Apologies for the late reply -- and thanks to all who have helped in fixing the issue! >>! In T215611#4950057, @bd808... [10:07:43] (03PS1) 10Elukey: profile::analytics::refinery:job::data_purge: absent mediawiki-geoeditors-drop-month [puppet] - 10https://gerrit.wikimedia.org/r/490829 [10:07:54] (03PS2) 10ArielGlenn: misc dumps: report names of most recent failed wikis if we bail out [dumps] - 10https://gerrit.wikimedia.org/r/488261 [10:08:36] (03CR) 10Joal: [C: 03+1] "Thanks Luca :)" [puppet] - 10https://gerrit.wikimedia.org/r/490829 (owner: 10Elukey) [10:08:43] (03CR) 10jerkins-bot: [V: 04-1] profile::analytics::refinery:job::data_purge: absent mediawiki-geoeditors-drop-month [puppet] - 10https://gerrit.wikimedia.org/r/490829 (owner: 10Elukey) [10:09:41] sorry jenkins [10:09:42] fixing [10:11:13] (03PS2) 10Elukey: profile::analytics::refinery:job::data_purge: absent timer [puppet] - 10https://gerrit.wikimedia.org/r/490829 [10:12:11] (03CR) 10Elukey: [C: 03+2] profile::analytics::refinery:job::data_purge: absent timer [puppet] - 10https://gerrit.wikimedia.org/r/490829 (owner: 10Elukey) [10:21:59] (03CR) 10Gehel: [C: 04-1] "see comments inline." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/490809 (https://phabricator.wikimedia.org/T213898) (owner: 10Filippo Giunchedi) [10:33:33] (03PS1) 10GTirloni: cloudvirt1019/1020 - Reimage with Stretch [puppet] - 10https://gerrit.wikimedia.org/r/490831 (https://phabricator.wikimedia.org/T193264) [10:33:50] 10Operations, 10Discovery-Search, 10Elasticsearch: cleanup reprepro configuration for elasticsearch-curator - https://phabricator.wikimedia.org/T216235 (10Gehel) [10:35:23] !log reboot cloudvirt1019 [10:35:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:35:57] (03PS2) 10ArielGlenn: showcrcs: util to write out crc information from a bzip2 file [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/490299 (https://phabricator.wikimedia.org/T216009) [10:42:57] !log upgrade docker on contint1001 to 18.06.2 T216236 [10:42:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:48:36] !log upgrade docker on contint2001 to 18.06.2 T216236 [10:48:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:32] (03PS2) 10Filippo Giunchedi: logstash: force use elasticsearch-curator 5 [puppet] - 10https://gerrit.wikimedia.org/r/490809 (https://phabricator.wikimedia.org/T213898) [10:52:06] (03CR) 10Filippo Giunchedi: logstash: force use elasticsearch-curator 5 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/490809 (https://phabricator.wikimedia.org/T213898) (owner: 10Filippo Giunchedi) [10:52:26] 10Operations, 10Discovery-Search, 10Elasticsearch, 10User-fgiunchedi: cleanup reprepro configuration for elasticsearch-curator - https://phabricator.wikimedia.org/T216235 (10fgiunchedi) [10:54:14] (03PS3) 10Muehlenhoff: Drop requires_os checks for trusty [puppet] - 10https://gerrit.wikimedia.org/r/489625 [10:54:57] (03PS1) 10Alexandros Kosiaris: Upgrade docker in CI [puppet] - 10https://gerrit.wikimedia.org/r/490832 (https://phabricator.wikimedia.org/T216236) [10:56:00] (03CR) 10Alexandros Kosiaris: [C: 03+2] Upgrade docker in CI [puppet] - 10https://gerrit.wikimedia.org/r/490832 (https://phabricator.wikimedia.org/T216236) (owner: 10Alexandros Kosiaris) [10:56:02] (03CR) 10Muehlenhoff: [C: 03+2] Drop requires_os checks for trusty [puppet] - 10https://gerrit.wikimedia.org/r/489625 (owner: 10Muehlenhoff) [10:56:30] 10Operations, 10Traffic, 10VisualEditor, 10Wikimedia-Apache-configuration: Visual Editor gets stuck opening article (net::ERR_SPDY_PROTOCOL_ERROR 200/Loading failed for the