[00:00:04] MaxSem, RoanKattouw, Niharika, and Urbanecm: Time to snap out of that daydream and deploy Evening SWAT (Max 6 patches). Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191108T0000). [00:00:04] Zoranzoki21: A patch you scheduled for Evening SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:00:12] (03PS3) 10Catrope: GrowthExperiments: Enable suggested edits, but as opt-in only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547856 (https://phabricator.wikimedia.org/T236968) [00:00:25] I'll do the SWAT [00:00:26] Here for SWAT [00:02:28] (03PS7) 10Jeena Huneidi: Modify Restrouter chart to allow for minikube development [deployment-charts] - 10https://gerrit.wikimedia.org/r/545421 (https://phabricator.wikimedia.org/T228910) [00:03:39] PROBLEM - Backup freshness on backup1001 is CRITICAL: All failures: 2 (labtestpuppetmaster2001, ...), Fresh: 93 jobs https://wikitech.wikimedia.org/wiki/Backups%23Monitoring [00:04:40] (03CR) 10Jeena Huneidi: Modify Restrouter chart to allow for minikube development (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/545421 (https://phabricator.wikimedia.org/T228910) (owner: 10Jeena Huneidi) [00:06:57] RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:08:31] (03CR) 10Catrope: [C: 03+2] GrowthExperiments: Enable suggested edits, but as opt-in only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547856 (https://phabricator.wikimedia.org/T236968) (owner: 10Catrope) [00:08:55] (03Merged) 10jenkins-bot: GrowthExperiments: Enable suggested edits, but as opt-in only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/547856 (https://phabricator.wikimedia.org/T236968) (owner: 10Catrope) [00:10:25] 10Operations, 10ops-codfw, 10DBA: (codfw):rack/setup/install db213[2-5] - https://phabricator.wikimedia.org/T237702 (10Papaul) [00:10:55] 10Operations, 10ops-codfw, 10DBA: (codfw):rack/setup/install db213[2-5] - https://phabricator.wikimedia.org/T237702 (10Papaul) p:05Triage→03High [00:12:36] (03CR) 10Jeena Huneidi: Modify Restrouter chart to allow for minikube development (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/545421 (https://phabricator.wikimedia.org/T228910) (owner: 10Jeena Huneidi) [00:16:03] RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [00:17:52] how's swat going? [00:18:00] still waiting for jenkins? [00:19:16] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable suggested edits as hidden preference on arwiki, cswiki, kowiki, viwiki (T236968) (duration: 00m 53s) [00:19:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:19:22] T236968: Newcomer tasks: hidden preference - https://phabricator.wikimedia.org/T236968 [00:21:04] 10Operations, 10ops-codfw: find horizontal PDUs in codfw - https://phabricator.wikimedia.org/T221153 (10RobH) 05Open→03Resolved these are being sold off on T235792 [00:23:13] (03PS1) 10Catrope: GrowthExperiments: Add remote API config that should not be needed but is [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549678 [00:23:21] (03CR) 10Catrope: [C: 03+2] GrowthExperiments: Add remote API config that should not be needed but is [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549678 (owner: 10Catrope) [00:24:16] (03Merged) 10jenkins-bot: GrowthExperiments: Add remote API config that should not be needed but is [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549678 (owner: 10Catrope) [00:24:18] Zoranzoki21: Your semicolon patch is on mwdebug1001, please test [00:24:47] Oh [00:24:54] It works as should be [00:25:39] 10Operations, 10hardware-requests: eqiad+codfw: 6x hardware request for swift backend (each site) - https://phabricator.wikimedia.org/T227314 (10RobH) [00:26:38] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Fix remote API configs for GrowthExperiments (duration: 00m 51s) [00:26:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:34:12] !log catrope@deploy1001 Synchronized php-1.35.0-wmf.5/resources/: Semicolon should appear after log entries (T237500) (duration: 00m 53s) [00:34:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:34:16] T237500: Regression: [1.35.0-wmf.5] Classic RC lacks space between "(User creation log)"/"(Move log)"/"(Page translation log)" string and date - https://phabricator.wikimedia.org/T237500 [00:37:05] RoanKattouw: all done swatting? [00:39:09] twentyafterfour: Almost, sorry [00:40:18] Two more syncs [00:40:45] !log catrope@deploy1001 Synchronized php-1.35.0-wmf.5/includes: Sync DiffEngine changes that were needed to unbreak CI (duration: 00m 55s) [00:40:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:42:16] RoanKattouw: no problem [00:44:58] !log catrope@deploy1001 Synchronized php-1.35.0-wmf.5/extensions/GrowthExperiments/modules/homepage/ext.growthExperiments.Homepage.Logging.js: Fix homepage instrumentation (T237600) (duration: 00m 52s) [00:45:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:45:03] T237600: Homepage: unable to distinguish between clicks and impressions on mobile - https://phabricator.wikimedia.org/T237600 [00:45:07] Will you have time to sync and this? https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/549676/ [00:45:58] (03PS1) 104nn1l2: Change the language of Votewiki back to English [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549680 [00:46:36] I mean https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/549681/ [00:47:10] Zoranzoki21: On it [00:47:14] +2ed the master patch too [00:47:29] (03PS2) 104nn1l2: Change the language of Votewiki back to English Bug:T230614 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549680 [00:47:32] +2 added already by VolkerE [00:47:52] Oh hah I missed that [00:48:18] No problem, nothing bad will not happen :D [00:49:20] (03PS3) 104nn1l2: Change the language of Votewiki back to English [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549680 [00:57:38] (03PS1) 10CDanis: prometheus: export NIC firmware versions [puppet] - 10https://gerrit.wikimedia.org/r/549683 (https://phabricator.wikimedia.org/T236744) [01:00:14] (03CR) 10Krinkle: [C: 03+1] "MW isn't separable into endpoints like that. Most any code path can be called from any url or app server in some edge case. As far as I'm " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548923 (https://phabricator.wikimedia.org/T236833) (owner: 10Dzahn) [01:01:42] (03CR) 10Krinkle: [C: 04-1] "See CR at https://gerrit.wikimedia.org/r/548923." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548944 (https://phabricator.wikimedia.org/T236833) (owner: 10Dzahn) [01:02:02] (03CR) 10Krinkle: [C: 04-1] "(-1 because I think this is not meant temporary, but feel free to override if only meant as a short-term measure)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548944 (https://phabricator.wikimedia.org/T236833) (owner: 10Dzahn) [01:04:13] shdubsh: still around? [01:04:24] Yep [01:04:30] (03CR) 10Krinkle: "We may want to do https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/548923/ first given the wpt* servers with rest.php enab" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549219 (https://phabricator.wikimedia.org/T237555) (owner: 10Tim Starling) [01:04:36] I'm gonna try that deploy again if you don't mind [01:05:49] (03PS1) 10Huji: Revert "Change the language of Votewiki to Persian (fa) temporarily for the annual ArbCom elections" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549685 (https://phabricator.wikimedia.org/T230614) [01:06:04] go for it! [01:06:45] !log twentyafterfour@deploy1001 Started deploy [releng/phatality@11d4ad8]: deploying one more time, hopefully without killing elastic [01:06:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:07:59] (03PS2) 10Huji: Revert "Change the language of Votewiki to Persian (fa) temporarily for the annual ArbCom elections" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549685 (https://phabricator.wikimedia.org/T230614) [01:09:14] (03PS3) 10Huji: Change the language of Votewiki back to English (en) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549685 (https://phabricator.wikimedia.org/T230614) [01:09:50] !log twentyafterfour@deploy1001 Finished deploy [releng/phatality@11d4ad8]: deploying one more time, hopefully without killing elastic (duration: 03m 04s) [01:09:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:11:04] hmm so deploy finished ok [01:12:16] something went funny on 1007. checking now [01:12:25] Oh I forgot to deploy Zoranzoki's second patch [01:12:38] PROBLEM - Check systemd state on logstash1007 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [01:12:50] PROBLEM - ElasticSearch health check for shards on 9200 on logstash1007 is CRITICAL: CRITICAL - elasticsearch http://localhost:9200/_cluster/health error while fetching: HTTPConnectionPool(host=localhost, port=9200): Max retries exceeded with url: /_cluster/health (Caused by NewConnectionError(requests.packages.urllib3.connection.HTTPConnection object at 0x7f4dd1c9e358: Failed to establish a new connection: [Errno 111] Connection [01:12:50] ://wikitech.wikimedia.org/wiki/Search%23Administration [01:14:56] !log catrope@deploy1001 Synchronized php-1.35.0-wmf.5/resources/: Use internationalized semicolon separators (T233649) (duration: 00m 53s) [01:15:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:15:01] T233649: [AMC] Stray semicolon in RecentChanges and Watchlist interface - https://phabricator.wikimedia.org/T233649 [01:17:39] (03CR) 104nn1l2: "Duplicate of 549680" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549685 (https://phabricator.wikimedia.org/T230614) (owner: 10Huji) [01:18:25] jouncebot: now [01:18:25] No deployments scheduled for the next 70 hour(s) and 41 minute(s) [01:18:28] jouncebot: next [01:18:29] In 70 hour(s) and 41 minute(s): Veteran's Day (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20191111T0000) [01:19:12] RECOVERY - ElasticSearch health check for shards on 9200 on logstash1007 is OK: OK - elasticsearch status production-logstash-eqiad: active_shards_percent_as_number: 100.0, task_max_waiting_in_queue_millis: 0, status: green, delayed_unassigned_shards: 0, number_of_nodes: 6, active_primary_shards: 267, number_of_in_flight_fetch: 0, timed_out: False, relocating_shards: 0, cluster_name: production-logstash-eqiad, initializing_shards [01:19:12] s: 652, number_of_data_nodes: 3, unassigned_shards: 0, number_of_pending_tasks: 0 https://wikitech.wikimedia.org/wiki/Search%23Administration [01:21:16] twentyafterfour: ES looks good now. I'll file a task about making some headroom for those deploys. [01:21:42] cool, thanks shdubsh [01:23:55] !log reedy@deploy1001 Synchronized php-1.35.0-wmf.5/extensions/GlobalBlocking/: Prevent some extra db queries (duration: 00m 53s) [01:23:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:27:42] 10Operations, 10observability: Phatality deployments invoke oom-killer on logstash::collector nodes. - https://phabricator.wikimedia.org/T237706 (10colewhite) [01:30:50] (03PS1) 10Dzahn: wikistats: change mysql datadir to /var/lib/mysql [puppet] - 10https://gerrit.wikimedia.org/r/549686 [01:33:07] (03CR) 10Dzahn: [C: 03+2] "just a cloud VPS project instance moving to buster. we got this issue before with mariadb-server when importing data" [puppet] - 10https://gerrit.wikimedia.org/r/549686 (owner: 10Dzahn) [01:37:00] (03CR) 10Zoranzoki21: [C: 04-1] "Duplicate of https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/549685/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549680 (owner: 104nn1l2) [01:48:27] (03CR) 104nn1l2: "> Patch Set 3: Code-Review-1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549680 (owner: 104nn1l2) [01:49:37] (03PS4) 104nn1l2: Change the language of Votewiki back to English [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549680 [02:07:12] PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:09:35] 10Operations, 10observability: Phatality deployments invoke oom-killer on logstash::collector nodes. - https://phabricator.wikimedia.org/T237706 (10mmodell) An alternative might be to rewrite the install script in ~2 lines of bash. Something like: `lang=bash #!/bin/bash unzip /path/to/phatality-version.zip -o... [02:24:52] (03PS1) 10Ayounsi: Add PIM stanza for CR devices [homer/public] - 10https://gerrit.wikimedia.org/r/549689 [02:30:59] (03PS1) 10Ayounsi: CR: add apply-groups [ re0 re1 ]; if multiple REs [homer/public] - 10https://gerrit.wikimedia.org/r/549690 [02:34:22] RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [02:35:19] (03CR) 10Vgutierrez: [C: 03+2] prometheus: Provide global aggregation rules for trafficserver requests [puppet] - 10https://gerrit.wikimedia.org/r/548954 (https://phabricator.wikimedia.org/T236482) (owner: 10Vgutierrez) [02:36:16] PROBLEM - OSPF status on cr1-codfw is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [02:36:58] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 269, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [02:40:00] (03PS2) 10CDanis: prometheus: export NIC firmware versions [puppet] - 10https://gerrit.wikimedia.org/r/549683 (https://phabricator.wikimedia.org/T236744) [02:41:10] (03CR) 10jerkins-bot: [V: 04-1] prometheus: export NIC firmware versions [puppet] - 10https://gerrit.wikimedia.org/r/549683 (https://phabricator.wikimedia.org/T236744) (owner: 10CDanis) [02:41:48] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 271, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [02:42:06] (03PS1) 10Vgutierrez: hiera: Move nginx from port 443 to port 4443 on cp3052 [puppet] - 10https://gerrit.wikimedia.org/r/549691 (https://phabricator.wikimedia.org/T231627) [02:42:08] (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to port 443 on cp3052 [puppet] - 10https://gerrit.wikimedia.org/r/549692 (https://phabricator.wikimedia.org/T231627) [02:42:18] !log Switch from nginx to ats-tls on cp3052 - T231627 [02:42:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:42:24] T231627: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 [02:42:42] RECOVERY - OSPF status on cr1-codfw is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [02:43:26] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move nginx from port 443 to port 4443 on cp3052 [puppet] - 10https://gerrit.wikimedia.org/r/549691 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [02:45:27] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move ats-tls from port 8443 to port 443 on cp3052 [puppet] - 10https://gerrit.wikimedia.org/r/549692 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [02:45:50] (03PS2) 10Vgutierrez: hiera: Move ats-tls from port 8443 to port 443 on cp3052 [puppet] - 10https://gerrit.wikimedia.org/r/549692 (https://phabricator.wikimedia.org/T231627) [02:48:46] PROBLEM - HTTPS Unified ECDSA on cp3052 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused https://wikitech.wikimedia.org/wiki/HTTPS [02:49:39] ^^ expected [02:50:22] RECOVERY - HTTPS Unified ECDSA on cp3052 is OK: SSL OK - OCSP staple validity for en.wikipedia.org has 578803 seconds left:Certificate *.wikipedia.org contains all required SANs:Certificate *.wikipedia.org (ECDSA) valid until 2020-10-06 12:00:00 +0000 (expires in 333 days) https://wikitech.wikimedia.org/wiki/HTTPS [02:56:41] 10Operations, 10Traffic, 10Patch-For-Review: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 (10Vgutierrez) [03:02:56] (03PS1) 10Vgutierrez: hiera: Move nginx from port 443 to port 4443 on cp3054 [puppet] - 10https://gerrit.wikimedia.org/r/549694 (https://phabricator.wikimedia.org/T231627) [03:02:58] (03PS1) 10Vgutierrez: hiera: Move ats-tls from port 8443 to port 443 on cp3054 [puppet] - 10https://gerrit.wikimedia.org/r/549695 (https://phabricator.wikimedia.org/T231627) [03:03:42] !log Switch from nginx to ats-tls on cp3054 - T231627 [03:03:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:03:47] T231627: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 [03:04:32] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move nginx from port 443 to port 4443 on cp3054 [puppet] - 10https://gerrit.wikimedia.org/r/549694 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [03:05:34] (03CR) 10Vgutierrez: [C: 03+2] hiera: Move ats-tls from port 8443 to port 443 on cp3054 [puppet] - 10https://gerrit.wikimedia.org/r/549695 (https://phabricator.wikimedia.org/T231627) (owner: 10Vgutierrez) [03:20:32] 10Operations, 10Traffic, 10Patch-For-Review: Move cache text cluster from nginx to ats-tls - https://phabricator.wikimedia.org/T231627 (10Vgutierrez) [03:24:06] (03PS3) 10CDanis: prometheus: export NIC firmware versions [puppet] - 10https://gerrit.wikimedia.org/r/549683 (https://phabricator.wikimedia.org/T236744) [03:28:35] 10Operations, 10Traffic, 10observability: Add ats-tls status and availability graphs to frontend-traffic - https://phabricator.wikimedia.org/T236482 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez atls-tls availability panel is now ready: https://grafana.wikimedia.org/d/000000479/frontend-traffic?refre... [04:17:12] RECOVERY - Check systemd state on labtestpuppetmaster2001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:22:00] PROBLEM - Check systemd state on labtestpuppetmaster2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:33:25] (03CR) 10Subramanya Sastry: "> Patch Set 2: Code-Review-1" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548944 (https://phabricator.wikimedia.org/T236833) (owner: 10Dzahn) [04:39:57] (03CR) 10Krinkle: [C: 04-1] "OK. Sounds good to me. Syncing the memory_limit (or making it enabled at run-time by Parsoid itself through a separate config var) is imho" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548944 (https://phabricator.wikimedia.org/T236833) (owner: 10Dzahn) [04:57:58] 10Operations, 10Parsoid-PHP, 10serviceops, 10Patch-For-Review: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10Krinkle) Summary from various comments in code review: >>! From ("Bump wgMemoryLimit from 660MB to 760MB") > >>>! Krinkle wr... [04:58:02] subbu: ^ [05:06:44] PROBLEM - Memory correctable errors -EDAC- on mw1239 is CRITICAL: 4.001 ge 4 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=mw1239&var-datasource=eqiad+prometheus/ops [05:28:48] (03CR) 10DLynch: "Would it be viable to just set the default in VisualEditor's extension.js instead?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549633 (https://phabricator.wikimedia.org/T229074) (owner: 10Bartosz Dziewoński) [05:32:08] RECOVERY - Memory correctable errors -EDAC- on elastic1018 is OK: (C)4 ge (W)2 ge 1 https://wikitech.wikimedia.org/wiki/Monitoring/Memory%23Memory_correctable_errors_-EDAC- https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=elastic1018&var-datasource=eqiad+prometheus/ops [05:41:24] (03PS1) 10Ammarpad: Remove old config settings for DPL [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549697 (https://phabricator.wikimedia.org/T237698) [05:58:26] PROBLEM - OSPF status on cr2-knams is CRITICAL: OSPFv2: 5/6 UP : OSPFv3: 5/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [05:58:48] PROBLEM - OSPF status on cr1-eqiad is CRITICAL: OSPFv2: 4/6 UP : OSPFv3: 4/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [05:59:08] PROBLEM - BFD status on cr1-eqiad is CRITICAL: CRIT: Down: 3 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [06:00:02] RECOVERY - OSPF status on cr2-knams is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:00:24] RECOVERY - OSPF status on cr1-eqiad is OK: OSPFv2: 6/6 UP : OSPFv3: 6/6 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [06:00:44] RECOVERY - BFD status on cr1-eqiad is OK: OK: UP: 11 AdminDown: 0 Down: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [06:42:00] 10Operations, 10observability: Phatality deployments invoke oom-killer on logstash::collector nodes. - https://phabricator.wikimedia.org/T237706 (10MoritzMuehlenhoff) p:05Triage→03Normal [06:51:15] 10Operations, 10SRE-Access-Requests: Requesting access to restricted production access for Nikki Nikkhoui - https://phabricator.wikimedia.org/T237689 (10MoritzMuehlenhoff) We use group-based access control and based on the description that means membership in the "restricted" group. I see that you've already "... [06:53:55] 10Operations, 10SRE-Access-Requests: Requesting access to restricted production access for Nikki Nikkhoui - https://phabricator.wikimedia.org/T237689 (10MoritzMuehlenhoff) p:05Triage→03Normal [07:00:54] (03PS2) 10Elukey: analytics::refinery::job::data_purge: absent data quality deletion [puppet] - 10https://gerrit.wikimedia.org/r/549612 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [07:00:55] (03PS3) 10Elukey: analytics::refinery::job::data_purge: absent data quality deletion [puppet] - 10https://gerrit.wikimedia.org/r/549612 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [07:04:58] (03CR) 10Elukey: [C: 03+2] analytics::refinery::job::data_purge: absent data quality deletion [puppet] - 10https://gerrit.wikimedia.org/r/549612 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [07:06:07] (03PS2) 10Elukey: analytics::refinery::job::data_purge: remove data quality deletion [puppet] - 10https://gerrit.wikimedia.org/r/549615 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [07:06:14] (03PS3) 10Elukey: analytics::refinery::job::data_purge: remove data quality deletion [puppet] - 10https://gerrit.wikimedia.org/r/549615 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [07:08:53] (03CR) 10Elukey: [C: 03+2] analytics::refinery::job::data_purge: remove data quality deletion [puppet] - 10https://gerrit.wikimedia.org/r/549615 (https://phabricator.wikimedia.org/T235486) (owner: 10Mforns) [07:32:27] (03PS1) 10Muehlenhoff: Add Nikki Nikkhoui to the restricted group [puppet] - 10https://gerrit.wikimedia.org/r/549700 (https://phabricator.wikimedia.org/T237689) [07:34:04] (03CR) 10jerkins-bot: [V: 04-1] Add Nikki Nikkhoui to the restricted group [puppet] - 10https://gerrit.wikimedia.org/r/549700 (https://phabricator.wikimedia.org/T237689) (owner: 10Muehlenhoff) [07:41:11] (03PS2) 10Muehlenhoff: Add Nikki Nikkhoui to the restricted group [puppet] - 10https://gerrit.wikimedia.org/r/549700 (https://phabricator.wikimedia.org/T237689) [07:42:40] (03CR) 10jerkins-bot: [V: 04-1] Add Nikki Nikkhoui to the restricted group [puppet] - 10https://gerrit.wikimedia.org/r/549700 (https://phabricator.wikimedia.org/T237689) (owner: 10Muehlenhoff) [07:45:57] (03PS3) 10Muehlenhoff: Add Nikki Nikkhoui to the restricted group [puppet] - 10https://gerrit.wikimedia.org/r/549700 (https://phabricator.wikimedia.org/T237689) [07:48:05] (03CR) 10jerkins-bot: [V: 04-1] Add Nikki Nikkhoui to the restricted group [puppet] - 10https://gerrit.wikimedia.org/r/549700 (https://phabricator.wikimedia.org/T237689) (owner: 10Muehlenhoff) [07:50:31] (03PS4) 10Muehlenhoff: Add Nikki Nikkhoui to the restricted group [puppet] - 10https://gerrit.wikimedia.org/r/549700 (https://phabricator.wikimedia.org/T237689) [08:00:57] 10Operations: Remove old builds on package builder - https://phabricator.wikimedia.org/T237713 (10MoritzMuehlenhoff) [08:01:06] 10Operations: Remove old builds on package builder - https://phabricator.wikimedia.org/T237713 (10MoritzMuehlenhoff) p:05Triage→03Low [08:01:43] 10Operations, 10User-jbond: Boron disk space alert - https://phabricator.wikimedia.org/T237649 (10MoritzMuehlenhoff) 05Open→03Resolved a:03MoritzMuehlenhoff I removed a bunch of old data (e.g. trusty leftovers), we now have over 70G free diskspace again and I filed https://phabricator.wikimedia.org/T2377... [08:03:05] (03PS1) 10Giuseppe Lavagetto: phabricator: proxy websockets at the httpd level [puppet] - 10https://gerrit.wikimedia.org/r/549703 [08:06:19] 10Operations, 10Wikimedia-Logstash, 10observability, 10Core Platform Team Legacy (Watching / External), 10Services (watching): Reduce the number of fields declared in elasticsearch by logstash - https://phabricator.wikimedia.org/T180051 (10dcausse) @Eevans I think this is the way forward, as for schemas... [08:07:05] !log installing fribidi security updates on Buster [08:07:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:08:02] 10Operations, 10Parsoid-PHP, 10serviceops, 10Patch-For-Review: wt2html: Out of memory crashers - https://phabricator.wikimedia.org/T236833 (10Joe) I think it's practical, in the current moment and in general, to be able to vary the memory limits by payload and /or cluster. Not all clusters have the same s... [08:10:31] (03CR) 10Giuseppe Lavagetto: "> OK. Sounds good to me. Syncing the memory_limit (or making it" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548944 (https://phabricator.wikimedia.org/T236833) (owner: 10Dzahn) [08:17:43] (03Abandoned) 10Urbanecm: Change the language of Votewiki back to English [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549680 (owner: 104nn1l2) [08:23:58] !log restart kafka on kafka-jumbo1001 to test the new openjdk [08:24:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:24:26] (03CR) 10Ema: Set up cache routing for schema.wikimedia.org (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/549177 (https://phabricator.wikimedia.org/T233630) (owner: 10Ottomata) [08:31:30] !log elukey@cumin1001 START - Cookbook sre.druid.roll-restart-workers [08:31:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:31:56] (03CR) 10Ema: phabricator: proxy websockets at the httpd level (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/549703 (owner: 10Giuseppe Lavagetto) [08:38:41] 10Operations, 10ops-codfw, 10DBA: (codfw):rack/setup/install db213[2-5] - https://phabricator.wikimedia.org/T237702 (10jcrespo) Yes, the rack proposal seems ok. [08:52:53] !log stop and upgrade db1124 (may create temporary lag on wikireplicas) [08:52:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:57:34] !log elukey@cumin1001 END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) [08:57:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:00:13] !log jynus@cumin1001 dbctl commit (dc=all): 'Depool db1106 at 50%', diff saved to https://phabricator.wikimedia.org/P9562 and previous config saved to /var/cache/conftool/dbconfig/20191108-090012-jynus.json [09:00:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:04:52] !log jynus@cumin1001 dbctl commit (dc=all): 'Depool db1106 at 10%', diff saved to https://phabricator.wikimedia.org/P9563 and previous config saved to /var/cache/conftool/dbconfig/20191108-090451-jynus.json [09:04:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:05:38] !log elukey@cumin1001 START - Cookbook sre.druid.roll-restart-workers [09:05:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:17] I am roll restarting druid public, that is the cluster that AQS uses for edit-related api calls. In theory there is a pool/repool step and no alert should fire [09:06:42] but if any AQS alert does fire, it is almost surely due to me restarting [09:07:49] !log jynus@cumin1001 dbctl commit (dc=all): 'Depool db1106 fully', diff saved to https://phabricator.wikimedia.org/P9564 and previous config saved to /var/cache/conftool/dbconfig/20191108-090746-jynus.json [09:07:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:08:32] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/484304 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar) [09:08:42] !log installing Java security updates on kafka-jumbo [09:08:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:11] !log update and restart db1106 [09:10:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:13:52] 10Operations, 10Packaging, 10serviceops: Build and upload envoy 1.12.0 package. - https://phabricator.wikimedia.org/T237235 (10Joe) a:03Joe [09:16:32] 10Operations, 10DBA: Switchover s1 primary database master db1067 -> db1083 - 14th Nov 05:00 - 05:30 UTC - https://phabricator.wikimedia.org/T234800 (10jcrespo) Hi, a reminder this was scheduled well in advance, provably announcement was already done, but I thought it was worth giving a reminder here one week... [09:19:38] 10Operations: Track remaining jessie systems in production - https://phabricator.wikimedia.org/T224549 (10MoritzMuehlenhoff) [09:23:52] (03PS1) 10Elukey: sre.druid.roll-restart-workers: add log before the restarts happen [cookbooks] - 10https://gerrit.wikimedia.org/r/549812 [09:26:22] (03CR) 10Muehlenhoff: "No, I meant the OpenStack backend for" [puppet] - 10https://gerrit.wikimedia.org/r/548439 (owner: 10Dzahn) [09:27:36] !log jynus@cumin1001 dbctl commit (dc=all): 'Repool db1106 at 10%', diff saved to https://phabricator.wikimedia.org/P9565 and previous config saved to /var/cache/conftool/dbconfig/20191108-092735-jynus.json [09:27:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:28:09] (03PS2) 10Ema: phabricator: proxy websockets at the httpd level [puppet] - 10https://gerrit.wikimedia.org/r/549703 (owner: 10Giuseppe Lavagetto) [09:29:43] !log update and restart db2094 [09:29:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:31:44] (03PS1) 10Muehlenhoff: Switch dumpsdata1003 to Buster [puppet] - 10https://gerrit.wikimedia.org/r/549813 [09:32:59] (03CR) 10Ema: [C: 03+2] "wss replaced with ws in ProxyPass." [puppet] - 10https://gerrit.wikimedia.org/r/549703 (owner: 10Giuseppe Lavagetto) [09:33:12] (03CR) 10Ema: [C: 03+2] phabricator: proxy websockets at the httpd level [puppet] - 10https://gerrit.wikimedia.org/r/549703 (owner: 10Giuseppe Lavagetto) [09:34:20] (03CR) 10ArielGlenn: [C: 03+1] "Please do!" [puppet] - 10https://gerrit.wikimedia.org/r/549813 (owner: 10Muehlenhoff) [09:35:47] !log elukey@cumin1001 END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) [09:35:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:36:26] (03CR) 10Muehlenhoff: [C: 03+2] Switch dumpsdata1003 to Buster [puppet] - 10https://gerrit.wikimedia.org/r/549813 (owner: 10Muehlenhoff) [09:39:02] (03CR) 10Alexandros Kosiaris: "Wikia has been kind enough to create https://github.com/Wikia/poolcounter-prometheus-exporter" [puppet] - 10https://gerrit.wikimedia.org/r/549668 (owner: 10RLazarus) [09:39:59] !log jynus@cumin1001 dbctl commit (dc=all): 'Repool db1106 at 50%', diff saved to https://phabricator.wikimedia.org/P9566 and previous config saved to /var/cache/conftool/dbconfig/20191108-093958-jynus.json [09:40:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:41:02] !log jynus@cumin1001 dbctl commit (dc=all): 'Depool db2072', diff saved to https://phabricator.wikimedia.org/P9567 and previous config saved to /var/cache/conftool/dbconfig/20191108-094100-jynus.json [09:41:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:41:26] !log update and restart db2072 [09:41:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:41:34] (03CR) 10Mobrovac: profile::mediawiki::httpd: set a SERVERGROUP env variable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/546448 (https://phabricator.wikimedia.org/T235899) (owner: 10Giuseppe Lavagetto) [09:43:47] (03PS1) 10Muehlenhoff: Remove now obsolete openstack-jessie-bpo filter [puppet] - 10https://gerrit.wikimedia.org/r/549814 [09:44:07] 10Operations, 10netops, 10observability: Determine & implement near-term method for escalating network alerts - https://phabricator.wikimedia.org/T237587 (10Volans) I'd rather not do (3), seems a step back (not respecting awake hours and such). Regarding (1) we already have a proposal from last SRE summit,... [09:44:49] (03PS11) 10Hashar: rsync: readd incoming and outgoing chmod [puppet] - 10https://gerrit.wikimedia.org/r/484304 (https://phabricator.wikimedia.org/T137890) [09:45:04] (03CR) 10Hashar: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/484304 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar) [09:47:07] (03CR) 10Giuseppe Lavagetto: profile::mediawiki::httpd: set a SERVERGROUP env variable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/546448 (https://phabricator.wikimedia.org/T235899) (owner: 10Giuseppe Lavagetto) [09:48:22] (03CR) 10Hashar: "Attached to T237707" [puppet] - 10https://gerrit.wikimedia.org/r/484304 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar) [09:48:43] (03PS1) 10Jbond: package_builder: clean up build directory [puppet] - 10https://gerrit.wikimedia.org/r/549815 (https://phabricator.wikimedia.org/T237713) [09:49:03] (03PS1) 10Ema: ATS: map phabricator ws to TLS encrypted wss [puppet] - 10https://gerrit.wikimedia.org/r/549816 (https://phabricator.wikimedia.org/T210411) [09:49:05] (03PS1) 10Ema: phabricator: include uri path in ProxyPass directive [puppet] - 10https://gerrit.wikimedia.org/r/549817 (https://phabricator.wikimedia.org/T210411) [09:50:52] !log uploaded openjdk 8u232-b09-1~deb10u1 to component/jdk8 for apt.wikimedia.org/buster-wikimedia [09:50:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:51:04] (03CR) 10Ema: [C: 03+2] ATS: map phabricator ws to TLS encrypted wss [puppet] - 10https://gerrit.wikimedia.org/r/549816 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [09:51:58] (03CR) 10Ema: [C: 03+2] phabricator: include uri path in ProxyPass directive [puppet] - 10https://gerrit.wikimedia.org/r/549817 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [09:53:49] 10Operations, 10DBA: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10jcrespo) db2072 got stuck on `Loading initial ramdisk ...` [09:55:44] (03PS1) 10Ema: ATS: fix typo in phabricator wss remap rule [puppet] - 10https://gerrit.wikimedia.org/r/549818 (https://phabricator.wikimedia.org/T210411) [09:57:02] (03CR) 10Ema: [C: 03+2] ATS: fix typo in phabricator wss remap rule [puppet] - 10https://gerrit.wikimedia.org/r/549818 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [10:01:29] !log jynus@cumin1001 dbctl commit (dc=all): 'Repool db2072', diff saved to https://phabricator.wikimedia.org/P9568 and previous config saved to /var/cache/conftool/dbconfig/20191108-100128-jynus.json [10:01:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:11] !log jynus@cumin1001 dbctl commit (dc=all): 'Depool db2071', diff saved to https://phabricator.wikimedia.org/P9569 and previous config saved to /var/cache/conftool/dbconfig/20191108-100310-jynus.json [10:03:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:26] (03CR) 10Alexandros Kosiaris: [C: 04-1] package_builder: clean up build directory (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/549815 (https://phabricator.wikimedia.org/T237713) (owner: 10Jbond) [10:04:55] 10Operations, 10Patch-For-Review: Remove old builds on package builder - https://phabricator.wikimedia.org/T237713 (10akosiaris) >All builds done on boron end up in /var/cache/pbuilder/result/* and we don't expire debs from there. The build dirs are on /var/cache/pbuilder/build. The result/ subdir is where we... [10:06:23] !log update and restart db2071 [10:06:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:38] !log restart jvm-based hadoop daemons on an-master100[1,2] to pick up the new openjdk version [10:09:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:18:00] !log jynus@cumin1001 dbctl commit (dc=all): 'Repool db2071, depool db2092', diff saved to https://phabricator.wikimedia.org/P9570 and previous config saved to /var/cache/conftool/dbconfig/20191108-101759-jynus.json [10:18:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:19:01] !log update and restart db2092 [10:19:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:25:01] (03PS1) 10Ema: phabricator: allow websockets via tls terminator [puppet] - 10https://gerrit.wikimedia.org/r/549821 (https://phabricator.wikimedia.org/T210411) [10:25:42] (03CR) 10Vgutierrez: [C: 03+1] phabricator: allow websockets via tls terminator [puppet] - 10https://gerrit.wikimedia.org/r/549821 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [10:25:50] (03PS1) 10Jcrespo: Revert "wikireplica_analytics: Change query killer from 4h to 1h" [puppet] - 10https://gerrit.wikimedia.org/r/549822 [10:26:13] (03CR) 10Ema: [C: 03+2] phabricator: allow websockets via tls terminator [puppet] - 10https://gerrit.wikimedia.org/r/549821 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [10:28:20] (03CR) 10Jcrespo: [C: 03+2] Revert "wikireplica_analytics: Change query killer from 4h to 1h" [puppet] - 10https://gerrit.wikimedia.org/r/549822 (owner: 10Jcrespo) [10:28:41] (03PS2) 10Jcrespo: Revert "wikireplica_analytics: Change query killer from 4h to 1h" [puppet] - 10https://gerrit.wikimedia.org/r/549822 [10:30:01] ^heads up arturo [10:32:19] !log jynus@cumin1001 dbctl commit (dc=all): 'Repool db2092, depool db2103', diff saved to https://phabricator.wikimedia.org/P9571 and previous config saved to /var/cache/conftool/dbconfig/20191108-103218-jynus.json [10:32:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:33:04] (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/549585 (owner: 10Elukey) [10:33:31] (03CR) 10Volans: [C: 03+1] "LGTM" [cookbooks] - 10https://gerrit.wikimedia.org/r/549812 (owner: 10Elukey) [10:33:32] !log enable IPMI `racadm set iDRAC.IPMILan.Enable 1` on cloudcephosd[1-3] T224188 [10:33:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:33:37] T224188: rack/setup/install (3) new osd ceph nodes - https://phabricator.wikimedia.org/T224188 [10:34:01] !log enable IPMI `racadm set iDRAC.IPMILan.Enable 1` on cloudcephmon[1-3] T228102 [10:34:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:34:06] T228102: rack/setup/install cloudcephmon100[123] - https://phabricator.wikimedia.org/T228102 [10:34:34] (03PS1) 10Giuseppe Lavagetto: envoy-tls-local-proxy: require configuration of the admin endpoint [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/549825 (https://phabricator.wikimedia.org/T237234) [10:38:22] !log update and restart db2103 [10:38:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:57] (03CR) 10Hashar: "> The build issues are the result of newer versions of Pylint and Mypy. Pylint 2.3.1 which I used found not issues. Pylint 2.4.3 reported " (0314 comments) [software/cassandra-table-properties] - 10https://gerrit.wikimedia.org/r/524921 (https://phabricator.wikimedia.org/T220246) (owner: 10Holger Knust) [10:50:14] !log jynus@cumin1001 dbctl commit (dc=all): 'Repool db2103, depool db2116', diff saved to https://phabricator.wikimedia.org/P9572 and previous config saved to /var/cache/conftool/dbconfig/20191108-105013-jynus.json [10:50:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:13] !log update and restart db2116 [10:51:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:52:43] jynus: ? [10:52:58] https://gerrit.wikimedia.org/r/c/operations/puppet/+/549822 [10:53:23] https://phabricator.wikimedia.org/T233986#5551982 [10:55:22] jynus: ACK will let my team know [10:55:34] thanks [10:56:04] it could be related for the other issue, if 10.1.42 fixhes the crases on toolsdb [10:58:21] (03PS1) 10BBlack: Add globalsign 2019 unified cert files [puppet] - 10https://gerrit.wikimedia.org/r/549827 (https://phabricator.wikimedia.org/T237650) [10:58:23] (03PS1) 10BBlack: Deploy (but not use) new 2019 globalsign unified [puppet] - 10https://gerrit.wikimedia.org/r/549828 (https://phabricator.wikimedia.org/T237650) [10:58:33] !log running rebuildItemTerms on 8028 items (T234329) [10:58:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:38] T234329: Investigate, fix and reimport failing item terms into new store - https://phabricator.wikimedia.org/T234329 [11:00:06] (03CR) 10BBlack: [C: 03+2] Add globalsign 2019 unified cert files [puppet] - 10https://gerrit.wikimedia.org/r/549827 (https://phabricator.wikimedia.org/T237650) (owner: 10BBlack) [11:00:12] \o/ [11:01:01] vgutierrez: can you double check https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/549828/ ? [11:02:09] doing it right now [11:02:50] (03CR) 10Vgutierrez: [C: 03+1] Deploy (but not use) new 2019 globalsign unified [puppet] - 10https://gerrit.wikimedia.org/r/549828 (https://phabricator.wikimedia.org/T237650) (owner: 10BBlack) [11:05:13] (03CR) 10BBlack: [C: 03+2] "Confirmed OCSP is up and running on the CA's end via crt.sh" [puppet] - 10https://gerrit.wikimedia.org/r/549828 (https://phabricator.wikimedia.org/T237650) (owner: 10BBlack) [11:11:27] !log jynus@cumin1001 dbctl commit (dc=all): 'Repool db2116, depool db2130', diff saved to https://phabricator.wikimedia.org/P9573 and previous config saved to /var/cache/conftool/dbconfig/20191108-111125-jynus.json [11:11:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:31] 10Operations, 10Phabricator: Intermittent DB connectivity problem on phabricator, needs investigation - https://phabricator.wikimedia.org/T163507 (10mmodell) 05Open→03Declined This is ancient history. [11:12:32] (03PS1) 10Ema: phabricator: do not rewrite /ws/ [puppet] - 10https://gerrit.wikimedia.org/r/549832 (https://phabricator.wikimedia.org/T210411) [11:12:37] !log update and restart db2130 [11:12:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:13:11] (03CR) 10Elukey: [C: 03+1] phabricator: do not rewrite /ws/ [puppet] - 10https://gerrit.wikimedia.org/r/549832 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [11:15:37] (03CR) 10Ema: [C: 03+2] phabricator: do not rewrite /ws/ [puppet] - 10https://gerrit.wikimedia.org/r/549832 (https://phabricator.wikimedia.org/T210411) (owner: 10Ema) [11:24:39] Amir1: what's the latest on the wb terms migration? are we done? what's left? [11:25:05] !log jynus@cumin1001 dbctl commit (dc=all): 'repool db2130', diff saved to https://phabricator.wikimedia.org/P9574 and previous config saved to /var/cache/conftool/dbconfig/20191108-112503-jynus.json [11:25:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:25:09] apergos: it is the wb_terms migration [11:25:33] we are not done but these were failed because of the deadlocks so I'm rerunning for those only [11:25:42] ah [11:26:01] when might i check back again? a week, a few weeks, ...? [11:27:03] basically once we are writing to the new tables and the existing wb_terms data is copied into them too, it makes sense to start dumping those new tables [11:27:34] !log jynus@cumin1001 dbctl commit (dc=all): 'Pool db1118 at 50%', diff saved to https://phabricator.wikimedia.org/P9575 and previous config saved to /var/cache/conftool/dbconfig/20191108-112733-jynus.json [11:27:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:30:20] Amir1: [11:31:31] sorry, I was afk [11:31:45] btw. we fixed deadlocks issue https://phabricator.wikimedia.org/T234948 [11:32:31] apergos: so, we are 50% done in migrating the data apergos [11:32:37] ah ok [11:32:41] the whole wiki is write both [11:33:02] I'll ask again in about am onth then. no worries [11:33:17] thanks! [11:33:21] cool! [11:41:34] (03PS1) 10Cparle: Turn off redirect on exact search match for commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549835 (https://phabricator.wikimedia.org/T235263) [11:42:23] (03CR) 10jerkins-bot: [V: 04-1] Turn off redirect on exact search match for commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549835 (https://phabricator.wikimedia.org/T235263) (owner: 10Cparle) [11:44:47] (03PS2) 10Cparle: Turn off redirect on exact search match for beta commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549835 (https://phabricator.wikimedia.org/T235263) [11:45:27] (03CR) 10jerkins-bot: [V: 04-1] Turn off redirect on exact search match for beta commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549835 (https://phabricator.wikimedia.org/T235263) (owner: 10Cparle) [11:46:48] (03PS3) 10Cparle: Turn off redirect on exact search match for beta commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549835 (https://phabricator.wikimedia.org/T235263) [11:48:29] (03PS3) 10Giuseppe Lavagetto: Also check charts generated by helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/549059 [11:48:30] (03PS1) 10Giuseppe Lavagetto: blubberoid: add telemetry collection support for envoy [deployment-charts] - 10https://gerrit.wikimedia.org/r/549837 (https://phabricator.wikimedia.org/T237234) [11:53:31] (03PS2) 10Jbond: package_builder: clean up build directory [puppet] - 10https://gerrit.wikimedia.org/r/549815 (https://phabricator.wikimedia.org/T237713) [11:54:11] (03CR) 10Jbond: "updated, thanks" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/549815 (https://phabricator.wikimedia.org/T237713) (owner: 10Jbond) [11:55:54] !log jynus@cumin1001 dbctl commit (dc=all): 'Pool db1118 at 20%', diff saved to https://phabricator.wikimedia.org/P9576 and previous config saved to /var/cache/conftool/dbconfig/20191108-115553-jynus.json [11:55:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:01:40] !log jynus@cumin1001 dbctl commit (dc=all): 'Depool db1118 fully', diff saved to https://phabricator.wikimedia.org/P9577 and previous config saved to /var/cache/conftool/dbconfig/20191108-120138-jynus.json [12:01:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:02:35] !log update and restart db1118 [12:02:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:10:46] (03CR) 10Muehlenhoff: package_builder: clean up build directory (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/549815 (https://phabricator.wikimedia.org/T237713) (owner: 10Jbond) [12:14:46] !log jynus@cumin1001 dbctl commit (dc=all): 'Pool db1118 at 10%', diff saved to https://phabricator.wikimedia.org/P9578 and previous config saved to /var/cache/conftool/dbconfig/20191108-121444-jynus.json [12:14:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:02] (03CR) 10Hashar: "I went with a full compile https://puppet-compiler.wmflabs.org/compiler1001/19315/" [puppet] - 10https://gerrit.wikimedia.org/r/484304 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar) [12:25:43] (03PS1) 10Ema: ATS: X-Wikimedia-Debug request routing implementation [puppet] - 10https://gerrit.wikimedia.org/r/549840 (https://phabricator.wikimedia.org/T237687) [12:34:24] !log elukey@cumin1001 START - Cookbook sre.cassandra.roll-restart [12:34:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:34:39] aqs cluster --^ [12:50:59] (03CR) 10Jbond: [C: 03+1] "lgtm" [puppet] - 10https://gerrit.wikimedia.org/r/525220 (https://phabricator.wikimedia.org/T46722) (owner: 10Muehlenhoff) [12:53:38] 10Operations, 10SRE-tools, 10puppet-compiler, 10User-jbond: Puppet compiler: re-add the concurrency option NUM_THREADS - https://phabricator.wikimedia.org/T157002 (10jbond) When you say it was removed, can you expand a bit. I see `puppet_compiler/templates/run_wrapper.erb ` was removed which made referenc... [12:53:56] (03CR) 10Alexandros Kosiaris: [C: 04-1] Also check charts generated by helmfile (034 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/549059 (owner: 10Giuseppe Lavagetto) [13:05:28] (03CR) 10Urbanecm: [C: 03+2] "needed ASAP" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549685 (https://phabricator.wikimedia.org/T230614) (owner: 10Huji) [13:06:20] (03Merged) 10jenkins-bot: Change the language of Votewiki back to English (en) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549685 (https://phabricator.wikimedia.org/T230614) (owner: 10Huji) [13:09:00] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: ee2027c: Change the language of Votewiki back to English (en) (T230614) (duration: 00m 54s) [13:09:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:09:05] T230614: Carry out the 2019 fawiki elections on votewiki - https://phabricator.wikimedia.org/T230614 [13:10:01] Urbanecm: re errors, they happend on mwdebug only at the same time (I think it was you) deployed a related login patch/run script [13:12:58] !log jynus@cumin1001 dbctl commit (dc=all): 'Pool db1118 at 20%', diff saved to https://phabricator.wikimedia.org/P9580 and previous config saved to /var/cache/conftool/dbconfig/20191108-131257-jynus.json [13:13:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:41] (03PS1) 10Arturo Borrero Gonzalez: toolforge: enable tools-prometheus to monitor by SSH serveral servers [puppet] - 10https://gerrit.wikimedia.org/r/549844 (https://phabricator.wikimedia.org/T237557) [13:19:29] jouncebot: now [13:19:29] No deployments scheduled for the next 58 hour(s) and 40 minute(s) [13:19:39] (03PS4) 10Jforrester: Turn off redirect on exact search match for beta commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549835 (https://phabricator.wikimedia.org/T235263) (owner: 10Cparle) [13:19:49] Doing a beta-only config merge. [13:20:51] (03CR) 10Jforrester: [C: 03+2] Turn off redirect on exact search match for beta commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549835 (https://phabricator.wikimedia.org/T235263) (owner: 10Cparle) [13:21:23] (03PS1) 10Jbond: puppet_compiler.prepare: Fail fast if the git commands fail [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/549845 (https://phabricator.wikimedia.org/T157001) [13:21:26] (03Merged) 10jenkins-bot: Turn off redirect on exact search match for beta commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549835 (https://phabricator.wikimedia.org/T235263) (owner: 10Cparle) [13:23:47] (03PS1) 10Alexandros Kosiaris: md: Globally set lower sync limits [puppet] - 10https://gerrit.wikimedia.org/r/549847 (https://phabricator.wikimedia.org/T237197) [13:26:24] (03CR) 10jerkins-bot: [V: 04-1] md: Globally set lower sync limits [puppet] - 10https://gerrit.wikimedia.org/r/549847 (https://phabricator.wikimedia.org/T237197) (owner: 10Alexandros Kosiaris) [13:27:40] !log elukey@cumin1001 END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) [13:27:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:44] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Proton, 10Core Platform Team Workboards (Clinic Duty Team): Requests to MW 404 when on HTTPS - https://phabricator.wikimedia.org/T202982 (10akosiaris) >>! In T202982#5627916, @Pchelolo wrote: > Seems like it's been fixed, the only thing left to be done... [13:28:20] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] "PCC https://puppet-compiler.wmflabs.org/compiler1003/19316/" [puppet] - 10https://gerrit.wikimedia.org/r/549844 (https://phabricator.wikimedia.org/T237557) (owner: 10Arturo Borrero Gonzalez) [13:29:06] (03CR) 10Alexandros Kosiaris: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/547777 (owner: 10Brennen Bearnes) [13:29:23] (03CR) 10Mobrovac: [C: 04-2] "-2 per T229074#5647578" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549633 (https://phabricator.wikimedia.org/T229074) (owner: 10Bartosz Dziewoński) [13:31:10] (03PS2) 10Alexandros Kosiaris: md: Globally set lower sync limits [puppet] - 10https://gerrit.wikimedia.org/r/549847 (https://phabricator.wikimedia.org/T237197) [13:37:09] (03PS8) 10Jforrester: Rename DPL extension variable to non-ambiguous name, part 1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548569 (https://phabricator.wikimedia.org/T237698) (owner: 10Ammarpad) [13:37:11] (03PS2) 10Jforrester: Rename DPL extension variable to non-ambiguous name, part 2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549666 (https://phabricator.wikimedia.org/T237698) (owner: 10Ammarpad) [13:37:13] (03PS2) 10Jforrester: Rename DPL extension variable to non-ambiguous name, part 3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549697 (https://phabricator.wikimedia.org/T237698) (owner: 10Ammarpad) [13:37:51] 10Operations, 10Product-Infrastructure-Team-Backlog, 10Proton, 10Core Platform Team Workboards (Clinic Duty Team): Requests to MW 404 when on HTTPS - https://phabricator.wikimedia.org/T202982 (10mobrovac) 05Open→03Resolved a:03mobrovac Indeed all's good here. [13:38:46] (03PS1) 10Arturo Borrero Gonzalez: acme_chief: cloud: fix typo in prometheus_fixup profile declaration [puppet] - 10https://gerrit.wikimedia.org/r/549854 (https://phabricator.wikimedia.org/T237557) [13:40:12] !log stop and upgrade percona-server on test host db1114 [13:40:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:42:02] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] acme_chief: cloud: fix typo in prometheus_fixup profile declaration [puppet] - 10https://gerrit.wikimedia.org/r/549854 (https://phabricator.wikimedia.org/T237557) (owner: 10Arturo Borrero Gonzalez) [13:48:14] (03PS1) 10Ema: Add dummy globalsign-2019 keys for pcc [labs/private] - 10https://gerrit.wikimedia.org/r/549856 (https://phabricator.wikimedia.org/T237650) [13:49:07] (03CR) 10Ema: [V: 03+2 C: 03+2] Add dummy globalsign-2019 keys for pcc [labs/private] - 10https://gerrit.wikimedia.org/r/549856 (https://phabricator.wikimedia.org/T237650) (owner: 10Ema) [13:51:29] (03PS1) 10Jbond: puppet-export-facts: use the certificate provided by localcacert [puppet] - 10https://gerrit.wikimedia.org/r/549857 (https://phabricator.wikimedia.org/T214472) [13:56:18] 10Operations, 10puppet-compiler, 10Patch-For-Review, 10User-jbond: puppet: compiler-update-facts error and warning - https://phabricator.wikimedia.org/T214472 (10jbond) @aborrero The current script references a file which is only avalible in the CA directory and therefore only available on the puppet mast... [13:58:17] (03PS2) 10Ema: ATS: X-Wikimedia-Debug request routing implementation [puppet] - 10https://gerrit.wikimedia.org/r/549840 (https://phabricator.wikimedia.org/T237687) [13:58:19] (03PS1) 10Ema: labs: add cache_{text,upload} to wikimedia_clusters [puppet] - 10https://gerrit.wikimedia.org/r/549858 [13:59:07] (03PS2) 10Ema: labs: add cache_{text,upload} to wikimedia_clusters [puppet] - 10https://gerrit.wikimedia.org/r/549858 [14:04:07] 10Operations, 10Machine vision, 10serviceops, 10MW-1.35-notes (1.35.0-wmf.8; 2019-11-26), 10Product-Infrastructure-Team-Backlog (Kanban): Configure Google Cloud Vision credentials in production - https://phabricator.wikimedia.org/T236426 (10Mholloway) 05Open→03Resolved [14:04:11] 10Operations, 10Machine vision, 10Product-Infrastructure-Team-Backlog, 10serviceops, 10Patch-For-Review: How should the MachineVision extension interact with external APIs from production? - https://phabricator.wikimedia.org/T236797 (10Mholloway) [14:19:25] (03CR) 10Daniel Kinzler: "> Personally, I'd probably take it slightly slower. Put it to group0, wait a few days (maybe a week), then group1, then all." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549449 (https://phabricator.wikimedia.org/T198312) (owner: 10Daniel Kinzler) [14:23:31] 10Operations, 10ops-codfw, 10DBA: (codfw):rack/setup/install db213[2-5] - https://phabricator.wikimedia.org/T237702 (10jcrespo) [14:23:51] PROBLEM - Host backup2001 is DOWN: PING CRITICAL - Packet loss = 100% [14:24:59] RECOVERY - Host backup2001 is UP: PING OK - Packet loss = 0%, RTA = 36.22 ms [14:27:36] oh [14:28:20] papaul: ^potential loos cables while doing maintenance? [14:28:25] *loose [14:30:02] actually, no, it restarted [14:30:27] checking hw logs [14:33:55] "SYS1003 System CPU Resetting." [14:34:14] "System is performing a CPU reset because of system power off, power on or a warm reset like CTRL-ALT-DEL." [14:34:49] ah, but that is old [14:35:09] no logs today [14:40:23] !log volans@cumin1001 START - Cookbook sre.hosts.downtime [14:40:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:41] PROBLEM - Check the Netbox report puppetdb for fail status. on netbox1001 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [14:42:30] !log volans@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [14:42:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:44:28] 10Operations, 10ops-codfw: backup2001 crashed with no logs on 2019-11-08 14:22 - https://phabricator.wikimedia.org/T237730 (10jcrespo) [14:45:30] 10Operations, 10ops-codfw: backup2001 crashed with no logs on 2019-11-08 14:22 - https://phabricator.wikimedia.org/T237730 (10jcrespo) [14:45:38] 10Operations, 10DBA, 10serviceops, 10Goal, 10Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10jcrespo) [14:48:37] 10Operations, 10ops-codfw: backup2001 crashed with no logs on 2019-11-08 14:22 - https://phabricator.wikimedia.org/T237730 (10jcrespo) a:03Papaul Papaul: the only long shot is that it or one of the 2 arrays could have a loose power cable (I know, improbable, specially with redundant power) could I ask you to... [14:48:48] 10Operations, 10ops-codfw: backup2001 crashed with no logs on 2019-11-08 14:22 - https://phabricator.wikimedia.org/T237730 (10jcrespo) p:05Triage→03Normal [14:50:30] !log jynus@cumin1001 dbctl commit (dc=all): 'Pool db1118 at 50%', diff saved to https://phabricator.wikimedia.org/P9581 and previous config saved to /var/cache/conftool/dbconfig/20191108-145028-jynus.json [14:50:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:45] (03CR) 10Ottomata: "Giuseppe should I add discovery for schema after all? Ema says yes?" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/549177 (https://phabricator.wikimedia.org/T233630) (owner: 10Ottomata) [14:55:23] (03PS1) 10Phamhi: tools-webservice: Add new buster image options to webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549863 (https://phabricator.wikimedia.org/T230961) [14:55:53] !log volans@cumin1001 START - Cookbook sre.hosts.decommission [14:55:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:03] (03CR) 10jerkins-bot: [V: 04-1] tools-webservice: Add new buster image options to webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549863 (https://phabricator.wikimedia.org/T230961) (owner: 10Phamhi) [14:56:27] !log volans@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [14:56:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:56:34] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission cp1008, cp1071, cp1072, cp1073, cp1074, cp1099 - https://phabricator.wikimedia.org/T229586 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by volans@cumin1001 for hosts: `cp1072.eqiad.wmnet` - cp1072.eqiad.wmnet (**PASS**)... [14:58:14] (03PS2) 10Phamhi: tools-webservice: Add new buster image options to webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549863 (https://phabricator.wikimedia.org/T230961) [14:58:51] (03CR) 10jerkins-bot: [V: 04-1] tools-webservice: Add new buster image options to webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549863 (https://phabricator.wikimedia.org/T230961) (owner: 10Phamhi) [15:01:28] (03PS3) 10Phamhi: tools-webservice: Add new buster image options to webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549863 (https://phabricator.wikimedia.org/T230961) [15:02:10] (03CR) 10jerkins-bot: [V: 04-1] tools-webservice: Add new buster image options to webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549863 (https://phabricator.wikimedia.org/T230961) (owner: 10Phamhi) [15:07:44] (03CR) 10Ottomata: "Ah ok, he just said it isn't strictly needed but check with you all. Will add discovery then..." [puppet] - 10https://gerrit.wikimedia.org/r/549177 (https://phabricator.wikimedia.org/T233630) (owner: 10Ottomata) [15:08:09] RECOVERY - Check the Netbox report puppetdb for fail status. on netbox1001 is OK: puppetdb.PuppetDB OK https://wikitech.wikimedia.org/wiki/Netbox%23Reports [15:09:01] (03PS1) 10Mholloway: MachineVision: Enable testers-only mode on testcommonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549866 [15:09:14] (03CR) 10Volans: [C: 03+1] "LGTM, see inline." (031 comment) [software/puppet-compiler] - 10https://gerrit.wikimedia.org/r/549845 (https://phabricator.wikimedia.org/T157001) (owner: 10Jbond) [15:10:21] (03CR) 10Mholloway: [C: 03+2] MachineVision: Enable testers-only mode on testcommonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549866 (owner: 10Mholloway) [15:10:48] Krinkle, ty [15:12:02] (03Merged) 10jenkins-bot: MachineVision: Enable testers-only mode on testcommonswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549866 (owner: 10Mholloway) [15:12:04] (03PS4) 10Phamhi: tools-webservice: Add new buster image options to webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549863 (https://phabricator.wikimedia.org/T230961) [15:13:49] !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings.php: MachineVision: Enable testers-only mode on testcommonswiki for debugging (duration: 00m 52s) [15:13:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:14:58] (03CR) 10Volans: [C: 04-1] "Two minor nits inline, LGTM otherwise." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/549847 (https://phabricator.wikimedia.org/T237197) (owner: 10Alexandros Kosiaris) [15:21:04] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: decommission cp1008, cp1071, cp1072, cp1073, cp1074, cp1099 - https://phabricator.wikimedia.org/T229586 (10Volans) [15:23:49] jynus: I finally took a look at the search you shared on Wednesday. Yes, they are very likely from testing of code I deployed around that time. However, all the messages are logged on DEBUG level, and once i exclude that level, nothing suspicious appears. The DEBUG logging feature of x-wikimedia-debug was likely enabled (cf https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug#Debug_logging). [15:24:05] I see [15:24:07] thanks for double-checking [15:24:22] debug on mwmaint is treated as an error, right? [15:24:36] jynus: what do you mean? [15:24:41] so I got a spike of errors on my dashboard [15:24:49] or mayby I don't ignore DEBUG [15:24:54] only INFO and WARNING [15:24:56] let me check [15:25:05] maybe [15:25:16] yes, there is my mistake [15:25:39] https://logstash.wikimedia.org/app/kibana#/dashboard/87348b60-90dd-11e8-8687-73968bebd217 [15:25:48] I ignore WARNINGS AND INFO [15:25:53] but forgot to ignore DEBUG [15:26:01] so I though they were ERROR level [15:26:10] okay, makes sense [15:26:13] sorry about that, will fix the dashboard [15:26:19] you can ignore me [15:26:21] np, thanks for checking :) [15:26:31] but as other people told me, better to ping and be wrong [15:26:42] than not ping and be right :-D [15:26:47] exactly :) [15:27:30] (03CR) 10Subramanya Sastry: "Marko tells me that 8001 is for Parsoid/JS on both servers. So, I gave you the wrong info." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549637 (https://phabricator.wikimedia.org/T229078) (owner: 10Catrope) [15:28:05] Now it should be fixed: https://logstash.wikimedia.org/app/kibana#/dashboard/87348b60-90dd-11e8-8687-73968bebd217 [15:28:07] sorry again [15:37:19] !log beginning rolling service restarts on logstash hosts for java security updates [15:37:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:14] (03PS1) 10Giuseppe Lavagetto: prometheus: add scraping of k8s envoy sidecars [puppet] - 10https://gerrit.wikimedia.org/r/549871 [15:42:16] (03PS1) 10Giuseppe Lavagetto: kubernetes::deployment_server: Add a private/general.yaml file [puppet] - 10https://gerrit.wikimedia.org/r/549872 (https://phabricator.wikimedia.org/T237234) [15:45:00] (03PS4) 10CDanis: prometheus: export NIC firmware versions [puppet] - 10https://gerrit.wikimedia.org/r/549683 (https://phabricator.wikimedia.org/T236744) [15:46:02] (03CR) 10Bstorm: "We found the problem and this version of the ingress object is correct." [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549613 (https://phabricator.wikimedia.org/T236202) (owner: 10Bstorm) [15:48:57] 10Operations, 10ops-eqiad: (Need by Aug 1) rack/setup/install dumpsdata1003.eqiad.wmnet - https://phabricator.wikimedia.org/T234076 (10Cmjohnson) [15:49:34] RECOVERY - Check systemd state on logstash1008 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [15:50:39] (03CR) 10Subramanya Sastry: "> > Still -1 because it introduces another file_exists call in the" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548944 (https://phabricator.wikimedia.org/T236833) (owner: 10Dzahn) [15:51:22] 10Operations: (Need by Aug 1) rack/setup/install dumpsdata1003.eqiad.wmnet - https://phabricator.wikimedia.org/T234076 (10Cmjohnson) a:05Cmjohnson→03ArielGlenn @ArielGlenn All yours! If you have any issues please add the ops-eqiad tag back. Thanks! [15:51:51] (03Abandoned) 10Bartosz Dziewoński: Explicitly set wgVisualEditorRestbaseParsoidVariant='js' everywhere else [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549633 (https://phabricator.wikimedia.org/T229074) (owner: 10Bartosz Dziewoński) [15:57:01] !log jynus@cumin1001 dbctl commit (dc=all): 'Pool db1118, db1106 at 100%', diff saved to https://phabricator.wikimedia.org/P9582 and previous config saved to /var/cache/conftool/dbconfig/20191108-155700-jynus.json [15:57:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:59:50] RECOVERY - Check systemd state on logstash1007 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [16:00:16] 10Operations, 10ops-eqiad, 10Cloud-Services, 10cloud-services-team (Kanban): rack/setup/install cloudcephmon100[123] - https://phabricator.wikimedia.org/T228102 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jeh on cumin1001.eqiad.wmnet for hosts: ` ['cloudcephmon1001.wikimedia.org'] ` The... [16:00:48] (03CR) 10Bstorm: [C: 03+2] new k8s: Fix ingress object and enable toolsbeta ingress creation [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549613 (https://phabricator.wikimedia.org/T236202) (owner: 10Bstorm) [16:01:37] (03PS1) 10Mobrovac: [Beta] Flow: Use Parsoid/PHP [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549875 (https://phabricator.wikimedia.org/T229078) [16:05:31] (03CR) 10Bstorm: [C: 03+1] "But will need to be version 0.51" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549863 (https://phabricator.wikimedia.org/T230961) (owner: 10Phamhi) [16:06:05] (03PS5) 10CDanis: prometheus: export NIC firmware versions [puppet] - 10https://gerrit.wikimedia.org/r/549683 (https://phabricator.wikimedia.org/T236744) [16:06:46] (03PS1) 10RLazarus: When starting a cookbook, also log the args to IRC. [software/spicerack] - 10https://gerrit.wikimedia.org/r/549879 [16:10:43] (03PS1) 10Mholloway: Revert "MachineVision: Enable testers-only mode on testcommonswiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549880 [16:11:54] (03PS3) 10Ema: ATS: X-Wikimedia-Debug request routing implementation [puppet] - 10https://gerrit.wikimedia.org/r/549840 (https://phabricator.wikimedia.org/T237687) [16:11:56] (03CR) 10Mholloway: [C: 03+2] Revert "MachineVision: Enable testers-only mode on testcommonswiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549880 (owner: 10Mholloway) [16:12:50] (03Merged) 10jenkins-bot: Revert "MachineVision: Enable testers-only mode on testcommonswiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549880 (owner: 10Mholloway) [16:15:00] !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Revert "MachineVision: Enable testers-only mode on testcommonswiki for debugging" (duration: 00m 54s) [16:15:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:43] !log jeh@cumin1001 START - Cookbook sre.hosts.downtime [16:15:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:16:23] (03CR) 10Bstorm: [C: 04-1] "I changed my review to -1 because the images are not on the registry yet, it seems: https://docker-registry.tools.wmflabs.org/v2/toolforge" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549863 (https://phabricator.wikimedia.org/T230961) (owner: 10Phamhi) [16:16:47] (03CR) 10Krinkle: ATS: X-Wikimedia-Debug request routing implementation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/549840 (https://phabricator.wikimedia.org/T237687) (owner: 10Ema) [16:17:01] (03CR) 10Bstorm: [C: 04-1] "Wait, I see the non-sssd version https://docker-registry.tools.wmflabs.org/v2/toolforge-python37-web/tags/list" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549863 (https://phabricator.wikimedia.org/T230961) (owner: 10Phamhi) [16:17:54] (03PS1) 10Volans: cookbooks: add argparse custom formatter [cookbooks] - 10https://gerrit.wikimedia.org/r/549883 [16:19:24] !log jeh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [16:19:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:20:02] (03CR) 10Bstorm: [C: 04-1] tools-webservice: Add new buster image options to webservice (031 comment) [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549863 (https://phabricator.wikimedia.org/T230961) (owner: 10Phamhi) [16:23:11] 10Operations, 10ops-eqiad, 10Cloud-Services, 10cloud-services-team (Kanban): rack/setup/install cloudcephmon100[123] - https://phabricator.wikimedia.org/T228102 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudcephmon1001.wikimedia.org'] ` and were **ALL** successful. [16:23:19] (03CR) 10Bstorm: "https://puppet-compiler.wmflabs.org/compiler1002/19320/tools-sgebastion-07.tools.eqiad.wmflabs/" [puppet] - 10https://gerrit.wikimedia.org/r/549661 (https://phabricator.wikimedia.org/T214513) (owner: 10Bstorm) [16:23:56] (03CR) 10Volans: [C: 04-1] "Thanks for the patch, unfortunately is not that trivial. There's already a task for it ( https://phabricator.wikimedia.org/T221212 ) and a" [software/spicerack] - 10https://gerrit.wikimedia.org/r/549879 (owner: 10RLazarus) [16:24:07] (03PS5) 10Phamhi: tools-webservice: Add new buster image options to webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549863 (https://phabricator.wikimedia.org/T230961) [16:26:01] (03CR) 10Elukey: [C: 03+1] cookbooks: add argparse custom formatter [cookbooks] - 10https://gerrit.wikimedia.org/r/549883 (owner: 10Volans) [16:26:29] (03PS2) 10Volans: cookbooks: add argparse custom formatter [cookbooks] - 10https://gerrit.wikimedia.org/r/549883 [16:30:47] (03CR) 10Volans: [C: 03+2] cookbooks: add argparse custom formatter [cookbooks] - 10https://gerrit.wikimedia.org/r/549883 (owner: 10Volans) [16:32:23] (03Merged) 10jenkins-bot: cookbooks: add argparse custom formatter [cookbooks] - 10https://gerrit.wikimedia.org/r/549883 (owner: 10Volans) [16:33:15] (03PS1) 10Jhedden: install_server: update partition layout for cloudcephmon100[1-3] [puppet] - 10https://gerrit.wikimedia.org/r/549889 (https://phabricator.wikimedia.org/T228102) [16:34:57] (03CR) 10Jhedden: [C: 03+2] install_server: update partition layout for cloudcephmon100[1-3] [puppet] - 10https://gerrit.wikimedia.org/r/549889 (https://phabricator.wikimedia.org/T228102) (owner: 10Jhedden) [16:37:00] (03PS4) 10Elukey: sre.zookeeper.roll-restart-zookeeper: add zookepeer-analytics cluster [cookbooks] - 10https://gerrit.wikimedia.org/r/549585 [16:37:02] (03PS2) 10Elukey: sre.druid.roll-restart-workers: add log before the restarts happen [cookbooks] - 10https://gerrit.wikimedia.org/r/549812 [16:37:59] (03PS4) 10Ema: ATS: X-Wikimedia-Debug request routing implementation [puppet] - 10https://gerrit.wikimedia.org/r/549840 (https://phabricator.wikimedia.org/T237687) [16:38:11] 10Operations, 10ops-eqiad, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudcephmon100[123] - https://phabricator.wikimedia.org/T228102 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jeh on cumin1001.eqiad.wmnet for hosts: ` ['cloudcephmon1001.... [16:38:35] (03CR) 10Ema: [C: 03+2] labs: add cache_{text,upload} to wikimedia_clusters [puppet] - 10https://gerrit.wikimedia.org/r/549858 (owner: 10Ema) [16:39:04] (03CR) 10Bstorm: [C: 03+1] "Got all the sssd images pushed up :) Looks great!" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549863 (https://phabricator.wikimedia.org/T230961) (owner: 10Phamhi) [16:39:41] (03CR) 10Ema: ATS: X-Wikimedia-Debug request routing implementation (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/549840 (https://phabricator.wikimedia.org/T237687) (owner: 10Ema) [16:40:15] (03CR) 10Arlolra: [Beta] Flow: Use Parsoid/PHP (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549875 (https://phabricator.wikimedia.org/T229078) (owner: 10Mobrovac) [16:40:46] (03CR) 10Phamhi: [C: 03+2] tools-webservice: Add new buster image options to webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549863 (https://phabricator.wikimedia.org/T230961) (owner: 10Phamhi) [16:41:10] (03CR) 10Elukey: [C: 03+2] sre.zookeeper.roll-restart-zookeeper: add zookepeer-analytics cluster [cookbooks] - 10https://gerrit.wikimedia.org/r/549585 (owner: 10Elukey) [16:41:18] (03CR) 10Elukey: [C: 03+2] sre.druid.roll-restart-workers: add log before the restarts happen [cookbooks] - 10https://gerrit.wikimedia.org/r/549812 (owner: 10Elukey) [16:41:22] (03Merged) 10jenkins-bot: tools-webservice: Add new buster image options to webservice [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/549863 (https://phabricator.wikimedia.org/T230961) (owner: 10Phamhi) [16:41:32] (03Abandoned) 10Elukey: Add defaults to argparse's output for kafka/hadoop/zookeeper/cassandra [cookbooks] - 10https://gerrit.wikimedia.org/r/549582 (owner: 10Elukey) [16:43:02] PROBLEM - Check the Netbox report puppetdb for fail status. on netbox1001 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [16:45:10] (03PS1) 10Cmjohnson: Adding dns entries for elastic1053-67 [dns] - 10https://gerrit.wikimedia.org/r/549891 (https://phabricator.wikimedia.org/T230746) [16:45:20] 10Operations, 10SRE-tools, 10puppet-compiler, 10User-jbond: Puppet compiler: re-add the concurrency option NUM_THREADS - https://phabricator.wikimedia.org/T157002 (10Volans) 05Open→03Resolved a:03Volans I don't recall the details, too much time has passed, but indeed, it seem it's still supported by... [16:50:08] (03PS2) 10Cmjohnson: Adding dns entries for elastic1053-67 [dns] - 10https://gerrit.wikimedia.org/r/549891 (https://phabricator.wikimedia.org/T230746) [16:52:01] (03CR) 10Cmjohnson: [C: 03+2] Adding dns entries for elastic1053-67 [dns] - 10https://gerrit.wikimedia.org/r/549891 (https://phabricator.wikimedia.org/T230746) (owner: 10Cmjohnson) [16:52:29] !log jeh@cumin1001 START - Cookbook sre.hosts.downtime [16:52:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:54:34] !log jeh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) [16:54:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:55:34] 10Operations, 10ops-eqiad, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudcephmon100[123] - https://phabricator.wikimedia.org/T228102 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cloudcephmon1001.wikimedia.org'] ` Of which those **FAILED**: `... [16:56:34] (03PS1) 10Elukey: Move cassandra/druid/hadoop/kafka/zookepeer cookbooks to ArgparseFormatter [cookbooks] - 10https://gerrit.wikimedia.org/r/549892 [16:58:12] (03CR) 10jerkins-bot: [V: 04-1] Move cassandra/druid/hadoop/kafka/zookepeer cookbooks to ArgparseFormatter [cookbooks] - 10https://gerrit.wikimedia.org/r/549892 (owner: 10Elukey) [16:58:34] if I had run tox! [16:58:43] bad Luca [16:59:01] how many chances do I have that Riccardo didn't see? [16:59:20] rotfl [16:59:21] (03PS4) 10Thcipriani: logging: add logspam utilities [puppet] - 10https://gerrit.wikimedia.org/r/547777 (owner: 10Brennen Bearnes) [16:59:29] pro tip: if you ping me not many [16:59:39] (03CR) 10Thcipriani: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/547777 (owner: 10Brennen Bearnes) [17:01:28] 10Operations, 10Release-Engineering-Team, 10serviceops, 10Performance-Team (Radar), and 2 others: PHP Fatal error: The UdpSocket to 127.0.0.1:10514 has been closed (from Monolog/SyslogUdp on mwdebug1002) - https://phabricator.wikimedia.org/T214734 (10ema) Notice that debug servers aren't pooled in etcd li... [17:01:47] (03CR) 10jerkins-bot: [V: 04-1] logging: add logspam utilities [puppet] - 10https://gerrit.wikimedia.org/r/547777 (owner: 10Brennen Bearnes) [17:04:09] PROBLEM - Maps tiles generation on icinga1001 is CRITICAL: CRITICAL: 100.00% of data under the critical threshold [5.0] https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=8&fullscreen&orgId=1 [17:05:37] (03PS2) 10Elukey: Move cassandra/druid/hadoop/kafka/zookepeer cookbooks to ArgparseFormatter [cookbooks] - 10https://gerrit.wikimedia.org/r/549892 [17:09:59] PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:15:26] (03PS3) 10Volans: Move cassandra/druid/hadoop/kafka/zookepeer cookbooks to ArgparseFormatter [cookbooks] - 10https://gerrit.wikimedia.org/r/549892 (owner: 10Elukey) [17:19:30] (03PS1) 10Cmjohnson: Adding dhcpd file for elastic10[53-67] [puppet] - 10https://gerrit.wikimedia.org/r/549893 (https://phabricator.wikimedia.org/T230746) [17:23:11] (03CR) 10Subramanya Sastry: "Should this also undo the change in https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/549637/1/wmf-config/LabsServices.php th" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549875 (https://phabricator.wikimedia.org/T229078) (owner: 10Mobrovac) [17:26:20] (03CR) 10Elukey: [C: 03+2] Move cassandra/druid/hadoop/kafka/zookepeer cookbooks to ArgparseFormatter [cookbooks] - 10https://gerrit.wikimedia.org/r/549892 (owner: 10Elukey) [17:27:48] (03CR) 10RobH: [C: 03+1] Adding dhcpd file for elastic10[53-67] [puppet] - 10https://gerrit.wikimedia.org/r/549893 (https://phabricator.wikimedia.org/T230746) (owner: 10Cmjohnson) [17:28:25] 10Operations, 10ops-codfw: backup2001 crashed with no logs on 2019-11-08 14:22 - https://phabricator.wikimedia.org/T237730 (10Papaul) 05Open→03Resolved @jcrespo all looks good no power issue in that rack and also the IDRAC logs doesn't show any errors or problem. Resolving the task for now and if it happe... [17:28:34] 10Operations, 10DBA, 10serviceops, 10Goal, 10Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10Papaul) [17:28:45] 10Operations, 10ops-eqiad, 10Cloud-Services, 10cloud-services-team (Kanban): rack/setup/install cloudcephmon100[123] - https://phabricator.wikimedia.org/T228102 (10JHedden) @Jclark-ctr Could you help me with the cloudcephmon1002 and cloudcephmon1003 servers? I'm unable to power them on through iDRAC SSH, I... [17:29:40] 10Operations, 10ops-eqiad, 10Discovery-Search (Current work): Degraded RAID on elastic1039 - https://phabricator.wikimedia.org/T236601 (10Cmjohnson) [17:30:41] 10Operations, 10ops-eqiad, 10Discovery-Search (Current work): Degraded RAID on elastic1039 - https://phabricator.wikimedia.org/T236601 (10Cmjohnson) @Jclark-ctr I have the disk and will leave in data center for you to replace the disk next week. [17:31:20] (03CR) 10Cmjohnson: [C: 03+2] Adding dhcpd file for elastic10[53-67] [puppet] - 10https://gerrit.wikimedia.org/r/549893 (https://phabricator.wikimedia.org/T230746) (owner: 10Cmjohnson) [17:34:53] 10Operations, 10ops-eqiad, 10Discovery-Search (Current work), 10Patch-For-Review: (Aug 30th, 2019) rack/setup/install elastic10[53-67].eqiad.wmnet - https://phabricator.wikimedia.org/T230746 (10Cmjohnson) [17:36:52] RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [17:39:55] 10Operations, 10ops-codfw: backup2001 crashed with no logs on 2019-11-08 14:22 - https://phabricator.wikimedia.org/T237730 (10Papaul) Also I upgrade the IDRAC from 3.15 to 3.34 [17:40:18] 10Operations, 10serviceops, 10Patch-For-Review, 10Performance-Team (Radar), and 2 others: Upgrade memcached for Debian Stretch/Buster - https://phabricator.wikimedia.org/T213089 (10elukey) `stat items` is something that changed a bit. For example, take slab 63: * Jessie ` STAT items:63:number 1431 STAT i... [17:40:32] 10Operations, 10ops-eqiad, 10Analytics, 10decommission, 10Patch-For-Review: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Papaul) ` papaul@asw2-d-eqiad# show | compare [edit interfaces] - ge-1/0/7 { - description dbstore1002; - enable; - } [17:40:51] (03PS1) 10Giuseppe Lavagetto: role::debug_proxy: remove old/unused aliases, depool mwdebug1002 [puppet] - 10https://gerrit.wikimedia.org/r/549895 [17:41:17] 10Operations, 10ops-eqiad, 10Analytics, 10decommission, 10Patch-For-Review: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Papaul) [17:42:58] (03CR) 10Krinkle: role::debug_proxy: remove old/unused aliases, depool mwdebug1002 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/549895 (owner: 10Giuseppe Lavagetto) [17:43:16] (03CR) 10Krinkle: role::debug_proxy: remove old/unused aliases, depool mwdebug1002 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/549895 (owner: 10Giuseppe Lavagetto) [17:43:48] 10Operations, 10ops-eqiad, 10decommission: decom einsteinium - https://phabricator.wikimedia.org/T209738 (10Papaul) ` papaul@asw2-d-eqiad# show | compare [edit interfaces] - ge-3/0/7 { - description einsteinium; - enable; - } [17:44:16] 10Operations, 10ops-eqiad, 10decommission: decom einsteinium - https://phabricator.wikimedia.org/T209738 (10Papaul) [17:44:55] (03PS6) 10CDanis: prometheus: export NIC firmware versions [puppet] - 10https://gerrit.wikimedia.org/r/549683 (https://phabricator.wikimedia.org/T236744) [17:44:57] (03PS1) 10CDanis: systemd::timer::job: add typing for $interval [puppet] - 10https://gerrit.wikimedia.org/r/549896 [17:46:25] 10Operations, 10ops-eqiad, 10decommission: Decommission ms-be1027 - https://phabricator.wikimedia.org/T233289 (10Papaul) ` papaul@asw2-d-eqiad# show | compare [edit interfaces] - xe-7/0/13 { - description ms-be1027; - enable; - } [17:46:30] (03PS2) 10Giuseppe Lavagetto: role::debug_proxy: remove old/unused aliases, depool mwdebug1002 [puppet] - 10https://gerrit.wikimedia.org/r/549895 [17:47:13] (03PS1) 10Dzahn: remove mgmt entries for einsteinium [dns] - 10https://gerrit.wikimedia.org/r/549898 (https://phabricator.wikimedia.org/T209738) [17:47:57] (03CR) 10Giuseppe Lavagetto: ATS: X-Wikimedia-Debug request routing implementation (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/549840 (https://phabricator.wikimedia.org/T237687) (owner: 10Ema) [17:48:36] (03PS5) 10Dzahn: gerrit: remove pre-buster support [puppet] - 10https://gerrit.wikimedia.org/r/548547 [17:49:11] (03CR) 10Papaul: [C: 03+1] remove mgmt entries for einsteinium [dns] - 10https://gerrit.wikimedia.org/r/549898 (https://phabricator.wikimedia.org/T209738) (owner: 10Dzahn) [17:49:32] 10Operations, 10ops-eqiad, 10decommission: Decommission ms-be1027 - https://phabricator.wikimedia.org/T233289 (10Papaul) [17:49:51] (03CR) 10Dzahn: [C: 03+2] remove mgmt entries for einsteinium [dns] - 10https://gerrit.wikimedia.org/r/549898 (https://phabricator.wikimedia.org/T209738) (owner: 10Dzahn) [17:49:53] 10Operations, 10ops-eqiad, 10decommission: Decommission ms-be1027 - https://phabricator.wikimedia.org/T233289 (10Papaul) [17:49:56] (03PS2) 10Dzahn: remove mgmt entries for einsteinium [dns] - 10https://gerrit.wikimedia.org/r/549898 (https://phabricator.wikimedia.org/T209738) [17:51:13] 10Operations, 10ops-eqiad, 10decommission, 10Patch-For-Review: decom einsteinium - https://phabricator.wikimedia.org/T209738 (10Dzahn) [17:51:21] 10Operations, 10ops-eqiad, 10decommission: decom einsteinium - https://phabricator.wikimedia.org/T209738 (10Dzahn) [17:52:34] (03PS3) 10Giuseppe Lavagetto: role::debug_proxy: remove old/unused aliases, depool mwdebug1002 [puppet] - 10https://gerrit.wikimedia.org/r/549895 [17:52:57] (03CR) 10jerkins-bot: [V: 04-1] role::debug_proxy: remove old/unused aliases, depool mwdebug1002 [puppet] - 10https://gerrit.wikimedia.org/r/549895 (owner: 10Giuseppe Lavagetto) [17:53:43] (03CR) 10CDanis: "small PCC looks good: https://puppet-compiler.wmflabs.org/compiler1002/19323/" [puppet] - 10https://gerrit.wikimedia.org/r/549896 (owner: 10CDanis) [17:55:42] (03CR) 10Krinkle: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/549895 (owner: 10Giuseppe Lavagetto) [18:00:06] (03PS1) 10Paladox: Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/549901 [18:00:13] (03CR) 10Krinkle: [C: 03+1] role::debug_proxy: remove old/unused aliases, depool mwdebug1002 [puppet] - 10https://gerrit.wikimedia.org/r/549895 (owner: 10Giuseppe Lavagetto) [18:01:11] (03CR) 10jerkins-bot: [V: 04-1] Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/549901 (owner: 10Paladox) [18:01:15] :O [18:01:19] (03CR) 10Dzahn: [C: 03+2] gerrit: remove pre-buster support [puppet] - 10https://gerrit.wikimedia.org/r/548547 (owner: 10Dzahn) [18:01:21] let me guess [18:01:37] 18:01:07 Current Bazel version is 0.29.1; expected at least 1.1.0 [18:01:44] :p [18:02:00] 10Operations, 10ops-eqiad, 10Analytics, 10DC-Ops, 10decommission: Decommission analytics1003 - https://phabricator.wikimedia.org/T206524 (10Papaul) ` papaul@asw2-c-eqiad# show | compare [edit interfaces] - ge-4/0/18 { - description "analytics1003 - no-bw-mon"; - } [18:08:34] (03PS5) 10Thcipriani: logging: add logspam utilities [puppet] - 10https://gerrit.wikimedia.org/r/547777 (owner: 10Brennen Bearnes) [18:08:42] PROBLEM - Check the Netbox report puppetdb for fail status. on netbox1001 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [18:09:07] (03CR) 10Thcipriani: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/547777 (owner: 10Brennen Bearnes) [18:09:53] (03CR) 10Dzahn: [C: 03+2] "This is now ready to go since the favicon has been moved." [puppet] - 10https://gerrit.wikimedia.org/r/549208 (https://phabricator.wikimedia.org/T213223) (owner: 10Krinkle) [18:16:12] 10Operations, 10serviceops, 10User-jijiki: mw2225 keeps sending cronspam for hhvm-needs-restart - https://phabricator.wikimedia.org/T236799 (10Dzahn) 05Open→03Resolved a:03Dzahn After it sent the regular message Nov 3, 4, 5, 6 it has not sent one anymore on the 7th. The restart did it it looks. [18:22:06] RECOVERY - Keyholder SSH agent on cumin1001 is OK: OK: Keyholder is armed with all configured keys. https://wikitech.wikimedia.org/wiki/Keyholder [18:23:20] RECOVERY - Keyholder SSH agent on cumin2001 is OK: OK: Keyholder is armed with all configured keys. https://wikitech.wikimedia.org/wiki/Keyholder [18:24:16] PROBLEM - Check systemd state on netbox1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:29:32] (03PS1) 10Dzahn: phabricator: enable phd service on phab2001 and monitor it [puppet] - 10https://gerrit.wikimedia.org/r/549906 (https://phabricator.wikimedia.org/T232883) [18:31:32] PROBLEM - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is CRITICAL: CRITICAL: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [18:32:12] (03PS2) 10Paladox: Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/549901 [18:33:10] (03CR) 10Dzahn: "Or we only enable it but keep the monitoring change separate.. then figure out what happens with the phab1001/phab1003 situation." [puppet] - 10https://gerrit.wikimedia.org/r/549906 (https://phabricator.wikimedia.org/T232883) (owner: 10Dzahn) [18:33:28] (03CR) 10jerkins-bot: [V: 04-1] Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/549901 (owner: 10Paladox) [18:36:58] RECOVERY - Check systemd state on netbox1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:42:08] RECOVERY - Check the last execution of netbox_ganeti_codfw_sync on netbox1001 is OK: OK: Status of the systemd unit netbox_ganeti_codfw_sync https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [18:42:14] PROBLEM - Check the Netbox report puppetdb for fail status. on netbox1001 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [18:42:27] (03PS3) 10Paladox: Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/549901 [18:43:13] (03CR) 10jerkins-bot: [V: 04-1] Merge branch 'stable-2.16' into wmf/stable-2.16 [software/gerrit] (wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/549901 (owner: 10Paladox) [18:45:23] (03PS1) 10Papaul: DNS: Remove mgmt DNS for analytic1003, dbstore1002 abd ms-be1027 [dns] - 10https://gerrit.wikimedia.org/r/549912 [18:48:02] (03PS1) 10Dzahn: phabricator: monitor the PHD process if PHD is set to running in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/549913 (https://phabricator.wikimedia.org/T232883) [18:49:11] (03CR) 10jerkins-bot: [V: 04-1] phabricator: monitor the PHD process if PHD is set to running in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/549913 (https://phabricator.wikimedia.org/T232883) (owner: 10Dzahn) [18:50:05] (03CR) 10Dzahn: [C: 03+1] DNS: Remove mgmt DNS for analytic1003, dbstore1002 abd ms-be1027 [dns] - 10https://gerrit.wikimedia.org/r/549912 (owner: 10Papaul) [18:52:01] (03CR) 10Dzahn: "ok.. i think we should first do https://gerrit.wikimedia.org/r/c/operations/puppet/+/549913 and then remove the monitoring part from here" [puppet] - 10https://gerrit.wikimedia.org/r/549906 (https://phabricator.wikimedia.org/T232883) (owner: 10Dzahn) [18:52:45] (03PS2) 10Papaul: DNS: Remove mgmt DNS for analytic1003, dbstore1002 abd ms-be1027 [dns] - 10https://gerrit.wikimedia.org/r/549912 [18:53:28] (03CR) 10Papaul: [C: 03+2] DNS: Remove mgmt DNS for analytic1003, dbstore1002 abd ms-be1027 [dns] - 10https://gerrit.wikimedia.org/r/549912 (owner: 10Papaul) [18:57:07] 10Operations, 10ops-eqiad, 10Analytics, 10DC-Ops, and 2 others: Decommission analytics1003 - https://phabricator.wikimedia.org/T206524 (10Papaul) [18:57:17] (03PS2) 10Dzahn: phabricator: enable phd service on phab2001 [puppet] - 10https://gerrit.wikimedia.org/r/549906 (https://phabricator.wikimedia.org/T232883) [18:57:27] 10Operations, 10ops-eqiad, 10Analytics, 10DC-Ops, and 2 others: Decommission analytics1003 - https://phabricator.wikimedia.org/T206524 (10Papaul) 05Open→03Resolved Complete [19:02:40] 10Operations, 10ops-eqiad, 10Analytics, 10decommission, 10Patch-For-Review: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Papaul) [19:02:59] 10Operations, 10ops-eqiad, 10Analytics, 10decommission, 10Patch-For-Review: Decommission dbstore1002 - https://phabricator.wikimedia.org/T216491 (10Papaul) 05Open→03Resolved Complete [19:07:38] (03PS2) 10Krinkle: Set CSP on doc.wikimedia.org to enforce. [puppet] - 10https://gerrit.wikimedia.org/r/547718 (https://phabricator.wikimedia.org/T213223) (owner: 10Brian Wolff) [19:07:43] (03PS3) 10Krinkle: Set CSP on doc.wikimedia.org to enforce. [puppet] - 10https://gerrit.wikimedia.org/r/547718 (https://phabricator.wikimedia.org/T213223) (owner: 10Brian Wolff) [19:08:00] (03PS4) 10Krinkle: Set CSP on doc.wikimedia.org to enforce. [puppet] - 10https://gerrit.wikimedia.org/r/547718 (https://phabricator.wikimedia.org/T213223) (owner: 10Brian Wolff) [19:08:32] (03PS5) 10Krinkle: Set CSP on doc.wikimedia.org to enforce. [puppet] - 10https://gerrit.wikimedia.org/r/547718 (https://phabricator.wikimedia.org/T213223) (owner: 10Brian Wolff) [19:08:51] (03CR) 10Krinkle: "Rebased on the now-merged patch that removed uploadwmo and www-wm-o" [puppet] - 10https://gerrit.wikimedia.org/r/547718 (https://phabricator.wikimedia.org/T213223) (owner: 10Brian Wolff) [19:09:09] (03CR) 10Krinkle: [C: 03+1] Set CSP on doc.wikimedia.org to enforce. [puppet] - 10https://gerrit.wikimedia.org/r/547718 (https://phabricator.wikimedia.org/T213223) (owner: 10Brian Wolff) [19:10:33] 10Operations, 10ops-eqiad, 10decommission: Decommission ms-be1027 - https://phabricator.wikimedia.org/T233289 (10Papaul) [19:12:54] 10Operations, 10ops-eqiad, 10decommission: Decommission ms-be1027 - https://phabricator.wikimedia.org/T233289 (10Papaul) 05Open→03Resolved Complete [19:17:11] (03CR) 10CDanis: [C: 03+1] rsync: readd incoming and outgoing chmod [puppet] - 10https://gerrit.wikimedia.org/r/484304 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar) [19:18:36] (03CR) 10Dzahn: [C: 03+2] Set CSP on doc.wikimedia.org to enforce. [puppet] - 10https://gerrit.wikimedia.org/r/547718 (https://phabricator.wikimedia.org/T213223) (owner: 10Brian Wolff) [19:26:56] PROBLEM - Check the Netbox report puppetdb for fail status. on netbox1001 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [19:39:52] (03CR) 10Dzahn: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1002/19325/gerrit2001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/484304 (https://phabricator.wikimedia.org/T137890) (owner: 10Hashar) [19:45:36] 10Operations, 10ops-eqiad, 10decommission: decom einsteinium - https://phabricator.wikimedia.org/T209738 (10Papaul) 05Open→03Resolved [19:45:41] 10Operations, 10observability, 10Patch-For-Review: upgrade icinga server to stretch and replace einsteinium - https://phabricator.wikimedia.org/T202782 (10Papaul) [19:45:48] (03CR) 10Bstorm: [C: 03+2] kubectl: upgrade /usr/bin/kubectl to 1.15.5 [puppet] - 10https://gerrit.wikimedia.org/r/549661 (https://phabricator.wikimedia.org/T214513) (owner: 10Bstorm) [19:49:44] (03PS2) 10CDanis: systemd::timer::job: add typing for $interval [puppet] - 10https://gerrit.wikimedia.org/r/549896 [19:49:46] (03PS7) 10CDanis: prometheus: export NIC firmware versions [puppet] - 10https://gerrit.wikimedia.org/r/549683 (https://phabricator.wikimedia.org/T236744) [19:50:24] (03PS1) 10Phamhi: toolforge: Rename to toolforge-tool-role.yaml due to typo [puppet] - 10https://gerrit.wikimedia.org/r/549921 (https://phabricator.wikimedia.org/T227290) [19:50:47] (03CR) 10CDanis: [C: 03+2] "The few failures in the large PCC look to be unrelated." [puppet] - 10https://gerrit.wikimedia.org/r/549896 (owner: 10CDanis) [19:53:17] (03CR) 10Bstorm: [C: 03+1] "Yes please! :)" [puppet] - 10https://gerrit.wikimedia.org/r/549921 (https://phabricator.wikimedia.org/T227290) (owner: 10Phamhi) [19:54:05] (03CR) 10Phamhi: [C: 03+2] toolforge: Rename to toolforge-tool-role.yaml due to typo [puppet] - 10https://gerrit.wikimedia.org/r/549921 (https://phabricator.wikimedia.org/T227290) (owner: 10Phamhi) [20:03:20] (03CR) 10Dzahn: [C: 03+1] "https://puppet-compiler.wmflabs.org/compiler1003/19326/wtp1025.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/546448 (https://phabricator.wikimedia.org/T235899) (owner: 10Giuseppe Lavagetto) [20:08:42] (03CR) 10Jhedden: [C: 03+1] toollabs::maintain_kubeusers remove cleanup code [puppet] - 10https://gerrit.wikimedia.org/r/549460 (owner: 10Phamhi) [20:10:21] (03PS1) 10Alex Monk: Revert "systemd::timer::job: add typing for $interval" [puppet] - 10https://gerrit.wikimedia.org/r/549924 [20:10:42] PROBLEM - Unmerged changes on repository puppet on labtestpuppetmaster2001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [20:10:50] (03CR) 10CDanis: [C: 03+2] Revert "systemd::timer::job: add typing for $interval" [puppet] - 10https://gerrit.wikimedia.org/r/549924 (owner: 10Alex Monk) [20:11:44] (03CR) 10CDanis: [V: 03+2 C: 03+2] Revert "systemd::timer::job: add typing for $interval" [puppet] - 10https://gerrit.wikimedia.org/r/549924 (owner: 10Alex Monk) [20:12:07] phamhi: is it okay to puppet-merge your change? [20:12:16] cdanis: sure thing [20:13:54] RECOVERY - Unmerged changes on repository puppet on labtestpuppetmaster2001 is OK: No changes to merge. https://wikitech.wikimedia.org/wiki/Monitoring/unmerged_changes [20:15:07] (03CR) 10Catrope: [Beta] Flow: Use Parsoid/PHP (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549875 (https://phabricator.wikimedia.org/T229078) (owner: 10Mobrovac) [20:21:28] (03PS1) 10Mholloway: MachineVision: Delay annotation request jobs by 5 mins for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549925 [20:21:39] (03PS3) 10Dzahn: allow different memory limit settings for parsoid-php servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548944 (https://phabricator.wikimedia.org/T236833) [20:22:08] (03PS2) 10Mholloway: MachineVision: Delay annotation request jobs by 5 mins for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549925 [20:22:23] (03CR) 10jerkins-bot: [V: 04-1] allow different memory limit settings for parsoid-php servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548944 (https://phabricator.wikimedia.org/T236833) (owner: 10Dzahn) [20:22:55] (03CR) 10jerkins-bot: [V: 04-1] MachineVision: Delay annotation request jobs by 5 mins for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549925 (owner: 10Mholloway) [20:24:02] (03PS3) 10Mholloway: MachineVision: Delay annotation request jobs by 5 mins for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549925 [20:24:56] (03CR) 10Mholloway: [C: 03+2] MachineVision: Delay annotation request jobs by 5 mins for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549925 (owner: 10Mholloway) [20:26:44] !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings.php: MachineVision: Delay annotation request jobs by 5 mins for testing (duration: 00m 52s) [20:26:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:28:14] PROBLEM - Check the Netbox report puppetdb for fail status. on netbox1001 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [20:38:54] (03PS4) 10Dzahn: allow different memory limit settings for parsoid-php servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548944 (https://phabricator.wikimedia.org/T236833) [20:39:38] (03CR) 10jerkins-bot: [V: 04-1] allow different memory limit settings for parsoid-php servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548944 (https://phabricator.wikimedia.org/T236833) (owner: 10Dzahn) [20:41:17] (03CR) 10Dzahn: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/548439 (owner: 10Dzahn) [20:42:52] (03Abandoned) 10Dzahn: puppetmaster: remove jessie support [puppet] - 10https://gerrit.wikimedia.org/r/548439 (owner: 10Dzahn) [20:43:20] (03PS1) 10Jhedden: install_server: add cloudcephosd partman config [puppet] - 10https://gerrit.wikimedia.org/r/549928 (https://phabricator.wikimedia.org/T228102) [20:44:48] (03CR) 10Dzahn: [C: 04-2] "per comments on https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/548944 the consensus seems to only change it on parsoid ser" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548923 (https://phabricator.wikimedia.org/T236833) (owner: 10Dzahn) [20:48:21] (03CR) 10Dzahn: "when introducing new server names please add them to" [puppet] - 10https://gerrit.wikimedia.org/r/549928 (https://phabricator.wikimedia.org/T228102) (owner: 10Jhedden) [20:49:17] (03PS9) 10Paladox: WIP: Update gerrit to 2.16.13 [software/gerrit] (deploy/wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/495012 [20:53:06] (03CR) 10Jhedden: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/549928 (https://phabricator.wikimedia.org/T228102) (owner: 10Jhedden) [20:56:10] PROBLEM - Check the Netbox report puppetdb for fail status. on netbox1001 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [20:57:27] (03PS10) 10Paladox: WIP: Update gerrit to 2.16.13 [software/gerrit] (deploy/wmf/stable-2.16) - 10https://gerrit.wikimedia.org/r/495012 [20:59:01] (03PS17) 10Jhedden: ceph: add ceph storage cluster profiles and modules [puppet] - 10https://gerrit.wikimedia.org/r/546182 (https://phabricator.wikimedia.org/T236290) [20:59:38] (03PS18) 10Jhedden: ceph: add ceph storage cluster profiles and modules [puppet] - 10https://gerrit.wikimedia.org/r/546182 (https://phabricator.wikimedia.org/T236290) [21:02:43] (03CR) 10Phamhi: [C: 03+2] toollabs::maintain_kubeusers remove cleanup code [puppet] - 10https://gerrit.wikimedia.org/r/549460 (owner: 10Phamhi) [21:10:40] (03PS1) 10Herron: icinga: add notes_url to aggregate ipsec alerts [puppet] - 10https://gerrit.wikimedia.org/r/549932 (https://phabricator.wikimedia.org/T230236) [21:13:57] (03PS2) 10Herron: icinga: add notes_url to aggregate ipsec alerts [puppet] - 10https://gerrit.wikimedia.org/r/549932 (https://phabricator.wikimedia.org/T230236) [21:18:13] (03CR) 10Herron: [C: 03+2] icinga: add notes_url to aggregate ipsec alerts [puppet] - 10https://gerrit.wikimedia.org/r/549932 (https://phabricator.wikimedia.org/T230236) (owner: 10Herron) [21:18:36] 10Operations, 10ops-eqiad, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): rack/setup/install cloudcephmon100[123] - https://phabricator.wikimedia.org/T228102 (10JHedden) [21:23:34] 10Operations, 10Wikimedia-Mailing-lists, 10Privacy, 10Security, 10User-Josve05a: Stop storing Mailman passwords in plain text - https://phabricator.wikimedia.org/T181803 (10Josve05a) [21:25:48] (03PS1) 10Ayounsi: msw: extend generic, make generic/system.conf compatible with msw [homer/public] - 10https://gerrit.wikimedia.org/r/549933 [21:28:16] (03PS2) 10Ayounsi: msw/asw: [homer/public] - 10https://gerrit.wikimedia.org/r/549933 [21:33:07] (03CR) 10Subramanya Sastry: [Beta] Flow: Use Parsoid/PHP (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549875 (https://phabricator.wikimedia.org/T229078) (owner: 10Mobrovac) [21:34:07] (03PS1) 10Phamhi: toolforge: Add missing Package[python-yaml] dependency [puppet] - 10https://gerrit.wikimedia.org/r/549935 (https://phabricator.wikimedia.org/T237768) [21:36:35] (03CR) 10jerkins-bot: [V: 04-1] toolforge: Add missing Package[python-yaml] dependency [puppet] - 10https://gerrit.wikimedia.org/r/549935 (https://phabricator.wikimedia.org/T237768) (owner: 10Phamhi) [21:37:07] (03PS2) 10Phamhi: toolforge: Add missing Package[python-yaml] dependency [puppet] - 10https://gerrit.wikimedia.org/r/549935 (https://phabricator.wikimedia.org/T237768) [21:37:21] (03PS1) 10Andrew Bogott: wikitech: remove OSM settings related to OpenStack [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549936 (https://phabricator.wikimedia.org/T161553) [21:37:48] (03PS2) 10Andrew Bogott: wikitech: remove OSM settings related to OpenStack [mediawiki-config] - 10https://gerrit.wikimedia.org/r/549936 (https://phabricator.wikimedia.org/T161553) [21:39:46] Hi, Phabricator got update? [21:39:54] Details [21:39:54] RELATED GERRIT PATCHES: [21:39:57] It is something new? [21:40:00] yup [21:40:01] that's new [21:40:14] Oh cool :) [21:40:43] looks like it [21:40:44] pretty cool [21:41:15] Much easier to see gerrit pathces related to task [21:42:14] Thank you very much! [21:43:33] thank you to whoever did it [21:43:40] i also like it very much. [21:45:10] Krinkle twentyafterfour [21:45:13] err [21:45:15] wrong ping [21:45:17] sorry, Krenair ^ [21:45:32] (03CR) 10Jhedden: [C: 03+1] toolforge: Add missing Package[python-yaml] dependency [puppet] - 10https://gerrit.wikimedia.org/r/549935 (https://phabricator.wikimedia.org/T237768) (owner: 10Phamhi) [21:45:51] :) [21:46:15] (03PS2) 10Dzahn: phabricator: monitor the PHD process if PHD is set to running in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/549913 (https://phabricator.wikimedia.org/T232883) [21:46:29] twentyafterfour: ^ [21:46:32] (03CR) 1020after4: [C: 03+1] phabricator: monitor the PHD process if PHD is set to running in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/549913 (https://phabricator.wikimedia.org/T232883) (owner: 10Dzahn) [21:47:16] i think we want this before we enable on 2001 [21:47:23] so that's another change [21:47:27] (03PS3) 10Ayounsi: msw/asw: use same generic config [homer/public] - 10https://gerrit.wikimedia.org/r/549933 [21:47:41] mutante: +1 [21:48:42] (03CR) 10jerkins-bot: [V: 04-1] phabricator: monitor the PHD process if PHD is set to running in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/549913 (https://phabricator.wikimedia.org/T232883) (owner: 10Dzahn) [21:49:06] fixes syntax error [21:49:11] (03PS4) 10Ayounsi: msw/asw: use same generic config [homer/public] - 10https://gerrit.wikimedia.org/r/549933 [21:49:13] (03PS1) 10Ayounsi: msw: ensure no vlans are configured [homer/public] - 10https://gerrit.wikimedia.org/r/549938 [21:50:28] (03PS3) 10Dzahn: phabricator: monitor the PHD process if PHD is set to running in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/549913 (https://phabricator.wikimedia.org/T232883) [21:54:04] does anyone know how I would get Jenkins to run on PHP 7.3? https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GraphQL/+/547844 [21:54:06] hrmm... Execution of '/srv/jenkins-workspace/puppet-compiler/19329/change/src/environments/production/get_config.sh' returned 1: Error: Could not execute posix command: Permission denied - /srv/jenkins-workspace/puppet-compiler/19329/change/src/environments/production/get_config.sh [21:56:43] (03CR) 10Phamhi: [C: 03+2] toolforge: Add missing Package[python-yaml] dependency [puppet] - 10https://gerrit.wikimedia.org/r/549935 (https://phabricator.wikimedia.org/T237768) (owner: 10Phamhi) [21:56:52] (03PS4) 10Dzahn: phabricator: monitor the PHD process if PHD is set to running in Hiera [puppet] - 10https://gerrit.wikimedia.org/r/549913 (https://phabricator.wikimedia.org/T232883) [21:56:56] (03PS1) 10EBernhardson: mjolnir_bulk_daemon: Add new kafka topics for model upload [puppet] - 10https://gerrit.wikimedia.org/r/549939 [22:04:19] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/19330/" [puppet] - 10https://gerrit.wikimedia.org/r/549913 (https://phabricator.wikimedia.org/T232883) (owner: 10Dzahn) [22:06:23] (03PS3) 10Dzahn: phabricator: enable phd service on phab2001 [puppet] - 10https://gerrit.wikimedia.org/r/549906 (https://phabricator.wikimedia.org/T232883) [22:07:19] (03CR) 10Dzahn: "anything stopping us now?" [puppet] - 10https://gerrit.wikimedia.org/r/549906 (https://phabricator.wikimedia.org/T232883) (owner: 10Dzahn) [22:11:24] (03PS5) 10Dzahn: allow different memory limit settings for parsoid-php servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548944 (https://phabricator.wikimedia.org/T236833) [22:12:31] (03PS4) 10Dzahn: gerrit: refactor, move java setup to separate class [puppet] - 10https://gerrit.wikimedia.org/r/548554 [22:13:18] (03CR) 10Subramanya Sastry: [C: 03+1] allow different memory limit settings for parsoid-php servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548944 (https://phabricator.wikimedia.org/T236833) (owner: 10Dzahn) [22:13:42] (03CR) 10jerkins-bot: [V: 04-1] gerrit: refactor, move java setup to separate class [puppet] - 10https://gerrit.wikimedia.org/r/548554 (owner: 10Dzahn) [22:15:00] (03CR) 10Dzahn: allow different memory limit settings for parsoid-php servers (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548944 (https://phabricator.wikimedia.org/T236833) (owner: 10Dzahn) [22:17:18] (03PS6) 10Dzahn: allow different memory limit settings for parsoid-php servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/548944 (https://phabricator.wikimedia.org/T236833) [22:17:46] (03CR) 10Dzahn: [C: 03+1] "Ib2e6dd89611c7 will now use this" [puppet] - 10https://gerrit.wikimedia.org/r/546448 (https://phabricator.wikimedia.org/T235899) (owner: 10Giuseppe Lavagetto) [22:18:36] (03CR) 10Dzahn: [C: 04-1] "https://puppet-compiler.wmflabs.org/compiler1001/19331/gerrit1001.wikimedia.org/change.gerrit1001.wikimedia.org.err" [puppet] - 10https://gerrit.wikimedia.org/r/548554 (owner: 10Dzahn) [22:21:22] (03PS5) 10Dzahn: gerrit: refactor, move java setup to separate class [puppet] - 10https://gerrit.wikimedia.org/r/548554 [22:22:16] paladox: ^ starting to move stuff out of jetty.. it's way too large i think [22:22:32] (03CR) 10jerkins-bot: [V: 04-1] gerrit: refactor, move java setup to separate class [puppet] - 10https://gerrit.wikimedia.org/r/548554 (owner: 10Dzahn) [22:22:33] (and not really jetty) [22:22:38] ok [22:22:41] awesome! [22:22:44] so first the java stuff [22:24:11] ah, i also have to move nrpe monitoring for the gerrit process because that uses $java_home ..meh [22:24:56] great! [22:29:07] (03PS6) 10Dzahn: gerrit: refactor, move java setup to separate class [puppet] - 10https://gerrit.wikimedia.org/r/548554 [22:29:47] changes the monitoring command slightly to avoid needing the $java_home. '.*/bin/java .*-jar /var/lib/gerrit2/review_site/bin/gerrit.war .." is fine too [22:33:10] (03CR) 10Krinkle: profile::mediawiki::httpd: set a SERVERGROUP env variable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/546448 (https://phabricator.wikimedia.org/T235899) (owner: 10Giuseppe Lavagetto) [22:34:49] (03CR) 10Krinkle: profile::mediawiki::httpd: set a SERVERGROUP env variable (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/546448 (https://phabricator.wikimedia.org/T235899) (owner: 10Giuseppe Lavagetto) [22:36:49] paladox: Failed to parse template gerrit/gerrit.config.erb and doesnt work like i wanted.. heh [22:36:54] hmm [22:37:08] because it needs java_home and heap_limit in the template too of course [22:37:16] and it's used in jetty.pp [22:38:30] (03CR) 10Paladox: gerrit: refactor, move java setup to separate class (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/548554 (owner: 10Dzahn) [22:38:34] yup [22:44:40] (03CR) 10Dzahn: gerrit: refactor, move java setup to separate class (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/548554 (owner: 10Dzahn) [22:49:00] (03PS7) 10Dzahn: gerrit: refactor, move java setup to separate class [puppet] - 10https://gerrit.wikimedia.org/r/548554 [22:53:35] (03CR) 10Dzahn: [C: 03+1] "per https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions#Server_clusters a "cluster" is "eqiad" or "codfw". But in real l" [puppet] - 10https://gerrit.wikimedia.org/r/546448 (https://phabricator.wikimedia.org/T235899) (owner: 10Giuseppe Lavagetto) [23:42:08] PROBLEM - Check the Netbox report puppetdb for fail status. on netbox1001 is CRITICAL: puppetdb.PuppetDB CRITICAL https://wikitech.wikimedia.org/wiki/Netbox%23Reports [23:57:58] (03PS1) 10CDanis: rsync: provide a hiera default to unbreak cloud [puppet] - 10https://gerrit.wikimedia.org/r/549949 (https://phabricator.wikimedia.org/T237424) [23:59:49] (03CR) 10Bstorm: [C: 03+1] rsync: provide a hiera default to unbreak cloud [puppet] - 10https://gerrit.wikimedia.org/r/549949 (https://phabricator.wikimedia.org/T237424) (owner: 10CDanis)