[00:00:05] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor When your hammer is PHP, everything starts looking like a thumb. Rise for Evening SWAT (Max 8 patches). (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171130T0000). [00:00:05] kaldari, etonkovidova, and MaxSem: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:01:13] o/ [00:02:09] (03PS1) 10Dzahn: restbase-dev-cluster: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/394228 (https://phabricator.wikimedia.org/T177225) [00:04:46] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:04:51] OK, I'll do it [00:06:12] etonkovidova: yt? [00:06:24] yes? [00:06:56] checking ... [00:06:59] checking for deployment! [00:07:29] (03PS2) 10MaxSem: Switch all wikis to HTML5 section IDs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394104 (https://phabricator.wikimedia.org/T152540) [00:07:38] (03CR) 10MaxSem: [C: 032] Switch all wikis to HTML5 section IDs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394104 (https://phabricator.wikimedia.org/T152540) (owner: 10MaxSem) [00:07:47] (03CR) 10Dzahn: [C: 032] restbase-dev-cluster: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/394228 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [00:07:51] (03PS2) 10Dzahn: restbase-dev-cluster: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/394228 (https://phabricator.wikimedia.org/T177225) [00:08:05] while submodules are waiting for a merge, I'll do my config change [00:09:46] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [00:10:53] (03Merged) 10jenkins-bot: Switch all wikis to HTML5 section IDs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394104 (https://phabricator.wikimedia.org/T152540) (owner: 10MaxSem) [00:10:54] (03CR) 10jenkins-bot: Switch all wikis to HTML5 section IDs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394104 (https://phabricator.wikimedia.org/T152540) (owner: 10MaxSem) [00:15:42] (03PS1) 10MaxSem: Revert "Switch all wikis to HTML5 section IDs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394229 [00:15:44] (03CR) 10MaxSem: [C: 032] Revert "Switch all wikis to HTML5 section IDs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394229 (owner: 10MaxSem) [00:15:55] etonkovidova: it seems patch is merged. [00:16:00] yes [00:16:09] checking in testwiki [00:16:53] kart_: I do not see it working in tetswiki - argh ... [00:17:07] wait etonkovidova [00:17:12] it's not live yet [00:17:15] ah.. [00:17:32] ... [00:17:45] now it's on mwdebug1002, please test [00:18:06] MaxSem: thx [00:18:30] (03Merged) 10jenkins-bot: Revert "Switch all wikis to HTML5 section IDs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394229 (owner: 10MaxSem) [00:18:39] (03CR) 10jenkins-bot: Revert "Switch all wikis to HTML5 section IDs" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394229 (owner: 10MaxSem) [00:21:57] MaxSem: kart_ the fix is working on testwiki - all seems to be fine [00:22:05] wee [00:23:29] etonkovidova: let me test again. [00:23:40] kart_: ok [00:23:44] (03PS2) 10Dzahn: pybal: support RunCommand everywhere, not just appservers? [puppet] - 10https://gerrit.wikimedia.org/r/382930 (https://phabricator.wikimedia.org/T177225) [00:24:23] (03PS3) 10Dzahn: pybal: support RunCommand everywhere, not just appservers? [puppet] - 10https://gerrit.wikimedia.org/r/382930 (https://phabricator.wikimedia.org/T177225) [00:24:27] !log maxsem@tin Synchronized php-1.31.0-wmf.10/extensions/ContentTranslation/: https://gerrit.wikimedia.org/r/#/c/394206/ (duration: 00m 52s) [00:24:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:24:44] etonkovidova: ^ [00:25:42] etonkovidova: looks good now. [00:25:53] etonkovidova: thanks a lot! [00:25:58] MaxSem: Thanks! [00:26:06] kart_: hurray! [00:27:08] !log maxsem@tin Started scap: Message updates for https://gerrit.wikimedia.org/r/#/c/394155/ [00:27:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:37:48] 10Operations, 10monitoring, 10Patch-For-Review: ensure that services on labtest machines never create SMS from Icinga (not send sms pages for labtest* things to non-cloud folks) - https://phabricator.wikimedia.org/T178008#3798760 (10Dzahn) @chasemp @faidon @jcrespo Ok, so what we have meanwhile is this in... [00:39:15] 10Operations, 10Scoring-platform-team, 10Wikimedia-Logstash, 10monitoring, 10Wikimedia-Incident: Send celery and wsgi service logs to logstash - https://phabricator.wikimedia.org/T181630#3798762 (10Dzahn) [00:39:32] MaxSem: Is SWAT deploy over? [00:39:51] not yet [00:40:03] (03CR) 10Volans: "my 2 cents inline ;)" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/382930 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [00:40:13] k, then I'll wait. [00:45:06] (03CR) 10Dzahn: pybal: support RunCommand everywhere, not just appservers? (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/382930 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [00:47:10] 10Operations, 10Gerrit, 10Phabricator, 10Traffic, 10periodic-update: Phabricator and Gerrit: Improve the way that maintenance downtime is communicated to users. - https://phabricator.wikimedia.org/T180655#3798765 (10mmodell) One easy way to implement this (at least for phabricator) would be to just bring... [00:53:46] !log maxsem@tin Finished scap: Message updates for https://gerrit.wikimedia.org/r/#/c/394155/ (duration: 26m 38s) [00:53:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:56:23] bearND: done [00:56:36] MaxSem: awesome. Thanks! [00:56:44] !log bsitzmann@tin Started deploy [mobileapps/deploy@fa2a877]: Update mobileapps to dcea7d3 (T181004) [00:56:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:56:53] T181004: Run Big English fundraising on apps - https://phabricator.wikimedia.org/T181004 [00:59:51] (03PS4) 10Dzahn: pybal: use lvs::config not ganglia_clusters to determine if appserver [puppet] - 10https://gerrit.wikimedia.org/r/382930 (https://phabricator.wikimedia.org/T177225) [01:00:04] twentyafterfour: That opportune time is upon us again. Time for a Phabricator update deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171130T0100). [01:00:04] No GERRIT patches in the queue for this window AFAICS. [01:00:24] just a sec [01:01:51] PROBLEM - HHVM rendering on labweb1001 is CRITICAL: connect to address 10.64.16.200 and port 80: Connection refused [01:01:58] bearND: ? [01:02:01] PROBLEM - nutcracker process on labweb1001 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 113 (nutcracker), command name nutcracker [01:02:11] PROBLEM - nutcracker port on labweb1001 is CRITICAL: connect to address 127.0.0.1 and port 11212: Connection refused [01:02:32] PROBLEM - Apache HTTP on labweb1001 is CRITICAL: connect to address 10.64.16.200 and port 80: Connection refused [01:02:41] PROBLEM - HHVM processes on labweb1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name hhvm [01:02:52] !log bsitzmann@tin Finished deploy [mobileapps/deploy@fa2a877]: Update mobileapps to dcea7d3 (T181004) (duration: 06m 08s) [01:02:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:03:00] T181004: Run Big English fundraising on apps - https://phabricator.wikimedia.org/T181004 [01:03:03] !log Starting phabricator upgrade & maintenance. Service will be offline for less than 5 minutes. [01:03:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:03:20] twentyafterfour: done :) [01:06:39] !log finished phabricator upgrade, service is online and appears to be functioning normally [01:06:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:33:06] 10Operations, 10Gerrit, 10Phabricator, 10Traffic, 10periodic-update: Phabricator and Gerrit: Improve the way that maintenance downtime is communicated to users. - https://phabricator.wikimedia.org/T180655#3765427 (10Dzahn) I was thinking about "jouncebot: next" when reading this. I mean it already output... [02:29:20] !log l10nupdate@tin scap sync-l10n completed (1.31.0-wmf.8) (duration: 08m 43s) [02:29:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:45:15] !log ebernhardson@tin Started deploy [search/mjolnir/deploy@7aa39b7]: (no justification provided) [02:45:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:48:31] !log ebernhardson@tin Finished deploy [search/mjolnir/deploy@7aa39b7]: (no justification provided) (duration: 03m 15s) [02:48:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:24:41] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 786.08 seconds [03:54:21] PROBLEM - HHVM rendering on mw2100 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [03:55:12] RECOVERY - HHVM rendering on mw2100 is OK: HTTP OK: HTTP/1.1 200 OK - 73702 bytes in 0.308 second response time [04:05:42] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 294.15 seconds [05:02:41] (03PS12) 10TerraCodes: Add loginwiki and wikidata to $wgLocalVirtualHosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392999 (https://phabricator.wikimedia.org/T117302) [05:02:51] (03PS13) 10TerraCodes: Add loginwiki and wikidata to $wgLocalVirtualHosts [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392999 (https://phabricator.wikimedia.org/T117302) [05:03:36] (03PS16) 10TerraCodes: $wmf* -> $wmg* [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392184 (https://phabricator.wikimedia.org/T45956) [05:03:52] (03PS4) 10TerraCodes: Remove single editor tab for plwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393121 (https://phabricator.wikimedia.org/T181045) [05:10:41] PROBLEM - cassandra-a SSL 10.64.48.135:7001 on restbase1014 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [05:11:02] PROBLEM - cassandra-a CQL 10.64.48.135:9042 on restbase1014 is CRITICAL: connect to address 10.64.48.135 and port 9042: Connection refused [06:26:24] !log Deploy schema change on s3 db1077 - T174569 [06:26:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:26:33] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [06:28:31] PROBLEM - puppet last run on mw2258 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:30:12] PROBLEM - puppet last run on cp2024 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/varnishreqstats] [06:30:42] PROBLEM - puppet last run on mw2229 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/apache2/sites-enabled/wikimedia-legacy.incl] [06:44:22] (03PS1) 10Marostegui: db-eqiad.php: Depool db1056 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394244 [06:48:31] !log Deploy schema change on dbstore1001 s3 - T174569 [06:48:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:48:38] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [06:55:42] RECOVERY - puppet last run on mw2229 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:58:31] RECOVERY - puppet last run on mw2258 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [06:58:53] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1056 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394244 (owner: 10Marostegui) [07:00:11] RECOVERY - puppet last run on cp2024 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [07:00:20] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1056 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394244 (owner: 10Marostegui) [07:00:30] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1056 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394244 (owner: 10Marostegui) [07:01:53] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1056 (duration: 01m 18s) [07:01:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:48:00] !log installing curl security updates [07:48:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:48:31] PROBLEM - HHVM rendering on mw2114 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:48:34] (03CR) 10Krinkle: [C: 031] "No idea what this is used for or what it will change." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392999 (https://phabricator.wikimedia.org/T117302) (owner: 10TerraCodes) [07:49:21] RECOVERY - HHVM rendering on mw2114 is OK: HTTP OK: HTTP/1.1 200 OK - 73702 bytes in 0.318 second response time [08:20:51] (03PS14) 10Elukey: profile::hadoop::worker: add Prometheus JMX exporter configuration [puppet] - 10https://gerrit.wikimedia.org/r/394045 (https://phabricator.wikimedia.org/T177458) [08:21:53] !log Enable GTID on s8 eqiad hosts that do not have it enabled (db1109, db1104, db1101, db1092, db1087, db1063) - T177208 [08:22:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:01] T177208: Provide dedicated database resources for wikidata - https://phabricator.wikimedia.org/T177208 [08:25:20] !log rolling restart of mw canaries to pick up curl security update [08:25:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:26:43] (03PS3) 10Marostegui: filtered_tables: Add new columns [puppet] - 10https://gerrit.wikimedia.org/r/393725 (https://phabricator.wikimedia.org/T174569) [08:27:59] (03PS3) 10Giuseppe Lavagetto: profile::base: switch everything back to the default environment [puppet] - 10https://gerrit.wikimedia.org/r/394043 [08:31:17] <_joe_> uhm, on second thoughts, I'd wait for chasemp or andrewbogott to be around before merging this ^^ [08:32:31] (03PS4) 10Marostegui: filtered_tables: Add new columns [puppet] - 10https://gerrit.wikimedia.org/r/393725 (https://phabricator.wikimedia.org/T174569) [08:33:08] (03CR) 10Marostegui: [C: 032] filtered_tables: Add new columns [puppet] - 10https://gerrit.wikimedia.org/r/393725 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui) [08:36:38] (03CR) 10Marostegui: "Actually, I think I am going to revert this and leave it pending to merge, otherwise private_data check will complain until we do all the " [puppet] - 10https://gerrit.wikimedia.org/r/393725 (https://phabricator.wikimedia.org/T174569) (owner: 10Marostegui) [08:36:46] (03PS1) 10Marostegui: Revert "filtered_tables: Add new columns" [puppet] - 10https://gerrit.wikimedia.org/r/394253 [08:37:22] (03CR) 10Marostegui: [C: 032] Revert "filtered_tables: Add new columns" [puppet] - 10https://gerrit.wikimedia.org/r/394253 (owner: 10Marostegui) [08:41:53] (03PS1) 10Marostegui: filtered_tables: Add new columns [puppet] - 10https://gerrit.wikimedia.org/r/394254 (https://phabricator.wikimedia.org/T174569) [08:46:10] (03Draft2) 10Jayprakash12345: Add import sources to de.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394255 [08:46:50] (03PS3) 10Jayprakash12345: Add import sources to de.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394255 (https://phabricator.wikimedia.org/T181695) [08:47:10] (03PS1) 10Elukey: profile::hadoop::worker: increase Java Xmx to 4G for Datanode/Nodemanager [puppet] - 10https://gerrit.wikimedia.org/r/394256 (https://phabricator.wikimedia.org/T178876) [08:55:21] (03PS2) 10Elukey: profile::hadoop::worker: increase Java Xmx to 4G for Datanode/Nodemanager [puppet] - 10https://gerrit.wikimedia.org/r/394256 (https://phabricator.wikimedia.org/T178876) [08:58:53] (03PS1) 10Muehlenhoff: Record extended MOU dates [puppet] - 10https://gerrit.wikimedia.org/r/394257 [09:00:06] (03CR) 10Muehlenhoff: [C: 032] Record extended MOU dates [puppet] - 10https://gerrit.wikimedia.org/r/394257 (owner: 10Muehlenhoff) [09:05:22] RECOVERY - Disk space on graphite2002 is OK: DISK OK [09:05:36] !log add 200G of space to graphite2002 carbon lv [09:05:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:23] (03PS2) 10Addshore: wdbuild: add switch to ease killing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394207 (https://phabricator.wikimedia.org/T176948) [09:13:27] (03PS2) 10Addshore: wdbuild: extension-list-labs stop using build entry points [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394208 [09:13:33] (03PS2) 10Addshore: wdbuild: Stop using wikidata build on LABS / BETA [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394209 (https://phabricator.wikimedia.org/T176948) [09:13:39] (03PS2) 10Addshore: wdbuild: Stop loading from build on test and testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394210 (https://phabricator.wikimedia.org/T176948) [09:13:45] (03PS2) 10Addshore: wdbuild: Stop loading from build on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394211 (https://phabricator.wikimedia.org/T176948) [09:13:52] (03PS2) 10Addshore: wdbuild: Stop loading from build on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394212 (https://phabricator.wikimedia.org/T176948) [09:13:55] (03PS2) 10Addshore: wdbuild: Stop loading from build on all wikis (except enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394213 (https://phabricator.wikimedia.org/T176948) [09:13:58] (03PS2) 10Addshore: wdbuild: Stop loading from build on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394214 (https://phabricator.wikimedia.org/T176948) [09:14:58] !log drain and reboot analytics1029/1030 for jvm+kernel updates (Hadoop worker canaries) [09:15:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:16:26] !log installing exim security updates on stretch (jessie/trusty not affected) [09:16:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:18:56] (03PS2) 10Addshore: wdbuild: Remove wmgUseWikidataBuild [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394215 [09:19:04] (03PS2) 10Addshore: wdbuild: Remove Wikibase-buildentry.php config file (empty) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394216 [09:20:25] (03CR) 10jerkins-bot: [V: 04-1] wdbuild: Remove wmgUseWikidataBuild [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394215 (owner: 10Addshore) [09:28:52] (03PS3) 10Addshore: wdbuild: Remove wmgUseWikidataBuild [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394215 [09:29:15] 10Operations, 10monitoring, 10Patch-For-Review: ensure that services on labtest machines never create SMS from Icinga (not send sms pages for labtest* things to non-cloud folks) - https://phabricator.wikimedia.org/T178008#3799106 (10jcrespo) 05Open>03Resolved [09:29:50] 10Operations, 10monitoring, 10Patch-For-Review: ensure that services on labtest machines never create SMS from Icinga (not send sms pages for labtest* things to non-cloud folks) - https://phabricator.wikimedia.org/T178008#3677751 (10jcrespo) a:05chasemp>03Dzahn [09:30:07] (03CR) 10Elukey: "pcc: https://puppet-compiler.wmflabs.org/compiler02/9084/" [puppet] - 10https://gerrit.wikimedia.org/r/394256 (https://phabricator.wikimedia.org/T178876) (owner: 10Elukey) [09:30:14] (03PS3) 10Addshore: wdbuild: Remove Wikibase-buildentry.php config file (empty) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394216 [09:34:53] (03CR) 10Gehel: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/394021 (https://phabricator.wikimedia.org/T173772) (owner: 10Gehel) [09:47:05] 10Operations, 10ops-eqiad, 10DBA: db1016 m1 master: Possibly faulty BBU - https://phabricator.wikimedia.org/T166344#3799118 (10Marostegui) We can probably clone db1056 from db1001. [09:59:51] PROBLEM - kartotherian endpoints health on maps-test2004 is CRITICAL: /{src}/{z}/{x}/{y}.{format} (get a tile in the middle of the ocean, with overzoom) is CRITICAL: Test get a tile in the middle of the ocean, with overzoom returned the unexpected status 400 (expecting: 200): /img/{src},{z},{lat},{lon},{w}x{h}@{scale}x.{format} (Small scaled map) is CRITICAL: Test Small scaled map returned the unexpected status 400 (expecting: [09:59:51] }/{y}@{scale}x.{format} (default scaled tile) is CRITICAL: Test default scaled tile returned the unexpected status 400 (expecting: 200): /{src}/info.json (tile service info for osm-intl) is CRITICAL: Test tile service info for osm-intl returned the unexpected status 400 (expecting: 200): /{src}/info.json (tile service info for osm-pbf) is CRITICAL: Test tile service info for osm-pbf returned the unexpected status 400 (expecting [10:00:38] oops, downtime expired, fixing... [10:33:45] !log stopping replication on db1044 and db1095 (s3) [10:33:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:39:24] (03CR) 10Jcrespo: [C: 032] pr_index table is not private [puppet] - 10https://gerrit.wikimedia.org/r/375347 (https://phabricator.wikimedia.org/T113842) (owner: 10Tpt) [10:39:31] (03PS3) 10Jcrespo: pr_index table is not private [puppet] - 10https://gerrit.wikimedia.org/r/375347 (https://phabricator.wikimedia.org/T113842) (owner: 10Tpt) [10:40:47] 10Operations, 10ops-eqiad, 10DBA, 10Phabricator, 10hardware-requests: Decommission db1048 (was Move m3 slave to db1059) - https://phabricator.wikimedia.org/T175679#3799171 (10Marostegui) [10:41:48] (03PS4) 10Giuseppe Lavagetto: puppet: Move puppet CI to puppet 4.8.2 [puppet] - 10https://gerrit.wikimedia.org/r/393259 [10:41:50] (03PS1) 10Giuseppe Lavagetto: rsync::get: fix undefs for puppet4 [puppet] - 10https://gerrit.wikimedia.org/r/394266 [10:41:52] (03PS1) 10Giuseppe Lavagetto: git::clone: define all variables for puppet 4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/394267 [10:41:53] <_joe_> huge patchset incoming [10:41:54] (03PS1) 10Giuseppe Lavagetto: monitoring::host: enhance puppet4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/394268 [10:41:56] (03PS1) 10Giuseppe Lavagetto: Gemfile: update rspec-puppet, add rspec-puppet-facts [puppet] - 10https://gerrit.wikimedia.org/r/394269 [10:41:58] (03PS1) 10Giuseppe Lavagetto: mirrors: puppet 4 spec compatibility [puppet] - 10https://gerrit.wikimedia.org/r/394270 [10:42:00] (03PS1) 10Giuseppe Lavagetto: wmflib: fix spec, compatibility with puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/394271 [10:42:02] (03PS1) 10Giuseppe Lavagetto: spec: make apt,authdns, install_server work with puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/394272 [10:42:04] (03PS1) 10Giuseppe Lavagetto: spec: make jenkins, zuul, monitoring, profile work with puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/394273 [10:42:06] (03PS1) 10Giuseppe Lavagetto: spec: make network, puppetmaster, service, systemd compatible with puppet4 [puppet] - 10https://gerrit.wikimedia.org/r/394274 [10:42:18] _joe_: is that splitting your huge patch ? :) [10:42:42] <_joe_> hashar: yeah I just told you in another channel :P [10:42:55] too many channels [10:42:57] <_joe_> hashar: can we add a non-voting job for puppet 4? [10:43:06] that is probably going to make it easier to review [10:43:10] yeah should be easy maybe [10:43:19] <_joe_> but I think we just can review the patches and go with them [10:43:22] the Gemfile can vary the puppet version via an env variable [10:43:40] and maybe I can craft a second job that has that variable injected [10:43:43] <_joe_> I know, let's first be sure CI gives a +1 to all these patches though [10:44:21] (03CR) 10jerkins-bot: [V: 04-1] spec: make apt,authdns, install_server work with puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/394272 (owner: 10Giuseppe Lavagetto) [10:44:36] <_joe_> heh, kinda expected this could happen [10:45:55] <_joe_> and indeed, the issue is that facts are stringified with puppet 3.x in the module I'm using, meh [10:46:05] <_joe_> hashar: we will need to override that -1 I think [10:46:45] oh my god npm is not in Debian stretch :/ [10:46:51] <_joe_> hashar: yeah it's not [10:47:09] * hashar goes with curl | sudo bash [10:47:18] <_joe_> hashar: I'm gonna work on that problem for our containers [10:47:44] ah yeah that is https://phabricator.wikimedia.org/T180524 [10:50:50] <_joe_> hashar: I'll decompose for you what their bash script does :P [11:03:21] 10Operations, 10Goal, 10User-fgiunchedi, 10cloud-services-team (Kanban): Port elasticsearch metrics to Prometheus - https://phabricator.wikimedia.org/T181627#3799228 (10fgiunchedi) [11:10:00] (03CR) 10Giuseppe Lavagetto: [C: 031] role: split prometheus redis jobs [puppet] - 10https://gerrit.wikimedia.org/r/393794 (https://phabricator.wikimedia.org/T148637) (owner: 10Filippo Giunchedi) [11:11:31] (03PS3) 10Filippo Giunchedi: role: split prometheus redis jobs [puppet] - 10https://gerrit.wikimedia.org/r/393794 (https://phabricator.wikimedia.org/T148637) [11:12:14] (03CR) 10Filippo Giunchedi: [C: 032] role: split prometheus redis jobs [puppet] - 10https://gerrit.wikimedia.org/r/393794 (https://phabricator.wikimedia.org/T148637) (owner: 10Filippo Giunchedi) [11:19:31] (03PS2) 10Ema: prometheus: add varnish-canary job definition [puppet] - 10https://gerrit.wikimedia.org/r/394063 [11:19:33] (03CR) 10Ema: [V: 032 C: 032] prometheus: add varnish-canary job definition [puppet] - 10https://gerrit.wikimedia.org/r/394063 (owner: 10Ema) [11:19:50] (03CR) 10Giuseppe Lavagetto: [C: 032] "This is unused in our manifests, I'm just ensuring tests pass and the define can work in puppet 4." [puppet] - 10https://gerrit.wikimedia.org/r/394266 (owner: 10Giuseppe Lavagetto) [11:20:45] (03PS8) 10Muehlenhoff: Add Prometheus exporter to openldap/labs [puppet] - 10https://gerrit.wikimedia.org/r/394025 [11:21:16] (03PS2) 10Giuseppe Lavagetto: rsync::get: fix undefs for puppet4 [puppet] - 10https://gerrit.wikimedia.org/r/394266 [11:22:29] (03CR) 10Muehlenhoff: [C: 032] Add Prometheus exporter to openldap/labs [puppet] - 10https://gerrit.wikimedia.org/r/394025 (owner: 10Muehlenhoff) [11:24:15] <_joe_> grrrr merge wars [11:24:23] <_joe_> grrr [11:24:33] (03PS3) 10Giuseppe Lavagetto: rsync::get: fix undefs for puppet4 [puppet] - 10https://gerrit.wikimedia.org/r/394266 [11:26:02] PROBLEM - puppet last run on seaborgium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[prometheus-openldap-exporter] [11:28:12] (03CR) 10Faidon Liambotis: [C: 031] git::clone: define all variables for puppet 4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/394267 (owner: 10Giuseppe Lavagetto) [11:28:43] (03CR) 10Volans: "Not blocked anymore, future parser syntax is now allowed" [puppet] - 10https://gerrit.wikimedia.org/r/392606 (https://phabricator.wikimedia.org/T170353) (owner: 10Volans) [11:30:05] (03CR) 10Volans: "Not blocked anymore, future parser syntax allowed" [puppet] - 10https://gerrit.wikimedia.org/r/392607 (https://phabricator.wikimedia.org/T170353) (owner: 10Volans) [11:31:02] RECOVERY - puppet last run on seaborgium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:31:30] (03CR) 10Volans: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/392607 (https://phabricator.wikimedia.org/T170353) (owner: 10Volans) [11:31:34] 10Operations, 10Patch-For-Review, 10Prometheus-metrics-monitoring, 10User-fgiunchedi: Port redis statistics to Prometheus - https://phabricator.wikimedia.org/T148637#3799299 (10fgiunchedi) All redis (except for ores') have their metrics being pulled now! [11:31:43] (03CR) 10jerkins-bot: [V: 04-1] Metric alarms: convert dashboad_link to array [puppet] - 10https://gerrit.wikimedia.org/r/392607 (https://phabricator.wikimedia.org/T170353) (owner: 10Volans) [11:32:08] eh, manual rebase needed :D [11:35:19] (03PS3) 10Addshore: wdbuild: extension-list-labs stop using build entry points [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394208 (https://phabricator.wikimedia.org/T177060) [11:36:03] (03PS3) 10Addshore: wdbuild: add switch to ease killing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394207 (https://phabricator.wikimedia.org/T176948) [11:36:19] (03PS4) 10Addshore: wdbuild: extension-list-labs stop using build entry points [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394208 (https://phabricator.wikimedia.org/T177060) [11:36:27] (03PS3) 10Addshore: wdbuild: Stop using wikidata build on LABS / BETA [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394209 (https://phabricator.wikimedia.org/T176948) [11:36:35] (03PS3) 10Addshore: wdbuild: Stop loading from build on test and testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394210 (https://phabricator.wikimedia.org/T176948) [11:36:44] (03PS3) 10Addshore: wdbuild: Stop loading from build on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394211 (https://phabricator.wikimedia.org/T176948) [11:36:50] (03PS3) 10Addshore: wdbuild: Stop loading from build on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394212 (https://phabricator.wikimedia.org/T176948) [11:36:56] (03PS3) 10Addshore: wdbuild: Stop loading from build on all wikis (except enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394213 (https://phabricator.wikimedia.org/T176948) [11:37:00] (03PS3) 10Addshore: wdbuild: Stop loading from build on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394214 (https://phabricator.wikimedia.org/T176948) [11:38:21] PROBLEM - mobileapps endpoints health on scb2005 is CRITICAL: /{domain}/v1/page/media/{title} (retrieve images and videos of en.wp Cat page via media route) timed out before a response was received [11:39:11] RECOVERY - mobileapps endpoints health on scb2005 is OK: All endpoints are healthy [11:41:28] (03PS1) 10Filippo Giunchedi: role: add ldap Prometheus jobs [puppet] - 10https://gerrit.wikimedia.org/r/394280 (https://phabricator.wikimedia.org/T181511) [11:41:38] (03PS4) 10Addshore: wdbuild: Remove wmgUseWikidataBuild [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394215 [11:42:10] (03PS4) 10Addshore: wdbuild: Remove Wikibase-buildentry.php config file (empty) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394216 [11:43:04] (03CR) 10jerkins-bot: [V: 04-1] wdbuild: Remove wmgUseWikidataBuild [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394215 (owner: 10Addshore) [11:43:11] PROBLEM - mobileapps endpoints health on scb2003 is CRITICAL: /{domain}/v1/page/media/{title} (retrieve images and videos of en.wp Cat page via media route) timed out before a response was received [11:43:12] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: /{domain}/v1/page/media/{title} (retrieve images and videos of en.wp Cat page via media route) timed out before a response was received [11:43:21] PROBLEM - mobileapps endpoints health on scb2005 is CRITICAL: /{domain}/v1/page/media/{title} (retrieve images and videos of en.wp Cat page via media route) timed out before a response was received [11:43:28] <_joe_> uhm [11:43:33] <_joe_> mobrovac: ^^ :P [11:44:08] (03PS5) 10Addshore: wdbuild: Remove wmgUseWikidataBuild [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394215 [11:44:09] <_joe_> that calls restbase right? [11:44:11] RECOVERY - mobileapps endpoints health on scb2003 is OK: All endpoints are healthy [11:44:12] RECOVERY - mobileapps endpoints health on scb2005 is OK: All endpoints are healthy [11:44:12] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: /{domain}/v1/page/media/{title} (retrieve images and videos of en.wp Cat page via media route) timed out before a response was received [11:44:22] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 90.00% of data above the critical threshold [50.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1panelId=2fullscreen [11:44:25] (03PS5) 10Addshore: wdbuild: Remove Wikibase-buildentry.php config file (empty) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394216 [11:44:29] <_joe_> uh [11:44:33] <_joe_> mediawiki fatals [11:45:11] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: /{domain}/v1/page/media/{title} (retrieve images and videos of en.wp Cat page via media route) timed out before a response was received [11:46:11] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [11:46:26] (03PS1) 10Addshore: wdbuild: Add wikidata extensions to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394282 [11:46:33] (03PS1) 10Addshore: wdbuild: Switch wikidata extensions to json entrypoint where possible [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394283 [11:47:12] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [11:47:13] (03PS2) 10Addshore: wdbuild: Add wikidata extensions to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394282 (https://phabricator.wikimedia.org/T177060) [11:47:18] db1091 is doing something weird [11:48:03] (03PS2) 10Addshore: wdbuild: Switch wikidata extensions to json entrypoint where possible [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394283 (https://phabricator.wikimedia.org/T123026) [11:48:11] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [11:48:45] (03CR) 10Faidon Liambotis: [C: 031] monitoring::host: enhance puppet4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/394268 (owner: 10Giuseppe Lavagetto) [11:49:41] (03PS1) 10Muehlenhoff: Add apt pinning for Twisted from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/394284 [11:51:25] (03CR) 10Faidon Liambotis: [C: 031] "Seems fine from a cursory look." [puppet] - 10https://gerrit.wikimedia.org/r/394271 (owner: 10Giuseppe Lavagetto) [11:54:03] (03CR) 10Muehlenhoff: [C: 031] role: add ldap Prometheus jobs [puppet] - 10https://gerrit.wikimedia.org/r/394280 (https://phabricator.wikimedia.org/T181511) (owner: 10Filippo Giunchedi) [11:54:39] (03PS1) 10Marostegui: db-eqiad.php: Give more traffic to db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394285 [11:55:53] (03CR) 10Jcrespo: [C: 031] db-eqiad.php: Give more traffic to db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394285 (owner: 10Marostegui) [11:56:10] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Give more traffic to db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394285 (owner: 10Marostegui) [11:57:34] (03Merged) 10jenkins-bot: db-eqiad.php: Give more traffic to db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394285 (owner: 10Marostegui) [11:57:44] (03CR) 10jenkins-bot: db-eqiad.php: Give more traffic to db1081 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394285 (owner: 10Marostegui) [11:58:31] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 70.00% above the threshold [25.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1panelId=2fullscreen [11:58:55] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1081 and reduce traffic for db1091 (duration: 00m 50s) [11:59:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:02:20] (03PS2) 10Filippo Giunchedi: role: add ldap Prometheus jobs [puppet] - 10https://gerrit.wikimedia.org/r/394280 (https://phabricator.wikimedia.org/T181511) [12:02:45] (03PS1) 10Marostegui: db-eqiad.php: Tackle the weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394287 [12:03:00] (03CR) 10Filippo Giunchedi: [C: 032] role: add ldap Prometheus jobs [puppet] - 10https://gerrit.wikimedia.org/r/394280 (https://phabricator.wikimedia.org/T181511) (owner: 10Filippo Giunchedi) [12:03:32] (03PS2) 10Marostegui: db-eqiad.php: Tackle s4 weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394287 [12:05:17] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Tackle s4 weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394287 (owner: 10Marostegui) [12:06:31] (03Merged) 10jenkins-bot: db-eqiad.php: Tackle s4 weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394287 (owner: 10Marostegui) [12:06:59] (03CR) 10jenkins-bot: db-eqiad.php: Tackle s4 weights [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394287 (owner: 10Marostegui) [12:07:39] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Tackle s4 DB weights to make them more equal (duration: 00m 48s) [12:07:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:25:39] (03PS2) 10Giuseppe Lavagetto: git::clone: define all variables for puppet 4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/394267 [12:26:40] (03CR) 10Giuseppe Lavagetto: [C: 032] git::clone: define all variables for puppet 4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/394267 (owner: 10Giuseppe Lavagetto) [12:27:10] (03PS2) 10Giuseppe Lavagetto: monitoring::host: enhance puppet4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/394268 [12:28:46] (03PS1) 10Addshore: DNM Remove wikidatabuilder [puppet] - 10https://gerrit.wikimedia.org/r/394291 (https://phabricator.wikimedia.org/T181706) [12:32:57] mobrovac: re:table creation on rb1012, do you know from where it originated? from the linked logs I can see that source_host is rb1012 itself [12:33:31] (03PS2) 10Muehlenhoff: Add apt pinning for Twisted from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/394284 [12:34:46] _joe_: if you are around. Given a single docker-pkg template can I build two containers (jessie & stretch) ? :D [12:35:12] but I guess I should get two copy pasted templates [12:35:34] (03CR) 10Muehlenhoff: [C: 032] Add apt pinning for Twisted from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/394284 (owner: 10Muehlenhoff) [12:41:15] (03PS4) 10Muehlenhoff: Remove ffmpeg2theora from package list [puppet] - 10https://gerrit.wikimedia.org/r/373733 (https://phabricator.wikimedia.org/T172445) [12:42:10] (03CR) 10Muehlenhoff: [C: 032] Remove ffmpeg2theora from package list [puppet] - 10https://gerrit.wikimedia.org/r/373733 (https://phabricator.wikimedia.org/T172445) (owner: 10Muehlenhoff) [12:42:40] <_joe_> hashar: nope [12:48:08] _joe_: I found a way to isntall npm on stretch :] [12:48:40] <_joe_> argh expect some puppet failures [12:48:42] FROM wmfreleng/npm:latest as npm-jessie [12:48:43] COPY --from=npm-jessie /usr/local/lib/node_modules/npm/ /usr/local/lib/node_modules/npm/ [12:48:43] RUN ln -s ../lib/node_modules/npm/bin/npm-cli.js /usr/local/bin/npm [12:48:55] or really use jessie npm 1.4 to bootstrap npm 3.8.3 [12:49:05] <_joe_> nah [12:49:11] and then copy the node_modules / js files from jessie to the stretch container :] [12:49:23] <_joe_> I will get to this once I resolved the issue we're having with puppet 4 agents [12:49:50] well it is really a dirty hack . I will later follow whatever steps you took for the mathoid/SSD containers I guess [12:52:32] PROBLEM - puppet last run on db2033 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/lib/nagios/plugins/check-fresh-files-in-dir.py] [12:53:41] PROBLEM - puppet last run on mw2232 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/furl] [13:04:52] RECOVERY - puppet last run on puppetmaster2001 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [13:07:30] (03CR) 10Giuseppe Lavagetto: [C: 031] dns: restore puppet.codfw.wmnet CNAME puppetmaster2001.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/394110 (https://phabricator.wikimedia.org/T177254) (owner: 10Herron) [13:09:52] (03CR) 10Giuseppe Lavagetto: [C: 032] monitoring::host: enhance puppet4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/394268 (owner: 10Giuseppe Lavagetto) [13:10:14] (03PS3) 10Giuseppe Lavagetto: monitoring::host: enhance puppet4 compatibility [puppet] - 10https://gerrit.wikimedia.org/r/394268 [13:11:52] PROBLEM - puppet last run on mw2178 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:12:32] PROBLEM - puppet last run on mw2230 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:13:15] <_joe_> looking [13:13:44] zeljkof: FYI i'm going to push out 2 small backports in the next 45 mins (before swat) as I'm busy during swat! [13:13:55] <_joe_> yeah that was my fault [13:13:58] <_joe_> but should be ok now [13:14:11] PROBLEM - puppet last run on mw2143 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:14:35] <_joe_> re-running puppet now fixes the issue [13:15:01] PROBLEM - puppet last run on mw2103 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:15:31] (03CR) 10Herron: [C: 032] dns: restore puppet.codfw.wmnet CNAME puppetmaster2001.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/394110 (https://phabricator.wikimedia.org/T177254) (owner: 10Herron) [13:16:32] PROBLEM - puppet last run on mw2249 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:16:51] RECOVERY - puppet last run on mw2178 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:18:07] !log cutting codfw puppet agents over to puppetmaster2001.codfw.wmnet [13:18:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:18:32] RECOVERY - puppet last run on mw2232 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:19:32] PROBLEM - puppet last run on db2047 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:22:23] addshore: ok with me [13:22:30] Awesome! [13:22:31] RECOVERY - puppet last run on db2033 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [13:22:37] !log addshore@tin Synchronized php-1.31.0-wmf.8/extensions/AdvancedSearch/modules/ext.advancedSearch.init.js: pre-swat: T181644 Force search profile advanced in AdvancedSearch [[gerrit:394299]] (duration: 00m 50s) [13:22:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:45] T181644: Advanced Search: Namespace Filters not working in dewiki - https://phabricator.wikimedia.org/T181644 [13:22:51] PROBLEM - puppet last run on mw2180 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:23:10] I had already started the sync ;) [13:23:41] !log addshore@tin Synchronized php-1.31.0-wmf.10/extensions/AdvancedSearch/modules/ext.advancedSearch.init.js: pre-swat: T181644 Force search profile advanced in AdvancedSearch [[gerrit:394297]] (duration: 00m 48s) [13:23:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:24:32] RECOVERY - puppet last run on db2047 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:24:52] PROBLEM - puppet last run on mw2136 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:25:22] PROBLEM - puppet last run on mw2124 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:25:30] zeljkof: thats me all done [13:25:36] <_joe_> the puppet last run checks have a lot of lag [13:25:48] addshore: nice [13:31:03] <_joe_> all restbase hosts in codfw will fail puppet runs [13:31:16] <_joe_> some new puppet4 incompatibility was introduced in the meanwhile [13:32:31] PROBLEM - puppet last run on restbase2008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:32:32] RECOVERY - puppet last run on mw2230 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [13:32:41] PROBLEM - puppet last run on restbase2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:33:01] PROBLEM - puppet last run on db2036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:33:01] PROBLEM - puppet last run on restbase2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:33:11] PROBLEM - puppet last run on cp2005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/varnishreqstats] [13:35:02] <_joe_> db2036 and cp2005 are false positives [13:35:11] PROBLEM - puppet last run on mw2113 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:36:21] PROBLEM - puppet last run on restbase2007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:37:51] PROBLEM - puppet last run on restbase-test2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:38:01] RECOVERY - puppet last run on db2036 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:38:19] (03PS1) 10Herron: puppet: fix cqlsh.erb template to parse under puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/394301 (https://phabricator.wikimedia.org/T177254) [13:38:32] PROBLEM - puppet last run on restbase2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:41:21] PROBLEM - puppet last run on restbase-test2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:41:31] PROBLEM - puppet last run on restbase2011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:41:32] RECOVERY - puppet last run on mw2249 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [13:41:32] PROBLEM - puppet last run on restbase2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:42:01] PROBLEM - puppet last run on puppetmaster2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:42:03] (03CR) 10Herron: [C: 032] puppet: fix cqlsh.erb template to parse under puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/394301 (https://phabricator.wikimedia.org/T177254) (owner: 10Herron) [13:44:01] PROBLEM - puppet last run on restbase-test2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:44:11] RECOVERY - puppet last run on mw2143 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:44:52] RECOVERY - puppet last run on mw2103 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:46:21] RECOVERY - puppet last run on restbase2007 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [13:46:32] RECOVERY - puppet last run on restbase2011 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [13:46:32] RECOVERY - puppet last run on restbase2006 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [13:47:31] RECOVERY - puppet last run on restbase2008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:47:41] RECOVERY - puppet last run on restbase2004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:48:01] RECOVERY - puppet last run on restbase2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:48:32] RECOVERY - puppet last run on restbase2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:52:51] RECOVERY - puppet last run on mw2180 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:54:52] RECOVERY - puppet last run on mw2136 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [13:54:55] (03CR) 10Jayprakash12345: "@SWAT member, Please Create shorturl table first before merge. By" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/387618 (https://phabricator.wikimedia.org/T178919) (owner: 10Jayprakash12345) [13:55:22] RECOVERY - puppet last run on mw2124 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:00:05] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: #bothumor I � Unicode. All rise for European Mid-day SWAT(Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171130T1400). [14:00:08] Pl217: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [14:00:14] I can SWAT today [14:00:20] Pl217: around for SWAT? [14:00:22] RECOVERY - puppet last run on puppetmaster2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:00:26] Yes [14:00:31] Ok [14:00:36] * Nikerabbit is watching [14:00:58] * addshore continues eating his food [14:01:20] Pl217: I will ping you in a few minutes when you patch is at mwdebug1002, do you know how to test there? [14:01:42] Pl217: forgot to ask, do you want to deploy your patch yourself? [14:01:59] I have the extension. On what environment to test it? [14:02:06] You do the deployment [14:02:18] Pl217: you will be able to test it at mwdebug1002 [14:02:22] in a few minutes [14:02:36] zeljkof: I can't test 394255 Add import sources to de.wiki. Because I have not transwiki right. [14:02:51] RECOVERY - puppet last run on restbase-test2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:02:54] zeljkof: Ping me when ready [14:03:11] RECOVERY - puppet last run on cp2005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:03:12] zeljkof: So Please merge And Syn. [14:05:11] RECOVERY - puppet last run on mw2113 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:08:42] (03CR) 10Ottomata: [C: 031] "I think its fine, especially for the env.sh files. I am more concerned about keeping the .xml files the same." [puppet] - 10https://gerrit.wikimedia.org/r/394256 (https://phabricator.wikimedia.org/T178876) (owner: 10Elukey) [14:09:15] (03PS4) 10Zfilipin: Enable WikiLove Extension on pa.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/387618 (https://phabricator.wikimedia.org/T178919) (owner: 10Jayprakash12345) [14:10:40] Pl217: CI is taking a while to merge, should be done any minute now [14:11:02] zeljkof: Good. I'm waiting [14:11:10] Pl217: docs on how to test at mwdebug1002 https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug#Staging_changes [14:11:18] if case you need it [14:11:21] RECOVERY - puppet last run on restbase-test2003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [14:11:35] zeljkof: I have read the docs [14:11:41] and let me know if you have any questions [14:12:05] volans: re rb1012, yes, it originated there, but that's not the issue (we figured why that happened), the issue is that these tables hold data in them, which is possible iff there was live traffic directed at rb1012 (which shouldn't have been the case) [14:12:15] zeljkof: $ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=pawiki WikiLove [14:12:53] volans: so i suspect that rb1012's state was actually pooled=yes at some point [14:13:09] i.e. lvs was sending traffic to it [14:13:31] mobrovac: do you have an exact datetime? [14:13:49] I guess we can check pybal logs [14:14:01] RECOVERY - puppet last run on restbase-test2001 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [14:14:17] Pl217: the patch is at mwdebug1002, please test and let me know if I can deploy [14:14:38] mobrovac: are the times in the linked log correct? [14:15:01] volans: yes, so 12:54 utc for sure, but there are others preceding it [14:15:11] zeljkof: Great. Give me some time to test [14:15:25] Pl217: sure, let me know if you need more that 5 minutes [14:17:01] mobrovac: do you happen to know already the LVS involved? (or I look in config) [14:17:59] Jayprakash12345: maybe I can test 394255, can you let me know where to look for the new option? [14:18:15] nope volans, restbase.discovery.wmnet aka restbase.svc.eqiad.wmnet [14:18:25] ack, thanks [14:18:58] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394255 (https://phabricator.wikimedia.org/T181695) (owner: 10Jayprakash12345) [14:19:09] !log Deploy schema change on s3 - db1078 - T174569 [14:19:09] zeljkof: It's good, you can deploy [14:19:10] (03PS15) 10Elukey: profile::hadoop::worker: add Prometheus JMX exporter configuration [puppet] - 10https://gerrit.wikimedia.org/r/394045 (https://phabricator.wikimedia.org/T177458) [14:19:15] zeljkof: http://de.wikipedia.org/wiki/Special:Import [14:19:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:17] T174569: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569 [14:19:18] Pl217: deploying [14:19:33] zeljkof: Great, thank you so much! [14:20:12] !log zfilipin@tin Synchronized php-1.31.0-wmf.10/extensions/ContentTranslation/modules/dashboard/ext.cx.dashboard.js: SWAT: [[gerrit:394292|Set default languages after fetching valid languages]] (duration: 00m 49s) [14:20:18] zeljkof: You may see me around every once in a while, as I have developed a craft for breaking things :) [14:20:18] (03Merged) 10jenkins-bot: Add import sources to de.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394255 (https://phabricator.wikimedia.org/T181695) (owner: 10Jayprakash12345) [14:20:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:20:26] zeljkof: See 'v' option in Import Source. [14:20:38] Pl217: it's deployed, please test and thanks for releasing with #releng ;) [14:20:52] (don't forget to disable the browser extension) [14:20:58] Jayprakash12345: ok, looking [14:22:16] Pl217: "Note: If you break AND fix the wikis, you will be rewarded with a sticker." (I thought one also got a t-shirt saying "I broke Wikipedia") [14:22:32] 10Operations, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Elukey: Decommission old dbstore hosts (db1046, db1047) - https://phabricator.wikimedia.org/T156844#3799814 (10elukey) >>! In T156844#3786146, @Capt_Swing wrote: > The `shawn` table belonged to Shawn Walker, a research intern in 2011. Th... [14:22:44] mobrovac: so, when in theoyr was supposed to have been depooled restbase1012? [14:23:26] zeljkof: I have checked and it's fixed [14:23:36] (03CR) 10jenkins-bot: Add import sources to de.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394255 (https://phabricator.wikimedia.org/T181695) (owner: 10Jayprakash12345) [14:23:36] zeljkof: hhhh [14:23:55] zeljkof: They will give me a jacket instead of a t-shirt if I continue with this pace ;) [14:24:30] Jayprakash12345: ok, I can't test it [14:24:48] https://de.wikipedia.org/wiki/Spezial:Importieren?uselang=en says "You do not have permission to import pages from another wiki, for the following reason: " [14:25:51] zeljkof: So syn. Because It not have tipical code. I just add simple 'v'. So It can't broke anything. [14:26:18] Jayprakash12345: you would be surprised at how simple things can break something :) [14:26:26] but deploying, I don't think it will be a problem [14:26:38] please leave a comment in the task saying we could not test [14:28:30] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:394255|Add import sources to de.wiki (T181695)]] (duration: 00m 49s) [14:28:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:38] T181695: Add import sources to de.wiki - https://phabricator.wikimedia.org/T181695 [14:28:51] Jayprakash12345: 394255 is deployed [14:29:13] (03PS5) 10Zfilipin: Enable WikiLove Extension on pa.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/387618 (https://phabricator.wikimedia.org/T178919) (owner: 10Jayprakash12345) [14:29:43] zeljkof: Thank. Please go ahead with second on. [14:30:52] zeljkof: In https://gerrit.wikimedia.org/r/#/c/387618/. First we need tocreate the table. [14:31:26] Jayprakash12345: so, I first run the script, then do the deploy? [14:31:57] zeljkof: yeah [14:32:04] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/387618 (https://phabricator.wikimedia.org/T178919) (owner: 10Jayprakash12345) [14:32:07] ok [14:33:30] (03Merged) 10jenkins-bot: Enable WikiLove Extension on pa.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/387618 (https://phabricator.wikimedia.org/T178919) (owner: 10Jayprakash12345) [14:33:42] zeljkof: ($ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=pawiki WikiLove) This works? [14:33:45] (03CR) 10jenkins-bot: Enable WikiLove Extension on pa.wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/387618 (https://phabricator.wikimedia.org/T178919) (owner: 10Jayprakash12345) [14:34:41] Jayprakash12345: waiting for the commit to merge before running the script [14:34:44] (03CR) 10Elukey: [C: 031] "It looks great!" [puppet] - 10https://gerrit.wikimedia.org/r/394045 (https://phabricator.wikimedia.org/T177458) (owner: 10Elukey) [14:37:39] (03PS1) 10Herron: dns: set puppet.ulsfo.wmnet CNAME puppetmaster2001.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/394307 (https://phabricator.wikimedia.org/T177254) [14:37:41] (03PS1) 10Herron: dns: set puppet.esams.wmnet CNAME puppetmaster2001.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/394308 (https://phabricator.wikimedia.org/T177254) [14:38:30] (03CR) 10Zfilipin: "zfilipin@terbium:~$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=pawiki WikiLove" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/387618 (https://phabricator.wikimedia.org/T178919) (owner: 10Jayprakash12345) [14:39:09] Jayprakash12345: it's at mwdebug1002, please test and let me know if I can deploy [14:39:21] zeljkof: ok [14:40:14] I am looking at this user page, but I do not see the heart icon https://pa.wikipedia.org/wiki/%E0%A8%B5%E0%A8%B0%E0%A8%A4%E0%A9%8B%E0%A8%82%E0%A8%95%E0%A8%BE%E0%A8%B0:Satnam_S_Virdi?uselang=en [14:42:10] I see it now, I guess it needed a minute [14:42:16] looks good to you? can I deploy? [14:42:34] Wait for a min [14:45:25] zeljkof: Looking Good. See https://pa.wikipedia.org/wiki/%E0%A8%B5%E0%A8%B0%E0%A8%A4%E0%A9%8B%E0%A8%82%E0%A8%95%E0%A8%BE%E0%A8%B0_%E0%A8%97%E0%A9%B1%E0%A8%B2-%E0%A8%AC%E0%A8%BE%E0%A8%A4:Satdeep_Gill#A_cupcake_for_you.21 [14:45:57] ok, deploying [14:46:03] zeljkof: Go Ahead. [14:46:49] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:387618|Enable WikiLove Extension on pa.wiki (T178919)]] (duration: 00m 49s) [14:46:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:46:57] T178919: ShortUrl and WikiLove on Punjabi Wikipedia - https://phabricator.wikimedia.org/T178919 [14:47:32] Jayprakash12345: deployed, please check and thanks for deploying with #releng ;) [14:49:34] zeljkof: Everything is Ok. Thank for being here. [14:49:53] :) [14:49:57] looks like that is all for swat [14:50:02] !log EU SWAT finished [14:50:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:56] 10Operations, 10DBA, 10Epic: Eliminate SPOF at the main database infrastructure - https://phabricator.wikimedia.org/T119626#3799913 (10jcrespo) [14:52:49] (03PS1) 10Ottomata: Install cergen on Puppet CA host [puppet] - 10https://gerrit.wikimedia.org/r/394310 (https://phabricator.wikimedia.org/T166167) [14:55:28] (03CR) 10Elukey: [C: 031] "Patch Set 15: Code-Review+1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/394144 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata) [15:04:07] (03CR) 10Dzahn: [C: 031] role::ci::slave::browsertests: Fix $redis_port by adding string [puppet] - 10https://gerrit.wikimedia.org/r/394096 (owner: 10Paladox) [15:06:24] (03CR) 10Ottomata: [C: 032] Install cergen on Puppet CA host [puppet] - 10https://gerrit.wikimedia.org/r/394310 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata) [15:06:27] (03PS1) 10Jcrespo: mariadb: Increase db1110 load to normal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394313 (https://phabricator.wikimedia.org/T181613) [15:06:31] (03PS2) 10Giuseppe Lavagetto: Gemfile: update rspec-puppet, add rspec-puppet-facts [puppet] - 10https://gerrit.wikimedia.org/r/394269 [15:07:50] (03CR) 10Giuseppe Lavagetto: [C: 032] Gemfile: update rspec-puppet, add rspec-puppet-facts [puppet] - 10https://gerrit.wikimedia.org/r/394269 (owner: 10Giuseppe Lavagetto) [15:08:19] (03PS2) 10Giuseppe Lavagetto: mirrors: puppet 4 spec compatibility [puppet] - 10https://gerrit.wikimedia.org/r/394270 [15:09:00] (03CR) 10Giuseppe Lavagetto: [C: 032] mirrors: puppet 4 spec compatibility [puppet] - 10https://gerrit.wikimedia.org/r/394270 (owner: 10Giuseppe Lavagetto) [15:10:02] (03PS2) 10Giuseppe Lavagetto: wmflib: fix spec, compatibility with puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/394271 [15:10:23] (03CR) 10Elukey: [C: 032] "The previous comment was of course for another code review :)" [puppet] - 10https://gerrit.wikimedia.org/r/394045 (https://phabricator.wikimedia.org/T177458) (owner: 10Elukey) [15:10:33] (03PS16) 10Elukey: profile::hadoop::worker: add Prometheus JMX exporter configuration [puppet] - 10https://gerrit.wikimedia.org/r/394045 (https://phabricator.wikimedia.org/T177458) [15:10:42] PROBLEM - puppet last run on puppetmaster1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Package[cergen] [15:10:52] RECOVERY - Disk space on furud is OK: DISK OK [15:10:55] <_joe_> ottomata: ^^ [15:11:10] yeah [15:11:18] cergen : Depends: python3-cryptography (>= 1.7.0) but 0.6.1-1+deb8u1 is to be installed [15:11:18] Depends: python3-openssl (>= 16.0.0) but 0.14-1 is to be installed [15:11:21] RECOVERY - Disk space on flerovium is OK: DISK OK [15:11:28] _joe_: i would have thought since those versions are in jessie backports [15:11:31] that they would just be installed [15:11:33] !log beginning cut over of ulsfo to codfw puppet 4 masters [15:11:38] do I need to manually install those versions in puppet? [15:11:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:11:42] <_joe_> ottomata: noope [15:11:44] or is there somethign I did wrong in the debian package? [15:11:59] (03CR) 10Herron: [C: 032] dns: set puppet.ulsfo.wmnet CNAME puppetmaster2001.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/394307 (https://phabricator.wikimedia.org/T177254) (owner: 10Herron) [15:12:14] (03PS3) 10Elukey: profile::hadoop::worker: increase Java Xmx to 4G for Datanode/Nodemanager [puppet] - 10https://gerrit.wikimedia.org/r/394256 (https://phabricator.wikimedia.org/T178876) [15:12:35] (03CR) 10Marostegui: [C: 031] mariadb: Increase db1110 load to normal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394313 (https://phabricator.wikimedia.org/T181613) (owner: 10Jcrespo) [15:12:47] (03CR) 10Elukey: [C: 032] profile::hadoop::worker: increase Java Xmx to 4G for Datanode/Nodemanager [puppet] - 10https://gerrit.wikimedia.org/r/394256 (https://phabricator.wikimedia.org/T178876) (owner: 10Elukey) [15:13:03] (03CR) 10Jcrespo: [C: 032] mariadb: Increase db1110 load to normal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394313 (https://phabricator.wikimedia.org/T181613) (owner: 10Jcrespo) [15:13:15] <_joe_> ottomata: no idea about your package, but no, packages from backports won't be automatically installed [15:13:43] ok [15:14:23] (03Merged) 10jenkins-bot: mariadb: Increase db1110 load to normal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394313 (https://phabricator.wikimedia.org/T181613) (owner: 10Jcrespo) [15:14:34] (03PS1) 10Ottomata: Install some cergen python3 deps from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/394314 (https://phabricator.wikimedia.org/T166167) [15:15:08] (03CR) 10Giuseppe Lavagetto: [C: 032] wmflib: fix spec, compatibility with puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/394271 (owner: 10Giuseppe Lavagetto) [15:15:13] _joe_: https://gerrit.wikimedia.org/r/#/c/394314/1/modules/profile/manifests/puppetmaster/frontend.pp [15:15:17] PROBLEM - LVS HTTP IPv4 on wdqs.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:15:22] <_joe_> uh [15:15:23] (03PS3) 10Giuseppe Lavagetto: wmflib: fix spec, compatibility with puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/394271 [15:15:26] oops one copy paste [15:15:28] error [15:15:30] <_joe_> ottomata: not now [15:15:38] gehel: around? [15:15:42] <_joe_> gehel: know what's going on? [15:15:53] (03PS2) 10Ottomata: Install some cergen python3 deps from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/394314 (https://phabricator.wikimedia.org/T166167) [15:16:06] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Increase db1110 load (duration: 00m 48s) [15:16:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:16:14] looking [15:16:16] RECOVERY - LVS HTTP IPv4 on wdqs.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 434 bytes in 0.012 second response time [15:16:22] load? [15:16:28] probably [15:16:43] wdqs1004 was depooled? [15:16:47] <_joe_> nope [15:17:03] (03PS4) 10Ottomata: Install some cergen python3 deps from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/394314 (https://phabricator.wikimedia.org/T166167) [15:17:05] https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?refresh=1m&orgId=1&from=now-1h&to=now [15:17:08] (03CR) 10jenkins-bot: mariadb: Increase db1110 load to normal [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394313 (https://phabricator.wikimedia.org/T181613) (owner: 10Jcrespo) [15:17:25] it went down to inactive 20min ago [15:17:29] high heap usage, looks like an overly expensive query [15:18:37] gehel: see -tech there is one of you users ;) [15:18:41] 10Operations, 10ops-eqiad: Please disconnect flerovium's disk shelves - https://phabricator.wikimedia.org/T181724#3799996 (10faidon) [15:19:02] volans: thanks! [15:19:05] 10Operations, 10ops-eqiad: Disconnect flerovium's disk shelves - https://phabricator.wikimedia.org/T181724#3800012 (10faidon) [15:19:16] (03PS2) 10Giuseppe Lavagetto: spec: make apt,authdns, install_server work with puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/394272 [15:19:21] !log restart blazegraph on wdqs1004 [15:19:24] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] spec: make apt,authdns, install_server work with puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/394272 (owner: 10Giuseppe Lavagetto) [15:19:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:19:39] 10Operations, 10ops-codfw: Disconnect furud's disk shelves - https://phabricator.wikimedia.org/T181725#3800014 (10faidon) [15:19:54] just restarting the service for now. Then I'll dig into GC logs and see if I can spot something [15:19:58] (03CR) 10Ottomata: [C: 032] Install some cergen python3 deps from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/394314 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata) [15:20:08] (03PS5) 10Ottomata: Install some cergen python3 deps from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/394314 (https://phabricator.wikimedia.org/T166167) [15:20:10] (03CR) 10Ottomata: [V: 032 C: 032] Install some cergen python3 deps from jessie-backports [puppet] - 10https://gerrit.wikimedia.org/r/394314 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata) [15:20:40] (03PS2) 10Giuseppe Lavagetto: spec: make jenkins, zuul, monitoring, profile work with puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/394273 [15:21:49] (03CR) 10Giuseppe Lavagetto: [C: 032] spec: make jenkins, zuul, monitoring, profile work with puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/394273 (owner: 10Giuseppe Lavagetto) [15:22:09] (03PS2) 10Giuseppe Lavagetto: spec: make network, puppetmaster, service, systemd compatible with puppet4 [puppet] - 10https://gerrit.wikimedia.org/r/394274 [15:23:17] (03CR) 10Giuseppe Lavagetto: [C: 032] spec: make network, puppetmaster, service, systemd compatible with puppet4 [puppet] - 10https://gerrit.wikimedia.org/r/394274 (owner: 10Giuseppe Lavagetto) [15:23:50] (03PS5) 10Giuseppe Lavagetto: puppet: Move puppet CI to puppet 4.8.2 [puppet] - 10https://gerrit.wikimedia.org/r/393259 [15:25:03] \o/ [15:25:12] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3fullscreenorgId=1var-site=eqiadvar-cache_type=Allvar-status_type=5 [15:25:32] PROBLEM - Misc HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3fullscreenorgId=1var-site=Allvar-cache_type=miscvar-status_type=5 [15:25:35] (03CR) 10Giuseppe Lavagetto: [C: 032] puppet: Move puppet CI to puppet 4.8.2 [puppet] - 10https://gerrit.wikimedia.org/r/393259 (owner: 10Giuseppe Lavagetto) [15:25:53] (03PS1) 10Ottomata: Remove before => cergen, this creates a circular deb with require package [puppet] - 10https://gerrit.wikimedia.org/r/394317 [15:26:03] (03PS1) 10Marostegui: m4.hosts: Remove db1046 and db1047 [software] - 10https://gerrit.wikimedia.org/r/394318 (https://phabricator.wikimedia.org/T156844) [15:26:30] (03PS2) 10Ottomata: Remove before => cergen, this creates a circular dep with require package [puppet] - 10https://gerrit.wikimedia.org/r/394317 [15:26:50] (03PS3) 10Ottomata: Remove before => cergen, this creates a circular dep with require package [puppet] - 10https://gerrit.wikimedia.org/r/394317 [15:26:57] (03CR) 10Ottomata: [V: 032 C: 032] Remove before => cergen, this creates a circular dep with require package [puppet] - 10https://gerrit.wikimedia.org/r/394317 (owner: 10Ottomata) [15:27:06] the spikes seems to be mostly for query.wikidata.org [15:27:09] cc: gehel [15:27:54] Wdqs should be back to normal [15:30:34] (03CR) 10Marostegui: [C: 032] m4.hosts: Remove db1046 and db1047 [software] - 10https://gerrit.wikimedia.org/r/394318 (https://phabricator.wikimedia.org/T156844) (owner: 10Marostegui) [15:31:00] (03Merged) 10jenkins-bot: m4.hosts: Remove db1046 and db1047 [software] - 10https://gerrit.wikimedia.org/r/394318 (https://phabricator.wikimedia.org/T156844) (owner: 10Marostegui) [15:31:03] 10Operations, 10Goal, 10User-Elukey, 10User-fgiunchedi: Stop using jmx_exporter deployed via scap in favour of Debian package - https://phabricator.wikimedia.org/T181728#3800073 (10fgiunchedi) [15:31:31] PROBLEM - puppet last run on labpuppetmaster1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:32:12] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3fullscreenorgId=1var-site=eqiadvar-cache_type=Allvar-status_type=5 [15:33:32] RECOVERY - Misc HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] https://grafana.wikimedia.org/dashboard/file/varnish-aggregate-client-status-codes.json?panelId=3fullscreenorgId=1var-site=Allvar-cache_type=miscvar-status_type=5 [15:35:21] labpuppetmaster1001 is me [15:35:24] working on it... [15:35:37] (03PS1) 10Ottomata: Don't use require_package for cergen [puppet] - 10https://gerrit.wikimedia.org/r/394320 [15:36:53] (03CR) 10Ottomata: [C: 032] Don't use require_package for cergen [puppet] - 10https://gerrit.wikimedia.org/r/394320 (owner: 10Ottomata) [15:37:01] RECOVERY - puppet last run on puppetmaster2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [15:39:19] (03PS1) 10Filippo Giunchedi: Depend on jre-headless and its versioned names. [debs/prometheus-jmx-exporter] - 10https://gerrit.wikimedia.org/r/394322 [15:39:31] 10Operations, 10ops-eqiad, 10Patch-For-Review, 10User-Elukey, 10User-Joe: rack and setup mw1307-1348 - https://phabricator.wikimedia.org/T165519#3800098 (10jcrespo) [15:39:33] 10Operations, 10ops-eqiad, 10Patch-For-Review: Please move db1110 and change its ip - https://phabricator.wikimedia.org/T181613#3800096 (10jcrespo) 05Open>03Resolved a:03Cmjohnson [15:45:51] (03PS1) 10Ottomata: Need to specify more dependencies for cergen, create cergen module [puppet] - 10https://gerrit.wikimedia.org/r/394325 (https://phabricator.wikimedia.org/T166167) [15:46:01] 10Operations, 10Domains, 10Traffic: Purchase domains mediawi.ki and media.wiki to use as a url shortener - https://phabricator.wikimedia.org/T180657#3800120 (10ema) p:05Triage>03Normal [15:46:26] (03CR) 10jerkins-bot: [V: 04-1] Need to specify more dependencies for cergen, create cergen module [puppet] - 10https://gerrit.wikimedia.org/r/394325 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata) [15:46:56] (03CR) 10Ottomata: [C: 031] Depend on jre-headless and its versioned names. [debs/prometheus-jmx-exporter] - 10https://gerrit.wikimedia.org/r/394322 (owner: 10Filippo Giunchedi) [15:47:41] (03PS2) 10Ottomata: Need to specify more dependencies for cergen, create cergen module [puppet] - 10https://gerrit.wikimedia.org/r/394325 (https://phabricator.wikimedia.org/T166167) [15:48:13] 10Operations, 10Phabricator, 10Traffic: Switch on http/2 in phabricator apache - https://phabricator.wikimedia.org/T180998#3800123 (10ema) p:05Triage>03Low [15:48:17] (03CR) 10jerkins-bot: [V: 04-1] Need to specify more dependencies for cergen, create cergen module [puppet] - 10https://gerrit.wikimedia.org/r/394325 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata) [15:49:25] (03PS1) 10Jcrespo: mariadb-eventlogging: db1108 will contain only log database [puppet] - 10https://gerrit.wikimedia.org/r/394326 (https://phabricator.wikimedia.org/T156844) [15:49:54] (03PS3) 10Ottomata: Need to specify more dependencies for cergen, create cergen module [puppet] - 10https://gerrit.wikimedia.org/r/394325 (https://phabricator.wikimedia.org/T166167) [15:50:33] (03CR) 10Ottomata: [C: 032] Need to specify more dependencies for cergen, create cergen module [puppet] - 10https://gerrit.wikimedia.org/r/394325 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata) [15:51:03] 10Operations, 10Phabricator, 10Traffic: Switch on http/2 in phabricator apache - https://phabricator.wikimedia.org/T180998#3775802 (10ema) Note that varnish does not speak TLS, so no h2. I'm not sure how stable varnish's h2c support (HTTP/2 without TLS) is. [15:52:22] (03CR) 10Elukey: [C: 031] mariadb-eventlogging: db1108 will contain only log database [puppet] - 10https://gerrit.wikimedia.org/r/394326 (https://phabricator.wikimedia.org/T156844) (owner: 10Jcrespo) [15:53:04] I am fixing prometheus cannot connecting to mysql [15:53:13] I think it is because old credentials [15:55:05] (03PS1) 10Ottomata: require python3-pkg-resources for cergen in jessie [puppet] - 10https://gerrit.wikimedia.org/r/394327 (https://phabricator.wikimedia.org/T166167) [15:56:12] (03CR) 10Ottomata: [C: 032] require python3-pkg-resources for cergen in jessie [puppet] - 10https://gerrit.wikimedia.org/r/394327 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata) [15:59:20] !log setup prometheus with unix_socket on new server db1107 and db1108 [15:59:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:59:32] that should fix starts going to grafana [15:59:39] ^elukey [16:00:07] (03PS2) 10Jcrespo: mariadb-eventlogging: db1108 will contain only log database [puppet] - 10https://gerrit.wikimedia.org/r/394326 (https://phabricator.wikimedia.org/T156844) [16:00:31] jynus: thanks a lot! Sorry thta I didn't check that :( [16:00:44] RECOVERY - puppet last run on puppetmaster1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [16:00:55] (03CR) 10Jcrespo: [C: 032] mariadb-eventlogging: db1108 will contain only log database [puppet] - 10https://gerrit.wikimedia.org/r/394326 (https://phabricator.wikimedia.org/T156844) (owner: 10Jcrespo) [16:01:49] elukey: this is more of a self reminder [16:02:16] of pending actions after stretch/10.1 upgrade [16:04:31] it still doesn't work :-/ [16:06:59] it is trying to connect through /tmp/mysql.sock [16:09:57] (03PS1) 10Ottomata: Use install_options instead of manually specifying cergen deps [puppet] - 10https://gerrit.wikimedia.org/r/394328 (https://phabricator.wikimedia.org/T166167) [16:10:27] (03CR) 10jerkins-bot: [V: 04-1] Use install_options instead of manually specifying cergen deps [puppet] - 10https://gerrit.wikimedia.org/r/394328 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata) [16:11:20] (03PS2) 10Ottomata: Use install_options instead of manually specifying cergen deps [puppet] - 10https://gerrit.wikimedia.org/r/394328 (https://phabricator.wikimedia.org/T166167) [16:12:17] !log drain and reboot analytics1031->39 to pick up jvm+kernel updates - T179943 [16:12:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:12:24] T179943: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943 [16:13:13] (03CR) 10Ottomata: [C: 032] Use install_options instead of manually specifying cergen deps [puppet] - 10https://gerrit.wikimedia.org/r/394328 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata) [16:13:20] (03PS3) 10Ottomata: Use install_options instead of manually specifying cergen deps [puppet] - 10https://gerrit.wikimedia.org/r/394328 (https://phabricator.wikimedia.org/T166167) [16:13:23] (03CR) 10Ottomata: [V: 032 C: 032] Use install_options instead of manually specifying cergen deps [puppet] - 10https://gerrit.wikimedia.org/r/394328 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata) [16:14:57] (03PS1) 10Muehlenhoff: Also track operations monitoring metrics [debs/prometheus-openldap-exporter] - 10https://gerrit.wikimedia.org/r/394331 [16:15:41] 10Operations, 10ops-codfw: Disconnect furud's disk shelves - https://phabricator.wikimedia.org/T181725#3800212 (10RobH) 05Open>03stalled IRC discussion with both @faidon and myself has lead to us wanting to test out how the systems handle the loss and addition of shelves. For now, don't touch this system... [16:16:34] (03PS1) 10Jcrespo: mariadb-eventlogging: Move prometheus monitoring to /run socket [puppet] - 10https://gerrit.wikimedia.org/r/394332 (https://phabricator.wikimedia.org/T156844) [16:17:38] (03PS2) 10Jcrespo: mariadb-eventlogging: Move prometheus monitoring to /run socket [puppet] - 10https://gerrit.wikimedia.org/r/394332 (https://phabricator.wikimedia.org/T156844) [16:18:02] do you know if db1047 and db1046 are still up? In any case, we are going to break them [16:18:18] 1047 is up, 1046 is stopped [16:18:38] break as in "monitoring will not work for grafana" [16:19:06] but I do not want to waste time to make a conditional there [16:19:16] I will just update the the right socket [16:19:25] and make monitoring work on the new ones [16:19:36] 10Operations, 10ops-eqiad: Disconnect flerovium's disk shelves - https://phabricator.wikimedia.org/T181724#3799996 (10RobH) So there is some discussion on this, and how it will detect the disks being added, removed, and added again. For now, lets test this out with the following: * - Power down host * remove... [16:19:54] ack [16:20:00] (03CR) 10Jcrespo: [C: 032] mariadb-eventlogging: Move prometheus monitoring to /run socket [puppet] - 10https://gerrit.wikimedia.org/r/394332 (https://phabricator.wikimedia.org/T156844) (owner: 10Jcrespo) [16:21:34] RECOVERY - puppet last run on labpuppetmaster1001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [16:22:39] elukey: now it works! [16:23:08] \o/ [16:29:03] PROBLEM - pdfrender on scb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:44:45] !log drop (erroneous) legacy tables from -ng cassandra cluster - T181689 [16:44:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:44:56] T181689: New RESTBase Cassandra cluster has legacy tables - https://phabricator.wikimedia.org/T181689 [16:49:37] (03CR) 10Ottomata: Puppetize SSL for Kafka broker (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/394144 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata) [16:52:36] (03PS8) 10Ottomata: Puppetize SSL for Kafka broker [puppet] - 10https://gerrit.wikimedia.org/r/394144 (https://phabricator.wikimedia.org/T166167) [16:55:31] RECOVERY - pdfrender on scb1002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.004 second response time [16:59:05] 10Operations, 10Phabricator, 10Traffic: Switch on http/2 in phabricator apache - https://phabricator.wikimedia.org/T180998#3775802 (10elukey) Same problem as for gerrit: ``` elukey@phab1001:~$ dpkg --list | grep apache ii apache2 2.4.10-10+deb8u11+wmf1 amd64 Apache... [17:00:05] godog, moritzm, and _joe_: That opportune time is upon us again. Time for a Puppet SWAT(Max 8 patches) deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171130T1700). [17:00:05] No GERRIT patches in the queue for this window AFAICS. [17:00:19] (03CR) 10Arturo Borrero Gonzalez: [C: 032] apt: add --force-confold/--force-confdef dpkg option to apt calls (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/392421 (https://phabricator.wikimedia.org/T180811) (owner: 10Arturo Borrero Gonzalez) [17:01:32] GET https://godog.wikimedia.org/celebrating-gif [17:01:42] (03PS13) 10Arturo Borrero Gonzalez: apt: add --force-confold/--force-confdef dpkg option to apt calls [puppet] - 10https://gerrit.wikimedia.org/r/392421 (https://phabricator.wikimedia.org/T180811) [17:02:44] elukey: https://i.redd.it/67d9r21bt8zz.gif [17:03:06] :D [17:03:46] cmjohnson1: o/ [17:04:00] let me know when/if I can bother you for kafka1018 [17:06:00] !log beginning cut over of esams to codfw puppet 4 masters [17:06:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:06:21] (03CR) 10Herron: [C: 032] dns: set puppet.esams.wmnet CNAME puppetmaster2001.codfw.wmnet [dns] - 10https://gerrit.wikimedia.org/r/394308 (https://phabricator.wikimedia.org/T177254) (owner: 10Herron) [17:07:30] (03Abandoned) 10Rush: apt: unattended upgrades -updates suites by default [puppet] - 10https://gerrit.wikimedia.org/r/390431 (https://phabricator.wikimedia.org/T180254) (owner: 10Arturo Borrero Gonzalez) [17:10:59] (03CR) 10Filippo Giunchedi: [C: 031] Also track operations monitoring metrics [debs/prometheus-openldap-exporter] - 10https://gerrit.wikimedia.org/r/394331 (owner: 10Muehlenhoff) [17:12:03] 10Operations, 10Phabricator, 10Traffic: Switch on http/2 in phabricator apache - https://phabricator.wikimedia.org/T180998#3775802 (10Dzahn) As Moritz said on a Gerrit patch to enable this for Gerrit/Planet. There are security concerns even with the version in stretch and it probably becomes realistic to do... [17:14:38] elukey: hi [17:15:50] cmjohnson1: o/ [17:16:11] wondering if you have time today/tomorrow to check if we can swap the motherboard [17:16:43] elukey: not today..did you get Faidon or Rob to approve [17:17:17] it's a little unconventional, it's typically policy to decom servers that are beyond standard repair and out of warranty [17:17:55] cmjohnson1: yep yep, we should have already migrated some clients to kafka-jumbo but things got delayed [17:18:50] if we are able to make it work again it would be great, otherwise an option is to reimage notebook1002 (same hw, not super used) to a new kafka (like kafka1023) and then force it to take kafka1018's spot [17:19:01] (03CR) 10Addshore: [C: 031] user homes: Allow git to control +x for $HOME files [puppet] - 10https://gerrit.wikimedia.org/r/377056 (owner: 10BryanDavis) [17:19:35] elukey: that may be the best option....the swift servers are slightly a different cfg and it may not work [17:19:43] all right [17:20:58] cmjohnson1 is right nevertheless, reassigning hardware for different purposes requires director approval [17:21:47] (and also right in the sense that swapping motherboards is unconvential :) [17:22:14] paravoid: ah didn't know it! [17:22:41] PROBLEM - Check systemd state on restbase1012 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [17:22:41] PROBLEM - cassandra-a service on restbase1012 is CRITICAL: CRITICAL - Expecting active but unit cassandra-a is failed [17:23:00] (03PS1) 10Filippo Giunchedi: cassandra: reprovision restbase1014 with cassandra 3 [puppet] - 10https://gerrit.wikimedia.org/r/394343 (https://phabricator.wikimedia.org/T179422) [17:23:01] PROBLEM - cassandra-a SSL 10.64.32.202:7001 on restbase1012 is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:Connection refused [17:23:02] PROBLEM - cassandra-a CQL 10.64.32.202:9042 on restbase1012 is CRITICAL: connect to address 10.64.32.202 and port 9042: Connection refused [17:23:16] (03CR) 10jerkins-bot: [V: 04-1] cassandra: reprovision restbase1014 with cassandra 3 [puppet] - 10https://gerrit.wikimedia.org/r/394343 (https://phabricator.wikimedia.org/T179422) (owner: 10Filippo Giunchedi) [17:23:19] paravoid: even repurposing notebook1002 to kafka1023 ? [17:23:30] correct [17:23:38] super, so I'll do my homeworks [17:26:52] (03PS2) 10Filippo Giunchedi: cassandra: reprovision restbase1014 with cassandra 3 [puppet] - 10https://gerrit.wikimedia.org/r/394343 (https://phabricator.wikimedia.org/T179422) [17:31:03] 10Operations, 10Phabricator, 10Traffic: Switch on http/2 in phabricator apache - https://phabricator.wikimedia.org/T180998#3775802 (10demon) Do we even gain much from having Phab speak http2 for an internal connection to Varnish? [17:31:26] (03PS3) 10Rush: cloud: setup for attended upgrade process [puppet] - 10https://gerrit.wikimedia.org/r/394200 (https://phabricator.wikimedia.org/T181647) [17:31:30] (03CR) 10jerkins-bot: [V: 04-1] cloud: setup for attended upgrade process [puppet] - 10https://gerrit.wikimedia.org/r/394200 (https://phabricator.wikimedia.org/T181647) (owner: 10Rush) [17:33:49] 10Operations, 10Datasets-General-or-Unknown, 10User-ArielGlenn: Reboot of dumps hosts - https://phabricator.wikimedia.org/T180127#3800469 (10ArielGlenn) dumpsdata1001 done. dataset1001 can't be done now due to the wikidata weekly job; it's already had issues once this week. I'm looking at this weekend. [17:35:59] (03CR) 10Muehlenhoff: [V: 032 C: 032] Also track operations monitoring metrics [debs/prometheus-openldap-exporter] - 10https://gerrit.wikimedia.org/r/394331 (owner: 10Muehlenhoff) [17:36:20] (03CR) 10Mobrovac: [C: 031] cassandra: reprovision restbase1014 with cassandra 3 [puppet] - 10https://gerrit.wikimedia.org/r/394343 (https://phabricator.wikimedia.org/T179422) (owner: 10Filippo Giunchedi) [17:36:57] (03PS1) 10Muehlenhoff: Bump version [debs/prometheus-openldap-exporter] - 10https://gerrit.wikimedia.org/r/394346 [17:37:35] 10Operations, 10ops-eqiad, 10Analytics-Kanban: kafka1018 fails to boot - https://phabricator.wikimedia.org/T181518#3800473 (10elukey) Just spoke to Chris and Faidon on IRC, and with my team. The best option seems to be repurpose notebook1002.eqiad.wmnet to kafka1023.eqiad.wmnet (new hostname), and assign to... [17:37:45] paravoid,cmjohnson1: just added --^ [17:38:00] tried to put a summary in there [17:38:09] (03CR) 10Muehlenhoff: [V: 032 C: 032] Bump version [debs/prometheus-openldap-exporter] - 10https://gerrit.wikimedia.org/r/394346 (owner: 10Muehlenhoff) [17:38:31] elukey: what about on-site spares? [17:38:55] (03PS2) 10Jcrespo: Link to grafana rather than to ganglia on tendril [software/tendril] - 10https://gerrit.wikimedia.org/r/391558 (https://phabricator.wikimedia.org/T177225) [17:40:11] paravoid: notebook1002 seems to be a good candidate since it is not really used and it has the exact same hw specs, so I thought that it would have been less work for everybody. Plus we are ordering the new notebooks in these days :) [17:40:30] (03PS1) 10ArielGlenn: add labstore1007 to list of hosts permitted to rsync from dumps servers [puppet] - 10https://gerrit.wikimedia.org/r/394348 [17:41:39] no, it's messier -- if we have a spare, way easier to do that [17:41:59] (03CR) 10Jcrespo: [V: 032 C: 032] Link to grafana rather than to ganglia on tendril [software/tendril] - 10https://gerrit.wikimedia.org/r/391558 (https://phabricator.wikimedia.org/T177225) (owner: 10Jcrespo) [17:42:48] all right, updating the task [17:44:05] (03CR) 10ArielGlenn: [C: 032] add labstore1007 to list of hosts permitted to rsync from dumps servers [puppet] - 10https://gerrit.wikimedia.org/r/394348 (owner: 10ArielGlenn) [17:44:27] 10Operations, 10ops-eqiad, 10Analytics-Kanban: kafka1018 fails to boot - https://phabricator.wikimedia.org/T181518#3800495 (10elukey) Updating after a chat with Faidon: better to see if there is a onsite spare to repurpose, but for that I'd need to ping @RobH :) [17:47:36] !log uploaded prometheus-openldap-exporter 0+git20171128-1 for jessie-wikimedia (T181511) [17:47:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:47:43] T181511: Package openldap collector for Prometheus and adapt metrics - https://phabricator.wikimedia.org/T181511 [17:48:22] (03PS4) 10Rush: cloud: setup for attended upgrade process [puppet] - 10https://gerrit.wikimedia.org/r/394200 (https://phabricator.wikimedia.org/T181647) [17:48:58] (03CR) 10jerkins-bot: [V: 04-1] cloud: setup for attended upgrade process [puppet] - 10https://gerrit.wikimedia.org/r/394200 (https://phabricator.wikimedia.org/T181647) (owner: 10Rush) [17:55:19] 10Operations, 10ops-eqiad, 10Analytics-Kanban: kafka1018 fails to boot - https://phabricator.wikimedia.org/T181518#3800512 (10RobH) So kafka1018 is a Dell PowerEdge R720xd. It is a 2U server, with 12 LFF disk bays. It has dual Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz, 48Gb memory, and 12 * 2TB disks. We... [17:55:31] (03PS1) 10Andrew Bogott: Move labtestpuppetmaster2001 to puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/394353 (https://phabricator.wikimedia.org/T178717) [17:56:00] (03PS5) 10Rush: cloud: setup for attended upgrade process [puppet] - 10https://gerrit.wikimedia.org/r/394200 (https://phabricator.wikimedia.org/T181647) [17:56:42] (03CR) 10jerkins-bot: [V: 04-1] cloud: setup for attended upgrade process [puppet] - 10https://gerrit.wikimedia.org/r/394200 (https://phabricator.wikimedia.org/T181647) (owner: 10Rush) [17:56:55] (03CR) 10Andrew Bogott: [C: 032] Move labtestpuppetmaster2001 to puppet 4 [puppet] - 10https://gerrit.wikimedia.org/r/394353 (https://phabricator.wikimedia.org/T178717) (owner: 10Andrew Bogott) [17:57:33] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Package openldap collector for Prometheus and adapt metrics - https://phabricator.wikimedia.org/T181511#3800524 (10MoritzMuehlenhoff) https://github.com/jcollie/openldap_exporter has been packages as software/deb... [17:57:41] !log upgrading labtestpuppetmaster2001 to puppet 4.8 [17:57:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:59:24] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3800527 (10MoritzMuehlenhoff) [17:59:27] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Package openldap collector for Prometheus and adapt metrics - https://phabricator.wikimedia.org/T181511#3800525 (10MoritzMuehlenhoff) 05Open>03Resolved https://grafana-admin.wikimedia.org/dashboard/db/openlda... [17:59:56] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3650139 (10MoritzMuehlenhoff) [18:00:04] gwicke, cscott, arlolra, subbu, halfak, and Amir1: It is that lovely time of the day again! You are hereby commanded to deploy Services – Graphoid / Parsoid / Citoid / ORES. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171130T1800). [18:00:04] No GERRIT patches in the queue for this window AFAICS. [18:01:30] (03CR) 10Rush: cloud: setup for attended upgrade process (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/394200 (https://phabricator.wikimedia.org/T181647) (owner: 10Rush) [18:01:31] (03PS6) 10Rush: cloud: setup for attended upgrade process [puppet] - 10https://gerrit.wikimedia.org/r/394200 (https://phabricator.wikimedia.org/T181647) [18:01:57] (03PS7) 10Rush: WIP: cloud: setup for attended upgrade process [puppet] - 10https://gerrit.wikimedia.org/r/394200 (https://phabricator.wikimedia.org/T181647) [18:02:42] (03CR) 10jerkins-bot: [V: 04-1] WIP: cloud: setup for attended upgrade process [puppet] - 10https://gerrit.wikimedia.org/r/394200 (https://phabricator.wikimedia.org/T181647) (owner: 10Rush) [18:06:03] (03CR) 10Smalyshev: [C: 04-1] "I am putting -1 on this temporarily as I want to manually run it before putting it into cron, and that is waiting for https://gerrit.wikim" [puppet] - 10https://gerrit.wikimedia.org/r/394021 (https://phabricator.wikimedia.org/T173772) (owner: 10Gehel) [18:08:59] (03PS1) 10Andrew Bogott: Move labtestpuppetmaster2001 over to a codfw puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/394355 [18:10:30] (03CR) 10Herron: [C: 031] Move labtestpuppetmaster2001 over to a codfw puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/394355 (owner: 10Andrew Bogott) [18:11:34] (03CR) 10Andrew Bogott: [C: 032] Move labtestpuppetmaster2001 over to a codfw puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/394355 (owner: 10Andrew Bogott) [18:14:00] 10Operations, 10Phabricator, 10Traffic: Switch on http/2 in phabricator apache - https://phabricator.wikimedia.org/T180998#3800594 (10faidon) 05Open>03declined >>! In T180998#3800462, @demon wrote: > Do we even gain much from having Phab speak http2 for an internal connection to Varnish? As I mentioned... [18:14:53] 10Operations, 10Scoring-platform-team, 10Patch-For-Review, 10Wikimedia-Incident: ORES overload incident, 2017-11-28 - https://phabricator.wikimedia.org/T181538#3800604 (10akosiaris) [18:14:55] 10Operations, 10monitoring, 10Scoring-platform-team (Current): Investigate scb1001 and scb1002 available memory graphs in Grafana - https://phabricator.wikimedia.org/T181544#3800601 (10akosiaris) 05Open>03Resolved a:03akosiaris I 've delete the wrong graph and added a new one based on prometheus. This... [18:27:10] (03PS8) 10Arturo Borrero Gonzalez: WIP: cloud: setup for attended upgrade process [puppet] - 10https://gerrit.wikimedia.org/r/394200 (https://phabricator.wikimedia.org/T181647) (owner: 10Rush) [18:27:51] (03CR) 10jerkins-bot: [V: 04-1] WIP: cloud: setup for attended upgrade process [puppet] - 10https://gerrit.wikimedia.org/r/394200 (https://phabricator.wikimedia.org/T181647) (owner: 10Rush) [18:28:05] 10Operations, 10Phabricator, 10Traffic: Switch on http/2 in phabricator apache - https://phabricator.wikimedia.org/T180998#3800621 (10demon) >>! In T180998#3800594, @faidon wrote: >>>! In T180998#3800462, @demon wrote: >> Do we even gain much from having Phab speak http2 for an internal connection to Varnish... [18:34:28] (03PS2) 10Dzahn: rename requesttracker_server to just requesttracker [puppet] - 10https://gerrit.wikimedia.org/r/393712 [18:35:18] (03PS1) 10Ottomata: Add dummy cergen created certificates for kafka-jumbo [labs/private] - 10https://gerrit.wikimedia.org/r/394361 [18:35:58] (03CR) 10Ottomata: [V: 032 C: 032] Add dummy cergen created certificates for kafka-jumbo [labs/private] - 10https://gerrit.wikimedia.org/r/394361 (owner: 10Ottomata) [18:37:34] (03PS3) 10Dzahn: rename requesttracker_server to just requesttracker [puppet] - 10https://gerrit.wikimedia.org/r/393712 [18:37:42] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/9097/ununpentium.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/393712 (owner: 10Dzahn) [18:42:22] (03PS1) 10Ottomata: Add dummy profile::kafka::broker::ssl_password for kafka jumbo [labs/private] - 10https://gerrit.wikimedia.org/r/394362 [18:46:44] (03CR) 10Ottomata: [V: 032 C: 032] Add dummy profile::kafka::broker::ssl_password for kafka jumbo [labs/private] - 10https://gerrit.wikimedia.org/r/394362 (owner: 10Ottomata) [18:47:00] (03CR) 10Chad: "I don't think that test is even necessary anymore" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394199 (owner: 10Chad) [18:51:49] (03PS9) 10Ottomata: Puppetize SSL for Kafka broker [puppet] - 10https://gerrit.wikimedia.org/r/394144 (https://phabricator.wikimedia.org/T166167) [18:52:53] (03CR) 10Ottomata: "OOOOK! LOOKING GOOD! https://puppet-compiler.wmflabs.org/compiler02/9099/kafka-jumbo1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/394144 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata) [18:52:54] (03CR) 10Ottomata: [C: 032] Puppetize SSL for Kafka broker [puppet] - 10https://gerrit.wikimedia.org/r/394144 (https://phabricator.wikimedia.org/T166167) (owner: 10Ottomata) [18:54:38] (03CR) 10Chad: "I know, but they will be eventually, and nothing in the non-standard docroots really matters for beta afaict." [puppet] - 10https://gerrit.wikimedia.org/r/394203 (owner: 10Chad) [18:57:35] PROBLEM - Kafka Broker Server on kafka-jumbo1001 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args Kafka /etc/kafka/server.properties [18:57:58] ottomata: ^ that's paging [18:58:10] is it in prod already? [18:58:21] I'm not sure [18:58:23] (03PS1) 10Herron: puppet: update hiera function call in puppet_ssldir [puppet] - 10https://gerrit.wikimedia.org/r/394367 (https://phabricator.wikimedia.org/T179181) [18:58:28] <_joe_> hey whatsup? [18:58:30] 10Operations, 10Gerrit, 10Phabricator, 10Traffic, 10periodic-update: Phabricator and Gerrit: Improve the way that maintenance downtime is communicated to users. - https://phabricator.wikimedia.org/T180655#3800663 (10demon) That's good for people on IRC. This task is more about making it clear from the br... [18:58:53] I don't know if actionable in prod or not. elukey or ottomata? [18:59:06] yup [18:59:07] on it [18:59:09] at first sight might be related to the latest patches [18:59:13] oh paging [18:59:15] (03PS1) 10Ottomata: chgrp server.properties to 'kafka' so daemon can read it [puppet] - 10https://gerrit.wikimedia.org/r/394368 (https://phabricator.wikimedia.org/T121561) [18:59:16] yikes yea sorry [18:59:17] yep [18:59:20] its ^^ [18:59:30] i'm about to restart all, will silence first [18:59:42] (03PS3) 10Kaldari: Enable MP3 uploads on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393661 (https://phabricator.wikimedia.org/T120288) [18:59:48] (03CR) 10Ottomata: [C: 032] chgrp server.properties to 'kafka' so daemon can read it [puppet] - 10https://gerrit.wikimedia.org/r/394368 (https://phabricator.wikimedia.org/T121561) (owner: 10Ottomata) [19:00:05] addshore, hashar, anomie, RainbowSprinkles, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: My dear minions, it's time we take the moon! Just kidding. Time for Morning SWAT (Max 8 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171130T1900). [19:00:05] subbu, cscott, kaldari, and MatmaRex: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [19:00:16] (03PS2) 10Andrew Bogott: puppet: update hiera function call in puppet_ssldir [puppet] - 10https://gerrit.wikimedia.org/r/394367 (https://phabricator.wikimedia.org/T179181) (owner: 10Herron) [19:00:22] hello [19:00:24] o/ [19:01:45] RECOVERY - Kafka Broker Server on kafka-jumbo1001 is OK: PROCS OK: 1 process with command name java, args Kafka /etc/kafka/server.properties [19:01:59] (03CR) 10Andrew Bogott: [C: 032] puppet: update hiera function call in puppet_ssldir [puppet] - 10https://gerrit.wikimedia.org/r/394367 (https://phabricator.wikimedia.org/T179181) (owner: 10Herron) [19:02:38] I can SWAT [19:03:40] (03PS4) 10Thcipriani: Enable MP3 uploads on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393661 (https://phabricator.wikimedia.org/T120288) (owner: 10Kaldari) [19:05:24] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393661 (https://phabricator.wikimedia.org/T120288) (owner: 10Kaldari) [19:05:46] subbu|workshop: cscott additional ping for swat if you all are around [19:06:24] cscott will be monitoring .. but, i can if required and he cannot. [19:06:30] but, that patch should go about. [19:06:43] yup [19:06:51] s/about/out/ [19:06:53] i'm hoping i don't get a sticker [19:07:00] :) [19:07:08] (03Merged) 10jenkins-bot: Enable MP3 uploads on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393661 (https://phabricator.wikimedia.org/T120288) (owner: 10Kaldari) [19:07:09] you and me both. [19:07:35] Ill take the sticker for you cscott [19:08:08] we really need to wok on the incentives surrounding deployment :) [19:08:21] kaldari: your change is live on mwdebug1002, check please [19:08:27] checking.... [19:08:54] kaldari thansk :) will update the local pages once merged&synchronized [19:08:59] *thanks [19:09:01] Offer a few hundred dollars thcipriani :P [19:09:28] Steinsplitter: Thanks!! [19:09:35] thcipriani: Looks good! [19:09:40] kaldari: ok, going live [19:10:08] Steinsplitter: I already removed folks from the MP3 uploaders group and will be adding them to the new group [19:10:28] ok thanks :) [19:11:59] !log thcipriani@tin Synchronized wmf-config: SWAT: [[gerrit:393661|Enable MP3 uploads on Commons]] T120288 (duration: 00m 51s) [19:12:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:12:05] ^ kaldari all live now [19:12:06] T120288: Enable MP3 uploads on Wikimedia Commons and TMH playback - https://phabricator.wikimedia.org/T120288 [19:12:16] thcipriani: testing... [19:12:49] (03PS3) 10Thcipriani: Revert "Temporary disable remex html" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392185 (https://phabricator.wikimedia.org/T178632) (owner: 10Subramanya Sastry) [19:14:41] thcipriani: Seems to work: https://commons.wikimedia.org/wiki/File:Chopin_-_Waltz_in_E_minor,_B_56.mp3 !! [19:14:42] o/ [19:14:55] kaldari: awesome, kudos :) [19:15:21] (03CR) 10jenkins-bot: Enable MP3 uploads on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/393661 (https://phabricator.wikimedia.org/T120288) (owner: 10Kaldari) [19:15:42] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392185 (https://phabricator.wikimedia.org/T178632) (owner: 10Subramanya Sastry) [19:16:51] (03PS1) 10Andrew Bogott: nslcd.conf: add @ before some erb variables. [puppet] - 10https://gerrit.wikimedia.org/r/394372 [19:17:12] MatmaRex: you change is live on mwdebug1002, check please [19:17:27] looking [19:17:58] thcipriani: looks good, the broken heading on https://en.wikisource.org/wiki/Special:Preferences#mw-prefsection-gadgets looks fine [19:18:06] (03CR) 10Rush: [C: 031] "Needed to move to puppet 4" [puppet] - 10https://gerrit.wikimedia.org/r/394372 (owner: 10Andrew Bogott) [19:18:07] addshore: \o <- the high-five you were looking for :) [19:18:11] :D [19:18:20] MatmaRex: cool, thanks for checking, going live. [19:18:42] (03Merged) 10jenkins-bot: Revert "Temporary disable remex html" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392185 (https://phabricator.wikimedia.org/T178632) (owner: 10Subramanya Sastry) [19:20:40] !log thcipriani@tin Synchronized php-1.31.0-wmf.10/includes/htmlform/fields/HTMLMultiSelectField.php: SWAT: [[gerrit:394339|HTMLMultiSelectField: Allow formatting in section headings in OOUI mode]] T181698 (duration: 00m 49s) [19:20:42] (03PS1) 10Imarlier: webperf.py: Handle oversamples differently than regular samples [puppet] - 10https://gerrit.wikimedia.org/r/394375 [19:20:46] ^ MatmaRex live everywhere [19:20:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:20:47] T181698: Headings of sections on "Gadgets" tab on Special:Preferences display escaped HTML after OOUI conversion - https://phabricator.wikimedia.org/T181698 [19:21:00] thanks thcipriani [19:21:05] yw :) [19:21:19] PROBLEM - HHVM jobrunner on mw1311 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.002 second response time [19:21:20] (03CR) 10jerkins-bot: [V: 04-1] webperf.py: Handle oversamples differently than regular samples [puppet] - 10https://gerrit.wikimedia.org/r/394375 (owner: 10Imarlier) [19:21:40] cscott: your change is live on mwdebug1002, check please [19:21:41] (03PS1) 10RobH: adding additional command to wipe lvm data [puppet] - 10https://gerrit.wikimedia.org/r/394376 [19:22:37] (03PS1) 10Rush: base: seems to be needed for puppet4 on master [puppet] - 10https://gerrit.wikimedia.org/r/394377 [19:22:49] (03PS2) 10Imarlier: webperf.py: Handle oversamples differently than regular samples [puppet] - 10https://gerrit.wikimedia.org/r/394375 (https://phabricator.wikimedia.org/T181413) [19:22:50] PROBLEM - Nginx local proxy to apache on mw1311 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.005 second response time [19:23:20] RECOVERY - HHVM jobrunner on mw1311 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [19:23:50] RECOVERY - Nginx local proxy to apache on mw1311 is OK: HTTP OK: HTTP/1.1 200 OK - 245 bytes in 0.006 second response time [19:24:09] (03CR) 10Andrew Bogott: [C: 032] nslcd.conf: add @ before some erb variables. [puppet] - 10https://gerrit.wikimedia.org/r/394372 (owner: 10Andrew Bogott) [19:31:02] thcipriani: looking (i haven't used x-wikimedia-debug before) [19:31:27] okie doke [19:31:40] (03PS2) 10Rush: base: seems to be needed for puppet4 on master [puppet] - 10https://gerrit.wikimedia.org/r/394377 [19:33:39] (03PS3) 10Rush: base: qualify syslogs::readable defined type call [puppet] - 10https://gerrit.wikimedia.org/r/394377 [19:34:59] thcipriani: seems to work. that is, https://www.mediawiki.org/wiki/User:SSastry_(WMF)/Sandbox has the remex appearance (not the tidy appearance) when i use the x-wikimedia-debug extension to select mw1002 (and purge the parser cache) [19:35:10] (03PS4) 10Rush: base: qualify syslogs::readable defined type call [puppet] - 10https://gerrit.wikimedia.org/r/394377 [19:35:29] cscott: cool. I'll sync out everywhere. [19:35:51] (03CR) 10Andrew Bogott: [C: 031] base: qualify syslogs::readable defined type call [puppet] - 10https://gerrit.wikimedia.org/r/394377 (owner: 10Rush) [19:36:32] (03CR) 10Krinkle: webperf.py: Handle oversamples differently than regular samples (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/394375 (https://phabricator.wikimedia.org/T181413) (owner: 10Imarlier) [19:37:58] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:392185|Revert "Temporary disable remex html"]] T178632 (duration: 00m 49s) [19:38:07] ^ cscott live everywhere [19:38:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:38:08] T178632: Stack overflow in remex_html serializer - https://phabricator.wikimedia.org/T178632 [19:38:32] (03CR) 10jenkins-bot: Revert "Temporary disable remex html" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/392185 (https://phabricator.wikimedia.org/T178632) (owner: 10Subramanya Sastry) [19:41:42] (03Draft1) 10MarcoAurelio: Add https://studiezaal.nijmegen.nl to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394379 (https://phabricator.wikimedia.org/T181713) [19:41:47] (03PS2) 10MarcoAurelio: Add https://studiezaal.nijmegen.nl to $wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394379 (https://phabricator.wikimedia.org/T181713) [19:42:02] thcipriani: thanks. looks the same now even with x-wikimedia-debug turned off. [19:42:22] awesome, glad all's working :) [19:42:25] (03PS3) 10Imarlier: webperf.py: Handle oversamples differently than regular samples [puppet] - 10https://gerrit.wikimedia.org/r/394375 (https://phabricator.wikimedia.org/T181413) [19:42:27] cscott, one thing to check is if there is a page reported in that original bug report (stack overflow one) and see if that page is now okay. [19:42:46] subbu|workshop: yep, was reading through that phab task now looking for that [19:43:13] k [19:43:42] https://fa.wikipedia.org/wiki/%DA%98%D8%A7%D9%86-%D9%BE%D9%84_%D8%B3%D8%A7%D8%B1%D8%AA%D8%B1 looks fine, even after a purge [19:44:12] (03CR) 10Imarlier: webperf.py: Handle oversamples differently than regular samples (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/394375 (https://phabricator.wikimedia.org/T181413) (owner: 10Imarlier) [19:46:20] (03CR) 10Rush: [C: 032] base: qualify syslogs::readable defined type call [puppet] - 10https://gerrit.wikimedia.org/r/394377 (owner: 10Rush) [19:50:41] (03PS2) 10RobH: adding additional command to wipe lvm data [puppet] - 10https://gerrit.wikimedia.org/r/394376 [19:51:29] (03CR) 10RobH: [C: 032] adding additional command to wipe lvm data [puppet] - 10https://gerrit.wikimedia.org/r/394376 (owner: 10RobH) [19:51:45] (03PS1) 10Ottomata: Disable ssl for kafka-jumbo [puppet] - 10https://gerrit.wikimedia.org/r/394383 (https://phabricator.wikimedia.org/T121561) [19:52:10] (03PS2) 10Ottomata: Disable ssl for kafka-jumbo [puppet] - 10https://gerrit.wikimedia.org/r/394383 (https://phabricator.wikimedia.org/T121561) [19:52:14] (03CR) 10Ottomata: [V: 032 C: 032] Disable ssl for kafka-jumbo [puppet] - 10https://gerrit.wikimedia.org/r/394383 (https://phabricator.wikimedia.org/T121561) (owner: 10Ottomata) [19:56:49] cscott, great. [19:59:46] 10Operations, 10ops-eqiad, 10DC-Ops: decommission mobile 1004 and mobile1005 - https://phabricator.wikimedia.org/T181750#3800915 (10Cmjohnson) [20:00:04] no_justification: #bothumor Q:Why did functions stop calling each other? A:They had arguments. Rise for MediaWiki train . (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20171130T2000). [20:00:05] No GERRIT patches in the queue for this window AFAICS. [20:03:30] how did jouncebot develop a sense of humour? :) [20:05:01] no_justification: *waves* [20:05:08] give me a ping once the train is done :D [20:05:35] Oh, I should start that I s'pose [20:05:39] haha [20:08:13] (03PS1) 10Dzahn: configcluster: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/394388 (https://phabricator.wikimedia.org/T177225) [20:08:42] (03PS1) 10Chad: group2 to wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394389 [20:08:44] (03CR) 10Chad: [C: 032] group2 to wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394389 (owner: 10Chad) [20:09:02] (03PS2) 10Dzahn: configcluster: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/394388 (https://phabricator.wikimedia.org/T177225) [20:09:18] (03CR) 10Dzahn: [C: 032] configcluster: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/394388 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [20:11:17] (03Merged) 10jenkins-bot: group2 to wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394389 (owner: 10Chad) [20:11:28] (03CR) 10jenkins-bot: group2 to wmf.10 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394389 (owner: 10Chad) [20:11:45] (03PS3) 10Dzahn: rename planet_server to just planet [puppet] - 10https://gerrit.wikimedia.org/r/393710 [20:11:51] * addshore rebases his chain [20:11:56] (03PS4) 10Addshore: wdbuild: add switch to ease killing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394207 (https://phabricator.wikimedia.org/T176948) [20:12:07] (03PS5) 10Addshore: wdbuild: extension-list-labs stop using build entry points [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394208 (https://phabricator.wikimedia.org/T177060) [20:12:11] (03PS4) 10Addshore: wdbuild: Stop using wikidata build on LABS / BETA [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394209 (https://phabricator.wikimedia.org/T176948) [20:12:15] (03PS4) 10Addshore: wdbuild: Stop loading from build on test and testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394210 (https://phabricator.wikimedia.org/T176948) [20:12:22] (03PS4) 10Addshore: wdbuild: Stop loading from build on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394211 (https://phabricator.wikimedia.org/T176948) [20:12:26] (03PS4) 10Addshore: wdbuild: Stop loading from build on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394212 (https://phabricator.wikimedia.org/T176948) [20:12:30] !log demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group2 to wmf.10 [20:12:31] (03PS4) 10Addshore: wdbuild: Stop loading from build on all wikis (except enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394213 (https://phabricator.wikimedia.org/T176948) [20:12:35] (03PS4) 10Addshore: wdbuild: Stop loading from build on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394214 (https://phabricator.wikimedia.org/T176948) [20:12:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:12:39] (03PS6) 10Addshore: wdbuild: Remove wmgUseWikidataBuild [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394215 [20:12:44] (03PS3) 10Addshore: wdbuild: Add wikidata extensions to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394282 (https://phabricator.wikimedia.org/T177060) [20:13:00] (03PS6) 10Addshore: wdbuild: Remove Wikibase-buildentry.php config file (empty) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394216 [20:13:07] (03PS4) 10Addshore: wdbuild: Add wikidata extensions to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394282 (https://phabricator.wikimedia.org/T177060) [20:13:15] (03PS3) 10Addshore: wdbuild: Switch wikidata extensions to json entrypoint where possible [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394283 (https://phabricator.wikimedia.org/T123026) [20:17:04] addshore: Things look quiet on my end [20:17:32] Lovely"! [20:17:58] * addshore begins [20:18:38] (03CR) 10Addshore: [C: 032] wdbuild: add switch to ease killing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394207 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [20:20:08] (03PS3) 10Andrew Bogott: DO NOT MERGE: no-op patch for testing [puppet] - 10https://gerrit.wikimedia.org/r/383942 [20:21:29] (03Merged) 10jenkins-bot: wdbuild: add switch to ease killing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394207 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [20:21:36] addshore: "Add switch to ease killing" is my favorite commit message this week. <3 [20:22:06] hehe, I should have added "the wikidata build" at the end, but i figured the wdbuild prefix sould suffice [20:23:34] I'm going to watch each change roll out on beta before i sync too [20:25:00] or maybe not, hmmm, I thought the config changes went out quicker than they do [20:26:30] It's a shame there is only beta-scap-eqiad that does a full scap, and not something that syncs only wmf-config [20:28:42] addshore: time to see a psych dr? xD [20:30:11] Could manually trigger scap sync-file on beta [20:30:31] no_justification: yeh, let me ssh in there [20:30:40] !log addshore@tin Synchronized wmf-config: wdbuild: T173818: add switch to ease killing (duration: 00m 47s) [20:30:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:30:51] T173818: [Epic] Kill the Wikidata build step - https://phabricator.wikimedia.org/T173818 [20:32:04] reverting, wtf already... [20:32:30] !log addshore@tin Synchronized wmf-config: REVERT wdbuild: T173818: add switch to ease killing (duration: 00m 47s) [20:32:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:32:49] thats a hella lot more notices for an undefined variable than I was expecting [20:34:22] (03PS1) 10Addshore: Revert "wdbuild: add switch to ease killing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394399 [20:34:25] (03CR) 10Addshore: [C: 032] Revert "wdbuild: add switch to ease killing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394399 (owner: 10Addshore) [20:35:51] (03Merged) 10jenkins-bot: Revert "wdbuild: add switch to ease killing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394399 (owner: 10Addshore) [20:35:58] (03CR) 10jenkins-bot: wdbuild: add switch to ease killing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394207 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [20:36:00] (03CR) 10jenkins-bot: Revert "wdbuild: add switch to ease killing" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394399 (owner: 10Addshore) [20:38:10] * addshore checks the order of stuff [20:39:19] no_justification: any idea what was up with that patch? [20:40:14] looking [20:40:22] I swear this is why I normally do them file by file [20:40:40] !log Sending Toolforge survey final reminder emails from silver for T177126 [20:40:43] Notice: Undefined variable: wmgUseWikidataBuild in /srv/mediawiki/wmf-config/Wikibase-buildentry.php on line 10 [20:40:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:40:49] T177126: 2017 Toolforge user survey - https://phabricator.wikimedia.org/T177126 [20:41:08] no_justification: indeed, but it is set in InitializeSettings [20:41:08] It quieted off immediately [20:41:15] Yeah, race condition [20:41:18] It spiked, then stopped [20:41:45] Also: Warning: include_once(/srv/mediawiki/php-1.31.0-wmf.10/extensions/Constraints/WikibaseQualityConstraints.php): File not found in /srv/mediawiki/wmf-config/Wikibase-buildentry.php on line 42 [20:41:48] hmm, I saw around 32000 events for it in logstahs for 2 misn until i reverted it [20:42:34] (03PS1) 10Addshore: Revert "Revert "wdbuild: add switch to ease killing"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394401 [20:43:54] (03CR) 10Addshore: [C: 04-1] Revert "Revert "wdbuild: add switch to ease killing"" (033 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394401 (owner: 10Addshore) [20:45:14] no_justification: yeh, so it wasn't set, and thus the if falls to the else which was the setup loading from the extensions rather than the build [20:45:27] Gotcha [20:46:01] I might flip that statement around and run it again..... [20:47:24] (03PS2) 10Addshore: Revert "Revert "wdbuild: add switch to ease killing"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394401 [20:49:14] (03PS3) 10Addshore: Revert "Revert "wdbuild: add switch to ease killing"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394401 [20:49:22] that feels safer [20:52:14] (03CR) 10Addshore: [C: 032] Revert "Revert "wdbuild: add switch to ease killing"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394401 (owner: 10Addshore) [20:53:44] (03Merged) 10jenkins-bot: Revert "Revert "wdbuild: add switch to ease killing"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394401 (owner: 10Addshore) [20:53:58] (03CR) 10jenkins-bot: Revert "Revert "wdbuild: add switch to ease killing"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394401 (owner: 10Addshore) [20:55:40] PROBLEM - puppet last run on labstore1007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 1 minute ago with 1 failures. Failed resources (up to 3 shown): Mount[/srv/dumps] [20:55:50] and syncing on beta... [20:58:32] (03PS1) 10Dzahn: Revert "nagios_common: remove krinkle from perf-team contactgroup" [puppet] - 10https://gerrit.wikimedia.org/r/394404 [20:58:41] * addshore dislikes race these races [20:59:29] (03CR) 10Dzahn: "@krinkle instead i set your notification options in the contact definition to "n" for never. It should solve the issue of duplicate emails" [puppet] - 10https://gerrit.wikimedia.org/r/394404 (owner: 10Dzahn) [20:59:45] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: wdbuild: T173818: add switch to ease killing (again) (duration: 00m 45s) [20:59:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:59:54] T173818: [Epic] Kill the Wikidata build step - https://phabricator.wikimedia.org/T173818 [21:01:28] !log addshore@tin Synchronized wmf-config: wdbuild: T173818: add switch to ease killing (again) (duration: 00m 46s) [21:01:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:01:44] thats better [21:02:11] (03CR) 10Addshore: [C: 032] wdbuild: extension-list-labs stop using build entry points [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394208 (https://phabricator.wikimedia.org/T177060) (owner: 10Addshore) [21:02:25] (03CR) 10Addshore: [C: 032] wdbuild: Stop using wikidata build on LABS / BETA [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394209 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [21:03:33] (03Merged) 10jenkins-bot: wdbuild: extension-list-labs stop using build entry points [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394208 (https://phabricator.wikimedia.org/T177060) (owner: 10Addshore) [21:03:48] (03CR) 10jenkins-bot: wdbuild: extension-list-labs stop using build entry points [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394208 (https://phabricator.wikimedia.org/T177060) (owner: 10Addshore) [21:03:59] 10Operations, 10Performance-Team, 10Traffic, 10Patch-For-Review: load.php response taking 160s (of which only 0.031s in Apache) - https://phabricator.wikimedia.org/T181315#3801188 (10Gilles) OK, we have the explanation as to why the 2+ minute request didn't show up in the Varnish slow log. By default the V... [21:04:19] 10Operations, 10Performance-Team, 10Traffic, 10Patch-For-Review: load.php response taking 160s (of which only 0.031s in Apache) - https://phabricator.wikimedia.org/T181315#3801189 (10Gilles) a:03Gilles [21:05:03] 10Operations, 10Performance-Team, 10Traffic: load.php response taking 160s (of which only 0.031s in Apache) - https://phabricator.wikimedia.org/T181315#3786217 (10Gilles) [21:05:18] (03PS5) 10Addshore: wdbuild: Stop using wikidata build on LABS / BETA [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394209 (https://phabricator.wikimedia.org/T176948) [21:05:23] (03CR) 10Addshore: [C: 032] wdbuild: Stop using wikidata build on LABS / BETA [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394209 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [21:05:36] (03PS5) 10Addshore: wdbuild: Stop loading from build on test and testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394210 (https://phabricator.wikimedia.org/T176948) [21:05:40] (03PS5) 10Addshore: wdbuild: Stop loading from build on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394211 (https://phabricator.wikimedia.org/T176948) [21:05:44] (03PS5) 10Addshore: wdbuild: Stop loading from build on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394212 (https://phabricator.wikimedia.org/T176948) [21:05:48] (03PS5) 10Addshore: wdbuild: Stop loading from build on all wikis (except enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394213 (https://phabricator.wikimedia.org/T176948) [21:06:01] (03PS5) 10Addshore: wdbuild: Stop loading from build on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394214 (https://phabricator.wikimedia.org/T176948) [21:06:17] (03CR) 10Addshore: [C: 04-1] "need rebase for fix" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394215 (owner: 10Addshore) [21:06:48] (03Merged) 10jenkins-bot: wdbuild: Stop using wikidata build on LABS / BETA [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394209 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [21:06:58] (03CR) 10jenkins-bot: wdbuild: Stop using wikidata build on LABS / BETA [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394209 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [21:08:47] (03PS2) 10Dzahn: Revert "nagios_common: remove krinkle from perf-team contactgroup" [puppet] - 10https://gerrit.wikimedia.org/r/394404 [21:08:49] (03PS7) 10Addshore: wdbuild: Remove wmgUseWikidataBuild [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394215 [21:09:08] (03CR) 10jerkins-bot: [V: 04-1] wdbuild: Remove wmgUseWikidataBuild [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394215 (owner: 10Addshore) [21:10:52] legoktm: no_justification well, beta is now running wikibase without using the build [21:11:00] :D [21:11:00] woohooo [21:12:29] (03PS8) 10Addshore: wdbuild: Remove wmgUseWikidataBuild [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394215 [21:13:49] addshore: are you planning to do the rest today? [21:14:03] legoktm: yup [21:14:09] sweet [21:14:11] as long as I manage to get throguh them all before swat [21:14:47] !log addshore@tin Synchronized wmf-config: wdbuild: BETA ONLY (duration: 00m 47s) [21:14:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:16:00] (03PS7) 10Addshore: wdbuild: Remove Wikibase-buildentry.php config file (empty) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394216 [21:16:13] (03PS5) 10Addshore: wdbuild: Add wikidata extensions to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394282 (https://phabricator.wikimedia.org/T177060) [21:16:17] (03PS4) 10Addshore: wdbuild: Switch wikidata extensions to json entrypoint where possible [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394283 (https://phabricator.wikimedia.org/T123026) [21:16:27] right, thats the chain all ready again [21:16:32] time for some testing on beta quickly [21:17:24] legoktm: still need to fixup https://phabricator.wikimedia.org/T179663 somehow [21:17:48] elukey: do you know what's up with kafka1018 [21:18:39] couldnt find it in SAL or ticket but it's down and notifications are disabled [21:18:51] !log upgrade and restart db2078 [21:19:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:19:01] hmm, on beta... [21:19:01] [{exception_id}] {exception_url} ErrorException from line 73 of /srv/mediawiki-staging/php-master/extensions/Wikibase/lib/includes/Changes/ItemChange.php: PHP Warning: get_class() expects parameter 1 to be object, string given [21:19:03] (03CR) 10Krinkle: [C: 031] "Thanks" [puppet] - 10https://gerrit.wikimedia.org/r/394404 (owner: 10Dzahn) [21:19:12] also [{exception_id}] {exception_url} ErrorException from line 309 of /srv/mediawiki-staging/php-master/includes/debug/MWDebug.php: PHP Warning: Cannot get sitelink diff from . Change #1371936, type wikibase-item~update [Called from Wikibase\ItemChange::logW [21:19:26] aude ^^ :( [21:20:11] elukey: but i did see in SAL there were kernel upgrades on other kafka.. so guess that.. looking at console [21:20:49] hmm, they are all from deployment-tin O_o [21:21:26] 10Operations, 10Performance-Team, 10Traffic: load.php response taking 160s (of which only 0.031s in Apache) - https://phabricator.wikimedia.org/T181315#3801215 (10Samat) Thank you Gilles for your efforts! Am I correct that I should catch a slow loading longer than 60 secs now? Because delay between 20-60 se... [21:23:00] !log powercycling kafka1018 (was down in Icinga and saw in SAL: reboot kafka10[12-22] for kernel + jvm updates - T179943) [21:23:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:23:12] T179943: Restart Analytics JVM daemons for open-jdk security updates - https://phabricator.wikimedia.org/T179943 [21:23:22] no_justification: well, I seem to be able to produce a bunch of ErrorExceptions on the beta cluster [21:24:19] elukey: sorry, i found everything now. https://phabricator.wikimedia.org/T181518 got it [21:25:35] Looks like they are from code that was merged today.... [21:25:45] the thing is, i did not find that ticket by searching phab for host name like it used to work.. [21:25:57] even though host name is in ticket title itself [21:25:59] (03PS1) 10Addshore: Revert "wdbuild: Stop using wikidata build on LABS / BETA" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394408 [21:26:02] (03CR) 10Addshore: [C: 032] Revert "wdbuild: Stop using wikidata build on LABS / BETA" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394408 (owner: 10Addshore) [21:26:32] i have to confirm what others have said about phab search [21:27:33] (03Merged) 10jenkins-bot: Revert "wdbuild: Stop using wikidata build on LABS / BETA" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394408 (owner: 10Addshore) [21:27:42] (03CR) 10jenkins-bot: Revert "wdbuild: Stop using wikidata build on LABS / BETA" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394408 (owner: 10Addshore) [21:28:24] Making a new wikidata build on master so the code being switched between is actually the same.... https://gerrit.wikimedia.org/r/#/c/394409/ [21:28:39] and will see if I can reproduce while using the build.... [21:30:07] addshore: how that can cause problems? [21:30:35] Amir1: on beta! [21:30:46] well, that's expected [21:30:47] :D [21:30:52] [{exception_id}] {exception_url} ErrorException from line 73 of /srv/mediawiki-staging/php-master/extensions/Wikibase/lib/includes/Changes/ItemChange.php: PHP Warning: get_class() expects parameter 1 to be object, string given [21:30:59] not much though [21:31:02] let me check [21:31:05] also [{exception_id}] {exception_url} ErrorException from line 309 of /srv/mediawiki-staging/php-master/includes/debug/MWDebug.php: PHP Warning: Cannot get sitelink diff from . Change #1371937, type wikibase-item~add [Called from Wikibase\ItemChange::logWarn [21:31:15] think it has something to do with me loading recentchanges [21:31:43] but it poped up when i switched from the build to the extensions, as of course on beta the extensions are updated every 10 mins or so but the build is onyl daily [21:31:47] !log upgrading labsdb1010 and restarting mariadb [21:31:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:31:59] just made https://gerrit.wikimedia.org/r/#/c/394409 (updating the build) to get it in sync with master) [21:32:10] yeah [21:32:11] 10Operations, 10ops-eqiad, 10Analytics-Kanban: kafka1018 fails to boot - https://phabricator.wikimedia.org/T181518#3792843 (10Dzahn) I saw this host as DOWN when looking at Icinga as it was in the unacknowledeged section (though notifications were disabled). Then i searched Phab for the host name, which usu... [21:32:16] I think I can debug and fix it [21:32:26] addshore: mind if you file a bug and assign it to me? [21:32:43] Amir1: will do once I finish build killing [21:32:51] just wanted to make sure it wasn't related to what i was doing [21:32:57] but looks very unrelated [21:33:08] we will have a small downtime on proxy 10 and 11, ignore it [21:33:45] (03PS1) 10Addshore: Revert "Revert "wdbuild: Stop using wikidata build on LABS / BETA"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394411 [21:35:02] !log addshore@tin Synchronized wmf-config: wdbuild: BETA ONLY (duration: 00m 47s) [21:35:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:35:30] Amir1: your patch might actually make the build tests fail [21:35:38] https://integration.wikimedia.org/ci/job/mediawiki-extensions-hhvm-jessie/29173/console [21:36:09] Amir1: mind if I just revert on master for now? [21:36:22] I can fix it soon [21:36:32] okay! I'll wait! [21:36:37] thanks [21:40:40] RECOVERY - puppet last run on labstore1007 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [21:43:40] (03PS1) 10Madhuvishy: dumps: Turn off dumps auto-sync to labstore1006 for reimaging [puppet] - 10https://gerrit.wikimedia.org/r/394415 [21:45:59] (03CR) 10Madhuvishy: [C: 032] dumps: Turn off dumps auto-sync to labstore1006 for reimaging [puppet] - 10https://gerrit.wikimedia.org/r/394415 (owner: 10Madhuvishy) [21:46:21] Amir1: I also made https://phabricator.wikimedia.org/T181759 so we can look into why this made it through the Wikibase CI..... [21:46:24] seems a bit fishy [21:48:38] and Amir1 https://phabricator.wikimedia.org/T181760 is the issue for the errorexceptions [21:48:45] addshore: okay, I'm checking everything now [21:48:52] Thanks! [21:48:53] thanks [21:50:27] addshore: where is the backtrace, I thought I could find it in fatalmonitor in logstahs-beta [21:50:42] *looks* [21:51:30] Amir1: it is in mediawiki-errors [21:51:47] Thanks [21:51:50] Amir1: ill add some traces to the ticket [21:53:05] {{done}} [21:53:23] (03PS1) 10RobH: Revert "adding additional command to wipe lvm data" [puppet] - 10https://gerrit.wikimedia.org/r/394429 [21:53:29] addshore: I believe I found the problem, would you mind reviewing it? [21:53:29] (03PS2) 10RobH: Revert "adding additional command to wipe lvm data" [puppet] - 10https://gerrit.wikimedia.org/r/394429 [21:53:43] (03CR) 10RobH: [C: 032] Revert "adding additional command to wipe lvm data" [puppet] - 10https://gerrit.wikimedia.org/r/394429 (owner: 10RobH) [21:54:19] Amir1: sure [21:55:26] (03PS1) 10Dzahn: planet: add feeds of Megha Sharma and Neha Jha [puppet] - 10https://gerrit.wikimedia.org/r/394431 (https://phabricator.wikimedia.org/T181587) [21:56:47] (03PS2) 10Dzahn: planet: add feeds of Megha Sharma and Neha Jha [puppet] - 10https://gerrit.wikimedia.org/r/394431 (https://phabricator.wikimedia.org/T181587) [21:57:17] 10Operations, 10monitoring, 10Scoring-platform-team (Current): Investigate scb1001 and scb1002 available memory graphs in Grafana - https://phabricator.wikimedia.org/T181544#3801343 (10Halfak) Thanks @akosiaris! [21:58:08] (03PS3) 10Dzahn: planet: add feeds of Megha Sharma and Neha Jha [puppet] - 10https://gerrit.wikimedia.org/r/394431 (https://phabricator.wikimedia.org/T181587) [21:58:44] (03CR) 10Dzahn: [C: 032] planet: add feeds of Megha Sharma and Neha Jha [puppet] - 10https://gerrit.wikimedia.org/r/394431 (https://phabricator.wikimedia.org/T181587) (owner: 10Dzahn) [22:00:33] addshore: the patch is up for review :D [22:00:39] *looks* [22:02:29] 10Operations, 10Gerrit, 10Phabricator, 10Traffic, 10periodic-update: Phabricator and Gerrit: Improve the way that maintenance downtime is communicated to users. - https://phabricator.wikimedia.org/T180655#3801357 (10Dzahn) Yea, we should have both, it was meant to be additional not replacing the other. B... [22:03:14] Amir1: hehe, a is_array instead of is_string ? ;) [22:03:22] Amir1: looks like it could have done with a test ;) [22:03:37] addshore: basically it was a copy-paste error from the above part of the code [22:04:09] yeah, I will do that tomorrow [22:04:18] Thanks! [22:04:25] * addshore waits for it to merge [22:05:56] Thanks for spotting it :D [22:06:05] no worries :D [22:09:16] Amir1: https://integration.wikimedia.org/ci/job/mwext-testextension-php55-composer-jessie/718/console phpcs [22:09:46] addshore: f*** stupid phpstorm [22:09:53] I'm having this issue all the time [22:09:57] bwhahahaaa [22:11:06] addshore: uploaded new patchset [22:11:07] sorry [22:11:40] * addshore waits for jenkins to catch up [22:16:07] (03PS1) 10Ottomata: Improvements for Kafka + SSL [puppet] - 10https://gerrit.wikimedia.org/r/394438 (https://phabricator.wikimedia.org/T167304) [22:16:49] (03CR) 10jerkins-bot: [V: 04-1] Improvements for Kafka + SSL [puppet] - 10https://gerrit.wikimedia.org/r/394438 (https://phabricator.wikimedia.org/T167304) (owner: 10Ottomata) [22:17:36] (03PS2) 10Ottomata: Improvements for Kafka + SSL [puppet] - 10https://gerrit.wikimedia.org/r/394438 (https://phabricator.wikimedia.org/T167304) [22:18:25] (03CR) 10jerkins-bot: [V: 04-1] Improvements for Kafka + SSL [puppet] - 10https://gerrit.wikimedia.org/r/394438 (https://phabricator.wikimedia.org/T167304) (owner: 10Ottomata) [22:18:48] (03CR) 10Ottomata: "Luca, this changes what we talked about yesterday. Now there is a single cert&key for a Kafka cluster. This makes configuration WAY easi" [puppet] - 10https://gerrit.wikimedia.org/r/394438 (https://phabricator.wikimedia.org/T167304) (owner: 10Ottomata) [22:20:18] bah Amir1 https://integration.wikimedia.org/ci/job/mwext-mw-selenium-composer-jessie/7458/console [22:20:47] Amir1: that failed for PS1 too. [22:21:00] random error? [22:21:24] Right, I'm going to continue with the build killing as this is on master only [22:21:34] (03CR) 10Addshore: [C: 032] Revert "Revert "wdbuild: Stop using wikidata build on LABS / BETA"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394411 (owner: 10Addshore) [22:23:16] addshore: what do you think of "submitting" it instead of jenkins [22:23:26] this probably didn't caused it [22:23:56] Amir1: probably not, but it has happened twice in a row :/ [22:24:38] yeah [22:24:41] okay [22:25:00] 10Operations, 10ops-eqiad, 10DC-Ops: Decommission niobium - https://phabricator.wikimedia.org/T181763#3801389 (10Cmjohnson) [22:25:51] (03PS1) 10Dzahn: kubernetes: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/394439 (https://phabricator.wikimedia.org/T177225) [22:26:22] (03Merged) 10jenkins-bot: Revert "Revert "wdbuild: Stop using wikidata build on LABS / BETA"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394411 (owner: 10Addshore) [22:26:32] (03CR) 10jenkins-bot: Revert "Revert "wdbuild: Stop using wikidata build on LABS / BETA"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394411 (owner: 10Addshore) [22:27:49] !log addshore@tin Synchronized wmf-config: wdbuild: BETA ONLY [[gerrit:394411]] (duration: 00m 47s) [22:27:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:28:04] (03CR) 10Dzahn: [C: 032] kubernetes: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/394439 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [22:28:22] (03CR) 10Addshore: [C: 032] wdbuild: Stop loading from build on test and testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394210 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [22:29:11] addshore: it gives https://integration.wikimedia.org/ci/job/mwext-mw-selenium-composer-jessie/7458/artifact/log/Use%20header%3A%20Save%20label,%20description%20and%20aliases%3A%20%7C%20press%20the%20RETURN%20key%20in%20the%20description%20input%20field%20%7C.png [22:29:26] but extremely weird [22:29:40] Amir1: does it have the debug.log anywhere? [22:30:00] Best bet is probably to revert that change on master for now Amir1 and just tackle it again tommorrow [22:30:07] yes [22:30:07] https://integration.wikimedia.org/ci/job/mwext-mw-selenium-composer-jessie/7458/ [22:30:31] Amir1: https://integration.wikimedia.org/ci/job/mwext-mw-selenium-composer-jessie/7458/artifact/log/mw-error.log [22:30:44] (03Merged) 10jenkins-bot: wdbuild: Stop loading from build on test and testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394210 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [22:31:04] addshore: it doesn't have anything with my changes [22:31:10] maybe we merged something else today [22:31:11] (03CR) 10jenkins-bot: wdbuild: Stop loading from build on test and testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394210 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [22:31:31] Amir1: could be mediawiki or something that broke it [22:33:36] I pulled master and checked localhost several times, can't reproduce it [22:33:55] the old "try again" hack to see it might be just a random failure [22:35:07] addshore: Sort unrelated, but while we're here.... T168035 [22:35:07] T168035: dewiktionary on beta gives Undefined variable: wmgWikibaseSiteGroup - https://phabricator.wikimedia.org/T168035 [22:35:29] *looks* [22:36:00] no_justification: looks like thereis no default [22:36:22] might might mean labs gets nothing, hmm. wmgWikibaseSiteGroup is set in InitialiaseSettings [22:37:04] Maybe it was fixed? [22:37:29] I'll have a look tommorrow [22:37:39] no worries, just doing some #logspam grooming [22:37:44] :D [22:38:10] added myself to it for now [22:38:28] no issues on testwiki or testwikidata on mwdebug1002 :) [22:38:57] :D [22:39:13] Failed to acquire lock because demon is doing secret things [22:40:47] Is my lock file really that opaque? [22:40:53] !log demon@tin Pruned MediaWiki: 1.31.0-wmf.8 [keeping static files] (duration: 02m 33s) [22:41:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:41:07] haha :P [22:41:58] !log addshore@tin Synchronized wmf-config: wdbuild: BETA ONLY [[gerrit:394210|wdbuild: Stop loading from build on test and testwikidata]] (duration: 00m 47s) [22:42:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:42:09] (03CR) 10Krinkle: [C: 031] "I was expecting a unit test failure given the new output isn't "expected" yet, but they pass because the input doesn't contain usage of th" [puppet] - 10https://gerrit.wikimedia.org/r/394375 (https://phabricator.wikimedia.org/T181413) (owner: 10Imarlier) [22:42:12] !log BETA ONLY was a lie on that last one ... [22:42:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:42:37] (03CR) 10Addshore: [C: 032] wdbuild: Stop loading from build on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394211 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [22:42:45] (03CR) 10jerkins-bot: [V: 04-1] wdbuild: Stop loading from build on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394211 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [22:42:56] (03PS6) 10Addshore: wdbuild: Stop loading from build on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394211 (https://phabricator.wikimedia.org/T176948) [22:43:03] (03CR) 10Addshore: [C: 032] wdbuild: Stop loading from build on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394211 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [22:45:57] perhaps gate-submit for mediawiki-config should be gate-submit-swat..... [22:47:37] (03Merged) 10jenkins-bot: wdbuild: Stop loading from build on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394211 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [22:47:47] (03PS1) 10MaxSem: Switch all wikis to HTML5 section IDs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394460 (https://phabricator.wikimedia.org/T152540) [22:47:51] (03CR) 10jenkins-bot: wdbuild: Stop loading from build on group0 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394211 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [22:54:25] looks fine on mw.org on debug1002 [22:55:41] !log addshore@tin Synchronized wmf-config: wdbuild: [[gerrit:394211|wdbuild: Stop loading from build on group0]] (duration: 00m 46s) [22:55:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:56:55] (03PS6) 10Addshore: wdbuild: Stop loading from build on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394212 (https://phabricator.wikimedia.org/T176948) [22:58:38] (03CR) 10Addshore: [C: 032] wdbuild: Stop loading from build on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394212 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [23:03:11] (03Merged) 10jenkins-bot: wdbuild: Stop loading from build on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394212 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [23:03:25] (03CR) 10jenkins-bot: wdbuild: Stop loading from build on group1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394212 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [23:03:40] (03CR) 10Krinkle: [C: 04-1] "Hm.. not sure beta should differ from prod for that though. What is the plan for the search engine files (*.wikipedia.org) and FirefoxOS t" [puppet] - 10https://gerrit.wikimedia.org/r/394203 (owner: 10Chad) [23:03:45] (03PS1) 10Dzahn: parsoid: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/394488 (https://phabricator.wikimedia.org/T177225) [23:03:47] (03PS1) 10Dzahn: logstash: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/394489 (https://phabricator.wikimedia.org/T177225) [23:03:49] (03PS1) 10Dzahn: jobqueue_redis,restbase: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/394490 (https://phabricator.wikimedia.org/T177225) [23:06:20] !log addshore@tin Synchronized wmf-config: wdbuild: [[gerrit:394212|wdbuild: Stop loading from build on group1]] (duration: 00m 46s) [23:06:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:06:31] no_justification: thats group1 done =o [23:07:02] and everything appears to be just fine.... [23:07:05] addshore: Probably CirrusSearch broke it [23:07:06] https://integration.wikimedia.org/ci/job/mwext-mw-selenium-composer-jessie/7458/artifact/log/mw-debug-www.log [23:07:25] I updated everything in my localhost and still can't reproduce it [23:07:32] (03PS6) 10Addshore: wdbuild: Stop loading from build on all wikis (except enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394213 (https://phabricator.wikimedia.org/T176948) [23:07:38] Amir1: thanks! [23:09:48] (03CR) 10Addshore: [C: 032] wdbuild: Stop loading from build on all wikis (except enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394213 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [23:09:49] addshore: ahhhh, I see this error, it's reproducible in wikidata-lexeme.wmflabs.org [23:10:06] that's weird that I can't do it locally but I know what's going on [23:10:49] addshore: I see the light at the end of the tunnel! [23:11:04] no_justification: yup [23:11:26] heh, I love how 'cautious' i have been with this commit, everything except enwiki [23:11:48] Ain't no test environment like enwiki [23:12:36] (03Merged) 10jenkins-bot: wdbuild: Stop loading from build on all wikis (except enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394213 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [23:12:43] Would anyone here mind if I did a small MCS (Node service) deploy? [23:13:17] (03PS3) 10Dzahn: Revert "nagios_common: remove krinkle from perf-team contactgroup" [puppet] - 10https://gerrit.wikimedia.org/r/394404 [23:13:26] (03CR) 10Dzahn: [C: 032] Revert "nagios_common: remove krinkle from perf-team contactgroup" [puppet] - 10https://gerrit.wikimedia.org/r/394404 (owner: 10Dzahn) [23:13:45] bearND: s'ok [23:13:54] greg-g: Thanks! [23:14:05] !log bsitzmann@tin Started deploy [mobileapps/deploy@4305d96]: Update mobileapps to 4317ea5 (T181743) [23:14:08] (03CR) 10Chad: "The latter, preferably." [puppet] - 10https://gerrit.wikimedia.org/r/394203 (owner: 10Chad) [23:14:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:14:16] T181743: iOS app users in California seeing both US and Canada banners - https://phabricator.wikimedia.org/T181743 [23:14:22] (03CR) 10Chad: "But I can back wikipedia and wikimedia out of this for awhile if it'll make merging less confusing" [puppet] - 10https://gerrit.wikimedia.org/r/394203 (owner: 10Chad) [23:14:24] addshore: https://gerrit.wikimedia.org/r/#/c/394495/ [23:15:10] !log addshore@tin Synchronized wmf-config: wdbuild: [[gerrit:394213|wdbuild: Stop loading from build on all wikis (except the one i really dont want to break)]] (duration: 00m 46s) [23:15:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:15:41] (03PS2) 10Dzahn: mw_canary_appserver: mv firewall incl to role, use profile [puppet] - 10https://gerrit.wikimedia.org/r/392768 [23:15:47] (03PS6) 10Addshore: wdbuild: Stop loading from build on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394214 (https://phabricator.wikimedia.org/T176948) [23:15:49] (03CR) 10Addshore: [C: 032] wdbuild: Stop loading from build on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394214 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [23:15:55] last but not least [23:16:02] (03PS2) 10Chad: Beta: Moving all docroots to standard-docroot [puppet] - 10https://gerrit.wikimedia.org/r/394203 [23:16:46] (03CR) 10Krinkle: [C: 031] Beta: Moving all docroots to standard-docroot [puppet] - 10https://gerrit.wikimedia.org/r/394203 (owner: 10Chad) [23:16:49] (03CR) 10Dzahn: [C: 032] mw_canary_appserver: mv firewall incl to role, use profile [puppet] - 10https://gerrit.wikimedia.org/r/392768 (owner: 10Dzahn) [23:18:16] Amir1: also https://gerrit.wikimedia.org/r/#/c/394496/ [23:18:39] (03Merged) 10jenkins-bot: wdbuild: Stop loading from build on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394214 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [23:18:55] !log bsitzmann@tin Finished deploy [mobileapps/deploy@4305d96]: Update mobileapps to 4317ea5 (T181743) (duration: 04m 50s) [23:19:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:19:49] Amir1: I left a comment on https://gerrit.wikimedia.org/r/#/c/390224 [23:20:37] Thanks [23:20:49] (03CR) 10Dzahn: "no-op on mwdebug1001,mw2017. wmf-style: total violations delta -3" [puppet] - 10https://gerrit.wikimedia.org/r/392768 (owner: 10Dzahn) [23:21:55] !log addshore@tin Synchronized wmf-config: wdbuild: T173818 [[gerrit:394214|wdbuild: Stop loading from build on ALL WIKIS]] The build is dead! Mwahahaaa (duration: 00m 47s) [23:22:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:22:03] T173818: [Epic] Kill the Wikidata build step - https://phabricator.wikimedia.org/T173818 [23:22:13] no_justification: greg-g legoktm ^^ [23:22:57] addshore: I love you. [23:23:05] wheeeeeeee :D [23:23:11] I need to email Lydia_WMDE and tell her to buy cake [23:23:19] There should be cake for days [23:23:22] (03PS9) 10Addshore: wdbuild: Remove wmgUseWikidataBuild [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394215 [23:23:25] (03CR) 10Addshore: [C: 032] wdbuild: Remove wmgUseWikidataBuild [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394215 (owner: 10Addshore) [23:23:30] just got cleanup now [23:24:06] (03CR) 10Addshore: [C: 032] wdbuild: Remove wmgUseWikidataBuild [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394215 (owner: 10Addshore) [23:24:41] (03PS8) 10Addshore: wdbuild: Remove Wikibase-buildentry.php config file (empty) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394216 [23:24:46] (03PS6) 10Addshore: wdbuild: Add wikidata extensions to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394282 (https://phabricator.wikimedia.org/T177060) [23:24:55] (03PS5) 10Addshore: wdbuild: Switch wikidata extensions to json entrypoint where possible [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394283 (https://phabricator.wikimedia.org/T123026) [23:25:28] addshore: <3 [23:26:57] (03Merged) 10jenkins-bot: wdbuild: Remove wmgUseWikidataBuild [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394215 (owner: 10Addshore) [23:29:03] !log addshore@tin Synchronized wmf-config/Wikibase-buildentry.php: wdbuild: T173818 [[gerrit:394215|Remove wmgUseWikidataBuild]] (duration: 00m 45s) [23:29:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:29:13] T173818: [Epic] Kill the Wikidata build step - https://phabricator.wikimedia.org/T173818 [23:29:43] (03CR) 10jenkins-bot: wdbuild: Stop loading from build on all wikis (except enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394213 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [23:29:45] (03CR) 10jenkins-bot: wdbuild: Stop loading from build on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394214 (https://phabricator.wikimedia.org/T176948) (owner: 10Addshore) [23:29:47] (03CR) 10jenkins-bot: wdbuild: Remove wmgUseWikidataBuild [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394215 (owner: 10Addshore) [23:30:38] !log addshore@tin Synchronized wmf-config: wdbuild: T173818 [[gerrit:394215|Remove wmgUseWikidataBuild]] (duration: 00m 46s) [23:30:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:30:50] addshore: can we all go home now? [23:31:02] haha :P sure! [23:31:11] I'm still in the office >.> [23:31:24] (03CR) 10Addshore: [C: 032] wdbuild: Remove Wikibase-buildentry.php config file (empty) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394216 (owner: 10Addshore) [23:31:30] 3 more [23:32:57] (03Merged) 10jenkins-bot: wdbuild: Remove Wikibase-buildentry.php config file (empty) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394216 (owner: 10Addshore) [23:34:00] In case you didn't know there is a twitter bot that tweets operations SAL: https://twitter.com/wikimediatech [23:34:24] Yep that's mirrored since basically forever [23:34:33] !log addshore@tin Synchronized wmf-config/Wikibase.php: wdbuild: T173818 [[gerrit:394216|Remove Wikibase-buildentry.php config file (empty)]] (duration: 00m 45s) [23:34:42] addshore: <3 <3 <3 [23:34:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:34:43] T173818: [Epic] Kill the Wikidata build step - https://phabricator.wikimedia.org/T173818 [23:35:35] !log addshore@tin Synchronized wmf-config: wdbuild: T173818 [[gerrit:394216|Remove Wikibase-buildentry.php config file (empty)]] (duration: 00m 46s) [23:35:44] (03CR) 10Addshore: [C: 032] wdbuild: Add wikidata extensions to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394282 (https://phabricator.wikimedia.org/T177060) (owner: 10Addshore) [23:35:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:28] (03CR) 10Addshore: [C: 04-1] "Should probably do this in Wikibase.php where they are loaded too." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394283 (https://phabricator.wikimedia.org/T123026) (owner: 10Addshore) [23:36:38] Right, last one merging [23:37:31] (03Merged) 10jenkins-bot: wdbuild: Add wikidata extensions to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394282 (https://phabricator.wikimedia.org/T177060) (owner: 10Addshore) [23:39:12] !log addshore@tin Synchronized wmf-config: wdbuild: T173818 T177060 [[gerrit:394282|Add wikidata extensions to extension-list]] (duration: 00m 46s) [23:39:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:39:22] T177060: Stop using extension-list-wikidata from the build in mediawiki-config - https://phabricator.wikimedia.org/T177060 [23:39:27] no_justification: thats it! [23:39:43] Wheeeeee :D [23:39:53] (03PS2) 10Dzahn: parsoid: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/394488 (https://phabricator.wikimedia.org/T177225) [23:40:14] omg omg omg [23:42:35] I updated some docs... https://wikitech.wikimedia.org/wiki/How_to_determine_the_deployed_Wikidata_version hehe [23:42:56] (03PS1) 10Chad: Remove ArticleCreationWorkflow from extension-list-labs, it's in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394501 [23:43:39] (03CR) 10Dzahn: [C: 032] parsoid: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/394488 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [23:45:33] Bahahahahahaha [23:46:18] * apergos eyeballs no_justification with plenty of justification [23:48:12] I'm calling it a day [23:48:14] o/ [23:50:07] (03PS1) 10Chad: Remove timeless inclusion in labs, prod has it [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394503 [23:50:09] (03PS1) 10Chad: Remove AdvancedSearch inclusion in beta, it's in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394504 [23:51:04] (03CR) 10BBlack: [C: 031] vcl: add hostname/layer info to syntethic healthcheck response [puppet] - 10https://gerrit.wikimedia.org/r/393251 (owner: 10Ema) [23:51:16] (03CR) 10Chad: [C: 032] Remove ArticleCreationWorkflow from extension-list-labs, it's in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394501 (owner: 10Chad) [23:52:53] (03Merged) 10jenkins-bot: Remove ArticleCreationWorkflow from extension-list-labs, it's in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394501 (owner: 10Chad) [23:53:08] (03CR) 10jenkins-bot: wdbuild: Remove Wikibase-buildentry.php config file (empty) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394216 (owner: 10Addshore) [23:53:10] (03CR) 10jenkins-bot: wdbuild: Add wikidata extensions to extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394282 (https://phabricator.wikimedia.org/T177060) (owner: 10Addshore) [23:53:12] (03CR) 10jenkins-bot: Remove ArticleCreationWorkflow from extension-list-labs, it's in prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/394501 (owner: 10Chad) [23:53:51] * addshore is heading home [23:54:04] (03CR) 10BBlack: [C: 031] vcl: distinguish between hfp and hfm [puppet] - 10https://gerrit.wikimedia.org/r/391171 (https://phabricator.wikimedia.org/T180434) (owner: 10Ema) [23:54:27] !log demon@tin Synchronized wmf-config/extension-list-labs: no-op (duration: 00m 45s) [23:54:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:58:33] (03CR) 10BBlack: [C: 031] VCL: log TLS information to VSM [puppet] - 10https://gerrit.wikimedia.org/r/388064 (https://phabricator.wikimedia.org/T177199) (owner: 10Ema) [23:59:17] (03PS2) 10Dzahn: logstash: remove ganglia [puppet] - 10https://gerrit.wikimedia.org/r/394489 (https://phabricator.wikimedia.org/T177225)