[00:00:14] PROBLEM - Maps - OSM synchronization lag - eqiad on einsteinium is CRITICAL: 1.728e+05 ge 1.728e+05 https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=11&fullscreen&orgId=1 [00:00:43] PROBLEM - Maps - OSM synchronization lag - codfw on einsteinium is CRITICAL: 1.728e+05 ge 1.728e+05 https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=12&fullscreen&orgId=1 [00:08:23] PROBLEM - puppet last run on druid1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:33:54] RECOVERY - puppet last run on druid1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:10:03] PROBLEM - DPKG on kubestage1002 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [01:54:43] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 2.918 second response time [01:58:13] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:08:24] !log repooling wdqs1003. It has caught up with others [02:08:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:22:54] RECOVERY - Maps - OSM synchronization lag - eqiad on einsteinium is OK: (C)1.728e+05 ge (W)9e+04 ge 8570 https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=11&fullscreen&orgId=1 [02:23:33] RECOVERY - Maps - OSM synchronization lag - codfw on einsteinium is OK: (C)1.728e+05 ge (W)9e+04 ge 8606 https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=12&fullscreen&orgId=1 [03:34:33] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 842.93 seconds [04:03:33] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 157.69 seconds [05:06:54] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 277 bytes in 6.948 second response time [05:10:23] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [05:40:43] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 9.417 second response time [05:44:03] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [06:10:50] (03PS7) 10Giuseppe Lavagetto: beta: start using set_handler instead of the proxy passes [puppet] - 10https://gerrit.wikimedia.org/r/469203 [06:10:54] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] beta: start using set_handler instead of the proxy passes [puppet] - 10https://gerrit.wikimedia.org/r/469203 (owner: 10Giuseppe Lavagetto) [06:11:54] (03Abandoned) 10Giuseppe Lavagetto: mediawiki::web::beta_sites: enable serving content from php7-fpm [puppet] - 10https://gerrit.wikimedia.org/r/468990 (https://phabricator.wikimedia.org/T206338) (owner: 10Giuseppe Lavagetto) [06:12:28] (03Abandoned) 10Giuseppe Lavagetto: mediawiki::web::vhost: allow serving content from php7 [puppet] - 10https://gerrit.wikimedia.org/r/467878 (https://phabricator.wikimedia.org/T206338) (owner: 10Giuseppe Lavagetto) [06:13:48] (03Abandoned) 10Giuseppe Lavagetto: [PoC] How I'd like the mediawiki vhosts to be [puppet] - 10https://gerrit.wikimedia.org/r/468931 (owner: 10Giuseppe Lavagetto) [06:31:13] PROBLEM - puppet last run on stat1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/etc/R/biocLite.R] [06:33:04] PROBLEM - puppet last run on aqs1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 7 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/gen_fingerprints] [06:46:40] (03PS3) 10Giuseppe Lavagetto: mediawiki: remove now unused common includes [puppet] - 10https://gerrit.wikimedia.org/r/468930 [06:56:43] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:06] (03CR) 10Giuseppe Lavagetto: [C: 032] "https://puppet-compiler.wmflabs.org/compiler1002/13231/" [puppet] - 10https://gerrit.wikimedia.org/r/468930 (owner: 10Giuseppe Lavagetto) [06:58:44] RECOVERY - puppet last run on aqs1005 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:30:27] 10Operations, 10Wikimedia-Site-requests, 10HHVM: Set hhvm.virtual_host[default][always_decode_post_data] = false - https://phabricator.wikimedia.org/T208191 (10Bawolff) [07:43:14] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 7.528 second response time [07:46:17] (03PS2) 10Muehlenhoff: Disable prometheus rsyncd module for now [puppet] - 10https://gerrit.wikimedia.org/r/469630 [07:46:34] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:48:43] (03CR) 10Muehlenhoff: [C: 032] Disable prometheus rsyncd module for now [puppet] - 10https://gerrit.wikimedia.org/r/469630 (owner: 10Muehlenhoff) [07:51:58] (03PS4) 10Muehlenhoff: Add initial profile for Kerberos client [puppet] - 10https://gerrit.wikimedia.org/r/469853 [07:52:21] 10Operations, 10Wikimedia-Site-requests, 10HHVM: Set hhvm.virtual_host[default][always_decode_post_data] = false - https://phabricator.wikimedia.org/T208191 (10Bawolff) [07:53:14] 10Operations, 10Wikimedia-Site-requests, 10HHVM: Set hhvm.virtual_host[default][always_decode_post_data] = false - https://phabricator.wikimedia.org/T208191 (10Bawolff) For context, my specific issue is that The api dispatch code basically works by doing `$foo = $_POST + $_GET; $action = $foo['action'];` and... [07:53:49] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T208096 (10fgiunchedi) a:03Papaul Hi @Papaul, looks like this controller has a faulty battery, likely will need to have the battery replaced, what do you think? We have seen this issue before on a bunch of ms-be hosts i... [07:56:03] (03CR) 10Muehlenhoff: [C: 032] Add initial profile for Kerberos client [puppet] - 10https://gerrit.wikimedia.org/r/469853 (owner: 10Muehlenhoff) [08:00:04] gilles: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Performance perception survey (QuickSurveys) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T0800). [08:00:29] indeed [08:00:40] (03PS2) 10Muehlenhoff: Convert udp2log::rsyncd to auto_ferm [puppet] - 10https://gerrit.wikimedia.org/r/469627 [08:00:54] !log Deploying time-sensitive backport to QuickSurveys [08:00:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:04:16] (03CR) 10Muehlenhoff: [C: 032] Convert udp2log::rsyncd to auto_ferm [puppet] - 10https://gerrit.wikimedia.org/r/469627 (owner: 10Muehlenhoff) [08:07:15] !log reformat ms-be1042 xfs filesystems - T199198 [08:07:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:07:19] T199198: Some swift filesystems reporting negative disk usage - https://phabricator.wikimedia.org/T199198 [08:07:32] (03CR) 10Gilles: [C: 032] Enable performance perception survey shuffling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470065 (https://phabricator.wikimedia.org/T208088) (owner: 10Gilles) [08:08:39] (03PS2) 10Muehlenhoff: Use auto_ferm for profile::analytics::database::meta::backup_dest [puppet] - 10https://gerrit.wikimedia.org/r/469635 [08:08:42] (03Merged) 10jenkins-bot: Enable performance perception survey shuffling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470065 (https://phabricator.wikimedia.org/T208088) (owner: 10Gilles) [08:09:02] jouncebot: now [08:09:02] For the next 0 hour(s) and 20 minute(s): Performance perception survey (QuickSurveys) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T0800) [08:09:07] jouncebot: next [08:09:07] In 2 hour(s) and 20 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T1030) [08:11:53] 10Operations, 10DBA, 10monitoring: Create a script to regenerate prometheus mysqld exporter listing that works with puppetdb - https://phabricator.wikimedia.org/T145072 (10jcrespo) I think this should be moved to zarcillo mariadb metadata database, and centralize there the active database control (substituti... [08:14:24] RECOVERY - Check systemd state on db1117 is OK: OK - running: The system is fully operational [08:15:22] 10Operations, 10Cloud-Services, 10Cloud-VPS: labs precise and jessie instance not accessible after provisioning - https://phabricator.wikimedia.org/T117673 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi Haven't seen this either lately, resolving. [08:17:00] jouncebot: refresh [08:17:01] I refreshed my knowledge about deployments. [08:17:02] jouncebot: next [08:17:02] In 0 hour(s) and 12 minute(s): Cleaning up Wikibase mw-config (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T0830) [08:19:39] (03CR) 10Filippo Giunchedi: [C: 031] Remove Diamond from LVSes [puppet] - 10https://gerrit.wikimedia.org/r/469594 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [08:19:48] (03CR) 10Filippo Giunchedi: [C: 031] Remove Pybal Diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/469593 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [08:21:58] (03CR) 10jenkins-bot: Enable performance perception survey shuffling [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470065 (https://phabricator.wikimedia.org/T208088) (owner: 10Gilles) [08:27:25] (03PS1) 10Elukey: Allow tuning of the Yarn RM's zookeeper session timeout [puppet/cdh] - 10https://gerrit.wikimedia.org/r/470327 (https://phabricator.wikimedia.org/T206943) [08:27:46] (03CR) 10jerkins-bot: [V: 04-1] Allow tuning of the Yarn RM's zookeeper session timeout [puppet/cdh] - 10https://gerrit.wikimedia.org/r/470327 (https://phabricator.wikimedia.org/T206943) (owner: 10Elukey) [08:29:04] !log gilles@deploy1001 Started scap: T208088 Enable performance QuickSurvey shuffling [08:29:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:29:08] T208088: Add ability to randomize display order of answer options in QuickSurveys - https://phabricator.wikimedia.org/T208088 [08:30:04] addshore: It is that lovely time of the day again! You are hereby commanded to deploy Cleaning up Wikibase mw-config. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T0830). [08:30:15] gilles: let me know when you are done :) [08:30:30] addshore: sure thing, scap just started [08:30:51] thanks! [08:31:22] (03PS2) 10Elukey: Allow tuning of the Yarn RM's zookeeper session timeout [puppet/cdh] - 10https://gerrit.wikimedia.org/r/470327 (https://phabricator.wikimedia.org/T206943) [08:31:24] (03PS1) 10Elukey: Fix some linting warnings [puppet/cdh] - 10https://gerrit.wikimedia.org/r/470328 [08:32:20] (03PS3) 10Addshore: Wikibase, create and use wmgWikibaseMaxSerializedEntitySize [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470149 [08:32:24] (03CR) 10Addshore: [C: 032] Wikibase, create and use wmgWikibaseMaxSerializedEntitySize [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470149 (owner: 10Addshore) [08:32:37] gilles: aaah it s afull scap? [08:32:46] yeah, I'm adding new files [08:33:30] 10Operations, 10ops-eqiad, 10DBA: db1117 went away - https://phabricator.wikimedia.org/T208150 (10jcrespo) so the error happens because it is tried to be run manually, which it not a big deal if it errors out- just delete any file you may have added. I ran `systemctl disable is it https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/QuickSurveys/+/470257/ ? [08:33:36] 10Operations, 10Beta-Cluster-Infrastructure, 10DNS, 10Traffic, and 4 others: Ferm's upstream Net::DNS Perl library questionable handling of NOERROR responses without records causing puppet errors when we try to @resolve AAAA in labs - https://phabricator.wikimedia.org/T153468 (10fgiunchedi) [08:33:52] I don't think that should need a full full scap, only a scap of the QuickSurveys directory [08:33:54] addshore: yes, and a config change [08:34:05] (03CR) 10Elukey: [C: 032] Allow tuning of the Yarn RM's zookeeper session timeout [puppet/cdh] - 10https://gerrit.wikimedia.org/r/470327 (https://phabricator.wikimedia.org/T206943) (owner: 10Elukey) [08:34:11] (03CR) 10Elukey: [C: 032] Fix some linting warnings [puppet/cdh] - 10https://gerrit.wikimedia.org/r/470328 (owner: 10Elukey) [08:34:18] i expect the full scap will end up taking ~40 mins? but the dir scap should only take a couple [08:34:22] ah, didn't know that was possible, didn't see that mentioned in the "how to deploy code" wiki page [08:34:44] 10Operations, 10ops-eqiad, 10DBA: db1117 went away - https://phabricator.wikimedia.org/T208150 (10jcrespo) [08:35:05] ah it's if you're "adding directories" duh... [08:35:09] sorry about that [08:35:14] :D [08:35:27] you should be able to stop the sync and just do a dir sync [08:35:33] cool [08:35:34] !log gilles@deploy1001 sync aborted: T208088 Enable performance QuickSurvey shuffling (duration: 06m 30s) [08:35:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:35:38] T208088: Add ability to randomize display order of answer options in QuickSurveys - https://phabricator.wikimedia.org/T208088 [08:35:54] !log gilles@deploy1001 Started scap: T208088 Enable performance QuickSurvey shuffling [08:35:54] !log gilles@deploy1001 sync aborted: T208088 Enable performance QuickSurvey shuffling (duration: 00m 00s) [08:35:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:36:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:37:03] i changed the heading of that section on the page to include the word "directories" [08:38:48] CDB stuff spewed some errors, but it kept going, is that expected? [08:38:50] !log gilles@deploy1001 Synchronized php-1.33.0-wmf.1/extensions/QuickSurveys: T208088 Add ability to shuffle answers display order (duration: 01m 51s) [08:38:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:39:12] gilles: when cancelling the previous one? [08:39:22] nope, when running the actual sync-file [08:40:02] paste them somewhere? [08:40:06] 10Operations: Access requests process: People sometimes specify hostnames instead of admin groups in access requests - https://phabricator.wikimedia.org/T207754 (10MoritzMuehlenhoff) Can you clarify the scope/intent of this task, it's not obvious to me. Is this about the fact that some access::group permissions... [08:40:43] addshore: https://phabricator.wikimedia.org/P7728 [08:40:46] the sync-file won't be syncing cdb files so should be nothing to worry about [08:41:24] should be fine, as the next scpa sync will re do all of those actions :) [08:42:33] !log gilles@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T208088 Enable performance perception survey shuffling (duration: 00m 47s) [08:42:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:42:37] T208088: Add ability to randomize display order of answer options in QuickSurveys - https://phabricator.wikimedia.org/T208088 [08:42:58] (03CR) 10Addshore: [C: 032] Wikibase, create and use wmgWikibaseMaxSerializedEntitySize [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470149 (owner: 10Addshore) [08:43:36] addshore: I am done [08:43:44] gilles: amazing, thanks! [08:43:52] thanks for the help [08:43:58] np [08:44:18] (03Merged) 10jenkins-bot: Wikibase, create and use wmgWikibaseMaxSerializedEntitySize [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470149 (owner: 10Addshore) [08:44:51] (03PS1) 10Elukey: Rename Yarn zookeeper session timeout property to include "ms" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/470333 [08:45:07] (03PS3) 10Addshore: Wikibase, Split specialSiteLinkGroups and manage from IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470150 [08:45:43] (03CR) 10jenkins-bot: Wikibase, create and use wmgWikibaseMaxSerializedEntitySize [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470149 (owner: 10Addshore) [08:46:17] (03CR) 10Elukey: [V: 032 C: 032] Rename Yarn zookeeper session timeout property to include "ms" [puppet/cdh] - 10https://gerrit.wikimedia.org/r/470333 (owner: 10Elukey) [08:49:29] meh, looks like because the cdb files were stopped part way through their regeneration scap doesn't like them now, so running a "scap sync-l10n" for the deployed branch... [08:49:58] !log addshore@deploy1001 sync-l10n aborted: (no justification provided) (duration: 01m 19s) [08:50:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:25] looks like that won't work either and I just have to do the full sync anyway... [08:50:32] !log addshore@deploy1001 Started scap: (no justification provided) [08:50:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:36] !log addshore@deploy1001 sync aborted: (no justification provided) (duration: 00m 03s) [08:50:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:43] (03PS1) 10Elukey: profile::hadoop::common: raise Yarn zookeeper timeout to 20s [puppet] - 10https://gerrit.wikimedia.org/r/470336 (https://phabricator.wikimedia.org/T206943) [08:51:00] (03PS1) 10Addshore: Revert "Wikibase, create and use wmgWikibaseMaxSerializedEntitySize" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470337 [08:51:04] (03CR) 10Addshore: [C: 032] Revert "Wikibase, create and use wmgWikibaseMaxSerializedEntitySize" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470337 (owner: 10Addshore) [08:52:07] (03Merged) 10jenkins-bot: Revert "Wikibase, create and use wmgWikibaseMaxSerializedEntitySize" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470337 (owner: 10Addshore) [08:52:35] !log addshore@deploy1001 Started scap: sync with no changes [08:52:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:52:42] 10Operations, 10ops-eqiad, 10DBA: db1117 went away - https://phabricator.wikimedia.org/T208150 (10Banyek) What cli command did you use for getting those logs? [08:55:14] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/compiler1002/13235/" [puppet] - 10https://gerrit.wikimedia.org/r/470336 (https://phabricator.wikimedia.org/T206943) (owner: 10Elukey) [08:55:25] (03PS5) 10Giuseppe Lavagetto: mediawiki: add httpd class, alternative to mediawiki::web [puppet] - 10https://gerrit.wikimedia.org/r/467643 [08:55:27] (03PS5) 10Giuseppe Lavagetto: mediawiki::webserver: introduce profile, use it on mwdebug* [puppet] - 10https://gerrit.wikimedia.org/r/467644 [08:56:46] !log restart yarn on an-master100[1,2] to pick up new zookeeper timeout settings (10s -> 20s) - T206943 [08:56:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:56:50] T206943: JVM pauses cause Yarn master to failover - https://phabricator.wikimedia.org/T206943 [08:58:00] (03CR) 10Ema: [C: 031] Remove Pybal Diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/469593 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [08:58:09] (03CR) 10Ema: [C: 031] Remove Diamond from LVSes [puppet] - 10https://gerrit.wikimedia.org/r/469594 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [08:59:47] 10Operations, 10User-jijiki: Redefine privileges and access for perf-roots group - https://phabricator.wikimedia.org/T207666 (10jijiki) p:05Triage>03Normal [09:02:02] 10Operations, 10monitoring, 10User-Joe, 10Wikimedia-Incident: Monitor redis memory/disk usage - https://phabricator.wikimedia.org/T110169 (10jijiki) Should we merge this with T148637 ? [09:03:09] (03CR) 10jenkins-bot: Revert "Wikibase, create and use wmgWikibaseMaxSerializedEntitySize" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470337 (owner: 10Addshore) [09:07:14] !log addshore@deploy1001 Finished scap: sync with no changes (duration: 14m 39s) [09:07:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:10:04] (03PS6) 10Giuseppe Lavagetto: mediawiki::webserver: introduce profile, use it on mwdebug* [puppet] - 10https://gerrit.wikimedia.org/r/467644 [09:12:24] (03PS1) 10Addshore: Revert "Revert "Wikibase, create and use wmgWikibaseMaxSerializedEntitySize"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470339 [09:12:29] (03CR) 10Addshore: [C: 032] Revert "Revert "Wikibase, create and use wmgWikibaseMaxSerializedEntitySize"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470339 (owner: 10Addshore) [09:14:18] (03Merged) 10jenkins-bot: Revert "Revert "Wikibase, create and use wmgWikibaseMaxSerializedEntitySize"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470339 (owner: 10Addshore) [09:14:40] 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review: Concerns about icinga1001 check latency - https://phabricator.wikimedia.org/T208066 (10fgiunchedi) >>! In T208066#4697985, @Dzahn wrote: > other things i have tried but not puppetized so far: > > (point 5 from the tuning guide, Max Reaper Time.) >... [09:14:45] (03PS7) 10Giuseppe Lavagetto: mediawiki::webserver: introduce profile, use it on mwdebug* [puppet] - 10https://gerrit.wikimedia.org/r/467644 [09:17:34] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 3.985 second response time [09:17:52] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: PT1/2 [[gerrit:470339]] Introduce wmgWikibaseMaxSerializedEntitySize (duration: 00m 47s) [09:18:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:18:39] (03CR) 10jenkins-bot: Revert "Revert "Wikibase, create and use wmgWikibaseMaxSerializedEntitySize"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470339 (owner: 10Addshore) [09:19:04] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: PT2/2 [[gerrit:470339]] Introduce wmgWikibaseMaxSerializedEntitySize (duration: 00m 46s) [09:19:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:19:08] (03PS1) 10Gilles: Double performance perception survey sampling on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470340 (https://phabricator.wikimedia.org/T187299) [09:19:38] addshore: if you happen to finish your window early, let me know [09:21:04] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:21:17] gilles: acl [09:21:18] ack [09:23:32] (03PS4) 10Addshore: Wikibase, Split specialSiteLinkGroups and manage from IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470150 [09:23:35] (03CR) 10Addshore: [C: 032] Wikibase, Split specialSiteLinkGroups and manage from IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470150 (owner: 10Addshore) [09:24:50] (03Merged) 10jenkins-bot: Wikibase, Split specialSiteLinkGroups and manage from IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470150 (owner: 10Addshore) [09:29:00] (03PS8) 10Banyek: mariadb: table checker for monitoring data drift [puppet] - 10https://gerrit.wikimedia.org/r/469889 (https://phabricator.wikimedia.org/T207253) [09:29:37] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: PT1/2 [[gerrit:470150]] Wikibase, Split specialSiteLinkGroups and manage from IS.php (duration: 00m 47s) [09:29:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:30:09] !log resume cache hosts rolling reboots for kernel/microcode updates T203011 [09:30:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:30:41] (03PS6) 10Giuseppe Lavagetto: mediawiki: add httpd class, alternative to mediawiki::web [puppet] - 10https://gerrit.wikimedia.org/r/467643 [09:30:42] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: PT2/2 [[gerrit:470150]] Wikibase, Split specialSiteLinkGroups and manage from IS.php (duration: 00m 46s) [09:30:43] (03PS8) 10Giuseppe Lavagetto: mediawiki::webserver: introduce profile, use it on mwdebug* [puppet] - 10https://gerrit.wikimedia.org/r/467644 [09:30:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:31:01] (03PS3) 10Addshore: Wikibase, move wmgWBSiteLinkGroups to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470151 [09:31:07] (03CR) 10Addshore: [C: 032] Wikibase, move wmgWBSiteLinkGroups to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470151 (owner: 10Addshore) [09:32:04] jouncebot: now [09:32:04] For the next 0 hour(s) and 57 minute(s): Cleaning up Wikibase mw-config (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T0830) [09:32:48] (03Merged) 10jenkins-bot: Wikibase, move wmgWBSiteLinkGroups to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470151 (owner: 10Addshore) [09:33:49] (03PS4) 10Addshore: Wikibase, kill $wmgWBSharedSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470152 [09:33:51] (03CR) 10Filippo Giunchedi: create rsyslog::ship_logfile - simplified logstash shipper via kafka (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/469945 (https://phabricator.wikimedia.org/T206454) (owner: 10Herron) [09:34:13] (03CR) 10Addshore: [C: 032] Wikibase, kill $wmgWBSharedSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470152 (owner: 10Addshore) [09:34:15] (03CR) 10jenkins-bot: Wikibase, Split specialSiteLinkGroups and manage from IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470150 (owner: 10Addshore) [09:34:17] (03CR) 10jenkins-bot: Wikibase, move wmgWBSiteLinkGroups to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470151 (owner: 10Addshore) [09:34:26] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: PT1/2 [[gerrit:470151]] Wikibase, move wmgWBSiteLinkGroups to IS.php (duration: 00m 47s) [09:34:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:35:25] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: PT2/2 [[gerrit:470151]] Wikibase, move wmgWBSiteLinkGroups to IS.php (duration: 00m 46s) [09:35:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:35:48] (03Merged) 10jenkins-bot: Wikibase, kill $wmgWBSharedSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470152 (owner: 10Addshore) [09:37:14] (03PS3) 10Addshore: Wikibase, define $wgExtraNamespaces in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470153 [09:37:17] (03CR) 10Addshore: [C: 032] Wikibase, define $wgExtraNamespaces in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470153 (owner: 10Addshore) [09:37:33] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: [[gerrit:470152]] Wikibase, kill $wmgWBSharedSettings (duration: 00m 47s) [09:37:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:38:29] (03Merged) 10jenkins-bot: Wikibase, define $wgExtraNamespaces in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470153 (owner: 10Addshore) [09:40:04] (03PS3) 10Addshore: Wikibase, put all wgNamespaceAliases in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470154 [09:40:08] (03CR) 10Addshore: [C: 032] Wikibase, put all wgNamespaceAliases in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470154 (owner: 10Addshore) [09:40:40] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:470153]] PT1/2 Wikibase, define $wgExtraNamespaces in IS.php (duration: 00m 46s) [09:40:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:41:40] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: [[gerrit:470153]] PT2/2 Wikibase, define $wgExtraNamespaces in IS.php (duration: 00m 47s) [09:41:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:41:47] (03Merged) 10jenkins-bot: Wikibase, put all wgNamespaceAliases in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470154 (owner: 10Addshore) [09:43:11] !log addshore@deploy1001 sync-file aborted: [[gerrit:470154]] PT1/2 (duration: 00m 00s) [09:43:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:07] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:470154]] PT1/2 Wikibase, put all wgNamespaceAliases in IS.php (duration: 00m 47s) [09:44:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:44:15] (03PS3) 10Addshore: Wikibase, create and use wmgWikibaseClientInjectRecentChanges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470155 [09:44:20] (03PS4) 10Addshore: Wikibase, create and use wmgWikibaseClientInjectRecentChanges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470155 [09:44:25] (03CR) 10Addshore: [C: 032] Wikibase, create and use wmgWikibaseClientInjectRecentChanges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470155 (owner: 10Addshore) [09:45:25] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: [[gerrit:470154]] PT2/2 Wikibase, put all wgNamespaceAliases in IS.php (duration: 00m 46s) [09:45:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:45:46] (03Merged) 10jenkins-bot: Wikibase, create and use wmgWikibaseClientInjectRecentChanges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470155 (owner: 10Addshore) [09:49:12] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: pt1/3 [[gerrit:470155]] Wikibase, create and use wmgWikibaseClientInjectRecentChanges (duration: 00m 47s) [09:49:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:50:01] (03CR) 10jenkins-bot: Wikibase, kill $wmgWBSharedSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470152 (owner: 10Addshore) [09:50:03] (03CR) 10jenkins-bot: Wikibase, define $wgExtraNamespaces in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470153 (owner: 10Addshore) [09:50:05] (03CR) 10jenkins-bot: Wikibase, put all wgNamespaceAliases in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470154 (owner: 10Addshore) [09:50:07] (03CR) 10jenkins-bot: Wikibase, create and use wmgWikibaseClientInjectRecentChanges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470155 (owner: 10Addshore) [09:50:10] (03PS3) 10Addshore: Wikibase, remove unused wmgWikibaseClientSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470156 [09:50:17] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: pt2/3 [[gerrit:470155]] Wikibase, create and use wmgWikibaseClientInjectRecentChanges (duration: 00m 47s) [09:50:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:50:42] (03CR) 10Addshore: [C: 032] Wikibase, remove unused wmgWikibaseClientSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470156 (owner: 10Addshore) [09:51:12] !log addshore@deploy1001 Synchronized wmf-config/Wikibase-production.php: pt3/3 [[gerrit:470155]] Wikibase, create and use wmgWikibaseClientInjectRecentChanges (duration: 00m 46s) [09:51:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:51:46] jouncebot: next [09:51:46] In 0 hour(s) and 38 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T1030) [09:51:59] (03Merged) 10jenkins-bot: Wikibase, remove unused wmgWikibaseClientSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470156 (owner: 10Addshore) [09:53:24] PROBLEM - High lag on wdqs1003 is CRITICAL: 3642 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [09:53:28] (03PS2) 10Elukey: profile::statistics::private: move geoip archive to another dir [puppet] - 10https://gerrit.wikimedia.org/r/469890 (https://phabricator.wikimedia.org/T208028) [09:54:09] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: PT1/2 [[gerrit:470156]] Wikibase, remove unused wmgWikibaseClientSettings (duration: 00m 47s) [09:54:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:54:34] (03PS3) 10Addshore: Wikibase, Remove unused wmgUseWikibaseQualityExternalValidation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470158 [09:54:37] (03CR) 10Addshore: [C: 032] Wikibase, Remove unused wmgUseWikibaseQualityExternalValidation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470158 (owner: 10Addshore) [09:55:08] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: PT2/2 [[gerrit:470156]] Wikibase, remove unused wmgWikibaseClientSettings (duration: 00m 47s) [09:55:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:55:52] (03Merged) 10jenkins-bot: Wikibase, Remove unused wmgUseWikibaseQualityExternalValidation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470158 (owner: 10Addshore) [09:56:12] (03PS2) 10Muehlenhoff: Remove Pybal Diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/469593 (https://phabricator.wikimedia.org/T183454) [09:56:40] (03PS3) 10Addshore: Wikibase, add IS.php setting for each possible extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470159 [09:56:57] (03CR) 10Muehlenhoff: [C: 032] Remove Pybal Diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/469593 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [09:57:14] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:470158]] Wikibase, Remove unused wmgUseWikibaseQualityExternalValidation (duration: 00m 47s) [09:57:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:57:36] (03CR) 10Addshore: [C: 032] Wikibase, add IS.php setting for each possible extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470159 (owner: 10Addshore) [09:58:45] (03Merged) 10jenkins-bot: Wikibase, add IS.php setting for each possible extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470159 (owner: 10Addshore) [10:00:47] (03PS3) 10Addshore: Wikibase.php, move a bunch of config into 'clean' area [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470160 [10:00:51] (03CR) 10Addshore: [C: 032] Wikibase.php, move a bunch of config into 'clean' area [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470160 (owner: 10Addshore) [10:01:23] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: [[gerrit:470159]] PT1/2 Wikibase, add IS.php setting for each possible extension (duration: 00m 47s) [10:01:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:02:05] (03Merged) 10jenkins-bot: Wikibase.php, move a bunch of config into 'clean' area [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470160 (owner: 10Addshore) [10:02:24] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: [[gerrit:470159]] PT2/2 Wikibase, add IS.php setting for each possible extension (duration: 00m 47s) [10:02:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:08] (03PS1) 10Filippo Giunchedi: rsyslog: add prometheus-rsyslog-exporter support [puppet] - 10https://gerrit.wikimedia.org/r/470345 (https://phabricator.wikimedia.org/T205862) [10:03:10] (03PS1) 10Filippo Giunchedi: WIP: temporary workaround co-installability of two roles [puppet] - 10https://gerrit.wikimedia.org/r/470346 [10:03:20] !log starting to switch wdqs1003 and wdqs1006 - T207947 [10:03:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:03:24] T207947: Switch wdqs1003 with one of the internal wdqs cluster - https://phabricator.wikimedia.org/T207947 [10:03:28] (03PS3) 10Gehel: wdqs: remove wdqs1006 from internal cluster [puppet] - 10https://gerrit.wikimedia.org/r/469685 (https://phabricator.wikimedia.org/T207947) [10:03:43] PROBLEM - Check systemd state on an-coord1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [10:03:48] (03CR) 10jerkins-bot: [V: 04-1] rsyslog: add prometheus-rsyslog-exporter support [puppet] - 10https://gerrit.wikimedia.org/r/470345 (https://phabricator.wikimedia.org/T205862) (owner: 10Filippo Giunchedi) [10:03:52] (03PS3) 10Addshore: Wikibase, Create and use wmgWikibaseRepoStatementSections [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470161 [10:04:03] gilles: just a couple more, should have a little time left at the end of the slot for you [10:04:10] addshore: thanks :) [10:04:24] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: [[gerrit:470160]] Wikibase.php, move a bunch of config into clean area NOOP (duration: 00m 47s) [10:04:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:04:36] (03CR) 10Addshore: [C: 032] Wikibase, Create and use wmgWikibaseRepoStatementSections [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470161 (owner: 10Addshore) [10:05:35] (03CR) 10jenkins-bot: Wikibase, remove unused wmgWikibaseClientSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470156 (owner: 10Addshore) [10:05:37] (03CR) 10jenkins-bot: Wikibase, Remove unused wmgUseWikibaseQualityExternalValidation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470158 (owner: 10Addshore) [10:05:39] (03CR) 10jenkins-bot: Wikibase, add IS.php setting for each possible extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470159 (owner: 10Addshore) [10:05:41] (03CR) 10jenkins-bot: Wikibase.php, move a bunch of config into 'clean' area [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470160 (owner: 10Addshore) [10:05:48] (03Merged) 10jenkins-bot: Wikibase, Create and use wmgWikibaseRepoStatementSections [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470161 (owner: 10Addshore) [10:06:02] (03CR) 10jenkins-bot: Wikibase, Create and use wmgWikibaseRepoStatementSections [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470161 (owner: 10Addshore) [10:06:49] (03PS2) 10Gilles: Double performance perception survey sampling on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470340 (https://phabricator.wikimedia.org/T187299) [10:07:11] on the last 3 syncs [10:07:15] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: PT1/3 [[gerrit:470161]] Wikibase, Create and use wmgWikibaseRepoStatementSections (duration: 00m 50s) [10:07:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:07:54] (03PS2) 10Filippo Giunchedi: WIP: temporary workaround co-installability of two roles [puppet] - 10https://gerrit.wikimedia.org/r/470346 [10:07:56] (03PS2) 10Filippo Giunchedi: rsyslog: add prometheus-rsyslog-exporter support [puppet] - 10https://gerrit.wikimedia.org/r/470345 (https://phabricator.wikimedia.org/T205862) [10:08:07] (03CR) 10Gehel: [C: 032] wdqs: remove wdqs1006 from internal cluster [puppet] - 10https://gerrit.wikimedia.org/r/469685 (https://phabricator.wikimedia.org/T207947) (owner: 10Gehel) [10:08:15] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: PT2/3 [[gerrit:470161]] Wikibase, Create and use wmgWikibaseRepoStatementSections (duration: 00m 47s) [10:08:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:09:14] !log addshore@deploy1001 Synchronized wmf-config/Wikibase-production.php: PT3/3 [[gerrit:470161]] Wikibase, Create and use wmgWikibaseRepoStatementSections (duration: 00m 47s) [10:09:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:10:33] !log addshore@deploy1001 Synchronized wmf-config: final sync (duration: 00m 47s) [10:10:35] gilles: its all yours [10:10:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:10:38] jouncebot: next [10:10:38] In 0 hour(s) and 19 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T1030) [10:10:56] (03CR) 10Gilles: [C: 032] Double performance perception survey sampling on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470340 (https://phabricator.wikimedia.org/T187299) (owner: 10Gilles) [10:12:01] !log Moving l10n_cache-ti.cdb files on deploy1001: sudo -u l10nupdate mv l10n_cache-ti.cdb.json l10n_cache-ti.cdb.json-back-T208196 && sudo -u l10nupdate mv l10n_cache-ti.cdb.MD5 l10n_cache-ti.cdb.MD5-back-T208196 # T208196 [10:12:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:12:08] T208196: l10nupdate aborts due to l10n_cache-ti.cdb.json being truncated - https://phabricator.wikimedia.org/T208196 [10:12:11] (03Merged) 10jenkins-bot: Double performance perception survey sampling on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470340 (https://phabricator.wikimedia.org/T187299) (owner: 10Gilles) [10:13:06] (03PS3) 10Gehel: wdqs: add wdqs1006 to public cluster [puppet] - 10https://gerrit.wikimedia.org/r/469686 (https://phabricator.wikimedia.org/T207947) [10:13:22] (03PS7) 10Giuseppe Lavagetto: mediawiki: add httpd class, alternative to mediawiki::web [puppet] - 10https://gerrit.wikimedia.org/r/467643 [10:13:24] (03PS9) 10Giuseppe Lavagetto: mediawiki::webserver: introduce profile, use it on mwdebug* [puppet] - 10https://gerrit.wikimedia.org/r/467644 [10:13:27] (03PS1) 10Giuseppe Lavagetto: httpd: add httpd::env [puppet] - 10https://gerrit.wikimedia.org/r/470347 [10:14:39] (03CR) 10Gehel: [C: 032] wdqs: add wdqs1006 to public cluster [puppet] - 10https://gerrit.wikimedia.org/r/469686 (https://phabricator.wikimedia.org/T207947) (owner: 10Gehel) [10:15:59] !log gilles@deploy1001 Synchronized wmf-config/InitialiseSettings.php: T187299 Double sampling of performance perception survey on ruwiki (duration: 00m 47s) [10:16:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:18:50] (03PS3) 10Gehel: wdqs: remove wdqs1003 from public cluster [puppet] - 10https://gerrit.wikimedia.org/r/469687 (https://phabricator.wikimedia.org/T207947) [10:18:52] (03CR) 10Gehel: [C: 032] wdqs: remove wdqs1003 from public cluster [puppet] - 10https://gerrit.wikimedia.org/r/469687 (https://phabricator.wikimedia.org/T207947) (owner: 10Gehel) [10:19:34] PROBLEM - puppet last run on labtestservices2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:19:49] (03PS3) 10Gehel: wdqs: add wdqs1003 to internal cluster [puppet] - 10https://gerrit.wikimedia.org/r/469688 (https://phabricator.wikimedia.org/T207947) [10:20:59] (03CR) 10Gehel: [C: 032] wdqs: add wdqs1003 to internal cluster [puppet] - 10https://gerrit.wikimedia.org/r/469688 (https://phabricator.wikimedia.org/T207947) (owner: 10Gehel) [10:21:07] (03CR) 10jenkins-bot: Double performance perception survey sampling on ruwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470340 (https://phabricator.wikimedia.org/T187299) (owner: 10Gilles) [10:21:21] (03PS1) 10Addshore: Wikibase, move dispatching settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470348 [10:21:21] !log Restore ti l10n files on deploy1001:/srv/mediawiki-staging/php-1.33.0-wmf.1/cache/l10n/upstream # T208196 [10:21:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:21:26] T208196: l10nupdate aborts due to l10n_cache-ti.cdb.json being truncated - https://phabricator.wikimedia.org/T208196 [10:21:31] hashar: fixed it? :) [10:22:55] !log switch wdqs1003 and wdqs1006 completed, wdqs1003 still depooled to catch up on update lag - T207947 [10:22:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:22:58] T207947: Switch wdqs1003 with one of the internal wdqs cluster - https://phabricator.wikimedia.org/T207947 [10:23:07] (03PS1) 10Addshore: Wikibase, Move $wgPropertySuggesterMinProbability to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470349 [10:30:05] jan_drewniak: It is that lovely time of the day again! You are hereby commanded to deploy Wikimedia Portals Update. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T1030). [10:30:21] (03PS2) 10Muehlenhoff: Remove Diamond from LVSes [puppet] - 10https://gerrit.wikimedia.org/r/469594 (https://phabricator.wikimedia.org/T183454) [10:31:38] (03CR) 10Muehlenhoff: [C: 032] Remove Diamond from LVSes [puppet] - 10https://gerrit.wikimedia.org/r/469594 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [10:33:28] (03PS1) 10Addshore: Wikibase, Move property suggester settings to IS files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470351 [10:34:24] PROBLEM - High lag on wdqs1003 is CRITICAL: 3918 ge 3600 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [10:35:35] ACKNOWLEDGEMENT - High lag on wdqs1003 is CRITICAL: 3978 ge 3600 Gehel depooled, should recover soon https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [10:35:44] PROBLEM - puppet last run on lvs2006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:35:44] PROBLEM - puppet last run on lvs2005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:35:52] that's me, fixing [10:36:13] PROBLEM - puppet last run on lvs2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:36:47] (03PS1) 10Muehlenhoff: Revert "Remove Diamond from LVSes" [puppet] - 10https://gerrit.wikimedia.org/r/470352 [10:37:31] (03CR) 10Muehlenhoff: [C: 032] Revert "Remove Diamond from LVSes" [puppet] - 10https://gerrit.wikimedia.org/r/470352 (owner: 10Muehlenhoff) [10:37:33] (03PS1) 10Ema: check_vcl_reload: no unknowns if reload-vcl still has to run [puppet] - 10https://gerrit.wikimedia.org/r/470353 (https://phabricator.wikimedia.org/T206950) [10:38:15] (03CR) 10Fdans: [C: 031] "There's a case to be made that maybe we don't need these files both in hdfs and in disk, but we don't have to decide that now. This patch " [puppet] - 10https://gerrit.wikimedia.org/r/469890 (https://phabricator.wikimedia.org/T208028) (owner: 10Elukey) [10:38:34] PROBLEM - puppet last run on lvs3001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:39:08] (03PS2) 10Ema: check_vcl_reload: no unknowns if reload-vcl still has to run [puppet] - 10https://gerrit.wikimedia.org/r/470353 (https://phabricator.wikimedia.org/T206950) [10:39:56] (03PS3) 10Elukey: profile::statistics::private: move geoip archive to another dir [puppet] - 10https://gerrit.wikimedia.org/r/469890 (https://phabricator.wikimedia.org/T208028) [10:40:44] RECOVERY - puppet last run on lvs2005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:41:14] RECOVERY - puppet last run on lvs2004 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:41:54] PROBLEM - puppet last run on lvs4005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:42:23] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 6.179 second response time [10:43:03] PROBLEM - puppet last run on lvs2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:43:43] RECOVERY - puppet last run on lvs3001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:44:00] (03CR) 10Elukey: [C: 032] profile::statistics::private: move geoip archive to another dir [puppet] - 10https://gerrit.wikimedia.org/r/469890 (https://phabricator.wikimedia.org/T208028) (owner: 10Elukey) [10:44:13] (03PS1) 10Addshore: Wikibase, move badge related config to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470355 [10:44:37] (03PS2) 10Addshore: Wikibase, Move badge related config to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470355 [10:44:54] RECOVERY - puppet last run on labtestservices2001 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [10:45:34] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:47:40] (03CR) 10Arturo Borrero Gonzalez: "> Other than a bin/sbin typo, it seems okay so far. You can check" [puppet] - 10https://gerrit.wikimedia.org/r/469614 (https://phabricator.wikimedia.org/T207591) (owner: 10Arturo Borrero Gonzalez) [10:52:33] (03PS1) 10Elukey: profile::statistics::private: fix wrong file ensure [puppet] - 10https://gerrit.wikimedia.org/r/470357 (https://phabricator.wikimedia.org/T208028) [10:52:52] (03PS2) 10Rxy: Remove global action related permissions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469864 (https://phabricator.wikimedia.org/T208035) [10:52:54] (03PS10) 10Giuseppe Lavagetto: mediawiki::webserver: introduce profile, use it on mwdebug* [puppet] - 10https://gerrit.wikimedia.org/r/467644 [10:53:07] (03CR) 10Elukey: [C: 032] profile::statistics::private: fix wrong file ensure [puppet] - 10https://gerrit.wikimedia.org/r/470357 (https://phabricator.wikimedia.org/T208028) (owner: 10Elukey) [10:53:45] PROBLEM - puppet last run on stat1007 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/srv/geoip] [10:53:52] this is me --^ [10:57:16] jouncebot: next [10:57:16] In 0 hour(s) and 2 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T1100) [10:58:53] RECOVERY - puppet last run on stat1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:00:04] addshore, hashar, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the European Mid-day SWAT(Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T1100). [11:00:04] rxy: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:10] o/ [11:00:56] (03PS1) 10Muehlenhoff: Remove absented Diamond collector for Pybal [puppet] - 10https://gerrit.wikimedia.org/r/470358 [11:01:45] o/ [11:02:09] hi [11:02:47] 10Operations: Access requests process: People sometimes specify hostnames instead of admin groups in access requests - https://phabricator.wikimedia.org/T207754 (10Krenair) >>! In T207754#4701935, @MoritzMuehlenhoff wrote: > Can you clarify the scope/intent of this task, it's not obvious to me. Change the acces... [11:04:28] rxy: sorry, Europe changed time zone over the weekend, and this event is locked to a US timezone, I forgot it's an hour earlier this week :) [11:05:15] (03CR) 10Ema: [C: 031] Remove absented Diamond collector for Pybal [puppet] - 10https://gerrit.wikimedia.org/r/470358 (owner: 10Muehlenhoff) [11:05:32] haha [11:05:43] 10Operations, 10monitoring, 10Patch-For-Review, 10User-fgiunchedi: Deprovision Diamond collectors no longer in use - https://phabricator.wikimedia.org/T183454 (10MoritzMuehlenhoff) [11:06:13] RECOVERY - puppet last run on lvs2006 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:06:14] DST is confusable. ( JST is not effects DST ) [11:06:19] (03PS11) 10Giuseppe Lavagetto: mediawiki::webserver: introduce profile, use it on mwdebug* [puppet] - 10https://gerrit.wikimedia.org/r/467644 [11:07:31] zeljkof: anyway, can you SWAT the my patch? :) [11:07:44] rxy: sure, sorry, I'm reviewing it, forgot to say that [11:07:59] ah , k [11:08:33] RECOVERY - puppet last run on lvs2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:08:52] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469864 (https://phabricator.wikimedia.org/T208035) (owner: 10Rxy) [11:10:09] (03Merged) 10jenkins-bot: Remove global action related permissions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469864 (https://phabricator.wikimedia.org/T208035) (owner: 10Rxy) [11:12:33] RECOVERY - puppet last run on lvs4005 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [11:14:46] rxy: the patch is at mwdebug1002, please test and let me know if it's ok to deploy it [11:15:01] I will testing that ... [11:15:30] (03CR) 10Muehlenhoff: [C: 032] Remove absented Diamond collector for Pybal [puppet] - 10https://gerrit.wikimedia.org/r/470358 (owner: 10Muehlenhoff) [11:17:09] (03PS12) 10Giuseppe Lavagetto: mediawiki::webserver: introduce profile, use it on mwdebug* [puppet] - 10https://gerrit.wikimedia.org/r/467644 [11:17:38] zeljkof: ok, It work correctly at mwdebug1002. Please deploy :) [11:18:05] rxy: ok [11:18:47] (03PS1) 10Muehlenhoff: Remove Diamond from LVSes [puppet] - 10https://gerrit.wikimedia.org/r/470359 (https://phabricator.wikimedia.org/T183454) [11:19:18] !log zfilipin@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:469864|Remove global action related permissions (T208035)]] (duration: 00m 48s) [11:19:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:22] T208035: Remove global action related permissions except meta wikimedia - https://phabricator.wikimedia.org/T208035 [11:21:09] rxy: it's deployed, please test [11:22:47] zeljkof: Request URL: https://en.wikipedia.org/wiki/Special:ListGroupRights ; server: mw1248.eqiad.wmnet -> ok, Request URL: https://meta.wikimedia.org/wiki/Special:ListGroupRights server: mw1322.eqiad.wmnet -> ok. It work correctly. Thanks :) [11:23:11] (03CR) 10Muehlenhoff: [C: 032] Remove Diamond from LVSes [puppet] - 10https://gerrit.wikimedia.org/r/470359 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [11:23:23] !log EU SWAT finished [11:23:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:26:13] (03PS1) 10Addshore: Wikibase BETA, use the same siteLinkGroups as prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470360 [11:26:15] (03PS1) 10Addshore: Wikibase, move some property lists to IS php files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470361 [11:30:19] (03PS1) 10Addshore: Wikibase, Move wgArticlePlaceholderImageProperty to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470362 [11:31:41] (03CR) 10jenkins-bot: Remove global action related permissions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469864 (https://phabricator.wikimedia.org/T208035) (owner: 10Rxy) [11:34:27] (03CR) 10Tarrow: [C: 04-1] "Typo I think?" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470348 (owner: 10Addshore) [11:35:12] (03CR) 10Addshore: Wikibase, move dispatching settings to IS.php (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470348 (owner: 10Addshore) [11:35:14] (03PS13) 10Giuseppe Lavagetto: mediawiki::webserver: introduce profile, use it on mwdebug* [puppet] - 10https://gerrit.wikimedia.org/r/467644 [11:35:46] (03CR) 10BBlack: [C: 031] check_vcl_reload: no unknowns if reload-vcl still has to run [puppet] - 10https://gerrit.wikimedia.org/r/470353 (https://phabricator.wikimedia.org/T206950) (owner: 10Ema) [11:36:56] (03PS1) 10Addshore: Wikibase, cleanup some duplicated settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470363 [11:37:58] jouncebot: now [11:37:58] For the next 0 hour(s) and 22 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T1100) [11:38:00] jouncebot: next [11:38:00] In 5 hour(s) and 21 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T1700) [11:42:37] (03PS1) 10Muehlenhoff: Absent Kubernetes diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/470364 (https://phabricator.wikimedia.org/T183454) [11:42:39] (03PS1) 10Muehlenhoff: Remove now obsolete Diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/470365 (https://phabricator.wikimedia.org/T183454) [11:44:31] !log installing graphicsmagick update for stretch [11:44:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:44:54] 10Operations, 10Traffic: Varnish won't purge thumbnails of specific file - https://phabricator.wikimedia.org/T207615 (10BBlack) Most likely, this is related to URI normalization rules (note %-encoded chars in the relevant titles) and/or the generation of purges at the origins (tracking known thumbnails for pur... [11:47:10] (03PS1) 10Addshore: Wikibase, Move quality contraints settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470366 [11:49:46] (03PS1) 10Addshore: Wikibase, set wgArticlePlaceholderSearchEngineIndexed in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470368 [11:49:48] (03PS1) 10Addshore: Remove unused wmgArticlePlaceholderSearchEngineIndexed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470369 [11:55:24] PROBLEM - Check systemd state on lvs3002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [11:55:33] PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 2 minutes ago with 3 failures. Failed resources (up to 3 shown): Package[diamond],Package[python-diamond] [11:56:53] moritzm: FYI ^^^ but I think it was just a race with the above upgrade, dpkg locked [11:58:12] 10Operations: Access requests process: People sometimes specify hostnames instead of admin groups in access requests - https://phabricator.wikimedia.org/T207754 (10MoritzMuehlenhoff) We don't want to shift any responsibilities, in the end the handling of the access request is a collaborative process between the... [11:59:03] volans: ack, rerunning puppet to clean this out [11:59:27] thx [11:59:54] RECOVERY - Check systemd state on lvs3002 is OK: OK - running: The system is fully operational [12:00:34] RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [12:07:50] (03PS1) 10Addshore: Wikibase, move more misc settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470370 [12:09:52] (03PS1) 10Addshore: Wikibase, move 2 usage tracking configs to Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470371 [12:12:44] RECOVERY - High lag on wdqs1003 is OK: (C)3600 ge (W)1200 ge 1187 https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [12:17:52] (03CR) 10Tarrow: [C: 031] "haha, almost left a comment in the previous one before seeing why it was split" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470369 (owner: 10Addshore) [12:25:21] (03PS1) 10Addshore: Wikibase, move repo definitions to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470373 [12:25:24] tarrow: ^^ that one could do with eyes [12:25:48] looking [12:26:29] tarrow: actually, already spotted some more stuff to go into that change [12:28:20] ok [12:31:28] (03PS2) 10Addshore: Wikibase, move repo definitions to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470373 [12:33:29] (03PS1) 10Addshore: Totally empty Wikibase-* files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470374 [12:33:31] (03PS1) 10Addshore: Stop loading Wikibase-* files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470375 [12:33:33] (03PS1) 10Addshore: Remove Wikibase-* config files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470376 [12:33:48] tarrow: right, I think I got to the next check point, of being able to remove those evil files :D [12:34:14] Then I just have ~114 lines of Wikibase.php to finish cleaning up [12:34:17] jouncebot: now [12:34:17] No deployments scheduled for the next 4 hour(s) and 25 minute(s) [12:34:23] * addshore goes to claim the current slot [12:35:15] jouncebot: refresh [12:35:30] I refreshed my knowledge about deployments. [12:35:40] fine, still looking at 470373 [12:37:20] jouncebot: refresh [12:37:21] I refreshed my knowledge about deployments. [12:37:23] jouncebot: now [12:37:24] For the next 1 hour(s) and 22 minute(s): Cleaning up Wikibase mw-config (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T1230) [12:37:26] :> [12:38:17] (03PS2) 10Addshore: Wikibase, move dispatching settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470348 [12:38:21] (03PS3) 10Addshore: Wikibase, move dispatching settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470348 [12:39:38] (03CR) 10Addshore: [C: 032] Wikibase, move dispatching settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470348 (owner: 10Addshore) [12:41:16] (03Merged) 10jenkins-bot: Wikibase, move dispatching settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470348 (owner: 10Addshore) [12:42:34] (03PS2) 10Addshore: Wikibase, Move $wgPropertySuggesterMinProbability to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470349 [12:42:53] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: PT1/2 [[gerrit:470348]] Wikibase, move dispatching settings to IS.php (duration: 00m 48s) [12:43:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:43:42] (03CR) 10Addshore: [C: 032] Wikibase, Move $wgPropertySuggesterMinProbability to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470349 (owner: 10Addshore) [12:43:50] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: PT2/2 [[gerrit:470348]] Wikibase, move dispatching settings to IS.php (duration: 00m 47s) [12:43:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:44:31] (03PS2) 10Addshore: Wikibase, Move property suggester settings to IS files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470351 [12:44:38] (03Merged) 10jenkins-bot: Wikibase, Move $wgPropertySuggesterMinProbability to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470349 (owner: 10Addshore) [12:46:05] (03CR) 10Addshore: [C: 032] Wikibase, Move property suggester settings to IS files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470351 (owner: 10Addshore) [12:46:40] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Wikibase, Move $wgPropertySuggesterMinProbability to IS.php PT 1/2 (duration: 00m 47s) [12:46:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:52] (03Restored) 10Hashar: test: puppet-syntax now fails on deprecation notices [puppet] - 10https://gerrit.wikimedia.org/r/333012 (https://phabricator.wikimedia.org/T154915) (owner: 10Hashar) [12:47:02] (03PS5) 10Hashar: test: puppet-syntax now fails on deprecation notices [puppet] - 10https://gerrit.wikimedia.org/r/333012 (https://phabricator.wikimedia.org/T154915) [12:47:21] (03Merged) 10jenkins-bot: Wikibase, Move property suggester settings to IS files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470351 (owner: 10Addshore) [12:47:37] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: Wikibase, Move $wgPropertySuggesterMinProbability to IS.php PT 2/2 (duration: 00m 47s) [12:47:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:48:58] 10Operations, 10Puppet, 10Continuous-Integration-Config, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): Get rid of "import realm.pp" in manifests/site.pp - https://phabricator.wikimedia.org/T154915 (10hashar) 05stalled>03Open The reason for this task is to have PuppetSyntax to whine on a de... [12:51:39] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: PT1/2 Wikibase, Move property suggester settings to IS files (duration: 00m 47s) [12:51:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:52:03] (03PS3) 10Addshore: Wikibase, Move badge related config to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470355 [12:52:06] (03CR) 10Addshore: [C: 032] Wikibase, Move badge related config to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470355 (owner: 10Addshore) [12:52:25] (03CR) 10jenkins-bot: Wikibase, move dispatching settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470348 (owner: 10Addshore) [12:52:27] (03CR) 10jenkins-bot: Wikibase, Move $wgPropertySuggesterMinProbability to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470349 (owner: 10Addshore) [12:52:29] (03CR) 10jenkins-bot: Wikibase, Move property suggester settings to IS files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470351 (owner: 10Addshore) [12:52:41] !log addshore@deploy1001 Synchronized wmf-config: PT2/2 Wikibase, Move property suggester settings to IS files (duration: 00m 47s) [12:52:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:22] (03Merged) 10jenkins-bot: Wikibase, Move badge related config to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470355 (owner: 10Addshore) [12:55:57] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Wikibase, Move badge related config to IS.php PT 1/2 (duration: 00m 45s) [12:55:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:56:27] la la la la sync sync sync [12:57:04] !log addshore@deploy1001 Synchronized wmf-config: Wikibase, Move badge related config to IS.php PT 2/2 (duration: 00m 47s) [12:57:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:57:11] (03PS2) 10Addshore: Wikibase BETA, use the same siteLinkGroups as prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470360 [12:57:15] (03CR) 10Addshore: [C: 032] Wikibase BETA, use the same siteLinkGroups as prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470360 (owner: 10Addshore) [12:57:27] (03PS2) 10Addshore: Wikibase, move some property lists to IS php files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470361 [12:57:32] (03CR) 10Addshore: [C: 032] Wikibase, move some property lists to IS php files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470361 (owner: 10Addshore) [12:58:20] (03Merged) 10jenkins-bot: Wikibase BETA, use the same siteLinkGroups as prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470360 (owner: 10Addshore) [12:59:17] (03Merged) 10jenkins-bot: Wikibase, move some property lists to IS php files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470361 (owner: 10Addshore) [12:59:42] (03PS2) 10Addshore: Wikibase, Move wgArticlePlaceholderImageProperty to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470362 [12:59:44] 10Operations, 10SRE-Access-Requests: Requesting access to deployment and analytics-privatedata-users for sbassett - https://phabricator.wikimedia.org/T207852 (10jijiki) a:05jijiki>03None [13:00:00] !log addshore@deploy1001 Synchronized wmf-config: BETA FILES only (duration: 00m 47s) [13:00:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:01:19] (03PS1) 10Muehlenhoff: Ship base directory for keytabs [puppet] - 10https://gerrit.wikimedia.org/r/470378 [13:01:38] !log addshore@deploy1001 sync-file aborted: Wikibase, move some property lists to IS php files PT 1/2 (duration: 00m 04s) [13:01:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:32] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Wikibase, move some property lists to IS php files PT 1/2 (duration: 00m 47s) [13:02:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:02:53] (03CR) 10Addshore: [C: 032] Wikibase, Move wgArticlePlaceholderImageProperty to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470362 (owner: 10Addshore) [13:03:02] (03PS2) 10Addshore: Wikibase, cleanup some duplicated settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470363 [13:03:32] !log addshore@deploy1001 Synchronized wmf-config: Wikibase, move some property lists to IS php files PT 2/2 (duration: 00m 47s) [13:03:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:04:13] (03Merged) 10jenkins-bot: Wikibase, Move wgArticlePlaceholderImageProperty to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470362 (owner: 10Addshore) [13:05:30] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Wikibase, Move wgArticlePlaceholderImageProperty to IS.php PT 1/2 (duration: 00m 46s) [13:05:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:06:42] (03CR) 10Addshore: [C: 032] Wikibase, cleanup some duplicated settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470363 (owner: 10Addshore) [13:06:47] (03PS2) 10Addshore: Wikibase, Move quality contraints settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470366 [13:06:51] !log addshore@deploy1001 Synchronized wmf-config: Wikibase, Move wgArticlePlaceholderImageProperty to IS.php PT 2/2 (duration: 00m 46s) [13:06:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:07:40] tarrow: all going well so far :) [13:08:05] (03Merged) 10jenkins-bot: Wikibase, cleanup some duplicated settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470363 (owner: 10Addshore) [13:08:35] (03CR) 10jenkins-bot: Wikibase, Move badge related config to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470355 (owner: 10Addshore) [13:08:37] (03CR) 10jenkins-bot: Wikibase BETA, use the same siteLinkGroups as prod [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470360 (owner: 10Addshore) [13:08:39] (03CR) 10jenkins-bot: Wikibase, move some property lists to IS php files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470361 (owner: 10Addshore) [13:08:41] (03CR) 10jenkins-bot: Wikibase, Move wgArticlePlaceholderImageProperty to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470362 (owner: 10Addshore) [13:08:43] (03CR) 10jenkins-bot: Wikibase, cleanup some duplicated settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470363 (owner: 10Addshore) [13:08:47] (03PS2) 10Addshore: Wikibase, set wgArticlePlaceholderSearchEngineIndexed in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470368 [13:08:51] (03CR) 10Addshore: [C: 032] Wikibase, Move quality contraints settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470366 (owner: 10Addshore) [13:09:20] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: Wikibase, cleanup some duplicated settings PT 1/2 (duration: 00m 46s) [13:09:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:16] !log addshore@deploy1001 Synchronized wmf-config: Wikibase, cleanup some duplicated settings PT 2/2 (duration: 00m 47s) [13:10:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:25] (03Merged) 10jenkins-bot: Wikibase, Move quality contraints settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470366 (owner: 10Addshore) [13:12:09] (03CR) 10Addshore: [C: 032] Wikibase, set wgArticlePlaceholderSearchEngineIndexed in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470368 (owner: 10Addshore) [13:12:16] (03PS2) 10Addshore: Remove unused wmgArticlePlaceholderSearchEngineIndexed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470369 [13:12:49] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Wikibase, Move quality contraints settings to IS.php PT 1/2 (duration: 00m 47s) [13:12:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:13:30] (03Merged) 10jenkins-bot: Wikibase, set wgArticlePlaceholderSearchEngineIndexed in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470368 (owner: 10Addshore) [13:13:48] !log addshore@deploy1001 Synchronized wmf-config: Wikibase, Move quality contraints settings to IS.php PT 2/2 (duration: 00m 47s) [13:13:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:15:46] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Wikibase, set wgArticlePlaceholderSearchEngineIndexed in IS.php PT 1/2 (duration: 00m 47s) [13:15:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:21] (03CR) 10Addshore: [C: 032] Remove unused wmgArticlePlaceholderSearchEngineIndexed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470369 (owner: 10Addshore) [13:16:29] (03PS2) 10Addshore: Wikibase, move more misc settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470370 [13:16:49] !log addshore@deploy1001 Synchronized wmf-config: Wikibase, set wgArticlePlaceholderSearchEngineIndexed in IS.php PT 2/2 (duration: 00m 47s) [13:16:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:58] (03PS1) 10Andrew Bogott: Horizon: move some projects to eqiad1: antiharassment, catgraph, codereview, cvn [puppet] - 10https://gerrit.wikimedia.org/r/470380 (https://phabricator.wikimedia.org/T204745) [13:17:35] (03Merged) 10jenkins-bot: Remove unused wmgArticlePlaceholderSearchEngineIndexed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470369 (owner: 10Addshore) [13:17:42] (03CR) 10Addshore: [C: 032] Wikibase, move more misc settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470370 (owner: 10Addshore) [13:18:00] (03PS2) 10Addshore: Wikibase, move 2 usage tracking configs to Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470371 [13:19:02] (03Merged) 10jenkins-bot: Wikibase, move more misc settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470370 (owner: 10Addshore) [13:19:07] !log addshore@deploy1001 Synchronized wmf-config: Remove unused wmgArticlePlaceholderSearchEngineIndexed (duration: 00m 48s) [13:19:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:21:32] (03CR) 10Addshore: [C: 032] Wikibase, move 2 usage tracking configs to Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470371 (owner: 10Addshore) [13:21:38] (03PS3) 10Addshore: Wikibase, move repo definitions to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470373 [13:21:40] (03CR) 10Andrew Bogott: [C: 032] Horizon: move some projects to eqiad1: antiharassment, catgraph, codereview, cvn [puppet] - 10https://gerrit.wikimedia.org/r/470380 (https://phabricator.wikimedia.org/T204745) (owner: 10Andrew Bogott) [13:21:42] (03PS2) 10Addshore: Totally empty Wikibase-* files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470374 [13:21:49] (03PS2) 10Addshore: Stop loading Wikibase-* files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470375 [13:21:53] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Wikibase, move more misc settings to IS.php PT 1/2 (duration: 00m 49s) [13:21:55] (03PS2) 10Addshore: Remove Wikibase-* config files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470376 [13:21:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:22:52] (03Merged) 10jenkins-bot: Wikibase, move 2 usage tracking configs to Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470371 (owner: 10Addshore) [13:22:55] !log addshore@deploy1001 Synchronized wmf-config: Wikibase, move more misc settings to IS.php PT 2/2 (duration: 00m 47s) [13:22:55] 10Operations, 10SRE-Access-Requests: Requesting access to deployment, operational logs, and analytics cluster for jlinehan - https://phabricator.wikimedia.org/T207951 (10jijiki) a:03jijiki @jlinehan we'll add you to `deployment`, `statistics-privatedata-users`, `analytics-privatedata-users`, and `researchers... [13:22:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:23:19] 10Operations, 10SRE-Access-Requests: Requesting access to deployment and analytics-privatedata-users for sbassett - https://phabricator.wikimedia.org/T207852 (10jijiki) Pending today's SRE meeting and approval from manager [13:24:17] (03CR) 10jenkins-bot: Wikibase, Move quality contraints settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470366 (owner: 10Addshore) [13:24:18] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: Wikibase, move 2 usage tracking configs to Wikibase.php PT 1/2 (duration: 00m 46s) [13:24:19] (03CR) 10jenkins-bot: Wikibase, set wgArticlePlaceholderSearchEngineIndexed in IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470368 (owner: 10Addshore) [13:24:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:24:21] (03CR) 10jenkins-bot: Remove unused wmgArticlePlaceholderSearchEngineIndexed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470369 (owner: 10Addshore) [13:24:23] (03CR) 10jenkins-bot: Wikibase, move more misc settings to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470370 (owner: 10Addshore) [13:24:25] (03CR) 10jenkins-bot: Wikibase, move 2 usage tracking configs to Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470371 (owner: 10Addshore) [13:25:10] (03CR) 10GTirloni: [C: 032] toolforge: refactor/bootstrap service node puppet code [puppet] - 10https://gerrit.wikimedia.org/r/469614 (https://phabricator.wikimedia.org/T207591) (owner: 10Arturo Borrero Gonzalez) [13:25:17] !log addshore@deploy1001 Synchronized wmf-config: Wikibase, move 2 usage tracking configs to Wikibase.php PT 2/2 (duration: 00m 47s) [13:25:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:20] (03PS2) 10Pikne: Add dty, gor, inh, kbp and lfn to InterwikiSortOrders [mediawiki-config] - 10https://gerrit.wikimedia.org/r/468337 (https://phabricator.wikimedia.org/T208217) (owner: 10Gerrit Patch Uploader) [13:25:42] jouncebot: now [13:25:42] For the next 0 hour(s) and 34 minute(s): Cleaning up Wikibase mw-config (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T1230) [13:25:54] * addshore is going to stop there as he has a meeting for the next 30 mins... [13:26:40] jouncebot: refresh [13:26:41] I refreshed my knowledge about deployments. [13:26:42] 10Operations, 10SRE-Access-Requests, 10User-jijiki: Requesting access to deployment and analytics-privatedata-users for sbassett - https://phabricator.wikimedia.org/T207852 (10jijiki) [13:26:53] 10Operations, 10SRE-Access-Requests, 10User-jijiki: Requesting access to deployment, operational logs, and analytics cluster for jlinehan - https://phabricator.wikimedia.org/T207951 (10jijiki) [13:29:03] jouncebot: now [13:29:03] For the next 0 hour(s) and 0 minute(s): Cleaning up Wikibase mw-config (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T1230) [13:29:06] =] [13:29:53] (03PS8) 10Arturo Borrero Gonzalez: toolforge: refactor/bootstrap service node puppet code [puppet] - 10https://gerrit.wikimedia.org/r/469614 (https://phabricator.wikimedia.org/T207591) [13:30:46] (03PS1) 10Filippo Giunchedi: hieradata: enable syslog-tls in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/470382 (https://phabricator.wikimedia.org/T136312) [13:36:19] (03CR) 10Filippo Giunchedi: [C: 031] Absent Kubernetes diamond collector [puppet] - 10https://gerrit.wikimedia.org/r/470364 (https://phabricator.wikimedia.org/T183454) (owner: 10Muehlenhoff) [13:44:33] (03PS1) 10GTirloni: tools-services: Fix typo in updatetools service exec path [puppet] - 10https://gerrit.wikimedia.org/r/470386 (https://phabricator.wikimedia.org/T207591) [13:45:29] (03CR) 10GTirloni: [C: 032] tools-services: Fix typo in updatetools service exec path [puppet] - 10https://gerrit.wikimedia.org/r/470386 (https://phabricator.wikimedia.org/T207591) (owner: 10GTirloni) [13:53:04] (03CR) 10Bstorm: toolforge: refactor/bootstrap service node puppet code (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/469614 (https://phabricator.wikimedia.org/T207591) (owner: 10Arturo Borrero Gonzalez) [13:54:54] (03PS4) 10Gehel: wdqs: rate limit log sent to logstash [puppet] - 10https://gerrit.wikimedia.org/r/468979 (https://phabricator.wikimedia.org/T207656) [14:05:04] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 5.178 second response time [14:08:34] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:09:43] (03PS1) 10Arturo Borrero Gonzalez: toolforge: add missing grid base profile to services role [puppet] - 10https://gerrit.wikimedia.org/r/470397 (https://phabricator.wikimedia.org/T207591) [14:11:13] (03CR) 10Bstorm: [C: 031] toolforge: add missing grid base profile to services role [puppet] - 10https://gerrit.wikimedia.org/r/470397 (https://phabricator.wikimedia.org/T207591) (owner: 10Arturo Borrero Gonzalez) [14:11:41] jouncebot: next [14:11:41] In 2 hour(s) and 48 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T1700) [14:12:31] (03CR) 10Arturo Borrero Gonzalez: [C: 032] toolforge: add missing grid base profile to services role [puppet] - 10https://gerrit.wikimedia.org/r/470397 (https://phabricator.wikimedia.org/T207591) (owner: 10Arturo Borrero Gonzalez) [14:13:46] jouncebot: refresh [14:13:47] I refreshed my knowledge about deployments. [14:13:50] jouncebot: now [14:13:50] For the next 1 hour(s) and 46 minute(s): Cleaning up Wikibase mw-config (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T1400) [14:16:03] !log Restarting Jenkins [14:16:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:17:18] !log repooling wdqs1003, catched up on lag [14:17:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:18:17] (03CR) 10Herron: create rsyslog::ship_logfile - simplified logstash shipper via kafka (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/469945 (https://phabricator.wikimedia.org/T206454) (owner: 10Herron) [14:20:37] (03PS14) 10Giuseppe Lavagetto: mediawiki::webserver: introduce profile, use it on mwdebug* [puppet] - 10https://gerrit.wikimedia.org/r/467644 [14:21:01] (03PS1) 10BBlack: bugfix gdnsd prometheus edns client subnet stat [puppet] - 10https://gerrit.wikimedia.org/r/470402 [14:21:40] (03CR) 10BBlack: [C: 032] bugfix gdnsd prometheus edns client subnet stat [puppet] - 10https://gerrit.wikimedia.org/r/470402 (owner: 10BBlack) [14:21:48] (03CR) 10jerkins-bot: [V: 04-1] bugfix gdnsd prometheus edns client subnet stat [puppet] - 10https://gerrit.wikimedia.org/r/470402 (owner: 10BBlack) [14:22:18] lol [14:22:24] (03PS1) 10Vgutierrez: secret: Add dummy LE ACMEv2 staging private key for certcentral2001 [labs/private] - 10https://gerrit.wikimedia.org/r/470403 [14:22:25] I can't put the word "bugfix" in a commit title? :P [14:22:46] (03PS2) 10BBlack: fix gdnsd prometheus edns client subnet stat [puppet] - 10https://gerrit.wikimedia.org/r/470402 [14:22:48] (03PS4) 10Addshore: Wikibase, move repo definitions to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470373 [14:22:58] <_joe_> bblack: wat [14:23:14] <_joe_> those are wmf-wide commit message guidelines AIUI :P [14:23:22] (03CR) 10Addshore: [C: 032] Wikibase, move repo definitions to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470373 (owner: 10Addshore) [14:23:34] yeah, but the intent seems to be "don't put the bug reference in the title" [14:23:42] which I wasn't doing [14:24:17] the word bugfix seems useful in commit titles in general. or at least, annoying to exclude categorically [14:25:48] (03PS2) 10Vgutierrez: secret: Add dummy LE ACMEv2 staging private key for certcentral2001 [labs/private] - 10https://gerrit.wikimedia.org/r/470403 (https://phabricator.wikimedia.org/T208212) [14:26:25] (03CR) 10Vgutierrez: [V: 032 C: 032] secret: Add dummy LE ACMEv2 staging private key for certcentral2001 [labs/private] - 10https://gerrit.wikimedia.org/r/470403 (https://phabricator.wikimedia.org/T208212) (owner: 10Vgutierrez) [14:27:18] (03CR) 10Elukey: [C: 031] Ship base directory for keytabs [puppet] - 10https://gerrit.wikimedia.org/r/470378 (owner: 10Muehlenhoff) [14:27:25] * addshore waits for jenkins to schedule some jobs.... [14:27:36] aaah, there we go [14:27:37] (03Merged) 10jenkins-bot: Wikibase, move repo definitions to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470373 (owner: 10Addshore) [14:29:13] it probably doesn't check for word separator, hence matching also bugfix [14:29:51] (03CR) 10Herron: [C: 031] hieradata: enable syslog-tls in eqiad [puppet] - 10https://gerrit.wikimedia.org/r/470382 (https://phabricator.wikimedia.org/T136312) (owner: 10Filippo Giunchedi) [14:33:21] (03CR) 10jenkins-bot: Wikibase, move repo definitions to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470373 (owner: 10Addshore) [14:34:34] (03PS1) 10Vgutierrez: certcentral: Provide unique LE accounts for each certcentral hosts [puppet] - 10https://gerrit.wikimedia.org/r/470404 (https://phabricator.wikimedia.org/T208212) [14:34:44] (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/compiler1002/13249/mwdebug1001.eqiad.wmnet/ I looked at the diff in a lot of detail and I think there " [puppet] - 10https://gerrit.wikimedia.org/r/467644 (owner: 10Giuseppe Lavagetto) [14:35:31] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: PT 1/2 Wikibase, move repo definitions to IS.php [[gerrit:470373]] (duration: 00m 48s) [14:35:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:35:54] 10Operations, 10ops-codfw: Degraded RAID on db2048 - https://phabricator.wikimedia.org/T208141 (10Papaul) a:05Papaul>03Banyek Disk replacement complete [14:36:39] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T208096 (10Papaul) p:05Triage>03Normal [14:37:14] (03CR) 10Vgutierrez: "pcc looks happy: https://puppet-compiler.wmflabs.org/compiler1002/13250/" [puppet] - 10https://gerrit.wikimedia.org/r/470404 (https://phabricator.wikimedia.org/T208212) (owner: 10Vgutierrez) [14:37:19] (03PS2) 10Muehlenhoff: Ship base directory for keytabs [puppet] - 10https://gerrit.wikimedia.org/r/470378 [14:37:49] !log addshore@deploy1001 Synchronized wmf-config: PT 2/2 Wikibase, move repo definitions to IS.php [[gerrit:470373]] (duration: 00m 47s) [14:37:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:38:13] 10Operations, 10ops-codfw: Degraded RAID on db2048 - https://phabricator.wikimedia.org/T208141 (10Banyek) Thank you! As the drive is getting rebuilt I close the ticket ``` root@db2048:~# /usr/local/lib/nagios/plugins/get-raid-status-hpssacli Smart Array P420i in Slot 0 (Embedded) array A Logical Dr... [14:38:26] 10Operations, 10ops-codfw: Degraded RAID on db2048 - https://phabricator.wikimedia.org/T208141 (10Banyek) 05Open>03Resolved [14:41:21] 10Operations, 10ops-codfw: Degraded RAID on heze-array1 - https://phabricator.wikimedia.org/T206909 (10Papaul) [14:45:28] (03PS1) 10Addshore: Wikibase.php, check $wmgWBRepoSettingsSparqlEndpoint is set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470405 [14:45:33] 10Operations, 10ops-codfw: Degraded RAID on heze-array1 - https://phabricator.wikimedia.org/T206909 (10Papaul) There are 2 failed disks on the system. Slot 7 and slot 9 . This system is out of warranty and I have no 4TB SAS disks on site for replacement. I will have to open a procurement task for this. [14:47:26] 10Operations, 10ops-codfw: Degraded RAID on heze - https://phabricator.wikimedia.org/T164955 (10jijiki) a:05jijiki>03Papaul [14:49:20] (03PS1) 10Vgutierrez: Release 0.4 [software/certcentral] - 10https://gerrit.wikimedia.org/r/470407 (https://phabricator.wikimedia.org/T207927) [14:49:22] 10Operations, 10ops-codfw: relabel server saiph.frack.codfw.wmnet to frpig2001.frack.codfw.wmnet - https://phabricator.wikimedia.org/T207036 (10Papaul) 05Open>03Resolved Complete [14:50:28] (03CR) 10Addshore: [C: 032] Wikibase.php, check $wmgWBRepoSettingsSparqlEndpoint is set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470405 (owner: 10Addshore) [14:51:26] (03Merged) 10jenkins-bot: Wikibase.php, check $wmgWBRepoSettingsSparqlEndpoint is set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470405 (owner: 10Addshore) [14:51:57] 10Operations, 10ops-codfw: Degraded RAID on heze - https://phabricator.wikimedia.org/T164955 (10jijiki) 05Resolved>03Open @Papaul up to you if you want this task open or not, I marked it as resolved as T206909 was opened. [14:52:14] 10Operations, 10ops-eqiad, 10DBA: db1117 went away - https://phabricator.wikimedia.org/T208150 (10Cmjohnson) @jcrespo @banyek It is not clearly a RAM issue it could be a CPU issue as well on CPU1 ....i will need to do a few things tot he server...Swap the supposedly bad DIMM to B side and see if the error fo... [14:52:20] 10Operations, 10ops-eqiad, 10DBA: db1117 went away - https://phabricator.wikimedia.org/T208150 (10Cmjohnson) idrac log ------------------------------------------------------------------------------- Record: 2 Date/Time: 10/27/2018 21:08:57 Source: system Severity: Non-Critical Description: C... [14:52:54] 10Operations, 10ops-eqiad, 10Analytics: Degraded RAID on aqs1006 - https://phabricator.wikimedia.org/T206915 (10Cmjohnson) The disk is being sent and should arrive today or tomorrow [14:52:58] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: Wikibase.php, check $wmgWBRepoSettingsSparqlEndpoint is set (duration: 00m 46s) [14:53:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:22] (03PS3) 10Addshore: Totally empty Wikibase-* files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470374 [14:53:29] (03PS3) 10Addshore: Stop loading Wikibase-* files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470375 [14:53:34] (03PS3) 10Addshore: Remove Wikibase-* config files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470376 [14:53:40] (03PS2) 10Giuseppe Lavagetto: httpd: add httpd::env [puppet] - 10https://gerrit.wikimedia.org/r/470347 [14:53:43] 10Operations, 10ops-eqiad, 10DBA: db1117 went away - https://phabricator.wikimedia.org/T208150 (10Banyek) Sure! Can I shut down the server now for you? [14:54:39] 10Operations, 10ops-codfw: Degraded RAID on heze - https://phabricator.wikimedia.org/T164955 (10Papaul) 05Open>03Resolved resolving duplicate T206909 [14:56:42] 10Operations, 10Commons, 10MediaWiki-File-management, 10MediaWiki-Maintenance-scripts, and 2 others: cronspam cleanup: Cron /usr/local/bin/foreachwiki maintenance/cleanupUploadStash.php > /dev/null - https://phabricator.wikimedia.org/T150375 (10jijiki) [14:56:46] (03CR) 10Herron: rsyslog: add prometheus-rsyslog-exporter support (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/470345 (https://phabricator.wikimedia.org/T205862) (owner: 10Filippo Giunchedi) [14:58:46] !log shutting down db1117 for hardware maintenance (T208150) [14:58:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:58:49] T208150: db1117 went away - https://phabricator.wikimedia.org/T208150 [14:59:25] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T208096 (10Papaul) @fgiunchedi we can try to drain the power, unplug and plug back the controller cable and update the server firmware as well. Done this on some DB servers. Let me know what you think . [15:00:52] (03CR) 10Addshore: [C: 032] Totally empty Wikibase-* files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470374 (owner: 10Addshore) [15:00:54] (03PS3) 10Giuseppe Lavagetto: httpd: add httpd::env [puppet] - 10https://gerrit.wikimedia.org/r/470347 [15:01:35] (03CR) 10jerkins-bot: [V: 04-1] httpd: add httpd::env [puppet] - 10https://gerrit.wikimedia.org/r/470347 (owner: 10Giuseppe Lavagetto) [15:02:13] (03Merged) 10jenkins-bot: Totally empty Wikibase-* files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470374 (owner: 10Addshore) [15:03:43] PROBLEM - haproxy failover on dbproxy1006 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [15:03:54] PROBLEM - haproxy failover on dbproxy1008 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [15:04:04] PROBLEM - haproxy failover on dbproxy1001 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [15:04:28] (03CR) 10jenkins-bot: Wikibase.php, check $wmgWBRepoSettingsSparqlEndpoint is set [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470405 (owner: 10Addshore) [15:04:30] (03CR) 10jenkins-bot: Totally empty Wikibase-* files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470374 (owner: 10Addshore) [15:04:34] PROBLEM - haproxy failover on dbproxy1007 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [15:04:34] PROBLEM - haproxy failover on dbproxy1002 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [15:04:44] PROBLEM - haproxy failover on dbproxy1003 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [15:04:55] !log addshore@deploy1001 Synchronized wmf-config/Wikibase.php: PT1/2 Totally empty Wikibase-* files (duration: 00m 46s) [15:04:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:33] !log addshore@deploy1001 Synchronized wmf-config: PT2/2 Totally empty Wikibase-* files (duration: 00m 47s) [15:06:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:08:42] (03CR) 10Addshore: [C: 032] Stop loading Wikibase-* files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470375 (owner: 10Addshore) [15:09:45] (03Merged) 10jenkins-bot: Stop loading Wikibase-* files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470375 (owner: 10Addshore) [15:12:08] 10Operations, 10SRE-Access-Requests, 10User-jijiki: Requesting access to deployment and analytics-privatedata-users for sbassett - https://phabricator.wikimedia.org/T207852 (10sbassett) @jijiki - I think I just need `deployment` and `analytics-privatedata-users`. I modeled it off of what @Bawolff has here,... [15:13:04] 10Operations, 10ops-eqiad, 10DBA: db1117 went away - https://phabricator.wikimedia.org/T208150 (10jcrespo) > It is not clearly a RAM issue Sorry, my comment was in the context of "it is not a software issue" and "it expresses/reveals itself as a memory error", so we could discard MySQL issues, as it is what... [15:14:02] !log addshore@deploy1001 Synchronized wmf-config: Stop loading Wikibase-* files (duration: 00m 47s) [15:14:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:14:44] (03PS4) 10Addshore: Remove Wikibase-* config files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470376 [15:14:57] 10Operations, 10SRE-Access-Requests, 10User-jijiki: Requesting access to deployment and analytics-privatedata-users for sbassett - https://phabricator.wikimedia.org/T207852 (10JBennett) +1 from me [15:15:07] (03CR) 10Addshore: [C: 032] Remove Wikibase-* config files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470376 (owner: 10Addshore) [15:16:08] (03Merged) 10jenkins-bot: Remove Wikibase-* config files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470376 (owner: 10Addshore) [15:16:44] (03PS1) 10Cwhite: update graphite-in to use graphite1004 [dns] - 10https://gerrit.wikimedia.org/r/470410 (https://phabricator.wikimedia.org/T196484) [15:16:55] (03CR) 10Muehlenhoff: [C: 032] Ship base directory for keytabs [puppet] - 10https://gerrit.wikimedia.org/r/470378 (owner: 10Muehlenhoff) [15:17:17] !log addshore@deploy1001 Synchronized wmf-config: Remove Wikibase-* config files (duration: 00m 47s) [15:17:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:18:25] 10Operations, 10WMF-NDA: Issues with purgeUnusedProjects.php cron job on mwmaint1002 (Fri Oct 26) - https://phabricator.wikimedia.org/T208231 (10jijiki) p:05Triage>03Normal [15:19:50] (03CR) 10jenkins-bot: Stop loading Wikibase-* files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470375 (owner: 10Addshore) [15:19:52] (03CR) 10jenkins-bot: Remove Wikibase-* config files [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470376 (owner: 10Addshore) [15:19:53] !log cloudvirt1019 going down to re-seat the backplane cables [15:19:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:00] ^andrewbogott [15:21:23] (03PS1) 10Bstorm: sonofgridengine: try using the gridengine_queue puppet type [puppet] - 10https://gerrit.wikimedia.org/r/470411 (https://phabricator.wikimedia.org/T200557) [15:22:24] PROBLEM - Host cloudvirt1019 is DOWN: PING CRITICAL - Packet loss = 100% [15:22:48] (03PS1) 10BBlack: gdnsd: experiment with higher tcp_timeout [puppet] - 10https://gerrit.wikimedia.org/r/470412 [15:23:05] cmjohnson1: no worries, thanks [15:23:08] (03PS1) 10Addshore: Create and use testwikidataclient.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470413 [15:24:44] RECOVERY - Device not healthy -SMART- on cloudvirt1019 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=cloudvirt1019&var-datasource=eqiad%2520prometheus%252Fops [15:25:03] (03CR) 10BBlack: [C: 032] gdnsd: experiment with higher tcp_timeout [puppet] - 10https://gerrit.wikimedia.org/r/470412 (owner: 10BBlack) [15:26:15] (03CR) 10Vgutierrez: [C: 032] Release 0.4 [software/certcentral] - 10https://gerrit.wikimedia.org/r/470407 (https://phabricator.wikimedia.org/T207927) (owner: 10Vgutierrez) [15:26:24] RECOVERY - haproxy failover on dbproxy1001 is OK: OK check_failover servers up 2 down 0 [15:26:31] 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review: Systemd restart loop of timer filled the disk on tegmen - https://phabricator.wikimedia.org/T199413 (10Volans) 05Open>03Resolved a:03Volans This hasn't repro in months and we're moving to stretch on the Icinga hosts. Resolving for now, feel fr... [15:26:53] RECOVERY - haproxy failover on dbproxy1007 is OK: OK check_failover servers up 2 down 0 [15:26:54] RECOVERY - haproxy failover on dbproxy1002 is OK: OK check_failover servers up 2 down 0 [15:27:04] RECOVERY - haproxy failover on dbproxy1003 is OK: OK check_failover servers up 2 down 0 [15:27:04] RECOVERY - haproxy failover on dbproxy1006 is OK: OK check_failover servers up 2 down 0 [15:27:23] RECOVERY - haproxy failover on dbproxy1008 is OK: OK check_failover servers up 2 down 0 [15:27:30] (03PS2) 10Addshore: Create and use wikidataclient-test.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470413 [15:28:00] (03CR) 10Alex Monk: [C: 032] Release 0.4 [software/certcentral] - 10https://gerrit.wikimedia.org/r/470407 (https://phabricator.wikimedia.org/T207927) (owner: 10Vgutierrez) [15:28:02] (03PS1) 10Vgutierrez: certcentral: Implement slow retries on challenge rejection by ACME dir. [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/470414 (https://phabricator.wikimedia.org/T207927) [15:28:04] (03PS1) 10Vgutierrez: certcentral: Avoid fast retry on local errors after cert is issued [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/470415 (https://phabricator.wikimedia.org/T207927) [15:28:07] (03PS1) 10Vgutierrez: Release 0.4 [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/470416 (https://phabricator.wikimedia.org/T207927) [15:28:11] (03CR) 10jenkins-bot: Release 0.4 [software/certcentral] - 10https://gerrit.wikimedia.org/r/470407 (https://phabricator.wikimedia.org/T207927) (owner: 10Vgutierrez) [15:28:37] (03CR) 10Alex Monk: [C: 032] Release 0.4 [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/470416 (https://phabricator.wikimedia.org/T207927) (owner: 10Vgutierrez) [15:29:04] (03CR) 10Alex Monk: [C: 032] certcentral: Implement slow retries on challenge rejection by ACME dir. [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/470414 (https://phabricator.wikimedia.org/T207927) (owner: 10Vgutierrez) [15:29:29] (03CR) 10Alex Monk: [C: 032] certcentral: Avoid fast retry on local errors after cert is issued [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/470415 (https://phabricator.wikimedia.org/T207927) (owner: 10Vgutierrez) [15:30:10] (03PS1) 10Addshore: Split make wikidataclient dblist compute from www and test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470417 [15:31:09] (03CR) 10Alex Monk: [C: 031] certcentral: Provide unique LE accounts for each certcentral hosts [puppet] - 10https://gerrit.wikimedia.org/r/470404 (https://phabricator.wikimedia.org/T208212) (owner: 10Vgutierrez) [15:32:02] (03CR) 10jerkins-bot: [V: 04-1] Split make wikidataclient dblist compute from www and test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470417 (owner: 10Addshore) [15:32:33] PROBLEM - haproxy failover on dbproxy1007 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [15:32:34] PROBLEM - haproxy failover on dbproxy1002 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [15:32:43] PROBLEM - haproxy failover on dbproxy1003 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [15:32:43] banyek: ^^ [15:32:44] PROBLEM - haproxy failover on dbproxy1006 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [15:32:47] (03CR) 10jenkins-bot: Release 0.4 [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/470416 (https://phabricator.wikimedia.org/T207927) (owner: 10Vgutierrez) [15:32:54] Ah [15:32:58] it was me, sorry [15:33:03] PROBLEM - haproxy failover on dbproxy1008 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [15:33:12] I downtimed the host, but forgot do downtime the haproxy checks [15:33:13] PROBLEM - haproxy failover on dbproxy1001 is CRITICAL: CRITICAL check_failover servers up 1 down 1 [15:33:17] (03CR) 10jenkins-bot: certcentral: Avoid fast retry on local errors after cert is issued [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/470415 (https://phabricator.wikimedia.org/T207927) (owner: 10Vgutierrez) [15:33:21] everything is under control :( [15:33:29] (03CR) 10jenkins-bot: certcentral: Implement slow retries on challenge rejection by ACME dir. [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/470414 (https://phabricator.wikimedia.org/T207927) (owner: 10Vgutierrez) [15:34:24] ACKNOWLEDGEMENT - haproxy failover on dbproxy1001 is CRITICAL: CRITICAL check_failover servers up 1 down 1 Banyek db1117 is getting kernel and mariadb upgrades [15:34:25] ACKNOWLEDGEMENT - haproxy failover on dbproxy1002 is CRITICAL: CRITICAL check_failover servers up 1 down 1 Banyek db1117 is getting kernel and mariadb upgrades [15:34:25] ACKNOWLEDGEMENT - haproxy failover on dbproxy1003 is CRITICAL: CRITICAL check_failover servers up 1 down 1 Banyek db1117 is getting kernel and mariadb upgrades [15:34:25] ACKNOWLEDGEMENT - haproxy failover on dbproxy1006 is CRITICAL: CRITICAL check_failover servers up 1 down 1 Banyek db1117 is getting kernel and mariadb upgrades [15:34:25] ACKNOWLEDGEMENT - haproxy failover on dbproxy1007 is CRITICAL: CRITICAL check_failover servers up 1 down 1 Banyek db1117 is getting kernel and mariadb upgrades [15:34:25] ACKNOWLEDGEMENT - haproxy failover on dbproxy1008 is CRITICAL: CRITICAL check_failover servers up 1 down 1 Banyek db1117 is getting kernel and mariadb upgrades [15:34:28] 10Operations, 10SRE-Access-Requests, 10User-jijiki: Requesting access to deployment and analytics-privatedata-users for sbassett - https://phabricator.wikimedia.org/T207852 (10jijiki) [15:34:42] (03PS1) 10Vgutierrez: debian: Add release 0.4 to changelog [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/470418 (https://phabricator.wikimedia.org/T207927) [15:35:52] (03PS2) 10Bstorm: sonofgridengine: try using the gridengine_queue puppet type [puppet] - 10https://gerrit.wikimedia.org/r/470411 (https://phabricator.wikimedia.org/T200557) [15:37:51] 10Operations, 10Wikimedia-Logstash: Rationalize default logrotate "rotated" file extensions - https://phabricator.wikimedia.org/T207296 (10colewhite) +1 from me for dateext going forward [15:38:20] (03CR) 10Bstorm: [C: 032] sonofgridengine: try using the gridengine_queue puppet type [puppet] - 10https://gerrit.wikimedia.org/r/470411 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [15:39:36] (03CR) 10Addshore: [C: 032] Create and use wikidataclient-test.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470413 (owner: 10Addshore) [15:39:55] (03CR) 10Vgutierrez: [C: 032] debian: Add release 0.4 to changelog [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/470418 (https://phabricator.wikimedia.org/T207927) (owner: 10Vgutierrez) [15:40:43] RECOVERY - haproxy failover on dbproxy1006 is OK: OK check_failover servers up 2 down 0 [15:40:47] (03Merged) 10jenkins-bot: Create and use wikidataclient-test.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470413 (owner: 10Addshore) [15:40:54] RECOVERY - haproxy failover on dbproxy1008 is OK: OK check_failover servers up 2 down 0 [15:41:04] RECOVERY - haproxy failover on dbproxy1001 is OK: OK check_failover servers up 2 down 0 [15:41:34] RECOVERY - haproxy failover on dbproxy1007 is OK: OK check_failover servers up 2 down 0 [15:41:34] RECOVERY - haproxy failover on dbproxy1002 is OK: OK check_failover servers up 2 down 0 [15:41:44] RECOVERY - haproxy failover on dbproxy1003 is OK: OK check_failover servers up 2 down 0 [15:41:47] (03CR) 10jenkins-bot: debian: Add release 0.4 to changelog [software/certcentral] (debian) - 10https://gerrit.wikimedia.org/r/470418 (https://phabricator.wikimedia.org/T207927) (owner: 10Vgutierrez) [15:42:06] !log addshore@deploy1001 Synchronized dblists: Create and use wikidataclient-test.dblist PT 1/2 (duration: 00m 48s) [15:42:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:42:19] 10Operations, 10Icinga, 10fundraising-tech-ops, 10monitoring: Why doesn't icinga notify the team-fr-tech-ops contact for services in WARNING state? - https://phabricator.wikimedia.org/T207966 (10Dzahn) a:03Dzahn [15:43:04] !log addshore@deploy1001 Synchronized wmf-config: Create and use wikidataclient-test.dblist PT 2/2 (duration: 00m 48s) [15:43:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:43:35] 10Operations, 10Icinga, 10fundraising-tech-ops, 10monitoring: Why doesn't icinga notify the team-fr-tech-ops contact for services in WARNING state? - https://phabricator.wikimedia.org/T207966 (10Dzahn) Could you give an example which service is currently in warn or was in warn and didn't notify you? [15:44:00] !log uploaded certcentral 0.4 to apt.wikimedia.org (stretch) - T207927 [15:44:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:44:04] T207927: Take into account LE rate limits on sensitive operations - https://phabricator.wikimedia.org/T207927 [15:44:16] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install pc2007-pc2010 - https://phabricator.wikimedia.org/T207259 (10Papaul) [15:44:23] RECOVERY - Device not healthy -SMART- on db2048 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db2048&var-datasource=codfw%2520prometheus%252Fops [15:44:27] (03PS2) 10Vgutierrez: certcentral: Provide unique LE accounts for each certcentral hosts [puppet] - 10https://gerrit.wikimedia.org/r/470404 (https://phabricator.wikimedia.org/T208212) [15:44:40] 10Operations, 10Icinga, 10fundraising-tech-ops, 10monitoring: Why doesn't icinga notify the team-fr-tech-ops contact for services in WARNING state? - https://phabricator.wikimedia.org/T207966 (10Dzahn) Also, are we talking about email notifications or IRC notifications or another method? [15:45:28] (03CR) 10Vgutierrez: [C: 032] certcentral: Provide unique LE accounts for each certcentral hosts [puppet] - 10https://gerrit.wikimedia.org/r/470404 (https://phabricator.wikimedia.org/T208212) (owner: 10Vgutierrez) [15:46:57] (03PS5) 10Gehel: wdqs: rate limit log sent to logstash [puppet] - 10https://gerrit.wikimedia.org/r/468979 (https://phabricator.wikimedia.org/T207656) [15:47:45] 10Operations, 10Cloud-Services, 10Mail, 10Patch-For-Review, 10User-herron: Create a Cloud VPS SMTP smarthost - https://phabricator.wikimedia.org/T41785 (10herron) How are these working so far? Hopefully no news is good news! Is there anything that needs to be addressed before resolving this? [15:47:51] (03CR) 10Gehel: [C: 032] wdqs: rate limit log sent to logstash [puppet] - 10https://gerrit.wikimedia.org/r/468979 (https://phabricator.wikimedia.org/T207656) (owner: 10Gehel) [15:48:13] !log power off ms-be2021 for controller alarms troubleshooting - T208096 [15:48:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:48:17] T208096: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T208096 [15:49:53] PROBLEM - Host ms-be2021 is DOWN: PING CRITICAL - Packet loss = 100% [15:50:19] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install pc2007-pc2010 - https://phabricator.wikimedia.org/T207259 (10jcrespo) @Papaul: @Banyek will be your contact point as he will be the person in charge of the related goal while Manuel is out. [15:50:48] (03CR) 10jenkins-bot: Create and use wikidataclient-test.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470413 (owner: 10Addshore) [15:51:16] (03PS2) 10Gehel: wdqs: cleanup logback configuration [puppet] - 10https://gerrit.wikimedia.org/r/469611 (https://phabricator.wikimedia.org/T207834) [15:52:42] (03CR) 10Gehel: [C: 032] wdqs: cleanup logback configuration [puppet] - 10https://gerrit.wikimedia.org/r/469611 (https://phabricator.wikimedia.org/T207834) (owner: 10Gehel) [15:53:29] (03PS1) 10Addshore: BETA, wmgWikibaseClientRepositories, load wikidata lexeme NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470420 [15:53:58] (03CR) 10Addshore: [C: 032] BETA, wmgWikibaseClientRepositories, load wikidata lexeme NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470420 (owner: 10Addshore) [15:54:04] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [15:54:34] doesn't look dramatic [15:55:01] (03Merged) 10jenkins-bot: BETA, wmgWikibaseClientRepositories, load wikidata lexeme NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470420 (owner: 10Addshore) [15:55:03] (03PS1) 10Ottomata: Update YARN fair share premption for production and essential queues [puppet] - 10https://gerrit.wikimedia.org/r/470421 (https://phabricator.wikimedia.org/T208208) [15:55:33] PROBLEM - Host ms-be2021.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [15:56:09] (03Abandoned) 10Gehel: wdqs: cleanup logback configuration [puppet] - 10https://gerrit.wikimedia.org/r/463254 (https://phabricator.wikimedia.org/T207834) (owner: 10Gehel) [15:56:09] !log addshore@deploy1001 Synchronized wmf-config: BETA ONLY (duration: 00m 48s) [15:56:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:56:23] the memcached errors are the recurrent gadget-def issue [15:56:23] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [15:56:36] not that dramatic, it happens sporadically now [15:58:37] (03PS2) 10Gehel: wdqs: increase restart interval of wdqs-updater [puppet] - 10https://gerrit.wikimedia.org/r/469447 (https://phabricator.wikimedia.org/T207843) [15:59:01] (03PS3) 10Gehel: wdqs: increase restart interval of wdqs-updater [puppet] - 10https://gerrit.wikimedia.org/r/469447 (https://phabricator.wikimedia.org/T207843) [16:01:04] 10Operations, 10Icinga, 10fundraising-tech-ops, 10monitoring: Why doesn't icinga notify the team-fr-tech-ops contact for services in WARNING state? - https://phabricator.wikimedia.org/T207966 (10Jgreen) >>! In T207966#4703201, @Dzahn wrote: > Could you give an example which service is currently in warn or... [16:01:16] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: rack/setup/install pc2007-pc2010 - https://phabricator.wikimedia.org/T207259 (10Papaul) [16:01:54] 10Operations, 10ops-codfw: unrack/decom cr1-eqord - https://phabricator.wikimedia.org/T208049 (10Papaul) Received the router . [16:01:58] (03CR) 10Mathew.onipe: [C: 031] wdqs: increase restart interval of wdqs-updater [puppet] - 10https://gerrit.wikimedia.org/r/469447 (https://phabricator.wikimedia.org/T207843) (owner: 10Gehel) [16:05:20] !og remove emailbot from acl*sre-team in phab, afaik this is unused now and unmaintained [16:05:56] jouncebot: now [16:05:56] No deployments scheduled for the next 0 hour(s) and 54 minute(s) [16:06:45] (03CR) 10jenkins-bot: BETA, wmgWikibaseClientRepositories, load wikidata lexeme NS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470420 (owner: 10Addshore) [16:07:24] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 5.734 second response time [16:09:22] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review, 10User-Banyek: rack/setup/install pc2007-pc2010 - https://phabricator.wikimedia.org/T207259 (10Banyek) [16:10:54] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:12:14] 10Operations, 10Cloud-Services, 10Mail, 10Patch-For-Review, 10User-herron: Create a Cloud VPS SMTP smarthost - https://phabricator.wikimedia.org/T41785 (10Krenair) >>! In T41785#4703228, @herron wrote: > How are these working so far? Hopefully no news is good news! Is there anything that needs to be ad... [16:17:52] (03PS1) 10Addshore: Wikibase, move namespace config to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470423 [16:19:55] PROBLEM - IPMI Sensor Status on cloudvirt1021 is CRITICAL: Sensor Type(s) Temperature, Power_Supply Status: Critical [PS Redundancy = Critical, Status = Critical] [16:20:33] chasemp: you may want to re-log the "!og" line you tried here at 16:04Z [16:20:49] thanks bd808 [16:20:52] !log remove emailbot from acl*sre-team in phab, afaik this is unused now and unmaintained [16:20:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:23:53] (03PS1) 10Addshore: Wikibase, move client list config to nice part of Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470425 [16:27:30] (03PS1) 10Addshore: Wikibase.php, finish the grand cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470426 [16:27:44] right, the final 3 patches for the grand wikibase.php etc cleanup! [16:28:15] (03CR) 10Addshore: [C: 032] Wikibase, move namespace config to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470423 (owner: 10Addshore) [16:28:23] RECOVERY - Host ms-be2021.mgmt is UP: PING OK - Packet loss = 0%, RTA = 36.90 ms [16:29:22] (03Merged) 10jenkins-bot: Wikibase, move namespace config to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470423 (owner: 10Addshore) [16:29:42] 10Operations, 10WMF-NDA: Issues with purgeUnusedProjects.php cron job on mwmaint1002 (Fri Oct 26) - https://phabricator.wikimedia.org/T208231 (10chasemp) (meta note) Just a heads up, this task is currently public and not restricted to #wmf-nda (top left) > Open, Normal Public If I `edit task` and set the v... [16:31:52] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Wikibase, move namespace config to IS.php PT 1/2 (duration: 00m 47s) [16:31:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:35] (03CR) 10Addshore: [C: 032] Wikibase, move client list config to nice part of Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470425 (owner: 10Addshore) [16:32:49] !log addshore@deploy1001 Synchronized wmf-config: Wikibase, move namespace config to IS.php PT 2/2 (duration: 00m 47s) [16:32:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:33:04] 10Operations, 10ops-eqiad, 10netops: asw2-a-eqiad FPC7 faulty PEM0 - https://phabricator.wikimedia.org/T206972 (10Cmjohnson) I boxed the broken pem and will be shipping it today [16:33:39] (03Merged) 10jenkins-bot: Wikibase, move client list config to nice part of Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470425 (owner: 10Addshore) [16:34:48] (03CR) 10Addshore: [C: 032] Wikibase.php, finish the grand cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470426 (owner: 10Addshore) [16:35:02] !log addshore@deploy1001 Synchronized wmf-config: Wikibase, move client list config to nice part of Wikibase.php (duration: 00m 47s) [16:35:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:35:52] (03Merged) 10jenkins-bot: Wikibase.php, finish the grand cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470426 (owner: 10Addshore) [16:36:34] !log create wikimaniawiki_general indices for eqiad and codfw elasticsearch clusters [16:36:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:37:28] !log addshore@deploy1001 Synchronized wmf-config: Wikibase.php, finish the grand cleanup (duration: 00m 48s) [16:37:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:47] (03CR) 10jenkins-bot: Wikibase, move namespace config to IS.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470423 (owner: 10Addshore) [16:38:49] (03CR) 10jenkins-bot: Wikibase, move client list config to nice part of Wikibase.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470425 (owner: 10Addshore) [16:38:51] (03CR) 10jenkins-bot: Wikibase.php, finish the grand cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470426 (owner: 10Addshore) [16:38:53] (03PS1) 10Addshore: Wikibase.php, update 2 comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470427 [16:39:02] (03CR) 10Addshore: [C: 032] Wikibase.php, update 2 comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470427 (owner: 10Addshore) [16:40:10] (03CR) 10Herron: [C: 04-1] "https://puppet-compiler.wmflabs.org/compiler1002/13251/mx1001.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/469524 (https://phabricator.wikimedia.org/T207887) (owner: 10Alex Monk) [16:41:17] herron, well that's interesting. [16:41:20] (03Merged) 10jenkins-bot: Wikibase.php, update 2 comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470427 (owner: 10Addshore) [16:41:36] herron, those errors from realm.pp are on line 24... [16:41:50] does PCC work on those hosts in general? [16:42:26] !log addshore@deploy1001 Synchronized wmf-config: phpdoc comments only (duration: 00m 48s) [16:42:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:43:04] PROBLEM - Kafka Broker Replica Max Lag on kafka-jumbo1001 is CRITICAL: 2.089e+07 ge 5e+06 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1001 [16:43:34] ottomata: are you working on jumbo? [16:44:03] PROBLEM - Kafka Broker Replica Max Lag on kafka-jumbo1003 is CRITICAL: 2.098e+07 ge 5e+06 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1003 [16:44:04] PROBLEM - Kafka Broker Replica Max Lag on kafka-jumbo1005 is CRITICAL: 2.172e+07 ge 5e+06 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1005 [16:45:04] Krenair: yes, for instance at https://puppet-compiler.wmflabs.org/compiler1002/13251/mx1001.wikimedia.org/ “production catalog” is the current version and “change catalog” is with the gerrit patch [16:45:08] the warnings are to be expected [16:45:33] sorry, errors/warnings rather than catalog [16:46:01] oooh right [16:46:05] still wtf [16:46:15] heh yeah odd indeed [16:46:19] the error doesn't make any sense :S [16:48:39] elukey: yes [16:48:44] i just moved some partitions, see lgo in analytics [16:48:51] sorry didn't expec that.... i think that's ok [16:48:54] that's just some partitions rereplicating [16:49:01] yeah let's log in here too [16:49:06] I missed the log in the other chan :( [16:49:08] Reassignment of partition eventlogging_ReadingDepth-0 is still in progress [16:49:11] k [16:49:19] !log reassigning eventlogging_ReadingDepth partition 0 from 1002,1004,1006 to 1003,1001,1005 to move preferred leadership from 1002 to 1003 [16:49:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:50:05] RECOVERY - IPMI Sensor Status on cloudvirt1021 is OK: Sensor Type(s) Temperature, Power_Supply Status: OK [16:50:13] RECOVERY - HP RAID on db2048 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12 - Controller: OK - Battery/Capacitor: OK [16:53:55] PROBLEM - configured eth on labvirt1015 is CRITICAL: eth1 reporting no carrier. [16:54:22] (03CR) 10jenkins-bot: Wikibase.php, update 2 comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470427 (owner: 10Addshore) [16:54:24] (03CR) 10Alex Monk: "Yep totally stumped by that one." [puppet] - 10https://gerrit.wikimedia.org/r/469524 (https://phabricator.wikimedia.org/T207887) (owner: 10Alex Monk) [16:55:49] (03CR) 10Alex Monk: "Ah I've missed an apostrophe in common.yaml" [puppet] - 10https://gerrit.wikimedia.org/r/469524 (https://phabricator.wikimedia.org/T207887) (owner: 10Alex Monk) [16:56:06] (03PS2) 10Alex Monk: Move mail_smarthost (and wikimail_smarthost) to hiera [puppet] - 10https://gerrit.wikimedia.org/r/469524 (https://phabricator.wikimedia.org/T207887) [16:56:27] (03CR) 10jerkins-bot: [V: 04-1] Move mail_smarthost (and wikimail_smarthost) to hiera [puppet] - 10https://gerrit.wikimedia.org/r/469524 (https://phabricator.wikimedia.org/T207887) (owner: 10Alex Monk) [16:56:43] (03CR) 10Thcipriani: [C: 032] "Can update keyholder script in a follow-on patch." (031 comment) [software/keyholder] - 10https://gerrit.wikimedia.org/r/458240 (owner: 10Faidon Liambotis) [16:56:45] bah [16:57:30] (03Merged) 10jenkins-bot: Add permission checks for various commands [software/keyholder] - 10https://gerrit.wikimedia.org/r/458240 (owner: 10Faidon Liambotis) [16:57:31] herron, so I guess a better question is [16:57:39] How did it pass jenkins? [16:57:58] I missed an apostrophe in common.yaml, that should be a syntax error [16:58:05] hmm, not sure if jenkins is linting hiera yaml off hand [16:59:04] (03PS3) 10Alex Monk: Move mail_smarthost (and wikimail_smarthost) to hiera [puppet] - 10https://gerrit.wikimedia.org/r/469524 (https://phabricator.wikimedia.org/T207887) [16:59:22] mind sticking that one through PCC? ^ [16:59:35] RECOVERY - configured eth on labvirt1015 is OK: OK - interfaces up [17:00:04] gehel and onimisionipe: I seem to be stuck in Groundhog week. Sigh. Time for (yet another) Wikidata Query Service weekly deploy deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T1700). [17:00:17] here here! [17:01:04] just did, looking better now [17:01:16] 10Operations: Ensure jenkins on puppet.git checks for yaml syntax errors - https://phabricator.wikimedia.org/T208240 (10Krenair) [17:01:16] I've opened https://phabricator.wikimedia.org/T208240 [17:01:43] (03CR) 10Alex Monk: "made T208240 for that" [puppet] - 10https://gerrit.wikimedia.org/r/469524 (https://phabricator.wikimedia.org/T207887) (owner: 10Alex Monk) [17:02:27] going to pull a better list of hosts and run it through pcc again for paranoias sake [17:03:21] yeah [17:03:29] realm.pp is probably a good place for paranoia [17:03:46] (03PS1) 10Bstorm: sonofgridengine: Fix type reference for gridengine_queues [puppet] - 10https://gerrit.wikimedia.org/r/470431 (https://phabricator.wikimedia.org/T200557) [17:04:04] <_joe_> herron: pro tip: if you leave the host list empty, the compiler will do a full run on a smart-built list of hosts [17:04:14] <_joe_> which is not perfect, but almost [17:04:32] _joe_: thanks! that should be perfect for this [17:04:32] <_joe_> it will take ~ 2 hours to run though, but it's worth it [17:04:32] and will take forever ;) [17:04:51] thats ok, there is lunch to be had! [17:05:01] (03CR) 10Bstorm: [C: 032] sonofgridengine: Fix type reference for gridengine_queues [puppet] - 10https://gerrit.wikimedia.org/r/470431 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [17:09:13] RECOVERY - Host ms-be2021 is UP: PING OK - Packet loss = 0%, RTA = 36.32 ms [17:09:34] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T208096 (10Papaul) a:05Papaul>03fgiunchedi @fgiunchedi battery is dead, it needs to be replaced. I update the firmware. Server is back up. [17:09:39] 10Operations, 10New-Readers: Create URL for Mexico Awareness Campaign - https://phabricator.wikimedia.org/T207816 (10Nirzar) >Could the content as well be in a wiki page given that we can upload images and format the text? (and if you imagine it would be locked, so regular users can't edit it) yes, there is n... [17:11:34] (03PS1) 10Andrew Bogott: m5: allow nova@labcontrol1001 to access nova_eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/470438 [17:11:36] (03CR) 10Herron: "kicked off another pcc run this time against a much wider set of hosts. it will take some time to complete, but can refresh https://puppe" [puppet] - 10https://gerrit.wikimedia.org/r/469524 (https://phabricator.wikimedia.org/T207887) (owner: 10Alex Monk) [17:11:54] PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on einsteinium is CRITICAL: 41.19 le 60 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [17:12:03] (03PS1) 10Bstorm: sonofgridengine: move queue definitions to the master [puppet] - 10https://gerrit.wikimedia.org/r/470439 (https://phabricator.wikimedia.org/T200557) [17:12:41] (03CR) 10Andrew Bogott: [C: 032] m5: allow nova@labcontrol1001 to access nova_eqiad1 [puppet] - 10https://gerrit.wikimedia.org/r/470438 (owner: 10Andrew Bogott) [17:13:04] (03PS2) 10Bstorm: sonofgridengine: move queue definitions to the master [puppet] - 10https://gerrit.wikimedia.org/r/470439 (https://phabricator.wikimedia.org/T200557) [17:15:32] 10Operations, 10Icinga, 10fundraising-tech-ops, 10monitoring: Why doesn't icinga notify the team-fr-tech-ops contact for services in WARNING state? - https://phabricator.wikimedia.org/T207966 (10Dzahn) confirmed the "Service Alert History" does show this alert: Service Warning[2018-10-29 17:11:33] SERVIC... [17:17:22] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T208096 (10fgiunchedi) a:05fgiunchedi>03Papaul Thanks @papaul, the host is back up. Please proceed with ordering a replacement battery and let me know when ready to swap. [17:18:36] 10Operations, 10Performance-Team, 10Traffic: Investigate using RFC 7838 Alternate Services to better optimize edge connections - https://phabricator.wikimedia.org/T208242 (10BBlack) [17:19:43] (03PS1) 10Andrew Bogott: Region-migrate: use the eqiad1 database when updating the VM user_id [puppet] - 10https://gerrit.wikimedia.org/r/470440 [17:20:27] 10Operations, 10Performance-Team, 10Traffic: Investigate using RFC 7838 Alternate Services to better optimize edge connections - https://phabricator.wikimedia.org/T208242 (10BBlack) [17:21:57] (03CR) 10Andrew Bogott: [C: 032] Region-migrate: use the eqiad1 database when updating the VM user_id [puppet] - 10https://gerrit.wikimedia.org/r/470440 (owner: 10Andrew Bogott) [17:22:40] 10Operations, 10Icinga, 10fundraising-tech-ops, 10monitoring: Why doesn't icinga notify the team-fr-tech-ops contact for services in WARNING state? - https://phabricator.wikimedia.org/T207966 (10Dzahn) It's the notication options for the hosts and services in frack. All hosts in the "nsca_frack.cfg" confi... [17:22:51] (03PS3) 10Bstorm: sonofgridengine: move queue definitions to the master [puppet] - 10https://gerrit.wikimedia.org/r/470439 (https://phabricator.wikimedia.org/T200557) [17:24:30] (03CR) 10Bstorm: [C: 032] sonofgridengine: move queue definitions to the master [puppet] - 10https://gerrit.wikimedia.org/r/470439 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [17:25:20] 10Operations, 10Icinga, 10fundraising-tech-ops, 10monitoring: Why doesn't icinga notify the team-fr-tech-ops contact for services in WARNING state? - https://phabricator.wikimedia.org/T207966 (10Jgreen) >>! In T207966#4703590, @Dzahn wrote: > It's the notication options for the hosts and services in frack.... [17:28:41] 10Operations, 10Icinga, 10fundraising-tech-ops, 10monitoring: Why doesn't icinga notify the team-fr-tech-ops contact for services in WARNING state? - https://phabricator.wikimedia.org/T207966 (10Dzahn) Ok, actually it's just the one line in the template for services. Hosts don't have a "w" option. For ho... [17:28:44] RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on einsteinium is OK: (C)60 le (W)70 le 70.16 https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [17:30:47] (03PS1) 10Bstorm: sonofgridengine: assign qname as the namevar for gridengine_queue [puppet] - 10https://gerrit.wikimedia.org/r/470442 (https://phabricator.wikimedia.org/T200557) [17:30:48] ACKNOWLEDGEMENT - HP RAID on ms-be2021 is CRITICAL: CRITICAL: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4 - Controller: OK - Cache: Permanently Disabled - Battery count: 0 nagiosadmin RAID handler auto-ack: https://phabricator.wikimedia.org/T208245 [17:30:57] 10Operations, 10ops-codfw: Degraded RAID on ms-be2021 - https://phabricator.wikimedia.org/T208245 (10ops-monitoring-bot) [17:31:41] git fat issues on wdqs deployment, it is going to be delayed a bit [17:31:45] SMalyshev: ^ [17:31:58] (03CR) 10Bstorm: [C: 032] sonofgridengine: assign qname as the namevar for gridengine_queue [puppet] - 10https://gerrit.wikimedia.org/r/470442 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [17:32:08] (03PS1) 10Dzahn: icinga/nsca/frack: enable notifications for service warnings in frack [puppet] - 10https://gerrit.wikimedia.org/r/470443 (https://phabricator.wikimedia.org/T207966) [17:34:10] gehel: ok [17:39:26] (03PS1) 10Bstorm: sonofgridengine: correct declaration of namevar for qname [puppet] - 10https://gerrit.wikimedia.org/r/470444 (https://phabricator.wikimedia.org/T200557) [17:40:19] (03CR) 10Faidon Liambotis: [C: 04-1] cloudvps: eqiad1: add cloudinstances2b virtual router FQDNs (033 comments) [dns] - 10https://gerrit.wikimedia.org/r/460320 (https://phabricator.wikimedia.org/T202886) (owner: 10Arturo Borrero Gonzalez) [17:42:25] 10Operations, 10MediaWiki-Page-deletion, 10Performance-Team, 10MW-1.32-notes, and 3 others: Deleting pages on the English Wikipedia is very slow - https://phabricator.wikimedia.org/T207530 (10jcrespo) >>! In T207530#4690475, @tstarling wrote: > I logged a deletion on en.wikipedia.org using X-Wikimedia-Debu... [17:43:38] (03CR) 10Bstorm: [C: 032] sonofgridengine: correct declaration of namevar for qname [puppet] - 10https://gerrit.wikimedia.org/r/470444 (https://phabricator.wikimedia.org/T200557) (owner: 10Bstorm) [17:48:16] 10Operations, 10Cloud-VPS, 10netops: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10Dzahn) [17:50:27] 10Operations, 10Cloud-VPS, 10netops: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10Dzahn) @ayounsi @faidon are there router ACLs to allow udp/123 NTP and these don't have the new cloud IP ranges but do have old cloud IP ranges ? [17:50:39] (03PS1) 10Andrew Bogott: cloudservices: include ntp servers. [puppet] - 10https://gerrit.wikimedia.org/r/470445 (https://phabricator.wikimedia.org/T208244) [17:50:41] (03PS1) 10Andrew Bogott: ntp: use cloud-specific ntp servers for cloud VMS [puppet] - 10https://gerrit.wikimedia.org/r/470446 (https://phabricator.wikimedia.org/T208244) [17:53:07] 10Operations, 10Cloud-VPS, 10netops, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10Andrew) I've attached patches that propose running a cloud-specific NTP server. I'd also be OK with changing the network ACLs to allow the new region to access the standard NTP... [17:53:11] 10Operations, 10Cloud-VPS, 10netops, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10faidon) Can we set up a couple of NTP servers within VPS e.g. in the cloudinfra project instead? Should be just a couple of generic instances with `role::ntp` applied, right? [17:55:38] (03PS1) 10Smalyshev: Reduce small file size for lexemes [puppet] - 10https://gerrit.wikimedia.org/r/470447 (https://phabricator.wikimedia.org/T207030) [17:56:14] 10Operations, 10Cloud-VPS, 10netops, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10Krenair) I just noticed profile::ntp also contains some ACLs of it's own which restrict source addresses and don't use ferm::service @faidon: Well, standard::ntp and profile::n... [18:00:04] Deploy window Morning SWAT (Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T1800) [18:00:04] RoanKattouw, hoo, and TheJair: A patch you scheduled for Morning SWAT (Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:04:12] I'll get my patch out of the way… but would prefer if someone else could pick up then [18:04:20] Oh sorry, I missed the IRC ping somehow [18:04:24] I can do the SWAT [18:04:30] hoo: Including your patch if you like, just let me now [18:04:53] both is fine with me [18:05:43] OK I'll deploy your patch first the [18:05:44] n [18:05:55] Cool :) [18:06:00] (03CR) 10Catrope: [C: 032] Enable Wikidata data access on trwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470224 (https://phabricator.wikimedia.org/T204419) (owner: 10Hoo man) [18:07:22] (03PS2) 10Jgreen: icinga/nsca/frack: enable notifications for service warnings in frack [puppet] - 10https://gerrit.wikimedia.org/r/470443 (https://phabricator.wikimedia.org/T207966) (owner: 10Dzahn) [18:07:54] (03PS2) 10Catrope: Enable Wikidata data access on trwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470224 (https://phabricator.wikimedia.org/T204419) (owner: 10Hoo man) [18:08:02] (03CR) 10Catrope: Enable Wikidata data access on trwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470224 (https://phabricator.wikimedia.org/T204419) (owner: 10Hoo man) [18:08:06] (03CR) 10Catrope: [C: 032] Enable Wikidata data access on trwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470224 (https://phabricator.wikimedia.org/T204419) (owner: 10Hoo man) [18:09:11] (03Merged) 10jenkins-bot: Enable Wikidata data access on trwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470224 (https://phabricator.wikimedia.org/T204419) (owner: 10Hoo man) [18:09:26] (03CR) 10jenkins-bot: Enable Wikidata data access on trwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470224 (https://phabricator.wikimedia.org/T204419) (owner: 10Hoo man) [18:09:29] (03CR) 10Jgreen: [C: 032] icinga/nsca/frack: enable notifications for service warnings in frack [puppet] - 10https://gerrit.wikimedia.org/r/470443 (https://phabricator.wikimedia.org/T207966) (owner: 10Dzahn) [18:09:48] !log onimisionipe@deploy1001 Started deploy [wdqs/wdqs@7eeede7]: Improved time handling for Kafka, GUI Update and caching removal from updater [18:09:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:10:27] hoo: Your change is on mwdebug1002, please teest [18:10:43] TheJair: Are you here for your SWAT patches? [18:10:48] yes [18:11:01] (03PS3) 10Catrope: Enable PageTriage/Copyvio in testwiki and enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469438 (owner: 10Sbisson) [18:11:11] (03CR) 10Catrope: [C: 032] Enable PageTriage/Copyvio in testwiki and enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469438 (owner: 10Sbisson) [18:11:43] RoanKattouw, just so you're aware, TheJair is Google Code In student and getting a site config change deployed is one of tasks. [18:11:48] Oh cool! [18:12:09] RoanKattouw: Looks good [18:12:17] TheJair: Do you have the WikimediaDebug browser extension installed? [18:12:25] (03Merged) 10jenkins-bot: Enable PageTriage/Copyvio in testwiki and enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469438 (owner: 10Sbisson) [18:12:25] yes I do [18:12:33] 10Operations, 10Cloud-VPS, 10netops, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10BBlack) Yeah I'd agree that's the direction we should go. We don't offer our ntp servers to the globe for good reasons, and we similarly probably shouldn't be offering them to W... [18:12:39] (03CR) 10jenkins-bot: Enable PageTriage/Copyvio in testwiki and enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469438 (owner: 10Sbisson) [18:12:43] Awesome. [18:13:06] I'm deploying hoo's patch now, then I'll do mine, and yours will be next afte rthat [18:13:26] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable Wikidata data access on trwiktionary (T204419) (duration: 00m 48s) [18:13:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:13:31] T204419: enable arbitrary access on tr.wiktionary - https://phabricator.wikimedia.org/T204419 [18:13:36] 10Operations, 10Cloud-VPS, 10netops, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10BBlack) (or alternatively, we could look at this as one of the clear examples where a separate WMCS puppetization would be far simpler). [18:13:53] (03CR) 10Catrope: [C: 032] Update logo for Hebrew Wikivoyage, Add HD hewikivoyage logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba) [18:13:58] (03CR) 10Catrope: [C: 032] Add Hebrew Wikivoyage HD logo location in InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470242 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba) [18:15:00] (03Merged) 10jenkins-bot: Update logo for Hebrew Wikivoyage, Add HD hewikivoyage logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba) [18:15:14] (03CR) 10jenkins-bot: Update logo for Hebrew Wikivoyage, Add HD hewikivoyage logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470236 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba) [18:16:01] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable PageTriage/Copyvio on testwiki and enwiki (duration: 00m 47s) [18:16:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:22] (03PS3) 10Catrope: Add Hebrew Wikivoyage HD logo location in InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470242 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba) [18:16:35] (03CR) 10Catrope: Add Hebrew Wikivoyage HD logo location in InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470242 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba) [18:16:40] (03CR) 10Catrope: [C: 032] Add Hebrew Wikivoyage HD logo location in InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470242 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba) [18:18:26] (03Merged) 10jenkins-bot: Add Hebrew Wikivoyage HD logo location in InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470242 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba) [18:18:41] (03CR) 10jenkins-bot: Add Hebrew Wikivoyage HD logo location in InitialiseSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470242 (https://phabricator.wikimedia.org/T208148) (owner: 10Stibba) [18:19:02] TheJair: Alright, both of your changes are on mwdebug1002, please test using the WikimediaDebug extension [18:19:09] (03CR) 10Herron: [C: 032] "PCC shows a nice big noop with the exception of a small set of hosts which look to have unrelated issues. Going forward with this" [puppet] - 10https://gerrit.wikimedia.org/r/469524 (https://phabricator.wikimedia.org/T207887) (owner: 10Alex Monk) [18:19:15] (03PS4) 10Herron: Move mail_smarthost (and wikimail_smarthost) to hiera [puppet] - 10https://gerrit.wikimedia.org/r/469524 (https://phabricator.wikimedia.org/T207887) (owner: 10Alex Monk) [18:20:30] !log onimisionipe@deploy1001 Finished deploy [wdqs/wdqs@7eeede7]: Improved time handling for Kafka, GUI Update and caching removal from updater (duration: 10m 42s) [18:20:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:20:55] onimisionipe: did you deploy wdqs update yet? [18:21:22] if not wait for a bit I think updater binary may not be the latest version [18:21:33] RoanKattouw: Looks like it's working [18:21:35] I'll check and tell you in 5 mins or so [18:21:43] Alright, deploying everywhere [18:22:03] RECOVERY - Kafka Broker Replica Max Lag on kafka-jumbo1001 is OK: (C)5e+06 ge (W)1e+06 ge 8.159e+05 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1001 [18:22:18] !log moving mail_smarthost (and wikimail_smarthost) to hiera (gerrit 469524) [18:22:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:24] !log catrope@deploy1001 sync-file aborted: Update logo for hewikivoyage, add HD logos (duration: 00m 07s) [18:22:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:22:45] herron, what's the deal with the unrelated issues? some hosts currently can't succeed in PCC? [18:23:07] RoanKattouw, I'm wondering what does sync-file aborted message from logmsgbot above mean... [18:23:09] yeah they have issues compiling before the change [18:23:19] !log catrope@deploy1001 Synchronized static/images: Update logo for hewikivoyage, add HD logos (T208148) (duration: 00m 48s) [18:23:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:22] can see them at the bottom of the pcc link [18:23:23] T208148: Change Hebrew Wikivoyage logo - https://phabricator.wikimedia.org/T208148 [18:23:25] Oh whoops, that's because I changed the message to include the bug number [18:23:40] I hit Ctrl+C shortly after starting the sync, I didn't realize that got logged now [18:23:40] SMalyshev: I just finished now. Had some issues with git fat. [18:23:44] oh yes [18:23:49] thank you RoanKattouw [18:24:00] onimisionipe: ok, nm, I'll check anyway and redeploy if the version is wrong [18:24:56] SMalyshev: Ok then [18:24:57] !log catrope@deploy1001 Synchronized wmf-config/InitialiseSettings.php: Enable hewikivoyage HD logo (T208148) (duration: 00m 47s) [18:25:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:07] All done! [18:26:09] no the version seems to be correct [18:27:39] 10Operations, 10Icinga, 10fundraising-tech-ops, 10monitoring, 10Patch-For-Review: Why doesn't icinga notify the team-fr-tech-ops contact for services in WARNING state? - https://phabricator.wikimedia.org/T207966 (10Jgreen) Woohoo, fixed! Subject: ** PROBLEM alert - frbackup2001/check_rsyslog_backlog is... [18:27:49] 10Operations, 10Icinga, 10fundraising-tech-ops, 10monitoring, 10Patch-For-Review: Why doesn't icinga notify the team-fr-tech-ops contact for services in WARNING state? - https://phabricator.wikimedia.org/T207966 (10Jgreen) 05Open>03Resolved [18:29:51] hoo|away: There are a lot of SiteLinkLookup errors in the log, but they appear to predate today's SWAT [18:30:01] e.g. ErrorException from line 309 of /srv/mediawiki/php-1.33.0-wmf.1/includes/debug/MWDebug.php: PHP Warning: According to a SiteLinkLookup Q9748852 is linked to zhwiktionary while it is not or it does not exist. [Called from Wikibase\Client\ParserOutput\ClientParserOutputDataUpdater::setBadgesProperty in /srv/mediawiki/php-1.33.0-wmf.1/extensions/Wikibase/client/includes/ParserOutput/ClientParserOutputDataUpdater.php at line 140] [18:30:28] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 4.964 second response time [18:30:58] Hm never mind, there's a task for that already: T183993 [18:30:59] T183993: Investigate & Fix "According to a SiteLinkLookup Q47013093 is linked to frwiki while it is not or it does not exist" in logs - https://phabricator.wikimedia.org/T183993 [18:31:38] !log deploy1001:sudo -u l10nupdate scap cdb-refresh-json --directory /srv/mediawiki-staging/php-1.33.0-wmf.1/cache/l10n (ref: T208196) [18:31:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:42] T208196: l10nupdate aborts due to l10n_cache-ti.cdb.json being truncated - https://phabricator.wikimedia.org/T208196 [18:33:07] !log deploy1001:scap pull (ref: T208196) [18:33:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:33:18] 10Operations, 10ops-eqiad: Degraded RAID on cloudvirt1019 - https://phabricator.wikimedia.org/T196507 (10Dzahn) [18:33:23] 10Operations, 10ops-eqiad, 10Cloud-Services, 10cloud-services-team: Degraded RAID on cloudvirt1019 - https://phabricator.wikimedia.org/T207868 (10Dzahn) [18:33:38] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:34:50] ACKNOWLEDGEMENT - Host cloudvirt1019 is DOWN: PING CRITICAL - Packet loss = 100% daniel_zahn https://phabricator.wikimedia.org/T196507 [18:34:59] 10Operations, 10Cloud-Services, 10Mail, 10Patch-For-Review, 10User-herron: Create a Cloud VPS SMTP smarthost - https://phabricator.wikimedia.org/T41785 (10herron) Just to close the loop within this task https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/469524/ has been merged [18:42:44] herron, new MX is looking good [18:42:58] we should probably set it up for SPF/DKIM etc. at some point but this is a good start [18:43:02] excellent! [18:43:10] yeah good call [18:43:12] https://phabricator.wikimedia.org/T207887#4703883 [18:44:35] nice [18:45:08] thanks much for the mail_smarthost patch btw! [18:53:57] !log installing tiff security updates for jessie [18:53:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:54:29] 10Operations, 10Analytics, 10EventBus, 10Wikidata, and 7 others: WDQS Updater ran into issue and stopped working - https://phabricator.wikimedia.org/T207817 (10Smalyshev) [18:55:30] 10Operations, 10Performance-Team, 10Wikidata, 10Wikidata-Query-Service: Errors trying to fetch RDF from Wikidata - https://phabricator.wikimedia.org/T207718 (10Smalyshev) Still happening. Anybody from #performance-team could assist? [18:57:17] RECOVERY - Kafka Broker Replica Max Lag on kafka-jumbo1005 is OK: (C)5e+06 ge (W)1e+06 ge 8.439e+05 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1005 [19:00:25] (03PS1) 10Herron: role::logstash::collector: migrate to profile::logstash::collector [puppet] - 10https://gerrit.wikimedia.org/r/470452 (https://phabricator.wikimedia.org/T206454) [19:01:12] (03CR) 10jerkins-bot: [V: 04-1] role::logstash::collector: migrate to profile::logstash::collector [puppet] - 10https://gerrit.wikimedia.org/r/470452 (https://phabricator.wikimedia.org/T206454) (owner: 10Herron) [19:01:17] PROBLEM - DPKG on contint2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [19:02:13] (03PS2) 10Herron: role::logstash::collector: migrate to profile::logstash::collector [puppet] - 10https://gerrit.wikimedia.org/r/470452 (https://phabricator.wikimedia.org/T206454) [19:03:17] !log restart apache on phab1001 to hotfix T208254 [19:03:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:21] T208254: Legalpad access controls are confusing and seemingly broken - https://phabricator.wikimedia.org/T208254 [19:03:38] RECOVERY - Kafka Broker Replica Max Lag on kafka-jumbo1003 is OK: (C)5e+06 ge (W)1e+06 ge 7.567e+05 https://grafana.wikimedia.org/dashboard/db/kafka?panelId=16&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops&var-kafka_cluster=jumbo-eqiad&var-kafka_broker=kafka-jumbo1003 [19:12:06] !log redirect ns0 to authdns2001 [19:12:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:12:37] PROBLEM - Check systemd state on ms-be1042 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:13:26] 10Operations, 10New-Readers: Create URL for Mexico Awareness Campaign - https://phabricator.wikimedia.org/T207816 (10Dzahn) In this case, let's pick something in wikimedia.org and deploy it as a micro-site along other static sites like design.wikimedia.org, research.wikimedia.org and others. Additionally you... [19:14:07] 10Operations, 10New-Readers: Create URL for Mexico Awareness Campaign - https://phabricator.wikimedia.org/T207816 (10Dzahn) a:03Dzahn [19:15:04] (03CR) 10Krinkle: [C: 04-1] "The other files in this directory use "wikimedia" to refer to the movement, and "wmf" to refer to the organisation. For consistency this s" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/469214 (https://phabricator.wikimedia.org/T198946) (owner: 10Niedzielski) [19:19:17] RECOVERY - Device not healthy -SMART- on db1073 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db1073&var-datasource=eqiad%2520prometheus%252Fops [19:22:17] RECOVERY - Device not healthy -SMART- on labsdb1005 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=labsdb1005&var-datasource=eqiad%2520prometheus%252Fops [19:23:24] !log rebooting authdns1001 for kernel security update [19:23:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:24:32] 10Operations, 10Operations-Software-Development: cumin tries to downtime Icinga even with --no-downtime - https://phabricator.wikimedia.org/T208100 (10Dzahn) 05Open>03Invalid ACK! alright, yea, this make sense. thank you. And the reason that puppet didn't finish succesfully on first run was from T208108... [19:24:37] (03PS1) 10Herron: logstash: add generic kafka input config [puppet] - 10https://gerrit.wikimedia.org/r/470454 (https://phabricator.wikimedia.org/T206454) [19:25:47] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 5.720 second response time [19:27:45] !log rollback redirect ns0 to authdns2001 [19:27:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:29:17] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:31:38] (03Restored) 10Paladox: phabricator: Replace mod_php with php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/407958 (https://phabricator.wikimedia.org/T182832) (owner: 10Paladox) [19:31:45] !log redirect ns1 to authdns1001 [19:31:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:32:37] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 3.401 second response time [19:35:08] 10Operations: stop using mod_php anywhere - https://phabricator.wikimedia.org/T208257 (10Dzahn) [19:36:07] PROBLEM - pdfrender on scb1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:36:40] (03PS7) 10Jforrester: [Beta Cluster] Enable wgMediaInfoEnable on Beta Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/466954 (https://phabricator.wikimedia.org/T180981) [19:36:42] (03PS4) 10Jforrester: Install but don't enable the WikibaseMediaInfo extension, part IV [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446844 (https://phabricator.wikimedia.org/T180981) [19:36:44] (03PS3) 10Jforrester: Enable WikibaseMediaInfo on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/466955 (https://phabricator.wikimedia.org/T159708) [19:36:46] (03PS1) 10Jforrester: [Beta Cluster] Enable wmgUseWikibaseRepo,…UseWikibaseMediaInfo on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470457 (https://phabricator.wikimedia.org/T180981) [19:36:49] (03PS1) 10Jforrester: [Beta Cluster] UploadWizard: Enable Structured Data captions when WBMI is enabled [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470458 (https://phabricator.wikimedia.org/T180981) [19:36:51] (03PS1) 10Jforrester: [Beta Cluster] Cleanup SDC config, all same as prod now [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470459 [19:37:00] (03PS7) 10Paladox: phabricator: Replace mod_php with php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/407958 (https://phabricator.wikimedia.org/T182832) [19:37:48] 10Operations, 10Wikimedia-Mailing-lists, 10User-jijiki: New list request for 1lib1ref - https://phabricator.wikimedia.org/T207283 (10jijiki) [19:38:58] jouncebot: next [19:38:58] In 0 hour(s) and 21 minute(s): Services – Parsoid / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T2000) [19:39:05] OK, I'll be quick. [19:39:14] (03CR) 10Jforrester: [C: 032] [Beta Cluster] Enable wmgUseWikibaseRepo,…UseWikibaseMediaInfo on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470457 (https://phabricator.wikimedia.org/T180981) (owner: 10Jforrester) [19:39:17] :> [19:39:50] And if that works, https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/466954/7 [19:40:10] (03CR) 10jerkins-bot: [V: 04-1] phabricator: Replace mod_php with php-fpm [puppet] - 10https://gerrit.wikimedia.org/r/407958 (https://phabricator.wikimedia.org/T182832) (owner: 10Paladox) [19:40:31] !log redirect ns1 to authdns1001 - try 2 [19:40:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:41:57] (03Merged) 10jenkins-bot: [Beta Cluster] Enable wmgUseWikibaseRepo,…UseWikibaseMediaInfo on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470457 (https://phabricator.wikimedia.org/T180981) (owner: 10Jforrester) [19:42:09] RECOVERY - Host cloudvirt1019 is UP: PING OK - Packet loss = 0%, RTA = 0.22 ms [19:42:17] 10Operations, 10Release: OSError: [Errno 1] Operation not permitted when running git fat pull - https://phabricator.wikimedia.org/T208259 (10Mathew.onipe) [19:42:42] 10Operations, 10Release: OSError: [Errno 1] Operation not permitted when running git fat pull - https://phabricator.wikimedia.org/T208259 (10Mathew.onipe) p:05Triage>03Normal [19:43:40] 10Operations, 10Deployments, 10Release: OSError: [Errno 1] Operation not permitted when running git fat pull - https://phabricator.wikimedia.org/T208259 (10Mathew.onipe) [19:46:29] (03CR) 10jenkins-bot: [Beta Cluster] Enable wmgUseWikibaseRepo,…UseWikibaseMediaInfo on Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/470457 (https://phabricator.wikimedia.org/T180981) (owner: 10Jforrester) [19:47:09] !log replace radon IPs with authdns1001 on cr1/2-eqiad [19:47:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:48:04] !log redirect ns1 to authdns1001 - try 3 [19:48:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:56:00] !log rebooting authdns2001 for kernel security update [19:56:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:59:38] PROBLEM - Device not healthy -SMART- on db1073 is CRITICAL: cluster=mysql device=megaraid,3 instance=db1073:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=db1073&var-datasource=eqiad%2520prometheus%252Fops [20:00:04] cscott, arlolra, subbu, bearND, halfak, and Amir1: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Services – Parsoid / Citoid / Mobileapps / ORES / … deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20181029T2000). [20:01:50] 10Operations, 10Traffic: Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10BBlack) p:05Triage>03Normal [20:01:53] !log rollback redirect ns1 to authdns1001 [20:01:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:02:18] PROBLEM - MediaWiki memcached error rate on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [5000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [20:02:37] PROBLEM - Device not healthy -SMART- on labsdb1005 is CRITICAL: cluster=mysql device=megaraid,8 instance=labsdb1005:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=labsdb1005&var-datasource=eqiad%2520prometheus%252Fops [20:03:08] 10Operations, 10Cloud-VPS, 10netops, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10ayounsi) >>! In T208244#4703730, @Krenair wrote: > I just noticed profile::ntp also contains some ACLs of it's own which restrict source addresses and don't use ferm::service Goo... [20:05:55] 10Operations, 10Cloud-VPS, 10netops, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10Andrew) Running an ntp server or two on a cloud VM is probably not a big deal. But, before I go down that road... does anyone want to argue against us just using pool.ntp.org fo... [20:05:58] PROBLEM - Device not healthy -SMART- on cloudvirt1019 is CRITICAL: cluster=misc device={cciss,6,cciss,7,cciss,8,cciss,9} instance=cloudvirt1019:9100 job=node site=eqiad https://grafana.wikimedia.org/dashboard/db/host-overview?var-server=cloudvirt1019&var-datasource=eqiad%2520prometheus%252Fops [20:08:57] RECOVERY - MediaWiki memcached error rate on graphite1001 is OK: OK: Less than 40.00% above the threshold [1000.0] https://grafana.wikimedia.org/dashboard/db/mediawiki-graphite-alerts?orgId=1&panelId=1&fullscreen [20:13:53] (03PS4) 10Gehel: wdqs: increase restart interval of wdqs-updater [puppet] - 10https://gerrit.wikimedia.org/r/469447 (https://phabricator.wikimedia.org/T207843) [20:15:32] (03CR) 10Gehel: [C: 032] wdqs: increase restart interval of wdqs-updater [puppet] - 10https://gerrit.wikimedia.org/r/469447 (https://phabricator.wikimedia.org/T207843) (owner: 10Gehel) [20:20:50] (03CR) 10Herron: "pcc looks alright https://puppet-compiler.wmflabs.org/compiler1002/13254/" [puppet] - 10https://gerrit.wikimedia.org/r/470452 (https://phabricator.wikimedia.org/T206454) (owner: 10Herron) [20:23:20] !log arlolra@deploy1001 Started deploy [parsoid/deploy@e36608c]: Updating Parsoid to b9fa661 [20:23:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:23:41] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10cloud-services-team: delete t206636-3 VM and revert quota bumps for project wikidata-query - https://phabricator.wikimedia.org/T207101 (10Andrew) [20:35:48] !log arlolra@deploy1001 Finished deploy [parsoid/deploy@e36608c]: Updating Parsoid to b9fa661 (duration: 12m 27s) [20:35:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:37:35] 10Operations, 10Wikimedia-Mailing-lists, 10User-jijiki: New list request for 1lib1ref - https://phabricator.wikimedia.org/T207283 (10jijiki) 05Open>03Resolved @AVasanth_WMF Here are URLs for [[ https://lists.wikimedia.org/mailman/listinfo/1lib1ref | listinfo]], [[ https://lists.wikimedia.org/mailman/ad... [20:40:31] 10Operations, 10Cloud-VPS, 10netops, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10BBlack) So, a few things I can say along those lines: * Production ultimately derives its clock sources from various pool.ntp.org sources (mediated by our per-DC server pools).... [20:40:46] 10Operations, 10Cloud-VPS, 10netops, 10Patch-For-Review: ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10Krenair) >>! In T208244#4704231, @Andrew wrote: > what is the external source of ntp authority that the production NTP servers use? seems it's all 0.*.pool.ntp.org. for eqiad it... [20:42:58] RECOVERY - pdfrender on scb1003 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 3.783 second response time [20:45:28] !log Updated Parsoid to b9fa661 (T100841, T186965, T167349, T198618, T206040, T207956) [20:45:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:45:40] T206040: Timeout parsing largish page - https://phabricator.wikimedia.org/T206040 [20:45:40] T167349: Tidy on WMF sites removes `

` - https://phabricator.wikimedia.org/T186965 [20:45:41] T100841: Support for dynamically enabling new wikis - https://phabricator.wikimedia.org/T100841 [20:45:42] T207956: Token stream patcher table start retokenizing doesn't handle non-string tokens in table attribute position - https://phabricator.wikimedia.org/T207956 [20:45:42] T198618: TOC processing should strip contents of