[00:04:16] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [00:05:16] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [00:10:26] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [00:17:56] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:22:46] 06Operations, 10OCG-General, 06Wiktionary: Download as PDF does not work in English Wiktionary: "There was an error while attempting to render your book." - https://phabricator.wikimedia.org/T150604#2791443 (10Krenair) [00:31:16] 06Operations, 10OCG-General, 06Wiktionary: Download as PDF does not work in English Wiktionary: "There was an error while attempting to render your book." - https://phabricator.wikimedia.org/T150604#2791445 (10Krenair) Hm, no, I think that's just `Unexpected error, killing thread: Error: Unsupported content... [00:36:39] 06Operations, 10OCG-General, 06Wiktionary: Download as PDF does not work in English Wiktionary: "There was an error while attempting to render your book." - https://phabricator.wikimedia.org/T150604#2791446 (10Krenair) Okay so I sent the JSON I posted above through json_decode then wfArrayToCgi: ```krenair@m... [00:39:26] PROBLEM - puppet last run on mw1237 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:40:25] 06Operations, 10OCG-General, 06Wiktionary: Download as PDF does not work in English Wiktionary: "There was an error while attempting to render your book." - https://phabricator.wikimedia.org/T150604#2791447 (10Krenair) So it's this: https://gerrit.wikimedia.org/r/#/c/313915/ [00:41:19] 06Operations, 10OCG-General, 13Patch-For-Review: Tons of OCG jobs caused a massive increase in queue length - https://phabricator.wikimedia.org/T147211#2685145 (10Krenair) This is now causing {T150604} [00:45:56] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [01:07:26] RECOVERY - puppet last run on mw1237 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [01:35:17] PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 1804.078317 Seconds [01:36:16] RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 39.864455 Seconds [01:56:06] PROBLEM - puppet last run on cp3016 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:07:16] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [02:08:16] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [02:25:06] RECOVERY - puppet last run on cp3016 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [02:27:56] PROBLEM - puppet last run on cp3048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:36:56] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.2) (duration: 25m 04s) [02:37:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:41:13] !log l10nupdate@tin ResourceLoader cache refresh completed at Mon Nov 14 02:41:12 UTC 2016 (duration 4m 16s) [02:41:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:56:57] RECOVERY - puppet last run on cp3048 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [03:06:16] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [03:07:16] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [03:23:06] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 674.78 seconds [03:36:06] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 293.81 seconds [03:44:44] (03PS1) 10Dereckson: Add Abenaki language (abe) to Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321345 (https://phabricator.wikimedia.org/T150633) [04:02:47] (03CR) 10Dereckson: "@jhsoby So the next step is to schedule this patch for deployment during a SWAT window." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/315912 (https://phabricator.wikimedia.org/T113408) (owner: 10Jon Harald Søby) [04:20:26] (03PS6) 10Dereckson: HD logos for multiple wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321234 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [04:44:42] (03CR) 10Dereckson: [C: 031] HD logos for multiple wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321234 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [04:46:44] (03CR) 10Dereckson: [C: 04-1] "You need to keep the former one to avoid to break namespaces links." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321121 (owner: 10XXN) [04:50:45] (03CR) 10Dereckson: [C: 031] "@Arseny1992 Please schedule this patch to a SWAT window." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320352 (https://phabricator.wikimedia.org/T150146) (owner: 10Arseny1992) [04:54:55] (03CR) 10Dereckson: [C: 04-1] "Commit title > 72 characters must be avoided, as they are cut by GitHub, some git one log formats, etc." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298187 (https://phabricator.wikimedia.org/T139903) (owner: 10Urbanecm) [04:56:02] (03CR) 10Dereckson: [C: 04-1] "Please use CR-1 and not commit message to notify about deployment change, the commit message should focus on the how and the why of the ch" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/318515 (https://phabricator.wikimedia.org/T149019) (owner: 10Cenarium) [04:56:56] PROBLEM - puppet last run on mc1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:57:19] (03CR) 10Dereckson: [C: 031] "Looks good to me too, can you schedule this for SWAT deployment?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/318518 (https://phabricator.wikimedia.org/T140903) (owner: 10Bartosz Dziewoński) [04:58:52] Dereckson would you be available in the morning swat? 1900 UTC [05:03:31] (03CR) 10Dereckson: [C: 04-1] "First, these PNG are rather badly cut, require to the team who manage the logo refresh a correct SVG as the basis to avoid such quality de" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/307475 (https://phabricator.wikimedia.org/T144254) (owner: 10Urbanecm) [05:07:16] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [05:07:23] (03PS3) 10Cenarium: Remove patrol from autoconfirmed and reviewer for enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/318515 (https://phabricator.wikimedia.org/T149019) [05:07:52] (03PS4) 10Cenarium: Remove patrol from autoconfirmed and reviewer for enwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/318515 (https://phabricator.wikimedia.org/T149019) [05:09:19] (03PS1) 10Dereckson: Fix whitespace issue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321350 [05:10:16] (03CR) 10Dereckson: Enable Wikibase #statements parser function on all test wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317840 (https://phabricator.wikimedia.org/T142940) (owner: 10Thiemo Mättig (WMDE)) [05:10:16] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [05:17:31] (03PS1) 10Dereckson: Improve wmf-config/ style [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321351 [05:25:56] RECOVERY - puppet last run on mc1021 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:06:16] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [06:08:16] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [07:06:16] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [07:09:16] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [07:16:41] (03PS1) 10Marostegui: db-codfw: Depool db2066 for mainteance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321353 (https://phabricator.wikimedia.org/T150518) [07:17:49] 06Operations, 10ops-codfw, 10DBA: install new disks into dbstore2001 - https://phabricator.wikimedia.org/T149457#2791727 (10Marostegui) dbstore2001 caught up without any problems. This ticket is ready to be closed as soon as @Papaul confirms that the old disks are wiped. [07:47:04] (03CR) 10Marostegui: [C: 032] db-codfw: Depool db2066 for mainteance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321353 (https://phabricator.wikimedia.org/T150518) (owner: 10Marostegui) [07:47:35] (03Merged) 10jenkins-bot: db-codfw: Depool db2066 for mainteance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321353 (https://phabricator.wikimedia.org/T150518) (owner: 10Marostegui) [07:52:18] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2066 - T150518 (duration: 03m 09s) [07:53:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:53:03] T150518: Import S5 to dbstore2001 and dbstore2002 + compression - https://phabricator.wikimedia.org/T150518 [08:01:50] (03PS1) 10Marostegui: db-eqiad.php: Depool db1064 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321354 (https://phabricator.wikimedia.org/T149079) [08:07:16] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [08:08:16] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [08:14:01] (03PS2) 10Yuvipanda: Route all logs to /dev/null [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/319798 (https://phabricator.wikimedia.org/T149946) [08:15:41] 06Operations, 10MediaWiki-General-or-Unknown, 10Traffic: Failure to save recent changes - https://phabricator.wikimedia.org/T150503#2791741 (10ema) @Marshallsumter we've deployed a configuration change on Friday evening that should have fixed the problem. Can you please confirm whether that is the case? Thanks! [08:16:09] (03PS3) 10Ladsgroup: ores: Send logs to logstash [puppet] - 10https://gerrit.wikimedia.org/r/321096 (https://phabricator.wikimedia.org/T149010) [08:20:57] !log installing curl security updates/rolling restart of app servers in eqiad [08:21:34] !log Deploy schema change labsdb1001 s4 commonswiki revision table (T147305) [08:21:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:22:18] T147305: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305 [08:25:40] (03CR) 10Yuvipanda: [C: 032] Route all logs to /dev/null [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/319798 (https://phabricator.wikimedia.org/T149946) (owner: 10Yuvipanda) [08:29:26] (03CR) 10Gehel: Switch discovery-stats cronjob to a dedicated script (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/319252 (https://phabricator.wikimedia.org/T149722) (owner: 10MaxSem) [08:48:22] (03CR) 10Jcrespo: [C: 031] "Monitor db1068, I have never tried this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321354 (https://phabricator.wikimedia.org/T149079) (owner: 10Marostegui) [08:51:10] (03CR) 10Marostegui: "Thanks - I will monitor it, but I remembered we did something similar and I did the same thing we did here a month ago or so: https://gerr" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321354 (https://phabricator.wikimedia.org/T149079) (owner: 10Marostegui) [08:52:42] (03CR) 10Gehel: "I'm still not convinced by the naming. "Hourly" is more a technical detail than a defining attribute of those scripts. That being said, th" [puppet] - 10https://gerrit.wikimedia.org/r/319252 (https://phabricator.wikimedia.org/T149722) (owner: 10MaxSem) [08:54:06] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1064 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321354 (https://phabricator.wikimedia.org/T149079) (owner: 10Marostegui) [08:54:43] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1064 for maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321354 (https://phabricator.wikimedia.org/T149079) (owner: 10Marostegui) [08:56:52] good morning [08:58:45] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1064 - T149079 (duration: 03m 07s) [08:59:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:59:32] T149079: codfw: Fix S4 commonswiki.templatelinks partitions - https://phabricator.wikimedia.org/T149079 [09:06:16] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [09:08:16] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [09:11:26] PROBLEM - HHVM rendering on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:11:36] !log Deploy schema change s4 commonswiki templatelinks db1064 - T149079 [09:12:16] RECOVERY - HHVM rendering on mw1232 is OK: HTTP OK: HTTP/1.1 200 OK - 76742 bytes in 0.107 second response time [09:12:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:12:18] T149079: codfw: Fix S4 commonswiki.templatelinks partitions - https://phabricator.wikimedia.org/T149079 [09:18:56] PROBLEM - DPKG on auth1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:19:56] RECOVERY - DPKG on auth1001 is OK: All packages OK [09:20:56] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[hhvm-dbg] [09:23:20] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia, 07Upstream: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2791840 (10Gilles) The Thumbor servers are... [09:24:56] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [09:30:06] PROBLEM - DPKG on auth2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:30:12] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2791862 (10Gilles) [09:30:15] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: Log thumbnail requests that fail on Thumbor and not on Mediawiki and vice versa - https://phabricator.wikimedia.org/T147918#2791861 (10Gilles) 05Open>03Resolved [09:31:04] 06Operations, 06Performance-Team, 10Thumbor: Investigate differences in status codes between thumbor and image scalers - https://phabricator.wikimedia.org/T150641#2791865 (10Gilles) [09:31:58] 06Operations, 06Performance-Team, 10Thumbor: Match cache headers between thumbor and mediawiki - https://phabricator.wikimedia.org/T150642#2791879 (10Gilles) [09:32:06] RECOVERY - DPKG on auth2001 is OK: All packages OK [09:34:06] PROBLEM - puppet last run on auth2001 is CRITICAL: CRITICAL: Puppet has 3 failures. Last run 2 minutes ago with 3 failures. Failed resources (up to 3 shown): Package[openssh-client],Package[openssl],Package[openssh-server] [09:34:23] 06Operations, 06Performance-Team, 10Thumbor: Record OOM kills as a metric with mtail - https://phabricator.wikimedia.org/T148962#2791893 (10Gilles) So, no change after the IM limits were introduced? Maybe the difference between 900M and 1G isn't enough. I should check how much memory Thumbor consumes when it... [09:35:22] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2791896 (10Gilles) [09:35:25] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: thumbor imagemagick filling up /tmp on thumbor1002 - https://phabricator.wikimedia.org/T145878#2791894 (10Gilles) 05Open>03Resolved The original disk issue this task was about should be fixed now. [09:37:19] !log elastic@eqiad T150232: reindexing commonswiki from terbium (logs in ~dcausse/commons_reindex/cirrus_log) [09:37:26] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: thumbor imagemagick filling up /tmp on thumbor1002 - https://phabricator.wikimedia.org/T145878#2791903 (10Gilles) [09:37:29] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: Avoid thumbor generating log files > 1GB - https://phabricator.wikimedia.org/T150208#2791901 (10Gilles) 05Open>03Resolved The size-based rotation appears to be working fine: ``` gilles@thumbor1001:/srv/log/thumbor$ ls -alh total 2.3G drw... [09:38:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:38:03] T150232: CirrusSearch: normal word get obscure used as keyword (namespace) - https://phabricator.wikimedia.org/T150232 [09:40:51] 06Operations, 06Performance-Team, 10Thumbor: Thumbor OOMs too much - https://phabricator.wikimedia.org/T150643#2791905 (10Gilles) [09:41:15] 06Operations, 06Performance-Team, 10Thumbor: Thumbor should handle page redirects like Mediawiki does - https://phabricator.wikimedia.org/T148410#2791919 (10Gilles) [09:42:36] RECOVERY - mediawiki-installation DSH group on mw1239 is OK: OK [09:43:15] 06Operations, 06Performance-Team, 10Thumbor: Make Thumbor IM engine based on a subprocess - https://phabricator.wikimedia.org/T149903#2791920 (10Gilles) I'm going to keep this low-priority and a non-blocker for production deployment. For the whole time we've had Thumbor running, we haven't run into anything... [09:43:23] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: thumbor imagemagick filling up /tmp on thumbor1002 - https://phabricator.wikimedia.org/T145878#2791922 (10Gilles) [09:43:26] 06Operations, 06Performance-Team, 10Thumbor: Make Thumbor IM engine based on a subprocess - https://phabricator.wikimedia.org/T149903#2791921 (10Gilles) [09:43:29] 06Operations, 06Performance-Team, 10Thumbor: Make Thumbor IM engine based on a subprocess - https://phabricator.wikimedia.org/T149903#2768408 (10Gilles) p:05Normal>03Low [09:44:33] 06Operations, 06Performance-Team, 10Thumbor: Thumbor OOMs too much - https://phabricator.wikimedia.org/T150643#2791925 (10Gilles) Removing the parent task and making this low priority, since nginx retries mean that this isn't a blocker for launch. [09:44:40] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2791927 (10Gilles) [09:44:43] 06Operations, 06Performance-Team, 10Thumbor: Thumbor OOMs too much - https://phabricator.wikimedia.org/T150643#2791926 (10Gilles) [09:44:46] 06Operations, 06Performance-Team, 10Thumbor: Thumbor OOMs too much - https://phabricator.wikimedia.org/T150643#2791905 (10Gilles) p:05Normal>03Low [09:45:26] PROBLEM - puppet last run on mw1254 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[hhvm-dbg] [09:47:26] RECOVERY - puppet last run on mw1254 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [09:47:54] 06Operations, 06Performance-Team, 10Thumbor: Record OOM kills as a metric with mtail - https://phabricator.wikimedia.org/T148962#2791933 (10Gilles) Moved follow-up work to new task: {T150643} [09:51:07] 06Operations, 06Performance-Team, 10Thumbor: Investigate why oom_kill mtail program doesn't work properly - https://phabricator.wikimedia.org/T149980#2791939 (10Gilles) I'm still seeing only nulls here? https://graphite.wikimedia.org/render/?width=811&height=399&_salt=1478208622.423&target=mtail.lithium.kern... [09:57:46] (03PS7) 10Urbanecm: HD logos for multiple wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321234 (https://phabricator.wikimedia.org/T150618) [10:01:06] RECOVERY - puppet last run on auth2001 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [10:05:16] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [50.0] [10:05:44] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia, 07Upstream: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2791967 (10matmarex) I see nothing in http:... [10:08:16] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [10:08:26] PROBLEM - puppet last run on mw1277 is CRITICAL: CRITICAL: Puppet has 2 failures. Last run 2 minutes ago with 2 failures. Failed resources (up to 3 shown): Package[hhvm],Package[hhvm-dbg] [10:10:33] (03PS8) 10Urbanecm: HD logos for multiple wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321234 (https://phabricator.wikimedia.org/T150618) [10:12:26] RECOVERY - puppet last run on mw1277 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [10:15:46] (03PS9) 10Urbanecm: HD logos for multiple wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321234 (https://phabricator.wikimedia.org/T150618) [10:22:00] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia, 07Upstream: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2791983 (10Gilles) It does sharpen, but not... [10:33:39] (03CR) 10DCausse: [C: 031] Use logstash's prune filter for api-feature-usage-sanitized [puppet] - 10https://gerrit.wikimedia.org/r/313035 (owner: 10Anomie) [10:43:19] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia, 07Upstream: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2792016 (10matmarex) Hmm, I found `-unsharp... [10:52:10] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia, 07Upstream: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2792027 (10Gilles) I would advise against s... [10:59:07] 06Operations, 10Beta-Cluster-Infrastructure, 10Thumbor: Thumbor keeps losing Swift auth on beta - https://phabricator.wikimedia.org/T150649#2792043 (10Gilles) [11:05:16] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [11:06:16] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [11:07:24] jouncebot: next [11:07:24] In 2 hour(s) and 52 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161114T1400) [11:07:27] zeljkof: ^^ [11:10:50] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia, 07Upstream: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2792077 (10Gilles) Generated in production.... [11:11:03] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia, 07Upstream: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2792078 (10matmarex) Um, perhaps you're see... [11:12:01] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia, 07Upstream: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2792079 (10matmarex) Yes, I was doing sharp... [11:14:35] 06Operations, 06DC-Ops: Information missing from racktables - https://phabricator.wikimedia.org/T150651#2792080 (10akosiaris) [11:16:44] 06Operations, 06DC-Ops: Information missing from racktables - https://phabricator.wikimedia.org/T150651#2792097 (10akosiaris) I am still working through the list, namely the ones I can in someway get information about, but there are many (for example spares) that I can obviously do nothing about. [11:16:55] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia, 07Upstream: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2792099 (10Gilles) >>! In T141739#2792078,... [11:19:29] 06Operations, 06Commons, 10MediaWiki-File-management, 06Multimedia, 07Upstream: Issues with displaying thumbnails for CMYK JPG images due to buggy version of ImageMagick (black horizontal stripes, black color missing) - https://phabricator.wikimedia.org/T141739#2792102 (10Gilles) Note that Thumbor, just... [11:24:26] PROBLEM - puppet last run on mw1303 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 25 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[hhvm-dbg] [11:29:26] RECOVERY - puppet last run on mw1303 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [11:36:40] 06Operations, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2792118 (10Gilles) Passing context like the original dimensions seems like a hack. The thumbnailing server doesn't need it. The API should be agnostic... [11:37:08] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2792119 (10ArielGlenn) Minor gcs for the last week: ariel@cobalt:/srv/gerrit/jvmlogs$ grep 'real=' jvm_gc.pid56593.l... [11:41:29] 06Operations, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2792125 (10Gilles) As for face recognition to guide focused cropping, this is a functionality Thumbor provides out of the box. It even supports doing t... [11:47:34] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2792138 (10ArielGlenn) 2016-11-11T01:40:38.808+0000: 26.025: [GC (Allocation Failure) Desired survivor size 19451084... [11:51:04] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2792153 (10ArielGlenn) I don't know that we're going to get much more out of the logs until the next incident happen... [11:57:58] (03PS6) 10Muehlenhoff: Load connection tracking sysctl values via a separate systemd unit [puppet] - 10https://gerrit.wikimedia.org/r/320197 (https://phabricator.wikimedia.org/T136094) [12:03:16] !log upgrading pillow on thumbor1001 to 3.4.2 (latest version from backports with security fixes) [12:04:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:05:16] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [12:08:16] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [12:23:27] !log installing pillow security updates on mediawiki canary servers [12:24:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:33:04] (03CR) 10Bartosz Dziewoński: "Heh, yeah, I probably should. I forgot about this for a bit. Let's do it today." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/318518 (https://phabricator.wikimedia.org/T140903) (owner: 10Bartosz Dziewoński) [12:33:15] (03PS2) 10Bartosz Dziewoński: Verify license tags for custom license in Commons' UploadWizard [mediawiki-config] - 10https://gerrit.wikimedia.org/r/318518 (https://phabricator.wikimedia.org/T140903) [12:41:26] RECOVERY - Improperly owned -0:0- files in /srv/mediawiki-staging on mira is OK: Files ownership is ok. [12:43:26] PROBLEM - puppet last run on db1065 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [12:45:16] RECOVERY - puppet last run on db1065 is OK: OK: Puppet is currently enabled, last run 13 minutes ago with 0 failures [12:48:28] !log installing pillow security updates on non-mediawiki systems [12:49:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:59:17] (03Abandoned) 10BBlack: test commit, 8k default buffer [debs/openssl11] - 10https://gerrit.wikimedia.org/r/320950 (owner: 10BBlack) [12:59:34] (03Abandoned) 10BBlack: openssl (1.1.0c-1+wmf2) jessie-wikimedia; urgency=medium [debs/openssl11] - 10https://gerrit.wikimedia.org/r/320951 (owner: 10BBlack) [13:00:31] (03PS1) 10BBlack: Patch: increase default buffer size to 8K [debs/openssl11] - 10https://gerrit.wikimedia.org/r/321366 [13:00:34] (03PS1) 10BBlack: openssl (1.1.0c-1+wmf2) jessie-wikimedia; urgency=medium [debs/openssl11] - 10https://gerrit.wikimedia.org/r/321367 [13:10:03] (03PS2) 10BBlack: openssl (1.1.0c-1+wmf2) jessie-wikimedia; urgency=medium [debs/openssl11] - 10https://gerrit.wikimedia.org/r/321367 [13:10:05] (03PS2) 10BBlack: Patch: increase default buffer size to 8K [debs/openssl11] - 10https://gerrit.wikimedia.org/r/321366 [13:21:51] !log upgrading pillow on thumbor1002 to 3.4.2 (latest version from backports with security fixes) [13:22:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:26:40] Hi. I'm trying to do https://phabricator.wikimedia.org/T148241 but after getting into the deployment prep, I don't know where to navigate to for running the script. Ideas? (Sorry if this is the wrong channel for this query) [13:26:50] 06Operations, 07Puppet, 13Patch-For-Review, 07RfC: RFC: New puppet code organization paradigm/coding standards - https://phabricator.wikimedia.org/T147718#2792322 (10akosiaris) I 've been thinking about this RFC over the weekend so I 'll try and give my POV on the points there in a semi-structured way For... [13:39:02] !log Stopping replication in db2066 for maintenance - T150518 [13:39:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:39:45] T150518: Import S5 to dbstore2001 and dbstore2002 + compression - https://phabricator.wikimedia.org/T150518 [13:53:55] 06Operations, 06Services (doing), 15User-mobrovac: Investigate better protection modes for electron render service (xvfb setuid) - https://phabricator.wikimedia.org/T143336#2792355 (10mobrovac) a:05GWicke>03mobrovac Thank you @GWicke for the awesome work-around! I am currently in the process of integrati... [13:56:03] 06Operations, 06Services (doing), 15User-mobrovac: Investigate better protection modes for electron render service (xvfb setuid) - https://phabricator.wikimedia.org/T143336#2792361 (10MoritzMuehlenhoff) Just for the record, firejail 0.9.44 is available on carbon. [13:57:05] jouncebot next [13:57:06] In 0 hour(s) and 2 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161114T1400) [14:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161114T1400). Please do the needful. [14:00:04] Dereckson, Urbanecm, and MatmaRex: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [14:00:40] Present [14:00:47] hi [14:01:45] I can SWAT today! [14:02:27] 07Puppet, 10Beta-Cluster-Infrastructure, 07Beta-Cluster-reproducible, 07Easy, 15User-Ladsgroup: "Connect to 'deployment.eqiad.wmnet' instead" when you ssh into deployment-tin on Beta - https://phabricator.wikimedia.org/T146505#2792386 (10Ladsgroup) [14:04:48] Dereckson: around for swat? [14:06:16] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [14:06:17] !log downgrading varnish on cp3043 to 4.1.3-1wm3 [14:06:37] while waiting for Dereckson to confirm he is around... [14:06:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:12] Urbanecm: can you test 321234 at mw1099? [14:07:24] (03PS10) 10Zfilipin: HD logos for multiple wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321234 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [14:07:38] (03PS1) 10Muehlenhoff: Tools proxy: Restrict to labs networks [puppet] - 10https://gerrit.wikimedia.org/r/321371 [14:07:41] zeljkof, yes, 321234 is testable at mw1099. [14:07:45] * zeljkof is rebasing 321234 [14:08:16] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [14:08:24] Urbanecm: great! rebasing; will merge; push to mw1099 and let you know so you can test [14:09:19] * Urbanecm waits for merge [14:09:33] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321234 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [14:10:07] (03Merged) 10jenkins-bot: HD logos for multiple wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321234 (https://phabricator.wikimedia.org/T150618) (owner: 10Urbanecm) [14:13:10] Urbanecm: 321234 is at mw1099, please test and let me know if I can continue with the deployment [14:13:49] * Urbanecm is testing 321234 at mw1099 [14:14:49] !log elastic@eqiad T150232: commonswiki reindex failed again [14:15:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:15:34] T150232: CirrusSearch: normal word get obscure used as keyword (namespace) - https://phabricator.wikimedia.org/T150232 [14:16:22] zeljkof, you can deploy 321234 everywhere. [14:16:39] Urbanecm: great! deploying... [14:17:18] zeljkof, thanks [14:17:24] it will take a few minutes, since there is 15 files [14:17:56] PROBLEM - puppet last run on db1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:18:30] zeljkof, I can wait :). Is there any limit? E.g. this is maximum number of files I can add to one patch or something like it? [14:18:56] Urbanecm: no, as far as I know, we can sync folders too, if needed [14:19:10] !log zfilipin@tin Synchronized static/images/project-logos/avwiki-1.5x.png: SWAT: [[gerrit:321234|HD logos for multiple wikis (T150618)]] (duration: 01m 10s) [14:19:44] zeljkof, okay, thanks. [14:19:46] Is it deployed? [14:19:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:19:52] T150618: [GOAL] HD logo for all Wikipedias - https://phabricator.wikimedia.org/T150618 [14:20:08] Urbanecm: the first file (out of 15) :) [14:20:16] deploying the second [14:20:46] !log zfilipin@tin Synchronized static/images/project-logos/avwiki-2x.png: SWAT: [[gerrit:321234|HD logos for multiple wikis (T150618)]] (duration: 00m 51s) [14:21:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:22:19] zeljkof, yeah, you're doing this per file :). I through it will sync whatever needs to be sync. [14:22:28] !log zfilipin@tin Synchronized static/images/project-logos/bawiki-1.5x.png: SWAT: [[gerrit:321234|HD logos for multiple wikis (T150618)]] (duration: 00m 48s) [14:23:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:23:18] Urbanecm: I am relatively new to swat, I have synced files only so far, in this case I should have probably synced the folder, but I am not sure how resource intensive that is [14:23:20] !log zfilipin@tin Synchronized static/images/project-logos/bawiki-2x.png: SWAT: [[gerrit:321234|HD logos for multiple wikis (T150618)]] (duration: 00m 47s) [14:24:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:04] 06Operations, 10Analytics, 10ChangeProp, 10Citoid, and 10 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2748980 (10MoritzMuehlenhoff) Another nodejs-based service we're running is etherpad-lite (running etherpad.wikimedia.org), I've added @akosiaris to the ticket. [14:24:17] !log zfilipin@tin Synchronized static/images/project-logos/bgwiki-1.5x.png: SWAT: [[gerrit:321234|HD logos for multiple wikis (T150618)]] (duration: 00m 47s) [14:24:29] (03PS1) 10Gehel: Kartotherian: deploy application configuration with scap3 [puppet] - 10https://gerrit.wikimedia.org/r/321374 (https://phabricator.wikimedia.org/T150021) [14:24:31] (03PS1) 10Gehel: Kartotherian: deploy application configuration with scap3 [puppet] - 10https://gerrit.wikimedia.org/r/321375 (https://phabricator.wikimedia.org/T150021) [14:24:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:58] T150618: [GOAL] HD logo for all Wikipedias - https://phabricator.wikimedia.org/T150618 [14:25:15] !log zfilipin@tin Synchronized static/images/project-logos/bgwiki-2x.png: SWAT: [[gerrit:321234|HD logos for multiple wikis (T150618)]] (duration: 00m 47s) [14:25:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:26:01] zeljkof, I can't help you. I didn't sync even a file! [14:27:11] Urbanecm: trying `scap sync-dir`... https://doc.wikimedia.org/mw-tools-scap/scap2/commands.html#scap-sync-dir [14:27:18] !log zfilipin@tin Synchronized static/images/project-logos: SWAT: [[gerrit:321234|HD logos for multiple wikis (T150618)]] (duration: 00m 49s) [14:27:20] !log elastic@eqiad T150232: reindex commonswiki (content and general indices only) (logs terbium:~dcausse/commons_reindex/cirrus_log) [14:27:45] well, that was quick [14:28:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:23] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:321234|HD logos for multiple wikis (T150618)]] (duration: 00m 48s) [14:28:37] Urbanecm: if sync-dir worked, everything should be deployed :) [14:28:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:28:40] T150232: CirrusSearch: normal word get obscure used as keyword (namespace) - https://phabricator.wikimedia.org/T150232 [14:28:49] please check all logos and let me know if anything is missing [14:28:59] * Urbanecm is going to check it again [14:29:15] Dereckson: around for swat? [14:29:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:29:38] MatmaRex: can you check 318518 at mw1099? [14:29:50] (03PS3) 10Zfilipin: Verify license tags for custom license in Commons' UploadWizard [mediawiki-config] - 10https://gerrit.wikimedia.org/r/318518 (https://phabricator.wikimedia.org/T140903) (owner: 10Bartosz Dziewoński) [14:30:06] zeljkof: sure. it's live already? [14:30:07] * zeljkof is rebasing 318518 [14:30:52] MatmaRex: not yet, rebasing, will let you know if a minute or two when it is ready [14:31:13] * Urbanecm confirms all logos works. [14:31:26] Urbanecm: great! :D [14:31:57] And thanks for your work zeljkof! [14:32:22] Urbanecm: no problem at all :) [14:33:10] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/318518 (https://phabricator.wikimedia.org/T140903) (owner: 10Bartosz Dziewoński) [14:33:35] 06Operations, 10Analytics, 10ChangeProp, 10Citoid, and 10 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2792476 (10akosiaris) That's gonna be a problem. Etherpad is practically unmaintained these days. The leading developer has moved on to other projects (http://mclear.co.uk/2... [14:33:42] (03Merged) 10jenkins-bot: Verify license tags for custom license in Commons' UploadWizard [mediawiki-config] - 10https://gerrit.wikimedia.org/r/318518 (https://phabricator.wikimedia.org/T140903) (owner: 10Bartosz Dziewoński) [14:35:57] MatmaRex: 318518 is at mw1099, please test [14:37:16] Dereckson: looks like you are not around, in that case I will not deploy 321345, please add it to another deployment window (if you are around, ping me) [14:37:45] Hello [14:38:02] yes I'm here zeljkof [14:38:28] zeljkof: thanks, works as expected [14:38:43] Dereckson: ok, you are next then, please hold, your patch is important to us... :) [14:38:50] MatmaRex: great! deploying to the cluster then [14:40:54] !log zfilipin@tin Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:318518|Verify license tags for custom license in Commons UploadWizard (T140903)]] (duration: 00m 47s) [14:41:15] MatmaRex: deployed everywhere, please check [14:41:31] Dereckson: rebasing 321345 [14:41:37] (03PS2) 10Zfilipin: Add Abenaki language (abe) to Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321345 (https://phabricator.wikimedia.org/T150633) (owner: 10Dereckson) [14:41:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:41] T140903: Verify license tags for custom licenses ("Another reason not mentioned above") - https://phabricator.wikimedia.org/T140903 [14:42:22] zeljkof: all fine. thanks [14:42:39] MatmaRex: it was a pleasure deploying for you, please come back ;) [14:42:56] PROBLEM - HHVM rendering on mw1239 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 30129 bytes in 0.147 second response time [14:43:08] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321345 (https://phabricator.wikimedia.org/T150633) (owner: 10Dereckson) [14:43:47] (03Merged) 10jenkins-bot: Add Abenaki language (abe) to Wikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321345 (https://phabricator.wikimedia.org/T150633) (owner: 10Dereckson) [14:45:23] Dereckson: can you check 321345 at mw1099? [14:45:58] Testing. [14:46:14] Dereckson: wait, that was just a question, did not push yet :) [14:46:21] * zeljkof is pushing... [14:46:56] RECOVERY - puppet last run on db1051 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [14:47:05] (03PS8) 10Muehlenhoff: Check whether ferm has been correctly started [puppet] - 10https://gerrit.wikimedia.org/r/318527 (https://phabricator.wikimedia.org/T148986) [14:47:47] Dereckson: ok, please test now, 321345 is at mw1099 [14:47:54] ok [14:48:13] the first question was if it was possible to test the patch there (sometimes it isn't possible), I guess my question should be more explicit [14:49:01] your question were right, I wonder if these codes aren't api/bots only [14:49:35] Dereckson: sorry, did not understand you [14:51:18] okay tested, looks good as it appears in Special:Preferences [14:51:38] Dereckson: ok, deploying to the universe then [14:52:46] zeljkof: your question about if the change was testable was relevant: I was at start confident these extra languages appear in the regular Wikidata UI, but it doesn't seem so. [14:52:56] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:321345|Add Abenaki language (abe) to Wikidata (T150633)]] (duration: 00m 48s) [14:53:24] Dereckson: the patch is deployed everywhere, please check [14:53:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:38] T150633: Please enable abe or Western Abenaki - https://phabricator.wikimedia.org/T150633 [14:53:55] zeljkof: still looks good to me [14:54:18] Dereckson: great, in that case, we are done with this episode of EU SWAT! [14:54:33] see you all tomorrow, same bat-time, same bat-channel [14:54:40] !log EU SWAT finished [14:55:00] Thanks for the deploy. [14:55:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:37] Dereckson: thank you for the patch! :D [15:07:16] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [15:07:48] 06Operations, 10ops-eqiad, 10media-storage: Degraded RAID on ms-be1027 - https://phabricator.wikimedia.org/T150498#2792521 (10Volans) [15:08:16] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [15:08:33] 06Operations, 10ops-eqiad: mw1239: memory scrubbing error - https://phabricator.wikimedia.org/T148421#2792527 (10MoritzMuehlenhoff) a:03Cmjohnson [15:08:56] RECOVERY - HHVM rendering on mw1239 is OK: HTTP OK: HTTP/1.1 200 OK - 76742 bytes in 2.443 second response time [15:09:37] <_joe_> !log ran sync-common on mw1239, T148421 [15:10:00] 06Operations, 10ops-eqiad: mw1239: memory scrubbing error - https://phabricator.wikimedia.org/T148421#2722348 (10Joe) For the record, after a reboot the server is working correctly with no such errors in kern.log or dmesg. I will repool the server now. [15:10:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:10:21] T148421: mw1239: memory scrubbing error - https://phabricator.wikimedia.org/T148421 [15:11:06] !log statsv: deployed Ie471fa762 [15:11:06] 06Operations, 10Analytics, 10ChangeProp, 10Citoid, and 10 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2792535 (10GWicke) @akosiaris, in the short term, I would propose to test if it works with node 6. The level of compatibility between 4 & 6 has been generally high with most... [15:11:07] !log oblivian@puppetmaster1001 conftool action : set/pooled=yes; selector: name=mw1239.* [15:11:24] thanks _joe_ for mw1239, my bad I haven't tried to reboot it :( [15:11:45] <_joe_> I didn't either [15:11:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:11:55] * elukey is happy to see ori committing things : [15:11:57] (03PS2) 10Ori.livneh: statsv: use systemd's process watchdog [puppet] - 10https://gerrit.wikimedia.org/r/321231 [15:11:57] :) [15:12:21] hi! [15:12:23] <_joe_> it was rebooted for the kernel upgrade [15:12:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:13] !log uploaded libssl1.1 1.1.0c-1+wmf2 to jessie-wikimedia/backports - T150561 [15:13:26] (03CR) 10Ori.livneh: [C: 032] statsv: use systemd's process watchdog [puppet] - 10https://gerrit.wikimedia.org/r/321231 (owner: 10Ori.livneh) [15:13:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:13:54] T150561: Extra RTT on TLS handshakes - https://phabricator.wikimedia.org/T150561 [15:19:10] 06Operations, 10Analytics, 10ChangeProp, 10Citoid, and 10 others: Node 6 upgrade planning - https://phabricator.wikimedia.org/T149331#2792545 (10MoritzMuehlenhoff) There's at least anecdotal evidence that it works/worked with 6.2: https://github.com/ether/etherpad-lite/issues/2956 [15:21:17] !log statsv: deployed I01b0e885d; service now running with systemd watchdog supervision. [15:21:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:24:13] (03CR) 10Muehlenhoff: [C: 032] Check whether ferm has been correctly started [puppet] - 10https://gerrit.wikimedia.org/r/318527 (https://phabricator.wikimedia.org/T148986) (owner: 10Muehlenhoff) [15:24:18] (03PS9) 10Muehlenhoff: Check whether ferm has been correctly started [puppet] - 10https://gerrit.wikimedia.org/r/318527 (https://phabricator.wikimedia.org/T148986) [15:31:27] !log upgrade libssl1.1 package to 1.1.0c-1+wmf2 on cache clusters - T150561 [15:32:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:32:07] T150561: Extra RTT on TLS handshakes - https://phabricator.wikimedia.org/T150561 [15:33:17] !log cache_misc - seamless nginx restart for libssl1.1 upgrade - T150561 [15:33:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:35:04] !log cache_maps - seamless nginx restart for libssl1.1 upgrade - T150561 [15:35:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:41:53] (03PS1) 10Giuseppe Lavagetto: calico: add module to build calico/node and calicoctl [puppet] - 10https://gerrit.wikimedia.org/r/321383 [15:42:58] 06Operations, 13Patch-For-Review: Firewall sets not being loaded post-reboot due to a @resolve race - https://phabricator.wikimedia.org/T148986#2792567 (10MoritzMuehlenhoff) p:05Unbreak!>03High We now have an Icinga check which tests whether ferm is loaded. [15:44:09] (03CR) 10jenkins-bot: [V: 04-1] calico: add module to build calico/node and calicoctl [puppet] - 10https://gerrit.wikimedia.org/r/321383 (owner: 10Giuseppe Lavagetto) [15:44:34] PROBLEM - swift-container-replicator on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:44:44] PROBLEM - swift-object-replicator on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:44:54] PROBLEM - dhclient process on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:44:54] PROBLEM - Check size of conntrack table on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:44:54] PROBLEM - configured eth on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:45:04] PROBLEM - swift-account-reaper on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:45:04] PROBLEM - very high load average likely xfs on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:45:04] PROBLEM - swift-account-auditor on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:45:20] 06Operations, 13Patch-For-Review: Firewall sets not being loaded post-reboot due to a @resolve race - https://phabricator.wikimedia.org/T148986#2792570 (10MoritzMuehlenhoff) As for fixing the actual race; systemd.special(7) lists an nss-lookup.target, which looks promising. I'll test this. [15:45:24] PROBLEM - Disk space on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:45:24] PROBLEM - swift-container-auditor on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:45:24] PROBLEM - salt-minion processes on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:45:24] PROBLEM - swift-container-server on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:45:24] PROBLEM - swift-container-updater on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:45:25] PROBLEM - puppet last run on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:45:25] PROBLEM - DPKG on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:45:26] PROBLEM - swift-account-server on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:45:26] PROBLEM - swift-account-replicator on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:45:27] PROBLEM - swift-object-auditor on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:45:27] PROBLEM - swift-object-updater on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:45:28] PROBLEM - swift-object-server on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:45:28] PROBLEM - MD RAID on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [15:46:56] (03PS2) 10Giuseppe Lavagetto: calico: add module to build calico/node and calicoctl [puppet] - 10https://gerrit.wikimedia.org/r/321383 [15:50:43] (03PS2) 10Rush: Tools proxy: Restrict to labs networks [puppet] - 10https://gerrit.wikimedia.org/r/321371 (owner: 10Muehlenhoff) [15:52:07] can T150570 be deployed outside of any window? [15:52:08] T150570: Ban 10 most popular passwords on Persian Wikipedia - https://phabricator.wikimedia.org/T150570 [15:52:53] it liiks still not merged although Reedy said ensure gets done on monday anyway [15:52:58] looks* [15:55:01] (03CR) 10jenkins-bot: [V: 04-1] calico: add module to build calico/node and calicoctl [puppet] - 10https://gerrit.wikimedia.org/r/321383 (owner: 10Giuseppe Lavagetto) [15:56:27] (03PS1) 10Ema: 4.1.3-1wm4: gethdr_extrachance [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/321384 [15:59:56] 06Operations, 10DNS, 10Traffic, 13Patch-For-Review: Set SPF (... -all) for toolserver.org - https://phabricator.wikimedia.org/T131930#2183433 (10Dzahn) bump, there is a patch waiting in Gerrit for this since many months [16:00:08] PROBLEM - check_puppetrun on betelgeuse is CRITICAL: CRITICAL: Puppet has 25 failures [16:00:08] PROBLEM - check_puppetrun on bellatrix is CRITICAL: CRITICAL: Puppet has 19 failures [16:00:22] 06Operations, 10DNS, 10Traffic, 13Patch-For-Review: Set SPF (... -all) for toolserver.org - https://phabricator.wikimedia.org/T131930#2792581 (10Dzahn) a:05Mschon>03None [16:00:35] 06Operations, 10DNS, 06Labs, 10Labs-Infrastructure, and 2 others: Set SPF (... -all) for toolserver.org - https://phabricator.wikimedia.org/T131930#2792584 (10Dzahn) [16:00:43] ^^ betelgeuse/bellatrix are a non-issue, it's just because I rebooted their puppetmaster [16:00:55] 06Operations, 10DNS, 06Labs, 10Labs-Infrastructure, and 3 others: Set SPF (... -all) for toolserver.org - https://phabricator.wikimedia.org/T131930#2792585 (10Dzahn) [16:01:19] PROBLEM - MegaRAID on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [16:01:28] PROBLEM - Check whether ferm is active by checking the default input chain on ms-be2003 is CRITICAL: Return code of 255 is out of bounds [16:04:16] !log Setting new password on User:Ckoerner and requesting an unlock so user can recover account [16:05:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:05:08] RECOVERY - check_puppetrun on betelgeuse is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [16:06:18] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [16:07:18] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [16:07:47] 06Operations, 10Traffic: Post Varnish 4 migration cleanup - https://phabricator.wikimedia.org/T150660#2792598 (10ema) [16:07:56] !log cache_text - seamless nginx restart for libssl1.1 upgrade - T150561 [16:08:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:08:39] T150561: Extra RTT on TLS handshakes - https://phabricator.wikimedia.org/T150561 [16:10:18] RECOVERY - check_puppetrun on bellatrix is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [16:12:18] RECOVERY - Disk space on ms-be2003 is OK: DISK OK [16:12:18] RECOVERY - swift-container-auditor on ms-be2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [16:12:18] RECOVERY - swift-container-updater on ms-be2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [16:12:18] RECOVERY - swift-container-server on ms-be2003 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [16:12:18] RECOVERY - salt-minion processes on ms-be2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [16:12:19] RECOVERY - puppet last run on ms-be2003 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [16:12:28] RECOVERY - DPKG on ms-be2003 is OK: All packages OK [16:12:28] RECOVERY - swift-account-replicator on ms-be2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [16:12:28] RECOVERY - swift-account-server on ms-be2003 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [16:12:28] RECOVERY - swift-object-updater on ms-be2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [16:12:28] RECOVERY - swift-object-server on ms-be2003 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [16:12:29] RECOVERY - swift-object-auditor on ms-be2003 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [16:12:29] RECOVERY - MD RAID on ms-be2003 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [16:12:30] RECOVERY - Check whether ferm is active by checking the default input chain on ms-be2003 is OK: OK ferm input default policy is set [16:12:30] RECOVERY - swift-container-replicator on ms-be2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [16:12:38] RECOVERY - swift-object-replicator on ms-be2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [16:12:39] (03CR) 10BBlack: [C: 031] 4.1.3-1wm4: gethdr_extrachance [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/321384 (owner: 10Ema) [16:12:58] RECOVERY - dhclient process on ms-be2003 is OK: PROCS OK: 0 processes with command name dhclient [16:12:58] RECOVERY - Check size of conntrack table on ms-be2003 is OK: OK: nf_conntrack is 11 % full [16:12:58] RECOVERY - configured eth on ms-be2003 is OK: OK - interfaces up [16:12:59] RECOVERY - swift-account-reaper on ms-be2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [16:13:08] RECOVERY - very high load average likely xfs on ms-be2003 is OK: OK - load average: 6.90, 7.67, 7.35 [16:13:08] RECOVERY - swift-account-auditor on ms-be2003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [16:18:30] (03CR) 10Daniel Kinzler: "@Aude well, it only does anything when explicitly invoked, so having a feature switch seems redundant. But not a problem, of course." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/317840 (https://phabricator.wikimedia.org/T142940) (owner: 10Thiemo Mättig (WMDE)) [16:19:05] !log cache_upload - seamless nginx restart for libssl1.1 upgrade - T150561 [16:19:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:19:46] T150561: Extra RTT on TLS handshakes - https://phabricator.wikimedia.org/T150561 [16:21:18] RECOVERY - MegaRAID on ms-be2003 is OK: OK: optimal, 14 logical, 14 physical [16:23:01] 06Operations, 10Traffic: Extra RTT on TLS handshakes - https://phabricator.wikimedia.org/T150561#2792649 (10BBlack) Above SAL entries deployed the workaround (setting default libssl buffer size to 8K at compile-time), which solves the immediate issue and restores normal handshake performance. Leaving this ope... [16:25:38] 06Operations, 10Gerrit, 06Release-Engineering-Team, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2792668 (10demon) >>! In T148478#2792153, @ArielGlenn wrote: > I don't know that we're going to get much more out of... [16:27:08] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [16:27:58] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3189848 keys, up 14 days 8 hours - replication_delay is 0 [16:29:58] PROBLEM - puppet last run on db1060 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:35:30] (03PS18) 10Mobrovac: PDF Render Service: Role and module [puppet] - 10https://gerrit.wikimedia.org/r/305256 (https://phabricator.wikimedia.org/T143129) [16:36:36] (03CR) 10jenkins-bot: [V: 04-1] PDF Render Service: Role and module [puppet] - 10https://gerrit.wikimedia.org/r/305256 (https://phabricator.wikimedia.org/T143129) (owner: 10Mobrovac) [16:37:57] (03PS19) 10Mobrovac: PDF Render Service: Role and module [puppet] - 10https://gerrit.wikimedia.org/r/305256 (https://phabricator.wikimedia.org/T143129) [16:38:33] 06Operations, 10ops-codfw: rack spare pool servers and update tracking sheet - https://phabricator.wikimedia.org/T150341#2792707 (10Papaul) [16:38:53] (03PS1) 10Giuseppe Lavagetto: Allow building upon wikimedia-jessie [calico-containers] - 10https://gerrit.wikimedia.org/r/321388 [16:38:55] (03PS1) 10Giuseppe Lavagetto: Add files to build debian binary package [calico-containers] - 10https://gerrit.wikimedia.org/r/321389 [16:39:09] 06Operations, 10ops-codfw: rack spare pool servers and update tracking sheet - https://phabricator.wikimedia.org/T150341#2782701 (10Papaul) 05Open>03Resolved Complete [16:50:56] (03CR) 10Thiemo Mättig (WMDE): [C: 031] Fix whitespace issue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321350 (owner: 10Dereckson) [16:58:58] RECOVERY - puppet last run on db1060 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [17:32:44] (03CR) 10BBlack: [C: 032 V: 032] Patch: increase default buffer size to 8K [debs/openssl11] - 10https://gerrit.wikimedia.org/r/321366 (owner: 10BBlack) [17:32:45] (03CR) 10BBlack: [C: 032 V: 032] openssl (1.1.0c-1+wmf2) jessie-wikimedia; urgency=medium [debs/openssl11] - 10https://gerrit.wikimedia.org/r/321367 (owner: 10BBlack) [17:32:45] PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: CRITICAL - Socket timeout after 20 seconds [17:32:46] ^ we are on it [17:32:54] (03PS1) 10Chad: Gerrit: Also install openjdk8 alongside 7, make it configurable [puppet] - 10https://gerrit.wikimedia.org/r/321398 [17:32:56] (03CR) 10Muehlenhoff: "openjdk-7 and openjdk-8 both serve the "alternatives" for the various Java runtimes(java, javac), it would be better to handle these via a" [puppet] - 10https://gerrit.wikimedia.org/r/321398 (owner: 10Chad) [17:33:13] (03CR) 10Chad: "For the binary things like java, javac we'd probably be fine using alternatives. I'm not quite sure how you'd do it for something like the" [puppet] - 10https://gerrit.wikimedia.org/r/321398 (owner: 10Chad) [17:33:28] RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3677 bytes in 7.235 second response time [17:35:09] (03CR) 10Hashar: "The puppet Docker module drops the extended disk space and overall I guess it is probably heavily coupled with the Kubernetes setup we hav" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/320942 (https://phabricator.wikimedia.org/T150502) (owner: 10Dduvall) [17:38:18] (03CR) 10Dduvall: [WIP] contint: New role for Docker based CI slave (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/320942 (https://phabricator.wikimedia.org/T150502) (owner: 10Dduvall) [17:40:58] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 617 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3195053 keys, up 14 days 9 hours - replication_delay is 617 [17:41:58] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3193014 keys, up 14 days 9 hours - replication_delay is 0 [17:42:30] has tools.wmflabs just fallen over and died ? Getting all manner of 502 and 503 errors when using geohack [17:42:55] yes tiddlywink [17:43:02] see -labs [17:43:52] ah, just the maintenance Krenair ? [17:44:20] tiddlywink, maintenance gone wrong I think [17:44:50] it was supposed to be read-only [17:45:12] ah, no probs. Just trying to identify some monuments using the geohack stuff, nothing critical. [17:45:34] !log (20m late) stopping puppet on labstore1001, change tools share to ro [17:49:39] 06Operations, 10Traffic: Extra RTT on TLS handshakes - https://phabricator.wikimedia.org/T150561#2792910 (10GWicke) My latest round of benchmarks confirm the fix as well. @bblack, thank you for investigating & addressing this regression so quickly! [18:00:04] gehel: Dear anthropoid, the time has come. Please deploy Weekly Wikidata query service deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161114T1800). [18:00:33] nothing is scheduled for wikidata query service this week [18:05:09] 06Operations, 10Traffic: Extra RTT on TLS handshakes - https://phabricator.wikimedia.org/T150561#2792961 (10BBlack) [18:14:10] 06Operations, 10Parsoid, 06Release-Engineering-Team: Provide a /parsoid directory on releases.wikimedia.org - https://phabricator.wikimedia.org/T150672#2792988 (10ssastry) [18:15:50] https://tools.wmflabs.org/replag/ < commons seems to have an issue. [18:21:00] Revent, there was maintenance ongoing [18:21:13] it is currently going down if you reload the page [18:21:26] 06Operations, 10EventBus, 10hardware-requests: eqiad/codfw: 1+1 Kafka broker in main clusters in eqiad and codfw - https://phabricator.wikimedia.org/T145082#2793018 (10RobH) 05stalled>03Resolved Allocation of codfw box took place on T150340. Resolving this hw-request. [18:22:08] Revent: There is one server from commons (db1064) and labsdb1004 (for the commons database) with a long running ALTER table [18:22:20] 06Operations, 10hardware-requests: eqiad: (4) worker servers for kubernetes - https://phabricator.wikimedia.org/T141624#2793023 (10RobH) 05Open>03Resolved servers were purchased and allocated on T145026 [18:24:01] Ok, as long as it’s something that’s known. [18:24:14] Revent: Yep, it is. Thanks for the heads up though! [18:26:20] 06Operations, 06Labs, 13Patch-For-Review, 07Tracking: Migrate tools to secondary labstore HA cluster (Scheduled on 11/14) [tracking] - https://phabricator.wikimedia.org/T146154#2793044 (10madhuvishy) [18:29:10] 06Operations, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Thumb API: Varnish / CDN questions - https://phabricator.wikimedia.org/T150673#2793046 (10GWicke) [18:31:07] 06Operations, 10Parsoid: Parsoid deb upload failed: Need ops intervention - https://phabricator.wikimedia.org/T150674#2793063 (10ssastry) [18:39:28] PROBLEM - puppet last run on cp3033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:40:38] PROBLEM - puppet last run on cp3044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:00:05] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161114T1900). Please do the needful. [19:00:05] arseny92 and phuedx: A patch you scheduled for Morning SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [19:00:48] o/ [19:01:51] . [19:01:54] !log demon@tin Synchronized w/static.php: Removing useless function param (duration: 01m 00s) [19:02:18] I can SWAT today [19:02:43] I've never run ttmserver-export [19:02:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:22] any idea how long it will take/any resources concerns to be aware of? [19:03:54] I'm not sure though if I'd be able to test it. Should be there UI changes in Translate visible to whom arent translarors? [19:04:07] (03PS3) 10Thcipriani: Enable translation memory of Translate for frwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320352 (https://phabricator.wikimedia.org/T150146) (owner: 10Arseny1992) [19:04:23] Nikerabbit [19:06:00] hrm, lemme get a few of the others for this window out of the way and we'll circle back to this one. [19:06:13] (03PS2) 10Thcipriani: MF Beta: Enable moving first paragraph before infobox [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320993 (https://phabricator.wikimedia.org/T149830) (owner: 10Bmansurov) [19:07:18] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [19:07:43] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320993 (https://phabricator.wikimedia.org/T149830) (owner: 10Bmansurov) [19:07:47] thcipriani: i'm standing in for bmansurov on that one [19:07:59] phuedx: ok :) [19:08:18] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [19:08:28] RECOVERY - puppet last run on cp3033 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [19:08:38] RECOVERY - puppet last run on cp3044 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [19:08:39] (03Merged) 10jenkins-bot: MF Beta: Enable moving first paragraph before infobox [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320993 (https://phabricator.wikimedia.org/T149830) (owner: 10Bmansurov) [19:10:43] phuedx: change is live on mw1099, if there's anything to test there [19:10:50] thcipriani: on it [19:11:08] PROBLEM - puppet last run on cp1063 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:13:04] thcipriani: lgtm [19:13:12] phuedx: ok, going live everywhere [19:13:28] thcipriani: sec -- just checking fatalmonitor [19:13:35] (well for warnings) [19:13:47] sure, we have a dashboard for 1099....somewhere... [19:14:26] https://logstash.wikimedia.org/app/kibana#/dashboard/mw1099 [19:14:35] (03PS5) 10EBernhardson: Setup CirrusSearch interwiki load test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320220 (https://phabricator.wikimedia.org/T149740) [19:14:56] thcipriani: ok [19:15:04] 👍 [19:15:26] (terminal too old to see...) [19:15:42] or, I guess, fonts not configured correctly enough to see... :) [19:15:48] thcipriani: it was the poop emoji D: [19:15:53] thumbsup [19:15:53] thcipriani: i've just tacked one more patch onto the end, hope you don't mind. It's a config patch, but i'll need to watch it semi-carefully at first [19:15:56] :D :D :D [19:15:58] :) [19:16:05] phuedx: ok, going live everywhere. [19:16:15] ebernhardson: ok, np. [19:16:53] * phuedx is seeing "service unavailable" on logstash [19:17:57] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:320993|MF Beta: Enable moving first paragraph before infobox (T149830)]] (duration: 00m 52s) [19:18:04] phuedx: works for me... maybe try a hard reload? [19:18:06] ^ live everywhere [19:18:31] (03PS1) 10Chad: Remove old 2007 donation PSA page [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321471 [19:18:39] Heh ^ [19:18:44] (03PS1) 10Papaul: Add mgmt DNS entries for restbase201[0-2] Bug:T150680 [dns] - 10https://gerrit.wikimedia.org/r/321472 (https://phabricator.wikimedia.org/T150680) [19:18:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:18:48] T149830: Re-enable moving the first paragraph before infobox on MF beta - https://phabricator.wikimedia.org/T149830 [19:19:03] stashbot lags [19:19:04] See https://wikitech.wikimedia.org/wiki/Tool:Stashbot for help. [19:19:14] yes indeed. [19:19:23] bd808: hard reload worked -- ta [19:19:28] Amir1: ping for swat? [19:19:33] took a minute between log and reply [19:19:49] pong [19:20:06] Amir1: ok :) [19:20:19] (03PS3) 10Thcipriani: Ban ten most popular passwords from fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321093 (https://phabricator.wikimedia.org/T150570) (owner: 10Ladsgroup) [19:20:41] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321093 (https://phabricator.wikimedia.org/T150570) (owner: 10Ladsgroup) [19:20:53] Um, comment on https://phabricator.wikimedia.org/T150570#2791399 needs at least replying to [19:21:04] (even if it's a "that won't be a problem") [19:21:20] (03Merged) 10jenkins-bot: Ban ten most popular passwords from fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321093 (https://phabricator.wikimedia.org/T150570) (owner: 10Ladsgroup) [19:22:19] Amir1: any concerns about: https://phabricator.wikimedia.org/T150570#2791399 ? [19:22:53] thcipriani: it's an upstream security issue [19:22:59] checkout the parent task [19:24:06] I'm aware of the parent task, but that doesn't answer the question. [19:24:34] Well, answers half the question [19:24:38] The second half is what matters. [19:24:53] guys this has to be consistent https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=973179&oldid=973134 [19:25:23] (when scheduling include the task) [19:25:49] arseny92: new thing? in the past always referred to the linked gerrit patch for that information [19:27:42] arseny92: I don't think there is any hard and fast requirement. I think you meant to ask nicely rather than being declarative. [19:28:12] sorry [19:28:32] it is nice if the line in the SWAT queue is easy to cut-n-paste as a !log message that will ping the task though [19:28:54] yes thats what i meant [19:29:00] https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers#Full_deployment [19:29:42] * ostriches just !logs stuff like "Sync this cool thing to do that other cool thing" :) [19:29:48] as well as the summary reason for scap [19:29:49] * ostriches is also a rebel without a cause [19:29:50] :D [19:30:12] which in turn !log's via logmsgsbot [19:30:31] ostriches: I don't believe you don't have a cause ;) [19:31:32] True. But it's a secret :p [19:31:41] thcipriani sup with translate? As yyou seen in the gerrit log Dereckson asked to schedule it but he doesn't seem to be around [19:32:18] ostriches: "delete all the code and get rid of authed accounts" based on the 3.5 yrs I've been around to hear your rants :P [19:32:53] Amir1: your change is live on mw1099, check please [19:33:23] bd808: Damn straight [19:33:25] thcipriani: sure [19:33:26] arseny92: Are you able to check if I deploy? Didn't you have some question about that? [19:33:37] ostriches: oh hmm, is it too late to add https://gerrit.wikimedia.org/r/#/c/321162/ to swat? [19:33:47] Ask thcipriani, he's swattin' [19:33:56] thcipriani: ---^ [19:34:52] bawolff: I think I can get it out pretty quickly. Update the swat calendar and I'll do my best :) [19:35:11] (03CR) 10BryanDavis: [C: 031] "Seems reasonable to me. This is left from ye olden days when scap was a bash script and l10nudpate used dsh to ship things around." [puppet] - 10https://gerrit.wikimedia.org/r/321401 (owner: 10Chad) [19:35:36] (03CR) 10Ppchelko: PDF Render Service: Role and module (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/305256 (https://phabricator.wikimedia.org/T143129) (owner: 10Mobrovac) [19:36:52] thcipriani: added [19:36:58] thank you! [19:37:38] thcipriani : [21:03] I'm not sure though if I'd be able to test it. Should be there UI changes in Translate visible to whom arent translarors? [19:38:23] thcipriani: https://usercontent.irccloud-cdn.com/file/l1fsNLhK/ [19:38:51] Amir1: :) Assume that means it's working on mw1099? [19:38:58] The error message translates to "can't use popular passwords, choose another one" [19:39:05] arseny92: I don't know the answer there :( [19:39:07] so, yes, it works [19:39:08] RECOVERY - puppet last run on cp1063 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [19:39:13] Thanks [19:39:14] Amir1: cool, going live [19:40:15] !log thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:321093|Ban ten most popular passwords from fawiki (T150570)]] (duration: 00m 46s) [19:40:21] thcipriani then i guess lets just do it? it should be working as its just a bool switch to enable the feature [19:40:24] ^ Amir1 live everywhere [19:40:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:40:57] T150570: Ban 10 most popular passwords on Persian Wikipedia - https://phabricator.wikimedia.org/T150570 [19:41:22] (03PS1) 10Jcrespo: Repool db2042 after maintenance [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321476 (https://phabricator.wikimedia.org/T150334) [19:42:51] Tested, I've got an issue with fawiki abuse filter [19:43:05] (03PS4) 10Thcipriani: Enable translation memory of Translate for frwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320352 (https://phabricator.wikimedia.org/T150146) (owner: 10Arseny1992) [19:43:15] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320352 (https://phabricator.wikimedia.org/T150146) (owner: 10Arseny1992) [19:43:45] (03Merged) 10jenkins-bot: Enable translation memory of Translate for frwiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320352 (https://phabricator.wikimedia.org/T150146) (owner: 10Arseny1992) [19:43:46] Amir1: ? [19:44:19] Abuse filter doesn't let me make an account because I tried with my ip already [19:44:38] ah [19:44:39] let me try with my admin account [19:45:02] arseny92: live on mw1099, want to check for UI change? [19:45:27] i.e. as you can see the patch removes the line, so frwiktionaru falls under the default=true; definition... Checking atm [19:46:24] thcipriani: Using my admin account, I was able to not to make an account using popular passwords [19:46:39] Amir1: cool, thanks for double-checking :) [19:47:37] :) [19:49:25] afk, contact me if anything went wrong [19:49:54] I can't tell that fc-list is used anywhere.... [19:50:55] didn't notice changes. Seem to be privileged to translationadmins. So let's sync and wait if the original requester for the change confirms it (i.e. i'll leave the task open for couple of days) [19:51:11] ostriches: last i checked fc-list on noc was super out of date [19:51:28] It was mostly used so it could be linked from the meta help page on svg fonts [19:51:33] arseny92: ack [19:51:40] bawolff: That's...dumb... [19:53:24] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:320352|Enable translation memory of Translate for frwiktionary (T150146)]] (duration: 00m 47s) [19:53:44] (03CR) 10Jcrespo: "Give it a look, see what you think, test it live on labsdb1009." [puppet] - 10https://gerrit.wikimedia.org/r/320822 (https://phabricator.wikimedia.org/T150446) (owner: 10Jcrespo) [19:54:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:54:06] T150146: Enable "translation memory" of translate extension on frwiktionary - https://phabricator.wikimedia.org/T150146 [19:54:08] !log ran: mwscript extensions/Translate/scripts/ttmserver-export.php --wiki=frwiktionary [19:54:21] arseny92: should be live everywhere [19:54:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:55:05] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320220 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson) [19:55:11] (03PS6) 10Thcipriani: Setup CirrusSearch interwiki load test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320220 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson) [19:55:17] (03CR) 10Thcipriani: Setup CirrusSearch interwiki load test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320220 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson) [19:55:21] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320220 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson) [19:55:31] ugh. Why do you do this to me gerrit? [19:55:33] :) [19:55:52] (03Merged) 10jenkins-bot: Setup CirrusSearch interwiki load test [mediawiki-config] - 10https://gerrit.wikimedia.org/r/320220 (https://phabricator.wikimedia.org/T149740) (owner: 10EBernhardson) [19:57:11] ebernhardson: so from the looks of it, sync: -interwikisources, InitialiseSettings, -production. Correct? [19:57:25] well, guess I'll pull to 1099 first :) [19:57:59] wow [19:58:15] ebernhardson: live on mw1099 [19:58:41] thcipriani: checking [19:58:52] thcipriani: for order, yes that's correct [19:58:56] Nikerabbit: ? [20:00:02] thcipriani: might take a moment, this only runs on 5% of requests and i need to get it to trigger a few times [20:00:09] ok [20:01:14] (03PS1) 10Chad: Adding fonts to version control [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321481 (https://phabricator.wikimedia.org/T147481) [20:01:24] bawolff: if you've got a bit outside of the SWAT window, I can get your change out. We won't run into another deployment for an hour according to the calendar. [20:01:40] yep, i'm not going anywhere [20:01:42] Thanks :) [20:02:03] okie doke :) [20:03:47] (03CR) 10Jforrester: "Yay for less crap." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321471 (owner: 10Chad) [20:04:25] thcipriani: going to need to revert it, it will probably spam up our logs something crazy [20:04:42] ebernhardson: ok, doing. Thanks for checking. [20:05:09] it's querying a field on sister wikis that only exists for super popular wikis, and logging about it ... will hopefully have something by evening swat :) [20:05:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [20:05:21] (03PS1) 10Thcipriani: Revert "Setup CirrusSearch interwiki load test" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321482 [20:06:15] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [20:06:27] (03PS1) 10Chad: Remove boardvote.gpg from ignore list. File doesn't exist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321483 [20:06:40] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321482 (owner: 10Thcipriani) [20:07:10] (03Merged) 10jenkins-bot: Revert "Setup CirrusSearch interwiki load test" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321482 (owner: 10Thcipriani) [20:07:47] ebernhardson: reverted on mw1099 [20:07:52] thcipriani: thanks [20:08:32] (03PS2) 10Thcipriani: Allow 2FA for the abusefilter group if enabled on wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321162 (owner: 10Brian Wolff) [20:08:53] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321162 (owner: 10Brian Wolff) [20:09:30] (03Merged) 10jenkins-bot: Allow 2FA for the abusefilter group if enabled on wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321162 (owner: 10Brian Wolff) [20:10:20] bawolff: your change is now live on mw1099, check please [20:10:33] (03CR) 10BryanDavis: [C: 031] "LGTM. Andrew could/should add a Co-Authored-By header to the commit message unless he wants to blame me for whatever may go wrong with the" [puppet] - 10https://gerrit.wikimedia.org/r/321169 (owner: 10BryanDavis) [20:10:43] umm hmm, does testwiki even have that group [20:11:46] you should be able to test with whatever wiki with the debug extension https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug [20:12:02] Oh ok, for some reason i had it in my mind that that only worked on testwiki [20:12:05] just a second [20:12:23] ok, thanks :) [20:13:35] yep works [20:13:38] Thanks thcipriani [20:14:01] bawolff: ok, going live everywhere [20:15:49] !log thcipriani@tin Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:321162|Allow 2FA for the abusefilter group if enabled on wiki]] (duration: 00m 53s) [20:15:58] ^ live everywhere [20:16:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:16:58] (03CR) 10Chad: [C: 032] Remove boardvote.gpg from ignore list. File doesn't exist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321483 (owner: 10Chad) [20:17:25] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [20:17:31] :D [20:19:42] (03PS2) 10Chad: Remove boardvote.gpg from ignore list. File doesn't exist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321483 [20:19:45] (03CR) 10Chad: [V: 032] Remove boardvote.gpg from ignore list. File doesn't exist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321483 (owner: 10Chad) [20:20:11] (03PS1) 10Dduvall: docker: apt repo before installing package [puppet] - 10https://gerrit.wikimedia.org/r/321485 [20:21:02] !log demon@tin Synchronized .gitignore: For completeness (duration: 00m 46s) [20:21:25] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [20:21:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:22:09] (03PS2) 10Dduvall: [WIP] contint: New role for Docker based CI slave [puppet] - 10https://gerrit.wikimedia.org/r/320942 (https://phabricator.wikimedia.org/T150502) [20:28:25] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [20:43:44] (03CR) 10Hashar: [C: 031] "Lets do that, it is probably the last thing that was untracked on the deployment servers." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321481 (https://phabricator.wikimedia.org/T147481) (owner: 10Chad) [20:44:09] (03CR) 10Hashar: "(confirmed that the fonts.git repo content match what is on the deployment server)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321481 (https://phabricator.wikimedia.org/T147481) (owner: 10Chad) [20:47:58] !log deploying regression update for ghostscript (DSA-3691-2) on MW API appservers [20:48:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:49:21] (03PS1) 10Ottomata: Debian release 0.9.2-1 [debs/python-confluent-kafka] (debian) - 10https://gerrit.wikimedia.org/r/321486 [20:49:30] https://commons.wikimedia.org/w/index.php?title=File:Transmission_icon.png&action=edit&redlink=1 [20:50:00] ^ all revisions should be restored, after a history split. [20:50:17] “Error undeleting file: The file "mwstore://local-multiwrite/local-public/6/6d/Transmission_icon.png" is in an inconsistent state within the internal storage backends” [20:50:25] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [20:57:05] I get "Read-only file system" on /home on wmflabs [20:57:55] fnielsen_: maint in progress see labs-l [20:57:56] fnielsen_: yes. there is tool labs wide NFS maintenance. You can get more info in #wikimedia-labs [20:58:02] (03CR) 10Chad: [C: 032] Adding fonts to version control [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321481 (https://phabricator.wikimedia.org/T147481) (owner: 10Chad) [20:58:06] (03PS2) 10Chad: Adding fonts to version control [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321481 (https://phabricator.wikimedia.org/T147481) [20:58:07] Thanks! [20:59:35] !log deploying regression update for ghostscript (DSA-3691-2) on all codfw mw appservers [21:00:04] gwicke, cscott, arlolra, subbu, bearND, mdholloway, halfak, Amir1, and yurik: Respected human, time to deploy Services – Parsoid / OCG / Citoid / Mobileapps / ORES / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161114T2100). Please do the needful. [21:00:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:00:18] no parsoid deploy [21:00:22] no mobileapps deploy today [21:00:45] (03CR) 10Gehel: [C: 032] package_builder: add javahelper to standard build dependencies [puppet] - 10https://gerrit.wikimedia.org/r/321468 (https://phabricator.wikimedia.org/T150408) (owner: 10Gehel) [21:01:55] PROBLEM - puppet last run on db1094 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:02:25] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [21:05:44] !log demon@tin Synchronized fonts/: For completeness. Also for co-master git sync (duration: 01m 09s) [21:05:55] hasharAway: Yay!! ^^^^ [21:06:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:07:03] !log deploying regression update for ghostscript (DSA-3691-2) on all eqiad mw appservers [21:07:27] (03PS1) 10Nemo bis: [Planet Wikimedia] Remove Nimish Gautam [puppet] - 10https://gerrit.wikimedia.org/r/321489 [21:07:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:07:51] mutante: sorta urgent https://gerrit.wikimedia.org/r/#/c/321489/ [21:09:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [21:10:15] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [21:11:56] Nemo_bis: i see just 2 posts in the last 3 days? [21:12:32] does sound not Wiki related though, agreed [21:13:01] (03PS2) 10Chad: Fix whitespace issue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321350 (owner: 10Dereckson) [21:13:55] (03CR) 10Chad: [C: 032] Fix whitespace issue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321350 (owner: 10Dereckson) [21:14:25] (03Merged) 10jenkins-bot: Fix whitespace issue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321350 (owner: 10Dereckson) [21:14:59] (03PS2) 10Chad: Improve wmf-config/ style [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321351 (owner: 10Dereckson) [21:15:39] (03CR) 10Chad: [C: 032] Improve wmf-config/ style [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321351 (owner: 10Dereckson) [21:16:10] (03Merged) 10jenkins-bot: Improve wmf-config/ style [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321351 (owner: 10Dereckson) [21:16:38] (03PS1) 10EBernhardson: Revert "Revert "Setup CirrusSearch interwiki load test"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321492 [21:17:20] (03PS2) 10EBernhardson: Revert "Revert "Setup CirrusSearch interwiki load test"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321492 (https://phabricator.wikimedia.org/T149740) [21:17:44] !log demon@tin Synchronized wmf-config/: Coding style fixes (duration: 00m 49s) [21:18:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:19:37] (03CR) 10Chad: [C: 031] "This will be fine. It's the last reference to bits.wikimedia.org anywhere in puppet, dns, or mediawiki-config." [puppet] - 10https://gerrit.wikimedia.org/r/305536 (https://phabricator.wikimedia.org/T107430) (owner: 10BBlack) [21:20:51] mutante: I see 5 posts [21:21:21] in 4 days, and the recent ones at 2 hours of distance only [21:24:09] (03CR) 10Chad: [C: 032] `scap patch` tool for applying patches to a wmf/branch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312013 (owner: 1020after4) [21:24:13] (03PS8) 10Chad: `scap patch` tool for applying patches to a wmf/branch [mediawiki-config] - 10https://gerrit.wikimedia.org/r/312013 (owner: 1020after4) [21:24:15] PROBLEM - puppet last run on cp3045 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:26:44] !log demon@tin Synchronized scap/plugins/patch.py: Patching tool for fun and profit (duration: 00m 49s) [21:26:54] twentyafterfour: Yay ^ [21:27:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:30:55] RECOVERY - puppet last run on db1094 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [21:33:04] (03PS1) 10Hashar: Move EasyTimeline config to its own file [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321493 (https://phabricator.wikimedia.org/T22825) [21:33:36] (03CR) 10Dzahn: [C: 032] "i'd have to agree the last couple posts were not wiki related. would be great to re-add if there is some way to separate the wiki posts vi" [puppet] - 10https://gerrit.wikimedia.org/r/321489 (owner: 10Nemo bis) [21:34:28] (03PS1) 10Yuvipanda: base: Do not have classes clash with base::standard_packages [puppet] - 10https://gerrit.wikimedia.org/r/321494 [21:34:30] (03PS1) 10Yuvipanda: base: Move package list to hiera [puppet] - 10https://gerrit.wikimedia.org/r/321495 [21:36:22] (03PS1) 10Chad: scap patch: minor pep8/flake8 fixes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321496 [21:42:00] (03PS2) 10Yuvipanda: base: Do not have classes clash with base::standard_packages [puppet] - 10https://gerrit.wikimedia.org/r/321494 [21:42:02] (03PS2) 10Yuvipanda: base: Move package list to hiera [puppet] - 10https://gerrit.wikimedia.org/r/321495 [21:42:53] (03CR) 10Yuvipanda: [C: 032 V: 032] base: Do not have classes clash with base::standard_packages [puppet] - 10https://gerrit.wikimedia.org/r/321494 (owner: 10Yuvipanda) [21:51:35] PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:52:55] (03PS3) 10Yuvipanda: base: Move package list to hiera [puppet] - 10https://gerrit.wikimedia.org/r/321495 [21:53:15] RECOVERY - puppet last run on cp3045 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [21:55:53] (03CR) 10Ottomata: [C: 032] Debian release 0.9.2-1 [debs/python-confluent-kafka] (debian) - 10https://gerrit.wikimedia.org/r/321486 (owner: 10Ottomata) [21:56:16] ostriches: thanks! [21:56:23] yw [22:00:04] dapatrick, bawolff, and Reedy: Respected human, time to deploy Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161114T2200). Please do the needful. [22:06:49] (03CR) 10Alexandros Kosiaris: "Inline comments. Also I am not sure the idea of running this on the analytics cluster is the best one forward. It's going to be a producti" (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/320690 (https://phabricator.wikimedia.org/T143925) (owner: 10Ottomata) [22:07:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [22:07:43] akosiaris: quick comment about analytics kafka cluster vs main [22:07:50] (03PS2) 10Andrew Bogott: wikistatus: Handle a few more state changes [puppet] - 10https://gerrit.wikimedia.org/r/321233 [22:07:52] (03PS5) 10Andrew Bogott: openstack: cache mwclient connection in wikistatus [puppet] - 10https://gerrit.wikimedia.org/r/321169 (owner: 10BryanDavis) [22:07:54] i agree, but i'm not so sure the kafka main clusters are good either [22:08:10] the main ones are lower capacity, and back critical production services [22:08:15] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [22:08:44] analytics kafka has a lotta womp behind it, and is already used for 'production' things like statsv [22:08:57] but ja, if there was an intermediate kafka cluster, it'd be ideal [22:09:48] but, i think the analytics one is suited for this, its just named 'analytics' which feels maybe wrongish [22:09:49] :) [22:12:31] wikibugs: hi, you seem to be borked [22:15:20] (03PS1) 10Papaul: Add prod DNS for restbase201[0-2] Bug:T150680 [dns] - 10https://gerrit.wikimedia.org/r/321546 (https://phabricator.wikimedia.org/T150680) [22:15:38] bblack labs is doing maintenance [22:15:55] So wikibugs may need to be restarted like grrrit-wm had too [22:16:09] mumbles bots-prod.ganeti.eqiad.wmnet [22:16:19] lol [22:16:25] paladox: do you know how to restart? [22:16:35] Nope, but i can find the instructions [22:16:48] i found the page on mediawiki.org.. but .. [22:16:50] yes please [22:16:55] mediawiki https://www.mediawiki.org/wiki/Wikibugs [22:17:01] woops i meant mutante ^^ [22:17:55] paladox: we need the same thing grrrit-wm got :) [22:18:07] Yep :) [22:18:21] mutante: that's something I've been thinking about lately, too ("needed" bots not in labs) but, dogfooding is good too I guess [22:18:53] greg-g: yea, the most important bots.. i keep thinking it [22:19:02] sorry both, i have to step afk but be back soon [22:19:38] (03PS1) 10Yuvipanda: statistics: Don't install git explicitly [puppet] - 10https://gerrit.wikimedia.org/r/321547 [22:20:16] (03CR) 10MaxSem: "The code is live now." [puppet] - 10https://gerrit.wikimedia.org/r/319252 (https://phabricator.wikimedia.org/T149722) (owner: 10MaxSem) [22:21:07] (03PS2) 10Yuvipanda: statistics: Don't install git explicitly [puppet] - 10https://gerrit.wikimedia.org/r/321547 [22:21:10] (03PS4) 10Yuvipanda: base: Move package list to hiera [puppet] - 10https://gerrit.wikimedia.org/r/321495 [22:24:18] (03PS2) 10Papaul: Add mgmt DNS entries for restbase201[0-2] change asset tag from upper to lower case letters Bug:T150680 [dns] - 10https://gerrit.wikimedia.org/r/321472 (https://phabricator.wikimedia.org/T150680) [22:25:56] (03CR) 10Papaul: [C: 031] consistent capitalization of mgmt asset tag names [dns] - 10https://gerrit.wikimedia.org/r/320959 (owner: 10Dzahn) [22:29:12] (03PS3) 10Yuvipanda: Don't install git explicitly [puppet] - 10https://gerrit.wikimedia.org/r/321547 [22:29:14] (03PS5) 10Yuvipanda: base: Move package list to hiera [puppet] - 10https://gerrit.wikimedia.org/r/321495 [22:32:06] (03CR) 10Papaul: [C: 032] fix mgmt names in wrong data center [dns] - 10https://gerrit.wikimedia.org/r/320954 (https://phabricator.wikimedia.org/T149875) (owner: 10Dzahn) [22:32:35] (03CR) 10Yuvipanda: [C: 032] Don't install git explicitly [puppet] - 10https://gerrit.wikimedia.org/r/321547 (owner: 10Yuvipanda) [22:33:43] Are you doing any "cleanup" work right now? [22:34:06] I just got a "Bad Gateway" error, and wanted to localise the fault to being my ISP [22:35:35] RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [22:49:05] (03PS1) 10Madhuvishy: labstore: Symlink /data/project and /home on tools to mounts from labstore-secondary [puppet] - 10https://gerrit.wikimedia.org/r/321556 (https://phabricator.wikimedia.org/T146154) [22:51:09] (03CR) 10jenkins-bot: [V: 04-1] labstore: Symlink /data/project and /home on tools to mounts from labstore-secondary [puppet] - 10https://gerrit.wikimedia.org/r/321556 (https://phabricator.wikimedia.org/T146154) (owner: 10Madhuvishy) [22:53:01] (03PS2) 10Madhuvishy: labstore: Symlink /data/project and /home on tools to mounts from labstore-secondary [puppet] - 10https://gerrit.wikimedia.org/r/321556 (https://phabricator.wikimedia.org/T146154) [22:56:54] (03CR) 10Rush: [C: 031] labstore: Symlink /data/project and /home on tools to mounts from labstore-secondary [puppet] - 10https://gerrit.wikimedia.org/r/321556 (https://phabricator.wikimedia.org/T146154) (owner: 10Madhuvishy) [22:57:11] (03CR) 10Madhuvishy: [C: 032] labstore: Symlink /data/project and /home on tools to mounts from labstore-secondary [puppet] - 10https://gerrit.wikimedia.org/r/321556 (https://phabricator.wikimedia.org/T146154) (owner: 10Madhuvishy) [23:00:40] (03PS1) 10Hashar: Test for $wgTimelineFontFile values [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321558 (https://phabricator.wikimedia.org/T22825) [23:01:18] (03CR) 10jenkins-bot: [V: 04-1] Test for $wgTimelineFontFile values [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321558 (https://phabricator.wikimedia.org/T22825) (owner: 10Hashar) [23:04:59] (03CR) 10XXN: "@Dereckson, correcting misspellings in namespaces will not affect content on the wikis, because they aren't used at all (see https://ro.wi" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321121 (owner: 10XXN) [23:05:15] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [23:05:44] (03PS4) 10XXN: Fixes for namespace definitions for some Romanian (ro) projects. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321121 [23:06:52] (03PS1) 10Hashar: Symlink fonts for ploticus [mediawiki-config/fonts] - 10https://gerrit.wikimedia.org/r/321560 (https://phabricator.wikimedia.org/T22825) [23:06:59] (03CR) 10BBlack: [C: 04-1] "re: codfw - if the internal LVS service is going on a standard cluster like scb which already supports both sites, there's no reason to av" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/320690 (https://phabricator.wikimedia.org/T143925) (owner: 10Ottomata) [23:07:33] !log silence disk space alerts on labsdb1004 for 4h while investigating reoccurence - T150553 [23:08:04] (03CR) 10Hashar: [C: 04-2] "Pending." [mediawiki-config/fonts] - 10https://gerrit.wikimedia.org/r/321560 (https://phabricator.wikimedia.org/T22825) (owner: 10Hashar) [23:08:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:08:15] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [23:08:15] T150553: High replication activity filled up labsdb1004 with binlogs - https://phabricator.wikimedia.org/T150553 [23:09:23] (03PS1) 10Hashar: Drop '.ttf' from $wgTimelineFontFile settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321561 (https://phabricator.wikimedia.org/T22825) [23:10:15] (03CR) 10jenkins-bot: [V: 04-1] Drop '.ttf' from $wgTimelineFontFile settings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321561 (https://phabricator.wikimedia.org/T22825) (owner: 10Hashar) [23:10:19] (03CR) 10Hashar: [C: 04-2] "Not ready" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/321558 (https://phabricator.wikimedia.org/T22825) (owner: 10Hashar) [23:13:44] (03PS1) 10Madhuvishy: labstore: Remove old home and project mounts from tools [puppet] - 10https://gerrit.wikimedia.org/r/321562 [23:15:05] (03CR) 10Madhuvishy: [C: 032] labstore: Remove old home and project mounts from tools [puppet] - 10https://gerrit.wikimedia.org/r/321562 (owner: 10Madhuvishy) [23:18:09] !log add 150G to labsdb1004:/srv/labsdb to get it out of warning threshold T150553 [23:18:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:18:49] T150553: High replication activity filled up labsdb1004 with binlogs - https://phabricator.wikimedia.org/T150553 [23:21:22] paladox: thanks! [23:21:43] Your welcome :) [23:21:45] PROBLEM - puppet last run on db1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:30:40] !log Assigned email for my bot, RoboMaxCyberSem [23:31:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:43:10] (03PS1) 10Rush: nfs_mount: do mount{} absent prior to nfs-mount-manager [puppet] - 10https://gerrit.wikimedia.org/r/321564 [23:44:40] (03CR) 10Rush: [C: 032] nfs_mount: do mount{} absent prior to nfs-mount-manager [puppet] - 10https://gerrit.wikimedia.org/r/321564 (owner: 10Rush) [23:49:45] RECOVERY - puppet last run on db1024 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [23:53:57] (03PS1) 10Filippo Giunchedi: prometheus: rename varnish_config to cluster_config [puppet] - 10https://gerrit.wikimedia.org/r/321567 [23:53:59] (03PS1) 10Filippo Giunchedi: role: add Prometheus job for memcached_exporter [puppet] - 10https://gerrit.wikimedia.org/r/321568 (https://phabricator.wikimedia.org/T147326) [23:57:27] (03PS1) 10Rush: nfs-mount-manager: improve 'check' grep to be more exact [puppet] - 10https://gerrit.wikimedia.org/r/321569 [23:58:54] (03CR) 10Rush: [C: 032] nfs-mount-manager: improve 'check' grep to be more exact [puppet] - 10https://gerrit.wikimedia.org/r/321569 (owner: 10Rush)