[00:25:18] 503 on phabricator just now? [00:25:31] working now... [00:26:22] WFM too [00:26:59] 06Operations, 06Reading-Infrastructure-Team, 06Services, 06Services-next, 07Security-General: Protect sensitive user-related information with a UserData / auth / session service - https://phabricator.wikimedia.org/T140813#2549960 (10Tgr) One concern that came up (in the code review for https://gerrit.wik... [01:00:50] 06Operations, 06Reading-Infrastructure-Team, 06Services, 06Services-next, 07Security-General: Protect sensitive user-related information with a UserData / auth / session service - https://phabricator.wikimedia.org/T140813#2550004 (10Smalyshev) If Mediawiki gates access to auth service then Mediawiki woul... [01:46:07] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/2/3: down - Core: cr2-codfw:xe-5/0/1 (Zayo, OGYX/120003//ZYO) 36ms {#2909} [10Gbps wave]BR [02:27:25] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.14) (duration: 11m 28s) [02:27:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:33:32] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Aug 13 02:33:32 UTC 2016 (duration 6m 7s) [02:33:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [05:28:19] PROBLEM - puppet last run on db2034 is CRITICAL: CRITICAL: puppet fail [05:55:28] RECOVERY - puppet last run on db2034 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:05:48] PROBLEM - puppet last run on ms-be2024 is CRITICAL: CRITICAL: puppet fail [06:32:59] RECOVERY - puppet last run on ms-be2024 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:38:47] PROBLEM - Disk space on kafka1022 is CRITICAL: DISK CRITICAL - free space: /var/spool/kafka/b 73862 MB (3% inode=99%) [07:48:21] checking --^ [07:49:50] I restarted all the kafka brokers this week and we have still a bug only fixed with kafka 0.10. Basically upon restart kafka changes the mtime of all the logs to now due to a jvm truncate() weird behavior, and this messes up with file log removal due to aging. [07:51:04] we have a maximum log partition size of 500GB IIRC, and /var/spool/kafka/b is about 480.. so theoretically the log deletion should kick in soon but the disk is already full [08:14:33] !log added temporary override to Kafka topic settings to free disk space: retention.bytes=483183820800 [08:14:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:22:09] !log added temporary override to Kafka topic settings to free disk space: retention.bytes=429496729600 [08:22:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:25:10] (as FYI, all instructions are in https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Kafka/Administration#Temporarily_Modify_Per_Topic_Retention_Settings) [08:27:09] RECOVERY - Disk space on kafka1022 is OK: DISK OK [08:27:46] good, now at 88% and the maximum partition size will be 400GB, so we should be ok [08:32:28] I would be inclined to go down ever further to 350 but only if necessary [08:32:43] so I'll keep an eye [08:41:59] !log restarting cassandra on aqs100[456] (non live cluster) for jvm upgrades and new settings [08:42:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:57:53] !log added temporary override to Kafka topic settings to free disk space: retention.bytes=375809638400 [08:57:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:58:31] I did this --^ since usually the max partition size of text is around 280GiB [08:59:01] so 350GiB is enough and it will guarantee not to run with a disk partition size filled at ~90% [09:07:45] (03PS1) 10Elukey: Lower down the maximum kafka topic partition size to 350GiB [puppet] - 10https://gerrit.wikimedia.org/r/304609 [09:07:55] This is a more permantent fix --^ [09:08:11] for the moment I'll leave the temporary override and I'll alert Andrew to follow up on Monday [09:13:36] !log extended the maximum kafka topic partition size (350GiB) to upload as well (it was only text before) [09:13:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:43:42] (03PS3) 10Reedy: 8 more to extension.json in extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303915 (https://phabricator.wikimedia.org/T139800) [09:44:27] (03CR) 10Reedy: [C: 032] 8 more to extension.json in extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303915 (https://phabricator.wikimedia.org/T139800) (owner: 10Reedy) [09:44:55] (03Merged) 10jenkins-bot: 8 more to extension.json in extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303915 (https://phabricator.wikimedia.org/T139800) (owner: 10Reedy) [09:46:14] !log reedy@tin Synchronized wmf-config/extension-list: Many more to extension.json in extension-list (duration: 00m 50s) [09:46:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:49:59] (03PS1) 10Reedy: Fix path to FancyCaptcha extension.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304610 [09:50:33] (03CR) 10Reedy: [C: 032] Fix path to FancyCaptcha extension.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304610 (owner: 10Reedy) [09:51:00] (03Merged) 10jenkins-bot: Fix path to FancyCaptcha extension.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304610 (owner: 10Reedy) [09:52:12] !log reedy@tin Synchronized wmf-config/extension-list: Fix path to FancyCaptcha extension.json in extension-list (duration: 00m 47s) [09:52:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:00:15] (03PS5) 10Reedy: Load WikimediaMessages via wfLoadExtension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303014 (https://phabricator.wikimedia.org/T140852) [10:28:02] (03PS1) 10Reedy: Switch many extensions to use wfLoadExtension() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304613 [10:28:32] (03CR) 10jenkins-bot: [V: 04-1] Switch many extensions to use wfLoadExtension() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304613 (owner: 10Reedy) [10:29:38] (03PS1) 10Reedy: Remove orphaned OpenStreetMapSlippyMap config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304614 [10:31:54] (03PS1) 10Reedy: Remove s from VipsScaler extension.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304615 [10:32:17] (03CR) 10Reedy: [C: 032] Remove s from VipsScaler extension.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304615 (owner: 10Reedy) [10:32:47] (03Merged) 10jenkins-bot: Remove s from VipsScaler extension.json [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304615 (owner: 10Reedy) [10:33:48] (03PS6) 10Reedy: Load WikimediaMessages via wfLoadExtension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303014 (https://phabricator.wikimedia.org/T140852) [10:33:50] (03CR) 10Reedy: [C: 032] Load WikimediaMessages via wfLoadExtension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303014 (https://phabricator.wikimedia.org/T140852) (owner: 10Reedy) [10:33:53] !log reedy@tin Synchronized wmf-config/extension-list: Fix erroneous plural (duration: 00m 50s) [10:33:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:41:43] (03CR) 10Reedy: [C: 032] Load WikimediaMessages via wfLoadExtension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303014 (https://phabricator.wikimedia.org/T140852) (owner: 10Reedy) [10:42:10] (03Merged) 10jenkins-bot: Load WikimediaMessages via wfLoadExtension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303014 (https://phabricator.wikimedia.org/T140852) (owner: 10Reedy) [10:43:27] (03PS2) 10Reedy: Remove orphaned OpenStreetMapSlippyMap config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304614 [10:43:48] (03CR) 10Reedy: [C: 032] Remove orphaned OpenStreetMapSlippyMap config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304614 (owner: 10Reedy) [10:44:15] (03Merged) 10jenkins-bot: Remove orphaned OpenStreetMapSlippyMap config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304614 (owner: 10Reedy) [10:45:43] !log reedy@tin Synchronized wmf-config/: Simplify WikimediaMessages loading, Remove Orphaned OpenStreetMapSlippyMap config (duration: 00m 52s) [10:45:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:50:00] (03PS2) 10Reedy: Load CentralNotice via wfLoadExtension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304126 (https://phabricator.wikimedia.org/T140852) [10:51:43] (03PS2) 10Reedy: Switch many extensions to use wfLoadExtension() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304613 [10:54:53] (03PS1) 10Reedy: Remove .php from WikimediaMessages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304616 [10:55:19] (03CR) 10Reedy: [C: 032] Remove .php from WikimediaMessages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304616 (owner: 10Reedy) [10:55:47] (03Merged) 10jenkins-bot: Remove .php from WikimediaMessages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304616 (owner: 10Reedy) [10:57:05] !log reedy@tin Synchronized wmf-config/: Simplify WikimediaMessages loading, Remove Orphaned OpenStreetMapSlippyMap config (duration: 00m 50s) [11:07:46] (03PS1) 10Reedy: Only require_once JsonConfig.php in one place [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304617 [11:08:21] (03PS3) 10Reedy: Switch many extensions to use wfLoadExtension() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304613 [11:09:09] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [11:09:57] (03CR) 10Reedy: [C: 032] Switch many extensions to use wfLoadExtension() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304613 (owner: 10Reedy) [11:10:28] (03Merged) 10jenkins-bot: Switch many extensions to use wfLoadExtension() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304613 (owner: 10Reedy) [11:10:43] * Reedy is gonna let that one sit on beta for a few mins [11:11:07] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [11:18:11] (03PS1) 10Reedy: Fix wfLoadExtension typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304618 [11:18:41] (03CR) 10Reedy: [C: 032] Fix wfLoadExtension typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304618 (owner: 10Reedy) [11:19:10] (03Merged) 10jenkins-bot: Fix wfLoadExtension typo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304618 (owner: 10Reedy) [11:19:19] PROBLEM - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is CRITICAL: JNX_ALARMS CRITICAL - Received genError(5) error-status at error-index 1 [11:21:08] RECOVERY - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms [11:38:08] !log reedy@tin Synchronized wmf-config/CommonSettings.php: wfLoadExtension all around (duration: 00m 49s) [11:38:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:10:46] 06Operations, 06Reading-Infrastructure-Team, 06Services, 06Services-next, 07Security-General: Protect sensitive user-related information with a UserData / auth / session service - https://phabricator.wikimedia.org/T140813#2550324 (10Anomie) > and the only weird effect may be stale auth data in storage th... [12:14:18] PROBLEM - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 10.65.0.24 [12:16:08] RECOVERY - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms [13:42:58] PROBLEM - HTTPS-wmfusercontent on phab.wmfusercontent.org is CRITICAL: SSL CRITICAL - Certificate *.wmfusercontent.org valid until 2016-09-12 13:41:12 +0000 (expires in 29 days) [14:42:51] (03PS1) 10Aude: Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304622 [14:46:48] (03PS2) 10Aude: Update interwiki map [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304622 [15:11:24] (03CR) 10Ottomata: [C: 032] Lower down the maximum kafka topic partition size to 350GiB [puppet] - 10https://gerrit.wikimedia.org/r/304609 (owner: 10Elukey) [17:16:57] PROBLEM - puppet last run on ms-be2003 is CRITICAL: CRITICAL: Puppet has 1 failures [17:44:18] RECOVERY - puppet last run on ms-be2003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:30:30] anybody here to deploy simple but very needed patch please? https://phabricator.wikimedia.org/T141965 [19:30:46] mutante: iianm you care of the first round of it... [19:55:11] Danny_B: raise task priority perhaps? or plan it at https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160818T1600 if it's not urgent [19:57:43] well, it does not block me in my work at all (so *personally* i don't care pretty much about timing), but i know many people were complaining that it is breaking their stuff... [19:59:15] priority should be about general need, so it's probably a good idea to prioritize the task as high and not normal [20:19:23] (03PS1) 10Dereckson: wmgUseWPB → wmgUseWikidataPageBanner [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304631 [20:19:25] (03PS1) 10Dereckson: Enable WikidataPageBanner on he.wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304632 (https://phabricator.wikimedia.org/T140717) [20:59:47] !log labtestcontrol2001: restarted mysql to unbreak labtesthorizon login again. we really need to figure out why this becomes necessary [20:59:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:09:38] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [21:38:38] PROBLEM - puppet last run on ms-be2023 is CRITICAL: CRITICAL: puppet fail [21:39:45] JEM + ASIMOVBOT = AMOR [21:39:49] JEM + ASIMOVBOT = AMOR [21:43:57] Platonides: heh, you're fast [21:58:23] think he might have it scripted [22:07:57] RECOVERY - puppet last run on ms-be2023 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:06:47] Yeah. [23:09:55] (03PS1) 10Ladsgroup: ores: Enable uwsgi-specific statsd setup [puppet] - 10https://gerrit.wikimedia.org/r/304678 (https://phabricator.wikimedia.org/T141543) [23:20:03] (03CR) 10Ladsgroup: [C: 031] "Tested in ores-beta.wmflabs.org (deployment-sca03.eqiad.wmflabs), works just fine" [puppet] - 10https://gerrit.wikimedia.org/r/304678 (https://phabricator.wikimedia.org/T141543) (owner: 10Ladsgroup)