[00:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161206T0000). Please do the needful. [00:00:04] jdlrobson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [00:00:33] \o [00:00:51] meh [00:02:29] (03CR) 10Andrew Bogott: [C: 032] Remove unneeded lines from observerenv.sh [puppet] - 10https://gerrit.wikimedia.org/r/325443 (owner: 10Andrew Bogott) [00:02:51] jdlrobson, https://gerrit.wikimedia.org/r/#/c/325365/ doesn't do what it says on tin [00:03:52] MaxSem: ah.. because it's not in wmf-config/CommonSettings.php ? [00:04:34] (03CR) 10Jdlrobson: [C: 04-1] "dblist is not loaded. New patch coming." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325365 (https://phabricator.wikimedia.org/T151346) (owner: 10Bmansurov) [00:04:43] (03PS3) 10Jdlrobson: Enable ReadMore on mobile jawiki and eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325365 (https://phabricator.wikimedia.org/T151346) (owner: 10Bmansurov) [00:04:47] ^ how about now? [00:05:28] ah, it's a blacklist [00:05:47] yeh i moaned about the configuration name when that patch was written... [00:06:03] the implementation is a little strange [00:06:10] but the dblist was important :) [00:07:51] (03CR) 10MaxSem: [C: 032] Enable ReadMore on mobile jawiki and eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325365 (https://phabricator.wikimedia.org/T151346) (owner: 10Bmansurov) [00:08:29] (03Merged) 10jenkins-bot: Enable ReadMore on mobile jawiki and eswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325365 (https://phabricator.wikimedia.org/T151346) (owner: 10Bmansurov) [00:09:00] 06Operations, 13Patch-For-Review, 05Prometheus-metrics-monitoring: Port application-specific metrics from ganglia to prometheus - https://phabricator.wikimedia.org/T145659#2849457 (10fgiunchedi) I went over the ganglia rrds updated in the last 30d in P4571 and audited their origin and what to do. [] fundrai... [00:09:12] 06Operations, 13Patch-For-Review, 05Prometheus-metrics-monitoring: Port application-specific metrics from ganglia to prometheus - https://phabricator.wikimedia.org/T145659#2849458 (10fgiunchedi) [00:09:14] jdlrobson, pulled on mwdebug1002 [00:09:58] MaxSem: testing [00:11:29] MaxSem: looks good! ship it! [00:14:12] !log maxsem@tin Synchronized dblists/related-articles-footer-blacklisted-skins.dblist: https://gerrit.wikimedia.org/r/#/c/325365/ (duration: 00m 59s) [00:14:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:15:28] !log maxsem@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/325365/ (duration: 00m 48s) [00:15:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:16:44] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/325365/ (duration: 00m 45s) [00:16:53] jdlrobson, ^ [00:16:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:17:53] (03CR) 10MaxSem: Enable banners on Finnish Wikivoyage (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325369 (https://phabricator.wikimedia.org/T152344) (owner: 10Jdlrobson) [00:18:08] sweet! looks good! ready for the next one :) [00:18:46] MaxSem: The problem is now the projects which have banners are in a majority [00:18:55] it's my goal to try and get all those projects migrated over [00:19:00] meh [00:19:14] (03CR) 10MaxSem: [C: 032] Enable banners on Finnish Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325369 (https://phabricator.wikimedia.org/T152344) (owner: 10Jdlrobson) [00:22:48] oh shoot Max.. I accidentally disabled on Greek so will need to add a follow up.. [00:23:07] not greek.. romanian [00:23:12] (03PS2) 10MaxSem: Enable banners on Finnish Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325369 (https://phabricator.wikimedia.org/T152344) (owner: 10Jdlrobson) [00:24:07] (03CR) 10MaxSem: Enable banners on Finnish Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325369 (https://phabricator.wikimedia.org/T152344) (owner: 10Jdlrobson) [00:24:10] (03CR) 10MaxSem: [C: 032] Enable banners on Finnish Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325369 (https://phabricator.wikimedia.org/T152344) (owner: 10Jdlrobson) [00:24:15] sigh [00:24:46] (03Merged) 10jenkins-bot: Enable banners on Finnish Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325369 (https://phabricator.wikimedia.org/T152344) (owner: 10Jdlrobson) [00:25:01] jdlrobson, pulled on mwdebug1002 [00:25:35] jdlrobson, I lied, not pulled for realz [00:25:39] 06Operations, 06Performance-Team, 10Thumbor: Record OOM kills as a metric with mtail - https://phabricator.wikimedia.org/T148962#2849472 (10Gilles) 05Open>03declined Mtail is too unstable at the moment. [00:26:27] (03PS1) 10Jdlrobson: Re-enable banners on Romanian Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325447 [00:28:50] jdlrobson, I meant *now* pulled for realz [00:29:04] MaxSem: im confused.. so it's live without debug? [00:29:16] so it's live on debug [00:31:06] MaxSem: it's looking good but i need https://gerrit.wikimedia.org/r/325447 live before deploying [00:31:13] as otherwise we'll cause some fatals on Romanian [00:31:43] (03CR) 10MaxSem: [C: 032] Re-enable banners on Romanian Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325447 (owner: 10Jdlrobson) [00:32:15] although not even sure how Romanian uses this if at all :) [00:32:16] (03Merged) 10jenkins-bot: Re-enable banners on Romanian Wikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325447 (owner: 10Jdlrobson) [00:32:41] jdlrobson, ^ [00:33:13] thanks MaxSem ! [00:33:31] jdlrobson, can be pushed? [00:33:38] yup ship it! [00:34:21] 06Operations, 10MediaWiki-JobRunner, 10Wikimedia-General-or-Unknown: jobrunner memory leaks - https://phabricator.wikimedia.org/T122069#2849490 (10Krinkle) [00:35:38] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/325369/2 + https://gerrit.wikimedia.org/r/#/c/325447/ (duration: 00m 44s) [00:35:47] jdlrobson, ^ [00:35:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:36:15] (03PS3) 10MaxSem: Disable Wikipedia beta banner experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325370 (https://phabricator.wikimedia.org/T148634) (owner: 10Jdlrobson) [00:44:27] jdlrobson, ??? [00:44:40] MaxSem: ? [00:44:58] can you be more specific with your question? [00:45:05] still waiting for you to confirm we can continue [00:45:18] oh sorry i didnt realise. yes please. [00:45:29] all is good with the Finnish one [00:45:33] (03CR) 10MaxSem: [C: 032] Disable Wikipedia beta banner experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325370 (https://phabricator.wikimedia.org/T148634) (owner: 10Jdlrobson) [00:46:07] (03Merged) 10jenkins-bot: Disable Wikipedia beta banner experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325370 (https://phabricator.wikimedia.org/T148634) (owner: 10Jdlrobson) [00:46:46] jdlrobson, pulled [00:47:33] MaxSem: ship it! All good [00:49:17] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/325370/ (duration: 00m 44s) [00:49:32] jdlrobson, ^ [00:49:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:49:59] MaxSem: all good! [00:50:28] (03PS2) 10MaxSem: Roll out wikidata description taglines to French and German Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325366 (https://phabricator.wikimedia.org/T151345) (owner: 10Jdlrobson) [00:50:50] (03CR) 10MaxSem: [C: 032] Roll out wikidata description taglines to French and German Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325366 (https://phabricator.wikimedia.org/T151345) (owner: 10Jdlrobson) [00:51:23] (03Merged) 10jenkins-bot: Roll out wikidata description taglines to French and German Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325366 (https://phabricator.wikimedia.org/T151345) (owner: 10Jdlrobson) [00:52:08] jdlrobson, pulled [00:53:05] MaxSem: on 1002 ? [00:53:14] yep [00:53:46] hmm,.. not seeing the result [00:53:47] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:54:14] now i am [00:54:32] good to ship MaxSem ! [00:55:00] jdlrobson, in which order? [00:55:27] MaxSem: order of what? [00:55:38] of files [00:55:49] can't do it with a single command [00:55:57] (unless it's full scap) [00:56:43] MaxSem: I'm still not understanding what you mean... [00:56:50] we're talking about https://gerrit.wikimedia.org/r/#/c/325366/ still ? [00:56:56] aha [00:57:40] you mean wmf-config/InitialiseSettings.php VS dblists/nowikidatadescriptiontaglines.dblist ? [00:58:00] yes [00:58:15] the former should be deployed first i guess [00:58:24] since that's a stronger way of saying the other thing [00:59:48] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/325366/2 (duration: 00m 44s) [00:59:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:00:55] !log maxsem@tin Synchronized dblists/nowikidatadescriptiontaglines.dblist: https://gerrit.wikimedia.org/r/#/c/325366/2 (duration: 00m 43s) [01:01:03] jdlrobson, ^ [01:01:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:02:23] MaxSem: perfect :) [01:02:27] thank you for your patience today :) [01:08:01] (03PS3) 10Dzahn: (WIP) services: create global service restart script [puppet] - 10https://gerrit.wikimedia.org/r/325039 [01:08:51] (03PS1) 10Catrope: Rename $wgFlagRestrctions to $wgFlaggedRevsTagsRestrictions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325456 [01:09:02] AaronSchulz: FYI ---^^ [01:09:04] (03CR) 10jenkins-bot: [V: 04-1] (WIP) services: create global service restart script [puppet] - 10https://gerrit.wikimedia.org/r/325039 (owner: 10Dzahn) [01:12:07] (03PS1) 10Tim Landscheidt: Remove files/apache/ports.conf and files/apache/ports.conf.ssl [puppet] - 10https://gerrit.wikimedia.org/r/325457 [01:19:39] (03PS4) 10Dzahn: (WIP) services: create global service restart script [puppet] - 10https://gerrit.wikimedia.org/r/325039 [01:20:47] (03CR) 10jenkins-bot: [V: 04-1] (WIP) services: create global service restart script [puppet] - 10https://gerrit.wikimedia.org/r/325039 (owner: 10Dzahn) [01:21:47] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [01:23:46] (03PS5) 10Dzahn: (WIP) services: create global service restart script [puppet] - 10https://gerrit.wikimedia.org/r/325039 [01:28:59] (03PS1) 10Dzahn: move install_console from global /files to modules/role/files/ [puppet] - 10https://gerrit.wikimedia.org/r/325460 [01:34:07] (03PS1) 10Dzahn: ganeti: move id_dsa.pub from /files to modules/role/file/ganeti/ [puppet] - 10https://gerrit.wikimedia.org/r/325461 [01:37:45] 06Operations, 06Performance-Team: Upgrade labmon1001 Grafana to 4.0 beta - https://phabricator.wikimedia.org/T152473#2849567 (10Gilles) [01:38:15] (03PS1) 10Dzahn: jsbench: move files to modules/role/files/jsbench/ [puppet] - 10https://gerrit.wikimedia.org/r/325462 [01:38:44] (03PS2) 10Dzahn: jsbench: move files to modules/role/files/jsbench/ [puppet] - 10https://gerrit.wikimedia.org/r/325462 [01:41:11] (03CR) 10Dzahn: [C: 031] "probably, yea, i remember adding and looking at them again later, it was something that was different pre-jessie, will take one last look " [puppet] - 10https://gerrit.wikimedia.org/r/325457 (owner: 10Tim Landscheidt) [01:45:37] PROBLEM - puppet last run on db1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:57:47] (03PS1) 10Filippo Giunchedi: prometheus: add redis_exporter class [puppet] - 10https://gerrit.wikimedia.org/r/325466 (https://phabricator.wikimedia.org/T148637) [02:01:32] (03PS1) 10Aaron Schulz: Add "objectcache" log group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325467 [02:04:41] (03PS2) 10Aaron Schulz: Add "objectcache" log group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325467 [02:09:35] (03CR) 10Aaron Schulz: [C: 032] Add "objectcache" log group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325467 (owner: 10Aaron Schulz) [02:10:06] (03Merged) 10jenkins-bot: Add "objectcache" log group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325467 (owner: 10Aaron Schulz) [02:12:31] !log aaron@tin Synchronized wmf-config/InitialiseSettings.php: Added objectcache group (duration: 00m 58s) [02:12:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:13:37] RECOVERY - puppet last run on db1033 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [02:13:53] (03PS1) 10Tim Landscheidt: base: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325468 [02:14:10] (03PS1) 10Tim Landscheidt: beta: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325469 [02:14:57] PROBLEM - Apache HTTP on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:14:58] PROBLEM - HHVM rendering on mw1285 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:15:16] (03PS1) 10Tim Landscheidt: contint: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325470 [02:15:18] (03PS1) 10Filippo Giunchedi: Initial debianization [debs/prometheus-redis-exporter] - 10https://gerrit.wikimedia.org/r/325471 [02:15:27] (03PS1) 10Tim Landscheidt: dataset: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325472 [02:17:33] (03PS1) 10Tim Landscheidt: iegreview: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325473 [02:17:42] (03PS1) 10Tim Landscheidt: install_server: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325474 [02:17:56] (03PS1) 10Tim Landscheidt: ldap: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325475 [02:18:04] (03PS1) 10Tim Landscheidt: mediawiki: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325476 [02:18:12] (03PS1) 10Tim Landscheidt: noc: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325477 [02:18:20] (03PS1) 10Tim Landscheidt: openldap: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325478 [02:18:36] (03PS1) 10Tim Landscheidt: openstack: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325479 [02:18:47] (03PS1) 10Tim Landscheidt: smokeping: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325480 [02:18:56] (03PS1) 10Tim Landscheidt: snapshot: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325481 [02:19:05] (03PS1) 10Tim Landscheidt: statistics: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325482 [02:19:13] (03PS1) 10Tim Landscheidt: udp2log: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325483 [02:19:18] (03PS1) 10Tim Landscheidt: wikimania_scholarships: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325484 [02:19:43] (03PS1) 10Tim Landscheidt: aptly: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325485 [02:19:51] (03PS1) 10Tim Landscheidt: elasticsearch: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325486 [02:19:57] (03PS1) 10Tim Landscheidt: eventlogging: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325487 [02:20:03] (03PS1) 10Tim Landscheidt: exim4: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325488 [02:20:10] (03PS1) 10Tim Landscheidt: extdist: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325489 [02:20:17] (03PS1) 10Tim Landscheidt: gerrit: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325490 [02:20:23] (03PS1) 10Tim Landscheidt: icinga: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325491 [02:20:29] (03PS1) 10Tim Landscheidt: ipmi: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325492 [02:20:34] (03PS1) 10Tim Landscheidt: labstore: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325493 [02:20:40] (03PS1) 10Tim Landscheidt: logstash: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325494 [02:20:49] (03PS1) 10Tim Landscheidt: mediawiki_singlenode: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325495 [02:20:53] (03PS1) 10Tim Landscheidt: ocg: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325496 [02:20:58] (03PS1) 10Tim Landscheidt: publichtml: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325497 [02:21:06] (03PS1) 10Tim Landscheidt: racktables: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325498 [02:21:12] (03PS1) 10Tim Landscheidt: role: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325499 [02:21:19] (03PS1) 10Tim Landscheidt: rsync: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325500 [02:21:23] (03PS1) 10Tim Landscheidt: scap: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325501 [02:21:29] (03PS1) 10Tim Landscheidt: toolserver_legacy: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325502 [02:21:38] (03PS1) 10Tim Landscheidt: wikistats: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325503 [02:21:44] (03PS1) 10Tim Landscheidt: zuul: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325504 [02:27:45] (03CR) 10Tim Landscheidt: "For posterity:" [puppet] - 10https://gerrit.wikimedia.org/r/325485 (owner: 10Tim Landscheidt) [02:30:19] (03PS1) 10Aaron Schulz: Turn off duplicate key reporting for parser cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325505 [02:30:42] (03CR) 10Aaron Schulz: [C: 032] Turn off duplicate key reporting for parser cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325505 (owner: 10Aaron Schulz) [02:30:46] (03PS2) 10Aaron Schulz: Turn off duplicate key reporting for parser cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325505 [02:31:50] (03CR) 10Aaron Schulz: [C: 032] Turn off duplicate key reporting for parser cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325505 (owner: 10Aaron Schulz) [02:32:33] (03Merged) 10jenkins-bot: Turn off duplicate key reporting for parser cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325505 (owner: 10Aaron Schulz) [02:36:23] !log aaron@tin Synchronized wmf-config/InitialiseSettings.php: Turn off duplicate key reporting for parser cache (duration: 02m 15s) [02:36:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:49:27] PROBLEM - puppet last run on prometheus2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:50:37] PROBLEM - puppet last run on druid1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:13:37] PROBLEM - puppet last run on kubernetes1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:17:27] RECOVERY - puppet last run on prometheus2002 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [03:18:37] RECOVERY - puppet last run on druid1001 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [03:25:47] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 705.94 seconds [03:32:47] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 165.87 seconds [03:41:30] 06Operations, 06Performance-Team: Upgrade labmon1001 Grafana to 4.0 - https://phabricator.wikimedia.org/T152473#2849683 (10Peter) [03:42:37] RECOVERY - puppet last run on kubernetes1002 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [04:04:25] (03PS1) 10Tim Landscheidt: Remove remnant files and templates of mha classes [puppet] - 10https://gerrit.wikimedia.org/r/325509 [04:12:38] PROBLEM - puppet last run on elastic1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:15:25] (03PS1) 10Tim Landscheidt: Remove obsolete PPA key files [puppet] - 10https://gerrit.wikimedia.org/r/325510 [04:40:37] RECOVERY - puppet last run on elastic1037 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [05:31:27] (03PS1) 10Aaron Schulz: Turn off duplicate key reporting for parser cache (2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325514 [05:31:51] (03CR) 10Aaron Schulz: [C: 032] Turn off duplicate key reporting for parser cache (2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325514 (owner: 10Aaron Schulz) [05:32:25] (03Merged) 10jenkins-bot: Turn off duplicate key reporting for parser cache (2) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325514 (owner: 10Aaron Schulz) [05:33:39] !log aaron@tin Synchronized wmf-config/InitialiseSettings.php: Turn off duplicate key reporting for parser cache (2) (duration: 00m 44s) [05:33:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:47:12] !log aaron@tin Synchronized wmf-config/CommonSettings.php: Turn off duplicate key reporting for parser cache (duration: 00m 46s) [05:47:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:07:17] PROBLEM - puppet last run on db1037 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:36:17] RECOVERY - puppet last run on db1037 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:50:57] PROBLEM - puppet last run on db1026 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[gdisk] [07:18:47] RECOVERY - puppet last run on db1026 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [07:48:21] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 5 others: Separate clusters for asynchronous processing from the ones for public consumption - https://phabricator.wikimedia.org/T152074#2849909 (10Joe) >>! In T152074#2848311, @GWicke wrote: > There are pros & cons for dividing the API cluster in... [07:52:37] PROBLEM - puppet last run on mw1198 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:53:02] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 5 others: Separate clusters for asynchronous processing from the ones for public consumption - https://phabricator.wikimedia.org/T152074#2849912 (10Joe) So, assuming parsoid can do TLS to its backend (I'll check that), my proposed plan would be: #... [07:54:40] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 6 others: Separate clusters for asynchronous processing from the ones for public consumption - https://phabricator.wikimedia.org/T152074#2849914 (10Joe) [08:16:07] PROBLEM - puppet last run on helium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:20:38] RECOVERY - puppet last run on mw1198 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [08:31:57] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [08:44:07] RECOVERY - puppet last run on helium is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [08:47:38] !log restarting hhvm on mw1285 (hhvm debug in /tmp/hhvm.100918.bt) [08:47:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:47:56] same thing happened yesterday evening (EU time) [08:49:47] RECOVERY - Apache HTTP on mw1285 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 612 bytes in 0.030 second response time [08:49:47] RECOVERY - HHVM rendering on mw1285 is OK: HTTP OK: HTTP/1.1 200 OK - 71740 bytes in 0.183 second response time [08:59:06] (03PS2) 10Gehel: elasticsearch - upgrade codfw cluster to Java 8 [puppet] - 10https://gerrit.wikimedia.org/r/325276 (https://phabricator.wikimedia.org/T151325) [08:59:58] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [09:00:09] (03CR) 10Gehel: [C: 032] elasticsearch - upgrade codfw cluster to Java 8 [puppet] - 10https://gerrit.wikimedia.org/r/325276 (https://phabricator.wikimedia.org/T151325) (owner: 10Gehel) [09:07:38] RECOVERY - MariaDB Slave Lag: m3 on db1048 is OK: OK slave_sql_lag Replication lag: 48.89 seconds [09:09:57] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [09:24:47] what are you guys doing [09:25:52] sorry? [09:26:37] 06Operations, 06Analytics-Kanban, 10Traffic, 13Patch-For-Review: Ganglia varnishkafka python module crashing repeatedly - https://phabricator.wikimedia.org/T152093#2849995 (10elukey) 05Open>03Resolved [09:34:17] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:36:57] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [10:02:17] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [10:16:43] (03CR) 10Ema: [C: 032] Release 4.1.4-1wm1 [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/325289 (owner: 10Ema) [10:34:56] (03CR) 10Giuseppe Lavagetto: [C: 032] Add missing Build-Depends [software/service-checker] - 10https://gerrit.wikimedia.org/r/325319 (owner: 10Volans) [10:42:50] 06Operations, 10Wikimedia-General-or-Unknown, 07Availability, 13Patch-For-Review, and 2 others: Job queue size growing since ~12:00 on 2016-11-19 - https://phabricator.wikimedia.org/T151196#2850063 (10Joe) 05Open>03Resolved [10:53:34] (03CR) 10Faidon Liambotis: [C: 031] "LGTM, but see inline comments (and Jenkins' -1)" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/320246 (https://phabricator.wikimedia.org/T150160) (owner: 10Dzahn) [11:08:07] (03PS1) 10MarcoAurelio: Allow private.dblist wikis to manage more permissions internally [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325531 (https://phabricator.wikimedia.org/T152489) [11:17:57] !log varnish 4.1.4-1wm1 uploaded to carbon [11:18:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:22:47] PROBLEM - puppet last run on analytics1036 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:24:15] (03PS1) 10Addshore: Add note regarding source of wdqs lag metrics [puppet] - 10https://gerrit.wikimedia.org/r/325533 [11:50:46] (03PS8) 10Kaldari: Re-enable 'centralauth-rename' rights for when maintenance is done [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322667 (https://phabricator.wikimedia.org/T148242) (owner: 10MarcoAurelio) [11:51:47] RECOVERY - puppet last run on analytics1036 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [12:07:45] (03CR) 10Alexandros Kosiaris: [C: 032] Remove files/apache/ports.conf and files/apache/ports.conf.ssl [puppet] - 10https://gerrit.wikimedia.org/r/325457 (owner: 10Tim Landscheidt) [12:07:49] (03PS2) 10Alexandros Kosiaris: Remove files/apache/ports.conf and files/apache/ports.conf.ssl [puppet] - 10https://gerrit.wikimedia.org/r/325457 (owner: 10Tim Landscheidt) [12:07:51] (03CR) 10Alexandros Kosiaris: [V: 032] Remove files/apache/ports.conf and files/apache/ports.conf.ssl [puppet] - 10https://gerrit.wikimedia.org/r/325457 (owner: 10Tim Landscheidt) [12:08:14] (03CR) 10Alexandros Kosiaris: [C: 032] Add note regarding source of wdqs lag metrics [puppet] - 10https://gerrit.wikimedia.org/r/325533 (owner: 10Addshore) [12:08:23] (03PS2) 10Alexandros Kosiaris: Add note regarding source of wdqs lag metrics [puppet] - 10https://gerrit.wikimedia.org/r/325533 (owner: 10Addshore) [12:08:26] (03CR) 10Alexandros Kosiaris: [V: 032] Add note regarding source of wdqs lag metrics [puppet] - 10https://gerrit.wikimedia.org/r/325533 (owner: 10Addshore) [12:09:38] !log starting elasticsearch codfw cluster restart for Java 8 upgrade - T151325 [12:09:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:09:49] T151325: Upgrade to Java 8 for cirrus / elasticsearch - https://phabricator.wikimedia.org/T151325 [12:11:28] 06Operations, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2850204 (10Ciencia_Al_Poder) >>! In T66214#2827486, @GWicke wrote: > Since the need for explicit control should be rare, I think using the Accept heade... [12:49:37] PROBLEM - puppet last run on mw1179 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:59:55] (03CR) 10Alexandros Kosiaris: [C: 032] ganeti: move id_dsa.pub from /files to modules/role/file/ganeti/ [puppet] - 10https://gerrit.wikimedia.org/r/325461 (owner: 10Dzahn) [12:59:59] (03PS2) 10Alexandros Kosiaris: ganeti: move id_dsa.pub from /files to modules/role/file/ganeti/ [puppet] - 10https://gerrit.wikimedia.org/r/325461 (owner: 10Dzahn) [13:00:01] (03CR) 10Alexandros Kosiaris: [V: 032] ganeti: move id_dsa.pub from /files to modules/role/file/ganeti/ [puppet] - 10https://gerrit.wikimedia.org/r/325461 (owner: 10Dzahn) [13:08:33] 06Operations, 06Performance-Team: Upgrade labmon1001 Grafana to 4.0.1 - https://phabricator.wikimedia.org/T152473#2850316 (10Gilles) [13:18:37] RECOVERY - puppet last run on mw1179 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [13:30:40] 06Operations, 06Performance-Team, 10Thumbor: Record OOM kills as a metric with mtail - https://phabricator.wikimedia.org/T148962#2850372 (10Gilles) [13:30:43] 06Operations, 06Performance-Team, 10Thumbor: Investigate why oom_kill mtail program doesn't work properly - https://phabricator.wikimedia.org/T149980#2850370 (10Gilles) 05Open>03declined Won't purse mtail, will look at surfacing the kills in logstash instead [13:32:50] 06Operations, 06Performance-Team, 10Thumbor: Thumbor resource consumption is spiky - https://phabricator.wikimedia.org/T151851#2850375 (10Gilles) Seems to have gotten better since I fixed the big memory leak, but we'll have to wait and see. [13:33:20] (03CR) 10Alexandros Kosiaris: [C: 032] zuul: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325504 (owner: 10Tim Landscheidt) [13:33:24] (03PS2) 10Alexandros Kosiaris: zuul: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325504 (owner: 10Tim Landscheidt) [13:33:27] (03CR) 10Alexandros Kosiaris: [V: 032] zuul: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325504 (owner: 10Tim Landscheidt) [13:33:49] (03PS2) 10Alexandros Kosiaris: wikistats: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325503 (owner: 10Tim Landscheidt) [13:33:53] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] wikistats: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325503 (owner: 10Tim Landscheidt) [13:36:46] (03PS2) 10Gehel: logstash: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325494 (owner: 10Tim Landscheidt) [13:38:01] (03CR) 10Gehel: [C: 032] logstash: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325494 (owner: 10Tim Landscheidt) [13:39:39] (03PS2) 10Gehel: elasticsearch: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325486 (owner: 10Tim Landscheidt) [13:40:57] PROBLEM - puppet last run on labstore1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:42:26] (03CR) 10Gehel: [C: 032] elasticsearch: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325486 (owner: 10Tim Landscheidt) [13:42:57] RECOVERY - puppet last run on labstore1002 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [13:44:07] (03PS2) 10Alexandros Kosiaris: otrs: Provision mpm_prefork.conf [puppet] - 10https://gerrit.wikimedia.org/r/271543 [13:48:20] (03PS3) 10Alexandros Kosiaris: otrs: Provision mpm_prefork.conf [puppet] - 10https://gerrit.wikimedia.org/r/271543 [13:50:48] (03CR) 10Steinsplitter: [C: 031] "T148242#2850188" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322667 (https://phabricator.wikimedia.org/T148242) (owner: 10MarcoAurelio) [13:52:28] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: Thumbor can't render a few SVGs that Mediawiki can - https://phabricator.wikimedia.org/T150754#2850432 (10Gilles) Quite strange, this works fine locally. The magic string is definitely within the first 4k of data for that file. [13:53:05] (03Abandoned) 10Cenarium: Remove 'validate' from enwiki reviewers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/318018 (owner: 10Cenarium) [13:54:57] hashar: looks like it'll be a busy swat today [13:55:15] what's the plan? should I do it, or are you planning to do it? [13:56:15] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: Thumbor can't render a few SVGs that Mediawiki can - https://phabricator.wikimedia.org/T150754#2850450 (10Gilles) Ah, I think I understand why... The SVG engine, which is what runs in the tests, looks at 4K of data. But the https loaderr won'... [14:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Respected human, time to deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161206T1400). Please do the needful. [14:00:04] Urbanecm, dcausse, and kaldari: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [14:00:11] Present [14:00:40] here [14:00:43] o/ [14:01:08] anybody want's to do swat, or should I? [14:01:45] o/ [14:02:48] ok, looks like I'm doing the swat then! :D [14:04:30] Urbanecm: can your patches be tested at mwdebug1002? (once they are there? [14:05:24] Only the abusefilter one as I'm cswiki sysop. I can't test import because I'm not a sysop. I'll ask them. [14:06:06] (03PS3) 10Zfilipin: Create import sources list for hsbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325275 (https://phabricator.wikimedia.org/T152382) (owner: 10Urbanecm) [14:07:54] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325275 (https://phabricator.wikimedia.org/T152382) (owner: 10Urbanecm) [14:08:34] (03Merged) 10jenkins-bot: Create import sources list for hsbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325275 (https://phabricator.wikimedia.org/T152382) (owner: 10Urbanecm) [14:09:15] (03PS4) 10Zfilipin: Disable wgAbuseFilterProfile at cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325323 (https://phabricator.wikimedia.org/T149899) (owner: 10Urbanecm) [14:13:34] zeljkof, Am I supposed to do something? [14:14:01] Urbanecm: sorry, "scap pull" is taking really long now :| [14:14:08] will try again [14:14:23] zeljkof, okay, will wait. I only asked for case you've forgot to ping me :) [14:14:42] Urbanecm: sorry, forgot to let you know I'm waiting [14:14:46] Urbanecm: 325275 is at mwdebug1002, please test [14:15:18] zeljkof, I said I can't test only the abusefilter one. As I can see 325275 is import one. [14:15:29] I'm not a hsbwiki sysop so I must ask them... [14:15:48] Urbanecm: oops, sorry [14:15:48] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: Thumbor can't render a few SVGs that Mediawiki can - https://phabricator.wikimedia.org/T150754#2850490 (10Gilles) [14:16:00] are they around now? or should we deploy to cluster? [14:16:26] I asked one hsbwiki sysop. Can we move to the abusefilter one and wait if they will reply? [14:17:45] Urbanecm: sure, the patches touch different files, so we should be able to deploy them independently [14:17:59] zeljkof, okay. [14:20:30] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325323 (https://phabricator.wikimedia.org/T149899) (owner: 10Urbanecm) [14:21:08] (03Merged) 10jenkins-bot: Disable wgAbuseFilterProfile at cswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325323 (https://phabricator.wikimedia.org/T149899) (owner: 10Urbanecm) [14:21:39] (03PS3) 10Zfilipin: Add a wiki configuration tag for configured language [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319253 (https://phabricator.wikimedia.org/T149755) (owner: 10EBernhardson) [14:22:30] Does somebody see this messsage? [14:22:40] Urbanecm: I do [14:23:00] Urbanecm: 325323 is at mwdebug1002, please test [14:23:34] 06Operations, 06Performance-Team, 10Thumbor, 15User-Joe: Thumbor instances exit with exit code 0 even when crashing/failing - https://phabricator.wikimedia.org/T149560#2850504 (10Gilles) I'm going to assume that this was an OOM kill by systemd. In which case, I don't know if Thumbor or firejail's behavior... [14:24:29] zeljkof, thanks. I tested my internet connection. I'm at school and there is outage of electric energy :D [14:26:02] Urbanecm: ouch [14:26:13] did you get my message? [14:26:20] "Urbanecm: 325323 is at mwdebug1002, please test" [14:26:21] About 325323? [14:26:23] Yes. [14:26:28] testing? [14:26:33] Trying to. [14:27:19] 325323 works, please deploy it to the whole cluster. [14:27:54] Urbanecm: great [14:28:42] And the hsbwiki sysop did not reply. Please deploy it too, hsbwiki didn't crash :). I'll ask them at their Village Pump for testing and maybe you for reverting. [14:28:54] Urbanecm: ok, doing that too [14:29:21] !log zfilipin@tin Synchronized wmf-config/abusefilter.php: SWAT: [[gerrit:325323|Disable wgAbuseFilterProfile at cswiki (T149899)]] (duration: 00m 44s) [14:29:30] Thanks zeljkof. [14:29:35] (03PS4) 10Alexandros Kosiaris: otrs: Provision mpm_prefork.conf [puppet] - 10https://gerrit.wikimedia.org/r/271543 [14:29:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:29:35] T149899: Enable $wgAbuseFilterProfile at cswiki for 14 days - https://phabricator.wikimedia.org/T149899 [14:29:37] (03PS1) 10Alexandros Kosiaris: apache: Allow configuring mpm parameters [puppet] - 10https://gerrit.wikimedia.org/r/325563 [14:29:42] 06Operations, 06Performance-Team, 10Thumbor: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2850514 (10Gilles) I'm going to move all the remaining tasks to the parent, since for all intents and purposes, Thumbor has been running in production for some time now, just not... [14:30:20] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:325275|Create import sources list for hsbwiki (T152382)]] (duration: 00m 44s) [14:30:24] 06Operations, 06Performance-Team, 10Thumbor: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2850515 (10Gilles) 05Open>03Resolved a:03Gilles [14:30:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:32] T152382: Create import sources list for hsbwiki - https://phabricator.wikimedia.org/T152382 [14:30:46] Urbanecm: both your patches are deployed, please check on production and thanks for flying with #releng :) [14:31:26] 06Operations, 06Performance-Team, 10Thumbor: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2654782 (10Gilles) [14:31:26] dcausse: you are next! can you patches be tested at mwdebug1002? (once they are there) [14:31:31] 06Operations, 10MediaWiki-Maintenance-scripts, 06Performance-Team, 10Thumbor: ensure thumbor container access is preserved by mw filebackend setzoneaccess - https://phabricator.wikimedia.org/T144479#2850520 (10Gilles) [14:31:41] 06Operations, 06Performance-Team, 10Thumbor: Investigate differences in status codes between thumbor and image scalers - https://phabricator.wikimedia.org/T150641#2850524 (10Gilles) [14:31:43] 06Operations, 06Performance-Team, 10Thumbor: Thumbor inexplicably 504s intermittently on files that render fine later - https://phabricator.wikimedia.org/T150746#2850523 (10Gilles) [14:31:44] zeljkof: sure, if I can mwscript there [14:31:59] 06Operations, 06Performance-Team, 10Thumbor: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2654800 (10Gilles) [14:32:04] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: Thumbor should handle "temp" thumbnail requests - https://phabricator.wikimedia.org/T151441#2850526 (10Gilles) [14:32:13] 06Operations, 06Performance-Team, 10Thumbor: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2654805 (10Gilles) [14:32:18] 06Operations, 06Performance-Team, 10Thumbor: Implement rate limiter in Thumbor - https://phabricator.wikimedia.org/T151067#2850529 (10Gilles) [14:32:19] zeljkof: the first one should be a noop so nothing to test except that nothing blow up :) [14:32:28] 06Operations, 06Performance-Team, 10Thumbor: Implement PoolCounter support in Thumbor - https://phabricator.wikimedia.org/T151066#2850532 (10Gilles) [14:32:42] dcausse: should I push it to mwdebug1002? or cluster? [14:32:43] 06Operations, 06Performance-Team, 10Thumbor: Implement DC-local cache failure limiter in Thumbor - https://phabricator.wikimedia.org/T151065#2850535 (10Gilles) [14:32:47] 06Operations, 06Performance-Team, 10Thumbor: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2655233 (10Gilles) [14:32:49] mwdebug1002 first? [14:32:57] zeljkof: yes [14:33:00] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319253 (https://phabricator.wikimedia.org/T149755) (owner: 10EBernhardson) [14:33:05] 06Operations, 06Performance-Team, 10Thumbor: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2655235 (10Gilles) [14:33:26] 06Operations, 06Performance-Team, 10Thumbor: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2655241 (10Gilles) [14:33:31] 06Operations, 06Performance-Team, 10Thumbor: Thumbor should reject thumbnail requests that are the same size as the original or bigger - https://phabricator.wikimedia.org/T150741#2850541 (10Gilles) [14:33:36] (03Merged) 10jenkins-bot: Add a wiki configuration tag for configured language [mediawiki-config] - 10https://gerrit.wikimedia.org/r/319253 (https://phabricator.wikimedia.org/T149755) (owner: 10EBernhardson) [14:33:37] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 605 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4788564 keys, up 36 days 6 hours - replication_delay is 605 [14:33:56] (03PS3) 10Zfilipin: [cirrus] enable BM25 on all but wikis with spaceless languages [step 1/3] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324738 (https://phabricator.wikimedia.org/T152092) (owner: 10DCausse) [14:34:01] 06Operations, 06Performance-Team, 10Thumbor: Add request URL to thumbor errors - https://phabricator.wikimedia.org/T151553#2850547 (10Gilles) [14:34:01] sigh... mwscript on mwdebug1002: PHP Fatal error: Class 'Memcached' not found in /srv/mediawiki/php-1.29.0-wmf.4/includes/libs/objectcache/MemcachedPeclBagOStuff.php on line 63 [14:34:04] 06Operations, 06Performance-Team, 10Thumbor: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2655257 (10Gilles) [14:34:17] 06Operations, 06Performance-Team, 10Thumbor: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2692233 (10Gilles) [14:34:21] 06Operations, 06Performance-Team, 10Thumbor: Track incoming HTTP request count on the Thumbor boxes - https://phabricator.wikimedia.org/T151554#2850551 (10Gilles) [14:34:21] dcausse: wait, did not push it there yet, I'm a bit slow :| [14:34:23] zeljkof, it works. [14:34:29] 06Operations, 06Performance-Team, 10Thumbor: Thumbor resource consumption is spiky - https://phabricator.wikimedia.org/T151851#2850558 (10Gilles) [14:34:33] 06Operations, 06Performance-Team, 10Thumbor: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2850559 (10Gilles) [14:34:38] Urbanecm: yeah! :) [14:34:42] !log removed varnish 4.1.3-1wm4 and varnishkafka 1.0.12-1 from experimental on carbon [14:34:48] And outage is away :). [14:34:48] zeljkof: I was just testing mwscript on mwdebug1002 but it does not work, it needs some deb package I think [14:34:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:34:56] dcausse: I see [14:35:05] I'll try to test without mwscript [14:36:22] dcausse: 319253 is at mwdebug1002, please test [14:36:29] ok trying [14:38:56] zeljkof: looks good to me [14:39:09] dcausse: great, pushing to the cluster then [14:40:41] !log zfilipin@tin Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:319253|Add a wiki configuration tag for configured language (T149755)]] (duration: 00m 47s) [14:40:46] dcausse: 319253 is live, please test on production [14:40:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:40:54] T149755: Enable making configuration changes on a per-language basis - https://phabricator.wikimedia.org/T149755 [14:41:38] (03PS2) 10Alexandros Kosiaris: apache: Allow configuring mpm parameters [puppet] - 10https://gerrit.wikimedia.org/r/325563 [14:41:43] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] apache: Allow configuring mpm parameters [puppet] - 10https://gerrit.wikimedia.org/r/325563 (owner: 10Alexandros Kosiaris) [14:41:54] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324738 (https://phabricator.wikimedia.org/T152092) (owner: 10DCausse) [14:41:59] (03PS5) 10Alexandros Kosiaris: otrs: Provision mpm_prefork.conf [puppet] - 10https://gerrit.wikimedia.org/r/271543 [14:42:03] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] otrs: Provision mpm_prefork.conf [puppet] - 10https://gerrit.wikimedia.org/r/271543 (owner: 10Alexandros Kosiaris) [14:42:21] (03PS9) 10Zfilipin: Re-enable 'centralauth-rename' rights for when maintenance is done [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322667 (https://phabricator.wikimedia.org/T148242) (owner: 10MarcoAurelio) [14:42:28] zeljkof: sounds good [14:42:38] PROBLEM - puppet last run on mc1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:42:38] (03Merged) 10jenkins-bot: [cirrus] enable BM25 on all but wikis with spaceless languages [step 1/3] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/324738 (https://phabricator.wikimedia.org/T152092) (owner: 10DCausse) [14:44:56] dcausse: 324738 is at mwdebug1002, please test [14:45:21] zeljkof: hard to test with mwscript but I'll try [14:45:25] *without [14:47:05] zeljkof: well... I can't really test, but I see nothing obviously wrong when browsing with mwdebug1002 [14:47:25] dcausse: ok, in that case deploying [14:47:29] ok [14:49:04] !log zfilipin@tin Synchronized tests/cirrusTest.php: SWAT: [[gerrit:324738|[cirrus] enable BM25 on all but wikis with spaceless languages [step 1/3] (T152092)]] (duration: 00m 43s) [14:49:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:15] T152092: Activate BM25 on all but wikis with spaceless languages - https://phabricator.wikimedia.org/T152092 [14:49:50] (03PS10) 10Zfilipin: Re-enable 'centralauth-rename' rights for when maintenance is done [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322667 (https://phabricator.wikimedia.org/T148242) (owner: 10MarcoAurelio) [14:49:59] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:324738|[cirrus] enable BM25 on all but wikis with spaceless languages [step 1/3] (T152092)]] (duration: 00m 44s) [14:50:07] dcausse: 324738 is live, please test on production [14:50:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:50:15] zeljkof: looking [14:50:42] kaldari: can you test 322667 at mwdebug1002? (once it is there) [14:50:52] zeljkof: looks good, thanks! [14:51:02] yes [14:51:11] dcausse: great, thanks for flying with #releng :) [14:51:15] ;) [14:51:22] kaldari: great, will be there in a minute or two [14:53:23] (03CR) 10Zfilipin: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322667 (https://phabricator.wikimedia.org/T148242) (owner: 10MarcoAurelio) [14:53:34] 06Operations, 06Performance-Team: Upgrade labmon1001 Grafana to 4.0.1 - https://phabricator.wikimedia.org/T152473#2850595 (10Krinkle) [14:54:28] kaldari: uh oh, looks like jenkins is busy, merging the patch could take a while... [14:54:36] ok [14:54:54] like 5 minutes or 30 minutes? [14:56:01] kaldari: hard to say, a few minutes I think [14:56:28] ok, cleared, should be done in a minute [14:56:38] (03Merged) 10jenkins-bot: Re-enable 'centralauth-rename' rights for when maintenance is done [mediawiki-config] - 10https://gerrit.wikimedia.org/r/322667 (https://phabricator.wikimedia.org/T148242) (owner: 10MarcoAurelio) [14:56:52] (03CR) 10Alexandros Kosiaris: kubelet: Amend to support more than labs (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/324210 (owner: 10Alexandros Kosiaris) [14:57:34] (03CR) 10Alexandros Kosiaris: Kube-proxy: Amend to support more than labs (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/324211 (owner: 10Alexandros Kosiaris) [14:58:36] kaldari: 322667 is at mwdebug1002, please test [15:00:10] (03PS3) 10Yuvipanda: toollabs: remove host aliases for tools-exec-12[01-11] [puppet] - 10https://gerrit.wikimedia.org/r/324623 (https://phabricator.wikimedia.org/T151980) (owner: 10BryanDavis) [15:01:06] (03CR) 10Yuvipanda: [C: 032 V: 032] "Whelp. I was going to recover by saying 'I' have never done it before, but I clearly have!" [puppet] - 10https://gerrit.wikimedia.org/r/324623 (https://phabricator.wikimedia.org/T151980) (owner: 10BryanDavis) [15:03:23] kaldari: testing? [15:03:34] looking now... [15:03:38] ok [15:04:56] zeljkof: looks good on 1002, feel free to sync [15:05:05] kaldari: great, will do [15:06:31] !log zfilipin@tin Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:322667|Re-enable centralauth-rename rights for when maintenance is done (T148242 T151155)]] (duration: 00m 43s) [15:06:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:44] T148242: Fully populate local_user_id and global_user_id fields in production - https://phabricator.wikimedia.org/T148242 [15:06:44] T151155: Suspend centralauth-rename (global rename) rights until 28 November 2016 - https://phabricator.wikimedia.org/T151155 [15:07:25] !log zfilipin@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:322667|Re-enable centralauth-rename rights for when maintenance is done (T148242 T151155)]] (duration: 00m 43s) [15:07:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:07:38] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4761748 keys, up 36 days 6 hours - replication_delay is 59 [15:07:43] kaldari: it's live, please test on production [15:07:46] kaldari: interface is back in prod, i can test renaming a user if needed. btw. i think the global renamers should be ntifxed via mailinglist. and i though that the same not moore than 10 concurrent renamings rule apply. ? [15:07:46] zeljkof: looks good live. Thanks! [15:07:54] !log EU SWAT finished [15:08:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:08:14] kaldari: great, and thanks for flying with #releng today! ;) [15:08:43] Steinsplitter: Would be great if you could test an actual rename [15:08:49] 06Operations, 06Performance-Team: Upgrade labmon1001 Grafana to 4.0.1 - https://phabricator.wikimedia.org/T152473#2850634 (10Gilles) [15:09:22] Steinsplitter: Can't parse your other 2 sentences [15:09:29] --> https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/Schirrupm [15:10:31] ntifxed = notified ? [15:10:37] RECOVERY - puppet last run on mc1001 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [15:10:42] 06Operations, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2850639 (10Fjalapeno) Not sure if it is helpful to examine, but here is a commercial image service API: https://docs.imgix.com/setup/serving-images ht... [15:10:47] andre__ yes. [15:10:59] ah [15:15:53] 06Operations, 10Electron-PDFs, 06TCB-Team, 13Patch-For-Review, and 2 others: Deploy ElectronPdfService Extension to testwikis and mediawikiwiki - https://phabricator.wikimedia.org/T150944#2802107 (10Addshore) a:03Addshore [15:16:20] 06Operations, 10Electron-PDFs, 06TCB-Team, 13Patch-For-Review, and 2 others: Deploy ElectronPdfService Extension to testwikis and mediawikiwiki - https://phabricator.wikimedia.org/T150944#2802107 (10Addshore) T152390 and T152424 are related [15:17:33] kaldari: slow, but seems to work :) green light? [cc: legoktm] [15:18:18] Steinsplitter: great. Feel free to continue renaming. [15:20:00] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 6 others: Separate clusters for asynchronous processing from the ones for public consumption - https://phabricator.wikimedia.org/T152074#2850671 (10Joe) Turns out parsoid's way of contacting the backends wasn't able to support TLS termination, but... [15:24:57] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [15:33:07] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=147.00 Read Requests/Sec=263.80 Write Requests/Sec=9.70 KBytes Read/Sec=32985.20 KBytes_Written/Sec=299.20 [15:38:05] grrrit-wm1: nick [15:38:27] grrrit-wm1: help [15:38:28] My current commands are: grrrit-wm1: restart, grrrit-wm1: force-restart, and grrrit-wm1: nick [15:39:10] grrrit-wm1: nick [15:39:14] grrrit-wm1: force-restart [15:39:15] Re-connecting to Gerrit and IRC. [15:39:57] re-connected to Gerrit and IRC. [15:43:07] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=200.70 Read Requests/Sec=221.90 Write Requests/Sec=58.00 KBytes Read/Sec=2341.60 KBytes_Written/Sec=5100.40 [15:52:40] (03PS1) 10Elukey: Initial debianization [debs/prometheus-apache-exporter] - 10https://gerrit.wikimedia.org/r/325568 (https://phabricator.wikimedia.org/T147316) [15:52:47] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [15:52:58] PROBLEM - puppet last run on cp3031 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:53:32] 06Operations, 10ops-eqiad: rack/setup prometheus100[3-4] - https://phabricator.wikimedia.org/T152504#2850801 (10Cmjohnson) [15:53:49] (03PS1) 10Cmjohnson: Adding dns entriesf or prometheus1003 and 1004 both production and mgmt. T152504 [dns] - 10https://gerrit.wikimedia.org/r/325569 [15:54:00] (03PS1) 10Hashar: apt: skip commenting non existent old comment [puppet] - 10https://gerrit.wikimedia.org/r/325570 [15:57:34] (03CR) 10Cmjohnson: [C: 032] Adding dns entriesf or prometheus1003 and 1004 both production and mgmt. T152504 [dns] - 10https://gerrit.wikimedia.org/r/325569 (owner: 10Cmjohnson) [16:01:19] (03CR) 10Hashar: "The original implementation of comment_out is https://gerrit.wikimedia.org/r/#/c/33098/ and the recent change to enable backports (https:/" [puppet] - 10https://gerrit.wikimedia.org/r/325570 (owner: 10Hashar) [16:03:34] 06Operations, 10MediaWiki-ResourceLoader, 06Performance-Team, 10Traffic: Expires header for load.php should be relative to request time instead of cache time - https://phabricator.wikimedia.org/T105657#2850851 (10Krinkle) >>! In T105657#2613260, @Krinkle wrote: > Or maybe we can bump the startup module exp... [16:07:05] 06Operations, 10Analytics, 10Traffic: A/B Testing solid framework - https://phabricator.wikimedia.org/T135762#2850856 (10Gilles) [16:08:55] 06Operations, 10ops-codfw, 06DC-Ops: ms-be2025 controller failure - https://phabricator.wikimedia.org/T151201#2850860 (10Papaul) HP didn't have the replacement part. They called me this morning to let me know that they do have the part now and a Tech is schedule to me onsite tomorrow Dec. 7th between 10AM... [16:11:27] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 6 others: Separate clusters for asynchronous processing from the ones for public consumption - https://phabricator.wikimedia.org/T152074#2837406 (10Ottomata) > Not sure what the kafka replication status to codfw Status is A-ok! https://grafana.wik... [16:20:35] (03PS1) 10Milimetric: [WIP] Upgrade edit_history to mediawiki_history [puppet] - 10https://gerrit.wikimedia.org/r/325572 [16:21:03] RECOVERY - puppet last run on cp3031 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [16:23:43] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 631 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4765526 keys, up 36 days 8 hours - replication_delay is 631 [16:30:25] (03CR) 10Dzahn: base/ipmi: install freeipmi globally, move to ipmi module (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/320246 (https://phabricator.wikimedia.org/T150160) (owner: 10Dzahn) [16:31:56] (03PS12) 10Dzahn: base/ipmi: install freeipmi globally, move to ipmi module [puppet] - 10https://gerrit.wikimedia.org/r/320246 (https://phabricator.wikimedia.org/T150160) [16:33:20] (03CR) 10jenkins-bot: [V: 04-1] base/ipmi: install freeipmi globally, move to ipmi module [puppet] - 10https://gerrit.wikimedia.org/r/320246 (https://phabricator.wikimedia.org/T150160) (owner: 10Dzahn) [16:36:31] (03CR) 10Faidon Liambotis: "> The reason am dropping sources.list is default one from Debian ends up clashing with our sources.d/ or introducing duplicates URL having" [puppet] - 10https://gerrit.wikimedia.org/r/325570 (owner: 10Hashar) [16:39:41] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4761347 keys, up 36 days 8 hours - replication_delay is 0 [16:44:18] (03PS13) 10Dzahn: base/ipmi: install freeipmi globally, move to ipmi module [puppet] - 10https://gerrit.wikimedia.org/r/320246 (https://phabricator.wikimedia.org/T150160) [16:45:00] (03CR) 10Tim Landscheidt: [C: 031] "I did not test it, but looks good to me." [puppet] - 10https://gerrit.wikimedia.org/r/325462 (owner: 10Dzahn) [16:45:02] (03CR) 10Hashar: "It came with a line having a line with just the 'main' component:" [puppet] - 10https://gerrit.wikimedia.org/r/325570 (owner: 10Hashar) [16:49:27] (03CR) 10Faidon Liambotis: "How are you building this image?" [puppet] - 10https://gerrit.wikimedia.org/r/325570 (owner: 10Hashar) [16:58:48] (03PS1) 10Tim Landscheidt: Remove obsolete logrotate files [puppet] - 10https://gerrit.wikimedia.org/r/325577 [17:00:05] godog, moritzm, and _joe_: Respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161206T1700). Please do the needful. [17:01:40] (03CR) 10Dzahn: [C: 032] "recompiled one last time. http://puppet-compiler.wmflabs.org/4809/ it's like all we said before, catalogs look fine (besides ignoring the" [puppet] - 10https://gerrit.wikimedia.org/r/320246 (https://phabricator.wikimedia.org/T150160) (owner: 10Dzahn) [17:05:41] (03PS1) 10Tim Landscheidt: Remove obsolete file misc/geoiplogtag [puppet] - 10https://gerrit.wikimedia.org/r/325583 [17:13:07] 06Operations, 10Cassandra, 10RESTBase, 06Services (doing): RESTBase k-r-v as Cassandra anti-pattern (or: revision retention policies considered harmful) - https://phabricator.wikimedia.org/T144431#2851034 (10Eevans) [17:14:23] (03PS1) 10Tim Landscheidt: Remove obsolete file misc/scripts/pcntl [puppet] - 10https://gerrit.wikimedia.org/r/325587 [17:20:12] 06Operations, 13Patch-For-Review: Remote IPMI doens't work for ~17% of the fleet - https://phabricator.wikimedia.org/T150160#2851050 (10Dzahn) @Volans ^ freeipmi-tools, freeipmi-ipmidetect and freeipmi-bmc-watchdog should now also be installed on all trusty hosts (everywhere except VMs really, but i also did n... [17:27:25] (03PS3) 10Dzahn: jsbench: move files to modules/role/files/jsbench/ [puppet] - 10https://gerrit.wikimedia.org/r/325462 [17:27:38] (03Restored) 10BBlack: VCL backends 1/N [WIP] [puppet] - 10https://gerrit.wikimedia.org/r/300574 (https://phabricator.wikimedia.org/T110717) (owner: 10BBlack) [17:27:54] (03Restored) 10BBlack: VCL backends 2/N: sort misc req_handling [puppet] - 10https://gerrit.wikimedia.org/r/300579 (https://phabricator.wikimedia.org/T110717) (owner: 10BBlack) [17:28:00] (03PS11) 10BBlack: VCL app_directors 2/N: sort misc req_handling [puppet] - 10https://gerrit.wikimedia.org/r/300579 (https://phabricator.wikimedia.org/T110717) [17:28:02] (03PS5) 10BBlack: rcstream: single-backend with manual failover [puppet] - 10https://gerrit.wikimedia.org/r/317132 (https://phabricator.wikimedia.org/T147845) [17:28:04] (03PS2) 10BBlack: simplify security_audit backend for deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/324947 [17:28:06] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 6 others: Separate clusters for asynchronous processing from the ones for public consumption - https://phabricator.wikimedia.org/T152074#2851079 (10Joe) >>! In T152074#2850862, @Ottomata wrote: >> Not sure what the kafka replication status to codfw... [17:28:06] (03PS3) 10BBlack: misc: get rid of hash support and maintenance [puppet] - 10https://gerrit.wikimedia.org/r/324941 [17:28:08] (03PS7) 10BBlack: VCL refactor: split cache/app backend support [puppet] - 10https://gerrit.wikimedia.org/r/324942 [17:28:10] (03PS11) 10BBlack: VCL app_directors refactor 1/N [WIP] [puppet] - 10https://gerrit.wikimedia.org/r/300574 (https://phabricator.wikimedia.org/T110717) [17:28:15] <_joe_> ottomata: see the ticket [17:28:17] <_joe_> :) [17:29:08] Ooo [17:29:42] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 6 others: Separate clusters for asynchronous processing from the ones for public consumption - https://phabricator.wikimedia.org/T152074#2851080 (10Ottomata) Yup! [17:30:23] bblack :o :) [17:30:28] lunchin...bbl [17:30:41] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/4810/osmium.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/325462 (owner: 10Dzahn) [17:30:53] (03PS1) 10Giuseppe Lavagetto: role::mediawiki::webserver: add TLS local proxy [puppet] - 10https://gerrit.wikimedia.org/r/325591 (https://phabricator.wikimedia.org/T152074) [17:32:56] bblack: could i add this misc-web backend? https://gerrit.wikimedia.org/r/#/c/324797/ [17:35:26] mutante: can it wait a couple days? [17:35:39] bblack: yes [17:36:00] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 7 others: Separate clusters for asynchronous processing from the ones for public consumption - https://phabricator.wikimedia.org/T152074#2851115 (10greg) >>! In T152074#2849092, @greg wrote: > @Joe Should I make this an explicit follow-up from the... [17:37:22] 06Operations, 06Labs, 10Labs-Infrastructure, 07Wikimedia-Incident: labservices1001 down - https://phabricator.wikimedia.org/T152340#2845101 (10greg) This is now back up, yes? :) Incident report filed at https://wikitech.wikimedia.org/wiki/Incident_documentation/20161204-labservices1001 Closable? Follow-ups... [17:38:10] jouncebot: next [17:38:10] In 0 hour(s) and 21 minute(s): Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161206T1800) [17:39:17] (03CR) 1020after4: [C: 031] Move mwdeploy home to /var/lib where it belongs, it's a system user [puppet] - 10https://gerrit.wikimedia.org/r/323867 (https://phabricator.wikimedia.org/T86971) (owner: 10Chad) [17:39:35] (03CR) 10Dzahn: "https://gerrit.wikimedia.org/r/#/c/318451/" [puppet] - 10https://gerrit.wikimedia.org/r/325510 (owner: 10Tim Landscheidt) [17:39:50] (03CR) 10Dzahn: "https://gerrit.wikimedia.org/r/325510" [puppet] - 10https://gerrit.wikimedia.org/r/318451 (owner: 10Dzahn) [17:41:50] (03CR) 1020after4: [C: 032] Update Kafka analytics broker list for deployment-prep [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287741 (owner: 10Ottomata) [17:41:59] (03CR) 10jenkins-bot: [V: 04-1] Update Kafka analytics broker list for deployment-prep [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287741 (owner: 10Ottomata) [17:43:15] (03CR) 10Chad: [C: 04-1] "Daniel is right, we shouldn't merge just yet until .etcdrc is fixed to go to the right place." [puppet] - 10https://gerrit.wikimedia.org/r/323867 (https://phabricator.wikimedia.org/T86971) (owner: 10Chad) [17:43:36] (03PS2) 10Dzahn: racktables: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325498 (owner: 10Tim Landscheidt) [17:43:40] (03CR) 10Dzahn: [C: 032] racktables: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325498 (owner: 10Tim Landscheidt) [17:45:09] (03PS2) 10Dzahn: publichtml: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325497 (owner: 10Tim Landscheidt) [17:45:13] (03CR) 10Dzahn: [C: 032] publichtml: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325497 (owner: 10Tim Landscheidt) [17:47:25] (03CR) 10Dzahn: [C: 04-1] "this is a file not a template, shouldn't it be puppet:///modules/icinga/logrotate.conf ?" [puppet] - 10https://gerrit.wikimedia.org/r/325491 (owner: 10Tim Landscheidt) [17:49:20] (03PS2) 10Dzahn: wikimania_scholarships: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325484 (owner: 10Tim Landscheidt) [17:50:07] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 7 others: Separate clusters for asynchronous processing from the ones for public consumption - https://phabricator.wikimedia.org/T152074#2851136 (10GWicke) > the issue we're seeing here is excessive request rate from Change Propagation and probably... [17:51:13] (03CR) 10Dzahn: [C: 032] wikimania_scholarships: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325484 (owner: 10Tim Landscheidt) [17:51:27] (03CR) 1020after4: [C: 031] "This will be needed soon enough, upstream has already committed fixes for IPv6" [puppet] - 10https://gerrit.wikimedia.org/r/324841 (owner: 1020after4) [17:52:51] (03PS2) 10Dzahn: iegreview: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325473 (owner: 10Tim Landscheidt) [17:54:11] (03CR) 10Dzahn: [C: 04-1] "this one won't work, just renamed that file to give it an .sh extension" [puppet] - 10https://gerrit.wikimedia.org/r/325492 (owner: 10Tim Landscheidt) [17:54:31] (03CR) 10Dzahn: [C: 032] iegreview: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325473 (owner: 10Tim Landscheidt) [17:55:04] (03PS2) 10Dzahn: rsync: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325500 (owner: 10Tim Landscheidt) [17:56:47] (03CR) 10Dzahn: [C: 032] rsync: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325500 (owner: 10Tim Landscheidt) [17:57:07] (03PS2) 10Dzahn: gerrit: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325490 (owner: 10Tim Landscheidt) [17:58:19] (03CR) 10Dzahn: [C: 031] Remove obsolete file misc/scripts/pcntl [puppet] - 10https://gerrit.wikimedia.org/r/325587 (owner: 10Tim Landscheidt) [17:59:13] (03CR) 10Dzahn: [C: 031] "looks really old" [puppet] - 10https://gerrit.wikimedia.org/r/325583 (owner: 10Tim Landscheidt) [18:00:04] yurik, gwicke, cscott, arlolra, subbu, halfak, and Amir1: Respected human, time to deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161206T1800). Please do the needful. [18:00:13] (03CR) 10Dzahn: [C: 032] gerrit: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325490 (owner: 10Tim Landscheidt) [18:00:54] (03PS2) 10Dzahn: kibana: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325499 (owner: 10Tim Landscheidt) [18:00:58] (03PS3) 10Dzahn: kibana: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325499 (owner: 10Tim Landscheidt) [18:02:25] (03CR) 10Dzahn: [C: 032] kibana: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325499 (owner: 10Tim Landscheidt) [18:03:08] (03PS1) 10Andrew Bogott: puppet panel: teeny, tiny cleanup [puppet] - 10https://gerrit.wikimedia.org/r/325594 [18:03:10] (03PS1) 10Andrew Bogott: puppettab: Add 'unknown classes' widget [puppet] - 10https://gerrit.wikimedia.org/r/325595 [18:03:26] 06Operations, 10Gerrit, 06Release-Engineering-Team, 10hardware-requests: Requesting 1 spare misc box for Gerrit in codfw - https://phabricator.wikimedia.org/T148187#2851196 (10RobH) The server allocation of WMF6408 was approved by @mark on sub-task T150885, that sub-task also includes the ordering of dual... [18:03:38] (03PS2) 10Dzahn: extdist: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325489 (owner: 10Tim Landscheidt) [18:04:57] (03PS2) 10Andrew Bogott: puppet panel: teeny, tiny cleanup [puppet] - 10https://gerrit.wikimedia.org/r/325594 [18:04:59] (03CR) 10jenkins-bot: [V: 04-1] puppettab: Add 'unknown classes' widget [puppet] - 10https://gerrit.wikimedia.org/r/325595 (owner: 10Andrew Bogott) [18:05:04] (03PS2) 10Dzahn: exim4: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325488 (owner: 10Tim Landscheidt) [18:06:46] 06Operations, 06Analytics-Kanban, 06Zero, 05Security, 07audits-data-retention: Purge > 90 days stat1002:/a/squid/archive/zero - https://phabricator.wikimedia.org/T92343#2851206 (10Nuria) 05Open>03Resolved [18:06:51] (03CR) 10Dzahn: [C: 032] extdist: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325489 (owner: 10Tim Landscheidt) [18:07:04] (03PS3) 10Dzahn: exim4: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325488 (owner: 10Tim Landscheidt) [18:07:04] 06Operations, 06Analytics-Kanban, 13Patch-For-Review, 05Security, 07audits-data-retention: Purge > 90 days stat1002:/a/squid/archive/glam_nara - https://phabricator.wikimedia.org/T92340#2851209 (10Nuria) 05Open>03Resolved [18:07:41] 06Operations, 06Analytics-Kanban, 06Zero, 07Mobile, and 2 others: Purge > 90 days stat1002:/a/squid/archive/mobile - https://phabricator.wikimedia.org/T92341#2851213 (10Nuria) 05Open>03Resolved [18:07:57] 06Operations, 06Analytics-Kanban, 06Zero, 05Security, 07audits-data-retention: Purge > 90 days stat1002:/a/squid/archive/sampled - https://phabricator.wikimedia.org/T92342#2851215 (10Nuria) 05Open>03Resolved [18:08:08] (03CR) 10Andrew Bogott: [C: 032] puppet panel: teeny, tiny cleanup [puppet] - 10https://gerrit.wikimedia.org/r/325594 (owner: 10Andrew Bogott) [18:08:13] (03PS3) 10Andrew Bogott: puppet panel: teeny, tiny cleanup [puppet] - 10https://gerrit.wikimedia.org/r/325594 [18:08:31] no parsoid deploy today [18:11:23] (03PS2) 10Andrew Bogott: puppettab: Add 'unknown classes' widget [puppet] - 10https://gerrit.wikimedia.org/r/325595 [18:11:30] (03CR) 10Dzahn: [C: 032] exim4: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325488 (owner: 10Tim Landscheidt) [18:11:31] PROBLEM - Check if rsync server is running on labsdb1006 is CRITICAL: PROCS CRITICAL: 0 processes with command name rsync, regex args /usr/bin/rsync --no-detach --daemon [18:11:37] (03PS4) 10Dzahn: exim4: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325488 (owner: 10Tim Landscheidt) [18:12:38] (03PS3) 10Andrew Bogott: puppettab: Add 'unknown classes' widget [puppet] - 10https://gerrit.wikimedia.org/r/325595 [18:15:24] 06Operations, 06Analytics-Kanban, 05Security, 07audits-data-retention: Purge > 90 days stat1002:/a/squid/archive/api - https://phabricator.wikimedia.org/T92338#1106670 (10Dzahn) @Milimetric is this resolved? just noticed the parent task was closed today. [18:16:19] (03CR) 10Andrew Bogott: puppettab: Add 'unknown classes' widget (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/325595 (owner: 10Andrew Bogott) [18:16:52] (03PS2) 10Dzahn: eventlogging: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325487 (owner: 10Tim Landscheidt) [18:19:47] 06Operations, 06Performance-Team: Upgrade labmon1001 Grafana to 4.0.1 - https://phabricator.wikimedia.org/T152473#2851253 (10fgiunchedi) I'd be ok with trying the 4.0 on grafana-labs, adding #labs for visibility. The process would be to download/verify/install the 4.0 deb on labmon and if it goes well update t... [18:21:56] 06Operations, 10Ops-Access-Requests, 10Gerrit: Root for Mukunda for Gerrit machine(s) - https://phabricator.wikimedia.org/T152236#2842574 (10Florian) Would be great to have one more person able to maintain gerrit, and from what I know from him, I would trust him! (However, I'm not sure, if my vote counts :P) [18:22:59] 06Operations, 06Analytics-Kanban, 05Security, 07audits-data-retention: Purge > 90 days stat1002:/a/squid/archive/api - https://phabricator.wikimedia.org/T92338#1106670 (10Milimetric) 05Open>03Resolved (just forgot to move this over to the done, the data was deleted and this task is resolved.) [18:23:03] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 3 others: Check concurrency/retry/timeout limits and syncronize those between services - https://phabricator.wikimedia.org/T152073#2851281 (10GWicke) [18:23:51] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 48 failures. Last run 2 minutes ago with 48 failures. Failed resources (up to 3 shown): Package[apt-transport-https],Package[tree],Package[ngrep],Package[git] [18:25:49] (03PS1) 10Dzahn: add gerrit2001.mgmt for WMF6408.mgmt [dns] - 10https://gerrit.wikimedia.org/r/325596 (https://phabricator.wikimedia.org/T148186) [18:25:59] (03CR) 10Dzahn: [C: 032] eventlogging: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325487 (owner: 10Tim Landscheidt) [18:26:09] (03PS2) 10Dzahn: statistics: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325482 (owner: 10Tim Landscheidt) [18:26:51] PROBLEM - puppet last run on ms-be1022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:27:10] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 3 others: Check concurrency/retry/timeout limits and syncronize those between services - https://phabricator.wikimedia.org/T152073#2851297 (10GWicke) [18:27:56] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 3 others: Check concurrency/retry/timeout limits and syncronize those between services - https://phabricator.wikimedia.org/T152073#2837385 (10GWicke) [18:30:08] (03PS2) 10Dzahn: add gerrit2001.mgmt for WMF6408.mgmt [dns] - 10https://gerrit.wikimedia.org/r/325596 (https://phabricator.wikimedia.org/T148186) [18:30:19] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 3 others: Check concurrency/retry/timeout limits and syncronize those between services - https://phabricator.wikimedia.org/T152073#2851305 (10GWicke) [18:30:28] (03CR) 10jenkins-bot: [V: 04-1] add gerrit2001.mgmt for WMF6408.mgmt [dns] - 10https://gerrit.wikimedia.org/r/325596 (https://phabricator.wikimedia.org/T148186) (owner: 10Dzahn) [18:31:41] (03PS4) 10Andrew Bogott: puppettab: Add 'unknown classes' widget [puppet] - 10https://gerrit.wikimedia.org/r/325595 (https://phabricator.wikimedia.org/T152472) [18:31:54] (03PS3) 10Dzahn: add gerrit2001.mgmt for WMF6408.mgmt [dns] - 10https://gerrit.wikimedia.org/r/325596 (https://phabricator.wikimedia.org/T148186) [18:32:32] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 3 others: Check concurrency/retry/timeout limits and syncronize those between services - https://phabricator.wikimedia.org/T152073#2851319 (10Pchelolo) [18:32:47] (03CR) 10Dzahn: [C: 032 V: 032] statistics: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325482 (owner: 10Tim Landscheidt) [18:32:57] (03PS2) 10Dzahn: openldap: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325478 (owner: 10Tim Landscheidt) [18:34:27] 06Operations, 13Patch-For-Review, 05Prometheus-metrics-monitoring: Port application-specific metrics from ganglia to prometheus - https://phabricator.wikimedia.org/T145659#2851331 (10fgiunchedi) [18:34:29] 06Operations, 10Traffic, 13Patch-For-Review, 05Prometheus-metrics-monitoring: Port vhtcpd statistics from ganglia to prometheus - https://phabricator.wikimedia.org/T147429#2851328 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi This is deployed, I've updated https://grafana.wikimedia.org/dashboard/db... [18:36:08] 06Operations, 10Ops-Access-Requests, 10Gerrit: Root for Mukunda for Gerrit machine(s) - https://phabricator.wikimedia.org/T152236#2842574 (10mmodell) Thanks Florian :) [18:38:20] (03CR) 10Dzahn: [C: 031] openldap: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325478 (owner: 10Tim Landscheidt) [18:38:24] (03CR) 10Dzahn: [C: 031] noc: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325477 (owner: 10Tim Landscheidt) [18:40:31] RECOVERY - Check if rsync server is running on labsdb1006 is OK: PROCS OK: 1 process with command name rsync, regex args /usr/bin/rsync --no-detach --daemon [18:43:23] I have no clue what that is doing tbh [18:43:52] osm master iirc on labsdb1006, maybe related [18:44:03] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 3 others: Check concurrency/retry/timeout limits and syncronize those between services - https://phabricator.wikimedia.org/T152073#2851364 (10GWicke) [18:46:44] (03CR) 10Dzahn: [C: 032] openldap: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325478 (owner: 10Tim Landscheidt) [18:47:12] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 3 others: Check concurrency/retry/timeout limits and syncronize those between services - https://phabricator.wikimedia.org/T152073#2851397 (10GWicke) [18:47:44] (03PS2) 10Dzahn: noc: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325477 (owner: 10Tim Landscheidt) [18:48:11] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 3 others: Check concurrency/retry/timeout limits and syncronize those between services - https://phabricator.wikimedia.org/T152073#2851403 (10Pchelolo) [18:49:56] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 3 others: Check concurrency/retry/timeout limits and syncronize those between services - https://phabricator.wikimedia.org/T152073#2851406 (10GWicke) [18:50:38] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 3 others: Check concurrency/retry/timeout limits and syncronize those between services - https://phabricator.wikimedia.org/T152073#2851407 (10Pchelolo) [18:51:41] (03PS5) 10Andrew Bogott: puppettab: Add 'unknown classes' widget [puppet] - 10https://gerrit.wikimedia.org/r/325595 (https://phabricator.wikimedia.org/T152472) [18:51:41] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 628 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4767681 keys, up 36 days 10 hours - replication_delay is 628 [18:52:52] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [18:54:46] (03PS1) 10Chad: group0 to wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325600 [18:54:51] RECOVERY - puppet last run on ms-be1022 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [18:55:39] !log demon@tin Started scap: testwiki to wmf.5 to bootstrap [18:55:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:56:17] 06Operations, 06Labs, 10Labs-Infrastructure, 07Wikimedia-Incident: labservices1001 down - https://phabricator.wikimedia.org/T152340#2851417 (10fgiunchedi) @greg yeah now back up! this task is one of the followups though, I'll clarify it a bit [18:57:09] 06Operations, 06Labs, 10Labs-Infrastructure, 07Wikimedia-Incident: labservices1001 down, suspected overheating - https://phabricator.wikimedia.org/T152340#2851418 (10fgiunchedi) [18:58:23] (03CR) 10Dzahn: [C: 032] noc: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325477 (owner: 10Tim Landscheidt) [19:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161206T1900). [19:00:27] looks like i have the only patch, will deploy [19:00:55] Um, I'm mid-scap [19:01:00] * ostriches stole swat window [19:01:17] ostriches: lemme know when you're done [19:04:18] Go ahead and merge it to master, wmf.4 and wmf.5 [19:04:30] That way we can do it quickly :) [19:04:34] When times comes [19:04:41] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4760425 keys, up 36 days 10 hours - replication_delay is 39 [19:05:02] No morning SWAT, it's night and foggy here :P [19:05:08] ostriches: sure, i'm just waiting on gerrit atm [19:06:00] mafk: and european midday swat is before the sun comes up this time of year ;-) [19:06:13] Moral of the story: timezones suck, utc 4 life [19:07:23] (03Abandoned) 10Tim Landscheidt: Remove obsolete PPA key files [puppet] - 10https://gerrit.wikimedia.org/r/325510 (owner: 10Tim Landscheidt) [19:07:38] I feel like lighting the fireplace [19:07:48] (03CR) 10Tim Landscheidt: [C: 031] "They are no longer used (IIRC once for Openstack)." [puppet] - 10https://gerrit.wikimedia.org/r/318451 (owner: 10Dzahn) [19:10:24] jouncebot: next [19:10:24] In 0 hour(s) and 49 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161206T2000) [19:10:41] mafk here too. [19:14:30] (03CR) 10Ottomata: "Isn't hash needed for RCStream? We don't plan to turn that off until probably the end of Q4." [puppet] - 10https://gerrit.wikimedia.org/r/324941 (owner: 10BBlack) [19:15:25] (03PS2) 10EBernhardson: contint: Install php7.0-ast for phan [puppet] - 10https://gerrit.wikimedia.org/r/315711 (https://phabricator.wikimedia.org/T132636) (owner: 10Legoktm) [19:15:38] (03CR) 10Ottomata: "Oh, I see, next patch. Cool." [puppet] - 10https://gerrit.wikimedia.org/r/324941 (owner: 10BBlack) [19:16:08] (03CR) 10Ottomata: [C: 031] "This should be fine, especially since we have a tentative deprecation timeline for RCStream." [puppet] - 10https://gerrit.wikimedia.org/r/317132 (https://phabricator.wikimedia.org/T147845) (owner: 10BBlack) [19:16:12] (03CR) 10Ottomata: [C: 031] misc: get rid of hash support and maintenance [puppet] - 10https://gerrit.wikimedia.org/r/324941 (owner: 10BBlack) [19:16:28] (03CR) 10Ottomata: [C: 031] VCL app_directors refactor 1/N [WIP] [puppet] - 10https://gerrit.wikimedia.org/r/300574 (https://phabricator.wikimedia.org/T110717) (owner: 10BBlack) [19:17:41] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 634 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4760789 keys, up 36 days 10 hours - replication_delay is 634 [19:27:01] 06Operations, 13Patch-For-Review: Remote IPMI doesn't work for ~17% of the fleet - https://phabricator.wikimedia.org/T150160#2851569 (10Aklapper) [19:33:52] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-3/2/3: down - Core: cr2-codfw:xe-5/0/1 (Zayo, OGYX/120003//ZYO) 36ms {#2909} [10Gbps wave]BR [19:34:37] 06Operations, 10Ops-Access-Requests: Requesting access to Labs Root for bd808 - https://phabricator.wikimedia.org/T152520#2851625 (10Andrew) [19:35:31] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 41 probes of 413 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [19:36:21] PROBLEM - Apache HTTP on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:36:51] PROBLEM - HHVM rendering on mw1232 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:37:41] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4760016 keys, up 36 days 11 hours - replication_delay is 10 [19:38:56] (03CR) 10Dzahn: "@Tim Landscheidt, i merged the ones for templates, but i believe the ones for files are actually not the right puppet URL." [puppet] - 10https://gerrit.wikimedia.org/r/325492 (owner: 10Tim Landscheidt) [19:40:31] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 1 probes of 413 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [19:42:59] (03PS3) 10EBernhardson: contint: Install php7.0-ast for phan [puppet] - 10https://gerrit.wikimedia.org/r/315711 (https://phabricator.wikimedia.org/T132636) (owner: 10Legoktm) [19:43:20] (03CR) 10EBernhardson: "updated package name to php-ast, and verified on integration-slave-jessie-1002 that the package name is correct." [puppet] - 10https://gerrit.wikimedia.org/r/315711 (https://phabricator.wikimedia.org/T132636) (owner: 10Legoktm) [19:44:38] !log demon@tin Finished scap: testwiki to wmf.5 to bootstrap (duration: 48m 58s) [19:44:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:47:33] ebernhardson: I'm done until noon. [19:47:36] You got 15m :p [19:49:11] 06Operations, 10Gerrit, 06Release-Engineering-Team: setup/install gerrit2001/WMF6408 - https://phabricator.wikimedia.org/T152525#2851729 (10RobH) [19:49:20] 06Operations, 10Gerrit, 06Release-Engineering-Team: setup/install gerrit2001/WMF6408 - https://phabricator.wikimedia.org/T152525#2851729 (10RobH) [19:49:27] (03CR) 10Tim Landscheidt: [C: 031] "Didn't test, but LGTM." [puppet] - 10https://gerrit.wikimedia.org/r/325460 (owner: 10Dzahn) [19:49:58] 06Operations, 10Gerrit, 06Release-Engineering-Team: setup/install gerrit2001/WMF6408 - https://phabricator.wikimedia.org/T152525#2851729 (10RobH) [19:50:41] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 631 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4760548 keys, up 36 days 11 hours - replication_delay is 631 [19:51:55] (03CR) 1020after4: [C: 031] contint: Install php7.0-ast for phan [puppet] - 10https://gerrit.wikimedia.org/r/315711 (https://phabricator.wikimedia.org/T132636) (owner: 10Legoktm) [19:51:56] 06Operations, 10ops-codfw: update the label and racktables entry for gerrit2001/WMF6408 & install SSDs - https://phabricator.wikimedia.org/T152527#2851774 (10RobH) [19:51:59] !log asw2-d swapping cable fpc2 <-> fpc5 (paravoid) [19:52:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:53:43] 06Operations, 10ops-codfw: update the label and racktables entry for gerrit2001/WMF6408 & install SSDs - https://phabricator.wikimedia.org/T152527#2851794 (10RobH) [19:54:01] ostriches: :P thanks [19:56:37] !log ebernhardson@tin Synchronized php-1.29.0-wmf.5/extensions/PageImages: T152155: Add job queue option to PageImages initImageData maint script (duration: 00m 45s) [19:56:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:56:48] T152155: Thumbnails are not showing in search on multiple platforms - https://phabricator.wikimedia.org/T152155 [19:57:41] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4759648 keys, up 36 days 11 hours - replication_delay is 38 [19:58:39] !log ebernhardson@tin Synchronized php-1.29.0-wmf.4/extensions/PageImages: T152155: Add job queue option to PageImages initImageData maint script (duration: 00m 45s) [19:58:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:00:04] ostriches: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161206T2000). [20:00:21] ostriches: ok all done [20:00:34] Awesome [20:03:35] (03CR) 10Ottomata: [C: 031] Update Kafka analytics broker list for deployment-prep [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287741 (owner: 10Ottomata) [20:03:41] (03PS1) 10Dzahn: move "pcntl" from operations/puppet repo over here [software] - 10https://gerrit.wikimedia.org/r/325616 [20:03:50] (03CR) 10Tim Landscheidt: "*argl* I'm sure I have been bitten by this inconsistency in the past. I'll amend/revert the changes accordingly." [puppet] - 10https://gerrit.wikimedia.org/r/325492 (owner: 10Tim Landscheidt) [20:03:52] (03PS1) 10Rush: tools: setup a 'logs' dir in each tool on creation [puppet] - 10https://gerrit.wikimedia.org/r/325617 [20:04:03] (03CR) 10Tim Landscheidt: "(Thanks.)" [puppet] - 10https://gerrit.wikimedia.org/r/325492 (owner: 10Tim Landscheidt) [20:05:13] (03CR) 10Chad: [C: 032] group0 to wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325600 (owner: 10Chad) [20:05:25] (03CR) 10Dzahn: [C: 032] "just importing file from puppet repo" [software] - 10https://gerrit.wikimedia.org/r/325616 (owner: 10Dzahn) [20:05:57] (03Merged) 10jenkins-bot: group0 to wmf.5 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325600 (owner: 10Chad) [20:06:12] (03PS2) 10Dzahn: Remove obsolete file misc/scripts/pcntl [puppet] - 10https://gerrit.wikimedia.org/r/325587 (owner: 10Tim Landscheidt) [20:06:26] (03CR) 10Dzahn: [V: 032] move "pcntl" from operations/puppet repo over here [software] - 10https://gerrit.wikimedia.org/r/325616 (owner: 10Dzahn) [20:08:22] (03PS3) 10Dzahn: Remove obsolete file misc/scripts/pcntl [puppet] - 10https://gerrit.wikimedia.org/r/325587 (owner: 10Tim Landscheidt) [20:08:38] !log demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.5 [20:08:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:05] (03CR) 10Dzahn: "moved to operations/software repo" [puppet] - 10https://gerrit.wikimedia.org/r/325587 (owner: 10Tim Landscheidt) [20:10:53] (03CR) 10Filippo Giunchedi: [C: 04-1] "LGTM overall, only invalid flag in default file" (031 comment) [debs/prometheus-apache-exporter] - 10https://gerrit.wikimedia.org/r/325568 (https://phabricator.wikimedia.org/T147316) (owner: 10Elukey) [20:12:19] (03PS2) 10Tim Landscheidt: base: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325468 [20:12:59] (03PS4) 10Dzahn: Remove obsolete file misc/scripts/pcntl [puppet] - 10https://gerrit.wikimedia.org/r/325587 (owner: 10Tim Landscheidt) [20:14:41] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 622 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4760268 keys, up 36 days 11 hours - replication_delay is 622 [20:17:34] (03PS6) 10Andrew Bogott: puppettab: Add 'Other classes' widget [puppet] - 10https://gerrit.wikimedia.org/r/325595 (https://phabricator.wikimedia.org/T152472) [20:18:34] (03CR) 10Dzahn: "kind of strange how PS3 did not remove the file anymore, eh?" [puppet] - 10https://gerrit.wikimedia.org/r/325587 (owner: 10Tim Landscheidt) [20:18:42] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4759105 keys, up 36 days 11 hours - replication_delay is 49 [20:22:04] (03PS2) 10Tim Landscheidt: scap: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325501 [20:22:49] (03CR) 10Dzahn: [C: 032] Remove obsolete file misc/scripts/pcntl [puppet] - 10https://gerrit.wikimedia.org/r/325587 (owner: 10Tim Landscheidt) [20:23:13] (03CR) 10Dzahn: [V: 032] Remove obsolete file misc/scripts/pcntl [puppet] - 10https://gerrit.wikimedia.org/r/325587 (owner: 10Tim Landscheidt) [20:24:49] (03Abandoned) 10Tim Landscheidt: aptly: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325485 (owner: 10Tim Landscheidt) [20:27:15] (03CR) 10BryanDavis: "a few inline nits. I'm not sure if it's worth the effort to rename all the files and classes with s/unknown/other/ or not." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/325595 (https://phabricator.wikimedia.org/T152472) (owner: 10Andrew Bogott) [20:27:54] (03PS2) 10Tim Landscheidt: beta: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325469 [20:29:14] ostriches hi, im wondering if this chron https://github.com/wikimedia/operations-puppet/blob/1be74f76d12bd31e1989bcab99822284d6113d7e/modules/gerrit/manifests/crons.pp#L2 is still needed? [20:29:28] Yes [20:29:41] PROBLEM - puppet last run on mw1202 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:29:50] Ok [20:31:37] (03PS2) 10Tim Landscheidt: contint: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325470 [20:32:43] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:32:43] (03CR) 10Alex Monk: [C: 04-1] "Have any of them asked for it?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325531 (https://phabricator.wikimedia.org/T152489) (owner: 10MarcoAurelio) [20:32:51] PROBLEM - puppet last run on elastic1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:33:50] (03PS2) 10Tim Landscheidt: dataset: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325472 [20:36:56] (03PS2) 10Tim Landscheidt: icinga: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325491 [20:37:36] (03PS7) 10Andrew Bogott: puppettab: Add 'Other classes' widget [puppet] - 10https://gerrit.wikimedia.org/r/325595 (https://phabricator.wikimedia.org/T152472) [20:40:42] (03Abandoned) 10Tim Landscheidt: install_server: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325474 (owner: 10Tim Landscheidt) [20:43:18] (03PS2) 10Tim Landscheidt: ipmi: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325492 [20:45:21] (03PS2) 10Tim Landscheidt: labstore: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325493 [20:47:14] (03CR) 10MarcoAurelio: "> Have any of them asked for it?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325531 (https://phabricator.wikimedia.org/T152489) (owner: 10MarcoAurelio) [20:47:17] (03Abandoned) 10Tim Landscheidt: ldap: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325475 (owner: 10Tim Landscheidt) [20:48:44] (03PS2) 10Tim Landscheidt: mediawiki: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325476 [20:49:47] (03PS2) 10Tim Landscheidt: mediawiki_singlenode: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325495 [20:51:12] (03Abandoned) 10Tim Landscheidt: ocg: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325496 (owner: 10Tim Landscheidt) [20:53:55] (03CR) 10Chad: "Is ipblock-exempt all that common on private wikis?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/325531 (https://phabricator.wikimedia.org/T152489) (owner: 10MarcoAurelio) [20:54:01] (03PS2) 10Tim Landscheidt: openstack: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325479 [20:56:39] (03Abandoned) 10Tim Landscheidt: smokeping: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325480 (owner: 10Tim Landscheidt) [20:56:58] (03Abandoned) 10Tim Landscheidt: snapshot: Fix puppet URLs in comments [puppet] - 10https://gerrit.wikimedia.org/r/325481 (owner: 10Tim Landscheidt) [20:58:18] (03PS2) 10Tim Landscheidt: udp2log: Fix puppet URL in comment [puppet] - 10https://gerrit.wikimedia.org/r/325483 [20:58:41] RECOVERY - puppet last run on mw1202 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [21:00:01] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [21:00:03] (03PS8) 10Andrew Bogott: puppettab: Add 'Other classes' widget [puppet] - 10https://gerrit.wikimedia.org/r/325595 (https://phabricator.wikimedia.org/T152472) [21:00:11] PROBLEM - check_puppetrun on pay-lvs1002 is CRITICAL: CRITICAL: Puppet has 10 failures [21:00:44] (03PS1) 10Cmjohnson: Adding mac addresses for prometheus1003 and 1004 T152504 [puppet] - 10https://gerrit.wikimedia.org/r/325628 [21:00:51] RECOVERY - puppet last run on elastic1028 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [21:02:31] PROBLEM - Host google is DOWN: PING CRITICAL - Packet loss = 54%, RTA = 2609.19 ms [21:02:33] (03PS1) 10Tim Landscheidt: zuul: Fix puppet URL in comment correctly [puppet] - 10https://gerrit.wikimedia.org/r/325629 [21:03:01] RECOVERY - Host google is UP: PING OK - Packet loss = 0%, RTA = 74.62 ms [21:03:09] mutante ^^ [21:03:15] why is google showing up [21:03:31] PROBLEM - Host google is DOWN: PING CRITICAL - Packet loss = 0%, RTA = 2135.37 ms [21:03:51] RECOVERY - Host google is UP: PING OK - Packet loss = 0%, RTA = 152.01 ms [21:03:58] lol google [21:04:00] ostriches: still deploying? I'm a genius and didn't submodule update before syncing ... [21:04:27] it was a maint script only change so didn't check immediatly (lunch time!) [21:04:41] ebernhardson: Nah I'm done [21:04:49] paladox: it's because there are special checks like this: [21:04:50] check google safe browsing for mediawiki.org [21:05:09] Oh [21:05:12] RECOVERY - check_puppetrun on pay-lvs1002 is OK: OK: Puppet is currently enabled, last run 225 seconds ago with 0 failures [21:05:21] !log CI has some kind of overloading since roughly 19:00UTC investigating. [21:05:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:05:42] paladox: it's about checking that we are not on https://en.wikipedia.org/wiki/Google_Safe_Browsing [21:06:03] oh [21:06:08] paladox: then there needs to be a virtual host those checks are associated with, and that is the Google IP [21:06:18] !log ebernhardson@tin Synchronized php-1.29.0-wmf.4/extensions/PageImages: T152155: Add job queue option to PageImages initImageData maint script (duration: 00m 46s) [21:06:21] oh [21:06:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:06:31] T152155: Thumbnails are not showing in search on multiple platforms - https://phabricator.wikimedia.org/T152155 [21:07:30] !log ebernhardson@tin Synchronized php-1.29.0-wmf.5/extensions/PageImages: T152155: Add job queue option to PageImages initImageData maint script (duration: 00m 57s) [21:07:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:12:47] !log CI is working. Lot of changes caused it to reach the limit of queries Nodepool can do to wmflabs, the queue is being processed just fine though. [21:12:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:17:32] 06Operations, 10Continuous-Integration-Config, 06Operations-Software-Development: tox-jessie is failing on operations/software - https://phabricator.wikimedia.org/T152549#2852322 (10hashar) There are a bunch of flake8 errors. Most probably flake8 is not pinned to a specific version and thus use the latest on... [21:19:43] 06Operations, 10Continuous-Integration-Config, 06Operations-Software-Development: tox-jessie is failing on operations/software - https://phabricator.wikimedia.org/T152549#2852326 (10hashar) Looking at previously merged changes, https://gerrit.wikimedia.org/r/#/c/322619/ failed on November 21th. So something... [21:21:11] (03PS1) 10Andrew Bogott: wmfkeystonehooks: Don't bother adding novaobserver to projects [puppet] - 10https://gerrit.wikimedia.org/r/325633 (https://phabricator.wikimedia.org/T150092) [21:21:22] !log CI: pushed a new Jessie image that is faster to boot, should slightly help the current load. T113342 [21:21:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:21:34] T113342: Speed up the time to get a Nodepool instances to achieve READY state - https://phabricator.wikimedia.org/T113342 [21:22:32] (03CR) 10BryanDavis: [C: 031] puppettab: Add 'Other classes' widget [puppet] - 10https://gerrit.wikimedia.org/r/325595 (https://phabricator.wikimedia.org/T152472) (owner: 10Andrew Bogott) [21:25:20] (03CR) 10Andrew Bogott: [C: 032] puppettab: Add 'Other classes' widget [puppet] - 10https://gerrit.wikimedia.org/r/325595 (https://phabricator.wikimedia.org/T152472) (owner: 10Andrew Bogott) [21:30:11] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet has 1 failures [21:33:32] (03PS1) 10Tim Landscheidt: Revert "elasticsearch: Fix puppet URL in comment" [puppet] - 10https://gerrit.wikimedia.org/r/325639 [21:33:45] (03Abandoned) 10Andrew Bogott: wmfkeystonehooks: Don't bother adding novaobserver to projects [puppet] - 10https://gerrit.wikimedia.org/r/325633 (https://phabricator.wikimedia.org/T150092) (owner: 10Andrew Bogott) [21:35:11] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet has 1 failures [21:36:19] (03PS3) 10Dzahn: installserver: move http to own class (kill carbon WIP) [puppet] - 10https://gerrit.wikimedia.org/r/322829 (https://phabricator.wikimedia.org/T132757) [21:37:35] (03CR) 10Dzahn: [C: 032] Revert "elasticsearch: Fix puppet URL in comment" [puppet] - 10https://gerrit.wikimedia.org/r/325639 (owner: 10Tim Landscheidt) [21:38:09] (03PS1) 10Andrew Bogott: Novaobserver: novaobserver isn't in the admin project. [puppet] - 10https://gerrit.wikimedia.org/r/325643 (https://phabricator.wikimedia.org/T150092) [21:40:11] RECOVERY - check_puppetrun on boron is OK: OK: Puppet is currently enabled, last run 85 seconds ago with 0 failures [21:42:52] (03PS4) 10Dzahn: installserver: move http to own class [puppet] - 10https://gerrit.wikimedia.org/r/322829 (https://phabricator.wikimedia.org/T132757) [21:42:57] (03CR) 10Dzahn: "http://puppet-compiler.wmflabs.org/4812/" [puppet] - 10https://gerrit.wikimedia.org/r/322829 (https://phabricator.wikimedia.org/T132757) (owner: 10Dzahn) [21:49:04] (03PS2) 10Rush: tools: setup a 'logs' dir in each tool on creation [puppet] - 10https://gerrit.wikimedia.org/r/325617 [21:49:26] (03CR) 10Dzahn: [C: 032] "Leroy Jenkins (https://www.youtube.com/watch?v=Rj22DbRoAPM): reeeeeecheck" [puppet] - 10https://gerrit.wikimedia.org/r/322829 (https://phabricator.wikimedia.org/T132757) (owner: 10Dzahn) [21:50:07] mutante: haven't seen a good leroy jenkins ref in a while, thanks [21:50:41] chasemp: haha, i just found the video "reimagined as a short film" :) [21:52:01] such a pity the original video got recompressed :( [21:52:06] cant hear the famous phrase anymore [21:53:09] 06Operations, 10Ops-Access-Requests, 10Gerrit: Root for Mukunda for Gerrit machine(s) - https://phabricator.wikimedia.org/T152236#2852453 (10MarcoAurelio) Same as Florian. Not sure if my opinion counts (left a lovely token above just in case) but I think @mmodell would be a good Gerrit mantainer. [21:56:31] PROBLEM - puppet last run on labservices1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/usr/local/bin/labs-ip-alias-dump.py] [21:57:29] ^ andrewbogott any ideas? [21:57:51] (03CR) 10Rush: [C: 032 V: 032] tools: setup a 'logs' dir in each tool on creation [puppet] - 10https://gerrit.wikimedia.org/r/325617 (owner: 10Rush) [21:58:28] chasemp: looking [22:03:01] PROBLEM - puppet last run on labservices1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/usr/local/bin/labs-ip-alias-dump.py] [22:03:25] (03CR) 10Cmjohnson: [C: 032] Adding mac addresses for prometheus1003 and 1004 T152504 [puppet] - 10https://gerrit.wikimedia.org/r/325628 (owner: 10Cmjohnson) [22:03:31] RECOVERY - puppet last run on labservices1001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [22:03:35] (03PS2) 10Cmjohnson: Adding mac addresses for prometheus1003 and 1004 T152504 [puppet] - 10https://gerrit.wikimedia.org/r/325628 [22:03:40] (03CR) 10Cmjohnson: [V: 032] Adding mac addresses for prometheus1003 and 1004 T152504 [puppet] - 10https://gerrit.wikimedia.org/r/325628 (owner: 10Cmjohnson) [22:08:46] andrewbogott: that seemed transient to me? [22:09:00] chasemp: 'cause I fixed it :) [22:09:05] well then :D [22:09:30] I mean, I broke it, and then fixed it [22:11:05] (03CR) 10jenkins-bot: [V: 04-1] installserver: move http to own class [puppet] - 10https://gerrit.wikimedia.org/r/322829 (https://phabricator.wikimedia.org/T132757) (owner: 10Dzahn) [22:12:28] (03PS5) 10Dzahn: installserver: move http to own class (kill carbon WIP) [puppet] - 10https://gerrit.wikimedia.org/r/322829 (https://phabricator.wikimedia.org/T132757) [22:14:16] (03PS2) 10Dzahn: Revert "elasticsearch: Fix puppet URL in comment" [puppet] - 10https://gerrit.wikimedia.org/r/325639 (owner: 10Tim Landscheidt) [22:15:16] I'm deploying for https://phabricator.wikimedia.org/T152542 [22:16:55] 06Operations, 10ops-eqiad, 13Patch-For-Review: rack/setup prometheus100[3-4] - https://phabricator.wikimedia.org/T152504#2852532 (10fgiunchedi) Thanks @Cmjohnson ! re: raid/partman setup it'll be the same as prometheus200[34] in {T151338}. Namely hw raid, first VD on raid1 for the ssd and then raid10 for hdd [22:19:21] (03PS1) 10Andrew Bogott: Horizon: refresh apache anytime django is refreshed [puppet] - 10https://gerrit.wikimedia.org/r/325692 [22:22:13] (03CR) 10Dzahn: [V: 032] Revert "elasticsearch: Fix puppet URL in comment" [puppet] - 10https://gerrit.wikimedia.org/r/325639 (owner: 10Tim Landscheidt) [22:22:41] PROBLEM - puppet last run on ms-be1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:24:41] PROBLEM - Host google is DOWN: PING CRITICAL - Packet loss = 50%, RTA = 2432.47 ms [22:25:11] RECOVERY - Host google is UP: PING OK - Packet loss = 0%, RTA = 8.21 ms [22:27:23] 06Operations, 10media-storage: cronspam cleanup: Cron test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly ) - https://phabricator.wikimedia.org/T152440#2852588 (10fgiunchedi) p:05Triage>03Normal The problem is that the cronjob require internet access to download an up... [22:28:04] (03CR) 10Dzahn: [C: 032] installserver: move http to own class (kill carbon WIP) [puppet] - 10https://gerrit.wikimedia.org/r/322829 (https://phabricator.wikimedia.org/T132757) (owner: 10Dzahn) [22:28:12] (03PS6) 10Dzahn: installserver: move http to own class (kill carbon WIP) [puppet] - 10https://gerrit.wikimedia.org/r/322829 (https://phabricator.wikimedia.org/T132757) [22:29:51] Confirming it's working mwdebug1002 [22:29:53] going live [22:32:01] RECOVERY - puppet last run on labservices1002 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [22:32:17] !log scap sync-dir php-1.29.0-wmf.4/extensions/ORES/includes/ 'Deploy gerrit:325624 (UBN! T152542)' [22:32:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:32:30] T152542: Undefined method ORES\Hooks::getDamagingThreshold() - https://phabricator.wikimedia.org/T152542 [22:32:39] !log ladsgroup@tin Synchronized php-1.29.0-wmf.4/extensions/ORES/includes: Deploy gerrit:325624 (UBN! T152542) (duration: 00m 46s) [22:32:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:33:08] Okay, let's backport to wmf.5 [22:33:37] ostriches: ^ [22:33:57] k cool [22:34:56] (03PS3) 10Tim Landscheidt: labstore: Use explicit groups for file resources [puppet] - 10https://gerrit.wikimedia.org/r/324729 (https://phabricator.wikimedia.org/T152095) [22:37:40] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 4 others: Check concurrency/retry/timeout limits and syncronize those between services - https://phabricator.wikimedia.org/T152073#2852598 (10greg) [22:38:47] (03CR) 10Dzahn: "confirmed no-op on install1001, on carbon the only change is the motd" [puppet] - 10https://gerrit.wikimedia.org/r/322829 (https://phabricator.wikimedia.org/T132757) (owner: 10Dzahn) [22:40:16] okay, deploying the backport for wmf.5. It can be tested so I skip the mwdebug part [22:40:22] *can't [22:43:29] !log scap sync-dir php-1.29.0-wmf.5/extensions/ORES/includes/ 'Deploy gerrit:325624 (UBN! T152542)' [22:43:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:43:40] T152542: Undefined method ORES\Hooks::getDamagingThreshold() - https://phabricator.wikimedia.org/T152542 [22:43:58] !log ladsgroup@tin Synchronized php-1.29.0-wmf.5/extensions/ORES/includes: Deploy gerrit:325624 (UBN! T152542) (duration: 00m 45s) [22:44:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:44:20] ostriches: I'm done [22:44:35] awesome! Thanks for the quick turnaround [22:45:08] Happy to help, this was my first mediawiki deployment [22:45:11] that was fun [22:45:53] I'm so adding that to our customer testimonials :P [22:45:55] "That was fun" [22:45:57] :) [22:46:10] that's bash worthy [22:46:22] :)) [22:48:06] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 4 others: Check concurrency/retry/timeout limits and syncronize those between services - https://phabricator.wikimedia.org/T152073#2852627 (10GWicke) [22:48:10] Amir1: https://office.wikimedia.org/w/index.php?title=Bash&action=historysubmit&type=revision&diff=202478&oldid=202186 [22:48:22] use !quip :) [22:48:31] https://tools.wmflabs.org/bash/help [22:48:35] he won't be able to read that greg-g [22:48:36] I should get an account in office wiki [22:48:40] eh, !bash :p [22:48:54] https://tools.wmflabs.org/bash/top [22:49:22] good old time bugzilla [22:50:11] Krenair: bah! [22:50:55] there: https://tools.wmflabs.org/bash/quip/AVjWVRGnQMK9DA-FJsIR [22:51:02] Amir1: ^ [22:51:12] Now I see it [22:51:20] thanks greg-g [22:51:42] RECOVERY - puppet last run on ms-be1024 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [22:51:51] productivity [22:52:13] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 4 others: Check concurrency/retry/timeout limits and syncronize those between services - https://phabricator.wikimedia.org/T152073#2852629 (10GWicke) [22:52:26] 06Operations, 10fundraising-tech-ops: Port fundraising stats off Ganglia - https://phabricator.wikimedia.org/T152562#2852630 (10fgiunchedi) [22:52:31] PROBLEM - puppet last run on mc1025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:54:11] 06Operations, 10Cassandra, 10RESTBase, 06Services (doing): RESTBase k-r-v as Cassandra anti-pattern - https://phabricator.wikimedia.org/T144431#2852646 (10Eevans) [22:54:36] 06Operations, 13Patch-For-Review, 05Prometheus-metrics-monitoring: Port application-specific metrics from ganglia to prometheus - https://phabricator.wikimedia.org/T145659#2852647 (10fgiunchedi) [22:54:43] (03PS2) 10Dzahn: installserver: split squid proxy to own class [puppet] - 10https://gerrit.wikimedia.org/r/322830 (https://phabricator.wikimedia.org/T132757) [22:55:44] (03PS1) 10Filippo Giunchedi: swift: get rid of monthly ieee-data cronjob [puppet] - 10https://gerrit.wikimedia.org/r/325699 (https://phabricator.wikimedia.org/T152440) [22:55:48] (03CR) 10jenkins-bot: [V: 04-1] installserver: split squid proxy to own class [puppet] - 10https://gerrit.wikimedia.org/r/322830 (https://phabricator.wikimedia.org/T132757) (owner: 10Dzahn) [22:57:06] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 4 others: Check concurrency/retry/timeout limits and syncronize those between services - https://phabricator.wikimedia.org/T152073#2852656 (10GWicke) @ssastry @Arlolra: We need to determine what the actual upper bound on render times we want to sup... [22:57:13] (03PS3) 10Dzahn: installserver: split squid proxy to own class [puppet] - 10https://gerrit.wikimedia.org/r/322830 (https://phabricator.wikimedia.org/T132757) [23:00:49] !log upgrade grafana on labmon1001 - T152473 [23:01:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:01:03] T152473: Upgrade labmon1001 Grafana to 4.0.1 - https://phabricator.wikimedia.org/T152473 [23:04:48] (03PS1) 10Tim Landscheidt: Move ve files to role module [puppet] - 10https://gerrit.wikimedia.org/r/325701 [23:05:44] gilles: ^ [23:07:46] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 4 others: Check concurrency/retry/timeout limits and syncronize those between services - https://phabricator.wikimedia.org/T152073#2852671 (10ssastry) @gwicke, we set a 3-min render timeout in parsoid so that restbase retries have a chance of succe... [23:20:30] (03PS1) 10Dzahn: add donatetowikipedia.[com|org] as parked domain [dns] - 10https://gerrit.wikimedia.org/r/325706 [23:20:31] RECOVERY - puppet last run on mc1025 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [23:22:35] 06Operations, 10ChangeProp, 06Parsing-Team, 10Parsoid, and 4 others: Check concurrency/retry/timeout limits and syncronize those between services - https://phabricator.wikimedia.org/T152073#2852686 (10GWicke) > But, we can limit it to 2 mins. In that case, RB need not bump its timeout value on retry. 110... [23:22:41] PROBLEM - MD RAID on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [23:23:31] RECOVERY - MD RAID on thumbor1002 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [23:38:51] PROBLEM - puppet last run on elastic1025 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:47:41] PROBLEM - puppet last run on elastic1041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:52:59] 06Operations, 10Traffic, 13Patch-For-Review, 05Prometheus-metrics-monitoring: Error collecting metrics from varnish_exporter on some misc hosts - https://phabricator.wikimedia.org/T150479#2852882 (10fgiunchedi) I took another look at the cause of UUID/VCL churn, concentrating for now only on the backend va... [23:53:28] !log upgrade prometheus-varnish-exporter on cache boxes in esams - T150479 [23:53:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:53:41] T150479: Error collecting metrics from varnish_exporter on some misc hosts - https://phabricator.wikimedia.org/T150479