[00:33:25] 06Operations, 10Security-Reviews, 06Security-Team, 10Wikimedia-Site-requests: ACL configuration for url-downloader.wikimedia.org allowing upload.wikimedia.org - https://phabricator.wikimedia.org/T130695#2587813 (10Bawolff) [Since I was asked to comment] I personally can't think of any objections to this, h... [01:03:19] 06Operations, 10Security-Reviews, 06Security-Team, 10Wikimedia-Site-requests: ACL configuration for url-downloader.wikimedia.org allowing upload.wikimedia.org - https://phabricator.wikimedia.org/T130695#2143360 (10demon) I'm not sure it would need a url-downloader configuration change anyway? I thought any... [01:10:33] 06Operations, 10Security-Reviews, 06Security-Team, 10Wikimedia-Site-requests: ACL configuration for url-downloader.wikimedia.org allowing upload.wikimedia.org - https://phabricator.wikimedia.org/T130695#2587826 (10TTO) 05Open>03Resolved a:03TTO Works for me: https://test.wikipedia.org/wiki/File:Shell... [01:15:40] who never went to bed [01:15:59] AND turned the light back on (when did I do that, and why? I don't even remember it) [01:16:25] and is currently checking dump file integrity because they don't have the sense at 4am to go to sleep [01:16:27] ? [01:16:42] someday this ocd thing is going to kill me [01:19:16] right. trying again... see folks MUCH later [01:34:58] PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 1800.229663 Seconds [01:37:28] RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 119.801113 Seconds [02:06:40] (03PS14) 10Andrew Bogott: Horizon tab for modifying instance puppet config [puppet] - 10https://gerrit.wikimedia.org/r/294342 (https://phabricator.wikimedia.org/T91990) [02:06:43] (03PS1) 10Andrew Bogott: Compress static content for Horizon [puppet] - 10https://gerrit.wikimedia.org/r/307047 [02:07:56] (03CR) 10jenkins-bot: [V: 04-1] Horizon tab for modifying instance puppet config [puppet] - 10https://gerrit.wikimedia.org/r/294342 (https://phabricator.wikimedia.org/T91990) (owner: 10Andrew Bogott) [02:08:07] (03CR) 10jenkins-bot: [V: 04-1] Compress static content for Horizon [puppet] - 10https://gerrit.wikimedia.org/r/307047 (owner: 10Andrew Bogott) [02:11:50] (03PS2) 10Andrew Bogott: Compress static content for Horizon [puppet] - 10https://gerrit.wikimedia.org/r/307047 [02:11:52] (03PS15) 10Andrew Bogott: Horizon tab for modifying instance puppet config [puppet] - 10https://gerrit.wikimedia.org/r/294342 (https://phabricator.wikimedia.org/T91990) [02:13:23] (03CR) 10jenkins-bot: [V: 04-1] Horizon tab for modifying instance puppet config [puppet] - 10https://gerrit.wikimedia.org/r/294342 (https://phabricator.wikimedia.org/T91990) (owner: 10Andrew Bogott) [02:13:33] (03CR) 10jenkins-bot: [V: 04-1] Compress static content for Horizon [puppet] - 10https://gerrit.wikimedia.org/r/307047 (owner: 10Andrew Bogott) [02:17:36] (03PS3) 10Andrew Bogott: Compress static content for Horizon [puppet] - 10https://gerrit.wikimedia.org/r/307047 [02:17:38] (03PS16) 10Andrew Bogott: Horizon tab for modifying instance puppet config [puppet] - 10https://gerrit.wikimedia.org/r/294342 (https://phabricator.wikimedia.org/T91990) [02:18:51] (03CR) 10jenkins-bot: [V: 04-1] Compress static content for Horizon [puppet] - 10https://gerrit.wikimedia.org/r/307047 (owner: 10Andrew Bogott) [02:19:59] (03PS4) 10Andrew Bogott: Compress static content for Horizon [puppet] - 10https://gerrit.wikimedia.org/r/307047 [02:20:02] (03PS17) 10Andrew Bogott: Horizon tab for modifying instance puppet config [puppet] - 10https://gerrit.wikimedia.org/r/294342 (https://phabricator.wikimedia.org/T91990) [02:21:51] (03CR) 10Andrew Bogott: [C: 032] Compress static content for Horizon [puppet] - 10https://gerrit.wikimedia.org/r/307047 (owner: 10Andrew Bogott) [02:24:45] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.16) (duration: 11m 12s) [02:24:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:25:28] (03PS18) 10Andrew Bogott: Horizon tab for modifying instance puppet config [puppet] - 10https://gerrit.wikimedia.org/r/294342 (https://phabricator.wikimedia.org/T91990) [02:25:30] (03PS1) 10Andrew Bogott: Specify a path for the django compression exec [puppet] - 10https://gerrit.wikimedia.org/r/307048 [02:27:11] (03CR) 10Andrew Bogott: [C: 032] Specify a path for the django compression exec [puppet] - 10https://gerrit.wikimedia.org/r/307048 (owner: 10Andrew Bogott) [02:28:22] PROBLEM - puppet last run on californium is CRITICAL: CRITICAL: puppet fail [02:30:36] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Aug 27 02:30:35 UTC 2016 (duration 5m 50s) [02:30:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:31:02] RECOVERY - puppet last run on californium is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [02:36:30] (03PS1) 10Andrew Bogott: Forward horizon settings to mitaka, for LabTest [puppet] - 10https://gerrit.wikimedia.org/r/307049 [02:37:49] (03PS2) 10Andrew Bogott: Forward horizon settings to mitaka, for LabTest [puppet] - 10https://gerrit.wikimedia.org/r/307049 [02:39:19] (03CR) 10Andrew Bogott: [C: 032] Forward horizon settings to mitaka, for LabTest [puppet] - 10https://gerrit.wikimedia.org/r/307049 (owner: 10Andrew Bogott) [06:47:08] PROBLEM - puppet last run on labvirt1011 is CRITICAL: CRITICAL: Puppet has 2 failures [07:11:50] RECOVERY - puppet last run on labvirt1011 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [08:07:05] (03CR) 10Alex Monk: [C: 04-1] "This is cherry-picked on deployment-puppetmaster and it's causing issues, 3.2.4-1 is not in jessie-wikimedia so puppet is failing on deplo" [puppet] - 10https://gerrit.wikimedia.org/r/307028 (owner: 10Thcipriani) [08:18:59] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/1/3: down - Core: cr2-esams:xe-0/1/3 (Level3, BDFS2448, 84ms) {#2013} [10Gbps wave]BR [08:18:59] PROBLEM - Router interfaces on cr2-esams is CRITICAL: CRITICAL: host 91.198.174.244, interfaces up: 57, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/1/3: down - Core: cr2-eqiad:xe-4/1/3 (Level3, BDFS2448, 84ms) {#A0010621} [10Gbps wave]BR [08:32:17] (03CR) 10Alex Monk: "also deployment-mediawiki03" [puppet] - 10https://gerrit.wikimedia.org/r/307028 (owner: 10Thcipriani) [09:06:38] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [09:06:38] RECOVERY - Router interfaces on cr2-esams is OK: OK: host 91.198.174.244, interfaces up: 59, down: 0, dormant: 0, excluded: 0, unused: 0 [09:10:14] 06Operations, 10Beta-Cluster-Infrastructure, 10Traffic, 13Patch-For-Review, 07WorkType-Maintenance: On beta cluster varnish stats process points to production statsd - https://phabricator.wikimedia.org/T116898#2588202 (10hashar) [09:13:50] 06Operations, 10Beta-Cluster-Infrastructure, 07HHVM: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2588203 (10hashar) [09:16:04] (03PS4) 10Hashar: contint: bump pip 7.0.1 -> 8.1.2 [puppet] - 10https://gerrit.wikimedia.org/r/289639 [09:19:49] (03CR) 10Hashar: [C: 031] "Status:" [puppet] - 10https://gerrit.wikimedia.org/r/289639 (owner: 10Hashar) [09:21:21] (03PS2) 10Hashar: rsync: allow extra settings in rsyncd.conf [puppet] - 10https://gerrit.wikimedia.org/r/290895 (https://phabricator.wikimedia.org/T136276) [09:21:28] (03PS2) 10Hashar: contint: disable DNS lookup for castor rsyncd [puppet] - 10https://gerrit.wikimedia.org/r/290896 (https://phabricator.wikimedia.org/T136276) [09:22:54] (03CR) 10jenkins-bot: [V: 04-1] contint: disable DNS lookup for castor rsyncd [puppet] - 10https://gerrit.wikimedia.org/r/290896 (https://phabricator.wikimedia.org/T136276) (owner: 10Hashar) [09:24:01] (03CR) 10Hashar: "Proposed to puppet swat on August 30" [puppet] - 10https://gerrit.wikimedia.org/r/276346 (https://phabricator.wikimedia.org/T129092) (owner: 10Hashar) [10:47:02] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [10:52:05] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [12:29:51] 06Operations, 10Continuous-Integration-Infrastructure, 07Zuul: Upgrade Zuul on gallium - https://phabricator.wikimedia.org/T144088#2588273 (10hashar) [12:30:12] 06Operations, 10Continuous-Integration-Infrastructure, 07Zuul: Upgrade Zuul on gallium to 2.5.0-8-gcbc7f62-wmf1precise1 - https://phabricator.wikimedia.org/T144088#2588290 (10hashar) [12:47:13] (03CR) 10Hashar: [C: 04-1] "Using packages from the upstream distributions has proven to be a nightmare. They are stuck at an arbitrary version which often has bugs." [puppet] - 10https://gerrit.wikimedia.org/r/306851 (owner: 1020after4) [14:36:19] (03PS19) 10Andrew Bogott: Horizon tab for modifying instance puppet config [puppet] - 10https://gerrit.wikimedia.org/r/294342 (https://phabricator.wikimedia.org/T91990) [14:36:27] PROBLEM - MariaDB Slave Lag: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1753.93 seconds [15:06:41] ACKNOWLEDGEMENT - MariaDB Slave Lag: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 3523.58 seconds Jcrespo Expected lag during schema change - The acknowledgement expires at: 2016-08-29 08:00:00. [15:08:31] (03CR) 10Paladox: "recheck" [debs/contenttranslation/giella-sme] - 10https://gerrit.wikimedia.org/r/294430 (https://phabricator.wikimedia.org/T120087) (owner: 10KartikMistry) [15:15:34] (03CR) 10Thcipriani: "> This is cherry-picked on deployment-puppetmaster and it's causing" [puppet] - 10https://gerrit.wikimedia.org/r/307028 (owner: 10Thcipriani) [15:23:21] paladox: any reason to recheck on giella-sme, build is already passing. [15:23:48] kart_ hi, im doing some tests due to the test failing on other repos [15:23:56] paladox: nice. thanks! [15:24:31] kart_ Your welcome, i am trying to fix https://phabricator.wikimedia.org/T144094 [16:02:45] (03CR) 10Paladox: "recheck" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/301841 (owner: 10Paladox) [16:04:20] (03PS9) 10Paladox: Add gbp.conf file for debian [debs/gerrit] - 10https://gerrit.wikimedia.org/r/301841 [17:39:44] jynus: ping? [18:12:49] Dereckson, yes? [18:16:41] jynus: I wanted your opinion about the best strategy to solve "what is the 33 000 000th file uploaded on Commons?". The community wants it for celebration / press release purpose. [18:16:54] lol [18:16:55] lego.ktm indicated me the method used [18:17:10] tu guess it [18:17:27] yes, define it properly and I can help, the first part is the most complex one [18:18:38] send me an email, though, I suppose this will take some days to happen? [18:18:51] they are at 33,010,293 [18:19:01] oh [18:19:12] figure according https://commons.wikimedia.org/wiki/Special:Statistics?setlang=en [18:19:58] then that could be almost impossible because of agreement/concurrency for the nature of uplads and deletes [18:20:29] but if you have an arbitrary method to use, just tell me [18:24:25] Method seems to be 1. take the max page_id in page table, and the current stat. 2. remove to page_id the offset currently in the stats. 3. adjust to ignore pages created in other namespace than the image namespage. 4. take a span of five six pictures in the surroundings. [18:27:04] ok, where do I enter there? [18:27:26] as in, anybody can do that, right? (well, anybody with db access) [18:28:07] or you just want my opinion on that method? [18:28:20] opinion would be nice, yes [18:28:44] I'll do that in half an hour. I've already noted when stat was at 33 010 227, MAX(page_id) was at 50922927 [18:29:21] for me, I would take the image page, order by timestamp, LIMIT 33 million-5, 10 ignoring deleted/suppresed images [18:29:58] run it on the research slave to avoid creating extra unnecesary load [18:30:29] it may take an hour to run, but less work [18:31:29] yes, I suppose you can also do that [18:40:57] PROBLEM - puppet last run on db2040 is CRITICAL: CRITICAL: puppet fail [19:08:24] RECOVERY - puppet last run on db2040 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [19:10:54] (03CR) 10Legoktm: [C: 031] contint: bump pip 7.0.1 -> 8.1.2 [puppet] - 10https://gerrit.wikimedia.org/r/289639 (owner: 10Hashar) [19:16:32] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [19:18:59] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4891771 keys - replication_delay is 0 [19:41:55] (03CR) 10Paladox: [C: 031] contint: bump pip 7.0.1 -> 8.1.2 [puppet] - 10https://gerrit.wikimedia.org/r/289639 (owner: 10Hashar) [20:34:15] 06Operations, 10Continuous-Integration-Infrastructure: Upgrade jenkins-debian-glue on Jessie slaves from 0.13.0 to latest (0.17.0) - https://phabricator.wikimedia.org/T141114#2588748 (10hashar) [20:35:21] 06Operations, 10Continuous-Integration-Infrastructure: Upgrade jenkins-debian-glue on Jessie slaves from 0.13.0 to latest (0.17.0) - https://phabricator.wikimedia.org/T141114#2487529 (10hashar) Added to Puppet SWAT of Tuesday, August 30 [21:19:59] (03PS1) 10Andrew Bogott: Set up a root password for Labs instances [puppet] - 10https://gerrit.wikimedia.org/r/307086 (https://phabricator.wikimedia.org/T142531) [21:20:47] (03CR) 10Andrew Bogott: [C: 04-2] "This is totally untested, since I can't actually make a puppet::self instance to test on, because of https://phabricator.wikimedia.org/T14" [puppet] - 10https://gerrit.wikimedia.org/r/307086 (https://phabricator.wikimedia.org/T142531) (owner: 10Andrew Bogott) [21:21:29] 10Blocked-on-Operations, 07Puppet, 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: /etc/puppet/puppet.conf keeps getting double content - first for labs-wide puppetmaster, then for the correct puppetmaster - https://phabricator.wikimedia.org/T132689#2588790 (10hashar) Seems to cause {T144108} :( [21:21:34] (03CR) 10jenkins-bot: [V: 04-1] Set up a root password for Labs instances [puppet] - 10https://gerrit.wikimedia.org/r/307086 (https://phabricator.wikimedia.org/T142531) (owner: 10Andrew Bogott) [21:22:16] (03CR) 10Paladox: "recheck" [debs/gerrit] - 10https://gerrit.wikimedia.org/r/301841 (owner: 10Paladox) [21:57:04] 06Operations, 10Beta-Cluster-Infrastructure, 07HHVM: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2586022 (10AlexMonk-WMF) We don't actually have separate API appservers in beta, and I'm not sure what 'Script servers' refers to? [21:57:56] 06Operations, 10Beta-Cluster-Infrastructure, 07HHVM: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2588845 (10AlexMonk-WMF) Oh, it's in the parent task - that means maintenance script servers. We don't really have separate servers for that either - we could, with a labs qu... [22:11:04] (03CR) 10Alex Monk: "Okay. So this is blocked on ops uploading the new package to apt.wm.o?" [puppet] - 10https://gerrit.wikimedia.org/r/307028 (owner: 10Thcipriani) [22:23:18] (03PS1) 10ArielGlenn: pep8 for clouseau and remove rules for ignoring flake8 errors [software] - 10https://gerrit.wikimedia.org/r/307101 [22:31:18] (03CR) 10ArielGlenn: [C: 032] pep8 for clouseau and remove rules for ignoring flake8 errors [software] - 10https://gerrit.wikimedia.org/r/307101 (owner: 10ArielGlenn) [22:33:59] (03CR) 10Alex Monk: Horizon tab for modifying instance puppet config (0312 comments) [puppet] - 10https://gerrit.wikimedia.org/r/294342 (https://phabricator.wikimedia.org/T91990) (owner: 10Andrew Bogott) [22:40:24] (03CR) 10Volans: pep8 for clouseau and remove rules for ignoring flake8 errors (032 comments) [software] - 10https://gerrit.wikimedia.org/r/307101 (owner: 10ArielGlenn) [22:49:18] RECOVERY - MariaDB Slave Lag: s3 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 0.90 seconds [23:00:49] (03PS16) 10Alex Monk: Add python version of maintain-replicas script [software] - 10https://gerrit.wikimedia.org/r/295607 (https://phabricator.wikimedia.org/T138450) [23:32:24] PROBLEM - MariaDB Slave Lag: s3 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1727.84 seconds [23:33:24] 07Puppet, 10Continuous-Integration-Config, 07Jenkins: There is no sane way to get arcanist's conduit tokens onto nodepool CI slaves - https://phabricator.wikimedia.org/T140417#2588879 (10Paladox) @mmodell we could use puppet to get a the .arcrc file installed where ever it should go with the token of the use... [23:37:42] 07Puppet, 10Continuous-Integration-Config, 07Jenkins: There is no sane way to get arcanist's conduit tokens onto nodepool CI slaves - https://phabricator.wikimedia.org/T140417#2588881 (10Paladox) Or we can get upstream to support some type of anonymous http support without requiring authenticating like that... [23:39:05] PROBLEM - puppet last run on oxygen is CRITICAL: CRITICAL: Puppet has 1 failures [23:42:18] 07Puppet, 06Labs, 10Phabricator: Phabricator labs puppet role configures phabricator wrong - https://phabricator.wikimedia.org/T131899#2182293 (10Paladox) Actually phabricator did this. It is required for http cloning to fix this all you have to do is '''cd /support/bin''' '''sudo ln -sv /user/li... [23:44:20] 07Puppet, 06Labs, 10Phabricator: Phabricator labs puppet role configures phabricator wrong - https://phabricator.wikimedia.org/T131899#2588885 (10Paladox) But I think we need to instead update the main puppet role to work on both production and labs and remove the labs phabricator role once we have done that. [23:45:02] RECOVERY - MariaDB Slave Lag: s3 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 0.56 seconds