[00:08:23] (03PS1) 10Alex Monk: labs dnsrecursor: tidy up paths [puppet] - 10https://gerrit.wikimedia.org/r/299499 [00:08:25] (03PS1) 10Alex Monk: labs dnsrecursor metaldns: use hiera's labs_tld instead of assuming its value [puppet] - 10https://gerrit.wikimedia.org/r/299500 [00:08:27] (03PS1) 10Alex Monk: labs dnsrecursor metaldns: Resolve PTR records too [puppet] - 10https://gerrit.wikimedia.org/r/299501 (https://phabricator.wikimedia.org/T139438) [00:11:30] (03CR) 10Alex Monk: "Have been testing this in labs-dnsrecursor-test.openstack.eqiad.wmflabs (self-hosted puppet with labsaliaser patched out so it can work wi" [puppet] - 10https://gerrit.wikimedia.org/r/299501 (https://phabricator.wikimedia.org/T139438) (owner: 10Alex Monk) [00:15:00] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [01:47:47] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [01:47:47] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [01:58:46] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:04:46] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [02:08:17] (03PS1) 10Alex Monk: dnsrecursor labsaliaser: Set up instance-$instance.$project.wmflabs.org domains for every instance with a public IP [puppet] - 10https://gerrit.wikimedia.org/r/299503 (https://phabricator.wikimedia.org/T104521) [02:20:41] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.10) (duration: 08m 23s) [02:20:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:26:25] !log l10nupdate@tin ResourceLoader cache refresh completed at Mon Jul 18 02:26:24 UTC 2016 (duration 5m 43s) [02:26:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:33:06] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:33:45] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [02:34:46] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [02:35:35] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [02:39:25] PROBLEM - puppet last run on cp4012 is CRITICAL: CRITICAL: Puppet has 1 failures [02:56:29] (03PS1) 10Kharkiv07: Enable RevisionSlider for arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299504 (https://phabricator.wikimedia.org/T140551) [03:04:46] RECOVERY - puppet last run on cp4012 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [03:45:27] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:45:38] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:48:28] PROBLEM - puppet last run on mw2080 is CRITICAL: CRITICAL: Puppet has 1 failures [04:16:21] RECOVERY - puppet last run on mw2080 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:23:15] PROBLEM - puppet last run on cp3006 is CRITICAL: CRITICAL: Puppet has 1 failures [05:33:23] PROBLEM - puppet last run on cp4018 is CRITICAL: CRITICAL: puppet fail [05:49:03] RECOVERY - puppet last run on cp3006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:00:23] RECOVERY - puppet last run on cp4018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:04:43] PROBLEM - All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 185 bytes in 1.429 second response time [06:13:52] PROBLEM - Start and verify pages via webservices on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/service/start - 274 bytes in 57.135 second response time [06:20:03] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: Puppet has 1 failures [06:27:52] PROBLEM - Start and verify pages via webservices on kubernetes on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/webservice/kubernetes - 299 bytes in 57.024 second response time [06:28:44] PROBLEM - Start and verify pages via webservices on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/service/start - 274 bytes in 0.563 second response time [06:31:32] PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:44] PROBLEM - puppet last run on nobelium is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:23] PROBLEM - puppet last run on db1028 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:43] PROBLEM - puppet last run on mw2095 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:43] PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 1 failures [06:38:32] PROBLEM - puppet last run on mw1164 is CRITICAL: CRITICAL: puppet fail [06:41:23] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [06:45:24] RECOVERY - puppet last run on cp3010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:12] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [06:49:03] PROBLEM - Start and verify pages via webservices on kubernetes on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/webservice/kubernetes - 299 bytes in 0.242 second response time [06:50:13] PROBLEM - tools homepage -admin tool- on tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 531 bytes in 0.024 second response time [06:56:43] RECOVERY - puppet last run on nobelium is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [06:57:13] PROBLEM - puppet last run on labcontrol1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:57:14] RECOVERY - puppet last run on db1028 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:58:02] RECOVERY - tools homepage -admin tool- on tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.036 second response time [06:58:32] RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:43] RECOVERY - puppet last run on mw2095 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [06:58:43] RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:06:25] RECOVERY - puppet last run on mw1164 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:22:35] RECOVERY - puppet last run on labcontrol1002 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [07:30:55] PROBLEM - All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 185 bytes in 57.233 second response time [07:34:25] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:35:04] PROBLEM - All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 185 bytes in 58.168 second response time [07:36:15] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [07:42:35] PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: Puppet has 1 failures [07:45:04] PROBLEM - All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 185 bytes in 58.635 second response time [07:45:04] PROBLEM - Start and verify pages via webservices on kubernetes on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/webservice/kubernetes - 299 bytes in 49.339 second response time [07:45:04] PROBLEM - Start and verify pages via webservices on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/service/start - 274 bytes in 49.681 second response time [07:48:56] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:50:55] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [07:57:05] (03PS3) 10ArielGlenn: lock wikis for dump runs by date, permitting runs across multiple dates [dumps] - 10https://gerrit.wikimedia.org/r/299448 (https://phabricator.wikimedia.org/T126341) [07:59:55] PROBLEM - Start and verify pages via webservices on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/service/start - 274 bytes in 0.270 second response time [08:03:46] PROBLEM - mobileapps endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:05:45] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [08:07:15] PROBLEM - Start and verify pages via webservices on kubernetes on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/webservice/kubernetes - 299 bytes in 58.294 second response time [08:08:54] PROBLEM - HTTPS-wmflabs on tools.wmflabs.org is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:IO::Socket::SSL: SSL connect attempt failed with unknown error error:00000000:lib(0):func(0):reason(0) [08:09:36] RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:14:39] (03CR) 10Mobrovac: "See in-lined question. Also, would you mind chery-picking this on beta and test, since it's a beta-only change?" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/298947 (https://phabricator.wikimedia.org/T129284) (owner: 10KartikMistry) [08:19:14] PROBLEM - Start and verify pages via webservices on kubernetes on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/webservice/kubernetes - 299 bytes in 55.807 second response time [08:22:03] 06Operations, 10Diffusion: flexbisonparse - https://phabricator.wikimedia.org/T140594#2469863 (10jayvdb) [08:23:46] 06Operations, 10Diffusion: flexbisonparse - https://phabricator.wikimedia.org/T140594#2469880 (10jayvdb) p:05Triage>03Low [08:26:19] 06Operations, 10Diffusion: flexbisonparse - https://phabricator.wikimedia.org/T140594#2469863 (10Paladox) Hi it is here, https://phabricator.wikimedia.org/diffusion/SVN/browse/trunk/parsers/graveyard/flexbisonparse/ It will need to be migrated to a git repo. [08:31:45] !log gallium: upgrading Zuul 2.1.0-95-g66c8e52-wmf1precise1 .. zuul_2.1.0-151-g30a433b-wmf3precise1 T137525 [08:31:46] T137525: Investigate Zuul 2.1.0-151-g30a433b that stops processing Gerrit events - https://phabricator.wikimedia.org/T137525 [08:31:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:34:37] 06Operations, 10ops-eqiad, 10media-storage: diagnose failed(?) sda on ms-be1022 - https://phabricator.wikimedia.org/T140597#2469924 (10fgiunchedi) [08:38:36] PROBLEM - All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 185 bytes in 1.414 second response time [08:38:36] PROBLEM - Start and verify pages via webservices on kubernetes on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/webservice/kubernetes - 299 bytes in 1.637 second response time [08:38:42] (03PS6) 10KartikMistry: Beta: Fix cxserver restbase_url [puppet] - 10https://gerrit.wikimedia.org/r/298947 (https://phabricator.wikimedia.org/T129284) [08:42:55] 06Operations, 10ops-eqiad, 10Analytics-Cluster: Analytics hosts showed high temperature alarms - https://phabricator.wikimedia.org/T132256#2469960 (10elukey) @Cmjohnson sorry for the late answer! Can we schedule maintenance for a couple of servers to see if it fixes the issue? These are part of the Hadoop cl... [08:43:43] 06Operations, 10Diffusion: flexbisonparse - https://phabricator.wikimedia.org/T140594#2469964 (10jayvdb) >>! In T140594#2469901, @Paladox wrote: > Hi it is here, https://phabricator.wikimedia.org/diffusion/SVN/browse/trunk/parsers/graveyard/flexbisonparse/ I already said that in the original comment. > It wi... [08:45:49] 06Operations, 10Diffusion: flexbisonparse - https://phabricator.wikimedia.org/T140594#2469969 (10Paladox) Well you can request it here https://www.mediawiki.org/wiki/Git/New_repositories/Requests asking to import https://phabricator.wikimedia.org/diffusion/SVN/browse/trunk/parsers/graveyard/flexbisonparse/ pl... [08:52:34] 06Operations, 10Diffusion: flexbisonparse - https://phabricator.wikimedia.org/T140594#2469973 (10jayvdb) That only solves the problem for this one tool ( flexbisonparse ) . The problem exists for all of the other old tools in that SVN repo, most of which are useless now. Doing them all one by one isnt sensi... [08:54:53] !log swift eqiad-prod: ms-be102[3-6] to weight 500 - T136631 [08:54:53] T136631: rack/setup/deploy ms-be102[2-7] - https://phabricator.wikimedia.org/T136631 [08:54:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:05:14] (03PS1) 10Alex Monk: Delegate 208.80.155.128/25 (labs instances) PTR records to labs-ns* so they can be managed automatically [dns] - 10https://gerrit.wikimedia.org/r/299513 (https://phabricator.wikimedia.org/T104521) [09:05:39] (03CR) 10jenkins-bot: [V: 04-1] Delegate 208.80.155.128/25 (labs instances) PTR records to labs-ns* so they can be managed automatically [dns] - 10https://gerrit.wikimedia.org/r/299513 (https://phabricator.wikimedia.org/T104521) (owner: 10Alex Monk) [09:07:06] (03PS2) 10Alex Monk: Delegate 208.80.155.128/25 (labs instances) PTR records to labs-ns* so they can be managed automatically [dns] - 10https://gerrit.wikimedia.org/r/299513 (https://phabricator.wikimedia.org/T104521) [09:07:09] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [09:07:51] (03PS3) 10Ema: upload VCL: prep for easier V4 migration [puppet] - 10https://gerrit.wikimedia.org/r/299126 (https://phabricator.wikimedia.org/T131502) (owner: 10BBlack) [09:08:12] (03CR) 10Ema: [C: 032 V: 032] upload VCL: prep for easier V4 migration [puppet] - 10https://gerrit.wikimedia.org/r/299126 (https://phabricator.wikimedia.org/T131502) (owner: 10BBlack) [09:08:38] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [09:15:02] (03PS5) 10Ema: cache_upload VCL forward port to Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/298744 (https://phabricator.wikimedia.org/T131502) [09:15:22] (03CR) 10Ema: [C: 032 V: 032] cache_upload VCL forward port to Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/298744 (https://phabricator.wikimedia.org/T131502) (owner: 10Ema) [09:16:38] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [09:17:08] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [09:18:29] (03PS4) 10Ema: VCL: add call for cluster/layer vcl_backend_fetch for V4 [puppet] - 10https://gerrit.wikimedia.org/r/299129 (https://phabricator.wikimedia.org/T131502) (owner: 10BBlack) [09:18:44] (03CR) 10Ema: [C: 032 V: 032] VCL: add call for cluster/layer vcl_backend_fetch for V4 [puppet] - 10https://gerrit.wikimedia.org/r/299129 (https://phabricator.wikimedia.org/T131502) (owner: 10BBlack) [09:25:03] (03PS4) 10ArielGlenn: lock wikis for dump runs by date, permitting runs across multiple dates [dumps] - 10https://gerrit.wikimedia.org/r/299448 (https://phabricator.wikimedia.org/T126341) [09:25:28] (03CR) 10jenkins-bot: [V: 04-1] lock wikis for dump runs by date, permitting runs across multiple dates [dumps] - 10https://gerrit.wikimedia.org/r/299448 (https://phabricator.wikimedia.org/T126341) (owner: 10ArielGlenn) [09:26:33] (03PS5) 10Ema: upload VCL: X-Range hack for V4 [puppet] - 10https://gerrit.wikimedia.org/r/299130 (https://phabricator.wikimedia.org/T131502) (owner: 10BBlack) [09:27:10] (03CR) 10Ema: [C: 032 V: 032] upload VCL: X-Range hack for V4 [puppet] - 10https://gerrit.wikimedia.org/r/299130 (https://phabricator.wikimedia.org/T131502) (owner: 10BBlack) [09:29:57] woo hoo, first swim of the season! back in a couple of hours [09:38:40] PROBLEM - Start and verify pages via webservices on kubernetes on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/webservice/kubernetes - 299 bytes in 0.242 second response time [09:53:00] PROBLEM - All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 185 bytes in 1.475 second response time [09:54:52] (03PS2) 10Addshore: RevisionSlider enables: dewiki, hrwiki, arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298933 (https://phabricator.wikimedia.org/T140232) [09:55:25] (03CR) 10Addshore: [C: 04-1] "Please see the first deployment patch @ https://gerrit.wikimedia.org/r/#/c/298933/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299504 (https://phabricator.wikimedia.org/T140551) (owner: 10Kharkiv07) [09:58:19] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [09:58:49] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [10:06:33] (03PS3) 10Addshore: Add more to stats:wmde config [puppet] - 10https://gerrit.wikimedia.org/r/298931 [10:08:20] 06Operations, 10DBA: db2056 freezed - https://phabricator.wikimedia.org/T140598#2470054 (10jcrespo) [10:08:44] Hi godog! Just wondering if you have time to quickly look at https://gerrit.wikimedia.org/r/#/c/298931/3 ? If not that's okay I'll try and grab otto later! [10:10:39] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [10:10:41] !log hard reset for db2056 T140598 [10:10:42] T140598: db2056 freezed - https://phabricator.wikimedia.org/T140598 [10:10:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:13:35] (03PS1) 10Elukey: Sort hash keys in varnishkafka configuration file to avoid file changes. [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/299518 [10:14:24] 06Operations, 10DBA: db2056 freezed - https://phabricator.wikimedia.org/T140598#2470079 (10jcrespo) ``` Slot 0 HP Smart Array P420i Controller (1 GB, v6.00) 1 Logical Drive 1719-Slot 0 Drive Array - A controller failure event occurred prior to this power-up. (Previous lock up code = 0x13) 1792-S... [10:15:19] RECOVERY - Host db2056 is UP: PING OK - Packet loss = 0%, RTA = 37.06 ms [10:19:42] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/3360/" [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/299518 (owner: 10Elukey) [10:19:56] addshore: yeah sorry today I don't think I can take a look, though feel free to add me to the code review! [10:20:16] That's okay! I'll grab otto later :) [10:23:50] mysql seems healty, and GTID may have saved the day again [10:24:11] (03PS5) 10Addshore: Introduce wmde-analytics-users group [puppet] - 10https://gerrit.wikimedia.org/r/298928 (https://phabricator.wikimedia.org/T140342) [10:24:25] now to see if I can see anything wrong with the machine itself [10:24:40] 06Operations, 10DBA: db2056 freezed - https://phabricator.wikimedia.org/T140598#2470110 (10jcrespo) No mysql-related logs before the freeze: ``` 160524 12:34:13 [Note] Slave I/O thread: connected to master 'repl@db2017.codfw.wmnet:3306',replication starts at GTID position '0-171970567-2955826593' 160718 10:20... [10:25:42] (03PS1) 10Addshore: admin: add addshore to analytics-wmde-users [puppet] - 10https://gerrit.wikimedia.org/r/299522 (https://phabricator.wikimedia.org/T140342) [10:25:51] (03CR) 10Ema: [C: 031] "Good catch!" [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/299518 (owner: 10Elukey) [10:26:15] 06Operations, 10Ops-Access-Requests, 06WMDE-Analytics-Engineering, 13Patch-For-Review, 15User-Addshore: Requesting sudo access to analytics-wmde user on stat1002 for Addshore - https://phabricator.wikimedia.org/T140342#2470112 (10Addshore) [10:27:25] (03CR) 10Elukey: [C: 032] Sort hash keys in varnishkafka configuration file to avoid file changes. [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/299518 (owner: 10Elukey) [10:27:26] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review, 15User-Addshore: MediaWiki deployment shell access request for addshore - https://phabricator.wikimedia.org/T140276#2470114 (10Addshore) [10:28:49] PROBLEM - Start and verify pages via webservices on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/service/start - 274 bytes in 0.307 second response time [10:31:24] (03PS1) 10Elukey: Update the varnishkafka module to the latest revision. [puppet] - 10https://gerrit.wikimedia.org/r/299524 [10:32:55] PROBLEM - puppet last run on cp3003 is CRITICAL: CRITICAL: Puppet has 1 failures [10:36:26] RECOVERY - puppet last run on cp3003 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [10:36:27] 06Operations, 10DBA: db2056 freezed - https://phabricator.wikimedia.org/T140598#2470128 (10jcrespo) No syslog/kernel messages. But I found this on the management logs: ``` hpiLO-> show record6 status=0 status_tag=COMMAND COMPLETED Mon Jul 18 10:34:35 2016 /system1/log1/record6 Targets... [10:37:25] ^RAID controller failure, not the first time it happens, but I have to search when/where it was the last time it happened [10:38:06] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [10:38:23] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/3362/" [puppet] - 10https://gerrit.wikimedia.org/r/299524 (owner: 10Elukey) [10:38:28] 06Operations, 10DBA: db2056 RAID controller (temporary) failure - https://phabricator.wikimedia.org/T140598#2470134 (10jcrespo) [10:44:35] PROBLEM - MariaDB Slave Lag: s2 on db2056 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 45340.56 seconds [10:44:56] (03PS16) 10Paladox: Add missing roottree, file configs to gerrit.config.erb [puppet] - 10https://gerrit.wikimedia.org/r/298710 [10:45:15] (03PS17) 10Paladox: Add missing roottree, file configs to gerrit.config.erb [puppet] - 10https://gerrit.wikimedia.org/r/298710 [10:46:18] (03CR) 10Paladox: "Added some css to make links blue in gerrit 2.12. There currently blue in gerrit 2.8 but not in gerrit 2.12." [puppet] - 10https://gerrit.wikimedia.org/r/298710 (owner: 10Paladox) [10:48:15] PROBLEM - Start and verify pages via webservices on kubernetes on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/webservice/kubernetes - 299 bytes in 2.019 second response time [10:50:41] 06Operations, 10ops-eqiad: db2056 RAID controller (temporary) failure - https://phabricator.wikimedia.org/T140598#2470145 (10jcrespo) [11:05:15] PROBLEM - check_mysql on fdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 2539 [11:06:18] 06Operations, 10ops-codfw, 10media-storage, 13Patch-For-Review: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T136630#2470171 (10fgiunchedi) [11:08:36] !log swift codfw-prod: ms-be202[567] weight 3000 - T136630 [11:08:37] T136630: rack/setup/deploy ms-be202[2-7] - https://phabricator.wikimedia.org/T136630 [11:08:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:10:15] PROBLEM - check_mysql on fdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1855 [11:13:07] PROBLEM - All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 185 bytes in 1.436 second response time [11:20:15] RECOVERY - check_mysql on fdb2001 is OK: Uptime: 2984676 Threads: 1 Questions: 42998566 Slow queries: 18167 Opens: 3958 Flush tables: 2 Open tables: 583 Queries per second avg: 14.406 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [11:29:55] PROBLEM - Start and verify pages via webservices on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/service/start - 274 bytes in 0.232 second response time [11:35:25] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:35:57] (03PS1) 10Ema: Revert "cache_misc: puppetize switch to file storage" [puppet] - 10https://gerrit.wikimedia.org/r/299526 [11:43:16] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [11:46:43] (03PS5) 10ArielGlenn: lock wikis for dump runs by date, permitting runs across multiple dates [dumps] - 10https://gerrit.wikimedia.org/r/299448 (https://phabricator.wikimedia.org/T126341) [11:58:46] RECOVERY - MariaDB Slave Lag: s2 on db2056 is OK: OK slave_sql_lag Replication lag: 0.81 seconds [12:02:49] (03PS6) 10ArielGlenn: lock wikis for dump runs by date, permitting runs across multiple dates [dumps] - 10https://gerrit.wikimedia.org/r/299448 (https://phabricator.wikimedia.org/T126341) [12:03:16] PROBLEM - Start and verify pages via webservices on kubernetes on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/webservice/kubernetes - 299 bytes in 0.280 second response time [12:03:55] !log Gerrit was slow processing requests such as git pull since 11:17 UTC . Fixed by killing all idling/waiting tasks T140604 [12:03:56] T140604: Gerrit internal queue is filled / not processing - https://phabricator.wikimedia.org/T140604 [12:04:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:05:25] (03CR) 10ArielGlenn: [C: 032] lock wikis for dump runs by date, permitting runs across multiple dates [dumps] - 10https://gerrit.wikimedia.org/r/299448 (https://phabricator.wikimedia.org/T126341) (owner: 10ArielGlenn) [12:06:45] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [12:07:36] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [12:16:45] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [12:21:55] PROBLEM - All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 185 bytes in 1.710 second response time [12:22:27] (03Abandoned) 10Kharkiv07: Enable RevisionSlider for arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299504 (https://phabricator.wikimedia.org/T140551) (owner: 10Kharkiv07) [12:24:47] (03CR) 10Kharkiv07: [C: 031] RevisionSlider enables: dewiki, hrwiki, arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298933 (https://phabricator.wikimedia.org/T140232) (owner: 10Addshore) [12:25:36] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [12:27:30] (03PS1) 10ArielGlenn: extend dumps cron script to be able to run partial dumps [puppet] - 10https://gerrit.wikimedia.org/r/299527 (https://phabricator.wikimedia.org/T126339) [12:28:55] PROBLEM - All Flannel etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/etcd/flannel - 341 bytes in 38.567 second response time [12:28:55] PROBLEM - Verify internal DNS from within Tools on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/labs-dns/private - 341 bytes in 39.554 second response time [12:30:16] RECOVERY - Verify internal DNS from within Tools on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.410 second response time [12:33:40] (03PS1) 10Yuvipanda: tools: Timeout on toolschecker etcd instances [puppet] - 10https://gerrit.wikimedia.org/r/299529 [12:34:10] (03CR) 10jenkins-bot: [V: 04-1] tools: Timeout on toolschecker etcd instances [puppet] - 10https://gerrit.wikimedia.org/r/299529 (owner: 10Yuvipanda) [12:34:32] (03PS2) 10Yuvipanda: tools: Timeout on toolschecker etcd instances [puppet] - 10https://gerrit.wikimedia.org/r/299529 [12:35:46] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Timeout on toolschecker etcd instances [puppet] - 10https://gerrit.wikimedia.org/r/299529 (owner: 10Yuvipanda) [12:37:06] PROBLEM - All Flannel etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/etcd/flannel - 341 bytes in 7.184 second response time [12:37:11] (03PS1) 10BBlack: insecure post: 100% failure, loophole closed [puppet] - 10https://gerrit.wikimedia.org/r/299532 (https://phabricator.wikimedia.org/T136674) [12:37:12] PROBLEM - Test LDAP for query on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/ldap - 341 bytes in 7.179 second response time [12:37:13] PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 341 bytes in 20.851 second response time [12:37:13] PROBLEM - Verify internal DNS from within Tools on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/labs-dns/private - 341 bytes in 20.835 second response time [12:37:13] PROBLEM - Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/dumps - 341 bytes in 20.860 second response time [12:37:13] PROBLEM - Start and verify pages via webservices on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/service/start - 274 bytes in 0.627 second response time [12:37:56] PROBLEM - Start and verify pages via webservices on kubernetes on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/webservice/kubernetes - 299 bytes in 0.369 second response time [12:38:17] ACKNOWLEDGEMENT - All Flannel etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/etcd/flannel - 341 bytes in 7.184 second response time Yuvi Panda https://gerrit.wikimedia.org/r/#/c/299529/ + actual k8s etcd node being stuck [12:38:18] ACKNOWLEDGEMENT - All k8s etcd nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/etcd/k8s - 341 bytes in 41.510 second response time Yuvi Panda https://gerrit.wikimedia.org/r/#/c/299529/ + actual k8s etcd node being stuck [12:38:18] ACKNOWLEDGEMENT - Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/dumps - 341 bytes in 20.860 second response time Yuvi Panda https://gerrit.wikimedia.org/r/#/c/299529/ + actual k8s etcd node being stuck [12:38:18] ACKNOWLEDGEMENT - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 341 bytes in 20.851 second response time Yuvi Panda https://gerrit.wikimedia.org/r/#/c/299529/ + actual k8s etcd node being stuck [12:38:18] ACKNOWLEDGEMENT - Start and verify pages via webservices on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/service/start - 274 bytes in 0.627 second response time Yuvi Panda https://gerrit.wikimedia.org/r/#/c/299529/ + actual k8s etcd node being stuck [12:38:18] ACKNOWLEDGEMENT - Start and verify pages via webservices on kubernetes on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/webservice/kubernetes - 299 bytes in 0.369 second response time Yuvi Panda https://gerrit.wikimedia.org/r/#/c/299529/ + actual k8s etcd node being stuck [12:38:24] ACKNOWLEDGEMENT - Test LDAP for query on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/ldap - 341 bytes in 7.179 second response time Yuvi Panda https://gerrit.wikimedia.org/r/#/c/299529/ + actual k8s etcd node being stuck [12:38:24] ACKNOWLEDGEMENT - Verify internal DNS from within Tools on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 502 Bad Gateway - string OK not found on http://checker.tools.wmflabs.org:80/labs-dns/private - 341 bytes in 20.835 second response time Yuvi Panda https://gerrit.wikimedia.org/r/#/c/299529/ + actual k8s etcd node being stuck [12:38:55] RECOVERY - Verify internal DNS from within Tools on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.023 second response time [12:38:55] RECOVERY - Make sure enwiki dumps are not empty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.021 second response time [12:38:56] RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.382 second response time [12:39:13] RECOVERY - Test LDAP for query on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 3.775 second response time [12:40:46] PROBLEM - puppet last run on mw2171 is CRITICAL: CRITICAL: Puppet has 1 failures [12:41:57] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: puppet fail [12:44:45] RECOVERY - All k8s etcd nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.713 second response time [12:47:56] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:54:35] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [12:57:04] (03CR) 10Florianschmidtwelzow: [C: 031] RevisionSlider enables: dewiki, hrwiki, arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298933 (https://phabricator.wikimedia.org/T140232) (owner: 10Addshore) [13:05:35] RECOVERY - puppet last run on mw2171 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [13:10:45] PROBLEM - puppet last run on mw2137 is CRITICAL: CRITICAL: Puppet has 1 failures [13:13:04] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:13:42] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2470483 (10chasemp) [13:14:49] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2464178 (10chasemp) [13:16:01] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2470486 (10yuvipanda) Should get a new key setup for labs root, and probably good to rotate all keys anyway. [13:17:14] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [13:17:22] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2470490 (10yuvipanda) [13:18:17] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2470494 (10chasemp) [13:18:21] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2470495 (10yuvipanda) [13:19:41] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2470499 (10chasemp) [13:21:06] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2464178 (10chasemp) [13:21:56] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2470506 (10chasemp) [13:23:45] madhuvishy: ---^ :( :( :( [13:34:40] (03PS2) 10ArielGlenn: extend dumps cron job to run partial dumps as well [puppet] - 10https://gerrit.wikimedia.org/r/299527 (https://phabricator.wikimedia.org/T126339) [13:37:04] RECOVERY - puppet last run on mw2137 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [13:38:35] RECOVERY - BGP status on cr2-eqiad is OK: BGP OK - up: 131, down: 0, shutdown: 0 [13:38:45] YuviPanda: your last puppet commit wasn't merged on strontium, I just did so [13:38:54] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [13:39:06] ah, thanks paravoid [13:39:18] I missed looking for it in my merge... [13:40:15] RECOVERY - All Flannel etcd nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.818 second response time [13:51:48] 06Operations, 06Commons, 10media-storage: Install mscorefonts on scaling servers for SVG rendering - https://phabricator.wikimedia.org/T140141#2470645 (10fgiunchedi) @kaldari indeed you are right `fonts-liberation` is already installed! I'll check with legal about the EULA [13:54:49] 06Operations, 07Performance: Query for Special:Contributions?contribs=newbie&dir=prev is slow on enwiki - https://phabricator.wikimedia.org/T140537#2470666 (10jcrespo) Sorry, I am not very familiar with this functionality "newbies"- correct me if any assumption I make is wrong. What I can infer from the querie... [13:57:10] Is it me, or this is not operations, right? This is mediawiki-something right? https://phabricator.wikimedia.org/T140537#2470666 [13:57:28] I meant the ticket, not the comment [13:57:50] jynus: probably #DBA instead of #operations ? [13:57:56] DBA? [13:58:16] https://phabricator.wikimedia.org/project/profile/1060/ [13:58:21] the DBA seems to have commented to the question already [13:58:42] I was refering to potentially correcting associated tags :) [13:58:50] no, that is ok [13:58:57] i think its tagged because its causing db issues on the wmf side [13:59:04] ok [13:59:04] ah, ok... apart from that there's also #MediaWiki-Database for the DB code in MW core [13:59:14] but the core issue is the db query generating it [13:59:33] as in "ping the DBA/ask for advice" it is ok [13:59:52] or as you said, p858snake specially if it caused infrastructure problems [14:00:09] as an aside the project description on #DBA is pretty awesome [14:00:22] ha, chasemp [14:00:32] someone forced me to add that [14:00:38] I didn't want to [14:01:15] well it looks good either way :) [14:02:14] cmjohnson1: hi! Could you check https://phabricator.wikimedia.org/T119488 later on if you have time? I'd need to alert analytics related mailing lists if this is going to happen to give enough heads up time to people [14:03:22] Yes. That project desc is damn good. Thanks a lot for taking the time to write that! [14:04:22] elukey, chris was on vacations some days ago, so it is my fault for not asking him directly- I would do that and setup another date [14:05:09] jynus: we can still do tomorrow [14:05:21] around this time if that works for you [14:05:25] elukey, is that ok for scheduled maintenance? [14:05:55] feel free to discuss it internally and let us know ASAP [14:06:07] sure, I'd only need to ask to my team who should be alerted and what is the exact message to put in there to avoid confusion :) Will get back to you asap [14:06:51] in general, dbstore (analytics store) does not have dependency on any analytics infrastructure [14:07:02] except on users and tools querying it [14:07:18] (it is not like the master, that would create lots of errors on eventlogging) [14:11:29] !log moblieapps deploying fb65cea [14:11:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:12:10] jynus: I was reading https://wikitech.wikimedia.org/wiki/MariaDB#dbstore1002_.26_dbstore2002 to get some info, I always confuse the db names :) I believe that an email to analytics@ and research@ should be enough [14:12:13] (03PS1) 10Filippo Giunchedi: puppetmaster: generate prometheus targets from ganglia [puppet] - 10https://gerrit.wikimedia.org/r/299539 (https://phabricator.wikimedia.org/T126785) [14:12:15] (03PS1) 10Filippo Giunchedi: prometheus: monitor hosts in the current site [puppet] - 10https://gerrit.wikimedia.org/r/299540 (https://phabricator.wikimedia.org/T126785) [14:13:13] elukey, I can send such an email, if this was more or less ok [14:13:22] (that was my intention all along) [14:13:30] sure, as you prefer! [14:13:44] you know better than me what to write in there [14:13:45] :) [14:14:15] jynus: let me double check first with them [14:14:33] I'll get back to you in a bit [14:15:03] thanks [14:15:28] basically, eventlogging and s1 and s2 will continue to be available on analytics-slave [14:15:33] not a full down [14:15:55] (we have 2 "slaves" for log on eqiad, just the first one is not complete) [14:17:36] PROBLEM - mobileapps endpoints health on scb2001 is CRITICAL: Generic error: Generic connection error: HTTPConnectionPool(host=10.192.32.132, port=8888): Max retries exceeded with url: /?spec (Caused by ProtocolError(Connection aborted., error(111, Connection refused))) [14:17:38] jynus: analytics-slave is db1047 right? [14:17:51] 06Operations, 10ops-eqiad: db2056 RAID controller (temporary) failure - https://phabricator.wikimedia.org/T140598#2470828 (10jcrespo) 05Open>03Resolved a:03jcrespo Back to normal, we will keep it monitored in case it happens again. [14:17:54] elukey, yes [14:18:07] I know him by the A record [14:18:21] but most people in analytics know them by its CNAMEs [14:18:27] analytics-slave [14:18:34] or s1-analytics-slave, etc. [14:19:10] I have never worked with these so I always confuse who does what :) [14:19:20] there are 3 "analytics" db boxes on eqiad [14:19:37] db1046, db1047 and dbstore1002 [14:19:52] the basically provide HA for m4 (eventlogging) [14:20:23] and the slaves also have a replica of some/all s* shards [14:20:47] I think they also hold some analytics-only schemas [14:21:34] and they replicate m4 with a custom replication strategy right? [14:21:47] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [14:22:42] that is not that relevant, as downtime is downtime :-) [14:22:49] but ye [14:22:50] s [14:23:18] ahhaha sure [14:24:09] (03PS1) 10Ema: cache_upload VTC tests [puppet] - 10https://gerrit.wikimedia.org/r/299543 (https://phabricator.wikimedia.org/T128188) [14:28:46] (03PS1) 10Andrew Bogott: Add labvirt1012, 1013 to the nova scheduling pool. [puppet] - 10https://gerrit.wikimedia.org/r/299546 [14:30:36] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [14:31:26] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [14:35:27] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [14:38:29] cmjohnson1, jynus - the downtime is fine but it would be super great to give a bit more heads up to people if possible (maybe a couple of days?). [14:38:51] if not we can proceed tomorrow [14:39:33] cmjohnson1, thursday? [14:40:12] Anybody know why I'm getting SQL query errors recently? [14:40:35] It's to a certain IP address also. [14:40:48] (Not sure if I can post it here) [14:40:53] jynus that works [14:41:34] jynus/elukey 1400 UTC need 5 mins (if that) [14:41:41] I recently got one when going to Special:Block [14:41:44] Bsadowski1, you can post it if it starts with 10. [14:41:56] yeah [14:42:08] It does [14:42:21] post the error here [14:42:30] "Error: 2013 Lost connection to MySQL server during query (10.64.32.25)" [14:42:47] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [14:43:04] that is db1055 [14:43:15] Ah :) [14:43:55] Bsadowski1, what did you requested? URL? [14:44:20] I got it when trying to block someone on the English Wikipedia [14:44:30] https://en.wikipedia.org/wiki/Special:Block/The_Unblockable_Wonder [14:44:37] It was that one. [14:44:48] It took forever for the page to load as well. [14:47:18] It could be a particular ip-blocking-related query [14:48:25] I will check the logs (or someone can- I cannot right now)- but the better option would be to write all this (url, error) into a ticket on phabricator for a proper research [14:48:41] do you have a user there, Bsadowski1 ? [14:49:19] !log mobileapps deploying dfe5f11f5 [14:49:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:49:36] RECOVERY - Router interfaces on pfw-eqiad is OK: OK: host 208.80.154.218, interfaces up: 108, down: 0, dormant: 0, excluded: 1, unused: 0 [15:00:05] anomie, ostriches, thcipriani, hashar, and twentyafterfour: Dear anthropoid, the time has come. Please deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160718T1500). [15:00:05] Urbanecm: A patch you scheduled for Morning SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [15:00:15] Around [15:00:46] (03PS3) 10Giuseppe Lavagetto: Provide fontconfig configuration which forces antialiasing [puppet] - 10https://gerrit.wikimedia.org/r/299131 (https://phabricator.wikimedia.org/T139543) (owner: 10Muehlenhoff) [15:01:04] I can SWAT today [15:02:25] (03PS2) 10Thcipriani: Enable global abuse filters on ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299111 (https://phabricator.wikimedia.org/T140395) (owner: 10Urbanecm) [15:02:37] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299111 (https://phabricator.wikimedia.org/T140395) (owner: 10Urbanecm) [15:03:17] (03Merged) 10jenkins-bot: Enable global abuse filters on ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299111 (https://phabricator.wikimedia.org/T140395) (owner: 10Urbanecm) [15:03:42] (03CR) 10Giuseppe Lavagetto: [C: 032] Provide fontconfig configuration which forces antialiasing [puppet] - 10https://gerrit.wikimedia.org/r/299131 (https://phabricator.wikimedia.org/T139543) (owner: 10Muehlenhoff) [15:04:22] Urbanecm: change is live on mw1099, test with debug header please [15:04:57] thcipriani: What is debug header? And how can I test it? I'm not an admin... [15:05:44] Urbanecm: no need to be an admin, it's a browser plugin that will let you test a change on a canary server before we deploy everywhere see docs here: https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug [15:06:07] Going to install it... [15:06:14] thank you :) [15:08:12] Installed, I can see global rules only in Special:AbuseFilter so probably it works. [15:08:45] When I set the plugin to another backend I can't see "global rules only". [15:08:57] Urbanecm: ack, thanks for checking, will continue to deploy everywhere. [15:09:02] Thanks [15:09:13] !log gallium upgrading Zuul: zuul_2.1.0-151-g30a433b-wmf3precise1 zuul_2.1.0-151-g30a433b-wmf4precise1_amd64.deb . To support layout validation when multiple connections are used [15:09:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:10:26] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:299111|Enable global abuse filters on ptwiki (T140395)]] (duration: 00m 38s) [15:10:27] T140395: Enable Global AbuseFilters on ptwiki - https://phabricator.wikimedia.org/T140395 [15:10:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:10:32] ^ Urbanecm check live please [15:10:53] thcipriani, working. [15:11:03] Urbanecm: cool, thanks for checking! [15:11:09] You're welcome. [15:11:50] Thanks for deploying thcipriani! [15:12:07] you're welcome :) [15:16:28] gwicke: would you mind if I made some changes on https://gerrit.wikimedia.org/r/#/c/292505/ to address godog 's review? I'd like to try to get this integrated in the next scap version. I fixed up all the pep8 stuff last Wednesday. [15:19:41] 06Operations, 05Prometheus-metrics-monitoring: deploy prometheus node_exporter for host monitoring - https://phabricator.wikimedia.org/T140646#2471175 (10fgiunchedi) [15:20:52] (03PS2) 10Dzahn: phabricator: re-enable community metrics mail [puppet] - 10https://gerrit.wikimedia.org/r/299093 (https://phabricator.wikimedia.org/T139950) [15:21:20] !log oblivian@palladium conftool action : set/pooled=no; selector: name=mw1261.* [15:21:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:21:51] (03CR) 10Dzahn: [C: 032] phabricator: re-enable community metrics mail [puppet] - 10https://gerrit.wikimedia.org/r/299093 (https://phabricator.wikimedia.org/T139950) (owner: 10Dzahn) [15:23:39] !log oblivian@palladium conftool action : set/pooled=yes; selector: name=mw1261.* [15:23:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:29:09] (03PS1) 10Filippo Giunchedi: site: add node_exporter for prometheus machines [puppet] - 10https://gerrit.wikimedia.org/r/299558 (https://phabricator.wikimedia.org/T140646) [15:30:07] 06Operations, 10Wikimedia-Apache-configuration, 07HHVM, 07Wikimedia-log-errors: Fix Apache proxy_fcgi error "Invalid argument: AH01075: Error dispatching request to" (Causing HTTP 503) - https://phabricator.wikimedia.org/T73487#2471213 (10Joe) So, by combining elukey's patch and the original patch for not... [15:34:53] (03PS1) 10Mobrovac: Change-Prop: Reduce ORES precaching concurrency to 10 [puppet] - 10https://gerrit.wikimedia.org/r/299559 [15:36:01] (03CR) 10Ladsgroup: [C: 031] Change-Prop: Reduce ORES precaching concurrency to 10 [puppet] - 10https://gerrit.wikimedia.org/r/299559 (owner: 10Mobrovac) [15:42:04] (03CR) 10Faidon Liambotis: [C: 032] Change-Prop: Reduce ORES precaching concurrency to 10 [puppet] - 10https://gerrit.wikimedia.org/r/299559 (owner: 10Mobrovac) [15:42:06] (03CR) 10GWicke: Logstash_checker script for canary deploys (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/292505 (https://phabricator.wikimedia.org/T110068) (owner: 10GWicke) [15:43:31] thcipriani: go ahead wrt changes [15:43:45] gwicke: will do, thanks! [15:44:03] it would be great to move this script out of puppet as well, to the same repo as checker.py [15:44:39] I left some comments, explaining most of the inline remarks [15:46:48] 06Operations, 06Commons, 10media-storage, 13Patch-For-Review, 07User-notice: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2471268 (10Joe) I merged the patch, and it was correctly applied on all imagescalers that run Debian jes... [15:47:19] ahh, I see that. [15:47:38] <_joe_> gwicke: I would like to port it to the service-checker repo [15:47:50] Bsadowski1, I have filed https://phabricator.wikimedia.org/T140650 [15:48:30] it could be a software change, or something on that particular database server, as it only fails on db1055 [15:48:32] joe: that would be great; should we merge it to puppet first & move to service-checker then, or the other way around? [15:48:53] I will investigate further [15:52:25] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2471305 (10bd808) [15:53:59] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2471307 (10chasemp) >>! In T140422#2470486, @yuvipanda wrote: > Should get a new key setup for labs root, and probably good to rotate all keys anyway.... [15:56:34] (03PS1) 10Yuvipanda: tools: Check all servers for LDAP check [puppet] - 10https://gerrit.wikimedia.org/r/299561 [15:57:07] (03PS2) 10Yuvipanda: tools: Check all servers for LDAP check [puppet] - 10https://gerrit.wikimedia.org/r/299561 [15:59:52] (03CR) 10Andrew Bogott: [C: 031] tools: Check all servers for LDAP check [puppet] - 10https://gerrit.wikimedia.org/r/299561 (owner: 10Yuvipanda) [16:01:49] (03PS2) 10Andrew Bogott: Add labvirt1012, 1013 to the nova scheduling pool. [puppet] - 10https://gerrit.wikimedia.org/r/299546 [16:01:53] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [16:02:42] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [16:06:08] 06Operations, 10Traffic, 13Patch-For-Review: Raise cache frontend memory sizes significantly - https://phabricator.wikimedia.org/T135384#2471349 (10BBlack) 05Open>03Resolved a:03BBlack [16:07:52] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [16:09:16] 06Operations, 06Discovery, 10Elasticsearch, 10Wikimedia-Logstash, and 2 others: Logstash elasticsearch mapping does not allow err.code to be a string - https://phabricator.wikimedia.org/T137400#2471360 (10bd808) 05Open>03Resolved a:03bd808 See {rOPUPd5b95160faef2cf60c57c12aa2178f979d82b922} [16:09:47] (03PS3) 10Dzahn: phabricator: re-enable community metrics mail [puppet] - 10https://gerrit.wikimedia.org/r/299093 (https://phabricator.wikimedia.org/T139950) [16:15:43] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [16:22:00] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2471377 (10bd808) [16:23:03] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2318624 (10bd808) [16:26:10] (03CR) 10Andrew Bogott: [C: 032] Add labvirt1012, 1013 to the nova scheduling pool. [puppet] - 10https://gerrit.wikimedia.org/r/299546 (owner: 10Andrew Bogott) [16:27:45] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [16:28:32] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [16:29:33] 06Operations, 07Performance: Query for Special:Contributions?contribs=newbie&dir=prev is slow on enwiki - https://phabricator.wikimedia.org/T140537#2471398 (10matmarex) >>! In T140537#2470666, @jcrespo wrote: > What I can infer from the queries is that it checks for users within a range of rev_user, and that t... [16:30:38] (03PS4) 10Dzahn: phabricator: re-enable community metrics mail [puppet] - 10https://gerrit.wikimedia.org/r/299093 (https://phabricator.wikimedia.org/T139950) [16:35:04] (03CR) 10Chad: "Yes, I know what it's for, but why would we want/need it?" [puppet] - 10https://gerrit.wikimedia.org/r/299182 (owner: 10Paladox) [16:37:01] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2471423 (10Dzahn) - Order a yubikey (via OIT?) and set it up ? [16:39:39] 06Operations, 06Commons, 10media-storage, 13Patch-For-Review, 07User-notice: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2471428 (10kaldari) Still looks broken to me :( [16:40:53] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2471431 (10Dzahn) [16:45:26] (03PS1) 10Yuvipanda: Protect servers behind tools/labs proxies from httpoxy [puppet] - 10https://gerrit.wikimedia.org/r/299567 [16:46:14] 06Operations, 10ops-codfw, 10ops-eqiad: ship 7 ex4200s from codfw to eqiad - https://phabricator.wikimedia.org/T140655#2471460 (10RobH) [16:50:19] (03CR) 10BBlack: [C: 031] Protect servers behind tools/labs proxies from httpoxy [puppet] - 10https://gerrit.wikimedia.org/r/299567 (owner: 10Yuvipanda) [16:50:50] 06Operations, 10ops-codfw, 10ops-eqiad: ship 7 ex4200s from codfw to eqiad - https://phabricator.wikimedia.org/T140655#2471460 (10faidon) As I mentioned in the ops meeting, we're upgrading eqiad's row D this quarter. We already have plans for its 3x EX4550s switches that we are going to salvage, but no plans... [16:50:58] (03PS2) 10Yuvipanda: Protect servers behind tools/labs proxies from httpoxy [puppet] - 10https://gerrit.wikimedia.org/r/299567 [16:51:08] (03PS3) 10Yuvipanda: tools: Check all servers for LDAP check [puppet] - 10https://gerrit.wikimedia.org/r/299561 [16:51:18] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Check all servers for LDAP check [puppet] - 10https://gerrit.wikimedia.org/r/299561 (owner: 10Yuvipanda) [16:51:32] (03PS3) 10Yuvipanda: Protect servers behind tools/labs proxies from httpoxy [puppet] - 10https://gerrit.wikimedia.org/r/299567 [16:51:59] (03CR) 10Yuvipanda: [C: 032 V: 032] Protect servers behind tools/labs proxies from httpoxy [puppet] - 10https://gerrit.wikimedia.org/r/299567 (owner: 10Yuvipanda) [16:52:23] (03PS1) 10BBlack: Remove inbound Proxy header on all caches [puppet] - 10https://gerrit.wikimedia.org/r/299568 [16:59:06] 06Operations, 06Release-Engineering-Team, 10Traffic, 05Security: Make sure we're not relying on HTTP_PROXY headers - https://phabricator.wikimedia.org/T140658#2471564 (10demon) [16:59:28] (03PS2) 10Chad: Remove inbound Proxy header on all caches [puppet] - 10https://gerrit.wikimedia.org/r/299568 (https://phabricator.wikimedia.org/T140658) (owner: 10BBlack) [16:59:46] (03CR) 10Chad: "See T140658." [puppet] - 10https://gerrit.wikimedia.org/r/299567 (owner: 10Yuvipanda) [17:00:05] gehel: Respected human, time to deploy Weekly Wikidata query service deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160718T1700). Please do the needful. [17:00:05] Smalyshev: A patch you scheduled for Weekly Wikidata query service deployment window is about to be deployed. Please be available during the process. [17:00:05] (03CR) 10BBlack: [C: 032 V: 032] Remove inbound Proxy header on all caches [puppet] - 10https://gerrit.wikimedia.org/r/299568 (https://phabricator.wikimedia.org/T140658) (owner: 10BBlack) [17:00:28] (03PS1) 10Chad: Set $wgHTTPProxy globally instead of relying on getenv() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299571 (https://phabricator.wikimedia.org/T140658) [17:00:29] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2464178 (10mark) Madhu is approved for Ops access. Yes, she should also get a Yubikey. :) [17:01:08] SMalyshev: I'm updating wdq-beta first... [17:01:22] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:01:32] gehel: cool [17:03:44] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [17:04:25] SMalyshev: wdqs-beta looks good to me, moving to prod unless you have specific tests to do [17:04:39] no, all good [17:08:49] !log updated wdqs to latest version, new blazegraph version, restart of wdqs-updater [17:08:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:09:39] (03PS2) 10Chad: Set $wgHTTPProxy globally instead of relying on getenv() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299571 (https://phabricator.wikimedia.org/T140658) [17:10:26] SMalyshev: wdqs looks good to me, let me know if you see anything strange... [17:10:49] (03CR) 10Dzahn: [C: 031] madhu: transition to ops [puppet] - 10https://gerrit.wikimedia.org/r/299078 (https://phabricator.wikimedia.org/T140422) (owner: 10Rush) [17:11:06] looks good [17:15:50] 06Operations, 10Ops-Access-Requests: Requesting access to research groups for Helen Jiang - https://phabricator.wikimedia.org/T140659#2471648 (10Neil_P._Quinn_WMF) [17:17:20] (03PS2) 10Rush: madhu: transition to ops [puppet] - 10https://gerrit.wikimedia.org/r/299078 (https://phabricator.wikimedia.org/T140422) [17:17:51] (03PS5) 10Dzahn: phabricator: re-enable community metrics mail [puppet] - 10https://gerrit.wikimedia.org/r/299093 (https://phabricator.wikimedia.org/T139950) [17:17:55] Hi, one of cswiki users reported that he can't read notifications on cswiki. He must go to another wiki, open notification, then go back to cswiki and then all is working normally. He provided a screenshot, see http://urbanecm.8u.cz/wikipedia/notification.jpg [17:18:03] (03CR) 10Dzahn: [V: 032] phabricator: re-enable community metrics mail [puppet] - 10https://gerrit.wikimedia.org/r/299093 (https://phabricator.wikimedia.org/T139950) (owner: 10Dzahn) [17:18:13] 06Operations, 06Commons, 10media-storage, 13Patch-For-Review, 07User-notice: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2471684 (10Menner) What is the result of fc-match? ``` FC_DEBUG=1024 fc-match "Times" ``` Look here f... [17:18:14] !log moblieapps deploying debb3f6 [17:18:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:18:24] bearND: mdholloway: ^^ [17:18:32] This strip was there for a few of minutes and nothing changed. [17:18:41] Should I fill a ticket in phab? [17:19:55] 06Operations, 10Ops-Access-Requests: Requesting access to research groups for Helen Jiang - https://phabricator.wikimedia.org/T140659#2471688 (10Neil_P._Quinn_WMF) @HJiang-WMF, you'll need to sign the [server access responsibilities document](https://phabricator.wikimedia.org/L3) and post your production-only... [17:20:23] 06Operations, 10Ops-Access-Requests, 06Editing-Analysis: Requesting access to research groups for Helen Jiang - https://phabricator.wikimedia.org/T140659#2471694 (10Neil_P._Quinn_WMF) [17:21:45] (03PS1) 10BBlack: Unset inbound Proxy header - various misc services [puppet] - 10https://gerrit.wikimedia.org/r/299577 (https://phabricator.wikimedia.org/T140658) [17:22:53] 06Operations, 10Ops-Access-Requests, 06Editing-Analysis: Requesting access to research groups for Helen Jiang - https://phabricator.wikimedia.org/T140659#2471714 (10HJiang-WMF) I already signed the Acknowledgment of Wikimedia Server Access Responsibilities. [17:23:33] 06Operations, 10Ops-Access-Requests, 06Editing-Analysis: Requesting access to research groups for Helen Jiang - https://phabricator.wikimedia.org/T140659#2471720 (10HJiang-WMF) As instructed by https://wikitech.wikimedia.org/wiki/Production_shell_access, I will need explicit bastiononly access, and my prefer... [17:23:54] (03CR) 1020after4: [C: 031] Set $wgHTTPProxy globally instead of relying on getenv() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299571 (https://phabricator.wikimedia.org/T140658) (owner: 10Chad) [17:24:22] (03CR) 10Chad: [C: 031] Unset inbound Proxy header - various misc services [puppet] - 10https://gerrit.wikimedia.org/r/299577 (https://phabricator.wikimedia.org/T140658) (owner: 10BBlack) [17:25:37] (03CR) 10BBlack: [C: 032] Unset inbound Proxy header - various misc services [puppet] - 10https://gerrit.wikimedia.org/r/299577 (https://phabricator.wikimedia.org/T140658) (owner: 10BBlack) [17:26:13] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review, 15User-Addshore: MediaWiki deployment shell access request for addshore - https://phabricator.wikimedia.org/T140276#2458869 (10greg) Approve. [17:26:23] (03CR) 10Greg Grossmeier: [C: 031] admin: add addshore to deployers [puppet] - 10https://gerrit.wikimedia.org/r/299032 (https://phabricator.wikimedia.org/T140276) (owner: 10Dzahn) [17:26:38] (03CR) 10Chad: [C: 032] Set $wgHTTPProxy globally instead of relying on getenv() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299571 (https://phabricator.wikimedia.org/T140658) (owner: 10Chad) [17:27:17] (03Merged) 10jenkins-bot: Set $wgHTTPProxy globally instead of relying on getenv() [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299571 (https://phabricator.wikimedia.org/T140658) (owner: 10Chad) [17:27:26] (03PS1) 10MaxSem: maps: move postgres user creation from grants.sql [puppet] - 10https://gerrit.wikimedia.org/r/299579 [17:28:37] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: globally set $wgHTTPProxy (duration: 00m 26s) [17:28:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:30:27] (03PS2) 10Filippo Giunchedi: Add zfilipin to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/298792 (https://phabricator.wikimedia.org/T140264) (owner: 10Thcipriani) [17:32:37] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] Add zfilipin to deployment group [puppet] - 10https://gerrit.wikimedia.org/r/298792 (https://phabricator.wikimedia.org/T140264) (owner: 10Thcipriani) [17:34:11] (03CR) 10Paladox: "Well I'm not really sure, just thought It would be useful for users using that function." [puppet] - 10https://gerrit.wikimedia.org/r/299182 (owner: 10Paladox) [17:35:23] 06Operations, 10Ops-Access-Requests: Requesting shell access and access to groups 'analytics-privatedata-users' and 'researchers' for Brentjoseph (bcohn) - https://phabricator.wikimedia.org/T140449#2465141 (10Jgreen) @Brentjoseph there's a poorly explained step in the sign up process, which is that you also ne... [17:36:32] 06Operations, 10Ops-Access-Requests: Requesting shell access and access to groups 'analytics-privatedata-users' and 'researchers' for Jksamra - https://phabricator.wikimedia.org/T140445#2471795 (10Jgreen) @Jksamra there's a poorly explained step in the sign up process, which is that you also need a wikitech/la... [17:37:11] 06Operations, 06Commons, 10media-storage, 13Patch-For-Review, 07User-notice: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2471798 (10Joe) What is particularly baffling to me is that @MoritzMuehlenhoff has reported success in g... [17:37:16] 06Operations, 10Ops-Access-Requests: Requesting shell access and access to groups 'analytics-privatedata-users' and 'researchers' for Mpany - https://phabricator.wikimedia.org/T140399#2471799 (10Jgreen) @Mpany there's a poorly explained step in the sign up process, which is that you also need a wikitech/labs... [17:37:26] (03PS3) 10Filippo Giunchedi: admin: add addshore to deployers [puppet] - 10https://gerrit.wikimedia.org/r/299032 (https://phabricator.wikimedia.org/T140276) (owner: 10Dzahn) [17:38:13] 06Operations, 10ops-codfw, 10ops-eqiad: ship 7 ex4200s from codfw to eqiad - https://phabricator.wikimedia.org/T140655#2471802 (10RobH) Yep, but we only use a single EX4200 in all of codfw. So I had @papaul keep 2 other spares there, and send the rest on. There is no other planned expansion for ex4200s tha... [17:39:25] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review, 15User-zeljkofilipin: MediaWiki deployment shell access request for zfilipin - https://phabricator.wikimedia.org/T140264#2471832 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi merged, access should be enabled shortly [17:39:25] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] admin: add addshore to deployers [puppet] - 10https://gerrit.wikimedia.org/r/299032 (https://phabricator.wikimedia.org/T140276) (owner: 10Dzahn) [17:40:32] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review, 15User-Addshore: MediaWiki deployment shell access request for addshore - https://phabricator.wikimedia.org/T140276#2471853 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi merged, access should be enabled shortly [17:42:53] 06Operations, 06Commons, 10media-storage, 13Patch-For-Review, 07User-notice: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2471856 (10Menner) One more thing. Did you reboot? Otherwise try: ``` fc-cache ``` Additionally you m... [17:51:11] (03CR) 10Chad: [C: 04-1] "It's not really useful unless it's being used in a hook like post-receive, the act of signing a push itself doesn't do anything special..." [puppet] - 10https://gerrit.wikimedia.org/r/299182 (owner: 10Paladox) [18:02:58] (03Abandoned) 10Paladox: Enable gpg keys in gerrit [puppet] - 10https://gerrit.wikimedia.org/r/299182 (owner: 10Paladox) [18:04:54] (03CR) 10Chad: [C: 04-1] "We don't need to configure those two values (file and roottree), we can just put them in as-is. Gerrit ignores config values it doesn't us" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/298710 (owner: 10Paladox) [18:07:01] (03CR) 10Paladox: Add missing roottree, file configs to gerrit.config.erb (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/298710 (owner: 10Paladox) [18:10:38] 06Operations, 06Commons, 10media-storage, 13Patch-For-Review, 07User-notice: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2471934 (10Joe) @Menner I already ran fc-cache; in fact the no-bitmap configuration ``` eureka [18:13:08] (03PS18) 10Paladox: Add missing roottree, file configs to gerrit.config.erb [puppet] - 10https://gerrit.wikimedia.org/r/298710 [18:13:24] (03PS19) 10Paladox: Add missing roottree, file configs to gerrit.config.erb [puppet] - 10https://gerrit.wikimedia.org/r/298710 [18:15:27] (03CR) 10Dzahn: [C: 031] "looks alright, in ops meeting it was said that analytics should ack it" [puppet] - 10https://gerrit.wikimedia.org/r/298928 (https://phabricator.wikimedia.org/T140342) (owner: 10Addshore) [18:18:46] (03CR) 10Dzahn: [C: 031] madhu: transition to ops [puppet] - 10https://gerrit.wikimedia.org/r/299078 (https://phabricator.wikimedia.org/T140422) (owner: 10Rush) [18:19:37] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2471938 (10Dzahn) - added to ops-private - added to root@ mail [18:19:49] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2471939 (10Dzahn) [18:21:15] (03PS1) 10Giuseppe Lavagetto: mediawiki::packages::fonts: reject bitmap fonts on jessie [puppet] - 10https://gerrit.wikimedia.org/r/299586 (https://phabricator.wikimedia.org/T139543) [18:22:52] (03PS3) 10Rush: WIP for a key change before merge -- madhu: transition to ops [puppet] - 10https://gerrit.wikimedia.org/r/299078 (https://phabricator.wikimedia.org/T140422) [18:23:21] 06Operations, 06Discovery, 06Maps, 07Epic: Epic: switch Maps to production status - https://phabricator.wikimedia.org/T133744#2471947 (10MaxSem) [18:23:39] (03CR) 10Ori.livneh: [C: 031] "Haven't tested, but LGTM overall." [puppet] - 10https://gerrit.wikimedia.org/r/299586 (https://phabricator.wikimedia.org/T139543) (owner: 10Giuseppe Lavagetto) [18:24:26] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki::packages::fonts: reject bitmap fonts on jessie [puppet] - 10https://gerrit.wikimedia.org/r/299586 (https://phabricator.wikimedia.org/T139543) (owner: 10Giuseppe Lavagetto) [18:25:25] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2471952 (10chasemp) I asked madhu to change her user key along with the role/access change before https://gerrit.wikimedia.org/r/#/c/299078/. Seems app... [18:25:38] <_joe_> do NOT merge puppet, please [18:25:51] <_joe_> I have a followup commit or I'd need to revert [18:27:35] (03PS1) 10Giuseppe Lavagetto: mediawiki: use puppet-provided fontconfig everywhere [puppet] - 10https://gerrit.wikimedia.org/r/299587 (https://phabricator.wikimedia.org/T139543) [18:29:36] (03PS2) 10Giuseppe Lavagetto: mediawiki: use puppet-provided fontconfig everywhere [puppet] - 10https://gerrit.wikimedia.org/r/299587 (https://phabricator.wikimedia.org/T139543) [18:30:44] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] mediawiki: use puppet-provided fontconfig everywhere [puppet] - 10https://gerrit.wikimedia.org/r/299587 (https://phabricator.wikimedia.org/T139543) (owner: 10Giuseppe Lavagetto) [18:31:25] (03PS1) 10Andrew Bogott: Include nova mysql password in novaenv.sh [puppet] - 10https://gerrit.wikimedia.org/r/299590 (https://phabricator.wikimedia.org/T139272) [18:33:14] (03CR) 10jenkins-bot: [V: 04-1] Include nova mysql password in novaenv.sh [puppet] - 10https://gerrit.wikimedia.org/r/299590 (https://phabricator.wikimedia.org/T139272) (owner: 10Andrew Bogott) [18:33:48] (03PS1) 10RobH: ssl renewal: eventdonations.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/299591 [18:35:22] (03CR) 10jenkins-bot: [V: 04-1] ssl renewal: eventdonations.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/299591 (owner: 10RobH) [18:35:51] (03PS1) 10Dzahn: zuul: switch over from old to new gerrit server [puppet] - 10https://gerrit.wikimedia.org/r/299592 (https://phabricator.wikimedia.org/T125018) [18:36:14] so my patch fails for a bunch of indentations on a file i didnt touch... [18:36:20] odd.. /modules/mediawiki/manifests/multimedia.pp:13 WARNING indentation of => is not properly aligned (arrow_alignment) [18:36:43] Did someone recently change that file and force it though or something? [18:36:58] _joe_: ? [18:37:07] (03CR) 10jenkins-bot: [V: 04-1] zuul: switch over from old to new gerrit server [puppet] - 10https://gerrit.wikimedia.org/r/299592 (https://phabricator.wikimedia.org/T125018) (owner: 10Dzahn) [18:37:14] It seems that the arrow indentation is causing failures on future patchsets [18:37:27] i get the same problem [18:38:08] can fix [18:38:11] _joe_: Did you want to fix this or should someone else do so? [18:38:23] mutante: go for it, if its jsut indentation [18:38:56] ok [18:39:06] <_joe_> robh: yeah I will fix it sorry [18:39:18] (03PS1) 10Dzahn: multimedia: fix lint warnigns, indentation [puppet] - 10https://gerrit.wikimedia.org/r/299593 [18:39:22] <_joe_> I was on a hurry in order to avoid all mw hosts to fail [18:39:23] <_joe_> sorry [18:39:25] (03PS2) 10Andrew Bogott: Include nova mysql password in novaenv.sh [puppet] - 10https://gerrit.wikimedia.org/r/299590 (https://phabricator.wikimedia.org/T139272) [18:39:25] no worries [18:39:27] (03PS1) 10Andrew Bogott: Line up some arrows to make puppet-lint happy [puppet] - 10https://gerrit.wikimedia.org/r/299594 [18:39:41] i think daniel is fixing right now [18:39:53] <_joe_> yes [18:40:02] (03CR) 10RobH: [C: 031] multimedia: fix lint warnigns, indentation [puppet] - 10https://gerrit.wikimedia.org/r/299593 (owner: 10Dzahn) [18:40:04] (03PS2) 10Dzahn: multimedia: fix lint warnings, indentation [puppet] - 10https://gerrit.wikimedia.org/r/299593 [18:40:35] i guess i'll make my commit message make more sense since it'll rebase (was fate due to my not detailed commit message) [18:40:49] <_joe_> jenkins is being awfully slow [18:40:57] yea [18:41:01] (03CR) 10Giuseppe Lavagetto: [C: 031] multimedia: fix lint warnings, indentation [puppet] - 10https://gerrit.wikimedia.org/r/299593 (owner: 10Dzahn) [18:42:19] (03CR) 10Andrew Bogott: [C: 032] Line up some arrows to make puppet-lint happy [puppet] - 10https://gerrit.wikimedia.org/r/299594 (owner: 10Andrew Bogott) [18:43:11] (03PS1) 10Madhuvishy: Update prod ssh keys for user madhuvishy [puppet] - 10https://gerrit.wikimedia.org/r/299595 [18:43:21] ah, sorry mutante, I stepped on your patch [18:43:32] (03CR) 10Chad: [C: 031] "This looks fine now. We can go ahead with this." [puppet] - 10https://gerrit.wikimedia.org/r/298710 (owner: 10Paladox) [18:43:44] (03Abandoned) 10Dzahn: multimedia: fix lint warnings, indentation [puppet] - 10https://gerrit.wikimedia.org/r/299593 (owner: 10Dzahn) [18:43:46] heh, np [18:43:59] (03PS2) 10Rush: Update prod ssh keys for user madhuvishy [puppet] - 10https://gerrit.wikimedia.org/r/299595 (owner: 10Madhuvishy) [18:44:01] (03PS2) 10RobH: ssl renewal: eventdonations.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/299591 [18:44:12] (03PS3) 10RobH: ssl renewal: eventdonations.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/299591 [18:44:24] let the race begin [18:44:25] (03CR) 10Chad: "Is there a reason we can't just use gerrit.wikimedia.org here?" [puppet] - 10https://gerrit.wikimedia.org/r/299592 (https://phabricator.wikimedia.org/T125018) (owner: 10Dzahn) [18:44:31] (03PS2) 10Dzahn: zuul: switch over from old to new gerrit server [puppet] - 10https://gerrit.wikimedia.org/r/299592 (https://phabricator.wikimedia.org/T125018) [18:44:47] damn, someone is ahead of me [18:44:52] 595 [18:45:04] chasemp beat me in the rebase race. [18:45:14] rebase all the things! [18:45:19] :) [18:45:22] 06Operations, 06Commons, 10media-storage, 13Patch-For-Review, 07User-notice: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2472018 (10Joe) Ok; I think my last two commits actually fixed this - @Menner @kaldari can you confirm a... [18:46:50] mutante: so, https://gerrit.wikimedia.org/r/298688 . that was never deployed, was it? [18:47:00] (03CR) 10RobH: [C: 032] ssl renewal: eventdonations.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/299591 (owner: 10RobH) [18:47:13] mutante: i don't know if i should just wait, or try figuring out why it's not working [18:47:18] (03PS1) 10Paladox: Add css to turn repo links into blue again [puppet] - 10https://gerrit.wikimedia.org/r/299596 [18:47:47] (03PS2) 10Paladox: Add css to turn repo links into blue again [puppet] - 10https://gerrit.wikimedia.org/r/299596 [18:47:59] MatmaRex: no, it is deployed [18:48:07] MatmaRex: the question is why it doesnt work [18:48:48] (03CR) 1020after4: [C: 031] Add missing roottree, file configs to gerrit.config.erb [puppet] - 10https://gerrit.wikimedia.org/r/298710 (owner: 10Paladox) [18:49:06] mutante: right… i expact i should see some JS code on https://gerrit-new.wikimedia.org/r/gerrit_ui/undefined.cache.js , but it's still 404 [18:49:27] (or possibly https://gerrit.wikimedia.org/r/gerrit_ui/undefined.cache.js , i don't know if that applies to the gerrit-new domain) [18:49:38] (03PS4) 10Paladox: Add some colors to the site table on changes [puppet] - 10https://gerrit.wikimedia.org/r/299447 [18:49:46] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [18:50:10] MatmaRex: the same config line is on old and new gerrit server [18:50:14] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [18:50:18] MatmaRex: so yes, can be tested on gerrit-new [18:50:47] MatmaRex: maybe it's cached in varnish [18:50:52] MatmaRex it is same here https://gerrit.wikimedia.org/r/gerrit_ui/undefined.cache.js [18:50:55] 404 error [18:51:09] (03PS1) 10Madhuvishy: Add ssh key for user Madhuvishy to root-authorized-keys [labs/private] - 10https://gerrit.wikimedia.org/r/299597 [18:51:16] nevermind that , it's not [18:52:26] mutante: could it be conflicting with the ProxyPass rule? apache configs are really not something i'm experienced with, heh [18:52:53] mutante: i also noticed now, the dots in the path should probably be escaped, but that shouldn't cause the rule not work [18:53:04] does this work? [18:53:05] RewriteRule ^/tools/hooks/commit-msg$ https://gerrit.wikimedia.org/r/tools/hooks/commit-msg [18:53:13] the one right above it [18:53:23] it does afaik [18:53:24] https://gerrit.wikimedia.org/tools/hooks/commit-msg [18:53:30] that serves a file [18:53:47] well, it does a 302 redirect, actually. but it works [18:53:50] yea, and also on gerrit-new [18:54:13] hmm [18:55:18] mutante it seems MatmaRex forgot to add http:// [18:55:25] Should be [18:55:35] hmm [18:55:45] relative paths don't work? [18:55:55] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [18:56:01] RewriteRule ^/r/gerrit_ui/undefined.cache.js$ https://<%= @host %>/r/gerrit_ui/D39174379837CB12534D3B279AEAC59F.cach [18:56:01] e.js [18:56:08] MatmaRex ^^ [18:56:15] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [18:56:44] hmm. sounds worth trying. i really don't know enough about this to tell whether than will work :D [18:56:49] that* [18:57:09] ok, so on gerrit-new there is currently an unrelated issue [18:57:13] !log demon@tin Synchronized wmf-config/: Pruning 1.27.0-wmf.N ExtensionMessages files (duration: 00m 34s) [18:57:14] mutante: want me to submit a patch for that? ^ [18:57:15] (03PS1) 10Paladox: [gerrit] Fix RewriteRule for undefined.cache.js [puppet] - 10https://gerrit.wikimedia.org/r/299598 [18:57:15] in the apache config. due to testing [18:57:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:57:23] oh, paladox laready did. thanks [18:57:23] i tried testing it but ran into that thing [18:57:27] Yep :) [18:57:49] Syntax error on line 54 [18:58:03] (03PS3) 10Rush: Update prod ssh keys for user madhuvishy [puppet] - 10https://gerrit.wikimedia.org/r/299595 (owner: 10Madhuvishy) [18:58:26] I added <%= @host %> so that it will work with gerrit-new and gerrit. [18:58:28] :) [18:58:39] (03PS2) 10Paladox: [gerrit] Fix RewriteRule for undefined.cache.js [puppet] - 10https://gerrit.wikimedia.org/r/299598 [18:58:45] (03CR) 10Andrew Bogott: [C: 031] Add ssh key for user Madhuvishy to root-authorized-keys [labs/private] - 10https://gerrit.wikimedia.org/r/299597 (owner: 10Madhuvishy) [18:59:00] (03CR) 10Bartosz Dziewoński: [C: 031] [gerrit] Fix RewriteRule for undefined.cache.js [puppet] - 10https://gerrit.wikimedia.org/r/299598 (owner: 10Paladox) [19:00:16] (03PS2) 10Gehel: maps: move postgres user creation from grants.sql [puppet] - 10https://gerrit.wikimedia.org/r/299579 (owner: 10MaxSem) [19:00:18] i'll get back to it after the other issue [19:00:30] Ok, thanks. [19:00:35] jouncebot: next [19:00:35] In 0 hour(s) and 59 minute(s): Services – Parsoid / OCG / Citoid / Mobileapps / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160718T2000) [19:02:38] (03CR) 10Gehel: [C: 032] maps: move postgres user creation from grants.sql [puppet] - 10https://gerrit.wikimedia.org/r/299579 (owner: 10MaxSem) [19:03:15] PROBLEM - HTTPS on lead is CRITICAL: SSL CRITICAL - failed to connect or SSL handshake:IO::Socket::SSL: connect: Connection refused [19:03:45] (03PS4) 10Rush: Update prod ssh keys for user madhuvishy [puppet] - 10https://gerrit.wikimedia.org/r/299595 (owner: 10Madhuvishy) [19:04:11] !log starting elasticsearch upgrade for logstash (T136001) [19:04:12] T136001: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001 [19:04:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:05:33] (03PS3) 10Dzahn: zuul: use generic service name for gerrit server [puppet] - 10https://gerrit.wikimedia.org/r/299592 (https://phabricator.wikimedia.org/T125018) [19:05:47] !log demon@tin Synchronized private/: remove obsolete wikitech config file (duration: 00m 32s) [19:05:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:06:33] (03CR) 10Dzahn: "good point. i guess not (but we have multiple IPs on the interface). amended to use service name." [puppet] - 10https://gerrit.wikimedia.org/r/299592 (https://phabricator.wikimedia.org/T125018) (owner: 10Dzahn) [19:06:50] andrewbogott: ^^^ Pruned the obsolete version ^^^ [19:07:01] (wikitech still working fine, so I was right, it was unused) [19:07:16] cool, thanks for the cleanup [19:07:16] (thanks for clarifying for me) [19:07:20] np [19:08:54] (03CR) 10Rush: [C: 032] Update prod ssh keys for user madhuvishy [puppet] - 10https://gerrit.wikimedia.org/r/299595 (owner: 10Madhuvishy) [19:09:19] (03PS2) 10Gehel: Import kibana .deb to apt repository [puppet] - 10https://gerrit.wikimedia.org/r/296477 (https://phabricator.wikimedia.org/T129138) (owner: 10EBernhardson) [19:10:33] (03PS4) 10Rush: madhu: transition to ops for admin.yaml [puppet] - 10https://gerrit.wikimedia.org/r/299078 (https://phabricator.wikimedia.org/T140422) [19:10:38] (03PS3) 10Andrew Bogott: Include nova mysql password in novaenv.sh [puppet] - 10https://gerrit.wikimedia.org/r/299590 (https://phabricator.wikimedia.org/T139272) [19:10:40] (03PS1) 10Andrew Bogott: cold-migrate password reform: use novaenv.sh for vitals wherever possible. [puppet] - 10https://gerrit.wikimedia.org/r/299602 (https://phabricator.wikimedia.org/T139272) [19:10:53] (03CR) 10Gehel: [C: 032] Import kibana .deb to apt repository [puppet] - 10https://gerrit.wikimedia.org/r/296477 (https://phabricator.wikimedia.org/T129138) (owner: 10EBernhardson) [19:11:30] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2472207 (10EBernhardson) [19:12:25] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2318624 (10EBernhardson) [19:13:02] (03PS5) 10Rush: madhu: transition to ops for admin.yaml [puppet] - 10https://gerrit.wikimedia.org/r/299078 (https://phabricator.wikimedia.org/T140422) [19:13:14] RECOVERY - HTTPS on lead is OK: SSL OK - Certificate gerrit-new.wikimedia.org valid until 2016-10-10 23:47:00 +0000 (expires in 84 days) [19:15:27] (03PS1) 10Chad: RequestHeader unset doesn't like line-ending comments [puppet] - 10https://gerrit.wikimedia.org/r/299603 [19:15:47] bblack, mutante ^^^ [19:18:13] 06Operations, 06Commons, 10media-storage, 13Patch-For-Review, 07User-notice: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2472246 (10Menner) Looks good from here. The only thing I'm curious now is that Times is not substitued... [19:18:15] (03CR) 10Rush: [C: 032] madhu: transition to ops for admin.yaml [puppet] - 10https://gerrit.wikimedia.org/r/299078 (https://phabricator.wikimedia.org/T140422) (owner: 10Rush) [19:18:33] (03PS2) 10Madhuvishy: Add ssh key for user Madhuvishy to root-authorized-keys [labs/private] - 10https://gerrit.wikimedia.org/r/299597 (https://phabricator.wikimedia.org/T140422) [19:19:57] (03CR) 10Rush: [C: 032] Add ssh key for user Madhuvishy to root-authorized-keys [labs/private] - 10https://gerrit.wikimedia.org/r/299597 (https://phabricator.wikimedia.org/T140422) (owner: 10Madhuvishy) [19:20:06] (03CR) 10Rush: [V: 032] Add ssh key for user Madhuvishy to root-authorized-keys [labs/private] - 10https://gerrit.wikimedia.org/r/299597 (https://phabricator.wikimedia.org/T140422) (owner: 10Madhuvishy) [19:20:38] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2472255 (10chasemp) also: https://gerrit.wikimedia.org/r/#/c/299595 [19:21:08] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2472259 (10chasemp) [19:22:13] (03PS2) 10Gehel: Configuration changes for es 2.x on logstash1001-1006 [puppet] - 10https://gerrit.wikimedia.org/r/296475 (owner: 10EBernhardson) [19:23:54] PROBLEM - puppet last run on mw1274 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:24] PROBLEM - puppet last run on mw2242 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:25] (03PS2) 10Rush: icinga: let Madhu run commands from webui [puppet] - 10https://gerrit.wikimedia.org/r/299196 (https://phabricator.wikimedia.org/T140422) (owner: 10Dzahn) [19:24:25] PROBLEM - puppet last run on mw1278 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:27] (03PS2) 10Rush: nagios_common: add Madhu to sms (ops paging) group [puppet] - 10https://gerrit.wikimedia.org/r/299198 (https://phabricator.wikimedia.org/T140422) (owner: 10Dzahn) [19:24:33] (03CR) 10Gehel: [C: 032] Configuration changes for es 2.x on logstash1001-1006 [puppet] - 10https://gerrit.wikimedia.org/r/296475 (owner: 10EBernhardson) [19:24:44] PROBLEM - puppet last run on mw1138 is CRITICAL: CRITICAL: Puppet has 1 failures [19:24:53] mutante: are https://gerrit.wikimedia.org/r/#/c/299198/1 and https://gerrit.wikimedia.org/r/#/c/299196/ ready to roll? [19:24:57] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2472326 (10Gehel) [19:25:05] PROBLEM - puppet last run on mw1295 is CRITICAL: CRITICAL: Puppet has 1 failures [19:25:06] PROBLEM - puppet last run on mw1296 is CRITICAL: CRITICAL: Puppet has 1 failures [19:25:15] PROBLEM - puppet last run on mw2133 is CRITICAL: CRITICAL: Puppet has 1 failures [19:25:25] PROBLEM - puppet last run on mw1181 is CRITICAL: CRITICAL: Puppet has 1 failures [19:25:25] PROBLEM - puppet last run on mw2237 is CRITICAL: CRITICAL: Puppet has 1 failures [19:25:34] PROBLEM - puppet last run on mw2107 is CRITICAL: CRITICAL: Puppet has 1 failures [19:25:34] PROBLEM - puppet last run on mw2231 is CRITICAL: CRITICAL: Puppet has 1 failures [19:25:43] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2472333 (10madhuvishy) As for getting a Yubikey - I requested a new one and OIT has place an order. [19:25:46] PROBLEM - puppet last run on mw1132 is CRITICAL: CRITICAL: Puppet has 1 failures [19:25:46] PROBLEM - puppet last run on mw1245 is CRITICAL: CRITICAL: Puppet has 1 failures [19:25:55] PROBLEM - puppet last run on mw1297 is CRITICAL: CRITICAL: Puppet has 1 failures [19:25:55] PROBLEM - puppet last run on snapshot1005 is CRITICAL: CRITICAL: Puppet has 1 failures [19:25:56] PROBLEM - puppet last run on mw1165 is CRITICAL: CRITICAL: Puppet has 1 failures [19:26:04] PROBLEM - puppet last run on mw2202 is CRITICAL: CRITICAL: Puppet has 1 failures [19:26:05] PROBLEM - puppet last run on mw2094 is CRITICAL: CRITICAL: Puppet has 1 failures [19:26:06] PROBLEM - puppet last run on mw1178 is CRITICAL: CRITICAL: Puppet has 1 failures [19:26:11] something up there [19:26:14] PROBLEM - puppet last run on mw1218 is CRITICAL: CRITICAL: Puppet has 1 failures [19:26:15] PROBLEM - puppet last run on mw1275 is CRITICAL: CRITICAL: Puppet has 1 failures [19:26:24] PROBLEM - puppet last run on mw2150 is CRITICAL: CRITICAL: Puppet has 1 failures [19:26:27] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2472335 (10EBernhardson) [19:26:31] looking [19:26:44] PROBLEM - puppet last run on mw1198 is CRITICAL: CRITICAL: Puppet has 1 failures [19:26:44] PROBLEM - puppet last run on mw2106 is CRITICAL: CRITICAL: Puppet has 1 failures [19:26:49] chasemp: kind of.. one needs a phone number to work but doesnt hurt ( i think) [19:26:55] PROBLEM - puppet last run on mw1116 is CRITICAL: CRITICAL: Puppet has 1 failures [19:26:55] chasemp: i'll look while you are on that [19:26:56] PROBLEM - puppet last run on mw1240 is CRITICAL: CRITICAL: Puppet has 1 failures [19:27:05] Notice: /Stage[main]/Admin/Admin::Groupmembers[deployment]/Exec[deployment_ensure_members]/returns: gpasswd: user 'jzerebecki' does not exis [19:27:12] ^ is what's causing all the puppetfail above [19:27:25] PROBLEM - puppet last run on mw1256 is CRITICAL: CRITICAL: Puppet has 1 failures [19:27:25] weird ok... [19:27:35] PROBLEM - puppet last run on mw1233 is CRITICAL: CRITICAL: Puppet has 1 failures [19:27:45] PROBLEM - puppet last run on mw2186 is CRITICAL: CRITICAL: Puppet has 1 failures [19:27:48] * gehel was worried his unrelated puppet change was breaking things, but it does not seem to be the case [19:27:55] PROBLEM - puppet last run on mw1145 is CRITICAL: CRITICAL: Puppet has 1 failures [19:27:55] PROBLEM - puppet last run on mw1022 is CRITICAL: CRITICAL: Puppet has 1 failures [19:28:05] PROBLEM - puppet last run on mw2209 is CRITICAL: CRITICAL: Puppet has 1 failures [19:28:09] mutante: did you merge anything related to 'gpasswd: user 'jzerebecki' does not exist'? [19:28:22] no, i did not [19:28:25] PROBLEM - puppet last run on mw2144 is CRITICAL: CRITICAL: Puppet has 1 failures [19:28:25] PROBLEM - puppet last run on mw2193 is CRITICAL: CRITICAL: Puppet has 1 failures [19:28:25] PROBLEM - puppet last run on mw1270 is CRITICAL: CRITICAL: Puppet has 1 failures [19:28:26] PROBLEM - puppet last run on mw2219 is CRITICAL: CRITICAL: Puppet has 1 failures [19:28:35] PROBLEM - puppet last run on mw1134 is CRITICAL: CRITICAL: Puppet has 1 failures [19:28:45] PROBLEM - puppet last run on mw2091 is CRITICAL: CRITICAL: Puppet has 1 failures [19:28:45] PROBLEM - puppet last run on mw2068 is CRITICAL: CRITICAL: Puppet has 1 failures [19:28:45] PROBLEM - puppet last run on mw1305 is CRITICAL: CRITICAL: Puppet has 1 failures [19:28:55] PROBLEM - puppet last run on mw1141 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:04] PROBLEM - puppet last run on mw1200 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:11] (03PS2) 10BBlack: RequestHeader unset doesn't like line-ending comments [puppet] - 10https://gerrit.wikimedia.org/r/299603 (https://phabricator.wikimedia.org/T140658) (owner: 10Chad) [19:29:14] PROBLEM - puppet last run on mw1262 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:14] PROBLEM - puppet last run on mw2167 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:14] PROBLEM - puppet last run on mw2157 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:34] PROBLEM - puppet last run on mw2234 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:45] PROBLEM - puppet last run on mw2248 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:45] PROBLEM - puppet last run on mw2081 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:51] (03PS1) 10Rush: admin: reremove jzerebecki reference [puppet] - 10https://gerrit.wikimedia.org/r/299604 [19:29:52] chasemp: https://gerrit.wikimedia.org/r/#/c/299078/5/modules/admin/data/data.yaml line 57 seems to add jan to the list [19:29:53] yeah one of my merge conflicts must have been bad [19:29:55] PROBLEM - puppet last run on mw1185 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:55] PROBLEM - puppet last run on mw1219 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:55] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:55] PROBLEM - puppet last run on mw1139 is CRITICAL: CRITICAL: Puppet has 1 failures [19:29:55] PROBLEM - puppet last run on mw1284 is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:03] I predict a missing comma [19:30:05] PROBLEM - puppet last run on mw2076 is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:05] (03CR) 10BBlack: [C: 032 V: 032] "(I fixed one tiny trailing whitespace issue on the way)" [puppet] - 10https://gerrit.wikimedia.org/r/299603 (https://phabricator.wikimedia.org/T140658) (owner: 10Chad) [19:30:13] mutante: https://gerrit.wikimedia.org/r/#/c/299604/ [19:30:14] PROBLEM - puppet last run on mw2118 is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:24] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:25] PROBLEM - puppet last run on mw2141 is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:35] PROBLEM - puppet last run on mw1276 is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:41] chasemp: was he removed? [19:30:44] PROBLEM - puppet last run on mw1226 is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:44] PROBLEM - puppet last run on mw1215 is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:44] PROBLEM - puppet last run on mw1220 is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:44] PROBLEM - puppet last run on mw2217 is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:54] PROBLEM - puppet last run on mw1174 is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:56] PROBLEM - puppet last run on mw1222 is CRITICAL: CRITICAL: Puppet has 1 failures [19:31:08] no it's just a bad rebase [19:31:11] looks at git log [19:31:14] PROBLEM - puppet last run on mw1231 is CRITICAL: CRITICAL: Puppet has 1 failures [19:31:14] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 1 failures [19:31:14] PROBLEM - puppet last run on mw1260 is CRITICAL: CRITICAL: Puppet has 1 failures [19:31:25] PROBLEM - puppet last run on mw2073 is CRITICAL: CRITICAL: Puppet has 1 failures [19:31:25] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures [19:31:25] PROBLEM - puppet last run on mw2208 is CRITICAL: CRITICAL: Puppet has 1 failures [19:31:26] jz was removed since the patch was first made, then it was rebased on the removal and kept jz on that one line [19:31:31] I constantly gets disconnected from the etherpad, Is it just me? or a known issue? [19:31:34] PROBLEM - puppet last run on mw1228 is CRITICAL: CRITICAL: Puppet has 1 failures [19:31:35] PROBLEM - puppet last run on mw2250 is CRITICAL: CRITICAL: Puppet has 1 failures [19:31:36] PROBLEM - puppet last run on mw1203 is CRITICAL: CRITICAL: Puppet has 1 failures [19:31:48] (03CR) 10Dzahn: [C: 031] "got removed in d4d759ec734233c5b6b290" [puppet] - 10https://gerrit.wikimedia.org/r/299604 (owner: 10Rush) [19:31:54] PROBLEM - puppet last run on mw2145 is CRITICAL: CRITICAL: Puppet has 1 failures [19:31:54] PROBLEM - puppet last run on mw2129 is CRITICAL: CRITICAL: Puppet has 1 failures [19:31:54] PROBLEM - puppet last run on mw2069 is CRITICAL: CRITICAL: Puppet has 1 failures [19:31:57] bblack is right [19:32:05] (03PS2) 10Rush: admin: reremove jzerebecki reference [puppet] - 10https://gerrit.wikimedia.org/r/299604 [19:32:14] PROBLEM - puppet last run on snapshot1007 is CRITICAL: CRITICAL: Puppet has 1 failures [19:32:15] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [19:32:15] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 1 failures [19:32:24] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [19:32:25] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [19:32:26] PROBLEM - puppet last run on snapshot1006 is CRITICAL: CRITICAL: Puppet has 1 failures [19:32:31] I'm pushing it through to stop this noise [19:32:34] (03CR) 10Rush: [C: 032 V: 032] admin: reremove jzerebecki reference [puppet] - 10https://gerrit.wikimedia.org/r/299604 (owner: 10Rush) [19:32:46] PROBLEM - puppet last run on mw2095 is CRITICAL: CRITICAL: Puppet has 1 failures [19:32:54] PROBLEM - puppet last run on mw1289 is CRITICAL: CRITICAL: Puppet has 1 failures [19:32:55] PROBLEM - puppet last run on mw2146 is CRITICAL: CRITICAL: Puppet has 1 failures [19:32:55] PROBLEM - puppet last run on mw2077 is CRITICAL: CRITICAL: Puppet has 1 failures [19:32:58] 06Operations, 06Commons, 10media-storage, 13Patch-For-Review, 07User-notice: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2472383 (10kaldari) Yay! Looks great to me! Is there a way we can automatically regenerate all the SVG t... [19:33:04] PROBLEM - puppet last run on mw2228 is CRITICAL: CRITICAL: Puppet has 1 failures [19:33:04] PROBLEM - puppet last run on mw1259 is CRITICAL: CRITICAL: Puppet has 1 failures [19:33:05] PROBLEM - puppet last run on snapshot1001 is CRITICAL: CRITICAL: Puppet has 1 failures [19:33:14] PROBLEM - puppet last run on mw2138 is CRITICAL: CRITICAL: Puppet has 1 failures [19:33:15] PROBLEM - puppet last run on mw1157 is CRITICAL: CRITICAL: Puppet has 1 failures [19:33:15] PROBLEM - puppet last run on mw1195 is CRITICAL: CRITICAL: Puppet has 1 failures [19:33:16] PROBLEM - puppet last run on mw1251 is CRITICAL: CRITICAL: Puppet has 1 failures [19:33:16] PROBLEM - puppet last run on mw1184 is CRITICAL: CRITICAL: Puppet has 1 failures [19:33:16] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 1 failures [19:33:25] PROBLEM - puppet last run on mw1302 is CRITICAL: CRITICAL: Puppet has 1 failures [19:33:35] PROBLEM - puppet last run on mw2136 is CRITICAL: CRITICAL: Puppet has 1 failures [19:33:35] PROBLEM - puppet last run on mw1019 is CRITICAL: CRITICAL: Puppet has 1 failures [19:33:35] PROBLEM - puppet last run on mw1177 is CRITICAL: CRITICAL: Puppet has 1 failures [19:33:35] PROBLEM - puppet last run on mw2132 is CRITICAL: CRITICAL: Puppet has 1 failures [19:33:36] PROBLEM - puppet last run on mw2166 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:04] PROBLEM - puppet last run on mw2177 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:15] PROBLEM - puppet last run on mw1199 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:15] PROBLEM - puppet last run on mw1258 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:24] PROBLEM - puppet last run on mw1303 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:24] PROBLEM - puppet last run on mw2198 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:25] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:34] PROBLEM - puppet last run on mw1249 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:34] PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:34] PROBLEM - puppet last run on mw1288 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:35] PROBLEM - puppet last run on mw2061 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:35] PROBLEM - puppet last run on mw2232 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:35] PROBLEM - puppet last run on silver is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:36] PROBLEM - puppet last run on bast4001 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:36] PROBLEM - puppet last run on mw1234 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:36] PROBLEM - puppet last run on mw2088 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:36] PROBLEM - puppet last run on mw2162 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:44] PROBLEM - puppet last run on mw1287 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:45] PROBLEM - puppet last run on mw2235 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:46] PROBLEM - puppet last run on mw2191 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:46] PROBLEM - puppet last run on mw1261 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:48] okay, definitely there is an issue, I'll bother you later :D [19:34:55] PROBLEM - puppet last run on mw1282 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:56] PROBLEM - puppet last run on mw2120 is CRITICAL: CRITICAL: Puppet has 1 failures [19:35:04] PROBLEM - puppet last run on mw1246 is CRITICAL: CRITICAL: Puppet has 1 failures [19:35:14] PROBLEM - puppet last run on mw1191 is CRITICAL: CRITICAL: Puppet has 1 failures [19:35:14] PROBLEM - puppet last run on mw1202 is CRITICAL: CRITICAL: Puppet has 1 failures [19:35:14] Amir1: the above is unrelated to etherpad, I think [19:35:33] (03PS1) 10Chad: Gerrit: Add myself to contact groups for when https goes boom [puppet] - 10https://gerrit.wikimedia.org/r/299605 [19:35:34] PROBLEM - puppet last run on snapshot1004 is CRITICAL: CRITICAL: Puppet has 1 failures [19:35:35] PROBLEM - puppet last run on mw1212 is CRITICAL: CRITICAL: Puppet has 1 failures [19:35:35] yeah, but it shows you're busy. [19:35:35] (03PS1) 10Chad: Gerrit: Add icinga check for Gerrit SSH access [puppet] - 10https://gerrit.wikimedia.org/r/299606 [19:35:40] we used etherpad just fine with a bunch of people for an hour during the ops meeting earlier today, a couple of hours ago. that's the last I know about it [19:35:44] RECOVERY - puppet last run on mw2237 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [19:35:45] PROBLEM - puppet last run on mw2125 is CRITICAL: CRITICAL: Puppet has 1 failures [19:35:54] PROBLEM - puppet last run on mw1252 is CRITICAL: CRITICAL: Puppet has 1 failures [19:35:54] PROBLEM - puppet last run on mw1133 is CRITICAL: CRITICAL: Puppet has 1 failures [19:35:55] PROBLEM - puppet last run on mw1156 is CRITICAL: CRITICAL: Puppet has 1 failures [19:35:56] PROBLEM - puppet last run on mw1192 is CRITICAL: CRITICAL: Puppet has 1 failures [19:36:14] PROBLEM - puppet last run on mw1264 is CRITICAL: CRITICAL: Puppet has 1 failures [19:36:23] !log shutting down logstash and elasticsearch on logstash1001-03 [19:36:24] PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: Puppet has 1 failures [19:36:27] bblack: mutante thanks things will straighten out now [19:36:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:36:34] PROBLEM - puppet last run on mw1257 is CRITICAL: CRITICAL: Puppet has 1 failures [19:36:34] PROBLEM - puppet last run on mw1163 is CRITICAL: CRITICAL: Puppet has 1 failures [19:36:34] PROBLEM - puppet last run on mw1183 is CRITICAL: CRITICAL: Puppet has 1 failures [19:36:35] PROBLEM - puppet last run on mw2072 is CRITICAL: CRITICAL: Puppet has 1 failures [19:36:35] yeah, we used it several hours ago for a meeting but right now it was annoying [19:36:38] 06Operations, 06Commons, 10media-storage, 13Patch-For-Review, 07User-notice: Some fonts not anti-aliasing in SVG thumbnails after upgrade of scaling servers - https://phabricator.wikimedia.org/T139543#2472430 (10kaldari) Also, I'm happy that Times is not substituted by DejaVu Serif. Hopefully it's being... [19:36:45] PROBLEM - puppet last run on mw1286 is CRITICAL: CRITICAL: Puppet has 1 failures [19:36:46] PROBLEM - puppet last run on mw1291 is CRITICAL: CRITICAL: Puppet has 1 failures [19:36:46] PROBLEM - puppet last run on mw1239 is CRITICAL: CRITICAL: Puppet has 1 failures [19:36:46] RECOVERY - puppet last run on mw1261 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [19:36:46] PROBLEM - puppet last run on mw2214 is CRITICAL: CRITICAL: Puppet has 1 failures [19:36:56] PROBLEM - puppet last run on mw1023 is CRITICAL: CRITICAL: Puppet has 1 failures [19:36:56] PROBLEM - puppet last run on mw2181 is CRITICAL: CRITICAL: Puppet has 1 failures [19:37:10] maybe it gets fixed by itself :D [19:37:15] PROBLEM - puppet last run on mw1244 is CRITICAL: CRITICAL: Puppet has 1 failures [19:37:15] PROBLEM - puppet last run on mw1248 is CRITICAL: CRITICAL: Puppet has 1 failures [19:37:34] PROBLEM - puppet last run on mw2210 is CRITICAL: CRITICAL: Puppet has 1 failures [19:37:34] PROBLEM - puppet last run on mw2226 is CRITICAL: CRITICAL: Puppet has 1 failures [19:37:35] PROBLEM - puppet last run on mw1306 is CRITICAL: CRITICAL: Puppet has 1 failures [19:37:35] PROBLEM - puppet last run on mw2155 is CRITICAL: CRITICAL: Puppet has 1 failures [19:37:45] PROBLEM - puppet last run on mw2205 is CRITICAL: CRITICAL: Puppet has 1 failures [19:37:45] PROBLEM - puppet last run on mw2239 is CRITICAL: CRITICAL: Puppet has 1 failures [19:37:46] PROBLEM - puppet last run on mw2169 is CRITICAL: CRITICAL: Puppet has 1 failures [19:37:55] PROBLEM - puppet last run on mw2122 is CRITICAL: CRITICAL: Puppet has 1 failures [19:37:55] PROBLEM - puppet last run on mw1301 is CRITICAL: CRITICAL: Puppet has 1 failures [19:38:04] PROBLEM - puppet last run on mw1186 is CRITICAL: CRITICAL: Puppet has 1 failures [19:38:04] PROBLEM - puppet last run on mw2215 is CRITICAL: CRITICAL: Puppet has 1 failures [19:38:05] PROBLEM - puppet last run on mw2071 is CRITICAL: CRITICAL: Puppet has 1 failures [19:38:16] PROBLEM - puppet last run on mw2074 is CRITICAL: CRITICAL: Puppet has 1 failures [19:38:26] PROBLEM - puppet last run on mw1229 is CRITICAL: CRITICAL: Puppet has 1 failures [19:38:26] PROBLEM - puppet last run on mw1223 is CRITICAL: CRITICAL: Puppet has 1 failures [19:38:44] PROBLEM - puppet last run on mw1167 is CRITICAL: CRITICAL: Puppet has 1 failures [19:38:44] PROBLEM - puppet last run on mw1148 is CRITICAL: CRITICAL: Puppet has 1 failures [19:38:45] PROBLEM - puppet last run on mw2159 is CRITICAL: CRITICAL: Puppet has 1 failures [19:38:45] PROBLEM - puppet last run on mw1152 is CRITICAL: CRITICAL: Puppet has 1 failures [19:38:46] !log Dropped logstash-2016.07.04 through logstash-2016.07.14 indices for backing Elasticsearch upgrade [19:38:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:39:03] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2472437 (10EBernhardson) [19:39:04] PROBLEM - puppet last run on mw2187 is CRITICAL: CRITICAL: Puppet has 1 failures [19:39:05] PROBLEM - puppet last run on mw2108 is CRITICAL: CRITICAL: Puppet has 1 failures [19:39:24] PROBLEM - puppet last run on mw1209 is CRITICAL: CRITICAL: Puppet has 1 failures [19:39:25] PROBLEM - puppet last run on snapshot1003 is CRITICAL: CRITICAL: Puppet has 1 failures [19:39:29] (03CR) 10Krinkle: [C: 031] Set the DBTransaction log [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299095 (owner: 10Aaron Schulz) [19:39:34] PROBLEM - puppet last run on mw2180 is CRITICAL: CRITICAL: Puppet has 1 failures [19:39:35] PROBLEM - puppet last run on mw2175 is CRITICAL: CRITICAL: Puppet has 1 failures [19:39:36] PROBLEM - puppet last run on mw2121 is CRITICAL: CRITICAL: Puppet has 1 failures [19:39:41] (03PS2) 10Krinkle: Enable debug logging for DBTransaction [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299095 (owner: 10Aaron Schulz) [19:39:46] PROBLEM - puppet last run on mw2124 is CRITICAL: CRITICAL: Puppet has 1 failures [19:39:55] PROBLEM - puppet last run on mw2204 is CRITICAL: CRITICAL: Puppet has 1 failures [19:40:35] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: decommission aluminium, replace it with frqueue1002 - https://phabricator.wikimedia.org/T140676#2472440 (10Jgreen) [19:40:37] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2472454 (10bd808) [19:41:05] PROBLEM - logstash process on logstash1003 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 998 (logstash), command name java, args logstash [19:41:24] PROBLEM - logstash process on logstash1002 is CRITICAL: PROCS CRITICAL: 0 processes with UID = 998 (logstash), command name java, args logstash [19:41:29] gehel: can you silence those ^ ? [19:42:03] bd808: on it [19:42:25] PROBLEM - ElasticSearch health check for shards on logstash1003 is CRITICAL: CRITICAL - elasticsearch http://10.64.48.113:9200/_cluster/health error while fetching: HTTPConnectionPool(host=10.64.48.113, port=9200): Max retries exceeded with url: /_cluster/health (Caused by class socket.error: [Errno 111] Connection refused) [19:42:45] PROBLEM - ElasticSearch health check for shards on logstash1002 is CRITICAL: CRITICAL - elasticsearch http://10.64.32.137:9200/_cluster/health error while fetching: HTTPConnectionPool(host=10.64.32.137, port=9200): Max retries exceeded with url: /_cluster/health (Caused by class socket.error: [Errno 111] Connection refused) [19:43:05] RECOVERY - puppet last run on mw2108 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [19:43:14] elasticsearch is intentionally shutdown, those critical problems are not [19:43:51] we forgot to schedule icinga downtime in our checklist :/ [19:44:12] (03CR) 10Chad: "Is this still needed? We can add exempt.dblist but the syntax in group2 and group1 have changed since January." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265189 (owner: 10Dduvall) [19:44:25] (03Abandoned) 10Paladox: Phabricator: allow liberal close in Differential [puppet] - 10https://gerrit.wikimedia.org/r/281069 (https://phabricator.wikimedia.org/T131623) (owner: 10Paladox) [19:46:53] (03Abandoned) 10Dduvall: Establish `group2` and `exempt` dblists [mediawiki-config] - 10https://gerrit.wikimedia.org/r/265189 (owner: 10Dduvall) [19:46:54] RECOVERY - ElasticSearch health check for shards on logstash1002 is OK: OK - elasticsearch status production-logstash-eqiad: status: green, number_of_nodes: 4, unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 11, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards: 33, initializing_shards: 0, number_of_data_nodes: 3, delayed_unassigned_sha [19:47:12] !log shutdown elasticsearch on logstash1004-6 [19:47:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:47:25] RECOVERY - logstash process on logstash1002 is OK: PROCS OK: 1 process with UID = 998 (logstash), command name java, args logstash [19:47:31] (03PS4) 10Gehel: Import elasticsearch 2.x into our APT repository [puppet] - 10https://gerrit.wikimedia.org/r/283466 (https://phabricator.wikimedia.org/T132376) [19:48:20] ebernhardson: disable puppet on each too until the new deb is installed? [19:49:16] bd808: hmm, shouldn't be necessary? [19:49:24] RECOVERY - puppet last run on mw1295 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [19:49:34] that recovery above was puppet restarting es on 1002 [19:49:45] !log ytterbium - fixing Apache config, graceful [19:49:46] wasn't it? [19:49:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:50:14] RECOVERY - puppet last run on mw1274 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [19:50:44] RECOVERY - puppet last run on mw1275 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [19:50:46] RECOVERY - puppet last run on mw2150 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [19:50:55] RECOVERY - puppet last run on mw2242 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:50:55] RECOVERY - puppet last run on mw1278 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:51:05] RECOVERY - puppet last run on mw1138 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [19:51:14] RECOVERY - logstash process on logstash1003 is OK: PROCS OK: 1 process with UID = 998 (logstash), command name java, args logstash [19:51:34] RECOVERY - puppet last run on mw1296 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:51:36] RECOVERY - puppet last run on mw2133 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:51:44] bd808: oh, yes now i get what you mean. disabling [19:51:45] RECOVERY - puppet last run on mw1181 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:51:55] RECOVERY - puppet last run on mw2107 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:52:04] RECOVERY - puppet last run on mw2231 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [19:52:08] (03PS5) 10Gehel: Import elasticsearch 2.x into our APT repository [puppet] - 10https://gerrit.wikimedia.org/r/283466 (https://phabricator.wikimedia.org/T132376) [19:52:14] RECOVERY - puppet last run on mw1132 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:52:15] RECOVERY - puppet last run on mw1245 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:52:16] RECOVERY - puppet last run on mw1297 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:52:16] RECOVERY - puppet last run on snapshot1005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:52:16] RECOVERY - puppet last run on mw1165 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:52:34] RECOVERY - puppet last run on mw2202 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:52:35] RECOVERY - puppet last run on mw1178 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [19:52:35] RECOVERY - puppet last run on mw2094 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:52:36] RECOVERY - puppet last run on mw1218 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:52:41] !log disabling puppet on logstash.* nodes for elasticsearch upgrade [19:52:43] (03PS3) 10Dzahn: [gerrit] Fix RewriteRule for undefined.cache.js [puppet] - 10https://gerrit.wikimedia.org/r/299598 (owner: 10Paladox) [19:52:45] RECOVERY - puppet last run on mw2144 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [19:52:46] RECOVERY - puppet last run on mw1270 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [19:52:46] RECOVERY - puppet last run on mw2193 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [19:52:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:52:55] RECOVERY - puppet last run on mw2219 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [19:52:55] RECOVERY - puppet last run on mw1134 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [19:53:05] RECOVERY - puppet last run on mw1198 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [19:53:05] RECOVERY - puppet last run on mw2106 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:53:06] RECOVERY - puppet last run on mw2068 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [19:53:16] RECOVERY - puppet last run on mw1116 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [19:53:16] RECOVERY - puppet last run on mw1141 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [19:53:24] RECOVERY - puppet last run on mw1200 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [19:53:24] RECOVERY - puppet last run on mw1240 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:53:34] RECOVERY - puppet last run on mw1262 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [19:53:35] RECOVERY - puppet last run on mw2167 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [19:53:45] RECOVERY - puppet last run on mw1256 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:53:55] RECOVERY - puppet last run on mw1233 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:54:03] !log re-stopping logstash on logstash1001-3 [19:54:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:54:13] (03CR) 10Gehel: [C: 032] Import elasticsearch 2.x into our APT repository [puppet] - 10https://gerrit.wikimedia.org/r/283466 (https://phabricator.wikimedia.org/T132376) (owner: 10Gehel) [19:54:14] RECOVERY - puppet last run on mw2186 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:54:14] RECOVERY - puppet last run on mw1185 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [19:54:14] RECOVERY - puppet last run on mw1219 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [19:54:15] RECOVERY - puppet last run on mw1139 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [19:54:16] RECOVERY - puppet last run on mw1145 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:54:16] RECOVERY - puppet last run on mw1022 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [19:54:34] RECOVERY - puppet last run on mw2209 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [19:55:05] RECOVERY - puppet last run on mw1174 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [19:55:05] RECOVERY - puppet last run on mw2091 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:55:05] RECOVERY - puppet last run on mw1305 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:55:14] RECOVERY - puppet last run on mw1222 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [19:55:24] RECOVERY - puppet last run on mw1231 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:55:25] RECOVERY - puppet last run on mw1260 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [19:55:44] RECOVERY - puppet last run on mw1228 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [19:55:44] RECOVERY - puppet last run on mw2157 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [19:55:46] RECOVERY - puppet last run on mw1203 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [19:55:55] RECOVERY - puppet last run on mw2234 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:55:56] RECOVERY - puppet last run on mw2145 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [19:56:04] RECOVERY - puppet last run on mw2069 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:56:15] RECOVERY - puppet last run on mw2248 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [19:56:16] RECOVERY - puppet last run on mw1120 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [19:56:16] RECOVERY - puppet last run on mw1284 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [19:56:25] RECOVERY - puppet last run on mw2076 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:56:34] RECOVERY - puppet last run on mw2118 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:56:34] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [19:56:35] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [19:56:44] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [19:56:45] RECOVERY - puppet last run on mw2141 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:56:54] RECOVERY - puppet last run on mw1276 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:56:55] RECOVERY - puppet last run on mw1226 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:56:55] RECOVERY - puppet last run on mw1215 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:56:55] RECOVERY - puppet last run on mw1220 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:57:04] RECOVERY - puppet last run on mw2217 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [19:57:15] RECOVERY - puppet last run on mw2228 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [19:57:25] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:57:26] RECOVERY - puppet last run on mw1251 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [19:57:26] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [19:57:37] RECOVERY - puppet last run on mw2073 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [19:57:44] RECOVERY - puppet last run on mw2208 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:57:44] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [19:57:44] RECOVERY - puppet last run on mw1177 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [19:57:45] RECOVERY - puppet last run on mw2136 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [19:57:45] RECOVERY - puppet last run on mw2250 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:58:04] RECOVERY - puppet last run on mw2129 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [19:58:15] RECOVERY - puppet last run on mw2081 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:58:16] RECOVERY - puppet last run on snapshot1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:58:24] (03PS4) 10Dzahn: [gerrit] Fix RewriteRule for undefined.cache.js [puppet] - 10https://gerrit.wikimedia.org/r/299598 (owner: 10Paladox) [19:58:25] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:58:26] RECOVERY - puppet last run on mw1199 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:58:26] RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:58:29] (03PS1) 10Jdlrobson: Wikidata description config cleanup [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299615 (https://phabricator.wikimedia.org/T140600) [19:58:31] (03CR) 10Dzahn: [C: 032] [gerrit] Fix RewriteRule for undefined.cache.js [puppet] - 10https://gerrit.wikimedia.org/r/299598 (owner: 10Paladox) [19:58:33] (03CR) 10Hashar: [C: 04-1] "There must be a reason, gotta dig in the git log history! One sure thing is that each DNS entries have different IP:" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/299592 (https://phabricator.wikimedia.org/T125018) (owner: 10Dzahn) [19:58:44] RECOVERY - puppet last run on snapshot1006 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:58:44] RECOVERY - puppet last run on mw1249 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [19:58:44] RECOVERY - puppet last run on mw1172 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:58:45] RECOVERY - puppet last run on silver is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [19:58:45] RECOVERY - puppet last run on mw2061 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [19:59:04] RECOVERY - puppet last run on mw2095 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:59:05] RECOVERY - puppet last run on mw1289 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:59:05] RECOVERY - puppet last run on mw2146 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:59:05] RECOVERY - puppet last run on mw2077 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:59:14] RECOVERY - puppet last run on mw2120 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:59:15] RECOVERY - puppet last run on snapshot1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:59:25] RECOVERY - puppet last run on mw2138 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [19:59:25] RECOVERY - puppet last run on mw1191 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [19:59:25] RECOVERY - puppet last run on mw1202 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [19:59:25] RECOVERY - puppet last run on mw1157 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [19:59:26] RECOVERY - puppet last run on mw1195 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:59:26] RECOVERY - puppet last run on mw1184 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [19:59:35] RECOVERY - puppet last run on mw1302 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [19:59:44] RECOVERY - puppet last run on mw1019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:59:45] RECOVERY - puppet last run on mw2132 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [19:59:46] RECOVERY - puppet last run on mw2166 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:00:05] gwicke, cscott, arlolra, subbu, bearND, and mdholloway: Dear anthropoid, the time has come. Please deploy Services – Parsoid / OCG / Citoid / Mobileapps / … (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160718T2000). [20:00:05] RECOVERY - puppet last run on mw1133 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [20:00:06] RECOVERY - puppet last run on mw1156 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [20:00:16] RECOVERY - puppet last run on mw2177 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [20:00:24] (03PS1) 10Jdlrobson: Lazy load images+references on Russian Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299619 (https://phabricator.wikimedia.org/T140197) [20:00:25] RECOVERY - puppet last run on mw1264 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [20:00:35] RECOVERY - puppet last run on mw1258 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [20:00:44] RECOVERY - puppet last run on mw1303 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [20:00:45] RECOVERY - puppet last run on mw2198 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:00:45] RECOVERY - puppet last run on mw1168 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [20:00:54] RECOVERY - puppet last run on mw1288 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:00:55] RECOVERY - puppet last run on mw2072 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [20:00:55] RECOVERY - puppet last run on mw2232 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:00:55] RECOVERY - puppet last run on mw1234 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:01:04] RECOVERY - puppet last run on mw2088 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:01:04] RECOVERY - puppet last run on mw2162 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [20:01:04] RECOVERY - puppet last run on mw1287 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [20:01:05] RECOVERY - puppet last run on bast4001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:01:05] RECOVERY - puppet last run on mw1286 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:01:05] RECOVERY - puppet last run on mw2235 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:01:14] RECOVERY - puppet last run on mw2191 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [20:01:15] RECOVERY - puppet last run on mw1282 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:01:24] RECOVERY - puppet last run on mw1246 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:01:25] RECOVERY - puppet last run on mw2181 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [20:01:25] RECOVERY - puppet last run on mw1259 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:01:33] (03CR) 10Bartosz Dziewoński: Lazy load images+references on Russian Wiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299619 (https://phabricator.wikimedia.org/T140197) (owner: 10Jdlrobson) [20:01:36] RECOVERY - puppet last run on mw1244 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [20:01:55] RECOVERY - puppet last run on snapshot1004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:02:18] RECOVERY - puppet last run on mw1212 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [20:02:19] RECOVERY - puppet last run on mw2226 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [20:02:20] RECOVERY - puppet last run on mw1023 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [20:02:24] (03CR) 10Jforrester: Wikidata description config cleanup (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/299615 (https://phabricator.wikimedia.org/T140600) (owner: 10Jdlrobson) [20:02:31] !log re-shutdown elasticsearch on logstash1001-6 [20:02:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:02:39] RECOVERY - puppet last run on mw1252 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:02:49] RECOVERY - puppet last run on mw1163 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:02:49] RECOVERY - puppet last run on mw1248 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [20:03:00] RECOVERY - puppet last run on mw1148 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [20:03:00] RECOVERY - puppet last run on mw1183 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:03:00] RECOVERY - puppet last run on mw1239 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [20:03:00] RECOVERY - puppet last run on mw2074 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [20:03:18] RECOVERY - puppet last run on mw2210 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [20:03:19] RECOVERY - puppet last run on mw2155 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:03:20] RECOVERY - puppet last run on mw1229 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [20:03:20] RECOVERY - puppet last run on mw1291 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:03:30] RECOVERY - puppet last run on mw2214 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:03:48] RECOVERY - puppet last run on mw1192 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:03:49] RECOVERY - puppet last run on mw2187 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:03:49] RECOVERY - puppet last run on mw2204 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [20:04:00] RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:04:00] RECOVERY - puppet last run on mw1257 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:04:00] RECOVERY - puppet last run on mw1301 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:04:19] RECOVERY - puppet last run on mw1209 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [20:04:39] RECOVERY - puppet last run on snapshot1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:04:39] RECOVERY - puppet last run on mw2071 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [20:04:39] RECOVERY - puppet last run on mw2205 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:04:48] RECOVERY - puppet last run on mw1152 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [20:04:49] RECOVERY - puppet last run on mw1186 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:04:50] RECOVERY - puppet last run on mw2124 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [20:04:58] RECOVERY - puppet last run on mw1306 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:04:59] RECOVERY - puppet last run on mw2175 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:05:08] RECOVERY - puppet last run on mw1223 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [20:05:08] RECOVERY - puppet last run on mw2159 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:05:19] RECOVERY - puppet last run on mw1167 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [20:05:19] RECOVERY - puppet last run on mw2121 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:05:20] RECOVERY - puppet last run on mw2239 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:05:20] RECOVERY - puppet last run on mw2125 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [20:05:20] RECOVERY - puppet last run on mw2122 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:05:28] RECOVERY - puppet last run on mw2215 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [20:05:38] RECOVERY - puppet last run on mw2169 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [20:05:45] MatmaRex: try the redirect again now [20:06:48] RECOVERY - puppet last run on mw2180 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:06:50] mutante: yay, seems to work! and gerrit-new loads! [20:06:52] works on gerrit-new and doesn't on gerrit-old [20:07:01] and that is expected as paladox said [20:07:03] ok, cool [20:07:08] mutante: right, old gerrit doesn't have the file we're redirecting to, but that's okay [20:07:15] ok:) [20:07:21] paladox: thanks ^ [20:07:31] Your welcome :) [20:07:36] yay [20:08:32] :) [20:08:43] I guess firefox users can now test gerrit-new :) [20:08:54] ostriches looks like firefox will work now ^^ [20:08:55] paladox: Did you see my comment on https://gerrit.wikimedia.org/r/#/c/299447/3/modules/gerrit/templates/gerrit.config.erb? [20:09:02] Oh nope [20:09:05] i will check now [20:09:14] (03CR) 10Dzahn: "we should probably add a phone number to the contact before this." [puppet] - 10https://gerrit.wikimedia.org/r/299198 (https://phabricator.wikimedia.org/T140422) (owner: 10Dzahn) [20:09:16] Also, iirc, even/odd rows had poor behavior on 2.8 [20:09:45] ostriches, oh, how do i get it so we get the table for the reviewer bit [20:09:53] Also i doint see a comment at https://gerrit.wikimedia.org/r/#/c/299447/3/modules/gerrit/templates/gerrit.config.erb [20:10:05] Lol, I didn't post it! [20:10:06] Whoops [20:10:10] Oh [20:10:12] "These should go in the [theme] section on line 131." [20:10:23] Ok. [20:10:31] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2472568 (10EBernhardson) [20:10:37] (03CR) 10Hashar: "I did dig the git log history and the original puppet change had gerrit_server pointing to manganese for whatever reason." [puppet] - 10https://gerrit.wikimedia.org/r/299592 (https://phabricator.wikimedia.org/T125018) (owner: 10Dzahn) [20:10:37] Im not sure how to get it so it shows only the table [20:11:01] You know for example [20:11:04] viewing https://gerrit.wikimedia.org/r/#/c/299447/ [20:11:08] Anyone know if there are any special issues with enabling the ShortURL extension on a new wiki? [20:11:23] legoktm, ^ [20:11:27] on the left where it shows the author and change id including other information [20:11:28] (In this case, urwiki, per T138507.) [20:11:28] T138507: Enable ShortUrl on Urdu Wikipedia - https://phabricator.wikimedia.org/T138507 [20:11:29] ostriches ^^ [20:12:23] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2472574 (10chasemp) [20:12:35] Oh wait let me test removing that part [20:13:09] James_F: mywiki is better than urwiki [20:13:40] ostriches: Ho. Ho. Ho. [20:14:00] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2472581 (10EBernhardson) [20:14:47] There's a bug in gerrit that prevents it from starting without doing bin/gerrit.sh start. [20:14:56] I think this may affect gerrit 2.12.3 [20:15:04] since before in gerrit 2.12.2 it worked. [20:15:08] ostriches ^^ [20:15:14] and gerrit 2.8 start for me. [20:15:25] Huh? [20:15:33] What doesn't work? [20:15:42] Gerrit, when i do bin/gerrit.sh start [20:15:51] it just sits there until the timeout goes [20:16:13] You just said it doesn't work without bin/gerrit.sh. [20:16:21] But then you said it doesn't work when you do. [20:16:22] I'm confused. [20:16:25] The workaround i found is to do bin/gerrit.sh run then sit there until it works which takes a few mins [20:16:35] That sounds like a local issue. [20:16:39] Maybe [20:16:45] But gerrit 2.8 start for me [20:17:05] Well we're not using 2.12.3 yet anyway and it works for me :) [20:17:11] Oh :) [20:17:38] (03CR) 10Chad: "Also wondering if there's a way we can do sms without adding me to the global sms spamlist." [puppet] - 10https://gerrit.wikimedia.org/r/299605 (owner: 10Chad) [20:18:00] ostriches ive been testing mutiple zuul connections and it works just have to have a different pipeline for one connection and the other one for the other connection [20:18:14] havent managed to get one pipeline working with two connections yet [20:18:19] !log installed elasticsearch-2.3.3 to logstash1001-6 [20:18:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:19:35] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2472619 (10chasemp) [20:21:24] ostriches it seems the comment at https://gerrit.wikimedia.org/r/#/c/299447/ has still not been published [20:21:25] ? [20:21:29] PROBLEM - puppet last run on ms-be3002 is CRITICAL: CRITICAL: puppet fail [20:23:14] 06Operations, 10Ops-Access-Requests: Requesting shell access and access to groups 'analytics-privatedata-users' and 'researchers' for Mpany - https://phabricator.wikimedia.org/T140399#2472630 (10Mpany) @Jgreen: thanks for explaining! wikitech username: Mpany shell account name: mpany Do you need anything... [20:27:33] (03PS3) 10Dzahn: icinga: let Madhu run commands from webui [puppet] - 10https://gerrit.wikimedia.org/r/299196 (https://phabricator.wikimedia.org/T140422) [20:27:35] paladox: I didn't bother because I told you here :) [20:27:40] (03CR) 10Dzahn: [C: 032] icinga: let Madhu run commands from webui [puppet] - 10https://gerrit.wikimedia.org/r/299196 (https://phabricator.wikimedia.org/T140422) (owner: 10Dzahn) [20:27:44] Ok [20:27:45] :) [20:28:35] ostriches i removed it and testing on my test instance [20:28:40] Does this look better http://gerrit-test.wmflabs.org/gerrit/#/dashboard/self [20:32:07] Better than https://gerrit-new.wikimedia.org/r/#/dashboard/self? I see no difference [20:32:27] Oh [20:32:45] ostriches here http://gerrit-test.wmflabs.org/gerrit/#/admin/projects/ [20:33:00] You should notice the difference between that and http://gerrit-new.wmflabs.org/gerrit/#/admin/projects/ [20:33:25] I only notice that gerrit-new lacks any links at all to diffusion :) [20:33:33] Yep [20:33:41] also includes a lighter blue for the table [20:33:52] http://gerrit-test.wmflabs.org/gerrit/#/c/11/ [20:33:55] ostriches ^^ [20:34:06] What's wrong with the existing grey header? [20:34:20] https://gerrit-new.wikimedia.org/r/#/c/11/ [20:34:24] Nothing [20:34:32] Then we don't need to change it. [20:34:35] Ok. [20:34:57] Making things "look nice" where they're not actually broken isn't a goal right now. That can wait until after the migration. [20:35:03] Ok [20:36:06] (03PS5) 10Paladox: Add some colors to the site table on changes [puppet] - 10https://gerrit.wikimedia.org/r/299447 [20:38:50] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2472744 (10Dzahn) [20:40:45] (03PS1) 10EBernhardson: Apply es2.x config for logstash1001-3 as well [puppet] - 10https://gerrit.wikimedia.org/r/299660 [20:44:18] (03PS2) 10EBernhardson: Apply es2.x config for logstash1001-3 as well [puppet] - 10https://gerrit.wikimedia.org/r/299660 [20:44:25] ostriches i will upload the amended commit in a sec, im testing it. [20:44:26] :) [20:47:27] (03PS4) 10Andrew Bogott: Include nova mysql password in novaenv.sh [puppet] - 10https://gerrit.wikimedia.org/r/299590 (https://phabricator.wikimedia.org/T139272) [20:47:29] (03PS2) 10Andrew Bogott: cold-migrate password reform: use novaenv.sh for vitals wherever possible. [puppet] - 10https://gerrit.wikimedia.org/r/299602 (https://phabricator.wikimedia.org/T139272) [20:47:31] (03PS1) 10Andrew Bogott: cold-migrate: activate/deactivate base image as needed. [puppet] - 10https://gerrit.wikimedia.org/r/299661 (https://phabricator.wikimedia.org/T139272) [20:49:12] (03PS3) 10EBernhardson: Apply es2.x config for logstash1001-3 as well [puppet] - 10https://gerrit.wikimedia.org/r/299660 [20:49:33] RECOVERY - puppet last run on ms-be3002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [20:50:02] ostriches yay it works, it is now back to grey. im uploading the patch now [20:50:03] :) [20:50:03] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: Puppet has 1 failures [20:50:47] (03PS6) 10Paladox: Add some colors to the site table on changes [puppet] - 10https://gerrit.wikimedia.org/r/299447 [20:50:48] ostriches ^^ [20:52:33] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2472852 (10chasemp) [20:53:05] (03CR) 10Gehel: "Puppet compiler looks good: https://puppet-compiler.wmflabs.org/3369/" [puppet] - 10https://gerrit.wikimedia.org/r/299660 (owner: 10EBernhardson) [20:53:21] (03PS3) 10Dzahn: nagios_common: add Madhu to sms (ops paging) group [puppet] - 10https://gerrit.wikimedia.org/r/299198 (https://phabricator.wikimedia.org/T140422) [20:54:16] (03PS4) 10Dzahn: nagios_common: add Madhu to sms (ops paging) group [puppet] - 10https://gerrit.wikimedia.org/r/299198 (https://phabricator.wikimedia.org/T140422) [20:57:09] 06Operations, 10Phabricator, 13Patch-For-Review: Phabricator weekly report not generated (or at least sent) - https://phabricator.wikimedia.org/T139950#2472873 (10Danny_B) For the record: It was not generated today as well. [20:57:16] (03CR) 10Dzahn: [C: 032] "added the contact in private repo, added phone number, added notification commands, set timezone to PST ..." [puppet] - 10https://gerrit.wikimedia.org/r/299198 (https://phabricator.wikimedia.org/T140422) (owner: 10Dzahn) [20:58:06] 06Operations, 10Phabricator, 13Patch-For-Review: Phabricator weekly report not generated (or at least sent) - https://phabricator.wikimedia.org/T139950#2472876 (10Danny_B) And none of those previous manual attempts mentioned above generated that. Cf. {T85183} and {rOPUP1cc6c30e95ae} [20:59:14] (03PS2) 10Chad: sitelist: update link to sitelist documentation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296756 (owner: 10BryanDavis) [21:00:04] dapatrick and bawolff: Respected human, time to deploy Security (non-urgent) deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160718T2100). Please do the needful. [21:00:41] (03PS4) 10Gehel: Apply es2.x config for logstash1001-3 as well [puppet] - 10https://gerrit.wikimedia.org/r/299660 (owner: 10EBernhardson) [21:00:45] (03Abandoned) 10Chad: sitelist: update link to sitelist documentation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296756 (owner: 10BryanDavis) [21:01:29] (03PS1) 10Dzahn: phabricator: re-enable project-changes email [puppet] - 10https://gerrit.wikimedia.org/r/299663 (https://phabricator.wikimedia.org/T139950) [21:01:48] 06Operations, 10Ops-Access-Requests, 06Labs, 13Patch-For-Review: madhuvishy is moving to operations on 7/18/16 - https://phabricator.wikimedia.org/T140422#2472883 (10Dzahn) [21:02:48] (03PS2) 10Dzahn: phabricator: re-enable project-changes email [puppet] - 10https://gerrit.wikimedia.org/r/299663 (https://phabricator.wikimedia.org/T139950) [21:03:38] (03PS3) 10Chad: Whitelist a bunch of RSS feeds for Fundraising Tech to play with [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298399 (owner: 10Awight) [21:04:52] (03CR) 10Chad: [C: 032] Whitelist a bunch of RSS feeds for Fundraising Tech to play with [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298399 (owner: 10Awight) [21:05:31] (03Merged) 10jenkins-bot: Whitelist a bunch of RSS feeds for Fundraising Tech to play with [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298399 (owner: 10Awight) [21:05:45] 06Operations, 10Phabricator, 13Patch-For-Review: Phabricator weekly report not generated (or at least sent) - https://phabricator.wikimedia.org/T139950#2472888 (10Dzahn) p:05High>03Normal [21:05:57] (03CR) 10Gehel: [C: 032] Apply es2.x config for logstash1001-3 as well [puppet] - 10https://gerrit.wikimedia.org/r/299660 (owner: 10EBernhardson) [21:06:05] (03CR) 10Danny B.: [C: 031] phabricator: re-enable project-changes email [puppet] - 10https://gerrit.wikimedia.org/r/299663 (https://phabricator.wikimedia.org/T139950) (owner: 10Dzahn) [21:06:07] (03CR) 10Dzahn: [C: 032] phabricator: re-enable project-changes email [puppet] - 10https://gerrit.wikimedia.org/r/299663 (https://phabricator.wikimedia.org/T139950) (owner: 10Dzahn) [21:06:16] (03PS3) 10Dzahn: phabricator: re-enable project-changes email [puppet] - 10https://gerrit.wikimedia.org/r/299663 (https://phabricator.wikimedia.org/T139950) [21:07:04] !log demon@tin Synchronized wmf-config/InitialiseSettings.php: whitelist some rss feeds for mw.org (duration: 00m 43s) [21:07:14] awight: Did that one for you ^ [21:07:42] (03CR) 10Paladox: "@Hashar all you have to do for the git repos is delete them, they will be git cloned again as they are tested :)." [puppet] - 10https://gerrit.wikimedia.org/r/299592 (https://phabricator.wikimedia.org/T125018) (owner: 10Dzahn) [21:08:55] PROBLEM - Check correctness of the icinga configuration on neon is CRITICAL: Icinga configuration contains errors [21:09:05] ostriches: ty! [21:09:09] yw [21:09:20] (trying to clear my backlog today :)) [21:09:22] I'll probably bug you for the revert in a few weeks :( [21:09:41] !log changed replica count for logstash-2016.06-(01|02|03|15|16|17) indices to 0 to make room for recovering todays index [21:09:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:09:54] bd808 mentioned another code quality tool, and I prefer its results: scrutinizer-ci [21:10:00] 07Puppet, 10ORES, 06Revision-Scoring-As-A-Service, 07Easy, 13Patch-For-Review: Puppet fails on new web node - https://phabricator.wikimedia.org/T140265#2472922 (10Ladsgroup) 05Open>03Resolved [21:10:37] (03Abandoned) 10Chad: Beta: remove old staging node defs and do it for beta instead [puppet] - 10https://gerrit.wikimedia.org/r/296809 (owner: 10Chad) [21:12:05] RECOVERY - ElasticSearch health check for shards on logstash1003 is OK: OK - elasticsearch status production-logstash-eqiad: status: yellow, number_of_nodes: 6, unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 11, task_max_waiting_in_queue_millis: 0, cluster_name: production-logstash-eqiad, relocating_shards: 0, active_shards_percent_as_number: 90.4761904762, acti [21:12:10] (03CR) 10Paladox: [C: 031] Gerrit: Add icinga check for Gerrit SSH access [puppet] - 10https://gerrit.wikimedia.org/r/299606 (owner: 10Chad) [21:14:04] (03CR) 10Paladox: [C: 031] Gerrit: Add myself to contact groups for when https goes boom [puppet] - 10https://gerrit.wikimedia.org/r/299605 (owner: 10Chad) [21:15:47] (03PS2) 10Awight: Delist Special:CodeReview [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298544 (https://phabricator.wikimedia.org/T116948) [21:16:18] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2472952 (10EBernhardson) [21:16:53] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [21:19:39] (03PS4) 10Chad: Gerrit: Remove SSH public key and last user of it [puppet] - 10https://gerrit.wikimedia.org/r/298052 [21:20:08] 06Operations, 10Phabricator, 13Patch-For-Review: Phabricator weekly report not generated (or at least sent) - https://phabricator.wikimedia.org/T139950#2472959 (10Dzahn) i also ran the command manually and it got send to the mailing ist now [21:20:22] 06Operations, 10Phabricator, 13Patch-For-Review: Phabricator weekly report not generated (or at least sent) - https://phabricator.wikimedia.org/T139950#2472960 (10Dzahn) 05Open>03Resolved a:03Dzahn [21:21:05] (03PS1) 10Yuvipanda: Revert "labs: Don't have shinken do basic instance checks" [puppet] - 10https://gerrit.wikimedia.org/r/299664 [21:24:44] (03PS2) 10Yuvipanda: Revert "labs: Don't have shinken do basic instance checks" [puppet] - 10https://gerrit.wikimedia.org/r/299664 [21:26:52] (03CR) 10Yuvipanda: [C: 032 V: 032] Revert "labs: Don't have shinken do basic instance checks" [puppet] - 10https://gerrit.wikimedia.org/r/299664 (owner: 10Yuvipanda) [21:32:59] (03PS10) 10Gehel: Update kibana module for kibana 4 [puppet] - 10https://gerrit.wikimedia.org/r/296279 (https://phabricator.wikimedia.org/T129138) (owner: 10EBernhardson) [21:34:06] icinga config issue is me adding the new contact for madhu [21:34:12] !log changed replica count for logstash-2016.06-(01|02|03|15|16|17) indices back to 2 [21:34:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:35:14] (03CR) 10Gehel: [C: 032] Update kibana module for kibana 4 [puppet] - 10https://gerrit.wikimedia.org/r/296279 (https://phabricator.wikimedia.org/T129138) (owner: 10EBernhardson) [21:37:53] ostriches ive tested https://gerrit.wikimedia.org/r/#/c/299447/ now and it keeps the grey color and fixes the links so they are blue. [21:38:24] (03PS20) 10Dzahn: Add missing roottree, file configs to gerrit.config.erb [puppet] - 10https://gerrit.wikimedia.org/r/298710 (owner: 10Paladox) [21:38:28] !log demon@tin Started scap: security, se-curity [21:38:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:40:15] (03CR) 10Chad: "I'm curious if we should do some apache rewrites for these URLs (if possible) so they can end up at https://phabricator.wikimedia.org/rSV" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298544 (https://phabricator.wikimedia.org/T116948) (owner: 10Awight) [21:41:04] (03CR) 10Awight: "That's a great idea, mind mentioning on the task?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298544 (https://phabricator.wikimedia.org/T116948) (owner: 10Awight) [21:41:50] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2473085 (10Gehel) [21:46:40] !log demon@tin Finished scap: security, se-curity (duration: 08m 12s) [21:46:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:47:47] (03PS1) 10EBernhardson: Reduce kibana logging levels [puppet] - 10https://gerrit.wikimedia.org/r/299671 [21:49:37] (03PS1) 10Thcipriani: Use hiera for udp2log-mw logrotate count [puppet] - 10https://gerrit.wikimedia.org/r/299672 (https://phabricator.wikimedia.org/T140313) [21:52:14] !log brought up kibana4 on logstash.wikimedia.org [21:52:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:53:26] RECOVERY - Check correctness of the icinga configuration on neon is OK: Icinga configuration is correct [21:53:32] preeety [21:53:34] +t [21:58:40] (03CR) 10Gehel: [C: 032 V: 032] "Trivial enough" [puppet] - 10https://gerrit.wikimedia.org/r/299671 (owner: 10EBernhardson) [22:01:32] !log Deployed patch for T115333 to wmf.10 [22:01:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:04:11] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2473155 (10EBernhardson) [22:07:47] !log elasticsearch / kibana upgrade done [22:07:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:18:12] !log Deployed patch for T136402 on php-1.28.0-wmf.10 [22:18:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:22:02] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2473262 (10EBernhardson) [22:24:14] (03PS21) 10Dzahn: Add missing roottree, file configs to gerrit.config.erb [puppet] - 10https://gerrit.wikimedia.org/r/298710 (owner: 10Paladox) [22:28:19] (03PS3) 10Chad: WIP: Gerrit: Swap lead to point at production data [puppet] - 10https://gerrit.wikimedia.org/r/298673 [22:28:21] (03PS1) 10Chad: Gerrit: Go ahead and ensure lets_encrypt everywhere other than ytterbium [puppet] - 10https://gerrit.wikimedia.org/r/299678 [22:30:12] (03CR) 10Dzahn: [C: 032] Add missing roottree, file configs to gerrit.config.erb [puppet] - 10https://gerrit.wikimedia.org/r/298710 (owner: 10Paladox) [22:32:00] gerrit will restart for config change. [22:32:22] !log gerrit restarting for config change 298710 [22:32:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:33:00] gerrit is back [22:33:33] guessing it is gerrit-new turn now [22:33:39] and thanks [22:34:28] mutante ^^ [22:34:34] :) [22:34:55] !log Deployed patch for T132926 to wmf.10 [22:34:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:35:00] paladox: yes [22:35:11] thanks :) [22:35:33] !log gerrit-new restarting for config change 298710 [22:35:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:35:48] paladox: you can test [22:35:54] ok [22:36:12] yep thanks, it works [22:36:14] :) [22:36:19] mutante ^^ [22:36:24] nice! [22:36:48] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2473356 (10EBernhardson) [22:37:13] yep [22:37:18] PROBLEM - puppet last run on graphite1001 is CRITICAL: CRITICAL: Puppet has 1 failures [22:38:48] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [22:40:02] !log lead: disabled puppet for a bit to test some CSS tweaks live. [22:40:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:40:28] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [22:43:58] !log Deployed patches for T133147 to wmf.10 [22:44:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:46:28] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [22:46:47] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [22:48:44] !log lead: puppet turned back on [22:48:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:49:22] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2473390 (10EBernhardson) [22:50:00] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2318624 (10EBernhardson) [22:51:34] !log restarted grrrit-wm [22:51:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:53:37] (03PS2) 10Dzahn: Gerrit: Add icinga check for Gerrit SSH access [puppet] - 10https://gerrit.wikimedia.org/r/299606 (owner: 10Chad) [22:53:43] (03PS4) 10Chad: Remove deprecated Fundraising thermometer config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/233900 (owner: 10Awight) [22:55:11] (03CR) 10Chad: [C: 032] Remove deprecated Fundraising thermometer config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/233900 (owner: 10Awight) [22:55:44] (03PS2) 10Jforrester: Change wmgVisualEditorAvailableNamespaces keys to canonical names instead of indexes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296673 (https://phabricator.wikimedia.org/T138999) (owner: 10Alex Monk) [22:55:49] (03Merged) 10jenkins-bot: Remove deprecated Fundraising thermometer config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/233900 (owner: 10Awight) [22:57:21] !log demon@tin Synchronized wmf-config/CommonSettings.php: rm deprecated fundraising config (duration: 00m 26s) [22:57:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:58:16] !log deploy fix for T129738 to php-1.28.0-wmf.10 [22:58:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:58:46] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2318624 (10EBernhardson) [22:59:16] 06Operations, 06Discovery, 10Wikimedia-Logstash, 03Discovery-Search-Sprint, and 2 others: [EPIC] Upgrade elasticsearch cluster supporting logging to 2.3 - https://phabricator.wikimedia.org/T136001#2318624 (10EBernhardson) [23:00:04] RoanKattouw, ostriches, MaxSem, and Dereckson: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160718T2300). [23:00:04] MaxSem, jgirault, and James_F: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:24] * MaxSem waves [23:00:38] * James_F too. [23:01:43] ahem, gues I'll deploy [23:01:56] (03CR) 10Dzahn: [C: 032] Gerrit: Add icinga check for Gerrit SSH access [puppet] - 10https://gerrit.wikimedia.org/r/299606 (owner: 10Chad) [23:04:45] Thank you, Max. [23:04:51] RECOVERY - puppet last run on graphite1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:05:34] (03PS1) 10Chad: Nitpick: Point to Phab tasks directly instead of BZ redirs [puppet] - 10https://gerrit.wikimedia.org/r/299683 [23:06:33] (03CR) 10Paladox: [C: 031] Nitpick: Point to Phab tasks directly instead of BZ redirs [puppet] - 10https://gerrit.wikimedia.org/r/299683 (owner: 10Chad) [23:06:36] (03CR) 10MaxSem: [C: 032] Change wmgVisualEditorAvailableNamespaces keys to canonical names instead of indexes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296673 (https://phabricator.wikimedia.org/T138999) (owner: 10Alex Monk) [23:08:15] (03PS3) 10MaxSem: Change wmgVisualEditorAvailableNamespaces keys to canonical names instead of indexes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296673 (https://phabricator.wikimedia.org/T138999) (owner: 10Alex Monk) [23:08:53] (03CR) 10MaxSem: Change wmgVisualEditorAvailableNamespaces keys to canonical names instead of indexes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296673 (https://phabricator.wikimedia.org/T138999) (owner: 10Alex Monk) [23:09:00] (03CR) 10MaxSem: [C: 032] Change wmgVisualEditorAvailableNamespaces keys to canonical names instead of indexes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296673 (https://phabricator.wikimedia.org/T138999) (owner: 10Alex Monk) [23:09:24] (03PS5) 10Dzahn: Gerrit: Remove SSH public key and last user of it [puppet] - 10https://gerrit.wikimedia.org/r/298052 (owner: 10Chad) [23:09:30] (03PS8) 10Paladox: Add some colors to the site table on changes [puppet] - 10https://gerrit.wikimedia.org/r/299447 [23:09:55] (03Merged) 10jenkins-bot: Change wmgVisualEditorAvailableNamespaces keys to canonical names instead of indexes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296673 (https://phabricator.wikimedia.org/T138999) (owner: 10Alex Monk) [23:10:35] James_F, pulled on mw1099 [23:11:32] MaxSem: Both of them, just the config one, just the MediaViewer one? [23:11:40] config [23:11:42] MaxSem: If the config, LGTM. [23:11:44] Kk. [23:11:59] (03CR) 10Dzahn: [C: 032] Gerrit: Remove SSH public key and last user of it [puppet] - 10https://gerrit.wikimedia.org/r/298052 (owner: 10Chad) [23:14:10] James_F, now mmv too [23:14:31] (03PS1) 10Chad: Gerrit: Follow-up I8455189c, isn't needed now [puppet] - 10https://gerrit.wikimedia.org/r/299684 [23:15:08] (03CR) 10Dzahn: [C: 032] Gerrit: Follow-up I8455189c, isn't needed now [puppet] - 10https://gerrit.wikimedia.org/r/299684 (owner: 10Chad) [23:15:27] (03CR) 10Dzahn: [V: 032] Gerrit: Follow-up I8455189c, isn't needed now [puppet] - 10https://gerrit.wikimedia.org/r/299684 (owner: 10Chad) [23:15:37] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/296673/ (duration: 00m 30s) [23:15:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:16:32] MaxSem: Yeah, LGTM. [23:17:18] PROBLEM - puppet last run on ytterbium is CRITICAL: CRITICAL: puppet fail [23:17:38] PROBLEM - puppet last run on lead is CRITICAL: CRITICAL: puppet fail [23:17:49] !log maxsem@tin Synchronized php-1.28.0-wmf.10/extensions/Kartographer/: https://gerrit.wikimedia.org/r/#/c/299560/ (duration: 00m 27s) [23:17:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:18:58] !log maxsem@tin Synchronized php-1.28.0-wmf.10/extensions/MultimediaViewer/: https://gerrit.wikimedia.org/r/#/c/299560/ (duration: 00m 26s) [23:19:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:19:26] James_F, ^ [23:19:32] * James_F re-checks. [23:20:34] MaxSem: LGTM. [23:21:32] (03PS1) 10Chad: Gerrit: Follow-up I8455189c, use monitoring::service not nrpe::monitor_service [puppet] - 10https://gerrit.wikimedia.org/r/299687 [23:22:00] (03CR) 10Dzahn: [C: 032] Gerrit: Follow-up I8455189c, use monitoring::service not nrpe::monitor_service [puppet] - 10https://gerrit.wikimedia.org/r/299687 (owner: 10Chad) [23:22:50] (03CR) 10Dzahn: [V: 032] Gerrit: Follow-up I8455189c, use monitoring::service not nrpe::monitor_service [puppet] - 10https://gerrit.wikimedia.org/r/299687 (owner: 10Chad) [23:22:53] MaxSem: Do you want to do any of the clean-up config ones? If not I'm happy to punt them to tomorrow. :_) [23:23:34] why delay something till tomorrow when you can delay it to the day after that? [23:23:39] Or that. :-D [23:23:56] 06Operations, 06Project-Admins, 10Traffic, 10domains: create a project for tasks related to WMF domain names - https://phabricator.wikimedia.org/T87465#2473520 (10Danny_B) [23:24:15] (03PS5) 10MaxSem: Cleanup: Move never-altered UseDismissableSiteNotice into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292619 (owner: 10Jforrester) [23:24:24] (03PS1) 10Chad: Gerrit: monitoring conflicts with system ssh [puppet] - 10https://gerrit.wikimedia.org/r/299688 [23:24:38] (03CR) 10MaxSem: [C: 032] Cleanup: Move never-altered UseDismissableSiteNotice into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292619 (owner: 10Jforrester) [23:25:28] (03PS1) 10Paladox: [gerrit] Use HEAD instead of master for branch [puppet] - 10https://gerrit.wikimedia.org/r/299689 [23:25:31] (03Merged) 10jenkins-bot: Cleanup: Move never-altered UseDismissableSiteNotice into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292619 (owner: 10Jforrester) [23:25:41] (03CR) 10MaxSem: Cleanup: Move never-altered UseAbuseFilter into CommonSettings (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292620 (owner: 10Jforrester) [23:25:51] (03CR) 10Dzahn: [C: 032 V: 032] Gerrit: monitoring conflicts with system ssh [puppet] - 10https://gerrit.wikimedia.org/r/299688 (owner: 10Chad) [23:26:06] (03PS2) 10Paladox: [gerrit] Use HEAD instead of master for branch [puppet] - 10https://gerrit.wikimedia.org/r/299689 [23:27:17] (03CR) 10Jforrester: Cleanup: Move never-altered UseAbuseFilter into CommonSettings (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292620 (owner: 10Jforrester) [23:27:27] RECOVERY - puppet last run on ytterbium is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [23:27:48] RECOVERY - puppet last run on lead is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [23:28:13] (03PS4) 10Jforrester: Cleanup: Move never-altered UseAbuseFilter into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292620 [23:28:29] (03PS5) 10Jforrester: Cleanup: Move never-altered UseAbuseFilter into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292620 [23:30:20] !log maxsem@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/292619/ part 1 (duration: 00m 26s) [23:30:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:31:26] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/292619/ part 2 (duration: 00m 24s) [23:31:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:34:03] (03CR) 10MaxSem: [C: 032] Cleanup: Move never-altered UseAbuseFilter into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292620 (owner: 10Jforrester) [23:34:39] (03Merged) 10jenkins-bot: Cleanup: Move never-altered UseAbuseFilter into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292620 (owner: 10Jforrester) [23:38:23] !log maxsem@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/292620/ part 1 (duration: 00m 26s) [23:38:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:39:17] !log maxsem@tin Synchronized wmf-config/: https://gerrit.wikimedia.org/r/#/c/292620/ part 2 (duration: 00m 27s) [23:39:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:42:04] (03PS3) 10MaxSem: Cleanup: Move never-altered UseLocalisationUpdate into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292621 (owner: 10Jforrester) [23:43:03] (03CR) 10MaxSem: [C: 032] Cleanup: Move never-altered UseLocalisationUpdate into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292621 (owner: 10Jforrester) [23:43:38] (03Merged) 10jenkins-bot: Cleanup: Move never-altered UseLocalisationUpdate into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292621 (owner: 10Jforrester) [23:45:11] (03PS3) 10Jforrester: Cleanup: Move never-altered CommonsMetadata* into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292622 [23:46:32] !log maxsem@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/292621/ (duration: 00m 31s) [23:46:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:47:48] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/292621/ (duration: 00m 29s) [23:47:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:48:02] 06Operations, 10Flow, 10MediaWiki-Redirects, 03Collab-Team-2016-Apr-Jun-Q4, and 2 others: Flow notification links on mobile point to desktop - https://phabricator.wikimedia.org/T107108#2473577 (10Etonkovidova) Testing *production** for the scenarios described in the ticket: |** Notification** |**Mobile p... [23:49:41] (03CR) 10Jforrester: [C: 04-1] "Roan thinks that there might be a maintenance script that needs to be run when enabling this on new wikis. Anyone know?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/298344 (https://phabricator.wikimedia.org/T138507) (owner: 10Jforrester) [23:49:49] (03CR) 10MaxSem: [C: 032] Cleanup: Move never-altered CommonsMetadata* into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292622 (owner: 10Jforrester) [23:49:57] YuviPanda: ---^^ [23:50:36] (03Merged) 10jenkins-bot: Cleanup: Move never-altered CommonsMetadata* into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292622 (owner: 10Jforrester) [23:51:28] James_F, any ideas how to test it^^^? [23:52:55] MaxSem: It'd be a pain. You'd have to view a Commons file remotely, upload a new version with different meta-data, then view the Commons file again and yet not see the change. [23:53:00] MaxSem: It's almost certainly fine.™ [23:53:07] yawwwn :P [23:54:04] !log maxsem@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/292622/ (duration: 00m 25s) [23:54:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:54:56] !log maxsem@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/292622/ (duration: 00m 24s) [23:55:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:55:31] Nothing seems to have broken, [23:55:50] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/292622/ (duration: 00m 25s) [23:55:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:56:06] IT ONLY SEEMS SO. YOU CAN'T POSSIBLY CLAIM IT WITHOUT ANY DATA [23:56:11] Yup. [23:57:05] anyway, I've already forgotten about git pull once which menas I'm tired. calling it an end [23:57:14] OK. :-) [23:57:23] (03PS2) 10Dzahn: Nitpick: Point to Phab tasks directly instead of BZ redirs [puppet] - 10https://gerrit.wikimedia.org/r/299683 (owner: 10Chad) [23:57:27] Just two left. I'll move them to the morning. [23:58:06] (03CR) 10Dzahn: [C: 032] Nitpick: Point to Phab tasks directly instead of BZ redirs [puppet] - 10https://gerrit.wikimedia.org/r/299683 (owner: 10Chad)