[00:18:27] 3Ops-Access-Requests, RESTBase, operations: Access to restbase / cassandra cluster - https://phabricator.wikimedia.org/T89366#1042670 (10GWicke) Let me respond to the point on puppet & config changes: > "disable puppet" — that's a bad idea. Puppet being disabled is a problem; we should never do that as a matter... [00:32:33] 3operations: Varnish GeoIP is broken for HTTPS+IPv6 traffic - https://phabricator.wikimedia.org/T89688#1042675 (10faidon) 3NEW a:3BBlack [01:00:38] 3Analytics, operations, Analytics-Kanban: Upgrade Analytics Cluster to Trusty, and then to CDH 5.3 - https://phabricator.wikimedia.org/T1200#1042711 (10Ottomata) Alright! Some oozie jobs are busy backfilling the time that they were offline, but everything is looking good. I'm going to wait a day or two make su... [01:01:00] (03PS1) 10Faidon Liambotis: varnish: fix GeoIP's get_relevant_ip function [puppet] - 10https://gerrit.wikimedia.org/r/190964 [01:01:10] bblack, ori: ^ [01:02:02] (03CR) 10Faidon Liambotis: "Live on cp1008." [puppet] - 10https://gerrit.wikimedia.org/r/190964 (owner: 10Faidon Liambotis) [01:04:38] (03PS2) 10Faidon Liambotis: varnish: fix GeoIP's get_relevant_ip function [puppet] - 10https://gerrit.wikimedia.org/r/190964 [01:08:48] (03PS1) 10Andrew Bogott: Add a stupid sleep during first boot. [puppet] - 10https://gerrit.wikimedia.org/r/190966 [01:09:51] (03PS2) 10Andrew Bogott: Add a stupid sleep during first boot. [puppet] - 10https://gerrit.wikimedia.org/r/190966 [01:11:22] (03CR) 10Andrew Bogott: [C: 032] Add a stupid sleep during first boot. [puppet] - 10https://gerrit.wikimedia.org/r/190966 (owner: 10Andrew Bogott) [01:18:45] 3Ops-Access-Requests, RESTBase, operations: Access to restbase / cassandra cluster - https://phabricator.wikimedia.org/T89366#1042728 (10GWicke) > I think Servisor needs a much wider vetting (perhaps an RFC?) This sounds like its on the track to be a core service used not only by RESTbase, but all services. Th... [01:44:50] (03CR) 10Ori.livneh: "Are hexadecimal digits A-F guaranteed to be lowercase?" [puppet] - 10https://gerrit.wikimedia.org/r/190964 (owner: 10Faidon Liambotis) [01:50:26] (03CR) 10Ori.livneh: [C: 032] Temporarily log message key lookups on four app servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190777 (https://phabricator.wikimedia.org/T65416) (owner: 10Ori.livneh) [01:50:37] (03Merged) 10jenkins-bot: Temporarily log message key lookups on four app servers [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190777 (https://phabricator.wikimedia.org/T65416) (owner: 10Ori.livneh) [01:51:24] !log ori Synchronized wmf-config: Ie91add33f: Temporarily log message key lookups on four app servers (duration: 00m 05s) [01:51:29] Logged the message, Master [02:15:59] !log l10nupdate Synchronized php-1.25wmf16/cache/l10n: (no message) (duration: 00m 03s) [02:16:06] Logged the message, Master [02:17:06] !log LocalisationUpdate completed (1.25wmf16) at 2015-02-17 02:16:02+00:00 [02:17:13] Logged the message, Master [02:18:17] (03PS1) 10Ori.livneh: Revert "Temporarily log message key lookups on four app servers" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190969 [02:19:34] (03PS2) 10Ori.livneh: Revert "Temporarily log message key lookups on four app servers" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190969 [02:19:40] (03CR) 10Ori.livneh: [C: 032] Revert "Temporarily log message key lookups on four app servers" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190969 (owner: 10Ori.livneh) [02:19:45] (03Merged) 10jenkins-bot: Revert "Temporarily log message key lookups on four app servers" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190969 (owner: 10Ori.livneh) [02:21:30] !log ori Synchronized wmf-config: Revert Ie91add33f: Temporarily log message key lookups on four app servers (duration: 00m 05s) [02:21:34] Logged the message, Master [02:26:56] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 885.549095637 [02:30:11] !log l10nupdate Synchronized php-1.25wmf17/cache/l10n: (no message) (duration: 00m 01s) [02:30:19] Logged the message, Master [02:31:18] !log LocalisationUpdate completed (1.25wmf17) at 2015-02-17 02:30:14+00:00 [02:31:23] Logged the message, Master [02:40:42] anyone know where old hhvm.logs go when they're rotated? [02:46:19] Hi bblack, yt? quick question: do you know where I can get *old* hhvm.logs? (say, from last Thursday or so?) [02:48:17] apergos: ? ^ [02:50:35] * AndyRussG looks around for someone to ping... [02:52:13] ottomata: ? ^ [02:59:32] AndyRussG: On fluorine they go in /a/mw-log/archive [03:00:26] bd808: gotcha! Thanks so much :) [03:02:37] bd808: mmm can I throw you another quick question? It's: whom should I bug for advice on writing a quick varnish rule? [03:03:56] AndyRussG: b.black and or.i are the two folks I'd turn to (at least for code review) [03:04:23] also don't kid yourself that their is an "easy" varnish change [03:04:57] our vcl is a complicated beast that I expect to achieve sentience any day now [03:08:48] bd808: cool, thanks! :) [03:09:02] Yeah it is a beast, indeed 8p [03:54:33] (03PS1) 10Springle: repool db1065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190976 [03:55:08] (03CR) 10Springle: [C: 032] repool db1065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190976 (owner: 10Springle) [03:55:13] (03Merged) 10jenkins-bot: repool db1065 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190976 (owner: 10Springle) [03:56:12] !log springle Synchronized wmf-config/db-eqiad.php: repool db1065, warm up (duration: 00m 05s) [03:56:15] Logged the message, Master [04:15:57] PROBLEM - puppet last run on amssq31 is CRITICAL: CRITICAL: puppet fail [04:35:17] RECOVERY - puppet last run on amssq31 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [04:43:44] (03PS1) 10Tim Landscheidt: Tools: Make crontab host configurable [puppet] - 10https://gerrit.wikimedia.org/r/190977 (https://phabricator.wikimedia.org/T87387) [04:43:56] (03PS1) 10Tim Landscheidt: Tools: Don't forward crontab to tools-submit for system users [puppet] - 10https://gerrit.wikimedia.org/r/190978 (https://phabricator.wikimedia.org/T87527) [04:44:09] (03PS2) 10Tim Landscheidt: Tools: Properly puppetize crontab replacement [puppet] - 10https://gerrit.wikimedia.org/r/186627 (https://phabricator.wikimedia.org/T86445) [05:16:48] !log LocalisationUpdate ResourceLoader cache refresh completed at Tue Feb 17 05:15:45 UTC 2015 (duration 15m 44s) [05:16:54] Logged the message, Master [06:22:38] ^d: you around? [06:28:28] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:46] PROBLEM - puppet last run on db1042 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:47] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:17] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 3 failures [06:29:27] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 3 failures [06:29:57] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:17] PROBLEM - puppet last run on cp4008 is CRITICAL: CRITICAL: Puppet has 2 failures [06:33:10] ah no, he wouldn't be, it was a holiday.. ugh [06:45:38] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:45:56] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:45:57] RECOVERY - puppet last run on mw1119 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:46:16] RECOVERY - puppet last run on mw1025 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [06:46:37] RECOVERY - puppet last run on mw1061 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [06:46:47] RECOVERY - puppet last run on db1042 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:47:27] RECOVERY - puppet last run on cp4008 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [06:48:07] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [06:51:56] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 0 below the confidence bounds [07:14:49] (03CR) 10Nikerabbit: [C: 031] "Did anyone from parsoid comment?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190776 (https://phabricator.wikimedia.org/T89558) (owner: 10KartikMistry) [07:20:26] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [07:26:24] (03PS2) 10Giuseppe Lavagetto: contint: hhvm-dev on Trusty slaves [puppet] - 10https://gerrit.wikimedia.org/r/190946 (https://phabricator.wikimedia.org/T89649) (owner: 10Hashar) [07:26:31] (03CR) 10Giuseppe Lavagetto: [C: 032] contint: hhvm-dev on Trusty slaves [puppet] - 10https://gerrit.wikimedia.org/r/190946 (https://phabricator.wikimedia.org/T89649) (owner: 10Hashar) [07:33:38] (03CR) 10Giuseppe Lavagetto: "Thanks Antoine <3" [puppet] - 10https://gerrit.wikimedia.org/r/190946 (https://phabricator.wikimedia.org/T89649) (owner: 10Hashar) [08:24:00] 3operations: Our custom php packages need to create some conf.d links - https://phabricator.wikimedia.org/T89157#1043019 (10Joe) [08:29:56] 3operations, Beta-Cluster: Make www-data the web-serving user (is currently apache) - https://phabricator.wikimedia.org/T78076#1043023 (10Joe) 5stalled>3Open a:5yuvipanda>3Joe [08:32:01] <_joe_> !log depooling mw1019-28 for T78076 [08:32:09] Logged the message, Master [08:34:47] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [08:43:04] (03PS1) 10Giuseppe Lavagetto: mediawiki: move hosts mw1019-1028 to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190986 [08:44:37] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: move hosts mw1019-1028 to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190986 (owner: 10Giuseppe Lavagetto) [08:45:56] (03PS1) 10Giuseppe Lavagetto: mediawiki: fixup for I076f50cefe777daf6620cbf80ed0b180dbaeb23e [puppet] - 10https://gerrit.wikimedia.org/r/190987 [08:46:18] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: fixup for I076f50cefe777daf6620cbf80ed0b180dbaeb23e [puppet] - 10https://gerrit.wikimedia.org/r/190987 (owner: 10Giuseppe Lavagetto) [08:46:40] (03CR) 10Giuseppe Lavagetto: [V: 032] mediawiki: fixup for I076f50cefe777daf6620cbf80ed0b180dbaeb23e [puppet] - 10https://gerrit.wikimedia.org/r/190987 (owner: 10Giuseppe Lavagetto) [08:55:11] (03PS1) 10Giuseppe Lavagetto: mediawiki: move mw1019 to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190988 [08:55:42] greetings [08:55:53] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] mediawiki: move mw1019 to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190988 (owner: 10Giuseppe Lavagetto) [08:56:49] <_joe_> godog: ciao [08:57:03] hey _joe_ [08:57:45] <_joe_> don't worry, it's very dumb and simple things I'm committing. I just seem unable to write a regexp this morning. The continuous sound of hammering from upstairs may be part of the problem though :) [09:03:27] PROBLEM - HHVM processes on mw1020 is CRITICAL: PROCS CRITICAL: 0 processes with command name hhvm [09:03:36] PROBLEM - Apache HTTP on mw1020 is CRITICAL: Connection refused [09:03:47] PROBLEM - HHVM rendering on mw1020 is CRITICAL: Connection refused [09:03:50] <_joe_> still me [09:04:36] RECOVERY - HHVM processes on mw1020 is OK: PROCS OK: 1 process with command name hhvm [09:04:56] RECOVERY - HHVM rendering on mw1020 is OK: HTTP OK: HTTP/1.1 200 OK - 66749 bytes in 1.120 second response time [09:05:46] RECOVERY - Apache HTTP on mw1020 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.070 second response time [09:06:02] <_joe_> I wasn't fast enough this time [09:06:03] !log rolling restart of elastic1023 -> elastic1031 [09:06:10] Logged the message, Master [09:06:18] <_joe_> oh I see you're enjoying yourself too :) [09:06:47] (03PS1) 10KartikMistry: WIP: Beta: Use compact registry format for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/190989 [09:07:57] PROBLEM - HHVM rendering on mw1027 is CRITICAL: Connection refused [09:07:57] PROBLEM - HHVM rendering on mw1026 is CRITICAL: Connection refused [09:07:57] PROBLEM - Apache HTTP on mw1028 is CRITICAL: Connection refused [09:09:06] RECOVERY - HHVM rendering on mw1026 is OK: HTTP OK: HTTP/1.1 200 OK - 66749 bytes in 0.554 second response time [09:09:06] RECOVERY - HHVM rendering on mw1027 is OK: HTTP OK: HTTP/1.1 200 OK - 66750 bytes in 1.598 second response time [09:09:07] RECOVERY - Apache HTTP on mw1028 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 2.493 second response time [09:12:57] PROBLEM - Apache HTTP on mw1022 is CRITICAL: Connection refused [09:12:57] PROBLEM - Apache HTTP on mw1021 is CRITICAL: Connection refused [09:13:44] (03PS1) 10KartikMistry: WIP: Use compact registry format for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/190990 [09:14:07] RECOVERY - Apache HTTP on mw1021 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.078 second response time [09:14:16] RECOVERY - Apache HTTP on mw1022 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 8.405 second response time [09:18:31] <_joe_> !log repooling mw1019-28 [09:18:38] Logged the message, Master [09:18:59] _joe_: upstrair or upstream ;) [09:19:18] see typos I'm making :/ [09:20:23] (03PS2) 10Filippo Giunchedi: RESTBase: add Icinga check_procs check for Cassandra [puppet] - 10https://gerrit.wikimedia.org/r/190688 (owner: 10Ori.livneh) [09:21:25] <_joe_> kart_: upstaris, sigh [09:22:33] 3Release-Engineering, operations: /usr/local/bin/deploy2graphite broken on tin due to nc command syntax - https://phabricator.wikimedia.org/T1387#1043064 (10fgiunchedi) a:3fgiunchedi ping? @ori @bd808 there was a question in https://gerrit.wikimedia.org/r/#/c/183568/ re: the history of this file move which I'm... [09:22:59] (03PS1) 10Giuseppe Lavagetto: mediawiki: move mw1029-1039 to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190991 [09:23:01] (03PS1) 10Giuseppe Lavagetto: mediawiki: move mw1040-49 to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190992 [09:23:03] (03PS1) 10Giuseppe Lavagetto: mediawiki: move all servers in the appserver pool to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190993 [09:23:05] (03PS1) 10Giuseppe Lavagetto: mediawiki: move api canaries to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190994 [09:23:07] (03PS1) 10Giuseppe Lavagetto: mediawiki: move api appservers to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190995 [09:24:43] <_joe_> !log depooling mw1029-1039 [09:24:49] Logged the message, Master [09:25:14] hi! can anybody here update the grrrit bot? https://wikitech.wikimedia.org/wiki/Grrrit-wm#Deploying , we got a new change merged on friday https://gerrit.wikimedia.org/r/#/c/190677/ [09:25:30] or maybe point me to who i could ping for this? [09:25:44] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: move mw1029-1039 to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190991 (owner: 10Giuseppe Lavagetto) [09:26:45] joakino: valhallasw`cloud can, when back, I think [09:27:15] hoo: which timezone is he/she in? [09:28:00] I don't know [09:28:07] oki thx! [09:29:47] PROBLEM - HHVM processes on mw1038 is CRITICAL: PROCS CRITICAL: 0 processes with command name hhvm [09:30:07] PROBLEM - HHVM rendering on mw1038 is CRITICAL: Connection refused [09:30:07] PROBLEM - Apache HTTP on mw1038 is CRITICAL: Connection refused [09:30:37] 3operations: Rolling restart for Elasticsearch to pick up new version of wikimedia-extra plugin - https://phabricator.wikimedia.org/T86602#1043076 (10fgiunchedi) update: yesterday half of the cluster got restarted and I'm restarting the remaining machines. From what I can tell `es-tool restart-fast` takes anywhe... [09:30:46] <_joe_> these ^ will happen from time to time during the day, but I'm handling them [09:30:47] RECOVERY - HHVM processes on mw1038 is OK: PROCS OK: 1 process with command name hhvm [09:31:07] RECOVERY - Apache HTTP on mw1038 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.430 second response time [09:31:07] RECOVERY - HHVM rendering on mw1038 is OK: HTTP OK: HTTP/1.1 200 OK - 66750 bytes in 3.938 second response time [09:37:36] 3Services, operations, RESTBase: setup internal LVS for restbase eqiad servers - https://phabricator.wikimedia.org/T89636#1043086 (10fgiunchedi) [09:38:00] 3operations: Our custom php packages need to create some conf.d links - https://phabricator.wikimedia.org/T89157#1043087 (10Joe) 5stalled>3Open [09:39:55] <_joe_> !log repooling mw1029-1039 [09:40:02] Logged the message, Master [09:41:16] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: move mw1040-49 to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190992 (owner: 10Giuseppe Lavagetto) [09:45:28] (03PS2) 10KartikMistry: WIP: Use compact registry format for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/190990 [09:47:55] (03PS2) 10KartikMistry: WIP: Beta: Use compact registry format for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/190989 [09:48:31] (03PS1) 10Alexandros Kosiaris: beta: Add url_downloader_ip in hiera [puppet] - 10https://gerrit.wikimedia.org/r/191001 [09:51:17] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [09:51:17] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [09:52:18] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [09:52:18] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [09:56:27] (03CR) 10Alexandros Kosiaris: [C: 032] beta: Add url_downloader_ip in hiera [puppet] - 10https://gerrit.wikimedia.org/r/191001 (owner: 10Alexandros Kosiaris) [09:56:56] PROBLEM - Apache HTTP on mw1043 is CRITICAL: Connection refused [09:56:56] PROBLEM - HHVM rendering on mw1043 is CRITICAL: Connection refused [09:57:57] RECOVERY - Apache HTTP on mw1043 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.064 second response time [09:57:57] RECOVERY - HHVM rendering on mw1043 is OK: HTTP OK: HTTP/1.1 200 OK - 66749 bytes in 0.171 second response time [10:01:51] (03PS3) 10KartikMistry: Beta: Use compact registry format for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/190989 [10:02:25] akosiaris: can you review this ^ breaking changes :) [10:05:13] akosiaris: dependencies are done. [10:05:56] !log testing cassandra-metrics on xenon [10:06:02] Logged the message, Master [10:07:48] 3OTRS, operations: Make OTRS sessions IP-address-agnostic - https://phabricator.wikimedia.org/T87217#1043145 (10Steinsplitter) >>! In T87217#1041222, @tommorris wrote: > If one is using tethered mobile broadband from Three (a UK mobile operator) using an Android handset to provide a personal hotspot, every sligh... [10:10:26] kart_: so the config file format changes ? [10:10:54] what's with the source/target stuff ? Kind of duplicated in the mt section [10:10:58] (03PS2) 10Giuseppe Lavagetto: mediawiki: move all servers in the appserver pool to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190993 [10:12:27] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [500.0] [10:12:41] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: move all servers in the appserver pool to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190993 (owner: 10Giuseppe Lavagetto) [10:12:59] akosiaris: yep. [10:13:21] akosiaris: those with mt has separate list [10:13:56] kart_: so when you want to add a new language pair, you have to update source, target and mt ? [10:14:17] akosiaris: depends, if there is MT or not. [10:14:57] MT is language-pair, in earlier config, it was confusing to read. [10:15:24] btw, this happens time to time: https://phabricator.wikimedia.org/P299 [10:15:31] akosiaris: some permission issue? [10:15:33] ^^ [10:15:36] paste [10:15:50] so there can be language pairs without MT ? [10:15:53] as in dictionary ? [10:15:56] or smt else ? [10:16:00] akosiaris: yes. [10:16:21] There can be language without mt, with mt, with dictionary only, without anything :) [10:16:39] without anything ? [10:16:49] akosiaris: I mean mt/dictionary. [10:17:00] but only source, target entries [10:17:19] There has been request to use CX without MT, as it makes easier to create articles. [10:17:23] yes. [10:17:32] source/target needed. [10:17:52] CX makes it easier to create articles even without MT ? [10:18:00] yes. [10:18:03] I did not see that coming [10:18:10] how come ? [10:18:12] akosiaris: Try that. [10:18:47] akosiaris: 1. Side by side editor. 2. Easy adaption of links, references, images, tables. 3. Quick edit tools. [10:19:04] Specially link and references part has been useful. [10:19:07] so, the things that VE should be doing [10:19:16] PROBLEM - puppet last run on restbase1005 is CRITICAL: CRITICAL: Puppet has 1 failures [10:19:19] sounds funny, to say the least [10:20:11] akosiaris: No. It isn't VE or try to be. [10:20:20] never said that [10:20:38] akosiaris: We don't provide full editing tools. Just article from one language to another. [10:20:41] 3OTRS, operations: Make OTRS sessions IP-address-agnostic - https://phabricator.wikimedia.org/T87217#1043167 (10Aschmidt) Please not that we have tried this out in [10:20:46] I get that [10:20:56] akosiaris: VE integration is big future plan :) [10:21:14] but you got to admit the features you just listed are part of VE as well [10:21:23] which is fine by me, just funny [10:21:26] akosiaris: yes. [10:21:27] PROBLEM - Apache HTTP on mw1053 is CRITICAL: Connection refused [10:21:27] PROBLEM - HHVM rendering on mw1053 is CRITICAL: Connection refused [10:21:50] 3operations, RESTBase: Detailed cassandra monitoring - https://phabricator.wikimedia.org/T78514#1043168 (10fgiunchedi) tried this on xenon with the suggested config, there are an additional 325 metrics, namely: ``` cassandra.xenon.jvm.daemon_thread_count cassandra.xenon.jvm.fd_usage cassandra.xenon.jvm.gc.Concu... [10:22:23] 3OTRS, operations: Make OTRS sessions IP-address-agnostic - https://phabricator.wikimedia.org/T87217#1043169 (10Aschmidt) Please note that we have tried this out in https://phabricator.wikimedia.org/T88224 and there are many DSL connections you just cannot work in OTRS from. Mine is one. You please should see... [10:22:27] RECOVERY - Apache HTTP on mw1053 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.051 second response time [10:22:36] RECOVERY - HHVM rendering on mw1053 is OK: HTTP OK: HTTP/1.1 200 OK - 66749 bytes in 0.193 second response time [10:23:35] kart_: ok so source/target is mt+dictionary+languages with neither mt nor dictionary [10:23:36] right ? [10:24:00] yes [10:24:10] <_joe_> !log converting all appservers to www-data [10:24:15] Logged the message, Master [10:25:23] so, is it me or it would be better a better approach ( in a future version ) to have only mt/dictionary/non-translation and compute source/target on configuration reload ? [10:25:53] (03PS4) 10KartikMistry: Beta: Use compact registry format for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/190989 [10:27:02] akosiaris: doable. [10:27:16] akosiaris: if you've time, can you file task? :) [10:27:32] (hope that I've not get it wrong, so better you :)) [10:27:46] I'll be back after much needed [10:29:07] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-0/0/2: down - Transit: ! NTT (service ID 234631) {#1061} [10Gbps DF]BR [10:33:02] hmm [10:35:27] PROBLEM - Apache HTTP on mw1063 is CRITICAL: Connection refused [10:35:56] <_joe_> that's me [10:36:05] <_joe_> will recover in a few [10:36:16] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [10:36:36] RECOVERY - Apache HTTP on mw1063 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.060 second response time [10:40:07] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 [10:41:38] PROBLEM - Apache HTTP on mw1095 is CRITICAL: Connection refused [10:42:46] RECOVERY - Apache HTTP on mw1095 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.048 second response time [10:45:19] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: Use compact registry format for cxserver (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/190989 (owner: 10KartikMistry) [10:48:59] joakino: CET :-) let me take a look [10:49:12] we should add a 'timezone' field to phabricator [10:49:33] thanks valhallasw`cloud ! [10:49:41] yea that would be useful [10:52:47] PROBLEM - HHVM rendering on mw1081 is CRITICAL: Connection refused [10:53:17] grrrit-wm1... [10:53:19] >_< [10:53:47] RECOVERY - HHVM rendering on mw1081 is OK: HTTP OK: HTTP/1.1 200 OK - 66749 bytes in 0.158 second response time [10:54:45] it works! thx valhallasw`cloud! [11:02:27] PROBLEM - Apache HTTP on mw1111 is CRITICAL: Connection refused [11:03:36] RECOVERY - Apache HTTP on mw1111 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.059 second response time [11:07:26] akosiaris: thanks! [11:07:37] PROBLEM - HHVM rendering on mw1105 is CRITICAL: Connection refused [11:08:46] RECOVERY - HHVM rendering on mw1105 is OK: HTTP OK: HTTP/1.1 200 OK - 66736 bytes in 0.177 second response time [11:10:50] (03PS1) 10Alexandros Kosiaris: url_downloader: Vary squid config base on OS version [puppet] - 10https://gerrit.wikimedia.org/r/191015 [11:11:40] (03CR) 10jenkins-bot: [V: 04-1] url_downloader: Vary squid config base on OS version [puppet] - 10https://gerrit.wikimedia.org/r/191015 (owner: 10Alexandros Kosiaris) [11:17:47] PROBLEM - Apache HTTP on mw1151 is CRITICAL: Connection refused [11:18:57] RECOVERY - Apache HTTP on mw1151 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.065 second response time [11:19:41] (03PS2) 10Giuseppe Lavagetto: mediawiki: move api canaries to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190994 [11:20:17] (03CR) 10Alexandros Kosiaris: [C: 032] restbase: allocate LVS internal service ip [dns] - 10https://gerrit.wikimedia.org/r/190784 (https://phabricator.wikimedia.org/T89636) (owner: 10Filippo Giunchedi) [11:24:09] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: move api canaries to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190994 (owner: 10Giuseppe Lavagetto) [11:24:49] <_joe_> !log rolling transition of api appservers to www-data beginning as well [11:24:54] Logged the message, Master [11:25:26] (03PS1) 10KartikMistry: Beta: Fix common.yaml format [puppet] - 10https://gerrit.wikimedia.org/r/191021 [11:25:36] akosiaris: quick fix ^^ [11:26:23] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: Fix common.yaml format [puppet] - 10https://gerrit.wikimedia.org/r/191021 (owner: 10KartikMistry) [11:29:11] kart_: akosiaris: the commas must be removed as well [11:29:41] (03PS2) 10Alexandros Kosiaris: url_downloader: Vary squid config base on OS version [puppet] - 10https://gerrit.wikimedia.org/r/191015 [11:30:03] Nikerabbit: all commas, right? [11:30:42] Nikerabbit: https://phabricator.wikimedia.org/P300 - Please check. [11:31:09] from the lists [11:31:17] PROBLEM - Apache HTTP on mw1114 is CRITICAL: Connection refused [11:31:26] PROBLEM - HHVM rendering on mw1114 is CRITICAL: Connection refused [11:31:47] another Q is whether the target languages in jsonDict should be lists as well [11:31:47] Nikerabbit: P300 is okay, I mean? [11:32:06] Nikerabbit: no, it isn't. [11:32:27] RECOVERY - Apache HTTP on mw1114 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.059 second response time [11:32:28] RECOVERY - HHVM rendering on mw1114 is OK: HTTP OK: HTTP/1.1 200 OK - 66753 bytes in 0.535 second response time [11:33:21] Nikerabbit: should be? [11:34:07] kart_: I think they should be, otherwise it would be impossible to have es->ca and es->fr dictionaries at the same time... but I didn't check the code [11:34:35] Nikerabbit: meanwhile, lets fix yaml thing. Can you comment on P300? [11:35:00] 3Phabricator, operations: The options of the Security dropdown in Phabricator need to be clear and documented - https://phabricator.wikimedia.org/T76564#1043296 (10Krinkle) [11:35:50] (03PS2) 10Giuseppe Lavagetto: mediawiki: move api appservers to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190995 [11:36:07] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: move api appservers to www-data [puppet] - 10https://gerrit.wikimedia.org/r/190995 (owner: 10Giuseppe Lavagetto) [11:36:33] kart_: let's ignore the dict for now, I think Santhosh made a mistake [11:36:43] kart_: P300 is okay [11:36:58] Nikerabbit: yes. Lets fix it tomorrow then. [11:37:03] or tonight :) [11:40:30] (03PS1) 10KartikMistry: Beta: More correction to common.yaml formatting [puppet] - 10https://gerrit.wikimedia.org/r/191026 [11:40:36] akosiaris: one more. Sorry about this now ^ [11:41:56] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: More correction to common.yaml formatting [puppet] - 10https://gerrit.wikimedia.org/r/191026 (owner: 10KartikMistry) [11:42:26] PROBLEM - Apache HTTP on mw1146 is CRITICAL: Connection refused [11:42:27] PROBLEM - HHVM rendering on mw1147 is CRITICAL: Connection refused [11:42:27] PROBLEM - HHVM rendering on mw1148 is CRITICAL: Connection refused [11:42:29] akosiaris: thanks. [11:42:42] (03CR) 10Alexandros Kosiaris: [C: 031] cassandra: deprecate cassandra::defaults class [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/190813 (https://phabricator.wikimedia.org/T76149) (owner: 10Filippo Giunchedi) [11:42:52] (03PS3) 10KartikMistry: WIP: Use compact registry format for cxserver [puppet] - 10https://gerrit.wikimedia.org/r/190990 [11:43:27] RECOVERY - Apache HTTP on mw1146 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.067 second response time [11:43:27] RECOVERY - HHVM rendering on mw1147 is OK: HTTP OK: HTTP/1.1 200 OK - 66754 bytes in 0.528 second response time [11:43:36] RECOVERY - HHVM rendering on mw1148 is OK: HTTP OK: HTTP/1.1 200 OK - 66746 bytes in 0.518 second response time [11:43:48] akosiaris: we should merge 190990 after I deploy cxserver in production. I'll poke you then :) [11:44:14] 3operations: Puppet failing on zirconium due to inability to git pull Transparency Report - https://phabricator.wikimedia.org/T89640#1043313 (10faidon) p:5Normal>3High [11:44:39] kart_: give it a -1 then to block it [11:45:07] PROBLEM - Apache HTTP on mw1145 is CRITICAL: Connection refused [11:45:30] <_joe_> it takes a wee bit longer on APIs [11:46:07] RECOVERY - Apache HTTP on mw1145 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.070 second response time [11:47:46] PROBLEM - Apache HTTP on mw1141 is CRITICAL: Connection refused [11:47:46] PROBLEM - HHVM rendering on mw1141 is CRITICAL: Connection refused [11:47:46] PROBLEM - HHVM rendering on mw1142 is CRITICAL: Connection refused [11:47:47] PROBLEM - HHVM rendering on mw1140 is CRITICAL: Connection refused [11:48:05] akosiaris: okay [11:48:20] akosiaris: can you check why config.js still showing old format? [11:48:45] deployment-cxserver03:/srv/deployment/cxserver/deploy/src/config.js [11:48:47] RECOVERY - Apache HTTP on mw1141 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.102 second response time [11:48:47] RECOVERY - HHVM rendering on mw1141 is OK: HTTP OK: HTTP/1.1 200 OK - 66754 bytes in 0.272 second response time [11:48:47] RECOVERY - HHVM rendering on mw1142 is OK: HTTP OK: HTTP/1.1 200 OK - 66746 bytes in 0.270 second response time [11:48:54] akosiaris: ^^ [11:48:56] RECOVERY - HHVM rendering on mw1140 is OK: HTTP OK: HTTP/1.1 200 OK - 66746 bytes in 0.560 second response time [11:49:41] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Two comments, one really minor, the other important. After that, we should be fine" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/190786 (https://phabricator.wikimedia.org/T89636) (owner: 10Filippo Giunchedi) [11:50:24] (03CR) 10KartikMistry: [C: 04-1] "Not to be deployed unless we update cxserver in Production." [puppet] - 10https://gerrit.wikimedia.org/r/190990 (owner: 10KartikMistry) [11:54:11] kart_: deployment-salt is updated one per hour IIRC, so that explains it [11:54:29] akosiaris: ah :) [11:55:05] then I should go for cycling and check again. [11:57:07] PROBLEM - Apache HTTP on mw1129 is CRITICAL: Connection refused [11:58:16] RECOVERY - Apache HTTP on mw1129 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.111 second response time [11:59:37] PROBLEM - HHVM rendering on mw1127 is CRITICAL: Connection refused [12:00:47] RECOVERY - HHVM rendering on mw1127 is OK: HTTP OK: HTTP/1.1 200 OK - 67281 bytes in 0.431 second response time [12:02:07] PROBLEM - Apache HTTP on mw1123 is CRITICAL: Connection refused [12:03:07] RECOVERY - Apache HTTP on mw1123 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.896 second response time [12:03:57] (03CR) 10Alexandros Kosiaris: [C: 04-1] RESTBase: add Icinga check_procs check for Cassandra (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/190688 (owner: 10Ori.livneh) [12:05:03] this week is not mine :-) [12:05:29] I bet you'd like that [12:05:29] :P [12:05:39] you have nothing to do anyway right [12:05:46] exactly!!! [12:06:30] akosiaris: We've dangling symlink to config.js in /srv/deployment/cxserver/cxserver [12:06:40] akosiaris: can you look at that? :) [12:06:57] yeah, but not right now, finishing up some other stuff [12:07:11] akosiaris: should it point to /srv/deployment/cxserver/deploy/src/config.js? [12:07:21] IIRC no [12:07:32] it is actually being deployed as a file by puppet [12:07:39] I don't know what that symlink got in there [12:07:42] Right. [12:07:43] perhaps the deploy ? [12:07:50] s/what/how/ [12:08:19] /srv/deployment/cxserver/deploy/src/config.js is the file puppet is managing... but I don't understand why it is not updating to match the merged changes [12:09:32] yes. I refreshed puppet on cxserver too. [12:12:52] (03PS3) 10Alexandros Kosiaris: url_downloader: Vary squid config base on OS version [puppet] - 10https://gerrit.wikimedia.org/r/191015 [12:16:02] (03PS4) 10Filippo Giunchedi: restbase: add internal LVS configuration [puppet] - 10https://gerrit.wikimedia.org/r/190786 (https://phabricator.wikimedia.org/T89636) [12:17:28] (03CR) 10Filippo Giunchedi: restbase: add internal LVS configuration (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/190786 (https://phabricator.wikimedia.org/T89636) (owner: 10Filippo Giunchedi) [12:19:10] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "LGTM, feel free to merge. Don't forget to update pybal config and restart it" [puppet] - 10https://gerrit.wikimedia.org/r/190786 (https://phabricator.wikimedia.org/T89636) (owner: 10Filippo Giunchedi) [12:19:57] godog: ^ this should be good. I am off to lunch [12:20:03] (03PS3) 10Filippo Giunchedi: RESTBase: add Icinga check_procs check for Cassandra [puppet] - 10https://gerrit.wikimedia.org/r/190688 (owner: 10Ori.livneh) [12:20:11] akosiaris: ack! me too shortly, thanks! [12:21:28] akosiaris: are you planning to reprovision chromium? [12:21:35] or is this a beta-targeted change? [12:21:39] akosiaris: ah did you mean that changes to puppet can take up to one hour to appear on deployment-salt – and that running puppet on a instance pulls from deployment-salt? [12:21:39] (just curious :) [12:24:44] paravoid: beta targeted change [12:24:52] Nikerabbit: yup [12:25:27] (03CR) 10Alexandros Kosiaris: [C: 032] RESTBase: add Icinga check_procs check for Cassandra [puppet] - 10https://gerrit.wikimedia.org/r/190688 (owner: 10Ori.livneh) [12:25:39] akosiaris: ok got it now, and config was updated too [12:26:08] Nikerabbit: as I said. up to 1 hour :-) [12:26:22] we should have this lowered though [12:26:38] although beta is not supposed to be used as a testing ground [12:27:41] lemme clarify. As a "get the latest and greatest new hot change"-testing ground [12:28:22] <_joe_> well in puppet terms, it is [12:28:51] <_joe_> or, where should we apply changes before than in production? [12:29:08] I guess in theory we could do a self-hosted puppet master on our own instances and play with that, but that would probably take more effort [12:29:13] <_joe_> we should not leave changes on beta alone for a long timespan [12:29:33] <_joe_> Nikerabbit: when doing disruptive changes that is a sensible idea though [12:29:47] _joe_: actually in puppet terms, beta gets changes after production [12:29:57] due to the lag introduced by the cronjob [12:30:10] <_joe_> akosiaris: well it depends. That's true if you don't cherry-pick [12:30:21] so 99% of commits [12:30:46] <_joe_> akosiaris: yes, but those are uninteresting I guess [12:31:03] <_joe_> those are the ones people don't feel they should test [12:31:31] well testing is obviously fine. And beta is there for testing [12:31:37] but the idea is integration testing [12:31:50] not testing while development goes on [12:32:11] (03PS1) 10Giuseppe Lavagetto: mediawiki: convert imagescalers to use www-data [puppet] - 10https://gerrit.wikimedia.org/r/191030 [12:32:29] which reminds me of the entire staging/beta/deployment-prep naming problems and use cases and all [12:35:06] in our team "beta" (usually) ~ deployment-prep [12:35:10] but it's confusing [12:41:34] (03CR) 10Florianschmidtwelzow: [C: 04-1] Allow a full text search button on Commons whenever possible (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/186916 (https://phabricator.wikimedia.org/T19471) (owner: 10Nemo bis) [12:42:19] (03CR) 10Giuseppe Lavagetto: [C: 032] mediawiki: convert imagescalers to use www-data [puppet] - 10https://gerrit.wikimedia.org/r/191030 (owner: 10Giuseppe Lavagetto) [12:51:04] PROBLEM - Apache HTTP on mw1154 is CRITICAL: Connection refused [12:53:14] RECOVERY - Apache HTTP on mw1154 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.117 second response time [13:00:33] 3operations: Decommission svn.wikimedia.org server (import SVN into Phabricator) - https://phabricator.wikimedia.org/T86655#1043429 (10Nemo_bis) Does this include importing CodeReview comments into Audit? Can all viewvc URLs be safely redirected? Can the SVN imports be clearly separated from non-archived repos?... [13:01:54] PROBLEM - Apache HTTP on mw1159 is CRITICAL: Connection refused [13:04:04] RECOVERY - Apache HTTP on mw1159 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.184 second response time [13:04:44] PROBLEM - Apache HTTP on mw1160 is CRITICAL: Connection refused [13:06:24] PROBLEM - Apache HTTP on mw1158 is CRITICAL: Connection refused [13:06:54] RECOVERY - Apache HTTP on mw1160 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.177 second response time [13:09:44] RECOVERY - Apache HTTP on mw1158 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.095 second response time [13:11:53] PROBLEM - Apache HTTP on mw1157 is CRITICAL: Connection refused [13:14:03] RECOVERY - Apache HTTP on mw1157 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.110 second response time [13:16:03] PROBLEM - Apache HTTP on mw1156 is CRITICAL: Connection refused [13:19:15] RECOVERY - Apache HTTP on mw1156 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.112 second response time [13:21:07] PROBLEM - Apache HTTP on mw1155 is CRITICAL: Connection refused [13:24:17] RECOVERY - Apache HTTP on mw1155 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 400 bytes in 0.103 second response time [13:29:49] (03PS4) 10Alexandros Kosiaris: url_downloader: Vary squid config base on OS version [puppet] - 10https://gerrit.wikimedia.org/r/191015 [13:33:57] PROBLEM - MariaDB disk space on db2011 is CRITICAL: DISK CRITICAL - free space: /srv 86651 MB (5% inode=99%): [13:52:22] 3Datasets-General-or-Unknown, operations: dumps.wikimedia.org seems super-slow right now - https://phabricator.wikimedia.org/T45647#1043473 (10ArielGlenn) It's an lvm on hardware raid 6 so files should be spread about across disks. We don't have multiple filesystems/partitions so there's no shuffling around of... [13:57:10] (03PS1) 10Giuseppe Lavagetto: jobrunner: use www-data as the web user [puppet] - 10https://gerrit.wikimedia.org/r/191039 [13:59:08] (03CR) 10Giuseppe Lavagetto: [C: 032] jobrunner: use www-data as the web user [puppet] - 10https://gerrit.wikimedia.org/r/191039 (owner: 10Giuseppe Lavagetto) [14:01:59] <_joe_> !log updating the jobrunners to use www-data [14:02:04] Logged the message, Master [14:02:36] (03PS24) 10KartikMistry: cxserver: Add Yandex support [puppet] - 10https://gerrit.wikimedia.org/r/186538 (https://phabricator.wikimedia.org/T88512) [14:04:55] (03PS25) 10KartikMistry: cxserver: Add Yandex support [puppet] - 10https://gerrit.wikimedia.org/r/186538 (https://phabricator.wikimedia.org/T88512) [14:12:44] 3operations, Beta-Cluster: Make www-data the web-serving user (is currently apache) - https://phabricator.wikimedia.org/T78076#1043515 (10Joe) Production is being finally converted to www-data today; we will need to deploy the scap fix before we deploy today. I'll ping @bd808 [14:13:02] (03PS5) 10Filippo Giunchedi: restbase: add internal LVS configuration [puppet] - 10https://gerrit.wikimedia.org/r/190786 (https://phabricator.wikimedia.org/T89636) [14:13:09] _joe_: \o/ [14:13:10] nice work! [14:13:24] <_joe_> paravoid: we're almost done [14:13:37] <_joe_> and hell didn't freeze, apparently [14:13:39] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] restbase: add internal LVS configuration [puppet] - 10https://gerrit.wikimedia.org/r/190786 (https://phabricator.wikimedia.org/T89636) (owner: 10Filippo Giunchedi) [14:14:17] <_joe_> well, it took me just 9 months, but now our web setup is almost-100% following the debian standards [14:14:22] haha [14:14:38] * _joe_ still remembers in shock the apache configs in /usr/local/apache/somethingelse [14:15:13] indeed, very nice! [14:15:25] (03CR) 10Alexandros Kosiaris: [C: 032] url_downloader: Vary squid config base on OS version [puppet] - 10https://gerrit.wikimedia.org/r/191015 (owner: 10Alexandros Kosiaris) [14:35:05] Did I hear 'Debian' standards! [14:35:10] _joe_: nice! [14:41:33] PROBLEM - Host restbase.svc.eqiad.wmnet is DOWN: CRITICAL - Network Unreachable (10.2.2.17) [14:41:44] that's me, didn't silence in time of course [14:41:59] sorry about the page [14:42:07] godog: :-) [14:42:09] no worries [14:43:24] I'm not sure I've ever successfully catched it in time after first deploy [14:43:34] s/deploy/puppet runs/ [14:46:37] (03PS26) 10KartikMistry: cxserver: Add Yandex support [puppet] - 10https://gerrit.wikimedia.org/r/186538 (https://phabricator.wikimedia.org/T88512) [14:49:49] (03PS1) 10Giuseppe Lavagetto: tin: use www-data for mediawiki as well [puppet] - 10https://gerrit.wikimedia.org/r/191047 [14:49:51] (03PS1) 10Giuseppe Lavagetto: terbium: use www-data for mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/191048 [14:49:53] (03PS1) 10Giuseppe Lavagetto: videoscaler: use www-data [puppet] - 10https://gerrit.wikimedia.org/r/191049 [14:52:12] (03CR) 10Giuseppe Lavagetto: [C: 032] tin: use www-data for mediawiki as well [puppet] - 10https://gerrit.wikimedia.org/r/191047 (owner: 10Giuseppe Lavagetto) [14:53:42] safe to restart pybal on lvs1006 (?) [14:59:23] <_joe_> godog: I guess so [14:59:50] <_joe_> godog: tell me when you did it so that I can check something [14:59:56] sure [15:00:13] !log restart pybal on lvs1006 [15:00:15] _joe_: ^ [15:00:16] Logged the message, Master [15:00:40] I'll restart on lvs1003 shortly [15:01:23] <_joe_> ok [15:02:13] RECOVERY - Host restbase.svc.eqiad.wmnet is UP: PING OK - Packet loss = 0%, RTA = 6.01 ms [15:03:10] godog: \o/ [15:04:09] (03PS27) 10KartikMistry: cxserver: Add Yandex support [puppet] - 10https://gerrit.wikimedia.org/r/186538 (https://phabricator.wikimedia.org/T88512) [15:06:03] akosiaris: \o/ [15:06:09] !log restart pybal on lvs1003 [15:06:13] Logged the message, Master [15:06:42] <_joe_> mh now I'm left with terbium [15:06:54] <_joe_> which is going to be the hard one to migrate [15:07:11] (03PS28) 10KartikMistry: cxserver: Add Yandex support [puppet] - 10https://gerrit.wikimedia.org/r/186538 (https://phabricator.wikimedia.org/T88512) [15:07:39] (03CR) 10Giuseppe Lavagetto: [C: 032] terbium: use www-data for mediawiki [puppet] - 10https://gerrit.wikimedia.org/r/191048 (owner: 10Giuseppe Lavagetto) [15:10:13] PROBLEM - MariaDB disk space on db2011 is CRITICAL: DISK CRITICAL - free space: /srv 86999 MB (5% inode=99%): [15:10:37] 3Ops-Access-Requests, Services, operations, Citoid: Give mvolz access to sha machine i.e. http://citoid.wikimedia.org/ - https://phabricator.wikimedia.org/T89057#1043614 (10Mvolz) Thanks @Dzahn Who do you need to +2 this? [15:13:05] <_joe_> !log disabled manually all crons in the 'apache' crontab on terbium [15:13:13] Logged the message, Master [15:16:53] PROBLEM - Disk space on dataset1001 is CRITICAL: DISK CRITICAL - free space: /data 1521023 MB (3% inode=99%): [15:20:26] Oh, shit, I'm the only SWATter with patches in [15:20:30] And it's a big SWAT list [15:21:58] Three of the (so far) seven patches dealing with fallout from AaronS :P [15:23:53] PROBLEM - ElasticSearch health check for shards on elastic1030 is CRITICAL: CRITICAL - elasticsearch http://10.64.48.54:9200/_cluster/health error while fetching: Request timed out. [15:24:53] RECOVERY - ElasticSearch health check for shards on elastic1030 is OK: OK - elasticsearch status production-search-eqiad: status: yellow, number_of_nodes: 31, unassigned_shards: 168, timed_out: False, active_primary_shards: 2135, cluster_name: production-search-eqiad, relocating_shards: 15, active_shards: 6218, initializing_shards: 33, number_of_data_nodes: 31 [15:27:34] that was me, bad timing with restart and icinga [15:32:52] <_joe_> marktraceur: are you swatting now? [15:33:24] No, 30 minutes or so [15:33:32] But I will do it (sigh) when the time comes [15:34:41] (03PS29) 10KartikMistry: cxserver: Add Yandex support [puppet] - 10https://gerrit.wikimedia.org/r/186538 (https://phabricator.wikimedia.org/T88512) [15:35:40] _joe_: got 2 min for https://gerrit.wikimedia.org/r/#/c/190813/ ? [15:35:54] <_joe_> godog: not now sorry [15:36:00] <_joe_> maybe in 15? 20? [15:36:09] sure [15:36:23] PROBLEM - check if wikidata.org dispatch lag is higher than 2 minutes on wikidata is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1457 bytes in 0.181 second response time [15:36:46] <_joe_> mmmh this may be due to terbium changing cron user? [15:36:52] <_joe_> ^^ [15:37:37] (03CR) 10Faidon Liambotis: "Yes, since these are coming from socket addresses and inet_ntop always outputs in lowercase (to my knowledge)." [puppet] - 10https://gerrit.wikimedia.org/r/190964 (owner: 10Faidon Liambotis) [15:40:05] (03PS1) 10Alexandros Kosiaris: Reuse parsoid varnish for restbase [puppet] - 10https://gerrit.wikimedia.org/r/191061 (https://phabricator.wikimedia.org/T78194) [15:46:56] (03PS30) 10KartikMistry: cxserver: Add Yandex support [puppet] - 10https://gerrit.wikimedia.org/r/186538 (https://phabricator.wikimedia.org/T88512) [15:46:56] <_joe_> marktraceur: we'll need to deploy a new version of scap before you can swat I guess [15:47:03] <_joe_> marktraceur: so we'll have to wait for bd808|BUFFER to get here [15:47:54] Oh dear. [15:47:57] OK, no rush [15:48:14] We have two hours before someone else is deploying [15:48:37] <_joe_> I can do it if we're in a rush [15:48:49] <_joe_> but I'd prefer bryan to do that [15:50:04] No, no rush [15:50:09] Just let me know when you're set to go [15:51:37] (03CR) 10Mobrovac: "I believe in T78194 we settled for 'rest.wikimedia.org' (not 'restbase.'), see in-line comment." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/191061 (https://phabricator.wikimedia.org/T78194) (owner: 10Alexandros Kosiaris) [15:51:51] <_joe_> gee gerrit is slooow [15:53:56] new scap? [15:54:53] Oh good, bd808 is here. [15:54:56] Oh is it time for the apache -> www-data change? [15:55:44] anybody can deploy scap. It's in /srv/deployment/scap/scap on tin and deployed with trebuchet [15:55:52] (03PS1) 10Filippo Giunchedi: add extra_classpath argument [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/191067 (https://phabricator.wikimedia.org/T78514) [15:56:10] <_joe_> bd808: yes I just wanted your ack basically [15:56:23] <_joe_> so I +2'd the change [15:56:36] (03PS1) 10Giuseppe Lavagetto: maintenance: fixup for using www-data everywhere [puppet] - 10https://gerrit.wikimedia.org/r/191068 [15:56:40] <_joe_> bd808: yes it is time :) [15:57:37] (03CR) 10Giuseppe Lavagetto: [C: 032] maintenance: fixup for using www-data everywhere [puppet] - 10https://gerrit.wikimedia.org/r/191068 (owner: 10Giuseppe Lavagetto) [15:58:01] coolio. we'll needt o remove yuvi from the review to shed his -2 [15:58:02] hmm, RT seems down [15:59:08] <_joe_> ok [15:59:58] grumble. trebuchet lock held by Reedy from 2015-01-08 in scap dir [16:00:04] manybubbles, anomie, ^d, marktraceur: Respected human, time to deploy Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150217T1600). Please do the needful. [16:00:19] bd808: want me to remove it? [16:00:32] that would be swell [16:01:07] git deploy abort might do it or rm /srv/deployment/scap/scap/.git/deploy/deploy [16:01:42] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: puppet fail [16:02:56] <_joe_> deploy deploy [16:03:01] reedy@tin:/srv/deployment/scap/scap$ git deploy abort [16:03:01] There is no deployment to abort. [16:03:09] _joe_: scap scap deploy deploy [16:03:15] yeah, just nuke the lock file then [16:03:20] sudo deploy [16:03:34] nuked [16:03:58] git deploy start: Failed to write lock file [16:04:04] * bd808 looks for reasons [16:04:29] drwxr-sr-x 2 reedy wikidev 4096 Feb 17 16:03 . [16:04:31] no gw [16:04:32] ffs [16:04:57] fixed too [16:05:16] Failed to write the start tag. [16:05:19] grrrrrrrrr [16:05:58] (03CR) 10BBlack: "I think you're correct in practice on current Linux hosts I've looked at, but neither the standard nor the Linux man page require it to be" [puppet] - 10https://gerrit.wikimedia.org/r/190964 (owner: 10Faidon Liambotis) [16:06:12] (03PS1) 10Ottomata: Create new user joal - Joseph Allemandou [puppet] - 10https://gerrit.wikimedia.org/r/191069 (https://phabricator.wikimedia.org/T89357) [16:06:22] RECOVERY - check if wikidata.org dispatch lag is higher than 2 minutes on wikidata is OK: HTTP OK: HTTP/1.1 200 OK - 1447 bytes in 0.194 second response time [16:06:23] PROBLEM - puppet last run on mw1250 is CRITICAL: CRITICAL: Puppet has 1 failures [16:06:33] PROBLEM - puppet last run on mw1174 is CRITICAL: CRITICAL: Puppet has 1 failures [16:06:42] PROBLEM - puppet last run on mw1153 is CRITICAL: CRITICAL: Puppet has 1 failures [16:06:43] PROBLEM - puppet last run on mw1026 is CRITICAL: CRITICAL: Puppet has 1 failures [16:06:52] PROBLEM - puppet last run on mw1150 is CRITICAL: CRITICAL: Puppet has 1 failures [16:06:59] _joe_: can you do a `chmod -R g+w /srv/deployment/scap/scap/.git` for me on tin? We've got some messed up file perms yet again [16:07:03] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:03] PROBLEM - puppet last run on mw1082 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:04] PROBLEM - puppet last run on mw1046 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:13] PROBLEM - puppet last run on mw1164 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:13] PROBLEM - puppet last run on mw1117 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:13] PROBLEM - puppet last run on mw1222 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:17] <_joe_> wha? [16:07:19] :( [16:07:23] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:23] PROBLEM - puppet last run on mw1009 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:23] PROBLEM - puppet last run on mw1088 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:23] PROBLEM - puppet last run on mw1060 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:23] PROBLEM - puppet last run on mw1228 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:26] <_joe_> I may have screwed something [16:07:32] PROBLEM - puppet last run on mw1235 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:32] PROBLEM - puppet last run on mw1100 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:33] PROBLEM - puppet last run on mw1173 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:42] PROBLEM - puppet last run on mw1242 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:43] PROBLEM - puppet last run on mw1226 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:43] PROBLEM - puppet last run on mw1025 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:43] PROBLEM - puppet last run on mw1003 is CRITICAL: CRITICAL: Puppet has 1 failures [16:07:44] dirty roots with their root super powers ;) [16:07:53] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:00] <_joe_> Error: /Stage[main]/Mediawiki::Scap/Package[scap]/ensure: change from a78ddec6f38e266a9e26cf6119b10ae3ffb785bc to latest failed: Could not get latest version: 404 Not Found [16:08:03] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:03] PROBLEM - puppet last run on mw1120 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:10] <_joe_> bd808: ok I'll do that [16:08:12] PROBLEM - puppet last run on mw1008 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:12] PROBLEM - puppet last run on mw1068 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:13] PROBLEM - puppet last run on mw1217 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:19] paravoid, bblack [16:08:21] re, https://gerrit.wikimedia.org/r/#/c/190964/ [16:08:23] PROBLEM - puppet last run on mw1061 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:23] PROBLEM - puppet last run on mw1119 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:24] PROBLEM - puppet last run on mw1069 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:25] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:25] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:25] PROBLEM - puppet last run on mw1099 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:25] PROBLEM - puppet last run on mw1042 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:25] PROBLEM - puppet last run on mw1118 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:25] PROBLEM - puppet last run on mw1177 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:29] i would really like if we could have a canonical way of doing this [16:08:32] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:32] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:32] PROBLEM - puppet last run on mw1123 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:32] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Puppet has 1 failures [16:08:33] that error may be from the rm I had Reedy do [16:08:34] in pseudo code maybe? [16:08:37] documented? [16:08:39] <_joe_> bd808: done [16:08:41] it is needed in lots of places [16:08:48] and it would be best if we all did it the same way [16:08:56] * Deployment started. [16:09:00] \o/ [16:09:02] <_joe_> ok [16:09:03] PROBLEM - puppet last run on mw1129 is CRITICAL: CRITICAL: Puppet has 1 failures [16:09:05] that message ought to have {{PLURAL:$1|failure|failures}} ;) [16:09:12] PROBLEM - puppet last run on mw1175 is CRITICAL: CRITICAL: Puppet has 1 failures [16:09:13] PROBLEM - puppet last run on mw1172 is CRITICAL: CRITICAL: Puppet has 1 failures [16:09:13] PROBLEM - puppet last run on mw1114 is CRITICAL: CRITICAL: Puppet has 1 failures [16:09:17] Nikerabbit: heh [16:09:23] PROBLEM - puppet last run on mw1211 is CRITICAL: CRITICAL: Puppet has 1 failures [16:09:24] PROBLEM - puppet last run on mw1251 is CRITICAL: CRITICAL: Puppet has 1 failures [16:09:33] PROBLEM - puppet last run on mw1126 is CRITICAL: CRITICAL: Puppet has 1 failures [16:09:33] PROBLEM - puppet last run on mw1206 is CRITICAL: CRITICAL: Puppet has 1 failures [16:09:33] PROBLEM - puppet last run on mw1002 is CRITICAL: CRITICAL: Puppet has 1 failures [16:09:36] (03PS31) 10KartikMistry: cxserver: Add Yandex support [puppet] - 10https://gerrit.wikimedia.org/r/186538 (https://phabricator.wikimedia.org/T88512) [16:09:52] PROBLEM - puppet last run on mw1011 is CRITICAL: CRITICAL: Puppet has 1 failures [16:09:52] PROBLEM - puppet last run on mw1162 is CRITICAL: CRITICAL: Puppet has 1 failures [16:09:53] PROBLEM - puppet last run on mw1054 is CRITICAL: CRITICAL: Puppet has 1 failures [16:10:02] PROBLEM - puppet last run on mw1237 is CRITICAL: CRITICAL: Puppet has 1 failures [16:10:03] PROBLEM - puppet last run on mw1039 is CRITICAL: CRITICAL: Puppet has 1 failures [16:10:03] PROBLEM - puppet last run on mw1213 is CRITICAL: CRITICAL: Puppet has 1 failures [16:10:03] PROBLEM - puppet last run on silver is CRITICAL: CRITICAL: Puppet has 1 failures [16:10:03] PROBLEM - puppet last run on mw1195 is CRITICAL: CRITICAL: Puppet has 1 failures [16:10:13] PROBLEM - puppet last run on mw1076 is CRITICAL: CRITICAL: Puppet has 1 failures [16:10:23] PROBLEM - puppet last run on mw1055 is CRITICAL: CRITICAL: Puppet has 1 failures [16:10:32] PROBLEM - puppet last run on tin is CRITICAL: CRITICAL: Puppet has 1 failures [16:10:36] Who's SWATing? [16:10:41] seems stuck at "266/273 minions completed fetch" [16:10:43] PROBLEM - puppet last run on mw1149 is CRITICAL: CRITICAL: Puppet has 1 failures [16:10:44] PROBLEM - puppet last run on snapshot1002 is CRITICAL: CRITICAL: Puppet has 1 failures [16:10:52] PROBLEM - puppet last run on mw1249 is CRITICAL: CRITICAL: Puppet has 1 failures [16:10:53] PROBLEM - puppet last run on mw1208 is CRITICAL: CRITICAL: Puppet has 1 failures [16:10:53] PROBLEM - puppet last run on mw1044 is CRITICAL: CRITICAL: Puppet has 1 failures [16:10:53] PROBLEM - puppet last run on mw1125 is CRITICAL: CRITICAL: Puppet has 1 failures [16:10:56] (03PS2) 10Ottomata: Create new user joal - Joseph Allemandou [puppet] - 10https://gerrit.wikimedia.org/r/191069 (https://phabricator.wikimedia.org/T89357) [16:11:03] PROBLEM - puppet last run on mw1151 is CRITICAL: CRITICAL: Puppet has 1 failures [16:11:13] PROBLEM - puppet last run on osmium is CRITICAL: CRITICAL: Puppet has 1 failures [16:11:16] <_joe_> grr puppet failures are soo annoying [16:11:23] PROBLEM - puppet last run on mw1181 is CRITICAL: CRITICAL: Puppet has 1 failures [16:11:23] PROBLEM - puppet last run on mw1227 is CRITICAL: CRITICAL: Puppet has 1 failures [16:11:23] PROBLEM - puppet last run on mw1190 is CRITICAL: CRITICAL: Puppet has 1 failures [16:11:23] PROBLEM - puppet last run on mw1247 is CRITICAL: CRITICAL: Puppet has 1 failures [16:11:23] PROBLEM - puppet last run on mw1202 is CRITICAL: CRITICAL: Puppet has 1 failures [16:11:24] PROBLEM - puppet last run on mw1133 is CRITICAL: CRITICAL: Puppet has 1 failures [16:11:26] hahah [16:11:32] PROBLEM - puppet last run on mw1084 is CRITICAL: CRITICAL: Puppet has 1 failures [16:11:32] PROBLEM - puppet last run on mw1168 is CRITICAL: CRITICAL: Puppet has 1 failures [16:11:32] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Puppet has 1 failures [16:11:33] PROBLEM - puppet last run on mw1056 is CRITICAL: CRITICAL: Puppet has 1 failures [16:11:33] _joe_, did you know that puppet has 1 failures? [16:11:42] <_joe_> ottomata: I kinda guessed [16:11:44] hahaha [16:11:48] <_joe_> paravoid: thanks [16:11:59] <_joe_> btw, it's recovering now [16:12:04] bd808: Uhm... shall I consider SWAT canceled? [16:12:10] sure, that means yet another flood of messages [16:12:10] puppet can haz 1 failurez [16:12:12] <_joe_> marktraceur: you have my go to swat [16:12:24] <_joe_> paravoid: yes :/ [16:12:40] (03CR) 10Ottomata: [C: 032] Create new user joal - Joseph Allemandou [puppet] - 10https://gerrit.wikimedia.org/r/191069 (https://phabricator.wikimedia.org/T89357) (owner: 10Ottomata) [16:12:41] Yay [16:12:45] <_joe_> this is bound to happen whenever scap is being deployed with any issue btw [16:12:46] !log hosts failing to fetch for scap trebuchet deploy: fenari.wikimedia.org, mw1062.eqiad.wmnet, nickel.wikimedia.org, searchidx1001.eqiad.wmnet, mw1222.eqiad.wmnet, virt0.wikimedia.org, mw1159.eqiad.wmnet [16:13:16] <_joe_> bd808: the mw hosts will recover eventually [16:13:42] !log updated scap to 54a2713 (www-data user) [16:13:46] <_joe_> fenari... searchidx and virt0 should be torched [16:13:46] Logged the message, Master [16:14:05] there is a redis list that they have to be taken out of [16:14:14] trebuchet has some weird state shit [16:14:41] marktraceur: want to do the honors of some no-op test? [16:14:58] hoo: marktraceur will be swatting once we get scap working [16:15:03] bd808: What, just touch a file and scap? [16:15:12] bd808: I see, ok [16:15:17] yeah or even just sync-file some random file [16:15:22] Oh sure. [16:16:02] !log marktraceur Synchronized wmf-config/logging.php: No-op test for bd808 (duration: 00m 05s) [16:16:06] Logged the message, Master [16:16:09] {{done}} [16:16:25] _joe_: If some of those hosts I logged as failing the sync are decommed, we need to do this to make trebuchet forget about them -- https://wikitech.wikimedia.org/wiki/Trebuchet#Removing_minions_from_redis [16:16:37] OK... kart_, you're up first [16:16:41] Ping for testing [16:16:45] <_joe_> fo'realz? [16:16:53] marktraceur: here [16:16:59] Also hoo is here, so is bd808, and obviously myself [16:17:05] <_joe_> bd808: "trebuchet has some weird state shit" [16:17:05] So we're set to go! [16:17:42] _joe_: once salt tells it about a minion it adds it to a hash in redis and nothing automatically cleans that [16:17:47] (03CR) 10MarkTraceur: [C: 032] CX: Publishing to Main namespace for idwiki and ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190450 (https://phabricator.wikimedia.org/T89450) (owner: 10KartikMistry) [16:17:59] (03Merged) 10jenkins-bot: CX: Publishing to Main namespace for idwiki and ptwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190450 (https://phabricator.wikimedia.org/T89450) (owner: 10KartikMistry) [16:18:05] <_joe_> bd808: so no trebuchet remove shit command [16:18:06] <_joe_> meh [16:18:15] (03PS1) 10Ottomata: Joal also needs access to bastions [puppet] - 10https://gerrit.wikimedia.org/r/191071 (https://phabricator.wikimedia.org/T89357) [16:18:23] (03PS2) 10Ottomata: Joal also needs access to bastions [puppet] - 10https://gerrit.wikimedia.org/r/191071 (https://phabricator.wikimedia.org/T89357) [16:18:29] (03PS2) 10Giuseppe Lavagetto: videoscaler: use www-data [puppet] - 10https://gerrit.wikimedia.org/r/191049 [16:18:56] _joe_: it uses the hash to decide if all the hosts it should have heard from are responding. A guard against salt async responses as far as I understand it [16:19:12] !log marktraceur Synchronized wmf-config/InitialiseSettings.php: [SWAT] [config] Enable Main namespace publishing for idwiki, ptwiki (duration: 00m 06s) [16:19:13] "In the future this should be available via git deploy commands. " [16:19:20] kart_: Test that one, I'll start on the next one [16:19:20] Logged the message, Master [16:19:37] marktraceur: doing. [16:19:39] (03CR) 10MarkTraceur: [C: 032] CX: Update wgContentTranslationSiteTemplates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190218 (owner: 10KartikMistry) [16:19:46] (03Merged) 10jenkins-bot: CX: Update wgContentTranslationSiteTemplates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190218 (owner: 10KartikMistry) [16:19:58] (03CR) 10Giuseppe Lavagetto: [C: 032] videoscaler: use www-data [puppet] - 10https://gerrit.wikimedia.org/r/191049 (owner: 10Giuseppe Lavagetto) [16:19:59] marktraceur: I confess that I don't think I can actually test my patch. Needs someone to try and create a poll. [16:20:04] OK [16:20:11] Deskana may have rights but I'm not sure [16:20:18] bd808: We'll just make sure it doesn't kill the site and wait for someone else with rights to test 'er [16:20:24] *nod* [16:21:39] 3operations: Varnish GeoIP is broken for HTTPS+IPv6 traffic - https://phabricator.wikimedia.org/T89688#1043716 (10Ottomata) Hm, so this is an aside, but: I would love it if we could figure out a way to standardize this algorithm across the organization. Analytics has at least one place where client ip extractio... [16:21:54] bd808, marktraceur: I don't, but I can prod James A to try to create a poll. [16:21:57] KK [16:22:01] (03PS3) 10Ottomata: Joal also needs access to bastions [puppet] - 10https://gerrit.wikimedia.org/r/191071 (https://phabricator.wikimedia.org/T89357) [16:22:07] (03CR) 10Ottomata: [C: 032 V: 032] Joal also needs access to bastions [puppet] - 10https://gerrit.wikimedia.org/r/191071 (https://phabricator.wikimedia.org/T89357) (owner: 10Ottomata) [16:22:29] In the hopes that that can happen, I will shunt that patch to the end of the list [16:23:24] marktraceur: Can James test it now or does he need to wait until you're done? [16:23:30] He needs to wait [16:23:33] marktraceur: Admittedly the chances of him being awake at 8:30am are minimal. ^_^ [16:23:43] Yeah, I figure that's not going to happen [16:23:54] marktraceur: Good stuff. Can you ping me when you want me to reach out to him? [16:23:57] But if he joins us I can start doing it whenever [16:24:06] Deskana: Ping him now, when he comes in we can start [16:24:19] kart_: Any luck? [16:24:39] marktraceur: still testing. give a minute. [16:26:15] marktraceur: tested. [16:26:24] kart_: Presumably it worked? :) [16:26:47] yes. Found bug though. [16:26:56] Not really related to this. go ahead. [16:26:58] :) [16:27:04] OK, your other patch now [16:27:11] !log marktraceur Synchronized wmf-config/CommonSettings.php: [SWAT] [config] Update wgContentTranslationSiteTemplates (duration: 00m 05s) [16:27:14] kart_: Test! [16:27:17] Logged the message, Master [16:27:19] hoo: You're up next [16:27:39] _joe_: So I'm not getting DNS resolve on these trebuchet targets: fenari.wikimedia.org virt0.wikimedia.org nickel.wikimedia.org searchidx1001.eqiad.wmnet [16:27:55] does that sound right for removing from the scap/scap list? [16:27:56] <_joe_> nickel may have changed names [16:28:08] marktraceur: Yay [16:28:13] (03CR) 10MarkTraceur: [C: 032] Adjust wgPropertySuggesterMinProbability, update property id blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190477 (owner: 10Hoo man) [16:28:16] <_joe_> bd808: yes but I need a few minutes [16:28:20] (03Merged) 10jenkins-bot: Adjust wgPropertySuggesterMinProbability, update property id blacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190477 (owner: 10Hoo man) [16:28:23] * _joe_ is converting the videoscalers [16:28:27] Sure, no worries. [16:29:37] 3operations: Evaluation Virtualization management platforms for a private VPS cluster - https://phabricator.wikimedia.org/T87251#1043735 (10akosiaris) 5Open>3Resolved The evaluation has been completed. The winner was Ganeti, the logic behind the entire evaluation is depicted in a Google Spreadsheet sent to t... [16:29:37] 3operations: Introduce Virtualization in our infrastructure - https://phabricator.wikimedia.org/T87258#1043737 (10akosiaris) [16:29:40] hoo: Once kart_ says his last patch is good we'll fire a sync-file for you; be ready to test it, we have a lot to do today [16:29:58] marktraceur: ack [16:30:27] 3Labs, ops-eqiad, operations: virt1002 broken disk? - https://phabricator.wikimedia.org/T88923#1043740 (10coren) [16:30:32] marktraceur: thanks! [16:30:37] kart_: It's working? [16:30:45] 3Labs, ops-eqiad, operations: virt1002 broken disk? - https://phabricator.wikimedia.org/T88923#1043744 (10coren) p:5High>3Unbreak! [16:31:42] !log mw1159.eqiad.wmnet has ancient scap version (2014-10-09) [16:31:46] Logged the message, Master [16:33:03] OK, kart_, I'm going to assume your "ack" was an unclear verification of the patch because I'm getting antsy [16:33:10] !log marktraceur Synchronized wmf-config/Wikibase.php: [SWAT] [config] Adjust , update property id blacklist (duration: 00m 05s) [16:33:11] hoo: Test! [16:33:16] Logged the message, Master [16:34:03] marktraceur: Seems to work [16:34:14] K, next one [16:34:19] (03CR) 10MarkTraceur: [C: 032] Set $wgCentralAuthPreventUnattached = true; on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190634 (https://phabricator.wikimedia.org/T70069) (owner: 10Hoo man) [16:34:25] (03Merged) 10jenkins-bot: Set $wgCentralAuthPreventUnattached = true; on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190634 (https://phabricator.wikimedia.org/T70069) (owner: 10Hoo man) [16:34:50] !log mw1062.eqiad.wmnet not accepting ssh login by bd808 (key refused) [16:34:56] Logged the message, Master [16:35:06] !log marktraceur Synchronized wmf-config/: [SWAT] [config] Set = true; on all wikis (duration: 00m 06s) [16:35:09] hoo: Test! [16:35:12] Logged the message, Master [16:35:23] <_joe_> bd808: mw1062 is down for now [16:35:27] Only extension updates left ;_; [16:35:55] 3operations: Decommission svn.wikimedia.org server (import SVN into Phabricator) - https://phabricator.wikimedia.org/T86655#1043768 (10Chad) >>! In T86655#1043429, @Nemo_bis wrote: > Does this include importing CodeReview comments into Audit? No. But worth discussing. > Can all viewvc URLs be safely redirected... [16:35:57] _joe_: works for me. Just logging what I could find in the trebuchet failures [16:35:59] <_joe_> bd808: I'm back! www-data conversion is allegedly done [16:36:11] marktraceur: " Cannot create account: The requested username would conflict with another username on another wiki. " [16:36:16] that's pretty cool [16:36:17] So: *check* [16:36:26] Great [16:36:43] Next is gi11es [16:37:26] Gotta wait for Jenkins. [16:37:35] So how 'bout them Packers [16:37:41] <_joe_> bd808: mmmh looking at mw1159 [16:37:53] <_joe_> it seems like something got corrupted there [16:37:55] marktraceur: ack = yes. Sorry for being unclear :/ [16:38:13] was figuring out how to delete test page. [16:38:25] kart_: No problemo, I figured it out :) [16:38:42] _joe_: *nod* `salt-call deploy.fetch 'scap/scap'` sometimes gives better error messages on what's broken [16:38:57] My unpublished protocol spec reserves "ack" for indicating receipt of messages, but the committee will revisit that. :P [16:38:58] * bd808 can't do that in prod but has in beta [16:39:19] marktraceur: ack is for search! [16:39:22] <_joe_> root@mw1159:/srv/deployment/scap/scap# git remote update [16:39:22] <_joe_> Fetching origin [16:39:22] <_joe_> error: Unable to find 9140a922bfdc5d72aa621658233afbe7fa9cf56b under http://tin.eqiad.wmnet/scap/scap/.git [16:40:12] plz not to be breaking the deploy tools in the middle of deploy :) [16:40:31] <_joe_> marktraceur: will not! [16:40:46] (03PS5) 10KartikMistry: WIP: Give apertium-admins access to kartik [puppet] - 10https://gerrit.wikimedia.org/r/189915 [16:41:03] <_joe_> marktraceur: well technically scap is broken there [16:41:12] So...don't run scap? [16:41:17] I doubt I need to, but still [16:45:00] !log marktraceur Synchronized php-1.25wmf16/extensions/MultimediaViewer/resources/mmv/ui/: [SWAT] [wmf16] Media Viewer share/embed fix (duration: 00m 05s) [16:45:03] Logged the message, Master [16:45:03] gi11es: Test! [16:45:11] marktraceur: testing... [16:45:29] bd808, _joe_, one sync error, "extra arguments found" [16:46:41] !log marktraceur Synchronized php-1.25wmf17/extensions/MultimediaViewer/resources/mmv/ui/: [SWAT] [wmf17] Media Viewer share/embed fix (duration: 00m 07s) [16:46:42] gi11es: And on wmf17 when you get a chance [16:46:44] Logged the message, Master [16:47:43] marktraceur: I see the fix on '16, not on '17 yet [16:47:46] Hrm [16:47:57] there's usually a small delay, nothing to worry about [16:48:02] I'll check again in a couple of minutes [16:48:04] Yeah, probably fine [16:48:16] On to the core fixes for StatusValue... [16:48:41] marktraceur: fix confirmed on '17 [16:48:47] Sweet [16:48:58] Deskana, any news of Jamesofur? [16:49:11] Nope [16:49:31] Oh, I got an email [16:49:34] He's not coming [16:49:41] We'll just make sure things don't die, it'll be fine [16:49:44] Oh, yeah [16:49:47] Yep [16:49:52] There are no ongoing elections right now [16:50:07] So even if it doesn't fix the problem... it's broken right now anyway, so that's not much worse :P [16:50:07] <_joe_> bd808: I should've fixed mw1159 in a bad way [16:51:55] "hurt me so good" [16:51:59] <_joe_> !log upgrading testwiki to use www-data, may cause a brief downtime [16:52:05] Logged the message, Master [16:52:42] RIP testwiki [16:52:49] <_joe_> funny how testwiki presents some peculiar difficulties sometimes, being one-of-a-kind [16:53:08] It's OK, test2wiki, I know _joe_ loves you too [16:53:22] <_joe_> paravoid: we may want to re-enable icinga checks [16:53:28] I think test2wiki is "normal" [16:53:32] <_joe_> s/checks/messages/ [16:53:38] But testwikidatawiki is the unloved child [16:53:53] We need a testtestwikitest [16:54:09] wikiwikitesttest (there's a song in this...) [16:54:09] <_joe_> hey nerds, it's already done :P [16:54:18] Ooh, Jenkins got around to merging my patch [16:54:18] * James_F grins. [16:55:19] !log marktraceur Synchronized php-1.25wmf17/includes/filerepo/FileRepo.php: [SWAT] [wmf17] Make sure Commons uploading is still working later today (duration: 00m 05s) [16:55:23] Logged the message, Master [16:56:02] odd log message [16:56:14] greg-g: Howzat? [16:56:24] marktraceur: I mean, shouldn't you anyways? [16:56:43] Shouldn't I what? [16:58:25] marktraceur: make sure it's still working [16:58:47] greg-g: Yes, but AaronS's StatusValue patch broke both uploading and file deletion [16:58:58] I just fixed the first one, second one coming soon [16:59:37] (03CR) 10GWicke: [C: 031] "+1 from me. The Parsoid spec is very stable, and the API & spec are covered by thousands of tests. There is little need to test against th" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/190776 (https://phabricator.wikimedia.org/T89558) (owner: 10KartikMistry) [17:00:52] I'm going over the allotted time, but nobody else is deploying for a while [17:01:01] I'll be done in 15 minutes or so [17:03:02] 3Analytics-Cluster, operations: Audit analytics cluster alerts and recipients - https://phabricator.wikimedia.org/T89730#1043869 (10Ottomata) 3NEW a:3Ottomata [17:04:58] !log marktraceur Synchronized php-1.25wmf17/includes/Status.php: [SWAT] [wmf17] Make sure Commons file deletion is still working later today (duration: 00m 06s) [17:05:03] Logged the message, Master [17:05:16] !log marktraceur Synchronized php-1.25wmf17/tests/phpunit/includes/StatusTest.php: [SWAT] [wmf17] Make sure Commons file deletion is still working later today (duration: 00m 06s) [17:05:21] Logged the message, Master [17:05:29] Working! [17:05:34] OK, SecurePoll finally [17:06:10] Afternoon [17:06:22] Anything been reported with Wikisource being slow? [17:06:27] You're the first, Qcoder00 [17:06:34] which language? [17:06:36] English [17:06:38] 3operations, Beta-Cluster: File upload area resorts to 0777 permissions to for uploaded content - https://phabricator.wikimedia.org/T75206#1043885 (10Joe) [17:06:40] 3operations, Beta-Cluster: Make www-data the web-serving user (is currently apache) - https://phabricator.wikimedia.org/T78076#1043884 (10Joe) 5Open>3Resolved [17:06:49] Probably localised for me [17:06:53] No error spike [17:06:57] Some other Non WMF sites have been slow for a bit [17:07:04] Assuming localised.. [17:07:07] * Qcoder00 out [17:07:15] lol [17:07:26] too bad, I wanted to debug this :) [17:07:30] 3operations, Beta-Cluster: Minimize differences between beta and production (Tracking) - https://phabricator.wikimedia.org/T87220#1043889 (10Joe) [17:08:06] <_joe_> paravoid: icinga-wm should get back in the room [17:08:16] 3Ops-Access-Requests, operations: Requesting sudo for hafnium for nuria - https://phabricator.wikimedia.org/T88988#1043890 (10RobH) 5Open>3stalled So, it turns out we didn't have an operations meeting this past Monday, due to being a holiday in the USA. We currently do NOT have an approval method for sudo a... [17:08:19] _joe_: already done, see above [17:08:27] <_joe_> oh ok [17:08:36] <_joe_> I expected one alert to recover [17:08:50] <_joe_> but it was in warning, not critical :) [17:09:01] 3Phabricator, operations: The options of the Security dropdown in Phabricator need to be clear and documented - https://phabricator.wikimedia.org/T76564#1043900 (10chasemp) 5Open>3Resolved a:3chasemp Marking this closed then as the description matches what I know to be reality. [17:09:53] PROBLEM - puppet last run on mw1019 is CRITICAL: CRITICAL: Puppet last ran 7 hours ago [17:10:33] PROBLEM - puppet last run on xenon is CRITICAL: CRITICAL: Puppet last ran 7 hours ago [17:11:02] RECOVERY - puppet last run on mw1019 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures [17:11:03] PROBLEM - puppet last run on mw1018 is CRITICAL: CRITICAL: Puppet last ran 7 hours ago [17:11:18] <_joe_> here they are [17:13:13] RECOVERY - puppet last run on mw1018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:14:06] !log marktraceur Synchronized php-1.25wmf17/extensions/SecurePoll/includes/ballots/Ballot.php: [SWAT] [wmf17] Backport SecurePoll_BallotStatus fix (duration: 00m 05s) [17:14:13] Logged the message, Master [17:14:22] Nothing obvious broken [17:15:21] bd808: Want to stare at the site for a minute and make sure it's not broken? Or Deskana? [17:15:31] (mw.org or testwikis, no other site has wmf17 yet) [17:15:53] * Deskana stares [17:16:06] Thankee [17:16:18] And with that, unless I have forgotten anyone, I declare SWAT closed. [17:16:23] You've been a wonderful audience [17:16:27] Tip your release managers [17:16:44] Verified not dead. [17:17:01] (03PS4) 10Rush: phabricator using mysql fulltext T89274, tweaked for mariadb/aria [puppet] - 10https://gerrit.wikimedia.org/r/190775 (owner: 10Springle) [17:17:15] (03CR) 10Rush: [C: 032] "looks good to me man. thanks for doing this" [puppet] - 10https://gerrit.wikimedia.org/r/190775 (owner: 10Springle) [17:22:18] 3Phabricator, operations: The options of the Security dropdown in Phabricator need to be clear and documented - https://phabricator.wikimedia.org/T76564#1043976 (10faidon) Was the idea of calling it a "software security bug" rather than MediaWiki rejected? It's not clear to me. [17:24:12] 3Phabricator, operations: The options of the Security dropdown in Phabricator need to be clear and documented - https://phabricator.wikimedia.org/T76564#1043981 (10chasemp) >>! In T76564#1043976, @faidon wrote: > Was the idea of calling it a "software security bug" rather than MediaWiki rejected? It's not clear... [17:24:43] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [17:25:41] !log powering down virt1005, waiting a few seconds, power on [17:25:47] Logged the message, Master [17:26:15] 3operations, Phabricator: Mysql search issues flagged by Phabricator setup - https://phabricator.wikimedia.org/T89274#1043996 (10chasemp) @springle I think the puppet looks good. When is good for you to knock this out? I'll try to sync up with you on irc also. Thanks again [17:26:25] Is labs offline? [17:26:51] (tools, to be more specific) [17:27:43] PROBLEM - Host virt1005 is DOWN: PING CRITICAL - Packet loss = 100% [17:28:47] sjoerddebruin: part of it. A good place to start is in #wikimedia-labs [17:29:03] 3OTRS, operations, Project-Creators: Project Proposal: Label style projects for common operations tools - https://phabricator.wikimedia.org/T1147#1044003 (10chasemp) 5Open>3Resolved a:3chasemp >>! In T1147#1035535, @Springle wrote: > +1 to a DBA or Databases tag. > > Although given our future includes va... [17:29:13] Mhm, I thought this was the channel for such things. [17:34:33] RECOVERY - Host virt1005 is UP: PING OK - Packet loss = 0%, RTA = 1.79 ms [17:34:53] RECOVERY - RAID on virt1005 is OK: OK: Active: 16, Working: 16, Failed: 0, Spare: 0 [17:35:03] RECOVERY - Disk space on virt1005 is OK: DISK OK [17:35:19] 3Scrum-of-Scrums, operations, Services, RESTBase: RESTbase deployment - https://phabricator.wikimedia.org/T1228#1044019 (10GWicke) [17:40:37] 3Project-Creators, operations, Phabricator: Create projects for Ops goals - https://phabricator.wikimedia.org/T87262#1044033 (10chasemp) 5Open>3Resolved a:3chasemp Here is my take away: - Everyone, especially me, needs to be explicit and forward with project creation - I talked to the "owner" of the #inte... [17:41:08] (03PS1) 10Gage: hiera: update for hadoop namenode migration [puppet] - 10https://gerrit.wikimedia.org/r/191078 [17:43:45] (03PS1) 10Andrew Bogott: Add virt1012 to the nova-compute pool. Remove virt1005. [puppet] - 10https://gerrit.wikimedia.org/r/191079 [17:46:08] (03CR) 10Hashar: "You are welcome _joe_ :-]" [puppet] - 10https://gerrit.wikimedia.org/r/190946 (https://phabricator.wikimedia.org/T89649) (owner: 10Hashar) [17:46:39] (03CR) 10Ottomata: [C: 032] hiera: update for hadoop namenode migration [puppet] - 10https://gerrit.wikimedia.org/r/191078 (owner: 10Gage) [17:47:02] thanks jgage [17:51:31] PROBLEM - RAID on virt1005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:51:43] (03CR) 10Andrew Bogott: [C: 032] Add virt1012 to the nova-compute pool. Remove virt1005. [puppet] - 10https://gerrit.wikimedia.org/r/191079 (owner: 10Andrew Bogott) [17:52:28] * greg-g just looked at the swat deploy window for this morning [17:52:30] RECOVERY - RAID on virt1005 is OK: OK: Active: 16, Working: 16, Failed: 0, Spare: 0 [17:52:31] 'twas a big one [17:53:30] 3Project-Creators, operations, Phabricator: Create projects for Ops goals - https://phabricator.wikimedia.org/T87262#1044051 (10Krenair) Thank you @chasemp. [17:55:12] (03CR) 10Alexandros Kosiaris: [C: 04-1] "I 'd prefer it if we avoided this. While being able to restart apertium in production is obviously useful I 'd rather an ops person does i" [puppet] - 10https://gerrit.wikimedia.org/r/189915 (owner: 10KartikMistry) [17:56:11] PROBLEM - puppet last run on virt1012 is CRITICAL: CRITICAL: Puppet has 7 failures [18:00:04] maxsem, kaldari: Respected human, time to deploy Mobile Web (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150217T1800). Please do the needful. [18:00:37] greg-g: and we did a scap deploy seconds before :) [18:01:49] RECOVERY - puppet last run on virt1012 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [18:01:56] bd808: gutsy [18:02:39] !log adding virt1012 to the nova virt pool [18:02:48] Logged the message, Master [18:10:21] (03CR) 10Gage: "one minor comment inline: speed hack by removing conditional" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/190231 (https://phabricator.wikimedia.org/T88870) (owner: 10BryanDavis) [18:16:37] 3Triagers, Project-Creators, Phabricator, operations: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1044083 (10JAnstee_WMF) I need to be given permission to create new projects. Please add me to project creators group. [18:17:39] (03PS1) 10Ori.livneh: Correct StatsFormatString so it emits valid statsd data [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191087 [18:19:03] (03PS2) 10Ori.livneh: Correct StatsFormatString so it emits valid statsd data [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191087 (https://phabricator.wikimedia.org/T85641) [18:19:14] (03CR) 10Ori.livneh: [C: 032] Correct StatsFormatString so it emits valid statsd data [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191087 (https://phabricator.wikimedia.org/T85641) (owner: 10Ori.livneh) [18:19:24] (03Merged) 10jenkins-bot: Correct StatsFormatString so it emits valid statsd data [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191087 (https://phabricator.wikimedia.org/T85641) (owner: 10Ori.livneh) [18:21:00] !log ori Synchronized wmf-config/CommonSettings.php: Icd6766440: Correct StatsFormatString so it emits valid statsd data (duration: 00m 07s) [18:21:06] Logged the message, Master [18:21:56] twentyafterfour: yurikR will be doing a ZeroBanner deploy right now, it might take about 40-50 minutes (he has to scap with some new messages/translations) [18:22:29] where "now" == "in 5 minutes" (he has to get ready [18:22:38] E_UNBALANCEDPARENS [18:22:40] (03PS1) 10Ori.livneh: Set $wgUDPProfilerPort to 8125 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191089 [18:22:41] and needs a go-ahead from us [18:22:48] waits for it, I mean [18:22:52] (03CR) 10Ori.livneh: [C: 032] Set $wgUDPProfilerPort to 8125 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191089 (owner: 10Ori.livneh) [18:22:57] (03Merged) 10jenkins-bot: Set $wgUDPProfilerPort to 8125 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191089 (owner: 10Ori.livneh) [18:23:00] paravoid: /me nods, I assumed he had coordinated with you [18:23:16] he has, he's standing by [18:23:18] bblack: ^ :) [18:23:20] cool [18:23:24] godspeed :) [18:23:29] !log ori Synchronized wmf-config/CommonSettings.php: Ie5879ec6a: Set $wgUDPProfilerPort to 8125 (duration: 00m 07s) [18:23:35] Logged the message, Master [18:24:12] yurikR, greg-g:do you want if we do it at 11:00 PST, i.e. in 35'? [18:24:35] paravoid, yep [18:24:41] do you mind* [18:24:51] 3operations: Varnish GeoIP is broken for HTTPS+IPv6 traffic - https://phabricator.wikimedia.org/T89688#1044106 (10BBlack) Well, we get different versions of XFF at different layers, and there are probably different considerations for whether we trust XFF's that came from the outside world, too. But the basics a... [18:25:01] (03PS1) 10Alexandros Kosiaris: Add IOPS to diskstat.py gmond plugin [puppet] - 10https://gerrit.wikimedia.org/r/191090 [18:27:14] 3operations: Varnish GeoIP is broken for HTTPS+IPv6 traffic - https://phabricator.wikimedia.org/T89688#1044111 (10Ottomata) > we could/should probably use everywhere to set our own custom internal header(s) to communicate the real client IP to MediaWiki and/or Analytics as appropriate. That would be amazingly u... [18:30:53] ori: that doesn't work [18:30:55] 2015-02-17 18:30:23+0000 [-] Bad line: 'mw.lag_cache_miss_expired:1|m\nmw.mobile.view.cache-miss:1|m\nmw.pcache_hit:1|m\nmw.image_cache_hit:2|m\n' [18:30:58] 2015-02-17 18:30:23+0000 [-] Bad line: 'mw.lag_cache_miss_expired:1|m\nmw.image_cache_hit:3|m' [18:31:01] 2015-02-17 18:30:23+0000 [-] Bad line: 'enwiki - 1 0.000000 0.000000 0.000141 0.000000 Setup.php-extensions-GlobalCssJsHooks::onExtensionFunctions\nenwiki - 1 0.000000 0.000000 0.000021 0.000000 Setup.php-extensions-efCentralNoticeSetup\nenwiki - 1 0.000000 0.000000 0.000085 0.000000 Setup.php-extensions-VisualEditorHooks::onSetup\nenwiki - 1 0.000000 0.000000 0.000050 0.000000 Setup.php-extensions-GuidedTourHooks::onSetup\nenwiki - 1 0. [18:31:08] 2015-02-17 18:30:23+0000 [-] Bad line: 'mw.rl-minify-css-cache-hits:2|m\nmw.rl-minify-js-cache-hits:1|m' [18:31:34] godog: ack, fixing [18:32:05] I guess a cc on the code review would have been nice too [18:32:39] sorry, you're right [18:32:52] I don't want to be right, I want to be CC'ed :) [18:33:52] anyways let's see how txstatsd will do [18:36:19] paravoid yurikR : it's fine with me, can you please let me know when you are finished so I can start the mediawiki 'train' deployment [18:36:28] 3Ops-Access-Requests, operations: Requesting sudo for hafnium for nuria - https://phabricator.wikimedia.org/T88988#1044169 (10Nuria) I can certainly wait another week w/o problems. [18:36:34] greg-g: ^ [18:36:39] twentyafterfour, ok [18:36:51] * greg-g nods [18:39:03] (03PS1) 10Ori.livneh: Enable forceprofile on mw1017 only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191097 [18:40:21] 3Ops-Access-Requests, operations: Requesting access to analytics-privatedata-users for jamesur - https://phabricator.wikimedia.org/T89739#1044188 (10Jalexander) 3NEW [18:40:21] (03CR) 10MaxSem: [C: 04-1] Enable gather extension on en beta labs (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189863 (owner: 10Robmoen) [18:41:11] 3operations: Rolling restart for Elasticsearch to pick up new version of wikimedia-extra plugin - https://phabricator.wikimedia.org/T86602#1044198 (10fgiunchedi) completed after a couple of false starts, next restarts can be mostly unattended I think ``` elastic1001.eqiad.wmnet: elastic+ 19165 758 41.6 8265016... [18:41:28] 3operations: Rolling restart for Elasticsearch to pick up new version of wikimedia-extra plugin - https://phabricator.wikimedia.org/T86602#1044199 (10fgiunchedi) 5stalled>3Resolved [18:41:39] !log yurik Started scap: (no message) [18:41:42] Logged the message, Master [18:42:34] (03Abandoned) 10Ori.livneh: Enable forceprofile on mw1017 only [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191097 (owner: 10Ori.livneh) [18:43:54] (03Abandoned) 10Ejegg: Special:RecordImpression now sampled client-side [puppet] - 10https://gerrit.wikimedia.org/r/188395 (https://phabricator.wikimedia.org/T45250) (owner: 10Ejegg) [18:52:05] !log cold-migrating all instances from virt1005 to virt1012 [18:52:09] Logged the message, Master [18:58:26] (03PS1) 10Ori.livneh: Set UDP section profiler to write to port 3811 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191101 [19:00:03] (03PS2) 10Ori.livneh: Set UDP section profiler to write to port 3811 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191101 [19:00:04] twentyafterfour, greg-g: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150217T1900). Please do the needful. [19:00:19] godog: ^ [19:00:51] (03CR) 10Gage: "Looks like this task is relevant to the labstore issue Chase pointed out:" [puppet] - 10https://gerrit.wikimedia.org/r/160628 (owner: 10Matanya) [19:01:38] gerrit: too "smart" about linking phab tickets [19:02:21] jgage: heh. There's a bug about that in phab somewhere [19:03:36] hm, i don't see it [19:03:54] (03CR) 10Filippo Giunchedi: [C: 032] Set UDP section profiler to write to port 3811 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191101 (owner: 10Ori.livneh) [19:03:56] (03PS3) 10AndyRussG: DO NOT DEPLOY BEFORE https://gerrit.wikimedia.org/r/#/c/182074/ Ugly URLs to override mobile redirect for CentralNotice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/182078 [19:03:59] (03Merged) 10jenkins-bot: Set UDP section profiler to write to port 3811 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191101 (owner: 10Ori.livneh) [19:04:11] ori: LGTM, I have to go now tho, I shouldn't have +2'd that perhaps [19:04:39] godog: I'll babysit it [19:05:08] ack, thanks [19:05:11] (03CR) 10AndyRussG: "Rebass'd" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/182078 (owner: 10AndyRussG) [19:06:05] (03PS4) 10AndyRussG: Ugly URLs to override mobile redirect for CentralNotice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/182078 [19:08:03] (03PS1) 10Dzahn: add language domains in .wiki TLD [dns] - 10https://gerrit.wikimedia.org/r/191104 [19:09:10] 3hardware-requests, Wikimedia-Logstash, operations: purchase 3 additional logstash nodes - https://phabricator.wikimedia.org/T89402#1044306 (10RobH) I've gotten the first round of quotes back and updated the RT ticket. Unfortunately, all the various spec discussions resulted in my making a mistake and not going... [19:09:55] @seen MZMcBride [19:09:55] mutante: I have never seen MZMcBride [19:10:04] @current_nick Gloria [19:13:47] Fiona [19:13:49] mutante: Fiona [19:14:17] thanks guys [19:15:19] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00666666666667 [19:15:19] (03PS1) 10Ori.livneh: Revert "Revert "Revert "Use ProfilerSectionOnly to handle DB/filebackend entries and the like""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191107 [19:15:21] !log yurik scap failed: OSError [Errno 2] No such file or directory: '/var/lock/scap' (duration: 33m 42s) [19:15:22] (03CR) 10jenkins-bot: [V: 04-1] Revert "Revert "Revert "Use ProfilerSectionOnly to handle DB/filebackend entries and the like""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191107 (owner: 10Ori.livneh) [19:15:26] Logged the message, Master [19:16:05] 3Ops-Access-Requests, operations: Requesting access to analytics-privatedata-users for jamesur - https://phabricator.wikimedia.org/T89739#1044357 (10Philippe-WMF) Approved for CA. [19:16:54] /msg nickserv info ;) [19:18:25] greg-g, ... :( [19:18:56] twentyafterfour, my scap failed for some reason, but it synced i18n, so all seems ok [19:20:28] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [19:20:58] shouldn't scap create /var/lock/scap when it doesn't already exist? bd808: is this a scap bug that I should patch? [19:21:19] twentyafterfour: It doesn't? [19:21:32] "scap failed: OSError [Errno 2] No such file or directory: '/var/lock/scap'" [19:22:08] https://github.com/wikimedia/mediawiki-tools-scap/blob/master/scap/utils.py#L246 [19:22:19] w+ should create if not there [19:23:12] Or was this bit what failed? https://github.com/wikimedia/mediawiki-tools-scap/blob/master/scap/utils.py#L256 [19:23:20] deleting the lock file at the end [19:23:50] bd: it was at the end [19:23:56] bd808: % [19:23:57] bd808, where should i copy/paste python log? [19:24:16] yurikR: phab paste? [19:24:17] for scap [19:25:03] twentyafterfour, yurikR: sounds like a bug in this commit -- https://gerrit.wikimedia.org/r/#/c/183560/ [19:25:18] bd808, https://phabricator.wikimedia.org/P307 [19:25:55] yurikR: got it. you're all good. the scap worked [19:26:05] (03PS1) 10Dzahn: add project domains in .wiki TLD [dns] - 10https://gerrit.wikimedia.org/r/191109 [19:26:26] not at all sure how there could be a race for that lockfile cleanup [19:26:29] (03CR) 10jenkins-bot: [V: 04-1] add project domains in .wiki TLD [dns] - 10https://gerrit.wikimedia.org/r/191109 (owner: 10Dzahn) [19:30:03] bd808: so something deleted the lock before scap got a chance to? [19:30:17] yeah I don't get it [19:30:19] PROBLEM - puppet last run on amssq56 is CRITICAL: CRITICAL: Puppet has 1 failures [19:30:20] that's what it looks like [19:30:30] I guess it's harmless [19:30:45] well no so much if it logs failures to SAL [19:30:58] so maybe at least a try/catch is needed [19:31:05] yeah [19:31:18] PROBLEM - puppet last run on amssq51 is CRITICAL: CRITICAL: Puppet has 1 failures [19:34:01] (03PS1) 10Ottomata: Paremeterize hive.support.concurrency [puppet/cdh] - 10https://gerrit.wikimedia.org/r/191114 [19:34:13] (03PS2) 10Ottomata: Paremeterize hive.support.concurrency [puppet/cdh] - 10https://gerrit.wikimedia.org/r/191114 [19:34:53] (03CR) 10Ottomata: [C: 032] Paremeterize hive.support.concurrency [puppet/cdh] - 10https://gerrit.wikimedia.org/r/191114 (owner: 10Ottomata) [19:35:16] (03CR) 10AndyRussG: "In theory we're ready for this to be merged and deployed. It'd be nice to watch it a bit on the beta cluster first, though...." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/182078 (owner: 10AndyRussG) [19:36:39] RECOVERY - puppet last run on amssq51 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:36:49] (03PS3) 10BryanDavis: logstash: Support MediaWiki logs via Syslog [puppet] - 10https://gerrit.wikimedia.org/r/190231 (https://phabricator.wikimedia.org/T88870) [19:37:39] RECOVERY - puppet last run on amssq56 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [19:38:07] (03PS1) 10Ottomata: Set support_concurrency to false for hive [puppet] - 10https://gerrit.wikimedia.org/r/191116 [19:38:14] (03CR) 10BryanDavis: logstash: Support MediaWiki logs via Syslog (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/190231 (https://phabricator.wikimedia.org/T88870) (owner: 10BryanDavis) [19:41:19] (03CR) 10Ottomata: [C: 032] Set support_concurrency to false for hive [puppet] - 10https://gerrit.wikimedia.org/r/191116 (owner: 10Ottomata) [19:42:28] yurikR: you're all finished with scap? I'm about to begin deployment if you are finished. [19:43:01] twentyafterfour, it failed a while ago, but it is all good [19:43:32] yurikR: the failure was just a bogus error message [19:44:20] (failed to clean up the lock but it completed everything else) [19:44:28] cool, thx [19:44:36] but yes, totally done [19:45:20] (03PS2) 10Phuedx: Adding original language of this work campaign for WikiGrok [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188731 (owner: 10Kaldari) [19:45:58] (03CR) 10Phuedx: Adding original language of this work campaign for WikiGrok (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188731 (owner: 10Kaldari) [19:49:04] (03CR) 10Gage: [C: 032] logstash: Support MediaWiki logs via Syslog [puppet] - 10https://gerrit.wikimedia.org/r/190231 (https://phabricator.wikimedia.org/T88870) (owner: 10BryanDavis) [19:50:30] jgage: Don't forget about the pre-requisite patch at https://gerrit.wikimedia.org/r/#/c/179759 [19:51:05] ah thanks [19:51:44] (03CR) 10Gage: [C: 032] logstash: parse json encoded hhvm fatal errors [puppet] - 10https://gerrit.wikimedia.org/r/179759 (owner: 10BryanDavis) [19:51:57] (03PS2) 10Dzahn: add project domains in .wiki TLD [dns] - 10https://gerrit.wikimedia.org/r/191109 [19:52:18] (03CR) 10jenkins-bot: [V: 04-1] add project domains in .wiki TLD [dns] - 10https://gerrit.wikimedia.org/r/191109 (owner: 10Dzahn) [19:52:41] bd808: ok, both are merged now [19:56:27] (03PS2) 10Ori.livneh: Revert "Revert "Revert "Use ProfilerSectionOnly to handle DB/filebackend entries and the like""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191107 [19:56:49] (03CR) 10Ori.livneh: [C: 032] Revert "Revert "Revert "Use ProfilerSectionOnly to handle DB/filebackend entries and the like""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191107 (owner: 10Ori.livneh) [19:56:53] (03Merged) 10jenkins-bot: Revert "Revert "Revert "Use ProfilerSectionOnly to handle DB/filebackend entries and the like""" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191107 (owner: 10Ori.livneh) [19:57:40] !log ori Synchronized wmf-config/StartProfiler.php: I6fbd48e6b: Revert "Revert "Revert "Use ProfilerSectionOnly to handle DB/filebackend entries and the like""" (duration: 00m 05s) [19:57:45] Logged the message, Master [20:03:43] what should I do with unstaged changes to /wmf-config/CommonSettings.php in mediawiki-staging? [20:03:51] + $wgUDPProfilerPort = '3812'; [20:04:01] +$wgStatsFormatString = 'mw.%3$s:%2$s|m' . "\r"; [20:05:11] stash them? [20:07:16] also, anyone care if we have `tig` installed on tin? it's a handy curses-based git UI which I'm addicted to and it would be helpful to me when viewing git status during deployments [20:12:08] according to policy we don't manually install packages on prod servers, so if you want tig it should be installed via puppet imo [20:14:19] that's fine, puppet or otherwise. I don't think I'm authorized to sudo anyway that's why I was asking in here [20:16:23] twentyafterfour: modules/base/manifests/standard-packages.pp [20:18:31] that would install it everywhere, no? [20:18:59] i haven't used tig before but at first glance it looks ok - one binary with normal privs, minimal dependencies [20:19:14] deps are already installed on tin [20:21:09] PROBLEM - MariaDB disk space on db2011 is CRITICAL: DISK CRITICAL - free space: /srv 86988 MB (5% inode=99%): [20:21:42] ok one more time: what should I do with the unstaged changes to commonsettings in mediawiki-staging? I stashed them for now but I don't want my deploy to break things, not sure why the changes weren't committed [20:22:15] I guess I'll deploy without those changes ... [20:23:26] twentyafterfour, undeployed changes? revert unless it looks like shit will break if you do [20:24:32] MaxSem: it looks like shit will break [20:24:40] o_0 [20:24:42] + $wgUDPProfilerPort = '3812'; [20:24:51] I assume that was an intentional change [20:25:02] then yell at whomever did that change [20:25:16] how can I tell? it was uncommitted [20:25:22] (03CR) 10Bmansurov: [C: 031] Adding original language of this work campaign for WikiGrok [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188731 (owner: 10Kaldari) [20:25:28] mmm, legoktm or bd808|LUNCH ? [20:25:39] both are lunching right now [20:26:25] is it deployed? [20:26:40] if not, I guess reverting it wouldn't hurt [20:27:11] * MaxSem goes to luch too [20:27:58] I don't know how to tell if it's deployed [20:28:08] It's getting reverted, I'm running out of deployment window [20:29:45] 3Wikimedia-General-or-Unknown, operations: DMARC: Users cannot send emails via a wiki's [[Special:EmailUser]] - https://phabricator.wikimedia.org/T66795#1044685 (10csteipp) >>! In T66795#1033712, @Jalexander wrote: > I'd really like to get this moving if possible, I'm starting to get more and more complaints bot... [20:30:08] (03PS1) 10AndyRussG: (Labs only) Ugly URLs to override mobile redirect for CentralNotice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191132 [20:32:24] (03PS5) 10AndyRussG: Ugly URLs to override mobile redirect for CentralNotice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/182078 [20:32:34] (03PS3) 10Dzahn: add project domains in .wiki TLD [dns] - 10https://gerrit.wikimedia.org/r/191109 (https://phabricator.wikimedia.org/T88873) [20:32:54] (03PS2) 10Dzahn: add language domains in .wiki TLD [dns] - 10https://gerrit.wikimedia.org/r/191104 (https://phabricator.wikimedia.org/T88873) [20:32:59] (03CR) 10jenkins-bot: [V: 04-1] add project domains in .wiki TLD [dns] - 10https://gerrit.wikimedia.org/r/191109 (https://phabricator.wikimedia.org/T88873) (owner: 10Dzahn) [20:33:15] (03CR) 10MZMcBride: "I remember Brion saying that we should avoid this top-level domain scam. A donation of domains is nice, but we're not required to accept i" [dns] - 10https://gerrit.wikimedia.org/r/191104 (https://phabricator.wikimedia.org/T88873) (owner: 10Dzahn) [20:34:31] (03CR) 10AndyRussG: "Separated out beta labs change (https://gerrit.wikimedia.org/r/#/c/191132/) and rebased on that. So we'll not deploy this until the other " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/182078 (owner: 10AndyRussG) [20:35:13] (03CR) 10MZMcBride: "I remember Brion saying that we should avoid this top-level domain scam. A donation of domains is nice, but we're not required to accept i" [dns] - 10https://gerrit.wikimedia.org/r/191109 (https://phabricator.wikimedia.org/T88873) (owner: 10Dzahn) [20:36:05] 3Ops-Access-Requests, operations: Requesting sudo access to vanadium for mforns - https://phabricator.wikimedia.org/T89471#1044696 (10mforns) Hi, sorry for the late response (long weekend), Yes, I meant root on vanadium. And it seems I need it on ALL commands, to be able to deploy eventlogging, look at logs, st... [20:36:22] twentyafterfour: I think "$wgUDPProfilerPort = '3812';" may have been ori [20:36:51] yes. is it in the way? you can revert or deploy, either won't matter. [20:37:25] ori: ok thanks, I stashed it [20:37:47] (03PS1) 1020after4: Group1 wikis to 1.25wmf17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191134 [20:38:34] (03CR) 1020after4: [C: 032] Group1 wikis to 1.25wmf17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191134 (owner: 1020after4) [20:38:39] (03Merged) 10jenkins-bot: Group1 wikis to 1.25wmf17 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191134 (owner: 1020after4) [20:41:36] !log twentyafterfour rebuilt wikiversions.cdb and synchronized wikiversions files: group1 to $VERSION [20:41:42] Logged the message, Master [20:41:45] ugh [20:42:03] $VERSION=1.25wmf17 [20:44:08] fatalmonitor is showing lots of slow query logs like: [10000ms] at runtime/ext_mysql: slow query: SELECT MASTER_POS_WAIT('db1038-bin.001162', 882938399, 10) [20:44:22] mysql overloaded? [20:44:53] 3operations: Puppet failing on zirconium due to inability to git pull Transparency Report - https://phabricator.wikimedia.org/T89640#1044720 (10Dzahn) I moved the docroot of that around recently to make it consistent with other docroots on zirconium. It might be related, checking. I had copied it with rsync thou... [20:46:18] 3Triagers, Project-Creators, Phabricator, operations: Broaden the group of users that can create projects in Phabricator - https://phabricator.wikimedia.org/T706#1044723 (10Qgil) @JAnstee_WMF done! Please remember to follow https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_projects [20:46:58] (03CR) 10Dzahn: "woohoo, thanks Alex, that was nice. i'll make them smaller in the future" [puppet] - 10https://gerrit.wikimedia.org/r/189898 (owner: 10Dzahn) [20:48:47] (03PS2) 10BBlack: varnish+jessie filesystem stuff [puppet] - 10https://gerrit.wikimedia.org/r/190610 [20:48:48] (03PS1) 10BBlack: Add normalize_path to mobile vcl_recv [puppet] - 10https://gerrit.wikimedia.org/r/191136 [20:53:28] MaxSem what does ^ mean? I'll be around at 4 [20:54:07] JonKatz, it eans "read above" [20:54:19] MaxSem cool, thanks! [20:56:08] 3operations: Puppet failing on zirconium due to inability to git pull Transparency Report - https://phabricator.wikimedia.org/T89640#1044737 (10Dzahn) yea, no. the changed i was referring to was https://gerrit.wikimedia.org/r/#/c/190036/ but that doesn't seem related. i'm also getting the password question when... [20:56:10] twentyafterfour, looks like a db overload - poke springle [21:00:36] 3operations: Puppet failing on zirconium due to inability to git pull Transparency Report - https://phabricator.wikimedia.org/T89640#1044755 (10Dzahn) It seems like this project has been deleted on Gerrit. I don't know who or why deleted it though. [21:13:43] (03PS2) 10MaxSem: Enable gather extension on en beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189863 (owner: 10Robmoen) [21:13:59] (03CR) 10Faidon Liambotis: [C: 031] "Sounds good." [puppet] - 10https://gerrit.wikimedia.org/r/191136 (owner: 10BBlack) [21:16:47] (03PS3) 10Faidon Liambotis: varnish: fix GeoIP's get_relevant_ip function [puppet] - 10https://gerrit.wikimedia.org/r/190964 [21:16:58] bblack: ^ [21:17:18] 3operations: Puppet failing on zirconium due to inability to git pull Transparency Report - https://phabricator.wikimedia.org/T89640#1044799 (10Krenair) It's accessible via https://github.com/wikimedia/wikimedia-TransparencyReport still [21:17:59] (03CR) 10Andrew Bogott: [C: 032] On debian, ensure that idmapd is running on labs instances. [puppet] - 10https://gerrit.wikimedia.org/r/190948 (owner: 10Andrew Bogott) [21:18:14] paravoid: you got rid of the len part? [21:18:27] yes because it's a noop [21:18:32] it happens later anyway [21:18:51] ok [21:19:09] (03PS3) 10MaxSem: Enable gather extension on en beta labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/189863 (owner: 10Robmoen) [21:19:15] (03CR) 10BBlack: [C: 031] varnish: fix GeoIP's get_relevant_ip function [puppet] - 10https://gerrit.wikimedia.org/r/190964 (owner: 10Faidon Liambotis) [21:25:15] 3operations, Wikimedia-Git-or-Gerrit: Puppet failing on zirconium due to inability to git pull Transparency Report - https://phabricator.wikimedia.org/T89640#1044832 (10hashar) The project is no more in Gerrit: `gerrit ls-projects -m ransparency` yields nothing. #Wikimedia-Git-or-Gerrit watchers might know. [21:25:56] 3operations, Wikimedia-Git-or-Gerrit: Puppet failing on zirconium due to inability to git pull Transparency Report - https://phabricator.wikimedia.org/T89640#1044834 (10hashar) + @Chad our Gerrit guru :-] [21:28:09] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [21:42:22] !log Set email for dewiki account "Ar-ras" to the email of the commons account with the same name [21:42:29] Logged the message, Master [21:45:37] 3Phabricator, operations: The options of the Security dropdown in Phabricator need to be clear and documented - https://phabricator.wikimedia.org/T76564#1044865 (10Qgil) I also think that "Software security bug" is clearer than "MediaWiki security bug". (Sorry for not replying before; I updated the description... [21:46:32] is graphite still working? [21:46:59] i'm getting 0-numbers for ruwiki for the past two hours even though it seems to work fine [21:47:32] bblack, greg-g ^ [21:48:28] 3Phabricator, operations: The options of the Security dropdown in Phabricator need to be clear and documented - https://phabricator.wikimedia.org/T76564#1044875 (10Qgil) [21:49:10] paravoid, ^ [21:49:28] i meant ^^ [21:50:25] (03PS1) 10Gerardduenas: Set $wgAllowMicrodataAttributes to true at hewikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191192 (https://phabricator.wikimedia.org/T89655) [21:52:00] ottomata, around? [21:52:53] yuphiya [21:55:16] yurikR: hiya [21:55:49] ottomata, hi, do you know if graphite works ok? [21:55:59] afaik [21:56:01] what'sup? [21:56:07] 3operations, Wikimedia-Git-or-Gerrit: Puppet failing on zirconium due to inability to git pull Transparency Report - https://phabricator.wikimedia.org/T89640#1044899 (10Dzahn) i put a request on the wiki page to have it (re-)created but with a warning message that it existed before and linking back here https:/... [21:56:33] gerrit is really dogging for me on pull [21:59:17] not sure why but gerrit is running super slow today [22:00:04] it took forever to push each time I've tried today [22:00:11] I ran into an internal server error earlier [22:00:17] repositories have gone missing [22:00:32] twentyafterfour, yeah I think something is wrong with gerrit [22:01:25] and gerrit is maintained by ^demon? [22:01:39] or qchris also knows a bit [22:01:55] those two are the duo of light in term of gerrit :) [22:02:00] it's, uh.... um [22:02:10] something along those lines. [22:06:23] (03PS1) 10Rush: phab update security option text [puppet] - 10https://gerrit.wikimedia.org/r/191195 [22:09:16] twentyafterfour: Browsing through the logs, I could not find anything special. Looks like it's "just" the usual gerrit slowness around this time of the day. [22:09:56] qchris: do you know if it logs project creations and deletions? [22:11:07] mutante: it doesn't (or rather: /me is not aware of logs of those) [22:11:49] qchris: ok, thanks, i thought so too [22:12:25] qchris: i made a request to have a project re-created that disappeared.. on the wiki page [22:13:01] * qchris looks. [22:13:01] but i dont know if it's bug or admin powers that removed it in the first place [22:13:26] That removal was requested. [22:13:34] qchris: it's also T89640 [22:13:49] may i ask why? it is still deployed on our servers [22:13:57] and also copied to github [22:14:11] Yup it is. [22:14:18] Legal knows about it. [22:14:38] It's especially weird to have a "TransparencyReport" half hidden. [22:15:57] (03CR) 10Rush: [C: 032] phab update security option text [puppet] - 10https://gerrit.wikimedia.org/r/191195 (owner: 10Rush) [22:16:13] 3Phabricator, operations: The options of the Security dropdown in Phabricator need to be clear and documented - https://phabricator.wikimedia.org/T76564#1044931 (10chasemp) [22:17:00] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [22:19:50] 3Phabricator, operations: The options of the Security dropdown in Phabricator need to be clear and documented - https://phabricator.wikimedia.org/T76564#1044934 (10chasemp) >>! In T76564#1044865, @Qgil wrote: > I also think that "Software security bug" is clearer than "MediaWiki security bug". > {F687} [22:30:43] (03PS1) 10QChris: Disable cloning of TransparencyReport until the repo is public again [puppet] - 10https://gerrit.wikimedia.org/r/191198 (https://phabricator.wikimedia.org/T89640) [22:30:55] (03Abandoned) 10MaxSem: Kill old (skins|live)-1.5 stuff [puppet] - 10https://gerrit.wikimedia.org/r/162768 (owner: 10MaxSem) [22:31:41] (03CR) 10Dzahn: [C: 032] Disable cloning of TransparencyReport until the repo is public again [puppet] - 10https://gerrit.wikimedia.org/r/191198 (https://phabricator.wikimedia.org/T89640) (owner: 10QChris) [22:33:09] RECOVERY - puppet last run on zirconium is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [22:35:59] 3operations, Wikimedia-Git-or-Gerrit: Puppet failing on zirconium due to inability to git pull Transparency Report - https://phabricator.wikimedia.org/T89640#1044975 (10Krenair) a:5Dzahn>3QChris That says: > The repo is currently not publicly clonable by private request. > Hence, turning cloning off for now,... [22:37:08] !log deploy fixes for T85850, T88310, T85855 [22:37:16] Logged the message, Master [22:38:20] 3operations, Wikimedia-Git-or-Gerrit: Puppet failing on zirconium due to inability to git pull Transparency Report - https://phabricator.wikimedia.org/T89640#1044980 (10QChris) The repo was requested to be hidden (for the time being) in private mails. @Prtksxna: I let you speak to why. Cloning is turned off fo... [22:43:05] 3operations, Wikimedia-Git-or-Gerrit: TransparencyReport repository master in Gerrit silently made private - https://phabricator.wikimedia.org/T89640#1044995 (10Krenair) [22:47:33] 3operations, Wikimedia-Git-or-Gerrit: TransparencyReport repository master in Gerrit silently made private - https://phabricator.wikimedia.org/T89640#1045010 (10Dzahn) The puppet run on zirconium is fixed, because cloning is temp. disabled now. [22:52:15] 3ops-codfw, operations: take a look at fdb2001 (in fundraising rack) and see whether it actually has a bad hdd - https://phabricator.wikimedia.org/T89407#1045029 (10Papaul) Replacement drive will be on site tomorrow. [23:12:11] gwicke: Your labs instance ‘api’ was built with the ‘(testing)’ debian image, and is hence forever cursed. May I delete it? [23:12:37] andrewbogott: yes, haven't really set that up yet anyway [23:12:45] is there a prod jessie image now? [23:13:19] gwicke: yep, I’m committed to long-term support for the new image. [23:13:24] So, go ahead and rebuild with that. [23:13:44] andrewbogott: thx, will do [23:13:58] (and yay again for jessie ;) [23:14:30] twentyafterfour: FYI, there are now "supported" jessie images for labs, we should probably use those for staging cluster ^^ [23:18:47] gwicke: are mem01 and mem02 yours as well? [23:18:51] Looks like _joe_ made them [23:19:10] don't think so [23:19:23] ok, will ask him instead [23:33:45] greg-g: sounds good [23:37:06] (03PS2) 10Dzahn: create shell account for Tyler Cipriani [puppet] - 10https://gerrit.wikimedia.org/r/190408 (https://phabricator.wikimedia.org/T89378) [23:37:51] (03CR) 10jenkins-bot: [V: 04-1] create shell account for Tyler Cipriani [puppet] - 10https://gerrit.wikimedia.org/r/190408 (https://phabricator.wikimedia.org/T89378) (owner: 10Dzahn) [23:38:39] (03PS3) 10Dzahn: create shell account for Tyler Cipriani [puppet] - 10https://gerrit.wikimedia.org/r/190408 (https://phabricator.wikimedia.org/T89378) [23:42:49] (03CR) 10RobH: [C: 032] create shell account for Tyler Cipriani [puppet] - 10https://gerrit.wikimedia.org/r/190408 (https://phabricator.wikimedia.org/T89378) (owner: 10Dzahn) [23:44:50] 3Ops-Access-Requests, operations: Give Tyler Cipriani shell access (with access to CI systems as well) - https://phabricator.wikimedia.org/T89378#1045158 (10RobH) 5Open>3Resolved a:3RobH So submitted on wednesday, thursday (1), friday (2), monday (holiday), tuesday (3). I'm merging your request (since @dz... [23:45:02] thcipriani: ^ [23:45:33] ugh, unrelated puppet issue on gallium [23:45:47] since we are on PST i count the end of the PST day as the third day (i didnt do this when i was on east coast) [23:47:35] 3Ops-Access-Requests, operations: Give Tyler Cipriani shell access (with access to CI systems as well) - https://phabricator.wikimedia.org/T89378#1045167 (10Dzahn) @thcipriani Notice: /Stage[main]/Admin/Admin::Hashuser[thcipriani]/Admin::User[thcipriani]/File[/home/thcipriani/.ssh/authorized_keys]/ensure: crea... [23:48:19] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: Puppet has 2 failures [23:50:38] 3operations, Phabricator, Wikimedia-Bugzilla: Create a static HTML version of Bugzilla - https://phabricator.wikimedia.org/T85140#1045169 (10JohnLewis) >>! In T85140#1038257, @jayvdb wrote: > * attachments yet to be done, and "Show Obsolete" JavaScript still broken (reported earlier: T85140#987797) After lookin... [23:51:42] !log apt-get upgrade on gallium [23:51:48] Logged the message, Master [23:52:55] (03CR) 10Ejegg: [C: 032] (Labs only) Ugly URLs to override mobile redirect for CentralNotice [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191132 (owner: 10AndyRussG) [23:53:39] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:59:58] I can do the swat today as it's mostly my stuff anyway:P