[00:03:07] 6operations: boron passive checks aren't being collected - https://phabricator.wikimedia.org/T89983#1158947 (10Dzahn) confirmed send_nsca is installed on boron and can send packets over to neon: @boron:/etc/cron.d# /usr/sbin/send_nsca -H neon.wikimedia.org <--> @neon:~# tcpdump port 5667 | grep boron also:... [00:07:29] bd808: meow? [00:08:54] * AaronS had is box hangung in auto-shutdown since "vagrant up" was still running [00:10:07] hey. Krenair was dangling some memcached failures in front of me that I thought were a new/increasing problem but that may have been a false alarm. [00:10:29] sorry [00:10:33] it's not coming up very often [00:10:34] * bd808 was multitasking and trying to pass the problem on nerd snipe style [00:14:30] gtg [00:14:49] 6operations: boron passive checks aren't being collected - https://phabricator.wikimedia.org/T89983#1158953 (10Dzahn) confirmed it's trusty's version of send_nsca. i could use the one from precise and it worked: echo -e "boron\tcheck_disk\t0\ttest" | /tmp/send_nsca -H neon.wikimedia.org -c /etc/send_nsca.cfg [... [00:18:27] 6operations: boron passive checks aren't being collected - https://phabricator.wikimedia.org/T89983#1158954 (10Dzahn) https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?search_string=boron works again. re-enabled notifications. but it's a hack for now. so don't close yet. [00:20:34] !log Attached local accounts to "Advance", per request: enwiki, commonswiki, metawiki, nlwiktionary and nlwikinews [00:20:41] Logged the message, Master [00:22:29] 6operations: reinstall OCG servers - https://phabricator.wikimedia.org/T84723#1158963 (10Dzahn) a:5Dzahn>3None could somebody help me here and take a look? [00:22:49] 6operations, 5Patch-For-Review: remove ganglia(old), replace with ganglia_new - https://phabricator.wikimedia.org/T93776#1158971 (10Dzahn) p:5Triage>3Normal [00:46:22] (03PS1) 10Gage: ipsec-global: add /bin to path [puppet] - 10https://gerrit.wikimedia.org/r/200276 [00:50:43] tricksy conditional :- [00:54:13] (03PS5) 10BryanDavis: proxies: allow filtering by datacenter [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [00:55:56] (03CR) 10BryanDavis: [C: 031] "More refactoring.Tested on me dev server and it still works without enabling the filter. Not sure where we can test the filter outside of " [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [02:11:35] (03CR) 1020after4: [C: 031] proxies: allow filtering by datacenter [tools/scap] - 10https://gerrit.wikimedia.org/r/200130 (owner: 10Giuseppe Lavagetto) [02:23:02] !log l10nupdate Synchronized php-1.25wmf22/cache/l10n: (no message) (duration: 07m 03s) [02:23:19] Logged the message, Master [02:27:36] !log LocalisationUpdate completed (1.25wmf22) at 2015-03-28 02:26:33+00:00 [02:27:42] Logged the message, Master [02:48:06] !log l10nupdate Synchronized php-1.25wmf23/cache/l10n: (no message) (duration: 06m 50s) [02:48:16] Logged the message, Master [02:52:48] !log LocalisationUpdate completed (1.25wmf23) at 2015-03-28 02:51:45+00:00 [02:52:54] Logged the message, Master [03:00:40] 6operations, 7HTTPS, 3HTTPS-by-default: Force all Wikimedia cluster traffic to be over SSL for all users (logged-in and anon) - https://phabricator.wikimedia.org/T49832#1159087 (10Tony_Tan_98) That's great to hear! Thanks. [03:15:57] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [03:27:38] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [03:45:27] (03CR) 1020after4: [C: 031] beta: Fix ::beta::autoupdater to work again [puppet] - 10https://gerrit.wikimedia.org/r/200248 (https://phabricator.wikimedia.org/T94261) (owner: 10BryanDavis) [04:19:07] (03PS1) 10BryanDavis: monolog: MWLoggerMonologSamplingHandler -> Monolog\Handler\SamplingHandler [mediawiki-config] - 10https://gerrit.wikimedia.org/r/200286 [04:32:32] (03Abandoned) 1020after4: fix puppet error due to missing parent directory [puppet] - 10https://gerrit.wikimedia.org/r/198461 (owner: 1020after4) [04:34:50] (03CR) 1020after4: "ok the reason I asked is that I'd +2 this but I don't want to break things when it gets merged." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188388 (https://phabricator.wikimedia.org/T75905) (owner: 10Reedy) [04:35:27] (03CR) 1020after4: "actually this cannot merge according to gerrit" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/188388 (https://phabricator.wikimedia.org/T75905) (owner: 10Reedy) [04:36:44] (03CR) 1020after4: "@nikerabbit: yeah I suppose it could" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/169716 (https://bugzilla.wikimedia.org/67154) (owner: 10Reedy) [04:39:13] (03Abandoned) 1020after4: Observe the remote IP reported by X_FORWARDED_FOR header from proxy server [puppet] - 10https://gerrit.wikimedia.org/r/184837 (https://phabricator.wikimedia.org/T840) (owner: 1020after4) [04:57:16] (03PS3) 10BryanDavis: logstash: Ship logs via syslog udp datagrams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191259 (https://phabricator.wikimedia.org/T88732) [04:57:31] (03CR) 10BryanDavis: "Rebased on I780b4fd02cb16b111dda33fe37c773f62c7c930f" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191259 (https://phabricator.wikimedia.org/T88732) (owner: 10BryanDavis) [05:01:57] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [05:09:58] PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: puppet fail [05:11:47] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [05:15:21] (03PS4) 10BryanDavis: logstash: Ship logs via syslog udp datagrams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191259 (https://phabricator.wikimedia.org/T88732) [05:23:51] (03CR) 10BryanDavis: "I have made several changes based on conversations I had with Ori:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/191259 (https://phabricator.wikimedia.org/T88732) (owner: 10BryanDavis) [05:28:07] RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [05:32:05] 6operations, 10MediaWiki-extensions-Sentry, 6Multimedia, 10hardware-requests, 3Multimedia-Sprint-2015-03-25: Procure hardware for Sentry - placeholder (not a live request) - https://phabricator.wikimedia.org/T93138#1159149 (10RobH) a:5Tgr>3RobH [06:29:46] PROBLEM - puppet last run on cp1056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:47] PROBLEM - puppet last run on db1051 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:47] PROBLEM - puppet last run on amssq34 is CRITICAL: CRITICAL: puppet fail [06:30:57] PROBLEM - puppet last run on virt1006 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:48] PROBLEM - puppet last run on mw2104 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:28] PROBLEM - puppet last run on mw2093 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:28] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:56] PROBLEM - puppet last run on mw2127 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:09] PROBLEM - puppet last run on mw2017 is CRITICAL: CRITICAL: Puppet has 1 failures [06:36:26] PROBLEM - puppet last run on mw1211 is CRITICAL: CRITICAL: Puppet has 1 failures [06:43:46] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 203, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr2-codfw:xe-5/2/1 (Telia, IC-307236) (#3658) [10Gbps wave]BR [06:45:27] RECOVERY - puppet last run on mw2104 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [06:45:47] RECOVERY - puppet last run on virt1006 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [06:45:57] RECOVERY - puppet last run on mw2127 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:45:57] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [06:46:17] RECOVERY - puppet last run on cp1056 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:17] RECOVERY - puppet last run on db1051 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:47:07] RECOVERY - puppet last run on mw2093 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:07] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:47:47] RECOVERY - puppet last run on mw2017 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [06:47:57] RECOVERY - puppet last run on mw1211 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:48:57] RECOVERY - puppet last run on amssq34 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:55:47] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [07:28:27] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 205, down: 0, dormant: 0, excluded: 0, unused: 0 [07:44:38] PROBLEM - very high load average likely xfs on ms-be1009 is CRITICAL: CRITICAL - load average: 275.08, 172.97, 82.44 [07:53:02] !log LocalisationUpdate ResourceLoader cache refresh completed at Sat Mar 28 07:51:56 UTC 2015 (duration 51m 55s) [07:53:12] Logged the message, Master [07:56:58] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [08:08:37] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [08:41:14] (03CR) 10Glaisher: Restore unregistered editing on mobile sites (staggered) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198691 (https://phabricator.wikimedia.org/T93210) (owner: 10Nemo bis) [08:45:11] (03CR) 10Glaisher: Restore unregistered editing on mobile sites (staggered) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/198691 (https://phabricator.wikimedia.org/T93210) (owner: 10Nemo bis) [09:06:37] PROBLEM - swift-container-auditor on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:06:47] PROBLEM - swift-object-replicator on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:06:47] PROBLEM - Disk space on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:06:48] PROBLEM - dhclient process on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:06:57] PROBLEM - swift-object-server on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:06:57] PROBLEM - swift-account-reaper on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:06:57] PROBLEM - salt-minion processes on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:07:07] PROBLEM - swift-container-updater on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:07:17] PROBLEM - swift-account-auditor on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:07:18] PROBLEM - puppet last run on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:07:27] PROBLEM - DPKG on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:07:36] PROBLEM - swift-object-updater on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:07:37] PROBLEM - swift-account-replicator on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:07:47] PROBLEM - swift-object-auditor on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:07:57] PROBLEM - swift-account-server on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:07:57] PROBLEM - configured eth on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:08:07] PROBLEM - swift-container-replicator on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:08:08] PROBLEM - swift-container-server on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:08:16] PROBLEM - RAID on ms-be1009 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [09:25:32] <_joe_> mh ms-be1009 again [10:15:21] 6operations: upload.wikimedia.org not loading 3/27/2015 - https://phabricator.wikimedia.org/T94269#1159219 (10Aklapper) For future reference: https://www.mediawiki.org/wiki/How_to_report_a_bug [10:17:37] PROBLEM - puppet last run on virt1011 is CRITICAL: CRITICAL: Puppet has 3 failures [10:34:16] RECOVERY - puppet last run on virt1011 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [10:45:32] !log powercycle ms-be1009 [10:45:37] Logged the message, Master [10:48:07] PROBLEM - Host ms-be1009 is DOWN: PING CRITICAL - Packet loss = 100% [10:48:36] RECOVERY - DPKG on ms-be1009 is OK: All packages OK [10:48:37] RECOVERY - swift-object-auditor on ms-be1009 is OK: PROCS OK: 3 processes with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [10:48:37] RECOVERY - swift-object-updater on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-updater [10:48:37] RECOVERY - swift-account-replicator on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-replicator [10:48:47] RECOVERY - Host ms-be1009 is UP: PING OK - Packet loss = 0%, RTA = 2.66 ms [10:49:06] RECOVERY - configured eth on ms-be1009 is OK: NRPE: Unable to read output [10:49:06] RECOVERY - swift-account-server on ms-be1009 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [10:49:17] RECOVERY - swift-container-replicator on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [10:49:27] RECOVERY - swift-container-server on ms-be1009 is OK: PROCS OK: 25 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [10:49:27] RECOVERY - RAID on ms-be1009 is OK: OK: optimal, 14 logical, 14 physical [10:49:27] RECOVERY - Disk space on ms-be1009 is OK: DISK OK [10:49:27] RECOVERY - swift-object-replicator on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-replicator [10:49:27] RECOVERY - swift-container-auditor on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-auditor [10:49:37] RECOVERY - swift-object-server on ms-be1009 is OK: PROCS OK: 101 processes with regex args ^/usr/bin/python /usr/bin/swift-object-server [10:49:37] RECOVERY - dhclient process on ms-be1009 is OK: PROCS OK: 0 processes with command name dhclient [10:49:47] RECOVERY - swift-container-updater on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-updater [10:49:47] RECOVERY - swift-account-reaper on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-reaper [10:49:47] RECOVERY - salt-minion processes on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [10:49:47] RECOVERY - very high load average likely xfs on ms-be1009 is OK: OK - load average: 18.98, 7.88, 2.90 [10:50:06] RECOVERY - swift-account-auditor on ms-be1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-account-auditor [10:58:19] 7Blocked-on-Operations, 6operations, 6Phabricator, 5Patch-For-Review: have any task put into ops-access-requests automatically generate an ops-access-review task - https://phabricator.wikimedia.org/T87467#1159226 (10mmodell) 5stalled>3Resolved [10:58:47] 6operations, 6Phabricator: have any task put into ops-access-requests automatically generate an ops-access-review task - https://phabricator.wikimedia.org/T87467#991959 (10mmodell) [14:48:28] PROBLEM - puppetmaster https on virt1000 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:56:47] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.060 second response time [14:57:41] !log graceful’d apache2 on virt1000 [14:58:05] morebots, you there? [14:58:11] hm [14:58:54] I am a logbot running on tools-exec-10. [14:58:54] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [14:58:54] To log a message, type !log . [15:01:06] !log graceful’d apache2 on virt1000 [15:01:45] grrrrr [15:02:23] morebots, what gives? [15:02:37] andrewbogott: restart morebots perhaps? (despite its response anyway) [15:02:44] I just did [15:02:56] I am a logbot running on tools-exec-13. [15:02:56] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [15:02:56] To log a message, type !log . [15:02:58] Oh didn't see that [15:03:27] PROBLEM - puppet last run on lvs4001 is CRITICAL: CRITICAL: Puppet has 1 failures [15:03:46] PROBLEM - puppetmaster https on virt1000 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:05:51] andrewbogott: fyi, logins to Wikitech are timing out for me which may be why it fails [15:06:04] yep [15:06:31] JohnFLewis: is that better? [15:06:41] !log graceful’d apache2 on virt1000 [15:06:46] !log and restarted keystone on virt1000 [15:06:48] Logged the message, Master [15:06:52] Yeah [15:06:53] Logged the message, Master [15:06:57] RECOVERY - puppetmaster https on virt1000 is OK: HTTP OK: Status line output matched 400 - 335 bytes in 0.066 second response time [15:07:05] so virt1000 oom’d over night :( [15:08:10] Seems like a regular thing with virt1000 :/ lets hope the new virt orders process quickly anyway [15:09:45] new virt hosts won’t help with virt1000. But https://phabricator.wikimedia.org/T90627 might [15:10:30] I thought the new virts was going to include a virt1000 replacement as well though? [15:11:09] 6operations, 6Labs: OOM on virt1000 - https://phabricator.wikimedia.org/T88256#1159347 (10Andrew) This happened again last night. Something must be running amok and gobbling memory. [15:12:04] JohnFLewis: not necessarily. It has plenty of memory already, something is running wild and eating it all [15:12:20] Right. [15:12:27] maybe keystone has a leak. [15:20:07] RECOVERY - puppet last run on lvs4001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:07:38] PROBLEM - puppet last run on amssq32 is CRITICAL: CRITICAL: puppet fail [17:24:27] RECOVERY - puppet last run on amssq32 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [20:05:22] 6operations, 6Labs, 7Monitoring, 5Patch-For-Review: Setup alarms for labstore* to check for network saturation - https://phabricator.wikimedia.org/T92629#1159525 (10yuvipanda) [20:18:46] mutante, in the admin data, is 'real name' a valid key? [20:18:53] isn't it supposed to be just 'realname'? [20:18:56] I noticed one user has it [20:19:59] Coren: did a bunch of CR for your patches :) [20:20:49] Krenair: comment => $uinfo['realname'], [20:21:01] That's the only reference to any key with such a name [20:25:08] am talking about https://git.wikimedia.org/blob/operations%2Fpuppet.git/production/modules%2Fadmin%2Fdata%2Fdata.yaml hoo [20:25:45] or you mean, that's the only place it's used? [20:27:54] yes [20:40:59] matanya: around? [21:17:12] Krenair: https://meta.wikimedia.org/wiki/System_administrators yay :) Thanks [21:25:39] hoo, had to write a script to generate that from the yaml [21:26:12] and another script to get the parsoid output of the existing page and take note of the existing data [21:26:31] but it's now less ridiculous [21:26:35] and includes all ops+deployment [22:27:26] PROBLEM - puppet last run on mw2163 is CRITICAL: CRITICAL: puppet fail [22:45:28] RECOVERY - puppet last run on mw2163 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [22:53:03] 6operations, 10Continuous-Integration, 6Release-Engineering, 7Graphite, 7Upstream: Let us customize Zuul metrics reported to statsd - https://phabricator.wikimedia.org/T1369#1159684 (10hashar) a:5hashar>3None [22:59:53] 10Ops-Access-Requests, 6operations: Checkuser and Sysop on Wikitech for Jalexander - https://phabricator.wikimedia.org/T94319#1159697 (10Jalexander) 3NEW a:3hoo [23:02:44] 10Ops-Access-Requests, 6operations: Checkuser and Sysop on Wikitech for Jalexander - https://phabricator.wikimedia.org/T94319#1159706 (10Jalexander) (For the record I know that closing accounts on wikitech is much more complicated given the shell connection) [23:04:14] !log Gave sysop and checkuser to Jalexander@labswiki via shell from silver after doing it via meta failed. ([[phab:T94319|T94319]]) [23:04:23] Logged the message, Master [23:05:21] 10Ops-Access-Requests, 6operations: Checkuser and Sysop on Wikitech for Jalexander - https://phabricator.wikimedia.org/T94319#1159707 (10hoo) 5Open>3Resolved Done. For reference: It failed on meta: ``` (Cannot access the database: Can't connect to MySQL server on '208.80.154.136' (4) (208.80.154.136)) ``` [23:05:50] 6operations, 10Wikimedia-Site-requests: Checkuser and Sysop on Wikitech for Jalexander - https://phabricator.wikimedia.org/T94319#1159709 (10hoo) [23:11:19] an ex-coworker showed me this lil "unicode/wikipedia mashup" demo he made -- displays a grid of randomly selected unicode glyphs, and displays wikipedia article content when you click on them: https://tranquil-forest-1441.herokuapp.com , https://github.com/siznax/charpoy [23:12:26] 6operations: upload.wikimedia.org not loading 3/27/2015 - https://phabricator.wikimedia.org/T94269#1159711 (10SlayerFanatic1999) Close this report. [23:30:11] 6operations: upload.wikimedia.org not loading 3/27/2015 - https://phabricator.wikimedia.org/T94269#1159720 (10Krenair) Yeah, I did already. [23:35:54] I'm not sure what's normally supposed to be there, but https://git.wikimedia.org/ is currently"Internal error" [23:40:45] hm, thanks. i tried restarting apache on antimony, but that hasn't solved the problem. [23:41:04] jgage: Restart gitblit itself [23:42:09] hmm ok [23:42:46] well now we get a different error message :) [23:43:01] yes, that you can't even get to gitblit :p [23:43:04] that's gitblit restarting, I think [23:45:57] here we go :) [23:46:23] nice [23:46:24] thanks folks [23:46:35] it sure took its time starting up, but.. java [23:47:10] i'll open a ticket to create a monitor to catch this [23:47:56] +1 [23:54:23] 6operations, 7Monitoring: Monitor https://git.wikimedia.org/ - https://phabricator.wikimedia.org/T94320#1159721 (10Gage) 3NEW [23:56:57] hm i wish i'd checked the http response on that "internal error" message [23:57:11] guessing it was 200 because we already have an http monitor for that url [23:57:29] 6operations, 7Monitoring: Improve monitoring of https://git.wikimedia.org/ - https://phabricator.wikimedia.org/T94320#1159731 (10Gage) [23:58:31] I thought we had auto-restarting for gitblit? [23:59:42] https://gerrit.wikimedia.org/r/#/c/188480/ abandoned [23:59:43] the upstart config says respawn, but that would only be triggered if it exited. this time it seemed to have hung.