Skip to content

Common problems and their resolutions

andrzej maczuga edited this page Apr 14, 2016 · 19 revisions

This document is a work in progress! It should get filled in fairly fast though.

Platform deployment problems

Spring application deployment times out

Problem

A spring application times out during start with no indication of a problem. The last log message displayed is similar to the following:

2016-01-20T11:33:53.55+0000 [App/0] OUT 2016-01-20 11:33:53.554 INFO 29 — [ost-startStop-1] o.s.b.c.e.ServletRegistrationBean : Mapping servlet: 'dispatcherServlet' to [/]

This happens more frequently on Openstack instances with a low number of compute nodes.

Resolution

Adding entropy to the hypervisor OS (i.e. the one running Openstack compute processes) solves the problem. While adding a hardware random number generator is preferable, the following solution also works:

sudo aptitude install rng-tools -y
sudo rngd -r /dev/urandom -o /dev/random

Please note that the above lowers the cryptographic strength of the keys generated by the application, so it is not recommended on production systems.

Platform operational problems

Error listing app instance numbers

Problem

When listing apps with cf a, app instance numbers show up as ?/1, for example:

user-management           started           ?/1         512M     1G     user-management.example.com
cdh-broker                started           ?/1         128M     1G     cdh-broker.example.com
hdfs-broker               started           ?/1         1G       1G     hdfs-broker.example.com
ipython-broker            started           ?/1         256M     1G     ipython-broker.example.com

Resolution

The resolution can be found here, under the Recovering from HM9000 Failure section. You can additionaly stop all hm9000 processes beforehand and start them in the following order: etcd1 -> hm1 -> etcd2 -> hm2.

Error parsing JSON

Problem

Can't list services in cf service-access when using the cf cli client:

Error parsing JSON: invalid syntax

Resolution

This is a bug in cf-cli 6.12.3. Downgrade do 6.12. Users of trustedanalytics/cloudfoundry-mkappstack need to set cfbinver in appstack.mk to 6.12.0

Cannot access manually spawned Centos instance on Openstack

Problem

When trying to SSH into an Centos instance that was manually created with the TAP provided image, you get:

Permission denied (publickey).

Resolution

You HAVE TO select the Configuration Drive option in the Advanced tab when creating an instance. This is used by the cloud-init scripts to get instance data such as authorized keys. More info available here: http://docs.openstack.org/user-guide/cli_config_drive.html

Cannot create GearPump service instance

Problem

No GearPump service instances can be created.

Explanation

gearpump-broker uses yarn-broker to obtain hadoop configuration files. Due to misconfiguration of yarn-broker, gearpump-broker couldn't work. If you see:

java.lang.ClassNotFoundException: Class org.trustedanalytics.zookeeper.mapping.ZookeeperMapping not found 

in gearpump-broker logs, you've just bumped into the case.

Resolution

Newer version of yarn-broker fixes the issue. If you cannot use the version, use the quick workaround.

gearpump-broker is written so it gets the configs, unpacks them and stores in dedicated folder. But there is this option, that if a config file is already present there (in the folder) it's not overwritten. The fix is to obtain and correct core-site.xml (it contains the problematic configuration section), put it into gearpump-broker jar and push the app again.

After gp-broker tried to create gp instance, it placed the configs in /app/yarn-conf/. So you could obtain core-site.xml and download locally like this:

cf files gearpump-broker /app/yarn-conf/core-site.xml > core-site.xml

Find and remove the following section from the file:

<property>
   <name>hadoop.security.group.mapping</name>
   <value>org.trustedanalytics.zookeeper.mapping.ZookeeperMapping</value>
</property>

Now, put the file into /yarn-conf folder of gearump-broker jar. Push the application (don't overwrite environment variables) and the problem should be fixed.

End-user problems

Error dialing loggregator

Problem

User can't tail logs of an application using cf logs app, gets the following error:

Error dialing loggregator server: Get https://loggregator.X.X.X.X.xip.io:443/recent?app=APPID: x509: certificate is valid for , not loggregator.X.X.X.X.xip.io.

Resolution

The api should be targeted with the --skip-ssl-validation flag, for example cf api api.X.X.X.X.xip.io --skip-ssl-validation.

Explanation

The root cause of this problem is an invalid or self-signed certificate for the domain the environment uses. This is common for testing instances using the xip.io domain.

Can't access dashboard for applications started via Marketplace

Problem

User can't access dashboard for applications started via Marketplace. HTTP code 500 is returned to browser.

Resolution

To fix this issue one need to:

  1. Login to cdh-launcher instance

  2. From cdh-launcher login to nginx-instance

  3. On nginx-instance edit nginx.conf file via

sudo vim  /etc/nginx/nginx.conf
  1. Change:
    proxy_buffering off;
    proxy_connect_timeout   180;
    proxy_send_timeout      180;
    proxy_read_timeout      900;

to:

    proxy_buffering off;
    proxy_connect_timeout   180;
    proxy_buffer_size        8k;
    proxy_send_timeout      180;
    proxy_read_timeout      900;
  1. Restart nginx service via:
sudo service nginx restart
  1. Logout

  2. Login to bastion host and go to workspace/deployments/docker-services-boshworkspace/ directory:

  cd workspace/deployments/docker-services-boshworkspace/

If your are using AWS run:

  bosh deployment docker-aws-vpc

Else, if you are using OpenStack run:

  bosh deployment docker-openstack
  1. Login to docker bosh VM via (same command for AWS and OpenStack):
  bosh ssh

When asked Enter password (use it to sudo on remote host): invent and remember any password. You will need it later.

  1. When in docker bosh VM edit /var/vcap/jobs/cf-containers-broker/packages/cf-containers-broker/config/initializers/omniauth.rb file:
   sudo vim /var/vcap/jobs/cf-containers-broker/packages/cf-containers-broker/config/initializers/omniauth.rb

When sudo ask you for password, provide password from step 8.

and change

  DASHBOARD_CLIENT_PROC = lambda do |env|
    request = Rack::Request.new(env)
    service = Catalog.find_service_by_guid(request.session[:service_guid])
    env['omniauth.strategy'].options[:client_id] = service.dashboard_client['id']
    env['omniauth.strategy'].options[:client_secret] = service.dashboard_client['secret']
    env['omniauth.strategy'].options[:auth_server_url] = Configuration.auth_server_url
    env['omniauth.strategy'].options[:token_server_url] = Configuration.token_server_url
    env['omniauth.strategy'].options[:scope] = %w(cloud_controller_service_permissions.read openid)
  end

to

  DASHBOARD_CLIENT_PROC = lambda do |env|
    request = Rack::Request.new(env)
    service = Catalog.find_service_by_guid(request.session[:service_guid])
    env['omniauth.strategy'].options[:client_id] = service.dashboard_client['id']
    env['omniauth.strategy'].options[:client_secret] = service.dashboard_client['secret']
    env['omniauth.strategy'].options[:auth_server_url] = Configuration.auth_server_url
    env['omniauth.strategy'].options[:token_server_url] = Configuration.token_server_url
    env['omniauth.strategy'].options[:scope] = %w(cloud_controller_service_permissions.read openid)
    env['omniauth.strategy'].options[:skip_ssl_validation] = Settings.skip_ssl_validation
  end
  1. Edit /var/vcap/jobs/cf-containers-broker/packages/cf-containers-broker/lib/uaa_session.rb via:
  sudo vim /var/vcap/jobs/cf-containers-broker/packages/cf-containers-broker/lib/uaa_session.rb

and change:

      client = CF::UAA::TokenIssuer.new(
        Configuration.auth_server_url,
        service.dashboard_client['id'],
        service.dashboard_client['secret'],
        token_target: Configuration.token_server_url,
      )

to:

      client = CF::UAA::TokenIssuer.new(
        Configuration.auth_server_url,
        service.dashboard_client['id'],
        service.dashboard_client['secret'],
        token_target: Configuration.token_server_url,
        skip_ssl_validation: Settings.skip_ssl_validation,
      )
  1. Edit /var/vcap/jobs/cf-containers-broker/packages/cf-containers-broker/app/controllers/manage/instances_controller.rb via:
  sudo vim /var/vcap/jobs/cf-containers-broker/packages/cf-containers-broker/app/controllers/manage/instances_controller.rb

change:

    def ensure_all_necessary_scopes_are_approved
      token_hash = CF::UAA::TokenCoder.decode(@uaa_session.access_token, verify: false)
      return true if has_necessary_scopes?(token_hash)

      if need_to_retry?
        session[:has_retried] = 'true'
        redirect_to '/manage/auth/cloudfoundry'
        return false
      else
        session[:has_retried] = 'false'
        render 'errors/approvals_error'
        return false
      end
    end

to:

    def ensure_all_necessary_scopes_are_approved
      begin
        token_hash = CF::UAA::TokenCoder.decode(@uaa_session.access_token, verify: false)
        return true if has_necessary_scopes?(token_hash)
      rescue
        need_to_retry = true
      end

      if need_to_retry?
        session[:has_retried] = 'true'
        redirect_to '/manage/auth/cloudfoundry'
        return false
      else
        session[:has_retried] = 'false'
        render 'errors/approvals_error'
        return false
      end
    end
  1. Restart cf-containers-broker via:
   sudo /var/vcap/bosh/bin/monit restart cf-containers-broker
  1. After 30 seconds check if cf-containers-broker started correctly
   sudo /var/vcap/bosh/bin/monit summary

You should see some thing like:

The Monit daemon 5.2.4 uptime: 19h 59m

Process 'docker'                    running
Process 'cf-containers-broker'      running
Process 'cf-containers-broker-route-registrar' running
  1. Logout
Clone this wiki locally