-
Notifications
You must be signed in to change notification settings - Fork 8
Common problems and their resolutions
This document is a work in progress! It should get filled in fairly fast though.
Problem
A spring application times out during start with no indication of a problem. The last log message displayed is similar to the following:
2016-01-20T11:33:53.55+0000 [App/0] OUT 2016-01-20 11:33:53.554 INFO 29 — [ost-startStop-1] o.s.b.c.e.ServletRegistrationBean : Mapping servlet: 'dispatcherServlet' to [/]
This happens more frequently on Openstack instances with a low number of compute nodes.
Resolution
Adding entropy to the hypervisor OS (i.e. the one running Openstack compute processes) solves the problem. While adding a hardware random number generator is preferable, the following solution also works:
sudo aptitude install rng-tools -y
sudo rngd -r /dev/urandom -o /dev/random
Please note that the above lowers the cryptographic strength of the keys generated by the application, so it is not recommended on production systems.
Problem
When listing apps with cf a
, app instance numbers show up as ?/1
, for example:
user-management started ?/1 512M 1G user-management.example.com
cdh-broker started ?/1 128M 1G cdh-broker.example.com
hdfs-broker started ?/1 1G 1G hdfs-broker.example.com
ipython-broker started ?/1 256M 1G ipython-broker.example.com
Resolution
The resolution can be found here, under the Recovering from HM9000 Failure
section. You can additionaly stop all hm9000 processes beforehand and start them in the following order: etcd1 -> hm1 -> etcd2 -> hm2
.
Problem
Can't list services in cf service-access
when using the cf cli client:
Error parsing JSON: invalid syntax
Resolution
This is a bug in cf-cli 6.12.3. Downgrade do 6.12. Users of trustedanalytics/cloudfoundry-mkappstack need to set cfbinver in appstack.mk to 6.12.0
Problem
When trying to SSH into an Centos instance that was manually created with the TAP provided image, you get:
Permission denied (publickey).
Resolution
You HAVE TO select the Configuration Drive option in the Advanced tab when creating an instance. This is used by the cloud-init scripts to get instance data such as authorized keys. More info available here: http://docs.openstack.org/user-guide/cli_config_drive.html
Problem
No GearPump service instances can be created.
Explanation
gearpump-broker uses yarn-broker to obtain hadoop configuration files. Due to misconfiguration of yarn-broker, gearpump-broker couldn't work. If you see:
java.lang.ClassNotFoundException: Class org.trustedanalytics.zookeeper.mapping.ZookeeperMapping not found
in gearpump-broker logs, you've just bumped into the case.
Resolution
Newer version of yarn-broker fixes the issue. If you cannot use the version, use the quick workaround.
gearpump-broker is written so it gets the configs, unpacks them and stores in dedicated folder. But there is this option, that if a config file is already present there (in the folder) it's not overwritten. The fix is to obtain and correct core-site.xml (it contains the problematic configuration section), put it into gearpump-broker jar and push the app again.
After gp-broker tried to create gp instance, it placed the configs in /app/yarn-conf/. So you could obtain core-site.xml and download locally like this:
cf files gearpump-broker /app/yarn-conf/core-site.xml > core-site.xml
Find and remove the following section from the file:
<property>
<name>hadoop.security.group.mapping</name>
<value>org.trustedanalytics.zookeeper.mapping.ZookeeperMapping</value>
</property>
Now, put the file into /yarn-conf folder of gearump-broker jar. Push the application (don't overwrite environment variables) and the problem should be fixed.
Problem
User can't tail logs of an application using cf logs app
, gets the following error:
Error dialing loggregator server: Get https://loggregator.X.X.X.X.xip.io:443/recent?app=APPID: x509: certificate is valid for , not loggregator.X.X.X.X.xip.io.
Resolution
The api should be targeted with the --skip-ssl-validation
flag, for example cf api api.X.X.X.X.xip.io --skip-ssl-validation
.
Explanation
The root cause of this problem is an invalid or self-signed certificate for the domain the environment uses. This is common for testing instances using the xip.io domain.
Problem
User can't access dashboard for applications started via Marketplace. HTTP code 500 is returned to browser.
Resolution
To fix this issue one need to:
-
Login to cdh-launcher instance
-
From cdh-launcher login to nginx-instance
-
On nginx-instance edit nginx.conf file via
sudo vim /etc/nginx/nginx.conf
- Change:
proxy_buffering off;
proxy_connect_timeout 180;
proxy_send_timeout 180;
proxy_read_timeout 900;
to:
proxy_buffering off;
proxy_connect_timeout 180;
proxy_buffer_size 8k;
proxy_send_timeout 180;
proxy_read_timeout 900;
- Restart nginx service via:
sudo service nginx restart
-
Logout
-
Login to bastion host and go to workspace/deployments/docker-services-boshworkspace/ directory:
cd workspace/deployments/docker-services-boshworkspace/
If your are using AWS run:
bosh deployment docker-aws-vpc
Else, if you are using OpenStack run:
bosh deployment docker-openstack
- Login to docker bosh VM via (same command for AWS and OpenStack):
bosh ssh
When asked Enter password (use it to sudo on remote host):
invent and remember any password. You will need it later.
- When in docker bosh VM edit
/var/vcap/jobs/cf-containers-broker/packages/cf-containers-broker/config/initializers/omniauth.rb
file:
sudo vim /var/vcap/jobs/cf-containers-broker/packages/cf-containers-broker/config/initializers/omniauth.rb
When sudo ask you for password, provide password from step 8.
and change
DASHBOARD_CLIENT_PROC = lambda do |env|
request = Rack::Request.new(env)
service = Catalog.find_service_by_guid(request.session[:service_guid])
env['omniauth.strategy'].options[:client_id] = service.dashboard_client['id']
env['omniauth.strategy'].options[:client_secret] = service.dashboard_client['secret']
env['omniauth.strategy'].options[:auth_server_url] = Configuration.auth_server_url
env['omniauth.strategy'].options[:token_server_url] = Configuration.token_server_url
env['omniauth.strategy'].options[:scope] = %w(cloud_controller_service_permissions.read openid)
end
to
DASHBOARD_CLIENT_PROC = lambda do |env|
request = Rack::Request.new(env)
service = Catalog.find_service_by_guid(request.session[:service_guid])
env['omniauth.strategy'].options[:client_id] = service.dashboard_client['id']
env['omniauth.strategy'].options[:client_secret] = service.dashboard_client['secret']
env['omniauth.strategy'].options[:auth_server_url] = Configuration.auth_server_url
env['omniauth.strategy'].options[:token_server_url] = Configuration.token_server_url
env['omniauth.strategy'].options[:scope] = %w(cloud_controller_service_permissions.read openid)
env['omniauth.strategy'].options[:skip_ssl_validation] = Settings.skip_ssl_validation
end
- Edit
/var/vcap/jobs/cf-containers-broker/packages/cf-containers-broker/lib/uaa_session.rb
via:
sudo vim /var/vcap/jobs/cf-containers-broker/packages/cf-containers-broker/lib/uaa_session.rb
and change:
client = CF::UAA::TokenIssuer.new(
Configuration.auth_server_url,
service.dashboard_client['id'],
service.dashboard_client['secret'],
token_target: Configuration.token_server_url,
)
to:
client = CF::UAA::TokenIssuer.new(
Configuration.auth_server_url,
service.dashboard_client['id'],
service.dashboard_client['secret'],
token_target: Configuration.token_server_url,
skip_ssl_validation: Settings.skip_ssl_validation,
)
- Edit
/var/vcap/jobs/cf-containers-broker/packages/cf-containers-broker/app/controllers/manage/instances_controller.rb
via:
sudo vim /var/vcap/jobs/cf-containers-broker/packages/cf-containers-broker/app/controllers/manage/instances_controller.rb
change:
def ensure_all_necessary_scopes_are_approved
token_hash = CF::UAA::TokenCoder.decode(@uaa_session.access_token, verify: false)
return true if has_necessary_scopes?(token_hash)
if need_to_retry?
session[:has_retried] = 'true'
redirect_to '/manage/auth/cloudfoundry'
return false
else
session[:has_retried] = 'false'
render 'errors/approvals_error'
return false
end
end
to:
def ensure_all_necessary_scopes_are_approved
begin
token_hash = CF::UAA::TokenCoder.decode(@uaa_session.access_token, verify: false)
return true if has_necessary_scopes?(token_hash)
rescue
need_to_retry = true
end
if need_to_retry?
session[:has_retried] = 'true'
redirect_to '/manage/auth/cloudfoundry'
return false
else
session[:has_retried] = 'false'
render 'errors/approvals_error'
return false
end
end
- Restart cf-containers-broker via:
sudo /var/vcap/bosh/bin/monit restart cf-containers-broker
- After 30 seconds check if cf-containers-broker started correctly
sudo /var/vcap/bosh/bin/monit summary
You should see some thing like:
The Monit daemon 5.2.4 uptime: 19h 59m
Process 'docker' running
Process 'cf-containers-broker' running
Process 'cf-containers-broker-route-registrar' running
- Logout