[Debian-ha-maintainers] corosync-blackbox qb_rb_chunk_read failed

Richard B Winters rik at mmogp.com
Thu Jul 23 06:37:26 BST 2015


On 07/22/2015 08:47 AM, braun wrote:
> I want to use corosync/pacemaker productive.  Is this a serious error ?
> 
> 
> corosync-blackbox ends with error:
> 
> [debug] shm size:8392704; real_size:8392704; rb->word_size:2098176
> [debug] read total of: 8392724
> ERROR: qb_rb_chunk_read failed: Connection timed out
> [trace] ENTERING qb_rb_close()
> [debug] Free'ing ringbuffer: /dev/shm/qb-create_from_file-header
> 
> root at willi:~/errorblackbox# df -kl /dev/shm
> Dateisystem    1K-Blöcke Benutzt Verfügbar Verw% Eingehängt auf
> tmpfs           16462300   14448  16447852    1% /dev/shm
> 
> 
> I followed the wiki on
> https://wiki.debian.org/Debian-HA/ClustersFromScratch and use the
> folling packets:
> 
> corosync:
>   Installiert:           2.3.4-1
>   Installationskandidat: 2.3.4-1
>   Versionstabelle:
>  *** 2.3.4-1 0
>         500 http://ppa.mmogp.com/apt/debian/ jessie/main amd64 Packages
>         100 /var/lib/dpkg/status
>      1.4.6-1.1 0
>         500 http://debian.uni-duisburg-essen.de/debian/ jessie/main
> amd64 Packag
> es
>         500 http://ftp.de.debian.org/debian/ testing/main amd64 Packages
> pacemaker:
>   Installiert:           1.1.12-1
>   Installationskandidat: 1.1.12-1
>   Versionstabelle:
>  *** 1.1.12-1 0
>         500 http://ppa.mmogp.com/apt/debian/ jessie/main amd64 Packages
>         100 /var/lib/dpkg/status
> fence-agents:
>   Installiert:           4.0.18-1
>   Installationskandidat: 4.0.18-1
>   Versionstabelle:
>  *** 4.0.18-1 0
>         500 http://ftp.de.debian.org/debian/ testing/main amd64 Packages
>         100 /var/lib/dpkg/status
>      4.0.17-2 0
>         500 http://ppa.mmogp.com/apt/debian/ jessie/main amd64 Packages
>      3.1.5-2 0
>         500 http://debian.uni-duisburg-essen.de/debian/ jessie/main
> amd64 Packag
> es
> libqb0:
>   Installiert:           0.17.1-4
>   Installationskandidat: 0.17.1-4
>   Versionstabelle:
>  *** 0.17.1-4 0
>         500 http://ftp.de.debian.org/debian/ testing/main amd64 Packages
>         100 /var/lib/dpkg/status
>      0.17.1-1 0
>         500 http://ppa.mmogp.com/apt/debian/ jessie/main amd64 Packages
>      0.11.1-2 0
>         500 http://debian.uni-duisburg-essen.de/debian/ jessie/main
> amd64 Packag
> es
> libqb-dev:
>   Installiert:           0.17.1-4
>   Installationskandidat: 0.17.1-4
>   Versionstabelle:
>  *** 0.17.1-4 0
>         500 http://ftp.de.debian.org/debian/ testing/main amd64 Packages
>         100 /var/lib/dpkg/status
>      0.17.1-1 0
>         500 http://ppa.mmogp.com/apt/debian/ jessie/main amd64 Packages
>      0.11.1-2 0
>         500 http://debian.uni-duisburg-essen.de/debian/ jessie/main
> amd64 Packag
> es
> 
> corosync.conf (modiefied from wheezy,where I don't  try corosync_blackbox):
> 
> totem {
>     version: 2
> 
>         ip_version: ipv4
> 
>     # How long before declaring a token lost (ms)
>     token: 4000
> 
>     # How many token retransmits before forming a new configuration
>     token_retransmits_before_loss_const: 10
> 
>     # How long to wait for join messages in the membership protocol (ms)
>     join: 1100
> 
>     # How long to wait for consensus to be achieved before starting a
> new round of membership configuration (ms)
>     consensus: 3600
> 
>         # how long to wait before checking for a partition when  no
> multicast
>     merge: 1000
> 
>     # Turn off the virtual synchrony filter
>     vsftype: none
> 
>     # Number of messages that may be sent by one processor on receipt of
> the token
>     max_messages: 100
> 
>     # Limit generated nodeids to 31-bits (positive signed integers)
>     clear_node_high_bit: yes
> 
>     crypto_cipher: aes256
>     crypto_hash: sha1
> 
> ###    # Disable encryption
> ###     secauth: off
> 
>     # How many threads to use for encryption/decryption
>     ### wir haben 4 cpu-cores C.B.
>      threads: 4
> 
>     # Optionally assign a fixed node id (integer)
>     ### habe ich von der ip genommen, brauchen wir eigentlich nicht in
> ip4 C.B.
> #    nodeid: 22
> 
>     # This specifies the mode of redundant ring, which may be none,
> active, or passive.
>      rrp_mode: none
> 
> ### bei den interfaces ucast genommen, da nur 2  hosts und einfacher als
> mcast
> ### geht aber irgendwie nicht
>      interface {
>         # The following values need to be set based on your environment
>         # fuer diskmirror
>         ringnumber: 0
>         bindnetaddr: 192.168.63.0
>         broadcast: yes
>         ##mcastaddr: 230.168.63.1
>         #mcastport: 5405
>         mcastport: 5405
>         }
>         transport: udpu
> }
> 
> nodelist {
>               node {
>         ring0_addr: 192.168.63.22
>         nodeid: 22
>                 }
>               node {
>         ring0_addr: 192.168.63.24
>         nodeid: 24
>                 }
> }
> 
> 
> quorum {
>         provider: corosync_votequorum
>         two_node: 1
>         expected_votes: 2
> }
> 
> 
> 
> logging {
>         fileline: off
>         to_stderr: yes
>         to_logfile: yes
>         to_syslog: yes
>     logfile: /var/log/corosync.log
>     syslog_facility: daemon
>         debug: off
>         timestamp: on
>         logger_subsys {
>                 subsys: QUORUM
>                 debug: off
>         }
> }
> 
> qb    {
>     ipc_type: shm
>     }
> 
> corosync.log:
> 
> Jul 22 13:59:32 [1431] willi corosync info    [MAIN  ] Corosync built-in
> features: dbus testagents rdma watchdog aug
> eas systemd upstart xmlconf qdevices snmp pie relro bindnow
> Jul 22 13:59:32 [1431] willi corosync notice  [TOTEM ] Initializing
> transport (UDP/IP Unicast).
> Jul 22 13:59:32 [1431] willi corosync notice  [TOTEM ] Initializing
> transmit/receive security (NSS) crypto: aes256 h
> ash: sha1
> Jul 22 13:59:32 [1431] willi corosync notice  [TOTEM ] The network
> interface [192.168.63.24] is now up.
> Jul 22 13:59:32 [1431] willi corosync notice  [SERV  ] Service engine
> loaded: corosync configuration map access [0]
> Jul 22 13:59:32 [1431] willi corosync info    [QB    ] server name: cmap
> Jul 22 13:59:32 [1431] willi corosync notice  [SERV  ] Service engine
> loaded: corosync configuration service [1]
> Jul 22 13:59:32 [1431] willi corosync info    [QB    ] server name: cfg
> Jul 22 13:59:32 [1431] willi corosync notice  [SERV  ] Service engine
> loaded: corosync cluster closed process group
> service v1.01 [2]
> Jul 22 13:59:32 [1431] willi corosync info    [QB    ] server name: cpg
> Jul 22 13:59:32 [1431] willi corosync notice  [SERV  ] Service engine
> loaded: corosync profile loading service [4]
> Jul 22 13:59:32 [1431] willi corosync info    [WD    ] Watchdog is now
> been tickled by corosync.
> Jul 22 13:59:32 [1431] willi corosync info    [WD    ] no resources
> configured.
> Jul 22 13:59:32 [1431] willi corosync notice  [SERV  ] Service engine
> loaded: corosync watchdog service [7]
> Jul 22 13:59:32 [1431] willi corosync notice  [QUORUM] Using quorum
> provider corosync_votequorum
> Jul 22 13:59:32 [1431] willi corosync notice  [VOTEQ ] Waiting for all
> cluster members. Current votes: 1 expected_vo
> tes: 2
> Jul 22 13:59:32 [1431] willi corosync notice  [SERV  ] Service engine
> loaded: corosync vote quorum service v1.0 [5]
> Jul 22 13:59:32 [1431] willi corosync info    [QB    ] server name:
> votequorum
> Jul 22 13:59:32 [1431] willi corosync notice  [SERV  ] Service engine
> loaded: corosync cluster quorum service v0.1 [
> 3]
> Jul 22 13:59:32 [1431] willi corosync info    [QB    ] server name: quorum
> Jul 22 13:59:32 [1431] willi corosync notice  [TOTEM ] adding new UDPU
> member {192.168.63.22}
> Jul 22 13:59:32 [1431] willi corosync notice  [TOTEM ] adding new UDPU
> member {192.168.63.24}
> Jul 22 13:59:32 [1431] willi corosync notice  [TOTEM ] A new membership
> (192.168.63.24:30668) was formed. Members jo
> ined: 24
> Jul 22 13:59:32 [1431] willi corosync notice  [VOTEQ ] Waiting for all
> cluster members. Current votes: 1 expected_vo
> tes: 2
> Jul 22 13:59:32 [1431] willi corosync notice  [VOTEQ ] Waiting for all
> cluster members. Current votes: 1 expected_vo
> tes: 2
> Jul 22 13:59:32 [1431] willi corosync notice  [VOTEQ ] Waiting for all
> cluster members. Current votes: 1 expected_vo
> tes: 2
> Jul 22 13:59:32 [1431] willi corosync notice  [QUORUM] Members[1]: 24
> Jul 22 13:59:32 [1431] willi corosync notice  [MAIN  ] Completed service
> synchronization, ready to provide service.
> Jul 22 13:59:40 [1431] willi corosync notice  [TOTEM ] A new membership
> (192.168.63.22:30672) was formed. Members jo
> ined: 22
> Jul 22 13:59:40 [1431] willi corosync notice  [QUORUM] This node is
> within the primary component and will provide se
> rvice.
> Jul 22 13:59:40 [1431] willi corosync notice  [QUORUM] Members[2]: 22 24
> Jul 22 13:59:40 [1431] willi corosync notice  [MAIN  ] Completed service
> synchronization, ready to provide service.
> root at willi:~/errorblackbox#
> 

Hello,

First please allow me to give you proper warning regarding
ppa.mmogp.com: It is _not_ a good candidate for production use; it is a
freely available ppa, and is _unofficial_. Please use this repository at
your own risk.

In Debian Jessie+, the Pacemaker/Corosync stack is not yet ready for
production, and for that we apologize - in some time it will be
officially available for Debian, but we are still testing and preparing
all of the packages and documentation.

I did some searching regarding your error, and I came across this Redhat
issue that seems to be related, perhaps it can be of some help to you:

https://bugzilla.redhat.com/show_bug.cgi?id=1114852

The next time you get this error, if you could give us not only the full
blackbox output, but also the corosync, pacemaker, libqb, and syslog
logs - that would help us to better identify the issue, and perhaps help
to offer a solution.

If the error is related/similar to that mentioned in the link above - a
good point to note is that it seems to be attempting to read
uninitialized memory:


> [trace] ENTERING qb_rb_close()
> [debug] Free'ing ringbuffer: /dev/shm/qb-create_from_file-header

It should be resolved, it is a serious error.


You could also try using a configuration generated by PCS or CRMSH and
go from that basic variant and built upon it. There are major
differences in the APIs and facilities of the stack between Wheezy and
Jessie.

You can also seek help on OFTC in the #debian-ha channel, as well as
freenode.net's #clusterlabs channel.


I hope that helps to provide you with some direction.


Best,




-- 
Rik (devrikx)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.alioth.debian.org/pipermail/debian-ha-maintainers/attachments/20150723/533327d4/attachment.sig>


More information about the Debian-ha-maintainers mailing list