Bug #659
OML server segfault
| Status: | New | Start date: | 24/01/2012 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | - | % Done: | 0% |
|
| Category: | server | |||
| Target version: | - |
Description
Here's an email from Pieter Becue:
I am trying to run all the basic OMF functionality tests, to ensure we have a correct OMF/OML setup.
However, when I try to run test03.rb (from repository/test/exp) it gives me this output :
I am using oml2-apps-2.5.0 and oml server 2.6.1
pbecue@ExpController:/usr/share/omf-expctl-5.3/repository/test/exp$ omf exec test03.rb
INFO NodeHandler: OMF Experiment Controller 5.3 (git 4c44bbe)
INFO NodeHandler: Slice ID: default_slice (default)
INFO NodeHandler: Experiment ID: default_slice-2012-01-23t15.00.09+01.00
INFO NodeHandler: Message authentication is disabled
INFO NodeHandler: Web interface available at: http://10.11.31.22:4000
INFO Experiment: load system:exp:stdlib
INFO property.resetDelay: value = 210 (Fixnum)
INFO property.resetTries: value = 1 (Fixnum)
INFO Experiment: load system:exp:eventlib
INFO Experiment: load test03.rb
INFO property.res1: value = "omf.ibbt.open.node34" (String)
INFO property.res2: value = "omf.ibbt.open.node35" (String)
INFO property.mode1: value = "adhoc" (String)
INFO property.mode2: value = "adhoc" (String)
INFO property.wifi: value = "g" (String)
INFO property.channel: value = "6" (String)
INFO Topology: Loading topology 'omf.ibbt.open.node34'.
INFO Topology: Loading topology 'omf.ibbt.open.node35'.
INFO Experiment: Switching ON resources which are OFF
INFO ALL_UP_AND_INSTALLED: Event triggered. Starting the associated tasks.
INFO exp: Request from Experiment Script: Wait for 10s....
INFO omf.ibbt.open.node34: Device 'net/w0' reported Not-Associated
INFO omf.ibbt.open.node35: Device 'net/w0' reported Not-Associated
INFO exp: Request from Experiment Script: Wait for 10s....
ERROR NodeHandler:
ERROR NodeHandler: The resource 'omf.ibbt.open.node35' reports that an error occured
ERROR NodeHandler: while running the application 'test:app:otr2#1'
ERROR NodeHandler: The error message is '# ERROR Socket(sock): the remote peer closed the connection'
ERROR NodeHandler:
ERROR NodeHandler: The resource 'omf.ibbt.open.node35' reports that an error occured
ERROR NodeHandler: while running the application 'test:app:otr2#1'
ERROR NodeHandler: The error message is ' 32: Broken pipe'
ERROR NodeHandler:
ERROR NodeHandler: The resource 'omf.ibbt.open.node34' reports that an error occured
ERROR NodeHandler: while running the application 'test:app:otg2#1'
ERROR NodeHandler: The error message is '# ERROR Socket(sock): the remote peer closed the connection'
ERROR NodeHandler:
ERROR NodeHandler: The resource 'omf.ibbt.open.node34' reports that an error occured
ERROR NodeHandler: while running the application 'test:app:otg2#1'
ERROR NodeHandler: The error message is ' 32: Broken pipe'
INFO exp: TEST - Running ifconfig on Sender
INFO exp: Request from Experiment Script: Wait for 15s....
INFO exp: TEST - Running ifconfig on Receiver
INFO exp: Request from Experiment Script: Wait for 15s....
INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
INFO NodeHandler:
INFO NodeHandler: Shutting down experiment, please wait...
INFO NodeHandler:
INFO run: Experiment default_slice-2012-01-23t15.00.09+01.00 finished after 1:0
And this on the side of the OML server :
INFO OML2 Server V2.6.1
INFO OML Protocol V3
INFO Copyright 2007-2011 NICTA
DEBUG Socket(server): Socket 3 successfully created
DEBUG Socket(server): Socket bound to port: 3003
INFO Database backend: 'sqlite'
INFO Creating SQLite3 databases in /home/pbecue/usr/var/oml2
INFO 'server-client:4': new client connected
DEBUG3 Eventloop: received 12 octets
DEBUG 'server-client:4': C_HEADER: received data, 12
DEBUG 'server-client:4': Meta <protocol>:<3>
DEBUG 'server-client:4': buffer repacked to 12 bytes (end of this pass)
DEBUG3 Eventloop: received 440 octets
DEBUG 'server-client:4': C_HEADER: received data, 440
DEBUG 'server-client:4': Meta <experiment-id>:<default_slice-2012-01-23t15.00.09+01.00>
INFO New experiment 'default_slice-2012-01-23t15.00.09+01.00'
INFO Opening SQLite database for experiment 'default_slice-2012-01-23t15.00.09+01.00', path is '/home/pbecue/usr/var/oml2/default_slice-2012-01-23t15.00.09+01.00.sq3'
DEBUG prepare to exec BEGIN TRANSACTION;
DEBUG Database table list seems empty; need to create tables.
DEBUG Got table list with 0 tables in it
DEBUG prepare to exec CREATE TABLE _senders (name TEXT PRIMARY KEY, id INTEGER UNIQUE);
DEBUG prepare to exec CREATE TABLE _experiment_metadata (key TEXT PRIMARY KEY, value TEXT);
DEBUG prepare to exec INSERT INTO "_experiment_metadata" ("key", "value") VALUES ('start_time', '1327334534');
INFO Experiment 'default_slice-2012-01-23t15.00.09+01.00': Set DB start-time = 1327334534
DEBUG 'server-client:4': Meta <start_time>:<1327360150>
DEBUG 'server-client:4': Meta <sender-id>:<Receiver>
DEBUG prepare to exec INSERT INTO "_senders" ("name", "id") VALUES ('Receiver', '1');
DEBUG 'server-client:4': Meta <app-name>:<>
DEBUG 'server-client:4': Meta <>:<1 _udp_in ts_avg:double ts_min:double ts_max:double flow_id_avg:double flow_id_min:double flow_id_max:double seq_no_avg:double seq_no_min:double seq_no_max:double pkt_length_avg:double pkt_length_min:double pkt_length_max:double src_host:string src_port_avg:double src_port_min:double src_port_max:double>
WARN 'server-client:4': Unknown meta info '' (1 _udp_in ts_avg:double ts_min:double ts_max:double flow_id_avg:double flow_id_min:double flow_id_max:double seq_no_avg:double seq_no_min:double seq_no_max:double pkt_length_avg:double pkt_length_min:double pkt_length_max:double src_host:string src_port_avg:double src_port_min:double src_port_max:double) ignored
DEBUG 'server-client:4': Meta <content>:<binary>
DEBUG 'server-client:4': remaining bytes not enough to process a measurement
DEBUG 'server-client:4': buffer repacked to 0 bytes (end of this pass)
INFO 'server-client:8': new client connected
DEBUG3 Eventloop: received 12 octets
DEBUG 'server-client:8': C_HEADER: received data, 12
DEBUG 'server-client:8': Meta <protocol>:<3>
DEBUG 'server-client:8': buffer repacked to 12 bytes (end of this pass)
DEBUG3 Eventloop: received 439 octets
DEBUG 'server-client:8': C_HEADER: received data, 439
DEBUG 'server-client:8': Meta <experiment-id>:<default_slice-2012-01-23t15.00.09+01.00>
INFO Experiment 'default_slice-2012-01-23t15.00.09+01.00': Database 'default_slice-2012-01-23t15.00.09+01.00' already open (1 clients)
DEBUG 'server-client:8': Meta <start_time>:<1322871785>
DEBUG 'server-client:8': Meta <sender-id>:<Sender>
DEBUG prepare to exec INSERT INTO "_senders" ("name", "id") VALUES ('Sender', '2');
DEBUG 'server-client:8': Meta <app-name>:<>
DEBUG 'server-client:8': Meta <>:<1 _udp_out ts_avg:double ts_min:double ts_max:double flow_id_avg:double flow_id_min:double flow_id_max:double seq_no_avg:double seq_no_min:double seq_no_max:double pkt_length_avg:double pkt_length_min:double pkt_length_max:double dst_host:string dst_port_avg:double dst_port_min:double dst_port_max:double>
WARN 'server-client:8': Unknown meta info '' (1 _udp_out ts_avg:double ts_min:double ts_max:double flow_id_avg:double flow_id_min:double flow_id_max:double seq_no_avg:double seq_no_min:double seq_no_max:double pkt_length_avg:double pkt_length_min:double pkt_length_max:double dst_host:string dst_port_avg:double dst_port_min:double dst_port_max:double) ignored
DEBUG 'server-client:8': Meta <content>:<binary>
DEBUG 'server-client:8': remaining bytes not enough to process a measurement
DEBUG 'server-client:8': buffer repacked to 0 bytes (end of this pass)
DEBUG3 Eventloop: received 121 octets
DEBUG 'server-client:8': C_BINARY_DATA: received data, 121
Segmentation fault
The EC log files are attached.
Any idea what might be causing the segmentation fault ? I saw some bug reports in the repository about segmentation faults with OML, but I think those were all fixed in the versions I am using ?
Thanks in advance !
Best regards,
Pieter
History
Updated by Olivier Mehani 4 months ago
Maybe upgrading oml-apps could help?
Updated by Olivier Mehani 3 months ago
On Fri, Jan 27, 2012 at 09:05:19PM +1100, Pieter Becue wrote:
I re-ran the experiment with liboml-2.6.1 and unfortunately got the
same results (when running test03.rb) :
Updated by Olivier Mehani 3 months ago
Running the experiment on norbit gives the following, but seems to work.
$ omf-5.3 exec /usr/share/omf-expctl-5.3/repository/test/exp/test03.rb -- --res1 omf.nicta.node1 --res2 omf.nicta.node2 INFO NodeHandler: OMF Experiment Controller 5.3 (git 552e614) INFO NodeHandler: Slice ID: default_slice (default) INFO NodeHandler: Experiment ID: default_slice-2012-02-10t11.39.42+11.00 INFO NodeHandler: Message authentication is disabled WARN server: TCPServer Error: Address already in use - bind(2) WARN server: TCPServer Error: Address already in use - bind(2) INFO NodeHandler: Web interface available at: http://10.0.0.200:4000 INFO Experiment: load system:exp:stdlib INFO property.resetDelay: value = 210 (Fixnum) INFO property.resetTries: value = 1 (Fixnum) INFO Experiment: load system:exp:eventlib INFO Experiment: load /usr/share/omf-expctl-5.3/repository/test/exp/test03.rb INFO property.res1: value = "omf.nicta.node1" (String) INFO property.res2: value = "omf.nicta.node2" (String) INFO property.mode1: value = "adhoc" (String) INFO property.mode2: value = "adhoc" (String) INFO property.wifi: value = "g" (String) INFO property.channel: value = "6" (String) INFO Topology: Loading topology 'omf.nicta.node1'. INFO Topology: Loading topology 'omf.nicta.node2'. INFO Experiment: Switching ON resources which are OFF INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: omf.nicta.node2,omf.nicta.node1) INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: omf.nicta.node2,omf.nicta.node1) INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: omf.nicta.node2,omf.nicta.node1) INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: omf.nicta.node2,omf.nicta.node1) INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: omf.nicta.node2,omf.nicta.node1) INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: omf.nicta.node2,omf.nicta.node1) INFO ALL_UP_AND_INSTALLED: Event triggered. Starting the associated tasks. INFO exp: Request from Experiment Script: Wait for 10s.... INFO omf.nicta.node2: Device 'net/w0' reported Not-Associated INFO omf.nicta.node1: Device 'net/w0' reported Not-Associated INFO exp: Request from Experiment Script: Wait for 10s.... ERROR NodeHandler: ERROR NodeHandler: The resource 'omf.nicta.node1' reports that an error occured ERROR NodeHandler: while running the application 'test:app:otg2#1' ERROR NodeHandler: The error message is 'WARNING: using custom log functions is now deprecated;' ERROR NodeHandler: ERROR NodeHandler: The resource 'omf.nicta.node2' reports that an error occured ERROR NodeHandler: while running the application 'test:app:otr2#1' ERROR NodeHandler: The error message is 'WARNING: using custom log functions is now deprecated;' ERROR NodeHandler: ERROR NodeHandler: The resource 'omf.nicta.node2' reports that an error occured ERROR NodeHandler: while running the application 'test:app:otr2#1' ERROR NodeHandler: The error message is 'WARNING: this feature will be removed in OML v2.7.0' ERROR NodeHandler: ERROR NodeHandler: The resource 'omf.nicta.node1' reports that an error occured ERROR NodeHandler: while running the application 'test:app:otg2#1' ERROR NodeHandler: The error message is 'WARNING: this feature will be removed in OML v2.7.0' INFO omf.nicta.node2: Device 'net/w0' reported 36:8D:7E:1E:A1:1D INFO omf.nicta.node1: Device 'net/w0' reported 36:8D:7E:1E:A1:1D INFO exp: TEST - Running ifconfig on Sender INFO exp: Request from Experiment Script: Wait for 15s.... INFO exp: TEST - Running ifconfig on Receiver INFO exp: Request from Experiment Script: Wait for 15s.... INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks. INFO NodeHandler: INFO NodeHandler: Shutting down experiment, please wait... INFO NodeHandler: INFO run: Experiment default_slice-2012-02-10t11.39.42+11.00 finished after 2:15
I reckon the TCPServer Error should be ignored. The important bit seems to be the messages saying
ERROR NodeHandler: ERROR NodeHandler: The resource 'omf.nicta.node1' reports that an error occured ERROR NodeHandler: while running the application 'test:app:otg2#1' ERROR NodeHandler: The error message is 'WARNING: using custom log functions is now deprecated;' ERROR NodeHandler: ERROR NodeHandler: The resource 'omf.nicta.node2' reports that an error occured ERROR NodeHandler: while running the application 'test:app:otr2#1' ERROR NodeHandler: The error message is 'WARNING: using custom log functions is now deprecated;' ERROR NodeHandler: ERROR NodeHandler: The resource 'omf.nicta.node2' reports that an error occured ERROR NodeHandler: while running the application 'test:app:otr2#1' ERROR NodeHandler: The error message is 'WARNING: this feature will be removed in OML v2.7.0' ERROR NodeHandler: ERROR NodeHandler: The resource 'omf.nicta.node1' reports that an error occured ERROR NodeHandler: while running the application 'test:app:otg2#1' ERROR NodeHandler: The error message is 'WARNING: this feature will be removed in OML v2.7.0'
where Pieter's instance segfaults.
Pieter: would you mind giving us the output of dpkg -l \*oml\* on your experiment controller and your nodes? What image are you using on the nodes?
Updated by Pieter Becue 3 months ago
The output of dpkg -l \*oml\* :
root@*ExpController*:~# dpkg -l \*oml\*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Description
+++-==============-==============-============================================
un liboml <none> (no description available)
un liboml-dev <none> (no description available)
ii liboml2-0 2.6.1-ubuntu3 OML: The Orbit Measurement Library
ii liboml2-dev 2.6.1-ubuntu3 OML measurement library headers
un oml-server <none> (no description available)
On resource:
root@*node34*:~# dpkg -l \*oml\*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Cfg-files/Unpacked/Failed-cfg/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Description
+++-==============-==============-============================================
un liboml <none> (no description available)
un liboml-dev <none> (no description available)
ii liboml2-0 2.6.1-ubuntu3 OML: The Orbit Measurement Library
ii liboml2-dev 2.6.1-ubuntu3 OML measurement library headers
un oml-server <none> (no description available)
ii oml2-otg2 2.4.0-ubuntu1 Orbit Traffic Generator
/usr/bin/otg2 on the resource refers to oml2-apps-2.5.0/otg/oml2-otg
Updated by Olivier Mehani 3 months ago
Hi Pieter,
Where is your OML server (the one that segfaults) located?
On our EC, I get the following output (truncated):
$ dpkg -l \*oml\* Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Description +++-===========================-===========================-====================================================================== rc liboml 2.3.9-ubuntu2 OML: The Orbit Measurement Library un liboml-dev <none> (no description available) ii liboml2-0 2.6.1-ubuntu3 OML: The Orbit Measurement Library ii liboml2-dev 2.6.1-ubuntu3 OML measurement library headers (...) ii oml2-server 2.6.1-ubuntu3 OML measurement server
Could you give us the output of that same command on the machine running the OML server? It doesn't seem to be the EC, in your case.
Updated by Pieter Becue 3 months ago
Yes idd, our OML server runs on the AM (Our AM and EC run on different servers). Here's the output of dpkg -l \*oml\* :
||/ Name Version Description
+++-==============-==============-============================================
un liboml <none> (no description available)
un liboml-dev <none> (no description available)
ii liboml2-0 2.6.1-ubuntu1 OML: The Orbit Measurement Library
un liboml2-dev <none> (no description available)
un oml-server <none> (no description available)
ii oml2-server 2.6.1-ubuntu3 OML measurement server