Monkey stumbles upon warpdrive while coding

"I thought it would take 1000 monkeys 1000 years" says ape.

My Photo
Name:
Location: Sydney, New South Wales, Australia

Back from the US, and loving the beaches. Lucky enough to be working in science still.

Friday, February 25, 2005

Dynamical loading libraries problem

So, we're in:
/home/cjoneill/StGermain/StGermain/FE/testApps
And when I compile I get the error (again):
Undefined first referenced
symbol in file
dlsym /home/cjoneill/StGermain/build/lib/libStGermain.a(Plugins.o) (symbol belongs to implicit dependency /usr/lib/libdl.so.1)

There are a few other dl symbols aswell - these control the dynamic loading. The offending library is /home/cjoneill/StGermain/build/lib/libStGermain.a
And the work around is to include more flags in the sun compile:
/home/cjoneill/StGermain/build/lib/libStGermain.a -lm -lnsl -lsocket -lposix4 -ldl
And it builds!
Nb: these go in the configure script as:
EXPORT_DYNAMIC_LFLAGS

Shared libraries on the suns

Ok, so I was compiling shared libraries on the sun using the -share flag:
gcc blahblah -share blah

(Nb: this is in the SO_LFLAGS variable in the StGermain configure script).

And I was getting these :

ld: fatal relocations remain against allocatable but non-writable sections

It turns out the -share flag is actually like using a -G -dys (?sp) -z text flags, and for this I don't think you want the z text stuff.
So, there were 2 options: use -share -mimpure-text which gets sort of leaves off the -z text stuff (as far as I can figure) OR just use -G, which is what I did. (Note -mimpure-text is only a sun option apparently).

Next speedbump:
ld: fatal: Symbol referencing errors.
(in: /home/cjoneill/StGermain/StGermain/FE/testApps)

Solved mpi libraries problem

Ok, so I fixed this.
Problem was linking to mpi libs, ie:
gcc blahblah -L/.../mpich/lib -lmpich
This was all I thought it needed. It came up with the ld problem:
fatal: Symbol referencing error
Undefined first referenced symbol in file
The symbol was sched_yield.

This error generally means that I've created a method for something a .h file, but haven't created a definition for it in a .c file.

Turns out that on the suns sched_yield needs a flag in the gcc command line: -lrt (sometimes this is -lposix4 or something). Sooo, I included this in the library flags so the gcc line looks like this:
gcc blahblah -L/.../mpich/lib -lmpich -lrt
And hey presto. Got past the tests.
Bombed in 'StGermain/Base/src'

Thursday, February 24, 2005

Troubles with mpi libraries

I've run into a tree at testMemory. Its wasn't linking, so I played around with this manually:

gcc -pipe -Wall -g -DDEBUG -O0 -o /home/cjoneill/StGermain/build/tests/testMemory1 /home/cjoneill/StGermain/build/tmp/test-libStGermainBaseFoundation/JournalWrappers.o -I/home/cjoneill/StGermain/build/include -I/home/cjoneill/StGermain/build/include/StGermain -I/opt1/site-sparc-sol8/mpich/include testMemory1.c /home/cjoneill/StGermain/build/lib/libStGermainBaseFoundation.a -L/opt1/site-sparc-sol8/mpich/lib -lmpich -lm

So this looks ok. But I'm getting the:

Undefined first referenced
symbol in file

-this usually means that a symbol is created somewhere (in a .h file) but no definition was made for it (in a .c file). This sounds to me like I haven't included something, but can't imagine what.
Funny thing is, I can compile it mpicc, with the option mpicc -cc=gcc.
(some of the options are a little funky without specifying gcc).
There is only one warning:
gcc: /home/cjoneill/StGermain/build/lib/libStGermainBaseFoundation.a: linker input file unused because linking not done
(why was linking not done? Sounds dodgy). Yet it compiles something that runs.

Ape found lost in jungle of St Germain

Last time I installed StGermain-Snark-Underworld I completely reinstalled my operating system (from redhat to gentoo - since redhat went to the darkside and started charging money).
I also had one Mr Turnbull helping me which made life a whole lot easier.

So now I'm in a foreign country (Texas, USA) on a Sun Solaris system trying to compile the whole she-bang again.

So first: changed everything during configure to GNU: this includes gcc g++ g77 etc etc. The sun compilers suck (at least for this). This also includes using gmake instead of make (sunOS make also sucks - doesn't get past the first line of the make file).

The configure is very hands on - I had to include an extra operating system (SunOS) everywhere. and also for petsc a new system architecture called solaris-2.8 or something.

First big ld problem is a sun thing: concerns the rpath. Now in theory the rpath flags:
'-Xlinker -rpath -Xlinker ${LIB_DIR}
should work, right? Not on solaris! Found this fix on the web:
'-Wl,-rpath, $LIB_DIR}

Next ld problem: linking to the mpi_python libraries. Was about to start headbutting the computer but Pat suggested deleting libpython & mpipython from the def_sub list, ie.
1. Edit the StGermain/compatibility/Makefile.def file
2. remove "libpython mpipython" from the def_sub list.

So this swept that problem under the carpet for now.

Next is where I'm stuck now. MPI LIBRARIES!

Here's the error:

/StGermain/StGermain/Base/Foundation/tests> gmake
/usr/local/bin/gcc -pipe -Wall -g -DDEBUG -O0 -o /home/cjoneill/StGermain/build/tests/testMemory0 /home/cjoneill/StGermain/build/tmp/test-libStGermainBaseFoundation/JournalWrappers.o -I/home/cjoneill/StGermain/build/include -I/home/cjoneill/StGermain/build/include/StGermain -I/opt1/site-sparc-sol8/mpich/include -I/opt/antelope/4.6/include/libxml2 testMemory0.c /home/cjoneill/StGermain/build/lib/libStGermainBaseFoundation.a -L/opt1/site-sparc-sol8/mpich/lib -lmpich -lpmpich -lm -L/opt/antelope/4.6/lib -R/opt/antelope/4.6/lib -lxml2 -L/opt/antelope/4.6/lib -R/opt/antelope/4.6/lib -lz -lpthread -lm -lsocket -lnsl
Undefined first referenced
symbol in file
sched_yield /opt1/site-sparc-sol8/mpich/lib/libmpich.a(p4_tsr.o)
ld: fatal: Symbol referencing errors.

I went into the offending directory so I wouldn't have to compile everything. So: the problem is linking with the libmpich, which is definitely in: /opt1/site-sparc-sol8/mpich.
I tried setting MPI_DIR='/opt1/site-sparc-sol8/mpich' no luck.
I also noticed in many forums that many linux->sun problems are solved by including the flags:
-lsocket -lnsl
manually. But these are already included so it shouldn't be this.
HELP!