sheet3
This commit is contained in:
parent
56614805cf
commit
2195a9db0a
51 changed files with 13038 additions and 0 deletions
462
sheet3/1/stream.f
Normal file
462
sheet3/1/stream.f
Normal file
|
|
@ -0,0 +1,462 @@
|
|||
*=======================================================================
|
||||
* Program: STREAM
|
||||
* Programmer: John D. McCalpin
|
||||
* RCS Revision: $Id: stream.f,v 5.6 2005/10/04 00:20:48 mccalpin Exp mccalpin $
|
||||
*-----------------------------------------------------------------------
|
||||
* Copyright 1991-2003: John D. McCalpin
|
||||
*-----------------------------------------------------------------------
|
||||
* License:
|
||||
* 1. You are free to use this program and/or to redistribute
|
||||
* this program.
|
||||
* 2. You are free to modify this program for your own use,
|
||||
* including commercial use, subject to the publication
|
||||
* restrictions in item 3.
|
||||
* 3. You are free to publish results obtained from running this
|
||||
* program, or from works that you derive from this program,
|
||||
* with the following limitations:
|
||||
* 3a. In order to be referred to as "STREAM benchmark results",
|
||||
* published results must be in conformance to the STREAM
|
||||
* Run Rules, (briefly reviewed below) published at
|
||||
* http://www.cs.virginia.edu/stream/ref.html
|
||||
* and incorporated herein by reference.
|
||||
* As the copyright holder, John McCalpin retains the
|
||||
* right to determine conformity with the Run Rules.
|
||||
* 3b. Results based on modified source code or on runs not in
|
||||
* accordance with the STREAM Run Rules must be clearly
|
||||
* labelled whenever they are published. Examples of
|
||||
* proper labelling include:
|
||||
* "tuned STREAM benchmark results"
|
||||
* "based on a variant of the STREAM benchmark code"
|
||||
* Other comparable, clear and reasonable labelling is
|
||||
* acceptable.
|
||||
* 3c. Submission of results to the STREAM benchmark web site
|
||||
* is encouraged, but not required.
|
||||
* 4. Use of this program or creation of derived works based on this
|
||||
* program constitutes acceptance of these licensing restrictions.
|
||||
* 5. Absolutely no warranty is expressed or implied.
|
||||
*-----------------------------------------------------------------------
|
||||
* This program measures sustained memory transfer rates in MB/s for
|
||||
* simple computational kernels coded in FORTRAN.
|
||||
*
|
||||
* The intent is to demonstrate the extent to which ordinary user
|
||||
* code can exploit the main memory bandwidth of the system under
|
||||
* test.
|
||||
*=======================================================================
|
||||
* The STREAM web page is at:
|
||||
* http://www.streambench.org
|
||||
*
|
||||
* Most of the content is currently hosted at:
|
||||
* http://www.cs.virginia.edu/stream/
|
||||
*
|
||||
* BRIEF INSTRUCTIONS:
|
||||
* 0) See http://www.cs.virginia.edu/stream/ref.html for details
|
||||
* 1) STREAM requires a timing function called mysecond().
|
||||
* Several examples are provided in this directory.
|
||||
* "CPU" timers are only allowed for uniprocessor runs.
|
||||
* "Wall-clock" timers are required for all multiprocessor runs.
|
||||
* 2) The STREAM array sizes must be set to size the test.
|
||||
* The value "N" must be chosen so that each of the three
|
||||
* arrays is at least 4x larger than the sum of all the last-
|
||||
* level caches used in the run, or 1 million elements, which-
|
||||
* ever is larger.
|
||||
* ------------------------------------------------------------
|
||||
* Note that you are free to use any array length and offset
|
||||
* that makes each array 4x larger than the last-level cache.
|
||||
* The intent is to determine the *best* sustainable bandwidth
|
||||
* available with this simple coding. Of course, lower values
|
||||
* are usually fairly easy to obtain on cached machines, but
|
||||
* by keeping the test to the *best* results, the answers are
|
||||
* easier to interpret.
|
||||
* You may put the arrays in common or not, at your discretion.
|
||||
* There is a commented-out COMMON statement below.
|
||||
* Fortran90 "allocatable" arrays are fine, too.
|
||||
* ------------------------------------------------------------
|
||||
* 3) Compile the code with full optimization. Many compilers
|
||||
* generate unreasonably bad code before the optimizer tightens
|
||||
* things up. If the results are unreasonably good, on the
|
||||
* other hand, the optimizer might be too smart for me
|
||||
* Please let me know if this happens.
|
||||
* 4) Mail the results to mccalpin@cs.virginia.edu
|
||||
* Be sure to include:
|
||||
* a) computer hardware model number and software revision
|
||||
* b) the compiler flags
|
||||
* c) all of the output from the test case.
|
||||
* Please let me know if you do not want your name posted along
|
||||
* with the submitted results.
|
||||
* 5) See the web page for more comments about the run rules and
|
||||
* about interpretation of the results.
|
||||
*
|
||||
* Thanks,
|
||||
* Dr. Bandwidth
|
||||
*=========================================================================
|
||||
*
|
||||
PROGRAM stream
|
||||
* IMPLICIT NONE
|
||||
C .. Parameters ..
|
||||
INTEGER n,offset,ndim,ntimes
|
||||
PARAMETER (n=2000000,offset=0,ndim=n+offset,ntimes=10)
|
||||
C ..
|
||||
C .. Local Scalars ..
|
||||
DOUBLE PRECISION scalar,t
|
||||
INTEGER j,k,nbpw,quantum
|
||||
C ..
|
||||
C .. Local Arrays ..
|
||||
DOUBLE PRECISION maxtime(4),mintime(4),avgtime(4),
|
||||
$ times(4,ntimes)
|
||||
INTEGER bytes(4)
|
||||
CHARACTER label(4)*11
|
||||
C ..
|
||||
C .. External Functions ..
|
||||
DOUBLE PRECISION mysecond
|
||||
INTEGER checktick,realsize
|
||||
EXTERNAL mysecond,checktick,realsize
|
||||
!$ INTEGER omp_get_num_threads
|
||||
!$ EXTERNAL omp_get_num_threads
|
||||
C ..
|
||||
C .. Intrinsic Functions ..
|
||||
C
|
||||
INTRINSIC dble,max,min,nint,sqrt
|
||||
C ..
|
||||
C .. Arrays in Common ..
|
||||
DOUBLE PRECISION a(ndim),b(ndim),c(ndim)
|
||||
C ..
|
||||
C .. Common blocks ..
|
||||
* COMMON a,b,c
|
||||
C ..
|
||||
C .. Data statements ..
|
||||
DATA avgtime/4*0.0D0/,mintime/4*1.0D+36/,maxtime/4*0.0D0/
|
||||
DATA label/'Copy: ','Scale: ','Add: ',
|
||||
$ 'Triad: '/
|
||||
DATA bytes/2,2,3,3/
|
||||
C ..
|
||||
|
||||
* --- SETUP --- determine precision and check timing ---
|
||||
|
||||
nbpw = realsize()
|
||||
|
||||
PRINT *,'----------------------------------------------'
|
||||
PRINT *,'STREAM Version $Revision: 5.6 $'
|
||||
PRINT *,'----------------------------------------------'
|
||||
WRITE (*,FMT=9010) 'Array size = ',n
|
||||
WRITE (*,FMT=9010) 'Offset = ',offset
|
||||
WRITE (*,FMT=9020) 'The total memory requirement is ',
|
||||
$ 3*nbpw*n/ (1024*1024),' MB'
|
||||
WRITE (*,FMT=9030) 'You are running each test ',ntimes,' times'
|
||||
WRITE (*,FMT=9030) '--'
|
||||
WRITE (*,FMT=9030) 'The *best* time for each test is used'
|
||||
WRITE (*,FMT=9030) '*EXCLUDING* the first and last iterations'
|
||||
|
||||
!$OMP PARALLEL
|
||||
!$OMP MASTER
|
||||
PRINT *,'----------------------------------------------'
|
||||
!$ PRINT *,'Number of Threads = ',OMP_GET_NUM_THREADS()
|
||||
!$OMP END MASTER
|
||||
!$OMP END PARALLEL
|
||||
|
||||
PRINT *,'----------------------------------------------'
|
||||
!$OMP PARALLEL
|
||||
PRINT *,'Printing one line per active thread....'
|
||||
!$OMP END PARALLEL
|
||||
|
||||
!$OMP PARALLEL DO
|
||||
DO 10 j = 1,n
|
||||
a(j) = 2.0d0
|
||||
b(j) = 0.5D0
|
||||
c(j) = 0.0D0
|
||||
10 CONTINUE
|
||||
t = mysecond()
|
||||
!$OMP PARALLEL DO
|
||||
DO 20 j = 1,n
|
||||
a(j) = 0.5d0*a(j)
|
||||
20 CONTINUE
|
||||
t = mysecond() - t
|
||||
PRINT *,'----------------------------------------------------'
|
||||
quantum = checktick()
|
||||
WRITE (*,FMT=9000)
|
||||
$ 'Your clock granularity/precision appears to be ',quantum,
|
||||
$ ' microseconds'
|
||||
PRINT *,'----------------------------------------------------'
|
||||
|
||||
* --- MAIN LOOP --- repeat test cases NTIMES times ---
|
||||
scalar = 0.5d0*a(1)
|
||||
DO 70 k = 1,ntimes
|
||||
|
||||
t = mysecond()
|
||||
a(1) = a(1) + t
|
||||
!$OMP PARALLEL DO
|
||||
DO 30 j = 1,n
|
||||
c(j) = a(j)
|
||||
30 CONTINUE
|
||||
t = mysecond() - t
|
||||
c(n) = c(n) + t
|
||||
times(1,k) = t
|
||||
|
||||
t = mysecond()
|
||||
c(1) = c(1) + t
|
||||
!$OMP PARALLEL DO
|
||||
DO 40 j = 1,n
|
||||
b(j) = scalar*c(j)
|
||||
40 CONTINUE
|
||||
t = mysecond() - t
|
||||
b(n) = b(n) + t
|
||||
times(2,k) = t
|
||||
|
||||
t = mysecond()
|
||||
a(1) = a(1) + t
|
||||
!$OMP PARALLEL DO
|
||||
DO 50 j = 1,n
|
||||
c(j) = a(j) + b(j)
|
||||
50 CONTINUE
|
||||
t = mysecond() - t
|
||||
c(n) = c(n) + t
|
||||
times(3,k) = t
|
||||
|
||||
t = mysecond()
|
||||
b(1) = b(1) + t
|
||||
!$OMP PARALLEL DO
|
||||
DO 60 j = 1,n
|
||||
a(j) = b(j) + scalar*c(j)
|
||||
60 CONTINUE
|
||||
t = mysecond() - t
|
||||
a(n) = a(n) + t
|
||||
times(4,k) = t
|
||||
70 CONTINUE
|
||||
|
||||
* --- SUMMARY ---
|
||||
DO 90 k = 2,ntimes
|
||||
DO 80 j = 1,4
|
||||
avgtime(j) = avgtime(j) + times(j,k)
|
||||
mintime(j) = min(mintime(j),times(j,k))
|
||||
maxtime(j) = max(maxtime(j),times(j,k))
|
||||
80 CONTINUE
|
||||
90 CONTINUE
|
||||
WRITE (*,FMT=9040)
|
||||
DO 100 j = 1,4
|
||||
avgtime(j) = avgtime(j)/dble(ntimes-1)
|
||||
WRITE (*,FMT=9050) label(j),n*bytes(j)*nbpw/mintime(j)/1.0D6,
|
||||
$ avgtime(j),mintime(j),maxtime(j)
|
||||
100 CONTINUE
|
||||
PRINT *,'----------------------------------------------------'
|
||||
CALL checksums (a,b,c,n,ntimes)
|
||||
PRINT *,'----------------------------------------------------'
|
||||
|
||||
9000 FORMAT (1x,a,i6,a)
|
||||
9010 FORMAT (1x,a,i10)
|
||||
9020 FORMAT (1x,a,i4,a)
|
||||
9030 FORMAT (1x,a,i3,a,a)
|
||||
9040 FORMAT ('Function',5x,'Rate (MB/s) Avg time Min time Max time'
|
||||
$ )
|
||||
9050 FORMAT (a,4 (f10.4,2x))
|
||||
END
|
||||
|
||||
*-------------------------------------
|
||||
* INTEGER FUNCTION dblesize()
|
||||
*
|
||||
* A semi-portable way to determine the precision of DOUBLE PRECISION
|
||||
* in Fortran.
|
||||
* Here used to guess how many bytes of storage a DOUBLE PRECISION
|
||||
* number occupies.
|
||||
*
|
||||
INTEGER FUNCTION realsize()
|
||||
* IMPLICIT NONE
|
||||
|
||||
C .. Local Scalars ..
|
||||
DOUBLE PRECISION result,test
|
||||
INTEGER j,ndigits
|
||||
C ..
|
||||
C .. Local Arrays ..
|
||||
DOUBLE PRECISION ref(30)
|
||||
C ..
|
||||
C .. External Subroutines ..
|
||||
EXTERNAL confuse
|
||||
C ..
|
||||
C .. Intrinsic Functions ..
|
||||
INTRINSIC abs,acos,log10,sqrt
|
||||
C ..
|
||||
|
||||
C Test #1 - compare single(1.0d0+delta) to 1.0d0
|
||||
|
||||
10 DO 20 j = 1,30
|
||||
ref(j) = 1.0d0 + 10.0d0** (-j)
|
||||
20 CONTINUE
|
||||
|
||||
DO 30 j = 1,30
|
||||
test = ref(j)
|
||||
ndigits = j
|
||||
CALL confuse(test,result)
|
||||
IF (test.EQ.1.0D0) THEN
|
||||
GO TO 40
|
||||
END IF
|
||||
30 CONTINUE
|
||||
GO TO 50
|
||||
|
||||
40 WRITE (*,FMT='(a)')
|
||||
$ '----------------------------------------------'
|
||||
WRITE (*,FMT='(1x,a,i2,a)') 'Double precision appears to have ',
|
||||
$ ndigits,' digits of accuracy'
|
||||
IF (ndigits.LE.8) THEN
|
||||
realsize = 4
|
||||
ELSE
|
||||
realsize = 8
|
||||
END IF
|
||||
WRITE (*,FMT='(1x,a,i1,a)') 'Assuming ',realsize,
|
||||
$ ' bytes per DOUBLE PRECISION word'
|
||||
WRITE (*,FMT='(a)')
|
||||
$ '----------------------------------------------'
|
||||
RETURN
|
||||
|
||||
50 PRINT *,'Hmmmm. I am unable to determine the size.'
|
||||
PRINT *,'Please enter the number of Bytes per DOUBLE PRECISION',
|
||||
$ ' number : '
|
||||
READ (*,FMT=*) realsize
|
||||
IF (realsize.NE.4 .AND. realsize.NE.8) THEN
|
||||
PRINT *,'Your answer ',realsize,' does not make sense.'
|
||||
PRINT *,'Try again.'
|
||||
PRINT *,'Please enter the number of Bytes per ',
|
||||
$ 'DOUBLE PRECISION number : '
|
||||
READ (*,FMT=*) realsize
|
||||
END IF
|
||||
PRINT *,'You have manually entered a size of ',realsize,
|
||||
$ ' bytes per DOUBLE PRECISION number'
|
||||
WRITE (*,FMT='(a)')
|
||||
$ '----------------------------------------------'
|
||||
END
|
||||
|
||||
SUBROUTINE confuse(q,r)
|
||||
* IMPLICIT NONE
|
||||
C .. Scalar Arguments ..
|
||||
DOUBLE PRECISION q,r
|
||||
C ..
|
||||
C .. Intrinsic Functions ..
|
||||
INTRINSIC cos
|
||||
C ..
|
||||
r = cos(q)
|
||||
RETURN
|
||||
END
|
||||
|
||||
* A semi-portable way to determine the clock granularity
|
||||
* Adapted from a code by John Henning of Digital Equipment Corporation
|
||||
*
|
||||
INTEGER FUNCTION checktick()
|
||||
* IMPLICIT NONE
|
||||
|
||||
C .. Parameters ..
|
||||
INTEGER n
|
||||
PARAMETER (n=20)
|
||||
C ..
|
||||
C .. Local Scalars ..
|
||||
DOUBLE PRECISION t1,t2
|
||||
INTEGER i,j,jmin
|
||||
C ..
|
||||
C .. Local Arrays ..
|
||||
DOUBLE PRECISION timesfound(n)
|
||||
C ..
|
||||
C .. External Functions ..
|
||||
DOUBLE PRECISION mysecond
|
||||
EXTERNAL mysecond
|
||||
C ..
|
||||
C .. Intrinsic Functions ..
|
||||
INTRINSIC max,min,nint
|
||||
C ..
|
||||
i = 0
|
||||
|
||||
10 t2 = mysecond()
|
||||
IF (t2.EQ.t1) GO TO 10
|
||||
|
||||
t1 = t2
|
||||
i = i + 1
|
||||
timesfound(i) = t1
|
||||
IF (i.LT.n) GO TO 10
|
||||
|
||||
jmin = 1000000
|
||||
DO 20 i = 2,n
|
||||
j = nint((timesfound(i)-timesfound(i-1))*1d6)
|
||||
jmin = min(jmin,max(j,0))
|
||||
20 CONTINUE
|
||||
|
||||
IF (jmin.GT.0) THEN
|
||||
checktick = jmin
|
||||
ELSE
|
||||
PRINT *,'Your clock granularity appears to be less ',
|
||||
$ 'than one microsecond'
|
||||
checktick = 1
|
||||
END IF
|
||||
RETURN
|
||||
|
||||
* PRINT 14, timesfound(1)*1d6
|
||||
* DO 20 i=2,n
|
||||
* PRINT 14, timesfound(i)*1d6,
|
||||
* & nint((timesfound(i)-timesfound(i-1))*1d6)
|
||||
* 14 FORMAT (1X, F18.4, 1X, i8)
|
||||
* 20 CONTINUE
|
||||
|
||||
END
|
||||
|
||||
|
||||
|
||||
|
||||
SUBROUTINE checksums(a,b,c,n,ntimes)
|
||||
* IMPLICIT NONE
|
||||
C ..
|
||||
C .. Arguments ..
|
||||
DOUBLE PRECISION a(*),b(*),c(*)
|
||||
INTEGER n,ntimes
|
||||
C ..
|
||||
C .. Local Scalars ..
|
||||
DOUBLE PRECISION aa,bb,cc,scalar,suma,sumb,sumc,epsilon
|
||||
INTEGER k
|
||||
C ..
|
||||
|
||||
C Repeat the main loop, but with scalars only.
|
||||
C This is done to check the sum & make sure all
|
||||
C iterations have been executed correctly.
|
||||
|
||||
aa = 2.0D0
|
||||
bb = 0.5D0
|
||||
cc = 0.0D0
|
||||
aa = 0.5D0*aa
|
||||
scalar = 0.5d0*aa
|
||||
DO k = 1,ntimes
|
||||
cc = aa
|
||||
bb = scalar*cc
|
||||
cc = aa + bb
|
||||
aa = bb + scalar*cc
|
||||
END DO
|
||||
aa = aa*DBLE(n-2)
|
||||
bb = bb*DBLE(n-2)
|
||||
cc = cc*DBLE(n-2)
|
||||
|
||||
C Now sum up the arrays, excluding the first and last
|
||||
C elements, which are modified using the timing results
|
||||
C to confuse aggressive optimizers.
|
||||
|
||||
suma = 0.0d0
|
||||
sumb = 0.0d0
|
||||
sumc = 0.0d0
|
||||
!$OMP PARALLEL DO REDUCTION(+:suma,sumb,sumc)
|
||||
DO 110 j = 2,n-1
|
||||
suma = suma + a(j)
|
||||
sumb = sumb + b(j)
|
||||
sumc = sumc + c(j)
|
||||
110 CONTINUE
|
||||
|
||||
epsilon = 1.D-6
|
||||
|
||||
IF (ABS(suma-aa)/suma .GT. epsilon) THEN
|
||||
PRINT *,'Failed Validation on array a()'
|
||||
PRINT *,'Target Sum of a is = ',aa
|
||||
PRINT *,'Computed Sum of a is = ',suma
|
||||
ELSEIF (ABS(sumb-bb)/sumb .GT. epsilon) THEN
|
||||
PRINT *,'Failed Validation on array b()'
|
||||
PRINT *,'Target Sum of b is = ',bb
|
||||
PRINT *,'Computed Sum of b is = ',sumb
|
||||
ELSEIF (ABS(sumc-cc)/sumc .GT. epsilon) THEN
|
||||
PRINT *,'Failed Validation on array c()'
|
||||
PRINT *,'Target Sum of c is = ',cc
|
||||
PRINT *,'Computed Sum of c is = ',sumc
|
||||
ELSE
|
||||
PRINT *,'Solution Validates!'
|
||||
ENDIF
|
||||
|
||||
END
|
||||
|
||||
Loading…
Add table
Add a link
Reference in a new issue