Commit Graph

1030 Commits

Author SHA1 Message Date
Alexey Bataev
25d3de8a0a [OPENMP][NVPTX]Reduce number of barriers in reductions.
After the fix for the syncthreads we don't need to generate extra
barriers for the parallel reductions.

llvm-svn: 350530
2019-01-07 15:45:09 +00:00
Joel E. Denny
bae586fb0a [OpenMP] Refactor const restriction for linear
As discussed in D56113, this patch refactors the implementation of the
const restriction for linear to reuse a function introduced by D56113.
A side effect is that, if a variable has mutable members, this
diagnostic is now skipped, and the diagnostic for the variable not
being an integer or pointer is reported instead.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D56299

llvm-svn: 350441
2019-01-04 22:12:13 +00:00
Joel E. Denny
d2649292ef [OpenMP] Refactor const restriction for reductions
As discussed in D56113, this patch refactors the implementation of the
const restriction for reductions to reuse a function introduced by
D56113.  A side effect is that diagnostics sometimes now say
"variable" instead of "list item" when a list item is a variable.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D56298

llvm-svn: 350440
2019-01-04 22:11:56 +00:00
Joel E. Denny
e6234d1429 [OpenMP] Replace predetermined shared for const variable
The following appears in OpenMP 3.1 sec. 2.9.1.1 as a predetermined
data-sharing attribute:

> Variables with const-qualified type having no mutable member are
> shared.

It does not appear in OpenmP 4.0, 4.5, or 5.0.  This patch removes the
implementation of that attribute when the requested OpenMP version is
greater than 3.1.

One effect of that removal is that `default(none)` affects const
variables without mutable members.

Also, without this patch, if a const variable without mutable members
was explicitly lastprivate or private, it was an error because it was
predetermined shared.  Now, clang instead complains that it's const
without mutable fields, which is a more intelligible diagnostic.  That
should be fine for all of the above versions because they all have
something like the following, which is quoted from OpenMP 5.0
sec. 2.19.3:

> A variable that is privatized must not have a const-qualified type
> unless it is of class type with a mutable member. This restriction does
> not apply to the firstprivate clause.

reduction and linear clauses already have separate checks for const
variables.  Future patches will merge the implementations.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D56113

llvm-svn: 350439
2019-01-04 22:11:31 +00:00
Alexey Bataev
8e009036c9 [OPENMP][NVPTX]Use new functions from the runtime library.
Updated codegen to use the new functions from the runtime library.

llvm-svn: 350415
2019-01-04 17:25:09 +00:00
Alexey Bataev
a3924b517e [OPENMP][NVPTX]Use __kmpc_barrier_simple_spmd(nullptr, 0) instead of
nvvm_barrier0.

Use runtime functions instead of the direct call to the nvvm intrinsics.
It allows to prevent some dangerous LLVM optimizations, that breaks the
code for the NVPTX target.

llvm-svn: 350328
2019-01-03 16:25:35 +00:00
Patrick Lyster
e13b1e3299 [OpenMP] Added support for explicit mapping of classes using 'this' pointer. Differential revision: https://reviews.llvm.org/D55982
llvm-svn: 350252
2019-01-02 19:28:48 +00:00
Alexey Bataev
db43f0696e [OPENMP]Fix processing of the clauses on target combined directives.
For constants with the predefined data-sharing clauses we may had
troubles with the target combined directives. It may cause compiler
crash in some corner cases.

llvm-svn: 350127
2018-12-28 17:27:32 +00:00
Nico Weber
170a55b893 Pass a concrete triple for two OpenMP tests that depend on TLS
Not all %itanium_abi_triple values support TLS. Makes
OpenMP/declare_reduction_codegen.cpp, OpenMP/parallel_copyin_codegen.cpp for
%itanium_abi_triples without TLS support.

Alternatively we could pass -fnoopenmp-use-tls and tweak some of the CHECK
lines, but this seems a bit simpler.

Fixes PR40156.

Differential Revision: https://reviews.llvm.org/D56086

llvm-svn: 350067
2018-12-26 16:06:26 +00:00
Michael Kruse
0535137e4a [CodeGen] Generate llvm.loop.parallel_accesses instead of llvm.mem.parallel_loop_access metadata.
Instead of generating llvm.mem.parallel_loop_access metadata, generate
llvm.access.group on instructions and llvm.loop.parallel_accesses on
loops. There is one access group per generated loop.

This is clang part of D52116/r349725.

Differential Revision: https://reviews.llvm.org/D52117

llvm-svn: 349823
2018-12-20 21:24:54 +00:00
Alexey Bataev
ce90181751 [OPENMP]Mark the loop as started when initialized.
Need to mark the loop as started when the initialization statement is
found. It is required to prevent possible incorrect loop iteraton
variable detection during template instantiation and fix the compiler
crash during the codegen.

llvm-svn: 349657
2018-12-19 18:16:37 +00:00
Joel E. Denny
0fdf5a9acc [OpenMP] Fix data sharing analysis in nested clause
Without this patch, clang doesn't complain that X needs explicit data
sharing attributes in the following:

```
 #pragma omp target teams default(none)
 {
   #pragma omp parallel num_threads(X)
     ;
 }
```

However, clang does produce that complaint after the braces are
removed.  With this patch, clang complains in both cases.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D55861

llvm-svn: 349635
2018-12-19 15:59:47 +00:00
Kelvin Li
ef57943e3f [OPENMP] parsing and sema support for 'close' map-type-modifier
A map clause with the close map-type-modifier is a hint to 
prefer that the variables are mapped using a copy into faster 
memory.

Patch by Ahsan Saghir (saghir)

Differential Revision: https://reviews.llvm.org/D55719

llvm-svn: 349551
2018-12-18 22:18:41 +00:00
Alexey Bataev
6a1b06bcd4 [OPENMP][NVPTX]Emit shared memory buffer for reduction as 128 bytes
buffer.

Seems to me, nvlink has a bug with the proper support of the weakly
linked symbols. It does not allow to define several shared memory buffer
with the different sizes even with the weak linkage. Instead we always
use 128 bytes buffer to prevent nvlink from the error message emission.

llvm-svn: 349540
2018-12-18 21:01:42 +00:00
Alexey Bataev
29d47fcb30 [OPENMP][NVPTX]Added extra sync point to the inter-warp copy function.
The parallel reduction operation requires an extra synchronization point
in the inter-warp copy function to avoid divergence.

llvm-svn: 349525
2018-12-18 19:20:15 +00:00
Alexey Bataev
ae51b96f99 [OPENMP][NVPTX]Improved interwarp copy function.
Inlined runtime with the current implementation of the interwarp copy
function leads to the undefined behavior because of the not quite
correct implementation of the barriers. Start using generic
__kmpc_barier function instead of the custom made barriers.

llvm-svn: 349192
2018-12-14 21:00:58 +00:00
Alexey Bataev
6393eb7ec6 [OPENMP][NVPTX] Fix globalization of the mapped array sections.
If the array section is based on pointer and this sections is mapped in
target region + then it is used in the inner parallel region, it also
must be globalized as the pointer itself is passed by value, not by
reference.

llvm-svn: 348492
2018-12-06 15:35:13 +00:00
Alexey Bataev
2c1ff9dda7 [OPENMP][NVPTX]Fixed emission of the critical region.
Critical regions in NVPTX are the constructs, which, generally speaking,
are not supported by the NVPTX target. Instead we're using special
technique to handle the critical regions. Currently they are supported
only within the loop and all the threads in the loop must execute the
same critical region.
Inside of this special regions the regions still must be emitted as
critical, to avoid possible data races between the teams +
synchronization must use __kmpc_barrier functions.

llvm-svn: 348272
2018-12-04 15:25:01 +00:00
Alexey Bataev
c3028cac24 [OPENMP][NVPTX]Mark __kmpc_barrier functions as convergent.
__kmpc_barrier runtime functions must be marked as convergent to prevent
some dangerous optimizations. Also, for NVPTX target all barriers must
be emitted as simple barriers.

llvm-svn: 348271
2018-12-04 15:03:25 +00:00
Aaron Ballman
4b5b0c0025 Move AST tests into their own test directory; NFC.
This moves everything primarily testing the functionality of -ast-dump and -ast-print into their own directory, rather than leaving the tests spread around the testing directory.

llvm-svn: 348017
2018-11-30 18:43:02 +00:00
Alexey Bataev
3ce5d827fc [OPENMP][NVPTX]Call get __kmpc_global_thread_num in worker after
initialization.

Function __kmpc_global_thread_num  should be called only after
initialization, not earlier.

llvm-svn: 347919
2018-11-29 21:21:32 +00:00
Gheorghe-Teodor Bercea
2b40470c61 [OpenMP] Add a new version of the SPMD deinit kernel function
Summary: This patch adds a new runtime for the SPMD deinit kernel function which replaces the previous function. The new function takes as argument the flag which signals whether the runtime is required or not. This enables the compiler to optimize out the part of the deinit function which are not needed.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D54970

llvm-svn: 347915
2018-11-29 20:53:49 +00:00
Alexey Bataev
719713ab7d [OPENMP]Fix emission of the target regions in virtual functions.
Fixed emission of the target regions found in the virtual functions.
Previously we may end up with the situation when those regions could be
skipped.

llvm-svn: 347793
2018-11-28 19:00:07 +00:00
Alexey Bataev
a116602475 [OPENMP][NVPTX]Basic support for reductions across the teams.
Added basic codegen support for the reductions across the teams.

llvm-svn: 347715
2018-11-27 21:24:54 +00:00
Alexey Bataev
e8ad4b7124 [OPENMP][NVPTX]Emit default locations with the correct Exec|Runtime
modes.

If the region is inside target|teams|distribute region, we can emit the
locations with the correct info for execution mode and runtime mode.
Patch adds this ability to the NVPTX codegen to help the optimizer to
produce better code.

llvm-svn: 347583
2018-11-26 18:37:09 +00:00
Alexey Bataev
ceeaa48052 [OPENMP][NVPTX]Emit default locations as constant with undefined mode.
For the NVPTX target default locations should be emitted as constants +
additional info must be emitted in the reserved_2 field of the ident_t
structure. The 1st bit controls the execution mode and the 2nd bit
controls use of the lightweight runtime. The combination of the bits for
Non-SPMD mode + lightweight runtime represents special undefined mode,
used outside of the target regions for orphaned directives or functions.
Should allow and additional optimization inside of the target regions.

llvm-svn: 347425
2018-11-21 21:04:34 +00:00
Alexey Bataev
92b33652f6 [OPENMP]Fix handling of the LCVs in loop-based directives.
Loop-control variables with the default data-sharing attributes should
not be captured in the OpenMP region as they are private by default.
Also, default attributes should be emitted for such variables in the
inner OpenMP regions for the correct data sharing during codegen.

llvm-svn: 347409
2018-11-21 19:41:10 +00:00
Kelvin Li
efbe4afbda [OPENMP] Support relational-op != (not-equal) as one of the canonical
forms of random access iterator
    
In OpenMP 4.5, only 4 relational operators are supported: <, <=, >, 
and >=.  This work is to enable support for relational operator 
!= (not-equal) as one of the canonical forms.

Patch by Anh Tuyen Tran
    
Differential Revision: https://reviews.llvm.org/D54441

llvm-svn: 347405
2018-11-21 19:10:48 +00:00
Joel E. Denny
435c739fbe [OpenMP] Update CHECK-DAG usage in target_parallel_codegen.cpp
This patch adjusts a test not to depend on deprecated FileCheck
behavior that permits overlapping matches within a block of CHECK-DAG
directives.  Thus, this patch also removes uses of FileCheck's
-allow-deprecated-dag-overlap command-line option.

There were two issues in this test:

1. There were sets of patterns for store instructions in which a
pattern X could match a superset of a pattern Y.  While X appeared
before Y, Y's intended match appeared before X's intended match.  The
result was that X matched Y's intended match.  Under the old
overlapping behavior, Y also matched Y's intended match.  Under the
new non-overlapping behavior, Y had nothing left to match.  This patch
fixes this by gathering these sets in one place and putting the most
specific patterns (Y) before the more general patterns (X).

2. The CHECK-DAG patterns involving the variables CBPADDR3 and
CBPADDR4 were the same, but there was only one match in the text, so
CBPADDR4 patterns had nothing to match under the new non-overlapping
behavior.  Moreover, a preceding related series of directives had
variables (SADDR0, BPADDR0, etc.) numbered only 0 through 4, but this
series had variables numbered 0 through 5.  Assuming CBPADDR4's
directives were not intended, this patch removes them.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D54765

llvm-svn: 347351
2018-11-20 22:05:23 +00:00
Joel E. Denny
d586bc6db9 [OpenMP] Update CHECK-DAG usage in for_codegen.cpp
This patch adjusts a test not to depend on deprecated FileCheck
behavior that permits overlapping matches within a block of CHECK-DAG
directives.  Thus, this patch also removes uses of FileCheck's
-allow-deprecated-dag-overlap command-line option.

Specifically, the FileCheck variables DBG_LOC_START, DBG_LOC_END, and
DBG_LOC_CANCEL were all set to the same value.  As a result, three
TERM_DEBUG-DAG patterns, one for each variable, all matched the same
text under the old overlapping behavior.  Under the new
non-overlapping behavior, that's not permitted.  This patch's solution
is to replace these variables with one variable and replace these
patterns with one pattern.

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D54764

llvm-svn: 347350
2018-11-20 22:04:45 +00:00
Patrick Lyster
8f7f586e53 [OpenMP] Check target architecture supports unified shared memory for requires directive. Differential Review: https://reviews.llvm.org/D54493
llvm-svn: 347214
2018-11-19 15:09:33 +00:00
Alexey Bataev
d1840e5383 [OPENMP]Fix PR39694: do not capture this in non-this region.
If lambda is used inside of the OpenMP region and captures `this`, we
should recapture it in the OpenMP region also. But we should do this
only if the OpenMP region is used in the context of the same class, just
like the lambda.

llvm-svn: 347096
2018-11-16 21:13:33 +00:00
Alexey Bataev
f2f39be9ed [OPENMP][NVPTX]Emit correct reduction code for teams/parallel
reductions.

Fixed previously committed code for the reduction support in
teams/parallel constructs taking into account new design of the NVPTX
support in the compiler. Teams reduction are not fully functional yet,
it is going to be fixed in the following patches.

llvm-svn: 347081
2018-11-16 19:38:21 +00:00
Alexey Bataev
8bcc69c054 [OPENMP][NVPTX]Extend number of constructs executed in SPMD mode.
If the statements between target|teams|distribute directives does not
require execution in master thread, like constant expressions, null
statements, simple declarations, etc., such construct can be xecuted in
SPMD mode.

llvm-svn: 346551
2018-11-09 20:03:19 +00:00
Alexey Bataev
09c9eea78f [OPENMP][NVPTX]Allow to use shared memory for the
target|teams|distribute variables.

If the total size of the variables, declared in target|teams|distribute
regions, is less than the maximal size of shared memory available, the
buffer is allocated in the shared memory.

llvm-svn: 346507
2018-11-09 16:18:04 +00:00
Alexey Bataev
969dbc02c2 [OPENMP]Make lambda mapping follow reqs for PTR_AND_OBJ mapping.
The base pointer for the lambda mapping must point to the lambda capture
placement and pointer must point to the captured variable itself. Patch
fixes this problem.

llvm-svn: 346408
2018-11-08 15:47:39 +00:00
Alexey Bataev
2a6f3f5fa2 [OPENMP]Fix handling of the globals during compilation for the device.
Fixed lookup for the target regions in unused virtual functions + fixed
processing of the global variables not marked as declare target but
emitted during debug info emission.

llvm-svn: 346343
2018-11-07 19:11:14 +00:00
Alexey Bataev
1fc1f8e819 [OPENMP][NVPTX]Use __kmpc_data_sharing_coalesced_push_stack function.
Coalesced memory access requires use of the new function
`__kmpc_data_sharing_coalesced_push_stack` instead of the
`__kmpc_data_sharing_push_stack`.

llvm-svn: 345991
2018-11-02 16:08:31 +00:00
Alexey Bataev
2dc07d0cf3 [OPENMP]Change the mapping type for lambda captures.
The previously used combination `PTR_AND_OBJ | PRIVATE` could be used for mapping of some data in Fortran. Changed it to `PTR_AND_OBJ | LITERAL`.

llvm-svn: 345982
2018-11-02 15:25:06 +00:00
Alexey Bataev
e40901806f [OPENMP][NVPTX]Improve emission of the globalized variables for
target/teams/distribute regions.

Target/teams/distribute regions exist for all the time the kernel is
executed. Thus, if the variable is declared in their context and then
escape it, we can allocate global memory statically instead of
allocating it dynamically.
Patch captures all the globalized variables in target/teams/distribute
contexts, merges them into the records, one per each target region.
Those records are then joined into the union, one per compilation unit
(to save the global memory). Those units are organized into
2 x dimensional arrays, where the first dimension is
the number of blocks per SM and the second one is the number of SMs.
Runtime functions manage this global memory space between the executing
teams.

llvm-svn: 345978
2018-11-02 14:54:07 +00:00
Patrick Lyster
7a2a27c4a4 Add support for 'atomic_default_mem_order' clause on 'requires' directive. Also renamed test files relating to 'requires'. Differntial review: https://reviews.llvm.org/D53513
llvm-svn: 345967
2018-11-02 12:18:11 +00:00
Alexey Bataev
6070542296 [OPENMP] Support for mapping of the lambdas in target regions.
Added support for mapping of lambdas in the target regions. It scans all
the captures by reference in the lambda, implicitly maps those variables
in the target region and then later reinstate the addresses of
references in lambda to the correct addresses of the captured|privatized
variables.

llvm-svn: 345609
2018-10-30 15:50:12 +00:00
Alexey Bataev
f07946e101 [OPENMP]Fix PR39372: Does not complain about loop bound variable not
being shared.

According to the standard, the variables with unspecified data-sharing
attributes in presence of `default(none)` clause must be reported to
users. Compiler did not generate error reports for the variables used in
other OpenMP regions. Patch fixes this.

llvm-svn: 345533
2018-10-29 20:17:42 +00:00
Gheorghe-Teodor Bercea
9d6341ff51 [OpenMP] Fix condition.
Summary: Iteration variable must be strictly less than the number of iterations. This fixes a bug introduced by previous patch D53448.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D53827

llvm-svn: 345527
2018-10-29 19:44:25 +00:00
Gheorghe-Teodor Bercea
e92567601b [OpenMP][NVPTX] Use single loops when generating code for distribute parallel for
Summary: This patch adds a new code generation path for bound sharing directives containing distribute parallel for. The new code generation scheme applies to chunked schedules on distribute and parallel for directives. The scheme simplifies the code that is being generated by eliminating the need for an outer for loop over chunks for both distribute and parallel for directives. In the case of distribute it applies to any sized chunk while in the parallel for case it only applies when chunk size is 1.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D53448

llvm-svn: 345509
2018-10-29 15:45:47 +00:00
Gheorghe-Teodor Bercea
669dbde7a5 [OpenMP][NVPTX] Enable default scheduling for parallel for in non-SPMD cases.
Summary: This patch enables the choosing of the default schedule for parallel for loops even in non-SPMD cases.

Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: jholewinski, guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D53443

llvm-svn: 345507
2018-10-29 15:23:23 +00:00
Alexey Bataev
6ab5bb115a [OPENMP] Do not capture private loop counters.
If the loop counter is not declared in the context of the loop and it is
private, such loop counters should not be captured in the outlined
regions.

llvm-svn: 345505
2018-10-29 15:01:58 +00:00
Gheorghe-Teodor Bercea
a6cb25676e [NFC][OpenMP] Add new test for parallel for code generation.
Summary:
This is a simple test of the parallel for code generation. It will be used to showcase the change introduced by patch D53443.


Reviewers: ABataev, caomhin

Reviewed By: ABataev

Subscribers: guansong, cfe-commits

Differential Revision: https://reviews.llvm.org/D53772

llvm-svn: 345417
2018-10-26 18:59:52 +00:00
Alexey Bataev
8fc7b5f922 [OPENMP]Fix PR39422: variables are not firstprivatized in task context.
According to the OpenMP standard, In a task generating construct, if no
default clause is present, a variable for which the data-sharing
attribute is not determined by the rules above is firstprivatized.
Compiler tries to implement this, but if the variable is not directly
used in the task context, this variable may not be firstprivatized.
Patch fixes this problem.

llvm-svn: 345277
2018-10-25 15:35:27 +00:00
Alexey Bataev
ac6e4de714 Do not always request an implicit taskgroup region inside the kmpc_taskloop function
Summary:
For the following code:
```
    int i;
    #pragma omp taskloop
    for (i = 0; i < 100; ++i)
    {}

    #pragma omp taskloop nogroup
    for (i = 0; i < 100; ++i)
    {}
```

Clang emits the following LLVM IR:

```
 ...
  call void @__kmpc_taskgroup(%struct.ident_t* @0, i32 %0)
  %2 = call i8* @__kmpc_omp_task_alloc(%struct.ident_t* @0, i32 %0, i32 1, i64 80, i64 8, i32 (i32, i8*)* bitcast (i32 (i32, %struct.kmp_task_t_with_privates*)* @.omp_task_entry. to i32 (i32, i8*)*))
  ...
  call void @__kmpc_taskloop(%struct.ident_t* @0, i32 %0, i8* %2, i32 1, i64* %8, i64* %9, i64 %13, i32 0, i32 0, i64 0, i8* null)
  call void @__kmpc_end_taskgroup(%struct.ident_t* @0, i32 %0)

  ...
  %15 = call i8* @__kmpc_omp_task_alloc(%struct.ident_t* @0, i32 %0, i32 1, i64 80, i64 8, i32 (i32, i8*)* bitcast (i32 (i32, %struct.kmp_task_t_with_privates.1*)* @.omp_task_entry..2 to i32 (i32, i8*)*))
  ...
  call void @__kmpc_taskloop(%struct.ident_t* @0, i32 %0, i8* %15, i32 1, i64* %21, i64* %22, i64 %26, i32 0, i32 0, i64 0, i8* null)

```

The first set of instructions corresponds to the first taskloop construct. It is important to note that the implicit taskgroup region associated with the taskloop construct has been materialized in our IR:  the `__kmpc_taskloop` occurs inside a taskgroup region. Note also that this taskgroup region does not exist in our second taskloop because we are using the `nogroup` clause.

The issue here is the 4th argument of the kmpc_taskloop call, starting from the end,  is always a zero. Checking the LLVM OpenMP RT implementation, we see that this argument corresponds to the nogroup parameter:

```
void __kmpc_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val,
                     kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st, int nogroup,
                     int sched, kmp_uint64 grainsize, void *task_dup);
```

So basically we always tell to the RT to do another taskgroup region. For the first taskloop, this means that we create two taskgroup regions. For the second example, it means that despite the fact we had a nogroup clause we are going to have a taskgroup region, so we unnecessary wait until all descendant tasks have been executed.

Reviewers: ABataev

Reviewed By: ABataev

Subscribers: rogfer01, cfe-commits

Differential Revision: https://reviews.llvm.org/D53636

llvm-svn: 345180
2018-10-24 19:06:37 +00:00