Today I’ll discuss one of the ways of maintaining Procedural Parallelism: Parallel Pipelined Functions.
Before I start to discuss Parallel Pipelined Functions, I’d like to briefly mention our habit of doing procedural processes and their effects.
Usually in the procedures or functions we code, we respectively set a data, then transform this data into lines and then insert the data we transformed into a table or update it considering our needs. The pseudo code example below, exemplifies this situation.
Create Procedure myProcedure() as begin for record in (select …..) loop — Transformation — update or delete record from table end loop; end;
Usually, while we want to have a performance gain whilst doing parallel processes on db, we are not able to get to the points we wish in means of performance because we cannot benefit from the parallelism enough in the points where functions or procedures are used.
Now, let’s think about with the run style I exemplified above, what we can parallelize and how much we can change the performance by it. Will we be able to get the performance we desire with this run style?
–> Firstly, in order to parallelize the SQL inside the Cursor, the first thing we try to parallelize is done so by giving the parallel hint to the Select phrase. However, as it does not parallelize the whole function it will not multiply our performance in this case.
–> Another point is, as we are in a circulation and as at any moment there will be only one line in our hands, the insert or the update we want to parallelize seems to be not possible to parallelize. Because the number of entries we influence will be a single line while in circulation and the series will run.
If we consider that every day the data we will process is going to increase, it’s clear that we will not be able to gain the performance level we desire with the improvements I discussed above. As parallelizing steps cannot pull the performance to desired levels we have to find a way to parallelize the whole function. One of the ways to do that is using the Parallel Pipelined Functions.
Now, let me explain to you how the Parallel Pipelined Functions work with the code example below.
Firstly, we need to do some changes in our processing style. For that, initially storing the data we are interested in to somewhere without transforming it, then applying other processes will not only be a performance-increasing factor but also it enables us to run our functions more efficiently.
Now if we analyze our code example, the first thing we will pay attention to is the phrase PARALLEL_ENABLE which will parallelize our function and the partition definition we made there. The partition definition we made separates the cursor according to our definition and enables function to run in separate independent parallel sessions. If we “partition” the incoming data, we can conclude that we can also run the function parallel. The partition process in here can be done in 3 different ways. As in here, if we use the phrase BY ANY, we leave the partition process to Oracle’s control. With this choice, Oracle executes the data separation process without overlapping. The other methods we can apply are the Hash and Range methods. It is possible to partition the data with these methods; However as the action is in our control we have to pay attention for parts not to overlap.
Now if we look at our “begin end” block, we can see that it sends the function to where it’s invoked with PIPE ROW phrase without initiating any transformation nor operation in FOR. This situation is advantageous in many ways. If we look at those:
a) It send the parts where invoked without waiting for all of the data inside FOR to end thus enables them to process faster.
b) For example, if we think of a case we are in a circulation where a select with 1 million entries processed line by line, not waiting for all the entries to return and sending every line to process as they end will grant us speed and reduce the memory usage.
We gain extra speed by converting our PIPE ROW phrase and run style to Producer-Consumer model. If we did not use Producer-Consumer model, we would have lost extra time by carrying the data we pass from one process to the other process, to mid-tables. The main reason we carry our data to mid-tables is the fact that the memory usage increases excessively because of the data staying on it. From the figures below, the difference between parallel pipeline run and serial run without pipe is understood more clearly. (T1,T2,T3: stands for Transformation processes. Stage1, Stage2: Show the mid-tables where the results returning from functions are stored.)
As can be understood from the figures, as the parallel and pipelined runs do not require mid-tables they do not cost time for writing to mid-table and reading. This creates an extra profit.
Now let’s look at how we are going to use the function we had written above with the code below.
The applications we code using Parallel Pipelined Functions creates fairly good results in means of memory usage, speed and performance. With this method we can improve and increase the performance of our applications which are reduced to serial from parallel because of the procedural requirements.