Tutorial :How to improve performance by processing database results in parallel?



Question:

I have a .net application which runs in the region of 20 to 30 SQL queries and processes the results 1 at a time. I have been trying to increase performance by doing some work in parallel.

2 of the queries take 75% of the time, purely because of the amount of data they return. My initial experiments have been to try to split these queries into 4 buckets using ntile and process each datareader in parallel. If anything this takes a lot longer, I think because of the extra work involved using NTILE + querying the DB 4 times instead of 1.

Can anyone suggest other techniques to try or am I just wasting my time here? The code below is part of a utility class which allows me to queue up the functions which process the reader. So using my NTILE experiment I queue up 4 tasks each processing 1/4 of the data (where ntile =1, 2, 3, 4) and call Execute to run them in parallel.

foreach (var keyValuePair in m_Tasks)              {                  var sql = keyValuePair.Key;                  var task = keyValuePair.Value;                    var conn = new OracleConnection(ConnectionString);                  conn.BeginOpen(o=> {                      conn.EndOpen(o);                      var cmd = conn.CreateCommand();                      cmd.CommandText = sql;                        cmd.BeginExecuteReader(a =>                      {                          var reader = cmd.EndExecuteReader(a);                          DateTime endIO = DateTime.Now;                          Console.WriteLine(TaskName + " " + Thread.CurrentThread.ManagedThreadId + "  IO took: " + (endIO - startTime) + " ended at " + endIO);                            DateTime taskStart = DateTime.Now;                          task(reader);                          DateTime endTAsk = DateTime.Now;                          Console.WriteLine(TaskName + " " + Thread.CurrentThread.ManagedThreadId + " TAsk took: " + (endTAsk - taskStart) + " ended at " + endTAsk);                          reader.Close();                          conn.Close();                            if (Interlocked.Decrement(ref numTasks) == 0)                          {                              finishedEvent.Set();                          }                        }, null);                    },                  null                        );                  }                finishedEvent.WaitOne();              DateTime endExecute = DateTime.Now;              Console.WriteLine(TaskName + " " + Thread.CurrentThread.ManagedThreadId + " EXECUTE took: " + (endExecute - startTime) + " ended at " + endExecute);            }  

Thanks for any help.


Solution:1

I think you're right that the cost of doing the NTILE is outweighing the saving of the parallelism.

You need to use something that will split the query sets into clearly separated sets.

If your queries are returning less than 15% of the total data (approximately) then breaking down the tables on an index (either an indexed field, or functional index) is probably your best starting point.

Example : Presuming your data has a numeric pseudo-key on each row, create a functional index on MOD(Id,4) - this would give you an Index based version of your NTILE approach. (I don't think you can have a functional index on an NTILE).

This specific approach is probably counter-productive - you would be getting data from the same blocks in different threads, so potentially increasing I/O (depends on memory).

The way that Oracle parallel query tends to do it - provided you want to process over 15% of the data in the table - is to simply break the table into N physical chunks (using the rowid) and then run N 'full scans' on those chunks.

I'm not sure if you can replicate this approach from the front-end. Splitting on a key id adds in the cost of going through the index to each row.

What you probably want is something that splits the table by something other than the key, or if you split on key, split it by ranges rather than the NTILE approach.


Solution:2

I use OracleCommand.Fetchsize to improve perfomance on large Queries.

cmd.FetchSize = &H100000  '1Mb  Dim Rdr = cmd.ExecuteReader  

Some time ago, I use Async Readers for get Blob Data. But to use Async Reader you need maintain an array with each async Result an loop until last Reader ends.

   Public Shared Function FromBlob(ByVal Id As String, ByVal Rv As String, ByVal cn As OracleConnection) As Proyecto       Dim n As Integer, Prj As Proyecto = Nothing       Dim Bf(2)() As Byte, arrAr(2) As IAsyncResult 'Para proceso asíncrono         Dim Cmd As New OracleCommand( _           "Select rv,fecha,Datos From Proyectos Where Id=:Id and Rv in (:Rv,'Av','Est')", cn)       Cmd.BindByName = True       Cmd.Parameters.Add("Id", OracleDbType.Varchar2, Id, ParameterDirection.Input)       Cmd.Parameters.Add("Rv", OracleDbType.Varchar2, Rv, ParameterDirection.Input)       If Rv Is Nothing Then Prj = Proyecto.Actprj       Try          Using Rdr As OracleDataReader = Cmd.ExecuteReader              Do Until Rdr.Read = False                  Dim rv1 As String = Rdr.GetString(0)                  Select Case rv1                      Case "Av" : n = 1   'Avance TND                      Case "Est" : n = 2  'Datos Seguimiento Estudio Seguridad                      Case Else : n = 0                  End Select                  If Rdr.IsDBNull(2) = False Then                     Dim Blob As OracleBlob = Rdr.GetOracleBlob(2)                     Dim Buffer(CInt(Blob.Length)) As Byte                     Bf(n) = Buffer                     arrAr(n) = Blob.BeginRead(Buffer, 0, Buffer.Length, Nothing, Blob)                  End If              Loop              If Bf(0) Is Nothing AndAlso Prj Is Nothing Then _                 MessageBox.Show("Fallo al cargar proyecto") : Return Nothing              For n = 0 To Bf.Length - 1                  Dim ar As IAsyncResult = arrAr(n)                  If ar IsNot Nothing AndAlso ar.AsyncWaitHandle.WaitOne() Then                     Dim blob As OracleBlob = DirectCast(ar.AsyncState, OracleBlob)                     blob.EndRead(ar)                     blob.Dispose()                     If ar.IsCompleted Then                        Using rd As New BinReader(New MemoryStream(Bf(n)))                            If n = 0 Then                               Prj = New Proyecto(rd, False)                            Else                               Dim entry = Proyecto.Entry.FromLob(rd), Index = Prj.IndexOf(entry)                               If Index < 0 Then Prj.Add(entry) Else Prj(Index) = entry                            End If                        End Using                     End If                  End If              Next          End Using          Catch ex As Exception              MessageBox.Show(ex.Message)       End Try       Return Prj    End Function  


Solution:3

You can use Ref Cursor with Oracle to execute some Sql with one OracleCommand:

  Dim cmd As New OracleCommand("Begin " _    & "Open :1 for Select T.CODTRA,SIM,JLA CAL,SUP,RESP,SERV,SubStr(Aparato,1,3) SIS,PERS,(nvl(DUR,0) * 60) as Dur,t.DESTRA,g.DesTra Destrae,OBS from " & TraRec & " T, Trarec_Gee g where T.codtra <> 'RV' and T.Codtra=G.Codtra(+);" _    & "Open :2 for Select Red,descr from Redes;" _    & "Open :3 for Select * from Tr_Redes;" _    & "Open :4 for Select CODTRA,T_COND,COND,DEMORA * 60 as DEMORA from " & TrCondic _    & ";end;", cn)      For n = 0 To 3 : cmd.Parameters.Add(Nothing, OracleDbType.RefCursor, ParameterDirection.Output) : Next    Dim da As New OracleDataAdapter(cmd)    da.Fill(0, 0, ds.Tnd, ds.Redes, ds.TrRedes, ds.TrCondic)  

Note: Da.Fill(0, 0, T1, T2 ...) is a Oracle especific function to retrieve many tables on a single statement.


Solution:4

Ultimately it has turned out to be an IO bound problem. I've been able to achieve perf improvements by doing the IO asynchronously. NTILE on ROWID does what I wanted but so far it hasn't helped because the problem is IO bound.


Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »