Tutorial :sql query - select duplicates within a 12 hour period



Question:

if i have data as follows

A | 01/01/2008 00:00:00

B | 01/01/2008 01:00:00

A | 01/01/2008 11:59:00

C | 02/01/2008 00:00:00

D | 02/01/2008 01:00:00

D | 02/01/2008 20:00:00

I want to only select the records whose identifiers (A, B, C or D) have occured twice within a 12 hour period. In this example above this would only be 'A'

Can anyone help please (this is for an Oracle data base)

Thanks

M


Solution:1

 Select Distinct A.Identifer      From Table A      Join Table B --  EDIT to eliminate self Joins (to same row)          On A.PrimKey <> B.PrimaryKey             And A.Identifer = B.Identifer                            -- EDIT to fix case where 2 at same time             And A.OccurTime >=  B.OccurTime              And A.OccurTime < B.OccurTime + .5  

and to implement question asked in comment, (Ignoring records which are on different days)

-- for SQL Server,

   Select Distinct A.Identifer        From Table A        Join Table B           On A.PrimKey <> B.PrimaryKey             And A.Identifer = B.Identifer             -- EDIT to fix case where 2 at same time             And A.OccurTime >= B.OccurTime               And A.OccurTime < B.OccurTime + .5      Where DateDiff(day, A.OccurTime, B.OccurTime) = 0  

-- or for oracle...

 Select Distinct A.Identifer        From Table A        Join Table B           On A.PrimKey <> B.PrimaryKey             And A.Identifer = B.Identifer             -- EDIT to fix case where 2 at same time             And A.OccurTime >= B.OccurTime               And A.OccurTime < B.OccurTime + .5   Where Trunc(A.OccurTime) = Trunc(B.OccurTime)  


Solution:2

Select        A.Id  From        YourTable A   Where        A.YourDateTime Between :StartDateTime and :EndDateTime   Group By        A.Id  Having        COUNT(A.Id) = 2  


Solution:3

I haven't checked William's query but I would seriously consider using what he has over every other. Analytics are da bomb. Anytime you find yourself joining a table back to itself is virtually guaranteed to be an opportunity to use analytics and will out perform the query with one table referenced twice every time.

You'll be amazed how much faster the analytic solution will be.


Solution:4

SELECT identifier    FROM table_name outer   WHERE EXISTS( SELECT 1                   FROM table_name inner                  WHERE inner.identifier  = outer.identifier                    AND inner.date_column BETWEEN outer.date_column AND outer.date_column + interval '12' hour                    AND inner.rowid != outer.rowid )  


Solution:5

I'm not 100% sure of your requirements, however this might give you some ideas about how to do what you need. For example you said exactly 2; what if there are 3 occurances? etc.

create table t (ident varchar2(16), occurance timestamp);    insert into t (ident, occurance) values ('a', to_date('20080101000000', 'yyyymmddhh24miss'));    insert into t (ident, occurance) values ('b', to_date('20080101010000', 'yyyymmddhh24miss'));    insert into t (ident, occurance) values ('a', to_date('20080101115900', 'yyyymmddhh24miss'));    insert into t (ident, occurance) values ('c', to_date('20080102000000', 'yyyymmddhh24miss'));    insert into t (ident, occurance) values ('d', to_date('20080102010000', 'yyyymmddhh24miss'));    insert into t (ident, occurance) values ('d', to_date('20080102200000', 'yyyymmddhh24miss'));    insert into t (ident, occurance) values ('d', to_date('20080103020000', 'yyyymmddhh24miss'));    select ident, occurance  from  (  select ident, occurance,      lag(occurance) over (partition by ident order by occurance) previous,       lead(occurance) over (partition by ident order by occurance) next   from t  )  where       ((occurance-previous<interval'12:00' hour to minute and extract(day from occurance) = extract(day from previous))      or (next-occurance<interval'12:00' hour to minute and extract(day from occurance) = extract(day from next)))  /  


Solution:6

SELECT namecol FROM tbl A  WHERE EXISTS (    SELECT 1 from tbl B    WHERE b.namecol = a.namecol    AND b.timestamp > a.timestamp    AND b.timestamp - 0.5 <= a.timestamp )  

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Previous
Next Post »