RTL Register based Memory Implementations


Purpose:
To show the user how to easily build and test a high speed register SRAM or FIFO given RTL code. Given a small memory requirement the user can synthesize to a non-SRAM based Actel family, such as the XL or Act3 families. This applications note covers the following three scenarios:

1)  Single-port SRAM
2)  Dual-port SRAM
3)  FIFO

Each scenario gives the RTL code, verilog test bench, and the synthesis results in area and speed. In addition, the FIFO scenario gives the VHDL RTL code.


1)  Register-based Single-port SRAM

This is really only a viable solution if the SRAM is relatively small. Of course, this is a design dependent decision. To quickly determine whether or not this idea has a chance of working, use the following formula.

Total available registers >= user registers + SRAM bits + 0.6(SRAM bits)decode logic

Verilog RTL CODE:
The intent was to make the source code easy to customize, therefore parameters were used. To modify the width or depth, simply modify the listed parameters in the code. However, the code does assume that the user wants to use posedge clk and negedge reset. Simply modify the always blocks if that is not the case.

`timescale 1 ns/100 ps
//########################################################
//# Behavioral single-port SRAM description :
//#    Active High write enable (WE)
//#    Rising clock edge (Clock)
//#######################################################

module reg_sram(Data, Q, Clock, WE, Address);

parameter width = 8;
parameter depth = 8;
parameter addr = 3;

input Clock, WE;
input [addr-1:0] Address;
input [width-1:0] Data;
output [width-1:0] Q;
wire [width-1:0] Q;
reg [width-1:0] mem_data [depth-1:0];

always @(posedge Clock)
	if(WE)
                mem_data[Address] = #1 Data;


assign Q = mem_data[Address];

endmodule

Verilog RTL Synthesis Results:
The above RTL synthesized to 66 registers with a total of 102 logic modules, which utilized 51% of a 1415A. It uses 22 IOs. It runs at 220MHz in -2 speed grade.

Verilog RTL Simulation:
Below is the verilog self-checking testbench. This testbench does rely upon the default parameters given below.

`timescale 1 ns/100 ps
module test;

parameter width = 8;    // bus width
parameter addr_bits = 3;     // # of addr lines
parameter numvecs = 21; // actual number of vectors
parameter Clockper = 1000; // 100ns period

reg [width-1:0] Data;
reg [addr_bits-1:0] Address;
reg Clock, WE;
reg [width-1:0] data_in  [0:numvecs-1];
reg [width-1:0] data_out [0:numvecs-1];

wire [width-1:0] Q;

integer i, j, numerrors;

reg_sram u0(.Data(Data), .Q(Q), .Clock(Clock), .WE(WE), .Address(Address));

initial
begin
        // sequential test patterns entered at neg edge Clock

	data_in[0]=8'h00; data_out[0]=8'hxx;
	data_in[1]=8'h01; data_out[1]=8'h01;
	data_in[2]=8'h02; data_out[2]=8'h02;
	data_in[3]=8'h04; data_out[3]=8'h04;
	data_in[4]=8'h08; data_out[4]=8'h08; 
	data_in[5]=8'h10; data_out[5]=8'h10; 
	data_in[6]=8'h20; data_out[6]=8'h20; 
	data_in[7]=8'h40; data_out[7]=8'h40; 
	data_in[8]=8'h80; data_out[8]=8'h80; 
	data_in[9]=8'h07; data_out[9]=8'h01; 
	data_in[10]=8'h08; data_out[10]=8'h02;
	data_in[11]=8'h09; data_out[11]=8'h04;
	data_in[12]=8'h10; data_out[12]=8'h08;
	data_in[13]=8'h11; data_out[13]=8'h10;
	data_in[14]=8'h12; data_out[14]=8'h20;
	data_in[15]=8'h13; data_out[15]=8'h40;
	data_in[16]=8'h14; data_out[16]=8'h80;
	data_in[17]=8'haa; data_out[17]=8'haa;
	data_in[18]=8'h55; data_out[18]=8'haa;
	data_in[19]=8'h55; data_out[19]=8'h55;
	data_in[20]=8'haa; data_out[20]=8'h55;

end

initial
begin
	Clock = 0;
	WE = 0;
	Address = 0;
	Data = 0;
	numerrors = 0;
end

always	#(Clockper / 2)  Clock = ~Clock;

initial
begin
	#2450 WE = 1;
	#8000 WE = 0;
	#8000 WE = 1;
	#1000 WE = 0;
	#1000 WE = 1;
	#1000 WE = 0;
end

initial
begin
	#1450;
	for (j = 0; j <= width; j = j + 1)
		#1000 Address = j;
	for (j = 1; j <= width; j = j + 1)
		#1000 Address = j;
	Address = 0;
end
	
initial
begin
	$display("\nBeginning Simulation..."); 

	//skip first rising edge
	for (i = 0; i <= numvecs-1; i = i + 1)
	begin
		@(negedge Clock);
		// apply test pattern at neg edge
		Data = data_in[i];

		@(posedge Clock) 
		#450; //45 ns later
		// check result at posedge + 45 ns
		$display("Pattern#%d time%d: WE=%b; Address=%h; Data=%h; Expected Q=%h; Actual Q=%h", i, $stime, WE, Address, Data, data_out[i], Q);

		if ( Q !== data_out[i] )
			begin
			$display("  ** Error");
			numerrors = numerrors + 1;
			end
	end
	if (numerrors == 0)
		$display("Good!  End of Good Simulation.");
	else
		if (numerrors > 1)
			$display(
			  "%0d ERRORS!  End of Faulty Simulation.",numerrors);
		else
			$display(
			 "1 ERROR!  End of Faulty Simulation."); 
	
	#1000 $finish; // after 100 ns later
end

endmodule


RTL Simulation Results:
The simulation results for the gate-level and the RTL should of course be the same, and should match the below report:

Beginning Simulation...
Pattern#0 time1950: WE=0; Address=0; Data=00; Expected Q=xx; Actual Q=xx
Pattern#1 time2950: WE=1; Address=0; Data=01; Expected Q=01; Actual Q=01
Pattern#2 time3950: WE=1; Address=1; Data=02; Expected Q=02; Actual Q=02
Pattern#3 time4950: WE=1; Address=2; Data=04; Expected Q=04; Actual Q=04
Pattern#4 time5950: WE=1; Address=3; Data=08; Expected Q=08; Actual Q=08
Pattern#5 time6950: WE=1; Address=4; Data=10; Expected Q=10; Actual Q=10
Pattern#6 time7950: WE=1; Address=5; Data=20; Expected Q=20; Actual Q=20
Pattern#7 time8950: WE=1; Address=6; Data=40; Expected Q=40; Actual Q=40
Pattern#8 time9950: WE=1; Address=7; Data=80; Expected Q=80; Actual Q=80
Pattern#9time10950: WE=0; Address=0; Data=07; Expected Q=01; Actual Q=01
Pattern#10time11950: WE=0; Address=1; Data=08; Expected Q=02;Actual Q=02
Pattern#11time12950: WE=0; Address=2; Data=09; Expected Q=04;Actual Q=04
Pattern#12time13950: WE=0; Address=3; Data=10; Expected Q=08;Actual Q=08
Pattern#13time14950: WE=0; Address=4; Data=11; Expected Q=10;Actual Q=10
Pattern#14time15950: WE=0; Address=5; Data=12; Expected Q=20;Actual Q=20
Pattern#15time16950: WE=0; Address=6; Data=13; Expected Q=40;Actual Q=40
Pattern#16time17950: WE=0; Address=7; Data=14; Expected Q=80;Actual Q=80
Pattern#17time18950: WE=1; Address=0; Data=aa; Expected Q=aa;Actual Q=aa
Pattern#18time19950: WE=0; Address=0; Data=55; Expected Q=aa;Actual Q=aa
Pattern#19time20950: WE=1; Address=0; Data=55; Expected Q=55;Actual Q=55
Pattern#20time21950: WE=0; Address=0; Data=aa; Expected Q=55;Actual Q=55
Good!  End of Good Simulation.
L111 "reg_sram.vt": $finish at simulation time 229500
729 simulation events + 12571 accelerated events + 82600 timing check events
CPU time: 0.7 secs to compile + 4.1 secs to link + 0.2 secs in simulation


2)  Register-based Dual-Port SRAM

Verilog RTL CODE:
This code was designed to imitate the behavior of the Actel DX family dual-port SRAM.

`timescale 1 ns/100 ps
//########################################################
//# Behavioral dual-port SRAM description :
//#    Active High write enable (WE)
//#    Active High read enable (RE)
//#    Rising clock edge (Clock)
//#######################################################


module reg_dpram(Data, Q, Clock, WE, RE, WAddress, RAddress);

parameter width = 8;
parameter depth = 8;
parameter addr = 3;

input Clock, WE, RE;
input [addr-1:0] WAddress, RAddress;
input [width-1:0] Data;
output [width-1:0] Q;
reg [width-1:0] Q;
reg [width-1:0] mem_data [depth-1:0];

// #########################################################
// # Write Functional Section
// #########################################################
always @(posedge Clock)
begin
	if(WE)
                mem_data[WAddress] = #1 Data;
end

//#########################################################
//# Read Functional Section
//#########################################################
always @(posedge Clock)
begin
	if(RE)
		Q = #1 mem_data[RAddress];
end

endmodule

Verilog RTL Synthesis Results:
The above RTL synthesized to 72 registers with a total of 98 logic modules, which utilized 49% of a 1415A. It uses 26 IOs. It runs at 125MHz in -2 speed grade.

Verilog RTL Simulation:
The verilog self-checking testbench is similar to the previous testbench. 

`timescale 1 ns/100 ps
module test;

parameter width = 8;    // bus width
parameter addr = 3;     // # of addr lines
parameter numvecs = 20; // actual number of vectors
parameter Clockper = 1000; // 100ns period

reg [width-1:0] Data;
reg [addr-1:0] WAddress, RAddress;
reg Clock, WE, RE;
reg [width-1:0] data_in  [0:numvecs-1];
reg [width-1:0] data_out [0:numvecs-1];

wire [width-1:0] Q;

integer i, j, k, numerrors;

reg_dpram u0(.Data(Data), .Q(Q), .Clock(Clock), .WE(WE), 
		.RE(RE), .WAddress(WAddress), .RAddress(RAddress));

initial
begin
        // sequential test patterns entered at neg edge Clock

	data_in[0]=8'h00; data_out[0]=8'hxx;
	data_in[1]=8'h01; data_out[1]=8'hxx;
	data_in[2]=8'h02; data_out[2]=8'hxx;
	data_in[3]=8'h04; data_out[3]=8'hxx;
	data_in[4]=8'h08; data_out[4]=8'hxx; 
	data_in[5]=8'h10; data_out[5]=8'hxx; 
	data_in[6]=8'h20; data_out[6]=8'hxx; 
	data_in[7]=8'h40; data_out[7]=8'hxx; 
	data_in[8]=8'h80; data_out[8]=8'hxx; 
	data_in[9]=8'h07; data_out[9]=8'h01; 
	data_in[10]=8'h08; data_out[10]=8'h02;
	data_in[11]=8'h09; data_out[11]=8'h04;
	data_in[12]=8'h10; data_out[12]=8'h08;
	data_in[13]=8'h11; data_out[13]=8'h10;
	data_in[14]=8'h12; data_out[14]=8'h20;
	data_in[15]=8'h13; data_out[15]=8'h40;
	data_in[16]=8'h14; data_out[16]=8'h80;
	data_in[17]=8'haa; data_out[17]=8'h80;
	data_in[18]=8'h55; data_out[18]=8'haa;
	data_in[19]=8'haa; data_out[19]=8'h55;
end

initial
begin
	Clock = 0;
	WE = 0;
	RE = 0;
	WAddress = 0;
	RAddress = 0;
	Data = 0;
	numerrors = 0;
end

always	#(Clockper / 2)  Clock = ~Clock;

initial
begin
	#2450 WE = 1;
	#8000 WE = 0;
	      RE = 1;
	#8000 RE = 0;
	      WE = 1;
	#1000 RE = 1;

end

initial
begin
	#1450;
	for (j = 0; j <= width; j = j + 1)
		#1000 WAddress = j;
	WAddress = 0;
end

initial
begin
	#9450;
	for (k = 0; k <= width; k = k + 1)
		#1000 RAddress = k;
	RAddress = 0;
end
	
initial
begin
	$display("\nBeginning Simulation..."); 

	//skip first rising edge
	for (i = 0; i <= numvecs-1; i = i + 1)
	begin
		@(negedge Clock);
		// apply test pattern at neg edge
		Data = data_in[i];

		@(posedge Clock) 
		#450; //45 ns later
		// check result at posedge + 45 ns
		$display("Pattern#%d time%d: WE=%b; Waddr=%h; RE=%b; Raddr=%h; Data=%h; Expected Q=%h; Actual Q=%h", i, $stime, WE, WAddress, RE, RAddress, Data, data_out[i], Q);

		if ( Q !== data_out[i] )
			begin
			$display("  ** Error");
			numerrors = numerrors + 1;
			end
	end
	if (numerrors == 0)
	   
		$display("Good!  End of Good Simulation.");
	else
		if (numerrors > 1)
	      
			$display(
			  "%0d ERRORS!  End of Faulty Simulation.",numerrors);
		else
			$display(
			 "1 ERROR!  End of Faulty Simulation."); 
	
	#1000 $finish; // after 100 ns later
end
endmodule


3)  Register-based FIFO

To quickly determine whether or not this idea has a chance of working, use the following formula.

Total available registers >= user registers + fifo bits + 1.2*(fifo bits) for fifo control logic 

Verilog RTL CODE:
This code was designed to imitate the behavior of the Actel DX family dual-port SRAM based fifo. 

`timescale 1 ns/100 ps
//########################################################
//# Behavioral description of FIFO with :
//#    Active High write enable (WE)
//#    Active High read enable (RE)
//#    Active Low asynchronous clear (Aclr)
//#    Rising clock edge (Clock)
//#    Active High Full Flag
//#    Active Low Empty Flag
//#######################################################

module reg_fifo(Data, Q, Aclr, Clock, WE, RE, FF, EF);

parameter width = 8;
parameter depth = 8;
parameter addr = 3;

input Clock, WE, RE, Aclr;
input [width-1:0] Data;
output FF, EF;			//Full & Empty Flags
output [width-1:0] Q;
reg [width-1:0] Q;
reg [width-1:0] mem_data [depth-1:0];
reg [addr-1:0] WAddress, RAddress;
reg FF, EF;

// #########################################################
// # Write Functional Section
// #########################################################
// WRITE_ADDR_POINTER
always@(posedge Clock or negedge Aclr)
begin
	if(!Aclr) 
		WAddress = #2 0;
	else if (WE)
		WAddress = #2 WAddress + 1;
end

// WRITE_REG
always @(posedge Clock)
begin
	if(WE)
                mem_data[WAddress] = Data;
end

//#########################################################
//# Read Functional Section
//#########################################################
// READ_ADDR_POINTER
always@(posedge Clock or negedge Aclr)
begin
	if(!Aclr) 
		RAddress = #1 0;
	else if (RE)
		RAddress = #1 RAddress + 1;
end

// READ_REG
always @(posedge Clock)
begin
	if(RE)
		Q = mem_data[RAddress];
end

//#########################################################
//# Full Flag Functional Section : Active high
//#########################################################
always@(posedge Clock or negedge Aclr)
begin
        if(!Aclr)
                FF = #1 1'b0;
        else if ( (WE & !RE) && ( (WAddress == RAddress-1) || 
		( (WAddress == depth-1) && (RAddress == 1'b0) ) ) )
		FF = #1 1'b1;
	else
		FF = #1 1'b0;
end

//#########################################################
//# Empty Flag Functional Section : Active low
//#########################################################
always@(posedge Clock or negedge Aclr)
begin
        if(!Aclr)
                EF = #1 1'b0;
        else if ( (!WE & RE) && ( (WAddress == RAddress+1) || 
		( (RAddress == depth-1) && (WAddress == 1'b0) ) ) )
		EF = #1 1'b0;
	else
		EF = #1 1'b1;
end

endmodule

VHDL RTL CODE:
-- ************************************************* 
-- Behavioral description of dual-port FIFO with :
--     Active High write enable (WE) 
--     Active High read enable (RE)
--     Active Low asynchronous clear (Aclr) 
--     Rising clock edge (Clock)
--     Active High Full Flag
--     Active Low Empty Flag
-- *************************************************
     
library ieee;
use ieee.std_logic_1164.all;
use IEEE.std_logic_arith.all;
     
entity reg_fifo is
     
  generic (width    : integer:=8;
           depth    : integer:=8;
           addr     : integer:=3);
     
  port (Data   : in std_logic_vector(width-1 downto 0);
        Q      : out std_logic_vector(width-1 downto 0); 
        Aclr   : in std_logic;
        Clock : in std_logic;
        WE     : in std_logic;
        RE     : in std_logic;
        FF    : out std_logic;
        EF    : out std_logic);
     
end reg_fifo;
     
library ieee;
use ieee.std_logic_1164.all;
use IEEE.std_logic_arith.all;
use IEEE.std_logic_unsigned.all;
     
architecture behavioral of reg_fifo is

  type MEM is array(0 to depth-1) of std_logic_vector(width-1 downto 0); 
  signal ramTmp : MEM;
  signal WAddress : std_logic_vector(addr-1 downto 0);
  signal RAddress : std_logic_vector(addr-1 downto 0);
  signal words : std_logic_vector(addr-1 downto 0);
     
begin

--  words <= conv_std_logic_vector (depth-1,addr);
     
  -- ######################################################### 
  -- # Write Functional Section
  -- #########################################################
     
  WRITE_POINTER : process (Aclr,Clock) 
  begin
    if (Aclr = '0') then
      WAddress <= (others => '0');
    elsif (Clock'event and Clock = '1') then
      if (WE = '1') then
        if (WAddress = words) then
          WAddress <= (others => '0');
        else
          WAddress <= WAddress + '1';
        end if;
      end if;
    end if;
  end process;
     
  WRITE_RAM : process (Clock)
  begin
    if (Clock'event and Clock = '1') then
      if (WE = '1') then
        ramTmp(conv_integer (WAddress)) <= Data;
      end if;
    end if;
  end process;
     
  -- ######################################################### 
  -- # Read Functional Section
  -- #########################################################
     
  READ_POINTER : process (Aclr,Clock) 
  begin
    if (Aclr = '0') then
      RAddress <= (others => '0');
    elsif (Clock'event and Clock = '1') then
      if (RE = '1') then
        if (RAddress = words) then
          RAddress <= (others => '0');
        else
          RAddress <= RAddress + '1';
        end if;
      end if;
    end if;
  end process;
     
  READ_RAM : process (Clock)
  begin
    if (Clock'event and Clock = '1') then
      if (RE = '1') then
        Q <= ramTmp(conv_integer(RAddress));
      end if;
    end if;
  end process;
     
  -- ######################################################### 
  -- # Full Flag Functional Section : Active high
  -- #########################################################
     
  FFLAG : process (Aclr,Clock)
  begin
    if (Aclr = '0') then
      FF <= '0';
    elsif (Clock'event and Clock = '1') then
      if (WE = '1' and RE = '0') then
        if ((WAddress = RAddress-1) or
             ((WAddress = depth-1) and (RAddress = 0))) then
          FF <= '1';
        end if;
      else
	FF <= '0';
      end if;
    end if;
  end process;
     
  -- ######################################################### 
  -- # Empty Flag Functional Section : Active low
  -- #########################################################
     
  EFLAG : process (Aclr,Clock)
  begin
    if (Aclr = '0') then
      EF <= '0';
    elsif (Clock'event and Clock = '1') then
      if (RE = '1' and WE = '0') then
        if ((WAddress = RAddress+1) or
             ((RAddress = depth-1) and (WAddress = 0))) then
          EF <= '0';
        end if;
      else
	EF <= '1';
      end if;
    end if;
  end process;

end behavioral;

RTL Synthesis Results:
The above RTL synthesized to 86 registers with a total of 155 logic modules, which utilized 78% of a 1415A. It uses 23 IOs. It runs at 45MHz in -2 speed grade. The performance could be enhanced if Actgen counter were instanciated instead of being synthesized.

RTL Simulation:
Below is the verilog self-checking testbench. This testbench was used to verify the verilog RTL code and gate level results from the VHDL and the verilog synthesis.

`timescale 1 ns/100 ps
module test;

parameter numvecs = 25; // actual number of vectors
parameter width = 8;     // data bit width
parameter Clockper = 1000; // 100ns period

reg [width-1:0] Data;
reg Aclr, Clock, WE, RE;
reg [width-1:0] data_in  [0:numvecs-1];    // in vector matrix
reg [width-1:0] data_out [0:numvecs-1];    // out vector matrix

wire [width-1:0] Q;
wire FF, EF;

reg_fifo u0(.Data(Data), .Q(Q), .Aclr(Aclr), .Clock(Clock), .WE(WE), 
		.RE(RE), .FF(FF), .EF(EF));

integer i;
integer numerrors;

initial
begin
        // sequential test patterns entered at neg edge Clock

        data_in[0] =8'hff; data_out[0] = 8'hxx;
        data_in[1] =8'h00; data_out[1] = 8'hff;
        data_in[2] =8'h00; data_out[2] = 8'hff;
	data_in[3] =8'h01; data_out[3] = 8'hff;
	data_in[4] =8'h02; data_out[4] = 8'hff; 
	data_in[5] =8'h03; data_out[5] = 8'hff; 
	data_in[6] =8'h04; data_out[6] = 8'hff; 
	data_in[7] =8'h05; data_out[7] = 8'hff; 
	data_in[8] =8'h06; data_out[8] = 8'hff; 
	data_in[9] =8'h07; data_out[9] = 8'hff; 
	data_in[10] =8'h08; data_out[10] = 8'h00;
	data_in[11] =8'h09; data_out[11] = 8'h01;
	data_in[12] =8'h10; data_out[12] = 8'h02;
	data_in[13] =8'h11; data_out[13] = 8'h03;
	data_in[14] =8'h12; data_out[14] = 8'h04;
	data_in[15] =8'h13; data_out[15] = 8'h05;
	data_in[16] =8'h14; data_out[16] = 8'h06;
	data_in[17] =8'hff; data_out[17] = 8'h07;
	data_in[18] =8'hff; data_out[18] = 8'h07;
	data_in[19] =8'haa; data_out[19] = 8'hff;
	data_in[20] =8'h55; data_out[20] = 8'haa;
	data_in[21] =8'haa; data_out[21] = 8'h55;
	data_in[22] =8'h00; data_out[22] = 8'haa;
	data_in[23] =8'hff; data_out[23] = 8'h00;
	data_in[24] =8'haa; data_out[24] = 8'hff;
	
end

initial
begin
	Aclr = 0;
	Clock = 0;
	WE = 0;
	RE = 0;
	Data = 0;
end

always
begin
	#(Clockper / 2)  Clock = ~Clock;
end

initial #3450 Aclr = 1;

initial
begin
	#1450 WE = 1;
	#1000 WE = 0;
	#1000 WE = 1;
	#8000 WE = 0;
	#8000 WE = 1;
	#6000 WE = 0;
end

initial
begin
	#2450 RE = 1;
	#1000 RE = 0;
	#8000 RE = 1;
	#8000 RE = 0;
	#1000 RE = 1;
end

initial
begin
	numerrors = 0;
	$display("\nBeginning Simulation..."); 

	//skip first rising edge
	for (i = 0; i <= numvecs-1; i = i + 1)
	begin
		@(negedge Clock);
		// apply test pattern at neg edge
		Data = data_in[i];

		@(posedge Clock) 
		#450; //45 ns later
		// check result at posedge + 45 ns
		$display("Pattern#%d time%d: Aclr=%b; WE=%b; RE=%b; Data=%h; FF=%b; EF=%b; Expected Q=%h; Actual Q=%h", i, $stime, Aclr, WE, RE, Data, FF, EF, data_out[i], Q);

		if ( Q !== data_out[i] )
			begin
			$display("  ** Error");
			numerrors = numerrors + 1;
			end
	end
	if (numerrors == 0)
	   
		$display("Good!  End of Good Simulation.");
	else
		if (numerrors > 1)
	      
			$display(
			  "%0d ERRORS!  End of Faulty Simulation.",numerrors);
		else
			$display(
			 "1 ERROR!  End of Faulty Simulation."); 
	
	#1000 $finish; // after 100 ns later
end

endmodule