Updating a file random acess in
Alternatively, you could block-gzip the reformatted data and keep a record of how many blocks are in the file and how many reads are in each block (since the filesize will no longer reflect the number of records in the file).
output.grabix index output.# retrieve 5-th record (1-based) in log(n) time # requires some math to convert indices (4*4 1, 4*4 4) = (17, 20) grabix grab output.17 20 # Count the number of records for part two of this question export N_LINES=$(gzip -dc output.| wc -l) output.tabix -s 1 -b 2 -e 2 output.# now retrieve the 5th record (1-based) in log(n) time tabix output.dummy:5-5 # This command will retrieve the 5th record and convert it record back into FASTQ format tabix output.dummy:5-5 | perl -pe 's/^dummy\t\d \t//' | tr '\t' '\n' # Count the number of records for part two of this question export N_RECORDS=$(gzip -dc output.| wc -l) # random_import os import random n_records = int(os.environ["N_LINES"]) // 4 rand_record_start = random.randrange(0, n_records) * 4 1 rand_record_end = rand_record_start 3 os.system("grabix grab output.".format(rand_record_start, rand_record_end)) # random_import os import random n_records = int(os.environ["N_RECORDS"]) rand_record_index = random.randrange(0, n_records) 1 # super ugly, but works...
One of the most thorough treatments of this question (or a similar question: grabbing a random subset of reads) was given by Jared Simpson in a blog post a few years ago.
If you just want to grab a single random read, Jared's benchmarks suggest that seeking to a random position in the file and then retrieving the next complete read should be the most performant option.
in front office trading application and FIX Engine, you can use random access file to store FIX sequence numbers or all open orders.
This will be handy when you recover from crash and you need to build your in memory cache to the state just before the crash.
os.system( "tabix output.dummy:- | perl -pe 's/^dummy\t\d \t//' | tr '\t' '\n'".format( rand_record_index) ) calls a system shell and is vulnerable to shell injection vulnerabilities.