August 14, 2018

Important QnA on Database Management Systems (DBMS) - Comprehensive

Question) Draw an ER diagram that captures the following information:
The Prescriptions-R-X chain of pharmacies has offered to give you a free lifetime supply of medicine if you design its database. Assume the rising cost of health care, you approve. Here’s the information that you gather:
Patients are recognized by an SSN, and their names, addresses, and ages must be logged.
Doctors are known by an SSN. For individual doctor, the name, specialty and field, and years of experience need to be logged.Each pharmaceutical company is recognized by name and has a phone number.
For each drug, the trade name and formula must be noted. Each drug is traded by a certain  pharmaceutical company, and the trade name identifies a drug distinctively from amongst the products of that company. If a pharmaceutical company is removed, you must not keep track of its products any longer.
Each pharmacy has a name, address, and phone number.Every patient has a primary physician. Each doctor has at least one patient.Each pharmacy vends several drugs and has a certain price for each. A drug might be sold at numerous pharmacies, and the value could differ from one pharmacy to other.
Doctors prescribe drugs for patients. A doctor can prescribe one or additional drugs for quite a few patients, and a patient can acquire prescriptions from numerous doctors. Each prescription has a date and a quantity associated with it. You can accept that, if a doctor recommends the identical drug for the same patient more than once, only the last such medicament needs to be kept.
Pharmaceutical companies have long-term contracts with pharmacies. A pharmaceutical company can bond with quite a lot of pharmacies, and a pharmacy can deal with a number of pharmaceutical companies. For every contract/bond or deal, you have to collect  a start date, an end date, and the text of the contract.
Pharmacies appoint a supervisor for each contract. There should constantly be a supervisor for each contract, but the contract supervisor or the managing authority may change over the period of the contract.
Key points for ER diagram:
  • Entities correctly identified.
  • Attributes correctly identified.
  • Primary keys correctly identified.
  • Relationships and cardinality correctly identified.

Question)A Consider the following relational schema and translate the following SQL-query into an expression of the relational algebra.
• Student (snum, sname, major, level, age) 
• Class (name, meets at, room, fid) 
• Enrolled (snum, cname) 
• Faculty (fid, fname, deptid) 
                      SELECT S.sname 
                      FROM Student S 
                      WHERE S.snum 
                      NOT IN (
                      SELECT E.snum 
                      Enrolled E)

                              FROM Class C 
                              WHERE = ’R128’ OR IN 
                      (SELECT E.cname 
                      FROM Enrolled E 
                     GROUP BY E.cname 
                     HAVING COUNT(*) >= 5)

EmpMaster (eid: number, ename: varchar, sal: number, age: number, deptid: number)   
Dept(deptid: number, budgetAmt: number, floor: number, mgreid: number)
Salaries range from Rs.10,000 to Rs.100,000, ages vary from 20 to 80, each department has about five employees on average, there are 10 floors, and budgets vary from RS.10,000 to Rs 100,000. You can assume uniform distribution of values.
Query: Print ename, age, and sal for all employees.
Query: Find the deptids of departments that are on the 10th floor and have a budget of less than Rs25,000     
a) For each of the following queries, which index would you choose to speed up the query? 
b)If your database system does not consider index-only plans (i.e.,data records are always retrieved even if enough information is available in the index entry), how would your answer change? 
Explain briefly.

a)We can build an unclustered hash index on ename, age, sal fields of EmpMaster as then we could do “On index only” test. If our system does not comprise of” index only “plans then we should not make index for this query. Subsequently this query needs us to access all the Emp records, an index won’t help us, and so we can access the records using a filescan.

b)For the second scenario we can create a clustered dense B+tree index on floor, budget fields of Dept, since the records would be well-ordered on these fields then. So when performing this query, the first record with floor=10 must be fetched, and then the other records with floor =10 can be read in ascending order of budget. This plan, which is the best for this query, is not an index-only plan.

Question) Modern disk drives store more sectors on the outer tracks than the inner tracks.
As the rotation speed is persistent, the sequential data transferal rate is also greater on the outer tracks.The seek time and rotational delay are unchanged. Given this information, explain good strategies for placing files with the following kinds of access patterns:
a. Frequent, random accesses to a small file (e.g., catalog relations).
b. Sequential scans of a large file (e.g., selection from a relation with no index).
c. Random accesses to a large file via an index (e.g., selection from a relation via the index).
d. Sequential scans of a small file.

a.Place the file in the middle tracks. Sequential speed is not an issue due to the small size of the file, and the seek time is minimized by placing files in the center.
b. Place the file in the outer tracks. Sequential speed is most important and outer tracks maximize it. 
c.   Place the file and index on the inner tracks. The DBMS will alternately access pages of the index and of the file, and so the two should reside in close proximity to reduce seek times. When we place the file and the index on the inner tracks we save valuable space as well on the faster (outer) tracks for additional files that are retrieved successively.
d. Place small files in the inner half of the disk. A scan of a minor  file is effectually random I/O because the cost is conquered by the cost of the first initial seek to the commencement of the file.

Question) What are checkpoints and why are they important? List the actions taken by the recovery manager during checkpoints? 

Answer) Checkpoint is a type of entry in the log. A checkpoint record is printed into the log from time to time at that point when the system prints out to the database on disk all DBMS buffers that have been reformed. As a result of this, all transactions that have their [commit, T] entries in the log before a checkpoint entry do not need to have their WRITE operations redone in case of a system crash, since all their updates will be recorded in the database on disk during check pointing. Actions taken by the recovery manager during Checkpoint The recovery manager of a DBMS must decide at what intervals to take a checkpoint. The interval may be measured in time say, every m minutes or in the number t of committed transactions since the last checkpoint, where the values of m or t are system parameters.  
Checkpoints taken by recovery manager consists of the following actions:
 1. Suspend execution of transactions temporarily. 
2. Force-write all main memory buffers that have been modified to disk..
3. Write a [checkpoint] record to the log, and force-write the log to disk. 
4. Resume executing transactions. As a consequence of Step 2, a checkpoint record in the log may also include additional information, such as a list of active transaction ids, and the locations (addresses) of the first and most recent (last) records in the log for each active transaction. This can facilitate undoing transaction operations in the event that a transaction must be rolled back.

Question) Consider an empty B+-tree with n=4, insert the following record keys in the specified order: 
80, 25, 69, 14, 58, 3, 47, 91, 36, 81. 

Question)  Consider the following partial Schedule S involving two transactions T1 and T2. Only the read and the write operations have been shown. The read action on data item P is signified by read (P) and the write action on data item P is signified by write (P).  What would the result when the transaction T1 fails instantly after time instance 9.
Schedule S is non-recoverable and cannot ensure transaction atomicity because T2 reads value of 'A' which is written by T1 and T2 is committed before T1. 

Question) Consider two transactions:
                               T1: BEGIN A=A+100, B=B-100 END
                               T2: BEGIN A=1.06*A, B=1.06*B END
• 1st TXN transfers $100 from B’s account to A’s
• 2nd TXN credits both accounts with 6% interest.
Assume at first A and B each have $1000. What are the authorized outcomes of running T1 and T2???

Answer) There is no guarantee that T1 will execute before T2 or vice-versa, if both are submitted together. But, the net effect must be equal to these two transactions running consecutively in some random order.
Legal outcomes: A=1166, B=954 or A=1160, B=960

Question) Suppose you are given a relation R with four attributes ABCD. For each of the following sets of FDs, assuming those are the only dependencies that hold for R.                                                 
(a) Identify the candidate key(s) for R. 
(b) Identify the best normal form that R fulfills (1NF, 2NF, 3NF, or BCNF).
(c) If R is not in BCNF, decompose it into a set of BCNF relations that preserve the dependencies. 
 Perform the above tasks for the following set of functional Dependencies:         
1. C → D, C → A, B → C
2. B → C, D → A
3. ABC → D, D → A
4. A → B, BC → D, A → C
5. AB → C, AB → D, C → A, D → B

1. (a) Candidate keys: B  
   (b) R is in 2NF but not 3NF.
   (c) C → D and C → A both cause violations of BCNF. One way to obtain a (lossless) join preserving decomposition is to decompose R into AC, BC, and CD.

2. (a) Candidate keys: BD  
    (b) R is in 1NF form and not 2NF.
    (c) Equally B → C and D → A cause BCNF desecration or violation. The decomposition: AD, BC, BD (achieved by first decomposing to AD, BCD) is BCNF and lossless and join-preserving.

3. (a) Candidate keys: ABC, BCD  
    (b) R is in 3NF and not BCNF.
    (c) ABCD is not in BCNF form because D → A and D is not a key for it. However when we split up R as AD, BCD we cannot reserve the dependence ABC → D. Hence there is no BCNF decomposition.

4. (a) Candidate keys: A
    (b) R is in 2NF and not 3NF (as FD: BC → D).
    (c) BC → D violates BCNF since BC does not comprise a key. Therefore we divide up R as in: BCD, ABC.

5. (a) Candidate keys: AB, BC, CD, AD
    (b) R is in 3NF and is not BCNF (as FD: C → A). 
    (c) C → A and D → B both cause violations. Hence we decompose into: AC, BCD but this does not preserve AB → C   and AB → D, and BCD is yet still not BCNF because D → B. Therefore we need to decompose more into: AC, BD, CD. However, when we try to revive the lost functional dependencies by adding ABC and ABD,  these relations are not in BCNF form. Therefore, we can say there is no BCNF decomposition.

August 13, 2018

Important QnA On Operating System

Question )Deadlock prevention algorithms prevent deadlocks by restraining how requests can be made. A different method for avoiding deadlocks is to require additional information about how resources are to be requested.
For example, consider the following snapshot.                

a. How many resources are there of type (A, B, C)?
b. What are the contents of the Need matrix?
c. Is this system currently in a safe state?  Justify your answer.
d. If  a  request  from  P3  reaches  for (1, 0, 0 ),  can  the  request be
      safely granted immediately?
a. (3,14,11) resources = total + avail
b. Need = Max - Allocation.
 P0 0 0 0
 P1 0 7 5
 P2 1 0 0
 P3 0 0 2
c.Yes, because the processes can be implemented and processed in the order P0, P2, P1, P3.

Question )Consider a system in which memory contains the following hole size in memory direction order:
 20K, 10K, 18K, 9K, 12K and 15K.          
Which hole is taken for successive segment request of
a. 6 KB
b. 10 KB
c. 12 KB
d. 9 KB
for the First fit, Best Fit, and Worst Fit.  Clearly show the memory allocation using a diagram. Which algorithm makes the most efficient use of memory?
       First fit: (a) 20K (b) remainder of 20K (c) 18K (d) 10K           
       Best fit: (a) 9K (b) 10K (c) 12K (d) 15K 
       Worst fit: (a) 20K (b) 18K (c) 15K (d) remainder of 20K 

Question) Suppose two processes enter the ready queue with the following properties: Process 1 has a total of 8 units of work to perform, but after every 2 units of work, it must perform 1 unit of I/O (so the minimum completion time of this process is 12 units). Assume that there is no work to be done following the last I/O operation. Process 2 has a total of 20 units of work to perform. This process arrives just behind P1. Show the resultant schedule for the shortest-job-first (preemptive) and the round-robin algorithms. Assume a time slice of 4 units of RR. What is the completion time of each process under each algorithm? 
Completion time of process p1 = 12 unit
Completion time of process p2 = 28 unit
Round- robin (R R) algorithm:
Completion time of process p1 = 20 unit
Completion time of process p2 = 28 unit

Question) A data of 700GB is to be stored in disks of 200GB each. 
a. How many such disks are required if the following RAID structures are implemented? Justify your answers with explanation and diagrams.
i. RAID  1
ii. RAID  0 
iii. RAID  6 
iv. RAID  5
b.If the system implements RAID 4, how many I/O operations are required for a write operation?
a)1) 8 disks     2)    4 disks     3)    4+2=6 disks    4)  4+1= 5 disks
(with explanation and diagrams)

b)5 I/o operations

Question) Consider a system with a 90% hit ratio, 40 nano-seconds times to search the associative registers, 600 nano-seconds times to access memory. Find the time to access a page :       
a. When the page number is in associative memory? 
b. When the time to access a page when not in associative memory?
c. Find the effective memory access time? 
(a) The time needed is 40 nano seconds to fetch the page from the associative memory and 600 nano seconds to read the preferred word from memory.
Time = 40+600 = 640 nano seconds

 (b)The time when the page is not in associative memory
Time=40+600+600=1240 nano seconds
One memory access extra is required to read the page table from memory.

 (c)Effective memory access time = page number in associative memory + page number, not in associative memory.
Page number in associative memory = 0.9x640
Page number not in associative memory=0.1x1240
Effective memory access time = 0.9 x 640 + 0.1 x 1240 = 700 nano seconds

Question) Consider a physical memory of size 264 KB which is managed by buddies’ scheme.  Suppose a set of five processes namely P1, P2, P3, P4 and P5demand memory requirement of size 2028KB, 112KB, 288 KB, 598KB and 25KB respectively. What would be the size of the memory blocks allocated to these four programs? Is it a fair allocation? Justify.
As Buddy system allocates contiguous blocks of fixed size memory this is the power factor of 2. Therefore, if a process request memory which doesn’t fit into the appropriately sized units, the next highest power of 2 sizes of memory is allocated. In our case, allocated block size are as follows,
Process P1 which request 2028KB, the nearest power of 2 is 2048KB (211KB) which could satisfy the request. Therefore, the buddy scheme allocates a contiguous memory block of size 211 KB. Similarly, the process P2 gets memory block of size 27  KB, process P3 gets memory block of size 29 KB and process p4 gets memory block of size 210 KB and P5 gets 25.
Justification: The serious drawback of this approach is that it is extremely inefficient in terms of memory utilization. The reason is that this scheme services all the requests by always allocating memory which must be rounded up to a power of 2. This results in the substantial amount of internal fragmentation, since, on an average, it allocates almost double the size of requested memory which is simply a waste on the part of memory usage that often summarily leads to nearly half of the memory remained unutilized. However, the distinct advantage of this scheme is, that deallocation is quite fast, and subsequent coalescing of released memory to form a larger free contiguous memory block for subsequent use can be carried out at much faster speed than its other competitors. 

Question) Linux’s security model is closely related to typical UNIX security mechanisms. Linux performs access control by assigning objects a protection mask that specifies which access modes are to be granted to processes. Give any four examples of file protection modes with allowed file accesses.
Answer) Any four with an explanation 
Question) Consider a logical address space of eight pages of 1024 bytes each, mapped onto a physical memory of 32 frames.   
a. How many bits are there in the logical address? Identify in the logical address the bits used as an offset and the bits used as the page number.
b. How many bits are there in the physical address? Identify in the physical address the bits used as an offset and the bits used as the frame number.
(a) Logical address: 13 bits (8 pages X 1024 bytes/page = 8196 bytes – requires 213 addresses). Bits for offset: Bits 0 to 9 (10 bits for offset into 1024, 210 byte page), 3 bits identify page number (23= 8 pages)   
(b) Physical address: 15 bits (32 frames X 1024 byes/frame = 32768 bytes, requires 215 addresses). Bits for offset: Bits 0 to 9 (10 bits for offset into 1024, 210 byte frame), 5 bits identify frame number (25= 32 frames)   

Question) What are the differences between user-level threads and kernel level threads?
 (1) User-level threads are unidentified by the kernel, whereas the kernel is conscious of kernel threads. 
(2) On systems with either M:1 or M:N mapping, user threads are organized and planned by the thread library and the kernel plans kernel threads. 
(3) Kernel threads may not be associated with a process, however, every single user thread fits a specific process. Kernel threads are usually more costly to be maintained than user threads as they need to be denoted with a kernel data structure. 

Question) Describe the actions taken by a kernel to context-switch between kernel-level threads.
Context switching between kernel threads typically requires saving the value of the CPU registers from the thread being switched out and restoring the CPU registers of the new thread being scheduled

Question) In demand paging, a victim frame is selected using the page replacement algorithm, if there is no free frame in the memory. In general, the page replacement algorithm with the lowest page fault rate is selected as the scheme. Consider the following page-reference string in a 3-frames system 
1  2  3  2  1  5  2  1  6  2  5  6  3 1  3  6  1  2   4   3         
a. How many page faults would occur with FIFO page replacement? 
b.How many page faults would occur with LRU page replacement?                  
c.How many page faults would occur with OPT page replacement?  
d.Check for Belady’s anomaly considering 4 frames for FIFO.       
               FIFO -14 9 Page faults       
       LRU – 11 9 Page faults     
       OPTIMAL – 9 Page faults
               Checking for Belady’s anomaly

Question) Given the following queue 95, 180, 34, 119, 11, 123, 62, 64 with the Read-Write head initially at the track 50 and the tail track being at 199. Apply the following algorithms and find the total head movements. Which algorithm results in lesser head movement so as to achieve a faster seek time? 
a.First Come-First Serve (FCFS)
b.Shortest Seek Time First (SSTF)
c.Elevator (SCAN) 
d.Circular SCAN (C-SCAN)
(a)FCFS - For this case, it went from 50 to 95 to 180 and so on. From 50 to 95 it moved 45 tracks. In this example, it had a total head movement of 640 tracks. 

(b)SSTF- In this case request is serviced according to next shortest distance. Starting at 50, the following shortest distance would be 62 and not 34 since it is merely 12 tracks far from 62 and 16 tracks far from 34. The process would continue until all the process is taken care of. For instance, the subsequent case would be to move from 62 to 64 and not 34 since there are only 2 tracks among them and not even 18 if it were to go the other way around. 
50, 62, 64, 34, 11, 95, 119, 123, 180
Total Head movements - 236 tracks

(c)SCAN - This approach works like an elevator does. It tests down to the closest end and then when it hits the bottommost it tests up examining the requests that it didn't get while going down. If a request arises in when it is scanned it will not be serviced up until the process originates back down or moves back up.
50, 34, 11, 0, 62, 64, 95, 119, 123, 180 
Total Head movements - 230 tracks

(d)C-SCAN - Circular scanning works just like the elevator to some extent. It begins its scan toward the closest end and works it all the way to the final end of the system. Once it hits the bottom or top it jumps to the other end and moves in the same direction. Keep in mind that the huge jump doesn't count as a head movement. 
The total head movement for this algorithm -187 track
50, 34, 11, 0, 180, 123, 119, 95, 64, 62 

(e)C-LOOK - This is just an enhanced version of C-SCAN. In this, the scanning doesn't go past the last request in the direction that it is moving. It too jumps to the other end but not all the way to the end. Just to the furthest request. 
C-LOOK total head movements - 157 tracks.
50, 34, 11, 180, 123, 119, 95, 64, 62 

August 04, 2018

OOAD - Case Studies

Case Studies for Exam Preparation:

Give two examples of information about a problem domain that can be captured in UML Activity Diagrams, and two ways in which these diagrams can be useful for Requirements Analysis.

Examples of info represented:
  1. Activity diagrams show how tasks depend on one another in a business process. For example, they can show that some tasks can be carried out in parallel, while other tasks have to be synchronized so that one doesn’t start until the tasks on which it depends are complete.
  2. Activity diagrams show the flow of activity between different business units, using swimlanes. In other words, they show how different business units must work together to carry out some process.
Activity diagrams are useful in requirements analysis:
  1. To understand the existing business processes, showing how an organization currently solves the problem.
  2. For sketching proposed solutions to the stakeholders to show how a proposed solution would change the way in which the organization works.
Draw a UML Class Diagram representing the following elements from the problem domain for a hockey league. A hockey league is made up of at least four hockey teams.  Each hockey team is composed of six to twelve players, and one player captains the team. A team has a name and a record.  Players have a number and a position.  Hockey teams play games against each other.  Each game has a score and a location.  Teams are sometimes lead by a coach. A coach has a level of accreditation and a number of years of experience, and can coach multiple teams.  Coaches and players are people, and people have names and addresses. Draw a class diagram for this information, and be sure to label all associations with appropriate multiplicities.


Notes: Captain could alternatively be represented as a second, named association between player and team. Assumptions: each player only plays on one team, each captain only captains one team, each team only plays in one league.

Do you know that it costs a lot of money to get a ’Certified Java Programmer’ certificate? It could cost you thousands of euros. Let’s say we will create a browser-based training scheme to support people to get ready for a certification exam. A user can appeal a quiz for the system. The system chooses a collection of questions from its database, and constitutes them together to create a quiz. It charges the user’s answers, and provides hints if the customer desires it.
Additionally, we also have professors who deliver questions and tip-offs. And similarly examinators who must declare problems to make sure they are not also trivial, and are sensical. Draw a use case diagram to create this classic system. Work out more or less of your use cases. As we don’t have real stake holders now, you are allowed to fill in facts you think is sensical for this case.


We’ll assume multiple choice quiz.
Use case: Make quiz.
Primary actor: User
Secondary actors: Pre-condition: The system has at least 10 questions.
Main flow:
1. The use-case is activated when the user requests it.
2. The user specifies the difficulty level.
3. The system chooses 10 questions, and proposes them as a quiz to the user.
4. The system starts a regulated timer.
5. For every question: 5a. The user selects an answer, or skip. [Extension point]
6. If the user has done the quiz, or the timer turns out, the quiz is finished.
Use case: Provide hint
Primary actor: User
Secondary actors: Pre-condition: The user requests for a hint. Post-condition:
Main flow:
1. The system provides a hint. The verbosity of the hint is determined by the difficulty level set previously by the user.
2. Return to to Make quiz’ main flow.

Suppose we want to develop software for an alarm clock. The clock shows the time of day. By means of buttons, the user can customize the hours and minutes fields independently, and choose among 12 and 24-hour display. It is possible to set one or two alarms. When an alarm fires, it will sound some noise. The user can turn it off, or choose to ’snooze’. If the user does not answer back at all, the alarm will turn off itself later after 2 minutes. ’Snoozing’ means to turn off the sound, but the alarm will ring again after few minutes of delay. This ’snoozing time’ can be pre-adjusted.
Identify the top-level functional requirement for the clock, and model it with a use case diagram.

Use case: Snooze.
Primary actor: User
Secondary actors: Pre-condition: An alarm is firing.
Post-condition: Main flow:
1. The use-case is activated when the user hits the snooze button.
2. The alarm is turned off.
3. Wait for snooze time.
4. Include use case ’Make sound’

Use case: Make sound
Primary actor: System
Secondary actors: Pre-condition:
Main flow: The use case starts when it is called.
What it does is to just make some noisy sound.

Draw an UML state diagram corresponding to the computer keyboard state machine. Also draw an extended state diagram to model the behavior of the keyboard which depends on the number of characters typed on it so far and that after, say, 1,000 keystrokes, the keyboard breaks down and enters the final state.(Assume normal function of the key board of a personal computer which you use regularly)


state diagram representing the computer keyboard state machine

The Gang of Four says that the intent of the Adapter pattern is to "change the interface of a class into another interface that the clients expect. Adapter lets classes work together with each other that could not otherwise as of mismatched interfaces."
What does this imply? Give an example.

This says that I have a class that desires to interact with a different class through a assured set of method calls. When the interface of that different class does not offer these method calls, the Adapter creates up a fresh interface to do the interpretation. For instance a reporting application that wishes to pull data from two diverse database systems. My application needs to use a "GetDate" method to attract info from the database, but the database systems don't deliver that through their API. Therefore I write an Adapter that gives the GetDate method in its interface and is accountable for pulling the data appropriately.

The Façade pattern and the Adapter pattern may seem similar. What is the essential difference between the two? Explain with an example.

In both circumstances, there is a pre-existing class or classes that have functionality needed. So for both cases, I build an intermediate object with interfaces that my system needs to use and that has duty for mapping that to the pre-existing class. Equally Façade and Adapter are wrappers. The Adapter is used when our client previously has predefined interfaces that it believes to use and when I want to practice polymorphism.

Data Warehouse Reference - QnA

How can you apply the data to the warehouse? What are the modes?
Data may be applied in the following four different modes: load, append, destructive merge, and constructive merge. Let us understanding of the effect of applying data in each of these four modes:

Load: If the target table to be loaded already exists and data exists in the table, the load process wipes out the existing data and applies the data from the incoming file. If the table is already empty before loading, the load process simply applies the data from the incoming file.

Append:You may think of the append as an extension of the load. If data already exists in the table, the append process unconditionally adds the incoming data, preserving the existing data in the target table. When an incoming record is a duplicate of an already existing record, you may define how to handle an incoming duplicate. The incoming record may be allowed to be added as a duplicate. In the other option, the incoming duplicate record may be rejected during the append process.

Destructive Merge : Merge In this mode, you apply the incoming data to the target data. If the primary key of an incoming record matches with the key of an existing record, update the matching target record. If the incoming record is a new record without a match with any existing record, add the incoming record to the target table.

Constructive Merge: This mode is slightly different from the destructive merge. If the primary key of an incoming record matches with the key of an existing record, leave the existing record, add the incoming record, and mark the added record as superseding the old record.

Let's say that the data warehouse for Big_University consists of four dimension students, courses, semesters and trainers, and there are two measurements and avg_grade. At the lowest ideological level (eg, for a given student, curriculum, semester and trainer combination), avg_grade measures the student's actual course grade. At higher conceptual levels, avg_grade stores the average grade for the given combination. Draw a snowflake schema diagram.

Based on current trends in technology need to design information systems . Explain the points to be taken care with respective traditional operational systems and the newer informational systems that need to be built?
The essential reason for the lack of ability to provide strategic facts is that we have been trying all along to provide strategic facts from the operational systems. These operational systems such as command processing, record control, dues and claims processing, casualty billing, and so on are not planned or intended to deliver strategic information. If we need the skill to provide strategic data and information, we must get the information from overall different types of systems. Specially designed decision care systems or informational systems can deliver strategic information.
  We find that in order to provide strategic information we need to build informational systems that are different from the operational systems we have been building to run the basic business. It will be worthless to continue to dip into the operational systems for strategic information as we have been doing in the past. As companies face fiercer competition and businesses become more complex, continuing the past practices will only lead to disaster.
  • Watching the wheels of business turn
  • Show me the top-selling products
  • Show me the problem regions
  • Tell me why (drill down)
  • Let me see other data (drill across)
  • Show the highest margins
  • Alert me when a district sells below target
We need to design and build informational systems
  • That serve different purposes
  • Whose scopes are different
  • Whose data content is different
  • Where the data usage patterns are different
  • Where the data access types are different
2-D data pulled out from the data cube.

Product ID
Location ID
Number Sold

Represent the above into 3-D format, focussing majorly  on product-id and sales

Product ID
Location ID
Total Sold

What is a  OLAP cube?                                                                                                        

An OLAP data cube is a representation of data in multiple dimensions, using facts and dimensions. It is characterized by the combination of information according to it’s relationship. It can consist in a collection of 0 to many dimensions, representing specific data. 
There are five basic operation to perform on these kind of data cubes: 
  1. Slicing
  2. Dicing
  3. Roll-Up
  4. Drill-Up
  5. Drill-Down
  6. Pivoting
Why is dimensional normalization not required?
Dimensional normalization allows to solve database related problems. It is used to remove unnecessary features which are used as De-normalized dimensions. Dimensions have sub-dimensions which are added together. Due to this fact dimensional generalization is not used:
  • Data structure is more complex and which can cause performance to be degraded because it needs to be included in tables and relationships are retained
  • Query Performance suffers while collecting or retrieving multiple dimensional values It requires proper analysis and operational reports.
  • Space is not used properly and more space is needed.
What are the steps involved in creating dimensional modeling process?
The business process of the dimensional modeling includes:

(a) Choose The Business Process: In this, 4-step design method is followed that helps to provide the usability of the dimensional model. This allows the business process to be more systematic in representation and more helpful in explaining it as well. It includes the use of Business Process Modelling Notation (BPMN) or Unified Modelling Language (UML).

(b)Declaring The Grain: After choosing the business process, the declaration of the model comes that consists of grains. The grain of the model provides the accurate description of the dimensional model and allows the focus should be shifted there.

(c)Identify The Dimensions:In this phase, the dimension is identified in the dimensional model. Dimensions are defined in cereals which are defined in the declaration part above. Dimensions acts as a foundation of the fact table where the data gets collected that comes under the fact. 

(d) Identify The Facts: Defining the dimensions provides a way to create a table in which the fact data can be stored. These facts are populated on the basis of the numerical figures and facts.

Consider a data warehouse, where the fact data is calculated to be 36GB of data per year, and 4 years’ worth of data are to be kept online. The data is to be partitioned by month and four concurrent queries are to be allowed.
Compute the partition size, Temporary Space and Space Required for this scenario. 
Partition size P = 36GB per year / 12 = 3 GB
T = (2n +1)P = [(2 x 4) + 1]3 = 27 GB
F = 36GB X 4 years = 144 GB
Space Required = 3.5F + T = 3.5 X 144 + 27 = 531 GB

Discuss the merits and demerits of using views from the perspective of security of data warehouse.
Views are easier option to define security initially. Later it will cause challenges.
Some of the common restrictions that may apply to the handling of views are:
  •     restricted data manipulation language (DML) operations,
  •     lost query optimization paths,
  •     restrictions on parallel processing of view projections.
The use of views to enforce security will impose a maintenance overhead. In particular, if views are used to enforce restricted access to data tables and aggregations, as these changes, the views may also change.
 For following statements, indicate True or False with proper justification:

A.    It is a good practice to drop the indexes before the initial load.
True.  Index entry creations during mass loads can be too time-consuming. So drop the indexes prior to the loads to make the loads go quicker. You may rebuild or regenerate the indexes when the loads are complete 

B.    The choice of index type depends on cardinality.
True. Bit-map index can be used only for low cardinality data

C.    The importance of metadata is the same for data warehouse and an operational system.
False.  In an operational system, users get information thru predefined screens and reports. In DW, users seek information thru ad-hoc queries.

D.    Backing up the data warehouse is not necessary because you can recover data from the source systems.
False. Information in DW is accumulated over long periods and elaborately preprocessed
E.    MPP is a shared-memory parallel hardware configuration.
False.  MPP is a share-nothing hardware architecture.