ICT & BIT Resources

This page provides a lot of ICT lesons and resources.

eCommerce

BIT Year 3 – Semester 6 – Detailed Syllabi  IT6304
1
IT6304  e-Business Applications (Optional)

INTRODUCTION

This  is  one  of  the  optional  courses  designed  for  Semester  6  of  the  Bachelor  of
Information  Technology  Degree  program.  It  provides  a  sound  understanding  of  the
applications and technologies in e-Business.
CREDITS: 04
LEARNING OUTCOMES
After successful completion of this course, the students will be able to:
•  Describe the concepts in e-Business, the Business applications, marketing on
the  web,  the  new  revenue  models  and  latest  payment  mechanisms,  legal
issues  related  to  B2C  (Business  to  Consumer)  and  B2B  (Business  to
Business) applications
•  Prepare  themselves  with  the  skills  needed  to  work  in  any  e-Business
environment  and  to  decide  on  strategic  business  decisions  related  to  e-Business
•  Realize Ethics and Professional Issues in an e-Business Environment
•  Prepare  themselves  to  work  in  an  e-Business  environment  in  the  global
market
•  Enhance  their  ability  to  take  a  company  through  the  e-Business
Transformation process

MINOR MODIFICATIONS
When  minor  modifications  are  made  to  this  syllabus,  those  will  be  reflected  in  the
Virtual Learning Environment (VLE) and the latest version can be downloaded from
the  relevant  course  page  of  VLE.  Please  inform  your  suggestions  and  comments
through the VLE. http://vle.bit.lk

ONLINE LEARNING MATERIALS AND ACTIVITIES
You can access all learning materials and this syllabus in the VLE: http://vle.bit.lk, if
you are a registered student of BIT degree program. It is very important to participate
in learning activities given in the VLE to learn this subject.

BIT Year 3 – Semester 6 – Detailed Syllabi  IT6304
2
FINAL EXAMINATION
Final  exam  of  the  course  will  be  held  at  the  end  of  the  semester.  Learning
activities and tutorial exercises are very important in this course, and  as they will
help students to prepare themselves for the final semester exam. Final exam is a
two hour written paper with four compulsory questions.

OUTLINE OF SYLLABUS
Topic  Hours
1- Introduction to e-Business       03
2- eMarketplaces and Revenue Models          04
3- The Commercial Use of The Internet and The World Wide Web          03
4- e-Business Applications         03
5- Business Strategies for eCommerce         07
6- Revenue Models for e-Business on the Web         04
7- Marketing on The Web         04
8- B2C Interactions and B2B Collaborations         04
9- Online Auctions, Virtual Communities and Evolving Concepts         08
10- e-Business Transformation         06
11- Sri Lankan Context for Electronic Commerce         04
Lectures
Practical and Tutorials
50
10

Total for the subject         60
* Students are expected to have a total of 10 hours of practical to strengthen their
knowledge of these sections
** Students are expected to have shallow and up-to-date knowledge of these
sections by self-study
BIT Year 3 – Semester 6 – Detailed Syllabi  IT6304
3
REQUIRED MATERIALS
Main Reading

Ref 1: Schneider, Garry P., 2012, E-Commerce: Strategy, Technology, and
Implementation, 9
th
Edition, first India edition ISBN-13: 978-81-315-1623-2

Ref 2: http://en.wikipedia.org/wiki/Cloud_computing (Last accessed : 1/03/2012)
Ref 3: Arunatileka, S. & Ginige, A. (2003). Seven Es in eTransformation, in
Proceedings of the International Association for the Development of Information
Society (IADIS) International Conference– e-Society, Lisbon, Portugal.
Ref 4: http://www.srilanka.lk (Last accessed : 1/03/2012)
Supplementary Reading
Ref 5: The E-business (R)Evolution: Living and Working in An Interconnected World, by
Daniel Amor (2001), ISBN 0-13-067039-1, Prentice Hall

Ref 6: Electronic  commerce: A managerial perspective, by Turban E, Lee J, King D &
Chung H.M. (2000) N.J.: Prentice Hall.

DETAILED SYLLABUS:
Section 1 : Introduction to e-Business (3 hrs)

Instructional Objectives
•  Identify  the basic concepts of e-Business
•  Describe the advantages of e-Business
•  Identify the parties involved in e-Business
•  Describe how to get the e-Business services
•  Relate e-Business success stories
Material /Sub Topics
1.1.   Introduction to e-Business [Ref 1: pg 16-17]
1.2.  Classifications of e-Business (B2C, B2B, C2C, B2G,….) [Ref 1: pg 18-20]
1.3.  Advantages and disadvantages of e-Business [Ref 1: pg 30-32]
1.4.  The e-Business Environment [Ref 1: pg 27-30]
1.5.  Customer business interaction in e-Business [Ref : Teacher’s note]
1.6.  e-Business success stories [Ref 1: pg 57-61, Ref : Teacher’s note]
BIT Year 3 – Semester 6 – Detailed Syllabi  IT6304
4

Section 2 : eMarketplaces and Revenue Models (4 hrs)

Instructional Objectives
Material /Sub Topics
2.1  eMarketplaces [Ref 1: pg 268-272]
2.1.1  Private eMarketplaces
2.1.2  Public eMaretplaces
2.1.3  Consortia
2.2 Revenue Models [Ref 1: pg 147-172]
2.2.1  Web Catalog
2.2.2  Books/Music/Videos
2.2.3  Goods
2.2.4  Services
2.2.5  Digital content
2.2.6  Academic content
2.2.7  Web Portals
2.2.8  Classified Advertising
2.2.9  Subscription Models
2.2.10  Fee-for-Transaction Model
2.2.11  Advertising Models

Section 3 : The Commercial Use of The Internet and The World Wide Web (3 hrs)

Instructional Objectives

•  Identify traditional business models and new business models
•  Describe the technologies enabling new business models
Material /Sub Topics
3.1.  Direct – to – customer interaction [Ref : Teacher’s note]
3.2.  Mass customization [Ref : Teacher’s note]
3.3.  Open business models [Ref : Teacher’s note]
3.4.  Virtual organization [Ref : Teacher’s note]

Section 4 : e-Business Applications (3 hrs)

Instructional Objectives
•  Describe driving forces for change to e-Business
BIT Year 3 – Semester 6 – Detailed Syllabi  IT6304
5
•  Identify technological advancements
•  Describe the traditional and new value chain
•  Identify the new strategic changes in e-Business
Material /Sub Topics
4.1.  The business environment [Ref : Teacher’s note]
4.2.  Driving forces for change [Ref : Teacher’s note]
4.2.1.  Technical forces
4.2.2.  Business driven forces
4.2.3.  External forces
4.2.4.  Internal forces
4.3.  Customer disruption [Ref : Teacher’s note]
4.4.  Product disruption [Ref : Teacher’s note]
4.5.  Price disruption [Ref : Teacher’s note]
4.6.  Intelligent agents [Ref : Teacher’s note]

Section 5 : Business Strategies for eCommerce (8 hrs)

Instructional Objectives
•  Describe business processes
•  Identify the impact ICT on internal / external business processes
•  Describe the e-Business roadmap
Material /Sub Topics
5.1.  Internal business processes [Ref : Teacher’s note]
5.2.  External business processes [Ref : Teacher’s note]
5.3.  e-Business roadmap [Ref : Teacher’s note]
5.4.  e-Business strategy development [Ref : Teacher’s note]

Section 6 : Revenue Models for e-Business on The Web (4 hrs)

Instructional Objectives
•  Describe new e-Business models
•  Identify the benefits of each model to customer and business organization
BIT Year 3 – Semester 6 – Detailed Syllabi  IT6304
6
Material /Sub Topics
6.1  Direct-to-customer model [Ref : Teacher’s note]
6.2  Supply chain model [Ref : Teacher’s note]
6.3  Full service provider model [Ref : Teacher’s note]
6.4  Revenue sharing model [Ref : Teacher’s note]
6.5  Digital value hub [Ref : Teacher’s note]
6.6  Global trade platform [Ref : Teacher’s note]

Section 7 : Marketing on The Web (4 hrs)

Instructional Objectives
•  Describe web marketing strategies
•  Describe market Segmentation
Material /Sub Topics
7.1  Product based marketing strategies [Ref 1: pg 195-196]
7.2  Customer based marketing strategies [Ref 1: pg 197]
7.3  Market segmentation [Ref 1: pg 198-208]
7.4  Online and offline marketing [Ref : Teacher’s note]
Section 8 : B2C Interactions and B2B Collaborations (4 hrs)

Instructional Objective
•  Describe Collaborative Strategies on the web
Material /Sub Topics
8.1  Collaborative strategies and its importance in e-Business [Ref : Teacher’s
note]
8.1.1  Collaborative strategies when Threat of New Entrants are high
8.1.2  Collaborative strategies when Rivalry of competitors are high
8.1.3  Collaborative strategies when Bargaining Power of suppliers are
high
8.1.4  Collaborative strategies when Bargaining power Buyers are high
8.1.5  Collaborative strategies when Substitutes

BIT Year 3 – Semester 6 – Detailed Syllabi  IT6304
7
Section 9 : Online Auctions, Virtual Communities and Evolving Concepts (6 hrs)

Instructional Objectives
•  Describe auction basics
•  Define web auction strategies
•  Describe virtual community and portal strategies
•  Discuss Advantages and Disadvantages of Social Networking
•  Describe Cloud Computing
•  Describe Customer Relationship Management, Supply Chain Management
and Knowledge Management
Material /Sub Topics
9.1 Auction basics [Ref 1: pg 292-295]
9.2 Web auction strategies [Ref 1: pg 295-304]
9.3 Virtual Community and Portal Strategies [Ref 1: pg 280-282, 288-289]
9.4 Social Networking [Ref 1: pg 282-288]
9.4.1  Introduction to Social Networking
9.4.2  Advantages and Disadvantages of Social Networking
9.4.3  Social Networking as a Marketing Tool for e-Business
9.4.4  Examples of Social Networks
9.5 Cloud Computing [Ref 2]
9.5.1  Characteristics of Clouds
9.5.2  Deployment Models in Cloud Computing
9.6 Customer Relationship Management (CRM)  [Ref 1: pg 220-222]
9.7 Supply chain Management (SCM) [Ref 1: pg 262-268]
9.8 Knowledge Management (KM) [Ref 1: pg 249]

Section 10 : e-Business Transformation (8 hrs)

Instructional Objective
•  Describe 7Es in e-Transformation
Material /Sub Topics
10.1 Stage 1: Environmental Analysis [Ref : Teacher’s note, Ref 3]
10.2 Stage 2: e-Business Goals/ Strategies [Ref : Teacher’s note, Ref 3]
10.3 Stage 3: eReadiness (Internal/External) [Ref : Teacher’s note, Ref 3]
BIT Year 3 – Semester 6 – Detailed Syllabi  IT6304
8
10.4 Stage 4: eTransformation Roadmap [Ref : Teacher’s note, Ref 3]
10.5 Stage 5: eTransformation Methodology [Ref : Teacher’s note, Ref 3]
10.6 Stage 6: eSystems [Ref : Teacher’s note, Ref 3]
10.7 Stage 7: Evolution Change Management [Ref : Teacher’s note, Ref 3]

Section 11 : Sri Lankan Context for Electronic Commerce (4 hrs)

Instructional Objectives
•  Identify Sri Lanka’s e-Readiness
•  Describe e-Business environment in Sri Lanka
•  Identify the web-based systems in Sri Lanka
Material /Sub Topics
11.1 e-Readiness of Sri Lanka
11.2  e-Business environment in Sri Lanka [Ref 4]
11.3  Web based systems

PLATFORM
•    Windows or Linux

1
© 2008, University of Colombo School of Computing
1.1. Introduction to e-Business
2
© 2008, University of Colombo School of Computing
What is eBusiness ?
• Doing  business  electronically  by  completing
business  processes  over  open  networks,
thereby  substituting  information  for  physical
business process
• Marketing,  buying,  selling,  delivering,  servicing,
and  paying  for  products,  services  and
information  across  (non-proprietary)  networks
linking  an  enterprise  and  its  prospects,
customers, agents, suppliers, competitors, etc.
3
© 2008, University of Colombo School of Computing
What is eBusiness ?
• Doing  business  electronically by  completing
business  processes over  open  networks,
thereby  substituting  information for  physical
business process
• Marketing,  buying,  selling,  delivering,  servicing,
and  paying for  products,  services  and
information  across  (non-proprietary)  networks
linking  an  enterprise  and  its  prospects,
customers, agents, suppliers, competitors, etc.
4
© 2008, University of Colombo School of Computing
What E-Business should not be !
E-Commerce
New IT
Solutions
ePayment
Procedural
Changes
ERP
5
© 2008, University of Colombo School of Computing
ENIAC – 1946
The world’s first
electronic digital
computer was
developed by Army
Ordnance to
compute World War
II firing tables.
6
© 2008, University of Colombo School of Computing
Enabling Technologies: Computing
Power
7
© 2008, University of Colombo School of Computing
Storage Capacity and Bandwidth
Electrical and
Electromagnetic
Optical
X 10  to 10000
8
© 2008, University of Colombo School of Computing
The Internet and the World Wide
Web
9
© 2008, University of Colombo School of Computing
Web Sites as at February 2004
10
© 2008, University of Colombo School of Computing
Human Computer Interface
> dir
11
© 2008, University of Colombo School of Computing
Intelligent Processing
Word of mouth
• Very limited in access
• Information is enriched
and customised.
Books
• Much wider access
• Information was
passive
Computers
• Global access
• Information can be
enriched and
customised
12
© 2008, University of Colombo School of Computing
Technological Breakthroughs
Time to reach 50 million people
Telephone
75 years
Television
13 years
The Web
4 years
Radio
35 years
13
© 2008, University of Colombo School of Computing
Plummeting Transaction Cost
14
© 2008, University of Colombo School of Computing
The Implications
• Within the last few decades we have seen
a massive change to the way we can
communicate and share information.
• Most tasks we do today, whether it is
manufacturing, education, healthcare,
entertainment or business involves
processing information in some form or
another.
• Thus we are about to see significant
changes to the way we do things.
15
© 2008, University of Colombo School of Computing
Internet Banking
1
© 2008, University of Colombo School of Computing
1.2. Types of e-Business
2
© 2008, University of Colombo School of Computing
Types of eBusinesses?
• B2C –Business to Consumer
• B2B – Business to Business
• B2G – Business to Government
• C2C – Consumer to Consumer
• C2B – Consumer to Business
• G2G – Government to Government
• G2C, G2B, C2G – Govt. to ……….
3
© 2008, University of Colombo School of Computing
B2C
4
© 2008, University of Colombo School of Computing
B2C
5
© 2008, University of Colombo School of Computing
Reverse
Auction–
Customers
set the
price
C2B
6
© 2008, University of Colombo School of Computing
The goals:
to increase worker
productivity,
improve customer
service,
and create a
competitive business
advantage for
customers
B2B
7
© 2008, University of Colombo School of Computing
C2C
8
© 2008, University of Colombo School of Computing
B2G
1.3. Benefits & Advantages of
e-Business
To E or not to E–That is the question?
• Competitive Edge
• Global Accessibility
• Culture and Leadership
• Channel Conflict
• Shortage of skills
• Lack of ICT Infrastructure
• Physical Customer interaction
The Disruptive Technology which
changes the Strategic Thinking
Best contact with buyer is
effective contact
Internet
World Wide Web
Best contact with buyer is
personal contact
Field staff connectivity where
ever they are
Wireless/Mobile
Computers/ Internet
Field staff need offices to
get/give information
Decision making is part of
everyone’s job
Decision Support
Tools
Managers make all
decisions
Businesses benefit from cen-tralisation and decentralisation
Telecommunications
Networks
Businesses must choose
Centralisation or not?
A generalist can work of an
expert
Expert SystemsOnly experts can perform
complex tasks
Information appears in multiple
places simultaneously
Shared DatabasesInformation-In one place at
one time
New RuleDisruptive TechnologyOld Rule
Some Benefits of eBusiness
Increase market share
Increase quality of  service
Reach new customers
Gain advantage
over competitors
Improve on technology
Reduce unnecessary costs
Advertise globally
Improve on Customer relationship
Accept online ordering
Accept online payment
Cut down on delays
Improve on supply chain mgt.
Improve on financial mgt.
Eliminate bottlenecks
Benefits
to the
Organization
Benefits
to the
Organization
Improve on efficiency
Proactive decision-making
1
© 2008, University of Colombo School of Computing
1.4. The e-Business
Environment
2
© 2008, University of Colombo School of Computing
The Business Environment
Internal
Internal
EXTERNAL
INTERNAL
Market
Customers
Dealers
Gov.t.
Regulatory
Agencies
Outsourced
Companies
Interest
groups
Competitors
Suppliers
Financial
Institutions
Business
Partners
Shared Values       Strategies
Staff       Skills       Systems       Style
Structure
3
© 2008, University of Colombo School of Computing
The Business Environment
Internal
Internal
EXTERNAL
INTERNAL
Market
Customers
Dealers
Gov.t.
Regulatory
Agencies
Outsourced
Companies
Interest
groups
Competitors
Suppliers
Financial
Institutions
Business
Partners
Shared Values       Strategies
Staff       Skills       Systems       Style
Structure
1
© 2008, University of Colombo School of Computing
1.5. Customer Business
Interaction in e-Business
2
© 2008, University of Colombo School of Computing
ICT is changing many things!
 A thu la G in ige, U W S
Computing Power
 A thu la G in ige, U W S
Storage Capacity and Bandwidth
E lectric al an d
E le ctro m ag n etic
O p tical
X  10  to 10000
 A thu la G in ige, U W S
The Internet and the World Wide
Web
 A thu la G in ige, U W S
Plummeting Transaction Cost
Business eBusiness
Learning eLearning
Commerce eCommerce
Government eGovernment
eTransformation
3
© 2008, University of Colombo School of Computing
Customer – Business Interaction
PastBusiness
Information
Information
Officer
Information
Officer
Customer
Customer
Customer
Information
Officer
Present
Customer
Customer
Customer
Business
Information
Information
Officer
4
© 2008, University of Colombo School of Computing
Changing Business Models:
Virtual Organisations
Business C
Information
Business B
Information
Business A
Information
Virtual
Organisation
Customer
Customer
Customer
1
© 2008, University of Colombo School of Computing
1.6. e-Business Success
Stories
2
© 2008, University of Colombo School of Computing
Why is learning eBusiness
Important to you?
3
© 2008, University of Colombo School of Computing
eBusiness Success Stories and Pioneers
• Dell Computer Corporation – (www.dell.com)
• Sales increased from $7M/d (1997) to $40M/d (2000)
• Ernst & Young – (www.ey.com)
• 85000 person global consulting firm operating in 32
countries, introduced Ernie- an online business
consultant.
• Cisco – (www.cisco.com)
• Has developed a very profitable and innovative
business model
• Amazon – (www.amazon.com)
• The Largest bookstore in the world – Virtual and No
stores!!
4
© 2008, University of Colombo School of Computing
Actual and Estimated Online Sales
• Year B2C Sales    B2B Sales
• 2000 50 600
• 2001 70 730
• 2002 80 900
• 2003 100 1600
• 2004 130 2800
• 2005 150 4100
• 2006 190 5300
• 2007 240 6800
1
© 2008, University of Colombo School of Computing
10. 7Es in e-Transformation
2
© 2008, University of Colombo School of Computing
E Region
Traditional
Digital Value Hub
Phase 1
Phase 2
Phase 3
Suppliers T1 Manufacturers Distributors
Customers
Supplier Manufacturer Distributor Customer
Suppliers T2
B2B  – B2C
CRM – SCM
 Reengineering
 IT enabling of internal and external processes
 eTransformation
3
© 2008, University of Colombo School of Computing
How to start eTransformation?
• “What becomes obvious is that the first thing you have to do,
before you understand e-business priorities, is to understand
business priorities! Which, by the way, makes sense because in
the end there is no “e-business.” The “e” is only temporary; it will
go away. It will all be “business.” Therefore, the right place to
start your e-business initiative is where the most leverage is
within the context of your business………”
• Prof. Mohan Sawhney,  Professor of Electronic Commerce
and Technology at Northwestern University’s Kellogg Graduate
School of Management
4
© 2008, University of Colombo School of Computing
1. Environmental Analysis
5. eTransformation Methodology
4. eTransformation Roadmap
3. eReadiness (Internal/External)
2. eTransformation Goals/Strategies
6. eSystems (ICT/Busi.,Maintenance)
7.
E
v
o
l
u
t
i
o
n

ch
an
ge
m
gt.
7.
E
v
o
l
u
t
i
o
n

ch
an
ge
m
gt.
The Seven E’s In eTransformation
– A Strategic eTransformation Model Developed by UWS
5
© 2008, University of Colombo School of Computing
The Seven E’s In eTransformation
– A Strategic eTransformation Model
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
3.eReadiness
(Internal/
External)
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
6
© 2008, University of Colombo School of Computing
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
3.eReadiness
(Internal/
External)
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
Global IT/
Business Trends
SWOT Analysis
Industry Analysis
(Porter’s Forces)
Global IT/
Business Trends
SWOT Analysis
Industry Analysis
(Porter’s Forces)
Business
Strategies,
eBusiness
Models
Business
Strategies,
eBusiness
Models
eReadiness in
terms of
Funding,
Infrastructure,
Applications,
Web presence,
Skills, etc.
eReadiness in
terms of
Funding,
Infrastructure,
Applications,
Web presence,
Skills, etc.
IT Policies,
Security, Support,
Maintenance
mechanisms
IT Policies,
Security, Support,
Maintenance
mechanisms
Structure
Strategy
Shared
Values
Shared
Values
Style
Staff
Skills
Systems
The Seven E’s In eTransformation
– A Strategic eTransformation Model
The Evolutionary
eTransformation
methodology
The Evolutionary
eTransformation
methodology
7
© 2008, University of Colombo School of Computing
7E’s eTransformation model
1.Environmental Analysis
2.eBusiness
goals/Strategies
The 7E’s in eTransformation (Arunatileka et al. 2003)
Developed by UWS
A Strategic
eTransformation model
Successfully used by some
SMEs in Western Sydney region
Consist of 7
very important aspects
3. eReadiness
4.eTransformation Roadmap
5.eTransformation
Methodology
6.eSystems
7.Evolution
Change Management
IT4VO Autumn 2004
8
© 2008, University of Colombo School of Computing
The Business Environment
IT Driven Forces Business Driven Forces
Internal
Internal
EXTERNAL
INTERNAL
Market
Customers
Dealers
Govt.
Regulatory
Agencies
Outsourced
Companies
Interest
groups
Competitors
Suppliers
Financial
Institutions
Business
Partners
The Company
3.eReadiness
(Internal/
External)
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
3.eReadiness
Internal/
External
9
© 2008, University of Colombo School of Computing
Application of Micheal Porter’s
Five Forces Model to the Industry
Bargaining
Power of
Suppliers
Bargaining
Power of
Suppliers
Bargaining
Power of
Buyers
Bargaining
Power of
Buyers
Threat of
New Entrants
Threat of
New Entrants
Threat of
Substitutes
Threat of
Substitutes
Rivalry Among
Competitors
Rivalry Among
Competitors
V. High
High
High
V. High
Low
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
3.eReadiness
Internal/
External
10
© 2008, University of Colombo School of Computing
Application of Micheal Porter’s Five Forces Model
to the Sri Lankan Garment Industry
Bargaining
Power of
Suppliers
Bargaining
Power of
Suppliers
Bargaining
Power of
Buyers
Bargaining
Power of
Buyers
Threat of
New Entrants
Threat of
New Entrants
Threat of
Substitutes
Threat of
Substitutes
Quick
Response
Systems
Non-Quota Era
Rivalry Among
Competitors
Rivalry Among
Competitors
V. High
High
V. High
V. High
Low
11
© 2008, University of Colombo School of Computing
SWOT  Analysis
Raw material price increases
Aging technology
Market intelligence
Legislations – Food/recycling
Market – large competitors
No direct link to end-user
Possibility of acquisition
New product/market develop.t
Develop products to niche mkt.
Alliance with giants in plastics
Web as a strong marketing tool
eBusiness opportunities
ThreatsOpportunities
Over-reliance on the CEO
Size of business – small
Not using Web for any purpose
Manual quality systems
Lack marketing strategies
IT is not used as a strategic tool
The Industry knowledge of the
CEO Manufacturing flexibility
Company culture-best practices
Innovation and creativity
Customer base–Client pedigree
Industry reputation
WeaknessesStrengths
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
3.eReadiness
Internal/
External
12
© 2008, University of Colombo School of Computing
The Outcomes of the Garment  Industry Survey
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Percentage
Finance
Merchandi
Production
HRD
Workstudy
Industr.
Operations
CAD/CAM
Marketing
Sales
IT
Sample
Stores
Qlty.Asur.
D
i
v
i
s
i
o
n
Departments/Divisions in Enterprises
No. of Computers  in Factory
5 to 15
37%
16 to 50
10%
Over 50
16%
Less than 5
37%
Existance of a Network in the Enterprise
WAN
16%
LAN
26%
None
58%
0% 20% 40% 60% 80% 100%
Multimedia
Trouble Shooting
Dat. Com.
IT Training
Networking
G. Ind. Skills
S/W Dev.
E-Commerce Dev.
Shortage of IT Staff Skills
13
© 2008, University of Colombo School of Computing
Critical Success Factors which give the
Competitive Advantage in the Garment Sector
Effective
Business
Processes
State-of-the-Art
Technology
14
© 2008, University of Colombo School of Computing
Goals, Directions, Strategies
and Competitive Advantage
•Be the Cost Leader
•Be the Cost Leader
•Differentiate
•Differentiate
Cost/Price  Features/Quality
Features/Quality
Cost/Price
Supplier1
Supplier3
Supplier2
Manufacturer Dealers Retailers Customer
$ P
Supply Chain Model
Apply eBusiness Model
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
3.eReadiness
Internal/
External
cargo
Service Provider1
Full Service
Provider
Full Service
Provider
Customers
Retailers
Buyers
Service Provider2
Service Provider3
Airline tickets
Hotel bookings
Full Service Provider Model
15
© 2008, University of Colombo School of Computing
eBusiness Models
33
Direct-to-Customer Model
P
Flow of Product
$
Flow of Money
P
Computer
Manufacturer
(eg. IBM,HP)
Computer
Manufacturer
(eg. IBM,HP)
DistributorDistributor DealerDealer
Customer
$ $$
P PP
DELL
Customer
$
Traditional Business Model
New Business Model
• Can sell at lower prices
• Build to customer order
• Receive payment earlier
• Speed up new product release cycles
• Use customer data to provide
customized value added service
• Proactive decision making
33
Direct-to-Customer Model
P
Flow of Product
$
Flow of Money
P
Computer
Manufacturer
(eg. IBM,HP)
Computer
Manufacturer
(eg. IBM,HP)
Distributor Dealer
Customer
$ $$
P PP
DELL
Customer
$
Traditional Business Model
New Business Model
• Can sell at lower prices
• Build to customer order
• Receive payment earlier
• Speed up new product release cycles
• Use customer data to provide
customized value added service
• Proactive decision making
36
Supply Chain Model
Supplier1
Supplier3
Supplier2
Manufacturer Dealers Retailers Customer
$ P Information flow
Horizontal
Market
place
Model
• Virtual Value Chain
• Information flow across the supply chain
• All parties have a strong electronic bond
and backend systems
• Some companies do/don’t own any part
of the value chain
• They have access to information about all
from suppler/manufacturer to the customer
36
Supply Chain Model
Supplier1
Supplier3
Supplier2
Manufacturer Dealers Retailers Customer
$ P Information flow
Horizontal
Market
place
Model
• Virtual Value Chain
• Information flow across the supply chain
• All parties have a strong electronic bond
and backend systems
• Some companies do/don’t own any part
of the value chain
• They have access to information about all
from suppler/manufacturer to the customer
35
Full-Service Provider Model
Service
Provider1
Service
Provider1
Full Service
Provider
Full Service
Provider
Customers
Retailers
Buyers
Service
Provider2
Service
Provider2
Service
Provider3
Service
Provider3
Airline tickets
Information flow
Hotel bookings
Transportation of goods
Vertical Market place Model
• Has to know a lot about the customer
• Provides own or third party products
• Offers a wide range of products
• Offers different channels
Internet, face-to-face, phone, etc.
• Sells it’s own products+ Commission for
third party products
• Some charge customers a service fee
35
Full-Service Provider Model
Service
Provider1
Service
Provider1
Full Service
Provider
Full Service
Provider
Customers
Retailers
Buyers
Service
Provider2
Service
Provider2
Service
Provider3
Service
Provider3
Airline tickets
Information flow
Hotel bookings
Transportation of goods
Vertical Market place Model
• Has to know a lot about the customer
• Provides own or third party products
• Offers a wide range of products
• Offers different channels
Internet, face-to-face, phone, etc.
• Sells it’s own products+ Commission for
third party products
• Some charge customers a service fee
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
3.eReadiness
Internal/
External
eBusiness Models
16
© 2008, University of Colombo School of Computing
The Suitability of the E-Transformation
Approach to the Garment Companies
• Big Picture  Business-IT alignment
• Smaller incremental changes
• Change is constant – Changes in Requirements
• Flexibility is the key – Responsive, adaptable
sys.
• Automation or Optimisation – SCM, ERP, MRP
• Strong Back End Systems to Support the Web
based Front End Systems
• E-business is Business!
17
© 2008, University of Colombo School of Computing
eReadiness (Internal/External)
Internal:
Business processes – Well defined processes
Applications & Infrastructure
Web presence – Existence/ usage
Skills –Level of IT skills of the employees
Executive mgt – Commitment/Support
External connectivity – Channels
Future directions – Plans for expansions
External:
Customers, Suppliers,  Potential users
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
3.eReadiness
Internal/
External
18
© 2008, University of Colombo School of Computing
The Company’s Position and Path
in the eTransformation Roadmap
New
Processes
New
Processes
Convergence
eCommerce
Site
eCommerce
Site
Interactiv
e Site
Interactiv
e Site
Basic
Website
Basic
Website
Effective
Organisation
Effective
Organisation
Effective
Team
Effective
Team
Effective
Individual
Effective
Individual
IT Sophistication
External Business Internal Business
External Processes
Internal Processes
B2E
B2C
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
3.eReadiness
Internal/
External
19
© 2008, University of Colombo School of Computing
Convergence
Production
Finance
Human
Resources
Human
Resources
Links to
Distributors
Links to
Distributors
Interactive
Web site
Interactive
Web site
Links to
Suppliers
Links to
Suppliers
Purchasing
Marketing
Corporate
Data
Repository
Corporate
Data
Repository
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
3.eReadiness
Internal/
External
20
© 2008, University of Colombo School of Computing
Business Process
Modeling
Business Process
Modeling
Business Process
Re-engineering
Business Process
Re-engineering
Enhancing Process
with IT
Enhancing Process
with IT
Change Management
Stage 4 : eTransformation
Roadmap
Stage 4 : eTransformation
Roadmap
Internal eTransformation Methodology
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
3.eReadiness
(Internal/
External)
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
3.eReadiness
Internal/
External
21
© 2008, University of Colombo School of Computing
Implementing
the Change
Responding to
Change
E Transformation Methodology
Understanding the
Global Change
Business Goals
Business Processes
Workflow, Technology, People
Business Process
Re-engineering
Enhanced
Using IT
Change
Management
Business Strategy
22
© 2008, University of Colombo School of Computing
Case Study – Garment Manufacturing Company
Business Processes
1. Customer Order Processing
2. Raw Material Purchasing
3. Pre-Production Planning
4. Resource Allocations
5. Production Process
6. Quality Assurance
7. Despatch Finished Goods
8. Sample Preparation
9. Inventory Control
10. Human Resource Develop.t.
Business Process Modelling
11. Freight Forwarders handling
12. Accounts Receivable
13. Accounts Payable
14. Staff Recruitment
15. Payroll
16. Merchandising
17. E-Business Sys. Handling
18. Strategic Decision making
19. Marketing
23
© 2008, University of Colombo School of Computing
Business Processes Re
Engineering
Customer
Request
Customer
Satisfaction
Management
Processes
Core Processes
Support Processes
24
© 2008, University of Colombo School of Computing
Understanding the Business
Processes
Process
PeopleTechnology
Process
PeopleTechnology
Functionality
Information
Interface Competence
Linkage
Interconnection
Co-operation
Responsibility
Guidelines
25
© 2008, University of Colombo School of Computing
Identify Purpose and
Infor. Requirement
Identify Purpose and
Infor. Requirement
Design and Develop
Management and
Maintenance
Management and
Maintenance
Online and Offline
Promotion
Online and Offline
Promotion
Stage 4 : eTransformation
Roadmap
Stage 4 : eTransformation
Roadmap
External eTransformation Methodology
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
3.eReadiness
Internal/
External
26
© 2008, University of Colombo School of Computing
Goals, Directions, Strategies
and  Competitive Advantage
Differentiate
CRM Tool
Pre-Purchase
Marketing
Tool
Differentiate
CRM Tool
Pre-Purchase
Marketing
Tool
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
3.eReadiness
(Internal/
External)
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolutio
n
– Change
Manageme
nt
Context
Content
Community
Customisation
Communication
Connection
Commerce
Context
Content
Community
Customisation
Communication
Connection
Commerce
•Innovative Designing
capabilities
•Total Solution Provider
•Rapid response
and commitment
to client success
•Strategic integration
with partners/clients
•Commitment to quality
•Clean room conditions
in factory
•Innovative Designing
capabilities
•Total Solution Provider
•Rapid response
and commitment
to client success
•Strategic integration
with partners/clients
•Commitment to quality
•Clean room conditions
in factory
27
© 2008, University of Colombo School of Computing
eSystems (Policies, Support,  Maintenance)
Management  Controls :  Standards,
guidelines to users, Procedures, Manuals
Security Measures : To deal with common
threats (sabotage,  hacking,  privacy,  etc.)  and
contingency planning and disaster recovery
IT  Maintenance  and  Support:  (Support
for  ICT  infrastructure,  upgrading,  backing
up,  maintenance,  troubleshooting,  Support
by the ISP and Vendors)
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
3.eReadiness
Internal/
External
28
© 2008, University of Colombo School of Computing
7S Model for Change Management
Strategy
Shared
Values
Shared
Values
Style
Staff
Skills
Systems
Structure
Business Strategy,
Strategic alliances,
marketing, product and
service development,
sales and channel
distribution, business
systems and processes
Formal/informal communication
Channels, Organisational Structure
Hierarchical ? Network?
Business
processes,
methods,
procedures and
controls
Behaviour of key
managers and
the way they relate
to employees
People employed, their positions,
Levels,numbers and adequacy
Skills,
aptitude,
Educating,
Training
needs
of the Staff
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
3.eReadiness
Internal/
External
29
© 2008, University of Colombo School of Computing
7S Model
1. Strategy – This defines key actions and capabilities
along the major dimensions of marketing, product and
service development, sales and channel distribution,
business systems and processes, and management of
alliances and partnerships in order to achieve
organisational goals.
2. Structure – The way the organisation’s units relate to
each other and the chain of command and formal/informal
communication channels
3. Systems –The information flow which requires
capabilities in both information technology and in
organisational processes, methods, procedures and
controls.
30
© 2008, University of Colombo School of Computing
7S Model
4. Style – The behaviour of the key managers and the way they
relate to employees in order to achieve the organizational goals
5. Staff – The types of people employed in the organization,
their positions, levels and numbers.
6. Skills – The skills and aptitude for developing customer
relationships, service and sales For staff to develop appropriate
new skills requires a learning environment.
7. Shared Values – The guiding concepts, values and
aspirations, often unwritten, which directs all the personnel in
the organization in the same direction
31
© 2008, University of Colombo School of Computing
Company Profile
A family owned plastic moulding company
established in 1970s
Catering to the consumer foods and
pharmaceutical market
Turnover – 1.5M per annum
Staff – 9 Full time employees
Application of the 7E Model to an
SME in the Manufacturing Sector
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
3.eReadiness
(Internal/
External)
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
32
© 2008, University of Colombo School of Computing
Architecture for the Company
CBEADS Server
Database
Intranet Web Pages
Scripts
(MySQL)
(PERL)
Web Content
Management
Virtual
Corporate Data
Repository
Intranet
Public
Web Site
Extranet
B2C B2B
B2E
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
3.eReadiness
(Internal/
External)
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evoluti
on
– Change
Manage
ment
1
© 2008, University of Colombo School of Computing
3.1. Direct-to-customer
Interaction
2
© 2008, University of Colombo School of Computing
Dealing Directly with Customers
Cases in Point
Description
Bypassing of traditional sales
and distribution channels to
reach consumers directly
3
© 2008, University of Colombo School of Computing
Validity of the Traditional Business
Models in today’s Context
Intangible / VirtualTangible / PhysicalAssets
Global / No boundariesLocal / GeographicalMarkets
Faster / Easier (24/7)Slow / DifficultFinance
Networked / OpenChained / ClosedCommunication
Direct (DELL) MiddlemanDistribution
Personalised (DELL)MassProduction
New Business ModelsTraditional Business
Models
4
© 2008, University of Colombo School of Computing
Example: DELL Computers
• Built on a vision of customer responsive order fulfillment.
• Payment is received at the time of the order.
• Uses a direct Sales approach with no middleman.
• Organization practices a proactive not reactive approach
• Dell finds and hire the
right professionals.
• Retained all efficient
processes & operations
and outsourced the rest to
suppliers and distributors.
5
© 2008, University of Colombo School of Computing
Origins and Founder
• Michael Dell, born in February 1965, is the
chairman and chief executive officer of Dell,
the company he founded in 1984 with
$1,000 and an unprecedented idea – to sell
computer systems directly to customers.
• Mr. Dell became the youngest CEO of a
company ever to earn a ranking on the
Fortune 500 and is now the longest-tenured
CEO in the computer industry.
• Mr. Dell has been honored many times for
his visionary leadership, including in 2003
being named one of the top-ten most
powerful people in business by Fortune
magazine, the fourth most respected world
leader by the Financial Times and the best
CEO in the IT hardware industry by
Institutional Investor magazine.
• In 2001, he was named chief executive of
the year by Chief Executive magazine.
6
© 2008, University of Colombo School of Computing
Dell Web Site
• Gives customers the
ability to custom order &
price various sizes &
configurations of PCs
online
• Receives money before
product is shipped
• All customer service is
done via the web
helping to cut costs
• Customers can track
shipments
1
© 2008, University of Colombo School of Computing
3.2. Mass Communication
2
© 2008, University of Colombo School of Computing
Mattel Launches Limited Direct
Play
Mattel avoids channel conflict by
creating proprietary “Build Your Own
Barbie” product for direct sales…
Mattel.com
…leaving traditional
doll product line to
established retailers
Barbie.com toysrus.com
1
© 2008, University of Colombo School of Computing
3.3. Open Business Models
2
© 2008, University of Colombo School of Computing
Traditional Value Chain
Distributor Manufacturer Reseller Customer
Supply Channel
Supplier
Distribution Channel
3
© 2008, University of Colombo School of Computing
Amazon.com Web site
• Website offers online
viewing of preface,
table of contents &
back cover
• Easy ordering and
fast shipping to
anywhere in the
world.
• Has diversified to
online auctions and
other lucrative
household items
4
© 2008, University of Colombo School of Computing
Example: Amazon.com
• Started in 1995 with 2 employees in a rundown warehouse
in Seattle.
• Grow revenue in 3 years to more than $600 million.
• Beat two giants Barnes & Nobel and Borders Books and
Music.
• Payment is received before product is shipped.
• Has created an e-retail infrastructure that meets even needs
such as hard-to-find, relatively unpopular, out-of print
titles.
• Provides 3
rd
party contents such as author interviews, pre-release information, reviews etc., a valuable part of the
book purchase process.
1
© 2008, University of Colombo School of Computing
3.4. Virtual Organization
2
© 2008, University of Colombo School of Computing
Towards Virtual Collaborations…
Traditional
Phase 1
 Reengineering
 IT enabling of internal and external processes
 eTransformation
Phase 2
Supplier Manufacturer Distributor Customer
Digital Value HubPhase 3
Suppliers T1 Manufacturers Distributors
Customers
Suppliers T2
1
© 2008, University of Colombo School of Computing
4.1. The Business
Environment
2
© 2008, University of Colombo School of Computing
Technological Breakthroughs
Time to reach 50 million people
Telephone
75 years
Television
13 years
The Web
4 years
Radio
35 years
3
© 2008, University of Colombo School of Computing
Plummeting Transaction Cost
1
© 2008, University of Colombo School of Computing
4.2. Driving Forces for
Change
2
© 2008, University of Colombo School of Computing
Driving Forces for Change
• Why do organisations need to change the
way they do business?
• What are the driving forces to change?
3
© 2008, University of Colombo School of Computing
Driving Forces
for Change
IT Driven Forces
External Forces Internal Forces
Business Driven Forces
4
© 2008, University of Colombo School of Computing
IT Driven Forces
•Internet & WWW
•Communication Explosion
•Technological advancement
•Information Revolution
•Virtual Connectivity
•No Geographical boundaries , etc.
5
© 2008, University of Colombo School of Computing
Business Driven Forces
•Bargaining power of buyers
•Bargaining power of suppliers
•Market Changes
•Strong Competition
•Adopting New Strategies
•Diversifying in to new products , etc.
6
© 2008, University of Colombo School of Computing
External Forces
•Government Regulations
•Pressure from business partners
•Pressure from Interest groups
•Market changes , etc.
7
© 2008, University of Colombo School of Computing
Internal Forces
•Adopting New Strategies
•Changes in business processes
•Changes in management
•Changes in staff/structure
•Changes in value systems , etc.
1
© 2008, University of Colombo School of Computing
4.4. e-Market Places
2
© 2008, University of Colombo School of Computing
Market Makers:
eMarketplaces – Auctions
Puts buyer in control
Establishing a “pure
market price”
Sell-side auction
Buy-side auction
eBay
First C2C marketplace
1 million+ auctions/day
3
© 2008, University of Colombo School of Computing
Amazon.com Web site
• Website offers online
viewing of preface,
table of contents &
back cover
• Easy ordering and
fast shipping to
anywhere in the
world.
• Has diversified to
online auctions and
other lucrative
household items
4
© 2008, University of Colombo School of Computing
Example: Amazon.com
• Started in 1995 with 2 employees in a rundown warehouse
in Seattle.
• Grow revenue in 3 years to more than $600 million.
• Beat two giants Barnes & Nobel and Borders Books and
Music.
• Payment is received before product is shipped.
• Has created an e-retail infrastructure that meets even needs
such as hard-to-find, relatively unpopular, out-of print
titles.
• Provides 3
rd
party contents such as author interviews, pre-release information, reviews etc., a valuable part of the
book purchase process.
5
© 2008, University of Colombo School of Computing
Example: DELL Computers
• Built on a vision of customer responsive order fulfillment.
• Payment is received at the time of the order.
• Uses a direct Sales approach with no middleman.
• Organization practices a proactive not reactive approach
• Dell finds and hire the
right professionals.
• Retained all efficient
processes & operations
and outsourced the rest to
suppliers and distributors.
6
© 2008, University of Colombo School of Computing
Origins and Founder
• Michael Dell, born in February 1965, is the
chairman and chief executive officer of Dell, the
company he founded in 1984 with $1,000 and an
unprecedented idea – to sell computer systems
directly to customers.
• Mr. Dell became the youngest CEO of a company
ever to earn a ranking on the Fortune 500 and is
now the longest-tenured CEO in the computer
industry.
• Mr. Dell has been honored many times for his
visionary leadership, including in 2003 being
named one of the top-ten most powerful people
in business by Fortune magazine, the fourth
most respected world leader by the Financial
Times and the best CEO in the IT hardware
industry by Institutional Investor magazine.
• In 2001, he was named chief executive of the
year by Chief Executive magazine.
1
© 2008, University of Colombo School of Computing
4.5. Product Disruption
2
© 2008, University of Colombo School of Computing
Product Disruption:
Examples of Digitization Across Industries
Industry Traditional Format Digital Format
Recorded music LP records, tapes CDs, MP3
Journalism Newspaper, Web site
magazine,
television,
radio
Product Substitution
3
© 2008, University of Colombo School of Computing
Product Digitization:
Examples of Digitization Across Industries
Industry Traditional Format Digital Format
Service Substitution
Banking Cash, check Smart card,
web banking
and payment
systems
Photo-finishing Film to paper Digital to paper,
film to digital
1
© 2008, University of Colombo School of Computing
4.6. Price Disruption
2
© 2008, University of Colombo School of Computing
Price Disruption:
Intelligent Agents
Consumer
Alibris
PriceSCAN
3
© 2008, University of Colombo School of Computing
New Pricing Models
Network marketing
feature send E-mail
from buyer to friends
encouraging
participation
Accompany’s Buy-Cycle
Web display updates
in real time
Current number of
committed buyers
Time remaining in cycle
Current savings
per buyer
Pricing schedule
showing decline as
number of buyers
decline
4
© 2008, University of Colombo School of Computing
Destroying pricing models
1
© 2008, University of Colombo School of Computing
4.7. Intelligent Agents
2
© 2008, University of Colombo School of Computing
3
© 2008, University of Colombo School of Computing
4
© 2008, University of Colombo School of Computing
1
© 2008, University of Colombo School of Computing
5.1. Internal Business
Processes
2
© 2008, University of Colombo School of Computing
Interactions with external World
Business
organisation
Customers
Other
Businesses
Government
Authorities
Suppliers
• Information
• Material
• Finance
Trust Relationships
Security
Convenience
Quality of Service
3
© 2008, University of Colombo School of Computing
Business Processes
• External
Business
Processes
– Marketing
– Distribution
• Internal
Business
Processes
– Production
– Payroll
Business
organisation
Employees
Business
Processes
Organisation Structure
 Information
 Material
 Finance
4
© 2008, University of Colombo School of Computing
Nature of Interactions
Business
Processes
Employees
Business
Organisation
External
Entities
Information
Material
Finance
Trust Relationships
Security
Convenience
Quality of Service
5
© 2008, University of Colombo School of Computing
Impact of Information Technology
Business
Processes
Employees
Business
Organisation
External
Entities
Technology
Technology
Information
Material
Finance
From Atoms to Bits
From place to space
From Brick to click
From Real to Virtual
Trust Relationships
Security
Convenience
Level of Service
6
© 2008, University of Colombo School of Computing
Internal Processes
Business
Processes
Employees
Business
Organisation
External
Entities
Technology
Technology
Effective
Individual
Effective
Team
Effective
Enterprise
Productivity
Tools
Communication
Infrastructure
Enterprise
Wide
Applications
Organisational Efficiency
1
© 2008, University of Colombo School of Computing
5.2. External Business
Processes
2
© 2008, University of Colombo School of Computing
External Processes
Organisational Efficiency
Basic
Web Site
Interactive
Site
eCommerce
Site
Marketing
Information
Two way
Communication
Payment and
follow up
Business
Processes
Employees
Business
Organisation
External
Entities
Technology
Technology
1
© 2008, University of Colombo School of Computing
5.3.e-Business Roadmap
2
© 2008, University of Colombo School of Computing
Convergence and New Processes
Organisational Efficiency
Effective
Individual
Effective
Team
Effective
Enterprise
Basic
Web Site
Interactive
Site
eCommerce
Site
Convergence
New
Processes
SCM
CRM
KM
3
© 2008, University of Colombo School of Computing
Convergence
Corporate
Data
Repository
Production
Finance
Human
Resources
Links to
Distributors
Interactive
Web site
Links to
Suppliers
Purchasing
Marketing
4
© 2008, University of Colombo School of Computing
E-Business Road Map
New
Processes
New
Processes
E-Commerce
Site
E-Commerce
Site
Interactive
Site
Interactive
Site
Basic
Website
Basic
Website
Effective
Organisation
Effective
Organisation
Effective
Team
Effective
Team
Effective
Individual
Effective
Individual
Process Sophistication
External Processes
Internal Processes
External Processes
Internal Processes
Convergence
5
© 2008, University of Colombo School of Computing
Basic Web Site
External Processes
Basic
Website
Basic
Website
Internal Processes
The organization has it’s own domain name
and ‘brochure ware’ type of  website hosted
with an ISP.
The  website  contains  company  information,
static e-catalogue, e-mailing lists, Answers to
FAQs, e-messages to masses.
Process Sophistication
6
© 2008, University of Colombo School of Computing
Interactive Site
Interactive
Site
Interactive
Site
Basic
Website
Basic
Website
Process Sophistication
Internal Processes
Dynamic web site providing two-way
flow  of  information.  Answers
structured  queries.  Existence  of  a
news  forum,  chat  area,  feedback
forms.  Uses  own  domain  name,
database, scripting languages.
Requires  a  web  server and  a  high
speed  dedicated  connection to  the
Internet.  Strong  back-end  systems
and security measures.
External Processes
7
© 2008, University of Colombo School of Computing
eCommerce Site
External Processes
E-Commerce
Site
E-Commerce
Site
Interactive
Site
Interactive
Site
Basic
Website
Basic
Website
Process Sophistication
Internal Processes
The  organization  should
have  a  secure  web  server
to  facilitate  financial
transactions or a link to a
payment  gateway to
process online payments.
Supporting  back-end
Systems,  International
Security  standards,
Business  Contingency
Planning,  needs  to  be  in
place.
Trust Relationships and
Security are major issues
8
© 2008, University of Colombo School of Computing
Effective Individual
Effective
Individual
Effective
IndividualProcess Sophistication
Internal Processes
Individuals  using  computers  and
standalone productivity software such as
accounting  packages,  Payroll  s/w,
Inventory  Control  software,
spreadsheets,  word-processors,  etc.  May
be connected to the Internet and using e-mail, too.
Getting users to own
the processes
External Processes
9
© 2008, University of Colombo School of Computing
Effective Team
External Processes
Effective
Team
Effective
Team
Effective
Individual
Effective
IndividualProcess Sophistication
Internal Processes
Internal Processes
Computer  network  being  used  in
functional  units such  as  Accounting,
Production. People work in teams using
networked  applications,  e-mail,
intranet  capabilities  to  enhance  team
productivity.
Existence  of  LAN,  shared  I/O  devices,
drive space, databases, etc.
10
© 2008, University of Colombo School of Computing
Effective Organisation
External Processes
Effective
Organisation
Effective
Organisation
Effective
Team
Effective
Team
Effective
Individual
Effective
Individual
Process Sophistication
Internal Processes
Internal Processes
All computers in the organization are  networked,
and  the  databases  and  information  systems  are
interlinked. Enterprise wide applications are used
for  purchasing,  manufacturing,  sales,  accounting,
etc.  Information  integration  and  sharing  across
the enterprise.
Existence  of  an  ERP, VPN,  Intranet.  Strict
Security and password protection.
11
© 2008, University of Colombo School of Computing
Convergence
Convergence
Process Sophistication
Internal Processes
External Processes
Internal Processes
The  organization  has
achieved  integration  of  all
information it  needs  to
support all business processes
and  to  interact  with  it’s
business partners.
Convergence
Corporate
Data
Repository
Production
Finance
Human
Resources
Links to
Distributors
Interactive
Web site
Links to
Suppliers
Purchasing
Marketing
External Processes
12
© 2008, University of Colombo School of Computing
New Processes
New
Processes
New
Processes
Convergence
Internal Processes
External Processes
Internal Processes
Such  organization  can  handle
new  processes  such  as  SCM
(Supply  Chain  Management),
CRM (Customer  Relationship
Management),  KM (Knowledge
Management), etc.
External Processes
13
© 2008, University of Colombo School of Computing
The Company’s Position and Path
in the eTransformation Roadmap
New
Processes
New
Processes
Convergence
eCommerce
Site
eCommerce
Site
Interactiv
e Site
Interactiv
e Site
Basic
Website
Basic
Website
Effective
Organisation
Effective
Organisation
Effective
Team
Effective
Team
Effective
Individual
Effective
Individual
IT Sophistication
External Business Internal Business
External Processes
Internal Processes
B2E
B2C
1
© 2008, University of Colombo School of Computing
5.4.e-Business Strategy
Development
2
© 2008, University of Colombo School of Computing
E-business development
step-by-step
Planning Implementation Operation
Regional Training Workshop for Enterprise Support Agencies to Promote E-business for
SMEs in the Greater Mekong Subregion (GMS), 26-28 June 2006, Bangkok
3
© 2008, University of Colombo School of Computing
E-business development step-by-step
What are the steps to success ?
STEP 1 – e-Business strategy
STEP 2 – Implementation plan
STEP 3 – Implementation
STEP 4 – Operation
Planning
Implementation
Operation
Regional Training Workshop for Enterprise Support Agencies to Promote E-business for
SMEs in the Greater Mekong Subregion (GMS), 26-28 June 2006, Bangkok
4
© 2008, University of Colombo School of Computing
E-Business Strategy
Fundamental
questions …..
How to
develop e-business ?
What type of e-business?
Why you want to develop
that type of e-business ?
E-business
strategy
….and answers.
Implementation
plan
Regional Training Workshop for Enterprise Support Agencies to Promote E-business for
SMEs in the Greater Mekong Subregion (GMS), 26-28 June 2006, Bangkok
5
© 2008, University of Colombo School of Computing
E-Business strategy
Family
Inequalities
Environment
$$$
Now
Competition
Regional Training Workshop for Enterprise Support Agencies to Promote E-business for
SMEs in the Greater Mekong Subregion (GMS), 26-28 June 2006, Bangkok
6
© 2008, University of Colombo School of Computing
E-Business strategy
Environment
$$$
Now
Future
Regional Training Workshop for Enterprise Support Agencies to Promote E-business for
SMEs in the Greater Mekong Subregion (GMS), 26-28 June 2006, Bangkok
7
© 2008, University of Colombo School of Computing
E-Business strategy
Propose you future situation
• Should be specific (time and figures)
• Profit oriented
“Increase revenue by 30% in 2 years”
“Increase market share by 50% in 1 year”
• Cost oriented
“Reduce cost by 20%  in 1 year”
• Client oriented
“Increase client satisfaction by 50% in 2
years”
Efficiency
Improve customer services
Regional Training Workshop for Enterprise Support Agencies to Promote E-business for
SMEs in the Greater Mekong Subregion (GMS), 26-28 June 2006, Bangkok
8
© 2008, University of Colombo School of Computing
E-Business strategy
Assess your current situation
• Internal factors
Strengths
Weaknesses
• External factors
Opportunities
Threats
Regional Training Workshop for Enterprise Support Agencies to Promote E-business for
SMEs in the Greater Mekong Subregion (GMS), 26-28 June 2006, Bangkok
9
© 2008, University of Colombo School of Computing
E-Business strategy
To conduct the SWOT Analysis  you should consider:
• What is your business sector?
• Who are the customers?
• What are the current practices of selling and buying?
• Who are the major competitors? (How intense is the
competition?)
• What e-strategies are used, by whom?
• What are the major opportunities and threats?
• What are the existing and potential partnerships for
developing e-Business?
Regional Training Workshop for Enterprise Support Agencies to Promote E-business for
SMEs in the Greater Mekong Subregion (GMS), 26-28 June 2006, Bangkok
10
© 2008, University of Colombo School of Computing
E-Business strategy
SWOT Analysis
• Internal factors
Strengths
Original product
Popular product
High quality
Weaknesses
Lack of IT expertise
No WEB presence
• External factors
Opportunities
External market
New trend
B2B market places
Threats
Competitors
Regional Training Workshop for Enterprise Support Agencies to Promote E-business for
SMEs in the Greater Mekong Subregion (GMS), 26-28 June 2006, Bangkok
11
© 2008, University of Colombo School of Computing
SWOT Diagram
Strengths (S)
Original product
Popular product
High quality
Weaknesses (W)
Lack of IT expertise
No WEB presence
Threats (T)
Competitors
INTERNAL
FACTORS
SO Strategies
Generate strategies
here that use
strengths to take
advantages of
opportunities
WO Strategies
Generate strategies here
that take advantage of
opportunities by
overcoming weaknesses
ST Strategies
Generate strategies
here that use
strengths to avoid
threats
WT Strategies
Generate strategies
here that minimize
weaknesses and
avoid threats
EXTERNAL
FACTORS
Opportunities (O)
External market
New trend
B2B market places
12
© 2008, University of Colombo School of Computing
Issues in e-Business Strategy
• Advantages
– Chance to capture
large markets
– Establishing a brand
name
– Exclusive strategic
alliances
• Disadvantages
– Cost of developing initiative
is usually very high
– Chance of failure is high
– System may be obsolete as
compared to second wave
arrivals
– No support services are
available at the beginning
To be a first mover or a follower???
Regional Training Workshop for Enterprise Support Agencies to Promote E-business
for SMEs in the Greater Mekong Subregion (GMS), 26-28 June 2006, Bangkok
13
© 2008, University of Colombo School of Computing
Issues in e-Business Strategy
• E-Business Awareness / Owner
commitment
• Senior managers tend to:
– Know the whole spectrum of business
– Possess knowledge and authority to lead
– the e-business adoption
Regional Training Workshop for Enterprise Support Agencies to Promote E-business for
SMEs in the Greater Mekong Subregion (GMS), 26-28 June 2006, Bangkok
14
© 2008, University of Colombo School of Computing
Issues in e-Business Strategy
Should you join an e-Business Portal?
– Several benefits
– Costs and limitations
– E-Marketing
– sell-side and buy-side infrastructure
– Which Portal to join?
Regional Training Workshop for Enterprise Support Agencies to Promote E-business for SMEs in the Greater Mekong Subregion (GMS), 26-28 June 2006,
Bangkok
15
© 2008, University of Colombo School of Computing
E-Business strategy
1. B2B e-business
2. Target external market
3. Develop own website and join a B2B Portal
4. Use ISP to host website
Regional Training Workshop for Enterprise Support Agencies to Promote E-business for
SMEs in the Greater Mekong Subregion (GMS), 26-28 June 2006, Bangkok
16
© 2008, University of Colombo School of Computing
E-Business strategy
How to know if the strategy will achieve the
proposed result?
“Increase revenue by 30% in 2 years”
Regional Training Workshop for Enterprise Support Agencies to Promote E-business for
SMEs in the Greater Mekong Subregion (GMS), 26-28 June 2006, Bangkok
17
© 2008, University of Colombo School of Computing
The Business Environment
Internal
Internal
EXTERNAL
INTERNAL
Market
Customers
Dealers
Gov.t.
Regulatory
Agencies
Outsourced
Companies
Interest
groups
Competitors
Suppliers
Financial
Institutions
Business
Partners
Shared Values       Strategies
Staff       Skills       Systems       Style
Structure
18
© 2008, University of Colombo School of Computing
Analyzing the External Environment
Using Porter’s Forces
Bargaining
Power of
Suppliers
Bargaining
Power of
Suppliers
Bargaining
Power of
Buyers
Bargaining
Power of
Buyers
Threat of
New Entrants
Threat of
New Entrants
Threat of
Substitutes
Threat of
Substitutes
Rivalry Among
Competitors
Rivalry Among
Competitors
V. High
High
High
V. High
Low
19
© 2008, University of Colombo School of Computing
Michael Porter’s Five Forces
Bargaining
Power of
Suppliers
Bargaining
Power of
Suppliers
Bargaining
Power of
Buyers
Bargaining
Power of
Buyers
Threat of
New Entrants
Threat of
New Entrants
Threat of
Substitutes
Threat of
Substitutes
Rivalry Among
Competitors
Rivalry Among
Competitors
Strength and
aggressiveness of
the competitors
in the industry
sector
Attractiveness of
the industry
sector and the
height of it’s entry
and exit barriers
Number of
suppliers in the
industry sector
(concentrated or
organised power)
Power of
substitute
products
eating in to the
market share
Buyers become
powerful when
switching costs are
low, products are
undifferentiated
and price sensitive
20
© 2008, University of Colombo School of Computing
Bargaining Power of Suppliers
• Suppliers have the most power when:
• The inputs you require are available only from a small
number of suppliers.
• The inputs you require are unique, making it costly to
switch suppliers.
• Your input purchases don’t represent a significant
portion of the supplier’s business.
• Suppliers can sell directly to your customers,
bypassing the need for your business.
• It is difficult for you to switch to another supplier.
• You do not have a full understanding of your
• supplier’s market.
21
© 2008, University of Colombo School of Computing
Reducing the Bargaining Power
of Suppliers
• Reduce inventory costs by providing just-in-time
deliveries
• Enhance the value of goods and services supplied by
making effective use of information about customer
needs and preferences
• Speed the adoption of new technologies
• Forming a buying group of small producers to buy as
one large-volume customer.
• choose to integrate back and produce your own
inputs by purchasing one of your key suppliers or
doing the production yourself.
22
© 2008, University of Colombo School of Computing
Bargaining Power of Buyers
• Buyers have the most power over you when:
• they are large and purchase much of your output.
• Many small customers acting as a group creates force.
• Your industry has many small companies supplying the product and
buyers are few and large.
• The products represent a relatively large expense for your customers
• Customers have access to and are able to evaluate
• market information.
• Your product is not unique and can be purchased
• from other suppliers.
• Customers could possibly make your product themselves.
• Customers can easily, and with little cost, switch to another product.
23
© 2008, University of Colombo School of Computing
Reducing the Bargaining Power
of Buyers
• By increasing their loyalty to your business through partnerships or
loyalty programs,
• selling directly to consumers
• Increasing the inherent or perceived value of a product by adding
features or branding.
• select the customers who have little knowledge of the market and have
less power, you can enhance your profitability.
24
© 2008, University of Colombo School of Computing
Threat of New Entrants
• The threat of new entrants is greatest when:
• Processes are not protected by regulations or patents.
• Customers have little brand loyalty.
• Start-up costs are low for new businesses entering the
industry.
• The products provided are not unique.
• Can easily liquidate their inventory & assets
• if the venture fails.
• Switching costs are low.
• The production process is easily learned.
• Access to inputs is easy.
• Access to customers is easy
• Economies of scale are minimal.
25
© 2008, University of Colombo School of Computing
Reducing the Threat of
New Entrants
• Enhancing your marketing/brand image,
• utilizing patents
• creating alliances with associated
products
• demonstrating your ability and desire to
retaliate to potential entrants
• setting a product price that deters entry
26
© 2008, University of Colombo School of Computing
Threat of Substitutes
• Substitutes are a greater threat when:
• Your product doesn’t offer any real
benefit compared to other products.
• It is easy for customers to switch.
• Customers have little loyalty.
27
© 2008, University of Colombo School of Computing
Reducing the Threat
of Substitutes
• by using tactics such as staying closely
in tune with customer preferences
• differentiating your product by branding
• collective advertising for an industry
• Value added products taking your
products to a different market
28
© 2008, University of Colombo School of Computing
Rivalry Among Competitors
• The most intense rivalries occur when:
• One firm or a small number of firms have incentive to
try and become the market leader.
• The market is growing slowly or shrinking.
• There are high fixed costs of production
• Products are perishable and need to be sold quickly.
• Products are not unique or homogenous.
Undifferentiated products (commodities)
• Customers can easily switch between products.
• There are high costs for exiting the business
29
© 2008, University of Colombo School of Computing
Reducing the Threat
of Rivals
• To minimize price competition
• distinguish your product from your
competitors’ by innovating or improving
features.
• focusing on a unique segment of the market
• distributing your product in a novel channel
• trying to form stronger relationships
• Build customer loyalty.
30
© 2008, University of Colombo School of Computing
Business Goals & Strategies
• Survival -Companies in deep trouble
and  need re-engineering, they have no
choice.
• Sustainability – Not yet in trouble, but,
has the foresight to see trouble
coming. They need to be proactive.
• Growth – Ambitious and aggressive.
Reengineering is an opportunity to
further their lead over the competitors.
31
© 2008, University of Colombo School of Computing
Critical Success Factors which give the
Competitive Advantage in the Garment Sector
Effective
Business
Processes
State-of-the-Art
Technology
32
© 2008, University of Colombo School of Computing
Virtual Collaborations
33
Direct-to-Customer Model
P
Flow of Product
$
Flow of Money
P
Computer
Manufacturer
(eg. IBM,HP)
Computer
Manufacturer
(eg. IBM,HP)
DistributorDistributor DealerDealer
Customer
$ $$
P PP
DELL
Customer
$
Traditional Business Model
New Business Model
• Can sell at lower prices
• Build to customer order
• Receive payment earlier
• Speed up new product release cycles
• Use customer data to provide
customized value added service
• Proactive decision making
33
Direct-to-Customer Model
P
Flow of Product
$
Flow of Money
P
Computer
Manufacturer
(eg. IBM,HP)
Computer
Manufacturer
(eg. IBM,HP)
Distributor Dealer
Customer
$ $$
P PP
DELL
Customer
$
Traditional Business Model
New Business Model
• Can sell at lower prices
• Build to customer order
• Receive payment earlier
• Speed up new product release cycles
• Use customer data to provide
customized value added service
• Proactive decision making
36
Supply Chain Model
Supplier1
Supplier3
Supplier2
Manufacturer Dealers Retailers Customer
$ P Information flow
Horizontal
Market
place
Model
• Virtual Value Chain
• Information flow across the supply chain
• All parties have a strong electronic bond
and backend systems
• Some companies do/don’t own any part
of the value chain
• They have access to information about all
from suppler/manufacturer to the customer
36
Supply Chain Model
Supplier1
Supplier3
Supplier2
Manufacturer Dealers Retailers Customer
$ P Information flow
Horizontal
Market
place
Model
• Virtual Value Chain
• Information flow across the supply chain
• All parties have a strong electronic bond
and backend systems
• Some companies do/don’t own any part
of the value chain
• They have access to information about all
from suppler/manufacturer to the customer
35
Full-Service Provider Model
Service
Provider1
Service
Provider1
Full Service
Provider
Full Service
Provider
Customers
Retailers
Buyers
Service
Provider2
Service
Provider2
Service
Provider3
Service
Provider3
Airline tickets
Information flow
Hotel bookings
Transportation of goods
Vertical Market place Model
• Has to know a lot about the customer
• Provides own or third party products
• Offers a wide range of products
• Offers different channels
Internet, face-to-face, phone, etc.
• Sells it’s own products+ Commission for
third party products
• Some charge customers a service fee
35
Full-Service Provider Model
Service
Provider1
Service
Provider1
Full Service
Provider
Full Service
Provider
Customers
Retailers
Buyers
Service
Provider2
Service
Provider2
Service
Provider3
Service
Provider3
Airline tickets
Information flow
Hotel bookings
Transportation of goods
Vertical Market place Model
• Has to know a lot about the customer
• Provides own or third party products
• Offers a wide range of products
• Offers different channels
Internet, face-to-face, phone, etc.
• Sells it’s own products+ Commission for
third party products
• Some charge customers a service fee
15
Market/Promote
Products collectively
Market/Promote
Products collectivelyCompany 1
Company 2
Company 3
Company 4
Order together to get
Economies of scale
Order together to get
Economies of scale
Deal with customers
to get larger projects
Deal with customers
to get larger projects
eHub
or
ePortal
Revenue sharing eB Model
• The sellers get together through a Portal
• They Market/ Promote products collectively to a larger mkt segment
• Sellers can work on larger projects/orders as they work collectively
• Collective bulk orders give them bargaining power over suppliers
• Resources as well as profits are shared among companies
Information flow
$ P
65
Digital Value Hub – eRegion
Suppliers T1 Manufacturers Distributors
Customers
Suppliers T2
• Strong B2B partnerships and collaborations between nodes
in the supply chain
• The industry competitors willingness to work together
• Trust relationships among the competitors in an ndustry
• A strong force against foreign competition
65
Digital Value Hub – eRegion
Suppliers T1 Manufacturers Distributors
Customers
Suppliers T2
• Strong B2B partnerships and collaborations between nodes
in the supply chain
• The industry competitors willingness to work together
• Trust relationships among the competitors in an ndustry
• A strong force against foreign competition
Extended Enterprise
Professor Jim Browne, CIMRU, NUI, Galway.
Project Management
Issues
Project Management
Issues
CustomersT2 Suppliers ManufacturingT1 Suppliers
Design Issues
Co-Design Customer Driven Design
Supply Chain Management
Customer Order Fulfillment
Extended Enterprise
Professor Jim Browne, CIMRU, NUI, Galway.
Project Management
Issues
Project Management
Issues
CustomersT2 Suppliers ManufacturingT1 Suppliers
Design Issues
Co-Design Customer Driven Design
Supply Chain Management
Customer Order Fulfillment
A Global Trade Platform for SMEs
33
© 2008, University of Colombo School of Computing
Cost-Benefit and Risk Analysis
• Revenue model
– Properly planned revenue model is a critical success
factor
– Revenues from sales depend on customer acquisition
cost and advertisement
– Must be figured into the analysis
• Costs
– Implementation and operation costs
• Recover the investment
– Should be able to recover the investment in up to 3
years
34
© 2008, University of Colombo School of Computing
E-Business Strategy
• Outputs:
– Where you want to be in the future
– Why e-business
– What type of e-business (Estimated Scope)
– Business case
• Estimated Time
• Estimated Cost
35
© 2008, University of Colombo School of Computing
What are Business Strategies ?
• Product Differentiation (Value-added)
• Strategic Alliances
• Product Bundling
• Horizontal Integration
• Marketing
• Pricing Strategies
• Customer Relationship Mgt. (CRM)
• Expand Product Line
36
© 2008, University of Colombo School of Computing
Threat of New Entrants
Bargaining
Power of
Suppliers
Bargaining
Power of
Suppliers
Bargaining
Power of
Buyers
Bargaining
Power of
Buyers
Threat
of New
Entrants
Threat
of New
Entrants
Threat of
Substitutes
Threat of
Substitutes
Rivalry
Among
Competito
rs
Rivalry
Among
Competito
rs
Strengthen the
barriers of entry
Value Added
Product
Differentiation
Product Bundling
CRM -Customer
Relationship Mgt
Strategic Alliances
Cost Leadership
Strengthen the
barriers of entry
Value Added
Product
Differentiation
Product Bundling
CRM -Customer
Relationship Mgt
Strategic Alliances
Cost Leadership
Micheal Porter’s Five Forces Business Strategies
37
© 2008, University of Colombo School of Computing
Rivalry Among Existing Firms
Bargaining
Power of
Suppliers
Bargaining
Power of
Suppliers
Bargaining
Power of
Buyers
Bargaining
Power of
Buyers
Threat
of New
Entrants
Threat
of New
Entrants
Threat of
Substitutes
Threat of
Substitutes
Rivalry
Among
Competitors
Rivalry
Among
Competitors
VA Product
Differentiation /
Strategic Alliances
/Product Bundling /
Horizontal
integration /
Marketing / Price
discrimination
strategies/ Pricing
Strategies /
Targeting Niche
markets/ Customer
Relationship
Management
(CRM)/Expand
Product Line
VA Product
Differentiation /
Strategic Alliances
/Product Bundling /
Horizontal
integration /
Marketing / Price
discrimination
strategies/ Pricing
Strategies /
Targeting Niche
markets/ Customer
Relationship
Management
(CRM)/Expand
Product Line
Micheal Porter’s Five Forces Business Strategies
38
© 2008, University of Colombo School of Computing
Bargaining Power of Suppliers
Threat of
New
Entrants
Threat of
New
Entrants
Bargain
ing
Power
of
Buyers
Bargain
ing
Power
of
Buyers
Bargainin
g Power of
Suppliers
Bargainin
g Power of
Suppliers
Threat of
Substitute
s
Threat of
Substitute
s
Rivalry
Among
Competito
rs
Rivalry
Among
Competito
rs
Reduce Suppliers
Monopoly or
Strength of
Suppliers
Product
Differentiation /
Backward
Integration / Supply
Chain Mgt (SCM) /
Strategic Alliances /
ePortal (for bulk
ordering)
Reduce Suppliers
Monopoly or
Strength of
Suppliers
Product
Differentiation /
Backward
Integration / Supply
Chain Mgt (SCM) /
Strategic Alliances /
ePortal (for bulk
ordering)
Micheal Porter’s Five Forces Business Strategies
39
© 2008, University of Colombo School of Computing
Bargaining Power of Buyers
Bargainin
g Power of
Suppliers
Bargainin
g Power of
Suppliers
Bargaining
Power of
Buyers
Bargaining
Power of
Buyers
Threat of
Substitute
s
Threat of
Substitute
s
Rivalry
Among
Competito
rs
Rivalry
Among
Competito
rs
VA Product
Differenciation /
Forward
Integration /
Marketing / Product
bundling / Product
Development /
Strategic Alliances /
Customer
Relationship Mgt
(CRM) / Cost
Leadership / Pricing
Strategies / Expand
Product line
VA Product
Differenciation /
Forward
Integration /
Marketing / Product
bundling / Product
Development /
Strategic Alliances /
Customer
Relationship Mgt
(CRM) / Cost
Leadership / Pricing
Strategies / Expand
Product line
Micheal Porter’s Five Forces Business Strategies
Threat of
New
Entrants
Threat of
New
Entrants
40
© 2008, University of Colombo School of Computing
Bargaining Power of Substitutes
Bargaining
Power of
Suppliers
Bargaining
Power of
Suppliers
Bargaining
Power of
Buyers
Bargaining
Power of
Buyers
Bargaining
Power of
Substitutes
Bargaining
Power of
Substitutes
Rivalry
Among
Competito
rs
Rivalry
Among
Competito
rs
Deal with the threat
before it is too big to
handle (do not avoid,
ignore or under-estimate the threat)
Product
Diversification /
Market
Diversification
Product Bundling /
Strategic Alliances /
Pricing Strategies
Deal with the threat
before it is too big to
handle (do not avoid,
ignore or under-estimate the threat)
Product
Diversification /
Market
Diversification
Product Bundling /
Strategic Alliances /
Pricing Strategies
Micheal Porter’s Five Forces Business Strategies
Threat of
New
Entrants
Threat of
New
Entrants
41
© 2008, University of Colombo School of Computing
Business Goals/Strategies
Product (Value Added) Differentiation / Forward Integration / Marketing / Product
bundling / Product Development / Strategic Alliances / Customer Relationship Mgt
(CRM) / Cost Leadership / Pricing Strategies / Expand Product line
Bargaining
Power of Buyers
Product Differentiation / Backward Integration / Supply Chain Mgt (SCM) /
Strategic Alliances / ePortal (for bulk ordering)
Bargaining
Power of
Suppliers
Product Diversification / Market Diversification
Product Bundling / Strategic Alliances / Pricing Strategies
Threat of
Substitutes
Product (Value-added) Differentiation / Strategic Alliances /Product Bundling /
Horizontal integration / Marketing / Price discrimination strategies/ Pricing
Strategies / Targeting Niche markets/ Customer Relationship Management
(CRM)/Expand Product Line
Rivalry among
existing Firms
Product Differentiation / Product Bundling / Customer Relationship Mgt
(CRM)/Strategic Alliances / Cost Leadership
Threat of New
Entrants
Business StrategiesForce
1
© 2008, University of Colombo School of Computing
6.1. Direct-to-customer model
2
© 2008, University of Colombo School of Computing
Direct-to-Customer Model
P
Flow of Product
$
Flow of Money
P
Computer
Manufacturer
(eg. IBM,HP)
Computer
Manufacturer
(eg. IBM,HP)
Distributor Dealer
Customer
$ $$
P PP
Manufacturer
Customer
$
Traditional Business Model
New Business Model
•Can sell at lower prices
•Build to customer order
•Receive payment earlier
•Speed up new product release cycles
•Use customer data to provide
customized value added service
•Proactive decision making
Supplier
$
P
$
P
Supplier
Supplier
Supplier
1
© 2008, University of Colombo School of Computing
6.2. Supply chain model
2
© 2008, University of Colombo School of Computing
Supply Chain Model
Supplier1
Supplier3
Supplier2
Manufacturer Dealers Retailers Customers
$ P Information flow
• Virtual Value Chain
• Information flow across the supply chain
• All parties have a strong electronic bond and backend systems
• Some companies do/don’t own any part of the value chain
• They have access to information about all from suppler/manufacturer
to the customer
1
© 2008, University of Colombo School of Computing
6.3. Full service provider
model
2
© 2008, University of Colombo School of Computing
Full-Service Provider Model
Service
Provider1
Service
Provider1
Full Service
Provider
Full Service
Provider
Customers
Retailers
Buyers
Service
Provider2
Service
Provider2
Service
Provider3
Service
Provider3
Airline tickets
Information flow
Hotel bookings
Car rentals
• Has to know a lot about the customer
• Provides own or third party products
• Offers a wide range of products
• Offers different channels
Internet, face-to-face, phone, etc.
• Sells it’s own products+ Commission for
third party products
• Some charge customers a service fee
1
© 2008, University of Colombo School of Computing
6.4. Revenue sharing model
2
© 2008, University of Colombo School of Computing
Revenue sharing eB Model
Market/Promote
Products collectively
Market/Promote
Products collectivelyCompany 1
Company 2
Company 3
Company 4
Order together to get
Economies of scale
Order together to get
Economies of scale
Deal with customers
to get larger projects
Deal with customers
to get larger projects
eHub
or
ePortal
• The sellers get together through a Portal
• They Market/ Promote products collectively to a larger mkt segment
• Sellers can work on larger projects/orders as they work collectively
• Collective bulk orders give them bargaining power over suppliers
• Resources as well as profits are shared among companies
Information flow
$ P
1
© 2008, University of Colombo School of Computing
6.5. Digital value hub
2
© 2008, University of Colombo School of Computing
Digital Value Hub – eRegion
Suppliers T1 Manufacturers Distributors
Customers
Suppliers T2
•Strong B2B partnerships and collaborations between nodes
in the supply chain
•The industry competitors willingness to work together
•Trust relationships among the competitors in an industry
•A strong force against foreign competition
1
© 2008, University of Colombo School of Computing
7.4. Online and Offline
Marketing
2
© 2008, University of Colombo School of Computing
Importance of Marketing
• The  aim of  Marketing  is to  know and
understand  the  customer  so well that the
product or service fits him and sells itself.
• Peter Drucker – A leading Management
Theorist
• Extracted  from “Marketing” by Phillip
Kotler
3
© 2008, University of Colombo School of Computing
The Traditional Marketing Mix
Product
Product variety,
quality, Design,
Features,Brand
name,Packaging
,Services,Warra
nties, Returns
Product
Product variety,
quality, Design,
Features,Brand
name,Packaging
,Services,Warra
nties, Returns
Price
List Price,
Discounts,
Allowances,
Payment
period,
Credit
Terms,
Price
List Price,
Discounts,
Allowances,
Payment
period,
Credit
Terms,
Promotion
Sales
Promotion,
Advertising,
Public Relations,
Direct
Marketing, Sales
force
Promotion
Sales
Promotion,
Advertising,
Public Relations,
Direct
Marketing, Sales
force
Place
Channels,
Coverage,
Locations,
Inventory,
Transport
Place
Channels,
Coverage,
Locations,
Inventory,
Transport
4
© 2008, University of Colombo School of Computing
Marketing in the New Economy
Intelligent
shopping
Agents
5
© 2008, University of Colombo School of Computing
The New Marketing Practices
Pure Click
Brick & Click
Hybrid
Consumers
Hybrid
Consumers
Cyber-consumers
Cyber-consumers
Traditional
Consumers
Traditional
ConsumersContext
Content
Community
Customisation
Communication
Connection
Commerce
Context
Content
Community
Customisation
Communication
Connection
Commerce
Company Web Presence Customer
6
© 2008, University of Colombo School of Computing
The New Marketing Strategies
Online Promotion
Strategic Alliances,
Email, Newsletters,
Search engines,
Banner Adverts,
Viral Marketing,
Analyse site traffic
Database Marketing
Online Promotion
Strategic Alliances,
Email, Newsletters,
Search engines,
Banner Adverts,
Viral Marketing,
Analyse site traffic
Database Marketing
Offline Promotion
Business Cards
Industry Magazines,
Media Advertisements
Newspapers,
Newsletters, brochures,
Banners, sponsorships,
Etc., etc., etc.
Offline Promotion
Business Cards
Industry Magazines,
Media Advertisements
Newspapers,
Newsletters, brochures,
Banners, sponsorships,
Etc., etc., etc.
7
© 2008, University of Colombo School of Computing
The eMarketing Challenge
• Encourage customer loyalty by offering
incentives
• Reduce first-time purchase risk – address
security concerns
• Increase repeat buying – Increase trust
• Provide multiple mechanisms for
accepting payment
• Add value to the sales channel by having
latest information
8
© 2008, University of Colombo School of Computing
Seven Cs in Web design
• Context – Layout and design (Downloads quickly,
simple and easy to understand and use)
• Content – Information, pictures, sound, links, offers,
• Community – How the site enables user-to-user
communication
• Customisation – Site’s ability to tailer itself to
different users or allow users to personalise the site
• Communication – How the site enables site-to-user
user-to-site 2 way communication
• Connection – Degree of links with other sites
• Commerce – Site’s Capabilities to enable
commercial transactions
1
© 2008, University of Colombo School of Computing
8.1. Collaborative strategies
on the web
2
© 2008, University of Colombo School of Computing
Boeing 7E7 Project – A Case Study
3
© 2008, University of Colombo School of Computing
Who is Building Boeing 7E7?
Mitsubishi,
Japan
Wichita,
Kansas
Frederickson,
Tacoma
Vought, Fuji,
Kawasaki in
Japan
Australia
Canada
Italy
4
© 2008, University of Colombo School of Computing
Collaborating to Win
Professor Jim Browne, CIMRU, NUI, Galway.
Project Management
Issues
Project Management
Issues
CustomersT2 Suppliers ManufacturingT1 Suppliers
Design Issues
Co-Design Customer Driven Design
Supply Chain Management
Customer Order Fulfillment
5
© 2008, University of Colombo School of Computing
Goals, Directions, Strategies
and Competitive Advantage
•Be the Cost Leader
•Be the Cost Leader
•Differentiate
•Differentiate
Cost/Price  Features/Quality
Features/Quality
Cost/Price
Supplier1
Supplier3
Supplier2
Manufacturer Dealers Retailers Customer
$ P
Supply Chain Model
Apply eBusiness Model
1.Environmental
Analysis
2.eBusiness
Goals/
Strategies
4.eTransformation
Roadmap
5.eTransfor–mation
Methodology
6.eSystems
(ICT/Business
Maintenance
7.Evolution
– Change
Management
3.eReadiness
Internal/
External
6
© 2008, University of Colombo School of Computing
Linking Industry Forces, Business
Strategies and eBusiness Models
Bargaining
Power of
Suppliers
Bargaining
Power of
Suppliers
Bargain
ing
Power
of
Buyers
Bargain
ing
Power
of
Buyers
Threat
of New
Entrants
Threat
of New
Entrants
Threat of
Substitutes
Threat of
Substitutes
Rivalry
Among
Competito
rs
Rivalry
Among
Competito
rs
V. High
Medium
Low
High
Low
Product
Differentiation (Value-added) / Strategic
Alliances
Product Bundling /
Horizontal integration
/ Marketing /
Pricing Strategies
Customer Relationship
Management(CRM) /
Expand Product Line
Product
Differentiation (Value-added) / Strategic
Alliances
Product Bundling /
Horizontal integration
/ Marketing /
Pricing Strategies
Customer Relationship
Management(CRM) /
Expand Product Line
3 5
Full-Service Provider M odel
Service
Provider1
Service
Provider1
Full Service
Provider
Full Service
Provider
C ustom ersCustom ers
R etailersRetailers
Buyers
Service
Provider2
Service
Provider2
Service
Provider3
Service
Provider3
Airline tick ets
Inform ation flow
H otel bookings
Transportation of goods
Vertical M arket place M odel
• H as to know a lot about the custom er
• Provides ow n or third party products
• O ffers a w ide range of products
• O ffers different channels
Internet, face-to-face, phone, etc.
• Sells it’s ow n products+  C om m ission for
third party products
• Som e charge custom ers a service fee 3 5
Full-Service Provider M odel
Service
Provider1
Service
Provider1
Full Service
Provider
Full Service
Provider
C ustom ersCustom ers
R etailersRetailers
Buyers
Service
Provider2
Service
Provider2
Service
Provider3
Service
Provider3
Airline tick ets
Inform ation flow
H otel bookings
Transportation of goods
Vertical M arket place M odel
• H as to know a lot about the custom er
• Provides ow n or third party products
• O ffers a w ide range of products
• O ffers different channels
Internet, face-to-face, phone, etc.
• Sells it’s ow n products+  C om m ission for
third party products
• Som e charge custom ers a service fee
3 6
S u p p ly C h a in  M o d e l
S u pp lier1
S u pp lier3
S u pp lier2
M a n ufa ctu re rM an u fa cture r D e a le rs R e taile rsR etaile rs C u sto m er
$ P In fo rm ation  flow
H o rizo n ta l
M arke t
place
M o d e l
• V irtua l V a lu e  C ha in
• In fo rm a tio n flo w  a cro ss th e  su p ply  c ha in
• A ll p a rties h av e  a stro ng  ele ctro n ic  b o n d
a nd  ba c ke n d system s
• So m e co m pa n ie s do /do n ’t o w n  a n y  p a rt
o f th e va lue  ch ain
• Th e y  h a ve  a cce ss to  in fo rm atio n  ab o u t a ll
fro m  sup p le r/m a nu fac tu re r to th e  c u sto m e r
3 6
S u p p ly C h a in  M o d e l
S u pp lier1
S u pp lier3
S u pp lier2
M a n ufa ctu re rM an u fa cture r D e a le rs R e taile rsR etaile rs C u sto m er
$ P In fo rm ation  flow
H o rizo n ta l
M arke t
place
M o d e l
• V irtua l V a lu e  C ha in
• In fo rm a tio n flo w  a cro ss th e  su p ply  c ha in
• A ll p a rties h av e  a stro ng  ele ctro n ic  b o n d
a nd  ba c ke n d system s
• So m e co m pa n ie s do /do n ’t o w n  a n y  p a rt
o f th e va lue  ch ain
• Th e y  h a ve  a cce ss to  in fo rm atio n  ab o u t a ll
fro m  sup p le r/m a nu fac tu re r to th e  c u sto m e r
3 3
D irect-to-C u sto m e r M o d e l
P F l o w  o f  P r o d u c t
$ F l o w   o f  M o n e y
P
C o m p u t e r
M a n u f a c t u r e r
( e g .   I B M , H P )
C o m p u t e r
M a n u f a c t u r e r
( e g .  I B M ,H P )
D i s t r i b u t o rD i s t r i b u t o r D e a l e rD e a l e r
C u s t o m e r
$ $$
P PP
D E L L
C u s t o m e r
$
T rad itio na l B u sines s  M o d el
N ew  B us ines s M o d el
• C a n  se ll at low er p rice s
• B uild  to cu stom e r ord e r
• R e ce iv e  pa y m e nt e a rlie r
• S pe e d  u p  n e w  p rod u ct rele ase  c yc le s
• U se  c u stom e r d a ta  to p rov id e
c u stom ize d  v a lu e  ad d ed  service
• P roa ctiv e d e cision  m a k in g 3 3
D irect-to-C u sto m e r M o d e l
P F l o w  o f  P r o d u c t
$ F l o w   o f  M o n e y
P
C o m p u t e r
M a n u f a c t u r e r
( e g .   I B M , H P )
C o m p u t e r
M a n u f a c t u r e r
( e g .  I B M ,H P )
D i s t r i b u t o r D e a l e r
C u s t o m e r
$ $$
P PP
D E L L
C u s t o m e r
$
T rad itio na l B u sines s  M o d el
N ew  B us ines s M o d el
• C a n  se ll at low er p rice s
• B uild  to cu stom e r ord e r
• R e ce iv e  pa y m e nt e a rlie r
• S pe e d  u p  n e w  p rod u ct rele ase  c yc le s
• U se  c u stom e r d a ta  to p rov id e
c u stom ize d  v a lu e  ad d ed  service
• P roa ctiv e d e cision  m a k in g
7
© 2008, University of Colombo School of Computing
Threat of New Entrants
Bargainin
g Power of
Suppliers
Bargainin
g Power of
Suppliers
Bargain
ing
Power
of
Buyers
Bargain
ing
Power
of
Buyers
Threat
of New
Entrants
Threat
of New
Entrants
Threat of
Substitute
s
Threat of
Substitute
s
Rivalry
Among
Competito
rs
Rivalry
Among
Competito
rs
Strengthen the
barriers of entry
Value Added
Product
Differentiation
Product Bundling
CRM -Customer
Relationship Mgt
Strategic Alliances
Cost Leadership
Strengthen the
barriers of entry
Value Added
Product
Differentiation
Product Bundling
CRM -Customer
Relationship Mgt
Strategic Alliances
Cost Leadership
35
Full-Service Provider Model
Service
Provider1
Service
Provider1
Full Service
Provider
Full Service
Provider
Customers
Retailers
Buyers
Service
Provider2
Service
Provider2
Service
Provider3
Service
Provider3
Airline tickets
Information flow
Hotel bookings
Transportation of goods
Vertical Market place Model
• Has to know a lot about the customer
• Provides own or third party products
• Offers a wide range of products
• Offers different channels
Internet, face-to-face, phone, etc.
• Sells it’s own products+ Commission for
third party products
• Some charge customers a service fee
35
Full-Service Provider Model
Service
Provider1
Service
Provider1
Full Service
Provider
Full Service
Provider
Customers
Retailers
Buyers
Service
Provider2
Service
Provider2
Service
Provider3
Service
Provider3
Airline tickets
Information flow
Hotel bookings
Transportation of goods
Vertical Market place Model
• Has to know a lot about the customer
• Provides own or third party products
• Offers a wide range of products
• Offers different channels
Internet, face-to-face, phone, etc.
• Sells it’s own products+ Commission for
third party products
• Some charge customers a service fee
33
Direct-to-Customer Model
P
F low of Product
$
Flow  of M oney
P
Com puter
M anufacturer
(eg. IBM ,HP)
C om puter
M anufacturer
(eg. IBM ,HP)
D istributorDistributor D ealerD ealer
C ustom er
$ $$
P PP
DE LLD ELL
C ustom er
$
Traditional Business Model
New Business Model
• Can sell at lower prices
• Build to customer order
• Receive payment earlier
• Speed up new product release cycles
• Use customer data to provide
customized value added service
• Proactive decision making
33
Direct-to-Customer Model
P
F low of Product
$
Flow  of M oney
P
Com puter
M anufacturer
(eg. IBM ,HP)
C om puter
M anufacturer
(eg. IBM ,HP)
D istributorDistributor D ealer
C ustom er
$ $$
P PP
DE LLD ELL
C ustom er
$
Traditional Business Model
New Business Model
• Can sell at lower prices
• Build to customer order
• Receive payment earlier
• Speed up new product release cycles
• Use customer data to provide
customized value added service
• Proactive decision making
Micheal Porter’s Five Forces Business Strategies eBusiness Models
15
Market/Promote
Products collectively
Market/Promote
Products collectivelyCompany 1
Company 2
Company 3
Company 4
Order together to get
Economies of scale
Order together to get
Economies of scale
Deal with customers
to get larger projects
Deal with customers
to get larger projects
eHub
or
ePortal
Revenue sharing eB Model
• The sellers get together through a Portal
• They Market/ Promote products collectively to a larger mkt segment
• Sellers can work on larger projects/orders as they work collectively
• Collective bulk orders give them bargaining power over suppliers
• Resources as well as profits are shared among companies
Information flow
$ P
28
Digital Value Hub – eRegion
Suppliers T1 Manufacturers Distributors
Customers
Suppliers T2
• Strong B2B partnerships and collaborations between nodes
in the supply chain
• The industry competitors willingness to work together
• Trust relationships among the competitors in an ndustry
• A strong force against foreign competition
28
Digital Value Hub – eRegion
Suppliers T1 Manufacturers Distributors
Customers
Suppliers T2
• Strong B2B partnerships and collaborations between nodes
in the supply chain
• The industry competitors willingness to work together
• Trust relationships among the competitors in an ndustry
• A strong force against foreign competition
8
© 2008, University of Colombo School of Computing
Rivalry Among Existing Firms
Bargainin
g Power of
Suppliers
Bargainin
g Power of
Suppliers
Bargain
ing
Power
of
Buyers
Bargain
ing
Power
of
Buyers
Threat
of New
Entrants
Threat
of New
Entrants
Threat of
Substitute
s
Threat of
Substitute
s
Rivalry
Among
Competitors
Rivalry
Among
Competitors
VA Product
Differentiation /
Strategic Alliances
/Product Bundling /
Horizontal
integration /
Marketing / Price
discrimination
strategies/ Pricing
Strategies /
Targeting Niche
markets/ Customer
Relationship
Management
(CRM)/Expand
Product Line
VA Product
Differentiation /
Strategic Alliances
/Product Bundling /
Horizontal
integration /
Marketing / Price
discrimination
strategies/ Pricing
Strategies /
Targeting Niche
markets/ Customer
Relationship
Management
(CRM)/Expand
Product Line
35
Full-Service Provider Model
Service
Provider1
Service
Provider1
Full Service
Provider
Full Service
Provider
Customers
Retailers
Buyers
Service
Provider2
Service
Provider2
Service
Provider3
Service
Provider3
Airline tickets
Information flow
Hotel bookings
Transportation of goods
Vertical Market place Model
• Has to know a lot about the customer
• Provides own or third party products
• Offers a wide range of products
• Offers different channels
Internet, face-to-face, phone, etc.
• Sells it’s own products+ Commission for
third party products
• Some charge customers a service fee
35
Full-Service Provider Model
Service
Provider1
Service
Provider1
Full Service
Provider
Full Service
Provider
Customers
Retailers
Buyers
Service
Provider2
Service
Provider2
Service
Provider3
Service
Provider3
Airline tickets
Information flow
Hotel bookings
Transportation of goods
Vertical Market place Model
• Has to know a lot about the customer
• Provides own or third party products
• Offers a wide range of products
• Offers different channels
Internet, face-to-face, phone, etc.
• Sells it’s own products+ Commission for
third party products
• Some charge customers a service fee
33
Direct-to-Customer Model
P
F low of Product
$
Flow  of M oney
P
Com puter
M anufacturer
(eg. IBM ,HP)
C om puter
M anufacturer
(eg. IBM ,HP)
D istributorDistributor D ealerD ealer
C ustom er
$ $$
P PP
DE LLD ELL
C ustom er
$
Traditional Business Model
New Business Model
• Can sell at lower prices
• Build to customer order
• Receive payment earlier
• Speed up new product release cycles
• Use customer data to provide
customized value added service
• Proactive decision making
33
Direct-to-Customer Model
P
F low of Product
$
Flow  of M oney
P
Com puter
M anufacturer
(eg. IBM ,HP)
C om puter
M anufacturer
(eg. IBM ,HP)
D istributorDistributor D ealer
C ustom er
$ $$
P PP
DE LLD ELL
C ustom er
$
Traditional Business Model
New Business Model
• Can sell at lower prices
• Build to customer order
• Receive payment earlier
• Speed up new product release cycles
• Use customer data to provide
customized value added service
• Proactive decision making
Micheal Porter’s Five Forces Business Strategies eBusiness Models
36
Supply Chain Model
Supplier1
Supplier3
Supplier2
Manufacturer Dealers Retailers Customer
$ P Information flow
Horizontal
Market
place
Model
• Virtual Value Chain
• Information flow across the supply chain
• All parties have a strong electronic bond
and backend systems
• Some companies do/don’t own any part
of the value chain
• They have access to information about all
from suppler/manufacturer to the customer
36
Supply Chain Model
Supplier1
Supplier3
Supplier2
Manufacturer Dealers Retailers Customer
$ P Information flow
Horizontal
Market
place
Model
• Virtual Value Chain
• Information flow across the supply chain
• All parties have a strong electronic bond
and backend systems
• Some companies do/don’t own any part
of the value chain
• They have access to information about all
from suppler/manufacturer to the customer
14
eAuction Model
P
Flow of Product
$ Flow of Money
$ $
P P
Seller 1 Buyer 1
eAuctioneer
Seller 2
Buyer 2
• The seller pays a % of the price
• The intermediary does not take
responsibility for the sale/payment
• Money goes from prospective seller
to eAuctioneer, buyers to actual seller,
Successful seller t eAuctioneer
• They advertise & use Allies to build traffic
• eAuctioneer owns the cust. Relationship
& data, but, not the transaction
• Lowers operating costs, risks
Intermediary
14
eAuction Model
P
Flow of Product
$ Flow of Money
$ $
P P
Seller 1 Buyer 1
eAuctioneer
Seller 2
Buyer 2
• The seller pays a % of the price
• The intermediary does not take
responsibility for the sale/payment
• Money goes from prospective seller
to eAuctioneer, buyers to actual seller,
Successful seller t eAuctioneer
• They advertise & use Allies to build traffic
• eAuctioneer owns the cust. Relationship
& data, but, not the transaction
• Lowers operating costs, risks
Intermediary
15
Market/Promote
Products collectively
Market/Promote
Products collectivelyCompany 1
Company 2
Company 3
Company 4
Order together to get
Economies of scale
Order together to get
Economies of scale
Deal with customers
to get larger projects
Deal with customers
to get larger projects
eHub
or
ePortal
Revenue sharing eB Model
• The sellers get together through a Portal
• They Market/ Promote products collectively to a larger mkt segment
• Sellers can work on larger projects/orders as they work collectively
• Collective bulk orders give them bargaining power over suppliers
• Resources as well as profits are shared among companies
Information flow
$ P
9
© 2008, University of Colombo School of Computing
Bargaining Power of Suppliers
Threat of
New
Entrants
Threat of
New
Entrants
Bargain
ing
Power
of
Buyers
Bargain
ing
Power
of
Buyers
Bargainin
g Power of
Suppliers
Bargainin
g Power of
Suppliers
Threat of
Substitute
s
Threat of
Substitute
s
Rivalry
Among
Competito
rs
Rivalry
Among
Competito
rs
Reduce Suppliers
Monopoly or
Strength of
Suppliers
Product
Differentiation /
Backward
Integration / Supply
Chain Mgt (SCM) /
Strategic Alliances /
ePortal (for bulk
ordering)
Reduce Suppliers
Monopoly or
Strength of
Suppliers
Product
Differentiation /
Backward
Integration / Supply
Chain Mgt (SCM) /
Strategic Alliances /
ePortal (for bulk
ordering)
Micheal Porter’s Five Forces Business Strategies eBusiness Models
15
Market/Promote
Products collectively
Market/Promote
Products collectivelyCompany 1
Company 2
Company 3
Company 4
Order together to get
Economies of scale
Order together to get
Economies of scale
Deal with customers
to get larger projects
Deal with customers
to get larger projects
eHub
or
ePortal
Revenue sharing eB Model
• The sellers get together through a Portal
• They Market/ Promote products collectively to a larger mkt segment
• Sellers can work on larger projects/orders as they work collectively
• Collective bulk orders give them bargaining power over suppliers
• Resources as well as profits are shared among companies
Information flow
$ P
36
Supply Chain Model
Supplier1
Supplier3
Supplier2
Manufacturer Dealers Retailers Customer
$ P Information flow
Horizontal
Market
place
Model
• Virtual Value Chain
• Information flow across the supply chain
• All parties have a strong electronic bond
and backend systems
• Some companies do/don’t own any part
of the value chain
• They have access to information about all
from suppler/manufacturer to the customer
36
Supply Chain Model
Supplier1
Supplier3
Supplier2
Manufacturer Dealers Retailers Customer
$ P Information flow
Horizontal
Market
place
Model
• Virtual Value Chain
• Information flow across the supply chain
• All parties have a strong electronic bond
and backend systems
• Some companies do/don’t own any part
of the value chain
• They have access to information about all
from suppler/manufacturer to the customer
28
Digital Value Hub – eRegion
Suppliers T1 Manufacturers Distributors
Customers
Suppliers T2
• Strong B2B partnerships and collaborations between nodes
in the supply chain
• The industry competitors willingness to work together
• Trust relationships among the competitors in an ndustry
• A strong force against foreign competition
28
Digital Value Hub – eRegion
Suppliers T1 Manufacturers Distributors
Customers
Suppliers T2
• Strong B2B partnerships and collaborations between nodes
in the supply chain
• The industry competitors willingness to work together
• Trust relationships among the competitors in an ndustry
• A strong force against foreign competition
10
© 2008, University of Colombo School of Computing
Bargaining Power of Buyers
Bargainin
g Power of
Suppliers
Bargainin
g Power of
Suppliers
Bargaining
Power of
Buyers
Bargaining
Power of
Buyers
Threat of
Substitute
s
Threat of
Substitute
s
Rivalry
Among
Competito
rs
Rivalry
Among
Competito
rs
VA Product
Differenciation /
Forward
Integration /
Marketing / Product
bundling / Product
Development /
Strategic Alliances /
Customer
Relationship Mgt
(CRM) / Cost
Leadership / Pricing
Strategies / Expand
Product line
VA Product
Differenciation /
Forward
Integration /
Marketing / Product
bundling / Product
Development /
Strategic Alliances /
Customer
Relationship Mgt
(CRM) / Cost
Leadership / Pricing
Strategies / Expand
Product line
Micheal Porter’s Five Forces Business Strategies eBusiness Models
Threat of
New
Entrants
Threat of
New
Entrants
35
Full-Service Provider Model
Service
Provider1
Service
Provider1
Full Service
Provider
Full Service
Provider
Customers
Retailers
Buyers
Service
Provider2
Service
Provider2
Service
Provider3
Service
Provider3
Airline tickets
Information flow
Hotel bookings
Transportation of goods
Vertical Market place Model
• Has to know a lot about the customer
• Provides own or third party products
• Offers a wide range of products
• Offers different channels
Internet, face-to-face, phone, etc.
• Sells it’s own products+ Commission for
third party products
• Some charge customers a service fee
35
Full-Service Provider Model
Service
Provider1
Service
Provider1
Full Service
Provider
Full Service
Provider
Customers
Retailers
Buyers
Service
Provider2
Service
Provider2
Service
Provider3
Service
Provider3
Airline tickets
Information flow
Hotel bookings
Transportation of goods
Vertical Market place Model
• Has to know a lot about the customer
• Provides own or third party products
• Offers a wide range of products
• Offers different channels
Internet, face-to-face, phone, etc.
• Sells it’s own products+ Commission for
third party products
• Some charge customers a service fee
33
Direct-to-Customer Model
P
F low of Product
$
Flow  of M oney
P
Com puter
M anufacturer
(eg. IBM ,HP)
C om puter
M anufacturer
(eg. IBM ,HP)
D istributorDistributor D ealerD ealer
C ustom er
$ $$
P PP
DE LLD ELL
C ustom er
$
Traditional Business Model
New Business Model
• Can sell at lower prices
• Build to customer order
• Receive payment earlier
• Speed up new product release cycles
• Use customer data to provide
customized value added service
• Proactive decision making
33
Direct-to-Customer Model
P
F low of Product
$
Flow  of M oney
P
Com puter
M anufacturer
(eg. IBM ,HP)
C om puter
M anufacturer
(eg. IBM ,HP)
D istributorDistributor D ealer
C ustom er
$ $$
P PP
DE LLD ELL
C ustom er
$
Traditional Business Model
New Business Model
• Can sell at lower prices
• Build to customer order
• Receive payment earlier
• Speed up new product release cycles
• Use customer data to provide
customized value added service
• Proactive decision making
14
eAuction Model
P
Flow of Product
$ Flow of Money
$ $
P P
Seller 1 Buyer 1
eAuctioneer
Seller 2
Buyer 2
• The seller pays a % of the price
• The intermediary does not take
responsibility for the sale/payment
• Money goes from prospective seller
to eAuctioneer, buyers to actual seller,
Successful seller t eAuctioneer
• They advertise & use Allies to build traffic
• eAuctioneer owns the cust. Relationship
& data, but, not the transaction
• Lowers operating costs, risks
Intermediary
14
eAuction Model
P
Flow of Product
$ Flow of Money
$ $
P P
Seller 1 Buyer 1
eAuctioneer
Seller 2
Buyer 2
• The seller pays a % of the price
• The intermediary does not take
responsibility for the sale/payment
• Money goes from prospective seller
to eAuctioneer, buyers to actual seller,
Successful seller t eAuctioneer
• They advertise & use Allies to build traffic
• eAuctioneer owns the cust. Relationship
& data, but, not the transaction
• Lowers operating costs, risks
Intermediary
15
Market/Promote
Products collectively
Market/Promote
Products collectivelyCompany 1
Company 2
Company 3
Company 4
Order together to get
Economies of scale
Order together to get
Economies of scale
Deal with customers
to get larger projects
Deal with customers
to get larger projects
eHub
or
ePortal
Revenue sharing eB Model
• The sellers get together through a Portal
• They Market/ Promote products collectively to a larger mkt segment
• Sellers can work on larger projects/orders as they work collectively
• Collective bulk orders give them bargaining power over suppliers
• Resources as well as profits are shared among companies
Information flow
$ P
11
© 2008, University of Colombo School of Computing
Bargaining Power of Substitutes
Bargainin
g Power of
Suppliers
Bargainin
g Power of
Suppliers
Bargain
ing
Power
of
Buyers
Bargain
ing
Power
of
Buyers
Bargaining
Power of
Substitutes
Bargaining
Power of
Substitutes
Rivalry
Among
Competito
rs
Rivalry
Among
Competito
rs
Deal with the threat
before it is too big to
handle (do not avoid,
ignore or under-estimate the threat)
Product
Diversification /
Market
Diversification
Product Bundling /
Strategic Alliances /
Pricing Strategies
Deal with the threat
before it is too big to
handle (do not avoid,
ignore or under-estimate the threat)
Product
Diversification /
Market
Diversification
Product Bundling /
Strategic Alliances /
Pricing Strategies
35
Full-Service Provider Model
Service
Provider1
Service
Provider1
Full Service
Provider
Full Service
Provider
Customers
Retailers
Buyers
Service
Provider2
Service
Provider2
Service
Provider3
Service
Provider3
Airline tickets
Information flow
Hotel bookings
Transportation of goods
Vertical Market place Model
• Has to know a lot about the customer
• Provides own or third party products
• Offers a wide range of products
• Offers different channels
Internet, face-to-face, phone, etc.
• Sells it’s own products+ Commission for
third party products
• Some charge customers a service fee
35
Full-Service Provider Model
Service
Provider1
Service
Provider1
Full Service
Provider
Full Service
Provider
Customers
Retailers
Buyers
Service
Provider2
Service
Provider2
Service
Provider3
Service
Provider3
Airline tickets
Information flow
Hotel bookings
Transportation of goods
Vertical Market place Model
• Has to know a lot about the customer
• Provides own or third party products
• Offers a wide range of products
• Offers different channels
Internet, face-to-face, phone, etc.
• Sells it’s own products+ Commission for
third party products
• Some charge customers a service fee
Micheal Porter’s Five Forces Business Strategies eBusiness Models
15
Market/Promote
Products collectively
Market/Promote
Products collectivelyCompany 1
Company 2
Company 3
Company 4
Order together to get
Economies of scale
Order together to get
Economies of scale
Deal with customers
to get larger projects
Deal with customers
to get larger projects
eHub
or
ePortal
Revenue sharing eB Model
• The sellers get together through a Portal
• They Market/ Promote products collectively to a larger mkt segment
• Sellers can work on larger projects/orders as they work collectively
• Collective bulk orders give them bargaining power over suppliers
• Resources as well as profits are shared among companies
Information flow
$ P
28
Digital Value Hub – eRegion
Suppliers T1 Manufacturers Distributors
Customers
Suppliers T2
• Strong B2B partnerships and collaborations between nodes
in the supply chain
• The industry competitors willingness to work together
• Trust relationships among the competitors in an ndustry
• A strong force against foreign competition
28
Digital Value Hub – eRegion
Suppliers T1 Manufacturers Distributors
Customers
Suppliers T2
• Strong B2B partnerships and collaborations between nodes
in the supply chain
• The industry competitors willingness to work together
• Trust relationships among the competitors in an ndustry
• A strong force against foreign competition
Threat of
New
Entrants
Threat of
New
Entrants
12
© 2008, University of Colombo School of Computing
eBusiness Goals/Strategies
Direct-to-customer
Full Service Provider
EPortal /eAuctioneer
Product (Value Added) Differentiation / Forward Integration /
Marketing / Product bundling / Product Development / Strategic
Alliances / Customer Relationship Mgt (CRM) / Cost Leadership /
Pricing Strategies / Expand Product line
Bargaining Power
of Buyers
Supply Chain Model
EPortal /eRegion
Product Differentiation / Backward Integration / Supply Chain Mgt
(SCM) / Strategic Alliances / ePortal (for bulk ordering)
Bargaining Power
of Suppliers
Full Service Provider
ePortal / eRegion
Product Diversification / Market Diversification
Product Bundling / Strategic Alliances / Pricing Strategies
Threat of
Substitutes
eAuction Model
Full Service Provider
EPortal
Supply Chain Model
Direct-to-cust. Model
Product (Value-added) Differentiation / Strategic Alliances /Product
Bundling / Horizontal integration / Marketing / Price discrimination
strategies/ Pricing Strategies / Targeting Niche markets/ Customer
Relationship Management (CRM)/Expand Product Line
Rivalry among
existing Firms
Direct-to-Customer
Full Service Provider
ePortal / eRegion
Product Differentiation / Product Bundling / Customer Relationship
Mgt (CRM)/Strategic Alliances / Cost Leadership
Threat of New
Entrants
eBusiness
Models
Business StrategiesForce
13
© 2008, University of Colombo School of Computing
Conclusion
• eBusiness Transformation needs proper
business strategies and models to gain the
competitive advantage
• Issues need to be looked at on building
Trust relationships with Strategic Partners
as well as Customers

How To Connect

This beautifully illustrated full-color guide is the ideal introduction to the world of on-line services. The book reveals the hidden secrets to buying your first modem, connecting to an on-line system, managing download files, and everything in-between.

This section contains Chapters 1, 2, and 4 from “How To Connect.” To order the full book, call 1-800-688-0448, extension 199.

Click the turned up pages on the right and left corners to go forward and backward through the book. Click the sides of the Almanac to jump a chapter at a time.

Chapter 1

Worlds of Possibilities

-Who’s Out There?

-All the News

-Reference Desk

-Money Matters

-The Electronic Mall

-Travel & Leisure

-The Mail’s Here

-Entertainment

-Software Availability

-The “Virtual Community”

1 Connecting. The idea seems simple enough. People linking their computers to other computers in order to communicate with one another. They exchange information, share data, and talk about ideas. But just what kind of information do they exchange? What data can they share? Just what do they talk about?

You might be surprised. Your personal computer, modem, and communications software open worlds of possibilities: libraries of information, shopping opportunities of all kinds, personal financial services, travel information, games, sports, weather, and much, much more-all available at the other end of the phone line.

Who’s Out There?

So what exactly does on line mean? On line is any place that computer users go via modems and telephone lines to meet with one another. On line can be a commercial service, such as CompuServe, Prodigy, or America Online, where many members connect to a central system to gather information and exchange ideas. On line can be a local bulletin board system-a single computer into which users call to read messages or copy software programs to their computers at home or work. On line can be an electronic mail service, such as MCI Mail or AT&T Mail, that individuals use to send messages to other service users. (You’ll learn much more about all these specific kinds of on-line connections in later chapters. For now, just open your mind to all the wonderful possibilities that are available when you use a modem to connect to another computer.) Sometimes, the terms going on line and connecting are used interchangeably because both refer to essentially the same thing: any kind of electronic connection between computers.

Who goes on line? Literally millions of people just like you. On-line services and bulletin board systems are virtual cities teeming with activity. The same people you might meet on a city street can be found on line talking about the same things they talk about on street corners. And the errands they do at the post office, the bank, the local shops, the newsstand, and the library-they’re doing them on line, too.

There are computer enthusiasts talking about the latest technologies, copying software programs from the on-line network to their own PCs, and swapping war stories about hardware upgrades. And there are lawyers, too, checking into electronic law libraries to find legal rulings that support their cases. There are investors buying and selling stocks, business travelers rearranging airline tickets, and collectors bartering everything from vintage baseball cards to antique automobiles.

Children talk with other kids about homework, fads, television, and music. They compete with one another in electronic games. They find information for school projects, and, more often than not, they learn something new.

Shoppers come on line to get the skinny on what car dealers really pay for a particular car. They scan electronic databases to find out who’s offering the best price on the laser printer of their choice. Sometimes they just get snagged by a really great deal and buy products-while they’re on line.

There are whole communities of fence-post chatters talking about politics, sports, current affairs, and the arts. If you have an interest, a social cause, or a point of view, there’s someone on line ready to talk to you. In fact, every day people are meeting one another and making new friends. And every once in a while, couples fall in love, meet, and get married.

All the News

If you’re a news junkie, on line is the place for up-to-the-minute information. The Associated Press provides hourly news updates through CompuServe and other on-line services. The San Jose Mercury News and The Chicago Tribune post each morning’s edition on America Online. But on-line services don’t just wait for news to come across the “wire.” They keep on top of the latest stories themselves. During the 1992 presidential election, for example, Prodigy updated voter tallies as each state reported its results. Services also poll members and report the ebb and flow of popular opinion on a variety of news issues.

Of course, there’s much more than front-page news on on-line services. You can get weather reports for any location across the country or around the world-including satellite and radar maps. Sports fans tune into on-line services to gather the scores from all the games, get the scoop on who’s been traded where, and find out who’s on and off the injured reserve list. Hollywood watchers get the low-down on all the entertainment news, from the latest television deals to movie reviews and soap opera summaries. From time to time, the stars even come out on-line to talk to their fans.

Reference Desk

Have you ever wondered about…anything? Well, you can probably find the answer on line. When was the Magna Carta signed? How fast does the bullet train go? How many movies did Charlie Chaplin make? What should I pay for a new Ford Taurus? What can I do to treat poison ivy? Should we send our child to Princeton? And where will we get the money to pay for it?

If you can imagine a question, there is an on-line source to answer it. Most popular reference works-and many obscure ones-are now available in electronic form, enabling you to dial in, search a database, and get the information you need. There are dozens of databases that have abstracts and full articles from literally hundreds of thousands of magazines, journals, and newsletters. Government information-from census data to agency regulations-is all available on line. You can check government tax regulations (do you have to pay social security taxes for your nanny?), research the demographic profiles of cities and towns across the country (where’s the best place to locate a new frozen yogurt factory?), and stay current on pending government contracts (maybe your paving business can bid on that new highway job).

Consumer Reports, Books in Print, Marquis Who’s Who, Magill’s Survey of Cinema, Peterson’s College Database, a half-dozen encyclopedias, dictionaries, and thesauruses, even phone books-you can use them all on line.

Money Matters

It’s hard to be a smart investor these days without going on line. In fact, most personal finance and investment software have built-in links to on-line services. There’s no need to wait for the morning paper to find out how your stock did in the day’s market. On line, you can get instant stock price quotes throughout the trading day. Thinking about investing in a new company? On-line services provide the latest news from Dow Jones, financial and company reports, ratings from Standard & Poors, and much more to help you make wise investment decisions. When you’re ready to put your money into the company, you can buy the stock on line, too. A number of brokerage firms, including Quick & Reilly, Dreyfus, Fidelity Investments, and Charles Schwabb, make their services available electronically.

Popular personal finance programs, including Managing Your Money and Quicken, let you dial into on-line services and update your stock portfolio automatically. Another program, Fidelity Online Express, lets you buy and sell mutual funds through the Fidelity discount brokerage network. Reality Technology’s innovative Smart Investor software and service gives all the tools investors need to evaluate, buy, track, and sell investments-all on line.

If you only dream about making your million in the stock market, on-line services let you make a game of it. CompuServe, for example, lets members compete with one another as they build imaginary portfolios. The player who makes the most money wins.

But on-line money management isn’t just about investing. Some banks, such as Citibank in New York, let you handle basic bank transactions by dialing into the bank’s computer and accessing your account. And a variety of services let you pay bills without ever writing a check. You simply send an electronic payment notice to the Checkfree or BillPay USA services, for example, and they transfer money from your account to pay the bills you specify-from major credit cards to the local drugstore.

The Electronic Mall

Imagine if every mail order catalog in existence arrived on your doorstep. That’s just about what it’s like to shop on line. Brooks Brothers, Lands’ End, JC Penney, Books On Tape, Barnes & Noble, Columbia House music club, Hammacher Schlemmer, JDR Microdevices, Omaha Steaks, the Metropolitan Museum of Art-these are just a sampling of the organizations selling their wares on line. Need to send flowers, a fruit basket, or maybe some specialty coffees or foods? Dozens of companies ready to serve you are just a modem connection away.

Although there are plenty of things to buy on line, there are also ways to be smarter about what you buy. Consumer Reports, for example, provides its product-test results electronically. You can find out how much you should spend on a new car, including dealer prices for every option and tips on negotiating the deal. If you’re in the market for computer products, Ziff Buyer’s Market and PC Catalog are two services that provide a complete listing of direct marketing sources for hardware and software. In seconds, you can find out who’s got the best price on hundreds of products. And the Boston Computer Exchange links buyers and sellers of used computer equipment.

If your idea of shopping is a lazy Saturday afternoon cruising yard sales, you’ll find something on line, too. On-line classifieds are a giant swap meet where you can buy or sell just about anything-without ever taking the car out of the garage.

Travel and Leisure

Planning a trip? Make an on-line service the first stop on your itinerary. From airline tickets to hotel reservations, you can handle all your travel arrangements yourself, without having to “hold for the next available agent.” You can book flights, reserve rental cars, even sign up for an exotic cruise. Many prominent travel services are available to on-line users, but one of the most useful is the EAASY SABRE reservation system that enables you to search for, compare, book, and purchase airline tickets, reserve rental cars, and schedule hotel accommodations. You’ll also find other handy traveler’s resources, including the Official Airline Guide Electonic Travel Services, which features up-to-date listings of more than 200,000 flights and a bevy of travel-related information, including frequent-flier program rules, toll-free airline, hotel, and car rental phone numbers, and much more.

You might expect to find Zagat’s Restaurant Survey on line-and you will. And you likely won’t be surprised to find recommendations from other on-line users on the best bed-and-breakfasts, out-of-the-way restaurants, interesting sites, and all the rest. But you might be surprised to learn that U.S. State Department travel and visa advisories are on line and updated as international situations change.

The Mail’s Here

It won’t be surprising if the twenty-nine cent stamp winds up on the endangered species list. More and more people are taking advantage of fast, immediate electronic mail to get their messages across. Anyone who signs up for a major on-line service automatically gets a “mailbox” to receive electronic mail, commonly referred to as e-mail. There are even services, such as MCI Mail and AT&T mail, that do nothing but act as an electronic post office, delivering e-mail to their members.

An electronic message costs about the same to send as a paper letter, but with added benefits. You can be sure that your message is instantly delivered to the recipient’s electronic mailbox. You can even request a “return receipt” that notifies you when the recipient has opened your message. If you need to send a mass-mailing, you can write one message and send it to dozens of mailboxes, without copying and addressing all those envelopes. And when you want to reply to a message, the system automatically addresses the envelope; you just fill in your response.

E-mail is a great way for friends and family to stay in touch, too. One message, sent to every family member, for example, can launch a discussion of the type usually shared around the Thanksgiving table. And e-mail lets you quickly speak what’s on your mind. Do you have a gripe for the president? Drop him an electronic message. In fact, according to one news report, the White House gets up to 5,000 e-mail messages a week.

Entertainment

Just because you are home alone doesn’t mean you have to play by yourself. Imagine, instead, going into combat, descending a dark cave, or rocketing into space-by dialing into an on-line service to match wits with other members in on-line interactive games. GEnie provides a host of multiplayer games. Prodigy’s Baseball Manager is a season-long venture that pits you against other members in a rotisserie baseball league. Rotisserie baseball uses the performance of real-life baseball players across the major leagues to chart the performance of the game-player’s dream team. You draft and trade players, arrange the lineups, and play games with other on-line league members. How your team performs in a given game depends on how the real-life ball players did in that night’s outing at the ballpark. Just about every on-line service hosts lots of games, including trivia contests of every flavor and stripe with chances to win real prizes.

On-line services offer a host of downloadable games, too. You can copy these game programs from the on-line service to your own computer and play them after you’ve disconnected from the on-line service. Games are among the most frequently downloaded programs, and you can find them in a number of categories including sports, arcades, education, role playing, strategy, and many more.

But games aren’t the only entertainment you’ll find on line. Many on-line services have complete movie, theatre, book, and arts reviews to let you know what entertainment is worth your pursuit and what isn’t. On-line service users get together in special discussion areas to share the opinions on the latest in pop culture. And oftentimes authors, television actors, and movie stars come on line to talk with their fans.

Software Availability

Picture a room filled with floppy disks. There are word processing and spreadsheet programs. There are programs that teach kids to read and spell and multiply and subtract. And there are drawing programs and desktop publishing packages, and thousands of ready-made graphics to use with them. In one corner of the room there’s nothing but utilities to back up your hard drive, fine-tune your system, and make your PC easier and more fun to use. There’s a stack of programs that make music, and mountains of databases to manage everything from business client lists to baseball card collections. The middle of the room is piled high with games, games, and more games.

You’ll need a very large imagination to picture just how big this room needs to be to hold all the software available at the other end of a modem connection. Most of the programs are shareware, which you can copy to your PC and try before paying for them. Others are absolutely free. And they’re all just a phone call away.

The “Virtual Community”

You’re never lonely when you’re on line; there are always lots of people to talk to, and talk to, and talk to. But what is everyone talking about? Food, politics, religion, fitness, literature, hobbies, art, music, their pets, and their pet peeves.

On-line services are “virtual communities” where people come together to share ideas, tell their secrets, and criticize and compliment just about everything. In computer speak, “virtual” is any electronic experience that mimics real life. You may have heard the term “virtual reality” used to describe computer-simulated experiences that seem to the user to be like the real thing. A virtual community is a group of people who come together and interact, even though they come from across the country or around the globe.

On-line services, then, become a forum for meeting people who share your interests and ideas. No matter what your profession or passion, there’s a group already talking about it. You can talk about fly-fishing with someone in Wisconsin, even though you live in Alabama. Or you can share your recipe for barbecue sauce with a “neighbor” in Maine. No matter that you live in Texas. Because when you’re on line, location doesn’t matter; you’re still all in the same “place.”

Catching the latest news, doing your banking, shopping, playing games, talking to people around the globe-it all sounds great, but isn’t it expensive? The honest answer is yes and no. Connections do cost money. Even local bulletin board services must pay for the hardware and software, and perhaps even personnel, to keep the service up and running. Larger on-line services must keep a network of larger computers up and running day in and day out so that you can dial in to get the information you want, when you want it. They buy the rights to make information such as encyclopedias, news, sports scores, stock quotes, weather reports, airline schedules, and much more available to their members on line. They need a staff of programmers to keep the computers running and editors to make sure you always get the most current information. And their biggest costs, by far, are telecommunications costs that enable you to make a local call to access their services, even when their service is located across the country.

Ultimately, you help pay for these costs. Depending on what you are connecting to, the costs will vary. Sometimes, the charge is as little as a local phone call to a bulletin board service that you can use as much as you’d like at no additional costs. Other times, the bulletin board operator will ask you to pay a small registration fee to support the on-going operation of the board. Most on-line services charge a membership fee-typically less than $10 a month-that gives you access to many, if not all, the service’s features.

In addition to this membership fee, you may be charged for the amount of time you use all or some parts of the service. These charges, often referred to as connect time rates, typically range from about $4 per hour to as much as $20 or more, often depending on what time of day you call the service and what modem speed you use. The theory here is that it is cheaper for on-line services to cover their telecommunications costs during nonpeak evening and weekend hours. Faster modems allow members to get much more information in a shorter time on line, so you pay a premium for that extra information. Incidentally, because you can get so much more information sent to your computer using a faster modem, it ultimately costs less to use a faster modem, even if you pay a bit more for the connect time because you are connected to the service for significantly fewer minutes.

Most on-line services have special service plans that include several hours of connect time in the monthly membership fee. As a result, you can get a lot of use from a service such as America Online or Prodigy for less than $10 a month.

Chapter 2

The Culture of Connecting

-A Language All Its Own

-Minding Your Manners On Line

-What’s an Emoticon?

-Your On-line Rights

2 If you’ve ever traveled in foreign countries, you know that customs vary greatly from place to place. On-line communications is a world all its own, and it, too, has its own set of customs and courtesies. Just as it is important to know the local customs when you travel, knowing the etiquette of on-line communications will make you a more welcome member of the on-line community.

A Language All Its Own

Perhaps the first word in good manners is to learn the few key words and phrases that will help you understand your new environment and get along better with the people in the on-line community. On-line communications has a language all its own. There are dozens of technical terms-baud, parity, protocol, and handshake, for example-that you may hear from time to time. Throughout this book, I’ll define those terms in context. Rest assured, though, that you’ll not have to develop a whole new “techie” vocabulary in order to gain all the advantages of on-line communications.

But there’s much more to on-line language than dull communications jargon. The on-line community has its own unique conversational words and phrases, and these are the words you’ll really want to know. Furthermore, each on-line environment has its own dialect. For example, conversations might be called messages, notes, or threads, depending on the service you are using. But there are a number of words and phrases that are constant across all services. Here are a few that you’ll need to know to be part of the “in” on-line crowd.

>bulletin board system also BBS n: an on-line electronic service on which users can exchange messages and data or program files, most often for public consumption by all other members. Some services, such as Prodigy, refer to the places where members write messages to one another as bulletin boards.

>chat n: an on-line dialog in which two or more people participate in live electronic conversation, also known as a conference or CB. v: the act of participating in a live electronic conversation.

>download v: the act of copying a file from a bulletin board or on-line service to your PC. n: an electronic file that has been copied from a bulletin board or on-line service to your PC. Also referred to as DL, usually in the text of an on-line message.

>electronic mail also e-mail n: a private message sent from one person to another via an on-line information network. v: to send a message.

>flame n: a message or group of messages that are hostile, angry, or outrageous (did you see that flame about Microsoft’s new support policies?). v: to post a disparaging or controversial message. flamer n: one who regularly posts angry messages.

>forum n: originally from CompuServe, an area on an on-line           service devoted to a single topic (the politics forum).

>forward v: to pass along to someone else an electronic mail message previously sent to you.

>log on v: to connect to an on-line service by dialing the service and entering name and password.

>lurker n: one who reads and “listens in” on discussions without participating or contributing in the discussion. lurk v: the act of reading messages without posting responses.

>on-line service n: a commercial venture tdat provides information via an electronic connection (for example, America Online, CompuServe, Delphi, GEnie, Prodigy).

>post v: to place a message on a bulletin board or in a forum for     public reading.

>sysop n: a person who monitors on-line conversations to be sure they stay on track and aboveboard. Some services refer to this person as the board manager or guide.

>thread n: a series of messages or conversations that follow a single thought or topic; one continuous discussion.

>upload v: to send a file from your PC to a bulletin board or on-line service. n: the file that has been sent to the service.

With this list, you’re well on your way to developing a whole new vocabulary. You’ll learn several other new words as you make your way through this book, and even more as you explore on-line services. In fact, many terms will be specific to the on-line services you choose to use. As you come across new words or familiar words used in strange new ways, it is absolutely okay to ask other on-line service users just what they mean. Remember, they were new to on-line services once, too.

Minding Your Manners On Line

For all its advantages, on-line communications have one very major drawback: They’re totally dependent on the words you type to communicate your meaning. The gestures, eye movements, intonation, volume, smiles, and grimaces that help people understand the meaning and innuendo of your spoken communication are absent from your on-line messages. The on-line community has developed its own code to add this subtle layer of meaning to on-line messages. In polite on-line conversation, for example, you use upper- and lowercase characters, just as you would in a business letter or office memo. The mix of upper- and lowercase is easier to read online. But how do you raise your voice in an on-line message? Simple: TYPE IN ALL CAPS! The recipient will know you’re yelling. But never type in all capital letters unless you really do want to yell out your point. Messages in all capital letters are very hard to read and considered quite rude in on-line circles.

And what if you want to add emphasis to just a word or phrase, but you don’t exactly want to yell it? On-line messaging doesn’t support italic or bold text. Place an asterisk at either end of the word for *just the right* emphasis.

Perhaps the most important courtesy is KISS: Keep it short and simple. On most on-line services, brevity is not only the soul of wit, but also the key to more effective communications and lower connect time costs. When you write a letter to a friend on paper, you have the whole page to convey your ideas. With just one glance, your friend can take in the scope of your message, easily skip to the middle or end, and very quickly grasp the essence of your words. Not so on line. When you write an electronic message, you have a few lines transmitted one screen at a time. The recipient of your message must read your message sequentially-there’s no skipping to the end to find key ideas buried there. As the communicator, you need to make your point quickly. Otherwise, your on-line pen pal may give up on your message before you’ve had your say. That’s why short and sweet is the rule of thumb on line. Try to keep your messages to one or two screens. “Four score and seven years ago, our forefathers brought forth upon this continent a new nation” may sound good in a speech and it may read well on paper, but on line, “This country was founded 87 yrs. ago” plays much better.

To help keep messages brief, the on-line community has come up with hundreds, perhaps thousands, of shortcuts to say in a few letters what would otherwise take a dozen or more characters. The point of these acronyms is to make on-line messages brief, using the fewest characters to communicate the most information. Many of these acronyms will already be familiar to you; they are commonly used in everyday communications-ASAP (as soon as possible) and FYI (for your information). Many acronyms are specific to certain situations. The military, for example, has its own abbreviations, such as SNAFU (situation normal, all “fouled” up). Star Trek fans have theirs: SFS (search for Spock) and TFF (the final frontier). As you participate in bulletin board discussions, you’ll come to learn the acronyms unique to various special interests. And just as when you encounter unfamiliar new terms, when you don’t know the meaning of an acronym, just ask.

These shortened messages could fill an entire dictionary. Let’s get started with a few dozen of the most frequently used TLAs (three letter acronyms). IMHO (in my humble opinion), the list covers the acronyms you’re most likely to see on line. BTW (by the way), the creative uses of many TLAs will have you LOL (laughing out loud). You’ll notice that many TLAs have grown beyond three letters. TLA generally includes acronyms of any length, but in the interest of precision, some on-line users have adopted MLA to refer to multiletter acronyms. See the figure A TLA Dictionary for more examples of TLAs.

In addition to cryptic TLAs that pepper electronic conversations, on-line communicators have dozens of phonetic contractions, or words that are spelled like they sound. Many of these contractions, such as “tho,” and “thru,” are standard fare. Others are a bit more creative. For example, you may be asked, “how RU,” and you could answer, “fine, thx.” So if you see a strange letter combination and you aren’t sure what it might mean, fall back on the advice of our grade school teachers: sound it out. You may just find that you do know what’s being said.

What’s an Emoticon?

For many on-line communicators, simple bracketed comments just aren’t expressive enough. They needed something else to convey the spirit in which messages were intended. So these folks came up with whole galleries full of emoticons, strings of type characters that say what they mean. The most simple emoticon is the smiley 🙂 composed of a colon and a closed parenthesis. The smiley gave way to the frown 😦 and then dozens and dozens of variations started rolling in. Today there are hundreds of symbols to communicate a wide range of emotions.

When you see one of these character strings on line, the best way to make sense of it is to tilt your head to the left and look at it sideways. If you observe a colon, a hyphen, and an end parenthesis, you’ll be face to face with the standard smiley. See for yourself:

🙂

The use of an emoticon may also depend on the context. A message using :-# means “I wear braces” in a discussion about dental work, but means “I’m censoring my remarks” in a discussion about the performance of your employer.

You use emoticons just as you use gestures in a conversation. For example: My great, great Uncle Louie died. (:-… Then his lawyer called and told me I inherited a million bucks! 😀 Not that I won’t miss, Uncle Louie, of course 😦

Your On-line Rights

All this talk of showing emotion begs the question of just how much emotion you can show. Just how far can you go on-line? Each on-line service has its own guidelines for good taste. Many are family-oriented services that prefer the Seven Dirty Words stay off their messaging systems. Other on-line services provide special adults-only forums to discuss issues such as human sexuality (in all its various forms). And there are many free-for-all bulletin boards where anything goes. To know what conversation topics are appropriate on the services you choose to use, check out the membership agreements carefully. These will outline exactly what goes and what doesn’t.

If you do step out of line, unintentionally or otherwise, nearly all on-line services and bulletin boards retain the right to remove your message from public view. When this happens, a sysop generally informs you of your mistake and asks you to repost your note in more appropriate form. This practice has been perceived by some as censorship. To be sure, there are arguments on all sides of this question. Some people say that because members of on-line services agree to certain rules, it is reasonable for the services to scan messages for violations of the rules. Others believe that on-line services and bulletin boards violate a member’s First Amendment rights when any message is removed from public view. Tangled up in these arguments are the on-line service’s legal liabilities. Is the service responsible if one member slanders another in an on-line message? If members use the service to publicly conduct illegal business, is the service an accomplice to the crime? Can a service be subpoenaed to turn over records of on-line conversations to the courts? Lawyers and judges are wrangling over these issues and have yet to come up with complete answers.

One thing does seem to be clear: Personal electronic messages sent and received on public electronic mail and on-line services are private, much like paper mail sent through the U.S. Postal Service is private. On-line services, contrary to rumors that flare up from time to time, aren’t reading the contents of members’ messages. Within the confines of electronic mail, you can say to anyone else anything you want. These messages are for your eyes and those of your recipient only.

Chapter 4

What You Need to Get Connected

How Modems Work

What to Look for in a Modem

Understanding Your Phone Line

Communications Software

4 Getting connected is a lot easier than most people think. In fact, it requires only a PC, modem, phone line, and communications software. Each of these plays an important and necessary role in getting on line. The PC and communications software work together as the command station for the whole operation. The modem is the go-between for the PC and the telephone system, and the phone line is the pathway across which information travels. Without one, the others are absolutely useless in communicating information on line. So just how do they all work? And how do they work together?

How Modems Work

Have you ever been in a foreign country where you didn’t speak the language? Wonderful conversations happen all around you. A young woman enthusiastically recommends a great neighborhood bistro. A shopper describes the terrific bargains at an out-of-the-way boutique. Two friends laugh at a shared joke. A stockbroker whispers a tip on a sure investment. And you don’t understand any of it. You need a translator.

That’s how it is for computers and telephones. They speak different languages and can’t understand one another without a translator. A modem is that translator. Computers understand only digital information-electronic signals that represent one of two positions: on or off. It’s the combination of these on-off signals that represents information to a computer. Telephones, on the other hand, understand only analog tones-continuous sound waves that move over wires or through the air. The trouble comes because digital and analog machines-PCs and phones, in this case-can’t understand one another directly because they speak different languages. Without a translator the computer can’t understand information sent over telephone lines.

The word modem stands for MOdulator/DEModulator. When you send information from your PC, the modem converts (modulates) digital signals coming from your PC to analog signals that can be sent over telephone lines. When you receive information, the modem converts (demodulates) the analog tones from the phone line back into digital signals the computer can understand.

Though all this sounds complicated (and there is a bit of technical wizardry inside a modem), the modem itself is a pretty simple device. Its sole purpose is to act as the go-between between your computer and the phone, making sure the two can understand one another.

What to Look for in a Modem

Modems come in several flavors, but your choice will be determined by three things: compatibility, design, and speed.

Compatibility is the easiest to deal with of these three issues. You need to choose a modem that is compatible with your computer, and that’s a cinch because nearly a hundred percent of all modems will work with both IBM-compatible and Macintosh computers. In most cases, you’ll simply need to ask your computer salesperson for the appropriate style of connector cable depending on whether you’re using a PC or a Mac.

You may also hear something about AT compatibility in modems. In the early days of personal computers, Hayes Microcomputer Products developed a set of commands that let software tell modems what to do. These commands are called the AT command set. Because Hayes sold so many modems that used the AT command set, and because communications software makers rapidly took advantage of the command set to make their programs more useful, many more modem makers began selling what they called Hayes-compatible, or AT-compatible, modems. Today, it is virtually impossible to buy a PC modem that is not Hayes-compatible. I mention it here only because you may come across the term, and this nugget of information will keep you out of the dark.

With compatibility out of the way, you can focus on the more important distinctions. Modems are designed in one of two ways: They are either internal (installed inside your computer) or external (attached to the outside of your PC). Internal modems take the form of a circuit board that fits inside your computer. An internal modem requires an expansion slot, which is simply a connection between any circuit board and the PC’s main processing system. Expansion slots are used for a variety of purposes-storage devices, a mouse, a scanner, sound systems, and so on. Each of these things can require an expansion slot. If all the slots have been used for another purpose, you’ll need to use an external modem rather than an internal one. External modems are usually a flat box about the size of a book that houses the circuit board and connects to your PC with a cable. Both types of modems do the same job. The type you choose depends largely on whether you have available space inside your PC.

Incidentally, more and more PCs come with an internal modem preinstalled. If your computer included a modem when you bought it, the dirty work of installing the modem has been done for you, and you can skip right past this section of the chapter.

Once you’ve determined whether your modem will be internal or external, you’ll need to know a bit about its speed. A modem’s speed determines how quickly it can send and receive data. Essentially, the faster the modem, the faster it can translate the PC’s digital signals into analog signals and stuff them down the phone line. Modem speeds are rated by the bits (or individual pieces of data) that the modem can send to the phone line in one second. This rating is called bits per second, or bps. Modem speed is also measured in kilobits per second, or kbps.

Note The term bits per second is sometimes confused with the term baud. Baud rate has to do with the telephone line itself and refers to the number of signal changes that can take place in a second.

While modems come in a variety of speeds, the most common speeds for personal computers are 2,400 bps, 9,600 bps, and 14.4 kbps (usually called “fourteen dot four”). Obviously, the faster the modem, the more quickly information is sent from and received by your PC, and the less time you spend waiting for information to arrive. Fast modems, such as 9,600 bps modems, can talk to slower modems by slowing the speed at which they send and receive information. But under normal circumstances no modem can talk faster than its rated speed. For example, a 2,400 bps modem can talk to a 9,600 bps modem because the 9,600 bps modem will slow down for it. But a 2,400 bps modem can’t send data faster than 2,400 bits of information in one second. For a 9,600 bps modem to work at its fastest, it must be talking to another 9,600 bps modem or to one rated even faster. However, what should be a simple truth with computers-a modem can’t send information faster than its rated speed-is not always strictly true. A modem can send more data per second than its rating by compressing the information. Using special techniques, the modem can squeeze information that might take 100 bits in a normal form to just 70 bits in compressed form, for example. As a result, a modem rated at 14.4 kbps and designed to send 14.4 kilobits of data per second might actually be able to send 30 kilobits worth of data per second by compressing the information into fewer bits.

The ability to send information this quickly depends a lot on the quality of the telephone line. Sometimes when you make a telephone call, the connection is crystal clear. Other times, you hear a lot of static or clicking on the line. You can overlook this so-called line noise when youÕre talking to someone on the phone, but a modem can’t. It tries to interpret everything it “hears” coming across the line. When data is being sent slowly, say at 2,400 bps, the modem has an easier time differentiating data from noise. At faster speeds, say at 9,600 bps, the task of separating noise from data becomes much harder. As a result, the modem may make a mistake and interpret line noise as data. Fortunately, most modems today are self-correcting, using something called an error correction protocol. Error correction protocols can be quite complex. Suffice it to say that the modems on either end of the connection compare notes on what one has sent with what the other has received. If their notes don’t match, the sending modem will resend that data in question.

Fax Modems

As you shop for a modem, you may hear the term fax modem or data/fax modem. The words mean the same thing: a modem that can handle both data and fax transmissions. While a fax modem looks no different from a standard data-only modem, it has the ability to communicate with office fax machines and other fax modems. Even if you don’t need fax capabilities right now, you might want to consider buying a fax modem rather than a data-only modem-just in case. Typically, a fax modem costs only a little more than a standard data modem, and often the fax software you need to send and/or receive faxes with your computer is included in the price.

There are several types of fax modems, but the differences are pretty basic. First, some fax modems can only send faxes-they can’t receive them. If you want to receive incoming faxes with your computer, you should be sure the fax modem has send and receive capabilities, and these days most fax modems do. Secondly, be sure your fax modem is at least Group 3 compatible, or better yet, Group 4 compatible. Group 3 and Group 4 are fax standards and determine the speed at which a fax is sent, among other things. Group 3 and Group 4 fax modems are able to talk to one another. It’s just that a Group 4 fax modem will communicate more quickly with another Group 4 fax modem or fax machine than it will with a Group 3-compatible device. Finally, you may want to look for a CAS-compatible modem. CAS, or Communications Application Specification, was developed by Intel and defines how most software applications communicate with fax software. Your fax modem doesn’t have to be CAS-compatible, but you may find it a big convenience because CAS will allow you to fax a document without leaving the application in which you created it.

To use a fax modem, however, you must have special fax communications software. This software allows you to send almost any document created by a word processor, graphics program, spreadsheet, or other application to a fax machine anywhere in the world. Generally, the fax software intercepts selected documents heading to the printer and saves them in a file as a graphic image of the document as it would have looked on the printed page. Once the file is captured, you direct the fax software to prepare a cover page, append the document image file, dial a fax number, and send the document. At the receiving end, the cover page and document look just like an incoming call from another fax machine. In fact, the document typically looks better than if you had printed it and fed it into a fax machine because often graphic clarity is lost in the translations from the electronic file to the fax printout.

Just as with ordinary data modems, fax modems also come in external and internal models. Fax modems are installed in the same way as data-only modems.

Understanding Your Phone Line

What would you possibly need to know about phone lines? After all, there’s just an outlet in the wall into which you plug your phone, right? It’s almost that simple. That outlet in your wall leads to an intricate web of phone systems connecting with one another, much like an on-ramp leads to a highway that connects to many other roads and highways. That may seem obvious enough, but like highways, not all phone systems are the same. The first thing (and likely the last) you need to know about phone lines is what kind of phone system you have.

Essentially, there are two kinds of phone systems: touch tone and pulse. The majority of phone systems in the United States are touch tone, which means that each number is represented by a different pitch or tone. Pulse phone systems send a series of electronic pulses to represent a number-five pulses for the number five, for example-rather than sending a tone for each number.

Your modem will work with either touch-tone or pulse systems, but you’ll need to know which type you have. To make this determination, pick up your phone and press the number 5. If you hear a musical note, you have a touch-tone phone line. If you hear five clicks, you have a pulse system.

The second thing you’ll need to know about your phone line is whether you have call waiting. Call waiting is a convenience in voice calls that alerts you to incoming calls. But it’s a major inconvenience when it interrupts-and cuts off-your modem connection. You can avoid these interruptions by suspending call waiting during your modem communications sessions. Just add the digits 1170 to the beginning of the number you want the modem to dial. Then incoming calls get a standard busy signal and you have a connection free of unwanted interruptions. Once you hang up your modem, the call waiting feature is resumed for subsequent calls.

Communications Software

If modems are the translators between PCs and phone lines, and phone lines are the highways across which information is sent, then communications software is the glue that holds it all together. Communications software tells the modem what telephone number to dial, what sort of computer to expect at the other end, and how to talk to it. Communications software is also your window into the on-line connection.

Communications programs handle basic functions such as dialing a phone number, sending text, and receiving text and other information. But communications software can have other, more advanced features, too. Typically, communications software includes a telephone directory where you enter the information and phone number of the on-line services and bulletin boards you use. The software may also have a mini word processor that you can use to compose messages. When you are connected to a service or bulletin board, the communications software acts as the go-between, telling you what information the on-line connection is requesting and relaying your commands back to the service. Some communications packages even have a scripting language that lets you write out and store instructions so that your software can automatically call on-line services without your intervention.

Note If your computer came equipped with a modem already installed in it, it probably also came with communications software. If your PC runs the Windows graphical interface, you have a simple communications program in the Windows Terminal application.


Many on-line services, such as the Prodigy Information Service and America Online, have their own communications software that you’ll need in order to use these services. Other on-line services, such as GEnie and CompuServe, can be used with any communications packages. Still, there are several programs, such as WinCIM and TAPCIS on the PC, and Navigator for the Mac, that are designed to work exclusively with CompuServe. These programs make getting around the large and complex information service much easier and much more cost-effective.       Communications capabilities are also built into many other kinds of software. For example, the personal finance programs you read about in Chapter 1 include communications software for dialing into bill-paying systems, such as Checkfree, or into on-line services to gather stock quotes. Address-book programs might include a communications component that instructs the PC to dial a phone number for you.

Glossary

Combining the insight of columnist John Dvorak with the latest lab research from PC Magazine, this annual offering shows it how it all stacks up, from PC’s to all major peripherals, including CD-ROM drives, modems, monitors, video cards, input devices and printers.

The 500 word glossary in our almanac is taken from this book. To order the full book, call 1-800-688-0448, extension 199.

This special electronic edition of PC/Computing contains 10 chapters from 2001 Windows Tips. Chapter 11 shows a selection of other books and software you can order from Ziff-Davis Press and The Software Toolworks.

Click the turned up pages on the right and left corners to go forward and backward through the book.

adapter Also known as an add-on card, controller, expansion card, or I/O card. Adapters are installed in expansion slots to enhance the processing power of the computer or to communicate with other devices. Examples of adapters include asynchronous communication, floppy disk-controller, and expanded memory.

address A unique memory location permitting reading or writing of data to/from that location. Network interface cards and CPUs often use shared addresses in RAM to move data between programs.

analog-to-digital converter (ADC) A device that converts analog input signals to digital output signals used to represent the amplitude of the original signal.

application software A computer program designed to help people perform a certain type of work. An application can manipulate text, numbers, graphics, or a combination of elements. Some application packages focus on a single task and offer greater computer power while others, called integrated software, offer less power but include several applications, such as word processing, spreadsheet, and database programs. An application may also be referred to as software, program, instructions, or task. See also software

areal density The amount of data that can be stored in one area of a disk-hard or floppy.

ASCII (American Standard Code for Information Interchange) The data alphabet used in the IBM PC to determine the composition of the 7-bit string of 0s and 1s that represents each character (alphabetic, numeric, or special). It is a standard way to transmit characters.

asynchronous communication (ASYNC) A type of serial communication by which data is passed between devices. “Asynchronous” means that the timing of each character transmitted is independent of other characters.

average access time The time (in milliseconds) that a disk drive takes to find the right track in response to a request (the seek time), plus the time it takes to get to the right place on the track (the latency).

back up To make a copy of a file, group of files, or the entire contents of a hard disk.

baud rate A measure of the actual rate of symbols transmitted per second, which may represent more than one bit. A given baud rate may have more than one bps (bits per second) rate. Baud rate is often used interchangeably with bps, although this is technically incorrect.

binary A numbering system with two digits, 0 and 1, used by computers to store and process information.

BIOS (basic input/output system) A collection of primitive computer routines (stored in ROM in a PC) that control peripherals such as the video display, disk drives, and keyboard.

bisynchronous (BISNYC) Computer communications in which both sides simultaneously transmit and receive data.

bit A binary digit: the smallest piece of information that can be recognized and processed by a computer. A bit is either 0 or 1. Bits can form larger units of information called nibbles (4 bits), bytes (8 bits), and words (usually 16 bits). See also data bit

bits per second (bps) The number of data bits sent per second between two modems. Used as a measure of the rate at which digital information is handled, manipulated, or transmitted. Similar, but not identical, to baud rate.

buffer An area of RAM (usually 512 bytes plus another 16 for overhead) in which DOS stores data temporarily. See also frame buffer

bus A group of wires used to carry a set of related signals or information within a computer from one device to another.

byte A sequence of adjacent binary digits that the computer considers a unit. A byte consists of 8 bits.

cache An amount of RAM set aside to hold data that is expected to be accessed again. The second access, which finds the data in RAM, is very fast. (Pronounced like “cash.”)

CGA IBM’s first color graphics standard, capable of 320 by 320 resolution at four colors (or gray shades on laptops), or 640 by 200 at two colors (black and white). CGA-only laptops are behind the times.

chip An integral part of the PC. These are very tiny, square or rectangular slivers of material (usually silicon) with electrical components built in. Some of the chips in a computer aid in memory, but the most important chip is the microprocessor. This is the “8088”, “286”, “386”, or “486” that is referred to when talking about a specific machine’s features.

clone An IBM PC/XT- or AT-compatible computer made by another manufacturer.

cluster A hard-disk term that refers to a group of sectors, the smallest storage unit recognized by DOS. On most modern hard disks, four 512-byte sectors make up a cluster, and one or more clusters make up a track.

CMOS (complementary metal oxide semiconductor) chip A type of memory chip that retains its data when power is turned off as long as it retains a trickle of power from a battery.

coding The act of programming a computer; specifically, generating source code in the language of the program’s choice. The most popular languages used by programmers are Pascal, C, and C++.

COM Communications port or serial port used by modems, mice, and some printers. DOS assigns these ports as COM1, COM2, and sometimes COM3 and COM4. DOS also lets you refer to the first communications port as AUX. Note: Some programs count communications ports starting with 0, so “Port 0” or “Communications Port 0” would be COM1, and “Port 1” would be COM2.

communications parameters Settings that define how your communications software will handle incoming data and transmit outgoing data. Parameters include bits per second, parity, data bits, and stop bits.

convergence A video term that describes the way in which the three beams that generate the three color dots (red, green, blue) should meet. When all three dots are excited at the same time and their relative distance is perfect, the result is pure white. Deviation from this harmony (due to an incorrect relationship of the beams to each other) results in poor convergence. This causes white pixels to show bits of color and can decrease image sharpness and resolution.

CPU (central processing unit) The functional “brain” of a computer; the element that does the actual adding and subtracting of 0s and 1s and the manipulation and moving of data that is essential to computing.

database A file consisting of a number of records or tables, each of which is constructed of fields (in column format) of a particular type, together with a collection of operations that facilitate searching, sorting, recombination, and similar acts.

data bits The bits sent by a modem. These bits make up characters and don’t include the bits that make up the communications parameters. See also bit

device Any piece of computer hardware.

device-level interface An interface that uses an external controller to connect the disk drives to the PC. Among its other functions, the controller converts the serial stream of data read from the drive into parallel data for the host computer’s bus. ST506 and ESDI are device-level interfaces.

digital-to-analog converter (DAC) A circuit that accepts digital input signals and converts them to analog output signals. Sometimes called DAC chips, they are used in VGA video cards, for example.

directory A list of file names and locations of files on a disk.

disk A circular metal platter or mylar diskette with magnetic material on both sides that stores programs and data. Disks are rotated continuously so that read/write heads mounted on movable or fixed arms can read or write programs or data to and from the disk. See also floppy disk, hard disk

disk cache A portion of a computer’s RAM set aside for temporarily holding information read from a disk. The disk cache does not hold entire files as does a RAM disk, but information that has either been recently requested from a disk or has previously been written to a disk.

disk defragmenter Defragmentation is the rewriting of all the parts of a file on contiguous sectors. When files on a hard disk drive are being updated, the information tends to be written all over the disk, causing delays in file retrieval. Defragmentation reverses this process, and is often achieved with special defragmentation programs that provide up to 75 percent improvement in the speed of disk access and retrieval.

disk drive The motor that actually rotates the disk, plus the read/write heads and associated mechanisms, usually in a mountable housing. Sometimes used synonymously to mean the entire disk subsystem.

disk format Refers to the method in which data is organized and stored on a floppy or hard disk.

diskette See floppy disk

DOS (disk operating system) A set of programs that control the communications between components of the computer. Examples of DOS functions are: displaying characters on the screen, reading and writing to a disk, printing, and accepting commands from the keyboard. DOS is a widely used operating system on IBM-compatible personal computers (PCs).

dot matrix printer A type of printer technology using a print head with pins to poke out arrays of dots that form text and graphics.

dot pitch A color monitor characteristic; specifically, the distance between the holes in the shadow mask. It indirectly describes how far apart the individual dots are on screen. The smaller the dot pitch, the finer the image’s “grain.” Some color monitors, such as the Sony Trinitron, use a slot mask (also known as an aperture grille) that is perforated by strips, not holes, in the shadow mask. In this case, the dots are arranged in a linear fashion, and their density is called striped dot pitch. (Monochrome monitors do not use a shadow mask and therefore do not have a dot pitch.)

download To receive information from another modem and computer over the telephone lines. It is the opposite of upload.

DRAM (dynamic random-access memory) The most commonly used type of memory, found on video boards as well as on PC system boards. DRAM is usually slower than VRAM (video random-access memory), since it has only a single access pathway.

drive array A storage system composed of several hard disks. Data is divided among the different drives for greater speed and higher reliability.

DSDD (double-sided, double-density) On PCs and laptops, DSDD means 720K 3 1/2-inch diskettes or 360K 5 1/4-inch diskettes.

DSHD (double-sided, high-density) On PCs and laptops, DSHD means 1.44Mb 3 1/2-inch diskettes or 1.2Mb 5 1/4-inch diskettes.

EISA (Extended Industry Standard Architecture) Primarily a desktop specification for high-performance computers. Competes with IBM’s Micro Channel architecture (MCA). EISA computers can use existing PC, XT, and AT add-in cards; MCA computers can’t. See also Micro Channel architecture

E-mail (electronic mail) The exchange of messages via a bulletin board or on-line service. One user leaves the message on the service “addressed” to another user. The other user later connects to the same service and can read the message and reply to it.

expanded memory Memory that can be used by some DOS software to access more than the normal 640K (technically, more than 1Mb). 80386, 80386SX, and 80486 computers can create expanded memory readily by using an EMS (expanded memory specification) driver provided with DOS, through Microsoft Windows, or through a memory manager such as Quarterdeck QEMM or Qualitas 386 To The Max. To use expanded memory, a program must be EMS-aware or run under an environment such as Microsoft Windows. 8088- and 80286-based computers often need special hardware to run expanded memory. See also memory

extended memory Memory above 1Mb in 80286 and higher computers. Can be used for RAM disks, disk caches, or Microsoft Windows, but requires the processor to operate in a special mode (protected mode or virtual real mode). With a special driver, you can use extended memory to create expanded memory. See also memory, RAM, ROM

file A collection of related records treated as a unit. In a computer system, a file can exist on magnetic tape, disk, or as an accumulation of information in system memory. A file can contain data, programs, or both.

floppy disk A removable, rotating, flexible magnetic storage disk. Floppy disks come in a variety of sizes, but 3 1/2-inch and 5 1/4-inch are the most popular. Storage capacity is usually between 360K and 1.44MB. Also called flexible disk or diskette. See also disk, hard disk

floppy drive A disk drive designed to read and write data to a floppy disk for transfer to and from a computer.

format A DOS command that records the physical organization of tracks and sectors on a disk.

frame buffer A large section of memory used to store an image to be displayed on-screen as well as parts of the image that lie outside the limits of the display. See also buffer

GCR (group coded recording) A hard-disk term for a storage process where bits are packaged as groups, with each group assigned to and stored under a particular code. Used by RLL drives.

graphics coprocessor Similar to a math coprocessor in concept, a programmable chip that can speed video performance by carrying out graphics processing independently of the microprocessor. Graphics coprocessors can speed up performance in two ways: by taking over tasks the main processor would lose time performing and by optimizing for graphics. Video adapter cards with graphics coprocessors are expensive compared to those without them, but they speed up graphics operations considerably. Among the coprocessor’s common abilities are drawing graphics primitives and converting vectors to bitmaps.

handshaking A modem term that describes the initial exchange between modems. It’s like “are you there?” with the response “I am here.”

hard disk A mass storage device that transfers data between the computer’s memory and the disk storage media. Hard disks are nonremovable, rotating, rigid, magnetic storage disks. There are some types of hard disk with removable rigid media in the form of disk packs. See also disk

hardware The physical components of a computer.

head actuator In a disk drive, the mechanism that moves the read/write head radially across the surface of the platter of the disk drive.

high-speed modem A modem operating at speeds from 9,600 to 19,200 bits per second.

host system In telecommunications, the system that you have called up and to which you are connected, such as a BBS (bulletin board system) or an on-line service such as CompuServe.

Hz (Hertz) A unit of measurement. This used to be called cycles per second.

IDE (integrated drive electronics) A disk drive with its own controller electronics built in to save space and money. Many laptops use IDE drives.

instructions See application software

Intel A major manufacturer of integrated circuits used in computers. Intel makes the 8086 family of microprocessors and its derivatives: the 8088, 80286, 80386SX and DX, and 80486SX and DX. These are the chips used in the IBM PC family of computers and all the computers discussed in this book.

integrated circuit (IC) A tiny complex of electronic components and their connections that is produced in or on a slice of material (such as silicon). A single IC can hold many electronic elements. Also called a chip.

interlaced and noninterlaced scanning Two monitor schemes with which to paint an image on the screen. Interlaced scanning takes two passes, painting every other line on the first pass and filling in the rest of the lines on the second pass. Noninterlaced scanning paints all the lines in one pass and then paints an entirely new frame. Noninterlaced scanning is preferable because it reduces screen flicker, but it’s more expensive.

interleaving A hard-disk term that describes a method of arranging disk sectors to compensate for relatively slow computers. Spreads sectors apart instead of arranging them consecutively. For example, 3:1 interleaving means your system reads one out of every three tracks on one rotation. The time required for the extra spin lets the read/write head catch up with the disk drive, which might otherwise outrun the head’s ability to read the data. Thanks to track buffering and the speed of today’s PCs, interleaving is obsolete. Look for a “1:1 interleaving,” which indicates a noninterleaved drive.

I/O (input/output) Input is the data flowing into your computer. Output is the data flowing out. I/O can refer to the parallel and serial ports, keyboard, video display, and hard and floppy disks.

interrupt request (IRQ) A request for attention and service made to the CPU. The keyboard and the serial and parallel ports all have interrupts. Setting two peripherals to the same IRQ is a cause of hair pulling among desktop PC users; laptops don’t suffer the problem as badly because they have few, if any, add-on products that need interrupts set.

ISA (Industry Standard Architecture) Computers using the same bus structure and add-in cards as the IBM PC, XT, and AT. Also called classic bus. It comes in an 8-bit and 16-bit version. Most references to ISA mean the 16-bit version. Many machines claiming ISA compatibility will have both 8- and 16-bit connectors on the motherboard.

kilobyte (KB) 1,024 bytes. Sometimes abbreviated as k (lowercase), K-byte, K, or KB for kilobyte and Kb for kilobit (1,024 bits). When in doubt about whether an abbreviation refers to kilobytes or kilobits, it’s probably kilobytes, with these exceptions: the speed of a modem (as in 2.4 kilobits per second) and the transfer rate of a floppy disk (as in 500 kilobits per second).

local area network (LAN) A small- to moderate-size network in which communications are usually confined to a relatively small area, such as a single building or campus.

logical drive A drive that has been created by the disk operating system (DOS). This is done either at the preference of the user or because the DOS version does not allow a formatted capacity in excess of 32MB. A user with a 100MB hard disk will want to use more than 32MB, so a program will tell DOS there are a bunch of “logical” drives that add up to 100MB. DOS 5.0 eliminates this need.

log on or log off The process of connecting or disconnecting your computer to another system by modem.

MB See megabyte

mega One million, but with computers it typically means 1,048,576 (1,024 times 1,024).

megabyte (MB) 1,048,576 bytes (1,024 times 1,024). Used to describe the total capacity of a hard or floppy disk or the total amount of RAM. Sometimes abbreviated as Mb, M, MB, or meg for megabyte; and Mb, M-bit, or Mbit for megabit. When in doubt, it’s probably megabyte, not megabit, with these exceptions: the capacity of a single memory chip (a 1-megabit chip; you need eight chips plus an optional ninth parity-checking chip to get 1 megabyte of memory), the throughput of a network (4 megabits per second), and the transfer speed of a hard disk (5 megabits per second).

megahertz (MHz) One million cycles per second, typically used in reference to a computer’s clock rate. Both the clock rate and the processor type (80286, 80386, etc.) determine the power and speed of a computer.

memory A device that stores data in a computer. Internal memories are very fast and are either read/write random-access memory (RAM) or read-only memory (ROM). Bulk storage devices are either fixed disk, floppy disk, tape, or optical memories; these hold large amounts of data, but are slower to access than internal memories. See also expanded memory, extended memory, RAM, ROM

MHz See also megahertz

Micro Channel architecture (MCA) The basis for the IBM Micro Channel bus, used in high-end models of IBM’s PS/2 series of personal computers. See also EISA

microprocessor An integrated circuit (IC) that communicates, controls, and executes machine language instructions.

microsecond 1/1,000,000 (one-millionth) of a second.

millisecond (ms) 1/1,000 (one-thousandth) of a second. Hard disks are rated in milliseconds. Modern laptop hard disks have drives of 20 to 40 milliseconds, meaning they can find the average piece of data in 1/25 to 1/50 of a second. Older hard disks were about 100 milliseconds. Higher numbers mean slower performance.

modem A combination of the words modulate and demodulate. A device that allows a computer to communicate with another computer over telephone lines.

multimedia The presentation of information on a computer using sound, graphics, animation, video, and text.

nanosecond 1/1,000,000,000 (one-billionth) of a second. Memory chips are rated in nanoseconds, typically 80 to 150 nanoseconds. Higher numbers indicate slower chips.

NetWare A popular series of network operating systems and related products made by Novell.

network A continuing connection between two or more computers that facilitates sharing files and resources.

online/offline When connected to another computer via modem and telephone lines, a modem is said to be online. When disconnected, it is offline.

operating system (OS) A set of programs residing in ROM and/or on disk that controls communications between components of the computer and the programs run by the computer. MS-DOS is an operating system.

OS/2 (Operating System/2) An operating system developed by IBM and Microsoft for use with Intel’s microprocessors. Unlike its predecessor, DOS, OS/2 is a multitasking operating system. This means many programs can run at the same time.

OS/2 Extended Edition IBM’s proprietary version of OS/2; it includes built-in communications and database-management facilities.

parallel port A port that transmits or receives 8 bits (1 byte) of data at a time between the computer and external devices. Mainly used by printers. LPT1 is a parallel port, for example.

PCL (printer command language) Usually refers to Hewlett-Packard laser printers. Most H-P compatibles support PCL 4. H-PÆs newest printers (the III series) use PCL 5, which includes scalable fonts and monochrome support for HPGL.

peripheral A device that performs a function and is external to the system board. Peripherals include displays, disk drives, and printers.

pixel A pixel is the smallest information building block of an on-screen image. On a color monitor screen, each pixel is made of one or more triads (red, green, and blue). Resolution is usually expressed in terms of the number of pixels that fit within the width and height of a complete on-screen image. In VGA, the resolution is 640 by 480 pixels; in SuperVGA, it is 800 by 600 pixels.

platter The actual disk inside a hard-disk drive; it carries the magnetic recording material. All but the thinnest disk drives have multiple platters, most of which have two sides that can be used for data storage. (On multiple-platter drives, one side of each platter is usually reserved for storing control information.)

port The channel or interface between the microprocessor and peripheral devices.

program See application software

programming language Any artificial language that can be used to define a sequence of instructions that can ultimately be processed and executed by the computer.

PROM (programmable read-only memory) A (usually) permanent memory chip programmed after manufacture (unlike a ROM chip). EPROMs (erasable PROMs) and EEPROMs (electrically erasable PROMs) can be erased and reprogrammed several times.

protocol Rules governing communications, including flow control (start-stop), error detection or correction, and parameters (data bits, stop bits, parity). If they use the same protocols, products from different vendors can communicate.

RAM (random-access memory) Also known as read-write memory; the memory used to execute application programs. See also memory

RAM disk VDISK (virtual disk) that can be used in place of a hard or floppy disk for frequently accessed files. A RAM disk is dangerous for storing data because the contents are lost if the computer crashes or if power is turned off. Most users with extra RAM use it for a disk cache rather than as a RAM disk. See also memory

read/write head The part of the hard disk that writes data to or reads data from a platter. It functions like a coiled wire that reacts to a changing magnetic field by producing a minute current that can be detected and amplified by the electronics of the disk drive.

refresh rate See vertical frequency

RGB (red, green, blue) The triad, the three colors that make up one pixel of a color monitor. See also triad

RLL (run length limited) A hard-disk method of encoding information magnetically that uses a scheme (GCR) to store blocks of data instead of single bits of data. It allows greater storage densities and higher transfer speeds than the other method in use (MFM).

ROM (read-only memory) The memory chip(s) that permanently store computer information and instructions. Your computer’s BIOS (basic input/output system) information is stored in a ROM chip. Some laptops even have the operating system (DOS) in ROM.

RS-232C An electrical standard for the interconnection of equipment established by the Electrical Industries Association; the same as the CCITT code V.24. RS-232C is used for serial ports.

SCSI (small computer system interface) A system-level interface designed for general purpose applications that allows up to seven devices to be connected to a single host adapter. It uses an 8-bit parallel connection that produces a maximum transfer rate of 5Mb per second. The term is pronounced “scuzzy.”

sector The basic storage unit on a hard disk. On most modern hard disks, sectors are 512 bytes each, four sectors make up a cluster, and there are 17 to 34 sectors in a trackùalthough newer drives may have a different number of sectors.

serial port The “male” connector (usually DB-9 or DB-25) on the back of your computer. It sends out data one bit at a time. It is used by modems and, in years past, for daisy-wheel and other printers. The other port on your computer is the parallel port, which is a “female” connector. It is used for printers, backup systems, and mini-networking (LANs). See also COM.

shadow mask Inside the color monitor just behind the screen, it is drilled with small holes, each of which corresponds to a triad. The shadow mask helps guide the electron beams so that each beam hits only one phosphor dot in the triad.

shell A piece of software providing direct communication between the user and the operating system. The main inner part of the system, called the kernel, is enclosed by the shell program, as in a nut.

slot mask Also known as an aperture grille, it serves the same function as the shadow mask on a monitor.

spindle One part of a hard disk, around which the platters rotate.

software Programming tools such as languages, assemblers, and compilers; control programs such as operating systems; or application programs such as electronic spreadsheets and word processors. Software instructs the computer to perform tasks. See also application software

spreadsheet An application commonly used for budgets, forecasting and other finance-related tasks. Data and formulas to calculate those data are entered into ledger-like forms (spreadsheets or worksheets) for analysis, tracking, planning, and evaluation of impacts on economic strategy.

synchronous communication Fixed-rate serial communication, eliminating the need for transmitting inefficient start-stop information. PC-to-mainframe communication may be synchronous; most PC-to-PC communication is asynchronous. Most laptop modems are asynchronous only. If you’re not sure whether you need a synchronous-asynchronous modem, you probably don’t.

system-level interface A connection between the hard disk and its host system that puts control and data-separation functions on the drive itself (and not on the external controller). SCSI and IDE are system-level interfaces.

telecommunication Using your computer to communicate with another computer via telephone lines and your modem.

track The circular path traced across the spinning surface of a disk platter by the read/write head inside the hard-disk drive. The track consists of one or more clusters.

track buffer Memory sometimes built into disk-drive electronics, sufficient to store the contents of one full track. This allows the drive to read the entire track quickly, in one rotation, then slowly send the information to your CPU. It eliminates the need for interleaving and can speed up drive operation.

transfer rate The speed at which a disk drive can transfer information between its platters and your CPU. The transfer rate is typically measured in megabytes per second, megabits per second, or megahertz.

transmission speed See baud rate

triad Three phosphor-filled dots (one red, one green, one blue) arranged in a triangular fashion within a monitor. Each of the three electron guns is dedicated to one of these colors. As the guns scan the screen, each active triad produces a single color, which is determined by the combination of excited color dots and by how active each dot is. See also RGB

utility program A program designed to perform maintenance work on a system or on system components, e.g. a storage backup program, a disk and file recovery program, or a resource editor.

V. The CCITT international communications standards, pronounced “vee-dot.” Various V. standards cover speed (modulation), error correction, data compression, and signaling characteristics.

vertical frequency This is also called the vertical refresh rate, or the vertical scan frequency. It is a monitor term that describes how long it takes to draw an entire screenful of lines, from top to bottom. Monitors are designed for specific vertical and horizontal frequencies. Vertical frequency is a key factor in image flicker. Given a low enough vertical frequency (53 Hz, for example) nearly everyone will see a flicker because the screen isn’t rewritten quickly enough. A high vertical frequency (70 Hz on a 14-inch monitor) will eliminate the flicker for most people.

VGA IBM’s third (1987) and current mainstream graphics standard, capable of 640-by-480-pixel resolution at 16 colors or gray shades. SuperVGA (800 by 600) resolution is important on desktop PCs. A handful of laptops support SuperVGA when connected to an external monitor; they use regular VGA when driving the built-in display. Some laptop vendors use “text mode” VGA, which means the monitor displays only 400 pixels, not 480, vertically, and uses double-scan CGA (640 by 400) for graphics.

VRAM (video random-access memory) Special-purpose RAM with two data paths for access, rather than the one path in conventional RAM. The two paths let a VRAM board handle two functions at once: display refresh and processor access. VRAM doesn’t force the system to wait for one function to finish before starting the other, so it permits faster operation for the video subsystem.

wide area network (WAN) Usually a moderate to large network in which communications are conducted over the telephone lines using modems.

write protection Keeping a file or disk from being written over or deleted. 3 1/2-inch floppy disks use a sliding write-protect tab in the lower-left corner (diagonally across from the beveled corner of the disk) to keep the computer from writing to the disk. When the opening is hidden by the tab (no light passes), you can write to the disk; tab open, you can’t write. This can be confusing because it’s the exact opposite of how a 5 1/4-inch disk works. Most file management utilities allow you to write-protect individual files.

Any sufficiently advanced technology is indistinguishable from magic.

-Arthur C. Clarke

Sorcerers have their magic wands -powerful, potentially dangerous tools with a life of their own. Witches have their familiars creatures disguised as household beasts that could, if they choose, wreak the witches havoc. Mystics have their golems-beings built of wood and tin brought to life to do their masters bidding.

We have our personal computers.

PCs, too, are powerful creations that often seem to have a life of their own. Usually they respond to a seemingly magic incantation typed at a C:> prompt or to a wave of a mouse by performing tasks we couldn’t imagine doing ourselves without some sort of preternatural help. But even as computers successfully carry out our commands, it’s often difficult to quell the feeling that there’s some wizardry at work here.

And then there are the times when our PCs, like malevolent spirits, rebel and open the gates of chaos onto our neatly ordered columns of numbers, our carefully wrought sentences, and our beautifully crafted graphics. When that happens, we’re often convinced that we are, indeed, playing with power not entirely under our control. We become sorcerers apprentices, whose every attempt to right things leads to deeper trouble.

Whether our personal computers are faithful servants or imps, most of us soon realize there’s much more going on inside those putty-colored boxes than we really understand. PCs are secretive. Open their tightly sealed cases and youre confronted with poker-faced components. Few give any clues as to what they’re about. Most of them consist of sphinxlike microchips that offer no more information about themselves than some obscure code printed on their impenetrable surfaces. The maze of circuit tracings etched on the boards is fascinating, but meaningless, hieroglyphics. Some crucial parts, such as the hard drive and power supply, are sealed with printed omens about the dangers of peeking inside, omens that put to shame the warnings on a pharaohs tomb.

This book is based on two ideas. One is that the magic we understand is safer and more powerful than the magic we don’t. This is not a hands-on how-to book. Don’t look for any instructions for taking a screwdriver to this part or the other. But perhaps your knowing more about what’s going on inside all those stoic components makes them all a little less formidable when something does go awry. The second idea behind this book is that knowledge, in itself, is a worthwhile and enjoyable goal. This book is written to respond to your random musings about the goings-on inside that box that you sit in front of several hours a day. If this book puts your questions to rest or raises new ones it will have done its job.

At the same time, however, I’m trusting that knowing the secrets behind the magician’s legerdemain won’t spoil the show. This is a real danger. Mystery is often as compelling as knowledge. I’d hate to think that anything you read in this book takes away that sense of wonder you have when you man-age to make your PC do some grand, new trick. I hope that, instead, this book makes you a more confident sorcerer.

BEFORE YOU BEGIN

This book has been written with a certain type of personal computer in mind the IBM PC-compatible computer, usually powered by an Intel microprocessor and most often running the MS-DOS operating system. Many of the specifics in these explanations apply only to that class of computer and those components.

In more general terms, the explanations also may apply to Macintosh computers, Unix work-stations, and even minicomputers and mainframes. But I’ve made no attempt to devise universal explanations of how computers work. To do so would, of necessity, detract from the understanding that comes from inspecting specific components.

Even so, there is so much variety even within the IBM/Intel/MS-DOS world of PCs that, at times, Ive had to limit my explanations to particular instances or stretch the boundaries of a particular situation to make an explanation as generic as possible. If you spot anything that doesn’t seem quite right in this book, I pray that my liberties with the particulars is the only cause.

XMA (extended memory specification) Interface that lets DOS programs cooperatively use extended memory in 80286 and higher computers. One such driver is Microsoft’s HIMEM.SYS, which manages extended memory and HMA (high memory area), a 64K block just above 1MB.

Competency Levels Suggested Learning Process Learning Outcomes Comments Time Duration M1: Introduction to Computers 1.0 What is a computer ? • Teacher initiates a friendly discussion regarding the Computer • Teacher asks about computers 1) What is a computer? 2) Where do we use computer? 3) Why do we use computer? Pupils answer 1) Computer is an electronic machine 2) In the bank, School, Office, etc To type a document, to send e mail, search internet, listen music, etc • Teacher introduces different parts of a computer and gives a brief description on each part. Ex: Monitor/VDU, System unit, Keyboard, mouse, etc Activity 01 • Teacher provides one name tag to each group. • Teacher asks them to show the particular object relevant to the given name tag by teacher. • Teacher explains the shutdown procedure Ex: Click on Button Go to Click • Teacher gives them to do it practically. While they are doing the practical teacher moniters the class and help the student those who dificult to do this. Teacher describes and demomstrates 1) How to switch on the computer 2) Booting procedure. • Teacher gives them a chance to do it practically. Assesmet Teacher gives following activity to each student. Activity Put them in to correct order – switch on a) Switch on the UPS b) Switch on the monitor c) Switch on the CPU d) Switch on the power supply Put them in to correct order – Switch off a) Click on turn off your computer b) Go to “shoutdown’ c) Click on “start” button • Teacher uses the peer correction methord to correct their activities. 1) Defines what is computer 2) Names the parts of a computer 3) Uses correct techniques of switch on and off the computer 4) Values the uses of a computer 40 min 1.1 Basic parts of a computer Teacher gets the pupils to sit-down separately as groups. Teacher provides picture cards (picture of a computer) to each group. Teacher gets each group to discuss and names 5 parts of following picture. Teacher pastes picture of a computer on the white board. Teacher asks pupil’s to “will you look at the white board” Teacher describes the parts of the computer with the help of this picture. Teacher asks to show picture cards which teacher distributed. Teacher gives following activity cards to each group. Complete the computer puzzle, using the clues given below. a) K B b) M c) N R d) P T e) S W f) H R Clues – a) An input device b) The device where information is stored c) A TV like screen d) Another output device e) Programs in a computer f) Parts of a computer that can be touched or felt While they are doing the activity teacher goes round the class and monitors the groups. Teacher introduces different parts of a computer and gives a brief description on each part. Ex: Monitor/VDU, System unit, Keyboard, mouse, etc Teacher gets each group to name one part of a computer. Teacher writes their answers on the white board. Ex: Input device Output device Keyboard Mouse Monitor Printer Teacher explains input output devices. Teacher provides papers to each group Teacher gets pupil’s to draws a picture of computer and name it. Teacher asks each group to present their drawings. Teacher does the correction with the help of pupils. (teacher lets pupil’s to do their correction by exchanging drawing them among them selves) Assesment Teacher gives following activity cards to each student. Match the following a)Key board Output device b)Memory Input device c)Monitor Like brain d)Control units Programs in a computer c) Software Like traffic police Teacher does the correction by exchanging activity cards themselves. Teacher briefly explain’s whole lesson oneceagain. 1) Names the parts of a computer. 2) Illustrates a computer system. 3) Lists what are the computer Input, output, & storage. 4) Explains the terms “Hardware” & “Software” 40 min 1.2 Why we use computers ? Teacher gets the pupil’s to form their own groups. Teacher provides papers to each group. Teacher gets each groups to discuss and write five usage of a computer Teacher collects their papers. Teacher demonstrates the uses of a computer(with the help of picture) Teacher gives pupils to following activity card Fill in the blanks a) Computer can…………………………a picture. b) Computer can………………………..your friend’s name and address. c) Computer can………………………..music. d) Computer can………………………..a letter. e) Computer can………………………..spellings for you. Teacher does the peer correction. Assessment Teacher explains the whole lesson again. Teacher gives following activity cards. Say true or false a) Computer can design house. b) Computer can to calculate. c) Computer can be used to play chess. d) Computer can arrange the books in your basket. e) Computer can store your friend names addresses. f) Computer can eat your food. 1) Lists uses of a Computer 2) Finds places where the computers are used. 3) Discusses the value of using the computer. 40 min 1.3 Places where computers can be used • Teacher initiates a friendly discussion on where the computers are used. • Uses a presentation to enhance the knowledge of • Teacher asks them discuss other places where computers are used. • Teacher gets the students to write the places where the computers are used. • Discusses the value of using a computer in a place to do their work effectively and efficiently. • Writes the places where the computers are used. 40 min 1.4 Things we can do using computers • Gets the students to discuss the things that they can do using the computer. • Discusses the things that can do the things using the computer and the things that they can do manually as well. • Discusses the differences of work do by using the computer and the things done in manually. • Gets the students to write the things that they can do using the computer. • Inculcates the value of the computer in different fields. • Writes the things that they can do using the computer. 40min 1.5 Introduction to keywords (Data, Process, Information) • Puts the following words on the white board respectively, Data, Process, and Information. • Teacher elicits the given vocabulary items by taking instances from the student background. • Gets the students to use the vocabulary items in the field of computing. • Develops the vocabulary knowledge. • Uses the given vocabulary items meaningfully. 40min 1.6 Introduce mouse and Keyboard 1.6.1 Mouse – Mouse clicks 1.6.2 Keyboard – Delete/Backspace/Shift/Caps lock Keys • Teacher demonstrates the mouse and shows how it works. • Teacher asks students to come before the computer and asks to click the mouse on the desktop. • Gets the students response after this activity. • Teacher shows how to move the mouse. • Gets the students to move icons on the desktop. • Teacher opens the MS Paint application and asks them to draw any picture as they wish. • Teacher shows the arrangement of the keys in the keyboard and initiates a small discussion the function of the keyboard. • Teacher gets the use of a “Typing Tutor” program and lets the students to practice. • Develops the skills in using the mouse and keyboard. • Uses the keyboard and mouse in the proper manner. • Discusses the value of using the keyboard and mouse. 40min 1.7 Advantages of computer compared with human • Teacher discusses the activities done by the human beings. • Teacher initiates a discussion on the things done by the computer. • Teacher gets the students to discuss the advantages of computer compared with human. • Teacher asks them to write the facts have discussed in the class. • Develops the value of computer compared with human. • Writes the advantages of computer compared with human. 20min 1.8 Disadvantages of computer compared with human • Teacher discusses the bad influence of the computer usage in the society. • Teacher gets the students to discuss the modern trends and negative aspect of the computer usage. • Teacher gets the students to write the disadvantages of computer compared with human. • Writes the disadvantages of computer compared with human. • Discusses value of getting remedial measures to overcome the problems occurred due to the bad use of computer. 20min

Teachers guide (Grade 6)Royal College Colombo 7

Competency Levels

Suggested Learning Process Learning Outcomes Comments Time Duration
M3: Computer Hardware

  1. Introduction to Computers
  • Teacher asks about the physical components of a computer system
  • Teacher asks,

1)      Have you seen a computer?

2)      If you look at the computer what did you see?

  • Teacher elicits the word “Tangible”.
  • Student identifies the hardware components of a computer.
  • Student answers the questions given by the teacher.
3.1. Hardware Components     3.1.1  Parts of a computer

  1. 3.1.2            Basic functional block   diagram of a computer (CPU, I/O, Storage)
  2. 3.1.3             Motherboards (optional)

1)      Processors

2)      Memory

3)      I/O interfaces

  1. Teacher pastes a picture of a computer on the white board
  • Teacher describes the term “Hardware” with the help of this picture.
  • Teacher asks student get in to group and provides the following activity cards to each group.
  • Teacher assigns the particular activity,

Underline the correct answer.

1)      Hardware refers to

a)      The outside of the CPU.

b)      Any parts of the CPU.

  1. The tools to repair the computer.
  2. Computer instruction or programs that tell to computer what to do.

2)      What is RAM?

  1. Removable archive memory.
  2. Real Access memory.
  3. Random access memory.

2)      Name five devices of a computer?

Teacher draws a picture of a computer on the white board

  • Teacher asks pupil’s to name it.
  • Teacher describes what the Input & Output devises are in the computer system.
  • Teacher asks following question

Chose the correct answer

a)      An example of an input devise is the

a)CPU b) Monitor c) Keyboard d) printer

b) An example of an output device is the

a) keyboard c) floppy drive

c) Which of the following devices is used for both

Input and output.

a)      printer b)floppy drive c)keyboard

a)      Mouse

Teacher shows a picture of a computer and asks

  • Device name
  • Input or Output

Teacher asks pupils to draw a picture of a computer and name it.

Lists down Input & Output devices

  • Answers the questions given by the teacher.
  1. 3.3        Storage
  2. 3.3.1            Removable media

(floppy disks)

  • Teacher gets the student to state the definition of a storage device
  • Gives examples of primary and secondary storage
  • Identifies the differences between main storage devices and auxiliary storage devices
  • Gives examples of main storage devices
  • Gives examples of auxiliary storage devices
  • Compares different types of storage devices
3.4 Computer Architecture      3.4.1 Number Systems      3.4.1.1 Introduction to Binary                   Number System3.4.1.2 Conversion between

decimal and binary (optional)

Teacher recapitulates about the previous lesson.“What is binary number system?”Puts the following on the white board and ask them to solve.Converts decimal to binary – 45Converts binary to decimal – 1011

v       Tells rules for binary addition.

v       Shows how to do binary additions.

v       Tells rules for binary subtractions.

v       Shows how to do binary substarctions.

Gives an activity ( take them to the white board)

Binary addition

1)      1010 + 0011  2) 0011+0101  3) 1101+0111

Binary substarctions

2) 1100-0010  2) 1101-0011  3) 1110-0010

  • Develops the basic knowledge in number systems.
  • Answers the questions given by the teacher.

Introduction to Programming Languages
• First Generation of Computer Programs
• Sequence of machine instructions
• loaded into memory through a set of switches
• Second Generation Computer Programs
•  introduction of assembly languages
• enables the programmers to use mnemonic names for the machine instructions and symbolic names for memory locations.
• A translator called the assembler converts assembly language programs into machine code

• Third Generation Languages :
• High level languages such as BASIC, Fortran, Pascal, and C eliminate the close ties to the CPU’s machine instructions
• provide standard data types such as integers, floating point numbers, and characters etc.,
• Instructions are user friendly
• Compilers are available to translate these high- level language instructions to machine instructions.
The accepted programming style :
•  to organize related data items using programming constructs such as Pascal Records or C Structures
• then treat the resulting block of data as a single unit.
• After data structures are laid out,  the application is written as a collection of procedures that manipulate these structures.

• With ever-increasing:
• hardware capabilities such as faster CPUs, Main Memory Space, Hard Disk space
• better graphics, and
• easier networking,
• Users have come to expect software to have greater functionality
• Window-based graphical user interface,
• transparent access to data stored in mini or mainframe computers,
• and the ability to work in a network environment etc

• Faced with this complexity,
• More programmers are starting to use Object Oriented Programming (OOP).
• OOP is a new way of organising code and data
• OOP promises increased control over the complexity of the software development process.
• The underlying concepts of OOP are
• Data Abstraction with Encapsulation
• Inheritance and
• Polymorphism
Evolution of OOP Languages

• SIMULA67  gave us the crucial Object oriented concepts of classes, dynamic objects, encapsulation and inheritance.
•  Smalltalk is another OOP Language and environment released in 1980.
• C++ first version 1983
• Eiffel in 1988 -not an extension of an existing procedural language syntax

Introduction to Java
• Java was  developed by James Gosling at Sun Microsystems in 1991.
• His Original Aim was to develop a low cost, Hardware Independent Language based on C++.
• Due to technical reasons that idea was dropped .
• A new programming Language called Oak was developed based on C++ .
• The language oak was developed by removing undesirable features of C++.
• Those features include:
– Multiple Inheritance
– Automatic type conversions
– Use of pointers &
– Memory Management.
• By 1994 the World Wide Web Emerged and Oak was Re-named as Java.
• The Java language was Successfully used to develop a web browser called WebRunner and Java/Hotjava project was commenced.
• In Early 1995, Hotjava,Java,Java Documentation and Source code was made available over the web as an alpha version.
• In December 1995, beta version2 of Java was released.
• On January 23, 1996 Java 1.0 was officially released and made available to download over the net.
• Latest version of Java 2 SDK. And Documentation Can be downloaded at
– www.javasoft.com/
Running Java Programs
• Introduction to Java Development Kit (JDK)
– JDK provides core set of tools that are necessary to develop professional Java applications
– These tools are discussed in detail later
– JDK tools are also written in Java.
• Creating a Java Source File
– Any plain text editor or text editor capable of saving in ASCII format  can be used to create a Source file
– Examples are DOS EDIT, Notepad etc.
– Source File should be saved with a .java extension
• Compiling and Running the Source File
– First set the Java Environment
Setting The Path
In your Autoexec.bat file set the PATH and
CLASS PATH  as follows
PATH  …………;\C:\JDK1.3\BIN\
…. Indicates any existing paths
SET CLASSPATH =C:\JDK1.3\lib\classes.zip;.;
• Compiling and Running the Source File contd..
– Assuming you saved your source file in myJavaPrg Directory;
Introduction to Programming Languages
• First Generation of Computer Programs
• Sequence of machine instructions
• loaded into memory through a set of switches
• Second Generation Computer Programs
•  introduction of assembly languages
• enables the programmers to use mnemonic names for the machine instructions and symbolic names for memory locations.
• A translator called the assembler converts assembly language programs into machine code

• Third Generation Languages :
• High level languages such as BASIC, Fortran, Pascal, and C eliminate the close ties to the CPU’s machine instructions
• provide standard data types such as integers, floating point numbers, and characters etc.,
• Instructions are user friendly
• Compilers are available to translate these high- level language instructions to machine instructions.
The accepted programming style :
•  to organize related data items using programming constructs such as Pascal Records or C Structures
• then treat the resulting block of data as a single unit.
• After data structures are laid out,  the application is written as a collection of procedures that manipulate these structures.

• With ever-increasing:
• hardware capabilities such as faster CPUs, Main Memory Space, Hard Disk space
• better graphics, and
• easier networking,
• Users have come to expect software to have greater functionality
• Window-based graphical user interface,
• transparent access to data stored in mini or mainframe computers,
• and the ability to work in a network environment etc

• Faced with this complexity,
• More programmers are starting to use Object Oriented Programming (OOP).
• OOP is a new way of organising code and data
• OOP promises increased control over the complexity of the software development process.
• The underlying concepts of OOP are
• Data Abstraction with Encapsulation
• Inheritance and
• Polymorphism
Evolution of OOP Languages

• SIMULA67  gave us the crucial Object oriented concepts of classes, dynamic objects, encapsulation and inheritance.
•  Smalltalk is another OOP Language and environment released in 1980.
• C++ first version 1983
• Eiffel in 1988 -not an extension of an existing procedural language syntax

Introduction to Java
• Java was  developed by James Gosling at Sun Microsystems in 1991.
• His Original Aim was to develop a low cost, Hardware Independent Language based on C++.
• Due to technical reasons that idea was dropped .
• A new programming Language called Oak was developed based on C++ .
• The language oak was developed by removing undesirable features of C++.
• Those features include:
– Multiple Inheritance
– Automatic type conversions
– Use of pointers &
– Memory Management.
• By 1994 the World Wide Web Emerged and Oak was Re-named as Java.
• The Java language was Successfully used to develop a web browser called WebRunner and Java/Hotjava project was commenced.
• In Early 1995, Hotjava,Java,Java Documentation and Source code was made available over the web as an alpha version.
• In December 1995, beta version2 of Java was released.
• On January 23, 1996 Java 1.0 was officially released and made available to download over the net.
• Latest version of Java 2 SDK. And Documentation Can be downloaded at
– www.javasoft.com/
Running Java Programs
• Introduction to Java Development Kit (JDK)
– JDK provides core set of tools that are necessary to develop professional Java applications
– These tools are discussed in detail later
– JDK tools are also written in Java.
• Creating a Java Source File
– Any plain text editor or text editor capable of saving in ASCII format  can be used to create a Source file
– Examples are DOS EDIT, Notepad etc.
– Source File should be saved with a .java extension
• Compiling and Running the Source File
– First set the Java Environment
Setting The Path
In your Autoexec.bat file set the PATH and
CLASS PATH  as follows
PATH  …………;\C:\JDK1.3\BIN\
…. Indicates any existing paths
SET CLASSPATH =C:\JDK1.3\lib\classes.zip;.;
• Compiling and Running the Source File contd..
– Assuming you saved your source file in myJavaPrg Directory;

Introduction to Programming Languages
• First Generation of Computer Programs
• Sequence of machine instructions
• loaded into memory through a set of switches
• Second Generation Computer Programs
•  introduction of assembly languages
• enables the programmers to use mnemonic names for the machine instructions and symbolic names for memory locations.
• A translator called the assembler converts assembly language programs into machine code

• Third Generation Languages :
• High level languages such as BASIC, Fortran, Pascal, and C eliminate the close ties to the CPU’s machine instructions
• provide standard data types such as integers, floating point numbers, and characters etc.,
• Instructions are user friendly
• Compilers are available to translate these high- level language instructions to machine instructions.
The accepted programming style :
•  to organize related data items using programming constructs such as Pascal Records or C Structures
• then treat the resulting block of data as a single unit.
• After data structures are laid out,  the application is written as a collection of procedures that manipulate these structures.

• With ever-increasing:
• hardware capabilities such as faster CPUs, Main Memory Space, Hard Disk space
• better graphics, and
• easier networking,
• Users have come to expect software to have greater functionality
• Window-based graphical user interface,
• transparent access to data stored in mini or mainframe computers,
• and the ability to work in a network environment etc

• Faced with this complexity,
• More programmers are starting to use Object Oriented Programming (OOP).
• OOP is a new way of organising code and data
• OOP promises increased control over the complexity of the software development process.
• The underlying concepts of OOP are
• Data Abstraction with Encapsulation
• Inheritance and
• Polymorphism
Evolution of OOP Languages

• SIMULA67  gave us the crucial Object oriented concepts of classes, dynamic objects, encapsulation and inheritance.
•  Smalltalk is another OOP Language and environment released in 1980.
• C++ first version 1983
• Eiffel in 1988 -not an extension of an existing procedural language syntax

Introduction to Java
• Java was  developed by James Gosling at Sun Microsystems in 1991.
• His Original Aim was to develop a low cost, Hardware Independent Language based on C++.
• Due to technical reasons that idea was dropped .
• A new programming Language called Oak was developed based on C++ .
• The language oak was developed by removing undesirable features of C++.
• Those features include:
– Multiple Inheritance
– Automatic type conversions
– Use of pointers &
– Memory Management.
• By 1994 the World Wide Web Emerged and Oak was Re-named as Java.
• The Java language was Successfully used to develop a web browser called WebRunner and Java/Hotjava project was commenced.
• In Early 1995, Hotjava,Java,Java Documentation and Source code was made available over the web as an alpha version.
• In December 1995, beta version2 of Java was released.
• On January 23, 1996 Java 1.0 was officially released and made available to download over the net.
• Latest version of Java 2 SDK. And Documentation Can be downloaded at
– www.javasoft.com/
Running Java Programs
• Introduction to Java Development Kit (JDK)
– JDK provides core set of tools that are necessary to develop professional Java applications
– These tools are discussed in detail later
– JDK tools are also written in Java.
• Creating a Java Source File
– Any plain text editor or text editor capable of saving in ASCII format  can be used to create a Source file
– Examples are DOS EDIT, Notepad etc.
– Source File should be saved with a .java extension
• Compiling and Running the Source File
– First set the Java Environment
Setting The Path
In your Autoexec.bat file set the PATH and
CLASS PATH  as follows
PATH  …………;\C:\JDK1.3\BIN\
…. Indicates any existing paths
SET CLASSPATH =C:\JDK1.3\lib\classes.zip;.;
• Compiling and Running the Source File contd..
– Assuming you saved your source file in myJavaPrg Directory;

Evolution
Abacus – calculating device (3000 BC)
Pascaline – mechanical adding machine (1642)

Babbage – analytical engine (1830s)
Ada – first programmer (1800s)
Punched cards – data storage (1800s)
Hollerith – tabulating machine (1890s)
Mark I – general purpose computer (1944)
ENIAC – electronic computer (1946)
UNIVAC – US Census Department (1951)
EDVAC – Stored Program Concept (1951)
Generation of Computers
Classification of Computers
Microprocessor chip
Floppy disk for data storage
Pocket Calculator
Apple II – first personnel computer
IBM PC
Portable computers
Laser Printing and Desktop Publishing
Multimedia desktop computers
Home video computers
Video conferencing

Abacus (3000 BC)
An ancient calculating device
Still being used in China, Russia and the Far East

Pascaline (1642)
A desktop mechanical adding machine
Developed by Blaise Pascal

Analytical Engine (1830s)
This was invented by Charles Babbage who is known as “the father of computers”.
Designed to store one thousand 50 digit numbers for calculations and decisions.

Ada (1800s)
Probably the world’s first computer programmer.
Collaborated with Charles Babbage.

Punched Cards (1800s)
A card punched with holes in certain places so that a computer can read data coded from the combination of holes.
First used by Joseph Jacquard to automate his weaving factory.

Tabulating machine (1890s)
This was invented by Herman Hollerith to tabulate 1890 US census data.
It was electrically powered and, used punched cards.

Mark I (1944)
This was invented in 1944 by Dr. Howard Aiken.
Worked on programmable, general purpose computer.
IBM product.

ENIAC (1946)
Electronic Numerical Integrator And Calculator (ENIAC).
This was invented in 1946 by John Presper Eckert and John William Mauchly.
It was the first large-scale electronic digital computer.
It was a valve based computer and now referred to as a computer in the first generation.

UNIVAC (1951)
Universal Automatic Computer (UNIVAC )
This was invented to tabulate 1950 Census by John Presper Eckert and John William Mauchly
Processed both numerical and alphabetical calculations with ease

EDVAC (1951)
Electronic Discrete Variable Automatic Computer (EDVAC )
This was invented by Dr John Von Neumann
They emphasized the idea of stroed program, in contrast to program supplied by an input device as required

First Generation (1951-58)
Vacuum tubes for internal operations
Low-Level languages for programming (machine language)
Magnetic drums for primary memory.
Primary memory limited.
Heat and maintenance problems.
Punch cards for input and outputs.
Slow input and output
e.g. UNIVAC I, EDVAC

Second Generation (1958-64)
Transistors for internal operations.
Increased use of high level languages.
Magnetic cores for primary memory.
Increased memory capacity.
Binary coded data.
Increasing processing speed.
Magnetic tapes and disks for secondary storage
e.g. IBM 1620, UNIVAC 1108.

Third Generation (1965-70)
Integrated circuits (ICs) on silicon chips for internal operations.
Increased memory capacity.
Common use of minicomputers.
Emergence of software industry.
Reduction in size and cost.
Increase in speed and reliability.
e.g. HONEY WELL-6000 SERIES

Fourth Generation (1971-today)
Large Scale Integration (LSI) and Very Large Scale Integration (VLSI) for internal operations.
Development of the microprocessor.

Fourth Generation (1971-today)
Introductions of micro and super computers.
Increase in speed,power and storage capacity.
Parallel processing.
Artificial intelligence and expert systems.
Robotics
Increased use of Micro/Personal Computers.
e.g. Apple II, IBM PC, Micro Computers

Fifth Generation (1981-1990s)
A project to develop intelligent computers.
They are computers with artificial intelligence.
Symbolic manipulation and symbolic reasoning is required.

Classification of Computers
Mainframe Computers.
Minicomputers.
Microcomputers.
Special Purpose Computers.

Comparison of Computers
Computers have become
Faster (more powerful and more memory),
Cheaper (cost less) and
Smaller in size
with time.

Types of Modern Computers
Microcomputers
Workstations and Personal Computers
Minicomputers
Mainframe Computers
Parallel Processing Computers
Supercomputers

Microcomputers
Based on a microprocessor – single silicon chip CPU.
Enables the integration of sound, video, graphics, as well as text into software
‘Multi-media’ systems now available.
Appeared mid-to-late 70s.
E.g. Apple II, TRS-80, Sinclair Spectrum, Commodore PET
Peripherals connected via ISA, PCI, EISA, PCMCIA/PC Card e.g. CD-ROM sound card, speakers, microphone
Portable models now available
Small computers that can fit on a desktop or briefcase.
Two types
Personal Computers (PC)
Workstations

Personal Computer (PC)
Desktop or portable (laptop, notebook, palmtop).
Used in most organisations and at homes.
Commonly used for easy-to-use programs such as word processing, spreadsheets.

Workstations
Powerful and expensive than a PC.
Often connected to large computer system.
Designed to work with large or complex applications.
Used by engineers and scientists.
e.g. drafting, engineering design, 3D-graphical models

Minicomputers
Scaled down mainframe. (refrigerator-size)
Designed to meet the computing needs of a department or small company. Typically 4-100 concurrent users.
Usually run without a special environment.
Can support a number of concurrent applications and often uses a time-sharing operation system that aims to keep the users busy.
Lesser processing speed and data-storing capabilities than mainframes and used by medium-sized companies for specific purposes.
Low-end mainframes and high-end microcomputers can overlap.

Mainframe Computers
A multi-user computer designed to meet the computing needs of a large organization.
Originally the term referred to the metal cabinet housing the CPU.
Generally refers to computers of the 1950s and 1960s.
A Large number of dumb terminals were used for input/output and it had a large number of peripherals attached.
Can process a number of applications concurrently.
Used by large organisations to handle millions of transactions.
Usually housed in specially wired air-conditioned rooms.
Less powerful than supercomputers.

Supercomputers
Sophisticated, expensive computers, using state-of-the-art technology.
Provide processing speeds, many times that of powerful workstations.
Particularly used in the simulation and modeling of complex systems.
e.g. weather, chemical processes, the US economy, motion of galaxy

Parallel Processing Computers
Mainframe power computer that uses more than one processor (e.g. 8 – 256 processors)
Used to serve several transactions simultaneously (e.g. ATM)

Evolution of Microprocessors
Intel
8086, 80286, 80386, 80486, Pentium
Motorola
68000

Evolution of Intel Microprocessors
CPU Year Speed (MHz) Size (Bytes)
8086 1978 4.77 2
8088 1978 4.77 1
80286 1984 10-20 2
80386DX 1986 16-33 4
80486DX 1989 25-33 4
Pentium 1993 66-200+ 4
PII 1995 200+ 4
PIII 1998 Upto GHz 4

Computer Software
Consists of step-by-step instructions that tells the computer how to perform tasks.
Types of software
System Software
Application Software

System Software
Enables the application software to interact with the computer and helps the computer manage its internal resources.

Application software
Software that can perform useful work on general-purpose tasks, such as word processing, spreadsheets, and other automated applications.

Role of Systems Software
User interacts with applications software. Systems software enables the application software to interact with the computer and help the computer to manage its internal resources.
Systems software interfaces between the user or application programs and the computer hardware
Systems software is of three basic types
Operating systems
Language translators
Utility programs

Operating systems
The main piece of system software in any computing system which controls the overall system.
Language translators
Translate a program written by a programmer (e.g. Java) into machine language, so that the computer hardware can understand.
Utility programs
Generally used to support, enhance or expand existing programs in a computer system

Operating Systems (OS)
Consist of the master system of programs that manage the basic operations of the computer.
Booting (starting) the computer operation
Load, execute, store and retrieve programs
Retrieve, process and store data
Monitor the resources in the computer
Different types and makes of computers have their own operating systems.
E.g.  Mainframes: IBM uses MVS
Minicomputers: IBM uses OS/400
others UNIX, VMS
Microcomputers: IBM use DOS, Windows,
OS/2, Linux,
Apple uses Macintosh

Personal Computer OS
Two types are used
Single user (Stand-alone)
dedicated to a single computer system working all by itself.
Multi-user (Network operating systems)
designed for two or more users to share various resources

Single User OS
In the beginning all Personal Computer Operating Systems were designed for individual use.
To support the processing the needs of a single software application executed by an individual user at a given time in one Computer system.
Most popular stand-alone PC OS
MS-DOS
PC DOS
OS/2
Windows OS
Macintosh OS

MS-DOS
Usually referred to DOS was the most common personal computer operating system, until Windows OS appeared.
Developed by Microsoft
Now, integrated into Windows OS.
PC-DOS
Similar to MS-DOS, but developed specifically for the IBM PC.

Operating System/2 (OS/2)
Jointly developed by Microsoft and IBM for PS/2 line of personal computers.
It was the second generation of DOS.
Provided Multitasking and Graphical User Interface (GUI).
IBM has taken over all OS/2 development.

Windows OS
Began as a shell utility program for DOS.
Due to growing popularity of Macintosh OS, Microsoft began to develop Windows OS.
Now DOS is run through Windows.
Provides multi-tasking, graphics and multimedia capabilities.

Macintosh OS
Used icons and graphics instead of commands as in DOS and OS/2.
Was in use much before Windows OS.
Provides multi-tasking, graphics and multimedia capabilities.

Multi-User OS
A Network Operating System (NOS) operates separately but in addition to a stand-alone OS.
Coordinates activities among various computers and peripherals (disk drives, printers) connected in a network.
Main purpose is to enable people to share applications, files and printers.
One computer may be designated as the file server and the NOS may reside on it.
Files can be copied, shared and exchanged.

Servers
A server will provide various services to its users through a network.
File Server
Services users with data, application software, mass storage and other utilities.
Web Server
A system that hosts web sites that receives, manages and responds to the requests for documents and files.
Mail Server
Used to maintain all incoming and outgoing mail of the users associated with the network
Print Server
Used to share printers among users of the Local Area Network

Multi-User Operating Systems
Operating systems run on
Minicomputers,
Mainframe computers or
Network of personal computers
can be considered as Multi user operating systems.
E.g. Mainframe Computer OS
ACOS on NEC mainframes
IMS on IBM mainframes
E.g. Mini Computer Operating systems
OS/400 on IBM AS/400
UNIX on UNISYS 5000
E.g. Personal Computer Network OS (NOS)
NetWare
Apple Share
OS/2 Warp
Windows for Workgroups
Windows NT
Linux

NetWare
Created by Novell.
There are two versions ,one for PC compatibles and other Macintosh.
Apple Share
NOS for Macintosh PC.
One designated as the file server.
All the devices are connected in the Apple Talk network.

Windows NT
Microsoft’s full network version of Windows.

Linux
PC version of UNIX

Overview of DOS (Disk Operating System)

CONFIG.SYS file
A text file containing the environmental parameter setting of the computer. These parameters are loaded to the memory when computer boots up.
E.g.
files=20
buffers=30
device=Himem.sys   CONFIG.SYS

COMMAND.COM file
An executable file automatically loads to the computer memory when the computer boots up. A limited set of frequently used DOS commands such as DIR, DEL, COPY etc. (resident commands) are stored in this command.com file.

Memory Map
A portion of main computer memory (RAM) is used for the OS. Most operating systems use three different sections:
Supervisor
Resident commands
Transient commands.
Remaining RAM is available for the application programs.

Supervisor
Instructions which communicate with the user, cause input/output operations to occur and generally control the operations of the computer.
Resident commands
Frequently used utility programs (e.g. DIR) reside permanently in the RAM.
Transient commands
Reserved for less frequently used functions which are loaded from the disk as required.

File Management
Formatting disks, diskettes
Deleting files from a disk
Copying files
Moving Files
Renaming files
Executing files
Displaying file structure
Setting system date and time
Drive names

Frequently used DOS commands
DIR
Lists the files and subdirectories in a directory
COPY
Copies files to different locations (creates backup files)
CD
Changes the current directory
MKDIR
Makes a new directory
REN
Renames a file
DEL
Deletes a file
PATH
Indicates which directories should search to execute files.

Other Utilities in DOS
A large number of DOS commands which are not frequently (transient commands) used are stored in C:\DOS directory.
E.g.  FORMAT
XCOPY
FDISK etc.

Command Line Interface
Has the following distinguishing features:
Disk drive destination (e.g. C:)
A letter telling you which drive is in use.
Directory destination (e.g. WINDOWS)
Indicates which directory you are presently using. The root (\) has no directory.
Prompt (e.g. >)
A character or a message that tells you that the computer system is ready to accept a command or input.
Cursor (e.g. _)
Usually a blinking rectangle or a blinking underline that tells you where the next keyboard character typed will appear on the screen,

Batch File
An text file with the extension .BAT, including a sequence of dos commands or executable program names.
E.g.
AUTOEXEC.BAT

TRS Routines
Terminate and Stay Resident (TRS) utilities will not release occupied memory space after finishing execution. Allows to call and use TRS program while using another program.
Normally the other programs leave the RAM used to the next program that is to be loaded.

Graphical User Interface (GUI)
Most sophisticated but user-friendly, human-computer interface
GUI pronounced “gooey”
Enable to use
colour and graphics
icons
pull-down menus
pointing device

Windows Features
Mouse triggered icons for easy operation
Each task is backed by a rectangle shaped flexible window
Can open many windows as user desires depending on the availability of the memory
Menu bar and drop down menus for each window
Mouse operated Close, Minimize and Maximize buttons on each window
Folder and Document concept and small icon to indicate the document type
Availability of many colours on the desktop and windows
User friendly dialog boxes and error messages
Better graphical support using high resolution display settings
Entire control panel for changing settings and addition of new hardware or software
Rapid file or folder searching facilities
User management and sharing of resources
On line help facility

Advantages of GUI Operating Environment
Multitasking
Consistent application design and function
File compatibility
Cut and paste
Accessories

Evolution of Microsoft Windows
WINDOWS 3.1 ( 1990)
WINDOWS 95
WINDOWS 98
WINDOWS NT (1993)
WINDOWS 2000
WINDOWS XP
wINDOWS VISTA

Overview of WINDOWS 3.1
First enhanced multitasking operating system running on IBM compatible personal computers.
First graphical user interface and window based operating system.

Overview of WINDOWS 95
New Improved Interface
Taskbar & Start Button
Windows Explorer
Powerful browser
Long Filenames
Long file names are accepted
Improved Game and Multimedia support
Improved performance for playing video and sound files
Plug & Play Hardware Compatibility
Windows can recognize and configure your new hardware automatically
32 bit Preemptive Multitasking
Allows to see more than one program at a time
Microsoft Exchange
Facilitates electronic communications including E-mail and Faxes
Microsoft Network
Supports online service to communicate with people worldwide using E-mail Bulletin Boards and the Internet.

Overview of WINDOWS 98
Active Desktop
Allows to display any web page on the desktop without opening a web browser software.
Single Click Facility
Facilitates to open programs by simply clicking on the icon.
Multiple Monitors
Allows to connect up to nine monitors to a single computer to increase the workspace. Each monitor can be used to display different programs.
USB Support
Installation of new hardware using Universal Serial Bus (USB) standard allows to use new hardware without restarting the computer.
Supports DVD, DIGITAL AUDIO, VRML
High quality digital movies and audio along with the web pages that use virtual reality features.
Microsoft WEB TV
Television broadcasts and TV program listings can be checked.
Faster and Reliable
FAT 32 file system allows you to store files more efficiently and save hard disk space.
Supports online web site for answers to common questions and keeps your copy of windows upped date.

Overview of WINDOWS NT
In 1988 Bill Gates commissioned the creation of a new operating system. The premise for the design of this operating system was portability, Security, Compliance and Compatibility, Scalability, Extensibility, and ease of internationalization.
Portability
The system allows to run on different hardware platforms with minimal changes.
Security
Provides National Security Agency of United States C2-level criteria.
Scalability
Provides symmetric multiprocessing (SMP)
Compliance and Compatibility
Provides POSIX (Portable operating system interface)-compliance. Runs existing windows applications, and supports open international standards.
Extensibility
Can be easily expanded on by writing to a well defined application programming interface (API).
Ease of Internationalisation
Can be easily ported to run in numerous different languages and writing systems, with minimal modifications to the software.

Overview of WINDOWS 2000
Built on NT Technology, the Windows2000 Platform delivers the business operating system for the next generation of PC computing.

With built-in Web and application services, Internet-standard security, and record-breaking performance at a low cost, it’s a better operating system for doing business on the Internet.
And it’s the best operating system for taking                      advantage of all the latest hardware, from the smallest mobile devices to laptops to the largest, most powerful servers for  e-commerce.
It is available in the form of four types.
Windows 2000 Professional
Windows 2000 Server
Windows 2000 Advanced Server
Windows 2000 Datacentre
Windows 2000 Server
Windows 2000 Server is the entry-level version and is a perfect solution for file, print, intranet, and infrastructure servers.
Windows 2000 Advanced Server
Windows 2000 Advanced Server delivers enhanced reliability, availability, and scalability for running
e-commerce and line-of-business applications.
Windows 2000 Datacentre
Windows Datacenter Server is the most powerful serveroperating system ever offered by Microsoft.
Datacenter Server is designed for enterprises that demand the highest levels of availability and scale.

Word Processing
A Word Processing software is designed to
write (create),
revise (edit),
format,
store and
print documents.

Creating a New Document
Start the word processing package.
Select new from the File menu bar
or click on the new icon of standard toolbar.
This will create a new document.
The workplace is the screen which appears.
You can type any text in the workplace.

Items of a typical WP Screen
Title bar
Displays the program name and current file; Document1 means that the current file is a new document, that is not yet saved.
Menu bar
Lists the available pull-down menus, which contain commands.
Ruler
Display the margin and tab settings.
Standard toolbar
Displays icons that execute commands used most frequently.
Formatting toolbar
Displays information and icons that pertain to text formatting options.
Drawing toolbar
Display information and icons that pertain to drawing and colours.
Document Window (work space)
Area in which you create the document.
Status bar
Displays information about current location in the text.
Scroll bars
On the right (vertical) and across (horizontal) the bottom of the document window. Allows to move through text in the document.

Saving
Process of storing the document in a file and giving it a unique file name
Protect the work while writing and preserve the finished document
Important to save the work done frequently (e.g. after completing a paragraph)
Autosave feature save open files without commands.

Saving a New document
Select Save from File the menu bar
or click the Save (disk) icon on standard toolbar.
Only for new documents
or when Select Save As from File the menu bar
Save As dialog box pops up
Select the directory or drive where the file should be saved.
Type the name in which the file should be saved.
Then click on Save button.

Opening a Document
Select Open from File menu bar
or click the Open (Document) icon on standard toolbar.
Select the directory or drive from where the file should be opened.
Select the name of the file to be opened.
Then click on Open button.

Writing
Process of conveying information by typing words.
Word Wrap
Automatically pushes the text to the next line and allows to continuously type without pressing the typewriter’s carriage return key at the end of each line of text.
Soft Copy
Don’t have to use paper to write as the document is stored in RAM. The output produced by the video monitor (what is screen on the screen) is the soft copy.
Scrolling
Enables to view successive portions of a document on the screen. Can scroll through the soft copy in memory using scroll bars.
Reorganise
Don’t have to retype the entire document when you reorganise or make changes or corrections.
Hard Copy
All changes can be made in soft copy before printing on paper (hard copy).

Revising
The process of re-reading, changing, deleting and replacing text that have been written.
Changes are made on the soft copy.
New draft or version is created when revising a document. This could be save using a new file name or the existing name.
Adding new text.
Editing modes
Delete / undelete portions of the document.
Delete unwanted text
Undelete text that was accidentally deleted.
Moving or rearranging portions of the document.
Selecting text
Drag and drop
Copy and paste
Cut and Paste

Drag and Drop (Move Text)
Select the required text and move the mouse cursor to the new position and release the mouse.
Similar to Cut and Paste, but paste only once.

Copy and Paste
Allows to repeat terms, symbols or text phrases.
Select the required text and
Copy the text by clicking the Copy icon on the standard toolbar (or click the Copy on Edit menu bar)
Move the mouse cursor to the new position and
Paste the text by clicking the Paste icon on the standard toolbar (or click the Paste on Edit menu bar)
Last two steps can be repeated.

Cut and Paste
Reorganise text for clearer meaning.
Select the required text and
Delete the text by clicking the Cut icon on the standard toolbar (or click the Cut on Edit menu bar)
Move the mouse cursor to the new position and
Paste the text by clicking the Paste icon on the standard toolbar (or click the Paste on Edit menu bar)
Last two steps can be repeated. Otherwise same as drag and drop.

Find / Replacing specific characters, words or phrases.
Find or Search
Find and Replace
Move to specific pages, footnotes and other specific locations (e.g. sections, table numbers)
Go To

Find or Search
Locate specific words or phrases. Has the ability to search through a document and find a particular section of text or ‘string’.
Find
Select Find from the Edit menu bar
Enter the text you want to search for in the dialog box
Click Find Next

Find and Replace
Make global changes to word or phrase (e.g. replace misspelled word).
Search through a document and find a particular section of text or ‘string’, and replace it with another. There are varied uses for
Find and Replace
Select Replace from the Edit menu bar
Enter the text you want to search for in the find what box and replacement text in the replace with box of the dialog box
Click Find Next, Replace, or All.
To cancel a search in progress, press ESC.

Editing Modes
Insert key is a toggle key that allows to switch between the two modes: Insert and Overtype.
In Insert mode (default normally), each character typed is inserted at the inserted point and any following characters are pushed to the right and down (word wrap).
In Overtype mode, each character typed replaces any existing characters at the insertion point.

Formatting
It is not enough just to type a document.
Must make that document look attractive.
Types of formatting
character / word
line / paragraph
page / document
Process of emphasising and arranging text.
Format words by underlining, boldfacing, italicising, type styles (font, size), colour.
Format lines / paragraphs by justification of text (left, right, center, full), bullets / numbering, tab spaces.
Format pages by setting margins, line spacing, layout, portrait / landscape, tables, graphics, borders, columns  and page breaks.
Format documents by setting page numbers, creating headers / footers, include date / time, insert worksheet.
When formatting text, there are three major things you usually change:
Typeface
Size of the font
Its attributes
These can be changed from the Formatting toolbar.
There are other, less important changes to the fonts which can be changed only through a dialog box.

Selecting Text
You can select text using either the mouse or the keyboard.
Additionally, you need to select text before you can Copy or Cut it to the Clipboard, and before you can use drag-and-drop editing.
You must select all the text you wish to format prior to applying character formatting.

Selecting Text Using The Mouse
Word has mouse shortcuts for selecting text.
To select a single word, double-click on the word.
To select a single sentence, hold down the Control (Ctrl) key and click anywhere in the sentence.
If you hold down the mouse key when you click and move the mouse cursor, you will select additional text one sentence at a time.
To select an entire paragraph, move the mouse pointer to the left margin, beside the paragraph to be selected, until it turns into an arrow. Then, double-click to select that paragraph.
If you hold down the mouse key on the second click, Word will go into paragraph-selection mode.
If you then move the cursor up or down, you will select additional text one paragraph at a time.
To select all the text between the current cursor position and a different position, move the mouse cursor to the new position and hold down the Shift and click with the mouse.
Common Text Attributes
Boldface Darker and heavier characters
Italic Slant characters
Underline Draws a line under words
Typeface (font) Design characters (e.g. Arial)
Type size Varies size of characters
Bullet Insert symbols (•) in front of items of a list
Subscript / Superscript Characters appears below / above the line

Bulleted Lists
To distinguish a list of points from the rest of the text it is usual to highlight them using bullets or point numbers.
A bullet is a symbol at the start of each point as in the following illustration.
To add bullets or numbers to a list of sentences, select the items and click Bullets / Numbering.

Common Text Positioning
Justification Align text against the left, right, center or both margins
Line spacing Space between lines of text – single, double, triple-spacing
Margins Blank space on either side of the text (e.g. 1”)
Page break Determine the last line of text at the bottom of a page
Page numbering Automatically inserts consecutive page numbers
Header / Footer Information that appears at the top / bottom of the page. Usually repeated throughout a document (e.g. document title or chapter name / page number or file reference)
You can apply the formatting by using the
Formatting Toolbar.

Indenting
Indenting refers to the space that is left between the margin and the beginning of the sentence.
Text with a hanging indent
Hanging indents are frequently used for bibliographic entries, glossary terms, resumes, and bulleted and numbered lists.
Text with a first-line indent
To alter the indenting for a paragraph
Select Paragraph from Format menu bar
Type in the value for the indentation in the paragraph dialog box.

Line Spacing
Line spacing refers to the space between the bottom of one line and the bottom of the next line.
Normally text is typed in ‘single spacing’ i.e. line spacing is one. Word processor automatically adjusts the line height to accommodate the size of the font you are using.

Page Formatting
Page formatting controls how the document is printed on the page.
The main focus of page formatting is the margins.
Select Page Setup from File menu bar.
Page Setup dialog box will appear.

Page Break
Page breaks are inserted to keep two pages separately from each other
To add a page break
Position the insertion point at the place where the page break to occur and
Either Select Break from the Insert menu bar
or  Press <Ctrl> and Enter

Spelling Checker.
Check spelling and other simple vocabulary or syntax errors (e.g. repeated words)
Checks both spelling and grammar.
If you want to check spelling only,
click Options on the Tools menu bar,
click the Spelling and Grammar tab,
clear the Check grammar with spelling check box,
and then click OK.
Click Spelling and Grammar on the Standard toolbar.
When possible spelling or grammatical error are found, suggestions will be displayed.
Make necessary changes in the Spelling and Grammar dialog box.

Thesaurus
A word-finding program that suggests synonyms, antonyms and related and contrasting words.

Document Enhancement
Borders
Draw a line or border using a different style and thickness , such as a heavier weight, click Line
Can add a border or line to
any or all sides of a table,a paragraph,
selected text in a document, any or all sides of each page in a document,
a drawing object including a text box, an AutoShape (ready made shapes), or a picture.

Shading
Fill in the background of a table, a paragraph, or selected text.
Drawing objects can be filled with solid or gradient (shaded) colours, a pattern, a texture, or a picture.

Select the table, paragraph or text for which you want to apply the border or shading.
Click Borders and Shading on the Format menu bar.
Apply borders using the Borders and Shading dialog box.

Headers and Footers
Header
A text or graphics that appears at the top of every page.
Footer
Appears at the bottom of every page.

Useful in long documents to indicate
e.g. the chapter or section title
a reference number or company logo
Click Header and Footer on the View menu bar.
Insert options (page no, date), add text or graphics to the header or footer using the Header and Footer dialog box.

Printing
Process of hard copy of the document on paper
Provides permanent record of the document
Allows to read and review a draft of the document on paper
Print preview enables to see what the document will look like when printed

Printing a Document
Select Print from File menu bar
or click the Print icon on standard toolbar.
Print dialog box pops up.
Select the printer (if not already set).
Select the page range.
Then click on OK button.

Some Printing Options
print more than one copy
print only specific pages
print only a selection of text
Print Preview
Allow to view the document and the general layout of the page before printing

Closing a Document
Select Close from File the menu bar
The file which is currently open will be closed
Need to save the file before closing the document, otherwise will abandon the changes made.

Usual Sequence of Use
1. Start Word Processor
2. Create a New file
or  Open an existing document file.
3. Write (Type) text, add a picture or a chart
or  Revise an existing document.
4. Save the document onto the disk.
5. Print the document.
6. Exit Word Processor.

Hyphenate
You can use the hyphenation feature to give your documents a polished and professional look.
helps eliminate gaps or “rivers of white” in justified text.
Also helps to maintain even line lengths in narrow columns.
To hyphenate text automatically
Select Language from the Tools menu, and then point to click Hyphenation.

Tables
Tables are an easy way to arrange and adjust columns of text and numbers, and are much more flexible than tabs.
A table can be inserted at any point in your text.
A table is made up of rows and columns of cells that you can fill with text and graphics
Can split or merge cells of a table
Can align numbers in columns and then sort and perform calculations on them
Can arrange text and graphics, such as side-by-side paragraphs in a resume
Can use shading to fill in the background
Can convert text to a Table
Can align text or orient the text vertically
Inserting a Table
Select Insert Table from Table menu bar and specify number of rows and columns.
or Click the icon to create a table and drag to select the number of rows and columns you want.
Tables…Sort
Select what you want to sort.
On the Table menu, click Sort (for a table)
Select the options for your sort.
Tables…Formula
An expression that can contain any combination of numbers, bookmarks that refer to numbers, fields resulting in numbers, and the available operators and functions. The expression can refer to values in a table and values returned by functions.

Operators used by = (Formula)
In an = (Formula) field, you can use any combination of values and the following mathematical and relational operators.

Operators:
+ (Addition), – (Subtraction), * (Multiplication), / (Division), % (Percentage), ^ (Powers and roots), = (Equal to), < (Less than), < = (Less than or equal to), > (Greater than), > = (Greater than or equal to), < > (Not equal to)

Functions used by = (Formula)
ABS(x) The positive value of a number or formula,
regardless of its actual positive or negative
value.
AND( x,y) The value 1 if the logical expressions x
and y are both true, or the value 0 (zero) if
either expression is false.
AVERAGE( ) The average of a list of values.
COUNT( ) The number of items in a list.
DEFINED(x) The value 1 (true) if the expression
x is valid, or the value 0 (false) if the
expression cannot be computed.
FALSE 0 (zero).
IF(x,y,z) The result y if the conditional expression x
is true, or the result z if the conditional
expression is false. Note that y and z
(usually 1 and 0 (zero)) can be either any
numeric value or the words “True” and
“False.”
INT(x) The numbers to the left of the decimal
place in the value or formula x.
MIN( ) The smallest value in a list.
MAX( ) The largest value in a list.
MOD(x,y) The remainder that results from dividing
the value x by the value y a whole number
of times.
NOT(x) The value 0 (zero) (false) if the logical
expression x is true, or the value 1 (true) if
the expression is false.
OR(x,y) The value 1 (true) if either or both logical
expressions x and y are true, or the value 0
(zero) (false) if both expressions are false.
PRODUCT( ) The result of multiplying a list of
values. E.g., the function { = PRODUCT
(1,3,7,9) } returns the value 189.
ROUND(x,y) The value of x rounded to the
specified number of decimal places y; x
can be either a number or the result of a
formula.
SIGN(x) The value 1 if x is a positive value, or the
value –1 if x is a negative value.
SUM( ) The sum of a list of values or formulas.
TRUE 1.
The following functions can accept references to table cells as arguments:

AVERAGE(), COUNT(), MAX(), MIN(), PRODUCT(), and SUM().

E.g. COUNT(above)
Formula: Reference Cells
Table cells are referenced as A1, A2, B1, B2, and so on, with the letter representing a column and the number representing a row.

Formula: Reference Cells
To reference cells in formulas, use a comma to separate references to individual cells and a colon to separate the first and last cells in a designated range
E.g. to average these cells, type
=average(b:b) or =average(b1:b3)
=average(a1:b2)
=average(a1:c2) or =average(1:1,2:2)
=average(a1,a3,c2)
There are two ways you can indicate an entire row or column.
If you use 1:1 to indicate a row and then add a column to the table, your calculation will include all the rows in the table;

if you use a1:c1 to indicate a row and then add a column to the table, the calculation will include only the rows a, b, and c.
Page Layout
General page layout can be changed by Selecting Page Setup from the File menu bar
Set the margins of the document
Page size
Page orientation

WYSIWYG
What You See Is What You Get (WYSIWYG)
Pronounced “wizzy-wig”
means that what you see on the screen (soft copy) is exactly what will be printed on paper (hard copy)

Mail Merge
Mail merge enables to insert individual names and addresses into a form letter and print a copy addressed to each individual.
Two documents are required for a mail merge
1.The main (master) document
This contains the standard text plus areas that are marked as ‘replaceable’ i.e. personal information can be slotted into them.
1.The main document
2.The data source:
This is a document containing the personal information, which is to be slotted into the standard letter. Each person’s information is in a separate paragraph.
Creating a Mail Merge
Use an existing letter as a form letter and open it
OR Create a new letter by
Selecting New from the File menu
Select a letter template
Select Mail Merge from the Tools menu
Mail Merge Helper dialog box pops up

Create a Master Document
To set up a mail merge begin by click Create
Define the merge type, e.g. Click Form Letters
Define main document by click Active Window
The active document becomes the mail-merge main document
Next use or create a list of names and addresses
Create a Data Source
Click Get Data
Click Create Data Source

Set up the data records and save. Enter values into the fields.
Click Edit Data Source on the Mail Merge Helper. The Data Form dialog box will appear

Use a Data Source
To use an existing list of names and addresses in a Word document or in a worksheet, database or other list
Click Open Data Source and designate the data source
Click Edit Main Document to type the text you want to appear in every form letter

Inserting Merge Fields
In the main document, click where you want to insert a name, address, or other information that changes in each letter.
Select Mail Merge from the Tools menu bar
Click Insert Merge Field from click the field name that you want

Merging Documents
After inserting all of the merge fields and complete the main document by
Clicking Mail Merge Helper from the Mail Merge toolbar and merge the document
Mail merged document will be created as another word document
Print or Save using usual way

Wizards
Wizard is a feature that asks questions and then uses your answers to automatically lay out and format a document, such as a newsletter or a resume.

It takes you through the process step by step using dialog boxes to prompt you

Templates
Template is a predefined form to help you quickly create commonly used documents, such as letters, resumes, reports, fax forms and memos.
A template can be used to define not only standard text but also aspects such as the font, borders, page size and orientation.
Once a template has been created it can be recalled and used to produce the required document. This saves time and ensures consistency.
Word comes with many predefined templates and several Wizards to help you create documents such as letters and memos. To create most documents the ‘normal document’  template Normal.dot is used.
If a document is started using File-New then a dialog box containing the names of the templates appears. These are grouped by type, which you select by clicking on the appropriate tab.

Creating a Template
A template can store boilerplate text, custom toolbars, macros, shortcut keys, styles, and AutoText entries.
An easy way to create a template is by opening a document that contains the items you want to reuse (e.g. margin settings, page size and orientation, styles, and other formats) and saving it as a document template (*.dot).

Formatting Sections

Using a section break, a document can be divided in sections, where each having its own formatting elements such as the margins, page orientation, headers and footers, and sequence of page numbers

A section break appears as a double dotted line that contains the words “Section Break.”

Types of Section Breaks
Next page
Inserts a section break, breaks the page, and starts the new section on the next page
Continuous
Inserts a section break and starts the new section on the same page
Odd page or Even page
Inserts a section break and starts the new section on the next odd-numbered or even-numbered page

Working with Graphics
(WordArt)
Allow to add a special text effect to text by using the WordArt button on the Drawing toolbar.
Can create shadowed, skewed, rotated, and stretched text, as well as text that has been fitted to predefined shapes.
Special text effect is a drawing object and is not treated as text.

Working with Charts
Create a chart by clicking Object on the Insert menu and then specifying Microsoft Graph
Microsoft Graph displays a chart and its associated data in a table called a datasheet
The datasheet provides sample information that shows where to type your own row and column labels and data
Once you have created the chart, you can
enter your own data on the datasheet
import data from a text file or
Lotus 1-2-3 file
import a Excel worksheet or chart,
copy data from another program
You can also create a chart from a table in Word

Chart Types
Several chart types can be selected, namely:
Column Bar   Line
Pie  XY (scatter) Area
Doughnut Radar  Surface
Bubble Stock  Cone
Cylinder Pyramid
Charts can have 3-D effect

Bar Charts
useful for representing growth, in time or value, of unrelated items
e.g. how much a sales territory has grown over consecutive years

Pie Charts
good for representing portions of a whole,
e.g. distribution of lunch sales as percentages
You can customise the colour and design of the chart, e.g. portion of the pie graph can be highlighted, the largest pie slice can be enlarged and separated, or a 3D image can be rotated

Line Charts
effective to show variations of data over a period of time, e.g. the number of influenza cases over the years

Inserting Pictures
Inserting from clipart
Place the insertion point where you want to place the picture in the document
Select Picture from the Insert menu bar, then select Clip Art on the cascading menu. The Insert Clipart dialog box appears
Select a particular clipart category and locate the clipart you want, click the image and select insert clip

Inserting from file
Place the insertion point where you want to place the picture in the document
Select Picture from the Insert menu bar, then select From File on the cascading menu. The Insert Picture dialog box appears
Locate the drive, folder and picture file, click the file preview it, and click Insert

Similarly you could insert
Charts
WordArt styles
Shapes

Inserting Other Package Objects
You may Copy objects from other package and Paste it in the Word document OR
Select File from on Insert menu bar
Insert File dialog box appears
Select the directory and folder from which the file should be opened
Select the file then click on OK

Inserting Worksheet
You can insert the worksheet or chart as a linked object or embedded object.

A linked worksheet or chart is displayed in your document, but its information is stored in the original Microsoft Excel workbook

An embedded worksheet or chart stores its information directly in the Word document

Inserting MS Access Data
You can insert the contents of a Microsoft Access table or query into an existing Word document. To keep the data in your document up to date, you can create a link to the Microsoft Access data.

OR export Microsoft Access data (table, query, report, or other database object) to a Word document

Bookmarks
Identifies a specific place in the document.
Used to identify specific text or graphics or to mark a location that you use frequently.

Customised Toolbars
Toolbar buttons can be added, reorganised or remove to suit an individual.

Footnotes / Endnotes
Cite sources of research and are required whenever you prepare a proper research document.
Placed at the bottom of each page or end of the document (as endnotes).

Indexes and Tables of Contents
An index is created by marking the words you want to include.
A table of contents is created by formatting the headings into categories (1, 2, 3, and so on).
Used for long documents like a textbook.

Macros
A macro is a program consisting of recorded keystrokes and an application’s command language that, when run within the application, executes the keystrokes and commands to accomplish a task.
A macro can automate tedious tasks and can also automate a series of procedures.
E.g. taking pages from a document and faxing them
Macros are often written in a simple programming language.
E.g. Visual Basic
Computer viruses infects a word processing document using this macro feature.
You could set the word processor to warn the user before invoking a macro and hence protect against viruses.

Introduction to Presentation Tools
What is a presentation ?
A Presentation is a Visual aid to a speaker who is explaining some matter to a large audience

Presentations can take the following forms
Paper
usually as reports, handouts
Transparencies
displayed with an Overhead Projector – OHP
Colour Slides
displayed with a Slide Projector
Screen displays
projected from the computer onto a multimedia projector

Introduction to Presentation Tools
OHP slide
Transparencies are used
Still images
Black & White mostly
Slides must be changed manually
Cheap in cost
Poor Quality
Slide Projector
35mm Colour Positives are used
Per slide cost is very high
Text & Still images
Slide advancing is done mechanically
Quality is high compared to the OHP
Recording to a slide must be done using an expensive side recorder
Computer based Presentation tools (Multimedia projector)
These are some application software specially developed for preparing effective presentations
Full capabilities such as multimedia features of the computer can be applied in making this type of presentations
Slide is prepared on the computer screen
A Video projector can be used for projecting what we have on the computer screen
Computer based Presentation tools…
Presentation of this type is a Series of slides arranged in a desired order
Each slide can have both textual and Graphical information
Both textual and Graphical objects can be animated to make an effective presentation
Visual effects such as effect of appearing a slide and erasing a slide are also possible
Examples for typical application software developed as Presentation tools
Harvard Graphics
Aldus Presentation
Corel Show
Microsoft PowerPoint  etc.

Presentation Graphics
Analytical Graphics
graphical forms that make numeric data easier to analyse (e.g. bar chart) than it is in the form of numbers (e.g. electronic spreadsheet)
Presentation Graphics
graphics used to communicate to others. They use analytical graphics, texturing patterns (speckled, solid), colour, 3D.
Computer graphics can be highly complicated, such as those used in special effects for movies (Jurassic Park)

Microsoft PowerPoint
Window based Presentation Tool
Comes with Microsoft Office
User friendly and Easy to learn
To run the software
Start / Programs /     Microsoft PowerPoint on windows flat form

Creation of slides
Two options:
Rapid Presentation design using wizards
Creating a Presentation using Blank Presentation Option

Rapid Presentation design using wizards
Ideal for an Absolute beginner as guided by the computer
Ready made standard presentation skeletons for easy and fast preparation
Good reference for experienced user
Go through this option before actually making your own presentation

Auto Content Wizard option
Presentation category selection
Presentation output selection
Filling Title Slide information
Modify contents of the given presentation

Creating a Presentation
Creating a Presentation using Blank Presentation Option
Good for experienced users
Layout of a slide can be selected
Good reference for experienced user
Go through this option before actually making your own presentation
You better have paper sketch with you

Creating a Presentation using Blank Presentation Option
User friendly – On screen guidance is available
Appropriate text is entered in text areas
New slides can be inserted using Insert Menu
Duplicate slides too can be inserted
At any moment selected layout can be altered through Format menu- slide layout

Slides Views
Different Views of a Slide
Slide view
Outline view
Slide sorter view
Notes pages view
Slide show view
Outline view
Outline of all text in the presentation is given here
Outline is readily printable for checking purpose
Slide sorter view
Shows many slide on single screen
Sorting of slides specially rearrangement is done in this view using drag and drop feature
Unnecessary slide deletion can be done by selecting and use of the delete key
Slide transition effects can easily be applied and monitored here
Notes Pages view
Relevant information to a slide can be introduced in notes pages using normal word processing
Notes are not visible to the audience
Speaker may keep printed version of these notes together with the corresponding slide
Slide Show view
This is the actual view of the slide to the audience
Switching to this view you can test its actual appearance
No modification is possible on this view

Graphical Objects
Clip Arts
Word Arts
Multimedia clips
Sound clips
Video clips
Still photographs
Auto Shapes

Clip Arts
Some ready made drawings in the package and can be inserted through insert menu.

Word Arts
Some ready made artistic lettering in the package and can be inserted through insert menu or using drawing tool bar

Multimedia Clips
Some ready made sound, video and photos in the package and can be inserted through insert menu

Auto Shapes
Some ready made shapes in the package and can be inserted through insert menu or Auto shapes button on the drawing tool bar

Selecting single object
Simply click on the object
Selecting many objects at the same time
Hold down shift key while selecting the next or by drawing a rectangle covering all objects to be selected using the cursor

Grouping selected objects
First select all objects to be grouped
Use Draw / Group on the drawing tool bar
Ungrouping selected objects
Select the grouped object by clicking on it
Use Draw / Ungroup on the drawing tool bar

Overlapped  objects
Changing the order of overlapping
Select the object to be brought up
Use Draw / Order to change the order

Aligning and Distributing objects
Align to middle, Centre, Left, Right etc.
Distributing vertically, Horizontally
Select the objects to be aligned or distributed
Use Draw / Align or Distribute

Free Rotation and Mirror image of objects (flipping)
Select the object
Use Draw / Rotate or Flip

Visual Effects
Transition Effects
Appear effect or erasing effect between two consecutive slides
Build Effects
Visual effects appear on a slide to make any textual or graphical object
Animation effects
Simple animation introduced to graphical objects

Transition Effects
These effects can be introduced on slide sorter view
Slide sorter tool bar is  activated on the slide sorter view
Slide advancement method, Transition effect and speed of effect

Build Effects

Animation Effects
These effects are introduced on the slide view
You have to place the objects at the destination after animation
Slide show menu / Custom animation will give necessary steps to be followed in animating objects on the slide

Animation Effects

Action Buttons
These are introduced to break the sequence of the presentation
Clicking on pre programmed button on the slide will trigger a display of a slide out of the sequence
On the slide view you may introduce action buttons and it can be programmed by the context menu

Action Buttons ( Pre defined set )
Action Buttons (Programming a tiny auto shape)

Input Devices
Input devices are used to feed data and instructions to the computer systems.They consist of a range of devices that take data and programs from the outside world that people can read or comprehend and convert them to a  form that the computer can manipulate.
The form of the input may be by means of
Keyboard, Pointing device
Writing & drawing input devices
Video, Text, voice input

Key board
It’s similar as normal typewriter keyboard, plus a number of special keys.
Standard keys are used to enter words & numbers. Special keys so called “function keys” labeled as F1,F2 … are used to enter commands.
A numeric keypad that resembles an electronic calculator’s keypad.
Cursor-movement key
QWERTY keyboard
104-key enhanced keyboard

Used for data entry and to issue commands into the system.

Pointing device – Mouse
A mouse is a device that can be rolled on a desktop to direct a pointer (cursor) on the computers display screen. The cursor is the symbol on the screen that shows where data may be entered next or the command to be activated.
Pointing devices commonly have two or three buttons that are used to issue commands to the computer.
Command Actions
Point – an act of moving the pointing device to an object on-screen.
Click – select the object on-screen.
Drag – holding down the pointing-device button while moving the selected object on-screen.
Pointing device – Trackball
A trackball performs like a stationary, upside-down mouse.
Most portable laptop computers use a built-in or clip-on trackball.

Pointing device – Joystick
A joystick is a small lever that can be moved in any direction to move an object on the screen.
Usually associated with playing computer games.

Graphics Input – Scanner
Image scanners or graphic scanners convert the printed or photographic image on paper into electronic signals and then into digital form. These digital information then can be stored in a computer & manipulated.
Text Input – Scanner
Text are scanned from the printed page into the computer, as an attempt to reduce errors in data entry while speeding up the process as well. The software incorporated converts the scanned images into character codes and thus enables text processing.
Text Input – OCR
Optical Character Recognition (OCR)
An input device that can read and recognise the symbols of text (special printed characters) & convert them to the machine readable form.

Writing & Drawing Input Devices – Light Pen
The light pen is a light sensitive stylus, or pen like device, connected by a wire to the computer terminal.
The user brings the  pen to a desired point on the display screen and presses the pen button, which identifies that screen location to the computer.

Writing & Drawing Input Devices – Touch Screen
The touch screen is video display screen that has been sensitized to receive input from the touch of a finger.

Writing & Drawing Input Devices – Digitizing Tablet
A digitizing tablet consists of a tablet connected by a wire to a stylus or puck.
A stylus is a pen like device with which the user “sketches” an image.
A puck is a copying device with which the user copies an image as it is moved over a desired path on a sketch.
More sophisticated stylus or pointing devices with high accuracy are used by designers, architects, artists, desktop publishers, map makers, etc.

Video input (Digital Camera)
As with sound, most films & video are generated and recorded in analog forms in which the signals are in continuously varying nature. Thus the signals come from the systems such as VCR, videodisk or laser disk, or a camcorder must be converted to digital form through a special video capture card installed in the computer.

Two type of video cards,
Frame grabber video card
can capture & digitize only a single frame at a time.
Full motion video card
can convert analog to digital signals at the rate of 30 frames per second, giving the effect of a continuously following motion picture.

Voice input (Voice Recognition)
Converts the person’s speech in to digital code by comparing the electrical patterns produce by the speaker’s voice with a set of prerecorded patterns stored in the computer.

Source Data Input – MICR
Magnetic Ink Character Recognition (MICR)
MICR characters which are printed with magnetic ink , containing magnetic particles are read by MICR equipment producing  digitised signals.
Used by banks to read the information such as printed serial numbers on the bottom of cheques using magnetic ink.
MICR reader/sorter can process cheques and other documents at speeds of up to 2000 documents per minute.

Source Data Input – Magnetic strip
Used on the backs of credit cards and bank debit cards, and various other plastic cards.
Enables readers, such as automated teller machines (ATM) to read account information.

Source Data Input – OMR
Optical-Mark Recognition (OMR)
An input device that senses marks on a piece of paper, using a light beam, and converts them into electronic signals which are sent to the computer for processing.
Commonly used to mark the questionnaires or school examination answer sheets where the students, using pencils mark certain boxes on the examination answer sheets provided.

Source Data Input – Bar codes
Commonly used by sales and stock people in retail stores and supermarkets.
Point-of-sale (POS) terminal scans the bar codes of the Universal Product Code (UPC) to register the price, which is programmed into the host computer, as well as to deduct the item from stock.

Output devices
Translate information processed by the computer into a form that human or another machine can recognize. The two principle kind of output are hardcopy & softcopy .
Hardcopy refers to a printed output.
Softcopy refers to the information that is shown on the  display screen or is in  audio or video form.

There are several ways to produce output
Text output
Graphics output
Sound output
Video output
Text Output
is simply the alphanumeric characters that make up our language. Text output appearance ranges from typewritten to typeset quality.

Graphics Output
includes line drawings, maps, presentations business graphics, computer-aided design, computer painting, photographic reproduction.
Sound output
ranges from the message beeps produced by the computer system to the human voice to music and other forms of sound
Video output
Photographs (still images) or moving images such as television and videotaped material

Printer
A printer provides hard copy output on paper. The basic criteria for evaluating printers include:
Quality of the printed output.
Speed at which printed pages are produced.
Sound level during printing.
Cost of printing media (ribbons, cartridges).
Conservation of paper.

Impact Printers
Form characters or images by striking a mechanism such as print hammer or wheel against an inked  ribbon leaving an image on paper. Make high noise. Now used less.
Non- Impact Printers
Form characters or images without making direct physical contact between printing mechanism and paper.

First printing technology.
Early days typewriters were adapted, produced same high-quality output. Cannot change fonts nor can print graphics or colours. Only the symbols available in the printing mechanism can be produced on the paper.
e.g. daisy wheel printer, drum and belt printers.
Followed by dot-matrix printers.
Dot-matrix output is produced by printers that use wires in the print head. These wires extend out in different patterns, pressing against the ribbon to print the characters on paper. As this mechanism enables the print control up to the dot level on the paper, the can be used to produce both text and graphics.
Fast, but noisy. Wear out ribbons very quickly.

Non-Impact Printers –  Laser Printing
Provide high-quality non-impact printing. Output is created by directing a laser beam onto a drum to create an electrical charge that forms a pattern of letters or images.
As the drum rotates, it picks up black toner on the images and transfers them to paper. The heating process then fixes the toner particles permanently on the paper.
Excellent print quality and font selection.
Fast printing. E.g. 8-500 pages per minute
High quality graphics with colour. High resolution.
Medium level noise, but high cost.
Primary disadvantages are expensive
maintenance and the high cost of toner
cartridges.

Non-Impact Printers – Inkjet Printing
Inkjet printer transfers characters and images to paper by spraying a fine jet of ink.
Offers nearly the quality of laser printing, but not the speed. Low-cost alternative for high quality printing

Plotters
It is a specialized output device designed to produce high-quality graphics in a variety of colours. That are especially useful for creating maps and architectural drawings, although they may also produce less complicated charts and graphics.
Type of plotters
Pen plotter
Electrostatic plotter
Thermal plotter

Video Monitor
Provides soft copy output.
Comes in either monochrome or colour.
A monochrome display a single colour against a different coloured background, such as green on black amber on black or white on black.
Colour display can show a variety of colours.

Video Displays
Resolution – describes the degree
of details in a video display.
The higher the resolution the characters and images are sharper and crisper as film image.
Conventional television display is low resolution as we can see lines, jagged edges and graininess in the image.

Bit-mapped display offers extremely high-resolution. Bit map means that each dot on the screen, called a pixel (for picture element) is represented by one bit (a 1 or 0) by the computer. (monochrome)
Bit-mapped graphics is the colour version of a bit map display. Each pixel identifies a number (e.g. 1-256 on a 256-colour palette) indicating what colour that pixel should be.

The liquid crystal display (LCD) is a flat-screen display commonly used with portable computers.

Terminal
A monitor-keyboard combination.
Has no system unit of its own, but instead uses the facility from a central computer via a communication link.
Mainframes, minicomputers and workstation systems support multiple terminals.

Dumb terminal performs the simplest input and output operations but no processing.
e.g. A bank ATM
Smart (intelligent) terminal may have its own CPU or processing capabilities, as well as built-in disk for storage
e.g. Point-of-sale (POS) cash register

Virtual Reality
An artificial, three-dimensional reality created by the computer giving the real world-like feeling to the user.
Involves many human senses. Special gloves and stereoscopic eyewear are used.
Pilots being trained in a flight simulator

Storage device
A functional unit into which data can be
placed
retained (stored)
retrieved (accessed)

Main Parameters
Location
Internal storage
External storage
Capacity
Speed
Access Method
Primary Storage (Main memory) always uses Random Access method.
Two methods for storing and accessing instructions or data in secondary (external) storage
direct access
Sequential access

Random Access
Random Access means, that in any cell in the memory can be accessed in a fixed time irrespective of its physical location.

Direct access
Direct access means that the data is stored in a specific location so that any data can be found quickly.
e.g. Hard disk, floppy disk, CD-ROM.

Direct access is the most widely used storage method in external storage devices. The most common direct-access storage medium is the disk.

Access Time
RAM
60 nanoseconds (ns) or less to access memory locations in RAM
Secondary Storage
7 to 9 milliseconds (ms) to access sectors in a hard disk

Sequential access
Sequential access means that the data is stored and accessed in a set order, perhaps alphabetically or by date and time. The most common sequential storage medium is magnetic tape on reels or cassettes.

Sequential-access storage devices are used mostly for backup purposes.
e.g. Reel-to-reel magnetic tape, Tape Cartridges

Magnetic Diskette
First magnetic diskette was 8” with mini/mainframe computers
A thin flexible disk is permanently sealed within a rigged protective plastic cover
Sizes were evolved through 8”,5 1/4” & 3 1/2” (diameter)
Storage capacity is
H/D  L/D
3 1/2”  1.44 MB 720MB
5 1/4”  1.2 MB 360KB

Track
On a data medium, a path on the recording surface associated with a single read/write head as the data medium moves past it.
Sector

A predetermined angular part of a track or band on a magnetic drum or a magnetic disk, that can be addressed.
Most industry-standard PCs use sectors which can store 128 or 256 or 512 or 1024 bytes of information

Seek Time
Time required for the access arm of a direct access storage device to be positioned on the appropriate track
Rotational delay (Latency)
Time taken for the sector containing the required record to come under the read/write head

Access time
The interval of time between the moment data is called from memory / storage and the moment the transmission to the requesting device is completed.
i.e. Total time taken to find and transfer data.
Access time = Latency + Transfer time

Block transfer
The process of transferring one or more blocks of data in one operations.
Block size
The number of bytes or any other appropriate unit, in a block
Blocking Factor
The number of records to be contained in a block
Inter-blocking gap
Space between two consecutive blocks on a data medium.
Magnetic Diskette (3 1/2 inch)
Sector = 512 bytes
Track  = 18 sectors  = 18 * 512 bytes  = 9.0Kb
Disk  = Double sided = 2 * 80 tracks
= 2 * 80 * 9.0 Kb  = 1.44Mb
Size = 3 ½ inch
Capacity = 1.44 Mb

Access time = 275 ms
Rotational speed = 720 rpm

Magnetic Diskette (5 1/4 inch)
High Density
Sector = 512 bytes
Track  = 15 sectors  = 15 * 512 bytes  = 7.5 Kb
Disk = Double sided = 2 * 80 tracks
= 2 * 80 * 7.5 Kb  = 1.2 Mb
Low Density
Sector = 512 bytes
Track  = 9 sectors  = 9 * 512 bytes  = 4.5 Kb
Disk = Double sided = 2 * 40 tracks
= 2 * 40 * 4.5 Kb  = 360 Kb

Magnetic Diskette (5 1/4 inch)
Rotational speed = 360 revolutions per minute (rpm)
Two Read/write heads capable of addressing 80 cylinders per diskette at the speed of 3 ms from track to track
Average Access time = 80 milliseconds (ms)
Settling time = 15 ms

Magnetic Disk (Hard Disk)
REMOVABLE DISK
Removable disk pack used in earlier Mainframe & Mini Computers
Disk cartridge – easy to remove like cassettes
FIXED DISK
Installed in a sealed container and it’s not removable
most of the fixed disks use the “Winchester” technology

Disk consists of several platters (e.g. 3). Each platter has two sides. A number refers to each side (e.g. side 0, 1, 2, 3 for 4 surfaces). A Disc pack may have 20 surfaces or = 11 Platters
A disk starts out very unstructured – just a lot of bits of magnetic stuff without any organisation, rhyme or reason. Before the system can start writing records to it, the disk must have a structure- a grid work into which the information can be placed.
Formatting a disk is the process of putting the grid work on the disk and building the organisational structure so that file can be found. Once a disk is formatted it is ready for the system to write data to it
Formatting organises disks into numbered rings called cylinders. A cylinder on a single side is referred to as a track. Each track is broken into numbered pie slices called sectors. Each sector stores information.
Disk pack = 20 surfaces = 11 Platters
Disk = 2048 cylinders (figure has only 4)
Cylinder = 20 tracks (track in each surface)
Track = 72 sectors (figure outermost has 13)
Sector = 512 bytes
Disk Storage = 512 * 72 * 20 * 2048 bytes = 1.44 GB
Rotational speed = 3600 rpm (revolutions per minute) = 16.66 ms per revolution
The time required to position the read-write heads over the required track is the seek time.

The time required for the read-write head to come to a complete stop after it is moved is called the settling time.
The time required for the disk to rotate to the position where the beginning of the desired bock arrives at the read-write head is latency.
Average Rotational delay (latency) = ½ revolution

Track capacity = 72 x 512 = 36 KB
Cylinder capacity = 20 x 36 = 720 KB
Disk capacity = 2048 x 720 = 1.44 GB

Hard Disk Technology
Removable-pack hard disk system
Contains 6-20 hard disks of 10 1/2 or 14 inch diameter, aligned one above the other in a sealed unit.
Fixed disk drive
High-speed, high-capacity disk drives that are housed in their own cabinets.
Redundant Arrays of Inexpensive Disks (RAID)
The disk system consists of a number of  5 1/4-inch disk drives within a single cabinet and sends data to the computer along several parallel paths simultaneously.
The main purpose is to increase the reliability and availability. I.e. If one disk fails, still no data is lost

Magnetic Tapes
Very popular with mainframe computers
Storage density is expressed in  ‘bytes per inch’ (bpi) or character per inch (cpi)
Storage density varies with the quality of the tape & the equipment used to read form and write on it.

Inter record Gap
for logical records on tape to be individually accessible, they must be separated by gaps.
Inter block Gap
if records can be grouped into blocks (physical records), then process is faster and less tape is wasted because fewer gaps, in this case called inter block gaps (IBG).

BOT {header block} {gap} {data block} {gap} …..
….. {data block} {gap} {data block} EOT
Tape header block = 80 bytes
{label identifier (1-3), label number (4), volume identifier (5-10), accessibility (11), reserved (12-37), owner identifier (38-51), reserved (52-79), label standard version (80)}
Tape width = ½ inch
Data storage in tracks
Tape tracks = 9
Data recorded in blocks of characters
Read/write speed = 50 ips (inches per second)

Blocking factor = 20
Block size = 512 bytes
Recording density = 1,600 bpi (bytes per inch)
Inter-block gap = 0.5 inches
Tape length = 2400 feet = 28,800 inches
Block length = blocking factor
x block size / recording density
= 20 x 512 / 1,600 = 6.4 inches

Block + gap = 6.4 + 0.5 = 6.9 inches

Tape blocks = tape length / (block + gap)
Storage capacity = Tape blocks
x blocking factor x block size
= 4,173 x 20 x 512 bytes ? 40 MB

= 28,800 / 6.9 = 4,173
Tape efficiency = block length / (block + gap)
= 6.4 / 6.9 * 100 % ? 93 %

Time to write a block = block length / tape speed
= 6.4 / 50 sec (seconds) = 128 ms
Tape start/stop time = 0.02 sec = 20 ms
Time to write a block + start and stop times
= 20 + 128 + 20 = 168 ms

Average Access time for a block
= average distance + speed
= 28,800/2 x 168/6.9 = 350 sec

Digital Tape
Magnetic tape
A tape made of thin plastic with a magnetizable layer on which data can be stored
Digital tape store data in digital format, instead of analogue format

ZIP Devices
Zip drive (portable or internal) uses a special 3.5 inch disk that holds 100 Mb or 250 MB.

Optical disk storage system
Data recording is done by using laser technology

common type of optical disk
Compact Disk Read Only Memory (CD-ROM)
Compact Disk Recordable (CD-R)
Write Once Read Many times (WORM)
Erasable Optical Disk (EOD)
Optical card

CD-ROM
Holds approximately 650 MB
Used for
Data Storage
Encyclopaedias
Catalogues
Games
Entertainment
Movies
Magazines and books

DVD Devices
Originally –  Digital Video Disk (DVD)
Now  – Digital Versatile Disk (DVD)

Refers to a storage medium that can store TV-quality images on a CD-ROM disk with a capacity exceeding 5 GB

Main Circuit Board of a PC
The main circuit board (motherboard or system board) is the central nervous system of the computer. All the important components are either mounted on it or connected to it.
Primary electronic circuitry resides in it.
Consists of
RAM slots   – ROM chips
CPU    – Clock chip
BIOS chip   – Expansion slots
Disk drive controller chip
Connectors for disk drives
Keyboard connectors
Connectors for serial and parallel ports

Integrated Circuit
An integrated circuit (IC) is a small chunk of silicon semiconductor material that contains hundreds of thousands to millions of electronic circuits.

Chips
Integrated circuit chips are used in several different ways
CPU (microprocessor)
ROM chips
RAM (SIMMs)
Video display controller chip
Disk drive controller chip
Coprocessor chip

RAM Chips
Random Access Memory (RAM, main memory, primary storage) is memory that temporarily holds data and instructions that will be needed shortly by the CPU.
Data are stored and retrieved at random from anywhere in the electronic RAM chip, in approximately equal amounts of time, no matter what the specific data locations are
RAM chips are often mounted on a small circuit board, such as Single Inline Memory Module (SIMM) which is plugged into the motherboard.
Two principal types of RAM chips are
DRAM (Dynamic RAM) commonly used
SRAM (Static RAM) for specialised useRAM is of the four following types.
Conventional memory
Upper memory
Extended memory
Expanded memory

Conventional Memory
Consists of the first 640 kilobytes of RAM
This area is used for running the operating system and applications programs.

Upper Memory
Memory located between 640 KB and 1MB of RAM (384 KB).
Microcomputers with ‘286’ or higher chips use this area for storing parts of the operating system, leaving conventional memory available for running application programs.

Extended Memory
All memory over 1MB. Used by ‘286’ or higher chips.
Not all programs can use extended memory. Indeed, DOS and DOS programs can’t access it. Programs to be able to use this, they must being written with DOS extenders.

Expanded Memory
Lets 8088-chip-based PCs access memory over the limit of 640KB conventional memory.
Used with ‘386SX’ or higher chips

ROM Chips
Read-Only Memory (ROM, firmware) cannot be written on or erased by the computer user.
Contain programs that are built in at the factory.
There are instructions for basic computer operations, such as those that start the computer or put the characters on the screen
Three variants of ROM chips
PROM (Programmable ROM)
blank chips on which the buyer, using special equipment writes the program. Once the program is written it cannot be erased.
EPROM (Erasable PROM)
like PROM chips, but new material can be written.Erasing needs the use of UV rays.
EEPROM (Electrically EPROM)
can be reprogrammed using special electrical impulses. Need not be removed from the computer in order to be changed.

Other forms of Memory
Performance of microcomputers can be enhanced further by adding other forms of memory
Cache memory
Video memory (Video RAM)
Flash memory (flash RAM)

Cache memory
a special high-speed memory area that the CPU can access quickly
Video memory (Video RAM)
are used to store display images for the monitor.
Flash memory (flash RAM)
Card consists of circuitry on credit-card size cards that can be inserted into slots connecting to the motherboard. Is non-volatile. Used in notebooks.

Ports
A port (interface) is a connection from the main circuit board to a peripheral device. The peripheral is connected to the port by a special cable.
Ports are arranged along the rear of the main circuit board and provide connections through the back of the system of the system unit.
Ports commonly connect the main circuit board to the following
Keyboard
Monitor
Printer
Mouse
External modem
Joystick
A port is a socket on the outside of the system unit that is connected to an expansion board or the main board on the inside of the system.
Common types of ports
Parallel ports
Serial ports
Video adapter ports
SCSI ports
Game ports
Parallel port
Allows lines to be connected that will enable 8 bits to be transmitted simultaneously (printer).
Serial port (RS-232 port, COM)
Enables a line to be connected that will send bits one after the other on a single line (modem, mice, keyboard).
Video adapter ports
Used to connect the video display monitor outside the computer to the video adapter card inside the system unit. Monitors may have 9-pin plug or 15-pin plus.
SCSI ports
Small Computer System Interface (SCSI) provides an interface for transferring data at high speed for up to eight SCSI-compatible devices (external hard-disk drives, magnetic-tape backup units, CD-ROM drives, Scanners).
Games ports
Allows you to attach a joystick or similar game-playing device to the system unit.

Expansion Slots
Expansion card (adapter card) is a printed circuit card with circuitry that gives the computer additional capabilities. This is inserted into an expansion slot on the main board.

Expansion Cards
Memory Expansion cards (SIMMs)
Expansion cards are used to connect the following devices to the main circuit board
Video monitor (Display adapter, graphics display cards)
Dirk drive (controller cards)
Scanner (controller cards)
External CD-ROM (controller cards)
Internal modem –  Sound
TV tuner-  Network

Bus Lines
A bus line (bus) is an electrical pathway through which bits are transmitted within the CPU and between the CPU and other units in the system unit.
Principal PC bus standards (architectures)
Industry Standard Architecture (ISA)
Micro Channel Architecture (MCA)
Enhanced ISA (EISA)
PCI (Peripheral Computer Interface)
Personal Computer Memory Card International Association (PCMCIA)
Industry Standard Architecture (ISA)
First 8 bits, then 16 bits, is the most common PC bus.
Micro Channel Architecture (MCA)
Used in IBM PS/2 line of microcomputers. 32 bits.
Enhanced ISA (EISA)
32 bits. ISA cards will run in EISA slots.
PCI
The latest standard available in 32-bits and 64 bits
Personal Computer Memory Card International Association (PCMCIA)
Completely open, nonproprietary bus standard for notebooks, sub-notebooks and palmtops.

Local Bus Extensions
Used to bypass existing standards bus systems (connect to peripheral computers directly to the microprocessor).
Peripheral Component Interconnect (PCI)
64-bit data path used in Pentium-based systems.
Video Electronics Standard Association (VESA)
32 bits, used with ‘486 systems

Memory Hierarchy
CPU Registers
Used to store the information required by the current instruction being processed and to keep the status of the processor.
Cache memory
used to reduce the speed gap between the CPU and the main memory by placing in between
RAM
used as storage for data and/or instructions involved in the programs currently being executed by the CPU
Secondary Storage
used for data or instructions that may be processed at some later time or stored indefinitely

Registers
CPU must have the currently being executed instruction and the other components related to that within the CPU itself.
Both control unit and ALU have registers.
Registers are high-speed temporary storage areas, to hold both instructions and data during a processing sequence of an instruction.
Several types of registers
Instruction register
holds an instruction (e.g. to add, to multiply or to perform a logical comparison)
Storage register
temporarily holds data retrieved from RAM prior to processing.
Accumulator
temporarily stores the results of continuing arithmetic and logical operations.

Data Representation
The computer understands only the binary language of 1s and 0s.
Each 1 or 0 is a bit, and the representation they form (an 8-bit character: a letter, a digit, a symbol) is called a byte.
Bytes are organised into words for presentation to the processor.
Word is a logical unit of information, made up of bits and bytes, that can be stored in a single memory location.
Word length (size) is the number of bits in a single memory location.

The Machine Cycle
Machine Cycle is the sequence of steps by which an instruction is processed.
Cycle is the length of time the CPU takes to process one machine instruction or word.
A machine cycle is comprised of two cycles
instruction cycle (fetch, decodes)
execution cycle (execute, store)

CPU Speed
The CPU performance is measured in Million Instructions Per Second (MIPS)
CPU speed is measured based on an internal clock. Clock speed is measured in Hertz, which is a unit of measurement of electrical vibrations.
One Hertz is equal to one cycle per second.
One million Hertz is one megahertz (MHz)
Time to complete one machine cycle, in fractions of a second
milliseconds (one-thousandth of a second) in older computers
microseconds (one-millionth of a second) for most microcomputers
nanoseconds (one-billionth of a second) for mainframes
picoseconds (one-trillionth of a second) in some experimental machines

Buffer
A temporary data-storage area.
e.g. a file sent to the printer is held in the printer’s buffer because the printing speed is much slower than the data transfer speed.
A disk buffer is an important element in advanced disk input/output strategies.
When an application program requests data from a disk, the system software allocates a buffer and transfers the data from the appropriate disk sector into the buffer.

Disk Cache
Semiconductor memory which temporarily stores information that is frequently requested from the disk drives.
Improves the speed of disk-intensive applications (e.g. database)

Comparison of Capacity and Speed
Memory Capacity Access Time
CPU Bits (8 – 64)  MHz (4 – 700)
RegisterBits (8 – 64)   MIPS (0.8-100)
Cache KB (64 – 512)   ns (25-50)
RAM MB (8 KB – 64)   ns (60 – 80)
Disk GB (10 MB – 10)   ms (6 – 15)
Tape MB (40)   minutes (128ms – 5)

What is a Computer?
A Computer is a programmable, multipurpose machine that accepts data (e.g. raw data, facts & figures) and processes, or manipulates it into information we can use, such as summaries or totals
E.g.  An automatic teller machine (ATM) computes  the deposits and withdrawals to give you the total  in your account.
Data: (ISO) A representation of facts, concepts or instructions in a formalised manner suitable for communication, interpretation or processing by human beings or by automatic means.

Letters, numbers, colours, symbols, shapes, temperatures, sound or other facts and figures are data suitable for processing.

Information: (ISO) The meaning that is applied to data by means of the conversions applied to that data. I.e. processed data.
A computer is an electronic device, operating under the control of instructions stored in its own memory unit,
which can accept and store data
(e.g. data entered using a keyboard),
perform arithmetic and logical operations on that data without human intervention and
(e.g. process data into information)
produce output from the processing
(e.g. view information on the screen).
Computer: (ISO) A programmable functional unit that consists of one or more associated processing units and peripheral equipment, that is controlled by internally stored programs and that can perform substantial computation, including numerous arithmetic operations or logic operations, without human intervention during a run.

A computer may be a stand-alone unit or may consist of several interconnected units.
Personnel computers, microwave ovens, portable phones are machines that use processing devices (microprocessors).
A collection of circuits implementing the representation and manipulation of bits.

A collection of programs, which allow us to
control the devices, attached to it.

A collection of tools, which are designed to improve our productivity. (e.g. Calculator, clock, diary, notepad, spreadsheet, appointment scheduler, word processor)

Why Computers Needed?
Fundamental Characteristics, to Increase
Accuracy
Speed
Storage capacity
The three fundamental characteristics enable the following by-products
Increased Productivity
Efficient Decision Making
Cost Reduction
Hand over our recursive & tedious work to the computer
Typewriter Word Processor/Printer
Card file Database Management System
letter  e-mail
phone dialler Communication Management program

What does a Computer  do?
Input Operations
Arithmetic Operations
Logic Operations
Output Operations
Storage Operations
Input Operations
computer can accepts data & instructions.
Arithmetic Operations
Computer can process arithmetic operations such as Addition, Multiplication, Subtraction & Division
Logic Operations
Computer can perform logic operation such as
AND, OR, NOT ….
Output Operations
Computer can produce an output as a screen view, as a hard copy, as a sound output.
Storage Operations
Computer can store a large amount of data & programs permanently & perform tasks later.

Components of a Computer System – Input Units
Input units are used to feed data and instructions to the computer system.
Input units provide the interface between the outside world and the computer system for this purpose.
The most common examples for the input units are
Keyboard
Mouse
Other pointing devices.

Components of a Computer System – Output Units
The output units of a computer system are used to produce the results of the operations performed by the computer.
They are also used to output the error messages and other status of the system.
The most common examples for the output units are
Display monitors
Printers

Components of a Computer System – Internal Storage
This is also called as the main memory, and most commonly RAM (Random Access Memory).
Volatility is one of the specific features of the main storage. That is, it requires continuous supply of electrical power to retain information.
The internal store is used is used to
Receive the commands and data from the input units.
Store the information ready to be sent to the output units.
Store the currently running program(s).
Store the data required for the currently running program(s).
Store the intermediate data generated by the currently running program(s).

Components of a Computer System – External Storage
The external storage units are non-volatile.
They are used to store programs and data for future use.
They are also used when the capacity of the internal storage is insufficient to keep the currently running program(s) and the data required.
The common external storage units are
Floppy Disks
Hard Disks
CD ROMs
Magnetic Tapes

Components of a Computer System – Arithmetic & Logic Unit
Both the Arithmetic & Logic Unit (ALU) and the Control Unit of a computer system are collectively called as the Central Processing Unit (CPU) of a computer system.
When all the electronics required to implement the functions of the CPU are included in a single Integrated Circuit(IC) chip, it is called a Microprocessor.

The ALU performs all the arithmetic and logical operations required during the execution of the programs.

Components of a Computer System – Control Unit
Main function of the control unit is to issue the control signals to all the components to activate the role of each of them during the process of running a program.
It receives the individual instructions in a program one by one and then decodes them to identify the type and the sequence of the control signals to be generated.
The control unit is responsible for the overall control of the system.

Hardware, Software and Firmware
Hardware
Consists of all machinery and equipments which comprise a computer system .
All tangible items in a Computer system fall into the category of hardware.
That is, in a Computer system the hardware includes, among other devices, the Keyboard, the Screen, the Printer and the Computer or processing devices itself.

Software
Intangible in nature.
Consists of the step-by-step instructions that tell the computer what to do.
Needs some media to exist.
Runs on top of hardware making the hardware usable.
Software is  divided into two basic categories
Application Software
System Software

Firmware
Inbuilt software which has been installed by the manufacture.
The permanent pieces of software which are not supposed to be altered by the users are presented in this form.
Firmware brings some flexibilities in the manufacturing process of the computer systems.
E.g.  Machine-language programs stored
on ROM chip

User interacts with applications software. Systems software enables the application software to interact with the computer and help the computer to manage its internal resources.

System Software
System Software provides the interface between the hardware and the application software.
In this context it hides the hardware complexities and also brings the different hardware configurations into common platforms
Enables the Application software to interact with the computer & help it manage it’s internal resources.
System software make the hardware of the computer system accessible by the application programs and the users.
The systems software consists of several programs, one of the most important of which is the operating system. The operating system acts as the master control program that runs the computer.

Application Software
Application software may be either customized or Packaged

Customized Software
is the Software designed for a particular customer according to their needs.

Packaged Software
Also called as Software Package, is the kind of  “Off-the-Shelf” program developed for the general use.

Packaged Software
Word Processing

The Most popular kind of applications program, allows a person to use a computer to create, edit, save and print documents. Used to prepare letters, memos, reports, manuscripts, etc.

E.g. Microsoft Word

Spreadsheet

Allows a person to use rows, columns and formulas to display, analyse and summarise data, mostly numerical data. Used to do budgets, sales projections, financial plans, etc.

E.g. Microsoft Excel, Lotus 1-2-3

Allows a person to use a computer to define files, records within files and data elements within records in a relatively easy manner and provide a convenient method to access, update and create reports from the data managed in multiple files. Used to manage employee lists, student list letters, etc.

E.g. Microsoft Access, dBase, SQL Server

Graphics

Allows a person to present information in the form of charts and graphics or to create complex freehand artwork. Used for presentations.
Simple graphics are provided by spreadsheet software, while others are more sophisticated s/w.

E.g. Microsoft PowerPoint, Adobe Illustrator

Communications

Allows a person to manage the transformation of data between computers over wired or wireless channels. Used for E-mail, Internet, FTP, etc.

E.g. ProComm, Smartcom, Crosstalk

Other useful Software

Personal information managers, desktop publishing, hypertext, scheduling programs.

Customised Software
Accounting, Sales and Distribution, Manufacturing, Management Sciences, Medical and Health Care, Real Estate, Personal Investor, Tax Manager, Time Scheduler, etc.
e.g. ACCOUNTING
Inventory Control
Accounts Receivable
Payroll
General Ledger
Integrated Software
Put together functions of separate software into a single software package.

Application Software Users
Individual
Disable User
Office Secretary
Classroom/Labs
Company Manager
Artwork
Research
Fishing, coastal data
Air control

Types of Processing
Data may be taken from secondary storage and processed in either of two ways
batch processing (later)
real-time processing (right now)

Batch Processing
Data is collected over several days or weeks and then processed all at one time, as a “batch”.
e.g. banks for balancing checking accounts. After all checks have been processed in a batch.

Real-time Processing
Records information immediately and responds to user requests at the time transactions occur.
e.g. when you use ATM card to withdraw cash, the system automatically computes your account balance then and there.

Storage devices in Data Processing
Storage devices can be classified into two main types
online
offline

Online Storage
Data is directly accessible for processing
Storage medium physically connected to and controlled by  the central processing unit.
e.g. : Magnetic Disk, Magnetic Diskette, Optical Disk

Offline Storage devices
Data is not directly accessible for processing until tape or disk has been loaded onto an input device.
The use of removable computer media. Data not readily accessible to the CPU
e.g. : Magnetic tapes, Tape cartridges

Numbering Systems
Decimal
Binary
Octal
Hexadecimal (Hex)

Decimal Number System
Base (Radix) 10
Digits  0, 1, 2, 3, 4, 5, 6, 7, 8, 9
e.g.   345
Binary Number System
Base  2
Digits 0, 1
e.g.  1101

Octal Number System
Base  8
Digits 0, 1, 2, 3, 4, 5, 6, 7
e.g.  664

Hexadecimal Number System
Base  16
Digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
A, B, C, D, E, F
e.g.  A74D
Data Representation
How Data is Stored
BIT – B i n a r y d i g I T ( Either 0 or 1 )

Basic unit for storing data in main computer memory is the bit. A bit can represent one of only two values.
bit  0 is said to be “off”
bit  1 is said to be “on”

byte – 8 bits
Many computers use a combination of 8 bits (called a byte) as a unit for storing data.
Thus a byte is a location in the computer main memory consists of 8 adjacent bits.
When a character is entered from the keyboard, the computer interprets the character and stores it as a series of bits being “on” and “off”.

How Capacity is Expressed
Kilobyte (KB) is about 1000 bytes
1024 Bytes (2-10 bytes)
Megabyte (MB) is about 1 million bytes
1024 KB (2-20 bytes)
Gigabyte (GB) is about 1 billion bytes
1024 MB (2-30 bytes)
Terabyte (TB) is about 1 trillion bytes
1024 GB (2-40 bytes)

Data Measurement Units for Characters
American Standard Code for Information Interchange (ASCII)
Most widely used coding system to represent data.
Extended Binary Coded Decimal Interchange Code (EBCDIC)
EBCDIC is commonly used on mainframes.

Character Codes – ASCII
Numbers (0-9)
bits 5 and 6 on and  bits 1-4 used based on binary number system
Letters (A-Z)
bits 7 on and  bits 1-4 used based on binary number system

Number  ASCII
0 0110000
1 0110001
2 0110010
3 0110011
4 0110100
5 0110101
6 0110110
7 0110111
8 0111000
9 0111001

Letter    ASCII
A 1000001
B 1000010
C 1000011
D 1000100
E 1000101
F 1000110
G 1000111
H 1001000
I 1001001
J 1001010
K 1001011
L 1001100
M 1001101
N 1001110
O 1001111
P 1010000
Q 1010001
R 1010010
S 1010011
T 1010100
U 1010101
V 1010110
W 1010111
X 1011000
Y 1011001
Z 1011010

Character Codes – EBCDIC
Each 8-bit byte is divided into two portions
zone portion and digit portion
digit portion is based on the binary number system
Numbers
All zone bits “on” and binary digits
Letters (A-I)
Two zone bits (7, 8) “on” and binary digits
Letters (J-R)
Three zone bits (5, 7, 8) “on” and binary digits
Letters (S-Z)
Three zone bits (6, 7, 8) “on” and binary digits

Number  EBCDIC
0 11110000
1 11110001
2 11110010
3 11110011
4 11110100
5 11110101
6 11110110
7 11110111
8 11111000
9 11111001

Letter  EBCDIC
A 11000001
B 11000010
C 11000011
D 11000100
E 11000101
F 11000110
G 11000111
H 11001000
I 11001001
J 11010001
K 11010010
L 11010011
M 11010100
N 11010101
O 11010110
P 11010111
Q 11011000
R 11011001
S 11100010
T 11100011
U 11100100
V 11100101
W 11100110
X 11100111
Y 11101000
Z 11101001

Number Representation
Sign bit (leftmost position of the number)
0 for positive
1 for negative
Binary point (decimal point)
two ways of specifying
Fixed point number representation
Floating-point number representation

Fixed Point Number Representation
Assumes that the binary point is always fixed in one position
this method is not used in most computers
either
in the extreme left to make the stored number a fraction
or
in the extreme right to make the stored number an integer

Integer Representation
Three different ways to represent negative integer with eight bits. Most computers use the signed-2’s complement representation when performing arithmetic operations with integers.
E.g.  +14 0 0001110
-14 1 0001110 (signed-magnitude)
-14 1 1110001 (signed-1’s complement)
-14 1 1110010 (signed-2’s complement)

Signed-magnitude
used in ordinary arithmetic, awkward in computer arithmetic
Signed-1’s complement
Used in some old computers, 0 has two representations (+0 and -0)
Signed-2’s complement
Formed by leaving all least significant 0’s and the first 1 unchanged and then replacing 1’s by 0’s and 0’s by 1’s in all other higher significant bits

2’s Complement Arithmetic
Does not require subtraction, only addition and complementation.
Add two numbers, including the sign bit and discard any carry out of the sign (leftmost) bit position.
E.g. 6+13, -6+13, 6-13, -6-13

+6    00000110   -6     11111010
+13  00001101   +13  00001101
________________________________________________________
+19  00010011  +7     00000111
_______________________________________________________

+6    00000110   -6     11111010
-13   11110011   -13   11110011
__________________________________________________________
-7     11111001  -19    11101101
________________________________________________________

Floating-Point Number Representation
Two parts
a signed, fixed-point number called the mantissa, which can be a fraction or an integer
the position of the decimal (binary) point called the exponent
E.g. Decimal number +6132.789 can be written as  fraction +0.6132789 and an exponent +04
The radix r and the radix-point position of the mantissa are always assumed.
A floating-point binary number is represented in a similar manner except that it uses base 2 for the exponent.
E.g. The binary number 10.11
can be written as
0.1011 x 2

Word Size
Capacities of the CPU’s are expressed in terms of word size (number of characters).
A word is a group of bits that may be manipulated or stored at one time by the CPU.
Transferring data will be done in chunks of bits (e.g. 8-bit or 32-bit).
Word size can be 8-bit, 16-bit, 32-bit, 64-bit.

Register
A register is a group of flip-flops with each flip-flop capable of storing one bit of information
An n-bit register has a group of n flip-flops and is capable of storing any binary information of n-bits
Register is part of the internal storage having specified storage capacity and usually intended for a specific purpose.

Data Transmission
The conveying of data from one place for reception elsewhere by telecommunication means.
Serial Transmission
data (binary digits) can be transmitted only 1 bit at a time using  only one communications line.
Parallel Transmission
sends each byte (a series of bits) simultaneously using separate lines.

Serial Data Transmission
Data Terminal Equipment (DTE)
Devices such as terminals or computers which connect to data communications equipment

Parallel Data Transmission

Signals
Binary information is represented in digital computers by physical quantities called signals.
Electronic signals such as voltages exist throughout the computer in either one of two recognisable states.
The two states represent a binary variable that can be equal to 1 or 0.
E.g.  3 volts to represent binary 1
0.5 volt to represent binary 0

Parity Bit
Binary information transmitted through some form of communication medium is subject to external noise that could change bits from 1 to 0 and vice versa.
Error detection code is a binary code that detects digital errors during transmission.
The most common error detection code used is the parity bit.
Parity bit
A parity bit is an extra bit included with a binary message to make the total number of 1’s either odd or even.
Parity odd is chosen such that the sum of 1’s in all bits is odd. I.e. if odd = 0, if even = 1
Parity even is chosen such that the sum of 1’s in all bits is even. I.e. if even = 0, if odd = 1
At the sending end the parity bit is generated and included to the message. At the receiving end parity check is applied to detect errors.

Gates
Binary logic deals with binary variables and with operations that assume a logical meaning.
The manipulation of binary information is done by logic circuits called gates.
Gates are blocks of hardware that produce signals of binary 1 or 0 when input logic requirements are satisfied.
The input-output relationship of the binary variables for each gate can be represented in tabular form by a truth table

Logic Operations
AND
OR  (Inclusive-OR)
NOT  (Complement)
NAND  (Complement of AND)
NOR  (Complement of OR)
XOR  (Exclusive-OR)
Exclusive-NOR (Equivalence)

AND Logic Gate
Truth Table
A  B   x
________
0  0   0
0  1   0
1  0   0
1  1   1

A, B Binary Input Variables
x Binary Output Variable

OR  Logic Gate
Truth Table
A  B   x
________
0  0   0
0  1   1
1  0   1
1  1   1

NOT Logic Gate
Truth Table
A   x
________
0   0
1   1

NAND Logic Gate
Truth Table
A  B   x
________
0  0   1
0  1   1
1  0   1
1  1   0

NOR Logic Gate
Truth Table
A  B   x
________
0  0   1
0  1   0
1  0   0
1  1   0

XOR Logic Gate
Truth Table
A  B   x
________
0  0   0
0  1   1
1  0   1
1  1   0

Exclusive-NOR Logic Gate
Truth Table
A  B   x
________
0  0   1
0  1   0
1  0   0
1  1   1

Types of Computer Infestations
Virus
a program that can replicate itself by attaching itself to other programs.
Worm
a program that spreads copies throughout a network without needing a host program.
Trojan horse
like a worm, but is substituted for a legitimate program and cannot replicate itself.

What is a virus?
Program
copies itself
may modify system
may have obvious effect
may do other damage

Infecting by an Appending Virus
INSTALL.BAT
INSTALL.BAT
VIRAL.BAT
VIRAL.BAT
PROGRAM.EXE
PROGRAM.EXE
VIRAL.EXE
VIRAL.EXE

Infecting by an Insertion Virus
PROGRAM.EXE
PROGRAM.EXE
VIRAL.EXE
VIRAL.EXE

Infecting by a Replacement Virus
PROGRAM.EXE
PROGRAM.EXE
VIRAL.EXE
VIRAL.EXE
PROGRAM.COM

Virus Symptoms
Program takes longer than normal to load
Excessive disk access for simple tasks
Less memory than usual available
Files mysteriously disappear or appear
Noticeable reduction in disk space
Executed files have changed size
Files constantly get corrupted
File extensions or file attributes change without software
Bad sectors on the hard drive continue to increase
Hard disk boots but hangs before DOS prompt or Windows
A message display from virus scanner s/w

How do they get transmitted?
Boot Sector Infectors
Program/File Infectors
.exe; .com; .sys; .dll; .ovl; .mnu; .bat; .prg; .obj
.doc; .xls; .dot;
File System Infectors

Where do we get them from?
Installing infected software
Pirate software
Free software (magazines, etc.)
Internet
Email attachments
Shared documents

What are the dangers?
Trigger dates
Some viruses have trigger dates
26 April, etc. is well known
Also …
Viruses with no trigger date are 10 times more likely

Some general precautions
Educate users
Backups of all mission critical applications and data
Original source discs for all software
Disable floppy drive at boot up
Disable macros in MS Office products

General precautions – 2
Take care with Internet files
Buy and use virus scanning software
Create an integrity check and use it
Take care with Internet files
Buy and use virus scanning software
Create an integrity check and use it
Don’t assume everything is due to viruses
Antivirus (AV) Software
Norton AntiVirus by Symantec Corporation
McAfee Virus Scan by McAfee Associates, Inc.
PC-cillin II by Touchstone Software
Dr. Solomon’s Software
etc., etc.

Protecting Against Viruses
Antivirus (AV) Software
Set AV to run at Startup
e.g. AutoExec.Bat

Anti-Virus Techniques
Boot from a Floppy Disk
FORMAT A:/S
COPY AUTOEXEC.BAT CONFIG.SYS
SET COMSPEC=A:\COMMAND.COM
Write Protect Disk

Log Available Disk Space
CD\
DIR >> DIR.LOG
TYPE DIR.LOG > PRN
Same for Bad Sectors

Types of Processing
Data may be taken from secondary storage and processed in either of two ways
batch processing (later)
real-time processing (right now)

Batch Processing
Data is collected over several days or weeks and then processed all at one time, as a “batch”.
e.g. banks for balancing checking accounts. After all checks have been processed in a batch.

Real-time Processing
Records information immediately and responds to user requests at the time transactions occur.
e.g. when you use ATM card to withdraw cash, the system automatically computes your account balance then and there.

Storage devices in Data Processing
Storage devices can be classified into two main types
online
offline

Online Storage
Data is directly accessible for processing
Storage medium physically connected to and controlled by  the central processing unit.
e.g. : Magnetic Disk, Magnetic Diskette, Optical Disk

Offline Storage devices
Data is not directly accessible for processing until tape or disk has been loaded onto an input device.
The use of removable computer media. Data not readily accessible to the CPU
e.g. : Magnetic tapes, Tape cartridges
Terminology
Data
Information
Database – File, Record, Field, Byte, Bit
Database System
Database Management System
Relational Database Management System

Data
Raw data which is unprocessed
Text, numbers and other symbolic representations, suitable for communication or interpretation
E.g. Sri Lanka – An island
1957 – FORTRAN was introduced
– Bachelor of Information Technology

Information
Processed data
An organised collection of facts or data

E.g. BIT – Bachelor of Information Technology,     External Degree
17 – Minimum age to register for the BIT

A database is a collection of information related to a particular subject or purpose, such as tracking customer orders or maintaining a music collection. If your database isn’t stored on a computer, or only parts of it are, you may be tracking information from a variety of sources that you’re having to co-ordinate and organise yourself.

Database
A collection of interrelated files

E.g. Course file, Student file, Results file

File
A collection of related records
E.g. BIT Courses (Modules)
IT1101: Mathematics for Computing I
IT1201: Fundamentals of Programming
……
IT2301: Database Management Systems (DBMS)

All data of the database may be in one file (simplest)
or
it may be in a number of files, depending on the way database was designed and the data subsequently represented
E.g. Student information (name, address,   registration number, date of birth) and   course results for each module in one  file
or
Student information in one file and   course results in another file

When a DBMS work with many files, they are linked together as a database, allowing to use them as a single file.

E.g. Student information and course results are linked using the student examination index number

Record
A group of related fields. Each field is a data item that is part of a large record
All the student information required for the registration process (Student surname, initials, reference number, date of birth, address, examination centre,  entry requirements, registration number)
E.g. {Dias|A.B.|00A123|01/01/80|
10,Galle Rd,Colombo 3|
Colombo|O/L A/L|R001234}

Field
Represents an attribute or a characteristic or a piece of information
It is a grouping of characters into a word, a group of words (text) or a complete number (group of digits) or symbolic representation.

E.g.  A.B. Dias – Name of a student
00A123 – Reference number of a student
20 – Age of a student

Byte
A single character (letter, number, symbol) is represented using a group of bits

E.g. 10101010 letter J in ASCII

Bit
The smallest unit of data

E.g. 0 or 1

Database System
A computerised record keeping system that organises data into records in one or more databases / files.

Database Management System (DBMS)
DBMS
An application software that organises data into records in one or more databases and allows organising, accessing and sorting of the data in a variety of formats.
Relational DBMS
Most common type of DBMS. Data elements are stored in different tables made up of rows and columns. Relates data in different tables through the use of common data element(s).

Types of Files
Many types of files
Program files – files containing software instructions
Data files – files that contain data

Two commonly used data files
Master file – a file containing relatively permanent records and is generally updated periodically, e.g. address-label file
Transaction file – a temporary holding file that holds all changes to be made to the master file: additions, deletions, revisions, e.g. new enrolments

Access Modes
Sequential Access
information is accessed in sequence. Sequential access storage (magnetic tape) stores data in sequence to support sequential access
Direct Access
information can be accessed directly. Direct access storage (disk) is required. Data may or may not be stored in sequence.

File Organisation
A technique for physically arranging the records of a file on secondary storage devices
Sequential file organisation
Direct / Random / Hashed file organisation
Index file organisation
index sequential file organisation
index non-sequential file organisation

Sequential file organisation
stores records in sequence, one after the other (used with magnetic tape)
Direct (random) file organisation
stores records in no particular sequence, and a record is retrieved according to its key field or unique element of data (used with hard disk)
Hashed file organisation
a direct file organisation which determines the location of the record using a hashing algorithm

Index
a table or other data structure used to determine the location of records in a file that satisfy some condition
Indexed file organisation
stores records either in sequence or non-sequence. However, the file in which the records are stored contains an index that lists each record by its key field and identifies its physical location on the disk. (require magnetic disk)

Indexed sequential file organisation
stores records in sequential order and an index is created to locate individual records
Indexed non-sequential file organisation
stores records in non-sequential order and an index is created to locate individual records

Key Field
An important concept in data organisation is the key field.
Key field is a particular field that is chosen to uniquely identify a record so that it can be easily retrieved and processed.
The key field is often a number, e.g. an identification number (NID), customer account number, employee number
Primary characteristics of the key field is that it is unique

Keys
Two types of keys
Primary key and Secondary Key

Primary Key
One field or a combination of fields that uniquely identifies a record
Primary key can be natural (a field(s) of the record is usable as the primary key; e.g. NID of person) or user defined (a new field is introduced as the primary key; e.g. empno of an empoyee)
Secondary Key
One field or a combination of fields for which more than one record may have the same combination of values
indexed files are used for this purpose
multiple indexes can be created for a single file

Examples of Databases
Three general types:
personal, public and company

Personal (microcomputer) database
consists of integrated files on a microcomputer hat is used mainly by one person. It is a stand-alone database that not on-line
e.g. Paradox, Access, dBASE, FoxPro, Alpha Five,
FileMaker Pro, Approach
Public database
an enormous compilation of data, any part of which is available for a fee to the public
e.g. Lexis for lawyers, news, weather, travel
Company database
a collection of records shared throughout a company or other database. Generally records are private. It may be located in one place (centrally) or may be distribute
e.g. multi-user RDBMS such as Oracle, MS-SQL

Traditional File Environment
Origins goes back to 1950s when traditional file management systems were used.
is a way of collecting and maintaining data in an organisation that leads to each functional area or division creating and maintaining its own data files and programs

Payroll, Personnel, Benefit divisions maintaining their own data files and programs

Types of Database Organisations
Many types of Database organisations have been used to manage databases (e.g. hierarchical, network, relational, object-oriented, object-relational)
Relational DBMS is the most widely used type of database organisation
Relational DBMS
relates or connects, data in different files through the use of a key field or common data element

Advantages of DBMS

over the traditional file management systems
Sharing of data – same information is available for different users
Economy of files – less data redundancy (excess storage of data)
Data integrity – consistent data
Security – specific data can be limited to selected users
Flexibility – can respond to unanticipated information requirements in a timely fashion.

Relational Data Model
Relational model was proposed during the same period (1970)
but took some time for acceptance
delay in emergence of commercial products
e.g.  INGRES, Oracle, System R (late 70s)
Became dominant due to the complexity of programming, navigating and changing data structures in the older DBMS data models
Used non-procedural (declarative and non-navigational) language, e.g. SQL
The relational model of data has three major components:
Relational database objects
Relational operators
Relational integrity constraints

The Relational Objects
Table
A table is a collection of data about a specific topic, such as employees or suppliers. Using a separate table for each topic means you store that data only once, which makes your database more efficient and reduces data-entry errors. Tables organise data into columns (called fields) and rows (called records).
Data is presented to the user as tables
Tables are comprised of rows and a fixed number of named columns.
Columns are attributes describing an entity. Each column must have an unique name and a data type.
Rows are records that present information about a particular entity occurrence
Within the database, divide your data into separate storage containers called tables; view, add, and update table data using online forms; find and retrieve just the data you want using queries; and analyse or print data in a specific layout using reports

Each table has a primary key.  The primary key is a column or combination of columns that uniquely identify each row of the table.
A foreign key is a set of columns in one table that serve as the primary key in another table
data in different tables are related or connected through the use of a foreign key field or common data element
To store your data, create one table for each type of information you track. To bring the data from multiple tables together in a query, form, or report, you define relationships between the tables.
Index
An ordered set of pointers to the data in the table
A primary key must always have a unique index
Relational Operators
Relational operations are specified using Structured Query Language (SQL) — a standard for relational database access.
Relational operations are set level, meaning that they operate on multiple rows, rather than one record at a time.
SQL is non-procedural, meaning that the user specifies what data is to be retrieved rather than how to retrieve the data.
Each operator takes one or more tables as it operand(s) and produces a table as its result.
Any column value in a table can be referenced, not just keys.
Operations can be combined to form complex operations.

Relational Integrity Constraints
User defined integrity constraints can be enforced by the database server using triggers and stored procedures.
Stored procedures are explicitly executed (i.e. called) by a client application.  They are useful for encapsulating application logic in the database server:
Triggers are implicitly executed by the database server when a DELETE, UPDATE or INSERT SQL command is issued.  They are useful for enforcing user defined integrity constraints.

Setting up a Database
Determine the DBMS and storage environment
Design the database and determine its name (filename for personal database)
Define the data and build the data dictionary
Create the database (keys, indexes, constraints, users, access rights) using the software
Enter and edit data (use forms)
Make views, queries, reports and labels
Build application programs

Designing the Database
Determine the contents of the database
Determine what data is needed and the best way to organise it
Determine the tables and their fields
Determine the primary key

Define the data
Define the field names, their type and width
Field name is the name that describes the data to be entered in a field, e.g. emp_no
Field type defines the type of data that will be stored in a field
alphanumeric, numeric, date, logical, memo
Field width has a defined limit on the number of characters it can hold (e.g. 255)
Numeric data may specify the number of decimal places

Naming Fields
Can be up to 64 characters long.
Can include any combination of letters, numbers, spaces, and special characters except a period (.), an exclamation point (!), an accent grave (`), and brackets ([ ]).
Can’t begin with leading spaces.
Can’t include control characters (ASCII values 0 through 31).
Include spaces in field names can produce naming conflicts in Visual Basic for Applications in some circumstances.
To avoid unexpected results is to always use the ! operator instead of the . (dot) operator to refer to the value of a field.
e.g.  [TableName]![FieldName]

Define the data type
Alphanumeric (String) fields [TEXT, CHAR, VARCHAR]- consists of letters, numbers and special characters; you can’t perform numerical calculations using these fields
Numeric fields [NUMBER, INTEGER, DECIMAL, FLOAT]- consists of numbers for calculations; you can’t store text in these fields
Currency fields [CURRENCY, MONEY]- prevent rounding off during calculations numbers
Date fields [DATE/TIME, DATE]- consists of day, month, year
Logical fields [TRUE/FALSE, LOGICAL]- consists of two states (true/false; yes/no; on/off)
Sequence fields [AutoNumber]- Unique sequential (incrementing by 1) or random numbers automatically inserted when a record is added.
Memo fields [MEMO]- consists of lengthy amounts of text and numbers, such as notes or descriptions
Binary fields [OLE Object, RAW]- binary data such as images
OLE Object – Objects (such as Microsoft Word documents, Microsoft Excel spreadsheets, pictures, sounds, or other binary data), created in other programs using the OLE protocol, that can be linked to or embedded in a Microsoft Access table
Hyperlink – Field that will store hyperlinks to access web pages
Lookup Wizard – Creates a field that allows you to choose a value from another table or from a list of values using a combo box. Choosing this option in the data type list starts a wizard to define this for you.

Data Types and Limits
Data type Size
Text Up to 255 characters.
Memo Up to 64,000 characters.
Number 1, 2, 4, or 8 bytes.
Date/Time 8 bytes.
Currency 8 bytes.
AutoNumber 4 bytes.
Yes/No 1 bit.
OLE Object  limited by disk space
Hyperlink Up to 64,000 characters.
Lookup Wizard 4 bytes.

Predefined Data Type Formats
Number
Setting  Data Display
General Number –3456.789 –3456.789
Currency   –3456.789 ($3,456.79)
Fixed  –3456.789 –3456.79
Standard  3456.789 3,456.79
Percent  0.45 45%
Scientific  –3456.789 –3.46E+03
Date/Time
Setting  Display
ddd”, “mmm d”, “yyyy Mon, Jun 2, 1997
mmmm dd”, “yyyy June 02, 1997
“This is week number “ww This is week number 22
“Today is “dddd Today is Tuesday
Yes/No
display in place of Yes, True, or On values
display in place of No, False, or Off values
Text / Memo
Symbol  Description
@   a character or a space is required.
&  Text character is not required.
<  Force all characters to lowercase.
>  Force all characters to uppercase.

You can also create a custom display format for all data types except the OLE Object data type

Selecting the Data Type
Minimise storage space
decide on the storage space required to use for values in the field, e.g. 2 byte integers for marks
Represent all possible values
eliminate illegal values, e.g. can’t store text in numeric fields
Improve data integrity
further restrict possible values, e.g. marks > 0
Support all data manipulations
e.g. numeric for arithmetic calculations; sorting, grouping, indexing operations

Creating a Database
Choose file, new database or choose the file new database button in the toolbar
Choose blank database and choose ok
In the save in list box, select the desired drive & folder
Enter the file name & select create

Database window
The window that appears when you open a Microsoft Access database. It contains Tables, Queries, Forms, Reports, Macros, and Modules tabs that you can click to display a list of all objects of that type in the database.

Create a table
In database window click ‘table’ tab & choose new
Select design view & click ok
Enter field names & field types
Apply the primary key (if you want)
Close & save the table
Datasheet View •  Design View
Table Wizard  •  Import Table
Link Table

Datasheet View
Enter data directly into a blank datasheet. When you save the new datasheet, Microsoft Access will analyse your data and automatically assign the appropriate data type and format for each field. Data from a table, form or query is displayed in a row-column format.
In table Datasheet view, you can add, edit, or view the data in a table. You can also check the spelling and print your table’s data, filter or sort records, change the datasheet’s appearance, or change the table’s structure by adding or deleting columns

Design View
Use Design view to specify all of your table details (field names, data types, primary key) from scratch.
In table Design view, you can create an entire table from scratch, or add, delete, or customise an existing table’s fields.

Relationships
An association established between common fields (columns) in two tables.
A relationship can be
one-to-one,
one-to-many, or
many-to-many.

A one-to-many relationship
The most common type of relationship.
A record in Table A can have many matching records in Table B, but a record in Table B has only one matching record in Table A.

A many-to-many relationship
A record in Table A can have many matching records in Table B, and a record in Table B can have many matching records in Table A.
It is really two one-to-many relationships with a third table.

A one-to-one relationship
Each record in Table A can have only 1 matching record in Table B, & each record in Table B can have only 1 matching record in Table A.
This type of relationship is not common. You might use it to divide a table with many fields, to isolate part of a table for security reasons, or to store information that applies only to a subset of the main table.

Define Relationships
The kind of relationship depends on how the related fields are defined:
A one-to-many relationship is created if only one of the related fields is a primary key or has a unique index.
A one-to-one relationship is created if both of the related fields are primary keys or have unique indexes.
A many-to-many relationship is really two one-to-many relationships with a third table whose primary key consists of two fields – the foreign keys from the two other tables.
After you’ve set up different tables for each subject in your database, you need a way of telling how to bring that information back together again.
The first step in this process is to define relationships between your tables.
After you’ve done that, you can create queries, forms, and reports to display information from several tables at once.
You define a relationship by adding the tables you want to relate to the Relationships window, and
then dragging the key field from one table and dropping it on the key field in the other table.

Integrity Controls
Restrict Update, e.g. a customer ID can only be deleted if it is not found in Order table
Cascaded Update, e.g. changing a customer ID will result in that value changing to match in the Order table
Set Null Update, e.g. when a customer ID is changed, any customer ID in the Order table which matches the old customer ID is set to NULL
Similar integrity controls can be defined for Delete

Appending, Updating Records
Once tables and views have been created, it is necessary to populate them with data and maintain those data before queries and reports can be written
Use Datasheet or Form view
Datasheet
Form View
Form view is a window that usually displays one or more whole records. Form view is the primary means of adding and modifying data in tables.
To easily view, enter, and change data directly in a table, create a form. When you open a form, Microsoft Access retrieves the data from one or more tables and displays it on screen using the layout you chose in the Form Wizard or using a layout that you created from scratch.
Form View
Forms focus one record at a time, and they can display fields from multiple tables, pictures, and more.

Query
To find and retrieve just the data that meets conditions you specify, including data from multiple tables, create a query. A query can also update or delete multiple records at the same time, and perform built-in or custom calculations on your data.
You use queries to view, change, and analyse data in different ways. You can also use them as the source of records for forms and reports.

Query-By-Example (QBE)
By using a query we can select the records for a given condition(s)
QBE is the most widely available direct-manipulation database query language.
It is not an international standard like SQL
It is easy to learn for wide variety of people wanting to make inquires against a database
Database systems like Microsoft Access, translate QBE queries into SQL

Create a Query
In the database window click the ‘Query’ tab & choose New
Select design view & click ok
Select the table or query that you want
Attach the  field names with criteria (if any)
close & save the Query
There are several types of queries
Select query       (Simple/Single-table / Multiple-table)
Cross tab query
Parameter query
Self-join Query Query based on another query
Others
SQL query
(union, pass-through, data-definition, sub-query)
Action query
(make-table, delete, append, update)
Auto Lookup query

Simple query
Single Table
Selected Columns
Selected Rows (Filter using criteria)
Sort order

Multiple-table query

Cross tab query
calculate a sum, average, count, or other type of total for data that is grouped by two types of information
— one down the left side of the datasheet and another across the top.

Cross tab query
You create a crosstab query with a wizard or from scratch in the query design grid. In the design grid, you specify which field’s values become column headings, which field’s values become row headings, and which field’s values to sum, average, count, or otherwise calculate.

Parameter query
Displays one or more predefined dialog boxes that prompt you for the parameter value (criteria). You can also create a custom dialog box that prompts for the query’s parameters.

Action query
An action query is a query that makes changes to many records in just one operation.

There are four types of action queries:    delete, update, append, and make-table.

Delete query

Deletes a group of records from one or more tables.

E.g. remove products that are discontinued.

Update query
Makes global changes to a group of records in one or more tables.

E.g. raise prices by 10 percent for all dairy products.
Append query
Adds a group of records from one or more tables to the end of one or more tables.

E.g. suppose that you acquire some new customers and a database containing a table of information on those customers (Germany Customers). To avoid typing all this information in, you’d like to append it to your Customers table.
Append queries are also helpful for:
Appending fields based on criteria.
E.g. you might want to append only the names and addresses of customers with outstanding orders.
Appending records when some of the fields in one table don’t exist in the other table.
E.g. Customers table has 11 fields. Suppose that you want to append records from another table that has fields that match 9 of the 11 fields in the Customers table. An append query will append the data in the matching fields and ignore the others

Make-table queries
Make-table query
Creates a new table from all or part of the data in one or more tables.

Make-table queries are helpful for:
Creating a table to export to other databases.
E.g., you might want to create a table that contains several fields from your Employees table, and then export that table to a database used by your personnel department.

Make-table queries are helpful for:
Creating reports that display data from a specified point in time.
E.g., suppose you want to print a report on 15-May-01 that displays the first quarter’s sales totals based on the data that was in the underlying tables as of 9:00 A.M. on 1-Apr-01.
To preserve the data exactly as it was at 9:00 A.M. on 1-Apr-01 , create a make-table query at that point in time to retrieve the records you need and store them in a new table. Then use this table, rather than a query, as the basis for the reports.
Make-table queries are helpful for:
Making a backup copy of a table.
Creating a history table that contains old records.
E.g., you could create a table that stores all your old orders before deleting them from your current Orders table.
Make-table queries are helpful for:
Improving performance of forms and reports based on multiple-table queries.
E.g. suppose you want to print multiple reports that are based on a five-table query that includes totals. You may be able to speed things up by first creating a make-table query that retrieves the records you need and stores them in one table. Then you can base the reports on this table, so you don’t have to rerun the query for each report.

Auto Lookup queries
You can design a multiple-table query to automatically fill in certain field values for a new record. When you enter a value in the join field in the query or in a form based on the query, Microsoft Access looks up and fills in existing information related to that value.
E.g. if you know the value in the join field between a Customers table and an Orders table (typically, a customer identifier such as CustomerID), you could enter the customer ID and have Microsoft Access enter the rest of the information for that customer (Customer Name, Address). If no matching information is found, Microsoft Access displays an error message when the focus leaves the record.

Reports
To analyse your data or present it a certain way in print, create a report. E.g., you might print one report that groups data and calculates totals, and another report with different data formatted for printing mailing labels. Charts can be inserted in reports.
A report is an effective way to present your data in a printed format. Because you have control over the size and appearance of everything on a report, you can display the information the way you want to see it.
Most of the information in a report comes from an underlying table, query, or SQL statement, which is the source of the report’s data. Other information in the report is stored in the report’s design.

Charts
In most cases, you use the Chart Wizard to create a chart. The Chart Wizard will determine from the data you specify whether it should display data from all fields in one global chart, or whether it is more appropriate to show a record-bound chart, so that when you move from record to record you see a chart that represents only the data in the current record.

Chart Wizard
Choose a table or query and its fields.
Choose a chart that will appropriately display the fields selected

Forms
You can use a form for a variety of purposes:
data-entry, dialog box, switchboard
Most of the information in a form comes from an underlying record source. Other information in the form is stored in the form’s design.
You create the link between a form and its record source by using graphical objects called controls. The most common type of control used to display and enter data is a text box.

Sub-Forms
A subform is a form within a form. The primary form is called the main form, and the form within the form is called the subform. This type of forms is used to display the master and detail records.
A form/subform combination is often referred to as a hierarchical form, a master/detail form, or a parent/child form.
The main form and subform in this type of form are linked so that the subform displays only records that are related to the current record in the main form.
For example, when the main form displays the Beverages category, the subform displays only the products in the Beverages category.
A subform can be displayed as a datasheet, as in the preceding illustration, or it can be displayed as a single or continuous form.
A main form can only be displayed as a single form.
A main form can have any number of subforms if you place each subform on the main form.
You can also nest up to two levels of subforms. This means you can have a subform within a main form, and you can have another subform within that subform.
For example, you could have a main form that displays customers, a subform that displays orders, and another subform that displays order details.

Sorting
Arrange the records in given order (Ascending, descending), using of one or more key fields.
Eg. Department name
Salary within Department name
Sort order
The order in which records are displayed – either ascending (A to Z or 0 to 100) or descending (Z to A or 100 to 0).
You can sort records in Form view of a form or subform, or in Datasheet view of a table, query, form, or subform, even if a filter is already applied.
You can also sort filtered data by specifying a sort order in the Advanced Filter/Sort window, or sort a query’s results by specifying a sort order in query Design view.

Filtering
A set of criteria applied to records in order to show a subset of the records.
Select the records for given condition(s).
E.g. Employees in ‘Computer’ Department
Employees who are earning more than “$800”
Microsoft Access has four kinds of filters: Filter By Selection, Filter By Form, Advanced Filter/Sort, and Filter For Input.
A window in which you can create a filter from scratch.
You enter criteria expressions in the filter design grid to restrict the records in the open form or datasheet to a subset of records that meet the criteria.
You can also specify a sort order on one or more fields in the design grid.

Introduction to Macros
A macro is a set of one or more actions that each perform a particular operation, such as opening a form or printing a report.
Macros can help you to automate common tasks.
e.g., you can run a macro that prints a report when a user clicks a command button.
Enter the actions you want to carry out
Action arguments (object name, data to use, filter name, condition)

Introduction to Modules
A module is a collection of logically related program statements (e.g. Visual Basic for Applications declarations and procedures) that are stored together as a unit.
Each module has only a single function, which limits the module’s size and complexity.
Each procedure in a module can be a Function procedure or a Sub procedure

Class Modules
Form and report modules are class modules that are associated with a particular form or report.
Form and report modules often contain event procedures that run in response to an event on the form or report.
You can use event procedures to control the behavior of your forms and reports, and their response to user actions such as clicking the mouse on a command button.

Standard Modules
Standard modules contain general procedures that aren’t associated with any other object and frequently used procedures that can be run from anywhere within your database.

Introduction to Wizards
Wizard is a tool that help the user to do specific tasks
It asks you questions and take you through the process step by step, using dialog boxes
It creates an object according to your selections.
e.g. you can create a table using the Table Wizard
i.e., choose the fields for your table from a variety of sample tables and sample fields such as business contacts or household inventory.

Prototyping Database Applications
A prototype database application, including database design, tables, menus, forms, queries and reports can be created using its application development environment.
To build a robust application macros needs to be converted to modules (e.g. access macros to VBA modules)
Need to accomplish complex functionality, include error handling, easier maintenance  and more program control.

Multimedia
Multimedia refers to technology that presents information in more than one medium, including text, graphics, animation, video, music and voice
A single digital presentation can contain all media
Multimedia have added greater depth and variety to presentation

Multimedia Features
Text
Colour
Powerful Graphics
Animation
Sound (Stereo)
Video
Voice
Music

Allow users to play games or perform interactive learning
Interactive – user controls the direction of a program or presentation on the storage medium
1st multimedia PC introduced in 1991

Multimedia PC (MPC)
MPC machine is a multimedia personnel computer that adheres to standards set by the Multimedia PC (MPC) Marketing Council.

The council is made up of h/w and s/w companies including Intel, Microsoft, IBM, NEC, and Fujitsu.

The MPC Marketing has published standards for multimedia PC h/w specifications which is an extension of a desktop PC configuration.
(e.g. MPC Level 1 in 1990, MPC Level 2 in 1993)

Requirements for Multimedia Applications
Processor   RAM
Colour Display  Pointing Device
Keyboard   CD-ROM drive
Sound Board   Hard-disk drive
Floppy-disk drive Ports
Software

Processor
The central processor has a numerical name that indicates the basic type (e.g. Pentium) and speed of the processor (e.g. 200MHz)
The more powerful the processor, the faster the multimedia computer will respond.

Random Access Memory (RAM)
It is the main memory at the heart of the computer in which multimedia programmers execute. RAM is measured in megabytes (MB).
Since multimedia objects are big, 32 MB minimum required. 64 MB Works well.
64 MB or 128 MB recommended for the large programmers like windows 2000 and for quick hype.

Colour Display (Monitor)
Produce video output that include Colour, Graphics, Animation and Video
Digital video /graphics adapter is used for video output
14” colour monitor is good for multimedia with 640 x 480 pixels on the screen.
Colour Display: Most important is the number of colours the system unit can display.
SVGA monitor can display 256 simultaneous colours chosen from more than 16 million colours.

Pointing Device
The mouse is the pointing device on multimedia computers and is used to engage the interactive parts of the multimedia program.
Mouse pens which let you write with stylus instead dragging the mouse, trackballs, which lets you spin a ball track point mounted in the center of the keyboard on IBM notebook computers.

Other Hardware
Standard Key-board (e.g. 101-key keyboard) and two-button mouse.

Multimedia Tools and Devices
They give the multimedia computer the ability to make sound play music and record movies.

CD-ROM Drive
Audio-input devices and output
Video-input devices and output
Electronic cameras

CD–ROM Drive
Early CD–ROM drives could need computer data but did not have audio circuitry needed to make sound.
Triple and quadruple speed CD–ROM is an evolving technology that keeps improving. 48x drives are common now 50x speed drives too.

Digital Audio
Multimedia computer requires waveform audio to record and play back waveform digital audio files
16-bit sound card produces a dynamic range to 50dB whereas 32-bit sound card increases the dynamic range to 98dB (dB = decibel, a measurement of loudness )
The greater the dynamic range the more faithful sound reproduction
A sound board (e.g. AdLib, Microsoft Windows Sound System, Sound Blaster Pro) works with both digital and synthesized sound.
Digital sound: a sound boards converts analog sound (e.g. microphone) into digital. When play back through speakers and headphones, it converts digital sound into analog
Synthesized sound is created by the computer to simulate sound of music instruments

Audio Output
A pair of audio speakers to listen the stereo sound
Sound output is through a digital audio card
Speakers or headphones are used for Audio output.
The audio output types are sound, voice, and music.

MIDI
Musical Instrumental Digital Interface (MIDI) was invented to provide a means for music keyboards synthesizers and computer to communicate with each other.
MIDI synthesizer or keyboard purchased follows general MIDI specifications which standardize the set instrumental sounds MIDI device produce.
Unlike waveform data which stores actual digitized sound, a MIDI file contains a series of 3-Byte key-on and key-off messages.
Conventional MIDI system, Cassette deck, CD Player, Keyboard instruments and microphone can be used for audio input.
Using an audio board or a MIDI board analog sounds from these devices are digitised.

Digital Video
It is a combination of sound, video and animation.
This requires a massive disk space, faster drives and processors because video play back has to be done at 30 frames per second to achieve the TV quality.
Use software (e.g. MS video for windows) to display digital movie clips and use frames grabber cards to convert video footage to digital files.

Video Input
Conventional VCR and Camcorder (still-video camera) can be used for video input
Image Scanner can be used to input text and images
Using a video card (frame-grabber card or full-motion video card) analog signals from these devices are converted to digital form.

Users / Applications
Use is becoming more common in business, the professions and entertainment, education, training, sales presentations as a means of improving the way information is communicated
Used by telephone, cable, broadcasting and entertainment companies
Various kinds of information and entertainment are delivered to homes through wired and wireless communication lines

Multimedia Applications
Home/library Education Business  Government
Education Interactive
Learning Training  Public information
Access
Information Simulation Education Department
Information
Entertainment Reference Retail Sales Tourism
Reference  Simulation visual
Audio catalogues
Business
Presentations Surveys

Introduction to CAD/CAM
Computer-Aided Design (CAD) and Computer-Aided Manufacturing (CAM) are widely used in manufacturing industries for both design and manufacturing tasks

Introduction to CAD
Engineers, architects and other designers use the computer to assist in designing products and structures by generating drawings objects on a graphic display screen and then manipulating them until a final design is obtained.
One of the first applications of CAD was to automate the drafting process, which is the process of creating drawings of products.
Special terminals with very high resolution graphics are used to produce drawings of products they are designing.
On the terminal the user has options
of moving the drawing,
turning it to see all aides,
changing its size or perspective,
enlarging a portion of it to display more details,
and other options as well.
In most cases, hard copy can also be produced.
The designer can easily make changes to the design without having to completely redraw the entire drawing.
Changing the design which once required  hundreds of engineers and drafting hours, now can be done in a second.

Basic Features of CAD
Technology allows three-dimensional objects to be displayed in a wide variety of colours.
The objects may then be turned, rotated or displayed in a variety of sizes or angles.
Furthermore, using mathematical modelling, the engineer can subject the object to mechanical stresses, heat, motion and pressure, and observe the results on the graphic terminal display.
CAD programs allows users to do “what if” overhauls of designs using parametric.
Provide layers like transparent and overlapping sheets.
Allow automatic dimensioning and ensure greater accuracy than traditional hand-drafting methods.

Applications of CAD
Computers and computer graphics are being used to design products ranging from automobiles and aeroplanes to buildings and electronic circuits.
E.g. Designing shock absorbers
The shock absorber and wheel are drawn by a computer program. The program can then simulate the wheel and tire moving on different surfaces and watch the effect the movement will have on the shock absorber and tire. Better designs are developed faster and less expensively using modelling and simulation capabilities.
E.g. Designing yours office
Start by measuring your office space and enter the dimensions into the program to create a diagram of the exterior walls.
Next, add panels to divide the space into cubicles or work areas.
Now you are ready to furnish your new spaces using nearly 200 symbols representing desks, tables, chairs, computers, even plants.
Each symbol can be labelled for a particular occupant, department, manufacturer, price and size
When finished, you can give the diskette or print to your architect or builder to modify into working drawings or create lists of furniture and equipment to be purchased.
E.g. Automobile break assembly
A designer can draw in three-dimensions and rotate the figure on the screen and view the break system
E.g. Monitoring Pressure / Depth
Use of colour and three-dimensional effect with high resolution graphics allows intricate and complicated drawings to be placed on the computer screen.

CAD Programs
E.g.
Autosketch   EasyCAD2
TurboCAD

These programs include libraries of options such as
–  cabinetry, furniture, fixtures for office designs
–  trees, shrubs and vegetables for landscaping designs

CADD
Application programs that helps people to do drafting is called Computer-Aided Design and Drafting (CADD)
E.g.
AutoCAD   Microstation

These programs include symbols (points, circles, straight lines, arcs) that help the user put together graphic elements, such as the floor plan of a house

Applications for Professionals
Architects
Plan views, elevations, sectional elevations, perspectives, 3D rendering and walkthrough
Civil Engineers
Structural drawings, plumbing, mapping, highway, contouring
Surveyors
Plotting survey plans, longitudinal and cross sections, contouring, mapping
Electrical Engineers
Control schematics, connection diagrams, printed circuit boards
Electronic Engineers
Schematic diagrams, printed circuit boards
Mechanical Engineers
Machine design, processes, sheet metal layouts, tooling and fixtures, robotics, plant layout

Introduction to CAM
CAD/CAM software allows products designed with CAD to be input into an automated manufacturing system that makes the product.
The manufacturing process is done by robots under the control of computers.
Industrial robots are machines, operating under the control of a computer and related software, that are designed to perform repetitive manufacturing and operational tasks required by a company.
e.g. welding, drilling and material handling

Features of CAM
Robots are ideal for performing repetitive tasks, hours after hours, in a hostile environment, with the last operation being performed using the same precision as the first.
Robots are equipped with arms, hands, optic sensors, etc.
Robots are used for a variety of tasks which either can be performed by a robot more precisely and with more consistency than a human being can perform them, or which are performed in an environment which may endanger the health of a person.

Applications of CAM
E.g. Manufacturing Automobiles
design the manufacturing process of an automobile and use robots for welding of moving car bodies in automobile industry
E.g. Manufacturing Electronic Circuits
Design electronic circuit boards and using a moving robot arm, place electronic components on a printed circuit board
E.g. Manufacturing Garments
Design the garments for the fashion industry. Input designs and specifications into CAM system that enable robot pattern-cutters to automatically cut thousands of patterns from fabric, with only minimal waste.

Introduction to PC Networks & Internet

Evolution of Networks Pre-Network Environment
All Computers        were big, expensive, and difficult to maintain.
Concept of   Computer Centre.
Manual method for transportation of data.
Multi-user systems
Number of terminals attached to the same computer.
Remote terminals connected through telecom links
First step in the integration of communications and computers

Evolution… – Islands of Automation
Use of ICs in producing computers made them
less expensive
smaller in size
specialized in usage.
Large organizations became able to procure different computers for their different divisions such as accounts, engineering, sales, etc.
This paved the way to realize the need of getting them interconnected.

Evolution … – Early Computer Networks
Proprietary nature.
Made the purchasers married to one brand
Inter-operability among the different brands was not addressed.

Evolution … – ARPANET
Advanced Research Project Agency Network.
Originated by the Defense Department of United States.
To interconnect the computing facilities then available at the government institutions of the United States.
Could interconnect the computers of different brands.
Introduction of TCP/IP (Transfer Control Protocol/Internet Protocol).
ARPANET is the root of the today’s Internet.

Evolution… – Heterogeneous Networks
Development of Interoperability standards
Computers moving data quickly between dissimilar computers

Evolution of Networks – Global Networks
The Internet.
Networks used by Banks.
Networks used by Airlines.
Networks used by Multinational companies.

The Concept of Networking
The idea of networking has been around for a long time and has taken on many meanings. If you were to look up “network” in your dictionary, you might find any of the following definitions:
An openwork fabric; netting
A system of interlacing lines, tracks, or channels
Any interconnected system; for example, a television-broadcasting network
A system in which a number of independent computers are linked together to share data and peripherals, such as hard disks and printers

Introducing Computer Networking
At its most elementary level, a computer network consists of two computers connected to each other by a cable that allows them to share data.
Computer networking arose as an answer to the need to share data in a timely fashion.
Personal computers are powerful tools that can process and manipulate large amounts of data quickly, but they do not allow users to share that data efficiently.
When they are used without networking, it is known as “working in a stand-alone environment.”

Sharing of Information among the Stand-alone computers

Types of Computer Networks
Computer networks are classified into one of two groups, depending on their size and function.
A local area network (LAN) is the basic building block of any computer network. A LAN can range from simple (two computers connected by a cable) to complex (hundreds of connected computers and peripherals throughout a major corporation).
The distinguishing feature of a LAN is that it is confined to a limited geographic area.
A Wide Area Network (WAN), on the other hand, has no geographical limit.
It can connect computers and other devices on opposite sides of the world.
A WAN is made up of a number of interconnected LANs. Perhaps the ultimate WAN is the Internet.

Why Use a Computer Network?
With the availability and power of today’s personal computers, you might ask why networks are needed. From the earliest networks to today’s high-powered Personal Computers, the answer has remained the same: networks increase efficiency and reduce costs. Computer networks achieve these goals in three primary ways:
Sharing information (or data)
Sharing hardware and software
Centralizing administration and support

Network Applications
Traditional Network Applications
Telnet
Users desiring to connect to a remote system and interact with any of the various servers on a remote system can use the Telnet command.
Rlogon
Users desiring to run commands interactively on a remote computer can use this facility which enables the user to logon to the remote system.

FTP
Transferring of files among the remote systems are enabled through FTP.
E-mail
Enables the users to send messages to the others without sender knowing where the receiving host is.

Modern Network Applications
Most of the currently used applications on the networks are based on the WWW platform.

PC Networking Environment…
PC networks are formed by interconnecting PCs into a Local Area Network.
Each PC should consists of a Network Interface Card.
The port available on the Network Interface Card is used to connect the PC onto the cabling system forming the network.
The peripheral devices attached to the PCs can be shared on this kind of networks.

Servers — Computers that provide shared resources to network users.
Clients — Computers that access shared network resources provided by a server.
Media — The wires that make the physical connections
Shared data — Files provided to clients by servers across the network.
Shared printers and other peripherals — Additional resources provided by servers.
Resources — Any service or device, such as files, printers, or other items, made available for use by members of the network.

Two Broad Categories of Networks
Peer-to-peer Network
Server-based Network

Other Networking Devices
Modem – A device used to provide remote connectivity over the Telephone lines.
Routers – Used to connect one LAN to another or to connect a LAN to the Internet.

PC Communications Process
A PC can send data to and receive data from other computers using PC Communications s/w.

The steps in the communications s/w process:
Connect
Dial up
Handshake and log on
File transfer
Disconnect
Connect
open the phone line circuit and get a dial tone
Dial up
Place a call to the host computer
Handshake and log on
make the connections and establish a session
File transfer
send or receive data to or from the host computer
Disconnect
hang up the phone line and reset the communications software to place or receive the next communication

What is the Internet ?
The Internet is a worldwide collection of computer networks, cooperating with each other to exchange information using a common software standard.
Through telephone lines, satellite links and the other data cables the networks and the computers on the Internet are connected together.
The Internet’s basic job is to organize and share information between computers, regardless of the …..
Location
Computer
Information
Transmission
Software

People Use the Internet to :
Publish information
Get information
Buy things

Kind of Information are Available
Text documents
Graphics files (digitized photographs and artwork)
Sound and video files
Downloadable software
etc.

Internet Services
Email
Audio
Video
User groups
World Wide Web
Conferencing
Bulletin Boards
Virtual Reality

World Wide Web
Web Page – A file that contains text, images, etc. and the links to the other pages, sites, etc.
HTML – Hyper Text Markup Language, the language that describes how a page should be formatted.
Web Browser – The piece of software that pulls the requested page from the Web Server and it interprets the HTML code and displays the page on the client computer.
Web Server – The computer system which has the necessary hardware and software to hold the web pages and to respond to the requests.

URLs   “Internet Address”
Uniform Resource Locator (URL) is the Internet address of the web page requested.
http://www.lk
The address of the Sri Lanka web site
http://www.microsoft.com
The address of the Microsoft web site.
http://www.ict.cmb.ac.lk
The address of the ICT web site

Controls on Web Pages
Browser tools help you navigate around the Web.
moving back and forth between pages
A “Bookmark” list, “favorites” list or “hotlist”
lets you save the names and locations of favorite sites for easy reference
The Uniform Resource Locator (URL) is the address of a Web site.
Some Web pages contain special graphic buttons that, when clicked, take you to another resource as would a regular hotlink.
Hypertext hotlinks are connections to other pages and resources.
To contact the author or sponsor of a Web site, most contain one or more E-mail links.
Most Web pages use several applets, or small program segments run by Java, ActiveX or some other protocol.
Applets are downloaded when you access a Web site, and run only as long as you remain at that location.
Applets can perform a variety of functions.

Internet Browsers
Browsers provide viewing of html’s, photo’s, graphics, email programs, and news groups.
Netscape
Internet Explorer
What browsers do!
You ask something.
Browser tells other computer what you want.
Other computer responds.
Browser receives and displays.

Navigating with a Browser
Moving from page to page in the Web is called navigating.
A page created putting the introducing  information of a particular organization, company or a  person is called the home page.
From the current page you can move onto another page using a feature called links.
When you pass your mouse pointer over a link on the current page, it automatically changes to a hand symbol. If you click the mouse button at this instant the new page pointed by that link will be displayed.
Introduction to C++

C++ is totally an object-oriented programming language which was developed by Bijarne Stroustrup USA in the early 1980’s.

C++ is basically an extension of C with a major addition of the class construct feature.

The object-oriented features in C++ allow programmers to build large programs with clarity, extensibility and ease of maintenance.
Advantages of C++

1) C++ allows creation of hierarchy-related objects, which can be used by many programmers.

2) C++ is capable to map the real-world problems properly, so the C part of C++ program provides the ability to get close to the machine-level details.

3) C++ programs are easy to maintain.

4) The compilers which are used to execute C++ programs are cheap and easily available.

Structure of C++ Program

Execution of all C++ programs begins at main() function.

All the C++ statements terminates with semicolons. In C++ , main() returns an integer type value to the operating system.

Every main() in C++ should end with a return(0) statement.

cout << “introduction to C++ ” : cout is a predefined object that represents the standard output stream in C++.

<<

So, cout << causes the string “introduction to C++” to be displayed on the screen.

#include <iostream> : it is called C++ header files which causes the preprocessor to add the contents of the iostream file to the program.

using namespace std: Namespace is a new concept introduced by the ANSI C++ standards committee. This defines a scope for the identifiers used in the program.

So, based upon the information’s ,it can be concluded that C++ program contain four sections:

Section 1 : Include files

Section 2 : Class declaration

Section 3 : Member functions definitions

Section 4 : Main function program

Program 1: program to print a string on the screen

#include <iostream>

using namespace std;

int main()

{

cout << “First program on C++”<<“\n”;

return 0;

}

The essential components which will present in all the C++ programs as per ANSI standards are:

a) C++ directives (#include<header file name>.

b) using namespace std.

c) int main(). main method is always followed by int. Students will get more information about this concept in chapter 3 which deals with functions.

d) return 0, which will be the last line of the program.

The smallest individual units in a program are known as Tokens.

a) Keywords

b) Identifiers

c) Constants

Constants

Constants are fixed values that do not change during the execution of a program. C++

supports several kinds of literal constants and it includes: integers, characters, floating point numbers and strings.

1890 Decimal integer

15.45 Floating point integer

O

OX2 H Hexadecimal integer (H signifies the number is of Hexa Decimal type).

 

 

 

Keywords

Keywords are the reserved words and cannot be used as name for the program variables or other user-defined program elements or any other user defined variables.

asm Used for embedding assembly language statements in C++ programs.

auto It is a storage class specifier for the local variables.

bool It is a data type.

break A break statement is used to cause an exit from the loop.

catch catch is used to describe the exception handler code that catches the exception.

char It is a fundamental data type.

ACharacter constantC++String constant37 Octal integer (O signifies the number is of octal type).is called the insertion or put to operator

Introduction to Identifiers and Constants in C++

auto It is a storage class specifier for the local variables.

bool It is a data type.

Break A break statement is used to cause an exit from the loop.catch catch is used to describe the exception handler code that catches the exception.

Char It is a fundamental data type.

class class is used to create user –defined data types.

const It is a data type qualifier.

default It is a default label in a switch statement.

delete It is an operator used to remove the objects from memory.

Do It is a control statement.

False It is a Boolean type constant.

For It is a control statement.

goto It is a transfer statement.

double It is a floating-point data type’s specifier.

long It is a data type.

Private It is a access specifier.

public It is a access specifier.

Identifiers

Identifiers are used to refer the names of variables, functions, arrays, classes etc. which are created by the user.

The rules used for naming convention for C++ are:

a) Only alphabetic characters, digits and underscores are allowed.

b) Name must not start with a digit.

c) Uppercase and lowercase letters are distinct.

d) Keywords cannot be used as a variable name.

Example of valid identifiers are: person , _person , p_erson

Example of invalid identifiers are: 2person , switch , for.

Data Types

 

Variables and their Scope

#include <iostream>

using namespace std;

float y = 10.50; // globally defined

int main()

{

float y = 10.60; // local to main

{

float y = 10.70; // local to inner block

cout << “INNER BLOCK \n”;

cout << “y = ” << y << “\n” ;

cout << “:: y = ” << ::y << “\n” ; // accessing the global

// variable

}

cout << “OUTER BLOCK \n”;

cout << “y = ” << y << “\n” ;

cout << “::y =” << y << “\n”;// accessing global variable

Scope Resolution for Variables

It is very much possible that the same variable name can be used with different values in different blocks

The variable which is declared globally irrespective of any block is said to be global variable.

And the variable which is declared inside the block is said to be local variable with respect to that block.

Operators and Expressions

Assignation Operator

If condition is true the expression will return result1, if not it will return result2.

Expression Result

7= =5 ? 4 : 3 returns 3 since 7 is not equal to 5.

7= =5 + 2 ? 4 : 3 returns 4 since 7 is equal to 5+2.

5 >3 ? a : b returns a, since 5 is greater than 3.

a>b ? a : b returns the greater one, a or b.

Control Structures

 

Decision Making Constructs

 

if Statement

 

if – else Statement

 

else – if Ladder

 

switch Statement

#include <iostream>

using namespace std;

int main()

{

int values;

cout << “\n Enter 25 , 45 or 75 :”;

cin >> values;

Switch (values)

{

case 25 :

cout << “You have entered 25 \n” ;

break;

case 45 :

cout << “You have entered 45 \n” ;

break;

case 75 :

cout << “You have entered 75 \n”;

break;

default :

cout << “You have entered different value \n”;

}

return 0;

Looping Constructs

 

while Loop

 

do while Loop

for Loop

For example the listing 2.14 displays the cube of numbers from 0 to 14.

 

for loop executes a section of code fixed number of times ; this loop is extensively used when it is already known that how many times it is required to execute the code. The syntax of this loop is as under:

The syntax for while loop is as shown under:

The three C++ looping statements are explained in this section.

While loop

Do while loop

For loop

Whenever decision tree is large, and all the decisions depend upon the value of the same variable, then in that case switch statement may be used. The syntax for switch statement is as shown under:

For example, the program shown in listing 2.9 will ask user to enter 25, 45 or 75, if the values entered are among these three values, then the corresponding matching statements will be executed but if the entered value is different from these then the default statement will be executed.

There is another way of putting “ifs” together when multi path decisions are involved. A multi path decision is a chain of “ifs” in which the statement associated with each “else” is an “if”.

The if –else statement is used to do something if a condition is true, if the condition is not true, nothing happens. But if user wants to do one thing if a condition is true, and do something else if it’s false.

The syntax for if….else statement is as under:

For example, the code shown in listing 2.6 accepts a number from user and if the number is greater than 10; then displays a message stating that “Number entered by user is greater than 10” but if the number entered is less than 10, again a message will be displayed stating that “Number entered by u is less than 10”.

if statement is the simplest of the decision statements. The if keyword is followed by a test expression in parenthesis. The syntax is as shown:

For example, the code shown in Listing 2.4 accepts a number from user and if the number is greater than 10 ; then it displays a message stating that “Number entered by user is greater than 10” but if the number entered is less than 10, the program will terminate without displaying the message.

The C++ constructs extensively used for decision making are:

a) if statements.

b) if – else statements.

c) Nested if – else statements

d) switch statements.

Control statements classified into constructs which are as under:

1. Looping constructs: loops causes a section of the program to be repeated a certain number of times however, repetition continues while a condition is true and the loop terminates and control passes to the statements following the loop; whenever the condition becomes false. Loops provided in C++ are, for loop, while loop and do while loop.

2. Decision making constructs: In a program a decision causes a one-time jump to a different part of the program, depending upon the value of an expression. Statements provided in C++ for decision making are, if statements, if – else statements and switch statements.

The assignation operator serves to assign a value to a variable.

a = 2 + (b = 5);

is equivalent to:b = 5; a = 2 + b;

Arithmetic Operators

The five arithmetical operations supported by the language are:

 

Operator Meaning

+ Addition

– Subtraction

* Multiplication

/ Division

% module

Compound Assignment Operators

Different types of Compound operators available in C++ are:+=, -=, *=, /=, %=, >>=, <<=, &=, ^=, |=

value += increase ; is equivalent to value = value + increase ;

a – = 5; is equivalent to a = a– 5 ;

a /= b; is equivalent to a = a / b;

price *= units + 1; is equivalent to

price = price *(units + 1);

and the same for all other operations

Increment and Decrement Operator

 

Relational Operators

As specified by the ANSI-C++ standard, the result of a relational operation is a Boolean value that can only be true or false, according to the result of the comparison.Here is a list of the relational operators that can be performed in C++:

Operator Meaning

== Equal

!= Not Equal

> Greater than

< Less than

>= Greater than or equal t

<= Less than or equal to

Logical Operators

 

Conditional Operator

The conditional operator evaluates an expression and returns a different value according to the evaluated expression, depending on whether it is true or false. Its syntax is:condition ? statement 1 : statement 2

Operator ! is equivalent to boolean operation NOT

Operator ! is equivalent to boolean operation NOT

Another example of operator commonly used are the increment operator (++) and the decrement operator ().

They increase or reduce by 1 the value stored in a variable. They are equivalent to +=1 and to -=1, respectively.

In case that the increment operator is used as a prefix (++a) the value is increased before the

expression is evaluated and therefore the increased value is considered in the expression;

in case that it is used as a postfix (a++) the value stored in a is increased after being evaluated

 

Example 1 : B=3;A=++B;// A is 4, B is 4Example 2 : B=3;A=B++;// A is 3, B is 4The values associated with variables keeps changing through out the program; that’s why it is known as Variables.All variables have to be declared first before they are used in programs

Thus, in C++ variables can be declared wherever it is required.

Reference Variables

data-type &reference-name = variable-name

An example pertaining to illustration of reference variable is as explained:

float difference = 5;

float &deviation = difference;

Here, in the example shown “difference” is of type float and “deviation” is the alternative

Reference variables are used to provide an alternative name for a previously defined variable,Reference variables are created by using the syntax as under:

 

Data-types supported by C++ are :1. Built-in data type (also known as basic or fundamental data type).2. User-defined data type.3. Derived data type.

Built-in Data Type

 

User Defined Data Type

C++ permits its users to define a powerful data type known as class, which can be extensively used just like any other basic data type, to declare variables.Apart from classes; C++ also supports a data type which facilitates a way for attaching names to numbers and these data types are known as Enumerated Data Type.

enum keyword is used for defining Enumerated data type. The Syntax for defining enum statement is as under:

enum size {small, medium, large};

size is known as tag and by using these tags it is possible to declare new variables as for

example: size extralarge; // extralarge is of type size

Derived Data Type

 

Arrays

Arrays are a series of elements (variables) of the same type placed consecutively in memory that can be individually referenced by adding an index to a unique name, for example: int i[5];

 

Functions

Functions are the building blocks of C++ Programs.

Functions basically groups a number of program statements into a unit. This unit can be invoked from other parts of the program.

Pointers

Pointers are extensively used in C++ for memory management and achieving polymorphism as well. Pointers are basically used to store the address of the memory location.

a) Arraysb) Functions

c) Pointers

The built-in (or fundamental) data types, the size occupied by these data types and ranges allowed, as per ANSI C++ are as shown in table

 

Section 1.1


Numerical Presentation

In science, technology, business, and, in fact, most other fields of endeavour, we are constantly dealing with quantities. Quantities are measured, monitored, recorded, manipulated arithmetically, observed, or in some other way utilized in most physical systems. It is important when dealing with various quantities that we be able to represent their values efficiently and accurately. There are basically

two ways of representing the numerical value of quantities: analog and digital.

Analog Representation

In analog representation a quantity is represented by a voltage, current, or meter movement that is proportional to the value of that quantity. Analog quantities such as those cited above have an important characteristic: they can vary over a continuous range of values.Below is a diagram of

 

analog voltage vs time:

Digital Representation

In digital representation the quantities are represented not by proportional quantities but by symbols called digits. As an example, consider the digital watch, which provides the time of day in the form of decimal digits which represent hours and minutes (and sometimes seconds). As we know, the time of day changes continuously, but the digital watch reading does not change continuously; rather, it changes in steps of one per minute (or per second). In other words, this digital representation of the time of day changes in discrete steps, as compared with the representation of time provided by an analog watch, where the dial reading changes continuously.Below is a diagram of

 

digital voltage vs time:

The major difference between analog and digital quantities, then, can be simply stated as follows:

Analog = continuous
Digital = discrete (step by step)

Section 1.2


Advantages and Limitations of Digital Techniques

Advantages

1. Easier to design. Exact values of voltage or current are not important, only the range (HIGH or LOW) in which they fall.

2. Information storage is easy.

3. Accuracy and precision are greater.

4. Operation can be programmed. Analog systems can also be programmed, but the variety and complexity of the available operations is severely limited.

5. Digital circuits are less affected by noise. As long as the noise is not large enough to prevent us from distinguishing a HIGH from a LOW.

6. More digital circuitry can be fabricated on IC chips.

Limitations

There is really only one major drawback when using digital techniques:

The real world is mainly analog.
Most physical quantities are analog in nature, and it is these quantities that are often the inputs and outputs that are being monitored, operated on, and controlled by a system.To take advantage of digital techniques when dealing with analog inputs and outputs, three steps must be followed:

1. Convert the real-world analog inputs to digital form. (ADC)
2. Process (operate on) the digital information.
3. Convert the digital outputs back to real-world analog form. (DAC)

The following diagram shows a temperature control system that requires analog/digital conversions in order to allow the use of digital processing techniques.

Section 1.3


Digital Number System

Many number systems are in use in digital technology. The most common are the decimal, binary, octal, and hexadecimal systems. The decimal system is clearly the most familiar to us because it is a tool that we use every day. Examining some of its characteristics will help us to better understand the other systems.

Decimal System

Decimal System The decimal system is composed of 10 numerals or symbols. These 10 symbols are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9; using these symbols as digits of a number, we can express any quantity. The decimal system, also called the base-10 system because it has 10 digits.

103   102   101   100   10-1   10-2   10-3  
=1000   =100   =10   =1   .   =0.1   =0.01   =0.001  
M  

Decimal point   L  

 

Binary System

In the binary system, there are only two symbols or possible digit values, 0 and 1. This base-2 system can be used to represent any quantity that can be represented in decimal or other number system.

23   22   21   20   2-1   2-2   2-3  
=8   =4   =2   =1   .   =1/2   =1/4   =1/8  
M  

Binary point   L  

 

Binary Counting

The Binary counting sequence is shown in the table:

Section 1.4


Representing Binary Quantities

In digital systems the information that is being processed is usually presented in binary form. Binary quantities can be represented by any device that has only two operating states or possible conditions. Eg. a switch has only open or closed. We arbitrarily (as we define them) let an open switch represent binary 0 and a closed switch represent binary 1. Thus we can represent any binary number by using series of switches.

Typical Voltage Assignment

Binary 1: Any voltage between 2V to 5V
Binary 0: Any voltage between 0V to 0.8V
Not used: Voltage between 0.8V to 2V, this may cause error in a digital circuit.

We can see another significant difference between digital and analog systems. In digital systems, the exact value of a voltage is not important; eg, a voltage of 3.6V means the same as a voltage of 4.3V. In analog systems, the exact value of a voltage is important.

© 2010, University of Colombo School of Computing 1
Database Systems II
Distributed Databases
Prof. G.N.Wikramanayake
© 2010, University of Colombo School of Computing
2
Objective & Content
• Describe distributed database architecture
• Produce designs for distributed database
systems
• Recognise different categories of
distributed database systems
• Infer query processing in a distributed
environment
© 2010, University of Colombo School of Computing
3
Database Environment
Marketing
Sales Advertising Marketing Purchasing
Accounting
Accounting
Accounts
Receivable
Accounts
Payable
Corporate Database
© 2010, University of Colombo School of Computing
4
Centralised DBMS• Centralised DBMS all system components
(data, DBMS software and secondary
storage devices) reside at a single computer
or site.
Travel Agent A Travel Agent B
„ A centralised database can be accessed
remotely via terminals connected to the
site.
© 2010, University of Colombo School of Computing
5
Distributed Database
•A distributed database is a database that is
under the control of a central database
management system (DBMS) in which storage
devices are not all attached to a common CPU.
„ It may be stored in multiple
computers located in the
same physical location, or
may be dispersed over a
network of interconnected
computers.
© 2010, University of Colombo School of Computing
6
• In Distributed DBMS, each site is a database
system site in its own right, but the sites have
agreed to work together (if necessary).
• User at any site can access data anywhere in
the network exactly as if the data were all stored
at user’s own site.
•E.g.
5 Sites, 4 Databases, 1 Replicate
Distributed DBMS
S1
S2
S3
S4 S5
R3
R1 R2
R1-3
© 2010, University of Colombo School of Computing
7
Distributed
DBMS
A
Centralised
view, but
distributed
© 2010, University of Colombo School of Computing
8
Distributed DBMS
S1
S2
S3
S4 S5
R3
R1 R2
R1-3
• A database server is the software
managing a database, and a client is an
application that requests information from a
server.
„Each computer in a system is a
node. A node in a distributed
database system acts as a client,
a server, or both, depending on
the situation.
© 2010, University of Colombo School of Computing
9
Distribution
• Deals with Physical distribution of data
over multiple sites
• Three alternative architectures available
– Client-Server, communication duties are
shared between the client machines and
servers.
– Peer-to-peer systems, no distinction of client
machines versus servers. Integrate all local
schemas.
– Non-distributed systems. Integrate some local
schema.
© 2010, University of Colombo School of Computing
10
Clients/Servers (Multiple)
Communications
Client
Services
Applications
LAN
•directory
• caching
• query decomposition
• commit protocols
Communications
DBMS Services
Database
Communications
DBMS Services
Database
Client-Server,
communication
duties are shared
between the client
machines and
servers.
© 2010, University of Colombo School of Computing
11
Peer-to-Peer (P2P) or Server-to-Server
Communications
DBMS Services
LAN
Communications
DBMS Services
• SQL interface
• programmatic
interface
• other application
support
environments
Communications
Client
Services
Applications
Database Database
Peer-to-peer
systems, no
distinction of client
machines versus
servers. Integrate all
local schemas.
© 2010, University of Colombo School of Computing
12
Multi-DBMS
Global
RequestsResponses

D
B
M
S
User
Interface
Query
Processor
Query
Optimizer
Transaction
Manager
Scheduler
Recovery
Manager
Runtime Sup.
Processor
USER
GTP GQP
GQOGS GRM
GUI
Local
Requests
Component Interface Processor
(CIP)
D
B
M
S
User
Interface
Query
Processor
Query
Optimizer
Transaction
Manager
Scheduler
Recovery
Manager
Runtime Sup.
Processor
Local
Requests
Component Interface Processor
(CIP)
Non-distributed
systems.
Integrate some
local schema.
© 2010, University of Colombo School of Computing
13
user a program x
external-schema a sub-schema z
conceptual schema
internal schema
Data
bases
DDL
SDDL
DML/SQL
Practitioners:
DBA
Users
3 Level Architecture
© 2010, University of Colombo School of Computing
14
Distributed DBMS Architecture



ES1 ES2 ESn
GCS
LCS1 LCS2 LCSn
LIS1 LIS2 LISn
ES: External Schema
GCS: Global
Conceptual Schema
LCS: Local Conceptual
Schema
LIS: Local Internal
Schema
© 2010, University of Colombo School of Computing
15
Multi-DBMS Architecture

GCS… …
GES1
LCS2 LCSn…
…LIS2 LISn
LES11 LES1n LESn1 LESnm
GES2 GESn
LIS1
LCS1
• GES: Global External Schema
• LES: Local External Schema
„ LCS: Local Conceptual Schema
„ LIS: Local Internal Schema
© 2010, University of Colombo School of Computing
16
Motivation for Distributed
Databases
Organisational and
economic reason -Many organisations are
decentralised and a
distributed database
approach fits more
naturally the structure of
the organisation.
E.g. Banks.
© 2010, University of Colombo School of Computing
17
Interconnection of
existing databases -Distributed databases
are a natural solution
when several
databases already exist
in an organisation and
the necessity of
performing global
applications arises.
Motivation for Distributed
Databases
© 2010, University of Colombo School of Computing
18
• Incremental
growth – Supports
organisational
growth (new
branches) in a
smoother manner
than with a
centralised
database
approach.
Motivation …
© 2010, University of Colombo School of Computing
19
• Allow data
sharing while
maintaining
some measure
of local control
(autonomy).
Motivation for Distributed
Databases
© 2010, University of Colombo School of Computing
20
• Reduce Communication
overhead
• This is not
automatically
guaranteed by
distribution, but
depends largely on
the quality of the
distributed database
design.
Motivation for Distributed
Databases
© 2010, University of Colombo School of Computing
21
Motivation for Distributed
Databases• Performance
Consideration
• The existence of
several processors
results in better
performance through
the use of
parallelism. Smaller
databases exist at
each site and hence,
local queries and
transactions are
improved.
Location 2 Location 1
© 2010, University of Colombo School of Computing
22
• Increased reliability and
availability
• The use of multiple
components means
that higher reliability
can be obtained.
Also data replication
can be used to
increase availability
of data.
Motivation for Distributed
Databases
Location 2 Location 1
© 2010, University of Colombo School of Computing
23
Advantages of distributed
databases• Reflects organizational structure — database fragments are located in the departments they relate to.
• Local autonomy — a department can control the data about them (as they are the ones familiar with it.)
• Improved availability — a fault in one database system will only affect one fragment, instead of the entire database.
• Improved performance — data is located near the site of greatest demand, and the database systems themselves are parallelized, allowing load on the databases to be balanced among servers. (A high load on one module of the database won’t affect other modules of the database in a distributed database.)
• Economics — it costs less to create a network of smaller computers with the power of a single large computer.
• Modularity — systems can be modified, added and removed from the distributed database without affecting other modules (systems).
© 2010, University of Colombo School of Computing
24
Disadvantages of distributed
databases• Complexity — extra work must be done by the DBAs to ensure that the distributed nature of the system is transparent. Extra work must also be done to maintain multiple disparate systems, instead of one big one. Extra database design work must also be done to account for the disconnected nature of the database — for example, joins become prohibitively expensive when performed across multiple systems.
• Economics — increased complexity and a more extensive infrastructure means extra labour costs.
• Security — remote database fragments must be secured, and they are not centralized so the remote sites must be secured as well. Theinfrastructure must also be secured (e.g., by encrypting the network links between remote sites).
• Difficult to maintain integrity — in a distributed database, enforcing integrity over a network may require too much of the network’s resources to be feasible.
• Inexperience — distributed databases are difficult to work with, and as a young field there is not much readily available experience on proper practice.
© 2010, University of Colombo School of Computing
25
users
Allocation Schema
….
Fragmentation Schema
Global Schema
Local Mapping Schema
Local DBMS
Site 1
Local Mapping Schema
Local DBMS
Site n
General Architecture for a Distributed
Databases



ES1 ES2 ESn
LCS1 LCS2 LCSn
LIS1 LIS2 LISn
FS
AS
GS
© 2010, University of Colombo School of Computing
26
General Architecture for a Distributed
Databases
Global Schema
Defines all the data
contained in the
distributed database as if
the data were not
distributed at all. It
consists of a set of global
relations.
© 2010, University of Colombo School of Computing
27
General Architecture for a Distributed
Databases
Fragmentation Schema – Each global relation can be split into
several non-overlapping portions called fragments. The mapping
between global relations and fragments is defined in the
fragmentation schema. Here, several fragments correspond to one
global relation. Employee
Employee Name Employee Salary        Department Name
Silva 50000 Sales
Perera 45000 Sales
Dias 48000 Marketing
Fernando 5100 Marketing
Employee2
Employee Name Employee Salary        Department Name
Dias 48000 Marketing
Fernando 51000 Marketing
Employee1
Employee Name Employee Salary        Department Name
Silva 50000 Sales
Perera 45000 Sales
fragmented
Store at site 1 Store at site 2
Store at site 3
© 2010, University of Colombo School of Computing
28
General Architecture for a Distributed
Databases
Allocation Schema -Fragments are physically
located at one or several sites
of the network. The mapping
defined in the allocation
schema determines whether the
distributed database is
redundant or non-redundant.
Each allocation corresponds to
a fragment if the data is non-redundant otherwise several
allocations will correspond to a
fragment. Sales SalesMarketing Sales
© 2010, University of Colombo School of Computing
29
General Architecture for a Distributed
Databases
Local Mapping Schemes -The top three levels are site
independent. They do not
depend on the data model of
the local DBMS. At the lower
level, it is necessary to map the
objects to those, which are
manipulated by the local
DBMS. This mapping is called
the local mapping schema.



ES1 ES2 ESn
LCS1 LCS2 LCSn
LIS1 LIS2 LISn
FS
AS
GS
© 2010, University of Colombo School of Computing
30
Types of data transparency
• Data structure: “data independence” of
centralized DBMSs
• Location of fragments: “location
transparency”
• Existence of fragments: “fragmentation
transparency”
• Replication of fragments: “replication
transparency”
© 2010, University of Colombo School of Computing
31
The distribution is transparent — users must be able to interact with the
system as if it were one logical system. This applies to the system’s
performance, and methods of access amongst other things.
Transactions are transparent — each transaction must maintain
database integrity across multiple databases. Transactions must also be
divided into sub-transactions, each sub-transaction affecting one
database system..
Network / Distribution / Location transparency  –DBMSs
presented globally to user as  though a single centralised DBMS;
Global DD holds location of each underlying table; DDBMS
performs query decomposition & result joining without user being
made aware
Transparencies
© 2010, University of Colombo School of Computing
32
Distributed Execution
Plan: Example
• Transaction
• Allocation information:
• Global execution plan
Read X;
Read Y;
Read Z;
Report X+Y+Z to user
Read X;
Read Y;
Read Z;
Report X+Y+Z to user
X
DB1
Y
DB2
Z
DB3
Read X from DB1
Read Y from DB2
Read Z from DB3
Calculate X+Y+Z
Report X+Y+Z
Read X from DB1
Read Y from DB2
Read Z from DB3
Calculate X+Y+Z
Report X+Y+Z
Subtransactions to be
executed at different sites
Subtransactions to be
executed at different sites
More…
© 2010, University of Colombo School of Computing
33
Distributed Execution Plan:
Example, cont.
• Global execution planDistributed Execution
Manager
Distributed Execution
Manager
X
DB1
Local Execution
Manager
Local Execution
Manager
Y
DB2
Local Execution
Manager
Local Execution
Manager
Z
DB3
Local Execution
Manager
Local Execution
Manager
Report X+Y+Z
R[x] means Read x
R[X]
R[Y]
R[Z]
XZY
© 2010, University of Colombo School of Computing
34
Fragmentation transparency – For access purposes tables may
be split (fragmented) vertically or horizontally; Details of
fragmentation kept in Global Data Dictionary; User views global
DB without awareness of fragmentation
Transparencies
Employee
Employee Name Employee Salary        Department Name
Silva 50000 Sales
Perera 45000 Sales
Dias 48000 Marketing
Fernando 5100 Marketing
Employee2
Employee Name Employee Salary        Department Name
Dias 48000 Marketing
Fernando 51000 Marketing
Employee1
Employee Name Employee Salary        Department Name
Silva 50000 Sales
Perera 45000 Sales
fragmented
Store at site 1 Store at site 2
Store at site 3
SELECT Salary
FROM Employee
WHERE Name=”Dias”;
SELECT Salary
FROM Employee
WHERE Name=”Dias”;
© 2010, University of Colombo School of Computing
35
No Fragmentation
No Location
Transparencies
Employee
Employee Name Employee Salary        Department Name
Silva 50000 Sales
Perera 45000 Sales
Dias 48000 Marketing
Fernando 5100 Marketing
Employee2
Employee Name Employee Salary        Department Name
Dias 48000 Marketing
Fernando 51000 Marketing
Employee1
Employee Name Employee Salary        Department Name
Silva 50000 Sales
Perera 45000 Sales
fragmented
Store at site 1 Store at site 2
Store at site 3
SELECT Salary
FROM Employee1 at Site1
WHERE Name=”Dias”
UNION
SELECT Salary
FROM Employee2 at Site2
WHERE Name=”Dias”;
SELECT Salary
FROM Employee1 at Site1
WHERE Name=”Dias”
UNION
SELECT Salary
FROM Employee2 at Site2
WHERE Name=”Dias”;
© 2010, University of Colombo School of Computing
36
No Fragmentation
Location
Transparencies
Employee
Employee Name Employee Salary        Department Name
Silva 50000 Sales
Perera 45000 Sales
Dias 48000 Marketing
Fernando 5100 Marketing
Employee2
Employee Name Employee Salary        Department Name
Dias 48000 Marketing
Fernando 51000 Marketing
Employee1
Employee Name Employee Salary        Department Name
Silva 50000 Sales
Perera 45000 Sales
fragmented
Store at site 1 Store at site 2
Store at site 3
SELECT Salary
FROM Employee1
WHERE Name=”Dias”
UNION
SELECT Salary
FROM Employee2
WHERE Name=”Dias”;
SELECT Salary
FROM Employee1
WHERE Name=”Dias”
UNION
SELECT Salary
FROM Employee2
WHERE Name=”Dias”;
© 2010, University of Colombo School of Computing
37
Replication transparency – relations may be duplicated for
local/global access requirements; DDBMS handles replication
without user awareness; issue of update propagation
Replication
© 2010, University of Colombo School of Computing
38
Directory Issues
Type (Global to Local)
Location
(Central
to distributed)
Replication
(non-replication to replicated)
Global & central
& non-replicated
Local & central
& non-replicated (?)
Global & distributed
& non-replicated (?)
Local & distributed
& non-replicated
Global & central
& replicated (?)
Global & distributed
& replicated
Local & distributed
& replicated
Local & central
& replicated (?)
© 2010, University of Colombo School of Computing
39
Homogeneous DDBMS – Homogeneous DDBMSs are software to
integrate DBMSs geographically distributed over network; all
DBMSs have same data model/query language; hardware is same
manufacturer/operating system (server & user).uses one DBMS (e.g.: MS-SQL or Oracle)
Types of DDBMS
© 2010, University of Colombo School of Computing
40
Homogeneous DDBMS
© 2010, University of Colombo School of Computing
41
Heterogeneous DDBMS – difference in local DBMS and user software
uses multiple DBMS’s (e.g.: Oracle and MS-SQL and PostgreSQL).
© 2010, University of Colombo School of Computing
42
Local applications
applications which do not require data from other sites.
Global applications
applications which do require data from other sites.
Degree of local autonomy
Access to DDBMS via global schema (client) – no local autonomy, or direct by
local schema (server) – some degree of local autonomy.
• If single integrated schema then high degree of distribution transparency – no
distribution details.
• If no integrated schema then distribution details are required to formulate
queries – no distribution transparency.
Autonomy
© 2010, University of Colombo School of Computing
43
Models without Global Conceptual Schemas
each site has DBMS of it own. Share data by defining export
schemas.
Federated (semiautonomous)
Multi-database (autonomous)
Autonomy
© 2010, University of Colombo School of Computing
44
Alternatives in Distributed Database
Systems
Distribution
Heterogeneity
Autonomy
Client/server
Peer-to-peer
Distributed DBMS
Federated DBMS
Distributed
multi-DBMS
Multi-DBMS
© 2010, University of Colombo School of Computing
45
Distribution Design Issues
• Why fragment at all?
• How to fragment?
• How much to fragment?
• How to test correctness?
• How to allocate?
• Information requirements?
© 2010, University of Colombo School of Computing
46
Fragmentation
• Can’t we just distribute relations?
• What is a reasonable unit of distribution?
–relation
• views are subsets of relations
• extra communication
– fragments of relations (sub-relations)
• concurrent execution of a number of transactions that
access different portions of a relation
• views that cannot be defined on a single fragment will
require extra processing
• semantic data control (especially integrity enforcement)
more difficult
© 2010, University of Colombo School of Computing
47
Distributed Database Design
Objectives of distributed database design are separation of  data
fragmentation from  data allocation; control of  data
redundancy; independence from local DBMSs
A database is broken into logical units called fragments and
assigned for storage at various sites. Data fragmentation is
partitioning data into number of disjoint subsets.
Design
© 2010, University of Colombo School of Computing
48
Horizontal fragmentation partitions the records of a global table
into subsets. A horizontal fragment keeps only certain rows of the
global relation. The reconstruction is done by taking the union of
all fragments.subsets of tuples (rows) from a relation (table).
Fragmentation
Employee
Employee Name Employee Salary        Department Name
Silva 50000 Sales
Perera 45000 Sales
Dias 48000 Marketing
Fernando 5100 Marketing
Mkt-Employee
Employee Name Employee Salary        Department Name
Dias 48000 Marketing
Fernando 51000 Marketing
Sales-Employee
Employee Name Employee Salary        Department Name
Silva 50000 Sales
Perera 45000 Sales
horizontally
fragmented
© 2010, University of Colombo School of Computing
49
Vertical fragmentation subdivides the attributes of the global
table into groups. A vertical fragment keeps only certain attributes
of the global relation. The reconstruction is done by taking the join
of all fragments using a common key.subsets of attributes (columns) from a relation (table).
Fragmentation
Employee
Employee Name Employee Salary        Department Name
Silva 50000 Sales
Perera 45000 Sales
Dias 48000 Marketing
Fernando 5100 Marketing
Employee-Dept
Employee Name Department Name
Silva Sales
Perera Sales
Dias Marketing
Fernando Marketing
Employee-Pay
Employee Name Employee Salary
Silva 50000
Perera 45000
Dias 48000
Fernando 5100
vertically
fragmented
© 2010, University of Colombo School of Computing
50
Mixed fragmentation is the result of the successive application of
both fragmentation techniques.a fragment which is both horizontally and vertically fragmented
Fragmentation
Employee
Employee Name Employee Salary        Department Name
Silva 50000 Sales
Perera 45000 Sales
Dias 48000 Marketing
Fernando 5100 Marketing
Employee-Sales
Employee Name Department Name
Silva Sales
Perera Sales
Employee-Pay
Employee Name Employee Salary
Silva 50000
Perera 45000
Dias 48000
Fernando 5100
mix
fragmented
Employee-Mkt
Employee Name Department Name
Dias Marketing
Fernando Marketing
© 2010, University of Colombo School of Computing
51
EMPLOYEE
EmpNo Name Salary DNO
10 Perera 8000 SAL
12 De Silva 7500 MKT
22 Alwis 13000 MKT
25 Silva 12000 SAL
27 Dias 15000 MKT
30 Fernando 10000 SAL
SAL-EMP MKT-EMP
EmpNo Name Salary Dept EmpNo Name Salary DNO
10 Perera 8000 SAL 12 De Silva 7500 MKT
25 Silva 12000 SAL 22 Alwis 13000 MKT
30 Fernando  10000 SAL 27 Dias 15000 MKT
Fragmentation
© 2010, University of Colombo School of Computing
52
EMPLOYEE
EmpNo Name Salary DNO
10 Perera 8000 SAL
12 De Silva 7500 MKT
22 Alwis 13000 MKT
25 Silva 12000 SAL
27 Dias 15000 MKT
30 Fernando 10000 SAL
PER-EMP PAY-EMP
EmpNo Name DNO EmpNo Salary
10 Perera SAL 10 8000
12 De Silva MKT 12 7500
22 Alwis MKT 22 13000
25 Silva SAL 25 12000
27 Dias MKT 27 15000
30 Fernando SAL 30 10000
Fragmentation
© 2010, University of Colombo School of Computing
53
Criteria for Fragment Definition
Completeness Condition
All the data of the global relations must be mapped into the
fragments.
Reconstruction Conditions
It must always be possible to reconstruct each global relation
from its fragments
Criteria
© 2010, University of Colombo School of Computing
54
Degree of Fragmentation
• Finding the suitable level of partitioning
within this range
• finite number of alternatives
– Tuples or Attributes
– Relations
© 2010, University of Colombo School of Computing
55
Correctness rules of fragmentation
• Completeness
If a relation instance R is decomposed into fragments              R1,R2 …. Rn, each data item that can be found in R can also be found in one or more of Ri’s.
• Reconstruction
If a relation R is decomposed into fragments R1,R2 …. Rn, it should be possible to define a relational operator such that
R = ▼Ri, ¥Ri εFR ,
Please note the operator  would be different for the different forms of fragmentation
• Disjointness
If a relation R is horizontally decomposed into fragments         R1,R2 …. Rn, and data item di is in Rj, it is not in any other fragment Rk (k != j).
Silva Sales
R2
Silva 50000
R1
Silva 50000 Sales
R
R = R1       R2
© 2010, University of Colombo School of Computing
56
strategy is determined by the system architecture and network.
Four basic approaches:
Centralised: all the data is located at a single site
Partitioned: database is partitioned into disjoint fragments and
each fragment is assigned to a particular site
Replicated: allocate a full copy of the database to each site
Selective Replication: partitioned data into critical and non-critical fragments and replicate the critical fragments to achieve
the required level of availability and performance.
Data Allocation
X+Y+Z
DB
X+Y+Z
DB3
X
DB1
Y
DB2
Z
DB3
X+Y+Z
DB1
X+Y+Z
DB2
X+Y+Z
DB3
X
DB1
X+Y
DB2
© 2010, University of Colombo School of Computing
57
Allocation Alternatives
• Non-replicated
– partitioned : each fragment resides at only one site
• Replicated
– fully replicated : each fragment at each site
– partially replicated : each fragment at some of the
sites
• Rule of thumb:
If read-only queries ≥update queries replication is
advantageous,
otherwise replication may cause problems
X
DB1
Y
DB2
Z
DB3
X+Y+Z
DB1
X+Y+Z
DB2
X+Y+Z
DB3
X+Y+Z
DB3
X
DB1
X+Y
DB2
© 2010, University of Colombo School of Computing
58
Comparison of Replication
Alternatives
Full Replication Partial
Replication
Partitioning
Query
Processing
Easy Same Difficulty
Directory
Management
Easy or
nonexistent
Same Difficulty
Concurrency
Control
Moderate Difficult Easy
Reliability Very High High Low
Reality Possible
Application
Realistic Possible
application
© 2010, University of Colombo School of Computing
59
Information Requirements
• Database information
– selectivity of fragments
– size of a fragment
• Application information
– access types and numbers
– access localities
• Communication network information
– unit cost of storing data at a site
– unit cost of processing at a site
• Computer system information
– bandwidth
– latency
– communication overhead
© 2010, University of Colombo School of Computing
60
Cost/benefit of the replicated database allocation strategy can be
estimated in terms of storage cost, communication cost (query and
update time) and data availability.
An optimal data allocation can be theoretically determined to
minimise the total cost (storage+communication+local processing)
subject to some response time and availability constraints.
Cost Benefit
© 2010, University of Colombo School of Computing
61
Trade-offs due to Data Replication
123456
Fragment (No of copies)
Cost
storage cost
query time
update time
data availability
Cost Benefit
© 2010, University of Colombo School of Computing
Strategies to Distribute Data
• Fragmentation
• Allocation
• Distribution?
• Access?
• Query Processing?
• Security? 62
S1
S2
S3
S4 S5
?
??
?
Employee
Employee Name       Employee Salary          Department Name
Silva 50000 Sales
Perera 45000 Sales
Dias 48000 Marketing
Fernando 51000 Marketing
?
© 2010, University of Colombo School of Computing
63
Types of Distribution Schemes
• Round robin
creates even data distribution by randomly placing rows
in fragments
– The relation is scanned in any order and the i th tuple is
sent to disk numbered D (i mod n). Ensure even
distribution of tuples across disks.
Consider to use round robin when your queries perform
sequential scans and you have little information about the
data being stored. Also useful when your application is
update intensive or when fast data loading is important.
© 2010, University of Colombo School of Computing
64
• Round robin
Advantages
– no knowledge of the data is needed to achieve an even
distribution among the fragments.
– When column values are updated, rows are not moved
to other fragments because the distribution does not
depend on column values.
Disadvantage
– query optimiser is not able to eliminate fragments when
evaluating a query.
CREATE TABLE table1 (col1 char(5), …)
FRAGMENT BY ROUND ROBIN IN dbspace1, dbspace2;
© 2010, University of Colombo School of Computing
65
Distribution Schemes contd.
• Expression based
puts related rows in the same fragment, may creates
uneven distribution of data
Logical and Relational Operators
– Distributes contiguous attribute-value ranges to each
disk. If value range is >= v(i) and < v(i+1) then place
on disk i+1.
CREATE TABLE table1 (col1 char(5), …) FRAGMENT BY
EXPRESSION  col1>=1 and col1<=10 IN dbspace1,
col1>10 and col1<=20 IN dbspace2, REMAINDER IN
dbspace3;
© 2010, University of Colombo School of Computing
66
CREATE TABLE table1 (col1 char(5), …) FRAGMENT BY
EXPRESSION col1 IN (1000, 6000, 8500) IN
dbspace1, col1>10 and col1<=20 IN dbspace2,
REMAINDER IN dbspace3;
Advantages
• fragments may be eliminated from query scans.
• data can be segmented to support a particular archiving
strategy.
• users can be granted privileges at the fragment level.
• unequal data distribution can be created to offset an
unequal frequency of access.
© 2010, University of Colombo School of Computing
67
Disadvantages
• CPU resources are required for rule evaluation. When rule
is complex more CPU time is consumed.
• finding the optimum rule may be an iterative process and
once found may need to be monitored.
• more administrative work than the round robin.
Consider using an expression strategy when:
• non-overlapping fragments on a single column can be
created.
• The table is accessed with a high degree of selectivity.
• The data access is not evenly distributed.
• Overlapping fragments on single or multiple columns can
be created.
© 2010, University of Colombo School of Computing
68
Distribution Schemes contd.
• Expression based
a hash function can be used to evenly distribute
data across fragments, especially when the
column value may not divide commonly
accessed data evenly across fragments.
Hash Functions
– Maps each tuple to a disk location based on a
hash function, whose range is (0, 1, …, n-1). If
hash function returns i, then place tuple on
disk i.
© 2010, University of Colombo School of Computing
69
Advantage
• a hash expression yields an even distribution of data.
• When there is an unequal search it permits fragment
elimination during query optimisation.
Disadvantage
• fragment elimination does not occur during a range
search.
CREATE TABLE table1 (col1 char(5), …)
FRAGMENT BY EXPRESSION
MOD(col1,3) = 0 IN dbspace1, MOD(col1,3) = 1 IN
dbspace2, MOD(col1,3) = 2 IN dbspace3;
© 2010, University of Colombo School of Computing
70
Guidelines
• avoid REMAINDER IN clause as fragment is always
checked
• distribute data so that I/O is balanced across disks. Not
necessarily means even distribution.
• Keep fragment expressions simple. Complex expressions
takes more CPU time to evaluate. Avoid any expression
that must perform a conversion.
• Optimised data loads by placing the most frequently
accessed fragment first in your fragment statement. This
reduces the number of fragments to be checked.
• If a significant benefit is not expected, do not fragment
the table.
© 2010, University of Colombo School of Computing
71
Distributed Query
Processing Methodology
© 2010, University of Colombo School of Computing
Distributed QP
• Problem of query processing can itself be
decomposed into several sub-problems
corresponding to various layers.
• First two correspond to query rewriting.
• First three layers are performed by a
central site using global information.
• Fourth is done by the local site.
72
© 2010, University of Colombo School of Computing
73
Step 1 – Query Decomposition
Use techniques of a centralise DBMS on global relations. Calculus
query is rewritten in a normalised form suitable for subsequent
manipulation.
• Input : Calculus query on global relations
•Normalisation
– manipulate query quantifiers and qualification by applying logical
operator priority
• Analysis
– detect and reject “incorrect” queries
– possible for only a subset of relational calculus
• Simplification
– eliminate redundant predicates
• Restructuring
– calculus query ⇒algebraic query
– more than one translation is possible
– use transformation rules
© 2010, University of Colombo School of Computing
74
Normalisation
• Lexical and syntactic analysis
– check validity (similar to compilers)
– check for attributes and relations
– type checking on the qualification
• Put into normal form
– Conjunctive normal form
(p11∨p12∨…∨p1n) ∧…∧(pm1∨pm2∨…∨pmn)
– Disjunctive normal form
(p11∧p12 ∧…∧p1n) ∨…∨(pm1 ∧pm2∧…∧pmn)
– OR’s mapped into union
– AND’s mapped into join or selection
© 2010, University of Colombo School of Computing
75
Analysis
• Refute incorrect queries
• Type incorrect
– If any of its attribute or relation names are not defined in the global schema
– If operations are applied to attributes of the wrong type
• Semantically incorrect
– Components do not contribute in any way to the generation of the result
– Only a subset of relational calculus queries can be tested for correctness
– Those that do not contain disjunction and negation
– To detect• connection graph (query graph)
• join graph
© 2010, University of Colombo School of Computing
76
Analysis – Example
SELECT ename,mgr
FROM Employee E, WorksOn W, Project P
WHERE E.eno = W.eno
AND W.pno = P.pno
AND pname = “CAD/CAM”
AND duration ≥36
AND title = “Programmer”
E.eno=W.eno W.pno=P.pno
P
W
E
Result
ename
mgrTitle=“Programmer”
pname=“CAD/CAM”
duraton ≥36
© 2010, University of Colombo School of Computing
77
Analysis
• If the query graph is not connected, the query is
wrong.
SELECT ename,mgr
FROM Employee E, WorksOn W, Project P
WHERE E.eno = W.eno
AND pname = “CAD/CAM”
AND duration ≥36
AND title = “Programmer”
E.eno=W.eno
P
W
E
Result
ename
mgrTitle=“Programmer”
pname=“CAD/CAM”
duration ≥36
© 2010, University of Colombo School of Computing
78
Simplification
• Why simplify?
– Simple query vs Complex
• How? Use transformation rules
– elimination of redundancy
– Idem potency rules
• p1 ∧¬( p1) ⇔false
• p1 ∧(p1 ∨p2) ⇔p1
• p1 ∨false ⇔p1
•…
– application of transitivity
– use of integrity rules
© 2010, University of Colombo School of Computing
79
Simplification – Example
SELECT Title
FROM Employee
WHERE ENAME = “Perera”
OR ( NOT (Title = “Programmer”)
AND (Title = “Programmer”
OR Title = “Elect. Eng.”)
AND NOT (Title = “Elect. Eng.”))
SELECT Title
FROM Employee
WHERE ENAME = “Perera”
© 2010, University of Colombo School of Computing
80
Restructuring
• Convert relational calculus to relational
algebra
• Make use of query trees
E.eno=W.eno W.pno=P.pno
P
W
E
Result
ename
mgrTitle=“Programmer”
pname=“CAD/CAM”
duraton ≥36
W E
∞eno
P
∞pno
σpname=“CAD/CAM”
σduraton ≥36
σTitle=“Programmer”
πename, mgr
© 2010, University of Colombo School of Computing
81
Restructuring
W E
∞eno
P
∞pno
σpname=“CAD/CAM”
σduraton ≥36 σTitle=“Programmer”
πename, mgr
πeno, pno πeno, ename
πpno, mgrπpno, ename
© 2010, University of Colombo School of Computing
82
Distributed Query
Processing Methodology
© 2010, University of Colombo School of Computing
83
Step 2 – Data Localisation
Input: Algebraic query on distributed
relations
• Determine which fragments are involved
• Localisation program
– substitute for each global query its
materialisation program
– optimise
© 2010, University of Colombo School of Computing
84
Example
Assume
• Employee is fragmented into EMP1,EMP2,
EMP3 as follows:
–EMP1 = σeno≤“E3”(Employee)
–EMP2 = σ“E3”<eno≤“E6”(Employee)
–EMP3 = σeno≥“E6”(Employee)
• WorkOn fragmented into Wrk1 and Wrk2
as follows:
–Wrk1 = σeno≤“E3”(WorkOn)
–Wrk2 = σeno>“E3”(WorkOn)
© 2010, University of Colombo School of Computing
85
Example
Replace EMP
by
(EMP1∪EMP2
∪EMP3 ) and
WorkOn by
(Wrk1 ∪Wrk2)
in any query
W E
P
∞pno
σTitle=“Programmer” and duraton ≥36 and
pname=“CAD/CAM”
πename, mgr
∞eno
EMP1 EMP2 EMP3

Wrk2

Wrk1
© 2010, University of Colombo School of Computing
86
Provides Parallelism
Eliminate unnecessary work
Wrk1
∞eno
EMP1 Wrk2
∞eno
EMP2 Wrk2
∞eno
EMP3


Reduction with join
• Distribute join over unions
• Apply the reduction rule
© 2010, University of Colombo School of Computing
87
Step 3 – Global Query
Optimisation
Input: Fragment query
• Find the best (not necessarily optimal) global schedule
– Minimize a cost function
– Distributed join processing
• Bushy vs. linear trees
• Which relation to ship where?
• Ship-whole vs ship-as-needed
– Decide on the use of semi-joins
• Semi-join saves on communication at the expense of more local processing.
– Join methods
• nested loop vs ordered joins (merge join or hash join)
© 2010, University of Colombo School of Computing
88
Cost-Based Optimisation
• Solution space
– The set of equivalent algebra expressions (query
trees).
• Cost function (in terms of time)
– I/O cost + CPU cost + communication cost
– These might have different weights in different
distributed environments (LAN vs WAN).
– Can also maximise throughput
• Search algorithm
– How do we move inside the solution space?
– Exhaustive search, heuristic algorithms (iterative
improvement, simulated annealing, genetic,…)
© 2010, University of Colombo School of Computing
89
Search Space
• Search space characterised by alternative
execution plans
• Focus on join trees
•For N relations, there are O(N!) equivalent
join trees that can be obtained by applying
commutative and associative rules
SELECT ename,mgr
FROM Employee E, WorksOn W, Project P
WHERE E.eno = W.eno
AND W.pno = P.pno
© 2010, University of Colombo School of Computing
90
Search Space• Restrict by means of heuristics
– Perform unary operations before binary
operations
• Restrict the shape of the join tree
– Consider only linear trees, ignore bushy ones
Liner use at least one base relation at a time
while bushy ends up using intermediate
relations. But bushy is good for parallel
processing.
R2





∞R4
R3
R1 R1 R2 R3 R4
Linear Join Tree Bushy Join Tree
© 2010, University of Colombo School of Computing
91
Search Strategy
• How to “move” in the search space.
• Deterministic
– Start from base relations and build plans by adding
one relation at each step
– Dynamic programming: breadth-first (build all possible
plans and chose best)
– Greedy: depth-first (build only one-plan)
• Randomised
– Search for optimality around a particular starting point
– Trade optimisation time for execution time
– Better when > 5-6 relations
– Iterative improvement
© 2010, University of Colombo School of Computing
92
Distributed Query Optimisation
Problems
• Cost model
– multiple query optimisation
– heuristics to cut down on alternatives
• Larger set of queries
– optimisation only on select-project-join queries
– also need to handle complex queries (e.g., unions, disjunctions, aggregations and sorting)
• Optimisation cost vs execution cost trade-off
– heuristics to cut down on alternatives
– controllable search strategies
• Optimisation/re-optimisation interval
– extent of changes in database profile before re-optimisation is necessary
© 2010, University of Colombo School of Computing
Object Relational Database
Management Systems
Mr.Harsha Wijayawardhana
© 2010, University of Colombo School of Computing
Definition of ORDBMS
• Object Relational Databases Management
systems have emerged to enhance the
capabilities of relational database
management systems
• the idea that object-oriented database
concepts can be superimposed on
relational databases, is more commonly
encountered in available products
© 2010, University of Colombo School of Computing
Reasons for the evolution of
ORDBMS
• Today’s applications deal with graphics,
images, weather forecasting, biological
gnome data etc
• Further we have to deal with audio and
video streaming data
© 2010, University of Colombo School of Computing
Inadequacy of Relational model
• The handling above challenges was a
problem since many of these applications
handle the above as objects.
• Some of these applications were
developed with Object Oriented
Languages
© 2010, University of Colombo School of Computing
Object Database Management
System
• An object database management system
(ODBMS, also referred to as object-oriented
database management system or OODBMS), is
a database management system (DBMS) that
supports the modeling and creation of data as
objects. This includes some kind of support for
classes of objects and the inheritance of class
properties and methods by subclasses and their
objects.
• (Fundamentals of Database Systems page 359)
© 2010, University of Colombo School of Computing
History of ODMS
• Early 1980s – Orion Research Project at MCC
Won Kim at MCC (Microelectronics and
Computer Technology Corporation) in Austin,
Texas, begins a research project on ORION.
Two products will later trace their history to
ORION: ITASCA (no longer around) and
Versant.
© 2010, University of Colombo School of Computing
Commercial Products
• Late 1980
A Lisp-based system, Graphael, appears from the
French nuclear regulatory efforts. Eventually,
Graphael goes through a re-write and becomes
Matisse.
Servo-Logic begins work on GemStone. Servo-Logic is now GemStone Systems.
© 2010, University of Colombo School of Computing
Commercial Products cont …
• Start of O2 development at INRIA (France). The founder
of O2 is Francois Bencilhon, also from MCC.
Tom Atwood at Ontologic produced Vbase, which
supports the proprietary language COP (for C Object
Processor). COP is eventually eclipsed by C++,
Ontologic becomes ONTOS, and the database is
rewritten to support C++. Tom left Ontologic in the late
1980s and founded Object Design (now part of Progres
Software) with ObjectStore (based on C++).
© 2010, University of Colombo School of Computing
Commercial Products cont …
• 1991 – ODMG
Rick Cattell (SunSoft) initiates the ODMG with 5
major OODBMS vendors. The first standard,
ODMG 1.0, was released in 1993. Throughout
the 1990s, the ODMG works with the X3H2
(SQL) committee on a common query language.
Though no specific goal is achieved, the efforts
heavily influence the ODMG OQL (object query
language) and, to a lesser extent, SQL:1999.
© 2010, University of Colombo School of Computing
Commercial Products cont …
• 2001 – Final ODMG 3.0 standards
released.
Final  ODMG 3.0 standards is released.
Shortly thereafter, the ODMG submits the
ODMG Java Binding to the Java
Community Process as a basis for the
Java Data Objects (JDO) Specification.
Afterwards, the ODMG disbands.
© 2010, University of Colombo School of Computing
Review of Object Oriented
concepts
• Origins of OO concepts can be traced to
Object Oriented Programming languages
• Today Object Oriented concepts are
applied to many areas: databases,
Software Engineering, Knowledge bases.
• OOPLs have its roots to SIMULA which
was proposed in 1960
© 2010, University of Colombo School of Computing
Review cont …
• In SIMULA  class group together the
internal data structures in an object in a
class. Researchers later proposed the
concept of abstract data type which hides
the internal data structures and specifies
all possible external operations can be
applied that can be applied to an object,
leading to the concept encapsulation.
© 2010, University of Colombo School of Computing
Review cont …
• SMALL TALK developed at XEROX PARC (Palo SMALL TALK developed at XEROX PARC (Palo
Alto Research Centre, California) in 1970 was
one of the first languages to incorporate
additional OO concepts
• Object : State (value) and behavior (operations)
• Another key concept in  OO systems is that of
type and class hierarchies and inheritance
:Multiple Inheritance or Selective inheritance (pp
362 -367 and 380-381)
© 2010, University of Colombo School of Computing
Review cont …
• Other OO concepts are:
– Polymorphism which refers sometime as
Operator overloading (This gives rise to early
binding for strong typed and late binding weak
typed)
© 2010, University of Colombo School of Computing
The Object Model
• The basic modeling primitives are the object and the literal (constant). Each object has a unique identifier. A literal (constant). Each object has a unique identifier. A literal has no identifier.
• Objects and literals can be categorized by their types. All elements of a given type have a common range of states (i.e., the same set of properties) and common behavior (i.e., the same set of defined operations). An object is sometimes referred to as an instance of its type.
• The state of an object is defined by the values it carries for a set of properties. These properties can be attributes of the object itself or relationships between the object and one or more other objects. Typically the values of an object’s properties can change over time.
© 2010, University of Colombo School of Computing
Object Model cont …
• The behavior of an object is defined by the set of operations that can be executed on or by the object. Operations may have a list of input and output parameters, each with a specified type. Each operation may also return a typed result.
• A database stores objects, enabling them to be shared by multiple users and applications. A database is based on a schema that is defined in ODL and contains instances of the types defined by its schema.
© 2010, University of Colombo School of Computing
Object specifications languages
• The primary objective of these languages
is to facilitate the portability of databases
across ODMG compliant implementations.
These languages also provide a step
toward the interoperability of ODBMSs
from multiple vendors.
© 2010, University of Colombo School of Computing
Object specification language cont

• Object Definition Language (ODL) and
Object Interchange Format (OIF).
© 2010, University of Colombo School of Computing
Object Definition Language (ODL)
• ODL should support all semantic constructs of
the ODMG Object Model.
• ODL should not be a full programming language,
but rather a definition language for object
specifications.
• ODL should be programming-language
independent.
© 2010, University of Colombo School of Computing
• ODL should be compatible with the OMG’s Interface Definition Language (IDL).
• ODL should be extensible, not only for future functionality, but also for physical optimizations.
• ODL should be practical, providing value to application developers, while being supportable by the ODBMS vendors within a relatively short time frame after publication of the specification.
• (for further reading on Object Definition Language Fundamentals of Database Systems pp 399-404)
• (www.odmg.org )
© 2010, University of Colombo School of Computing
Object Query Language (OQL)
• OQL relies on the ODMG object model.
• OQL is very close to SQL 92. Extensions concern objectOQL is very close to SQL 92. Extensions concern object-oriented notions, like complex objects, object identity, path expressions, polymorphism, operation invocation, path expressions, polymorphism, operation invocation, late binding.
• OQL provides high-level primitives to deal with sets of objects but is not restricted to this collection construct. It also provides primitives to deal with structures, lists, arrays, and treats such constructs with the same efficiency.
© 2010, University of Colombo School of Computing
OQL cont …• OQL is a functional language where operators can freely be composed, as long as the operands respect the type system. This is a consequence of the fact that the result of any query has a type which belongs to the ODMG type model, and thus can be queried again .
• OQL is not computationally complete. It is a simple-to-use query language which provides easy access to an ODBMS.
• Based on the same type system, OQL can be invoked from within programming languages for which an ODMG binding is defined. Conversely, OQL can invoke operations programmed in these languages.
© 2010, University of Colombo School of Computing
OQL cont …
• OQL does not provide explicit update operators but
rather invokes operations defined on objects for that rather invokes operations defined on objects for that
purpose, and thus does not breach the semantics of an
ODBMS which, by definition, is managed by the
“methods” defined on the objects.
• OQL provides declarative access to objects. Thus OQL
queries can be easily optimized by virtue of this
declarative nature.
• The formal semantics of OQL can easily be defined.
© 2010, University of Colombo School of Computing
Simple queries using OQL (syntax
for O2)
• Basic syntax is select …from…where …
SELECT <list of values>
FROM <list of collections and variable assignments>
WHERE <condition>
SELECT SName: p.name
FROM p in People
WHERE p.age > 26
© 2010, University of Colombo School of Computing
OQL cont …
• Dot notation and Path Expressions
– Let variables t and ta range over objects in extents
(persistent names) of Tutors and TAs (i.e., range over
objects in sets Tutors and TAs).
ta.salary -> real
t.students -> set of tuples of type tuple(name: string,
fee: real) representing students
t.salary -> real
– Cascade of dots can be used if all names represent
objects and not a collection.
© 2010, University of Colombo School of Computing
OQL cont …
• Find the names of the students of all
tutors:
SELECT s.name
FROM Tutors t, t.students s
© 2010, University of Colombo School of Computing
OQL cont …
• Sub queries
– Give the names of the Tutors which have a salary
greater than Rs.3000 and have a student paying
more than Rs. 3000:
SELECT t.name
FROM ( SELECT t FROM Tutors t WHERE t.salary >
300 ) r, r.students s
WHERE s.fee > 3000
(pp 405 – 409)
© 2010, University of Colombo School of Computing
ORDBMS
• Some of the Object Relational Database
Management Systems are:
– Oracle 8
– Postgresql
– Informix Universal Server
• In addition, we will discuss SQL 3 language
latest version of SQL language which extends
SQL 2 by incorporating object database and
other features such as extended data types
© 2010, University of Colombo School of Computing
ORDMS cont …
• SQL 3
– SQL3 object facilities primarily involve
extensions to SQL’s type facilities; however,
extensions to SQL table facilities can also be
considered relevant. Additional facilities
include control structures to make SQL a
computationally complete language for
creating, managing, and querying persistent
object-like data structures.
© 2010, University of Colombo School of Computing
SQL 3 cont …
• The parts of SQL3 that provide the primary basis
for supporting object-oriented structures are:
– user-defined types (ADTs, named row types, and
distinct types)
– type constructors for row types and reference types
– type constructors for collection types (sets, lists, and
multi sets)
– user-defined functions and procedures
– support for large objects (BLOBs and CLOBs)
© 2010, University of Colombo School of Computing
SQL 3 cont …
• A row type is a sequence of field
name/data type pairs resembling a table
definition. Two rows are type-equivalent if
both have the same number of fields and
every pair of fields in the same position
have compatible types. The row type
provides a data type that can represent
the types of rows in tables, so that
complete rows can be stored.
© 2010, University of Colombo School of Computing
SQL 3 cont …
• Operations that may be invoked in SQL include defined
operations on tables (SELECT, INSERT, UPDATE,
DELETE), the implicitly defined functions  defined for
ADT attributes, and routines either explicitly associated
with ADTs or defined separately.
• Routines associated with ADTs are FUNCTION
definitions for type-specific user-defined behavior. The
FUNCTION definitions specify the operations on the
ADT and return a single value of a defined data type.
Functions may either be SQL functions, completely
defined in an SQL schema definition, or external
functions, defined in standard programming languages. functions, defined in standard programming languages.
© 2010, University of Colombo School of Computing
SQL 3 cont …
• QL functions associated with ADTs are
invoked using either a functionalnotation
or a dot notation (the dot notation, double
dot notation, is syntactic sugar for the
functional notation). For example:
BEGIN DECLARE r real_estate … SET r..area = 2540; /* same as area(r,2540) SET … = = 2540; /* same as area(r,2540) SET … = r..area; /*
same as area(r) … SET … = r..location..state; /* same as state(location(r)) SET r..location..city =
‘LA’; /* same as city(location(r),’LA’) …
© 2010, University of Colombo School of Computing
SQL 3 cont …
• Different routines may have the same name.
• This is referred to as overloading, and may be required, for example, to allow an ADT subtype to redefine an operation inherited from a super type.
© 2010, University of Colombo School of Computing
SQL 3 cont …
• SQL3 implements what is sometimes
known as a generalized object model,
meaning that the types of all arguments of
a routine are taken into consideration
when determining what routine to invoke,
rather than using only a single type
specified in the invocation as, for example,
in C++ or Smalltalk
© 2010, University of Colombo School of Computing
SQL 3 cont …
• Inheritance
– An ADT can be defined as a subtype of one
or more ADTs by defining itas UNDER those
ADTs (multiple inheritance is supported). In
this case,the ADT is referred to as a direct
subtype of the ADTs specifiedin the UNDER
clause, and these ADTs are direct
supertypes.
© 2010, University of Colombo School of Computing
SQL 3 cont …
– A typecan have more than one subtype and
more than one supertype. A subtype inherits
all the attributes and behavior of its
supertypes; additional attributes and behavior
can also be defined. An instance of a subtype
is consideredan instance of all of its
supertypes. An instance of a subtype can be
usedwherever an instance of any of its
supertypes is expected.
© 2010, University of Colombo School of Computing
• CREATE TABLE person (name
CHAR(20), sex CHAR(1), age INTEGER);
• CREATE TABLE employee UNDER
person (salary FLOAT);
• CREATE TABLE customer UNDER
person (account INTEGER);
© 2010, University of Colombo School of Computing
SQL 3 cont …
• The number of other statements which are added:
– An assignment statement that allows the result of an
SQL value expressionto be assigned to a free
standing local variable, a column, or an attributeof an
ADT.
– A CALL statement that allows invocation of an SQL
procedure.
– A RETURN statement that allows the result of an SQL
value expressionto be returned as the RETURNS
value of the SQL function.
© 2010, University of Colombo School of Computing
– A CASE statement to allow selection of an execution path based on alternative choices.
– An IF statement with THEN, ELSE, and ELSEIF alternatives to allow selection of an execution path based on the truth value of one or more conditions.
– Statements for LOOP, WHILE, and REPEAT to allow repeated execution of a block of SQL statements. WHILE checks a <search condition> before execution of the block, and REPEAT checks it afterwards. All three statements are allowed to have a statement label.
© 2010, University of Colombo School of Computing
PostgreSQL
• PostgreSQL 8.1
• POSTGRESQL’s ancestor was Ingres,
developed at the University of California at
Berkeley (1977-1985). The Ingres code
was later enhanced by Relational
Technologies/Ingres Corporation, which
produced one of the first commercially
successful relational database servers.
© 2010, University of Colombo School of Computing
Create
• CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP
} ] TABLE table_name ( [ { column_name data_type [
DEFAULT default_expr ]
[ column_constraint [ … ] ] | table_constraint | LIKE
parent_table [ { INCLUDING | EXCLUDING } DEFAULTS
] } [, … ] ] )
[ INHERITS ( parent_table [, … ] ) ]
[ WITH OIDS | WITHOUT OIDS ] [ ON COMMIT {
PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE tablespace ]
© 2010, University of Colombo School of Computing
Inheritance in PostgreSQL
• CREATE TABLE cities ( name text,
population float, altitude int — (in ft) );
CREATE TABLE capitals ( state char(2) )
INHERITS (cities);
Capitals inherits from cities.
© 2010, University of Colombo School of Computing
• SELECT name, altitude FROM cities
WHERE altitude > 500;
• name | altitude ———–+
Las Vegas | 2174
Mariposa | 1953
Madison | 845
© 2010, University of Colombo School of Computing
• SELECT name, altitude FROM ONLY
cities WHERE altitude > 500;
• name | altitude
Las Vegas | 2174
Mariposa | 1953
Data Warehouse & Mining
© 2012, University of Colombo School of Computing
Information Systems
© 2012, University of Colombo School of Computing
Transaction Processing Systems
• Substitutes computer-based processing
for manual procedures.
Deals with well-structured processes.
© 2012, University of Colombo School of Computing
Deals with well-structured processes.
Includes record keeping applications.
E.g. Airline Reservation Systems, Banking
Systems, or the Accounting System
Management Information Systems
• Provides input to be used in the
managerial decision process. Deals with
supporting well structured decision
© 2012, University of Colombo School of Computing
supporting well structured decision
situations. Typical information
requirements can be anticipated
Decision Support Systems
• Provides information to managers who
must make judgements about particular
situations. Supports decision-makers in
© 2012, University of Colombo School of Computing
situations. Supports decision-makers in
situations that are not well structured
Organizational Needs
• Better Strategic Decision Making &
Planning
• Convert Data / Information into Business
Intelligence
© 2012, University of Colombo School of Computing
Intelligence
• Manipulate Data Analytically to obtain
Business Intelligence
Why Data Warehouse?
• Management Requirements to process data at
corporate level using many files and collection of data
that have been accumulated over the years
• Limitations in operational systems
• Reporting against un integrated operational data can be
hazardous
© 2012, University of Colombo School of Computing
hazardous
• Reporting against data that is distributed in multiple
sources and are incompatible with each other
• Make operational data accessible and easily efficiently
queried so that management can get answers to
business questions
Operational (TP) Systems & Data
• Application oriented
• Support day to day operations
• Focused on operational efficiency
© 2012, University of Colombo School of Computing
Operational (TP) Systems & Data
• Not focused on complex queries required
by the management
• Not robust enough to meet future needs
• The data serving operational needs is
physically different data from that serving
© 2012, University of Colombo School of Computing
physically different data from that serving
informational or analytic needs
Online Transaction Processing
(OLTP) Systems
Designed to get data in quickly and to
analyse current events.
Characterised by:
© 2012, University of Colombo School of Computing
Characterised by:
– Process oriented
– Data Normalised
– Current data
– Volatile data
– Updated in real-time
Management Decision Support Requirements
Examples
• Sales by Region, Country, Product and by Quarter
• Customer purchasing trends
• What are the most popular products purchased by
© 2012, University of Colombo School of Computing
• What are the most popular products purchased by
customer between the ages 15 to 30?
Management Decision Support Requirements
Examples
• Sales by Region, Country, Product and by Quarter
• Customer purchasing trends
• What are the most popular products purchased by
© 2012, University of Colombo School of Computing
• What are the most popular products purchased by
customer between the ages 15 to 30?
Data Warehouse Systems
Designed to get data out and quickly analyse.
Characterised by:
– Subject oriented rather than process orientated
© 2012, University of Colombo School of Computing
– Subject oriented rather than process orientated
– Integrated across subjects and entire enterprise
– De-normalised data
– Time-variant
– Historical
– Non Volatile
– Atomic and Summary data
Data Warehouse
“A data warehouse is a subject-oriented,
integrated, time–variant and non-volatile
collection data in support of management’s
decision making process” [W.H.Inmon 96].
© 2012, University of Colombo School of Computing
The four keywords: subject-oriented, integrated, time-variant and non-volatile
distinguish data  warehouse form other data repository
systems, such as relational database systems,
transaction processing systems, and file systems .
Data Warehouse
• Subject-oriented: A data warehouse is organizes
around major subjects, such as customer, supplier,
product and sales.
• Rather than concentrating on the   day –to-day
operations and transaction processing of an
© 2012, University of Colombo School of Computing
operations and transaction processing of an
organisation, a data warehouse focuses on the
modelling and analysis of data for decision makers.
• Hence, data warehouse typically provide a simple
and concise view around particular subject issues
by excluding data that are not useful in the decision
support process
Data Warehouse
• Integrated: A data warehouse is usually
constructed by integrating multiple
heterogeneous sources, such as relational
databases, flat files, and on-line transaction
© 2012, University of Colombo School of Computing
records.
• Data cleaning data integration techniques
are applied to ensure consistency naming
conventions, encoding structures, attribute
measures, and so on
Data Warehouse
• Time variant: Data are stored to provide
information from a historical perspective
(e.g., the past 5-10 years).
• Every key structure in the data warehouse
© 2012, University of Colombo School of Computing
• Every key structure in the data warehouse
contains, either implicitly or explicitly, an
element of time.
Data Warehouse
• Non-volatile: A data warehouse is always a
physically separate store of data transformed
from the application data found in the
operational environment.
• Due to this separation, a data warehouse
© 2012, University of Colombo School of Computing
• Due to this separation, a data warehouse
does not require transaction processing
recovery, and concurrency control
mechanisms.
• It usually requires only two operations in data
accessing: initial loading and access of data.
Operational Database (OD) versus
Data Warehouse (DW)
• Data in the DW is stored primarily for the
purpose of providing data that can be
interrogated by business people to gain value
from information derived from daily operations.
• Use of the DW is to drive decision support.
© 2012, University of Colombo School of Computing
• Use of the DW is to drive decision support.
• The OD is used to process information that is
needed for the purposes of performing
operational tasks.
• The OD is active for updates during all hours
that business activities are executed. The DW is
used for read-only querying during active
business hours.
Data Mart
A departmentalized structure of data feeding
from the Data Warehouse where data is
denormalized based on the department’s
need for information.
Data marts are generally small data warehouses
© 2012, University of Colombo School of Computing
Data marts are generally small data warehouses
and uses the same structures and many of the
same development methods as data warehouses.
The difference is they are intend to meet a specific
need or to deliver only one type of information.
ReportsReports
Seamless Interoperability/Migration
Operational
data
An Integrated Environment for
Business Intelligence
© 2012, University of Colombo School of Computing
Warehouse
Discoverer
Legacy
datadatadatadata
Other data
Seamless Interoperability/
Express
DW Designing- Main Considerations
• Data Warehouse or Data mart
• Data extraction, validate and loading
• Data models in the source
• Dimensional model in the DW
© 2012, University of Colombo School of Computing
• Dimensional model in the DW
• Managing data volumes in the DW
• Refreshing data in the DW
© 2012, University of Colombo School of Computing
The ETL Process
• Capture
• Scrub or data cleansing
• Transform
© 2012, University of Colombo School of Computing
• Transform
• Load and Index
ETL = Extract, Transform and Load
Steps in data reconciliation
© 2012, University of Colombo School of Computing
Static extract = capturing a
snapshot of the source data at
a point in time
Incremental extract =
capturing changes that have
occurred since the last static
extract
Capture = extract…obtaining a snapshot
of a chosen subset of the source data for
loading into the data warehouse
Steps in data reconciliation (continued)
© 2012, University of Colombo School of Computing
Scrub = cleanse…uses pattern
recognition and AI techniques to
upgrade data quality
Fixing errors: misspellings,
erroneous dates, incorrect field usage,
mismatched addresses, missing data,
duplicate data, inconsistencies
Also: decoding, reformatting, time
stamping, conversion, key generation,
merging, error detection/logging,
locating missing data
Steps in data reconciliation (continued)
© 2012, University of Colombo School of Computing
Transform = convert data from format
of operational system to format of data
warehouse
RecordRecord–level:level:
Selection – data partitioning
Joining – data combining
Aggregation – data summarization
FieldField–level:level:
single-field – from one field to one field
multi-field – from many fields to one, or
one field to many
Steps in data reconciliation (continued)
© 2012, University of Colombo School of Computing
Load/Index= place transformed data
into the warehouse and create indexes
Refresh mode: bulk rewriting of
target data at periodic intervals
Update mode: only changes in
source data are written to data
warehouse
Extract, Transform & Load (ETL) Process
The process of integrating data from the
operational systems to DW.
This is a complex, time consuming & error-prone
– Availability of data
© 2012, University of Colombo School of Computing
– Availability of data
– Quality of data
– Granularity
ETL Process – Problems
• Hard Coded names
• Inconsistent field lengths
• Missing values
• Inconsistent values
• Inconsistent field types
© 2012, University of Colombo School of Computing
• Inconsistent field types
• Inconsistent entity handling
• Change in technology
• Resolving conflicts in multiple input files
ETL Software Tools
ETL tools extract data that resides in
disparate sources such as relational data
bases, mainframe systems or packaged
applications, transform it, and load it into
data marts and warehouses.
© 2012, University of Colombo School of Computing
data marts and warehouses.
Data Granularity
Granularity refers to the level of details or
summarization of the units of data in the
Data Warehouse.
The more detail there is, the lower the level of granularity.
© 2012, University of Colombo School of Computing
The more detail there is, the lower the level of granularity.
The less detail there is, the higher the level of granularity.
Granularity has an impact on:
– Volume of data in the Data Warehouse
– Types of Queries & Reports from the DW
DATA CUBES
• After defining the star schema we ca
create so many cubes according to the
requirement.
• For example, as in figure.
© 2012, University of Colombo School of Computing
• For example, as in figure.
District
Med
51 67 68
208 228 211

Den
Vet
Colombo
Gampaha
© 2012, University of Colombo School of Computing
A Data Cube
Year
District
26 31 39
2001 2002

2003
Kalutara

• A concept hierarchy defines a sequence of
mappings from a set of low-level concepts
to more general higher-level concepts.
• Using it data could be aggregated or
disaggregated. Many concept hierarchies
are implicit within the database schema
and a hierarchy could be defined for the
© 2012, University of Colombo School of Computing
and a hierarchy could be defined for the
locations in the order of school < zone <
district < province < country.
• This allows districts to be aggregated to
provinces (roll-up) and well as districts to
be disaggregated into zones (drill-down).
Roll-up
• The Roll-up operation corresponds to
taking the current data objects and doing
further grouping by one of the dimensions.
• The Roll-up operations performed on the
© 2012, University of Colombo School of Computing
• The Roll-up operations performed on the
central cube by climbing up the concept of
hierarchy.
• Thus, it is possible to Roll-up admission
data by grouping districts into provinces.
Concept Hierarchy
Sri Lanka
Western Sabaragamuwa

Country
Province
© 2012, University of Colombo School of Computing
KalutaraColombo
Colombo
De-la Salle College



… … … … … … …
… … …

School
District
Zone
Province
Med
92 108 96
285 326 318

Den
Vet
Western
Central
Roll-up District
© 2012, University of Colombo School of Computing
Year
Province
149
92
142
108
140
96
2001 2002

2003
Southern
Drill-down
• The drill-down operation is the opposite of
roll-up.
• It navigates from less detail data to more
details. The data cube of figure 6 can be
© 2012, University of Colombo School of Computing
details. The data cube of figure 6 can be
drill-down using the location concept-hierarchy and hence districts can be
disaggregated into zones as shown in
figure 9.
Zone
Med
1 0 0
170 173 158

Den
Vet
Colombo
Homagama
© 2012, University of Colombo School of Computing
Drill-down District
Year
Zone
12 7 6
2001 2002

2003
Sri Jayawar
dhanapura

Slice
• The slice operation performs a selection
on one dimension of the given cube,
resulting is a sub-cube.
• For example, we could select the course
© 2012, University of Colombo School of Computing
• For example, we could select the course
dimension and slice for course medicine
and view a sub-cube.
District
26
51
31
67
39
68
208 228 211Colombo
Kalutara
Gampaha
© 2012, University of Colombo School of Computing
Year
2001 2002

2003

A Slice for Course Medicine
Dice
• The dice operations define a sub-cube by
performing a selection of one or more
dimensions.
• For example, three dimension dice for
© 2012, University of Colombo School of Computing
• For example, three dimension dice for
courses “Medicine” and Dental” for
districts “Colombo” and “Jaffna” for year
2001 and 2002
District
Med
40
208 228
Colombo
course
Den
© 2012, University of Colombo School of Computing
A 3D Dice
Year
District
44 40
Jaffna
Multidimensional Data Schema
Support
• Decision Support Data tends to be
– Nonnormalized
– Duplicated
– Preaggregated
© 2012, University of Colombo School of Computing
– Preaggregated
• Star Schema
– Special Design technique for multidimensional
data representations
– Optimize data query operations instead of
data update operations
Star Schemas
• Data Modeling Technique to map
multidimensional decision support data
into a relational database
• Current Relational modeling techniques do
© 2012, University of Colombo School of Computing
• Current Relational modeling techniques do
not serve the needs of advanced data
requirements
Star Schema
• 4 Components
– Facts
– Dimensions
– Attributes
© 2012, University of Colombo School of Computing
– Attributes
– Attribute Hierarchies
STAR Schema
© 2012, University of Colombo School of Computing
Facts
• Numeric measurements (values) that represent
a specific business aspect or activity
• Stored in a fact table at the center of the star
scheme
© 2012, University of Colombo School of Computing
scheme
• Contains facts that are linked through their
dimensions
• Can be computed or derived at run time
• Updated periodically with data from operational
databases
Dimensions
• Qualifying characteristics that provide
additional perspectives to a given fact
– DSS data is almost always viewed in relation
to other data
© 2012, University of Colombo School of Computing
to other data
• Dimensions are normally stored in
dimension tables
Attributes
• Dimension Tables contain Attributes
• Attributes are used to search, filter, or classify
facts
• Dimensions provide descriptive characteristics
about the facts through their attributed
© 2012, University of Colombo School of Computing
about the facts through their attributed
• Must define common business attributes that will
be used to narrow a search, group information,
or describe dimensions. (ex.: Time / Location /
Product)
• No mathematical limit to the number of
dimensions (3-D makes it easy to model)
Attribute Hierarchies
• Provides a Top-Down data organization
– Aggregation
– Drill-down / Roll-Up data analysis
• Attributes from different dimensions can
© 2012, University of Colombo School of Computing
• Attributes from different dimensions can
be grouped to form a hierarchy
Star Schema
• A single fact table and for each dimension one
dimension table
• Does not capture hierarchies directly
© 2012, University of Colombo School of Computing
Snowflake schema
• Represent dimensional hierarchy directly by
normalizing tables.
• Easy to maintain and saves storage
Year
Exam
District
Application
Dist_No
Dist_Name
Population
School
Sch_Code
Sch_Address
Sch_Type
Year12S, Year12C
Student Master Fact Table
AL_Year
Exam_Index_No
UGC_Serial_No
Faculty_Code
Course_Code
Dist_No
Exam_Index_No
Dist_No(o)
Sub[4], Gra[4]
Dist_Ra, All_Island_rank
Exam_st
UGC_Serial_No
Corr_Dist
P_Address
Preference[10,40]
Maths_st
A_cou_pre
Sch_Code
AL_Year
Sp_Remarks
© 2012, University of Colombo School of Computing
Faculty
Course
Year12S, Year12C
Year12A, Year13S
Year_13_c, Year_13_A
Year_13r_S, Year_13r_c
Year_13r_A
Uni_Reg_No
Course_Code
Uni_stu
Reg_no
Year1_st
Year2_st
Year3_st
Year4_st
Final_st
Uni_code
Faculty
No_of_places
F-address
Course_Code
Course_Name
No_of_Vac
Category, status(m/d/u) Stu_st
Star Schema
© 2012, University of Colombo School of Computing
Star Schema
© 2012, University of Colombo School of Computing
Snowflake schemas normalize dimensions to eliminate redundancy. That
is, the dimension data has been grouped into multiple tables instead of
one large table.
E.g. Location and Item dimension table in a star schema might be
normalized into a location table and city table in a snowflake schema.
• define cube cube_master [AL_Year, Exam_Index_No,
UGC_Serial_No, Uni_Reg_No, Faculty_Code,
Course_Code, Sch_Code, Dist_No] : stu_status=
count(*)
• define  dimension  Year as  (AL_Year, Sp_Remarks)
• define dimension District as (Dist_No, Dist_Name,
Population)
© 2012, University of Colombo School of Computing
Population)
• define dimension Course as (Course_Code,
Course_Name, No_of_Vac)
Part of DMQL statements
Star Schema Representation
• Fact and Dimensions are represented by
physical tables in the data warehouse database
• Fact tables are related to each dimension table
in a Many to One relationship (Primary/Foreign
Key Relationships)
© 2012, University of Colombo School of Computing
Key Relationships)
• Fact Table is related to many dimension tables
– The primary key of the fact table is a composite
primary key from the dimension tables
• Each fact table is designed to answer a specific
DSS question
Star Schema
• The fact table is always the larges table in
the star schema
• Each dimension record is related to
thousand of fact records
© 2012, University of Colombo School of Computing
thousand of fact records
• Star Schema facilitated data retrieval
functions
• DBMS first searches the Dimension
Tables before the larger fact table
Data Warehouse
Implementation
• An Active Decision Support Framework
– Not a Static Database
– Always a Work in Process
– Complete Infrastructure for Company-Wide
© 2012, University of Colombo School of Computing
– Complete Infrastructure for Company-Wide
decision support
– Hardware / Software / People / Procedures /
Data
– Data Warehouse is a critical component of the
Modern DSS – But not the Only critical
component
Software for DW & BI
• Oracle Data Warehouse Builder
• Oracle Discoverer
• Business Objects
• Cognos
© 2012, University of Colombo School of Computing
• MS SQL Server Analysis Services
Business Objects – Complete BI Platform
Main Components
• Reporting (Crystal Reports, Crystal
Reports Analyzer)
• Query & Analysis
• OLAP Intelligence
© 2012, University of Colombo School of Computing
• OLAP Intelligence
• Web Intelligence
• Dash Board Manager
• Data Integrator (ETL)
Data Warehousing Applications
• Decision support
• Trend analysis
• Financial forecasting
• Churn Prediction for Telecom subscribers, Credit
© 2012, University of Colombo School of Computing
• Churn Prediction for Telecom subscribers, Credit
Card users etc.
• Insurance fraud analysis
• Call record analysis
• Logistics and Inventory management
• Agriculture
What Is Data Mining?
• Data mining (knowledge discovery in
databases):
– Extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful)
information or patterns from data in large
© 2012, University of Colombo School of Computing
information or patterns from data in large
databases
• What is not data mining?
– (Deductive) query processing.
– Expert systems or small ML/statistical programs
Data Mining
• Discover Previously unknown data
characteristics, relationships,
dependencies, or trends
• Typical Data Analysis Relies on end users
© 2012, University of Colombo School of Computing
• Typical Data Analysis Relies on end users
– Define the Problem
– Select the Data
– Initial the Data Analysis
– Reacts to External Stimulus
Data Mining
• Proactive
• Automatically searches
– Anomalies
– Possible Relationships
– Identify Problems before the end-user
© 2012, University of Colombo School of Computing
– Identify Problems before the end-user
• Data Mining tools analyze the data, uncover
problems or opportunities hidden in data
relationships, form computer models based on
their findings, and then user the models to
predict business behavior – with minimal end-user intervention
Multidisciplinary Field
Database
Technology
Statistics
© 2012, University of Colombo School of Computing 67
Data Mining
Other
Disciplines
Artificial
Intelligence
Machine
Learning
Visualization
Data Mining
• A methodology designed to perform
knowledge-discovery expeditions over the
database data with minimal end-user
intervention
© 2012, University of Colombo School of Computing
intervention
• 3 Stages of Data
– Data
– Information
– Knowledge
Why Data Mining? — Potential
Applications
• Database analysis and decision support
– Market analysis and management
• target marketing, customer relation management,
market basket analysis, cross selling, market
© 2012, University of Colombo School of Computing
market basket analysis, cross selling, market
segmentation
– Risk analysis and management
• Forecasting, customer retention, improved
underwriting, quality control, competitive analysis
– Fraud detection and management
Applications
• Customer analytics
– Forecasting buying habits and lifestyle preferences is
a process of data mining and analysis.
• Data Mining in Agriculture
• National Security Agency
• Police-enforced ANPR in the UK
© 2012, University of Colombo School of Computing
• Police-enforced ANPR in the UK
• Quantitative structure-activity relationship
Surveillance / Mass surveillance
• Processing Loan Applications
• Stock and investment analysis
• Identify successful medical therapies
Data Mining: A KDD Process
– Data mining: the core
of knowledge
discovery process.
Task-relevant Data
Data Mining
Pattern Evaluation
© 2012, University of Colombo School of Computing
Data Cleaning
Data Integration
Databases
Data Warehouse
Selection
Association
Description
• Seeks association rules in dataset
• ‘Market basket’ analysis
• Sequence discovery
© 2012, University of Colombo School of Computing
Clustering
Description
• unsupervised
• seeks to describe dataset in terms of
natural clusters of cases
© 2012, University of Colombo School of Computing
Discoverer
“An ad hoc query, reporting, and analysis tool”
© 2012, University of Colombo School of Computing
Reports
“A sophisticated enterprise production reporting tool to build and
distribute high-quality reports”
© 2012, University of Colombo School of Computing
© 2008, University of Colombo School of Computing
Data Storage and Querying
Dr G.N.Wikramanayake
Dr Jeevani Goonetillake
University of Colombo School of Computing
© 2008, University of Colombo School of Computing
user a user i/program j program x
sub-schema a sub-schema i sub-schema z
Conceptual Schema
Physical Schema
Data
bases
DBMS
File 1
File 2
File 1
File 3
File 2
A10B20C3060
© 2008, University of Colombo School of Computing
Name (20 characters)      Address (40 characters)
NID (10 char)  Designation (15 char)
database
A.B.C. De Silva      |222, Galle Road, Colombo         |
650370690V|Senior Lecturer
Employee record
Physical View
• The DBMS must
know
– exact physical location
– precise physical
structure
© 2008, University of Colombo School of Computing
File
• A collection of data records with similar
characteristics
• Student list, course list at the university
• price list, stock records at a supermarket
Item# Description   Shelf#  Max Stock Reorder Balance
1152    Milk             D01      100            25            70
1167    Bread           D01        70            25            30
1175    Sugar           G02       300           50           120
1172    Rice             G04        50            20            35
© 2008, University of Colombo School of Computing
Physical Design
• Provide good
performance
– Fast response time
– Minimum disk
accesses
© 2008, University of Colombo School of Computing
Disk Assembly
Cylinder
Track
Sector
(disk block)
Access time = seek time + rotational delay + transfer time
Read/write
head
Central axle
Head
Assembly
Protective
surface
Data
surface
© 2008, University of Colombo School of Computing
Disks
• 6 disks (platters); 12 surfaces; 2 outer protected
surfaces; 10 inner data surfaces (coated with a
magnetic substance to record data )
• Each surface 200-400 concentric tracks
• Read/write heads placed over specific track; with
one active at a time
• Set of corresponding tracks is a cylinder, i.e.
track I of all 10 surfaces
© 2008, University of Colombo School of Computing
Disks
© 2008, University of Colombo School of Computing
© 2008, University of Colombo School of Computing
Data Block
• A data block (sector) is the smallest unit of
data defined within the database.
• block size may be defined by the
DB_BLOCK_SIZE
© 2008, University of Colombo School of Computing
Definitions
• Seek time
– Average time to move the read-write head to
the correct cylinder
• Rotational delay
– Average time for the sector to move under the
read-write head
• Transfer time
– Time to read a sector and transfer the data to
memory
© 2008, University of Colombo School of Computing
More Definitions
• Logical Record
– The data about an entity (a row in a table)
• Physical Record
– A sector, page or block on the storage
medium
• Typically several logical records can be
stored in one physical record.
© 2008, University of Colombo School of Computing
File Organization Techniques
• Three techniques
– Serial or Heap (unordered)
–Sorted
• Sequential (SAM)
• Indexed Sequential (ISAM)
– Hashed or Direct or Random
© 2008, University of Colombo School of Computing
Serial
• Used for temporary files such as
transaction files, dump files
Transaction#  Item# Description   Quantity
101                  1152    Milk                 01
101                  1167    Bread               02
102                  1167    Bread               01
102                  1175    Sugar               01
103                  1172    Rice                 01
103                  1152    Milk                 01
© 2008, University of Colombo School of Computing
Heap File
ID Company   Industry   Symbl.   Price  Earns.  Dividnd.
1767    Tony Lama Apparel   TONY   45.00   1.50       0.25
1152    Lockheed    Aero        LCH   112.00   1.25       0.50
1175    Ford            Auto        F           88.00   1.70     0.20
1122    Exxon         Oil           XON    46.00   2.50       0.75
1231    Intel            Comp.      INTL   30.00    2.00       0.00
1323    GM             Auto        GM      158.00   2.10      0.30
1378    Texaco        Oil           TX       230.00   2.80      1.00
1245    Digital        Comp.      DEC    120.00   1.80      0.10
Tony Lama was the first record added,
Digital was the last.
© 2008, University of Colombo School of Computing
Heap File Characteristics
•Insertion
– Fast: New records added at the end of the file
• Retrieval
– Slow: A sequential search is required
• Update – Delete
– Slow:
• Sequential search to find the page
• Make the update or mark for deletion
• Re-write the page
© 2008, University of Colombo School of Computing
Sequential
• Records are recoded in key sequence, but
have no index
• Used for master files in a normal batch
processing
Item# Description       Price
1152    Milk                 35.00
1167    Bread               30.00
1172    Rice                 60.00
1175    Sugar               50.00
© 2008, University of Colombo School of Computing
Sequential (Ordered) File
ID Company   Industry  Symbl.   Price  Earns.  Dividnd.
1122    Exxon         Oil           XON    46.00   2.50       0.75
1152    Lockheed    Aero        LCH   112.00   1.25       0.50
1175    Ford            Auto        F           88.00   1.70     0.20
1231    Intel            Comp.      INTL   30.00    2.00       0.00
1245    Digital        Comp.      DEC    120.00   1.80      0.10
1323    GM             Auto        GM      158.00   2.10      0.30
1378    Texaco        Oil           TX       230.00   2.80      1.00
1480    Conoco Oil           CON    150.00   2.00      0.50
1767    Tony Lama Apparel   TONY   45.00   1.50       0.25
© 2008, University of Colombo School of Computing
Sequential Access
1122…other data
1152  …
1175…
1231…
© 2008, University of Colombo School of Computing
Sequential File Characteristics
• Older media (cards, tapes)
• Records physically ordered by primary key
• Use when direct access to individual records
is not required
• Accessing records
– Sequential search until record is found
• Binary search can speed up access
– Must know file size and how to determine mid-point,
© 2008, University of Colombo School of Computing
Search Sugar
• Sequence of file based on Description
• 4 sequential accesses to reach sugar
• 2 binary search accesses to reach
sugar Description       Price
Bread               30.00
Milk                 35.00
Rice                 60.00
Sugar               50.00
Water               20.00
1
2
3
4
1
2
© 2008, University of Colombo School of Computing
© 2008, University of Colombo School of Computing
Inserting Records in SAM files
•Insertion
–Slow:
• Sequential search to find where the record goes
• If sufficient space in that page, then rewrite
• If insufficient space, move some records to next
page
• If no space there, keep bumping down until space
is found
– May use an “overflow” file to decrease time
© 2008, University of Colombo School of Computing
Deletions and Updates to SAM
• Deletion
–Slow:
• Find the record
• Either mark for deletion or free up the space
•Rewrite
• Updates
–Slow:
• Find the record
• Make the change
•Rewrite
© 2008, University of Colombo School of Computing
Binary Search to Find GM (1323)
ID Company   Industry  Symbl.   Price  Earns.  Dividnd.
1122    Exxon         Oil           XON    46.00   2.50       0.75
1152    Lockheed    Aero        LCH   112.00   1.25       0.50
1175    Ford            Auto        F           88.00   1.70     0.20
1231    Intel            Comp.      INTL   30.00    2.00       0.00
1. 1245    Digital        Comp.      DEC    120.00   1.80      0.10
3. 1323    GM             Auto        GM      158.00   2.10      0.30
2. 1378    Texaco        Oil           TX       230.00   2.80      1.00
1480    Conoco Oil           CON    150.00   2.00      0.50
1767    Tony Lama Apparel   TONY   45.00   1.50       0.25
• Takes 3 accesses as opposed to 6 for linear search.
© 2008, University of Colombo School of Computing
Indexed Sequential
• Disk (usually)
• Records physically ordered by primary key
• Index gives physical location of each record
• Records accessed sequentially or directly
via the index
• The index is stored in a file and read into
memory when the file is opened.
• Indexes must be maintained
© 2008, University of Colombo School of Computing
Indexed Sequential Access
• Given a value for the key
– search the index for the record address
– issue a read instruction for that address
– Fast: Possibly just one disk access
© 2008, University of Colombo School of Computing
Indexed Sequential Access: Fast
Key                  Cyl. Trck Sect.
279-66-7549       3    10      2
452-75-6301       3    10      3
789-12-3456       3    10      4
777-13-1212 < 789-12-3456
Search Cyl. 3, Trck 10, Sect. 4
sequentially.
222-66-7634
255-75-5531
279-66-7549
333-88-9876
382-32-0658
452-75-6301
701-43-5634
777-13-1212
789-12-3456
Find record with key 777-13-1212
© 2008, University of Colombo School of Computing
Inserting into ISAM files
• Not very efficient
– Indexes must be updated
– Must locate where the record should go
– If there is space, insert the new record and
rewrite
– If no space, use an overflow area
– Periodically merge overflow records into file
© 2008, University of Colombo School of Computing
Deletion and Updates for ISA
• Fairly efficient
– Find the record
– Make the change or mark for deletion
– Rewrite
– Periodically remove records marked for
deletion
© 2008, University of Colombo School of Computing
Use ISAM files when:
• Both sequential and direct access is needed.
• Say we have a retail application like Foley’s.
• Customer balances are updated daily.
• Usually sequential access is more efficient
for batch updates.
• But we may need direct access to answer
customer questions about balances.
© 2008, University of Colombo School of Computing
Random
• Randomly organized file contains records
stored without regard to the sequence of
their control fields.
• Records are stored in some convenient
order establishing a direct link between the
key of the record and the physical address
of that record
© 2008, University of Colombo School of Computing
Direct or Hashed Access
• A portion of disk space is reserved
• A “hashing” algorithm computes record
address
Hashing
Algorithm
455-72-3566
Address
Overflow
376-87-3425
Address
© 2008, University of Colombo School of Computing
Hashed Access Characteristics
• No indexes to search or maintain
• Very fast direct access
• Inefficient sequential access
• Use when direct access is needed, but
sequential access is not
• Data cannot be sorted easily
© 2008, University of Colombo School of Computing
A query typically has many possible execution strategies
and the process of choosing a suitable one for
processing a query is known as query optimisation.
The job of the heuristic query optimiser is to transform this
initial query tree into a final query tree that is efficient to
execute.
The optimiser must include rules for equivalence among
relational algebra expressions that can be applied to the
initial tree, guided by the heuristic query optimisation
rules to produce the final optimised query tree.
Query Optimization
© 2008, University of Colombo School of Computing
Execution strategy – The DBMS must then devise
an execution strategy for retrieving the result of the
query from the internal database files.
Query code generator – According to the chosen
execution plan, the code generator generates the
code to execute the plan.
Runtime database processor – The runtime
database processor has the task of running the query
code (compiled or interpreted) to produce the query
result. If a runtime error occurs the runtime database
processor generates an error message.
© 2008, University of Colombo School of Computing
Translating SQL Queries into Relational
Algebra
SELECT  p.pno, d.dno, e.ename
FROM     Project as p, Department as d, Employee as e
WHERE  d.dno=p.dept and d.mgr=e.empno and
p.location=‘Colombo’;
T1 ÅProject ∞dno=dept Department
T2 ÅT1 ∞mgr=empno Employee
T3 Åσlocation=‘Colombo’(T2)
Result Åπpno, dno, ename(T3)
JOIN Project and Department over dno=dept giving T1
JOIN T1 and Employee over mgr=empno giving T2
RESTRICT T2 where location=‘Colombo’ giving T3
PROJECT T3 over pno, dno, ename giving Result
© 2008, University of Colombo School of Computing
Queries in Relational Algebra
Equivalent Queries
(a) Result Åπpno, dno, ename (σlocation=‘Colombo’ (
(Project ∞dno=dept Department) ∞mgr=empno
Employee) )
(b) T1 Åσlocation=‘Colombo’(Project)
T2 ÅT1 ∞dno=dept Department
T3 ÅT2 ∞mgr=empno Employee
Result Åπpno, dno, ename(T3)
© 2008, University of Colombo School of Computing
Basic algorithms for executing query
operations
• Each DBMS typically has a number of general
database access algorithms that implement
relational operations such as SELECT or
JOIN or combinations of these operations.
• The query optimisation module will consider
only execution strategies that can be
implemented by the DBMS access algorithms
(i.e. storage structures and access paths).
© 2008, University of Colombo School of Computing
Sorting is one of the primary algorithms used in
query processing.
For example, whenever an SQL query
specifies an ORDER BY clause, the query
result must be sorted.
Sorting is also a key component in sort-merge
algorithms used for JOIN and other
operations (such as UNION and
INTERSECTION), and in duplicate
elimination algorithms for the PROJECT
operation (when an SQL query specifies the
DISTINCT option in the SELECT clause).
© 2008, University of Colombo School of Computing
There are many options for executing a
SELECT operation.
A number of searching algorithms are
possible for selecting records from a file.
Linear search (brute force), binary search,
using a primary index or hash key,
clustering index and using secondary
index (B-tree) on an equality comparison
are examples of searching.
© 2008, University of Colombo School of Computing
Selecting Records, e.g.
•Primary index (records ordered on a key field)
T1 Åσempno=‘12345’(Employee) (single record)
T1 Åσdno>=‘5’(Department) (multiple records)
• Clustering Index (records ordered on a non key field)
T1 Åσdno=‘5’(Employee) (multiple records)
•B+-Tree Index (secondary index on equality comparison)
T1 Åσsalary>=30000 and salary<=35000(Employee) (multiple records)
© 2008, University of Colombo School of Computing
Using Heuristics in Query Optimisation
• There are two main techniques for query
optimisation: heuristic rules and systematic
estimating.
• Heuristic rules are used to  order the operations in a query
execution strategy. The rules typically reorder the operations
in a query tree or determine an order for executing the
operations specified by a query graph.
• Systematic estimating is used  to cost the different execution
strategies and to choose the execution plan with the lowest
cost estimate.
• Both strategies are usually combined in a query optimiser.
© 2008, University of Colombo School of Computing
Transformation rules for relational algebra
operations
• There are many rules for transforming relational
algebra operations into equivalent ones.
• These are in addition to those discussed under
relational algebra.
• These rules are used in heuristic optimisation.
• Algorithms that utilise these rules are used to
transform an initial query tree into an optimised
tree that is more efficient to execute.
• Here we look at some examples that demonstrate
such transformations.
© 2008, University of Colombo School of Computing
Rule 1 (cascade of σ)
Break up any SELECT operations (σ) with
conjunctive conditions (AND) into a
cascade (sequence) of individual SELECT
operations.
σc1 AND c2 AND … AND cn (R) ≡σc1 (σc2 (…(σcn (R))…))
This permits a greater degree of freedom in
moving SELECT operations down different
branches of the tree.
© 2008, University of Colombo School of Computing
Rule 2 (commutative of σ)
The SELECT operation is commutative
σc1 (σc2 (R)) ≡σc2 (σc1 (R))
Rule 3 (commutative of σwith π)
If the SELECT condition c involves only attributes
a1, a2, …, an in the PROJECTION list, the two
operations can be commuted.
πa1, a2, …, an (σc (R)) ≡σc (πa1, a2, …, an (R))
© 2008, University of Colombo School of Computing
Rule 4 (commutative of σwith X or ∞)
If all the attributes in the selection condition c involve
only the attributes of one of the relations being
joined (say R) the two operations can be
commuted as
σc (R∞S) ≡(σc (R)) ∞S
Alternatively if the selection condition c can be written as c1
and c2, where c involves only the attributes of S, the
operations commute as
σc (R∞S) ≡(σc1 (R)) ∞(σc2 (S))
The same rules apply if the ∞is replaced by a X operation.
© 2008, University of Colombo School of Computing
Rule 5 (commuting σwith set operations)
The  σ operation commutes with  ∪,  ∩ and  −. If  θ
stands for any one of these 3 operations then
σc (RθS) ≡(σc (R)) θ(σc (S))
Using rules 2, 3, 4, and 5 concerning the
commutative of SELECT with other operations,
move each SELECT operation as far down the
query tree as is permitted by the attributes
involved in the select condition.
The objective is to reduce the number of tuples that
appear in the Cartesian product.
© 2008, University of Colombo School of Computing
SELECT lname
FROM    Employee, WorksOn, Project
WHERE pname=’Aquarius’ and pnumber=pno and essn=ssn
and bdate>’31/12/1957’;
(a) initial query tree2
250,000,000
2,500,000                                        100
5,000                                          500
σpname=’Aquarius’ and
pnumber=pno and essn=ssn and
bdate>’31/12/1957’
πlname
|
Employee WorksOn
ProjectX
|
X
Depending on
the capacity of
the memory
intermediate
records may
have to be
written to disk.
Thus increase
total Read/Write
operations.
© 2008, University of Colombo School of Computing
(b) Moving σoperation down the query
tree
|
σpnumber=pno
πlname
|
Employee
WorksOn
ProjectX
|
X
σpname=’Aquarius’|
σessn=ssn
σbdate>’31/12/1957’
|
2
100
100         1
500,000    100
1,000
500
5,000
© 2008, University of Colombo School of Computing
(c) Applying the more restrictive σoperation
first
2
5,000
51,000500
1   5,000
500
100
|
σessn=ssn
πlname
|
Project
WorksOn
EmployeeX
|
X
σbdate>’31/12/1957’|
σpnumber=pno
σpname=’Aquarius’
|
Use the information that pname is a unique attribute of Project relation.
© 2008, University of Colombo School of Computing
(d) Replacing Cartesian product and Select with
Join operations
2
1,000
5
1
5000
500
100
πlname
|
Project
WorksOn
Employee
σbdate>’31/12/1957’|
∞pnumber=pno
σpname=’Aquarius’
|
∞essn=ssn
© 2008, University of Colombo School of Computing
(e) Moving Project operation (π) down the query
tree
∞essn=ssn
πlname
|
Project
WorksOn
Employee
σbdate>’31/12/1957’|
∞pnumber=pno
σpname=’Aquarius’
|
|
πssn, lname
|
πessn
πessn, pnoπpnumber
| |
Sub query
© 2008, University of Colombo School of Computing
Using Selectivity and Cost Estimates in
Query Optimisation
A query optimiser should not depend solely on
heuristic rules, it should also estimate and
compare the costs of executing a query using
different execution strategies and should choose
the strategy with the lowest cost estimate.
• Cost components
• Cost functions
• Examples
© 2008, University of Colombo School of Computing
Cost Components for Query Execution
• Access cost to secondary storage
– Cost of searching for, reading and writing data blocks that reside on secondary storage.
– Cost of searching depends on access file structures such as ordering, hashing, indexes. Also based on how data are allocated on disk.
• Storage cost
– Cost of storing any intermediate files that are generated by an execution strategy for the query
© 2008, University of Colombo School of Computing
Cost Components …
• Computation cost
– Cost of performing in-memory operations on the data buffers during query execution. E.g. searching, sorting, merging for join, performing computations on field values.
• Memory usage cost
– Cost pertaining to the number of memory buffers needed during query execution.
• Communication cost
– Cost of sending the query and its results from database server to client.
© 2008, University of Colombo School of Computing
Catalog Information used in Cost Functions
•nr – number of tuples in relation r
•sr – size of tuple in relation r
•br – number of blocks containing tuples of r
•fr – blocking factor – number of tuples of relation r that fit
into one block
•xa – number of levels of each multilevel index on attribute
a .
•da,r – number of distinct values in relation r for attribute a.
da,r = nr = πa(r) if a is unique
•sa,r – Selection cardinality of attribute a of relation r –
average number records satisfying equality condition. If
a is unique da,r = nr, sa,r = 1 else (if uniformly distributed),
sa,r = nr/da,r
© 2008, University of Colombo School of Computing
Cost Functions for SELECT
• Linear Search
– Retrieve all file blocks; cost = br
– For equal condition on average cost = br/2 if found else cost = br
• Binary Search
– Search access cost = log2 br + (sa,r/ fr ) – 1
– If unique attribute average cost = log2 br
• Using a primary / secondary key index
– Cost = xa + 1
• Using a hash key
– Cost ≈1
1
© 2008, University of Colombo School of Computing
Data Storage and Querying
Dr G.N.Wikramanayake
Dr Jeevani Goonetillake
University of Colombo School of Computing
© 2008, University of Colombo School of Computing
user a user i/program j program x
sub-schema a sub-schema i sub-schema z
Conceptual Schema
Physical Schema
Data
bases
DBMS
File 1
File 2
File 1
File 3
File 2
A10B20C3060
© 2008, University of Colombo School of Computing
Name (20 characters)      Address (40 characters)
NID (10 char)  Designation (15 char)
database
A.B.C. De Silva      |222, Galle Road, Colombo         |
650370690V|Senior Lecturer
Employee record
Physical View
• The DBMS must
know
– exact physical location
– precise physical
structure
© 2008, University of Colombo School of Computing
File
• A collection of data records with similar
characteristics
• Student list, course list at the university
• price list, stock records at a supermarket
Item# Description   Shelf#  Max Stock Reorder Balance
1152    Milk             D01      100            25            70
1167    Bread           D01        70            25            30
1175    Sugar           G02       300           50           120
1172    Rice             G04        50            20            35
© 2008, University of Colombo School of Computing
Physical Design
• Provide good
performance
– Fast response time
– Minimum disk
accesses
© 2008, University of Colombo School of Computing
Disk Assembly
Cylinder
Track
Sector
(disk block)
Access time = seek time + rotational delay + transfer time
Read/write
head
Central axle
Head
Assembly
Protective
surface
Data
surface
2
© 2008, University of Colombo School of Computing
Disks
• 6 disks (platters); 12 surfaces; 2 outer protected
surfaces; 10 inner data surfaces (coated with a
magnetic substance to record data )
• Each surface 200-400 concentric tracks
• Read/write heads placed over specific track; with
one active at a time
• Set of corresponding tracks is a cylinder, i.e.
track I of all 10 surfaces
© 2008, University of Colombo School of Computing
Disks
© 2008, University of Colombo School of Computing © 2008, University of Colombo School of Computing
Data Block
• A data block (sector) is the smallest unit of
data defined within the database.
• block size may be defined by the
DB_BLOCK_SIZE
© 2008, University of Colombo School of Computing
Definitions
• Seek time
– Average time to move the read-write head to
the correct cylinder
• Rotational delay
– Average time for the sector to move under the
read-write head
• Transfer time
– Time to read a sector and transfer the data to
memory
© 2008, University of Colombo School of Computing
More Definitions
• Logical Record
– The data about an entity (a row in a table)
• Physical Record
– A sector, page or block on the storage
medium
• Typically several logical records can be
stored in one physical record.
3
© 2008, University of Colombo School of Computing
File Organization Techniques
• Three techniques
– Serial or Heap (unordered)
–Sorted
• Sequential (SAM)
• Indexed Sequential (ISAM)
– Hashed or Direct or Random
© 2008, University of Colombo School of Computing
Serial
• Used for temporary files such as
transaction files, dump files
Transaction#  Item# Description   Quantity
101                  1152    Milk                 01
101                  1167    Bread               02
102                  1167    Bread               01
102                  1175    Sugar               01
103                  1172    Rice                 01
103                  1152    Milk                 01
© 2008, University of Colombo School of Computing
Heap File
ID Company   Industry   Symbl.   Price  Earns.  Dividnd.
1767    Tony Lama Apparel   TONY   45.00   1.50       0.25
1152    Lockheed    Aero        LCH   112.00   1.25       0.50
1175    Ford            Auto        F           88.00   1.70     0.20
1122    Exxon         Oil           XON    46.00   2.50       0.75
1231    Intel            Comp.      INTL   30.00    2.00       0.00
1323    GM             Auto        GM      158.00   2.10      0.30
1378    Texaco        Oil           TX       230.00   2.80      1.00
1245    Digital        Comp.      DEC    120.00   1.80      0.10
Tony Lama was the first record added,
Digital was the last.
© 2008, University of Colombo School of Computing
Heap File Characteristics
• Insertion
– Fast: New records added at the end of the file
• Retrieval
– Slow: A sequential search is required
• Update – Delete
–Slow:
• Sequential search to find the page
• Make the update or mark for deletion
• Re-write the page
© 2008, University of Colombo School of Computing
Sequential
• Records are recoded in key sequence, but
have no index
• Used for master files in a normal batch
processing
Item# Description       Price
1152    Milk                 35.00
1167    Bread               30.00
1172    Rice                 60.00
1175    Sugar               50.00
© 2008, University of Colombo School of Computing
Sequential (Ordered) File
ID Company   Industry  Symbl.   Price  Earns.  Dividnd.
1122    Exxon         Oil           XON    46.00   2.50       0.75
1152    Lockheed    Aero        LCH   112.00   1.25       0.50
1175    Ford            Auto        F           88.00   1.70     0.20
1231    Intel            Comp.      INTL   30.00    2.00       0.00
1245    Digital        Comp.      DEC    120.00   1.80      0.10
1323    GM             Auto        GM      158.00   2.10      0.30
1378    Texaco        Oil           TX       230.00   2.80      1.00
1480    Conoco Oil           CON    150.00   2.00      0.50
1767    Tony Lama Apparel   TONY   45.00   1.50       0.25
4
© 2008, University of Colombo School of Computing
Sequential Access
1122…other data
1152  …
1175…
1231…
© 2008, University of Colombo School of Computing
Sequential File Characteristics
• Older media (cards, tapes)
• Records physically ordered by primary key
• Use when direct access to individual records
is not required
• Accessing records
– Sequential search until record is found
• Binary search can speed up access
– Must know file size and how to determine mid-point,
© 2008, University of Colombo School of Computing
Search Sugar
• Sequence of file based on Description
• 4 sequential accesses to reach sugar
• 2 binary search accesses to reach
sugar Description       Price
Bread               30.00
Milk                 35.00
Rice                 60.00
Sugar               50.00
Water               20.00
1
2
3
4
1
2
© 2008, University of Colombo School of Computing
© 2008, University of Colombo School of Computing
Inserting Records in SAM files
• Insertion
–Slow:
• Sequential search to find where the record goes
• If sufficient space in that page, then rewrite
• If insufficient space, move some records to next
page
• If no space there, keep bumping down until space
is found
– May use an “overflow” file to decrease time
© 2008, University of Colombo School of Computing
Deletions and Updates to SAM
•Deletion
–Slow:
• Find the record
• Either mark for deletion or free up the space
•Rewrite
• Updates
–Slow:
• Find the record
• Make the change
•Rewrite
5
© 2008, University of Colombo School of Computing
Binary Search to Find GM (1323)
ID Company   Industry  Symbl.   Price  Earns.  Dividnd.
1122    Exxon         Oil           XON    46.00   2.50       0.75
1152    Lockheed    Aero        LCH   112.00   1.25       0.50
1175    Ford            Auto        F           88.00   1.70     0.20
1231    Intel            Comp.      INTL   30.00    2.00       0.00
1. 1245    Digital        Comp.      DEC    120.00   1.80      0.10
3. 1323    GM             Auto        GM      158.00   2.10      0.30
2. 1378    Texaco        Oil           TX       230.00   2.80      1.00
1480    Conoco Oil           CON    150.00   2.00      0.50
1767    Tony Lama Apparel   TONY   45.00   1.50       0.25
• Takes 3 accesses as opposed to 6 for linear search.
© 2008, University of Colombo School of Computing
Indexed Sequential
• Disk (usually)
• Records physically ordered by primary key
• Index gives physical location of each record
• Records accessed sequentially or directly
via the index
• The index is stored in a file and read into
memory when the file is opened.
• Indexes must be maintained
© 2008, University of Colombo School of Computing
Indexed Sequential Access
• Given a value for the key
– search the index for the record address
– issue a read instruction for that address
– Fast: Possibly just one disk access
© 2008, University of Colombo School of Computing
Indexed Sequential Access: Fast
Key                  Cyl. Trck Sect.
279-66-7549       3    10      2
452-75-6301       3    10      3
789-12-3456       3    10      4
777-13-1212 < 789-12-3456
Search Cyl. 3, Trck 10, Sect. 4
sequentially.
222-66-7634
255-75-5531
279-66-7549
333-88-9876
382-32-0658
452-75-6301
701-43-5634
777-13-1212
789-12-3456
Find record with key 777-13-1212
© 2008, University of Colombo School of Computing
Inserting into ISAM files
• Not very efficient
– Indexes must be updated
– Must locate where the record should go
– If there is space, insert the new record and
rewrite
– If no space, use an overflow area
– Periodically merge overflow records into file
© 2008, University of Colombo School of Computing
Deletion and Updates for ISA
• Fairly efficient
– Find the record
– Make the change or mark for deletion
–Rewrite
– Periodically remove records marked for
deletion
6
© 2008, University of Colombo School of Computing
Use ISAM files when:
• Both sequential and direct access is needed.
• Say we have a retail application like Foley’s.
• Customer balances are updated daily.
• Usually sequential access is more efficient
for batch updates.
• But we may need direct access to answer
customer questions about balances.
© 2008, University of Colombo School of Computing
Random
• Randomly organized file contains records
stored without regard to the sequence of
their control fields.
• Records are stored in some convenient
order establishing a direct link between the
key of the record and the physical address
of that record
© 2008, University of Colombo School of Computing
Direct or Hashed Access
• A portion of disk space is reserved
• A “hashing” algorithm computes record
address
Hashing
Algorithm
455-72-3566
Address
Overflow
376-87-3425
Address
© 2008, University of Colombo School of Computing
Hashed Access Characteristics
• No indexes to search or maintain
• Very fast direct access
• Inefficient sequential access
• Use when direct access is needed, but
sequential access is not
• Data cannot be sorted easily
© 2008, University of Colombo School of Computing
A query typically has many possible execution strategies
and the process of choosing a suitable one for
processing a query is known as query optimisation.
The job of the heuristic query optimiser is to transform this
initial query tree into a final query tree that is efficient to
execute.
The optimiser must include rules for equivalence among
relational algebra expressions that can be applied to the
initial tree, guided by the heuristic query optimisation
rules to produce the final optimised query tree.
Query Optimization
© 2008, University of Colombo School of Computing
Execution strategy – The DBMS must then devise
an execution strategy for retrieving the result of the
query from the internal database files.
Query code generator – According to the chosen
execution plan, the code generator generates the
code to execute the plan.
Runtime database processor – The runtime
database processor has the task of running the query
code (compiled or interpreted) to produce the query
result. If a runtime error occurs the runtime database
processor generates an error message.
7
© 2008, University of Colombo School of Computing
Translating SQL Queries into Relational
Algebra
SELECT  p.pno, d.dno, e.ename
FROM     Project as p, Department as d, Employee as e
WHERE  d.dno=p.dept and d.mgr=e.empno and
p.location=‘Colombo’;
T1 ÅProject ∞dno=dept Department
T2 ÅT1 ∞mgr=empno Employee
T3 Åσlocation=‘Colombo’(T2)
Result Åπpno, dno, ename(T3)
JOIN Project and Department over dno=dept giving T1
JOIN T1 and Employee over mgr=empno giving T2
RESTRICT T2 where location=‘Colombo’ giving T3
PROJECT T3 over pno, dno, ename giving Result
© 2008, University of Colombo School of Computing
Queries in Relational Algebra
Equivalent Queries
(a) Result Åπpno, dno, ename (σlocation=‘Colombo’ (
(Project ∞dno=dept Department) ∞mgr=empno
Employee) )
(b) T1 Åσlocation=‘Colombo’(Project)
T2 ÅT1 ∞dno=dept Department
T3 ÅT2 ∞mgr=empno Employee
Result Åπpno, dno, ename(T3)
© 2008, University of Colombo School of Computing
Basic algorithms for executing query
operations
• Each DBMS typically has a number of general
database access algorithms that implement
relational operations such as SELECT or
JOIN or combinations of these operations.
• The query optimisation module will consider
only execution strategies that can be
implemented by the DBMS access algorithms
(i.e. storage structures and access paths).
© 2008, University of Colombo School of Computing
Sorting is one of the primary algorithms used in
query processing.
For example, whenever an SQL query
specifies an ORDER BY clause, the query
result must be sorted.
Sorting is also a key component in sort-merge
algorithms used for JOIN and other
operations (such as UNION and
INTERSECTION), and in duplicate
elimination algorithms for the PROJECT
operation (when an SQL query specifies the
DISTINCT option in the SELECT clause).
© 2008, University of Colombo School of Computing
There are many options for executing a
SELECT operation.
A number of searching algorithms are
possible for selecting records from a file.
Linear search (brute force), binary search,
using a primary index or hash key,
clustering index and using secondary
index (B-tree) on an equality comparison
are examples of searching.
© 2008, University of Colombo School of Computing
Selecting Records, e.g.
• Primary index (records ordered on a key field)
T1 Åσempno=‘12345’(Employee) (single record)
T1 Åσdno>=‘5’(Department) (multiple records)
• Clustering Index (records ordered on a non key field)
T1 Åσdno=‘5’(Employee) (multiple records)
•B+-Tree Index (secondary index on equality comparison)
T1 Åσsalary>=30000 and salary<=35000(Employee) (multiple records)
8
© 2008, University of Colombo School of Computing
Using Heuristics in Query Optimisation
• There are two main techniques for query
optimisation: heuristic rules and systematic
estimating.
• Heuristic rules are used to order the operations in a query
execution strategy. The rules typically reorder the operations
in a query tree or determine an order for executing the
operations specified by a query graph.
• Systematic estimating is used to cost the different execution
strategies and to choose the execution plan with the lowest
cost estimate.
• Both strategies are usually combined in a query optimiser.
© 2008, University of Colombo School of Computing
Transformation rules for relational algebra
operations
• There are many rules for transforming relational
algebra operations into equivalent ones.
• These are in addition to those discussed under
relational algebra.
• These rules are used in heuristic optimisation.
• Algorithms that utilise these rules are used to
transform an initial query tree into an optimised
tree that is more efficient to execute.
• Here we look at some examples that demonstrate
such transformations.
© 2008, University of Colombo School of Computing
Rule 1 (cascade of σ)
Break up any SELECT operations (σ) with
conjunctive conditions (AND) into a
cascade (sequence) of individual SELECT
operations.
σc1 AND c2 AND … AND cn (R) ≡σc1 (σc2 (…(σcn (R))…))
This permits a greater degree of freedom in
moving SELECT operations down different
branches of the tree.
© 2008, University of Colombo School of Computing
Rule 2 (commutative of σ)
The SELECT operation is commutative
σc1 (σc2 (R)) ≡σc2 (σc1 (R))
Rule 3 (commutative of σwith π)
If the SELECT condition c involves only attributes
a1, a2, …, an in the PROJECTION list, the two
operations can be commuted.
πa1, a2, …, an (σc (R)) ≡σc (πa1, a2, …, an (R))
© 2008, University of Colombo School of Computing
Rule 4 (commutative of σwith X or ∞)
If all the attributes in the selection condition c involve
only the attributes of one of the relations being
joined (say R) the two operations can be
commuted as
σc (R∞S) ≡(σc (R)) ∞S
Alternatively if the selection condition c can be written as c1
and c2, where c involves only the attributes of S, the
operations commute as
σc (R∞S) ≡(σc1 (R)) ∞(σc2 (S))
The same rules apply if the ∞is replaced by a X operation.
© 2008, University of Colombo School of Computing
Rule 5 (commuting σwith set operations)
The  σ operation commutes with  ∪,  ∩ and  −. If  θ
stands for any one of these 3 operations then
σc (RθS) ≡(σc (R)) θ(σc (S))
Using rules 2, 3, 4, and 5 concerning the
commutative of SELECT with other operations,
move each SELECT operation as far down the
query tree as is permitted by the attributes
involved in the select condition.
The objective is to reduce the number of tuples that
appear in the Cartesian product.
9
© 2008, University of Colombo School of Computing
SELECT lname
FROM    Employee, WorksOn, Project
WHERE pname=’Aquarius’ and pnumber=pno and essn=ssn
and bdate>’31/12/1957’;
(a) initial query tree2
250,000,000
2,500,000                                        100
5,000                                          500
σpname=’Aquarius’ and
pnumber=pno and essn=ssn and
bdate>’31/12/1957’
πlname
|
Employee WorksOn
ProjectX
|
X
Depending on
the capacity of
the memory
intermediate
records may
have to be
written to disk.
Thus increase
total Read/Write
operations.
© 2008, University of Colombo School of Computing
(b) Moving σoperation down the query
tree
|
σpnumber=pno
πlname
|
Employee
WorksOn
ProjectX
|
X
σpname=’Aquarius’|
σessn=ssn
σbdate>’31/12/1957’
|
2
100
100         1
500,000    100
1,000
500
5,000
© 2008, University of Colombo School of Computing
(c) Applying the more restrictive σoperation
first
2
5,000
5 1,000
500
1   5,000
500
100
|
σessn=ssn
πlname
|
Project
WorksOn
EmployeeX
|
X
σbdate>’31/12/1957’|
σpnumber=pno
σpname=’Aquarius’
|
Use the information that pname is a unique attribute of Project relation.
© 2008, University of Colombo School of Computing
(d) Replacing Cartesian product and Select with
Join operations
2
1,000
5
1
5000
500
100
πlname
|
Project
WorksOn
Employee
σbdate>’31/12/1957’|
∞pnumber=pno
σpname=’Aquarius’
|
∞essn=ssn
© 2008, University of Colombo School of Computing
(e) Moving Project operation (π) down the query
tree
∞essn=ssn
πlname
|
Project
WorksOn
Employee
σbdate>’31/12/1957’|
∞pnumber=pno
σpname=’Aquarius’
|
|
πssn, lname
|
πessn
πessn, pnoπpnumber
| |
Sub query
© 2008, University of Colombo School of Computing
Using Selectivity and Cost Estimates in
Query Optimisation
A query optimiser should not depend solely on
heuristic rules, it should also estimate and
compare the costs of executing a query using
different execution strategies and should choose
the strategy with the lowest cost estimate.
• Cost components
• Cost functions
• Examples
10
© 2008, University of Colombo School of Computing
Cost Components for Query Execution
• Access cost to secondary storage
– Cost of searching for, reading and writing data blocks that reside on secondary storage.
– Cost of searching depends on access file structures such as ordering, hashing, indexes. Also based on how data are allocated on disk.
• Storage cost
– Cost of storing any intermediate files that are generated by an execution strategy for the query
© 2008, University of Colombo School of Computing
Cost Components …
• Computation cost
– Cost of performing in-memory operations on the data buffers during query execution. E.g. searching, sorting, merging for join, performing computations on field values.
• Memory usage cost
– Cost pertaining to the number of memory buffers needed during query execution.
• Communication cost
– Cost of sending the query and its results from database server to client.
© 2008, University of Colombo School of Computing
Catalog Information used in Cost Functions
•nr – number of tuples in relation r
•sr – size of tuple in relation r
•br – number of blocks containing tuples of r
•fr – blocking factor – number of tuples of relation r that fit
into one block
•xa – number of levels of each multilevel index on attribute
a .
•da,r – number of distinct values in relation r for attribute a.
da,r = nr = πa(r) if a is unique
•sa,r – Selection cardinality of attribute a of relation r –
average number records satisfying equality condition. If
a is unique da,r = nr, sa,r = 1 else (if uniformly distributed),
sa,r = nr/da,r
© 2008, University of Colombo School of Computing
Cost Functions for SELECT
• Linear Search
– Retrieve all file blocks; cost = br
– For equal condition on average cost = br/2 if found else cost = br
• Binary Search
– Search access cost = log2 br + (sa,r/ fr ) – 1
– If unique attribute average cost = log2 br
• Using a primary / secondary key index
– Cost = xa + 1
• Using a hash key
– Cost ≈1
Data Storage, Indexing
© 2010, University of Colombo School of Computing
Dr. Jeevani Goonetillake
File Organization and Storage Structures
Primary Storage (Main Memory)
• Fast
• Volatile
© 2010, University of Colombo School of Computing
• Expensive
Secondary Storage (Files in disks or tapes)
• Non-Volatile
Disk Storage Devices
• Preferred secondary storage device for high
storage capacity and low cost.
• Data stored as magnetized areas on magnetic
disk surfaces.
• A disk pack contains several magnetic disks
© 2010, University of Colombo School of Computing
• A disk pack contains several magnetic disks
connected to a rotating spindle.
• Disks are divided into concentric circular
tracks on each disk surface. Track capacities
vary typically from 4 to 50 Kbytes.
Disk Storage Devices
• Since a track usually contains a large amount of
information, it is divided into smaller blocks or
sectors.
© 2010, University of Colombo School of Computing
• The block size B is fixed for each system.
• Typical block sizes range from B=512 bytes to
B=4096 bytes. Whole blocks are transferred
between disk and main memory for processing.
Disk Storage Devices
© 2010, University of Colombo School of Computing
Disk Storage Devices
• A read-write head moves to the track that contains
the block to be transferred.
• Disk rotation moves the block under the readwrite
head for reading or writing.
© 2010, University of Colombo School of Computing
head for reading or writing.
• Reading or writing a disk block is time consuming
because of the seek time s and rotational delay
(latency) rd.
Blocking
• Blocking: refers to storing a number of records in
one block on the disk.
• Blocking factor (bfr) refers to the number of
records per block.
© 2010, University of Colombo School of Computing
records per block.
• There may be empty space in a block if an integral
number of records do not fit in one block.
Files of Records
• A file is a sequence of records, where each record is a
collection of data values (or data items).
• A file descriptor (or file header ) includes information
that describes the file, such as the field names and their
© 2010, University of Colombo School of Computing
that describes the file, such as the field names and their
data types, and the addresses of the file blocks on disk.
• Records are stored on disk blocks. The blocking factor
bfr for a file is the (average) number of file records
stored in a disk block.
Operation on Files
• OPEN: Readies the file for access, and associates a
pointer that will refer to a current file record at each
point in time.
• FIND: Searches for the first file record that satisfies a
certain condition, and makes it the current file record.
© 2010, University of Colombo School of Computing
• FINDNEXT: Searches for the next file record (from
the current record) that satisfies a certain condition, and
makes it the current file record.
• READ: Reads the current file record into a program
variable.
• INSERT: Inserts a new record into the file, and makes
it the current file record.
Operation on Files
• DELETE: Removes the current file record from the
file, usually by marking the record to indicate that it is
no longer valid.
• MODIFY: Changes the values of some fields of the
current file record.
© 2010, University of Colombo School of Computing
• CLOSE: Terminates access to the file.
• REORGANIZE: Reorganizes the file records. For
example, the records marked deleted are physically
removed from the file or a new organization of the file
records is created.
• READ_ORDERED: Read the file blocks in order of a
specific field of the file.
Unordered Files
• Also called a heap or a pile file.
• New records are inserted at the end of the file.
• To search for a record, a linear search through
the file     records is necessary. This requires
reading and searching half the file blocks on the
© 2010, University of Colombo School of Computing
reading and searching half the file blocks on the
average, and is hence quite expensive.
• Record insertion is quite efficient.
• To delete a record, the record is marked as
deleted. Space is reclaimed during periodical
reoganization.
Ordered Files
• Also called a sequential file.
• File records are kept sorted by the values of an ordering
field.
• Insertion is expensive: records must be inserted in the
correct order.
© 2010, University of Colombo School of Computing
correct order.
• A binary search can be used to search for a record on its
ordering field value. This requires reading and searching
log2 of the file blocks on the average, an improvement
over linear search.
• Reading the records in order of the ordering field is quite
efficient.
Ordered Files
© 2010, University of Colombo School of Computing
Average Access Times
The following table shows the average access time to access a specific record
for a given type of file
© 2010, University of Colombo School of Computing
Hashed Files
• The file blocks are divided into M equal-sized buckets,
numbered bucket0, bucket1, …, bucket M-1.
• One of the file fields is designated to be the hash key of
the file.
© 2010, University of Colombo School of Computing
• The record with hash key value K is stored in bucket i,
where i=h(K), and h is the hashing function.
• Search is very efficient on the hash key.
• Collisions occur when a new record hashes to a bucket
that is already full. An overflow file is kept for storing
such records.
Hashed Files
• There are numerous methods for collision resolution,
including the following:
Open addressing: Proceeding from the occupied
position specified by the hash address, the program
checks the subsequent positions in order until an unused
(empty) position is found.
© 2010, University of Colombo School of Computing
Chaining: A collision is resolved by placing the new
record in an unused overflow location and setting the
pointer of the occupied hash address location to the
address of that overflow location.
Multiple hashing: The program applies a second hash
function if the first results in a collision.
Hashed Files
• The hash function h should distribute the records
uniformly among the buckets; otherwise, search
time will be increased because many overflow
records will exist.
© 2010, University of Colombo School of Computing
• Main disadvantages of static hashing:
Fixed number of buckets M is a problem if   the
number of records in the file grows or shrinks.
Hashed Files
© 2010, University of Colombo School of Computing
Hashed Files
Limitation
• Inappropriate for some retrievals:
based on pattern matching
eg. Find all students with ID like 98xxxxxx.
• Involving ranges of values
© 2010, University of Colombo School of Computing
• Involving ranges of values
eg. Find all students from 50100000 to
50199999.
• Based on a field other than
the hash field
Indexes
• Index: A data structure that allows particular records in a
file to be located more quickly
~ Index in a book
• An index can be sparse or dense:
© 2010, University of Colombo School of Computing
• An index can be sparse or dense:
– Sparse: record for only some of the search key values
(eg. Staff Ids: CS001, EE001, MA001). Applicable to
ordered data files only.
– Dense: record for every search key value. (eg. Staff
Ids: CS001, CS002, .. CS089, EE001, EE002, ..)
Indexes
• Data file: a file containing the logical
records
• Index file: a file containing the index
records
© 2010, University of Colombo School of Computing
records
• Indexing field: the field used to order the
index records in the index file
Dense Index
–The index is usually specified on one
field of the file (although it could be
specified on several fields)
– One form of an index is a file of
entries <field value, pointer to
© 2010, University of Colombo School of Computing
entries <field value, pointer to
record>, which is ordered by field
value
– The index is called an access path
on the field.
Sparse Index
© 2010, University of Colombo School of Computing
• Defined on an ordered data file.
• The data file is ordered on a key field.
• Includes one index entry for each block in the data file;
the index entry has the key field value for the first record
Primary Index
© 2010, University of Colombo School of Computing
the index entry has the key field value for the first record
in the block, which is called the block anchor.
• A primary index is a nondense (sparse) index, since it
includes an entry for each disk block of the data file and
the keys of its anchor record rather than for every search
value.
© 2010, University of Colombo School of Computing
Clustering Index
• Defined on an ordered data file
• The data file is ordered on a non-key field unlike primary
index, which requires that the ordering field of the data file
have a distinct value for each record.
© 2010, University of Colombo School of Computing
have a distinct value for each record.
• Includes one index entry for each distinct value of the
field; the index entry points to the first data block that
contains records with that field value.
• It is another example of nondense index.
© 2010, University of Colombo School of Computing
© 2010, University of Colombo School of Computing
Secondary Index
• A secondary index provides a secondary means of
accessing a file for which some primary access already
exists.
• The secondary index may be on a field which is a
candidate key and has a unique value in every record, or
a non-key with duplicate values.
© 2010, University of Colombo School of Computing
• The index is an ordered file with two fields.
• The first field is of the same data type as some non-ordering field of the data file that is an indexing field.
• The second field is either a block pointer or a record
pointer.
Secondary Index
• There can be many secondary indexes (and hence,
indexing fields) for the same file.
• Includes one entry for each record in the data file; hence,
it is a dense index.
© 2010, University of Colombo School of Computing
it is a dense index.
© 2010, University of Colombo School of Computing
© 2010, University of Colombo School of Computing
© 2010, University of Colombo School of Computing
Multi-Level Indexes
• Since a single-level index is an ordered file, we can
create a primary index to the index itself;
• In this case, the original index file is called the first-level
index and the index to the index is called the second-level index.
• We can repeat the process, creating a third, fourth, …,
© 2010, University of Colombo School of Computing
• We can repeat the process, creating a third, fourth, …,
top level until all entries of the top level fit in one disk
block.
• A multi-level index can be created for any type of first
level index (primary, secondary, clustering) as long as
the first-level index consists of more than one disk block.
© 2010, University of Colombo School of Computing
Multi-Level Indexes
• Such a multi-level index is a form of search tree.
• However, insertion and deletion of new index
entries is a severe problem because every level
© 2010, University of Colombo School of Computing
entries is a severe problem because every level
of the index is an ordered file.
Dynamic Multilevel Indexes Using      B+-Trees
• Most multi-level indexes use  B+-tree data
structure because of the insertion and deletion problem
• This leaves space in each tree node (disk block) to allow
for new index entries
• The data structure is a variation of search trees that
© 2010, University of Colombo School of Computing
• The data structure is a variation of search trees that
allow efficient insertion and deletion of new search
values.
• In  B+-Tree data structure, each node
corresponds to a disk block.
• Each node is kept between half-full and completely full
Dynamic Multilevel Indexes Using
B+-Trees
• An insertion into a node that is not full is quite
efficient.
• If a node is full the insertion causes a split into two
© 2010, University of Colombo School of Computing
• If a node is full the insertion causes a split into two
nodes.
• Splitting may propagate to other tree levels
Dynamic Multilevel Indexes Using
B+-Trees
• A deletion is quite efficient if a node does not
become less than half full.
• If a deletion causes a node to become less than
half full, it must be merged with neighboring
© 2010, University of Colombo School of Computing
half full, it must be merged with neighboring
nodes.
B+ tree
The structure of the internal nodes of a B+ tree
of order p is as follows:
• Each internal node is of the form
<P
1
,K
1
,P
2
, K
2
…..,Kq-1
,P
q-1
,P
q
>
where q ≤ p. Each P is a tree pointer.
© 2010, University of Colombo School of Computing
where q ≤ p. Each P
i
is a tree pointer.
• Within each node K
1
< K
2
< ….<K
q-1
• Each node has at most p tree pointers.
• Each node with q tree pointers, q ≤ p, has q-1
search key field values.
B+ tree
The structure of the leaf nodes of a B+ tree of
order p is as follows:
• Each leaf node is of the form
<K
1
,Pr
1
>,<K
2
,Pr
2
>,…..,<K
q-1
,Pr
q-1
>,P
next
>
where q ≤ p. Each Pr is a data pointer. P
© 2010, University of Colombo School of Computing
where q ≤ p. Each Pr
i
is a data pointer. P
next
points to the next leaf node of the B+ tree.
• Within each node K
1
< K
2
< ….<K
q-1
• All leaf nodes are at the same level.
© 2010, University of Colombo School of Computing
Difference between B-tree
and B+-tree
• In a B-tree, pointers to data records exist at all
levels of the tree.
© 2010, University of Colombo School of Computing
• In a B+-tree, all pointers to data records exists at
the leaf-level nodes.
• A B+-tree can have less levels (or higher capacity
of search values) than the corresponding B-tree.
© 2010, University of Colombo School of Computing
Physical Database Design
and Tuning
© 2010, University of Colombo School of Computing
Dr. Jeevani Goonetillake
Objective
• Identify commonly asked queries, and typical update
operations, and adjust the design to improve performance
for the operations
identified.
© 2010, University of Colombo School of Computing
Database tuning – as user requirements evolve, we tune or
adjust all aspects of a database design for better
performance.
Overview
• After ER design, schema refinement, and the definition
of views, we have the logical and external schemas for
our database.
• The next step is to choose indexes and to refine the
conceptual and external schemas (if necessary) to meet
performance goals.
© 2010, University of Colombo School of Computing
performance goals.
• We must begin by understanding the workload:
– The most important queries and how often they arise.
– The most important updates and how often they arise.
– The desired performance for these queries and
updates.
Understanding the Workload
• For each query in the workload:
– Which relations does it access?
– Which attributes are retrieved?
– Which attributes are involved in selection/join conditions?  How
selective are these conditions likely to be?
© 2010, University of Colombo School of Computing
selective are these conditions likely to be?
• For each update in the workload:
– Which attributes are involved in selection/join conditions?  How
selective are these conditions likely to be?
– The type of update (INSERT/DELETE/UPDATE), and the
attributes that are affected.
Decisions to Make
• What indexes should we create?
– Which relations should have indexes?
– What field(s) should be the search key?
– Should we build several indexes?
• For each index, what kind of an index should it be?
© 2010, University of Colombo School of Computing
• For each index, what kind of an index should it be?
– Primary?
– Clustered?
– Hash/tree?  Dynamic/static?
– Dense/sparse?
Decisions to Make
• Should we make changes to the conceptual
schema?
– Consider alternative normalized schemas?
(Remember, there are many choices in
decomposing into BCNF, etc.)
© 2010, University of Colombo School of Computing
decomposing into BCNF, etc.)
– Should we “undo’’ some decomposition steps and
settle for a lower normal form?
(Denormalization.)
– Horizontal partitioning, replication, views …
Choice of Indexes
• One approach: consider the most important queries.  Consider the
best plan using the current indexes, and see if a better plan is
possible with an additional index.  If so, create it.
• Before creating an index, must also consider the impact on updates
© 2010, University of Colombo School of Computing
• Before creating an index, must also consider the impact on updates
in the workload!
– Trade-off: indexes can make queries go faster, updates slower.
Require disk space, too.
Issues to Consider in Index
Selection
• Attributes mentioned in a WHERE clause are candidates for index
search keys.
– Exact match condition suggests hash index.
– Range query suggests tree index.
• Clustering is especially useful for range queries, although it
can help on equality queries as well in the presence of
© 2010, University of Colombo School of Computing
can help on equality queries as well in the presence of
duplicates.
• Try to choose indexes that benefit as many queries as possible.
• Since only one index can be clustered per relation, choose it based
on important queries that would benefit the most from clustering.
Issues in Index Selection (Contd.)
• Multi-attribute search keys should be considered when a
WHERE clause contains several conditions.
– If range selections are involved, order of attributes
should be carefully chosen to match the range
© 2010, University of Colombo School of Computing
should be carefully chosen to match the range
ordering.
– Such indexes can sometimes enable index-only
strategies for important queries.
Index Only Plan
• An index-only plan is a query evaluation plan which requires to
access only the indexes for the data records, and not the data
records themselves, in order to answer the query.
• Iindex only plans are much faster than regular plans since it
does not require reading of the data records.
© 2010, University of Colombo School of Computing
does not require reading of the data records.
• If a certain query is executed repeatedly which only require
accessing one field (for example the average value of a field) it
would be an advantage to create a search key on this field to
use an index-only plan.
Index-Only Plans
• A number of queries
can be answered
without retrieving
any tuples from one
or more of the
SELECT D.mgr, E.eid
FROM Dept D, Emp E
WHERE D.dno=E.dno
SELECT E.dno, COUNT(*)
FROM Emp E
GROUP BY  E.dno
<E.dno,E.eid>
Tree index!
<E.dno>
© 2010, University of Colombo School of Computing
or more of the
relations involved if
a suitable index is
available.
SELECT E.dno, MIN(E.sal)
FROM Emp E
GROUP BY  E.dno
SELECT AVG(E.sal)
FROM Emp E
WHERE  E.age=25 AND
E.sal BETWEEN 3000 AND 5000
<E.dno,E.sal>
Tree index!
<E. age,E.sal>
or
<E.sal, E.age>
Tree!
Issues in Index Selection
• When considering a join condition:
– Hash index on inner is very good for
Index Nested Loops.
• Should be clustered if join column is not key
© 2010, University of Colombo School of Computing
• Should be clustered if join column is not key
for inner, and inner tuples need to be
retrieved.
– Clustered B+ tree on join column(s) good
for Sort-Merge.
Example1
• Hash index on D.dname supports ‘Toy’ selection.
SELECT E.ename, D.mgr
FROM Emp E, Dept D
WHERE D.dname=‘Toy’ AND E.dno=D.dno
© 2010, University of Colombo School of Computing
• Hash index on D.dname supports ‘Toy’ selection.
– Given this, index on D.dno is not needed.
• Hash index on E.dno allows us to get matching
(inner) Emp tuples for each selected (outer) Dept
tuple.
Example1
• What if WHERE included:   “ … AND  E.age=25’’  ?
– Could retrieve Emp tuples using index on E.age,
then join with Dept tuples satisfying dname
selection.
© 2010, University of Colombo School of Computing
– If E.age index is already created, this query
provides much less motivation for adding an E.dno
index.
Example2
SELECT E.ename, D.mgr
FROM Emp E, Dept D
WHERE E.sal BETWEEN 10000 AND 20000
AND E.hobby=‘Stamps’ AND E.dno=D.dno
© 2010, University of Colombo School of Computing
• Clearly, Emp should be the outer relation.
– Suggests that we build a hash index on D.dno.
Example2
• What index should we build on Emp?
– B+ tree on E.sal could be used, OR an index on E.hobby
could be used.  Only one of these is needed, and which is
better depends upon the selectivity of the conditions.
• As a rule of thumb, equality selections more selective
© 2010, University of Colombo School of Computing
• As a rule of thumb, equality selections more selective
than range selections.
• As both examples indicate, our choice of indexes is guided by
the plan(s) that we expect an optimizer to consider for a
query.
Clustering and Joins
SELECT E.ename, D.mgr
FROM Emp E, Dept D
WHERE D.dname=‘Toy’ AND E.dno=D.dno
• Clustering is especially important when accessing inner tuples in
INL.
– Should make index on E.dno clustered.
© 2010, University of Colombo School of Computing
– Should make index on E.dno clustered.
• Suppose that the WHERE clause is instead:
WHERE E.hobby=‘Stamps  AND E.dno=D.dno
– If many employees collect stamps, Sort-Merge join may be
worth considering.
• Summary:  Clustering is useful whenever many tuples  are to be
retrieved.
Multi-Attribute Index Keys
• To retrieve Emp records with age=30 AND sal=4000, an index on
<age,sal> would be better than an index on age or an index on sal.
– Such indexes also called composite or concatenated indexes.
– Choice of index key orthogonal to clustering etc.
• If condition is:  20<age<30  AND  3000<sal<5000:
© 2010, University of Colombo School of Computing
• If condition is:  20<age<30  AND  3000<sal<5000:
– Clustered tree index on <age,sal> or <sal,age> is best.
• If condition is:  age=30  AND  3000<sal<5000:
– Clustered <age,sal> index much better than <sal,age> index.
Summary
• Database design consists of several tasks:  requirements analysis,
conceptual design, schema refinement, physical design and tuning.
– In general, have to go back and forth between these tasks to refine
a database design, and decisions in one task can influence the
choices in another task.
© 2010, University of Colombo School of Computing
• Understanding the nature of the workload for the application, and the
performance goals, is essential to developing a good design.
– What are the important queries and updates?  What
attributes/relations are involved?
Summary (Contd.)
• Indexes must be chosen to speed up important queries (and perhaps
some updates!).
– Index maintenance overhead on updates to key fields.
– Choose indexes that can help many queries, if possible.
– Build indexes to support index-only strategies.
© 2010, University of Colombo School of Computing
– Build indexes to support index-only strategies.
– Clustering is an important decision; only one index on a given
relation can be clustered!
– Order of fields in composite index key can be important.
• Static indexes may have to be periodically re-built.
Database Tuning
• The process of continuing to revise/adjust the
physical database design by monitoring resource
utilization as well as internal DBMS processing to
reveal bottlenecks such as contention for the same
data or devices.
© 2010, University of Colombo School of Computing
• Goal:
– To make application run faster
– To lower the response time of queries/transactions
– To improve the overall throughput of transactions
Tuning Indexes
• Reasons to tuning indexes
– Certain queries may take too long to run for lack of
an index;
– Certain indexes may not get utilized at all;
– Certain indexes may be causing excessive overhead
because the index is on an attribute that undergoes
© 2010, University of Colombo School of Computing
because the index is on an attribute that undergoes
frequent changes
• Options to tuning indexes
– Drop or/and build new indexes
– Change a non-clustered index to a clustered index
(and vice versa)
– Rebuilding the index
Tuning Queries
• In some situations involving using of correlated
queries, temporaries are useful.
• The order of tables in the FROM clause may
affect the join processing.
© 2010, University of Colombo School of Computing
affect the join processing.
• Some query optimizers perform worse on
nested queries compared to their equivalent
un-nested counterparts.
Tuning Queries
• A query with multiple selection conditions that are
connected via OR may not be prompting the query
optimizer to use any index. Such a query may be split
up and expressed as a union of queries, each with a
condition on an attribute that causes an index to be
used.
© 2010, University of Colombo School of Computing
• Apply the following transformations NOT condition may
be transformed into a positive expression.
• Embedded SELECT blocks may be replaced by joins.
WHERE conditions may be rewritten to utilize the
indexes on multiple columns.
Tuning the Conceptual Schema
• The choice of conceptual schema should be guided by the workload,
in addition to redundancy issues:
– We may settle for a 3NF schema rather than BCNF.
– Workload may influence the choice we make in decomposing a
relation into 3NF or BCNF.
– We may further decompose a BCNF schema!
© 2010, University of Colombo School of Computing
– We may further decompose a BCNF schema!
– We might denormalize (i.e., undo a decomposition step), or we
might add fields to a relation.
– We might consider horizontal decompositions.
• If such changes are made after a database is in use, called schema
evolution;  might want to mask some of these changes from
applications by defining views.
Summary of Database Tuning
• The conceptual schema should be refined by
considering performance criteria and workload:
– May choose 3NF or lower normal form over BCNF.
– May choose among alternative decompositions into
© 2010, University of Colombo School of Computing
– May choose among alternative decompositions into
BCNF (or 3NF) based upon the workload.
– May denormalize, or undo some decompositions.
– May choose a horizontal decomposition of a relation.
© 2008, University of Colombo School of Computing 1
Transaction Management
Dr G.N.Wikramanayake
Dr Jeevani Goonetillake
University of Colombo School of Computing
© 2008, University of Colombo School of Computing 2
• Concurrency control deals with influencing
how data can be viewed and updated by
users accessing the same information at
one time.
• Concurrency control allows users to use the
database concurrently without damaging the
transactions of other users.
• It supports and ensures the availability and
correct operations of simultaneous multiple
access in the database system.
Concurrency Control
© 2008, University of Colombo School of Computing 3
• Single user – at most one user at a time
can use the system. Restricted to some PC
DBMS.
• Multi-user – many users can use the
system concurrently (at the same time).
Most DBMS are multi-user. Airline
reservations systems, banks, insurance
agencies, stock exchanges are multi-user
systems operated concurrently.
Concurrency Control
© 2008, University of Colombo School of Computing 4
Multiple users can use computer systems
simultaneously because of the concept of
multiprogramming.  When only one CPU, the
multiprogramming operating systems execute some
commands from one program, then suspend that
program and execute some commands from the
next program and so on. A program is resumed at
the point where it was suspended when it gets its
turn to use the CPU again. Hence, concurrent
execution of the program is actually  interleaved.
Simultaneous processing of multiple programs are
done with multiple CPUs.
Multiprogramming
© 2008, University of Colombo School of Computing 5
Interleaved model of concurrent
execution
A A
B B
t1 t2
Single CPU
↑ ↑time
A
B
t1 t3
Multiple CPUs
↑ ↑time
© 2008, University of Colombo School of Computing 6
The basic unit of data transfer from the disk to the
computer memory is one block. For discussion
purpose, consider transactions at the level of
data item (field of some record in the database)
and disk blocks. At this level the database
access operations that a transaction can include
are
• READ(X) – reads database item X into a
program variable X;
• WRITE(X) – write the value of program variable
X into the database item X.
Database Access Operations
© 2008, University of Colombo School of Computing 7
• Executing a READ(X)
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if not
already in some main memory buffer).
3. Copy item X from the buffer to the program variable
named X.
• Executing a WRITE(X)
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if not
already in some main memory buffer).
3. Copy item X from the program variable named X into its
correct location in the buffer.
4. Store the updated block from the buffer back to disk
(either immediately or at some later point of time)
SELECT labmark INTO old_mark FROM enrol
WHERE studno = sno and courseno = cno
FOR UPDATE OF labmark;
UPDATE enrol SET labmark = new_mark
WHERE studno = sno and courseno = cno;
© 2008, University of Colombo School of Computing 8
• A transaction is an atomic unit of work that is either
completed in its entirety or not done at all. For recovery
purpose, the system needs to keep track of when the
transaction starts, terminates and commits or aborts. The
recovery manager keeps track of:
BEGIN marks the beginning of transaction execution
READ or WRITE read or write operations on the
database items that are executed.
END specifies that READ and WRITE transaction
operations have ended and mark the end of
transaction execution.
Transaction States and additional operations
© 2008, University of Colombo School of Computing 9
COMMIT signals a successful end of the transaction
so that any changes (updates) executed by the
transaction can be safely committed to the
database and will not be undone.
ROLLBACK (or ABORT) signals that the
transaction has ended unsuccessfully so that any
changes or effects that the transaction may have
applied to the database must be undone.
Transaction States and additional operations
© 2008, University of Colombo School of Computing 10
Atomicity – A transaction is an atomic unit of processing.
It is either performed in its entirety or not performed at
all.
Consistency preservation – A correct execution of the
transaction must take the database from one
consistent state to another
Isolation  – A transaction should not make its updates
visible to other transactions until it is committed.
Durability or permanency – Once a transaction
changes the database and the changes are
committed, these changes must never be lost because
of subsequent failures.
Properties of Transactions
© 2008, University of Colombo School of Computing 11
e.g. Transfer 50 from account A (A=1000) to B (B= 2000)
T1: BEGIN
READ(A);
A = A – 50;
WRITE(A);
READ(B);
B = B + 50;
WRITE(B);
END;
A=950; B=2050;
Transaction Properties …Consistency  – take the database from one consistent state to
another
Value of A+B (3000) should be same before
transaction and after transaction
Atomicity – either performed in its entirety or not performed at all
Transaction failure after WRITE(A), but before
WRITE(B), then A=950; B=2000; i.e. 50 is lost
Data is now inconsistent as A+B is now 2950
Durability – changes must never be lost because of subsequent
failures
Recover database: remove changes of a partially
done transaction (A=1000; B=2000); reconstruct
completed transactions (A=950; B=2050)
Isolation  – updates not visible to other transactions until
committed
Between WRITE(A) and WRITE(B) if second
transaction reads A and B it sees inconsistent data
as A+B = 2950
© 2008, University of Colombo School of Computing 12
E.g. Transaction T1 – No of reservations for airline A is X; No
of reservation for airline B is Y; N reservation from A is
cancelled and booked for B.
Transaction T2 – M reservations to airline A.
T1 T2
READ(X) READ(X)
X = X – N X = X + M
WRITE(X) WRITE(X)
READ(Y)
Y = Y + N
WRITE(Y)
Problems with Concurrent Use
Several problems can occur when
concurrent transactions execute in an
uncontrolled manner.
© 2008, University of Colombo School of Computing 13
This occurs when two transactions that access the same database item have their operations interleaved in a way that makes the value of some database item incorrect. X=80; Y=100
T1 T2
READ(X) X = 80, N = 5, M = 4
X = X – N X = 75
READ(X) X = 80
X = X + M X = 84
WRITE(X) X = 75
READ(Y)
WRITE(X) X = 84
Y = Y + N Y = 105
WRITE(Y) T1: X+Y = 84+105=189
but X should be 80-5+4 = 79
1. The lost update problem
© 2008, University of Colombo School of Computing 14
This occurs when one transaction updates a database
item and then the transaction fails for some reason.
T1 T2
READ(X) X = 80, N = 5, M = 4
X = X – N X = 75
WRITE(X) X = 75
READ(X) X = 75
X = X + M X = 79
WRITE(X) X = 79
READ(Y)
ROLLBACK
– abort – changes X back to its original value gives X = 80
but should be 80+4 = 84
2. The temporary update (Dirty read) problem
© 2008, University of Colombo School of Computing 15
If one transaction is calculating  an aggregate summary function on a number of records while other transactions are updating some of these records, the aggregate function may calculate some values before they are updated and
T1 T2
sum = 0
READ(X)
X = X – N
WRITE(X)
READ(X)
sum = sum + X
READ(Y)
sum = sum + Y sum=X+Y=75+100=175
READ(Y)
Y = Y + M
WRITE(Y)
3. The incorrect summary problem
© 2008, University of Colombo School of Computing 16
Another problem that may occur is the unrepeatable
read where a transaction T2 reads an item twice
(i.e. X) and the item is changed by another
transaction (i.e. T1) between the two reads.
T1 T2
…..
READ(X) X=80
READ(X)
X = X – N …..
WRITE(X)
READ(X) X=75
….
4. Unrepeatable Read problem
© 2008, University of Colombo School of Computing 17
Set of rows that is read once might be
different due to insert of new record.
T1 T2
…..
SELECT X 3 records
INSERT(X)
…..
SELECT X 4 records
….
Phantom Phenomenon
© 2008, University of Colombo School of Computing 18
• Concurrency control deals with influencing how
data can be viewed and updated by users
accessing the same information simultaneously.
Do you want one user to view/change an order
that is being changed/viewed by another user?
• There are two classes of concurrency control:
(i)  applies to read-only database access;
levels of isolation:  dirty read, committed read,
repeatable read
(ii) applies to updating database records: serializable
Concurrency Control
© 2008, University of Colombo School of Computing 19
• Database server process reads from the
database table without checking for locks (let
this process look at dirty data). This can be
useful when the table is static; 100% accuracy
is not as important as speed and freedom from
contention; you cannot wait for locks to be
released.
SQL Syntax:
SET TRANSACTION ISOLATION LEVEL
READ UNCOMMITED;
All are possible {dirty read, non-repeatable, phantom}
Dirty Reads
© 2008, University of Colombo School of Computing 20
• Database server process reads rows from
the database after seeing that lock could be
acquired (do not let this process look at dirty
data). This can be useful for lookups;
queries; reports yielding general information
(e.g. month-ending sales analyses).
SQL Syntax:
SET TRANSACTION ISOLATION LEVEL
READ COMMITTED;
Dirty read not possible; non-repeatable & phantom possible
Committed Reads
© 2008, University of Colombo School of Computing 21
• Database server process puts locks on all rows
examined to satisfy the query (do not let other
processes change any of the rows I have
looked at until I am done). It can be used for
critical, aggregate arithmetic (e.g. account
balancing); coordinated lookups from several
tables (e.g. reservation systems).
SQL Syntax:
SET TRANSACTION ISOLATION LEVEL
REPEATABLE READ;
Dirty read and non-repeatable not possible; phantom possible
Repeatable Reads
© 2008, University of Colombo School of Computing 22
• Always guarantee correct execution of
transaction.
SQL Syntax:
SET ISOLATION TO SERIALZABLE;
All are not possible {Dirty read, non-repeatable, phantom}
Serializable
© 2008, University of Colombo School of Computing 23
• Concurrency control is enforced using
locking: database level; table level; page
level; row level; key level
• Database Level Locking: Other users
cannot access database. Database stores
exclusive. It can be used when executing a
large number of updates involving many
tables; archiving the database files for
backups; altering the structure of the
database.
Locking
© 2008, University of Colombo School of Computing 24
• Other users cannot modify the table. It can
be used to: avoid conflict with other users
during batch operations that affects most or
all of the rows of a table; avoid running out
of locks when running an operation as a
transaction; prevent users from updating a
table for a period of time; prevent access to
a table while altering its structure or creating
indexes.
Table Level Locking
© 2008, University of Colombo School of Computing 25
• Table Level Locking in Share Mode: Others
may SELECT from the table.
SQL Syntax:
LOCK TABLE table_name IN SHARE MODE
• Table Level Locking in Exclusive Mode:
Others may not SELECT from the table.
SQL Syntax:
LOCK TABLE table_name IN EXCLUSIVE
MODE
Table Level Locking
© 2008, University of Colombo School of Computing 26
• Unlocking a Table:
SQL Syntax: UNLOCK TABLE table_name
• Setting the Lock Mode:
• Wait forever for the lock to be released.
SQL Syntax: SET LOCK MODE TO WAIT
• Do not wait for lock to be released.
SQL Syntax: SET LOCK MODE TO NOT WAIT
• Wait 20 seconds for lock to be released.
SQL Syntax: SET LOCK MODE TO WAIT 20
Lock/Unlock
© 2008, University of Colombo School of Computing 27
Schedule (History) – A schedule S of n transactions T1, T2,
…, Tn is an order of the operations of the transactions
subject to the constraint that operation of Ti in S must
appear in the same order in which they occur in Ti.
Serial Schedules – For every transaction T participating in
the schedule, all the operations of T are executed
consecutively in the schedule. Otherwise the schedule is
called non-serial. Serial schedules are always correct.
Serializable – If two disjoint groups of the non-serial
schedules are equivalent to one of the serial schedules.
Otherwise non-serializable.
Serializability of Schedules
© 2008, University of Colombo School of Computing 28
• Protocols or set of rules are used to guarantee
serializability.
• locking data items to prevent multiple
transactions from accessing the item
concurrently.
• timestamps, where a unique identifier for each
transaction generated by the system.
[immediate update]
• multi-version, where multiple versions of a
data item is used. [shadow paging]
Guaranteeing Serializability
x√
© 2008, University of Colombo School of Computing 29
• Two types of locks:
– Binary – can have two states or values, Locked
and unlocked;
– Shared and Exclusive locks – read_locked item
is called shared locked; write_locked item is
called exclusive locked.
Locking Techniques
© 2008, University of Colombo School of Computing 30
• Guaranteeing Serializability by Two-phase
locking
• If all locking operations precede the first
unlock operation in the transaction such a
transaction can be divided into 2 phases
• Expanding or growing phase, where new
locks on items can be acquired but none can
be released and Shrinking phase, where
existing locks can be released but no new
locks can be acquired.
Two-phase locking
© 2008, University of Colombo School of Computing 31
• If every transaction in a schedule follows the two-phase locking protocol the schedule is guaranteed
to be serializable, eliminating the need to test for
serializability of schedules any more.
• Locking can be used to solve the concurrency control
problems, but it can also lead to the problem of
deadlock.
• Deadlock – occurs when each of two or more
transactions are in a simultaneous wait state, each
of them waiting for others to release a lock before it
can proceed.
Two-phase locking
© 2008, University of Colombo School of Computing 32
T1 T2
read_lock(Y)
READ(Y)
read_lock(X)
READ(X)
….. …..
write_lock(X) wait
write_lock(Y) wait
Deadlock
© 2008, University of Colombo School of Computing 33
• Two main methods for dealing with the deadlock
problem:  deadlock prevention and  deadlock
detection & recovery.
Deadlock Prevention method
• Uses deadlock prevention protocol to ensure
that the system will never enter a deadlock
state.
– Each transaction locks all its data before it
begins execution.
– Either all requested data items are locked in one
step or none are locked.
Deadlock Handling
© 2008, University of Colombo School of Computing 34
Disadvantages:
• low data utilisation: many data items may be
locked but unused for a long period of time
• possible starvation: a transaction which
requires a number of data items for its
operation may find itself in a indefinite wait
state while at least one of the data items is
always locked by some other transaction.
Deadlock Prevention
© 2008, University of Colombo School of Computing 35
Allows the system to enter a deadlock state,
but examines the state of the system
periodically to detect whether a deadlock
has occurred.
If it has, the system attempts to recover from
the deadlock.
Deadlock Detection
© 2008, University of Colombo School of Computing 36
• Keep information about the current locks of
data items to different transactions, as well as
any outstanding locking request for data items.
• Invoke an algorithm which uses this information
to determine whether the system has entered a
deadlock state. A typical technique is to use
the Wait-for-Graph (WFG) and periodically
invoke an algorithm to search for cycles in the
graph. Each transaction involved in the cycle is
said to be deadlocked.
Deadlock Detection Process
© 2008, University of Colombo School of Computing 37
• The most common solution  is rollback one or more
transactions so that the deadlock can be broken.
• Three issues are involved in deadlock recovery
– issue of choosing a victim – determine which
transaction(s) among a set of deadlocked transactions to
rollback to break the deadlock.
– Issue of rollback operation – determine how far the chosen
victim transaction should be rolled backed (total or
partial).
– Issue of starvation – avoid a situation where some
transaction may always be chosen as the victim due to
selections based on cost factors. This may prevent the
transaction from ever completing its job.
Recovery Aspects
© 2008, University of Colombo School of Computing 38
• Both methods may result in transaction
rollback
• both methods require overheads
• prevention method is commonly used if the
probability of the system entering a
deadlock state is relatively high
• Otherwise detection and recovery method
should be used
Comparison
© 2008, University of Colombo School of Computing 39
• Consider the following two schedules. If they are
executed as two serial schedules T1, T2 or T2, T1
then serializability is guarantee.
T1 T2
read_lock(Y) read_lock(X)
READ(Y) READ(X)
unlock(Y) unlock(X)
write_lock(X) write_lock(Y)
READ(X) READ(Y)
X = X + Y  Y =X + Y
WRITE(X) WRITE(Y)
unlock(X) unlock(Y)
If initial values
X=20, Y=30 then
T1, T2 would
give X=50, Y=80
T2, T1 would
give X=70, Y=50.
Checking for Serializability
© 2008, University of Colombo School of Computing 40
• Assuming that there
are no techniques
used to guarantee
serializability (e.g.
two-phase locking is
nor used) If T1, T2
are executed
concurrently the
schedule will be
serializable only if it
gives the result one
of the above two
serial schedules.
Checking for Serializability
T2
read_lock(X)
READ(X)
unlock(X)
write_lock(Y)
READ(Y)
Y =X + Y
WRITE(Y)
unlock(Y)
E.g., the following schedule is
non-serializable.
T1
read_lock(Y)
READ(Y)
unlock(Y)
write_lock(X)
READ(X)
X = X + Y
WRITE(X)
unlock(X) would give
X=50, Y=50
© 2008, University of Colombo School of Computing 41
Timestamp Ordering
Another method for determining the serializability. There is
no deadlock and no locks.
Basic idea is if a transaction A starts before transaction B
then A should behave as if it completed entirety before B
started – i.e. as a serial schedule.
Transaction A is assigned a unique timestamp TS(A)
before starting executing the transaction
Next Transaction B is assigned TS(B) where TS(A) < TS(B)
WRITE-TS(X) denotes the largest timestamp of any
transaction that executed WRITE(X) successfully
READ-TS(X) denotes the largest timestamp of any
transaction that executed READ(X) successfully
© 2008, University of Colombo School of Computing 42
Timestamp Ordering Protocol
Suppose transaction A issues READ(X)
• If TS(A) < WRITE-TS(X), then A needs to read a value of X that was
overwritten by another transaction say B [A should never be allowed to
see B’s updates]. Hence Rollback A.
• If TS(A) ≥WRITE-TS(X), then READ(X) is executed and READ-TS(X) = MAX{TS(A), READ-TS(X)}
Suppose transaction A issues WRITE(X)
• If TS(A) < READ-TS(X), then value of X that A is producing was
needed previously, and system assumed that it would never change [A
should never be allowed to update anything that B has already seen].
Hence Rollback A.
• If TS(A) < WRITE-TS(X), then attempting to write an obsolete value
of X [A should never be allowed to update anything that B has already
change]. Hence Rollback A.
• Otherwise WRITE(X) is executed and WRITE-TS(X) = MAX{TS(A),
WRITE-TS(X)}
© 2008, University of Colombo School of Computing 43
Timestamp Ordering Protocol
T1
READ(Y)
READ(X)
Z = X + Y
T2
READ(Y)
Y = Y – 500
WRITE(Y)
READ(X)
X = X + 500
WRITE (X)
READ-TS(X)      WRITE-TS(X)
00
10
20
22
TS(T2)=2TS(T1)=1
READ-TS(Y)      WRITE-TS(Y)
00 1020
22
Both T1 and T2 are successfully completed. Similar to T1, T2
© 2008, University of Colombo School of Computing 44
Timestamp Ordering Protocol
T1
READ(X)
READ(Y)
Z = X + Y
T2
READ(Y)
Y = Y – 500
WRITE(Y)
READ(X)
X = X + 500
WRITE (X)
READ-TS(X)      WRITE-TS(X)
00
10
1<2 Rollback
READ-TS(Y)      WRITE-TS(Y)
00
20
22
T1 Rollback
TS(T2)=2TS(T1)=1
© 2008, University of Colombo School of Computing 45
Timestamp Ordering Protocol
There are schedules that are possible under
timestamp but not possible under two-phase locking
There are schedules that are possible under
two-phase locking but not possible under
timestamp
© 2008, University of Colombo School of Computing 46
Recovery from Failure
• Three types of failures: transaction, system and
media failure. Recovery allows a database system
to recover from physical or software failures when
they occur in the system.
• If a transaction fails after executing some of its
operations but before executing all of them.
System failure, also called soft crash.
The volatile storage is destroyed (e.g. power
failure).  This affects all  transactions currently in
progress but do not cause damage to the
database.
© 2008, University of Colombo School of Computing 47
Types of Failures
1. A computer failure (system crash)
– A hardware or software error occurs in the
computer system during transaction execution.
E.g. Hardware error, internal memory lost.
2. A transaction or system error
– Some operation in the transaction may cause
the failure. E.g. integer overflow, division by
zero, erroneous parameter values, logical
programming error. User may interrupt using
control-C.
© 2008, University of Colombo School of Computing 48
Recovery from Failure
3. Local errors or exception conditions detected by
the transaction.
– During transaction execution, certain conditions
may occur tat necessitate cancellation of the
transaction. Done using programmed ABORT.
E.g. data value not found, insufficient account
balance.
4. Concurrency control enforcement.
– Concurrency control method may decide to abort the
transaction (e.g. violates serializability) or to be
restarted later (e.g. several transactions are in a state
of deadlock).
© 2008, University of Colombo School of Computing 49
Recovery from Failure
5. Disk failure
– Some disk blocks may lose their data (a read or
write malfunction a disk read/write head crash)
while reading or writing a transaction.
6. Physical problems and disasters
– Power or air-conditioning failure, fire, theft,
sabotages, overwriting disks or tapes by
mistake, mounting of a wrong tape.
Failure types 1-4 occur more commonly than
the types 5-6.
© 2008, University of Colombo School of Computing 50
Recovery via Reprocessing
• Go back to a known point and reprocess the
workload – periodically make copies of the
database (save).
• Keep a record of all transactions since the copy.
• When failure occurs restore the database from the
save and reprocess all transactions.
• This strategy is often infeasible, as same amount
of time is required (e.g. 24 hours).
• Also it is impossible to guarantee same order of
concurrent transactions.
© 2008, University of Colombo School of Computing 51
Recovery via Rollback / Rollforward
• Save results of transactions and when
failure occurs to recover
by removing changes (rollback) then
reapply the changes (rollforward).
• Here a log is kept. The log contains a
record of data changes in chronological
order.
© 2008, University of Colombo School of Computing 52
Recovery via Rollback / Rollforward
• At certain prescribed intervals. E.g. after
specified number of entries have been
written to the log the system automatically
takes a checkpoint.
– Physically write the contents of the database
buffers out to the physical database.
• Physically write a special checkpoint record
out to the physical log. This record gives a
list of all transactions that were in progress
at that time. i.e. T2-T3
© 2008, University of Colombo School of Computing 53
Transactions
Checkpoint
tc
System failure
tf
Time
Transaction
T1
T2
T3
T4
T5
© 2008, University of Colombo School of Computing 54
Recovery Process
• Recreate (or not destroy) the outputs of all
completed transactions.
• Abort all transactions in process at the time
of the failure.
• Remove database changes generated by
aborted transactions.
• Restart aborted transactions.
© 2008, University of Colombo School of Computing 55
When system restarts after a failure
– Using the checkpoint record identify all
transactions that were in progress at that time.
UNDO={T2, T3}. Initial REDO list is empty.
REDO={}.
– Search forward through the log starting from
the checkpoint record.
– If a “start” log entry is found for transaction T,
add T to the UNDO list.
E.g. T4, T5. UNDO={T2, T3, T4, T5}.
© 2008, University of Colombo School of Computing 56
– If a “commit” log entry is found for transaction T
move T from the UNDO list to the REDO list.
E.g. T2, T4. UNDO={T3, T5}, REDO={T2, T4}.
– When end of the log is reached, the UNDO and
REDO lists are identified.
– System now works backwards through the log,
undoing the transactions in the UNDO list and
then it works forward again redoing the
transactions in the REDO list.
i.e. rollback and rollforward.
© 2008, University of Colombo School of Computing 57
Recovery via Rollback / Rollforward
Possible data items of a log record: relative record no, transaction id,
reverse pointer, forward pointer, time, type of operation, object,
old values, new value.
1 OT1 0 2 11.42 START
2 OT1 1 4 11.43 MODIFY CUST143 Old New
3 OT2 0 8 11.46 START
4 OT1 2 5 11.47 MODIFY SPAA Old New
5 OT1 4 7 11.47 INSERT ORDER11 Value
6 CT1 0 9 11.48 START
7 OT1 5 0 11.49 COMMIT
8 OT2 3 0 11.50 COMMIT
9 CT1 6 10 11.51 MODIFY SPBB Old New
10 CT1 9 0 11.51 COMMIT
Log instances for OT1, OT2, CT1 transactions. Write-ahead log is
maintained.
© 2008, University of Colombo School of Computing 58
Recovery outline
• Recovery from transaction failures usually means
that the database is restored to some state from
the past so that correct state – close to the time of
failure – can be reconstructed from the past state.
The system recovery activity is carried out as part of
the system’s restart procedure.
Three main techniques for recovery from failures:
deferred update, immediate update, shadow
paging
© 2008, University of Colombo School of Computing 59
Deferred Update
• Do not update the database until after a
transaction reaches its commit point.
• Then updates are recorded in the database.
• If transaction fails to reach commit it will not
have changed the database in any way – no
need to undo the failed transactions.
Before update After update
© 2008, University of Colombo School of Computing 60
Deferred Update
Transactions
READ(A)
A = A-50
WRITE(A)
READ(B)
B = B+50
WRITE(B)
READ(C)
C = C-100
WRITE(C)
Log
<T1 start>
<T1, A, 950>
<T1, B, 2050>
<T1 commit>
<T2 start>
<T2, C, 600>
<T2 commit>
Database
A=1000; B=2000; C=700
A=950; B=2050
C=600
T1
T2
A=1000; B=2000
A=950; B=2050; C=700
A=950; B=2050; C=600
Update database when <COMMIT>
If fails at             no REDO/UNDO required
REDO needed as some changes may not have been recorded
From what point to REDO?
© 2008, University of Colombo School of Computing 61
Deferred Update with Checkpoint
Log
<T0 commit>
<T1 start>
<checkpoint T1>
<T1, A, 950>
<T1, B, 2050>
<T1 commit>
<T2 start>
<T2, C, 600>
<T2 commit>
<T3 start>
Database
A=1000; B=2000; C=700
A=950; B=2050
C=600
A=1000; B=2000
A=950; B=2050; C=700
A=950; B=2050; C=600
Update database when <CHECKPOINT>
If fails at             need to REDO/UNDO from CHECKPOINT
© 2008, University of Colombo School of Computing 62
Immediate Update• Database may be updated by some operations
of a transaction before the transaction reaches
its commit point.
• These operations are typically recorded in the
log on disk by force-write before they are
applied to the database.
• If a transaction fails the effect of its operations
must be undone.
Before update After update
© 2008, University of Colombo School of Computing 63
Immediate Update
Log
<T1 start>
<T1, A, 1000, 950>
<T1, B, 2000, 2050>
<T1 commit>
<T2 start>
<T2, C, 700, 600>
<T2 commit>
Database
A=1000; B=2000; C=700
A=950
B=2050
C=600
A=1000; B=2000
A=950; B=2050; C=700
A=950; B=2050; C=600
Update database when <WRITE>
If fails at             need to UNDO, but how far?
© 2008, University of Colombo School of Computing 64
Immediate Update with Checkpoint
Log
<T0 commit>
<T1 start>
<checkpoint T1>
<T1, A, 1000, 950>
<T1, B, 2000, 2050>
<T1 commit>
<T2 start>
<T2, C, 700, 600>
<T2 commit>
<T3 start>
Database
A=1000; B=2000; C=700
A=950
B=2050
C=600
A=1000; B=2000
A=950; B=2050; C=700
A=950; B=2050; C=600
Also Update database when <CHECKPOINT>
If fails at             need to REDO/UNDO
© 2008, University of Colombo School of Computing 65
Shadow Paging
• The database management system keeps
more than one copy of a data item on disk.
• No need to undo a failed transaction, as the
original copy of the data is not lost or
changed.
Before update After update
Old copy of the db Old copy of the db
(to be deleted) New copy of the db
© 2008, University of Colombo School of Computing 66
Multi-version
• Reads are never delayed. Reads never delay updates.
– if T2 asks for Read(X) when T1 has write(X) then T2 is given
access to previously committed version of X;
– if T2 asks for Write(X) when T1 has Read(X) then T2 is given
access to X
• It is never necessary to rollback a read-only transaction
• Deadlock is possible only between update transactions
– If T2 asks for Write(X) when T1 has Write(X) then T2 goes to
wait state
1
© 2008, University of Colombo School of Computing 1
Transaction Management
Dr G.N.Wikramanayake
Dr Jeevani Goonetillake
University of Colombo School of Computing
© 2008, University of Colombo School of Computing 2
• Concurrency control deals with influencing
how data can be viewed and updated by
users accessing the same information at
one time.
• Concurrency control allows users to use the
database concurrently without damaging the
transactions of other users.
• It supports and ensures the availability and
correct operations of simultaneous multiple
access in the database system.
Concurrency Control
© 2008, University of Colombo School of Computing 3
• Single user – at most one user at a time
can use the system. Restricted to some PC
DBMS.
• Multi-user – many users can use the
system concurrently (at the same time).
Most DBMS are multi-user. Airline
reservations systems, banks, insurance
agencies, stock exchanges are multi-user
systems operated concurrently.
Concurrency Control
© 2008, University of Colombo School of Computing 4
Multiple users can use computer systems
simultaneously because of the concept of
multiprogramming.  When only one CPU, the
multiprogramming operating systems execute some
commands from one program, then suspend that
program and execute some commands from the
next program and so on. A program is resumed at
the point where it was suspended when it gets its
turn to use the CPU again. Hence, concurrent
execution of the program is actually  interleaved.
Simultaneous processing of multiple programs are
done with multiple CPUs.
Multiprogramming
© 2008, University of Colombo School of Computing 5
Interleaved model of concurrent
execution
A A
B B
t1 t2
Single CPU
↑ ↑time
A
B
t1 t3
Multiple CPUs
↑ ↑time
© 2008, University of Colombo School of Computing 6
The basic unit of data transfer from the disk to the
computer memory is one block. For discussion
purpose, consider transactions at the level of
data item (field of some record in the database)
and disk blocks. At this level the database
access operations that a transaction can include
are
• READ(X) – reads database item X into a
program variable X;
• WRITE(X) – write the value of program variable
X into the database item X.
Database Access Operations
2
© 2008, University of Colombo School of Computing 7
• Executing a READ(X)
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if not
already in some main memory buffer).
3. Copy item X from the buffer to the program variable
named X.
• Executing a WRITE(X)
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if not
already in some main memory buffer).
3. Copy item X from the program variable named X into its
correct location in the buffer.
4. Store the updated block from the buffer back to disk
(either immediately or at some later point of time)
SELECT labmark INTO old_mark FROM enrol
WHERE studno = sno and courseno = cno
FOR UPDATE OF labmark;
UPDATE enrol SET labmark = new_mark
WHERE studno = sno and courseno = cno;
© 2008, University of Colombo School of Computing 8
• A transaction is an atomic unit of work that is either
completed in its entirety or not done at all. For recovery
purpose, the system needs to keep track of when the
transaction starts, terminates and commits or aborts. The
recovery manager keeps track of:
BEGIN marks the beginning of transaction execution
READ or WRITE read or write operations on the
database items that are executed.
END specifies that READ and WRITE transaction
operations have ended and mark the end of
transaction execution.
Transaction States and additional operations
© 2008, University of Colombo School of Computing 9
COMMIT signals a successful end of the transaction
so that any changes (updates) executed by the
transaction can be safely committed to the
database and will not be undone.
ROLLBACK (or ABORT) signals that the
transaction has ended unsuccessfully so that any
changes or effects that the transaction may have
applied to the database must be undone.
Transaction States and additional operations
© 2008, University of Colombo School of Computing 10
Atomicity – A transaction is an atomic unit of processing.
It is either performed in its entirety or not performed at
all.
Consistency preservation – A correct execution of the
transaction must take the database from one
consistent state to another
Isolation  – A transaction should not make its updates
visible to other transactions until it is committed.
Durability or permanency – Once a transaction
changes the database and the changes are
committed, these changes must never be lost because
of subsequent failures.
Properties of Transactions
© 2008, University of Colombo School of Computing 11
e.g. Transfer 50 from account A (A=1000) to B (B= 2000)
T1: BEGIN
READ(A);
A = A – 50;
WRITE(A);
READ(B);
B = B + 50;
WRITE(B);
END;
A=950; B=2050;
Transaction Properties …Consistency  – take the database from one consistent state to
another
Value of A+B (3000) should be same before
transaction and after transaction
Atomicity – either performed in its entirety or not performed at all
Transaction failure after WRITE(A), but before
WRITE(B), then A=950; B=2000; i.e. 50 is lost
Data is now inconsistent as A+B is now 2950
Durability – changes must never be lost because of subsequent
failures
Recover database: remove changes of a partially
done transaction (A=1000; B=2000); reconstruct
completed transactions (A=950; B=2050)
Isolation  – updates not visible to other transactions until
committed
Between WRITE(A) and WRITE(B) if second
transaction reads A and B it sees inconsistent data
as A+B = 2950 © 2008, University of Colombo School of Computing 12
E.g. Transaction T1 – No of reservations for airline A is X; No
of reservation for airline B is Y; N reservation from A is
cancelled and booked for B.
Transaction T2 – M reservations to airline A.
T1 T2
READ(X) READ(X)
X = X – N X = X + M
WRITE(X) WRITE(X)
READ(Y)
Y = Y + N
WRITE(Y)
Problems with Concurrent Use
Several problems can occur when
concurrent transactions execute in an
uncontrolled manner.
3
© 2008, University of Colombo School of Computing 13
This occurs when two transactions that access the same database item have their operations interleaved in a way that makes the value of some database item incorrect. X=80; Y=100
T1 T2
READ(X) X = 80, N = 5, M = 4
X = X – N X = 75
READ(X) X = 80
X = X + M X = 84
WRITE(X) X = 75
READ(Y)
WRITE(X) X = 84
Y = Y + N Y = 105
WRITE(Y) T1: X+Y = 84+105=189
but X should be 80-5+4 = 79
1. The lost update problem
© 2008, University of Colombo School of Computing 14
This occurs when one transaction updates a database
item and then the transaction fails for some reason.
T1 T2
READ(X) X = 80, N = 5, M = 4
X = X – N X = 75
WRITE(X) X = 75
READ(X) X = 75
X = X + M X = 79
WRITE(X) X = 79
READ(Y)
ROLLBACK
– abort – changes X back to its original value gives X = 80
but should be 80+4 = 84
2. The temporary update (Dirty read) problem
© 2008, University of Colombo School of Computing 15
If one transaction is calculating an aggregate summary function on a number of records while other transactions are updating some of these records, the aggregate function may calculate some values before they are updated and
T1 T2
sum = 0
READ(X)
X = X – N
WRITE(X)
READ(X)
sum = sum + X
READ(Y)
sum = sum + Y sum=X+Y=75+100=175
READ(Y)
Y = Y + M
WRITE(Y)
3. The incorrect summary problem
© 2008, University of Colombo School of Computing 16
Another problem that may occur is the unrepeatable
read where a transaction T2 reads an item twice
(i.e. X) and the item is changed by another
transaction (i.e. T1) between the two reads.
T1 T2
…..
READ(X) X=80
READ(X)
X = X – N …..
WRITE(X)
READ(X) X=75
….
4. Unrepeatable Read problem
© 2008, University of Colombo School of Computing 17
Set of rows that is read once might be
different due to insert of new record.
T1 T2
…..
SELECT X 3 records
INSERT(X)
…..
SELECT X 4 records
….
Phantom Phenomenon
© 2008, University of Colombo School of Computing 18
• Concurrency control deals with influencing how
data can be viewed and updated by users
accessing the same information simultaneously.
Do you want one user to view/change an order
that is being changed/viewed by another user?
• There are two classes of concurrency control:
(i)  applies to read-only database access;
levels of isolation:  dirty read, committed read,
repeatable read
(ii) applies to updating database records: serializable
Concurrency Control
4
© 2008, University of Colombo School of Computing 19
• Database server process reads from the
database table without checking for locks (let
this process look at dirty data). This can be
useful when the table is static; 100% accuracy
is not as important as speed and freedom from
contention; you cannot wait for locks to be
released.
SQL Syntax:
SET TRANSACTION ISOLATION LEVEL
READ UNCOMMITED;
All are possible {dirty read, non-repeatable, phantom}
Dirty Reads
© 2008, University of Colombo School of Computing 20
• Database server process reads rows from
the database after seeing that lock could be
acquired (do not let this process look at dirty
data). This can be useful for lookups;
queries; reports yielding general information
(e.g. month-ending sales analyses).
SQL Syntax:
SET TRANSACTION ISOLATION LEVEL
READ COMMITTED;
Dirty read not possible; non-repeatable & phantom possible
Committed Reads
© 2008, University of Colombo School of Computing 21
• Database server process puts locks on all rows
examined to satisfy the query (do not let other
processes change any of the rows I have
looked at until I am done). It can be used for
critical, aggregate arithmetic (e.g. account
balancing); coordinated lookups from several
tables (e.g. reservation systems).
SQL Syntax:
SET TRANSACTION ISOLATION LEVEL
REPEATABLE READ;
Dirty read and non-repeatable not possible; phantom possible
Repeatable Reads
© 2008, University of Colombo School of Computing 22
• Always guarantee correct execution of
transaction.
SQL Syntax:
SET ISOLATION TO SERIALZABLE;
All are not possible {Dirty read, non-repeatable, phantom}
Serializable
© 2008, University of Colombo School of Computing 23
• Concurrency control is enforced using
locking: database level; table level; page
level; row level; key level
• Database Level Locking: Other users
cannot access database. Database stores
exclusive. It can be used when executing a
large number of updates involving many
tables; archiving the database files for
backups; altering the structure of the
database.
Locking
© 2008, University of Colombo School of Computing 24
• Other users cannot modify the table. It can
be used to: avoid conflict with other users
during batch operations that affects most or
all of the rows of a table; avoid running out
of locks when running an operation as a
transaction; prevent users from updating a
table for a period of time; prevent access to
a table while altering its structure or creating
indexes.
Table Level Locking
5
© 2008, University of Colombo School of Computing 25
• Table Level Locking in Share Mode: Others
may SELECT from the table.
SQL Syntax:
LOCK TABLE table_name IN SHARE MODE
• Table Level Locking in Exclusive Mode:
Others may not SELECT from the table.
SQL Syntax:
LOCK TABLE table_name IN EXCLUSIVE
MODE
Table Level Locking
© 2008, University of Colombo School of Computing 26
• Unlocking a Table:
SQL Syntax: UNLOCK TABLE table_name
• Setting the Lock Mode:
• Wait forever for the lock to be released.
SQL Syntax: SET LOCK MODE TO WAIT
• Do not wait for lock to be released.
SQL Syntax: SET LOCK MODE TO NOT WAIT
• Wait 20 seconds for lock to be released.
SQL Syntax: SET LOCK MODE TO WAIT 20
Lock/Unlock
© 2008, University of Colombo School of Computing 27
Schedule (History) – A schedule S of n transactions T1, T2,
…, Tn is an order of the operations of the transactions
subject to the constraint that operation of Ti in S must
appear in the same order in which they occur in Ti.
Serial Schedules – For every transaction T participating in
the schedule, all the operations of T are executed
consecutively in the schedule. Otherwise the schedule is
called non-serial. Serial schedules are always correct.
Serializable – If two disjoint groups of the non-serial
schedules are equivalent to one of the serial schedules.
Otherwise non-serializable.
Serializability of Schedules
© 2008, University of Colombo School of Computing 28
• Protocols or set of rules are used to guarantee
serializability.
• locking data items to prevent multiple
transactions from accessing the item
concurrently.
• timestamps, where a unique identifier for each
transaction generated by the system.
[immediate update]
• multi-version, where multiple versions of a
data item is used. [shadow paging]
Guaranteeing Serializability
x√
© 2008, University of Colombo School of Computing 29
• Two types of locks:
– Binary – can have two states or values, Locked
and unlocked;
– Shared and Exclusive locks – read_locked item
is called shared locked; write_locked item is
called exclusive locked.
Locking Techniques
© 2008, University of Colombo School of Computing 30
• Guaranteeing Serializability by Two-phase
locking
• If all locking operations precede the first
unlock operation in the transaction such a
transaction can be divided into 2 phases
• Expanding or growing phase, where new
locks on items can be acquired but none can
be released and Shrinking phase, where
existing locks can be released but no new
locks can be acquired.
Two-phase locking
6
© 2008, University of Colombo School of Computing 31
• If every transaction in a schedule follows the two-phase locking protocol the schedule is guaranteed
to be serializable, eliminating the need to test for
serializability of schedules any more.
• Locking can be used to solve the concurrency control
problems, but it can also lead to the problem of
deadlock.
• Deadlock – occurs when each of two or more
transactions are in a simultaneous wait state, each
of them waiting for others to release a lock before it
can proceed.
Two-phase locking
© 2008, University of Colombo School of Computing 32
T1 T2
read_lock(Y)
READ(Y)
read_lock(X)
READ(X)
….. …..
write_lock(X) wait
write_lock(Y) wait
Deadlock
© 2008, University of Colombo School of Computing 33
• Two main methods for dealing with the deadlock
problem:  deadlock prevention and  deadlock
detection & recovery.
Deadlock Prevention method
• Uses deadlock prevention protocol to ensure
that the system will never enter a deadlock
state.
– Each transaction locks all its data before it
begins execution.
– Either all requested data items are locked in one
step or none are locked.
Deadlock Handling
© 2008, University of Colombo School of Computing 34
Disadvantages:
• low data utilisation: many data items may be
locked but unused for a long period of time
• possible starvation: a transaction which
requires a number of data items for its
operation may find itself in a indefinite wait
state while at least one  of the data items is
always locked by some other transaction.
Deadlock Prevention
© 2008, University of Colombo School of Computing 35
Allows the system to  enter a deadlock state,
but examines the state of the system
periodically to detect whether a deadlock
has occurred.
If it has, the system attempts to recover from
the deadlock.
Deadlock Detection
© 2008, University of Colombo School of Computing 36
• Keep information about the current locks of
data items to different transactions, as well as
any outstanding locking request for data items.
• Invoke an algorithm which uses this information
to determine whether the system has entered a
deadlock state. A typical technique is to use
the Wait-for-Graph (WFG) and periodically
invoke an algorithm to search for cycles in the
graph. Each transaction involved in the cycle is
said to be deadlocked.
Deadlock Detection Process
7
© 2008, University of Colombo School of Computing 37
• The most common solution is rollback one or more
transactions so that the deadlock can be broken.
• Three issues are involved in deadlock recovery
– issue of choosing a victim – determine which
transaction(s) among a set of deadlocked transactions to
rollback to break the deadlock.
– Issue of rollback operation – determine how far the chosen
victim transaction should be rolled backed (total or
partial).
– Issue of starvation – avoid a situation where some
transaction may always be chosen as the victim due to
selections based on cost factors. This may prevent the
transaction from ever completing its job.
Recovery Aspects
© 2008, University of Colombo School of Computing 38
• Both methods may result in transaction
rollback
• both methods require overheads
• prevention method is commonly used if the
probability of the system entering a
deadlock state is relatively high
• Otherwise detection and recovery method
should be used
Comparison
© 2008, University of Colombo School of Computing 39
• Consider the following two schedules. If they are
executed as two serial schedules T1, T2 or T2, T1
then serializability is guarantee.
Checking for Serializability
T1 T2
read_lock(Y) read_lock(X)
READ(Y) READ(X)
unlock(Y) unlock(X)
write_lock(X) write_lock(Y)
READ(X) READ(Y)
X = X + Y  Y =X + Y
WRITE(X) WRITE(Y)
unlock(X) unlock(Y)
If initial values
X=20, Y=30 then
T1, T2 would
give X=50, Y=80
T2, T1 would
give X=70, Y=50.
© 2008, University of Colombo School of Computing 40
• Assuming that there
are no techniques
used to guarantee
serializability (e.g.
two-phase locking is
nor used) If T1, T2
are executed
concurrently the
schedule will be
serializable only if it
gives the result one
of the above two
serial schedules.
Checking for Serializability
T2
read_lock(X)
READ(X)
unlock(X)
write_lock(Y)
READ(Y)
Y =X + Y
WRITE(Y)
unlock(Y)
E.g., the following schedule is
non-serializable.
T1
read_lock(Y)
READ(Y)
unlock(Y)
write_lock(X)
READ(X)
X = X + Y
WRITE(X)
unlock(X) would give
X=50, Y=50
© 2008, University of Colombo School of Computing 41
Timestamp Ordering
Another method for determining the serializability. There is
no deadlock and no locks.
Basic idea is if a transaction A starts before transaction B
then A should behave as if it completed entirety before B
started – i.e. as a serial schedule.
Transaction A is assigned a unique timestamp TS(A)
before starting executing the transaction
Next Transaction B is assigned TS(B) where TS(A) < TS(B)
WRITE-TS(X) denotes the largest timestamp of any
transaction that executed WRITE(X) successfully
READ-TS(X) denotes the largest timestamp of any
transaction that executed READ(X) successfully
© 2008, University of Colombo School of Computing 42
Timestamp Ordering Protocol
Suppose transaction A issues READ(X)
• If TS(A) < WRITE-TS(X), then A needs to read a value of X that was
overwritten by another transaction say B [A should never be allowed to
see B’s updates]. Hence Rollback A.
• If TS(A) ≥WRITE-TS(X), then READ(X) is executed and READ-TS(X) = MAX{TS(A), READ-TS(X)}
Suppose transaction A issues WRITE(X)
• If TS(A) < READ-TS(X), then value of X that A is producing was
needed previously, and system assumed that it would never change [A
should never be allowed to update anything that B has already seen].
Hence Rollback A.
• If TS(A) < WRITE-TS(X), then attempting to write an obsolete value
of X [A should never be allowed to update anything that B has already
change]. Hence Rollback A.
• Otherwise WRITE(X) is executed and WRITE-TS(X) = MAX{TS(A),
WRITE-TS(X)}
8
© 2008, University of Colombo School of Computing 43
Timestamp Ordering Protocol
T1
READ(Y)
READ(X)
Z = X + Y
T2
READ(Y)
Y = Y – 500
WRITE(Y)
READ(X)
X = X + 500
WRITE (X)
READ-TS(X)      WRITE-TS(X)
00
10
20
22
TS(T2)=2TS(T1)=1
READ-TS(Y)      WRITE-TS(Y)
00 1020
22
Both T1 and T2 are successfully completed. Similar to T1, T2
© 2008, University of Colombo School of Computing 44
Timestamp Ordering Protocol
T1
READ(X)
READ(Y)
Z = X + Y
T2
READ(Y)
Y = Y – 500
WRITE(Y)
READ(X)
X = X + 500
WRITE (X)
READ-TS(X)      WRITE-TS(X)
00 10
1<2 Rollback
TS(T2)=2TS(T1)=1
READ-TS(Y)      WRITE-TS(Y)
00
20
22
T1 Rollback
© 2008, University of Colombo School of Computing 45
Timestamp Ordering Protocol
There are schedules that are possible under
timestamp but not possible under two-phase locking
There are schedules that are possible under
two-phase locking but not possible under
timestamp
© 2008, University of Colombo School of Computing 46
Recovery from Failure
• Three types of failures:  transaction, system and
media failure. Recovery allows a database system
to recover from physical or software failures when
they occur in the system.
• If a transaction fails after executing some of its
operations but before executing all of them.
System failure, also called soft crash.
The volatile storage is destroyed (e.g. power
failure).  This affects all transactions currently in
progress but do not cause damage to the
database.
© 2008, University of Colombo School of Computing 47
Types of Failures
1. A computer failure (system crash)
– A hardware or software error occurs in the
computer system during transaction execution.
E.g. Hardware error, internal memory lost.
2. A transaction or system error
– Some operation in the transaction may cause
the failure. E.g. integer overflow, division by
zero, erroneous parameter values, logical
programming error. User may interrupt using
control-C.
© 2008, University of Colombo School of Computing 48
Recovery from Failure
3. Local errors or exception conditions detected by
the transaction.
– During transaction execution, certain conditions
may occur tat necessitate cancellation of the
transaction. Done using programmed ABORT.
E.g. data value not found, insufficient account
balance.
4. Concurrency control enforcement.
– Concurrency control method may decide to abort the
transaction (e.g. violates serializability) or to be
restarted later (e.g. several transactions are in a state
of deadlock).
9
© 2008, University of Colombo School of Computing 49
Recovery from Failure
5. Disk failure
– Some disk blocks may lose their data (a read or
write malfunction a disk read/write head crash)
while reading or writing a transaction.
6. Physical problems and disasters
– Power or air-conditioning failure, fire, theft,
sabotages, overwriting disks or tapes by
mistake, mounting of a wrong tape.
Failure types 1-4 occur more commonly than
the types 5-6.
© 2008, University of Colombo School of Computing 50
Recovery via Reprocessing
• Go back to a known point and reprocess the
workload – periodically make copies of the
database (save).
• Keep a record of all transactions since the copy.
• When failure occurs restore the database from the
save and reprocess all transactions.
• This strategy is often infeasible, as same amount
of time is required (e.g. 24 hours).
• Also it is impossible to guarantee same order of
concurrent transactions.
© 2008, University of Colombo School of Computing 51
Recovery via Rollback / Rollforward
• Save results of transactions and when
failure occurs to recover
by removing changes (rollback) then
reapply the changes (rollforward).
• Here a log is kept. The log contains a
record of data changes in chronological
order.
© 2008, University of Colombo School of Computing 52
Recovery via Rollback / Rollforward
• At certain prescribed intervals. E.g. after
specified number of entries have been
written to the log the system automatically
takes a checkpoint.
– Physically write the contents of the database
buffers out to the physical database.
• Physically write a special checkpoint record
out to the physical log. This record gives a
list of all transactions that were in progress
at that time. i.e. T2-T3
© 2008, University of Colombo School of Computing 53
Transactions
Checkpoint
tc
System failure
tf
Time
Transaction
T1
T2
T3
T4
T5
© 2008, University of Colombo School of Computing 54
Recovery Process
• Recreate (or not destroy) the outputs of all
completed transactions.
• Abort all transactions in process at the time
of the failure.
• Remove database changes generated by
aborted transactions.
• Restart aborted transactions.
10
© 2008, University of Colombo School of Computing 55
When system restarts after a failure
– Using the checkpoint record identify all
transactions that were in progress at that time.
UNDO={T2, T3}. Initial REDO list is empty.
REDO={}.
– Search forward through the log starting from
the checkpoint record.
– If a “start” log entry is found for transaction T,
add T to the UNDO list.
E.g. T4, T5. UNDO={T2, T3, T4, T5}.
© 2008, University of Colombo School of Computing 56
– If a “commit” log entry is found for transaction T
move T from the UNDO list to the REDO list.
E.g. T2, T4. UNDO={T3, T5}, REDO={T2, T4}.
– When end of the log is reached, the UNDO and
REDO lists are identified.
– System now works backwards through the log,
undoing the transactions in the UNDO list and
then it works forward again redoing the
transactions in the REDO list.
i.e. rollback and rollforward.
© 2008, University of Colombo School of Computing 57
Recovery via Rollback / Rollforward
Possible data items of a log record: relative record no, transaction id,
reverse pointer, forward pointer, time, type of operation, object,
old values, new value.
1 OT1 0 2 11.42 START
2 OT1 1 4 11.43 MODIFY CUST143 Old New
3 OT2 0 8 11.46 START
4 OT1 2 5 11.47 MODIFY SPAA Old New
5 OT1 4 7 11.47 INSERT ORDER11 Value
6 CT1 0 9 11.48 START
7 OT1 5 0 11.49 COMMIT
8 OT2 3 0 11.50 COMMIT
9 CT1 6 10 11.51 MODIFY SPBB Old New
10 CT1 9 0 11.51 COMMIT
Log instances for OT1, OT2, CT1 transactions. Write-ahead log is
maintained.
© 2008, University of Colombo School of Computing 58
Recovery outline
• Recovery from transaction failures usually means
that the database is restored to some state from
the past so that correct state – close to the time of
failure – can be reconstructed from the past state.
The system recovery activity is carried out as part of
the system’s restart procedure.
Three main techniques for recovery from failures:
deferred update, immediate update, shadow
paging
© 2008, University of Colombo School of Computing 59
Deferred Update
• Do not update the database until after a
transaction reaches its commit point.
• Then updates are recorded in the database.
• If transaction fails to reach commit it will not
have changed the database in any way – no
need to undo the failed transactions.
Before update After update
© 2008, University of Colombo School of Computing 60
Deferred Update
Transactions
READ(A)
A = A-50
WRITE(A)
READ(B)
B = B+50
WRITE(B)
READ(C)
C = C-100
WRITE(C)
Log
<T1 start>
<T1, A, 950>
<T1, B, 2050>
<T1 commit>
<T2 start>
<T2, C, 600>
<T2 commit>
Database
A=1000; B=2000; C=700
A=950; B=2050
C=600
T1
T2
A=1000; B=2000
A=950; B=2050; C=700
A=950; B=2050; C=600
Update database when <COMMIT>
If fails at             no REDO/UNDO required
REDO needed as some changes may not have been recorded
From what point to REDO?
11
© 2008, University of Colombo School of Computing 61
Deferred Update with Checkpoint
Log
<T0 commit>
<T1 start>
<checkpoint T1>
<T1, A, 950>
<T1, B, 2050>
<T1 commit>
<T2 start>
<T2, C, 600>
<T2 commit>
<T3 start>
Database
A=1000; B=2000; C=700
A=950; B=2050
C=600
A=1000; B=2000
A=950; B=2050; C=700
A=950; B=2050; C=600
Update database when <CHECKPOINT>
If fails at             need to REDO/UNDO from CHECKPOINT
© 2008, University of Colombo School of Computing 62
Immediate Update• Database may be updated by some operations
of a transaction before the transaction reaches
its commit point.
• These operations are typically recorded in the
log on disk by force-write before they are
applied to the database.
• If a transaction fails the effect of its operations
must be undone.
Before update After update
© 2008, University of Colombo School of Computing 63
Immediate Update
Log
<T1 start>
<T1, A, 1000, 950>
<T1, B, 2000, 2050>
<T1 commit>
<T2 start>
<T2, C, 700, 600>
<T2 commit>
Database
A=1000; B=2000; C=700
A=950
B=2050
C=600
A=1000; B=2000
A=950; B=2050; C=700
A=950; B=2050; C=600
Update database when <WRITE>
If fails at             need to UNDO, but how far? © 2008, University of Colombo School of Computing 64
Immediate Update with Checkpoint
Log
<T0 commit>
<T1 start>
<checkpoint T1>
<T1, A, 1000, 950>
<T1, B, 2000, 2050>
<T1 commit>
<T2 start>
<T2, C, 700, 600>
<T2 commit>
<T3 start>
Database
A=1000; B=2000; C=700
A=950
B=2050
C=600
A=1000; B=2000
A=950; B=2050; C=700
A=950; B=2050; C=600
Also Update database when <CHECKPOINT>
If fails at             need to REDO/UNDO
© 2008, University of Colombo School of Computing 65
Shadow Paging
• The database management system keeps
more than one copy of a data item on disk.
• No need to undo a failed transaction, as the
original copy of the data is not lost or
changed.
Before update After update
Old copy of the db Old copy of the db
(to be deleted) New copy of the db
© 2008, University of Colombo School of Computing 66
Multi-version
• Reads are never delayed. Reads never delay updates.
– if T2 asks for Read(X) when T1 has write(X) then T2 is given
access to previously committed version of X;
– if T2 asks for Write(X) when T1 has Read(X) then T2 is given
access to X
• It is never necessary to rollback a read-only transaction
• Deadlock is possible only between update transactions
– If T2 asks for Write(X) when T1 has Write(X) then T2 goes to
wait state
Transaction Management
© 2010, University of Colombo School of Computing 1
Dr. Jeevani Goonetillake
Single User Vs Multiuser
Systems
• Single user -at most one user at a time can use
the system. Restricted to some PC DBMS.
© 2010, University of Colombo School of Computing 2
• Multi-user -many users can use the system
concurrently (at the same time). Most DBMS are
multi-user. E.g. Airline reservations systems,
banks, Insurance agencies, stock exchanges are
multi- user systems operated concurrently.
MultiProgramming
• Multiple users can use computer systems
simultaneously because of the concept of
multiprogramming.
• When only one CPU, the multiprogramming
© 2010, University of Colombo School of Computing 3
• When only one CPU, the multiprogramming
operating systems
– execute some commands from one program,
– then suspend that program and execute some
commands from the next program and so on. A program
is resumed at the point where it was suspended when it
gets its turn to use the CPU again.
Interleaved Processing Vs
Parallel Processing
Hence, concurrent execution of the program is
actually interleaved. Simultaneous processing of
multiple programs are done with multiple CPUs.
© 2010, University of Colombo School of Computing 4
Transaction Support
Transaction
Action, or series of actions, carried out by
user or application, which accesses or
changes contents of database.
© 2010, University of Colombo School of Computing 5
• Logical unit of work on the database.
• Transforms database from one
consistent state to another, although
consistency may be violated during
transaction.
Example Transaction
• E.g. Transaction Tl -No of reservations for airline
A is X; No of reservation for airline B is Y; N
reservation from A is cancelled and booked for B.
• Transaction T2 -M reservations to airline A.
T1  T2
read_item(X)  read_item(X)
© 2010, University of Colombo School of Computing 6
read_item(X)  read_item(X)
X=X-N  X=X+M
write_item(X)  write_item(X)
read_item(Y)
Y=Y+N
write_item(Y)
Transaction Support
• Can have one of two outcomes:
– Success – transaction commits and
database reaches a new consistent state.
– Failure – transaction aborts, and database
© 2010, University of Colombo School of Computing 7
– Failure – transaction aborts, and database
must be restored to consistent state before it
started.
– Such a transaction is rolled back or undone.
Properties of Transactions
•Four basic (ACID) properties of a transaction are:
Atomicity ‘All or nothing’ property.
Consistency Must transform database from one
consistent state to another.
© 2010, University of Colombo School of Computing 8
Isolation Partial effects of incomplete transactions
should not be visible to other transactions.
Durability Effects of a committed transaction are
permanent and must not be lost because of
later failure.
Transaction Support
• For recovery purpose, the system needs to
keep track of when the transaction starts,
terminates and commits or aborts. The
recovery manager keeps track of:
– BEGIN_TRANSACTION marks the beginning of
© 2010, University of Colombo School of Computing 9
– BEGIN_TRANSACTION marks the beginning of
transaction execution
– READ or WRITE operations on the database items
that are executed.
– END_TRANSACTION specifies that READ and
WRITE transaction operations have ended and
mark the end of transaction execution.
Transaction Support
– COMMIT_TRANSACTION signals a successful end
of the transaction so that any changes (updates)
executed by the transaction can be safely
committed to the database and will not be undone.
– ROLLBACK (or ABORT) signals that the transaction
© 2010, University of Colombo School of Computing 10
– ROLLBACK (or ABORT) signals that the transaction
has ended unsuccessfully so that any changes or
effects that the transaction may have applied to the
database must be undone.
Concurrency Control
Process of managing simultaneous operations
on the database without having them interfere
with one another.
• Prevents interference when two or more users
are accessing database simultaneously and at
least one is updating data.
© 2010, University of Colombo School of Computing 11
least one is updating data.
• Although two transactions may be correct in
themselves, interleaving of operations may
produce an incorrect result.
Concurrency Control
• Three examples of potential problems
caused by concurrency:
– Lost update problem.
© 2010, University of Colombo School of Computing 12
– Lost update problem.
– Uncommitted dependency problem.
– Inconsistent analysis problem.
Lost Update Problem
• This occurs when two transactions that
access the same database item have their
operations interleaved in a way that makes
the value of some database item incorrect.
E.g. Originally there were 80 reservations
© 2010, University of Colombo School of Computing 13
E.g. Originally there were 80 reservations
on the flight.
– T
1
transfers 5 seat reservations from the flight
corresponding to X to the flight corresponding
to Y.
– T
2
reserves 4 seats on X.
– Serially, final result of X should be 79.
Lost Update Problem
T1  T2
read_item(X)  X = 80, N = 5, M = 4
X = X -N  X = 75
read_item(X)  X = 80
X=X+M  X=84
write_item(X)
read_item(Y)
© 2010, University of Colombo School of Computing 14
• The update in T1 that removed the five seats
from X was lost.
read_item(Y)
write_item (X)
Y=Y+N
write_item(Y)  gives X = 84,
but should be 80-5+4 = 79
Uncommitted Dependency Problem
• Occurs when one transaction can see
intermediate results of another transaction
before it has committed.
• T1 cancels 5 seat reservations updating X to
75. Later T1 aborts, so X should be back at
© 2010, University of Colombo School of Computing 15
75. Later T1 aborts, so X should be back at
original value of 80.
• T2 has read new value of X (75) and reserves
4 seats, giving X = 79, instead of 84.
Uncommitted Dependency Problem
T1  T2
read_item(X)  X = 80, N = 5, M = 4
X = X -N  X = 75
write_item(X)
read_item(X)  X = 75
X=X+M  X=79
© 2010, University of Colombo School of Computing 16
X=X+M  X=79
write_item(X)
read_item(Y)
-abort -changes X back to its
original value gives X = 80,
but should be 80+4 = 84
Uncommitted Dependency Problem
• T2 reads the ‘temporary’ value of X, which will
not be recorded permanently in the database
because of the failure of T1.
• The value of item X that is read by T2 is called
© 2010, University of Colombo School of Computing 17
• The value of item X that is read by T2 is called
dirty data, because it has been created by a
transaction that has not completed and
committed yet. Hence this problem is also
known as the dirty read problem.
Uncommitted Dependency Problem
• This problem can be avoided by
preventing T2 from reading X until after
T1 commits or aborts.
© 2010, University of Colombo School of Computing 18
T1 commits or aborts.
Inconsistent Analysis Problem
• If one transaction is calculating an
aggregate summary function on a number
of records while other transactions are
updating some-of these records, the
© 2010, University of Colombo School of Computing 19
updating some-of these records, the
aggregate function may calculate some
values before they are updated and
others after they are updated.
Inconsistent Analysis Problem
T1  T2
sum = 0
read_item(X)
X = X -N
write_item(X)
read_item(X)
 Problem avoided
by preventing T2
from reading X and
Y until after T1
completed updates.
© 2010, University of Colombo School of Computing 20
read_item(X)
sum = sum + X
read_item(Y)
sum = sum +Y
read_item(Y)
Y=Y+M
write_item(Y)
completed updates.
Serializability
• Objective of a concurrency control protocol is
to schedule transactions in such a way as to
avoid any interference.
• Could run transactions serially, but this limits
degree of concurrency or parallelism in
© 2010, University of Colombo School of Computing 21
degree of concurrency or parallelism in
system.
• Serializability identifies those executions of
transactions guaranteed to ensure
consistency.
Serializability
Schedule
Sequence of reads/writes by set of
concurrent transactions.
Serial Schedule
© 2010, University of Colombo School of Computing 22
Schedule where operations of each
transaction are executed consecutively
without any interleaved operations from other
transactions.
• No guarantee that results of all serial
executions of a given set of transactions will be
identical.
Nonserial Schedule
• Schedule where operations from set of
concurrent transactions are interleaved.
• Objective of serializability is to find nonserial
schedules that allow transactions to execute
© 2010, University of Colombo School of Computing 23
schedules that allow transactions to execute
concurrently without interfering with one another.
• In other words, want to find nonserial schedules
that are equivalent to some serial schedule. Such
a schedule is called serializable.
Equivalence of Schedules
• Two schedules are called result equivalent if
they produce the same final state of the
database.
Two schedules can accidentally produce the
same final database.
© 2010, University of Colombo School of Computing 24
same final database.
• For two schedules to be equivalent the
operations applied to each data item affected
by the schedules should be applied to that
item in both schedules in the same order.
Equivalence of Schedules
• In serializability, ordering of read/writes is
important:
(a) If two transactions only read a data item,
they do not conflict and order is not important.
(b) If two transactions either read or write
© 2010, University of Colombo School of Computing 25
(b) If two transactions either read or write
completely separate data items, they do not
conflict and order is not important.
(c) If one transaction writes a data item and
another reads or writes same data item, order
of execution is important.
Equivalence of Schedules
• Two definitions of equivalence of
schedules are generally used:
– Conflict equivalence
© 2010, University of Colombo School of Computing 26
– Conflict equivalence
– View equivalence
Conflict equivalence
• Two schedules are said to be conflict
equivalent if order of any two conflicting
operations is the same in both schedules.
• A schedule S is conflict serializable if it is
(conflict) equivalent to some serial schedule S’.
© 2010, University of Colombo School of Computing 27
(conflict) equivalent to some serial schedule S’.
• Conflict serializable schedule orders any
conflicting operations in same way as some
serial execution.
Precedence Graph
• Precedence graph is used for determining the
conflict serializability of a schedule.
• Create:
– node for each transaction;
– a directed edge T
i → T
j
, if T
j
reads the value of an
item written by T ;
© 2010, University of Colombo School of Computing 28
– a directed edge T
i → T
j
, if T
j
reads the value of an
item written by T
I
;
– a directed edge T
i → T
j
, if T
j
writes a value into an
item after it has been read by T
i
.
• If precedence graph contains cycle schedule is
not conflict serializable.
Example
• T9 is transferring £100 from one account
with balance bal
x
to another account with
balance bal
y
.
• T10 is increasing balance of these two
© 2010, University of Colombo School of Computing 29
• T10 is increasing balance of these two
accounts by 10%.
Example
© 2010, University of Colombo School of Computing 30
View Serializability
• Offers less restrictive definition of
schedule equivalence than conflict
serializability.
• Two schedules S and S are view
© 2010, University of Colombo School of Computing 31
• Two schedules S
1
and S
2
are view
equivalent if the following three conditions
hold:
– For each data item x, if T
i
reads initial value of x
in S
1
, T
i
must also read initial value of x in S
2
.
View Serializability
– For each read on x by T
i
in S
1
, if value
read by x is written by T
j
, T
i
must also
read value of x produced by T
j
in S
2
.
© 2010, University of Colombo School of Computing 32
read value of x produced by T
j
in S
2
.
– For each data item x, if last write on x
performed by T
i
in S
1
, same transaction
must perform final write on x in S
2
.
View Serializability
• Schedule is view serializable if it is view
equivalent to a serial schedule.
• Every conflict serializable schedule is view
serializable, although converse is not true.
© 2010, University of Colombo School of Computing 33
serializable, although converse is not true.
• It can be shown that any view serializable
schedule that is not conflict serializable
contains one or more blind writes.
• In general, testing whether schedule is
serializable is NP-complete.
Example – View Serializable
schedule
• T1: r1(X); w1(X); T2: w2(X); and T3: w3(X);
S1 : r1(X); w2(X); w1(X); w3(X);
© 2010, University of Colombo School of Computing 34
S1 : r1(X); w2(X); w1(X); w3(X);
w2(X) and w3(X) – blind writes
Schedule S1 is view serializable since it is
equivalent to the serial schedule T1, T2, T3.
© 2010, University of Colombo School of Computing
Concurrency Control
Dr. Jeevani Goonetillake
© 2010, University of Colombo School of Computing
Slide 18- 2
Database Concurrency Control
• 1   Purpose of Concurrency Control
– To enforce Isolation (through mutual exclusion)
among conflicting transactions.
– To preserve database consistency through
consistency preserving execution of transactions.
– To resolve read-write and write-write conflicts.
• Example:
– In concurrent execution environment if T1 conflicts
with T2 over a data item A, then the existing
concurrency control decides if T1 or T2 should get the
A and if the other transaction is rolled-back or waits.
© 2010, University of Colombo School of Computing
Classification of Techniques:
1. Locking data items to prevent multiple transactions from
accessing the items concurrently; a number of locking
protocols have been proposed.
2. Use of timestamps. A timestamp is a unique identifier
for each transaction, generated by the system.
3. Multiversion concurrency control protocols that use
multiple versions of a data item.
4. Optimistic Concurrency Control:  based on the concept
of  validation  or  certification  of a transaction after it
executes its operations;  these are sometimes called
optimistic protocols.
© 2010, University of Colombo School of Computing
Locking
•A lock: a variable associated with a data
item that describes the status of the item
with respect to possible operations that
can be applied to it.
• Generally, there is one lock for each data
item in the database.
• Granularity of locking varies : typically
rows or sets of rows. An entire relation
may be locked, or an entire database.
© 2010, University of Colombo School of Computing
Types of Locks
• Binary locks:  only two states of a lock;
too simple and too restrictive; not used in
practice.
• Shared/exclusive locks:  which provide
more general locking capabilities and are
used in practical database locking
schemes. (Read Lock as a shared lock,
Write Lock as an exclusive lock).
• Certify lock:  used to improve
performance of locking protocols.
© 2010, University of Colombo School of Computing
Binary Locks
A binary lock can have two states or values:
locked and unlocked (or 1 and 0, for simplicity).
A binary lock enforces mutual exclusion on the data item;
i.e., at a time only one transaction can hold a lock.
A distinct lock is associated with each database item X. If the value of the lock on X is 1, item X  cannot be accessed  by a database operation that requests the item.
If the value of the lock on X is 0, the item can be accessed when requested.
© 2010, University of Colombo School of Computing
Binary Locks
If LOCK (X) = 1, the transaction is forced to wait.
If LOCK(X) = 0, it is set to 1 (the transaction locks the item)
and the transaction is allowed to access item X.
unlock_item(X) : sets LOCK(X) to 0 (unlocks the item) so
that X may be accessed by other transactions.
© 2010, University of Colombo School of Computing
Binary Locking Scheme
Every transaction must obey the following rules. Rules are
enforced by the LOCK MANAGER
1. A transaction T must issue the operation lock_item(X)
before any read_item(X) or write_item(X) operations
are performed in T.
2. A transaction T must issue the operation unlock_item(X)
after all read_item(X) and write_item(X) operations
are completed in T.
3. A transaction T will not issue a lock_item(X) operation
if it already holds the lock on item X.
4. A transaction T will not issue an unlock_item(X)
operation on X unless it already holds the lock on item X.
© 2010, University of Colombo School of Computing
Slide 18- 9
Binary Locks
The following code performs the lock operation:
B: if LOCK (X) = 0 (*item is unlocked*)
then LOCK (X) ←1 (*lock the item*)
else begin
wait (until lock (X) = 0) and
the lock manager wakes up the transaction);
goto B
end;
© 2010, University of Colombo School of Computing
Slide 18- 10
Binary Locks
The following code performs the unlock
operation:
LOCK (X) ←0 (*unlock the item*)
if any transactions are waiting then
wake up one of the waiting the transactions;
© 2010, University of Colombo School of Computing
Shared/Exclusive (or Read/Write)
locks
• A lock associated with an item X,LOCK(X), now
has three possible states:
“read-locked,” “write-locked,” or “unlocked.”
•A read-locked item is also called share-locked,
because other transactions are allowed to read
the item.
•A write-locked item is called exclusive-locked,
because a single transaction exclusively holds
the lock on the item.
© 2010, University of Colombo School of Computing
Slide 18- 12
Shared/Exclusive (or Read/Write)
locks
– Two locks modes:
• (a) shared (read)  (b) exclusive (write).
– Conflict matrix
Read    WriteRead    Write
N
NN
Y
© 2010, University of Colombo School of Computing
Slide 18- 13
Shared/Exclusive (or Read/Write)
locks
The following code performs the read operation:
B: if LOCK (X) = “unlocked” then
begin LOCK (X) ←“read-locked”;
no_of_reads (X) ←1;
end
else if LOCK (X) ←“read-locked” then
no_of_reads (X) ←no_of_reads (X) +1
else begin wait (until LOCK (X) = “unlocked” and
the lock manager wakes up the transaction);
go to B
end;
© 2010, University of Colombo School of Computing
Slide 18- 14
Shared/Exclusive (or Read/Write)
locks
The following code performs the write lock
operation:
B: if LOCK (X) = “unlocked”
then LOCK (X) ←“write-locked”;
else begin
wait (until LOCK (X) = “unlocked” and
the lock manager wakes up the transaction);
go to B
end;
© 2010, University of Colombo School of Computing
Slide 18- 15
Shared/Exclusive (or Read/Write)
locks
The following code performs the unlock operation:
if LOCK (X) = “write-locked” then
begin LOCK (X) ←“unlocked”;
wakes up one of the transactions, if any
end
else if LOCK (X) ←“read-locked” then
begin
no_of_reads (X) ←no_of_reads (X) -1
if  no_of_reads (X) = 0 then
begin
LOCK (X) = “unlocked”;
wake up one of the transactions, if any
end
end;
© 2010, University of Colombo School of Computing
Shared/Exclusive (or Read/Write)
locks
RULES FOR Read/Write LOCKS
1. A transaction T must issue the operation read_lock(X) or
write_lock(X) before any read_item(X) operation is performed in T.
2. A transaction T must issue the operation write_lock(X) before any
write_item(X) operation is performed in T.
3. A transaction T must issue the operation unlock(X) after
all read_item(X) and write_item(X) operations are completed in T.
© 2010, University of Colombo School of Computing
Shared/Exclusive (or Read/Write)
locks
4.  A transaction T will not issue a read_lock(X) operation if it
already holds a read (shared) lock or a write (exclusive) lock on
item X.
( EXCEPTIONS: DOWNGRADING OF LOCK from WRITE TO
READ)
5.  A transaction T will not issue a write_lock(X) operation if it already
holds a read (shared) lock or write (exclusive) lock on item X.
(EXCEPTIONS: UPGRADING OF LOCK FROM READ   TO WRITE)
6. A transaction T will not issue an unlock(X) operation unless it
already holds a read (shared) lock or a write (exclusive) lock on item
X.
© 2010, University of Colombo School of Computing
Slide 18- 18
Lock conversion
– Lock upgrade: existing read lock to write lock
if Ti has a read-lock (X) and Tj has no read-lock (X) (i
≠j) then
convert read-lock (X) to write-lock (X)
else
force Ti to wait until Tj unlocks X
– Lock downgrade: existing write lock to read lock
Ti has a write-lock (X)  (*no transaction can have any lock
on X*)
convert write-lock (X) to read-lock (X)
© 2010, University of Colombo School of Computing
Slide 18- 19
Database Concurrency Control
T1 T2 Result
read_lock (Y); read_lock (X); Initial values: X=20;
Y=30
read_item (Y); read_item (X); Result of serial
execution
unlock (Y); unlock (X); T1 followed by T2
write_lock (X); Write_lock (Y); X=50, Y=80.
read_item (X); read_item (Y); Result of serial
execution
X:=X+Y; Y:=X+Y; T2 followed by T1
write_item (X); write_item (Y); X=70, Y=50
unlock (X); unlock (Y);
© 2010, University of Colombo School of Computing
Slide 18- 20
Database Concurrency Control
T1 T2 Result
read_lock (Y); X=50; Y=50read_item (Y);unlock (Y); read_lock (X); read_item (X);unlock (X); write_lock (Y);read_item (Y);Y:=X+Y;write_item (Y);unlock (Y);write_lock (X);read_item (X);X:=X+Y;write_item (X);unlock (X);
Time
© 2010, University of Colombo School of Computing
Slide 18- 21
Two-Phase Locking Techniques
• Two Phases:
– (a) Locking (Growing)
– (b) Unlocking (Shrinking).
• Locking (Growing) Phase:
– A transaction applies locks (read or write) on desired data
items one at a time.
• Unlocking (Shrinking) Phase:
– A transaction unlocks its locked data items one at a time.
• Requirement:
– For a transaction these two phases must be mutually
exclusively, that is, during locking phase unlocking phase
must not start and during unlocking phase locking phase
must not begin.
© 2010, University of Colombo School of Computing
Slide 18- 22
Two-Phase Locking Techniques
T’1 T’2
read_lock (Y); read_lock (X); T1 and T2 follow two-phase
read_item (Y); read_item (X); policy but they are
subject to
write_lock (X); Write_lock (Y); deadlock, which must
be
unlock (Y); unlock (X); dealt with.
read_item (X); read_item (Y);
X:=X+Y; Y:=X+Y;
write_item (X); write_item (Y);
unlock (X); unlock (Y);
© 2010, University of Colombo School of Computing
Slide 18- 23
Two-Phase Locking Techniques
• Conservative:
– Prevents deadlock by locking all desired data items before transaction begins execution.
• Basic:
– Transaction locks data items incrementally.  This may cause deadlock which is dealt with.
• Strict:
– A stricter version of Basic, where X-unlocking is performed after a transaction terminates (commits or aborts and rolled-back).  This is the most commonly used two-phase locking algorithm.
• Rigorous:
– Like s2PL, but all unlocking is performed upon termination.
© 2010, University of Colombo School of Computing
Limitations Of 2 PL
1. The two-phase locking protocol guarantees
serializability but it does not permit all possible
serializable schedules.
2. Use of locks can cause two additional
problems:      deadlock and starvation.
© 2010, University of Colombo School of Computing
Slide 18- 25
Deadlock
– Deadlock
T’1 T’2
read_lock (Y); T1 and T2 did follow
two-phase
read_item (Y); policy but they are
deadlock
read_lock (X);
read_item (Y);
write_lock (X);
(waits for X) write_lock (Y);
(waits for Y)
– Deadlock (T’1 and T’2)
© 2010, University of Colombo School of Computing
Slide 18- 26
Deadlock
Deadlock prevention
– A transaction locks all data items it refers to
before it begins execution.
– This way of locking prevents deadlock since a
transaction never waits for a data item.
– The conservative two-phase locking uses this
approach.
© 2010, University of Colombo School of Computing
Slide 18- 27
Deadlock
• Deadlock detection and resolution
– In this approach, deadlocks are allowed to happen.
The scheduler maintains a wait-for-graph for detecting
cycle.  If a cycle exists, then one transaction involved
in the cycle is selected (victim) and rolled-back.
– A wait-for-graph is created using the lock table.  As
soon as a transaction is blocked, it is added to the
graph.  When a chain like: Ti waits for Tj waits for Tk
waits for Ti or Tj occurs, then this creates a cycle.
© 2010, University of Colombo School of Computing
Slide 18- 28
Deadlock
• Deadlock avoidance
– There are many variations of two-phase
locking algorithm.
– Some avoid deadlock by not letting the cycle
to complete.
– That is as soon as the algorithm discovers that
blocking a transaction is likely to create a
cycle, it rolls back the transaction.
– Wound-Wait and Wait-Die algorithms use
timestamps to avoid deadlocks by rolling-back
victim.
© 2010, University of Colombo School of Computing
Slide 18- 29
Stravation
– Starvation occurs when a particular transaction
consistently waits or restarted and never gets a
chance to proceed further.
– In a deadlock resolution it is possible that the same
transaction may consistently be selected as victim and
rolled-back.
– This limitation is inherent in all priority based
scheduling mechanisms.
– In Wound-Wait scheme a younger transaction may
always be wounded (aborted) by a long running older
transaction which may create starvation.
© 2010, University of Colombo School of Computing
Slide 18- 30
Timestamp
– A monotonically increasing variable (integer)
indicating the age of an operation or a
transaction.  A larger timestamp value
indicates a more recent event or operation.
– Timestamp based algorithm uses timestamp to
serialize the execution of concurrent
transactions.
© 2010, University of Colombo School of Computing
Slide 18- 31
Timestamp based concurrency
control algorithm
Basic Timestamp Ordering
– 1.  Transaction T issues a write_item(X) operation:
• If read_TS(X) > TS(T) or if write_TS(X) > TS(T), then an
younger transaction has already read the data item so abort
and roll-back T and reject the operation.
• If the condition in part (a) does not exist, then execute
write_item(X) of T and set write_TS(X) to TS(T).
– 2.  Transaction T issues a read_item(X) operation:
• If write_TS(X) > TS(T), then an younger transaction has
already written to the data item so abort and roll-back T and
reject the operation.
• If write_TS(X) ≤TS(T), then execute read_item(X) of T and
set read_TS(X) to the larger of TS(T) and the current
read_TS(X).
© 2010, University of Colombo School of Computing
Slide 18- 32
Strict Timestamp Ordering
1.  Transaction T issues a write_item(X)
operation:
If TS(T) > read_TS(X), then delay T until the
transaction T’ that wrote or read X has terminated
(committed or aborted)
2.  Transaction T issues a read_item(X)
operation:
If TS(T) > write_TS(X), then delay T until the
transaction T’ that wrote or read X has terminated
(committed or aborted).
© 2010, University of Colombo School of Computing
Slide 18- 33
Thomas’s Write Rule
– If read_TS(X) > TS(T) then abort and roll-back
T and reject the operation.
– If write_TS(X) > TS(T), then just ignore the
write operation and continue execution.  This
is because the most recent writes counts in
case of two consecutive writes.
– If the conditions given in 1 and 2 above do not
occur, then execute write_item(X) of T and set
write_TS(X) to TS(T).
© 2010, University of Colombo School of Computing
Slide 18- 34
Validation (Optimistic) Concurrency
Control SchemesIn this technique only at the time of commit serializability
is checked and transactions are aborted in case of non-serializable schedules.
• Three phases:
1. Read phase
2. Validation phase
3. Write phase
1. Read phase:
– A transaction can read values of committed data
items.  However, updates are applied only to local
copies (versions) of the data items (in database
cache).
© 2010, University of Colombo School of Computing
Slide 18- 35
2. Validation phase: Serializability is checked before
transactions write their updates to the database.
– This phase for Ti checks that, for each transaction Tj that
is either committed or is in its validation phase, one of
the following conditions holds:
• Tj completes its write phase before Ti starts its read
phase.
• Ti starts its write phase after Tj completes its write
phase, and the read_set of Ti has no items in
common with the write_set of Tj
Validation (Optimistic) Concurrency Control
Schemes
© 2010, University of Colombo School of Computing
Slide 18- 36
• Both the read_set and write_set of Ti have no items in
common with the write_set of Tj, and Tj completes its read
phase.
• When validating Ti, the first condition is checked first for each
transaction Tj, since (1) is the simplest condition to check.  If
(1) is false then (2) is checked and if (2) is false then (3 ) is
checked.  If none of these conditions holds, the validation fails
and Ti is aborted.
3. Write phase: On a successful validation
transactions’ updates are applied to the
database; otherwise, transactions are restarted.
Validation (Optimistic) Concurrency
Control Schemes
© 2010, University of Colombo School of Computing
Slide 18- 37
Granularity of data items and Multiple
Granularity Locking
• A lockable unit of data defines its granularity. Granularity
can be coarse (entire database) or it can be fine (a tuple
or an attribute of a relation).
• Data item granularity significantly affects concurrency
control performance. Thus, the degree of concurrency is
low for coarse granularity and high for fine granularity.
• Example of data item granularity:
1. A field of a database record (an attribute of a tuple)
2. A database record (a tuple or a relation)
3. A disk block
4. An entire file
5. The entire database
© 2010, University of Colombo School of Computing
Slide 18- 38
Granularity of data items and Multiple
Granularity Locking
• The following diagram illustrates a
hierarchy of granularity from coarse
(database) to fine (record).
DB
f1 f2
p11             p12        …       p1n
r111 … r11j r111 … r11j r111 … r11j r111 … r11j r111 … r11j r111 … r11j
p11             p12        …       p1n
© 2010, University of Colombo School of Computing
Slide 18- 39
Granularity of data items and Multiple
Granularity Locking
To manage such hierarchy, in addition to read and
write, three additional locking modes, called
intention lock modes are defined:
– Intention-shared (IS): indicates that a shared lock(s)
will be requested on some descendent nodes(s).
– Intention-exclusive (IX): indicates that an exclusive
lock(s) will be requested on some descendent node(s).
– Shared-intention-exclusive (SIX): indicates that the
current node is locked in shared mode but an
exclusive lock(s) will be requested on some
descendent nodes(s).
© 2010, University of Colombo School of Computing
Slide 18- 40
Granularity of data items and
Multiple Granularity Locking
Granularity of data items and Multiple Granularity Locking
• These locks are applied using the following compatibility
matrix:
IS       IX       S       SIX     X
yes  yes      yes    yes      no
yes  yes      no      no       no
yes  no       yes     no       no
yes  no       no      no       no
no  no       no      no       no
IS
IX
S
SIX
X
Intention-shared (IS
Intention-exclusive (IX)
Shared-intention-exclusive
(SIX)
© 2010, University of Colombo School of Computing
Slide 18- 41
Granularity of data items and
Multiple Granularity Locking
• The set of rules which must be followed for producing serializable
schedule are
1. The lock compatibility must adhered to.
2. The root of the tree must be locked first, in any mode.
3. A node N can be locked by a transaction T in S or IX mode only if
the parent node is already locked by T in either IS or IX mode.
4. A node N can be locked by T in X, IX, or SIX mode only if the
parent of N is already locked by T in either IX or SIX mode.
5. T can lock a node only if it has not unlocked any node (to enforce
2PL policy).
6. T can unlock a node, N, only if none of the children of N are
currently locked by T.
© 2010, University of Colombo School of Computing
Granularity of data items and
Multiple Granularity Locking
• T1 wants to update record r111 and record
r211.
• T2 wants to update all records on page p12.
• T3 wants to read record r11j and the entire
f2 file.
© 2010, University of Colombo School of Computing
Slide 18- 43
Granularity of data items and Multiple
Granularity LockingT1                                            T2                 T3
IX(db)
IX(f1)
IX(db)
IS(db)
IS(f1)
IS(p11)
IX(p11)
X(r111)
IX(f1)
X(p12)
S(r11j)
IX(f2)
IX(p21)
IX(r211)
Unlock (r211)
Unlock (p21)
Unlock (f2)
S(f2)
© 2010, University of Colombo School of Computing
Slide 18- 44
Granularity of data items and Multiple
Granularity Locking
T1                          T2                                   T3
unlock(p12)
unlock(f1)
unlock(db)
unlock(r111)
unlock(p11)
unlock(f1)
unlock(db)
unlock (r111j)
unlock (p11)
unlock (f1)
unlock(f2)
unlock(db)
Database Recovery Techniques
© 2010, University of Colombo School of Computing 1
Dr. Jeevani Goonetillake
Types of Failure
– The database may become unavailable for use
due to
• Transaction failure:  Transactions may fail
because of incorrect input, deadlock, incorrect
synchronization.
© 2010, University of Colombo School of Computing 2
synchronization.
• System failure:  System may fail because of
addressing error, application error, operating system
fault, RAM failure, etc.
• Media failure:  Disk head crash, power disruption,
etc.
– Recovery manager is responsible for
transaction atomicity and durability.
• Undo actions of aborted transactions.
• Actions from committed transactions can survive
system crashes.
Purpose of Database Recovery
© 2010, University of Colombo School of Computing 3
system crashes.
– To bring the database into the last consistent
state, which existed prior to the failure.
Transaction Log
– For recovery from any type of failure data values prior to
modification (BFIM – Before Image) and the new value after
modification (AFIM – After Image) are required.
– These values and other information is stored in a sequential file
called Transaction log.  A sample log is given below.  Back P and
Next P point to the previous and next log records of the same
transaction.
© 2010, University of Colombo School of Computing 4
T ID Back P Next P Operation Data item BFIM AFIM
T1 0 1
T1 1 4
T2 0 8
T1 2 5
T1 4 7
T3 0 9
T1 5 nil
Begin
Write
W
R
R
End
Begin
X
Y
M
N
X = 200
Y = 100
M = 200
N = 400
X = 100
Y = 50
M = 200
N = 400
– Data items to be modified are first stored into
database cache by the Cache Manager (CM).
– After modification they are flushed (written) to
Data Caching
© 2010, University of Colombo School of Computing 5
– After modification they are flushed (written) to
the disk.
Data Update
• In-place update: The disk version of the data item is
overwritten by the cache version (i.e. writes the buffer
back to the same original disk location).
– Immediate Update:  As soon as a data item is
modified in cache, the disk copy is updated.
– Deferred Update:  All modified data items in the cache
is written either after a transaction ends its execution
© 2010, University of Colombo School of Computing 6
is written either after a transaction ends its execution
or after a fixed number of transactions have completed
their execution.
• Shadow update:  The modified version of a data item
does not overwrite its disk copy but is written at a
separate disk location.
Write-Ahead Logging
• When in-place update (immediate or deferred) is used
then log is necessary for recovery and it must be
available to recovery manager.  This is achieved by
Write-Ahead Logging (WAL) protocol.  WAL states that
– For Undo: Before a data item’s AFIM is flushed to the
database disk (overwriting the BFIM) its BFIM must be
© 2010, University of Colombo School of Computing 7
database disk (overwriting the BFIM) its BFIM must be
written to the log and the log must be saved on a
stable store (log disk).
– For Redo: Before a transaction executes its commit
operation, all its AFIMs must be written to the log and
the log must be saved on a stable store.
Steal/No-Steal and Force/No-Force
Possible ways for flushing database cache to database
disk:
• Steal:  Cache page updated by a transaction can
be flushed to disk before transaction commits.
• No-Steal: Cache cannot be flushed before
transaction commit.
© 2010, University of Colombo School of Computing 8
transaction commit.
• Force:  all Cache pages updated by a transaction
are immediately flushed (forced) to disk when the
transaction commits.
• No-Force:  Modified pages may not immediately
be written to disk after a transaction commits.
Steal/No-Steal and Force/No-Force
– These give rise to four different ways for
handling recovery:
• Steal/No-Force (Undo/Redo)
• Steal/Force (Undo/No-redo)
© 2010, University of Colombo School of Computing 9
• No-Steal/No-Force (Redo/No-undo)
• No-Steal/Force (No-undo/No-redo)
Typical database systems employ a
steal/no_force strategy.
Transaction Roll-back (Undo) and
Roll-Forward (Redo)
To maintain atomicity, a transaction’s operations are
redone or undone.
• Undo: Restore all BFIMs on to disk (Remove all
AFIMs).
• Redo: Restore all AFIMs on to disk.
© 2010, University of Colombo School of Computing 10
• Redo: Restore all AFIMs on to disk.
– Database recovery is achieved either by performing
only Undos or only Redos or by a combination of the
two. These operations are recorded in the log as they
happen.
Checkpoints in the System Log
• Time to time (randomly or under some criteria) the database
flushes its buffer to database disk to minimize the task of recovery.
The following steps defines a checkpoint operation:
1. Suspend execution of transactions temporarily.
2. Force write modified buffer data to disk.
3. Write a [checkpoint] record to the log, save the log to disk.
© 2010, University of Colombo School of Computing 11
3. Write a [checkpoint] record to the log, save the log to disk.
4. Resume normal transaction execution.
• During recovery redo or undo is required to transactions appearing
after [checkpoint] record.
Deferred Update
• Defer or postpone any actual updates to the database
until the transaction completes its execution
successfully and reaches its commit point.
• After the transaction reaches its commit point and the
log is force-written to disk, the updates are recorded in
© 2010, University of Colombo School of Computing 12
log is force-written to disk, the updates are recorded in
the database.
• If a transaction fails before reaching its commit point
there is no need to undo any operation. Hence this is
known as NO_UNDO/REDO recovery algorithm.
Deferred Update
© 2010, University of Colombo School of Computing 13
Deferred Update
© 2010, University of Colombo School of Computing 14
Immediate Update
Undo/No-redo Algorithm
– In this algorithm AFIMs of a transaction are
flushed to the database disk under WAL before
it commits.
– For this reason the recovery manager undoes
© 2010, University of Colombo School of Computing 15
– For this reason the recovery manager undoes
all transactions during recovery.
– No transaction is redone.
– It is possible that a transaction might have
completed execution and ready to commit but
this transaction is also undone.
Shadow Paging
• The AFIM does not overwrite its BFIM but
recorded at another place on the disk.
Thus, at any time a data item has AFIM
and BFIM (Shadow copy of the data item)
at two different places on the disk.
© 2010, University of Colombo School of Computing 16
X
Y
Database
X’
Y’
X and Y:  Shadow copies of data items
X’ and Y’: Current copies of data items
During transaction execution, the shadow
directory is never modified.
Shadow Paging
© 2010, University of Colombo School of Computing 17

Responses

  1. I am impressed by the site and want to know more if possible.I will be glad to use your resources.

  2. This is a really cool website, not to mention very helpful. But it could do better by including some exampler projects, whish is what i was looking for.

  3. i admire your website, i solicit for more terms such as defintion of software, computer inventor and central processing unit.

  4. I’m really impressed with the site, don’t hesitate 2 reply my text


Leave a comment