We provide IT Staff Augmentation Services!

Big Data Architect Resume

CA

PROFESSIONAL SUMMARY:

  • Highly skilled IT Professional with more than 10 years of experience in analysis, design, development and deployment of DW projects for Health Care, Banking, Finance(CFO), Accounting and Insurance domains.
  • Worked on several Proof Of Concepts to implement Streaming/ETL pipelines/Data Science/Machine Learning (using SPARK)/AWS and new cutting - edge technologies
  • Developed and delivered long-term strategic goals for data architecture vision and standards in conjunction with data users, clients, and other key stakeholders.
  • Involved in end to end SDLC of project development by providing Gathering Business requirements, Analysis and Design Reviews, Development, Code walkthroughs, Production implementation and Post implementation validation, Using traditional ETL, Cloud and Hadoop platforms.
  • Oversaw the mapping of data sources, data movement, interfaces, and analytics, with the goal of ensuring data quality and data retention policy.
  • Document the data architecture and environment in order to maintain a current and accurate view of the larger data picture
  • Established methods and procedures for tracking data quality, completeness, redundancy, and improvement.
  • Created strategies and plans for data security, backup, disaster recovery, business continuity, and archiving.
  • Defined best practices for Informatica/DataStage/Hadoop development and implemented at enterprise level.
  • Strong Knowledge of data warehouse implementation process, from business requirements through logical modeling, physical database design, data sourcing and data transformation, data loading and performance tuning.
  • Involved in data profiling and data cleansing activities using Confidential Information Server.
  • Expert in data warehousing techniques for Data cleansing, Surrogate key assignment, slowly changing dimension phenomenon (SCD), Change Data Capture (CDC)
  • Using Data replication process for Primarily capturing data changes in the source system and applying Change data Capture(CDC) loading in target system or generating flat files
  • Uses TCP/IP as transport protocol to stream data from the CDC data Replication process for InfoshpereDatastage to Other target systems
  • Worked extensively with multiple file types including VSAM, XML, flat files, JSON, AVRO, PARQUETand SalesForce objects
  • Extensive experience in automating several reusable components using UNIX Scripts and PowerShell scripts
  • Excellent experience in scheduling&monitoring the production jobs using CONTROL-M andAutosys
  • Extensively worked with Parallel Extender for parallel processing while working with bulk data sources.
  • Experience in RDBMS with good programming knowledge in Oracle 10g, SQL Server 2008 R2/ 2005, DB2, Teradata, Netezza, SQL, PL/SQL, DB Triggers, Views, Stored Procedures, Functions and Packages.
  • Excellent working knowledge in Business Objects (Building IDT and Webi reports)
  • Experience in Executing projects in Agile Methodologies

TECHNICAL EXPERIENCE:

Big Data Tools: Python 2.7.10 , Sqoop 1.4.3, PIG 0.12.0, Hadoop 2.2.0, Hive 0.12.0, Scala, Zeppelin, Spark 1.6.2, Kafka, Storm, Flume, YARN, Oozie, Ambari, MongoDB

BI Tools: Business Objects webi and Crystal Reports

RDBMS: Oracle 9i,10g and 11g, DB2 8.1, MS-SQL Server 2000, Sybase ASE, Teradata, Netezza

OPERATING SYSTEM: Unix, HP Unix(UX,IA) Confidential -AIX 5.2, Sun Solaris, Red-Hat Linux8,Suse Linux, MS-DOS, Windows /2000/XP/2003/WIN 7

TESTING TOOLS: QTP, QC, and Test Director

SERVERS: Citrix, VMWare and Domino mailing servers.

CMM TOOL: Start Team andRTC(Rational Team Concert)

SCHEDULING TOOL: Version and Source Control

Control-M andAutosys: Borland StarTeam and RTC

WORK EXPERIENCE:

Confidential, CA

Big Data Architect

Responsibilities:

  • Actively involved in Business Requirement meetings, to understand Operational, Analytical and Reporting data needs and converted them to robust IT solutions, adhering with governance standards and Information Architecture.
  • Provided design oversight by reviewing and approving design solutions across all bank agile streams.
  • Create short-term tactical solutions to achieve long-term objectives and an overall data management roadmap.
  • Involved in end to end life cycle of project development by providing Design Reviews, Construction, Code walkthroughs, Production implementation and Post implementation validation
  • Establish methods and procedures for tracking data quality, completeness, redundancy, track metadata, data lineage and data improvement
  • Address data-related problems in regards to systems integration, compatibility, and multiple-source integration
  • Establish processes for governing the identification, collection, and use of corporate metadata; take steps to assure metadata accuracy and validity.
  • Conduct data capacity planning, life cycle, duration, usage requirements, feasibility studies, and other tasks.
  • Create strategies and plans for data security, backup, disaster recovery, business continuity, and archiving.
  • Ensure that data strategies and architectures are in regulatory compliance.
  • Identify and develop opportunities for data reuse, migration, or retirement.
  • Involved in Relational and Dimensional data modeling review meeting, for creating Logical and Physical structures
  • Developed and implemented applications using Hadoop, Data Stage and Informatica
  • Performed Importing and exporting data into HDFS and Hive using Sqoop
  • Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows. Also Wrote Pig scripts to run ETL jobs on the data in HDFS.
  • Written Hive queries for data analysis to meet the business requirements.
  • Created partitioned tables in Hive. Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
  • Data is processed using Map Reduce and the result is stored in HBase and displayed as per the user requirement either Pie/bar chart or both.
  • Worked on several Proof Of Concepts to implement Streaming/ ETL pipelines/Machine Learning/AWS and new cutting-edge technologies
  • Coordinated the project development with onsite team and off-shore development teams in India.

Technologies: DataStage 9.1, DB2, SQL server, Netezza, RTC, UNIX, CONTROL-M, Python 2.7.10, Sqoop 1.4.3, PIG 0.12.0, Hadoop 2.2.0, Spark 1.6.3, Spark 2.2.0, Oozie, Hive 0.12.0, BOXI R2

Confidential

Big Data Architect

Responsibilities:

  • End to end responsible for COMPAS data warehousing system maintenance and enhancements
  • Handling and resolving critical business and technical issues
  • Created Dimensional Tables and Fact Tables based on warehouse design
  • Serving as a focal point for communication with all stake holders
  • Handling interface and integration issues with other teams
  • Created Data warehouse with major Data marts like sales, customers, product, month, Transactional Details.
  • Created reports using Business objects for Drill down, Slice and Dice type of reports and deployment for end-users.
  • Leading a team of 6, spread across both onsite and offshore.
  • Involved in job design to extract, cleanse, and parameterize the jobs to allow portability and flexibility during runtime, to apply business rules and logic at transformation level and load data into data warehousing system.
  • Involved preparation of unit test cases and performed unit testing
  • A key player in the team, promoting team spirit, accelerated learning and continuous improvement.
  • Performing monitoring and tracking of team activities against the project plan/ schedule and providing status reports to the management.
  • Involved in extracting the data from different source systems
  • Have fixed complex issues
  • Performance tune the all Confidential jobs and saved more than 3 hours job process time every day

Technologies: Datastage8.1,Business Objects, Datastage8.1, PLSQL, Oracle 10g and ASP platforms: Linux (RHEL5). Client Platforms Win XP and WIN7.

Confidential, San Antonio, Texas

Big Data Architect

Responsibilities:

  • Leading a team of 8, spread across both onsite and offshore
  • Handling and resolving critical business and technical issues
  • Serving as a focal point for communication with all stake holders
  • Handling interface and integration issues with other teams
  • Performing monitoring and tracking of team activities against the project plan/ schedule and providing status reports
  • Building Enterprise data Model including building of P&C, Life and Bank models for IMSM
  • Involved in extracting the data from different source systems
  • Involved in job design to extract, cleanse, and parameterize the jobs to allow portability and flexibility during runtime, to apply business rules and logic at transformation stage and load data into data warehouse
  • Using multiple stages like Sequential File, Sort, Merge, Join, Lookup, Filter, Transformer, column import/export, SCD and all processing stages during ETL development
  • Involved in data cleansing in key process
  • Involved in preparation of unit test cases and performed unit-testing
  • A key player in the team, promoting team spirit, accelerated learning and continuous improvement

Technologies: Confidential DataStage 8.101(Administrator, Designer and Director), DB2, Oracle 9i, MSSQL. Servers’ platforms: Linux (RHEL5). Client Platforms Win XP and WIN7.

Confidential

Big Data Architect

Responsibilities:

  • Involved in extracting the data from different data sources like SQL Server and flat files.
  • Managing the team (maintain and work assignments, reports submissions to the management)
  • Involved L3 support activity(Direct Customer interactions, Designing and testing the customer complicated scenarios finally deliver the .dsx files to the customers)
  • Involved in extracting the data from staging to Relational.
  • Provided the analytical solution to the data by validation.
  • Using multiple stages like Sequential File, FTP enterprise stage, Sort, Merge, Join, Lookup, Filter, Transformer, and Aggregator during ETL development.
  • Managing Repository Metadata from DataStage Manager.
  • Involved in the designing automated Batch jobs.
  • Involved in creating and maintaining Sequencer and Batch jobs.
  • Involved in data cleansing in key process.
  • Performing Client side and server side testing for the all DS stages.

Technologies: Confidential DataStage 8.101(Administrator, Designer and Director), DB2, Oracle 9i, MSSQL. Servers’ platforms: Linux (RHEL5). Client Platforms Win XP and WIN7.

Hire Now