We provide IT Staff Augmentation Services!

Big Data Engineer Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • Over twenty years of highly accomplished technical and business professional with demonstrated success in project/program leadership including Software Development, Business Intelligence and Data Warehouse assessment, strategy and implementation. Proven ability to initiate, plan and execute technical projects. Well experienced in providing technical and management expertise to cross - functional development teams.
  • Results and challenge-driven Data Warehouse Lead with extensive ETL Development, production support, systems design, and team leadership history. Collaborative coordinator with solid reputation for building positive professional relationships, troubleshooting/resolving system/application issues, and technically supporting large end-user groups.
  • Effective communicator respected for defining project requirements, implementing creative methodologies/strategies, and facilitating timely project execution.
  • Over 2 years of experience in Hadoop and its components like HDFS, Map Reduce, Cloudera, Hive, Sqoop, HBase, Apache Pig, Oozie and Flume.
  • Excellent understanding of Hadoop architecture and underlying framework including storage management.
  • Knowledge of architecture and functionality of NOSQL DB like Cassandra and MongoDB.
  • Experience in importing and exportingdatabetween HDFS and Relational Database Management systems using Sqoop.
  • Experience in loadingdatato HDFS from Linux file system.
  • Over 8 years of strong Business Analysis experience on Data Analysis, User Requirement Gathering, User Requirement Analysis, Data Cleansing, Data Transformations, Data Relationships, Source Systems Analysis and Reporting Analysis.
  • Expertise inSAS/BASE, SAS/ACCESS, SAS/GRAPH, SAS/SQL, SAS/MACROS, SAS/ODS, SAS/REPORT and SAS/CONNECT areas and proficient in data manipulations, analysis, report/graph generations using the data steps with varies SAS functions, procedures, macros and ODS facilities on PC SAS.
  • Good knowledge of Data warehouse concepts and principles - Star Schema, Snowflake, Surrogate keys, Normalization/De normalization.

TECHNICAL SKILLS

Big-Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Mongodb, Cassendra, Hive, Pig, Oozie, Flume, Zookeeper, Impala

Cloud Technologies: AWS EC2, Dynamo DB, Redshift, Amazon ElasticCache

NoSQL: Hbase, MongoDB, Cansendra

RDBMS: Oracle, Netezza, Hive, Teradata, SQL-Server, PostgreSQL

CRM: Salesforce.com, Apex Language, Apex Classes, Apex Triggers, SOQL, SOSL, Visual Force (Pages, Components & Controllers), S-Controls, Apex Web Services, APEX, Workflow & Approvals, Dashboards, Reports, Analytic Snapshots, Custom Objects, Force.com Eclipse IDE Plug-in.

Data Warehousing, ETL: Custom Integration: Outbound messages, workflow & Approvals, Reports, Custom Objects and Tabs, Custom Application and Data Loader.

Languages: SQL, MS SQL, PL/SQL, SQL *LOADER, Unix Shell Script, SAS/Base, SAS Macros, SAS Graph, C, PRO*C

Reporting Tools: Business Objects XI, Web Intelligence 2.6, Business Objects Web Intelligence XI 3.0, BRIO QUERY, Crystal Reports XI R2, Cognos, Spotfire, Integrated Review, Oracle Reports

Data Modeling, Data Architecture: Logical Modeling, Physical Modeling, Relational Modeling, ER Diagrams, Dimensional Data Modeling(Star Schema, Snow Flake Schema, FACT and Dimensions Tables)

Environment: Informatica Power Center, Repository Manager, Workflow Manager, Data Analyzer, Metadata Manager, Data Profiling, Data Quality, Pentaho Data Integration (Kettle), Spoon, Schema Workbench, Pentaho Admin, Pentaho Dashboard Design.

Other Tools: UNIX, Linux, VAX/VMS, Windows NT, Windows XP Eclipse, Erwin, Visual Studio 2008, SQL Server BI, TOAD, SQL Navigator, SQL*LOADER, PVCS, First Doc, Share Point, Jira

PROFESSIONAL EXPERIENCE

Big Data Engineer

Confidential

Responsibilities:

  • Worked on architect, designing and developing the software for an integratedbigdataprocessing service built on top of a Hadoop infrastructure.
  • Created design documents and reviewed with team in addition to assisting the Business Analyst / Project Manager in explanations to line of business.
  • Manage data pipelines that brings data into Hadoop cluster for further analysis from traditional relational databases such as Oracle and MySQL using SQOOP and also log based data such as XML and JSON file formats. OOZIE is used as the Scheduler for the pipeline.
  • Manage the Analytics projects that processes data from Hadoop File system using Hive and send the aggregated/consolidated data to NoSQL environment such as HBase.
  • Created Hive tables to store the processed results in a tabular format.
  • Developed the Sqoop scripts in order to make the interaction between Pig and MySQL Database.
  • Developed a deep understanding of complex distributed system, and designed an innovative solution for customer requirements.
  • Review proposed mappings with relevant personnel, e.g. Business Analyst, Business System and DataAnalyst.
  • Generated detailed design documentation for the source-to-target transformations.
  • Wrote scripts to monitordataload/transformation.
  • Involved in planning process of iterations under the Agile Scrum methodology.
  • Created Hive tables to load large sets of structured, semi-structured and unstructureddatacoming from various sources.
  • Supported code/design analysis, strategy development and project planning.
  • Created reports for the BI team by using Sqoop to import/exportdata.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensuredata quality and availability.
  • Moved all crawled data flat files generated from various sources to HDFS for further processing.

Sr. Data Warehouse Engineer

Confidential

Responsibilities:

  • As anETLDeveloper, implementingETLprocesses to design/develop/test/install/migrate/troubleshoot various ETL mappings.
  • Working closely with business analysts to understand the requirements, needs, and technical metadata and use cases for synchronization of data.
  • Working closely with the team by gathering the functional requirements and working with the project manager to give a high level design and the cost associated with the project.
  • Conduct the Impact analysis for changes, enhancements, upgrades etc.
  • Creating the Technical documentation and list the step by step solutions giving to the requirement or enhancements or issues in the existing system.
  • Work with Informatica Power Center client tools like Repository Manager, Designer, Workflow Manager, and Workflow Monitor to create new mappings, sessions and workflows based on the STTM by following the coding standards.
  • Participate in testing in preparation of test plans and test cases to ensure the code changes are meet the requirements properly.
  • Code Migration to the Production after certification.
  • Closely monitoring the Jobs in production to make sure jobs are running according to the SLA.
  • Tuning the mappings and session to increase the performance of the jobs.

Principal Programmer

Confidential

Responsibilities:

  • Responsibilities include assessing business rules, designing, reviewing, performing source to target data mapping; implementing and optimizing ETL processes.
  • Parsing high-level design spec to simple ETL coding and mapping standards.
  • Using Informatica Designer designed Mappings that populated the Data into the Target Star Schema.
  • Lead a team of developers for delivering strategic projects and provide support for applications in production.
  • Participate & contribute in defining / adjusting BI strategy.
  • Architect, design, develop and deliver BI solutions using Business Objects, and Oracle.
  • Lead technology governance groups that define policies, best practices and make design decisions in on-going projects
  • Mentor development teams regarding best practices and technology stacks used in solutions.
  • Ability to validate results and identify areas for further analysis
  • Design and develop Business Objects Universe, Reports, and Dashboards.

Consultant

Confidential

Responsibilities:

  • Perform randomization, CRF design, and database design activities, view creation, develop edit checks, data derivation procedures, create IR reports and SAS data sets to support the implementation of new clinical trials for all DevOps sites.
  • Generated tables, listings and graphs according to Protocol and Statistical Analysis Plan (SAP).
  • DevelopedSASprograms usingSAS/BASE andSAS/SQL for preparing analysis and reports from databases.
  • Coordinated with clinical, data management and statistics departments in identifying and defining tables/listings and programming, running and testing them. As a programmer worked very closely with the statisticians to ensure that the output is representative and reflects the data contained in the database.
  • Work in close partnership with project teams at other sites to ensure the delivery of study implementation services according to Service Level Agreements (SLAs).
  • Be accountable for the quality and timeliness of the deliverables of the study implementation area of GCDS (Global Clinical Data Services).
  • Ensure that all GCDS study implementation activities are conducted in compliance with relevant regulatory requirements using agreed standards such as GRADES.
  • Manage time and prioritize competing tasks to achieve area specific goals.
  • Contribute to the development and evolution of Service Level Agreements (SLAs) to specify timing and quality of services provided by the data acquisition group within GCDS.
  • Contribute to the development and implementation of change based on WW strategies and standards; and in collaboration with local and global site functional heads.
  • Provide technical input to projects where required.

Sr. Programmer Analyst

Confidential

Responsibilities:

  • Involved in Study design based on CRF provided by the Clinical Data Management Team Lead (CDMTL). In Study design, Study Definition Matrix (SDM) will be created, which explains DCM Name, Question Group, Questions, DVG names, SAS name & SAS label for each CRF.
  • Development of Data entry screens, which involves creating Questions groups, Questions, DCI (Data Collection Instrument), DCM (Data Collection Module).
  • Request the Global Librarian to create required DVGs.
  • Scheduling the DCM according to the event specified in the CRF.
  • Creating Oracle Clinical Validation & Derivation procedures based on the request provided by the Data Management Group.
  • Complex derivations are handled through PL/SQL Procedures.
  • Creating reports using Brio query for each study.
  • Testing Validation & Derivation procedures and verify the accuracy of the output.
  • Testing applications according to the Study Design Document & CRF.
  • Generating SAS extract view report and identifying all required fields are exist.
  • Maintain appropriate study application documentation.
  • Provided support to various user groups on Change requests, Production problems and enhancements with assistance from team members.

We'd love your feedback!