We provide IT Staff Augmentation Services!

Big Data Engineer Resume

San Antonio, TX

EXPERENCE SUMMARY:

  • Around 6 years of experience in Data Engineering
  • Around 3 years of experience in Big Data/Hadoop technologies
  • Hands - on experience on Spark, RDD, Data Frames, Spark SQL and components in Hadoop Ecosystem including Hive, HBase, Pig, Sqoop
  • Experienced in importing and exporting the data from RDBMS to HDFS and vice versa using Sqoop.
  • Experienced in Importing and exporting data from Cassandra, AWS to HDFS and vice versa using Spark API and used Spark SQL to Analyze the data.
  • Experienced with AWS S3
  • Clear understanding on MapReduce, YARN and Spark.
  • Good understanding on Messaging systems like Kafka and Dataset API in Spark.
  • Working Knowledge on Spark Streaming.
  • Experienced in creating scripts for Hive table creation and Loading files to Hive tables.
  • Expertise in loading data to Hadoop hive tables from flat files and relational tables.
  • Certified in IBM Info Sphere DataStage v9.1 and US Core Banking
  • Experienced in developing Tableau Reports and Dashboards
  • Experienced in table design and Data Modelling using All Fusion Erwin Data Modeler .
  • Experienced in Scheduling, Monitoring the ETL jobs using BMC Control M.
  • Strong working experience on databases like SQL server, Oracle11g, DB2 and Netezza
  • Expert in UNIX shell scripts, Basic routines and awk commands
  • Designed and Experience in developing parallel jobs using DataStage/Quality stages in DataStage Designer clients
  • Experience in working with extensively with SQL , including performing query optimizations.
  • Strong working experience on RDBMS like SQL server, Oracle11g, DB2 and Netezza
  • Certified in Oracle Java SE6 Programmer (OCPJP 6).
  • Experience in working scaled Agile and Waterfall project management and certified Scrum Master
  • Industry Experience : Core Banking , Banking collection process, Insurance
  • Good working experience on client-side programming like D3.js , HTML5, Java Script, CSS3 .
  • Experience on Build and deploy tools like IBM RTC and Borland Star Team and GitLab
  • Trained and Certified in Core Java, Servlets, JSP, Restful Services, JSON and JDBC.
  • Performed as an Onsite co-ordinator for 8 members Team and Handled the Mexico Team and Chennai Team

TECHNICAL EXPERTISE:

Hardware / Platforms: Windows, Linux, UNIX

Programming Languages: Java7.0 / J2EE, SQL, Hive< -- -->, Scala, Shell Script, JavaScript

Hadoop: Hive, Pig, Oozie, Flume, Hbase, Sqoop, Impala, Spark, HDFS, Kafka, AWS S3

ETL Tool: IBM DataStage 11.5/9.2 Clients

Data Model Tool: CA ERwin Data Modeler

Report Tool: Tableau 10.4, D3.js, HTML 5, CSS3

Webservices: Restful webservices

Report Development: HTML-5, CSS3, Bootstrap, Java Script, AJAX and jQuery

Databases: Netezza, Oracle 10g/11g, DB2 LUW, MySQL 5.3, SQL Server

Integration control: GITLAB, IBM RTC

Batch Scheduling tool: BMC Control M

Other tools: IntelliJ IDEA, Eclipse, Oracle, SQL Developer V 2.1, Squirrel SQL Client, SQL Server Management Studio, PUTTY 0.6, WinSCP 4.2.5, MobaXterm, Star team repository, Putty, Sublime text editor, Eclipse plugin development, SQL Developer, ServiceNow, Soap UI, Notepad++, Rational Team concert

PROFESSIONAL EXPERIENCE:

Confidential, San Antonio, TX

Big Data Engineer

Responsibilities:

  • Performed on Sqoop jobs for ingesting data from MySQL to HDFS and created Hive external tables for querying the data.
  • Experienced in using Spark Data Frame APIs to ingest Oracle data to S3 and stored in Redshift and wrote a script to get RDBMS data to Redshift.
  • Experienced in creating RDDs, transformations and Actions while implementing spark applications.
  • Developed Scala scripts, UDFs using both Data frames and RDD in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Optimized the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and RDD's .
  • Experienced in handling large datasets using Partitions , Spark in Memory capabilities , Broadcasts in Spark , Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Developed and Involved in loading data into Cassandra NoSQL Database.
  • Processed the complex/nested JSON and CSV data using Data Frame API.
  • Automatically scaled-up the EMR Instances based on the data and scheduled and executed Spark scripts in EMR Pipes.
  • Validated the source and final output data and tested the data using Dataset API instead of RDD.
  • Experience Designing the dimensional data model using Erwin Data Modeler (Star Schema, Snowflake Schema)
  • Preparing High Level Design Document , Source to Target document and Data Dictionary
  • Building DataMart and Data warehouse tables in Netezza and SQL Server with help of DBA.
  • Designing and Developing parallel jobs using DataStage/Quality stages in DataStage Designer clients
  • Created configuration and parameter files for the reusable shell scripts.
  • Developing reports using Tableau , SAP Lumira and D3.js
  • Experience in version control tool GitHub
  • Propose and developing dashboards using D3.js for analytical purpose
  • Scheduling, Monitoring and supporting jobs using control M
  • Developing a purge script in shell script

Environment: HDFS, Hive, Pig, Spark, IBM Datastage 11.5, Aginity, JSON, Netezza, DB2, MS SQL, GitLab, Cassandra

Confidential, San Antonio, TX

Big Data Engineer

Responsibilities:

  • Performed on Sqoop jobs for ingesting data from MySQL to HDFS and created Hive external tables for querying the data.
  • Experienced in using Spark Data Frame APIs to ingest Oracle data to S3
  • Experienced in creating RDDs, transformations and Actions while implementing spark applications.
  • Developed Scala scripts, UDFs using both Data frames and RDD in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Optimized the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and RDD's .
  • Experienced in handling large datasets using Partitions , Spark in Memory capabilities , Broadcasts in Spark , Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Developed and Involved in loading data into Cassandra NoSQL Database.
  • Processed the complex/nested JSON and CSV data using Data Frame API.
  • Validated the source and final output data and tested the data using Dataset API instead of RDD.
  • Experience on creating Hive tables and working on them using Hive QL
  • Experience Designing the dimensional data model using Erwin Data Modeler (Star Schema, Snowflake Schema)
  • Preparing High Level Design Document , Source to Target document and Data Dictionary
  • Building DataMart and Data warehouse tables in Netezza and SQL Server with help of DBA.
  • Designing and Developing parallel jobs using DataStage/Quality stages in DataStage Designer clients
  • Created configuration and parameter files, Purge Script for the reusable shell scripts.
  • Developing reports using Tableau , SAP Lumira and D3.js
  • Experience in version control tool GitHub
  • Propose and developing dashboards using D3.js for analytical purpose
  • Scheduling, Monitoring and supporting jobs using control M

Environment: HDFS, Hive, Pig, Spark, IBM Datastage 11.5, Aginity, JSON, Netezza, DB2, MS SQL, GitLab, Cassandra

Confidential

ETL Developer

Responsibilities:

  • Interacted with onshore daily basis and gathered the business requirement
  • Experience in Preparing High Level Design Document, Source to Target document and Data Dictionary
  • Experience in Data Modelling and Table design using Erwin Data Modeler
  • Creating Hive tables and working on them using Hive QL
  • Experience in developing file watcher script to check for file availability, initiate cycle and sending alerts to Business.
  • Experience in developing DataStage jobs extensively using Aggregator, Sort, Merge, and Data Set in Parallel Extender to achieve better job performance.
  • Experience in Developing reports using Tableau, SAP Lumira and D3.js
  • Experience in Building a hive table in HDFS for Archival purpose
  • Experience in Developing the DataStage jobs to extract the Data from HDFS and applied the transformation as per the ETL.
  • Experience DataStage sequencer jobs extensively to take care of inter dependencies and to run DataStage server/parallel jobs in order.
  • Experience in Preparing technical design/specification for data extraction, Transformation and Loading and Data Cleansing.
  • Experience in Developing Job Sequences with restart capability for the designed jobs using Job Activity, Exec Command, E-Mail Notification Activities and Triggers.
  • Create reusable change job with change capture stage and create update and insert dataset for type 2 tables
  • Used surrogate key generator files to generate the table sequence for type 2 tables.
  • Used DataStage Designer to develop processes for extracting, cleansing, transforming, integrating and loading data into Data Warehouse database
  • Experience in Created Jobs in ETL tool DataStage to populate the data files with lookup values. Used Complex flat file stage, change capture Stage and Transformer stage.
  • Wrote DataStage jobs using DataStage Designer and in depth experience in dealing with different stages (column export, column import, change capture, surrogate key generator)
  • Responsible for creating DS job run scripts and scheduling them in control M
  • Deploying the ETL code into different stages using IBM RTC and Gitlab

Environment: HDFS, Hive, Pig, IBM DataStage 9.1, Hadoop, Tableau, Data Modeling, Netezza, DB2, PUTTY 0.6, WinSCP 4.2.5

Confidential

ETL Developer

Responsibilities:

  • Coordinated with onshore and gathered the business requirements
  • Knowledge about the technical/business aspects of the components.
  • Experience in working changed requests as per clients and projects technical specification needs.
  • Experience in working the Data Architecture of ETL process.
  • Designed the data model using Erwin Data Modeler(Star Schema)
  • Experience in developing the reports using Tableau
  • Experience in developing the jobs to load the data into IBM GPFS using hive
  • Experience in developing the ETL jobs using IBM Datastage 9.1/8.x to load the data into Data Warehouse
  • Created Data stage jobs (ETL Process) for populating the data into the Data Warehouse from different source systems like ODS, ASCII files, Binary, scheduled the same using Data Stage Sequencer for SI testing.
  • Experience in working Data warehouse development and ETL tool using IBM Datastage 9.1/8.X
  • Troubleshooting performance issues with SQL tuning.
  • Experience in working Change capture stage and Slowly Changing Dimension (SCD) stage.
  • Extensively Experience in working customizing Salesforce Object Query Language (SOQL) to extract data from SFDC.
  • Extensively Experience in working customizing DB2 SQL and created truncate and run stats scripts and used in before and after job routines.
  • Involved in Unit testing and Integration testing to test jobs and the flow.
  • DataStage Development activities for loading the data into data warehouse using SQL, ORACLE, DB2, Netezza Database and Working on DataStage Versions and Unix.
  • Collaborated with team in, High Level design documents for extract, transform, validate and load ETL process data dictionaries, Metadata descriptions, file layouts and flow diagrams.
  • Collaborated with team in, Low Level design document for mapping the files from source to target and implementing business logic.
  • Generation of Surrogate Keys for the dimensions and fact tables for indexing and faster access of data in Data Warehouse.
  • Experienced in working with team with other associate product & component developers.

Environment: IBM DataStage 9.1, Hadoop, GPFS, Teradata, DB2, PUTTY 0.6, WinSCP 4.2.5 , Microsoft Sql Server

Confidential

ETL Developer

Responsibilities:

  • Extracting data from different sources like Oracle, Sybase and Mainframe DB2 and transforming them using business logic and loading the data to the target warehouse using the Data Stage Built in Passive stages like ODBC, ORACLE8i, Sequential files and Hashed files for look ups.
  • Good Knowledge in developing ETL jobs and validating and scheduling the same jobs using the Data Stage director.
  • Unit testing, integration testing and user acceptance testing using in my project.
  • To perform the aggregation operations on the extracted input data and filtering the data Using Built in Active Stages.
  • Involved to prepare ETL Specification for developing jobs
  • Involved to compiling, validating, running, Monitoring and scheduling the jobs using the Control M .
  • Worked with ETL tool for importing metadata from repository, new job Categories and creating new data elements
  • All file paths must be parameterized to allow runtime flexibility and ease of migration between various server environments.
  • Co-ordinated with Onsite Team and Clients
  • Experience in creating workstream in IBM RTC Client
  • Experience in setting up the workspace and check out the code base from prod
  • Experience in merging the code into prod code
  • Experience in creating change set to check in the code in to prod

Environment: IBM DataStage 9.1, Hadoop, Netezza, DB2, PUTTY 0.6, WinSCP 4.2.5

Confidential

JAVA/J2EE Developer

Responsibilities:

  • Co-ordinating the requirement gathering meeting among with QA, Business and Development Team
  • Involved in system coding and testing, all performed in accordance with the established standards and supervision.
  • Worked with AGILE-SCRUM methodology and Involved in all agile ceremonies and played a key role in Iteration planning
  • Design and Development of User interface (GUI) using JSP, HTML, JavaScript, Ajax, CSS
  • Used Hibernate, JDBC to fetch the data from Database.
  • Strong hands on experience and understanding of OOPS concepts, Core Java concepts/API understanding such as Exception handling, Collections API, Strong Multithreading
  • Some of the highlighted tools which I have Experience in developing
  • Used Shell Scripting in UNIX for running batch processes (java application programs through the script file).
  • Consumed Existing SOAP Web Services
  • Experience in developing Review Tool using swings and Java.
  • Experience in developing and Unit tested individual module using JUnit.
  • Experience in working post production application maintenance, support and bug-fixes.

Environment: Java5, J2EE, JDBC, Hibernate, Spring Core, JSP, WAS 6.0, SOAP, XML, SQL, JMS, Apache Tomcat 5, Oracle 10g, JUnit, IBM RTC, Eclipse plugin and Swings.

Hire Now