We provide IT Staff Augmentation Services!

Seni Software/big Data Engineer Resume

4.00 Rating



  • Over 7 years of experience in development, implementation and testing of Data warehousing and Business Inetelligence solutions
  • 5 years of experience as BigData Engineer/Data Engineer and Data Analyst including designing, developing and implementation ofdatamodels for enterprise - level applications and systems.
  • Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase and Spark.
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Hive, Pig, and Pyspark.
  • Experienced in using distributed computing architectures such as Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solvebigdatatype problems
  • Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SQOOP, Hue, Scala and CHEF.
  • Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster
  • Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Hortonworks and Cloudera.
  • Expertise in integration of variousdatasources like RDBMS, Spreadsheets, Text files, JSON and XML files.
  • Experience in extracting the data from RDBMS into HDFS using Sqoop.
  • Experience in collecting the logs from log collector into HDFS using Flume.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper
  • Solid knowledge ofDataMarts, OperationalDataStore (ODS), DimensionalDataModeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services
  • Expertise inDataArchitect,DataModeling,DataMigration,DataProfiling,DataCleansing, Transformation, Integration,DataImport, andDataExport through the use of multiple ETL tools such as Informatica Power Centre.
  • Experience in NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning &datamodeling.
  • Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
  • Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments
  • Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
  • Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra
  • Experienced on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.


Hadoop/Big Data: HDFS, MapReduce(M-R), Hue, Hive, Pig, HBase, Impala, Sqoop, Flume, Zookeeper, Oozie, Kafka, Spark with Scala

Operating Systems / Environment: Windows, Ubuntu, Linux, iOS, Cloudera CDH,EC2,S3, IBM Big Insight

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, Java Beans

Modeling Tools: UML on Rational Rose, Rational Clear Case, Enterprise Architect, Microsoft Visio

IDEs: Eclipse, Net beans, JUnit testing tool, Log4j for logging

Databases: Oracle, DB2, MS-SQL Server, MySQL, MS- Access, Teradata,NoSQL (HBase, MongoDB, Cassandra )

Web Servers: Web Logic, Web Sphere, Apache Tomcat 7

Build Tools: Maven, Scala Build Tool(SBT), Ant

Operating systems and Virtual Machines: Linux (Red Hat, Ubuntu, Centos), Oracle virtual box, VMware player, Workstation 11

ETL Tools: Talend for Big data, Informatica


Confidential - OR

Senior Software/Big Data Engineer


  • Worked on building an ingestion framework to ingest data from different sources like Oracle, SQL server, delimited flat files, XML, Parquet, JSON into Hadoop and building tables in Hive
  • Worked on building big data analytic solutions to provide near real time and batch data as per Business requirements.
  • Assisted in leading the plan, building, and running states within the Enterprise Analytics Team
  • Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
  • Performed detailed analysis of business problems and technical environments and use thisdatain designing the solution and maintainingdataarchitecture
  • Designed and developed software applications, testing, and building automation tools.
  • Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences
  • Worked on building a Spark framework to ingest data into Hive external tables and run complex computational and non equi-join SQLs in Spark.
  • Writing Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive.
  • Used Hive to analyze the partitioned and bucketeddataand compute various metrics for reporting on the dashboard.
  • Used Hive to analyzedataingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
  • Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Involved in developing a data quality tool for checking all data ingestion into Hive tables
  • Collaborated with BI teams to ensure data quality and availability with live visualization
  • Design, develop and maintain workflows in Oozie to integrate Shell-actions, Java-actions, Sqoop-actions, Hive-actions and Spark-actions in Oozie workflow nodes to run data pipelines
  • Design and support multi-tenancy on our data platform to allow other teams to run their applications
  • Used Impala for low latency queries, visualization and faster querying purposes.
  • Created HIVE Queries to process large sets of structured, semi-structured and unstructured data and store in Managed and External tables .
  • Created HBase tables to load large sets of structured data.
  • Managed and reviewed Hadoop log files.
  • Coded to ENCRYPT/DECRYPT data for PII groups.
  • Performed Real time event processing of data from multiple servers in the organization using Apache Kafka and Flume.
  • Processed JSON files and ingested into Hive tables
  • Used python to parse XML files and created flat files from them.
  • Used Hbase to support front end applications that retrieve data using row keys.
  • Used Control-M as Enterprise Scheduler to schedule all our jobs
  • Used Bit-Bucket extensively for code repository

Environment: Cloudera,Hue,Java,Python,Sql,Shell-scripting,CONTROL-M,Oozie,Spark,Sqoop,Bit-Bucket,HiveImpala

Confidential - TX

Senior Software/Big Data Engineer


  • Implemented theBigDatasolution using Hadoop, hive and Informatica to pull/load thedatainto the HDFS system.
  • Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
  • Architected, Designed and Developed Business applications andDatamarts for reporting.
  • Worked with SME and conducted JAD sessions documented the requirements using UML and use case diagrams
  • Used SDLC Methodology ofDataWarehouse development using Kanbanize.
  • Configured Apache Mahout Engine.
  • Used Agile (SCRUM) methodologies for Software Development.
  • Wrote complex Hive queries to extractdatafrom heterogeneous sources (DataLake) and persist thedatainto HDFS.
  • DevelopedBigDatasolutions focused on pattern matching and predictive modeling
  • Objective of this project is to build adatalake as a solution using Apache Hadoop.
  • Developed the code to performDataextractions from Oracle Database and load it into Hadoop platform using ingestion framework.
  • Created Hive External tables to stagedataand then move thedatafrom Staging to main tables
  • Worked in exportingdatafrom Hive tables into Oracle database.
  • Pulled thedatafromdatalake (HDFS) and massaging thedatawith various RDD transformations.
  • Developed Scala scripts, UDF's using bothDataframes/SQL and RDD/MapReduce in Spark forDataAggregation, queries and writingdataback into RDBMS through Sqoop.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing ofdata.
  • CreatedDataPipeline using Processor Groups and multiple processors using Apache Nifi for Flat Files, RDBMS as part of a POC using Amazon EC2.
  • Developed Spark code using Scala and Spark-SQL for faster testing anddataprocessing.
  • Built Hadoop solutions forbigdataproblems using MR1 and MR2 in YARN.
  • Load thedatafrom different sources such as HDFS or HBase into Spark RDD and implement in memorydatacomputation to generate the output response.
  • Developed complete end to endBig-dataprocessing in Hadoop eco system
  • Used Hortonworks distribution with Infrastructure Provisioning / Configuration.
  • Used Hive to analyze the partitioned and bucketeddataand compute various metrics for reporting on the dashboard.
  • Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Used Hive to analyze the partitioned and bucketeddataand compute various metrics for reporting on the dashboard.
  • Worked on configuring and managing disaster recovery and backup on CassandraData.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
  • Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
  • Implemented partitioning, dynamic partitions and buckets in Hive.
  • Used Flume to collect, aggregate, and store the web logdatafrom different sources like web servers, mobile and network devices and pushed to HDFS.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

Environment: Apache Spark 2.3, Hive 2.3, Informatica, HDFS, MapReduce, Scala, Apache Nifi 1.7, Yarn, HBase, PL/SQL, MongoDB, Pig 0.17, Sqoop 1.4, Apache Flume 1.8

Confidential - Bridgeport, CT

Hadoop Developer


  • Worked as a Sr.DataAnalyst/DataEngineer to review business requirement and compose source to target data mapping documents.
  • Researched, evaluated, architect, and deployed new tools, frameworks and patterns to build sustainable BigData platforms.
  • Designed and developed architecture fordataservices ecosystem spanning Relational, NoSQL, and BigDatatechnologies.
  • Responsible for thedataarchitecture design delivery,datamodel development, review, approval andDatawarehouse implementation.
  • Designed and developed the conceptual then logical and physical data models to meet the needs of reporting.
  • Involved in designing and developing DataModels andDataMarts that support the Business IntelligenceDataWarehouse.
  • Implemented logical and physical relational database and maintained Database Objects in thedatamodel using Erwin.
  • Responsible for Bigdata initiatives and engagement including analysis, brainstorming, POC, and architecture.
  • Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce.
  • Performed theDataMapping,Datadesign (DataModeling) to integrate thedataacross the multiple databases in to EDW.
  • Designed both 3NFDatamodels and dimensionalDatamodels using Star and Snowflake schemas.
  • Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
  • Worked with SQL Server Analysis Services (SSAS) and SQL Server Reporting Service (SSRS).
  • Worked onDatamodeling, Advanced SQL with Columnar Databases using AWS.
  • Performed reverse engineering of the dashboard requirements to model the requireddatamarts.
  • Cleansed, extracted and analyzed businessdataon daily basis and prepared ad-hoc analytical reports using Excel and T-SQL.
  • CreatedDataMigration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
  • Handled performance requirements for databases in OLTP and OLAP models.
  • Conducted meetings with business and development teams fordatavalidation and end-to-enddatamapping.
  • Involved in debugging and Tuning the PL/SQL code, tuning queries, optimization for the Sql database.
  • Leaddatamigration from legacy systems into moderndataintegration frameworks from conception to completion.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetchdatafrom legacy DB2 and SQL Server database systems..
  • Generated DDL and created the tables and views in the corresponding architectural layers.
  • Handled importing ofdatafrom variousdatasources, performed transformations using Map Reduce, loadeddatainto HDFS and Extracted thedatafrom MySQL into HDFS using Sqoop.
  • Involved in performing extensive Back-End testing by writing SQL queries and PL/SQL stored procedures to extract thedatafrom SQL Database.
  • Participate in code/design reviews and provide input into best practices for reports and universe development.
  • Involved in the validation of the OLAP, Unit testing and System Testing of the OLAP Report Functionality anddatadisplayed in the reports.
  • Created a high-level industry standard, generalizeddatamodel to convert it into logical and physical model at later stages of the project using Erwin and Visio.
  • Involved in translating business needs into long-term architecture solutions and reviewing object models,datamodels and metadata.

Environment: Erwin 9.7, HDFS, HBase, Hadoop 3.0, Metadata, MS Visio 2016, SQL Server 2016, SDLC, PL/SQL, ODS, OLAP, OLTP, flat files.

Confidential - San Francisco, CA

Data Analyst/Data Engineer


  • Participated in requirement gathering session, JAD sessions with users, Subject Matter experts, Architect's and BAs.
  • Optimized and updated UML Models (Visio) and RelationalDataModels for various applications.
  • Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
  • Worked with Business users during requirements gathering and prepared Conceptual, Logical and PhysicalDataModels.
  • Planned and defined system requirements to Use Case, Use Case Scenario and Use Case Narrative using the UML (Unified Modeling Language) methodologies.
  • Participated in JAD sessions involving the discussion of various reporting needs.
  • Reverse Engineering the existingdatamarts and identified theDataElements, Dimensions, Facts and Measures required for reports.
  • Extensively used PL/SQL in writing database packages, stored procedures, functions and triggers in Oracle.
  • Createddatadictionaries for variousdatamodels to help other teams understand the actual purpose of each table and its columns.
  • Developed the requireddatawarehouse model using Star schema for the generalized model.
  • Involved in designing and developing SQL server objects such as Tables, Views, Indexes (Clustered and Non-Clustered), Stored Procedures and Functions in Transact-SQL.
  • Used forward engineering approach for designing and creating databases for OLAP model.
  • Developed and maintainedDataDictionary to create Metadata Reports for technical and business purpose.
  • Worked with BI team in providing SQL queries,DataDictionaries and mapping documents.
  • Responsible for the analysis of business requirements and design implementation of the business solution.
  • Extensively involved inDataGovernance that involveddatadefinition,dataquality, rule definition, privacy and regulatory policies, auditing and access control.
  • Designed and Developed Oracle database Tables, Views, Indexes and maintained the databases by deleting and removing olddata.
  • DevelopedDatamapping,DataGovernance, Transformation and Cleansing rules for theDataManagement involving OLTP, ODS and OLAP.
  • Conducting user interviews, gathering requirements, analyzing the requirements using Rational Rose, Requisite pro RUP.
  • Designed and developed Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
  • Created E/R Diagrams,DataFlow Diagrams, grouped and created the tables, validated thedata.
  • Designed thedatamarts in dimensionaldatamodeling using star and snowflake schemas.
  • Translated business concepts into XML vocabularies by designing Schemas with UML.

Environment: MS Visio 2014, PL/SQL, Oracle 11g, OLAP, XML, OLTP, SQL server, Transact-SQL.

Confidential, IL

ETL Consultant


  • Worked as an Informatica developer and involved in creation of initial documentation for the project and setting the goals for Data Integration team from ETL perspective.
  • Played a Key role in designing the application that would migrate into the existing data into Annuity warehouse effectively by using Informatica Power Center.
  • Parsed high-level design spec to simple ETL coding and mapping standards.
  • Created ftp connections, database connections for the sources and targets.
  • Involved in creating test files and performed testing to check the errors.
  • Loaded Data to the Interface tables from multiple data sources such as MS Access, SQL Server, Flat files and Excel Spreadsheets using SQL Loader, Informatica and ODBC connection.
  • Created different transformations for loading the data into targets like Source Qualifier Joiner Transformation, Update strategy, lookup transformation (connected and unconnected), Rank transformations, Expression, Aggregator, and Sequence Generator.
  • Simplified the data flow by using a Router transformation to check multiple conditions at the same time.
  • Created reusable transformations and mapplets to import in the common aspects of data flow to avoid complexity in the mappings.
  • Created sessions, sequential and concurrent batches for proper execution of mappings using workflow manager.
  • Used shortcuts to reuse objects without creating multiple objects in the repository and inherit changes made to the source automatically.
  • Extensively worked with SQL scripts to validate the pre and post data load.
  • Used Session parameters, Mapping variable/parameters and created Parameter files for imparting flexible runs of workflows based on changing variable values.
  • Responsible for monitoring scheduled, running, completed and failed sessions. Involved in debugging the failed mappings and developing error handling methods.
  • Generated weekly and monthly report Status for the number of incidents handled by the support team.
  • Maintained source and target mappings, transformation logic and processes to reflect the changing business environment over time.
  • Designed a mapplet to update a slowly changing dimension table to keep full history which was used across the board.
  • Maintained documentation for corporate Data Dictionary with attributes, table names and constraints.
  • Responsible for post production support and SME to the project.

Environment: Informatica Power Center, Oracle 10g, PL/SQL, SQL Server, SQL Developer Toad, Windows NT, Stored Procedures, Business Intelligence Development Studio, Microsoft Visio 2003, Business Objects.

Confidential, McLean, VA

ETL Developer


  • Experience in developing Logical data modeling, Reverse engineering and physical data modeling of CRM system using ER-WIN and Infosphere.
  • Involved design and development of Data Migration from Legacy system using Oracle Loader and import/export tools for OLTP system.
  • Worked closely with the Data Business Analyst to ensure the process stays on track, develop consensus on data requirements, and document data element/data model requirements via the approved process and templates.
  • Was involved in writing Batch Programs to run Validation Packages.
  • Extensively worked on Informatica Power Center-Source analyzer, Data warehousing designer, Mapping Designer, Mapplet and Transformations to import source and target definitions into the repository and to build mappings.
  • Extensive use of Store Procedures/functions/Packages and User Defined Functions
  • Proper use of Indexes to enhance the performance of individual queries and enhance the Stored Procedures for OLTP system
  • Dropped and recreated the Indexes on tables for performance improvements for OLTP application
  • Tuned SQL queries using Show Plans and Execution Plans for better performance
  • Done the full life cycle software development processes, especially as they pertain to data movement and data integration

Environment: Informatica PowerCenter, Oracle 10g, SQL, PL/SQL, DB2, Stored Procedures, UNIX

We'd love your feedback!