We provide IT Staff Augmentation Services!

Seni Software/big Data Engineer Resume

3.00/5 (Submit Your Rating)

OR

PROFESSIONAL SUMMARY:

  • Over 7 years of experience in development, implementation and testing of Data warehousing and Business Inetelligence solutions
  • 5 years of experience as Big Data Engineer /Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
  • Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase and Spark.
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Hive, Pig, and Pyspark.
  • Experienced in using distributed computing architectures such as Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solve big data type problems
  • Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SQOOP, Hue, Scala and CHEF.
  • Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster
  • Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Hortonworks and Cloudera.
  • Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
  • Experience in extracting the data from RDBMS into HDFS using Sqoop.
  • Experience in collecting the logs from log collector into HDFS using Flume.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper
  • Solid knowledge of Data Marts, Operational Data Store (ODS), Dimensional DataModeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services
  • Expertise in Data Architect, Data Modeling, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
  • Experience in NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
  • Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments
  • Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
  • Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra
  • Experienced on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce(M-R), Hue, Hive, Pig, HBase, Impala, Sqoop, Flume, Zookeeper, Oozie, Kafka, Spark with Scala

Operating Systems / Environment: Windows, Ubuntu, Linux, iOS, Cloudera CDH,EC2,S3, IBM Big Insight

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, Java Beans

Modeling Tools: UML on Rational Rose, Rational Clear Case, Enterprise Architect, Microsoft Visio

IDEs: Eclipse, Net beans, JUnit testing tool, Log4j for logging Oracle, DB2, MS SQL Server, MySQL, MS- Access, Teradata,NoSQL (HBase, MongoDB, Cassandra )

Web Servers: Web Logic, Web Sphere, Apache Tomcat 7

Build Tools: Maven, Scala Build Tool(SBT), Ant

Operating systems and Virtual Machines: Linux (Red Hat, Ubuntu, Centos), Oracle virtual box, VMware player, Workstation 11

ETL Tools: Talend for Big data, Informatica

EXPERIENCE:

Confidential - OR

Senior Software/Big Data Engineer

Responsibilities:

  • Worked on building an ingestion framework to ingest data from different sources like Oracle, SQL server, delimited flat files, XML, Parquet, JSON into Hadoop and building tables in Hive
  • Worked on building big data analytic solutions to provide near real time and batch data as per Business requirements.
  • Assisted in leading the plan, building, and running states within the Enterprise Analytics Team
  • Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
  • Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture
  • Designed and developed software applications, testing, and building automation tools.
  • Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences
  • Worked on building a Spark framework to ingest data into Hive external tables and run complex computational and non equi-join SQLs in Spark.
  • Writing Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
  • Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Involved in developing a data quality tool for checking all data ingestion into Hive tables
  • Collaborated with BI teams to ensure data quality and availability with live visualization
  • Design, develop and maintain workflows in Oozie to integrate Shell-actions, Java-actions, Sqoop-actions, Hive-actions and Spark-actions in Oozie workflow nodes to run data pipelines
  • Design and support multi-tenancy on our data platform to allow other teams to run their applications
  • Used Impala for low latency queries, visualization and faster querying purposes.
  • Created HIVE Queries to process large sets of structured, semi-structured and unstructured data and store in Managed and External tables .
  • Created HBase tables to load large sets of structured data.
  • Managed and reviewed Hadoop log files.
  • Coded to ENCRYPT/DECRYPT data for PII groups.
  • Performed Real time event processing of data from multiple servers in the organization using Apache Kafka and Flume.
  • Processed JSON files and ingested into Hive tables.
  • Used python to parse XML files and created flat files from them.
  • Used Hbase to support front end applications that retrieve data using row keys.
  • Used Control-M as Enterprise Scheduler to schedule all our jobs.
  • Used Bit-Bucket extensively for code repository.

Environment: Cloudera,Hue,Java,Python,Sql,Shell-scripting,Talend,CONTROL-M,Oozie,Spark,Sqoop,Bit-Bucket,Scala,Hive,Impala.

Confidential - TX

Senior Software/Big Data Engineer

Responsibilities:

  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the datainto the HDFS system.
  • Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
  • Architected, Designed and Developed Business applications and Data marts for reporting.
  • Worked with SME and conducted JAD sessions documented the requirements using UML and use case diagrams
  • Used SDLC Methodology of Data Warehouse development using Kanbanize.
  • Configured Apache Mahout Engine.
  • Used Agile (SCRUM) methodologies for Software Development.
  • Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
  • Developed Big Data solutions focused on pattern matching and predictive modeling
  • Objective of this project is to build a data lake as a solution using Apache Hadoop.
  • Developed the code to perform Data extractions from Oracle Database and load it into Hadoop platform using ingestion framework.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Worked in exporting data from Hive tables into Oracle database.
  • Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat Files, RDBMS as part of a POC using Amazon EC2.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Built Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
  • Developed complete end to end Big-data processing in Hadoop eco system
  • Used Hortonworks distribution with Infrastructure Provisioning / Configuration.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
  • Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
  • Implemented partitioning, dynamic partitions and buckets in Hive.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

Environment: Apache Spark 2.3, Hive 2.3, Informatica, HDFS, MapReduce, Scala, Microsoft Azure, Apache Nifi 1.7, Yarn, HBase, PL/SQL, MongoDB, Pig 0.17, Sqoop 1.4, Apache Flume 1.8

Confidential - Bridgeport, CT

Hadoop Developer

Responsibilities:

  • Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
  • Researched, evaluated, architect, and deployed new tools, frameworks and patterns to build sustainable Big Data platforms.
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Responsible for the data architecture design delivery, data model development, review, approval and Data warehouse implementation.
  • Designed and developed the conceptual then logical and physical data models to meet the needs of reporting.
  • Involved in designing and developing Data Models and Data Marts that support the Business Intelligence Data Warehouse.
  • Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin.
  • Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
  • Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce.
  • Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases in to EDW.
  • Designed both 3NF Data models and dimensional Data models using Star and Snowflake schemas.
  • Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
  • Worked with SQL Server Analysis Services (SSAS) and SQL Server Reporting Service (SSRS).
  • Worked on Data modeling, Advanced SQL with Columnar Databases using AWS.
  • Performed reverse engineering of the dashboard requirements to model the required datamarts.
  • Cleansed, extracted and analyzed business data on daily basis and prepared ad-hoc analytical reports using Excel and T-SQL.
  • Created Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
  • Handled performance requirements for databases in OLTP and OLAP models.
  • Conducted meetings with business and development teams for data validation and end-to-end data mapping.
  • Involved in debugging and Tuning the PL/SQL code, tuning queries, optimization for the Sql database.
  • Lead data migration from legacy systems into modern data integration frameworks from conception to completion.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy DB2 and SQL Server database systems..
  • Generated DDL and created the tables and views in the corresponding architectural layers.
  • Handled importing of data from various data sources, performed transformations using Map Reduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Involved in performing extensive Back-End testing by writing SQL queries and PL/SQL stored procedures to extract the data from SQL Database.
  • Participate in code/design reviews and provide input into best practices for reports and universe development.
  • Involved in the validation of the OLAP, Unit testing and System Testing of the OLAP Report Functionality and data displayed in the reports.
  • Created a high-level industry standard, generalized data model to convert it into logical and physical model at later stages of the project using Erwin and Visio.
  • Involved in translating business needs into long-term architecture solutions and reviewing object models, data models and metadata.

Environment: Erwin 9.7, HDFS, HBase, Hadoop 3.0, Metadata, MS Visio 2016, SQL Server 2016, SDLC, PL/SQL, ODS, OLAP, OLTP, flat files.

Confidential - San Francisco, CA

Data Analyst/Data Engineer

Responsibilities:

  • Participated in requirement gathering session, JAD sessions with users, Subject Matter experts, Architect's and BAs.
  • Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
  • Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
  • Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
  • Planned and defined system requirements to Use Case, Use Case Scenario and Use Case Narrative using the UML (Unified Modeling Language) methodologies.
  • Participated in JAD sessions involving the discussion of various reporting needs.
  • Reverse Engineering the existing data marts and identified the Data Elements, Dimensions, Facts and Measures required for reports.
  • Extensively used PL/SQL in writing database packages, stored procedures, functions and triggers in Oracle.
  • Created data dictionaries for various data models to help other teams understand the actual purpose of each table and its columns.
  • Developed the required data warehouse model using Star schema for the generalized model.
  • Involved in designing and developing SQL server objects such as Tables, Views, Indexes (Clustered and Non-Clustered), Stored Procedures and Functions in Transact-SQL.
  • Used forward engineering approach for designing and creating databases for OLAP model.
  • Developed and maintained Data Dictionary to create Metadata Reports for technical and business purpose.
  • Worked with BI team in providing SQL queries, Data Dictionaries and mapping documents.
  • Responsible for the analysis of business requirements and design implementation of the business solution.
  • Extensively involved in Data Governance that involved data definition, data quality, rule definition, privacy and regulatory policies, auditing and access control.
  • Designed and Developed Oracle database Tables, Views, Indexes and maintained the databases by deleting and removing old data.
  • Developed Data mapping, Data Governance, Transformation and Cleansing rules for the Data Management involving OLTP, ODS and OLAP.
  • Conducting user interviews, gathering requirements, analyzing the requirements using Rational Rose, Requisite pro RUP.
  • Designed and developed Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
  • Created E/R Diagrams, Data Flow Diagrams, grouped and created the tables, validated the data.
  • Designed the data marts in dimensional data modeling using star and snowflake schemas.
  • Translated business concepts into XML vocabularies by designing Schemas with UML.

Environment: MS Visio 2014, PL/SQL, Oracle 11g, OLAP, XML, OLTP, SQL server, Transact-SQL.

Confidential, IL

ETL Consultant

Responsibilities:

  • Worked as an Informatica developer and involved in creation of initial documentation for the project and setting the goals for Data Integration team from ETL perspective.
  • Played a Key role in designing the application that would migrate into the existing data into Annuity warehouse effectively by using Informatica Power Center.
  • Parsed high-level design spec to simple ETL coding and mapping standards.
  • Created ftp connections, database connections for the sources and targets.
  • Involved in creating test files and performed testing to check the errors.
  • Loaded Data to the Interface tables from multiple data sources such as MS Access, SQL Server, Flat files and Excel Spreadsheets using SQL Loader, Informatica and ODBC connection.
  • Created different transformations for loading the data into targets like Source Qualifier Joiner Transformation, Update strategy, lookup transformation (connected and unconnected), Rank transformations, Expression, Aggregator, and Sequence Generator.
  • Simplified the data flow by using a Router transformation to check multiple conditions at the same time.
  • Created reusable transformations and mapplets to import in the common aspects of data flow to avoid complexity in the mappings.
  • Created sessions, sequential and concurrent batches for proper execution of mappings using workflow manager.
  • Used shortcuts to reuse objects without creating multiple objects in the repository and inherit changes made to the source automatically.
  • Extensively worked with SQL scripts to validate the pre and post data load.
  • Used Session parameters, Mapping variable/parameters and created Parameter files for imparting flexible runs of workflows based on changing variable values.
  • Responsible for monitoring scheduled, running, completed and failed sessions. Involved in debugging the failed mappings and developing error handling methods.
  • Generated weekly and monthly report Status for the number of incidents handled by the support team.
  • Maintained source and target mappings, transformation logic and processes to reflect the changing business environment over time.
  • Designed a mapplet to update a slowly changing dimension table to keep full history which was used across the board.
  • Maintained documentation for corporate Data Dictionary with attributes, table names and constraints.
  • Responsible for post production support and SME to the project.

Environment: Informatica Power Center, Talend, Oracle 10g, PL/SQL, SQL Server, SQL Developer Toad, Windows NT, Stored Procedures, Business Intelligence Development Studio, Microsoft Visio 2003, Business Objects.

Confidential, McLean, VA

ETL Developer

Responsibilities:

  • Experience in developing Logical data modeling, Reverse engineering and physical data modeling of CRM system using ER-WIN and Infosphere.
  • Involved design and development of Data Migration from Legacy system using Oracle Loader and import/export tools for OLTP system.
  • Worked closely with the Data Business Analyst to ensure the process stays on track, develop consensus on data requirements, and document data element/data model requirements via the approved process and templates.
  • Was involved in writing Batch Programs to run Validation Packages.
  • Extensively worked on Informatica Power Center-Source analyzer, Data warehousing designer, Mapping Designer, Mapplet and Transformations to import source and target definitions into the repository and to build mappings.
  • Extensive use of Store Procedures/functions/Packages and User Defined Functions
  • Proper use of Indexes to enhance the performance of individual queries and enhance the Stored Procedures for OLTP system
  • Dropped and recreated the Indexes on tables for performance improvements for OLTP application
  • Tuned SQL queries using Show Plans and Execution Plans for better performance
  • Done the full life cycle software development processes, especially as they pertain to data movement and data integration

Environment: Informatica PowerCenter, Talend, Oracle 10g, SQL, PL/SQL, DB2, Stored Procedures, UNIX

We'd love your feedback!