Senior Software/Big Data Engineer Resume OR - Hire IT People

SUMMARY

Over 7 years of experience in development, implementation and testing of Data warehousing and Business Inetelligence solutions
5 years of experience as BigData Engineer/Data Engineer and Data Analyst including designing, developing and implementation ofdatamodels for enterprise - level applications and systems.
Experience in designing, building and implementing complete Hadoop ecosystem comprising of Map Reduce, HDFS, Hive, Impala, Pig, Sqoop, Oozie, HBase and Spark.
Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Hive, Pig, and Pyspark.
Experienced in using distributed computing architectures such as Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solvebigdatatype problems
Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SQOOP, Hue, Scala and CHEF.
Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster
Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Hortonworks and Cloudera.
Expertise in integration of variousdatasources like RDBMS, Spreadsheets, Text files, JSON and XML files.
Experience in extracting the data from RDBMS into HDFS using Sqoop.
Experience in collecting the logs from log collector into HDFS using Flume.
Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper
Solid knowledge ofDataMarts, OperationalDataStore (ODS), DimensionalDataModeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services
Expertise inDataArchitect,DataModeling,DataMigration,DataProfiling,DataCleansing, Transformation, Integration,DataImport, andDataExport through the use of multiple ETL tools such as Informatica Power Centre.
Experience in NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning &datamodeling.
Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments
Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra
Experienced on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform
Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, MapReduce(M-R), Hue, Hive, Pig, HBase, Impala, Sqoop, Flume, Zookeeper, Oozie, Kafka, Spark with Scala

Operating Systems / Environment: Windows, Ubuntu, Linux, iOS, Cloudera CDH,EC2,S3, IBM Big Insight

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, Java Beans

Modeling Tools: UML on Rational Rose, Rational Clear Case, Enterprise Architect, Microsoft Visio

IDEs: Eclipse, Net beans, JUnit testing tool, Log4j for logging

Databases: Oracle, DB2, MS-SQL Server, MySQL, MS- Access, Teradata,NoSQL (HBase, MongoDB, Cassandra )

Web Servers: Web Logic, Web Sphere, Apache Tomcat 7

Build Tools: Maven, Scala Build Tool(SBT), Ant

Operating systems and Virtual Machines: Linux (Red Hat, Ubuntu, Centos), Oracle virtual box, VMware player, Workstation 11

ETL Tools: Talend for Big data, Informatica

PROFESSIONAL EXPERIENCE

Confidential - OR

Senior Software/Big Data Engineer

Responsibilities:

Worked on building an ingestion framework to ingest data from different sources like Oracle, SQL server, delimited flat files, XML, Parquet, JSON into Hadoop and building tables in Hive
Worked on building big data analytic solutions to provide near real time and batch data as per Business requirements.
Assisted in leading the plan, building, and running states within the Enterprise Analytics Team
Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
Performed detailed analysis of business problems and technical environments and use thisdatain designing the solution and maintainingdataarchitecture
Designed and developed software applications, testing, and building automation tools.
Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences
Worked on building a Spark framework to ingest data into Hive external tables and run complex computational and non equi-join SQLs in Spark.
Writing Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive.
Used Hive to analyze the partitioned and bucketeddataand compute various metrics for reporting on the dashboard.
Used Hive to analyzedataingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
Involved in developing a data quality tool for checking all data ingestion into Hive tables
Collaborated with BI teams to ensure data quality and availability with live visualization
Design, develop and maintain workflows in Oozie to integrate Shell-actions, Java-actions, Sqoop-actions, Hive-actions and Spark-actions in Oozie workflow nodes to run data pipelines
Design and support multi-tenancy on our data platform to allow other teams to run their applications
Used Impala for low latency queries, visualization and faster querying purposes.
Created HIVE Queries to process large sets of structured, semi-structured and unstructured data and store in Managed and External tables .
Created HBase tables to load large sets of structured data.
Managed and reviewed Hadoop log files.
Coded to ENCRYPT/DECRYPT data for PII groups.
Performed Real time event processing of data from multiple servers in the organization using Apache Kafka and Flume.
Processed JSON files and ingested into Hive tables
Used python to parse XML files and created flat files from them.
Used Hbase to support front end applications that retrieve data using row keys.
Used Control-M as Enterprise Scheduler to schedule all our jobs
Used Bit-Bucket extensively for code repository

Environment: Cloudera,Hue,Java,Python,Sql,Shell-scripting,CONTROL-M,Oozie,Spark,Sqoop,Bit-Bucket,HiveImpala

Confidential - TX

Senior Software/Big Data Engineer

Responsibilities:

Implemented theBigDatasolution using Hadoop, hive and Informatica to pull/load thedatainto the HDFS system.
Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
Architected, Designed and Developed Business applications andDatamarts for reporting.
Worked with SME and conducted JAD sessions documented the requirements using UML and use case diagrams
Used SDLC Methodology ofDataWarehouse development using Kanbanize.
Configured Apache Mahout Engine.
Used Agile (SCRUM) methodologies for Software Development.
Wrote complex Hive queries to extractdatafrom heterogeneous sources (DataLake) and persist thedatainto HDFS.
DevelopedBigDatasolutions focused on pattern matching and predictive modeling
Objective of this project is to build adatalake as a solution using Apache Hadoop.
Developed the code to performDataextractions from Oracle Database and load it into Hadoop platform using ingestion framework.
Created Hive External tables to stagedataand then move thedatafrom Staging to main tables
Worked in exportingdatafrom Hive tables into Oracle database.
Pulled thedatafromdatalake (HDFS) and massaging thedatawith various RDD transformations.
Developed Scala scripts, UDF's using bothDataframes/SQL and RDD/MapReduce in Spark forDataAggregation, queries and writingdataback into RDBMS through Sqoop.
Developed Spark code using Scala and Spark-SQL/Streaming for faster processing ofdata.
CreatedDataPipeline using Processor Groups and multiple processors using Apache Nifi for Flat Files, RDBMS as part of a POC using Amazon EC2.
Developed Spark code using Scala and Spark-SQL for faster testing anddataprocessing.
Built Hadoop solutions forbigdataproblems using MR1 and MR2 in YARN.
Load thedatafrom different sources such as HDFS or HBase into Spark RDD and implement in memorydatacomputation to generate the output response.
Developed complete end to endBig-dataprocessing in Hadoop eco system
Used Hortonworks distribution with Infrastructure Provisioning / Configuration.
Used Hive to analyze the partitioned and bucketeddataand compute various metrics for reporting on the dashboard.
Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
Used Hive to analyze the partitioned and bucketeddataand compute various metrics for reporting on the dashboard.
Worked on configuring and managing disaster recovery and backup on CassandraData.
Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
Implemented partitioning, dynamic partitions and buckets in Hive.
Used Flume to collect, aggregate, and store the web logdatafrom different sources like web servers, mobile and network devices and pushed to HDFS.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

Environment: Apache Spark 2.3, Hive 2.3, Informatica, HDFS, MapReduce, Scala, Apache Nifi 1.7, Yarn, HBase, PL/SQL, MongoDB, Pig 0.17, Sqoop 1.4, Apache Flume 1.8

Confidential - Bridgeport, CT

Hadoop Developer

Responsibilities:

Worked as a Sr.DataAnalyst/DataEngineer to review business requirement and compose source to target data mapping documents.
Researched, evaluated, architect, and deployed new tools, frameworks and patterns to build sustainable BigData platforms.
Designed and developed architecture fordataservices ecosystem spanning Relational, NoSQL, and BigDatatechnologies.
Responsible for thedataarchitecture design delivery,datamodel development, review, approval andDatawarehouse implementation.
Designed and developed the conceptual then logical and physical data models to meet the needs of reporting.
Involved in designing and developing DataModels andDataMarts that support the Business IntelligenceDataWarehouse.
Implemented logical and physical relational database and maintained Database Objects in thedatamodel using Erwin.
Responsible for Bigdata initiatives and engagement including analysis, brainstorming, POC, and architecture.
Worked with Hadoop eco system covering HDFS, HBase, YARN and Map Reduce.
Performed theDataMapping,Datadesign (DataModeling) to integrate thedataacross the multiple databases in to EDW.
Designed both 3NFDatamodels and dimensionalDatamodels using Star and Snowflake schemas.
Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
Worked with SQL Server Analysis Services (SSAS) and SQL Server Reporting Service (SSRS).
Worked onDatamodeling, Advanced SQL with Columnar Databases using AWS.
Performed reverse engineering of the dashboard requirements to model the requireddatamarts.
Cleansed, extracted and analyzed businessdataon daily basis and prepared ad-hoc analytical reports using Excel and T-SQL.
CreatedDataMigration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
Handled performance requirements for databases in OLTP and OLAP models.
Conducted meetings with business and development teams fordatavalidation and end-to-enddatamapping.
Involved in debugging and Tuning the PL/SQL code, tuning queries, optimization for the Sql database.
Leaddatamigration from legacy systems into moderndataintegration frameworks from conception to completion.
Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetchdatafrom legacy DB2 and SQL Server database systems..
Generated DDL and created the tables and views in the corresponding architectural layers.
Handled importing ofdatafrom variousdatasources, performed transformations using Map Reduce, loadeddatainto HDFS and Extracted thedatafrom MySQL into HDFS using Sqoop.
Involved in performing extensive Back-End testing by writing SQL queries and PL/SQL stored procedures to extract thedatafrom SQL Database.
Participate in code/design reviews and provide input into best practices for reports and universe development.
Involved in the validation of the OLAP, Unit testing and System Testing of the OLAP Report Functionality anddatadisplayed in the reports.
Created a high-level industry standard, generalizeddatamodel to convert it into logical and physical model at later stages of the project using Erwin and Visio.
Involved in translating business needs into long-term architecture solutions and reviewing object models,datamodels and metadata.

Environment: Erwin 9.7, HDFS, HBase, Hadoop 3.0, Metadata, MS Visio 2016, SQL Server 2016, SDLC, PL/SQL, ODS, OLAP, OLTP, flat files.

Confidential - San Francisco, CA

Data Analyst/Data Engineer

Responsibilities:

Participated in requirement gathering session, JAD sessions with users, Subject Matter experts, Architect's and BAs.
Optimized and updated UML Models (Visio) and RelationalDataModels for various applications.
Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
Worked with Business users during requirements gathering and prepared Conceptual, Logical and PhysicalDataModels.
Planned and defined system requirements to Use Case, Use Case Scenario and Use Case Narrative using the UML (Unified Modeling Language) methodologies.
Participated in JAD sessions involving the discussion of various reporting needs.
Reverse Engineering the existingdatamarts and identified theDataElements, Dimensions, Facts and Measures required for reports.
Extensively used PL/SQL in writing database packages, stored procedures, functions and triggers in Oracle.
Createddatadictionaries for variousdatamodels to help other teams understand the actual purpose of each table and its columns.
Developed the requireddatawarehouse model using Star schema for the generalized model.
Involved in designing and developing SQL server objects such as Tables, Views, Indexes (Clustered and Non-Clustered), Stored Procedures and Functions in Transact-SQL.
Used forward engineering approach for designing and creating databases for OLAP model.
Developed and maintainedDataDictionary to create Metadata Reports for technical and business purpose.
Worked with BI team in providing SQL queries,DataDictionaries and mapping documents.
Responsible for the analysis of business requirements and design implementation of the business solution.
Extensively involved inDataGovernance that involveddatadefinition,dataquality, rule definition, privacy and regulatory policies, auditing and access control.
Designed and Developed Oracle database Tables, Views, Indexes and maintained the databases by deleting and removing olddata.
DevelopedDatamapping,DataGovernance, Transformation and Cleansing rules for theDataManagement involving OLTP, ODS and OLAP.
Conducting user interviews, gathering requirements, analyzing the requirements using Rational Rose, Requisite pro RUP.
Designed and developed Use Cases, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design) using UML and Visio.
Created E/R Diagrams,DataFlow Diagrams, grouped and created the tables, validated thedata.
Designed thedatamarts in dimensionaldatamodeling using star and snowflake schemas.
Translated business concepts into XML vocabularies by designing Schemas with UML.

Environment: MS Visio 2014, PL/SQL, Oracle 11g, OLAP, XML, OLTP, SQL server, Transact-SQL.

Confidential, IL

ETL Consultant

Responsibilities:

Worked as an Informatica developer and involved in creation of initial documentation for the project and setting the goals for Data Integration team from ETL perspective.
Played a Key role in designing the application that would migrate into the existing data into Annuity warehouse effectively by using Informatica Power Center.
Parsed high-level design spec to simple ETL coding and mapping standards.
Created ftp connections, database connections for the sources and targets.
Involved in creating test files and performed testing to check the errors.
Loaded Data to the Interface tables from multiple data sources such as MS Access, SQL Server, Flat files and Excel Spreadsheets using SQL Loader, Informatica and ODBC connection.
Created different transformations for loading the data into targets like Source Qualifier Joiner Transformation, Update strategy, lookup transformation (connected and unconnected), Rank transformations, Expression, Aggregator, and Sequence Generator.
Simplified the data flow by using a Router transformation to check multiple conditions at the same time.
Created reusable transformations and mapplets to import in the common aspects of data flow to avoid complexity in the mappings.
Created sessions, sequential and concurrent batches for proper execution of mappings using workflow manager.
Used shortcuts to reuse objects without creating multiple objects in the repository and inherit changes made to the source automatically.
Extensively worked with SQL scripts to validate the pre and post data load.
Used Session parameters, Mapping variable/parameters and created Parameter files for imparting flexible runs of workflows based on changing variable values.
Responsible for monitoring scheduled, running, completed and failed sessions. Involved in debugging the failed mappings and developing error handling methods.
Generated weekly and monthly report Status for the number of incidents handled by the support team.
Maintained source and target mappings, transformation logic and processes to reflect the changing business environment over time.
Designed a mapplet to update a slowly changing dimension table to keep full history which was used across the board.
Maintained documentation for corporate Data Dictionary with attributes, table names and constraints.
Responsible for post production support and SME to the project.

Environment: Informatica Power Center, Oracle 10g, PL/SQL, SQL Server, SQL Developer Toad, Windows NT, Stored Procedures, Business Intelligence Development Studio, Microsoft Visio 2003, Business Objects.

Confidential, McLean, VA

ETL Developer

Responsibilities:

Experience in developing Logical data modeling, Reverse engineering and physical data modeling of CRM system using ER-WIN and Infosphere.
Involved design and development of Data Migration from Legacy system using Oracle Loader and import/export tools for OLTP system.
Worked closely with the Data Business Analyst to ensure the process stays on track, develop consensus on data requirements, and document data element/data model requirements via the approved process and templates.
Was involved in writing Batch Programs to run Validation Packages.
Extensively worked on Informatica Power Center-Source analyzer, Data warehousing designer, Mapping Designer, Mapplet and Transformations to import source and target definitions into the repository and to build mappings.
Extensive use of Store Procedures/functions/Packages and User Defined Functions
Proper use of Indexes to enhance the performance of individual queries and enhance the Stored Procedures for OLTP system
Dropped and recreated the Indexes on tables for performance improvements for OLTP application
Tuned SQL queries using Show Plans and Execution Plans for better performance
Done the full life cycle software development processes, especially as they pertain to data movement and data integration

Environment: Informatica PowerCenter, Oracle 10g, SQL, PL/SQL, DB2, Stored Procedures, UNIX

We provide IT Staff Augmentation Services!

Seni Software/big Data Engineer Resume

OR

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship