Sr. Big Data Engineer Resume New York, NY - Hire IT People

SUMMARY:

Above 8+ years of experience as Big Data Engineer /Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
Strong knowledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, BeanStalk, ECS, Cloudwatch, Lambda, ELB, VPC, ElasticCache, DynamoDB, Redshit, RDS, Aethna, Zeppelin & Airflow.
Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS).
Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
Experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
Expert level in Server side development using J2EE , EJB, J2SE, Spring, Servlets, J2SE, Python, C++ on Windows, Unix and Linux Platform.
Involve in writing SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries using TOAD
Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ and MDM.
Experience in cloud development architecture on Amazon AWS, EC2, Redshift and Basic on Azure.
Experience in Work on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
Excellent Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
Experience in Working Dimensional Data modeling, Star Schema/Snow flake schema, Fact & Dimensions Tables.
Strong knowledge of Spark for handling large data processing in streaming process along with Scala.
Thorough Knowledge in creating DDL, DML and Transaction queries in SQL for Oracle and Teradata databases.
Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
Expertise in performing User Acceptance Testing (UAT) and conducting end user training sessions.
Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
Good expertise knowledge with the UNIX commands like changing the permissions of the file to file and group permissions.
Experience in building reports using SQL Server Reporting Services and Crystal Reports.
Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server, and MySQL.
Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems
Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.

TECHNICAL SKILLS:

Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6.

Big Data / Hadoop Ecosystem: MapReduce, HBase 1.2, Hive 2.3, Pig 0.17, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

Cloud Management: Amazon Web Services(AWS), Amazon Redshift

ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, ANSI SQL, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.

Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio & Visual Source Safe

Operating System: Windows, Unix, Sun Solaris

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.

PROFESSIONAL EXPERIENCE:

Sr. Big Data Engineer

Confidential - New York, NY

Responsibilities:

Designed and developed software applications, testing, and building automation tools.
Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, MapReduce, Spark and Shells scripts (for scheduling of few jobs).
Involved in importing and exporting data between RDBMS and HDFS using Sqoop.
Performed querying of both managed and external tables created by Hive using Impala.
Designed/developed tables, views, various SQL queries, stored procedures, functions.
Involved in PL/SQL code review and modification for the development of new requirements.
Extracted data from existing data source and performed ad-hoc queries.
Utilized SAS and SQL extensively for collecting, validating and analyzing the raw data received from the client.
Executed data extraction programs/data profiling and analyzing data for accuracy and quality.
Analyzed the data using advanced excel functions like Pivot tables, VLOOK up, visualizations to get the descriptive analysis of the data.
Created Schema objects like Indexes, Views, and Sequences, triggers, grants, roles, Snapshots.
Used advanced Microsoft Excel to create pivot tables, and other excel functions to prepare reports and dashboard with user data.
Maintained numerous monthly scripts, executed on monthly basis, produces reports and submitted on time for business review.
Developed ad-hoc reports using Crystal reports for performance analysis by business users.
Implemented the Big Data solution using Hadoop, and hive to pull/load the data into the HDFS system.
Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
Implemented and configured workflows using Oozie to automate jobs.
Migrated the needed data from MySQL in to HDFS using Sqoop and importing various formats of flat files into HDFS.
As a Sr. Big Data Engineer, provided technical expertise and aptitude to Hadoop technologies as they related to the development of analytics.
Expertise in writing Hadoop Jobs to analyze data using MapReduce, Hive, and Pig.
Experience in designing, building and implementing complete Hadoop ecosystem comprising of MapReduce, HDFS, Hive, Pig, HBase, MongoDB, and Spark.
Worked experience in Scrum / Agile framework and Waterfall project execution methodologies.
Managed data from various file system to HDFS using UNIX command line utilities.
Worked in Azure environment for development and deployment of Custom Hadoop Applications.
Managed and support of enterprise Data Warehouse operation, big data advanced predictive application development using Cloudera &.
Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase, Zookeeper and Sqoop.
Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.

Environment: Hadoop 3.0, MapReduce, Hive 2.3, Pig 0.17, HDFS, HBase, MongoDB, Agile, Azure, MySQL, Oozie, MySQL, Sqoop1.4, HBase1.2.

Confidential, Merrimack, NH

Sr. Big Data Engineer

Responsibilities:

Worked as a Big Data Engineer to Import and export data from different databases.
Prepared ETL technical Mapping Documents along with test cases for each Mapping for future developments to maintain Software Development Life Cycle (SDLC).
Wrote script for Location Analytic project deployment on a Linux cluster/farm & AWS Cloud deployment using Python.
Wrote Scala/Spark/AWS EMR cloud application to process & transform billions of Rest & mobile events generated on Realtor.com every hour.
Created Logical & Physical Data Modeling on Relational (OLTP), Dimensional Data Modeling (OLAP) on Star schema for Fact & Dimension tables using Erwin.
Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily.
Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
Used Python, Ruby, Pig, Hive, Sqoop to implement various tools & utilities for data import & export.
Utilize U-SQL for data analytics/ data ingestion of raw data in Azure and Blob storage
Performed thorough data analysis for the purpose of overhauling the database using ANSI-SQL.
Translated business concepts into XML vocabularies by designing XML Schemas with UML.
Worked with medical claim data in the Oracle database for Inpatient/Outpatient data validation, trend and comparative analysis.
Designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
Implemented reporting in PySpark, Zeppelin & querying through Airpal & AWS Aethna.
Implemented Data Validation using MapReduce programs to remove un-necessary records before move data into Hive tables.
Worked on analyzing and examining customer behavioral data using MongoDB.
Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and Netezza
Worked with MDM systems team with respect to technical aspects and generating reports.
Worked on with importing and exporting data from different Relational Database Systems like DB2 into HDFS and Hive and vice-versa, using Sqoop
Supported various reporting teams and experience with data visualization tool Tableau.
Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
Designed the data aggregations on Hive for ETL processing on Amazon EMR to process data as per business requirement
Developed and designed data integration and migration solutions in Azure.
Developed Sqoop scripts to extract the data from MYSQL and load into HDFS.
Prepared complex ANSI-SQL queries, views and stored procedures to load data into staging area.
Involved in Hive-HBase integration by creating hive external tables and specifying storage as HBase format.
Analyzed the data which is using the maximum number of resources and made changes in the back-end code using PL/SQL stored procedures and triggers
Deployed SSRS reports to Report Manager and created linked reports, snapshots, and subscriptions for the reports and worked on scheduling of the reports.
Integrated multiple sources data (SQL Server, DB2, TD) into Hadoop cluster and analyzed data by Hive-HBase integration.
Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.

Environment: Hadoop 3.0, Kafka 1.1, SQL, PL/SQL, OLAP, ANSI-SQL, Java, Python, OLTP, SDLC, MDM, Netezza, Oracle 12c, XML, MapReduce, SSRS, MYSQL, T-SQL, Teradata 15, Azure, AWS, ETL, HDFS, Hive 2.3, Sqoop 1.4, Tableau, Pig 0.17.

Confidential - West Point, PA

Sr. Data Engineer

Responsibilities:

Worked as a Data Engineer designed and Modified Database tables and used HBase Queries to insert and fetch data from tables.
Participated in design discussions and assured functional specifications are delivered in all phases of SDLC in an Agile Environment.
Designed & Developed tool like Ambari & Chef for MPact Software, Hadoop, Hbase & SparkDeployment, Configuration, Monitoring, HA, Load & Data Balancing, Scalability on AWS, ESXI, XEN & distributed cluster using Java 8, Spring, Scala, Python & Ruby.
Created data models for AWS Redshift and Hive from dimensional data models.
Executed change management processes surrounding new releases of SAS functionality
Prepared complex T-SQL queries, views and stored procedures to load data into staging area.
Performed Normalization of the existing (3rd NF), to speed up the DML statements execution time.
Participated in data collection, data cleaning, data mining, developing models and visualizations.
Worked with Sqoop to transfer data between the HDFS to relational database like MySQL and vice versa and experience in using of Talend for this purpose.
Developed Server side and client-side web applications using Spring 2.5, Struts 2, EJB, Hibernate, IBatis, JSF, JSTL, ExtJS and Web 2.0 Ajax frameworks. Developed small intranet sites using Python.
Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally.
Used SSIS and T-SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into data-mart.
Extracted Tables and exported data from Teradata through Sqoop and placed in Cassandra.
Worked on analyzing and examining customer behavioral data using MongoDB.
Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
Generated parameterized queries for generating tabular reports using global variables, expressions, functions, and stored procedures using SSRS.
Created Hive External tables to stage data and then move the data from Staging to main tables
Created jobs and transformation in Pentaho Data Integration to generate reports and transfer data from HBase to RDBMS.
Wrote DDL and DML statements for creating, altering tables and converting characters into numeric values.
Worked on Master data Management (MDM) Hub and interacted with multiple stakeholders.
Worked on Kafka and Storm to ingest the real time data streams, to push the data to appropriate HDFS or HBase.
Extensively involved in development and implementation of SSIS and SSAS applications.
Collaborated with ETL, and DBA teams to analyze and provide solutions to data issues and other challenges while implementing the OLAP model.
Implemented Star Schema methodologies in modeling and designing the logical data model into Dimensional Models.
Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.
Worked on Performance Tuning of the database which includes indexes, optimizing SQL Statements.
Worked with OLTP to find the daily transactions and type of transactions occurred and the amount of resource used
Developed a Conceptual Model and Logical Model using Erwin based on requirements analysis.
Created reports analyzing large-scale database utilizing Microsoft Excel Analytics within legacy system.
Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy oracle and SQL server database systems.

Environment: Hadoop 3.0, HDFS, HBase 1.2, SSIS, SSAS, OLAP, OLTP, ETL, Java, ANSI-SQL, AWS, SDLC, T-SQL, SAS, MySQL, HDFS, Sqoop 1.4, Cassandra 3.0, MongoDB, Hive 2.3, SQL, PL/SQL, Teradata 15, Oracle 12c, MDM.

Confidential - Lowell, AR

Sr. Data Analyst/Data Engineer

Roles & Responsibilities

Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
Used Agile Central (Rally) to enter tasks which has the visibility to all the team and Scrum Master.
Connected to AWS Redshift through Tableau to extract live data for real time analysis.
Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases in to EDW.
Implemented Forward engineering to create tables, views and SQL scripts and mapping documents
Captured the data logs from web server into HDFS using Flume for analysis.
Involved in PL/SQL code review and modification for the development of new requirements.
Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP.
Worked closely with the ETL SSIS Developers to explain the complex Data Transformation using Logic.
Worked with MDM systems team with respect to technical aspects and generating reports.
Worked on Data Mining and data validation to ensure the accuracy of the data between the warehouse and source systems.
Created logical and physical data models using Erwin and reviewed these models with business team and data architecture team.
Used SAS procedures like means, frequency and other statistical calculations for Data validation.
Developed and presented Business Intelligence reports and product demos to the team using SSRS (SQL Server Reporting Services).
Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS).
Performed data analysis and data profiling using complex SQL on various sources systems including Oracle.
Worked in importing and cleansing of data from various sources like Teradata, flat files, SQL Server with high volume data.
Handled importing of data from various data sources, performed transformations using Map Reduce, loaded data into HDFS and Extracted the data from My SQL into HDFS using Sqoop
Integrated various sources in to the Staging area in Data warehouse to Integrating and Cleansing data.
Cleansed, extracted and analyzed business data on daily basis and prepared ad-hoc analytical reports using Excel and T-SQL

Environment: Erwin 9.6, Teradata r14, Oracle 11g, SQL, T-SQL, PL/SQL, AWS, Agile, OLAP, OLTP, SSIS, HDFS, SAS, Flume, SSRS, Sqoop, Map Reduce, My SQL, HDFS.

Confidential - SFO, CA

Sr. Data Analyst/Data Modeler

Responsibilities:

Worked as a Sr. Data Analyst / Data Modeler I was responsible for all data related aspects of a project.
Participated in requirement gathering session, JAD sessions with users, Subject Matter experts, and BAs.
Reverse Engineered DB2 databases and then forward engineered them to Teradata using E/R Studio.
Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
Improved performance on SQL queries used indexes for tuning created DDL scripts for database. Created PL/SQL Procedures and Triggers.
Extensively used of data transformation tools such as SSIS, Informatica or Data Stage.
Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
Performed data mining on data using very complex SQL queries and discovered pattern.
Designed both 3NF data models for OLTP systems and dimensional data models using star and snowflake Schemas
Worked on Normalization and De-Normalization techniques for OLAP systems.
Created Tableau scorecards, dashboards using stack bars, bar graphs, geographical maps and Gantt charts.
Analyzed the business requirements by dividing them into subject areas and understood the data flow within the organization.
Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
Used E/R Studio for reverse engineering to connect to existing database and ODS to create graphical representation in the form of Entity Relationships and elicit more information
Created mappings using pushdown optimization to achieve good performance in loading data into Netezza.
Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects.
Wrote T-SQL statements for retrieval of data and Involved in performance tuning of T-SQL queries and Stored Procedures.
Created and reviewed the conceptual model for the EDW (Enterprise Data Warehouse) with business user.
Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services (SSRS).
Created a list of domains in E/R Studio and worked on building up the data dictionary for the company
Created a Data Mapping document after each assignment and wrote the transformation rules for each field as applicable
Performed data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs.

Environment: E/R Studio V15, Teradata, SQL, PL/SQL, T-SQL, OLTP, SSIS, SSRS, OLAP, Tableau, OLTP, Netezza.

We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

New York, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship