Sr. Big Data Engineer Resume
New York, NY
SUMMARY:
- Above 8+ years of experience as Big Data Engineer /Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
- Strong knowledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Expert in Amazon EMR, Spark, Kinesis, S3, Boto3, BeanStalk, ECS, Cloudwatch, Lambda, ELB, VPC, ElasticCache, DynamoDB, Redshit, RDS, Aethna, Zeppelin & Airflow.
- Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS).
- Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
- Experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
- Expert level in Server side development using J2EE , EJB, J2SE, Spring, Servlets, J2SE, Python, C++ on Windows, Unix and Linux Platform.
- Involve in writing SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries using TOAD
- Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
- Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ and MDM.
- Experience in cloud development architecture on Amazon AWS, EC2, Redshift and Basic on Azure.
- Experience in Work on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
- Excellent Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Experience in Working Dimensional Data modeling, Star Schema/Snow flake schema, Fact & Dimensions Tables.
- Strong knowledge of Spark for handling large data processing in streaming process along with Scala.
- Thorough Knowledge in creating DDL, DML and Transaction queries in SQL for Oracle and Teradata databases.
- Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
- Expertise in performing User Acceptance Testing (UAT) and conducting end user training sessions.
- Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
- Good expertise knowledge with the UNIX commands like changing the permissions of the file to file and group permissions.
- Experience in building reports using SQL Server Reporting Services and Crystal Reports.
- Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
- Experience in writing stored procedures and complex SQL queries using relational databases like Oracle, SQL Server, and MySQL.
- Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems
- Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
- Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
TECHNICAL SKILLS:
Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6.
Big Data / Hadoop Ecosystem: MapReduce, HBase 1.2, Hive 2.3, Pig 0.17, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11
Cloud Management: Amazon Web Services(AWS), Amazon Redshift
ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, ANSI SQL, SED
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.
Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio & Visual Source Safe
Operating System: Windows, Unix, Sun Solaris
Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.
PROFESSIONAL EXPERIENCE:
Sr. Big Data Engineer
Confidential - New York, NY
Responsibilities:
- Designed and developed software applications, testing, and building automation tools.
- Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, MapReduce, Spark and Shells scripts (for scheduling of few jobs).
- Involved in importing and exporting data between RDBMS and HDFS using Sqoop.
- Performed querying of both managed and external tables created by Hive using Impala.
- Designed/developed tables, views, various SQL queries, stored procedures, functions.
- Involved in PL/SQL code review and modification for the development of new requirements.
- Extracted data from existing data source and performed ad-hoc queries.
- Utilized SAS and SQL extensively for collecting, validating and analyzing the raw data received from the client.
- Executed data extraction programs/data profiling and analyzing data for accuracy and quality.
- Analyzed the data using advanced excel functions like Pivot tables, VLOOK up, visualizations to get the descriptive analysis of the data.
- Created Schema objects like Indexes, Views, and Sequences, triggers, grants, roles, Snapshots.
- Used advanced Microsoft Excel to create pivot tables, and other excel functions to prepare reports and dashboard with user data.
- Maintained numerous monthly scripts, executed on monthly basis, produces reports and submitted on time for business review.
- Developed ad-hoc reports using Crystal reports for performance analysis by business users.
- Implemented the Big Data solution using Hadoop, and hive to pull/load the data into the HDFS system.
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Implemented and configured workflows using Oozie to automate jobs.
- Migrated the needed data from MySQL in to HDFS using Sqoop and importing various formats of flat files into HDFS.
- As a Sr. Big Data Engineer, provided technical expertise and aptitude to Hadoop technologies as they related to the development of analytics.
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Hive, and Pig.
- Experience in designing, building and implementing complete Hadoop ecosystem comprising of MapReduce, HDFS, Hive, Pig, HBase, MongoDB, and Spark.
- Worked experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Managed data from various file system to HDFS using UNIX command line utilities.
- Worked in Azure environment for development and deployment of Custom Hadoop Applications.
- Managed and support of enterprise Data Warehouse operation, big data advanced predictive application development using Cloudera &.
- Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
- Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
- Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase, Zookeeper and Sqoop.
- Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
- Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
Environment: Hadoop 3.0, MapReduce, Hive 2.3, Pig 0.17, HDFS, HBase, MongoDB, Agile, Azure, MySQL, Oozie, MySQL, Sqoop1.4, HBase1.2.
Confidential, Merrimack, NH
Sr. Big Data Engineer
Responsibilities:
- Worked as a Big Data Engineer to Import and export data from different databases.
- Prepared ETL technical Mapping Documents along with test cases for each Mapping for future developments to maintain Software Development Life Cycle (SDLC).
- Wrote script for Location Analytic project deployment on a Linux cluster/farm & AWS Cloud deployment using Python.
- Wrote Scala/Spark/AWS EMR cloud application to process & transform billions of Rest & mobile events generated on Realtor.com every hour.
- Created Logical & Physical Data Modeling on Relational (OLTP), Dimensional Data Modeling (OLAP) on Star schema for Fact & Dimension tables using Erwin.
- Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Used Python, Ruby, Pig, Hive, Sqoop to implement various tools & utilities for data import & export.
- Utilize U-SQL for data analytics/ data ingestion of raw data in Azure and Blob storage
- Performed thorough data analysis for the purpose of overhauling the database using ANSI-SQL.
- Translated business concepts into XML vocabularies by designing XML Schemas with UML.
- Worked with medical claim data in the Oracle database for Inpatient/Outpatient data validation, trend and comparative analysis.
- Designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
- Implemented reporting in PySpark, Zeppelin & querying through Airpal & AWS Aethna.
- Implemented Data Validation using MapReduce programs to remove un-necessary records before move data into Hive tables.
- Worked on analyzing and examining customer behavioral data using MongoDB.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Performed data analysis and data profiling using complex SQL on various sources systems including Oracle and Netezza
- Worked with MDM systems team with respect to technical aspects and generating reports.
- Worked on with importing and exporting data from different Relational Database Systems like DB2 into HDFS and Hive and vice-versa, using Sqoop
- Supported various reporting teams and experience with data visualization tool Tableau.
- Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
- Designed the data aggregations on Hive for ETL processing on Amazon EMR to process data as per business requirement
- Developed and designed data integration and migration solutions in Azure.
- Developed Sqoop scripts to extract the data from MYSQL and load into HDFS.
- Prepared complex ANSI-SQL queries, views and stored procedures to load data into staging area.
- Involved in Hive-HBase integration by creating hive external tables and specifying storage as HBase format.
- Analyzed the data which is using the maximum number of resources and made changes in the back-end code using PL/SQL stored procedures and triggers
- Deployed SSRS reports to Report Manager and created linked reports, snapshots, and subscriptions for the reports and worked on scheduling of the reports.
- Integrated multiple sources data (SQL Server, DB2, TD) into Hadoop cluster and analyzed data by Hive-HBase integration.
- Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.
Environment: Hadoop 3.0, Kafka 1.1, SQL, PL/SQL, OLAP, ANSI-SQL, Java, Python, OLTP, SDLC, MDM, Netezza, Oracle 12c, XML, MapReduce, SSRS, MYSQL, T-SQL, Teradata 15, Azure, AWS, ETL, HDFS, Hive 2.3, Sqoop 1.4, Tableau, Pig 0.17.
Confidential - West Point, PA
Sr. Data Engineer
Responsibilities:
- Worked as a Data Engineer designed and Modified Database tables and used HBase Queries to insert and fetch data from tables.
- Participated in design discussions and assured functional specifications are delivered in all phases of SDLC in an Agile Environment.
- Designed & Developed tool like Ambari & Chef for MPact Software, Hadoop, Hbase & SparkDeployment, Configuration, Monitoring, HA, Load & Data Balancing, Scalability on AWS, ESXI, XEN & distributed cluster using Java 8, Spring, Scala, Python & Ruby.
- Created data models for AWS Redshift and Hive from dimensional data models.
- Executed change management processes surrounding new releases of SAS functionality
- Prepared complex T-SQL queries, views and stored procedures to load data into staging area.
- Performed Normalization of the existing (3rd NF), to speed up the DML statements execution time.
- Participated in data collection, data cleaning, data mining, developing models and visualizations.
- Worked with Sqoop to transfer data between the HDFS to relational database like MySQL and vice versa and experience in using of Talend for this purpose.
- Developed Server side and client-side web applications using Spring 2.5, Struts 2, EJB, Hibernate, IBatis, JSF, JSTL, ExtJS and Web 2.0 Ajax frameworks. Developed small intranet sites using Python.
- Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally.
- Used SSIS and T-SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into data-mart.
- Extracted Tables and exported data from Teradata through Sqoop and placed in Cassandra.
- Worked on analyzing and examining customer behavioral data using MongoDB.
- Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
- Worked on Amazon AWS concepts like EMR and EC2 web services for fast and efficient processing of Big Data.
- Generated parameterized queries for generating tabular reports using global variables, expressions, functions, and stored procedures using SSRS.
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Created jobs and transformation in Pentaho Data Integration to generate reports and transfer data from HBase to RDBMS.
- Wrote DDL and DML statements for creating, altering tables and converting characters into numeric values.
- Worked on Master data Management (MDM) Hub and interacted with multiple stakeholders.
- Worked on Kafka and Storm to ingest the real time data streams, to push the data to appropriate HDFS or HBase.
- Extensively involved in development and implementation of SSIS and SSAS applications.
- Collaborated with ETL, and DBA teams to analyze and provide solutions to data issues and other challenges while implementing the OLAP model.
- Implemented Star Schema methodologies in modeling and designing the logical data model into Dimensional Models.
- Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.
- Worked on Performance Tuning of the database which includes indexes, optimizing SQL Statements.
- Worked with OLTP to find the daily transactions and type of transactions occurred and the amount of resource used
- Developed a Conceptual Model and Logical Model using Erwin based on requirements analysis.
- Created reports analyzing large-scale database utilizing Microsoft Excel Analytics within legacy system.
- Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy oracle and SQL server database systems.
Environment: Hadoop 3.0, HDFS, HBase 1.2, SSIS, SSAS, OLAP, OLTP, ETL, Java, ANSI-SQL, AWS, SDLC, T-SQL, SAS, MySQL, HDFS, Sqoop 1.4, Cassandra 3.0, MongoDB, Hive 2.3, SQL, PL/SQL, Teradata 15, Oracle 12c, MDM.
Confidential - Lowell, AR
Sr. Data Analyst/Data Engineer
Roles & Responsibilities
- Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
- Used Agile Central (Rally) to enter tasks which has the visibility to all the team and Scrum Master.
- Connected to AWS Redshift through Tableau to extract live data for real time analysis.
- Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
- Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases in to EDW.
- Implemented Forward engineering to create tables, views and SQL scripts and mapping documents
- Captured the data logs from web server into HDFS using Flume for analysis.
- Involved in PL/SQL code review and modification for the development of new requirements.
- Developed Data mapping, Transformation and Cleansing rules for the Data Management involving OLTP and OLAP.
- Worked closely with the ETL SSIS Developers to explain the complex Data Transformation using Logic.
- Worked with MDM systems team with respect to technical aspects and generating reports.
- Worked on Data Mining and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Created logical and physical data models using Erwin and reviewed these models with business team and data architecture team.
- Used SAS procedures like means, frequency and other statistical calculations for Data validation.
- Developed and presented Business Intelligence reports and product demos to the team using SSRS (SQL Server Reporting Services).
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS).
- Performed data analysis and data profiling using complex SQL on various sources systems including Oracle.
- Worked in importing and cleansing of data from various sources like Teradata, flat files, SQL Server with high volume data.
- Handled importing of data from various data sources, performed transformations using Map Reduce, loaded data into HDFS and Extracted the data from My SQL into HDFS using Sqoop
- Integrated various sources in to the Staging area in Data warehouse to Integrating and Cleansing data.
- Cleansed, extracted and analyzed business data on daily basis and prepared ad-hoc analytical reports using Excel and T-SQL
Environment: Erwin 9.6, Teradata r14, Oracle 11g, SQL, T-SQL, PL/SQL, AWS, Agile, OLAP, OLTP, SSIS, HDFS, SAS, Flume, SSRS, Sqoop, Map Reduce, My SQL, HDFS.
Confidential - SFO, CA
Sr. Data Analyst/Data Modeler
Responsibilities:
- Worked as a Sr. Data Analyst / Data Modeler I was responsible for all data related aspects of a project.
- Participated in requirement gathering session, JAD sessions with users, Subject Matter experts, and BAs.
- Reverse Engineered DB2 databases and then forward engineered them to Teradata using E/R Studio.
- Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
- Improved performance on SQL queries used indexes for tuning created DDL scripts for database. Created PL/SQL Procedures and Triggers.
- Extensively used of data transformation tools such as SSIS, Informatica or Data Stage.
- Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
- Performed data mining on data using very complex SQL queries and discovered pattern.
- Designed both 3NF data models for OLTP systems and dimensional data models using star and snowflake Schemas
- Worked on Normalization and De-Normalization techniques for OLAP systems.
- Created Tableau scorecards, dashboards using stack bars, bar graphs, geographical maps and Gantt charts.
- Analyzed the business requirements by dividing them into subject areas and understood the data flow within the organization.
- Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
- Used E/R Studio for reverse engineering to connect to existing database and ODS to create graphical representation in the form of Entity Relationships and elicit more information
- Created mappings using pushdown optimization to achieve good performance in loading data into Netezza.
- Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects.
- Wrote T-SQL statements for retrieval of data and Involved in performance tuning of T-SQL queries and Stored Procedures.
- Created and reviewed the conceptual model for the EDW (Enterprise Data Warehouse) with business user.
- Generated periodic reports based on the statistical analysis of the data using SQL Server Reporting Services (SSRS).
- Created a list of domains in E/R Studio and worked on building up the data dictionary for the company
- Created a Data Mapping document after each assignment and wrote the transformation rules for each field as applicable
- Performed data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs.
Environment: E/R Studio V15, Teradata, SQL, PL/SQL, T-SQL, OLTP, SSIS, SSRS, OLAP, Tableau, OLTP, Netezza.