We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Merrimack, NH


  • Above 9+ years of experience as Big Data Engineer /Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
  • Experience in Worked on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig, and Splunk.
  • Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solve big data type problems.
  • Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
  • Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
  • Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SCIOOP, Hue, Scala and CHEF.
  • Experience in developing and designing POC's using Scala, Spark SQL and MLlib libraries then deployed on the Yarn cluster.
  • Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, SAS and Python and creating dashboards using tools like Tableau.
  • Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
  • Solid knowledge of Data Marts, Operational Data Store (ODS), OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
  • Expertise in Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import, and Data Export through the use of multiple ETL tools such as Informatica Power Centre.
  • Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer, TOAD, and SQL LOADER.
  • Strong experience with architecting highly per formant databases using PostgreSQL, PostGIS, MySQL and Cassandra.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, MLDM and MDM.
  • Experienced on R and Python for statistical computing. Also experience with MLlib (Spark), Matlab, Excel, Minitab, SPSS, and SAS
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Strong Experience in working with Databases like Teradata and proficiency in writing complex SQL, PL/SQL for creating tables, views, indexes, stored procedures and functions.
  • Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
  • Performed the performance and tuning at source, Target and Data Stage job levels using Indexes, Hints and Partitioning in DB2, ORACLE and Data Stage.
  • Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
  • Experience in Data transformation, Data Mapping from source to target database schemas, Data Cleansing procedures.
  • Extensive experience in development of T-SQL, Oracle PL/SQL Scripts, Stored Procedures and Triggers for business logic implementation.
  • Expertise in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) tools.
  • Involve in writing SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries using TOAD.
  • Good Understanding and experience in Data Mining Techniques like Classification, Clustering, Regression and Optimization.


Big Data Tools: Hadoop Ecosystem MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

Cloud Architecture: Amazon AWS, EC2, Elastic Search, Elastic Load Balancing & MS Azure

Data Modeling Tools: Erwin R9.7/9.6, ER Studio V17

BI Tools: Tableau 10, Tableau server 10, Tableau Reader 10, SAP Business Objects, Crystal Reports

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, R, AWK, SED

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

Operating Systems: Microsoft Windows Vista7/8 and 10, UNIX, and Linux.

Methodologies: Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Waterfall Model.


Confidential - Merrimack, NH

Sr. Big Data Engineer


  • As a Sr. Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
  • Responsible for the planning and execution of big data analytics, predictive analytics and machine learning initiatives.
  • Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
  • Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
  • Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
  • Designed and developed software applications, testing, and building automation tools.
  • Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
  • Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
  • Conducted performance tuning of Hadoop clusters while monitoring and managing Hadoop cluster job performance, capacity forecasting, and security.
  • Defined compute (Storage & CPU) estimations formula for ELT & Data consumption workloads from reporting tools and Ad-hoc users.
  • Analyzed Big Data Analytic technologies and applications in both business intelligence analyses.
  • Developed analytics enablement layer using ingested data that facilitates faster reporting and dashboards.
  • Worked with production support team to provide necessary support for issues with CDH cluster and the data ingestion platform.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies using Hadoop, MapReduce, HBase, Hive and Cloud Architecture.
  • Worked on implementation and maintenance of Cloudera Hadoop cluster.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
  • Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Active involvement in design, new development and SLA based support tickets of Big Machines applications.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
  • Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
  • Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
  • Developed complete end to end Big-data processing in Hadoop eco-system.
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Proof-of-concept to determine feasibility and product evaluation of Big Data products
  • Writing Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Involved in developing MapReduce framework, writing queries scheduling map-reduce
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Developed customized classes for serialization and De-serialization in Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Implemented a proof of concept deploying this product in Amazon Web Services AWS.
  • Involved in migration of data from existing RDBMS (Oracle and SQL server) to Hadoop using Sqoop for processing data.

Environment: Hadoop 3.0, MapReduce, HBase, Hive 2.3, Informatica, HDFS, Scala 2.12, Spark, Sqoop 1.4, Apache Nifi, HDFS, AWS, EC2, SQL server, Oracle 12c

Confidential - Cary, NC

Sr. Data Engineer


  • Worked with the analysis teams and management teams and supported them based on their requirements.
  • Involved in extraction, transformation and loading of data directly from different source systems (flat files/Excel/Oracle/SQL/Teradata) using SAS/SQL, SAS/macros.
  • Generated PL/SQL scripts for data manipulation, validation and materialized views for remote instances.
  • Used Agile (SCRUM) methodologies for Software Development.
  • Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL.
  • Created large datasets by combining individual datasets using various inner and outer joins in SAS/SQL and dataset sorting and merging techniques using SAS/Base.
  • Developed live reports in a drill down mode to facilitate usability and enhance user interaction
  • Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
  • Wrote Python scripts to parse XML documents and load the data in database.
  • Used Python to extract weekly information from XML files.
  • Developed Python scripts to clean the raw data.
  • Worked on AWS CLI to aggregate clean files in Amazon S3 and also on Amazon EC2 Clusters to deploy files into Buckets.
  • Used AWS CLI with IAM roles to load data to Redshift cluster,
  • Provided technical support during delivery of MDM (Master Data Management) components.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Validated regulatory finance data and created automated adjustments using advanced SAS Macros, PROC SQL, UNIX (Korn Shell) and various reporting procedures.
  • Designed reports in SSRS to create, execute, and deliver tabular reports using shared data source and specified data source. Also, Debugged and deployed reports in SSRS.
  • Optimized the performance of queries with modification in TSQL queries, established joins and created clustered indexes.
  • Used Hive, Impala and Sqoop utilities and Oozie workflows for data extraction and data loading.
  • Used Spark API over Hadoop YARN to perform analytics on data in Hive.
  • Worked with Data Governance, Data Quality and Metadata Management team to understand project.
  • Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
  • Created HBase tables to store various data formats of data coming from different sources.
  • Responsible for importing log files from various sources into HDFS using Flume.
  • Development of routines to capture and report data quality issues and exceptional scenarios.
  • Creation of Data Mapping document and data flow diagrams.
  • Developed Linux Shell scripts by using Nzsql/Nzload utilities to load data from flat files to Netezza database.
  • Involved in generating dual-axis bar chart, Pie chart and Bubble chart with multiple measures and data blending in case of merging different sources.
  • Developed dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users to understand the data on the fly with the usage of quick filters for on demand needed information.
  • Created Dashboards style of reports using QlikView components like List box Slider, Buttons, Charts and Bookmarks.
  • Coordinated with Data Architects and Data Modelers to create new schemas and view in Netezza for to improve reports execution time, worked on creating optimized Data-Mart reports.
  • Worked on QA the data and adding Data sources, snapshot, caching to the report
  • Involved in troubleshooting at database levels, error handling and performance tuning of queries and procedures.

Environment: SAS, SQL, Teradata, Oracle, PL/SQL, UNIX, XML, Python, AWS, SSRS, TSQL, Hive, Sqoop

Confidential - Troy, NY

Data Engineer


  • Worked with Business Analyst to understand the user requirements, layout, and look of the interactive dashboard to be developed in tableau.
  • Gather and documented all business requirements to migrate reports from SAS to a Netezza platform utilizing a MicroStrategy reporting tool
  • Involved in Manipulating, cleansing & processing data using Excel, Access and SQL and responsible for loading, extracting and validation of client data.
  • Used Python programs for data manipulation, automation process of generating reports of multiple data sources or dashboards
  • Designed and implemented Data Warehouse life cycle and entity-relationship/multidimensional modeling using star schema, snowflake schema
  • Involved extensively in creating Tableau Extracts, Tableau Worksheet, Actions, Tableau Functions, Tableau Connectors (Live and Extract) including drill down and drill up capabilities and Dashboard color coding, formatting and report operations (sorting, filtering, Top-N Analysis, hierarchies).
  • Data blending of patient information from different sources and for research using Tableau and Python.
  • Used Boto3 to integrate Python application with AWS Redshift, Teradata and S3.
  • Involved in Netezza Administration Activities like backup/restore, performance tuning, and Security configuration.
  • Write complex SQL statements to perform high level and detailed validation tasks for new data and/or architecture changes within the model comparing Teradata data against Netezza data.
  • Utilized various Python frameworks and libraries Pandas, Numpy and scipy for analyzing data from data sources AWS Redshift and Teradata and data manipulation.
  • Developed Python programs and batch scripts on windows for automation of ETL processes to AWS Redshift.
  • Managed the Metadata associated with the ETL processes used to populate the Data Warehouse.
  • Created sheet selector to accommodate multiple chart types (Pie, Bar, Line etc) in a single dashboard by using parameters.
  • Published Workbooks by creating user filters so that only appropriate teams can view it.
  • Worked on SAS Visual Analytics & SAS Web Report Studio for data presentation and reporting.
  • Extensively used SAS/Macros to parameterize the reports so that the user could choose the summary and sub-setting variables to be used from the web application.
  • Created Teradata External loader connections such as Mload, Upsert, Update, and Fastload while loading data into the target tables in Teradata Database.
  • Resolved the data related issues such as: assessing data quality, testing dashboards, evaluating existing data sources.
  • Created DDL scripts for implementing Data Modeling changes, reviewed SQL queries and involved in Database Design and implementing RDBMS specific features.
  • Created data mapping documents mapping Logical Data Elements to Physical Data Elements and Source Data Elements to Destination Data Elements.
  • Written SQL Scripts and PL/SQL Scripts to extract data from Database to meet business requirements and for Testing Purposes.
  • Designed the ETL process using Informatica to populate the Data Mart using the flat files to Oracle database
  • Involved in Data analysis, reporting using Tableau and SSRS.
  • Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams

Environment: Tableau Server 9.3, Tableau Desktop 9.3, AWS Redshift, Teradata, Python, SQL, PostgreSQL, Linux, Teradata SQL Assistant, Netezza, EC2, S3, Windows, Pl/Sql

Confidential - San Francisco, CA

Sr. Data Analyst


  • Worked with business requirements analysts/subject matter experts to identify and understand requirements. Conducted user interviews and data analysis review meetings.
  • Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
  • Performed thorough data analysis for the purpose of overhauling the database using SQL Server.
  • Prepared complex T-SQL queries, views and stored procedures to load data into staging area.
  • Wrote and executed unit, system, integration and UAT scripts in a data warehouse projects.
  • Worked Extensively with DBA and Reporting team for improving the Report
  • Performance with the Use of appropriate indexes and Partitioning.
  • Extensively used SQL, T-SQL and PL/SQL to write stored procedures, functions, packages and triggers.
  • Worked on Data Analysis, Data profiling, and Data Modeling, data governance identifying Data Sets, Source Data, Source Metadata, Data Definitions and Data Formats.
  • Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.
  • Worked in importing and cleansing of data from various sources like Teradata, flat files, SQL Server with high volume data.
  • Prepared process flow/activity diagram for existing system using MS Visio and re- engineer the design based on business requirements.
  • Designed and developed Use Cases, Activity Diagrams, Sequence Diagrams, using UML and Business Process Modeling.
  • Used advanced MS Excel with V-Look up and Pivot table functions to identify the issues in the data and helped in further modifications to build new versions.
  • Wrote SQL scripts to run ad-hoc queries, PL/SQL scripts, Stored Procedures & Triggers and prepare reports to the management.
  • Manipulated, cleansing & processing data using Excel, Access and SQL.
  • Created reports analyzing large-scale database utilizing Microsoft Excel Analytics within legacy system.
  • Created reports from several discovered patterns using Microsoft excel to analyze pertinent data by pivoting.
  • Performed Data Analysis and Data validation by writing complex SQL queries.
  • Developed the retail reporting requirements by analyzing the existing business objects reports.

Environment: PL/SQL, DB2, T-SQL, SQL, Teradata 14, MS Visio 2012, MS Excel 2012, MS Access 2012


Data Analyst


  • Worked closely with various business teams in gathering the business requirements.
  • Worked with business analyst to design weekly reports using combination of Crystal Reports.
  • Experienced in data cleansing and Data migration for accurate reporting
  • Worked extensively on SQL querying using Joins, Alias, Functions, Triggers and Indexes.
  • Managed all indexing, debugging and query optimization techniques for performance tuning using T-SQL.
  • Wrote T-SQL statements for retrieval of data and Involved in performance tuning of T-SQL queries and Stored Procedures.
  • Performed data analysis, statistical analysis, generated reports, listings and graphs using SAS tools-SAS/Base, SAS/Macros and SAS graph, SAS/SQL, SAS/Connect, and SAS/Access.
  • Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
  • Developed SQL Server database to replace existing Access databases.
  • Performed thorough data analysis for the purpose of overhauling the database using SQL Server.
  • Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
  • Developed SQL scripts involving complex joins for reporting purposes.
  • Developed ad hoc reports using Crystal reports for performance analysis by business users.
  • Assisted with designing database packages and procedures.
  • Involved in defining the source to target data mappings, business rules, data definitions.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization.
  • Data analysis and reporting using MS Power Point, MS Access and SQL assistant.
  • Worked on CSV files while trying to get input from the MySQL database.
  • Created functions, triggers, views and stored procedures using MySQL.
  • Worked on database testing, wrote complex SQL queries to verify the transactions and business logic.

Environment: Crystal Reports, T-SQL, SAS, PL/SQL, DB2, SQL Server, MS Power Point, MS Access, SQL assistant, MySQL

Hire Now