Sr. Big Data Engineer Resume
Chicago, IL
SUMMARY:
- Over 8+ years of experience as Big Data Engineer /Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
- Experience in designing the Conceptual, Logical and Physical data modeling using Erwin and E/R Studio Data modeling tools.
- Strong knowledge of Spark for handling large data processing in streaming process along with Scala.
- Good in System analysis, ER Dimensional Modeling, Database design and implementing RDBMS specific features.
- Experience in Worked on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig, and Splunk.
- Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solve big data type problems.
- Knowledge and working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
- Experience in Developing Data Models for both OLTP/OLAP systems.
- Exposure to Both Kimball and Bill Inmon Data Warehousing Approaches.
- Experience in Working Dimensional Data modeling, Star Schema/Snow flake schema, Fact & Dimensions Tables.
- Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra.
- Experience in building reports using SQL Server Reporting Services and Crystal Reports.
- Experience in performing Reverse Engineering of Physical Data Models from data, SQL scripts.
- Working extensively on Forward engineering processes. Created DDL scripts for implementing Data Modeling changes.
- Experience in Creating Partitions, Indexes, and Indexed views to improve the performance, reduce contention and increase the availability of data.
- Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
- Expert in writing SQL queries and optimizing the queries in Oracle, SQL Server.
- Extensive experience in development of Oracle PL/SQL Scripts, Stored Procedures and Triggers for business logic implementation.
- Experience in designing the Data Mart and creation of Cubes.
- Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
- Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
- Experience in Performance Tuning and query optimization techniques in transactional and Data Warehouse Environments.
- Experience in using SSIS in solving complex business problems.
- Proficient in writing DDL, DML commands using SQL developer and Toad.
- Expertise in performing User Acceptance Testing (UAT) and conducting end user training sessions.
- Proficient in data governance, data quality, metadata management, master data management.
- Involved in analysis, development and migration of Stored Procedures, Triggers, Views and other related database objects
- Proficient in R Programming Language, Data extraction, Data cleaning, Data Loading, Data Transformation, and Data visualization.
- Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
TECHNICAL SKILLS:
Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6.
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack
Programming Languages: SQL, PL/SQL, C++, UNIX shell Scripting, PERL, AWK, SED
Big Data Tools: Hadoop Ecosystem: MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Cloudera Manager, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11
Cloud Management: Amazon Web Services(AWS), Amazon Redshift
Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio & Visual Source Safe
Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.
Operating System: Windows, Unix, Sun Solaris
ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.
PROFESSIONAL EXPERIENCE:
Confidential - Chicago, IL
Sr. Big Data Engineer
Responsibilities:
- As a Sr. Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
- Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive.
- Used Agile (SCRUM) methodologies for Software Development.
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
- Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
- Designed and develop end to end ETL processing from Oracle to AWS using Amazon S3, EMR, and Spark.
- Developed the code to perform Data extractions from Oracle Database and load it into AWS platform using AWS Data Pipeline.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
- Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Implemented AWS cloud computing platform using S3, RDS, Dynamo DB, Redshift, and Python.
- Responsible in loading and transforming huge sets of structured, semi structured and unstructured data.
- Implemented business logic by writing UDFs and configuring CRON Jobs.
- Extensively involved in writing PL/SQL, stored procedures, functions and packages.
- Created logical and physical data models using Erwin and reviewed these models with business team and data architecture team.
- Lead architecture and design of data processing, warehousing and analytics initiatives.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies using Hadoop, MapReduce, HBase, Hive and Cloud Architecture.
- Involved in converting MapReduce programs into Spark transformations using Spark python API.
- Developed Spark scripts by using python and bash Shell commands as per the requirement.
- Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
- Responsible for translating business and data requirements into logical data models in support Enterprise data models, ODS, OLAP, OLTP and Operational data structures.
- Created SSIS packages to migrate data from heterogeneous sources such as MS Excel, Flat files and CVS files.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
- Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
- Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
- Worked closely with the SSIS, SSRS Developers to explain the complex data transformation using Logic.
- Designed Data Marts by following Star Schema and Snowflake Schema Methodology, using industry leading Data modeling tools like Erwin.
- Developed the Star Schema/Snowflake Schema for proposed warehouse models to meet the requirements.
- Designed class and activity diagrams using Power Designer and UML tools like Visio.
- Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
- Developed Spark streaming application to pull data from cloud to Hive table.
- Used Spark SQL to process the huge amount of structured data.
- Used Talend for Big data Integration using Spark and Hadoop
- Used Microsoft Windows server and authenticated client server relationship via Kerberos protocol.
- Assigned name to each of the columns using case class option in Scala.
Environment: Hive 2.3, MapReduce, Hadoop 3.0, HDFS, Oracle, Spark 2.3, HBase 1.2, Flume 1.8, Pig 0.17, Sqoop 1.4, Oozie 4.3, Python, PL/SQL, C++, Erwin 9.7, NoSQL, OLAP, OLTP, SSIS, MS Excel 2016, SSRS, Visio
Confidential - Atlanta, GA
Sr. Data Engineer
Responsibilities:
- Participated in requirements sessions to gather requirements along with business analysts and product owners.
- Involved in Agile development methodology active member in scrum meetings.
- Involvement in design, development and testing phases of Software Development Life Cycle (SDLC).
- Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
- Architected, Designed and Developed Business applications and Data marts for reporting.
- Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
- Developed Big Data solutions focused on pattern matching and predictive modeling
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
- Installed and configured Hadoop Ecosystem components.
- Worked on implementation and maintenance of Cloudera Hadoop cluster.
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
- Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Experience in Server infrastructure development on Gateway, ELB, Auto Scaling, Dynamo DB, Elastic search, Virtual Private Cloud (VPC)
- Involved in Kafka and building use case relevant to our environment.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
- Created Integration Relational 3NF models that can functionally relate to other subject areas and responsible to determine transformation rules accordingly in the Functional Specification Document.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
- Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
- Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
- Developed Spark code using Scala for faster testing and processing of data.
- Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for our use case.
- Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
- Developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
- Collaborated with Business users for requirement gathering for building Tableau reports per business needs.
- Developed continuous flow of data into HDFS from social feeds using Apache Storm Spouts and Bolts.
- Involved in loading data from Unix file system to HDFS.
Environment: Spark, 3NF, flume 1.8, Sqoop 1.4, pig 0.17, Hadoop 3.0, YARN, HDFS, HBase 1.2, Kafka, Scala 2.12, NoSQL, C++, Cassandra 3.11, Elastic Search, MLlib, Teradata 15, Sqoop, MapReduce, UNIX, Zookeeper 3.4
Confidential - Greensboro, NC
Sr. Data Analyst/Data Engineer
Roles & Responsibilities
- Worked with the analysis teams and management teams and supported them based on their requirements.
- Involved in extraction, transformation and loading of data directly from different source systems (flat files/Excel/Oracle/SQL/Teradata) using SAS/SQL, SAS/macros.
- Generated PL/SQL scripts for data manipulation, validation and materialized views for remote instances.
- Created and modified several database objects such as Tables, Views, Indexes, Constraints, Stored procedures, Packages, Functions and Triggers using SQL and PL/SQL.
- Created large datasets by combining individual datasets using various inner and outer joins in SAS/SQL and dataset sorting and merging techniques using SAS/Base.
- Developed live reports in a drill down mode to facilitate usability and enhance user interaction
- Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
- Wrote Python scripts to parse XML documents and load the data in database.
- Used Python to extract weekly information from XML files.
- Developed Python scripts to clean the raw data.
- Worked on AWS CLI to aggregate clean files in Amazon S3 and also on Amazon EC2 Clusters to deploy files into Buckets.
- Used AWS CLI with IAM roles to load data to Redshift cluster,
- Responsible for in depth data analysis and creation of data extract queries in both Netezza and Teradata databases
- Extensive development in Netezza platform using PL SQL and advanced SQLs.
- Validated regulatory finance data and created automated adjustments using advanced SAS Macros, PROC SQL, UNIX (Korn Shell) and various reporting procedures.
- Designed reports in SSRS to create, execute, and deliver tabular reports using shared data source and specified data source. Also, Debugged and deployed reports in SSRS.
- Optimized the performance of queries with modification in TSQL queries, established joins and created clustered indexes
- Used Hive and Sqoop utilities and Oozie workflows for data extraction and data loading.
- Development of routines to capture and report data quality issues and exceptional scenarios.
- Creation of Data Mapping document and data flow diagrams.
- Developed Linux Shell scripts by using Nzsql/Nzload utilities to load data from flat files to Netezza database.
- Involved in generating dual-axis bar chart, Pie chart and Bubble chart with multiple measures and data blending in case of merging different sources.
- Developed dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users to understand the data on the fly with the usage of quick filters for on demand needed information.
- Created Dashboards style of reports using QlikView components like List box Slider, Buttons, Charts and Bookmarks.
- Coordinated with Data Architects and Data Modelers to create new schemas and view in Netezza for to improve reports execution time, worked on creating optimized Data-Mart reports.
- Worked on QA the data and adding Data sources, snapshot, caching to the report
- Involved in troubleshooting at database levels, error handling and performance tuning of queries and procedures.
Environment: SAS, SQL, Teradata, Oracle, PL/SQL, UNIX, XML, Python, AWS, SSRS, TSQL, Hive, Sqoop
Confidential
Data Analyst
Responsibilities:
- Worked with Data Analysts to understand Business logic and User Requirements.
- Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
- Created reports for the Data Analysis using SQL Server Reporting Services.
- Created V-Look Up functions in MS Excel for searching data in large spreadsheets.
- Created SQL queries to simplify migration progress reports and analyses.
- Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.
- Developed Stored Procedures in SQL Server to consolidate common DML transactions such as insert, update and delete from the database.
- Developed reporting and various dashboards across all areas of the client's business to help analyze the data.
- Cleansed and manipulated data by sub-setting, sorting, and pivoting on need basis.
- Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.
- Developed the stored procedures as required, and user defined functions and triggers as needed using T-SQL.
- Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
- Created reports from OLAP, sub reports, bar charts and matrix reports using SSIS.
- With V-lookups, Pivot tables, and Macros in Excel developed ad-hoc reports and recommended solutions to drive business decision making.
- Used Excel and PowerPoint on various projects as needed for presentations and summarization of data to provide insight on key business decisions.
- Designed Ad-hoc reports using SQL and Tableau dashboards, facilitating data driven decisions for business users.
- Extracted data from different sources performing Data Integrity and quality checks.
- Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
- Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
- Collected, analyze and interpret complex data for reporting and/or performance trend analysis
- Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.
Environment: SQL Server, MS Excel, V-Look, T-SQL, SSRS, SSIS, OLAP, PowerPoint.