Sr. Big Data Engineer Resume
Shelton, CT
SUMMARY
- Overall 7+ years of experience as Big Data Engineer including designing, developing and implementation of data models for enterprise - level applications and systems.
- Experience in Worked on NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig, and Splunk.
- Strong development experience in Java/JDK 7, JEE6, Maven, Jenkins, Jersey, Servlets, JSP, Struts, Spring, Hibernate, JDBC, Java Beans, JMS, JNDI, XML, XML Schema, Web
- Good experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
- Worked on Hadoop, Hive, JAVA, python, Scala Struts web framework.
- Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
- Extensive experience in development of T-SQL, Oracle PL/SQL Scripts, Stored Procedures and Triggers for business logic implementation.
- Expertise in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) tools.
- Involve in writing SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries.
- Good knowledge on Python Collections, Python Scripting and Multi-Threading.
- Written multiple MapReduce programs in Python for data extraction, transformation and aggregation from multiple file formats.
- Extensive experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Excellent working with data modeling tools like Erwin, Power Designer and ER/Studio.
- Proficient working experience on big data tools like Hadoop, Azure Data Lake, and AWS, AWS GLUE, Redshift.
- Great hands on experience with Pyspark for using Spark libraries by using python scripting for data analysis.
- Experience in professional system support and solution-based IT services for Linux OS
- Strong experience in Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.
- Excellent technical and analytical skills with clear understanding of design goals and development for OLTP and dimension modeling for OLAP.
- Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
- Designing and Developing Oracle PL/SQL and Shell Scripts, Data Conversions and Data Cleansing.
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services successfully loaded files to HDFS from Oracle, SQL Server, Teradata and Netezza using Sqoop.
- Experienced of building Data Warehouse in Azure platform using Azure data bricks and data factory
- Extensive knowledge in working with IDE Tools such as My Eclipse, RAD, IntelliJ, NetBeans
- Expert in Amazon EMR, S3, ECS, Elastic Cache, Dynamo DB, Redshift.
- Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along with CDH4 & CDH5 clusters.
- Experience in Dimensional Data Modeling, Star/Snowflake schema, FACT and Dimension tables.
- Good Working on Apache Nifi as ETL tool for batch processing and real time processing.
TECHNICAL SKILLS
Big Data Ecosystem: Map Reduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11
Cloud Management: Amazon Web Services(AWS), Amazon Red shift
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Programming Languages: SQL, PL/SQL, UNIX shell Scripting
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.
Operating System: Windows 7/8/10, Unix, Sun Solaris
ETL/Data warehouse Tools: Informatica v10, SAP Business Objects Business Intelligence 4.2 Service Pack 03, Talend, Tableau, and Pentaho.
Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.
Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6.
PROFESSIONAL EXPERIENCE
Confidential - Shelton, CT
Sr. Big Data Engineer
Responsibilities:
- As a Sr.BigDataEngineer, I will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
- Generated JavaAPIs for retrieval and analysis on No-SQL database such as HBase.
- Developed Hive queries to process the data and generate the data for visualizing.
- Responsible for installing, configuring, supporting and managing of Hadoop Clusters.
- Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams.
- Involved in writing Client side Scripts using Java Scripts and Server Side scripts using Java Beans and used servlets for handling the business.
- Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
- Conducted JAD sessions with management, vendors, users and other stakeholders for open and pending issues to develop specifications.
- Developed Java Web Applications using JSP and Servlets, Struts, Hibernate, spring, RestWebServices, SOAP.
- Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark.
- Managed and reviewed Hadoop log files as a part of administration for troubleshooting purposes.
- Loaded data into Hive Tables from Hadoop Distributed File System (HDFS) to provide SQL-like access on Hadoop data.
- Used Erwin tool to develop a Conceptual Model based on business requirements analysis.
- Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
- Experienced in developing Web Services with Python programming language.
- Designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
- Imported the data from different sources like HDFS/Hbase into SparkRDD.
- Used Apache Hive to analyze the partitioned and bucketeddataand compute various metrics for reporting on the dashboard.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files in AWS s3.
- Build and maintain SQL scripts, Indexes, and complex queries for data analysis and extraction.
- Implemented Kafka High level consumers to getdatafrom Kafka partitions and move into HDFS.
- Worked with Hadoop development Distributions like Cloudera (CDH4 & CDH5), Hortonworks and Amazon Web Services (AWS) in testing big data solutions & also for monitoring and managing Hadoop clusters.
- Used Python to extract weekly information from XML files.
- Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
- Collected large amounts of log data using Apache Flume and aggregating using PIG/HIVE in HDFS for further analysis.
- Generated various reports using SQL Server Report Services (SSRS), SQL Server Integrating Services for business analysts and the management team.
- Created AWS S3 buckets also managed policies for AWS S3 buckets and Utilized AWS S3 bucket to store.
- Front end development utilizing HTML5, CSS3, and JavaScript leveraging the Bootstrap framework and a Java backend
- Involved in converting MapReduce programs into Sparktransformations using Spark RDD's on Scala.
- Worked with NoSQL databases like HBase in creating tables to load large sets of semi structureddatacoming from source systems.
- Designed and Developed Oracle and UNIX Shell Scripts for Data Import/Export and Data Conversions.
- Wrote Python scripts to parse XML documents and load the data in database.
- Designed and implemented business intelligence to support sales and operations functions to increase customer satisfaction.
- Developed and implemented different Pig UDFs to write ad-hoc and scheduled reports as required by the Business team.
Environment: Hive 2.3, Agile, Hadoop 3.0, HDFS, Erwin 9.7, NoSQL, AWS, MapReduce, Kafka, Scala, SSRS, SSIS, HBase 1.2, Python
Confidential - Bellevue, WA
Data Engineer
Responsibilities:
- Implemented the Big Data solution using Hadoop, hive to pull/load the data into the HDFS system.
- Utilized Java and MySQL from day to day to debug and fix issues with client processes
- Involved in HDFS maintenance and WEBUI it through Hadoop-Java API.
- Configured LVM (Logical Volume Manager) on various Linux servers.
- Responsible for building scalable distributed data solutions using Hadoop components.
- Worked on an Agile approach to bring in multiple systems in the Data warehouse environment.
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
- Wrote DDL and DML statements for creating, altering tables and converting characters into numeric values.
- Load the data into SparkRDD and do in memory data Computation to generate the Output response.
- Developed PySpark code to mimic the transformations performed in the on-premise environment.
- Created Sqoop jobs for importing thedatafrom Relational Database systems into HDFS.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
- Was responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
- Designed the schema, configured and deployed AWS Redshift for optimal storage and fast retrieval of data.
- Installed and configured Hadoop ecosystem like Flume, Pig and Sqoop.
- Worked On Designing And Developing Etlworkflows Using Java For Processing Data In Hdfs/Hbase Using.
- Architected, Designed and Developed Business applications and Data marts for reporting.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Performed upgrades of Packages and Patches in Linux .
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat Files, RDBMS as part of a POC using Amazon EC2.
- Built Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Developed complete end to end Big-data processing in Hadoop eco system.
- Used AWS Cloud with Infrastructure Provisioning / Configuration.
- Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Implemented Kafka for collecting real time transaction data, which was then processed with spark streaming with Python to gather actionable insights.
- Monitored all MapReduce Read Jobs running on the cluster using Ambari and ensured that they were able to read
- Developed prototype Spark applications using Spark-Core, Spark SQL, DataFrame API and developed several custom User defined functions in Hive & Pig using Java & python
- Worked on configuring and managing disaster recovery and backup on Cassandra Data.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
- Configured Sparkstreaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
- Implemented partitioning, dynamic partitions and buckets in Hive.
- Configuring Solaris, Redhat Linux Servers with Jumpstart and kick start.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Environment: Hadoop 3.0, Hive 2.3, HDFS, Agile, Sqoop, PL/SQL, AWS, Flume, Pig 0.17, YARN, MapReduce, Oozie 2.3, Python
Confidential - McLean, VA
Data Analyst/Data Engineer
Responsibilities:
- Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
- Researched, evaluated, architect, and deployed new tools, frameworks and patterns to build sustainable Big Data platforms.
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
- Involved in migrating Hive queries into Spark transformations using Data frames, Spark SQL, SQL Context, and Scala.
- Analyzed the Sql scripts and designed solutions to implement using pyspark.
- Responsible for the data architecture design delivery, data model development, review, approval and Data warehouse implementation.
- Designed and developed the conceptual then logical and physical data models to meet the needs of reporting.
- Developed various automated scripts for DI (Data Ingestion) and DL (Data Loading) using python & java map reduce.
- Involved in designing and developing Data Models and Data Marts that support the Business Intelligence Data Warehouse.
- Implemented logical and physical relational database and maintained Database Objects in the data model using Erwin.
- Supported MapReducePrograms those are running on the cluster and also wrote MapReduce jobs using JavaAPI. created custom new columns depending up on the use case while ingesting the data into Hadoop lake using pyspark.
- Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
- Worked with Hadoop eco system covering HDFS, HBase, YARN and MapReduce.
- Performed the Data Mapping, Data design (Data Modeling) to integrate the data across the multiple databases in to EDW.
- Worked on implementing Spark using Scala and SparkSQL for faster analyzing and processing ofdata.
- Designed both 3NF Data models and dimensional Data models using Star and Snowflake schemas.
- Involved in Normalization/De-normalization techniques for optimum performance in relational and dimensional database environments.
- Worked on Data modeling, Advanced SQL with Columnar Databases using AWS.
- Performed reverse engineering of the dashboard requirements to model the required data marts.
- Cleansed, extracted and analyzed business data on daily basis and prepared ad-hoc analytical reports using Excel and T-SQL
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Created Data Migration and Cleansing rules for the Integration Architecture (OLTP, ODS, DW).
- Handled performance requirements for databases in OLTP and OLAP models.
- Conducted meetings with business and development teams for data validation and end-to-end data mapping.
Environment: MapReduce, YARN, HBase, HDFS, Hadoop 3.0, Erwin 9.1, AWS, T-SQL, OLTP, OLAP
Confidential
Data Analyst
Responsibilities:
- Worked with Data Analyst for requirements gathering, business analysis and project coordination.
- Developed stored procedures in SQL Server to standardize DML transactions such as insert, update and delete from the database.
- Developed SQL Queries to fetch complex data from different tables in databases using joins, database links.
- Performed Data analysis of existing data base to understand the data flow and business rules applied to Different data bases using SQL.
- Created SSIS package to load data from Flat files, Excel and Access to SQL server using connection manager.
- Developed all the required stored procedures, user defined functions and triggers using T-SQL and SQL.
- Produced report using SQL Server Reporting Services (SSRS) and creating various types of reports.
- Performed the detail data analysis, Identify the key facts and dimensions necessary to support the business requirements.
- Used MS Visio and Rational Rose to represent system under development in a graphical form by defining use case diagrams, activity and workflow diagrams.
- Wrote a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.
- Performed Data Analysis and Data validation by writing SQL queries using SQL assistant.
- Translated business concepts into XML vocabularies by designing XML Schemas with UML
- Gathered business requirements through interviews, surveys with users and Business analysts.
- Worked on Data Mining and data validation to ensure the accuracy of the data between the warehouse and source systems.
- Involved in writing Java API for Amazon Lambda to manage some of the AWS services.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Created reports to retrieve data using Stored Procedures.
- Ensured the compliance of the extracts to the Data Quality Center initiatives.
- Worked in data management performing data analysis, gap analysis, and data mapping.
Environment: SQL, SSIS, T-SQL, SSRS, PL/SQL, XML, Excel 2010.