We provide IT Staff Augmentation Services!

Senior Big Data Engineer Resume

5.00/5 (Submit Your Rating)

Columbus, IN

SUMMARY:

  • Overall 8+ years of professional experience in Information Technology and expertise in BIG DATA using HADOOP framework and Analysis, Design, Development, Testing, Documentation, Deployment and Integration using SQL and Big Data technologies.
  • Excellent knowledge onHadoopArchitecture such asHDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduceprogramming paradigm.
  • Experience in creatingSparkStreaming jobs to process huge sets of data in real time.
  • Experience tuning spark jobs for efficiency in terms of storage and processing.
  • Extensive experience usingMAVENas a Build Tool for the building of deployable artifacts from source code.
  • Experienced with Docker and Kuberneteson multiple cloud providers, from helping developers build and containerize their application (CI/CD) to deploying either on public or private cloud.
  • Proficient in usage of tools likeErwin (Data Modeler, Model Mart, navigator),ER Studio,IBM Meta Data Workbench, Oracle data profiling tool, Informatica, Oracle Forms, Reports, SQL*Plus, Toad, Crystal Reports.
  • Good understanding of Hadoop Gen1/Gen2 architecture and hands - on experience withHadoopcomponents such as Job Tracker, Task Tracker, Name Node, Secondary Name Node,Data Node, Map Reduce concepts andYARNarchitecture which includes Node manager, Resource manager and App Master
  • Involved in writingdata transformations, data cleansingusingPIG operationsand good experience in data retrieving and processing usingHIVE.
  • Worked withHBaseto conduct quick look ups (updates, inserts and deletes) in Hadoop.
  • Experienced in loading dataset intoHiveforETL(Extract, Transfer and Load) operation.
  • Experience inimporting and exportingdata usingSqoopfrom Relational Database Systems toHDFSand vice - versa.
  • Experienced in loading dataset intoHiveforETL(Extract, Transfer and Load) operation.
  • Experience inimporting and exportingdata usingSqoopfrom Relational Database Systems toHDFSand vice - versa
  • Experience in Microsoft Azure/Cloud Services like SQL Data Warehouse, Azure SQL Server, Azure Databricks, Azure Data Lake, Azure Blob Storage, Azure Data Factory
  • Extensive Experience on importing and exporting data using stream processing platforms likeFlume.
  • Experience in database development using SQL and PL/SQL and experience working on databases likeOracle 12c/ 11g/10g, SQL Server and MySQL.
  • Experience in developing MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Developed ApacheSparkjobs using Scala andPythonfor faster data processing and used Spark Core and Spark SQL libraries for querying.
  • Experience working with Amazon's AWS services likeEC2, EMR, S3, KMS, Kinesis, Lambda, API gateways, IAM etc.
  • Experience inOozieand workflow scheduler to manage Hadoop jobs byDirect Acyclic Graph (DAG)of actions with control flows.
  • Proficient in using Hive optimization techniques like Buckets, Partitions, etc.
  • Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems and generating data visualizations using R, SAS and Python and creating dashboards using tools like Tableau.

TECHNICAL SKILLS:

Languages: SQL, PL/SQL, PYTHON, Java, Scala, C, HTML, Unix, Linux

Data Modeling Tools: ERwin, Power Designer, Embarcadero ER Studio, IBM Rational Software Architect, MS Visio, ER Studio, Star Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables

ETL Tools: AWS Redshift Matillion, Alteryx, Informatica PowerCenter, Ab Initio

Big Data: HDFS, Map Reduce, Spark, Airflow, Yarn, NiFi, HBase, Hive, Pig, Flume, Sqoop, Kafka, Oozie, Hadoop, Zookeeper, Spark SQL.

Concepts and Methods: Business Intelligence, Data Warehousing, Data Modeling, Requirement Analysis

RDBMS: Oracle 9i/10g/11g/12c, Teradata, My SQL, MS SQL

NO SQL: MongoDB, HBase, Cassandra

Cloud Platform: Microsoft Azure, AWS (Amazon Web Services), Snowflake

Application Servers: Apache Tomcat, Web Sphere, Web logic, JBoss

Other Tools: Azure Databricks, Azure Data Explores, Azure HDInsight, Power BI

Operating Systems: UNIX, Windows, Linux

PROFESSIONAL EXPERIENCE:

Confidential

Senior Big Data Engineer

Responsibilities:

  • Performed data manipulation on extracted data using Python Pandas.
  • Work with subject matter experts and project team to identify, define, collate, document and communicate the data migration requirements.
  • Using HBase to store majority of data which needs to be divided based on region.
  • Created Hive, Pig, SQL and HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Using Nebula Metadata, registered Business and Technical Datasets for corresponding SQL scripts
  • Experienced in working with spark ecosystem using Spark SQL and Scala queries on different formats like text file, CSV file.
  • Developed spark code and spark-SQL/streaming for faster testing and processing of data.
  • Implemented ApacheDrillon Hadoop to join data from SQL and No SQL databases and store it i Configured Spark streaming to receive real time data from Kafka and store the stream data to HDFS using Scala.
  • Involved in Debugging and monitoring and troubleshooting issues.
  • Developed highly scalable and reliable data engineering solutions for moving data efficiently across systems using Where cape (ETL tool).
  • Experienced of buildingData WarehouseinAzure platformusingAzure data bricksanddata factory.
  • Implementation and data integration in developing large-scale system software experiencing wif Hadoop ecosystem components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig.
  • Design and Implemented the Sqoop incremental imports, delta imports on tables without primary keys and dates from Teradata and SAP HANA and appends directly into Hive Warehouse.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Built customtableau/ SAP Business Objectsdashboards for the Salesforce for accepting the parameters from the Salesforce to show the relevant data for that selected object.
  • Hands on Ab initio ETL, Data Mapping, Transformation and Loading in complex and high-volume environment
  • Setting up Azure infrastructure likestorage accounts, integration runtime, service principalid, app registrations to enablescalable and optimizedutilization of business user analytical requirements in Azure.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Experience in usingZookeeperandOozieoperational services to coordinate clusters and scheduling workflows
  • Experience in using Kafka and Kafka brokers to initiate spark context and processing livestreaming.
  • Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive usingHBase-Hive Integration.
  • Involved in SQOOP implementation which helps in loading data from various RDBMS sources toHadoopsystems and vice versa.
  • Automated the data processing with Oozie to automate data loading into the Hadoop Distributed File System.
  • Expertise in Python and Scala, user-defined functions (UDF) for Hive and Pig using Python.
  • Develop best practice, processes, and standards for effectively carrying out data migration activities. Work across multiple functional projects to understand data usage and implications for data migration.
  • Prepare data migration plans including migration risk, milestones, quality and business sign-off details.
  • Used Sqoop to channel data from different sources of HDFS and RDBMS.
  • Writing PySpark and spark sql transformation in Azure Data bricks to perform complex transformations for business rule implementation developed Oozie workflow engine to run multiple Hive, Pig, Tealeaf, Mongo DB, Git, Sqoop and Spark jobs.
  • Implemented python codebase for branch management over Kafka features.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Worked on to retrieve the data from FS to S3 using spark commands
  • Configured Zookeeper, worked on Hadoop High Availability wif Zookeeper failover controller, add support for scalable, fault-tolerant data solution.
  • Used HBase/Phoenix to support front end applications that retrieve data using row keys
  • Created Metric tables, End user views in Snowflake to feed data for Tableau refresh.
  • Developed Scala scripts using both Data frames/SQL and RDD/Map reduce in Spark for Data Aggregation, queries and writing data back into OLTP system through SQOOP.
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala.
  • Analyzed data, identify anomalies, and provide usable insight to customers.
  • Ensured accuracy and integrity of data through analysis, Testing and profiling using Atacama.

Environment: Python, Hadoop, Azure, Databricks, Data Factory, Data Lake, Data Storage, Teradata, Unix, DB2, PL/SQL, MS SQL, Ab initio ETL, Data Mapping, Spark, tableau, Nebula Metadata, Unix, Sql Server, Scala, Git.

Confidential, Columbus, IN

Big Data Engineer

Responsibilities:

  • Extracted Mega Data from AWS using SQL Queries to create reports.
  • Design and maintain CI/CD pipelines.
  • Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.
  • Designed a data analysis pipeline in Python, using Amazon Web Services such as S3, EC2, Lambda, Auto Scaling, Cloud Watch, IAM, Security Groups, Cloud Formation and Elastic Map Reduce.
  • Configured Zookeeper, worked on Hadoop High Availability with Zookeeper failover controller, add support for scalable, fault-tolerant data solution.
  • DevOps Practice for Micro Services using Kubernetes as Orchestrator.
  • Extensively using open source languagesPerl,Python,ScalaandJava.
  • Responsible for building scalable distributed data solutions usingHadoopand involved in Job management using Fair scheduler and Developed job processing scripts usingOozie workflow.
  • Extensively worked with Partitions, Dynamic Partitioning, bucketing tables in Hive, designed both Managed and External tables, also worked on optimization of Hive queries.
  • Designed workflows & coordinators for the task management and scheduling using Oozie to orchestrate the jobs
  • Involved in SQOOP implementation which helps in loading data from various RDBMS sources toHadoopsystems and vice versa.
  • Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
  • Expertise in writingHadoopJobs for analyzing data using Spark, Hive, Pig MapReduce, Hive.
  • Capable of understanding and knowledge of jobworkflow schedulingand locking tools/services likeOozie, Zookeeper, Airflow and Apache NiFi.
  • Created continuous integration and continuous delivery (CI/CD) pipeline on AWS that helps to automate steps in software delivery process
  • Responsible for developing data pipeline withAmazon AWSto extract the data from weblogs and store inHDFSand worked extensively withSqoopfor importing metadata fromOracle.
  • Implemented sentiment analysis and text analytics on Twitter social media feeds and market news using Scala and Python.
  • DevelopedETLjobs usingSpark-Scalato migrate data from Oracle to new hive tables.
  • Setting up DevOps pipelines for CI/CD on GIT, Jenkins, Nexus repository
  • Developed workflow in Oozie to automate the tasks of loading the data into Nifi and pre-processing with Pig.
  • Utilized SQOOP, Kafka, Flume and Hadoop File system APIs for implementing data ingestion pipelines
  • Migrate on in-house database toAWSCloud and also designed, built, and deployed a multitude of applications utilizing theAWS stack (Including EC2, RDS)by focusing on high-availability and auto-scaling.
  • Expertise in data transformation & analysis usingSPARK,PIG, HIVE
  • Experience in configuring theZookeeperto coordinate the servers in clusters and to maintain the data consistency which is important for decision making in the process.
  • Design and develop hive, HBase data structure and Oozie workflow.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Build and configuredApacheTEZonHiveandPIGto achieve better responsive time while running MR Jobs.
  • Experience in transferring Streaming data, data from different data sources into HDFS and NoSQL databases using Apache Flume. Cluster coordination services through Zookeeper.
  • Schedule nightly batch jobs using Oozie to perform schema validation and IVP transformation at larger scale to take the advantage of the power of Hadoop.
  • Worked extensively on AWS Components such as Elastic Map Reduce (EMR)
  • DevelopedOozieworkflow jobs to executehive, SqoopandMap Reduceactions.
  • Programming using Java and Scala.
  • Automated the data processing with Oozie to automate data loading into the Hadoop Distributed File System.
  • Worked on various applications using python integrated IDEs like Visual Studio, PyCharm.
  • UpdatedPythonscripts to match data with our database stored inAWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Implemented python codebase for branch management over Kafka features.
  • Worked on results from Kafka server output successfully.
  • Extending Hive and Pig core functionality by writing customUDFs, UDTF and UDAFs.
  • UsedAWS Data Pipelineto schedule anAmazon EMR clusterto clean and process web server logs stored inAmazon S3 bucket.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in Map Reduce way.
  • Generated report on predictive analytics using Python and Tableau including visualizing model performance and prediction results.

Environment: Hadoop, Hive, Sqoop, Hdfs, Kafka, MapReduce, Scala, Spark, Python, Oozie, AWS, EC2, S3, EMR, Lambda, Auto Scaling, Cloud Watch, Talend, SAS, Docker, Jenkins, Kubernetes, Unix, PL/SQL, Oracle 12c.

Confidential, Reston, VA

Big Data Engineer

Responsibilities:

  • Used Spark API over EMR Cluster Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Used Oozie and Zookeeper operational services for coordinating cluster and scheduling workflows.
  • Developed data pipeline using Spark, Hive, Pig and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
  • Setup and benchmarked Hadoop/HBase clusters for internal use.
  • Developed Map Reduce/Spark Python modules for machine learning & predictive analytics in Hadoop on AWS.
  • Created Sqoop scripts to import/export user profile data from RDBMS to S3 data lake.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Experience in job workflow scheduling and monitoring tools likeOozieand good knowledge onZookeeperto coordinate the servers in clusters and to maintain the data consistency.
  • Experience in open-source Kafka, zookeepers, Kafka connects.
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
  • Used Spark, Hive for implementing the transformations need to join the daily ingested data to historic data.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time.
  • Developed a python script to hit REST API’s and extract data to AWS S3
  • Configure Zookeeper to coordinate and support Kafka, Spark, Spark Streaming, HBase and HDFS
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Schedule nightly batch jobs using Oozie to perform schema validation and IVP transformation at larger scale to take the advantage of the power of Hadoop.
  • Experience in developing customizedUDF’sin Python to extend Hive and Pig Latin functionality.
  • Loading Data into HBase using Bulk Load and Non-bulk load.
  • Strong experience in core Java,Scala, SQL, PL/SQL and Restful web services.
  • Developed reusable transformations to load data from flat files and other data sources to the Data Warehouse.
  • UsedZookeeperto provide coordination services to the cluster. Experienced in managing and reviewingHadooplog files.
  • Developed Sqoop jobs for performing incremental loads from RDBMS into HDFS and further applied Spark transformations
  • Working on designing the Map Reduce and Yarn flow and writing Map Reduce scripts, performance tuning and debugging.
  • Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.
  • Worked in AWS environment for development and deployment of Custom Hadoop Applications.
  • Assisted operation support team for transactional data loads in developing SQL Loader & Unix scripts
  • Implemented Python script to call the Cassandra Rest API, performed transformations and loaded the data into Hive.
  • Extensively worked on Python and build the custom ingest framework.
  • Develop Map Reduce jobs for Data Cleanup in Python and C#.
  • Experienced in writing live Real-time Processing using Spark Streaming with Kafka.
  • Created Cassandra tables to store various data formats of data coming from different sources.

Environment: Hadoop YARN, Spark, Spark Streaming, Spark SQL, Scala, Kafka, Python, Hive, Sqoop, Impala, Tableau, Talend, Oozie, Control-M, Java, AWSS3, Oracle, Linux

Confidential

Data Engineer

Responsibilities:

  • Performed troubleshooting, fixed and deployed manyPython bug fixesforLearning Management System
  • Worked on Configuring Zookeeper, Kafka and LogStash cluster for data ingestion and Elasticsearch performance and optimization and Worked on Kafka for live streaming of data.
  • WroteMap Reducejobs using Java API and Pig Latin.
  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and Mllib.
  • Currently developing a generic database data extractor and replicator that moves data across several database types - initially to cater Hive/Impala, Redshift, Vertica and Amazon S3 sources and/or destinations. The tool is being written in Scala and Apache Spark.
  • Created and maintained SQL Server scheduled jobs, executing stored procedures for the purpose of extracting data from Oracle into SQL Server. Extensively used Tableau for customer marketing data visualization
  • Used Python Flask framework to build modular & maintainable applications.
  • Developed export framework using python, Sqoop, Hive and Netezza using Aginity Work Bench.
  • Automated data movements usingPython scripts.
  • Work on data that was a combination of unstructured and structured data from multiple sources and automate the cleaning usingPython scripts.
  • Performed end- to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3.
  • Developed fully customized framework using python, shell script, Sqoop & hive.
  • Involved in data ingestion from different RDBMS sources into Hadoop using Sqoop.
  • Used AWS glue catalog with crawler to get the data from S3 and perform sql query operations
  • Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift
  • Created jobs to import data from Oracle MAINODS to Elasticsearch.
  • Worked with various HDFS file formats like Avro, Sequence File, Nifi, Json and various compression formats like Snappy, bzip2.
  • Use Spark to process the data before ingesting the data into the HBase. Both Batch and real-time spark jobs were created using Scala.
  • Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources
  • Used python Boto 3 to configure the services AWS glue, EC2, S3
  • Written multiple Hive UDFS using Core Java and OOP concepts and spark functions with in Python programs.

Environment: Hadoop, Hive, Python, Django, Flask, Sqoop, Kafka, Scala, Nifi, AWS, S3, Glue, Redshift, Syncsort, Oracle 11g, DB2, SSIS, Elastic Search, Kibana, Tableau, PL/SQL, SQL Server, SQL Developer

Confidential

Data Engineer

Responsibilities:

  • Involved in Requirements Analysis and design an Object-oriented domain model.
  • Created Excel charts and pivot tables for the Ad-hoc Data pull.
  • Collecting and aggregating large amounts of log data and staging data in HDFS for further analysis.
  • Experience in managing and reviewing Hadoop Log files.
  • Created Column Store indexes on dimension and fact tables in the OLTP database to enhance read operation.
  • GUI prompts user to enter rmation, charity items to donate, and deliver options
  • Developed a fully functioning C# program that connects to SQL Server Management Studio and integrates information users enter with preexisting information in the database
  • Involvement in the detailed Documentation, written functional specifications of the module.
  • Built reports and report models using SSRS to enable end user report builder usage.
  • Implemented SQL functions to receive user information from front end C# GUIs and store it into database
  • Deployed Web, presentation, and business components on Apache Tomcat Application Server.
  • Developed PL/SQL procedures for different use case scenarios
  • Apache ANT was used for the entire build process.
  • Worked on report writing using SQL Server Reporting Services (SSRS) and in creating various types of reports like table, matrix, and chart report, web reporting by customizing URL Access.

Environment: Hadoop, SQL, SSRS, SSIS, OLTP, PL/SQL, Oracle 9i, Log4j, ANT, Clear-case, Windows.

We'd love your feedback!