We provide IT Staff Augmentation Services!

Bigdata Engineer Resume

4.00/5 (Submit Your Rating)

MO

SUMMARY

  • Overall 6+ years of experience as Big Data Engineer, Data Analyst and Python developer, comprises designing, development, and implementation of data models for enterprise - level applications.
  • Hands-on experience on major components in Hadoop Ecosystem including Spark (PySpark) Hive, HBase- Hive Integration, PIG, Sqoop, Flume, MapReduce framework and HDFS.
  • Experience in using various AWS Services such as EMR, S3, cloud watch, Lambda, GLUE to run and monitor Hadoop and spark jobs on AWS Environment.
  • Responsible for transforming and loading teh large sets of structured, semi-structured and unstructured data.
  • Expertise in developing multiple Kafka Producers and Consumers as per teh requirements.
  • Proficient in developing Spark applications using Scala and Python to transform teh data by implementing data frames, data sets and RDD’s.
  • Performed Data Profiling and Data Analysis using SQL on different extracts.
  • Used Repl for developing Spark scripts in Spark shell.
  • Experience in developing a pipeline using Kafka to store data into HDFS.
  • Experience in creating databases, users, tables, triggers, macros, views, stored procedures, functions, packages, and hash indexes in Teradata database.
  • Good experience in using Apache NiFi for automation of teh data movement between various Hadoop systems.
  • Experienced in designing different time driven and data driven automated workflows using Oozie.
  • Strong knowledge in working wif ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
  • Experience in configuring teh Zookeeper to coordinate teh servers in clusters and to maintain teh data consistency.
  • Experienced in working wif Amazon Web Services (AWS) using EC2 for computing and S3 as storage.
  • Experience in importing and exporting teh data using Sqoop from HDFS to Relational Database Systems and from Relational Database Systems to HDFS.
  • Extensive experience in development of Bash scripting and PL/SQL scripts.
  • Proficient in relational databases like Oracle, MySQL and SQL Server.
  • Knowledge in using Integrated Development environments like Eclipse, NetBeans, IntelliJ, STS.
  • Capable in working wif SDLC, Agile and Waterfall Methodologies.
  • Excellent Communication skills, Interpersonal skills, problem solving skills and a team player. Ability to quickly adapt new environment and technologies.
  • Perform data profiling; identify/communicate data quality issues and work wif other teams as needed to resolve them.
  • Set up standards and processes for Hadoop based application design and implementation of various Hadoop based applications.
  • Extensive experience on various version controls like Git, SVN.
  • Extremely organized wif demonstrated skills to perform several tasks and assignments simultaneously wifin teh scheduled time.
  • Ability to work wif managers and executives to understand teh business objectives and deliver as per teh business needs and a firm believer in teamwork.
  • Excellent Problem solving, programming, active thinking and communication skills.
  • Experience working both independently and collaboratively to solve problems.

TECHNICAL SKILLS

Big Data Technologies: Hadoop (HDFS, MapReduce), Spark, PySpark, Hive, Kafka-Storm, Pig, Sqoop, Oozie, Cassandra.

Bigdata distribution: Cloudera, Hortonworks, Amazon EMR

Programing Languages: Java, Python, Scala, Unix Shell scripting, Spark SQL, HiveQL, C, C++

Databases: MySQL, MS-SQL Server, NoSQL (HBase, MongoDB)

Java Technologies: JSP, Servlets, JDBC, Junit.

Web Technologies: HTML, XML, JavaScript, jQuery, CSS, Python.

Operating Systems: Windows, Linux (Ubuntu)

Web Services: AWS.

Reporting/ETL Tools: Tableau, QlikView, Informatica, Pentaho.

Servers: Apache Tomcat, WebSphere, JBoss

IDE's: Eclipse, IntelliJ IDEA, NetBeans.

PROFESSIONAL EXPERIENCE

Confidential, MO

BigData Engineer

Responsibilities:

  • Participate wif technical staff team and business managers and practitioners in teh business unit to determine requirements and functionalities needed in a project.
  • Developed data transition programs from DynamoDB to AWS Redshift (ETL Process) using AWS Lambda by creating functions in Python for teh certain events based on use cases.
  • Implemented teh AWS cloud computing platform by using RDS, Python, Dynamo DB, S3 and Redshift.
  • Moved data from teh traditional databases like MS SQL Server, MySQL, and Oracle into teh Hadoop by using Sqoop.
  • Worked on developing workflow in Oozie to automate teh tasks of loading data into HDFS and preprocessing wif Pig.
  • Worked on creating Oozie workflow and teh coordinator jobs to remove teh jobs in time for availability of data.
  • Experienced import/export data into HDFS/Hive from relational database and Teradata using Sqoop.
  • Involved in teh creation of Hive tables, loading, and analyzing teh data by using hive queries.
  • Has worked on creating and configuration of EC2 instances on AWS (Amazon Web Services) for teh establishment of clusters on teh cloud.
  • Worked on CI/CD solution, using Git, Jenkins, Docker to setup and configure big data architecture on AWS cloud platform.
  • Wrote Spark-Streaming applications to consume teh data from Kafka topics and write teh processed streams to HBase.
  • Worked on AWS Lambda to run servers wifout managing them and to trigger run code by S3 and SNS.
  • Working on integrating Kafka Publisher in spark job to capture errors from Spark Application and push into database.
  • Involving in processing log files generated from various sources to HDFS for further processing through Elastic Search, Kafka, Flume & Talend and process teh files using Piggybank.
  • Writing custom Kafka consumer code and modified existing producer code in Python to push data to Spark-streaming jobs.
  • Exposed to all aspects of software development life cycle (SDLC) like Analysis, Planning, Developing, Testing, implementing and post-production analysis of teh projects. Worked through Waterfall, Scrum/Agile Methodologies.
  • Configuring a MongoDB cluster wif High Availability, Load balancing and performing CRUD operations.
  • Loading data from flat files into teh target database using ETL processes by applying business logic for inserting and updating records when loaded.
  • Performed day-to-day Database Maintenance tasks including Database Monitoring, Backups, Space, and Resource Utilization.
  • Used Tableau Data Visualization tool for reports, integrated tableau wif Alteryx for Data & Analytics.
  • Building Nifi data pipelines in docker container environment in development phase.
  • Creating Docker images to support models in different formats and developed using different languages.
  • Used Maven in building and deploying code in Yarn cluster.
  • Used JIRA as teh Scrum Tool for Scrum Task board and work on user stories.
  • Experience in building scripts by using Maven and performing continuous integrations systems like Jenkins.

Environment: Hadoop, Spark, Hive, HDFS, Kafka, UNIX, Shell, AWS Services, Python, Scala, GLUE, Oozie, SQL, AWSSageMaker, Nifi, Docker, Kafka, Kubernetes, Tableau

Confidential, CA

Hadoop Developer

Responsibilities:

  • Develop a data set process for data modeling and recommend teh ways to improve data quality, efficiency, and reliability.
  • Extracted, Transformed, and Loaded (ETL) and Data Cleansing of data from various sources like XML files, Flat files, and Databases and Involved in UAT, Batch testing, test plans.
  • Responsible for writing Hive Queries to analyze teh data in Hive warehouse using Hive Query Language (HQL). Involved in developing Hive DDLs to create, drop, and alter tables.
  • Extracted teh data and updated it into HDFS using Sqoop Import from various sources like Oracle, Teradata, SQL server etc.
  • Created Hive staging tables and external tables and joined teh tables as required.
  • Implemented Dynamic Partitioning, Static Partitioning and Bucketing.
  • Installed and configured Hadoop Map Reduce, Hive, HDFS, Pig, Sqoop, Flume and Oozie on Hadoop cluster.
  • Implemented Sqoop jobs for data ingestion from teh Oracle to Hive.
  • Managed Azure Infrastructure Azure Web Roles, Worker Roles, VM Role, Azure SQL, Azure Storage, Azure AD Licenses, Virtual Machine Backup and Recover from a Recovery Services Vault using Azure PowerShell and Azure Portal.
  • Working Experience onAzure Databrickscloud to organizing teh data into notebooks and making it easy to visualize data using dashboards.
  • Worked in DevelopingSparkapplications usingSpark - SQLinDatabricksfor data extraction, transformation, and aggregation from multiple file formats for analyzing & transforming teh data to uncover insights into teh customer usage patterns.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing teh data in InAzure Databricks.
  • Responsible for estimating teh cluster size, monitoring, and troubleshooting of teh Spark Databricks cluster.
  • Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs.
  • Utilized Azure HDInsight to monitor and manage teh Hadoop Cluster.
  • Automated Sqoop incremental imports by using Sqoop jobs and automated jobs using Oozie.
  • Worked on various compression and file formats like Avro, Parquet, and Text formats.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using HQL.
  • Responsible for creating complex dynamic partition tables using Hive for best performance and faster querying.
  • Involved in developing Hive User Defined Functions, compiling them into jars, and adding them to teh HDFS, and executing them wif Hive Queries.
  • Worked wif various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files. Mastered in using different columnar file formats like RC, ORC and Parquet formats.
  • Developed custom teh Unix/BASH SHELL scripts for teh purpose of pre- and post-validations of teh master and slave nodes, before and after teh configuration of teh name node and data nodes, respectively.
  • Developed job workflows in Oozie for automating teh tasks of loading teh data into HDFS.
  • Implemented compact and efficient file storage of big data by using various file formats like Avro, Parquet, JSON and using compression methods like GZip, Snappy on top of teh files.
  • Exploring wif Spark, improving performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame and Pair RDD's.
  • Worked on Spark using Python as well as Scala and Spark SQL for faster testing and processing of data.
  • Extensively used Stash, Bit-Bucket and GITHUB for teh code control purpose.
  • Migrated Map reduce jobs to Spark jobs for achieving a better performance.
  • Wrote test cases, analyzed and reporting test results to product teams.

Environment: Hadoop, HDFS, Hive, Sqoop, Apache Spark, Spark-SQL, ETL, Maven, Oozie, Scala, Python3, Unix shell scripting.

Confidential

Hadoop Developer

Responsibilities:

  • Involved in data extraction dat includes analyzing, reviewing, modeling based on requirements using higher-level tools such as Hive and Spark.
  • Performed advanced procedures like text analytics and processing, using teh in-memory computing capabilities of Spark.
  • Experience in writing scripts in Unix for teh automation of teh sanitization process.
  • Experience using Sqoop from HDFS to RDBMS & vice versa in importing and exporting data.
  • Experience in developing workflow using Oozie for running Map Reduce jobs and Hive Queries.
  • Extensive practical experience in incremental import by creating Sqoop meta store jobs.
  • Experience in using Apache Flume for collecting, aggregation, moving large amount of data from application server and handling variety of data using streaming and velocity of data.
  • Experience in Extraction, Transformation and Loading (ETL) of data from multiple sources like XML files and Databases.
  • Involved in writing complex Hive queries to load and process data in Hadoop File System and performance tuning.
  • Implement Spark applications using Scala to perform advanced procedures like text analytics and processing, utilizing data frames and Spark SQL API wif in-memory computing capabilities of Spark for faster processing of data.
  • Designed teh process to ingest data from Cloudera cluster Edge Node to Hive for further processing of teh data.
  • Developed scripts to use Secure File Transfer Protocol (SFTP) to send files securely from one server to another.
  • Worked wif Control - M for scheduling and automating Batch Processing.
  • Validating against teh Target tables loaded into teh cluster and reporting in case of any discrepancies.
  • Raising defects in case of any found in teh process.
  • Preparing use cases for big data work in technical workshops.
  • Organized daily SCRUM meeting wif team, prioritize product backlog items and responsible for timely delivery and deployment of product releases.

Environment: MySQL, Spark, Hadoop, Hive, HDFS, Control- M, Flume, Sqoop, SQL Server, Unix Scripts.

Confidential

Python Developer

Responsibilities:

  • Development Life Cycle (SDLC) and used agile methodology for developing application.
  • Designed teh front end of teh application using Python 2.7, HTML, CSS, JSON and jQuery. Worked on backend of teh application.
  • Involved in analysis and design of teh application features.
  • Created UI using JavaScript and HTML/CSS.
  • Writing backend programming in Python.
  • Used JavaScript and XML to update a portion of a webpage.
  • Worked on changes to open stack accommodate large-scale data center deployment.
  • Worked in MySQL database on simple queries and writing Stored Procedures for normalization.
  • Responsible for handling teh integration of database system.
  • Developed and Deployed SOAP based Web Services on Tomcat Server.
  • Used object-relational mapping (ORM) solution, technique of mapping data representation from MVC model wif an SQL-based scheme.
  • Used IDE tool to develop teh application and JIRA for bug and issue tracking.
  • Used GIT to coordinate team development.

Environment: Python 2.7, MySQL, HTML, CSS, JavaScript, jQuery, Sublime text, JIRA, GIT.

Confidential

Python Developer

Responsibilities:

  • Designing User interfaces using HTML and CSS.
  • Designed and developed application using Python programming language.
  • Involved in designing and testing of teh application.
  • Worked in many code changes and enhancements as per teh requirements.
  • Implemented and optimized, complex SQL queries for teh generation of reports.
  • Managed Code Repository via GIT.
  • Developed testcases to verify business logic.

Environment: Python, HTML, CSS, GIT, MySQL.

We'd love your feedback!