We provide IT Staff Augmentation Services!

Big Data Engineer Resume

3.00/5 (Submit Your Rating)

Tampa, FloridA

SUMMARY

  • 5+ years of professional experience in the IT industry, involving 3+ years of experience with Big Data tools in developing applications using Apache Hadoop/Spark echo systems and 2 years of experience in software applications development lifecycle with Python Technologies.
  • Excellent understanding/knowledge ofHadooparchitecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Edge Node, and MapReduce programming paradigm.
  • Huge hands - on development of complex data ingestion pipelines, data transformations, data management, and data governance in a centralized enterprise data hub
  • Experienced in working with Spark ecosystem using Spark-SQL, Data Frames, and Scala/python queries on different data file formats like .txt, .csv, etc.
  • Designing and creating Hive external tables using a shared meta-store instead of the derby with partitioning and bucketing.
  • Experience with Autosys scheduler to manage Hadoop jobs by developing, deploying, and maintaining JIL scripts.
  • Experience in integrating Hive and HBase for effective operations.
  • Knowledge of supporting data projects using Elastic Map Reduce on Amazon web Services (AWS) and importing/exporting data to and from S3.
  • Experience working on different file formats like Avro, Parquet, ORC, and Sequence and Compression techniques like Snappy in Hadoop.
  • Strong understanding of NoSQL databases and hands-on work experience in writing applications on NoSQL databases like HBase.
  • Experience in creating HBase tables to load large sets of data from various data sources.
  • Implemented POC to migrate Map Reduce jobs into Spark RDD, Data frame transformations using Scala and python.
  • Have knowledge in creating real-time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
  • Working knowledge on major Hadoop ecosystems Hive, Sqoop, Impala, etc.
  • Good experience in Cloudera, Hortonworks & ApacheHadoopdistributions.
  • Knowledge with high throughput streaming apps reading from Kafka queues and writing enriched data back to outbound Kafka queues.
  • Experience in tuning and troubleshooting performance issues inthe Hadoopcluster.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Experience working with various Robotic Process Automation tools like Blue Prism, UI Path, and Appian Process Robot to develop automation solutions to various repetitive, rule-based and mundane business tasks.

TECHNICAL SKILLS

Programming Languages: Java, Python, Scala

Scripting Languages: Shell script, JavaScript, HTML, CSS, XML

Development tools: IntelliJ, Eclipse, Visual Studio

Database: MySQL, Oracle, SQL Server, HBase, Cassandra, MongoDB

Cloud: AWS,S3,Redshift,EC2

Operating System: Mac OS, Windows 10, Linux

Version Control Tools: SubVersion, GitHub

Methodologies: Agile, Waterfall

Big Data Ecosystem: Hadoop, MapReduce, YARN, Hive, Pig, Sqoop, HBase, Kafka, Oozie, Impala, Spark, Spark SQL (Data Frames and Dataset).

Data Visualizations: Tableau.

PROFESSIONAL EXPERIENCE

Confidential, Tampa, Florida

Big Data Engineer

Responsibilities:

  • Responsible for designing, implementing, and testing the ETL data pipeline from End-to-End by using tools
  • Responsible for ingestion, consumption, maintenance, production bug fixes, data cleansing, troubleshooting production job failures with workarounds, or re-running jobs if necessary.
  • Implemented Spark utilizingDataframes and Spark SQL API for faster processing ofbatch and real-time streaming data.
  • Handled large datasets using Partitions, Broadcasts inSpark, Effective & efficient Joins,
  • Transformations and others during the ingestion process itself.
  • Writing ETL jobs using Spark/Scala,Pig/MapReduce/Hbase, Databricks.
  • Written MapReduce/Pig programs for ETL and developed Customized UDF’s in java.
  • Developed a data pipeline to retrieve data from Kafka and store data into HDFS.
  • Experience with autosys to automate and schedule daily jobs.
  • Developed complex ETL transformations and performance tuning.

Environment: Linux, Eclipse, jdk1.8.0, Hadoop 2.9.0, HDFS, Map Reduce, Hive 2.3, Kafka 2.0.0, CDH 5.4.0, Autosys r11.3, Sqoop 1.4.7, Tableau, Shell Scripting, Scala 2.12, Spark 2, Python 3.6/3.5/3.4Maven Repository, Gradle Build.

Confidential

Big Data Support Engineer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop, and Spark with Scala.
  • Implemented Spark using Scala, utilizingDataframes and Spark SQL API for faster processing ofBatch and real-time streaming data.
  • Developed scripts to perform business transformations on the data using Hive and Impala for downstream applications.
  • Handled large datasets using Partitions, Broadcasts inSpark, Effective & efficient Joins, Transformations, and others during the ingestion process itself.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames, Spark RDDs, and Scala.
  • Created a common data lake for migrated data to be used by other members of the team.
  • Implemented pre-defined operators insparksuch as map, flat Map, filter, ReduceByKey, GroupByKey, AggregateByKey and CombineByKey etc.
  • Worked with different file formats (Sequential, AVRO, RC, Parquet and ORC) and different Compression Codecs (gzip, snappy, lzo).
  • Experience in working with amazon Redshift clusters for storing large datasets.
  • Expertise to read data from Amazon S3 and process it using Spark Applications.
  • Developed complex ETL transformation & performance tuning.
  • Import and export data using Sqoop from or to HDFS and Relational DB Oracle and Netezza.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Working extensively on Hive, SQL, Scala,Spark, and Shell.
  • Developed a data pipeline using Kafka to store data into HDFS.
  • Experienced in writingSparkRDD transformations, actions for the input data andSpark-SQL queries, Data frames to import data from Data sources to perform data transformations, read/write operations usingSpark-Core and saving the results to output directory into HDFS.
  • Responsible for design and development of Spark Applications using Scala to interact with hive and MySQL databases.
  • Experience with Oozie workflow to automate and schedule daily jobs.
  • Experience with job control tools like Autosys.
  • Scheduling and managing cron jobs, wrote shell scripts to generate alerts.
  • Hands on experience in installing, configuring and using eco-system components like Hadoop, MapReduce, HDFS.

Environment: Linux, Eclipse, jdk1.8.0, Hadoop 2.9.0, HDFS, Map Reduce, Hive 2.3, Kafka 2.0.0, CDH 5.4.0, Autosys r11.3, Sqoop 1.4.7, Tableau, Shell Scripting, Scala 2.12, Spark 2, Python 3.6/3.5/3.4, Maven Repository, Gradle Build.

Confidential

Software Engineer / Python Developer

Responsibilities:

  • Involved in Requirements gathering, Requirement analysis, Design, Development, Integration, and Deployment.
  • Building ETL jobs using Pyspark API with Jupyter notebooks in on-premise cluster for certain transforming needs and HDFS as data storage system.
  • Worked on reading and writing multiple data formats like JSON,ORC, Parquet on HDFS using PySpark.
  • Developed spark applications in python(PySpark) on the distributed environment to load a huge number of CSV files with different schema into Hive ORC tables.
  • Built database model, Views, and API's using Python for interactive web-based solutions.
  • Worked closely with DevOps for CI/CD for Deploying into Cloud using Jenkins and Chef.
  • Developed entire frontend and backend modules usingPythonon Django Web Framework.
  • DevelopedPythonbatch processors to consume and produce various feeds.
  • Wrote and executed various MYSQL database queries from python using Python-MySQL connector and MySQL dB package.
  • Utilized PyUnit, the Python unit test framework for testing the functionality of the application.

Environment: Python, Django, MySQL, PyUnit, Git, DevOps, Flask, JSON, Ansible, PHP.

Confidential

Data Analyst Trainee

Responsibilities:

  • Experienced working with a team of developers on Python applications for prioritizing tasks and for RISK management.
  • Hands-on experience with Python libraries such as NumPy, SciPy, and Matplotlib.
  • Have experience in writing Subqueries, Stored Procedures, Triggers, Cursors, and Functions on MySQL and PostgreSQL databases.
  • Worked on the development of SQL and stored procedures on MySQL.
  • Involved in Agile Methodologies and SCRUM Process.
  • Delivered interactive dashboards to predict daily job metrics, visualize trends and seasonality across locations
  • Involved in Requirements gathering, Requirement analysis, Design, Development, Integration, and Deployment.

Environment: Python, MySQL, Numpy, Pandas, NLTK, Scikit-learn, Seaborn, Matplotlib, Tidyverse, Git and Linux.

We'd love your feedback!