Big Data Engineer Resume Tampa, Florida - Hire IT People

SUMMARY

5+ years of professional experience in the IT industry, involving 3+ years of experience with Big Data tools in developing applications using Apache Hadoop/Spark echo systems and 2 years of experience in software applications development lifecycle with Python Technologies.
Excellent understanding/knowledge ofHadooparchitecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Edge Node, and MapReduce programming paradigm.
Huge hands - on development of complex data ingestion pipelines, data transformations, data management, and data governance in a centralized enterprise data hub
Experienced in working with Spark ecosystem using Spark-SQL, Data Frames, and Scala/python queries on different data file formats like .txt, .csv, etc.
Designing and creating Hive external tables using a shared meta-store instead of the derby with partitioning and bucketing.
Experience with Autosys scheduler to manage Hadoop jobs by developing, deploying, and maintaining JIL scripts.
Experience in integrating Hive and HBase for effective operations.
Knowledge of supporting data projects using Elastic Map Reduce on Amazon web Services (AWS) and importing/exporting data to and from S3.
Experience working on different file formats like Avro, Parquet, ORC, and Sequence and Compression techniques like Snappy in Hadoop.
Strong understanding of NoSQL databases and hands-on work experience in writing applications on NoSQL databases like HBase.
Experience in creating HBase tables to load large sets of data from various data sources.
Implemented POC to migrate Map Reduce jobs into Spark RDD, Data frame transformations using Scala and python.
Have knowledge in creating real-time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
Working knowledge on major Hadoop ecosystems Hive, Sqoop, Impala, etc.
Good experience in Cloudera, Hortonworks & ApacheHadoopdistributions.
Knowledge with high throughput streaming apps reading from Kafka queues and writing enriched data back to outbound Kafka queues.
Experience in tuning and troubleshooting performance issues inthe Hadoopcluster.
Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa.
Experience working with various Robotic Process Automation tools like Blue Prism, UI Path, and Appian Process Robot to develop automation solutions to various repetitive, rule-based and mundane business tasks.

TECHNICAL SKILLS

Programming Languages: Java, Python, Scala

Scripting Languages: Shell script, JavaScript, HTML, CSS, XML

Development tools: IntelliJ, Eclipse, Visual Studio

Database: MySQL, Oracle, SQL Server, HBase, Cassandra, MongoDB

Cloud: AWS,S3,Redshift,EC2

Operating System: Mac OS, Windows 10, Linux

Version Control Tools: SubVersion, GitHub

Methodologies: Agile, Waterfall

Big Data Ecosystem: Hadoop, MapReduce, YARN, Hive, Pig, Sqoop, HBase, Kafka, Oozie, Impala, Spark, Spark SQL (Data Frames and Dataset).

Data Visualizations: Tableau.

PROFESSIONAL EXPERIENCE

Confidential, Tampa, Florida

Big Data Engineer

Responsibilities:

Responsible for designing, implementing, and testing the ETL data pipeline from End-to-End by using tools
Responsible for ingestion, consumption, maintenance, production bug fixes, data cleansing, troubleshooting production job failures with workarounds, or re-running jobs if necessary.
Implemented Spark utilizingDataframes and Spark SQL API for faster processing ofbatch and real-time streaming data.
Handled large datasets using Partitions, Broadcasts inSpark, Effective & efficient Joins,
Transformations and others during the ingestion process itself.
Writing ETL jobs using Spark/Scala,Pig/MapReduce/Hbase, Databricks.
Written MapReduce/Pig programs for ETL and developed Customized UDF’s in java.
Developed a data pipeline to retrieve data from Kafka and store data into HDFS.
Experience with autosys to automate and schedule daily jobs.
Developed complex ETL transformations and performance tuning.

Environment: Linux, Eclipse, jdk1.8.0, Hadoop 2.9.0, HDFS, Map Reduce, Hive 2.3, Kafka 2.0.0, CDH 5.4.0, Autosys r11.3, Sqoop 1.4.7, Tableau, Shell Scripting, Scala 2.12, Spark 2, Python 3.6/3.5/3.4Maven Repository, Gradle Build.

Confidential

Big Data Support Engineer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop, and Spark with Scala.
Implemented Spark using Scala, utilizingDataframes and Spark SQL API for faster processing ofBatch and real-time streaming data.
Developed scripts to perform business transformations on the data using Hive and Impala for downstream applications.
Handled large datasets using Partitions, Broadcasts inSpark, Effective & efficient Joins, Transformations, and others during the ingestion process itself.
Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames, Spark RDDs, and Scala.
Created a common data lake for migrated data to be used by other members of the team.
Implemented pre-defined operators insparksuch as map, flat Map, filter, ReduceByKey, GroupByKey, AggregateByKey and CombineByKey etc.
Worked with different file formats (Sequential, AVRO, RC, Parquet and ORC) and different Compression Codecs (gzip, snappy, lzo).
Experience in working with amazon Redshift clusters for storing large datasets.
Expertise to read data from Amazon S3 and process it using Spark Applications.
Developed complex ETL transformation & performance tuning.
Import and export data using Sqoop from or to HDFS and Relational DB Oracle and Netezza.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Working extensively on Hive, SQL, Scala,Spark, and Shell.
Developed a data pipeline using Kafka to store data into HDFS.
Experienced in writingSparkRDD transformations, actions for the input data andSpark-SQL queries, Data frames to import data from Data sources to perform data transformations, read/write operations usingSpark-Core and saving the results to output directory into HDFS.
Responsible for design and development of Spark Applications using Scala to interact with hive and MySQL databases.
Experience with Oozie workflow to automate and schedule daily jobs.
Experience with job control tools like Autosys.
Scheduling and managing cron jobs, wrote shell scripts to generate alerts.
Hands on experience in installing, configuring and using eco-system components like Hadoop, MapReduce, HDFS.

Confidential

Software Engineer / Python Developer

Responsibilities:

Involved in Requirements gathering, Requirement analysis, Design, Development, Integration, and Deployment.
Building ETL jobs using Pyspark API with Jupyter notebooks in on-premise cluster for certain transforming needs and HDFS as data storage system.
Worked on reading and writing multiple data formats like JSON,ORC, Parquet on HDFS using PySpark.
Developed spark applications in python(PySpark) on the distributed environment to load a huge number of CSV files with different schema into Hive ORC tables.
Built database model, Views, and API's using Python for interactive web-based solutions.
Worked closely with DevOps for CI/CD for Deploying into Cloud using Jenkins and Chef.
Developed entire frontend and backend modules usingPythonon Django Web Framework.
DevelopedPythonbatch processors to consume and produce various feeds.
Wrote and executed various MYSQL database queries from python using Python-MySQL connector and MySQL dB package.
Utilized PyUnit, the Python unit test framework for testing the functionality of the application.

Environment: Python, Django, MySQL, PyUnit, Git, DevOps, Flask, JSON, Ansible, PHP.

Confidential

Data Analyst Trainee

Responsibilities:

Experienced working with a team of developers on Python applications for prioritizing tasks and for RISK management.
Hands-on experience with Python libraries such as NumPy, SciPy, and Matplotlib.
Have experience in writing Subqueries, Stored Procedures, Triggers, Cursors, and Functions on MySQL and PostgreSQL databases.
Worked on the development of SQL and stored procedures on MySQL.
Involved in Agile Methodologies and SCRUM Process.
Delivered interactive dashboards to predict daily job metrics, visualize trends and seasonality across locations
Involved in Requirements gathering, Requirement analysis, Design, Development, Integration, and Deployment.

Environment: Python, MySQL, Numpy, Pandas, NLTK, Scikit-learn, Seaborn, Matplotlib, Tidyverse, Git and Linux.

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Tampa, FloridA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship