We provide IT Staff Augmentation Services!

Big Data Engineer Resume

3.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY:

  • 12 Years of experience on IT wif 9 + years of experience in Big Data frameworks
  • Proven success in team leadership, focusing on mentoring team members and managing tasks for efficiency.
  • Worked wif various stakeholders for gathering requirements to create as - is and as-was dashboards.
  • Recommended and used various best practices to improve dashboard performance for Tableau server users.
  • Expert wif teh design of custom reports using data extraction and reporting tools, and development of algorithms based on business cases.
  • Strong fundamentals of SQL data model
  • Hands on in performance tuning and reporting for optimization using various methods like Extracts, Context filters, writing efficient calculations, Data source filters, Indexing, and Partitioning over SQL.
  • Used to working in production environments, managing migrations, installations, and development.
  • Created dashboards in Tableau using various features of Tableau like Custom-SQL, Multiple Tables, Blending, Extracts, Parameters, Filters, Calculations, Context Filters, Data source filters, Hierarchies, Filter Actions, Maps, etc.
  • Modified existing and added new functionalities to Financial and Strategic summary dashboards.
  • Strong SQL skills to query data for validation, reporting and dashboarding.
  • Worked wif Data Lakes and Big Data ecosystems (Hadoop, Spark, Hortonworks, Cloudera)
  • Expert wif BI tools like Tableau and PowerBI, data interpretation, modeling, data analysis, and reporting wif teh ability to assist in directing planning based on insights.
  • In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts and experience in working wif MapReduce programs using Apache Hadoop for working wif Big Data to analyze large datasets efficiently.
  • Track record of results as a project manager in an Agile methodology using data-driven analytics.
  • Experience wif teh Hadoop ecosystem, big data tools, and database technologies
  • Experience in data manipulation, data analysis, and data visualization of structured data, semi-structured data, and unstructured data
  • Understanding of teh Hadoop Architecture and its ecosystem such as HDFS, YARN, MapReduce, Sqoop, Avro, Spark, Hive, HBase, Flume, and Zookeeper
  • Creative skills in developing elegant solutions to challenges related to pipeline engineering
  • Knowledge of teh Spark Architecture and programming Spark applications
  • Ability to program in varies languages such as Python, Java, C++, and Scala
  • Experience in Object-oriented programming and functional programming
  • Creates bash scripts to automate software installation, file management, data pipelines
  • Knowledge in data governance, data operations, computer security, and cryptology
  • Coding skills wif PySpark, SparkContext, and Spark SQL
  • Pipeline development skills wif Apache Airflow, Kafka, and NiFi
  • Experience working wif several images and Docker Engine

TECHNICAL SKILLS:

PROJECT METHODS: Agile, Kanban, Scrum, DevOps, Continuous Integration, Test-Driven Development, Unit Testing, Functional Testing, Design Thinking, Lean Six Sigma

HADOOP DISTRIBUTIONS: Hadoop, Cloudera Hadoop, Hortonworks Hadoop

Amazon AWS: EC2, SQS, S3, MapR, Elastic Cloud

CLOUD SERVICES: Solr Cloud, Databricks, Datastax

CLOUD DATABASE & TOOLS: Redshift, DynamoDB, Cassandra, Apache Hbase, SQL

PROGRAMMING LANGUAGES: Python, Java, SQL, HQL, HTML5Spark, Spark Streaming, PySpark, PyTorch, C++, C#, Scala

SCRIPTING: Hive, MapReduce, SQL, Spark SQL, Shell Scripting

CONTINUOUS INTEGRATION (CI-CD): Jenkins CSV, JSON, Avro, Parquet, ORC

FILE SYSTEMS: HDFS

ETL TOOLS: Apache Flume, Kafka, Sqoop

PROFESSIONAL EXPERIENCE:

Big Data Engineer

Confidential - Dallas, TX

Responsibilities:

  • Developed Data Pipeline wif Kafka and Spark.
  • Contributed in designing teh Data Pipeline wif Lambda Architecture.
  • Performed advanced procedures like text analytics and processing, using teh in-memory computing capabilities of Spark using Scala.
  • Involved in installation, configuration, supporting and managing Hadoop clusters, Hadoop cluster administration.
  • Created Tables, Stored Procedures, and extracted data using PL/SQL for business users whenever required.
  • Worked on AWS Data pipeline to configure data loads from S3 to into Redshift.
  • Used Spark for interactive queries, processing of streaming data and integration wif popular NoSQL database for huge volume of data.
  • Expansively worked wif Partitions, Dynamic Partitioning, bucketing tables in Hive, designed both Managed and External tables, also worked on optimization of Hive queries.
  • Developed Spark Applications by using Scala, Python and Implemented Apache Spark data processing Project to handle data from various RDBMS and Streaming sources.
  • Worked wif teh Spark for improving performance and optimization of teh existing algorithms in Hadoop.
  • Using Spark Context, Spark-SQL, Spark MLib, Data Frame, Pair RDD, Spark YARN.
  • Used Spark Streaming APIs to perform transformations and actions on teh fly for building common.
  • Learner data model which gets teh data from Kafka in near real time and persist it to Cassandra.
  • Developed Kafka consumer API in Scala for consuming data from Kafka topics.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming wif Kafka as a data pipe-line system.
  • Migrated an existing on-premises application to AWS.
  • Used AWS services like EC2 and S3 for small data sets processing and storage.
  • Experienced in Maintaining teh Hadoop cluster on AWS EMR.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDDs.

Big Data Developer

Confidential - Fremont, CA

Responsibilities:

  • Experience installing Apache Kafka.
  • Configured a cluster of Zookeeper.
  • In charge of Topic creation and management.
  • Worked as L1 support on Jira requests for Kafka.
  • Worked on Topic partitioning and replication.
  • Designed a PoC for Confluent Kafka.
  • Configured documentations for Kafka to operate TEMPeffectively.
  • Created a Producer application that sends API messages over Kafka.
  • Defined API security key and other necessary credentials to run Kafka architecture.
  • Wrote python code that tracks Kafka message delivery.
  • Implemented API key and credentials into python program.
  • Define a catalog of APIs to be consumed by a Kafka producer application.
  • Implemented python codebase for branch management over Kafka features.
  • Defined Kafka Zookeeper offset storage.
  • Worked on results from Kafka server output successfully.
  • Coordinated Kafka consumers for data validation.
  • Defined serialization for data over Kafka management.
  • Created an application in python to consume messages from Kafka.
  • Developed dashboards to monitor Kafka performance using ELK.

Data Engineer

Confidential - Rockville, MD

Responsibilities:

  • Created and executed Hadoop Ecosystem installation and document configuration scripts on Google Cloud Platform.
  • Transformed batch data from several tables containing tens of thousands of records from SQL Server, MySQL, PostgreSQL, and csv file datasets into data frames using PySpark.
  • Researched and downloaded jars for Spark-avro programming.
  • Developed a PySpark program that writes dataframes to HDFS as avro files.
  • Utilized Spark's parallel processing capabilities to ingest data.
  • Created and executed HQL scripts that creates external tables in a raw layer database in Hive.
  • Developed a Script that copies avro formatted data from HDFS to External tables in raw layer.
  • Created PySpark code that uses Spark SQL to generate dataframes from avro formatted raw layer and writes them to data service layer internal tables as orc format.
  • In charge of PySpark code, creating dataframes from tables in data service layer and writing them to a Hive data warehouse.
  • Installed Airflow and created a database in PostgreSQL to store metadata from Airflow.
  • Configured documents which allow Airflow to communicate to its PostgreSQL database.
  • Developed Airflow DAGs in python by importing teh Airflow libraries.
  • Utilized Airflow to schedule automatically trigger and execute data ingestion pipeline.

Data Engineer

Confidential - Atlanta, GA

Responsibilities:

  • Installed Hadoop, MySQL, PostgreSQL, SQL Server, Sqoop, Hive, and HBase.
  • Created bashrc files and all other xml configurations to automate teh deployment of Hadoop VMs over AWS EMR.
  • Experience creating and organizing HDFS over a staging area.
  • Troubleshooted RSA SSH keys in Linux for authorization purposes.
  • Inserted data from multiple csv files into MySQL, SQL Server, and PostgreSQL using spark.
  • Utilized Sqoop to import structured data from MySQL, SQL Server, PostgreSQL, and a semi-structured csv file dataset into HDFS data lake.
  • Developed a raw layer of external tables wifin S3 containing copied data from HDFS.
  • Created a data service layer of internal tables in Hive for data manipulation and organization.
  • Inserted data into DSL internal tables from RAW external tables.
  • Achieved business intelligence by creating and analyzing an application service layer in Hive containing internal tables of teh data which are also integrated wif HBase.

Data Engineer/Data Scientist

Confidential - Coopersburg, PA

Responsibilities:

  • Utilized Pandas to create a dataframes
  • Imported a csv dataset into a dataframe using pandas.
  • Ingested data from varies RDBMS sources.
  • Wrote python code to manipulate and organize dataframe such that all attributes in each field were formatted identically.
  • Utilized Matplotlib to graph teh manipulated dataframes for further analysis.
  • Graphs provided teh data visualization needed to obtain information in a simple form.
  • Exported manipulated dataframes to Microsoft Excel and utilized its choropleth map feature.
  • Created a PowerPoint presentation on teh information discovered using data visualization techniques such as bar graphs and choropleth maps.

Database Administrator

Confidential, Pen Argyl, PA

Responsibilities:

  • Design a database for teh interrelationships of teh data that needed to be stored and teh logical structure of its data tables
  • Users selected products in a C# generated GUI and stores their information in SQL Server Management Studior
  • Outlined teh purpose and function of teh database
  • Ensured all necessary information was gathered and organized prior to design
  • Separated information into relevant tables and then divided it into appropriate tables
  • Designed a relational model for several tables wif specified primary keys
  • Executed normalization practices to ensure data integrity and eliminated data redundancy
  • Wrote C# scripts that generates a GUI wif text boxes, drop down menus, and buttons
  • GUI prompts user to enter personal information, charity items to donate, and deliver options
  • Developed a fully functioning C# program that connects to SQL Server Management Studio and integrates information users enter wif preexisting information in teh database
  • Implemented SQL functions to receive user information from front end C# GUIs and store it into database
  • Utilized SQL functions to select information from database and send it to teh front end upon user request

Software Developer

Confidential - Philadelphia, PA

Responsibilities:

  • Developed a first person shooter of teh classic game Asteroids wif Unity and C# scripts
  • Player uses pitch, yaw, roll, and thrust to fly around in three dimensional space while avoiding alien fire and asteroids
  • Implemented a player controller script which allows player to steer and thrust teh ship using arrow keys and fire proton torpedoes wif teh space bar
  • Created an alien controller script that flies alien ships around teh player while shooting at teh player and gradually flies teh alien ships towards teh player
  • Coded a script to revolve and rotate asteroids around teh level and will trigger a particle system explosion when shot by teh player
  • Audio manager script was written to store pitch and volume settings for each sound clip for every game object
  • A destroyed asteroid will trigger a debris script that creates and randomly projects smaller asteroid game objects out from teh center of teh inertial asteroid game object
  • Gun game object will recoil and trigger a muzzle flash TEMPeffect when player pressing teh space bar
  • HUD displays teh ships current pitch, yaw, and roll coordinates along wif teh players health points and whether or not teh ships force field is active
  • Random rotation script give asteroid game objects a real life appearance of naturally floating in outer space
  • Created a torpedo script that will initiate a proton torpedo game object when player presses teh space bar and projects teh torpedo forward from teh player wif great velocity
  • Developed a global stats script which passed and holds information for all of teh game objects in teh game
  • Main Menu was created as a scene in Unity and wif a script that allows teh player to navigate to teh high score table, reset teh high score table, and select a level of difficult to play
  • Created a game over scene containing a script which allows teh player to exit teh game or restart after player TEMPhas died
  • High score table is a scene and containing a script which holds and displays teh initials entered by teh player top players and teh score earned by that player

Software Developer

Confidential - Bethlehem, PA

Responsibilities:

  • Worked on a team to develop image editing software that iterates through each pixel of an image, manipulating each bit, altering teh color, and achieving teh desired TEMPeffect
  • C++ code was developed to achieve a chromatic abbreviation filter to teh image editing software
  • Developed a Simplified Advance Encryption Standard program which can encrypt any given 16-bit plain text
  • Implemented a decryption method to decipher encrypted text. Performs a differential cryptanalysis attack on teh encrypted text for one round S-AES
  • String to bit set function receives a string, converts it to a 4-bit set, and returns it
  • Cryptanalysis function receives eight 4-bit sets, manipulates teh bits using s-box transformation, makes logical comparisons, and returns teh resulting 4-bit set needed for a cryptanalysis attack
  • Key expansion function receives a 16-bit key, manipulates teh bits, and expands teh bit size of teh key
  • Nibble substitution function receives an 8-bit set, shuffles bits one nibble at a time, substitutes bits using s-box transformations, and returns teh resulted 8-bit set
  • Nibble rotation function receives an 8-bit set, shuffles bits wif rotation, and returns teh result
  • Add round key function receives two 16-bit sets, performs XOR operation on them and returns teh resulted key
  • Shift row function receives a 16-bit set, shuffles teh bits, and returns teh new 16-bit set
  • Mix column function receives a 16-bit set, shuffles bits one nibble at a time, performs XOR operation on each nibble, shuffles bits one nibble, and returns teh mixed 16-bit set
  • Nibble substitution function receives a 16-bit set, shuffles bits one nibble at a time, substitutes bits using s-box transformation, returns 16-bit set
  • S-box function receives a nibble, assigns it to a variable, comparison operators assign a new nibble to teh variable, and returns teh new nibble
  • Multable function receives a nibble, comparison operators assign a new nibble to teh variable, and returns teh nibble
  • Inverse mix column function executes mix column in reverse order
  • Inverse nibble substitution function executes nibble substitution in reverse order
  • Inverse s-box function receives a string nibble, comparison operators assign a new string nibble to teh variable that was passed, and returns that string

We'd love your feedback!