Big Data Engineer Resume Baltimore, Maryland - Hire IT People

SUMMARY

7 years’ experience in Big Data development applying Apache Spark, HIVE, Apache Kafka, and Hadoop.
9 years’ total IT/software/database design/development/deployment/support experience (7 years in Big Data and 2 years in software/IT data systems).
Experienced with Big Data link technologies such as Amazon Warehouse Services (AWS), Microsoft Azure, Apache Kafka, Python, Apache Spark, HIVE, Apache Kafka, and Hadoop.
Experienced analyzing Microsoft SQL Server data models and identifying and creating inputs to convert existing dashboards that use Excel as a data source.
Applied Python - based design and development programming to multiple projects.
Created Pyspark Data Frames on multiple projects and tied into Kafka.
Configured Big Data Hadoop and Apache Spark in Big Data.
Built AWS Cloud Formation templates used for Terraform with existing plugins.
Developed AWS Cloud Formation templates to create custom infrastructures of pipelines.
Implemented AWS IAM user roles and policies to authenticate and control user access.
Applied expertise designing custom reports using data extraction and reporting tools, and development of algorithms based on business cases.
Performance-tuned data-heavy dashboards and reports for optimization using various options such as Extracts, Context filters, writing efficient calculations, Data source filters, Indexing, and Partitioning in the data source, etc.
Wrote SQL queries for data validation of reports and dashboards.
Worked with Data Lakes and Big Data ecosystems (Hadoop, Spark, Hortonworks, Cloudera).
Proven success working on different Big Data technology teams operating within an Agile/Scrum project methodology.

TECHNICAL SKILLS

File Formats: Parquet, Avro & JSON, ORC, Text, CSV.

Apache: Apache Ant, Apache Flume, Apache Hadoop, Apache YARN, Apache Hive, Apache Kafka, Apache MAVEN, Apache Oozie, Apache Spark, Apache Tez, Apache Zookeeper, Cloudera Impala.

Operating Systems: Unix/Linux, Windows 10, Ubuntu, Apple OS.

Scripting: HiveQL, MapReduce, XML, FTP, Python, UNIX, Shell scripting, LINUX.

Distributions: Cloudera, Hortonworks, AWS, Elastic, ELK, Cloudera CDH 4/5, Hortonworks HDP 2.5/2.6, Amazon Web Services (AWS).

Data Processing (Compute) Engines: Apache Spark, Spark Streaming, Flink.

Data Visualization Tools: Pentaho, QlikView, Tableau, PowerBI, Matplot.

Compute Engines: Apache Spark, Spark Streaming, Storm.

Databases: Microsoft SQL Server Database (2005, 2008R2, 2012), Database & Data Structures, Apache Cassandra, Amazon Redshift, DynamoDB, Apache HBase, Apache Hive, MongoDB.

PROFESSIONAL EXPERIENCE

Confidential, Baltimore, Maryland

Big Data Engineer

Responsibilities:

Created Pyspark streaming job to receive real time data from Kafka.
Defined Spark data schema and set up development environment inside the cluster.
Designed Spark Python job to consume information from S3 Buckets using Boto3.
Utilized a cluster of multiple Kafka brokers to handle replication needs and allow for fault tolerance.
Created a pipeline to gather data using Pyspark, Kafka, and HBase.
Used Spark streaming to receive real-time data using Kafka.
Worked with unstructured data and parsed out the information by Python built-in functions.
Configured a Python API Producer file to ingest data from the Slack API, using Kafka for real-time processing with Spark.
Processed data with natural language toolkit to count important words and generate word clouds.
Started and configured master and slave nodes for Spark.
Set up cloud compute engine managed and unmanaged mode and SSH key management.
Worked on virtual machines to run pipelines to on a distributed system.
Led presentations about the Hadoop ecosystem, best practices, and data architecture in Hadoop.
Managed hive connection with tables, databases, and external tables.
Installed Hadoop using Terminal and set the configurations.

Confidential, Cincinnati, OH

AWS Data Engineer

Responsibilities:

Made and oversaw cloud VMs with AWS EC2 command-line clients and AWS administration reassure.
Used Apache Spark DataFrame API over the Cloudera platform to perform analytics on Hive data.
Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
Used Ansible Python Script to generate inventory and push the deployment to AWS Instances.
Executed Hadoop/Spark jobs on AWS EMR using programs, data stored in S3 Buckets.
Implemented usage of Amazon EMR for processing Big Data across Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3) AWS Redshift.
Implemented AWS Lambda functions to run scripts in response to events in the Amazon Dynamo DB table or S3.
Populated database tables via AWS Kinesis Firehose and AWS Redshift.
Automated the installation of ELK agent (file beat) with Ansible playbook. Developed KAFKA. Queue System to Collect Log data without Data Loss and Publish to various Sources.
Applied AWS Cloud Formation templates for Terraform with existing plugins.
Developed AWS Cloud Formation templates to create a custom infrastructure of our pipeline.
Implemented AWS IAM user roles and policies to authenticate and control access.
Specified nodes and performed the data analysis queries on Amazon redshift clusters on AWS.

Confidential, Houston, Texas

Hadoop Engineer

Responsibilities:

Integrated Kafka with Spark Streaming for real time data processing of logistic data.
Used shell scripts to migrate the data between Hive, HDFS and MySQL.
Installed and configured HDFS cluster for bigdata extraction, transformation, and load.
Utilized Zookeeper and Spark interface for monitoring proper execution of Spark Streaming.
Configured Linux on multiple Hadoop environments setting up Dev, Test, and Prod clusters within the same configuration.
Created a pipeline to gather data using Pyspark, Kafka and HBase.
Sent requests to source REST Based API from a Scala script via Kafka producer.
Utilized a cluster of multiple Kafka brokers to handle replication needs and allow for fault tolerance.
Hands-on with Spark Core, SparkSession, SparkSQL, and Data Frames/Data Sets/RDD API, Spark jobs, Spark SQL, and Data Frames API to load structured data into Spark clusters.
Created a Kafka broker that uses the schema to fetch structured data in structured streaming.
Defined Spark data schema and set up development environment inside the cluster.
Interacted with data residing in HDFS using Pyspark to process the data.

Confidential, New York, NY

Big Data Engineer

Responsibilities:

Connected and ingested data using different ingestion tools such as Kafka and Flume.
Worked on importing the received data into Hive using Spark.
Applied HQL for querying desirable data in Hive used for further analysis.
Implemented Partitioning, Dynamic Partition and Buckets in Hive which resulted in an increase in performance as well as proper and logical organization of data.
Decoded the raw data and loaded into JSON before sending the batched streaming file over the Kafka producer.
Received the JSON response in Kafka consumer written in Python.
Established a connection between the HBase and Spark for the transfer of the newly populated data frame.
Designed Spark Scala job to consume information from S3 Buckets.
Monitored background operations in Hortonworks Ambari.
HDFS Monitoring job status and life of the Data Nodes according to the specs.

Confidential, Oklahoma City, OK

Software/IT Data Systems Programmer

Responsibilities:

Gathered requirements from the client, analyzed, and prepared a requirement specification document for the client.
Identified data types and wrote and ran SQL data cleansing and analysis scripts.
Formatted files to import and export data to a SQL Server repository.
Applied Git to store and organize SQL queries.
Improved user interface for the database by reducing user input with automated inputs.
Re-designed forms for easier access.
Applied code modifications and wrote new scripts in Python.
Worked with software/IT technology team to improve data integration processing.
Reported and resolved discrepancies in a timely manner through the appropriate channels.

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Baltimore, MarylanD

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship