We provide IT Staff Augmentation Services!

Big Data Engineer Resume

4.00/5 (Submit Your Rating)

Parsippany, NJ

SUMMARY

  • 10+ years’ experience in IT Technology; 7+ years’ experience in the Hadoop/Big Data/Cloud space.
  • Demonstrated hands - on skill with Big Data link technologies such as Amazon Warehouse Services (AWS), Microsoft Azure, Apache Kafka, Python, Apache Spark, HIVE, Apache Kafka, and Hadoop.
  • Experienced analyzing Microsoft SQL Server data models and identifying and creating inputs to convert existing dashboards that use Excel as a data source.
  • Skilled with Python-based design and development programming.
  • Skilled creating PySpark Data Frames on multiple projects and tying into Kafka.
  • Configure Big Data Hadoop and Apache Spark in Big Data.
  • Build AWS Cloud Formation templates used for Terraform with existing plugins.
  • Develop AWS Cloud Formation templates to create custom infrastructures of pipelines.
  • Implement AWS IAM user roles and policies to autanticate and control user access.
  • Apply expertise designing custom reports using data extraction and reporting tools, and development of algorithms based on business cases.
  • Performance-tune data-heavy dashboards and reports for optimization using various options such as Extracts, Context filters, writing efficient calculations, Data source filters, Indexing, and Partitioning in the data source, etc.
  • Program SQL queries for data validation of reports and dashboards.
  • Prove skill setting up Data Lakes and Big Data ecosystems (Hadoop, Spark, Hortonworks, Cloudera).
  • TEMPEffective working with project stakeholders for gathering requirements to create As-Is and As-Was dashboards.
  • Recommended and used various best practices to improve dashboard performance for Tableau server users.
  • Design and develop custom reports using data extraction and reporting tools, and develop algorithms based on business cases.
  • Apply in-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce concepts.
  • Maintain ELK (Kibana) and write Spark scripts using Scala.
  • Configure Spark Streaming to receive real time data from internal system and store the stream data to HDFS.
  • Create HIVE scripts for ETL, create HIVE tables, and write HIVE queries.

TECHNICAL SKILLS

Apache: Kafka, Flume, Hadoop, YARN, Hive, MAVEN, Oozie, Spark, Zookeeper, Cloudera

Distributions: Hortonworks, Cloudera, AWS, ELK, AWS Lambda, EMR Amazon Web Services (AWS), Azure

Scripting: HiveQL, MapReduce, XML, FTP, Python, UNIX, Shell scripting, LINUX

Data Processing (Compute) Engines: Apache Spark, Spark Streaming

Databases: Microsoft SQL Server, Apache HBase, Apache Hive

Data Visualization Tools: Tableau, Power BI

Scheduler Tool: Airflow

File Formats: Parquet, Avro, JSON, ORC, Text, CSV

Operating Systems: Unix/Linux, Windows, Ubuntu, Apple OS

PROFESSIONAL EXPERIENCE

Big Data Engineer

Confidential

Responsibilities:

  • Applied Amazon Web Services (AWS) Cloud services such as EC2, S3, EBS, RDS, VPC, and IAM.
  • Set up Puppet Master client and wrote scripts to deploy applications on Dev, QA, and production environment.
  • Developed and maintained build/deployment scripts for testing, staging, and production environments using Maven, Shell, ANT, and Perl Scripts.
  • Developed Puppet modules and role/profiles for installation and configuration of software required for various applications/blueprints.
  • Wrote Python scripts to manage AWS resources from API calls using BOTO SDK and worked with AWS CLI.
  • Wrote Ansible playbooks to launch AWS instances and used Ansible to manage web applications.
  • Created UDFs using Scala.
  • Archived data using Amazon Glacier.
  • Monitored resources, such as Amazon DB Services, CPU Memory, and EBS volumes.
  • Monitored logs for better understanding the functioning of the system.
  • Automated, configured, and deployed instances on AWS, Azure environments, and Data Centers.
  • Hands-on with EC2, Cloud Watch, Cloud Formation and managing security groups on AWS.
  • Maintained high-availability clustered and standalone server environments and refined automation component with scripting and configuration management (Ansible).
  • Implemented Apache Spark and Spark Streaming projects using Scala and Spark SQL.
  • Wrote Spark applications using Python and Scala.
  • Implemented Microsoft Azure Cloud Services (PaaS & IaaS), Storage, Web Apps, Active Directory, Application Insights, Internet of Things (IoT), Azure Search, Key Vault, Visual Studio Online (VSO) and SQL Azure.
  • Automated Vulnerability Management patching and CI/CD using Chef and GitLab, Jenkins, and AWS/Open Stack.
  • Implemented AWS EMR Spark using PySpark and utilized DataFrames and SparkSQL API for faster processing of data.
  • Configured network architecture on AWS with VPC, Subnets, Internet gateway, NAT, and Route table.
  • Set up systems and network security using Cloud Watch and Nagios.
  • Worked on CI/CD pipeline for code deployment by engaging different tools (Git, Jenkins, CodePipeline) in the process right from developer code check-in to deployment.
  • Customized Kibana for dashboards and reporting to provide visualization of log data and streaming data.

Cloud Engineer

Confidential, Parsippany, NJ

Responsibilities:

  • Configured, deployed, and automated instances on AWS, Azure environments, and Data Centers.
  • Applied EC2, Cloud Watch, Cloud Formation, and managed security groups on AWS.
  • Programmed software installation shell scripts.
  • Programmed scripts to extract data from multiple databases.
  • Programmed scripts to schedule Oozie workflows to execute daily tasks.
  • Produced distributed query agents to perform distributed queries against Hive.
  • Loaded data from different sources such as HDFS or HBase into Spark data frames and implement in-memory data computation to generate the output response.
  • Developed Spark programs using PySpark.
  • Produced Hive external tables and designed information models in Hive.
  • Produced multiple Spark Streaming and batch Spark jobs using Python.
  • Processed Terabytes of information in real time using Spark Streaming.
  • Created and managed code reviews.
  • Wrote streaming applications with Spark Streaming/Kafka.
  • Created automated Python scripts to convert data from different sources and to generate ETL pipelines.
  • Applied Hive optimization techniques such as partitioning, bucketing, map join, and parallel execution.
  • Ingested data from various sources and processed the Data-at-Rest utilizing Big Data technologies such as HBase, Hadoop, Map Reduce Frameworks, and Hive.
  • Monitored Amazon DB and CPU Memory using Cloud Watch.
  • Used Spark SQL to realize quicker results compared to Hive throughout information analysis.
  • Converted HiveQL/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Developed DBC/ODBC connectors between Hive and Spark for the transfer of the newly populated data frames from MSSQL
  • Executed Hadoop/Spark jobs on AWS EMR using programs and data stored in S3 Buckets.
  • Implemented a Hadoop Cloudera distributions cluster using AWS EC2.
  • Worked with AWS Lambda functions for event-driven processing to various AWS resources.
  • Managed AWS Redshift clusters such as launching the cluster by specifying the nodes and performing the data analysis queries.

Data Engineer

Confidential, Houston, TX

Responsibilities:

  • Designed relational database management system (RDBMS) and incorporated with SQOOP and HDFS.
  • Created Hive external tables on RAW data layer pointing to HDFS location.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Utilized Hive Query Language (HQL) to create Hive internal tables on a data service layer (DSL) Hive database.
  • Integrated Hive internal tables with Apache HBase data store for data analysis and read/write access.
  • Developed Spark jobs using Spark SQL, PySpark, and DataFrames API to process structured, and unstructured data into Spark clusters.
  • Analyzed and processed Hive internal tables according to business requirements and saved the new queried tables in the application service layer (ASL) Hive database.
  • Developed knowledge base of Hadoop architecture and components such as HDFS, Name Node, Data Node, Resource Manager, Secondary Name Node, Node Manager, and MapReduce concepts.
  • Installed, configured, and tested Sqoop data ingestion tool and Hadoop ecosystems.
  • Imported and appended data using Sqoop from different Relational Database Systems to HDFS.
  • Exported and inserted data from HDFS into Relational Database Systems using Sqoop.
  • Automated the pipeline flow using Bash script.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Provided connections to different Business Intelligence tools to the tables in the data warehouse such as Tableau and Power BI.

Hadoop Data Engineer

Confidential, Salisbury, NC

Responsibilities:

  • Used Zookeeper and Oozie for coordinating the cluster and programing workflows.
  • Used Sqoop to expeditiously transfer information between information data bases and HDFS and used Flume to stream the log data from servers.
  • Enforced partitioning, bucketing in Hive for higher organization of the info.
  • Worked with different file formats and compression techniques to standards.
  • Loaded information from UNIX system to HDFS.
  • Used UNIX system shell scripts to alter the build method, and to perform regular jobs like file transfers between totally different hosts.
  • Documented Technical Specs, Dataflow, information Models and sophistication Models.
  • Documented needs gatheird from stake holders.
  • With success loaded files to HDFS from Teradata and loaded from HDFS to HIVE.
  • Involved in researching various available technologies, industry trends, and cutting-edge applications.
  • Data ingestion is done using Flume with source as Kafka Source & sink as HDFS.
  • Performed storage capacity management, performance tuning, and benchmarking of clusters.

Linux Systems Administrator

Confidential, San Jose, CA

Responsibilities:

  • Installed, configured, monitored, and administrated Linux servers.
  • Installed, deployed, and managed Linux RedHat Enterprise, CentOS, Ubuntu, and installed patches and packages for Red Hat Linux Servers.
  • Configured and installed RedHat and Centos Linux Servers on virtual machines and bare metal installations.
  • Performed kernel and database configuration optimization such as I/O resource usage on disks.
  • Created and modified users and groups with root permissions.
  • Administered local and remote servers using the SSH on a daily basis.
  • Created and maintained Python scripts for automating build and deployment processes.
  • Utilized Nagios-based open-source monitoring tools to monitor Linux Cluster nodes.
  • Created users, managed user permissions, maintained user and file system quotas, and installed and configured DNS.
  • Worked with DBA team for database performance issues, network related issues on LINUX/UNIX servers and with vendors regarding hardware related issues.
  • Monitored CPU, memory, hardware and software including raid, physical disk, multipath, filesystems, and networks using Nagios monitoring tool.
  • Hosted servers using Vagrant on Oracle virtual machines.
  • Automated daily tasks using bash scripts while documenting the changes in the environment and in each server, analyzing the error logs, user logs and /var/log messages.
  • Adhered to industry standards by securing systems, directory and file permissions, groups and supporting user account management along with the creation of users.

We'd love your feedback!