We provide IT Staff Augmentation Services!

Big Data Engineer Resume

5.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY:

  • Big Data Developer with extensive experience across finance, e - commerce, insurance industries.
  • In-Depth knowledge of Hadoop including HDFS, Yarn, MapReduce, Hive.
  • Proficient in Scala programming with Spark including Spark SQL, Spark Streaming, MLlib, GraphX.
  • Hands-on experience in stream processing with S p ark Streaming, Storm, and Flink.
  • Experience in NoSQL databases including HBase (with Phoenix), Cassandra and MongoDB.
  • Experience in RDBMS including Oracle and MySQL.
  • Hands-on experience in object store service Amazon S3 .
  • Proficient in ad-hoc queries with Hive, Impala (with Kudu) and Phoenix (with HBase) .
  • In-Depth knowledge of data ingestion tools Nifi, Sqoop and Flume.
  • Hands-on experience in building real-time data pipelines using Kafka (with Zookeeper), Spark Streaming, and HBase.
  • Experience in Kafka real-time Change Data Capture (CDC) using Spring XD.
  • Experience in AWS including EMR, Elasticsearch, S3, RDS, DynamoDB, Kinesis, Redshift, Lambda.
  • In-Depth knowledge of Machine Learning with python.
  • In-Depth knowledge of Algorithms and Data Structures.
  • Excellent understanding of object-oriented programming with Java and functional programming with Scala.
  • Excellent programming, analytical, communication and interpersonal skills, a fast learner and good team player.

TECHNICAL SKILLS

Hadoop/Spark Ecosystem: Database Hadoop 2.x, Spark 2.x, Hive 2.x, HBase 2.2.0 \ Oracle, MySQL, HBase2.2.0, Cassandra3.11 Nifi 1.9.2, Sqoop 1.4.6, Flume 1.9.0, Kafka 2.3.0 Yarn1.17.3, Mesos 1.8.0, Zookeeper 3.4.x \

AWS Programming Language: S3, RDS, DynamoDB, Kinesis, EMR \ Java8, Python3, Scala2.x Elasticsearch, Redshift, Lambda \

Software/Framework: \ Linux Git, JIRA, Maven, sbt, Junit, Jenkins, Spring \ Shell, Bash, Vim, nano, APT, Wget, pip IntelliJ IDEA, Eclipse

PROFESSIONAL EXPERIENCE:

Confidential

Big Data Engineer

Responsibilities:

  • Ingested initial high-volume (billion-level) data to from ERP systems to MySQL, NoSQL databases, Amazon S3, HDFS.
  • Deployed AWS Data Pipeline and build AWS Lambda functions to activate execution in response to AWS S3 event notifications.
  • Generated batch processing reports using MapReduce and Spark and loaded outputs to databases.
  • Optimized structured data processing using Spark SQL and Structured Streaming.
  • Performed and optimized ad-hoc queries with Hive, Impala (with Kudu) and Phoenix (with HBase) to achieve comprehensive data analysis.
  • Built real-time data pipelines using Kafka (with Zookeeper), Spark Streaming and HBase.
  • Brokered real-time streaming data to data persistence clusters (mainly HDFS) for further batch processing and ad-hoc queries.
  • Cooperated with data science teams running risk detection algorithms on streaming data.
  • Loaded stream processing outputs to HBase for scalable storage and fast query.
  • Used Git for version control and Maven for project management.

Environment: Scala 2.12.0, Nifi 1.9.2, HBase 2.2.0, Hadoop2.9.2, AWS, Spark2.4.3, Hive2.3.5, Impala3.2.0, Phoenix5.0.0, Kafka2.3.0, Zookeeper3.5.5

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

  • Ingested initial high-volume (billion-level) data using Nifi from relational databases and local file systems to HDFS.
  • Leveraged NiFi's REST API to automate the creation and monitoring of new ingestion pipelines.
  • Applied structure to large amounts of unstructured data, created and loaded tables in Hive.
  • Manipulated tables in Hive, performed and optimized ad-hoc queries with HiveQL to achieve comprehensive data analysis.
  • Utilized Amazon Elasticsearch to analyze and visualize large amount of data.
  • Used Spark as a drop-in replacement for Hadoop MapReduce job to get the right answer to our queries in a much shorter amount of time.
  • Leveraged Spark (with Scala) and Spark SQL to speed analysis across internal and external data sources, and generated batch processing reports.

Environment: Hadoop2.9.2, Hive2.3.5, Spark2.4.3, Scala 2.12.0

Confidential, Chicago, IL

Spark Developer

Responsibilities:

  • Built real-time data pipelines using Kafka (with Zookeeper), Spark Streaming (with Scala) and HBase.
  • Developed Kafka messaging system, collected events generated by data network (e.g. weblogs) and brokered that data to real-time analytics clusters and data persistence clusters (HDFS).
  • Developed and optimized high-throughput with low-latency Kafka multi-threaded producer.
  • Provided a comprehensive comparison among relational database, NoSQL database, object store and distributed file system based on OLAP and OLTP business requirement.
  • Cooperated with data science team, utilized Spark Streaming and MLlib running predictive models on streaming data for real-time analytics.
  • Optimized structured streaming data using Structured Streaming with Dataset API.
  • Loaded stream processing outputs to HBase for scalable storage and fast query.

Environment: Scala 2.12, HBase 2.2.0, Hadoop2.9.2, Spark2.4.3, Phoenix5.0.0, Kafka2.3.0, Zookeeper3.5.5

Confidential

Data Analyst

Responsibilities:

  • Performed extensive SQL queries to achieve comprehensive data analysis.
  • Cleansed and preprocessed the data using Pandas (with Python) for further modeling and analysis.
  • Cooperated with data science team, applied machine learning algorithms (classification, clustering, regression) using scikit-learn.
  • Interpreted and visualized the analysis result using Tableau.

Environment: Python3, pandas 0.24.0, scikit-learn 0.21.3, MySQL, Tableau

We'd love your feedback!