Big Data Engineer Resume Chicago, IL - Hire IT People

SUMMARY:

Big Data Developer with extensive experience across finance, e - commerce, insurance industries.
In-Depth knowledge of Hadoop including HDFS, Yarn, MapReduce, Hive.
Proficient in Scala programming with Spark including Spark SQL, Spark Streaming, MLlib, GraphX.
Hands-on experience in stream processing with S p ark Streaming, Storm, and Flink.
Experience in NoSQL databases including HBase (with Phoenix), Cassandra and MongoDB.
Experience in RDBMS including Oracle and MySQL.
Hands-on experience in object store service Amazon S3 .
Proficient in ad-hoc queries with Hive, Impala (with Kudu) and Phoenix (with HBase) .
In-Depth knowledge of data ingestion tools Nifi, Sqoop and Flume.
Hands-on experience in building real-time data pipelines using Kafka (with Zookeeper), Spark Streaming, and HBase.
Experience in Kafka real-time Change Data Capture (CDC) using Spring XD.
Experience in AWS including EMR, Elasticsearch, S3, RDS, DynamoDB, Kinesis, Redshift, Lambda.
In-Depth knowledge of Machine Learning with python.
In-Depth knowledge of Algorithms and Data Structures.
Excellent understanding of object-oriented programming with Java and functional programming with Scala.
Excellent programming, analytical, communication and interpersonal skills, a fast learner and good team player.

TECHNICAL SKILLS

Hadoop/Spark Ecosystem: Database Hadoop 2.x, Spark 2.x, Hive 2.x, HBase 2.2.0 \ Oracle, MySQL, HBase2.2.0, Cassandra3.11 Nifi 1.9.2, Sqoop 1.4.6, Flume 1.9.0, Kafka 2.3.0 Yarn1.17.3, Mesos 1.8.0, Zookeeper 3.4.x \

AWS Programming Language: S3, RDS, DynamoDB, Kinesis, EMR \ Java8, Python3, Scala2.x Elasticsearch, Redshift, Lambda \

Software/Framework: \ Linux Git, JIRA, Maven, sbt, Junit, Jenkins, Spring \ Shell, Bash, Vim, nano, APT, Wget, pip IntelliJ IDEA, Eclipse

PROFESSIONAL EXPERIENCE:

Confidential

Big Data Engineer

Responsibilities:

Ingested initial high-volume (billion-level) data to from ERP systems to MySQL, NoSQL databases, Amazon S3, HDFS.
Deployed AWS Data Pipeline and build AWS Lambda functions to activate execution in response to AWS S3 event notifications.
Generated batch processing reports using MapReduce and Spark and loaded outputs to databases.
Optimized structured data processing using Spark SQL and Structured Streaming.
Performed and optimized ad-hoc queries with Hive, Impala (with Kudu) and Phoenix (with HBase) to achieve comprehensive data analysis.
Built real-time data pipelines using Kafka (with Zookeeper), Spark Streaming and HBase.
Brokered real-time streaming data to data persistence clusters (mainly HDFS) for further batch processing and ad-hoc queries.
Cooperated with data science teams running risk detection algorithms on streaming data.
Loaded stream processing outputs to HBase for scalable storage and fast query.
Used Git for version control and Maven for project management.

Environment: Scala 2.12.0, Nifi 1.9.2, HBase 2.2.0, Hadoop2.9.2, AWS, Spark2.4.3, Hive2.3.5, Impala3.2.0, Phoenix5.0.0, Kafka2.3.0, Zookeeper3.5.5

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

Ingested initial high-volume (billion-level) data using Nifi from relational databases and local file systems to HDFS.
Leveraged NiFi's REST API to automate the creation and monitoring of new ingestion pipelines.
Applied structure to large amounts of unstructured data, created and loaded tables in Hive.
Manipulated tables in Hive, performed and optimized ad-hoc queries with HiveQL to achieve comprehensive data analysis.
Utilized Amazon Elasticsearch to analyze and visualize large amount of data.
Used Spark as a drop-in replacement for Hadoop MapReduce job to get the right answer to our queries in a much shorter amount of time.
Leveraged Spark (with Scala) and Spark SQL to speed analysis across internal and external data sources, and generated batch processing reports.

Environment: Hadoop2.9.2, Hive2.3.5, Spark2.4.3, Scala 2.12.0

Confidential, Chicago, IL

Spark Developer

Responsibilities:

Built real-time data pipelines using Kafka (with Zookeeper), Spark Streaming (with Scala) and HBase.
Developed Kafka messaging system, collected events generated by data network (e.g. weblogs) and brokered that data to real-time analytics clusters and data persistence clusters (HDFS).
Developed and optimized high-throughput with low-latency Kafka multi-threaded producer.
Provided a comprehensive comparison among relational database, NoSQL database, object store and distributed file system based on OLAP and OLTP business requirement.
Cooperated with data science team, utilized Spark Streaming and MLlib running predictive models on streaming data for real-time analytics.
Optimized structured streaming data using Structured Streaming with Dataset API.
Loaded stream processing outputs to HBase for scalable storage and fast query.

Environment: Scala 2.12, HBase 2.2.0, Hadoop2.9.2, Spark2.4.3, Phoenix5.0.0, Kafka2.3.0, Zookeeper3.5.5

Confidential

Data Analyst

Responsibilities:

Performed extensive SQL queries to achieve comprehensive data analysis.
Cleansed and preprocessed the data using Pandas (with Python) for further modeling and analysis.
Cooperated with data science team, applied machine learning algorithms (classification, clustering, regression) using scikit-learn.
Interpreted and visualized the analysis result using Tableau.

Environment: Python3, pandas 0.24.0, scikit-learn 0.21.3, MySQL, Tableau

We provide IT Staff Augmentation Services!

Big Data Engineer Resume

Chicago, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship