We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

New York, NY

SUMMARY

  • Big Data professional having over six years of experience in the field of Data Science and Data Analytics using Big Data technologies along with Enterprise Application Development.
  • Have gained a considerable amount of experience working alongside experienced professionals in the IT industry.
  • Hardworking, a team player, and a progressive thinker looking for a challenging environment to apply theoretical knowledge and working expertise to date.
  • 5+ years of professional experience in the domains of finance, ecommerce, and technology, developing, implementing and configuring Java technologies for desktop, web, and cloud applications.
  • 3+ years of experience with Apache Hadoop and Apache Spark.
  • Extensive knowledge of concepts in Big Data, Scalable Distributed and Parallel Computing, and Data Science.
  • Excellent knowledge of the Hadoop architecture and components such as the HDFS, MapReduce, YARN, and Hadoop Ecosystem.
  • Experience working with Spark APIs, and resilient distributed datasets (RDD) for batch processing data streams.
  • Hands on experience with Hadoop ecosystem components such as: Hive, Pig, HBase, Oozie, Zookeeper, Sqoop, Mahout, and Flume.
  • Experience in writing custom MapReduce computing jobs in Java.
  • Experience with data pipeline and data logging using Kafka, Flume and Storm.
  • Hands - on involvement in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) such as Oracle.
  • Extensive experience with installing, configuring, and managing multi-node Hadoop clusters on Linux, and cloud distributions such as Cloudera (CDH3/4/5), and Amazon Web Services.
  • Proficient in analyzing data using HiveQL, Pig Latin scripts and custom UDFs.
  • Working experience with NoSQL databases such as HBase and Cassandra.
  • Well-versed with applying and working with various open-source APIs, tools, and virtualization for Big Data related tasks.
  • Knowledge of data storage repositories such as data lakes and data warehouses as well as data cleansing and wrangling using Talend.
  • Familiar with Python data analytics tools such as NumPy and Pandas.
  • Extensive knowledge of Object Oriented Programming (OOP) and machine learning data pattern algorithms.
  • Well-versed with core Java programming, servlets, multi-threading, concurrency principles and growing knowledge of Scala, R, and emerging technologies such as Apache Flink, Docker, and Titan.
  • Work experience in Test Driven Development environment and knowledge of various software development methodologies such as Agile, Scrum, RUP, and Waterfall.
  • Successful ability to work independently as well as a team member on group projects with both on-shore and offshore teams.

TECHNICAL SKILLS

Big Data: Hadoop, Spark, MapReduce, YARN, Zookeeper, Hive, Pig, Solr, Sqoop, Flume, Oozie, Storm, Kafka, Mahout, Cloudera, Talend

Languages: Java, Scala, Python, C#, C++, HTML, Shell

RDBMS: Oracle 10g/11g/12c, MySQL

NoSQL: Cassandra, HBase

Cloud: Amazon Web Services, Oracle E-Business, Fusion Financials

Tools: Eclipse IDE, Netbeans, BlueJ, Visual Studio, MS Office 365, VMware, VirtualBox, Talend, Titan

Operating Systems: Windows XP/7/8/10, Mac OSX, Linux

PROFESSIONAL EXPERIENCE

Confidential - New York, NY 

Big Data Engineer

Responsibilities:

  • Developed MapReduce and Spark jobs using Java for batch processing and validating income data from multiple file formats and sources.
  • Built data pipeline using MapReduce, Flume, Sqoop, Pig, and HDFS for financial analysis.
  • Implemented Spark SQL and Spark Streaming for faster processing of real-time trading data.
  • Pipelined and analyzed real time streaming data logs using Spark with Kafka.
  • Imported data from different databases into the HDFS using Sqoop and performed transformations using Hive.
  • Collected and aggregated large amounts of log data using Flume.
  • Wrote HiveQL queries and executed Pig scripts to study customer behavior.
  • Used Hive to analyze the partitioned and bucketed data and computed metrics for financial reporting.
  • Developed product profiles using Pig and product specific UDFs.
  • Built scalable distributed data solutions and wrote CQL queries in Cassandra.
  • Scheduled and executed workflows in Oozie to run Hive and Pig jobs.
  • Used Impala to read, write and query the Hadoop data in HDFS from Cassandra.
  • Configured Kafka to read and write messages from external programs and handle real time data.
  • Involved in writing Storm topology to accept data from Kafka and process the data.
  • Applied Solr to index the search data and performed real time updates.
  • Participated in data cleansing using Talend.
  • Monitored Hadoop cluster using Cloudera Manager in CDH 5.
  • Participated in displaying financial statistical analysis in distributed graphs using Titan.

Environment: Hadoop, MapReduce, Spark, Pig, Hive, Sqoop, Oozie, HBase, Kafka, Storm, Flume, Solr, Impala, Oracle 11g, Cloudera Manager, CDH 5, Cassandra, Linux, Java SE 8, Scala, Titan.

Confidential, New York, NY

Big Data Engineer

Responsibilities:

  • Wrote MapReduce jobs to filter and parse inventory data which was stored in the HDFS.
  • Configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster for data pipelining.
  • Imported and exported data into the HDFS from the Oracle database using Sqoop.
  • Integrated MapReduce with Cassandra to import bulk amount of logged data.
  • Converted ETL operations to the Hadoop system using Hive transformations and functions.
  • Conductedstreaming jobs with basic Python to process terabytes of formatted data for machine learning purposes.
  • Used Flume to collect, aggregate and store the web log data and loaded it into the HDFS.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Developed custom and Pig UDFs for product specific needs.
  • Implemented and configured workflows using Oozie to automate jobs.
  • Performed Hadoop cluster management and configuration of multiple nodes on AWS.
  • Involved in creating buckets to store the data in AWS and stored the data repository for future needs and reusability.
  • Involved in the cluster coordination services through Zookeeper.
  • Participated in the managing and reviewing of the Hadoop log files.

Environment: Hadoop, MapReduce, Hive, Flume, Sqoop, Zookeeper, Pig, Oozie, Python, Java SE 8, Oracle 11g, HBase, AWS, Linux

Confidential, Jacksonville, FL

Big Data Engineer

Responsibilities:

  • Analyzed and prepared functional specifications for the business and system requirements.
  • Developed custom MapReduce use-cases using Java to log data of customer behavior and loaded it into the HDFS.
  • Fixed bugs and improved Java source code to support clusters.
  • Applied Sqoop to execute processes between the Oracle database to the HDFS.
  • Loaded and transformed large sets of structured, semi structured and unstructured data.
  • Migrated the dataset into Hive for ETL purposes and optimized Pig UDFs.
  • Wrote column-mapping scripts to generate ETL Queries in Hive.
  • Developed Hive Schema to help the business user extract data files.
  • Handled importing of data from various data sources, performed transformations using Pig and Hive.
  • Used Impala to query data stored in the HDFS.
  • Participated in Mahout implementation for machine learning analysis.
  • Performed data analysis on large datasets and present results to risk, finance, accounting and pricing, sales, marketing, and compliance teams.
  • Imported data into excel and created pivot tables and statistical models.

Environment: Hadoop, MapReduce, AWS, Hive, CDH, Sqoop, Pig, Oracle 11g, Java SE 7, Python, Zookeeper, Impala, Linux

Confidential, Des Plaines, IL

Java Developer

Responsibilities:

  • Involved in the analysis, design and development of the application based on J2EE using, Spring and Hibernate.
  • Involved in developing the user interface using Struts.
  • Scalable web services were built under the RESTful model.
  • Developed the user interface screens using JavaScript and HTML and also conducted client side validations.
  • Developed unit-testing classes using JUnit.
  • Implemented Spring MVC to handle the user requests and used various controllers to delegate flow.
  • Used JDBC to connect to database and wrote SQL queries and stored procedures to fetch and insert/update to database tables.
  • Worked with Servlets to handle and process electronic prescriptions, history, and analysis.
  • Conducted data analysis with basic Python and wrangled data for data repositories.
  • Applied machine learning principles for studying market behavior for trading platform.
  • Worked with JavaScript and CSS to improve application performance.
  • Used Log4J logging framework for logging.

Environment: Java SE 7, JDBC, Spring, Hibernate, Struts, Servlets, HTML, JavaScript, Apache Tomcat, JQuery, JUnit, XML, SQL

Confidential, Newark, DE

Java Developer

Responsibilities:

  • Participated in agile/scrum meetings to define client requirements and development reports with cross-functional and offshore teams.
  • Designed UML use-case, activity, and class diagrams for technical documentation and requirements.
  • Developed and tested the web application using core Java.
  • Involved in the front-end design using JavaScript, CSS, HTML and Servlets and used Hibernate to connect to the
  • Involved in using Spring as the middle-tier framework.
  • Utilized Java multi-threading programming, synchronization and built API for concurrent models and processes.
  • Implemented Hibernate in the data access object layer to access and update information in the Oracle Database.
  • Wrote Stored Procedures, Queries and Functions in SQL.
  • Used Log4J logging framework for logging messages.
  • Performed testing using JUnit.

Environment: Java EE 6/7, SQL, JavaScript, Servlets, JDBC, HTML, CSS, Apache Struts, Hibernate, Spring, XML, Eclipse, Oracle 10g, SQL

We'd love your feedback!