Big Data Engineer Resume
New York, NY
SUMMARY
- Big Data professional having over six years of experience in the field of Data Science and Data Analytics using Big Data technologies along with Enterprise Application Development.
- Have gained a considerable amount of experience working alongside experienced professionals in the IT industry.
- Hardworking, a team player, and a progressive thinker looking for a challenging environment to apply theoretical knowledge and working expertise to date.
- 5+ years of professional experience in the domains of finance, ecommerce, and technology, developing, implementing and configuring Java technologies for desktop, web, and cloud applications.
- 3+ years of experience with Apache Hadoop and Apache Spark.
- Extensive knowledge of concepts in Big Data, Scalable Distributed and Parallel Computing, and Data Science.
- Excellent knowledge of the Hadoop architecture and components such as the HDFS, MapReduce, YARN, and Hadoop Ecosystem.
- Experience working with Spark APIs, and resilient distributed datasets (RDD) for batch processing data streams.
- Hands on experience with Hadoop ecosystem components such as: Hive, Pig, HBase, Oozie, Zookeeper, Sqoop, Mahout, and Flume.
- Experience in writing custom MapReduce computing jobs in Java.
- Experience with data pipeline and data logging using Kafka, Flume and Storm.
- Hands - on involvement in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) such as Oracle.
- Extensive experience with installing, configuring, and managing multi-node Hadoop clusters on Linux, and cloud distributions such as Cloudera (CDH3/4/5), and Amazon Web Services.
- Proficient in analyzing data using HiveQL, Pig Latin scripts and custom UDFs.
- Working experience with NoSQL databases such as HBase and Cassandra.
- Well-versed with applying and working with various open-source APIs, tools, and virtualization for Big Data related tasks.
- Knowledge of data storage repositories such as data lakes and data warehouses as well as data cleansing and wrangling using Talend.
- Familiar with Python data analytics tools such as NumPy and Pandas.
- Extensive knowledge of Object Oriented Programming (OOP) and machine learning data pattern algorithms.
- Well-versed with core Java programming, servlets, multi-threading, concurrency principles and growing knowledge of Scala, R, and emerging technologies such as Apache Flink, Docker, and Titan.
- Work experience in Test Driven Development environment and knowledge of various software development methodologies such as Agile, Scrum, RUP, and Waterfall.
- Successful ability to work independently as well as a team member on group projects with both on-shore and offshore teams.
TECHNICAL SKILLS
Big Data: Hadoop, Spark, MapReduce, YARN, Zookeeper, Hive, Pig, Solr, Sqoop, Flume, Oozie, Storm, Kafka, Mahout, Cloudera, Talend
Languages: Java, Scala, Python, C#, C++, HTML, Shell
RDBMS: Oracle 10g/11g/12c, MySQL
NoSQL: Cassandra, HBase
Cloud: Amazon Web Services, Oracle E-Business, Fusion Financials
Tools: Eclipse IDE, Netbeans, BlueJ, Visual Studio, MS Office 365, VMware, VirtualBox, Talend, Titan
Operating Systems: Windows XP/7/8/10, Mac OSX, Linux
PROFESSIONAL EXPERIENCE
Confidential - New York, NY
Big Data Engineer
Responsibilities:
- Developed MapReduce and Spark jobs using Java for batch processing and validating income data from multiple file formats and sources.
- Built data pipeline using MapReduce, Flume, Sqoop, Pig, and HDFS for financial analysis.
- Implemented Spark SQL and Spark Streaming for faster processing of real-time trading data.
- Pipelined and analyzed real time streaming data logs using Spark with Kafka.
- Imported data from different databases into the HDFS using Sqoop and performed transformations using Hive.
- Collected and aggregated large amounts of log data using Flume.
- Wrote HiveQL queries and executed Pig scripts to study customer behavior.
- Used Hive to analyze the partitioned and bucketed data and computed metrics for financial reporting.
- Developed product profiles using Pig and product specific UDFs.
- Built scalable distributed data solutions and wrote CQL queries in Cassandra.
- Scheduled and executed workflows in Oozie to run Hive and Pig jobs.
- Used Impala to read, write and query the Hadoop data in HDFS from Cassandra.
- Configured Kafka to read and write messages from external programs and handle real time data.
- Involved in writing Storm topology to accept data from Kafka and process the data.
- Applied Solr to index the search data and performed real time updates.
- Participated in data cleansing using Talend.
- Monitored Hadoop cluster using Cloudera Manager in CDH 5.
- Participated in displaying financial statistical analysis in distributed graphs using Titan.
Environment: Hadoop, MapReduce, Spark, Pig, Hive, Sqoop, Oozie, HBase, Kafka, Storm, Flume, Solr, Impala, Oracle 11g, Cloudera Manager, CDH 5, Cassandra, Linux, Java SE 8, Scala, Titan.
Confidential, New York, NY
Big Data Engineer
Responsibilities:
- Wrote MapReduce jobs to filter and parse inventory data which was stored in the HDFS.
- Configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster for data pipelining.
- Imported and exported data into the HDFS from the Oracle database using Sqoop.
- Integrated MapReduce with Cassandra to import bulk amount of logged data.
- Converted ETL operations to the Hadoop system using Hive transformations and functions.
- Conductedstreaming jobs with basic Python to process terabytes of formatted data for machine learning purposes.
- Used Flume to collect, aggregate and store the web log data and loaded it into the HDFS.
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
- Developed custom and Pig UDFs for product specific needs.
- Implemented and configured workflows using Oozie to automate jobs.
- Performed Hadoop cluster management and configuration of multiple nodes on AWS.
- Involved in creating buckets to store the data in AWS and stored the data repository for future needs and reusability.
- Involved in the cluster coordination services through Zookeeper.
- Participated in the managing and reviewing of the Hadoop log files.
Environment: Hadoop, MapReduce, Hive, Flume, Sqoop, Zookeeper, Pig, Oozie, Python, Java SE 8, Oracle 11g, HBase, AWS, Linux
Confidential, Jacksonville, FL
Big Data Engineer
Responsibilities:
- Analyzed and prepared functional specifications for the business and system requirements.
- Developed custom MapReduce use-cases using Java to log data of customer behavior and loaded it into the HDFS.
- Fixed bugs and improved Java source code to support clusters.
- Applied Sqoop to execute processes between the Oracle database to the HDFS.
- Loaded and transformed large sets of structured, semi structured and unstructured data.
- Migrated the dataset into Hive for ETL purposes and optimized Pig UDFs.
- Wrote column-mapping scripts to generate ETL Queries in Hive.
- Developed Hive Schema to help the business user extract data files.
- Handled importing of data from various data sources, performed transformations using Pig and Hive.
- Used Impala to query data stored in the HDFS.
- Participated in Mahout implementation for machine learning analysis.
- Performed data analysis on large datasets and present results to risk, finance, accounting and pricing, sales, marketing, and compliance teams.
- Imported data into excel and created pivot tables and statistical models.
Environment: Hadoop, MapReduce, AWS, Hive, CDH, Sqoop, Pig, Oracle 11g, Java SE 7, Python, Zookeeper, Impala, Linux
Confidential, Des Plaines, IL
Java Developer
Responsibilities:
- Involved in the analysis, design and development of the application based on J2EE using, Spring and Hibernate.
- Involved in developing the user interface using Struts.
- Scalable web services were built under the RESTful model.
- Developed the user interface screens using JavaScript and HTML and also conducted client side validations.
- Developed unit-testing classes using JUnit.
- Implemented Spring MVC to handle the user requests and used various controllers to delegate flow.
- Used JDBC to connect to database and wrote SQL queries and stored procedures to fetch and insert/update to database tables.
- Worked with Servlets to handle and process electronic prescriptions, history, and analysis.
- Conducted data analysis with basic Python and wrangled data for data repositories.
- Applied machine learning principles for studying market behavior for trading platform.
- Worked with JavaScript and CSS to improve application performance.
- Used Log4J logging framework for logging.
Environment: Java SE 7, JDBC, Spring, Hibernate, Struts, Servlets, HTML, JavaScript, Apache Tomcat, JQuery, JUnit, XML, SQL
Confidential, Newark, DE
Java Developer
Responsibilities:
- Participated in agile/scrum meetings to define client requirements and development reports with cross-functional and offshore teams.
- Designed UML use-case, activity, and class diagrams for technical documentation and requirements.
- Developed and tested the web application using core Java.
- Involved in the front-end design using JavaScript, CSS, HTML and Servlets and used Hibernate to connect to the
- Involved in using Spring as the middle-tier framework.
- Utilized Java multi-threading programming, synchronization and built API for concurrent models and processes.
- Implemented Hibernate in the data access object layer to access and update information in the Oracle Database.
- Wrote Stored Procedures, Queries and Functions in SQL.
- Used Log4J logging framework for logging messages.
- Performed testing using JUnit.
Environment: Java EE 6/7, SQL, JavaScript, Servlets, JDBC, HTML, CSS, Apache Struts, Hibernate, Spring, XML, Eclipse, Oracle 10g, SQL