We provide IT Staff Augmentation Services!

Big Data Engineer Resume

3.00/5 (Submit Your Rating)

New York, NY

SUMMARY:

  • About 6+years of professional IT experience which includes 3+ years of experience building scalable, distributed, high performance computing on premise and cloud environment using big data technologies like Hadoop HDFS, Map Reduce 2, Apache Pig, Hive, Sqoop, Cassandra 2.0, Kafka 0.10.1, and AWS.
  • Implementation and extensive hands - on experience of several Big Data technology stack in the recent projects in Retail, Integrated Eligibility and Auditor Domains.
  • Expertise in Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name node, and Map Reduce concepts.
  • Proficient in Spark 2.1.0 architecture, Spark API, Spark SQL & Spark streaming. Written spark jobs in Scala to read and process large data sets in text / parquet file formats by converting into RDDs, Data frames and Datasets with custom logic to curate and enrich the data as per business needs.
  • Hands-on experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa.
  • In-depth experience in Big Data distributions over Hortonworks 2.6.
  • Expertise in different OS like UNIX /LINUX and developed various shell scripts for business process and loading data from different interfaces to HDFS.
  • Hands-on experience in transformation and Loading of data from source systems like Flat files, Excel, XML, Oracle, SQL Server.
  • Have hands on experience in writing Map Reduce jobs on Hadoop 2.7.3 Ecosystem including Hive and Pig.
  • Proficient in using distributed publish subscribe messaging system Apache Kafka 0.10.1 for moving large data sets between applications in real-time. Familiar with the Kafka architecture.
  • Strong experience in identifying performance bottlenecks and fine-tuned performance of MapReduce 2 programs, Spark 2.1.0 programs and Hive scripts.
  • Experience in creating Oozie 4.2.0 workflow/coordinator jobs for Map-reduce/Spark/Hive/Sqoop.
  • Experience in creating Reusable Transformations (Joiner, Sorter, Aggregator, Expression, Lookup, Router, Filter, Update Strategy, Normalizer and Rank) and Mappings using Informatica Designer and processing tasks using Workflow Manager to move data from multiple sources into targets.
  • Involved in Unit Testing to check whether the data loads into target are accurate.
  • Good working experience in writing SQL Joins, Nested Queries, Unions.
  • Good understanding on Cassandra architecture & core concepts.
  • Hands-on experience in working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA to track issues and crucible for code reviews.
  • Hands-on experience in cloud computing and hosted spark application on cloud using AWS EC2, S3, RDS
  • Proficient in Object-oriented concepts with complete software development life cycle (SDLC) experience - Requirements gathering, Detail design, Development, System and User Acceptance Testing.
  • In-depth knowledge in Machine Learning and Deep Learning and have hands-on experience in implementation of Machine Learning algorithms including Linear Regression, Logistic Regression, SVM, Decision Tree, Random Forest, K-means and so on by Python sklearn, Keras, Tensorflow .
  • Possess a deep knowledge of the J2EE Framework and internals of the architecture, including JSP, Servlets, JSF, JDBC, Junit, J2EE Design Patterns .

TECHNICAL SKILLS:

Hadoop Technologies\ Methodologies: HDFS, MR2, YARN, Hive, Sqoop, Pig, Agile methodology, UML, Design Patterns\Kafka, Flume

Cloud-Computing Technologies\ Programming Technologies: AWS EC2, AWS S3, AWS EMR, Core Java 1.6, Scala

Client-Sidé Technologies\ Databases\: JavaScript, HTML, XML, Oracle 10g/11g, MySQL, Cassandra

Scripting Languages\ Operating System: UNIX Korn, Bash, Python, Windows XP/Windows 7, Linux

Big Data Ecosystem: MapReduce, Spark 2.1

PROFESSIONAL EXPERIENCE:

Confidential, New York, NY

Big Data Engineer

Responsibilities:

  • Evaluated Hadoop projects across the ecosystem and extend and deployed them to high availability, big data clusters, and elastic load tolerance.
  • Bulk importing of data from various data sources into Hadoop 2.7.3 and transform data in flexible ways by using Kafka 0.10.1, Flume 1.6.0.
  • Developed MapReduce program to extract and transform the data sets and resultant dataset were loaded to Cassandra and vice versa using Kafka 0.10.1.
  • Developed automation, installation and monitoring of Hadoop ecosystem components in our open source infrastructure stack, specifically Cassandra 2.0, HDFS, MapReduce, Yarn, Hive and Kafka.
  • Created hive schemas using performance techniques like partitioning and bucketing.
  • Written extensive Hive queries to do transformations on the data to be used by downstream models.
  • Explored with the Spark 2.1.0, improving the performance and optimization of the existing algorithms in Hadoop 2.7.3 using Spark Context, SparkSQL, Data Frames.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark 2.1.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop 1.4.6.
  • Provided developer and operations documentation to educate peer teams.
  • Built linear regression and logistic regression models on patients and claims data in RDBMS using Spark MLlib library.
  • Configured Spark Streaming with Kafka to clean, aggregate real time data.

Environment: Hadoop 2.7.3, Java, UNIX, HDFS, MR 2, Hive 2.1.0, Spark 2.1.0, Sqoop 1.4.6, Cassandra 2.0, Oozie 4.2.0

Confidential, New York, NY

Big Data Engineer

Responsibilities:

  • Architected, implemented and tested data processing pipelines (Hadoop 2.7.3 and Spark 2.1.0) and data mining algorithms on a variety of hosted settings, such as AWS or Azure and Confidential 's own clusters.
  • Developed Docker to ensure that APIs and processing pipeline can be easily deployed across a variety of hardware and software architectures.
  • Used Spark 2.1.0 over Hortonworks 2.6 and Hadoop 2.7.3 to perform analytics on data in
  • Hive.
  • Built continuous integration and automated deployment by Bash Script in Linux environment.
  • Wrote Spark SQL script in Scala for testing and transformation of data.
  • Performed unit testing for Spark and Spark Streaming with Pytest, ScalaCheck.
  • Built linear regression with SGD (Stochastic Gradient Descent) model to predict the prices of products and logistic regression models to predict conversion rate of each market across the nation in Spark-shell using Dataframe API and Spark MLlibrary API
  • Developed Oozie 4.2.0 workflow jobs to execute hive 2.0.0, sqoop 1.4.6 and map-reduce actions.
  • Supported in the creation and maintenance of optimized data pipeline architectures on large and complex data sets by Kafka 0.10.1.and Spark Streaming.
  • Developed analytic workflow performance for machine learning, natural language processing, graph, and related algorithms for data science team by using Spark MLlib library
  • Assembled large, complex data sets that met Confidential business requirements by using Cassandra.

Environment: Hadoop 2.7.3, Spark 2.1.0, Sqoop 1.4.6, Hive 2.1.0, Kafka 0.10.1.1, Cassandra 2.0, Python, Java, Scala, AWS, Linux, Hortonworks 2.6, Oozie 4.2.0

Confidential

Big data Engineer

Responsibilities:

  • Worked on Hortonworks 2.1 Hadoop Platform.
  • Used Sqoop to migrate historical data from Oracle SQL and SQL Server to HDFS and HIVE 0.13
  • Created multiple Hive tables with partitioning and bucketing for more efficient data access
  • Used HiveQL for data transformation, cleansing and filtering
  • Wrote MapReduce jobs and User Defined Functions (UDF) in Hive for data aggregation
  • Delivered real time credit card transaction data from multiple sources into Kafka messaging system
  • Stored streaming data into HBase
  • Performed unit testing using JUnit.
  • Used Git for version control, JIRA for issue tracking and Jenkins for continuous integration
  • Build and improve reliable and low overhead authentication and authorization mechanisms to control the access to the resources and data.

Environment: Hadoop 2.4.0, HDP 2.1, HDFS, JUnit, Python, Hive 0.13, HBase, Kafka 0.8.1, Zookeeper 3.4.5, Oozie 4.0.0, Oracle, Git, JIRA

Confidential

Java Developer

Responsibilities:

  • Used Agile Methodology for developing the software and participated in Scrum meetings.
  • Managed the navigation and web application page flow through Spring Web Flow.
  • Responsible for implementing various modules of the application using Spring MVC architecture.
  • Used Hibernate Query Language (HQL) to write various queries in the database.
  • Implemented various J2EE Design patterns like Singleton, Business Delegate, Data Access Object (DAO), and Factory pattern.
  • Involved in the implementation of Spring ORM with Hibernate and creating the Hibernate POJO objects and mapped with MySQL database using Hibernate Annotations.
  • Used Maven to manage dependencies in the application and involved in writing Maven pom.xml and deployment of the application in Tomcat Application server.
  • Developed application using Git version control and used Eclipse IDE for development.

Environment: Spring MVC, Spring Core, Spring Web Flow, Spring ORM, Hibernate, MySQL, JUnit, RESTful web services, Maven, Git, Apache Tomcat, Eclipse IDE and Linux

Confidential

Java Developer

Responsibilities:

  • Developed the application under JEE architecture, developed Designed dynamic and browser compatible user interfaces using JSP, Custom Tags, HTML, CSS, and JavaScript.
  • Deployed & maintained the JSP, Servlets components on Web logic 8.0
  • Developed Application Servers persistence layer using, JDBC, SQL, Hibernate.
  • Used JDBC to connect the web applications to RMDBS.
  • Implemented Test First unit testing framework driven using Junit.
  • Developed and utilized J2EE Services and JMS components for messaging communication in Web Logic.
  • Configured development environment using Web logic application server for developer integration testing.

Environment: Java/J2EE, SQL, Oracle 10g, JSP 2.0, Java Script, Web Logic 8.0, HTML, JDBC 3.0, XML, Junit, Servlets, MVC, My Eclipse

We'd love your feedback!