We provide IT Staff Augmentation Services!

Big Data Engineer Resume

5.00/5 (Submit Your Rating)

Corvallis, OR

SUMMARY:

  • 5+ years of IT experience in software analysis, design, development and implementation of Big Data, Hadoop and Java/J2EE technologies.
  • 3+ years of hands on experience with Big Data Ecosystems including Hadoop, MapReduce, Pig, Hive, Sqoop, Flume, Oozie, MongoDB, Kafka, Maven, Spark, Scala, HBase, Cassandra.
  • Experience in installation, configuration and deployment of Big Data solutions.
  • Write Map Reduce Jobs, HIVEQL, Pig, Spark.
  • Hands on experience with NoSQL Databases like HBase, Cassandra and relational databases like Oracle and MySQL.
  • Hands - on experience of processing large sets of semi-structured data.
  • Experience in importing and exporting data between HDFS and RDBMS using Sqoop.
  • Extracted & processed streaming log data from various sources and integrated in to HDFS using Flume.
  • Support code/design analysis, strategy development and project planning.
  • Create reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way.
  • Work closely with the business and analytics team in gathering the system requirements.
  • Load and transform large sets of structured and semi structured data.
  • Proficient in Java, Collections, J2EE, Servlets, JSP, Spring, Hibernate, JDBC.

TECHNICAL SKILLS:

Big Data Ecosystems: Hadoop, Pig, Hive, Sqoop, Oozie, Spark, Spark SQLMapReduce, HDFS

Languages and Frameworks: Python (Preferred), JAVA, HiveQL, SQL, C, Scala

Database and Web Technologies: MySQL, Oracle 11g, PL/SQL, ESQL, Hadoop Map-Reduce

Operating system: Linux, Windows

Machine Learning Libraries: scikit-learn, NumPy, matplotlib, NetworkX, LibSVM

Tools: MATLAB, Weka, PyCharm, Confidential Integration BusWebSphere MQ, Confidential Data Studio, Eclipse, Maven, NetBeans

PROFESSIONAL EXPERIENCE:

Confidential, Corvallis, OR

Big Data Engineer

Responsibilities:

  • Involved in communicating with clients during the development phase.
  • Importing/Exporting data to/from HDFS and Hive using Sqoop.
  • Defining multiple data validation rules and creating the corresponding Hive queries.
  • Loading the data into Hive managed tables using partitions and buckets.
  • Built data pipeline using Pig and Java/Scala Map Reduce to store onto HDFS.
  • Validate the test cases using Spark SQL.
  • Develop Scala and Spark applications to execute the Hive queries using Hive Context in Spark for faster data processing than standard MapReduce programs.
  • Developed Oozie workflow for Spark jobs.
  • Provide support data analysis in running Pig and Hive queries.
  • Good knowledge of writing Hive UDF's as well as of Partitioning and Bucketing.
  • Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
  • Responsible for defining the data flow within Hadoop ecosystem and direct the team in implementing them.
  • Involved in managing and reviewing the Hadoop log files.

Environment: Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, Pig, HDFS, Flume, DB2, HBase, Scala

Confidential, Boca Raton, FL

Big Data/Hadoop Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster using different Big Data analytic tools including Flume, Sqoop, Spark, Pig, Hive and Map Reduce.
  • Involved in loading data from LINUX file system to HDFS.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Experience working on processing unstructured data using Pig and Hive.
  • Developed Pig Latin scripts to extract data from the web server output files to load into HDFS.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Developed Hive queries, Pig scripts, and Spark SQL queries to analyze large datasets.
  • Exported the result set from Hive to MySQL using Sqoop.
  • Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Gained experience in managing and reviewing Hadoop log files.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Used NoSQL database with Hbase
  • Actively involved in code review and bug fixing for improving the performance.

Environment: Hadoop, HDFS, Pig, Hive, MapReduce, Sqoop, Spark, Flume, LINUX, Hbase, Java, Oozie

Confidential

Software Developer

Responsibilities:

  • Developed web services in Confidential Integration Bus (WebSphere Message Broker v9.0) and integrated the services with back-end System of Records and Confidential Business Process Manager (BPM).
  • Performed Unit testing of the service and provided support during integration testing.
  • Experience in 24X7 on-call production support and troubleshooting problems related to WebSphere Application Servers.
  • Worked close with application team to figured out application related issues.
  • Good Team player possessing excellent communicational skills, self-starter and self-motivated.
  • Knowledge transfer and conduct technical sessions within team.
  • Implemented a movie rating prediction program in Python, based on both item-based and user-based collaborative filtering.
  • This program forms the basis of a recommendation system, which recommends a movie to that user for whom the predicted rating value for that movie is high.
  • Used it for the MovieLens 20M Dataset, which has 20 million movie ratings by 138k users for 27k movies.
  • Developed a database system along with a standalone application using Core Java, Java Swings & Oracle 10g that handles all transactions and tasks required for managing store operations.
  • Developed a reliable and robust Database Management System by creating stored procedures and triggers in PL/SQL, so as to maintain ACID properties of the database system. Also normalized it to Confidential .
  • Created functionality to store and retrieve order details, customer details, bill details, inventory details and supplier information.
  • To maintain security of the data, created two modules, general and administrative with different access controls.

We'd love your feedback!