We provide IT Staff Augmentation Services!

Big Data Engineer Resume

4.00/5 (Submit Your Rating)

Charlotte, NC

SUMMARY

  • 6+ years of experience in analysis, design and development using Big Data and Java
  • Experience on Hadoop, HDFS, Hive, Pig, Mapreduce, Spark
  • Configured Zoo Keeper, Flume, Kafka & Sqoop to the existing Hadoop cluster.
  • Hands - on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning).
  • Having experience on various Databases and Sources like Oracle, Netezza, MySql, Sql Server, Db2, Postgres, MainFrames.
  • Participated in requirement analysis, reviews and working sessions to understand the requirements and system design.
  • Knowledge in storing entire data at one single repository using Data Lakes.
  • Experience in developing Front-End using JSF, JavaScript, HTML, XHTML and CSS.
  • Experience in working with web/applications servers IBM Web sphere, Oracle Weblogic, Apache Tomcat.
  • Experience in designing highly transactional web sites using J2EE technologies and handling design/implementation-using Eclipse.

TECHNICAL SKILLS

Languages: Java, Python, R, Scala

Platforms: LINUX, Windows

Big Data: Hadoop, HDFS, MapReduce, Pig, Zookeeper, Hive, Sqoop, Flume, Kafka, Spark, Impala

J2SE / J2EE Technologies: Java, J2EE, JDBC, JSF, JSP, Web Services, Maven

Web Technologies: HTML, XHTML, CSS, Java Script, JSF and AJAX, Qlikview, XML and Shell Script.

Cloud Technologies: AWS, EC2, S3, Redshift, Data Pipeline, EMR.

Web/Application Servers: Web Sphere, Web logic Application server, Apache Tomcat

IDE / Tools: Eclipse, IntelliJ, RStudio

Methodologies: Agile, Scrum, Kanban

PROFESSIONAL EXPERIENCE

Confidential, Charlotte, NC

Big Data Engineer

Responsibilities:

  • Used Sqoop to pull the data from RDBMS like Teradata, Netezza, Oracle and storing it in the Hadoop.
  • Creating external hive tables to store and queries the data which is loaded.
  • Data will be loaded monthly, weekly and daily depends on the portfolios.
  • Different data include retail, auto, cards, home loans, and references.
  • Some of the retail data is located in Mainframes and RDBMS, so need to apply joins and store them at one location.
  • Scrubbed the history data present in hive and files located in HDFS.
  • Optimizations techniques include partitioning, bucketing.
  • Created internal tool for comparing the RDBMS and Hadoop such that all the data located in source and target matches using shell script.
  • Working with copybook files converting them from ASCHII, binary formats and storing in HDFS and creating hive tables such that we can Decommission Mainframes and make Hadoop as a primary source and same this for the export to mainframes.
  • Used some of the Pig and written pig scripts to transform the data in structured format.
  • Worked with Text, Avro, and Parquet file formatted and snappy as a default compression.
  • Created Oozie work flows to automate the process in structured manner.
  • We have 3 layers of storing the data Raw layer, Intermediate layer and Publish layer.
  • Used impala to query the data into the publish layers where all the other teams or business users can access for faster processing.
  • Worked on the Autosys and created jil with the dependencies of the other jobs such that all the jobs run in parallel and it’s been automated.
  • Used Eclipse IDE to check the new files, existing, and modification needs be done.
  • Used SVN repository to checking or checkout the code.

Environment: Hadoop, HDFS, Cloudera, Hive, Impala, shell script, eclipse, SVN, linux, oozie, Autosys, Teradata, Netezza, Oracle.

Confidential, Charlotte, NC

Hadoop Engineer

Responsibilities:

  • Managing several Hadoop clusters and other services of Hadoop Ecosystem in development and production environments.
  • Work closely with engineering teams and participate in the infrastructure development and framework development.
  • Worked on POCs in R&D environment on Hive2, Spark SQL and Kafka before providing services to the applications teams.
  • Used Spark SQL to create structured data by using data frame and querying from other data sources using JDBC and hive.
  • Automate deployment and management of Hadoop services including implementing monitoring.
  • Worked closely with Alpide team, ensuring all the issues where addressed or resolved sooner.
  • Contribute to the evolving architecture of our services to meet changing requirements for scaling, reliability, performance, manageability, and price.
  • Capacity planning of Hadoop clusters based on application requirement.
  • Peer Reviews with the application teams for their release and ensure they maintain the standards.
  • Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, uat and prod environment.
  • Migrated the existing data to Hadoop from RDBMS (Netezza, Oracle and Teradata) using Sqoop for processing the data and logs from server using flume into HDFS.
  • Created managed and external tables in hive and implemented partitioning and bucketing techniques for space and performance efficiency.
  • Used Impala on select queries for Business Users to retrieve the tables faster.
  • Developed Oozie shell wrapper for implementing Oozie re-run process for common workflows and sub-workflows.
  • Used Autosys scheduler to automate the jobs.
  • Used various file formats Avro, Parquet, Json, Text by using snappy compression.
  • Used CVS repository to checking or checkout the code.

Environment: Hadoop, HDFS, Hive, Sqoop, Impala, Flume, Spark SQL, Kafka, Python, Oozie, Autosys, Linux, Oracle, Netezza and CVS, Cloudera.

Confidential, Jersey City, NJ

Big Data Developer

Responsibilities:

  • Worked with closely with Business sponsors on the architectural solutions to meet their business needs
  • Conducted information sharing and teaching sessions to facilitate increased awareness of industry trends and upcoming initiatives by ensuring compliance between business strategies and goals and solution architecture designs
  • Performance tuned the application at various layers - MR, HIVE.
  • Used Qlikview to create visual interface of the real time data processing.
  • Implemented partitioning, dynamic partitioning and bucketing in hive.
  • Imported and exported data from various databases Netezza, oracle, MySql, DB2 into hdfs.
  • Automated the process from pulling the data from data sources to Hadoop and exporting the data in the form of Jason files in to specified location.
  • Migrated the Hive queries to Impala
  • Worked on various file formats Avro, Parquet, Text by using snappy compression.
  • Created analysis batch job prototypes using Hadoop, Pig, Oozie, Hue and Hive.
  • Used Git repository to checking and checkout the code.
  • Designed, documented operational problems by following standards and procedures using a software-reporting tool JIRA.

Environment: Hadoop, HDFS, Map Reduce, Hive, Impala, Pig, Sqoop, Java, Linux shell scripting, Oracle, Netezza, MySql, Db2, Qlikview, GIT.

Confidential

Java Developer

Responsibilities:

  • Used class-responsibility-collaborator (CRC) model to identify organized classes in the Hospital Management Systems.
  • Used sequence diagrams to show the object interactions involved with the Use-Cases of a user of the system.
  • Involved in Database Design by creating Data Flow Diagram (Process Model) and ER Diagram (Data Model).
  • Designed HTML screens with JSP for the front-end.
  • Made JDBC calls from the Servlets to the Database
  • Involved in designing stored procedures to extract and calculate billing information connecting to oracle.
  • Formatting the results from the Database as HTML reports to the client.
  • Java Script was used for client side validation.
  • Servlets are used as the controllers and Entity/Session Beans for Business logic purpose.
  • Used WebLogic to deploy applications on local and development environments of the application.
  • Used Eclipse for building the application.
  • Participated in User review meetings and used Test Director to periodically log the development issues, production problems and bugs.
  • Implemented and supported the project through development, Unit testing phase into production environment.
  • Used CVS Version manager for source control and CVS Tracker for change control management.

Environment: Java, JSP, JDBC, Java Script, HTML, WebLogic, Eclipse and CVS.

We'd love your feedback!