Big Data Engineer Resume
Charlotte, NC
SUMMARY
- 6+ years of experience in analysis, design and development using Big Data and Java
- Experience on Hadoop, HDFS, Hive, Pig, Mapreduce, Spark
- Configured Zoo Keeper, Flume, Kafka & Sqoop to the existing Hadoop cluster.
- Hands - on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning).
- Having experience on various Databases and Sources like Oracle, Netezza, MySql, Sql Server, Db2, Postgres, MainFrames.
- Participated in requirement analysis, reviews and working sessions to understand the requirements and system design.
- Knowledge in storing entire data at one single repository using Data Lakes.
- Experience in developing Front-End using JSF, JavaScript, HTML, XHTML and CSS.
- Experience in working with web/applications servers IBM Web sphere, Oracle Weblogic, Apache Tomcat.
- Experience in designing highly transactional web sites using J2EE technologies and handling design/implementation-using Eclipse.
TECHNICAL SKILLS
Languages: Java, Python, R, Scala
Platforms: LINUX, Windows
Big Data: Hadoop, HDFS, MapReduce, Pig, Zookeeper, Hive, Sqoop, Flume, Kafka, Spark, Impala
J2SE / J2EE Technologies: Java, J2EE, JDBC, JSF, JSP, Web Services, Maven
Web Technologies: HTML, XHTML, CSS, Java Script, JSF and AJAX, Qlikview, XML and Shell Script.
Cloud Technologies: AWS, EC2, S3, Redshift, Data Pipeline, EMR.
Web/Application Servers: Web Sphere, Web logic Application server, Apache Tomcat
IDE / Tools: Eclipse, IntelliJ, RStudio
Methodologies: Agile, Scrum, Kanban
PROFESSIONAL EXPERIENCE
Confidential, Charlotte, NC
Big Data Engineer
Responsibilities:
- Used Sqoop to pull the data from RDBMS like Teradata, Netezza, Oracle and storing it in the Hadoop.
- Creating external hive tables to store and queries the data which is loaded.
- Data will be loaded monthly, weekly and daily depends on the portfolios.
- Different data include retail, auto, cards, home loans, and references.
- Some of the retail data is located in Mainframes and RDBMS, so need to apply joins and store them at one location.
- Scrubbed the history data present in hive and files located in HDFS.
- Optimizations techniques include partitioning, bucketing.
- Created internal tool for comparing the RDBMS and Hadoop such that all the data located in source and target matches using shell script.
- Working with copybook files converting them from ASCHII, binary formats and storing in HDFS and creating hive tables such that we can Decommission Mainframes and make Hadoop as a primary source and same this for the export to mainframes.
- Used some of the Pig and written pig scripts to transform the data in structured format.
- Worked with Text, Avro, and Parquet file formatted and snappy as a default compression.
- Created Oozie work flows to automate the process in structured manner.
- We have 3 layers of storing the data Raw layer, Intermediate layer and Publish layer.
- Used impala to query the data into the publish layers where all the other teams or business users can access for faster processing.
- Worked on the Autosys and created jil with the dependencies of the other jobs such that all the jobs run in parallel and it’s been automated.
- Used Eclipse IDE to check the new files, existing, and modification needs be done.
- Used SVN repository to checking or checkout the code.
Environment: Hadoop, HDFS, Cloudera, Hive, Impala, shell script, eclipse, SVN, linux, oozie, Autosys, Teradata, Netezza, Oracle.
Confidential, Charlotte, NC
Hadoop Engineer
Responsibilities:
- Managing several Hadoop clusters and other services of Hadoop Ecosystem in development and production environments.
- Work closely with engineering teams and participate in the infrastructure development and framework development.
- Worked on POCs in R&D environment on Hive2, Spark SQL and Kafka before providing services to the applications teams.
- Used Spark SQL to create structured data by using data frame and querying from other data sources using JDBC and hive.
- Automate deployment and management of Hadoop services including implementing monitoring.
- Worked closely with Alpide team, ensuring all the issues where addressed or resolved sooner.
- Contribute to the evolving architecture of our services to meet changing requirements for scaling, reliability, performance, manageability, and price.
- Capacity planning of Hadoop clusters based on application requirement.
- Peer Reviews with the application teams for their release and ensure they maintain the standards.
- Created sentry policy files to provide access to the required databases and tables to view from impala to the business users in the dev, uat and prod environment.
- Migrated the existing data to Hadoop from RDBMS (Netezza, Oracle and Teradata) using Sqoop for processing the data and logs from server using flume into HDFS.
- Created managed and external tables in hive and implemented partitioning and bucketing techniques for space and performance efficiency.
- Used Impala on select queries for Business Users to retrieve the tables faster.
- Developed Oozie shell wrapper for implementing Oozie re-run process for common workflows and sub-workflows.
- Used Autosys scheduler to automate the jobs.
- Used various file formats Avro, Parquet, Json, Text by using snappy compression.
- Used CVS repository to checking or checkout the code.
Environment: Hadoop, HDFS, Hive, Sqoop, Impala, Flume, Spark SQL, Kafka, Python, Oozie, Autosys, Linux, Oracle, Netezza and CVS, Cloudera.
Confidential, Jersey City, NJ
Big Data Developer
Responsibilities:
- Worked with closely with Business sponsors on the architectural solutions to meet their business needs
- Conducted information sharing and teaching sessions to facilitate increased awareness of industry trends and upcoming initiatives by ensuring compliance between business strategies and goals and solution architecture designs
- Performance tuned the application at various layers - MR, HIVE.
- Used Qlikview to create visual interface of the real time data processing.
- Implemented partitioning, dynamic partitioning and bucketing in hive.
- Imported and exported data from various databases Netezza, oracle, MySql, DB2 into hdfs.
- Automated the process from pulling the data from data sources to Hadoop and exporting the data in the form of Jason files in to specified location.
- Migrated the Hive queries to Impala
- Worked on various file formats Avro, Parquet, Text by using snappy compression.
- Created analysis batch job prototypes using Hadoop, Pig, Oozie, Hue and Hive.
- Used Git repository to checking and checkout the code.
- Designed, documented operational problems by following standards and procedures using a software-reporting tool JIRA.
Environment: Hadoop, HDFS, Map Reduce, Hive, Impala, Pig, Sqoop, Java, Linux shell scripting, Oracle, Netezza, MySql, Db2, Qlikview, GIT.
Confidential
Java Developer
Responsibilities:
- Used class-responsibility-collaborator (CRC) model to identify organized classes in the Hospital Management Systems.
- Used sequence diagrams to show the object interactions involved with the Use-Cases of a user of the system.
- Involved in Database Design by creating Data Flow Diagram (Process Model) and ER Diagram (Data Model).
- Designed HTML screens with JSP for the front-end.
- Made JDBC calls from the Servlets to the Database
- Involved in designing stored procedures to extract and calculate billing information connecting to oracle.
- Formatting the results from the Database as HTML reports to the client.
- Java Script was used for client side validation.
- Servlets are used as the controllers and Entity/Session Beans for Business logic purpose.
- Used WebLogic to deploy applications on local and development environments of the application.
- Used Eclipse for building the application.
- Participated in User review meetings and used Test Director to periodically log the development issues, production problems and bugs.
- Implemented and supported the project through development, Unit testing phase into production environment.
- Used CVS Version manager for source control and CVS Tracker for change control management.
Environment: Java, JSP, JDBC, Java Script, HTML, WebLogic, Eclipse and CVS.