Java/j2ee Developer Resume
SUMMARY:
- Almost 9 years of experience in IT industry with extensive experience in BigData technologies, Java, J2EE.
- I have focused on Data warehousing, Data modeling, Data integration, Data Migration, ETL process and Business Intelligence.
- 4+ years working of exclusive experience on Big Data technologies and Hadoop stack
- Strong experience working with HDFS, MapReduce, Spark. Hive, Pig, Sqoop, Flume, Kafka, Yarn, Oozie and HBase.
- Good understanding of Distributed Systems, HDFS architecture, Internal working details of MapReduce and Spark processing frameworks.
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Cloudera (CDH3, CDH4), YARN distributions.
- Integrate Datameter with tools and cloud - based platforms within the Big Data ecosystem (e.g. Spark, Tez, Azure, HDI, Amazon EMR, RedShift, Google DataProc etc…)
- Over all two years of hands on experience using Spark framework with Scala.
- Good exposure to performance tuning Hive queries, MapReduce jobs, Spark jobs.
- Worked with various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files
- Has good understanding of various compression techniques used in Hadoop processing like Gzip, SNAPPY, LZO etc.,
- Extensively used ETL methodology for performing Data Profiling, Data Migration, Extraction,
Transformation and Loading using Talend and designed data conversions from wide variety of source systems including Netezza, Oracle, DB2, SQL server, Teradata, Hive, and non- relational sources like flat files, XML and Mainframe Files.
- Expertise in Inbound and Outbound (importing/exporting) data form/to traditional RDBMS using Apache SQOOP.
- Tuned PIG and HIVE scripts by understanding the joins, group and aggregation between them.
- Extensively worked on HiveQL, join operations, writing custom UDF’s and having good experience in optimizing Hive Queries.
- Worked on various Hadoop Distributions (Cloudera, Hortonworks, Amazon AWS) to implement and make use of those.
- Mastered in using the using the different columnar file formats like RCFile, ORC and Parquet formats.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Hands on experience in installing, configuring and deploying Hadoop distributions in cloud environments (Amazon Web Services).
- Experience with various MVC Java frameworks like AngularJS, UnderscoreJS, and NodeJS etc.
- Extensive experience on modern front-end template frameworks for JavaScript including Bootstrap, JQuery, AngularJS etc.
- Involved in business analysis and technical design sessions with business and technical staff to develop requirements document and ETL design specifications
- Developed and maintained ETL (Data Extraction, Transformation and Loading) mappings using Informatica Designer 8.6 to extract the data from multiple source systems that comprise databases like Oracle 10g, SQL Server 7.2, flat files to the Staging area, EDW and then to the Data Marts.
- Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle
- Good experience in optimizing Map-Reduce algorithms by using Combiners and Custom partitioner.
- Hands on experience in NoSQL databases like HBase and MongoDB.
- Expertise in back-end/server-side Java technologies such as: Web services, Java persistence API (JPA), Java Messaging Service (JMS), Java Data Base Connectivity (JDBC)
- Experience includes application development in Java (client/server), JSP, Servlet programming, Enterprise Java Beans, Struts, JSF, JDBC, Spring, Spring Integration, Hibernate.
- Very good understanding in AGILE scrum process.
- Experience in using version control tools like Bit-Bucket, SVN etc.
- Having good knowledge of Oracle 8i, 9i, 10g as Database and excellent in writing the SQL queries
- Performed performance tuning and productivity improvement activities
- Extensively use of use case diagrams, use case model, sequence diagrams using rational rose.
- Proactive in time management and problem-solving skills, self-motivated and good analytical skills.
- Have analytical and organizational skills with the ability to multitask and meet the deadlines.
- Excellent interpersonal skills in areas such as teamwork, communication and presentation to business users or management teams.
TECHNICAL SKILLS:
BigData Ecosystems: Hadoop, Teradata, MapReduce, Spark, HDFS, HBase, Pig, Hive, Sqoop, Yarn, Oozie, Storm, Kafka and Flume.
Spark Streaming Technologies: Spark Streaming, Storm
Scripting Languages: Python, Bash, Java Scripting, HTML5, CSS3
Programming Languages: Java, Scala, SQL, PL/SQL
Databases: RDBMS, NoSQL, Oracle.
Java/J2EE Technologies: Servlets, JSP (EL, JSTL, Custom Tags), JSF, Apache Struts, Junit, Hibernate 3.x, Log4J Java Beans, EJB 2.0/3.0, JDBC, RMI, JMS, JNDI.
Tools: Eclipse, Maven, Ant, MS Visual Studio, Net Beans, ETL,Intellij
Methodologies: Agile, Waterfall
PROFESSIONAL EXPERIENCE:
Confidential, Plano, TX
Bigdata/Spark /Scala Developer
Responsibilities:
- Collaborated on insights with other Data Scientists, Business Analysts, and partners.
- Uploaded data to Hadoop hive and combined new tables with existing databases
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented POC for using apache impala for data processing on top of hive
- Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets in Spark 2.1 for Data Aggregation, queries and writing data back into OLTP system through Sqoop
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data.
- Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement
- Data pipeline consists Spark, Hive and Sqoop and custom built Input Adapters to ingest, transform and analyze operational data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Used Spark to perform analytics on data in hive.
- Automating the jobs using Oozie.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Extracted data from oracle SQL server and MYSQL databases to hdfs using squoop
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Worked on Cluster of size 72 nodes.
- Worked extensively with Sqoop for importing metadata from Oracle.
- Created Hive tables and loading data by developing Spark jobs from one database to another database.
- Implemented schema extraction for Parquet and Avro, ORC file Formats in Hive.
- Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
- Actively participated in software development lifecycle, including design and code reviews, test development, test automation.
- We use remedy tool to log/monitor the issues or problems faced by customers by means of incident management tickets. It also used to monitor service requests/change management and problem management.
Environment: Cloudera, Hadoop, Spark, Sqoop, Hive, Scala, Shell Scripting, Kafka, Flume, Solr, Imapla, Spark-Core, Spark-Sql
Confidential, Ann Arbor, MI
Sr. Hadoop/Scala Developer
Responsibilities:
- Worked on developing Scalable AWS (EMR) environment on highly-scalable distributed systems. Allows us to bring down EMR cluster any time and bring it back up in very minimal time
- Create and develop an End to End Data Ingestion Hadoop. pipeline by ingesting sql server raw data into S3 and processed the data using the Spark Programming. Processed data finally pushed to Redshift for RI reports.
- Develop a Spark Streaming pipe line which ingests Activity data and Email Delivery Events into S3 using Kinesis. Processed the data using the Spark programing and store in S3 bucket, Redshift.
- Created a full and hourly Incremental imports into EMR using Sqoop, MYSQL RDS is used to store the Sqoop-metastore outside of the EMR for the high availability.
- Implemented the Data Bricks API in Scala program to push the processed data to Redshift DB. Redshift is columnar and compressed storage, scale linearly and seamlessly.
- Worked on the performance tuning of Spark data frames for aggregation using dynamic partition, creating the temp views needed.
- Deploying and managing applications in Datacenter, Virtual environment and Azure platform as well.
- Involved in importing the real time data to Hadoop usingKafkaand implemented the Oozie job for daily
- Involved in developing Hive DDLs to create, alter and drop Hive tables and storm, &Kafka.
- Experienced in transferring data from different data sources into HDFS systems usingKafka producers, consumers andKafkabrokers.
- Developed Spark applications for the entire batch processing by using Scala.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Ingested the Backfill data (Historical) from sql server to the S3 using the BCP (bulk ingestion).
- Implemented the S3 API for accessing S3 buckets and data for data processing and developed custom aggregate functions usingSparkSQL and performed interactive querying.
- Protect the cluster with VPC’s and security group setting to make sure only required firewall access is provided. Access the resources by users using the AWS Roles.
- Utilized Spark dataframe and Spark-sql api extensively for all the processing
- Worked on POC’s with Apache Spark using Scala to implement Spark in project.
- SSL encryption done to store and access including transfer to AWS, redshift to protect.
- Analyze and Vaccum the redshift database on loading the historical data for the redshift server performance and utilization of resources.
- Created a DR cluster, 100% S3 HA Cross-Region Replication is used. SQOOP and Hive Metastores are outside of EMR for High availability
- Ingested all 400+ customers event and activity data (Historical data) to S3 processed then using the Spark programing and stores in redshift for availability to portal.
Environment: EMR, S3, Hadoop, Kafka, Spark, Sqoop, Spark-Streaming, Hive, Java, Scala, Shell Script
Confidential, Ridgefield Park, NJ
Sr. Hadoop/Spark Developer
Responsibilities:
- Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
- Developed custom FTP adaptors to pull the clickstream data from FTP servers to HDFS directly using HDFS File System API.
- Integrated Kafka with Spark Streaming for real time data processing
- Used Spark SQL and Data Frame API extensively to build Spark applications.
- Used Spark engine Spark SQL for data analysis and given to the data scientists for further analysis.
- Performed streaming data ingestion using Kafka to the Spark distribution environment.
- Built a prototype for real time analysis using Spark streaming and Kafka.
- Closely worked with data science team in building Spark MLlib applications to build various predictive models.
- Extensively usedAkkaactors architecture for scalable & hassle free multi-threading. Millions of activity messages per second were handled very easily by the actors by propagating the messages.
- Developed an automation tool, in Java and Scala onAkkaframework, to allow Aspect’s customers to provide a background load of agent traffic on the system, while they perform functional testing on the real clients.
- Developing the web services using Java based Play which follows reactive design paradigm coupled withAkkatoolkit which provides actor based programmable system on JVM environment
- Portioned, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
- Imported data from different sources into Spark RDD for processing.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Experienced in handling large datasets using Partitions, Spark in-memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Creating Hive tables, dynamic partitions, buckets for sampling, and working on them using Hive QL
- Optimized HIVE analytics SQL queries, created tables/views, written custom UDFs and Hive based exception processing.
- Involved in transforming the relational database to legacy labels to HDFS, and HBASE tables using Sqoop and vice versa.
- Written Sqoop scripts to inbound and outbound data to HDFS and validated the data before loading to check the duplicated data.
- Created HBase tables to store variable data formats of data coming from different portfolios
- Used Sqoop job to import the data from RDBMS using Incremental Import. Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
- Worked with using different kind of compression techniques to save data and optimize data transfer over network using Lzo, Snappy, etc.
Environment: Hadoop, HDFS, hive, Sqoop, MapReduce, Cloudera, Kafka, Zookeeper, HBase, Shell Scripting, AWS UNIX Shell Scripting.
Confidential, San Francisco, CA
Sr Hadoop Developer
Responsibilities:
- Stored the processed data by using low level Java API’s to ingest data directly to HBase and HDFS.
- Worked oninstalling cluster, commissioning & decommissioning ofData node, Name nodehigh availability, capacity planning, and slots configuration.
- Experience in managing and reviewingHadooplog files.
- Experience in hive partitioning, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
- Developed and integrated Java programs to move flat files from Linux systems to Hadoop eco systems and file validations before loading it to hive tables
- Exported the analyzed data to the relational databases using Sqoop and to generate reports for the BI team.
- Executed tasks for upgrading cluster on the staging platform before doing it on production cluster.
- Perform maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters.
- Installed and configured various components of Hadoop ecosystem.
- Optimized HIVE analytics SQL queries, created tables/views, written custom UDFs and Hive based exception processing.
- Involved in transforming the relational database to legacy labels to HDFS, and HBASE tables using Sqoop and vice versa.
- Replaced default Derby metadata storage system for Hive with MySQL system.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig.
- Configured Fair Scheduler to provide fair resources to all the applications across the cluster.
Environment: Cloudera 5.4, Cloudera Manager, Hue, Kafka, HBase, HDFS, Hive, Pig, Sqoop, Kafka, MapReduce, DataStax, IBM DataStage 8.1(Designer, Director, Administrator), Flat files, Oracle 11g/10g, PL/SQL, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting.
Confidential
Java/J2EE Developer
Responsibilities:
- Created Use case, Sequence diagrams, functional specifications and User Interface diagrams using Star UML.
- Involved in complete requirement analysis, design, coding and testing phases of the project.
- Participated in JAD meetings to gather the requirements and understand the End Users System.
- Developed user interfaces using JSP, HTML, XML and JavaScript.
- Generated XML Schemas and used XML Beans to parse XML files.
- Created Stored Procedures & Functions. Used JDBC to process database calls for DB2/AS400 and SQL Server databases.
- Developed the code which will create XML files and Flat files with the data retrieved from Databases and XML files.
- Created Data sources and Helper classes which will be utilized by all the interfaces to access the data and manipulate the data.
- Developed web application called iHUB (integration hub) to initiate all the interface processes using Struts Framework, JSP and HTML.
- Developed the interfaces using Eclipse 3.1.1 and JBoss 4.1 Involved in integrated testing, Bug fixing and in Production Support
Environment: Java 1.3, Servlets, JSPs, Java Mail API, Java Script, HTML, MySQL 2.1, Swing, Java Web Server 2.0, JBoss 2.0, RMI, Rational Rose, Red Hat Linux 7.1.
Confidential
PL/SQL Developer
Responsibilities:
- Wrote Stored Procedures in PL/SQL.
- Defragmentation of tables, partitioning, compressing and indexes for improved performance and efficiency.
- Involved in table redesigning with implementation of Partition Table and Partition Indexes to makeDatabaseFaster and easier to maintain.
- UsedSQL Server SSIS toolto build high performance data integration solutions includingextraction, transformationandload packagesfordata warehousing.
- Extracted data from theXMLfile and loaded it into thedatabase.
- Created and modifiedSQL*Plus, PL/SQLandSQL*Loader scriptsfor data conversions.
- Worked onXMLalong with PL/SQL to develop and modify web forms.
- Designed Data Modeling, Design Specifications and to analyzeDependencies.
- Creatingindexeson tables to improve the performance by eliminating the full table scans and views for hiding the actual tables and to eliminate the complexity of the large queries.
- Involved in creatingUNIX Shell Scripting.
- Maintaining Logical and Physical structure of the database.
- Creating tablespaces, tables, views,scripts for automatic operationsof the database activities.
- Coded variousstored procedures, packagesandtriggersto in corporate business logic into the application.
Environment: Oracle 9i, 10g, PL/SQL, Erwin 4.1, C, C++, Oracle Designer 2000,Windows 2000, Toad, SQL*Plus.
